Suggesting Fixes during Code Review with ML

Vadim Markovtsev

Machine Learning for

Large Scale Code Analysis

Plan

Origins ➙ Lookout ➙ SDK ➙ Demo ➙ style-analyzer

Origins

Many efforts target boring stuff

Boring means automatable

Automatable ≠ unattended 😔

When to help?

While you type = IDE
While you check = CI
While you review = PR
Periodically, asynchronously

What is Lookout?

Goals

Assisted code review platform
Tight git/GitHub integration
Analyzed language agnostic
Implementation language agnostic
Batteries included

Architecture

Push event

PR event

docs.sourced.tech/lookout

SDK

src-d/lookout-sdk

Single source of gRPC definitions
Low-level API: Go, Python
Low-level examples

src-d/lookout-sdk-ml

High-level Python API
Stateful analyzers
Integrated with source{d} ml ecosystem

Rule of 👍

High-level API

class MyAnalyzer(Analyzer):
    @classmethod
    def train(cls, ...) -> AnalyzerModel:
        # ...

    def analyze(self, ...) -> [Comment]:
        # do something with self.model

Train

@with_uasts_and_contents
def train(cls,
          ptr: ReferencePointer,
          config: Dict[str, Any],
          data_service: DataService,
          files: Iterable[File]
          ) -> AnalyzerModel:

Analyze

@with_changed_uasts_and_contents
def analyze(self,
            ptr_from: ReferencePointer,
            ptr_to: ReferencePointer,
            data_service: DataService,
            changes: Iterable[Change]
            ) -> [Comment]:

Behind the scenes

gRPC servers and clients
Pooling and threading
Database of trained models
Caches
Logging
Metrics

Demo

style-analyzer

Training

Parse to intermediate representation
Train Decision Tree Forest
Extract production rules

Virtual nodes

a = b * 2

Machine Learning

Feature selection (univariate, ANOVA F-criterion)
Hyperparameter optimization (Bayesian)
80% + 20% split

Rules

a≤5 Λ b≤1 Λ c ⇒ α
a≤5 Λ 1<b<4 ⇒ β
5<a<10 Λ c ⇒ γ
a>5 Λ c Λ b>2 ⇒ α

Rules optimization

a>5 Λ c Λ b>2 Λ d Λ a>10 ⇒ α(merge)

a>10 Λ c Λ b>2 Λ d ⇒ α(redundant)

a>10 Λ c Λ d ⇒ α(feature exclusion)

a>10 Λ c ⇒ α

Result

-40% ~ -60% less attributes
-30% ~ -50% less rules

Rules optimization

a>5 Λ c Λ b>2 Λ d Λ a>10 ⇒ α(merge)

    a>10 Λ c Λ b>2 Λ d ⇒ α(redundant)

    a>10 Λ c Λ d ⇒ α(feature exclusion)

    a>10 Λ c ⇒ α(confidence threshold)

Result

-40% ~ -50% less rules @93%

Inference

Apply rules
- Fixes to old code?
- AST breakage?
- Identification?
Generate code
- Indentation?
- Multiple lines?

Precision > Recall

Prediction Rate (PredR)

Evaluation

⬛ Precision
⬛ PredR

~95% weighted avg.

We don't test the real behavior

Evaluation

170 handcrafted errors
2 projects
95% precision @50% PredR

We don't test the real behavior

Evaluation improvements

Extend
Random mutations
Extract from commits

Summary

Assisted code review + Lookout = ♥
style-analyzer is fun
#MLonCode is dope

Thank you

bit.ly/2B9tzZw