Suggesting Fixes during Code Review with ML

Vadim Markovtsev, source{d}.

Suggesting Fixes during Code Review with ML

Vadim Markovtsev

Plan

Origins    Lookout    SDK    Demo    style-analyzer

Origins

Many efforts target boring stuff

Boring means automatable

Automatable ≠ unattended 😔

When to help?

What is Lookout?

Goals

Architecture

Push event

Push event

Push event

Push event

Push event

Push event

Push event

PR event

PR event

docs.sourced.tech/lookout

SDK

src-d/lookout-sdk

src-d/lookout-sdk-ml

Rule of 👍

High-level API

class MyAnalyzer(Analyzer):
    @classmethod
    def train(cls, ...) -> AnalyzerModel:
        # ...

    def analyze(self, ...) -> [Comment]:
        # do something with self.model

Train

@with_uasts_and_contents
def train(cls,
          ptr: ReferencePointer,
          config: Dict[str, Any],
          data_service: DataService,
          files: Iterable[File]
          ) -> AnalyzerModel:

Analyze

@with_changed_uasts_and_contents
def analyze(self,
            ptr_from: ReferencePointer,
            ptr_to: ReferencePointer,
            data_service: DataService,
            changes: Iterable[Change]
            ) -> [Comment]:

Behind the scenes

Demo

style-analyzer

Training

  1. Parse to intermediate representation
  2. Train Decision Tree Forest
  3. Extract production rules

Virtual nodes

a = b * 2

Machine Learning

Machine Learning

Rules

Rules optimization

a>5 Λ c Λ b>2 Λ d Λ a>10 ⇒ α(merge)
a>10 Λ c Λ b>2 Λ d ⇒ α(redundant)
a>10 Λ c Λ d ⇒ α(feature exclusion)
a>10 Λ c ⇒ α

Result

Rules optimization

a>5 Λ c Λ b>2 Λ d Λ a>10 ⇒ α(merge)
a>10 Λ c Λ b>2 Λ d ⇒ α(redundant)
a>10 Λ c Λ d ⇒ α(feature exclusion)
a>10 Λ c ⇒ α(confidence threshold)

Result

Inference

  1. Apply rules
    • Fixes to old code?
    • AST breakage?
    • Identification?
  2. Generate code
    • Indentation?
    • Multiple lines?

Precision > Recall

Prediction Rate (PredR)

Evaluation

~95% weighted avg.

We don't test the real behavior

Evaluation

We don't test the real behavior

Evaluation improvements

Summary

Summary

Thank you

bit.ly/2B9tzZw