Machine Learning on Open Source Code

Vadim Markovtsev, source{d}.

#MLonCode

Vadim Markovtsev

@vadimlearning

🚀 Machine Learning

GitHub

Machine Learning on Source Code

Projects similar to MariaDB/server?


Details

Similar code detection

provides us

Another example: DeepCode.ai.
class foobar:
    def connecttoserver(self):
        myserverhost = globalconfig.server.host
        
class FooBar:
    def connect_to_server(self):
        myServerHost = globalConfig.server.host
        

Your code is a crime scene

Tools for MLonCode

MLonCode logo

Datasets

PGA

Details in the paper.

source{d} engine

AST

Universal AST

dashboard.bblf.sh
>>> engine.repositories.references.head_ref \
    .commits.tree_entries.blobs \
    .classify_languages() \
    .filter('lang = "Python"') \
    .extract_uasts() \
    .query_uast('//*[@roleIdentifier]') \
    .extract_tokens("result", "tokens") \
    .select("blob_id", "path", "tokens")
        

source{d} ml

GitHub

Modelforge

GitHub

Summary

Thank you

Contacts: