Scalable Language-Agnostic Analysis of Source Code and VCS History

Vadim Markovtsev, source{d}.

Scalable Language-Agnostic Analysis of Source Code and VCS History

Vadim Markovtsev
source{d}

Huge codebases

LoC, 10⁶
Chrome 20
Windows 10 50
Facebook 70
Eclipse Foundation 160
Apache Foundation 190
Google 2,000
GitHub >56,000

Problems

source{d} solutions

borges

Features

siva

Challenges

Engine

Features

Easy to try

           $ srcd init /path/to/git/repos
           $ srcd web sql
       

Top 10 repositories by commit count

SELECT r.repository_id, COUNT(*) AS commit_count
FROM ref_commits AS r
WHERE r.ref_name = 'HEAD'
GROUP BY r.repository_id
ORDER BY commit_count DESC LIMIT 10
        

Examples

How to parse

Universal AST

            $ srcd web parse
        

Number of functions per Go file

SELECT files.repository_id, files.file_path,
    ARRAY_LENGTH(uast_extract(UAST(
        files.blob_content,
        LANGUAGE(files.file_path, files.blob_content),
        '//*[@roleFunction and @roleDeclaration]'), 'token')
    ) as functions
FROM files NATURAL JOIN ref_commits AS rc
WHERE rc.ref_name = 'HEAD' AND rc.history_index = 0
      AND LANGUAGE(files.file_path, files.blob_content) = 'Go'

Challenges

Hercules

Features

Line burndown

git blame foo.py

2014
class Foo:
  def bar(self):
    print("!")
2015
class Foo:
  def bar(self):
    print("?")
 
  def baz(self):
    print("!")
2016
class FooBarBaz:
  def bar(self):
    """..."""
    print("yo")
 
  def baz(self):
    print("!")

Line burndown

Linux

Reproduce

            hercules --burndown --first-parent --pb \
    git://github.com/torvalds/linux | \
    labours.py -m project -f pb
        

Line overwrite matrix

Reproduce

            hercules --burndown --burndown-people --pb \
    git://github.com/tensorflow/tensorflow | \
    labours.py -m churn_matrix -f pb
        

Reproduce

            hercules --couples --pb \
    git://github.com/tensorflow/tensorflow | \
    labours.py -m couples -f pb
        
It is also possible to embed developers, functions and classes.

Summary

Summary

Thank you

bit.ly/2QMOY0Q