Releases Matching

Vadim Markovtsev
Head of Analytics

Problématique

What have we released today?
What to improve in how we release?

Clients

Do you mean...

🤔 Which pull requests were released?
🤔 Which JIRA issues were released?
🤔 What's your PR release time?
🤔 What's your release frequency, etc.?

How can we release?

Release == commit pointer + meta.

Create a Git tag.
Merge master to production.
Send Athenian a commit pointer.

Release by tag

🏷1.1.0 released PR X.

Release by branch

Release settings

Requirements

Automatic inference of the release process.
Support changing release settings.
Incredible, astounding, marvellous performance.
Correctness.
Map to JIRA issues.

Quirks

Squash

Tag deleted

Tag inserted

Rebase

Ambiguity

Match release branch: master|production

Overview

Plan

Discover release commits.
Analyze release ownership.
Match to PRs.

time_from, time_to

independent of the Heater

Discover release commits

Tags, events: obvious.
- SELECT FROM github.api_releases
Branches: follow the first parent from the matching branch heads.
Automatic: look for tags in (time_from - Δ, time_to + Δ),where Δ =3 weeks.
- Hence, it depends on time_from, time_to

Analyze release ownership

Match to PRs

Match to PRs - rebases

"Merge pull request #1841 from athenianco/fix-pr-dep-dupes"

False positives: <0.1%.

Empty merge commits are sometimes discarded 😭

Doesn't work for squashes 😭

Building the commit DAG

Recursive SQL:
Stop hashes. Pick maximum 25 equidistant first parents and cut.
Stop heads. Take maximum 25 known DAG entries.
- One of the reasons why we suck at monorepos.
Account for the eventual consistency:
- If (parent|child)_id does not exist in github.node_commit, abort mission.

Building the commit DAG

Code

Python for Athenian API

😄High quality libraries: numpy.
😅Rapid development libraries: pandas.
🤩Excellent coroutines support.
🤬Interpreter sucks: slow, Global Interpreter Lock.
🧐Powerful introspection => powerful profiling.
👽The fastest code is native code.

Python + C/C++ = Cython

@cython.boundscheck(False)
@cython.wraparound(False)
cdef void _copy_parents_to_array(const vector[vector[uint32_t]] *parents,
                                 uint32_t[:] output,
                                 int64_t[:] splits) nogil:
        cdef int64_t i, offset = 0
        for i in range(<int64_t>parents.size()):
            vec = dereference(parents)[i]  # (*parents)[i]
            memcpy(&output[offset], vec.data(), 4 * vec.size())
            offset += vec.size()
            splits[i] = offset

Athenian DAG format

DAG = (hashes, vertexes, edges)

For N commits with E parent-child relations:

hashes is a byte array of size Nx40, sorted by first axis
vertexes is an uint32 array of size (N + 1)
edges is an uint32 array of size E
Parent-children direction is inverted compared to Git-speak.

Athenian DAG format advantages

💪Memory footprint is minimal.
💪Data locality => DAG fits in L2/L3 cache.
💪Zero-copy, uniform usage in Python and Cython.
💪Fast enough O(log(N)) lookups.
- numpy has searchsorted - vectorized binary search.
💪DAG traversals rely on integer arrays vertexes and edges.
💪Inserting new commits is efficient O(N).

Precomputed tables

github.commit_history
github.release_match_spans
github.releases
github.release_facts

Everything related to releases has release_match in PK.

Releases Matching

Problématique

Do you mean...

How can we release?

Release by tag

Release by branch

Release settings

Requirements

Quirks

Squash

Tag deleted

Tag deleted

Tag inserted

Tag inserted

Rebase

Rebase

Ambiguity

Ambiguity

Overview

Plan

time_from, time_to

independent of the Heater

Discover release commits

Analyze release ownership

Match to PRs

Match to PRs - rebases

Building the commit DAG

Building the commit DAG

Code

Python for Athenian API

Python + C/C++ = Cython

Athenian DAG format

Athenian DAG format

Athenian DAG format advantages

Precomputed tables

Thank you, and see you tomorrow