Accelerating object serialization by using constraints

Vadim Markovtsev, Athenian.

Accelerating object serialization by using constraints

Vadim Markovtsev

About me

Humiliating pickle


Custom binary serialization

Custom binary serialization

Releasing GIL

Approach to pandas

dtype object

assert PyArray_IS_C_CONTIGUOUS(arr)
        assert PyArray_NDIM(arr) == 1
        assert PyArray_DESCR(arr).kind == b"O"
        cdef PyObject **data = <PyObject **> PyArray_DATA(arr)
        for i in range(PyArray_DIM(arr, 0)):

stdlib containers

for i in range(PyList_GET_SIZE(obj)):
            serialize(PyList_GET_ITEM(obj, i))
        while PyDict_Next(obj, &pos, &key, &val):

Integers, floats

double PyFloat_AS_DOUBLE(PyObject *)
        long PyLong_AsLong(PyObject *)
        void PyArray_ScalarAsCtype(PyObject *scalar, void *ctype)
        memcpy(buffer, &value, sizeof(value))


Internal representation: smart UCS1, UCS2, or UCS4.

            PyUnicode_GET_LENGTH(obj) * PyUnicode_KIND(obj)

datetime and timedelta

PyDateTime_CAPI: internal representation is a struct, not a timestamp

int PyDateTime_GET_YEAR(PyObject *)
        int PyDateTime_GET_MONTH(PyObject *)
        int PyDateTime_DATE_GET_SECOND(PyObject *)
        int PyDateTime_DELTA_GET_DAYS(PyObject *)
        int PyDateTime_DELTA_GET_SECONDS(PyObject *)

~20x* faster


Humiliating json.dumps


        class Movie:
            name: str
            rating: float
            actors: list[Actor]


Typical paginated request without DB pushdown:

blob = await load_from_cache(key)
        movies = deserialize(blob)
        selected = movies[offset:offset + limit]
        return to_json(selected)


Slow as hell with our "movies":

blob = await load_from_cache(key)
        movies = deserialize(blob)
        selected = movies[offset:offset + limit]
        return to_json(selected)

Problem - attempt to fix

First call:

movies = await bake_movies(...)
        dicts = to_atoms(movies)
        await store_to_cache(dicts, key)
        return to_json(dicts[:limit]), key

Problem - attempt to fix

n + 1 call:

blob = await load_from_cache(key)
        dicts = deserialize(blob)
        return to_json(dicts[offset:offset + limit]), key

Problem - attempt to fix

First call:

movies = await bake_movies(...)
        dicts = to_atoms(movies)
        await store_to_cache(serialize(dicts), key)
        return to_json(dicts[:limit]), key

Problem - attempt to fix

n + 1 call:

blob = await load_from_cache(key)
        dicts = deserialize(blob)
        return to_json(dicts[offset:offset + limit]), key


'[{"name": "RRR", ...},{"name": "Up", ...},{...'
        [ 1,                  100,                200]

Problem - my ideal solution

First call:

movies = await bake_movies(...)
        blob, toc = to_json_vadim(movies)
        await store_to_cache(serialize((blob, toc)), key)
        selected = blob[:toc[limit] - 1]
        return f'{{"key": "{key}", "movies": [{selected}]}}'

Problem - my ideal solution

n + 1 call:

blob, toc = deserialize(await load_from_cache(key))
        selected = blob[toc[offset]:toc[offset + limit] - 1]
        return f'{{"key": "{key}", "movies": [{selected}]}}'


  1. Make the serialization specification.
  2. Write each movie according to the spec.
  3. Reimplement many routines:
    • list, dict
    • int, float to str
    • datetime, timedelta to str
    • str to str: 'esc"ape\n' -> r'"esc\"ape\n"'; utf8

Main trick with __slots__

Serialization spec (Cython)

ctypedef struct SpecNode:
            DataType type
            Py_ssize_t offset
            PyTypeObject *model
            vector[SpecNode] nested
cdef enum DataType:
            DT_INVALID = 0            DT_DT = 7
            DT_MODEL = 1              DT_TD = 8
            DT_LIST = 2               DT_BOOL = 9
            DT_DICT = 3
            DT_LONG = 4
            DT_FLOAT = 5
            DT_STRING = 6

~100x* faster

Model - precomputed spec

        class Movie:
            name: str
            rating: float
            actors: list[Actor]
            # __json_spec__: ClassVar[PyCapsule]


Thank you