Tuesday, November 16, 2021

about my past hacks

 As you know the original meaning of hacking is not connected with security at all. It is all about creative elegant programming. Here I list some hacks of big open source software systems which I've done in the past for job or freelance.

Some hacks I forgot and these ones I never forget because of amount of effort and amount of joy which I got when implementing them.

1) Porting Apache Pig to third-party MapReduce.

The idea was to intercept requests (jar) to Hadoop, wrap them into custom logic and directly run them on non-Hadoop versions of MapReduce. Pig compiled script code into operator tree and it was embedded into jar. Most of the work was testing/making specific operators work in new MapReduce.

2)  Modification of MySql code to directly read myisam tables.

That was a need in past job to regularly process big (>3GB) myisam tables, the main requirements were fast read speed, small memory usage and also ability to process them in parallel. If you read them via mysql - the read was slow and memory heavy. 

The hack was to use myisam storage interface directly and deserialize fields in each row with mysql specific code.

3) Porting pxCore/pxScene library to Duktape js engine.

This I remember by amount of sleepless nights to solve all the issues and stop catching another segfault. The initial pxCore/pxScene was based on Node engine. There was a need to port it into Duktape js engine. And Duktape differed a lot from Node (V8) engine. The code of Duktape was not modified at all. The port was successful finally.

4) Support Media Source Extensions (part of video/audio html tag) into pxCore/pxScene.

This was a challenging task to add support of MSE in small amount of time. For this I've reused implementation of MSE in WebKit web browser engine, I've wrapped some classes in Webkit for use in pxCore. And finally this whole big MSE pipeline in WebKit started to work. To make it work I dissected the whole pipeline and understood the rough meaning of each class.

5) Porting fmha (apex/fmha) to support arbitrary sequence lengths with small amount of additional memory.

This was a tough task in hard very optimized cuda codebase. The task was useful for research groups in yandex and around the world.


No comments:

Post a Comment