Tuesday, February 21, 2017

About past topcoder maraphon cmap2 contest

There was an interesting optimization task.

The bottleneck is memory system. So you try to fit accurately in L1, missing L1 hurts much performance. A bottleneck task for me was to sort small arrays of 14 bits (100-200 elements on average). For my core i7 6700 I found optimized 14-bit radix sort the fastest. It also scales well
with multithreading. Also I tried parallel odd-even merge sort by sorting 16 shorts with avx2 at a time, but it didn't give win over radix sort. Overall it was fun, some lessons I learned: make automated tests as early as you can (when you are trying squeezing every bit - high possibility you will break something), simd versions of algorithms do not scale well if done wrong, also it's that competition feeling - when you squeezed everything from hardware - you see from leaderboard that another competitor submitted solution that outperformed yours by 2x-10x times

No comments:

Post a Comment