In this challenge you need to compute a sum of given floating point numbers taking into account accuracy, memory bandwidth and cache-friendliness.
My final submission was constructing perfect binary tree (pairsum technique) and then making a lot of improvement moves:
- swapping leaves,
- swapping 16-sized blocks
- changing type of addition
- etc
after making a pre-move - I propagated the sum to the root (which took O(log N)) and computed score improvement.
What did not work:
- distillation techniques (because of locality penalty)
- all kinds of sorting techniques (because of locality penalty)
Overall this was a very fun and complex challenge: at first I experimented with a lot of sorting/distillation techniques which did not work, than I fixed tree structure and concentrated on improving pairsum technique.
If you are interested - you can look at challenge details
No comments:
Post a Comment