I got silver medal in this interesting problem on kaggle. This was multiview pose estimation problem.
The task can be split into 2 parts:
- feature detection and feature matching
- pose estimation using colmap
I tried to tune colmap (the second part), but mostly the default parameters are already tuned.
So the most part of time I spent on first part (as most of participants).
The dataset contained challenging images:
- some were arbitrary rotated (90-180-270 degree unknown rotation)
- there was spatial big separation among some neighbouring images (requiring a lot of dense features to form tracks)
- varying scale of same objects in images (requiring to use scale-invariant features and ransac filtering or using roi).
- some scenes in dataset had featureless parts (like walls), requiring a lot of dense features to match them.
As a solution I used ensemble of 3 models:
- loftr
- superglue (superpoint)
- keynet, affnet, hardnet
I combined all matches from 3 models and performed coarse MAGSAC method (threshold = 3 pixels) of inlier filtering.
Rotation I resolved by performing 4 90deg rotations and choosing best for each image pair.
An additional challenge was to find for each image a set of candidates for which matches will be constructed (this set should not be big, as limited by time limit ). This I achieved by sorting on cosine descriptor similarity by efficient-net-b7 model.
No comments:
Post a Comment