Deformable Convolutional Networks

16 Apr 2017 | PR12, Paper, Machine Learning, CNN

이번 논문은 Microsoft Research Asia에서 2017년 3월에 공개한 “Deformable Convolutional Networks”입니다.

이 논문의 저자들은, CNN (Convolutional Neural Network)이 (지금까지 image 처리 분야에서 많은 성과를 거뒀지만) 근본적으로 한계가 있다고 주장합니다. CNN에서 사용하는 여러 연산(convolution, pooling, RoI pooling 등)이 기하학적으로 일정한 패턴을 가정하고 있기 때문에 복잡한 transformation에 유연하게 대처하기 어렵다는 것입니다. 저자들은 그 예로 CNN layer에서 사용하는 receptive field의 크기가 항상 같고, object detection에 사용하는 feature를 얻기 위해 사람의 작업이 필요한 점 등을 들고 있습니다.

이를 해결하기 위해 이 논문에서는 Deformable Convolution과 Deformable ROI Pooling이라는 두 가지 방법을 제안합니다.

Deformable Convolution

Deformable Convolution은 아래 그림처럼 convolution에서 사용하는 sampling grid에 2D offset을 더한다는 아이디어에서 출발합니다.

sampling grids

그림 (a)의 초록색 점이 일반적인 convolution의 sampling grid입니다. 여기에 offset을 더해(초록색 화살표) (b)(c)(d)의 푸른색 점들처럼 다양한 패턴으로 변형시켜 사용할 수 있습니다.

아래 그림은 $3 \times 3$ deformable convolution의 예를 보이고 있습니다.

3x3 deformable convolution

그림에서 보는 것처럼 deformable convolution에는 일반적인 convolution layer 말고 하나의 convolution layer가 더 있습니다. 그림에서 conv라는 이름이 붙은 이 초록색 layer가 각 입력의 2D offset을 학습하기 위한 것입니다. 여기서 offset은 integer 값이 아니라 fractional number이기 때문에 0.5 같은 소수 값이 가능하며, 실제 계산은 linear interpolation (2D이므로 bilinear interpolation)으로 이뤄집니다.

Training 과정에서, output feature를 만드는 convolution kernel과 offset을 정하는 convolution kernel을 동시에 학습할 수 있습니다.

receptive field illustration

위의 그림은 convolution filter의 sampling 위치를 보여주는 예제입니다. 붉은 점은 deformable convolution filter에서 학습한 offset을 반영한 sampling location이며, 초록색 사각형은 filter의 output 위치입니다. 일정하게 샘플링 패턴이 고정되어 있지 않고, 큰 object에 대해서는 receptive field가 더 커진 것을 확인할 수 있습니다.

Deformable ROI Pooling

RoI (Region of Interest) pooling은 크기가 변하는 사각형 입력 region을 고정된 크기의 feature로 변환하는 과정입니다.

3x3 deformable RoI pooling

Deformable RoI pooling도 일반적인 RoI pooling layer와 offset을 학습하기 위한 layer로 구성됩니다. 한 가지 deformable convolution과 다른 점은, offset을 학습하는 부분에 convolution이 아니라 fc (fully-connected) layer를 사용한 것인데 아쉽게도 그 이유가 논문에 밝혀져 있지 않습니다. Neural network에서 convolutional layer와 fully-connected layer의 차이에 대해서는 Reddit의 관련 post를 참고하시기 바랍니다.

마찬가지로 training 과정에서 offset을 결정하는 fc layer도 backpropagation을 통해 학습됩니다.

아래 그림은 노란색 입력 RoI에 대해 붉은색 deformable RoI pooling 결과를 보여줍니다. 이 실험 결과에서, RoI에 해당하는 붉은 사각형의 모양이 object 형태에 따라 다양한 형태로 변형되는 것을 볼 수 있습니다.

roi pooling illustration

지금까지 deep learning 분야의 많은 연구들이 predictor의 weight 값 $w$를 구하는 방법에 초점을 맞췄던 반면, 이 논문은 어떤 데이터 $x$를 뽑을 것인가에 초점을 맞췄다는 점이 참신하다는 평가를 받고 있습니다. 이제 갓 발표된 논문인 만큼, 향후 다른 연구에 어떤 영향을 미칠지 앞으로 주목할 필요가 있을 것 같군요.

– Jamie;

References

Jifeng Dai의 논문 “Deformable Convolutional Networks”
Jifeng Dai 및 저자들의 GitHub repository
엄태웅 님의 슬라이드 “Deformable Convolutional Networks”
엄태웅 님의 동영상 “PR-002: Deformable Convolutional Networks (2017)”
Felix Lau의 Notes on “Deformable Convolutional Networks”
Ross Girshick의 논문 “Fast R-CNN”
Ross Girshick의 슬라이드 “Fast R-CNN”
deepsense.io의 블로그 “Region of interest pooling explained”
deepsense.io의 블로그 “Region of interest pooling in TensorFlow – example”
Wikipedia의 CNN (Convolutional Neural Network)
Reddit의 “What is the difference between a Fully-Connected and Convolutional Neural Network?”

Pull Requests to Tomorrow Jamie Kang's weblog on computing

Deformable Convolutional Networks

Deformable Convolution

Deformable ROI Pooling

Related Posts

Neural Architecture Search with Reinforcement Learning 19 Jun 2017

You Only Look Once: Unified, Real-Time Object Detection 18 Jun 2017

Convolutional Neural Networks for Sentence Classification 12 Jun 2017