Guide how to learn and master computer vision in 2020
This post will focus on resources, which I believe will boost your knowledge in computer vision the most and mainly based on my own experience.
-
Original Medium post
Before starting learning computer vision getting knowledge about basics in machine learning and python will be great.
Frameworks
Star Wars: Luke Skywalker & Darth Vader
You don’t have to choose it from the beginning, but applying newly gained knowledge is necessary. There is no much to options: pytorch or keras (TensorFlow). Pytorch may require more code to write but gives much flexibility in return, so use it. Besides, most researchers in deep learning started to use pytoch. Albumentation (image augmentation) and catalyst (framework, high-level API on the top of pytorch) might be useful as well, use them, especially the first one.
Hardware
Nvidia GPU 10xx+ will be more than enough ($300+) Kaggle kernels — only 30 hours/week (free) Google Colab — 12 hours session limit, unknown week limits (free)
Theory & Practise
Online courses
CS231n
is the top online, which covers all necessary fundamentals in the computer vision. Youtube online videos. They even have exercises but I can’t advise to solve them. (free)Fast.ai
is the next course you should watch off. Also, fast.ai is the high-level framework on the top of pytorch, but they change their API too frequent and the lack of documentation makes it unreliable to use. However, theory and useful tricks are just fantastic to spend time watching this course. (free) While taking these courses I encourage you to put theory into practice applying it to one of the frameworks.
Articles and code
ArXiv.org — information about all recent will be here. (free) https://paperswithcode.com/sota — the state of the art in most common deep learning tasks, not only computer vision. (free) Github — if something was implemented you will find it here. (free)
Books
There is no much to read, but these two books I believe will be useful, no matter pytorch or keras you choose to use Deep Learning with Python by Keras creator and Google AI researcher François Chollet. Easy to use and may get some insight you didn’t know before. (not free) Deep learning with Pytorch by pytorch team Eli Stevens & Luca Antiga (free)
Kaggle
Competitions — kaggle is well known online platform for different variety of machine learning competitions, many of them are about computer vision. You can start participating even without finishing courses, because from competition beginning there will be many open kernels (end-to-end code) which you can run directly from the browser. (free)
Tough (jedi) way
Star Wars`s Jedi: Yoda
Another alternative path could be tough but you will get required knowledge not only to do fit-predict but perform own research. From Sergei Belousov aka bes. You just need to read and implement all the articles below (free). Just reading them also will be great.
Architectures
- AlexNet: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
- ZFNet: https://arxiv.org/abs/1311.2901
- VGG16: https://arxiv.org/abs/1505.06798
- ResNet: https://arxiv.org/abs/1704.06904
- GoogLeNet: https://arxiv.org/abs/1409.4842
- Inception: https://arxiv.org/abs/1512.00567
- Xception: https://arxiv.org/abs/1610.02357
- MobileNet: https://arxiv.org/abs/1704.04861
Semantic Segmentation
- FCN: https://arxiv.org/abs/1411.4038
- SegNet: https://arxiv.org/abs/1511.00561
- UNet: https://arxiv.org/abs/1505.04597
- PSPNet: https://arxiv.org/abs/1612.01105
- DeepLab: https://arxiv.org/abs/1606.00915
- ICNet: https://arxiv.org/abs/1704.08545
- ENet: https://arxiv.org/abs/1606.02147
Generative adversarial networks
- GAN: https://arxiv.org/abs/1406.2661
- DCGAN: https://arxiv.org/abs/1511.06434
- WGAN: https://arxiv.org/abs/1701.07875
- Pix2Pix: https://arxiv.org/abs/1611.07004
- CycleGAN: https://arxiv.org/abs/1703.10593
Object detection
- RCNN: https://arxiv.org/abs/1311.2524
- Fast-RCNN: https://arxiv.org/abs/1504.08083
- Faster-RCNN: https://arxiv.org/abs/1506.01497
- SSD: https://arxiv.org/abs/1512.02325
- YOLO: https://arxiv.org/abs/1506.02640
- YOLO9000: https://arxiv.org/abs/1612.08242
Instance Segmentation
- Mask-RCNN: https://arxiv.org/abs/1703.06870
- YOLACT: https://arxiv.org/abs/1904.02689
Pose estimation
- PoseNet: https://arxiv.org/abs/1505.07427
- DensePose: https://arxiv.org/abs/1802.00434