Things you should know before you build your model
Have you ever been in this situation where you want to build a model, let’s say a computer vision system for detecting the drones. You have the data and everything but tragically, your algorithm’s accuracy is not yet good enough. And you start acting crazy in order to improve your model’s performance. What do you do?

Source: https://www.futura-sciences.com/tech/breves/drone-soldes-hiver-drone-eachine-camera-hd-moins-70-3619/
You took some time to search on the internet and you found a bunch of ideas:
· Get more data: Get more drone pictures
· Get diverse pictures of the drone: from different angles and positions.
· Increase the number of epochs
· Try a bigger neural network, with more layers and parameters
· Try a smaller neural network
· Try Data augmentation techniques
· Try adding regularization (such as L2 regularization)
All these solutions are fine, but the challenge here is which direction you should go for that will help you to succeed. If you had the wrong choice, you’ll waste a lot of time and energy. So, how do you proceed?
Choose dev and test sets to reflect data you expect to get in the future and want to do well on
It is common to use a random 70%/30% split to form your training and test data. But it’s not that simple. You should make sure that your test data shouldn’t be that different from the future data of your application.
If you don’t have the application ready right now, you might not be able to get the exact data that can reflects what you need to focus on in the future. For example, you might search for images that was taken of drones by other people’s phone or you can take a couple of pictures yourself and update your dataset.
So my general advice to you is: Try to pick test examples that will reflect what you want to perform well on rather than whatever data you happen to have for training.
Focus on a single-number evaluation metric
Instead of distracting yourself with the model parameters, the choice of features and the algorithm architecture, it’s better to have a single-number evaluation metric such as accuracy that can allows you to compare your models according to their performance on that metric, and quickly decide what is working best.
If you’re interested in both Precision and Recall, you can compute F1 score, which is a modified way of computing their average.
Start small
Instead of overthinking it, it’s better to just have a basic train/test sets and an initial metric quickly. Even if that sounds imperfect, but this will help you get going quickly. Along the development, based on the first results, you can adapt your initial train/test set or metric. Machine learning is a highly iterative process: You may try many dozens of ideas before finding one that you’re satisfied with.

Source : https://blog.floydhub.com/structuring-and-planning-your-machine-learning-project/
Choose the size of the train/test sets wisely
Remember that your train set should be large enough to detect meaningful changes in the accuracy of your algorithm, but not necessarily much larger. Your test set should be big enough to give you a confident estimate of the final performance of your system.



