Life Cycle of a Data Science Project
The hottest field of the 22nd Century
The term “Data Science” was coined by William S at the beginning of the 21st Century.
The steps involved in the lifecycle of a data science project:
- Business Understanding
- Data Collection or Data Gathering
- Feature Engineering
- Feature Selection
- Model Creation & Hyper Parameter Tuning
- Model Deployment
- Model Monitoring & Retraining
“Why?” Because it is essential to understand the problem statement to analyze the data perfectly and get good results. Here, the data scientist and the business person with the domain knowledge are critical.
It is a crucial step to get good results. Here, the data is directly proportional to the results. The more information we have, the much good results we will get. The process of gathering data from different sources is also called Data Mining.
- Web Scraping — The technique used to get the data from the different websites.
2. API’s — The Application Program Interface (API’s) are usually open-source and not charged for usage. It is a code that allows two software programs to communicate.
And also, there are many techniques for this.
The algorithm depends on each feature that should be important and relevant to perform well.
- Exploratory Data Analysis
- Handling Missing Values
- Handling Outliers
- Categorical Encoding
- Normalization & Standardization
All data set features are unnecessary, as we discussed, whichever data is critical that we will select.
- Forward Elimination
- Backward Elimination
- Univariate Selection
- Random Forest Importance
- Feature Selection with Decision Trees
Model Creation & Hyper Parameter Tuning
Once the data is ready, we need to create the model. After completing the model, we need to check metric evaluation terms like precision, recall, F1-score, AUC, etc.
In Hyper Parameter Tuning, we have,
- Keras Tuner
- Bayesian Optimization Hyperopt
- Genetics Algorithms
We use platforms like AWS, Microsoft Azure, Flask, Google Cloud Platform (GCP), Heroku. Once the setup is done, move to production servers.