Mathematics for Artificial Intelligence
Since childhood, we have been learning these creepy mathematical concepts like Algebra, Calculus, Statistics, and Probability. Have you ever asked yourself why? And How will these things help you in the future? Today I am going to help you out with that.
These mathematical concepts play a very crucial role in the field of Artificial Intelligence and Data Science. Mainly, statistics are concerned with collecting, organizing, analyzing, and interpreting data.
Statistics is a branch of applied mathematics that involves collecting, describing, analyzing, and inferring conclusions from quantitative data. We need statistics to help transform observations into information and answer questions about statements' samples.
The statistics are mainly divided into two major areas:
- Descriptive Statistics
- Here it can be population data or sample data.
- Analyzing data, Summarizing data, Organizing data in the form of numbers and graphs.
- Measure of Central Tendency, Measure of Variance
- e.g., Bar plot, Pie chart, Histogram.
2. Inferential Statistics
- Here it deals only with sample data.
- It allows you to make predictions or inferences from the data.
- Point Estimation, Interval Estimation.
- eg., Z-test, T-test, Chi-square test, Annova test.
Now the question is:
How are these old statistical models used in Artificial Intelligence?
Measure of Central Tendency
- Mean: It is the average of the set of values considered.
- Median: It separates the higher half and lower half of the data.
- Mode: It is the most frequently appearing value.
Measure of Dispersion
- Range: Difference between the highest and lowest value.
- Interquartile Range: Quartiles divide a rank-ordered data set into four equal parts.
- Variance: Measures how far each number in the set is from the mean and therefore from every other group.
- Standard deviation: measures the variation or dispersion of a set of values from the mean.
Application of Machine Learning
Handling Missing Values in the Data Set
Mean/ Median/ Mode Imputation has the assumption that the data are missing completely at random (MCAR). We solve this by replacing the NAN with the most frequent occurrence of the variables.
Here is the code:
Advantages And Disadvantages of Mean/Median Imputation
- Easy to implement(Robust to outliers)
- A faster way to obtain the complete dataset
- Change or Distortion in the original variance
- Impacts Correlation