Mathematics for Artificial Intelligence
Part 2
We all know there are so many distributions in mathematics, but today, I will tell you about one required distribution.
Gaussian Distribution
The Gaussian Distribution is also called the Normal Distribution. The Gaussian Distribution is a probability distribution that associates the normal random variable X with the cumulative probability.
Note: Normal Random Variable is the one with mean 0 and standard deviation 1

In general, the data in the gaussian or normal distribution is very beneficial for model building in machine learning & deep learning. Also, it makes math more accessible. Datasets with Gaussian distributions create valid to a diversity of methods that decrease under parametric statistics.
The graph of the Normal Distribution depends on the two factors:
- Mean: It determines the location of the centre of the origin
- Standard Deviation: It determines the height of the graph
e.g., If the standard deviation is high, the curve is short and wide, or if the standard deviation is slight, the curve is tall and narrow.

Empirical Formula in Gaussian Distribution:

Now, I will tell you about the most important thing that we all need to handle in the datasets to produce the best models in machine learning and deep learning.
Outliers
An outlier is a data point in a dataset distant from all other observations. Simply, A data point that lies outside the overall distribution of the dataset.

Criteria to identify an outlier?
- Data points fall outside of 1.5 times of an interquartile range above the 3rd quartile and below the 1st quartile.
- Data points that fall outside of 3 standard deviations. We can use a z-score & if the z-score falls outside of 2 standard deviations.
Reason for an outlier to exist in the dataset?
- Variability in the data
- An experimental measurement error
Impacts of having outlier in a dataset?
- It causes various problems during our statistical analysis.
- It may cause a significant impact on the mean & the standard deviation.
Various ways of finding the outlier
- Using Scatter plot
- Box plot
- Using Z score
- Using the IQR interquartile range


