I will be selfish. Shouldn’t we all be at times? In fact, I think it’s in our interest that people act selfishly.

It is not from the benevolence of the butcher, the brewer, or the baker, that we expect our dinner, but from their regard to their own interest. We address ourselves, not to their humanity but to their self-love, and never talk to them of our necessities but of their advantages. — Adam Smith

One bit of advice: it is important to view knowledge as sort of a semantic tree — make sure you understand the fundamental principles, i.e. the trunk and big branches, before you get into the leaves/details or there is nothing for them to hang on to. — Elon Musk

**PSA:** Do not solely rely on online courses. We should include supplementary material for online learning. For example, working on side projects or reading the documentation can reap great benefits. It can feel rewarding to master concepts in an online course but could be deceiving. …

I was reading Buddhism Plain and Simple by Steve Hagen. First off, this book offers a tremendous harsh reality — you will die. We successfully occupy our minds with non-priority tasks. How many times have you commuted to work without realizing you were on auto-pilot? You probably don’t even remember the drive… ou was too busy “in” your thoughts. A lot of us live life on auto-pilot. We are either reliving great memories, forming an alternative solution to our what-ifs, or daydreaming on a better future.

Religion is controversial. Hence, I will remain as objective as I can be. I…

**Gini Impurity**: Measurement of the likelihood of incorrect classification of a new instance of a random variable, if that new instance were randomly classified according to the distribution of class labels from the data set.**Information Gain:**It calculates how important each independent variable is (features).**Entropy**: A measurement of information (or rather lack thereof).

**Variance**: It measures how far a data set is spread out.**Mean**: A calculated “central” value of a set of numbers.**Standard Deviation**: Similar to variance, but it is the root value of the variation.**The coefficient of Variation (CV):**A measure of relative variability. The ratio of the standard deviation to the mean. The coefficient of variation is useful when comparing two datasets. The larger the CV, the more spread the data is relative to its mean.

1. Deep Learning Book Notes, Chapter 1

2. Deep Learning Book Notes, Chapter 2

**Vectors**: A vector is composed of a magnitude and direction. Geometrically, a vector in a 2-Dimensional plane (x and y graph) is a line from the origin to its coordinates. For example, if we have the coordinates (3,4), we can sketch a line from the origin to (3,4) which is three on the x-axis, and four on the y-axis.**Magnitude:**To calculate for magnitude, we have to find the length between the origin (0,0) and (3,4). Using the Pythagorean Theorem, we compute the square root of ³²+⁴² which is 5!**Direction:**For direction, we use the trigonometric functions: sin, cos…

**Euclidean Distance**: The distance between data “points” (p1, p2, …, pn). It computes the square root of the sum of the squares of the differences between the data “points.”**Manhattan Distance**: The distance between data “points” (p1, p2, …, pn). It computes the sum of the absolute differences between the data “points.”**Chebyshev distance:**Unlike the previous two methods, it calculates the maximum of the absolute differences between the data “points.”**K**: Or neighbors. It’s a core concept of the K-Nearest Neighbor. It determines how much values we are using in our model.

**Naive Bayes**: The Naive Bayes Classifier technique derives from on the Bayesian theorem.**Bayes Theorem**: Bayes’ theorem is a mathematical equation used in probability and statistics to calculate the conditional probability. In other words, it is used to calculate the probability of an event based on its association with another event — Prof. Helmenstine.**Conditional Independence**: A great example from Wikipedia: “

A and B are conditionally independent given C if and only if, given the knowledge that C occurs, knowledge of whether A occurs provides no information on the likelihood of B occurring, and knowledge of whether B occurs provides…

**Supervised Learning**: Analyzed the dataset to produce a predicted function which will be used for forecasting new examples.**Ground Truth**: The actual result.**Cross-Entropy:**“Cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverge from the actual label.”

First of all, despite its name, a Logistic Regression is not a Regression problem but a Classification problem. Intuitively, you can think of the question: *“How to Make $1M in 10 years?”* In our scenario, there are two routes: finish college or…

**Supervised Learning**: Analyzed the dataset to produce a predicted function to forecast new examples.**Overfitting**: A model that has learned ‘too much’ of the dataset. Hence, the model will not be as useful on new examples.**Ground Truth**: The actual result.**MSE**: Mean Squared Error. It’s a formula that measures how well the model is performing. For each observation, it calculates the difference between the predicted and the ground truth. It then calculates the summation of the squares the difference. Lastly, it calculates the mean (divide by all the observations).**Gradient Descent:**A first-order iterative optimization algorithm for finding the…