In this part of the article, Machine Learning & Data Science dictionary, we’re going to provide the terminology starting from the letter “J” up to “P”. To read the first two parts of the article, visit the links below.

Machine Learning & Data Science Dictionary – Part 3

J

Jupyter Notebook – It is a web-based computing platform used to create and share documents.

K

K-means – It is an unsupervised learning algorithm used to group data points to the nearest centroid through distance.

Keras – It is an open-source library developed by Google for implementing neural networks.

K-nearest Neighbours (KNN) – It is a supervised machine learning algorithm used for regression and classification tasks. It is used to predict the test data set by calculating the distance between the current training data points.

Kubernetes – It is an open-source platform for automating application deployment, scaling, and management.

L

Labelled dataset – It is the data that has a “label”, “class”, or “tag” associated with it.

Lasso Regression – It is the process of shrinking or regularizing to avoid overfitting to minimize prediction error.

Linear Regression – It is used to make predictions on continuous dependent variables with the use of independent variables.

Logistic Regression – It is used to predict the categorical dependent variable using independent variables to classify outputs, which can only be between 1 and 0.

Log Loss – It measures the performance of a classification model, where the output is a probability with values between 1 and 0.

Long Short Term Memory Networks – It is a type of Recurrent Neural Network which can learn and memorize long-term dependencies. It aims to remember past information for long periods.

Part 1: Machine Learning & Data Science Dictionary

Part 2: Machine Learning & Data Science Dictionary

Part 4: Machine Learning & Data Science Dictionary

M

Machine Learning – It is a field of study & process where models use historical data as an input to predict the outcomes.

Machine Learning Operations (MLOps) – It is a core function of Machine Learning engineering that focuses on taking ML models to production and maintaining and monitoring them.

Management information system (MIS) – It is a computer system consisting of hardware and software that serves as the backbone of the operations of an organization.

Maximum Likelihood Estimation – It is a probabilistic framework to get more accurate parameter estimates.

Mean – It is an average value of all the numbers in the data.

Mean Absolute Error – Also called L1 regularization, it computes the mean of errors between labelled and predicted data.

Mean Square Error Loss – It is known as L2 regularization and tells how close a regression line is to a set of data points.

Median – It is the middle value in an ordered list from smallest to largest.

Mode – It is a value that appears most frequently in input or output data sets.

Model selection – It is the process of selecting a model from a set of available model selections.

Monte Carlo Method – It is a mathematical technique used to estimate the possible outcomes of an uncertain event.

Multi-Class Classification – The classification problems with more than one class in the target variable.

Multilayer Perceptrons – It is a feedforward artificial neural network where inputs are fed into the Neural Network to generate outputs.

Multivariate Analysis – It is the process of comparing and analyzing the dependency of multiple variables over one another.

Part 1: Machine Learning & Data Science Dictionary

Part 2: Machine Learning & Data Science Dictionary

Part 4: Machine Learning & Data Science Dictionary

N

Naive Bayes – It is a process based on the Bayes Theorem by using a classifier which assumes the independence between attributes of data points.

NaN – It stands for ‘not a number’ and refers to a numeric value that is undefined or unrepresented. This is considered misrepresented or missing in a dataset.

Natural Language Processing – NLP is the ability of a computer to detect and understand human language through speech and text.

Neural Network – It is a network made up of neurons that contain three different layers: an input layer, an output layer, and hidden layers.

NoSQL – It stands for Not only SQL and is a database that provides storage and retrieval of data.

Nominal variable – It is a type of the variable used to categorize, label, or name particular attributes that need to be measured.

Normal distribution – It is a probability distribution function that represents the distribution of variables in a bell-shaped graph.

Normalization – It is a scaling technique used to shift and rescale data into the range [0, 1].

NumPy – It is a library used for Python with mathematical functions in processing linear algebra, multidimensional array objects, and matrix functions.

Part 1: Machine Learning & Data Science Dictionary

Part 2: Machine Learning & Data Science Dictionary

Part 4: Machine Learning & Data Science Dictionary

O

One Hot Encoding – It is a process where categorical variables are converted into a form for ML and deep learning algorithms to improve the predictions and accuracy of a model.

Ordinal variables – These are the variables that have discrete values with some form of order involved.

Outlier – It is the observation far away from the overall pattern in a sample.

Overfitting – It happens when a statistical model fits precisely against its training data. It is a modelling error when a function too closely includes a limited set of data points.

P

Pandas – It is an open-source Python library for data analysis & manipulation.

Parameters – These are measurable factors that define a system and the part of a model that is learned from past training data.

Precision – It is the measure of total actual positive cases and the quality of a positive prediction made by the model.

Predictive Modelling – The process of using a mathematical approach to predict outcomes or future events by analyzing patterns in a given set of data.

The Predictor Variable – It is the variable used to predict dependent variables.

Principal Component Analysis – It is a technique used to reduce the dimensionality of datasets by increasing model interpretability without minimizing information loss.

Probability distribution – It is a statistical function that describes all possible values and their occurrence.

P-value – It is the probability that the results from your sample data occurred by chance. Therefore a low p-value is good.