In Part 1 of the article, Data Science and Machine Learning dictionary, we’ve explored the terminology starting with the letters A, B, and C. In this article, we’re going to share the terminology starting with the letters D, E, F, G, H, and I.

Data Science and Machine Learning Dictionary

D

Data Engineers – data professionals responsible for setting up and maintaining the organization’s data infrastructure.

Data Mining – It is the process of extracting useful insights from both structured as well as unstructured data.

Data Science – A field of study & process of preparing data for the analysis process of cleansing, mining, manipulating, and algorithmic development to perform data analysis.

Dashboard – A data visualization & management tool used to track, analyze and display performance. The most common tools for building dashboards are Power BI, Tableau, and MS Excel.

Database – It is a structured collection of data, organized in an accessible way. The standard database language is SQL (Structured Query Language).

Data Augmentation – It is a technique used to increase the amount of data by performing slight adjustments to existing data.

Decision Trees – It is a non-parametric supervised learning method used for classification and regression. It aims to build models that predict the value of the target variable by learning simple decision algorithms inferred from the data features.

Deep Learning – It is a Machine Learning method that teaches machines to perform the tasks of humans. It trains an algorithm to predict outputs based on a given set of inputs.

Dependent Variable – It is the variable that is measured and is affected by the independent variable.

Dimensionality Reduction – It is the process of reducing the number of input variables in data.

Part 1: Data Science and Machine Learning Dictionary.

Part 3: Data Science and Machine Learning Dictionary.

Part 4: Data Science and Machine Learning Dictionary.

E

Early Stopping – It is a technique used to avoid overfitting when training an ML model.

Exploratory Data Analysis – It is a critical process in the initial analysis of data to provide deeper insights through visualization or statistical analysis.

ETL – It is a popular acronym that stands for Extract, Transform, and Load. An ETL system extracts data from the source systems, enforcing its quality and presenting the data.

Evaluation metric – It is a metric used to measure the quality of an ML model, such as AUC.

Part 1: Data Science and Machine Learning Dictionary.

Part 3: Data Science and Machine Learning Dictionary.

Part 4: Data Science and Machine Learning Dictionary.

F

False Negative – These are the predictions that are true but are incorrectly predicted as false.

False Positive – These are the predictions that are false but are incorrectly predicted as accurate.

Feature Reduction – It is the process of reducing the number of features to improve the efficiency of a task without losing data.

Feature Selection – It is the process of reducing the number of input variables by selecting relevant features to use in the ML model.

F-score – It is a measurement of the model’s accuracy on a dataset.

Part 1: Data Science and Machine Learning Dictionary.

Part 3: Data Science and Machine Learning Dictionary.

Part 4: Data Science and Machine Learning Dictionary.

G

GPU – GPU stands for Graphics Processing Unit, a specialized processor that can process data pieces for machine learning models, video editing, and gaming applications.

Gradient Boosting – It is the process of relying on using previous ML models to improve the next model and minimizing the prediction error.

Gradient Descent – It is an optimization algorithm that helps find a local minimum/maximum of a given function.

Part 1: Data Science and Machine Learning Dictionary.

Part 3: Data Science and Machine Learning Dictionary.

Part 4: Data Science and Machine Learning Dictionary.

H

Hadoop – It is an open-source framework used to store and process large datasets efficiently.

Hierarchical Clustering – It is an algorithm that groups similar data points into clusters.

Histogram – It is a graphical representation of a group of data points in continuous variables.

Holdout Sample – It is a random sample taken from a data set that has not been used in the model fitting process.

Hyperparameter Tuning – It is the process of finding optimal hyperparameters for ML algorithms.

Part 1: Data Science and Machine Learning Dictionary.

Part 3: Data Science and Machine Learning Dictionary.

Part 4: Data Science and Machine Learning Dictionary.

I

Independent Variable – It is the variable that can manipulate or directly affect the dependent variable.

Iteration – It is the process of repeating a statement or block of code a specific number of times, producing an output one after another.

Hope you liked reading the article, Data Science and Machine Learning Dictionary – Part 2. Share your thoughts in the comments section below.

To become a data scientist, explore these amazing certification programs by Console flare that make you ready for the profiles of Data Analyst, Data Engineer, Database manager, and Data Scientist.