Data Science & ML Dictionary – Part 4

Console Flare

3 years ago

In the previous articles of Data Science & ML Dictionary, we’ve shared the terminology starting from A to P. In this article, we’re going to provide the terminology starting from “R” to “Z”.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

Data Science & ML Dictionary

R

R – It is an open-source programming language and a software environment for statistical computing, machine learning, and data visualization.

Random Forest – It comprises many decision trees and an ensemble learning method for classification, regression, and other tasks that consist of multiple Decision Trees.

Regression – It is a technique used for investigating the relationship b/w independent variables and dependent variables.

Regularization – It is a technique used to solve overfitting in statistical models.

Reinforcement Learning – It aims to train a model to return an optimum solution using a sequence of keys and/or decisions created for a specific problem.

Ruby – It is an open-source programming language primarily used for developing web apps.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

S

Scikit-learn – It is a library for Python programmers that contains tools for machine learning, and statistical modellings such as classification, clustering, regression, and dimensionality reduction.

SQL – It is an acronym for Structured Query Language and is used to manage databases by performing tasks such as updating, retrieving, and maintaining data.

Standard Deviation – It tells you the variation of the data around the mean.

Standard Error – It tells the variation of the various means calculated.

Stochastic Gradient Descent – The goal is to minimize the Cost Function by incrementally changing the weight of the network.

Supervised Learning – It is the type of learning when an algorithm learns on a labelled dataset and analyses the sample data.

Support Vector Machine – It is a supervised learning model which creates a line or a hyperplane that divides the data into classes.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

T

T-Distribution – It is a probability distribution that describes the standardized distances of sample means to the population mean, the same as the normal distribution.

T-Value – The variance b/w and within the groups, where big T-Value means distinct groups, and small T-Value mean similar groups.

TensorFlow – It is an open-source software library for deep learning applications which makes model building easy through large-scale neural networks with many layers using data flow charts.

Tokenization – Process of splitting a text string into units is called tokens and is a part of NLP (Natural Language Processing).

Transfer Learning – It is a machine learning method where the knowledge of application obtained from a model task can be reused as a foundation for another task.

True Positive – When you predicted positive, and it is positive

True Negative – When you predicted negative, and it is negative

T-test – It is a test used to compare two population sets by finding the difference in their population means.

Type I error – It is the decision to reject the null hypothesis as it could be incorrect.

Type II error – It is the decision to retain the null hypothesis as it could be incorrect.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

U

Underfitting – It is a modelling error that can neither model sample data nor generalizes fresh data and does not perform good on the sample set.

Unsupervised Learning – The process where an ML model learns on unlabelled data, to produce accurate and reliable outputs, inferring more about hidden structures.

V

Variance – It is used to measure the spread of a given set of numbers.

Vectors – They are used to represent numeric characteristics known as features in a mathematical form.

X

XGBoost – It is an open-source library that provides a regularizing gradient boosting framework for programming languages such as C++, Java, Python, R, etc.

Z

Z-test – It is a statistical test used to calculate whether two population means are different.

Part 1: Data Science & ML Dictionary

Part 2: Data Science & ML Dictionary

Part 3: Data Science & ML Dictionary

For the latest industry news, follow our LinkedIn Page.