Machine Learning Concepts

Estimated reading: 5 minutes 589 views

Machine Learning Concepts
Export
Advanced Entity Extraction
Advanced entity extraction, also known as entity recognition, is used to extract vital information for natural language processing (NLP). It is widely used for finding, storing and sorting textual content into default categories such as person, locations, objects, organizations and so on.
Thus, advance entity extraction involves identification of proper names in texts and their classification into a set of predefined categories of interest.
See also,
rubitext

Algorithm
Algorithm is a step-by-step procedure to solve logical and mathematical problems. It takes an input and produces a logical output.
Algorithms contain an ordered set of rules or instructions which determine how a certain task is accomplished in order to achieve the expected outcome.
Algorithms can be written in any ordinary language (for example, writing the steps for a recipe). However, in computing, algorithms are written in pseudocode, flowcharts or a programming language.
See also,
Machine Learning Algorithms

ARIMA
ARIMA stands for Autoregressive Integrated Moving Average model. It is fitted to the time-series data to understand the data better, or to predict the future data points in the series based on its own past values.
ARIMA is based on the assumption that only the information in the past values of a time-series data can be used to predict its future values. Thus, it is a univariate time-series forecasting model.
ARIMA is made up of three components – AR (past values used to forecast the next value), I (number of times the differencing operation is performed on the series to make it stationary), and MA (number of past forecast errors used to forecast the next value). However, in some cases, ARIMA models are applied where data shows evidence of non-stationarity. In that case, an initial differencing step can be applied one or more times to eliminate the non-stationarity.
ARIMA models are used in short-term non-seasonal forecasting and require a minimum of 40 historical data points.
See also,

Auto ARIMA

rubicast

Auto ARIMA
In ARIMA, before implementing the forecasting model, data preparation and parameter tuning are complex and time-consuming. It is necessary to make the model stationary and determine the values of AR (Autoregressive) and MA (Moving Average), before the model is actually implemented.
Auto ARIMA makes the implementation of the ARIMA model easier by performing all data preparation and parameter tuning operations. It makes the series stationary and determines the values of the three coefficients of ARIMA, by creating Auto Correlation Function (ACF) and Partial Correlation Function (PACF) plots.
See also,

ARIMA

rubicast

Basic Sentiment Analysis
Basic sentimental analysis is the mining of textual data to extract subjective information from the source. This helps businesses to understand the social sentiment about their product, service, or brand based upon the monitoring of online conversations.
In this analysis, the algorithm treats a text as Bag of Words (BOW), where the order of words and their context is ignored. The original text is filtered down to only those words that are thought to carry sentiment. Also, the algorithm keeps count of the maximum number of occurrences of the most frequent words.
Sentiment analysis models focus on polarity, that is, they detect whether the text evokes a positive, negative, or neutral sentiment. Apart from this polarity, these models can also detect emotions, feelings (for example, happy, sad, disappointed, or angry), and intentions (for example, interested or not interested).

Table: Sentiment Scores and their Meaning

Sentiment

Sentiment Score

Remark

Positive

0.01 to 1

0.01 is the weakest sentiment score, while 1 is the strongest sentiment score.

Negative

-0.01 to -1

0.01 is the weakest sentiment score, while -1 is the strongest sentiment score.

Neutral

0

Neutral statement

See also,

Sentiment

rubitext

Case Convertor
Case convertor can change the case of an alphabet from an upper case to a lower case or vice-versa.
See also,

Custom Words Remover

Frequent Words Remover

rubitext

Centroid-based Clustering
Centroid-based clustering is a method in which each cluster is represented by a central vector. The central vector may not necessarily be a part of the dataset. A data value is assigned to a cluster depending upon its proximity, such that its squared distance from the central vector is minimized.
The k-means algorithm is the most widely used centroid-based clustering algorithm. In this algorithm, the dataset is divided into k pre-defined, distinct, and non-overlapping clusters. Each data point is assigned to a cluster such that the arithmetic mean of all data points within a cluster is always minimum. Minimum variation within a cluster ensures greater homogeneity of data points within that cluster.
See also,

Data Clustering

Connectivity-based Clustering

Density-based Clustering

Incremental Learning

Connectivity-based Clustering
Connectivity-based clustering is also called hierarchical clustering. This is because it builds clusters in a hierarchy. In clustering, the data points that are closer to each other exhibit more similarity than those which are away from each other.
The algorithm starts with assigning of data points to a cluster of their own. Then two nearest clusters are merged to form a single cluster. In the end, the algorithm terminates with only one cluster remaining.
There are two approaches to this model. In the first approach, data points are classified into separate clusters and then aggregated as the distance between them decreases. In the second approach, data points are distributed into a single large cluster and then segregated as the distance between them increases.
See also,

Leave a Reply

Your email address will not be published. Required fields are marked *

Share this Doc

Machine Learning Concepts

Or copy link

CONTENTS