MACHINE INTELLIGENCE IN DATA SCIENCE

UNIT 1 : INTRODUCTION TO DATA SCIENCE

What Is Data Science?

Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions. Data science uses complex machine learning algorithms to build predictive models. The data used for analysis can come from many different sources and presented in various formats.

Data Science Lifecycle

Data Science Lifecycle revolves around the use of machine learning and different analytical strategies to produce insights and predictions from information in order to acquire a commercial enterprise objective.

The complete method includes a number of steps like data cleaning, preparation, modelling, model evaluation, etc. It is a lengthy procedure and may additionally take quite a few months to complete. So, it is very essential to have a generic structure to observe for each and every hassle at hand.

Let us understand what is the need for Data Science?

Earlier data used to be much less and generally accessible in a well-structured form, that we could save effortlessly and easily in Excel sheets, and with the help of Business Intelligence tools data can be processed efficiently. But Today we used to deals with large amounts of data like about 3.0 quintals bytes of records is producing on each and every day, which ultimately results in an explosion of records and data. According to recent researches, It is estimated that 1.9 MB of data and records are created in a second that too through a single individual.

So this a very big challenge for any organization to deal with such a massive amount of data generating every second. For handling and evaluating this data we required some very powerful, complex algorithms and technologies and this is where Data science comes into the picture.

1. Business Understanding: The complete cycle revolves around the enterprise goal. What will you resolve if you do not longer have a specific problem? It is extraordinarily essential to apprehend the commercial enterprise goal sincerely due to the fact that will be your ultimate aim of the analysis. After desirable perception only we can set the precise aim of evaluation that is in sync with the enterprise objective. You need to understand if the customer desires to minimize savings loss, or if they prefer to predict the rate of a commodity, etc.

2. Data Understanding: After enterprise understanding, the subsequent step is data understanding. This includes a series of all the reachable data. Here you need to intently work with the commercial enterprise group as they are certainly conscious of what information is present, what facts should be used for this commercial enterprise problem, and different information. This step includes describing the data, their structure, their relevance, their records type. Explore the information using graphical plots. Basically, extracting any data that you can get about the information through simply exploring the data.

3. Preparation of Data: Next comes the data preparation stage. This consists of steps like choosing the applicable data, integrating the data by means of merging the data sets, cleaning it, treating the lacking values through either eliminating them or imputing them, treating inaccurate data through eliminating them, additionally test for outliers the use of box plots and cope with them. Constructing new data, derive new elements from present ones. Format the data into the preferred structure, eliminate undesirable columns and features. Data preparation is the most time-consuming but arguably the most essential step in the complete existence cycle. Your model will be as accurate as your data.

4. Exploratory Data Analysis: This step includes getting some concept about the answer and elements affecting it, earlier than constructing the real model. Distribution of data inside distinctive variables of a character is explored graphically the usage of bar-graphs, Relations between distinct aspects are captured via graphical representations like scatter plots and warmth maps. Many data visualization strategies are considerably used to discover each and every characteristic individually and by means of combining them with different features.

5. Data Modeling: Data modeling is the coronary heart of data analysis. A model takes the organized data as input and gives the preferred output. This step consists of selecting the suitable kind of model, whether the problem is a classification problem, or a regression problem or a clustering problem. After deciding on the model family, amongst the number of algorithms amongst that family, we need to cautiously pick out the algorithms to put into effect and enforce them. We need to tune the hyperparameters of every model to obtain the preferred performance. We additionally need to make positive there is the right stability between overall performance and generalizability. We do no longer desire the model to study the data and operate poorly on new data.

6. Model Evaluation: Here the model is evaluated for checking if it is geared up to be deployed. The model is examined on an unseen data, evaluated on a cautiously thought out set of assessment metrics. We additionally need to make positive that the model conforms to reality. If we do not acquire a quality end result in the evaluation, we have to re-iterate the complete modelling procedure until the preferred stage of metrics is achieved. Any data science solution, a machine learning model, simply like a human, must evolve, must be capable to enhance itself with new data, adapt to a new evaluation metric. We can construct more than one model for a certain phenomenon, however, a lot of them may additionally be imperfect. The model assessment helps us select and construct an ideal model.

7. Model Deployment: The model after a rigorous assessment is at the end deployed in the preferred structure and channel. This is the last step in the data science life cycle. Each step in the data science life cycle defined above must be labored upon carefully. If any step is performed improperly, and hence, have an effect on the subsequent step and the complete effort goes to waste. For example, if data is no longer accumulated properly, you’ll lose records and you will no longer be constructing an ideal model. If information is not cleaned properly, the model will no longer work. If the model is not evaluated properly, it will fail in the actual world. Right from Business perception to model deployment, every step has to be given appropriate attention, time, and effort.

Relation between data science and machine learning

Data science and machine learning are closely related fields, and they often go hand in hand, with machine learning being a subset of data science. Here's an overview of their relationship:

Data Science Overview:
- Definition: Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract insights and knowledge from structured and unstructured data.
- Components: Data science encompasses various components, including data cleaning, exploration, visualization, statistical analysis, and predictive modeling.
Machine Learning Overview:
- Definition: Machine learning is a subset of artificial intelligence that focuses on the development of algorithms that enable computers to learn patterns and make predictions or decisions without being explicitly programmed.
- Components: Machine learning includes tasks such as supervised learning, unsupervised learning, reinforcement learning, and deep learning. It involves training models on data to recognize patterns and make predictions or decisions.
Relationship between Data Science and Machine Learning:
- Data as the Foundation: Data science relies on data to derive insights and solve problems. Machine learning, in turn, uses algorithms to learn from and make predictions or decisions based on that data.
- Predictive Modeling: Machine learning is often a crucial component of data science, especially in predictive modeling. Data scientists use machine learning algorithms to build models that can predict future outcomes or classify data into different categories.
- Tools and Techniques: Data scientists often use machine learning tools and techniques to analyze and interpret data. Machine learning algorithms help in identifying patterns, trends, and relationships within the data that may not be apparent through traditional statistical methods.
- Iterative Process: Data science is often an iterative process where machine learning models are built, evaluated, and refined based on the results obtained. This iterative cycle contributes to the improvement of models and the overall data science process.
Common Tasks in Data Science and Machine Learning:
- Exploratory Data Analysis (EDA): Both data science and machine learning involve exploring and understanding the data through techniques such as visualization and statistical analysis.
- Feature Engineering: In both fields, selecting and engineering relevant features from the data is crucial for building effective models.
- Model Evaluation: Both data scientists and machine learning practitioners need to assess the performance of their models, ensuring they generalize well to new, unseen data.

In summary, while data science is a broader field that encompasses various techniques for extracting insights from data, machine learning is a specific set of techniques within data science that focuses on building models capable of learning and making predictions or decisions. Data science provides the foundation, and machine learning is a powerful tool within the data scientist's toolkit.

Types Of Data – Nominal, Ordinal, Discrete and Continuous

The data is classified into four categories:

Nominal data.
Ordinal data.
Discrete data.
Continuous data.

Qualitative or Categorical Data

Qualitative or Categorical Data is data that can’t be measured or counted in the form of numbers. These types of data are sorted by category, not by number. That’s why it is also known as Categorical Data. These data consist of audio, images, symbols, or text. The gender of a person, i.e., male, female, or others, is qualitative data.

Qualitative data tells about the perception of people. This data helps market researchers understand the customers’ tastes and then design their ideas and strategies accordingly.

The other examples of qualitative data are :

What language do you speak
Favorite holiday destination
Opinion on something (agree, disagree, or neutral)
Colors

The Qualitative data are further classified into two parts :

Nominal Data

Nominal Data is used to label variables without any order or quantitative value. The color of hair can be considered nominal data, as one color can’t be compared with another color.

The name “nominal” comes from the Latin name “numen,” which means “name.” With the help of nominal data, we can’t do any numerical tasks or can’t give any order to sort the data. These data don’t have any meaningful order; their values are distributed into distinct categories.

Examples of Nominal Data :

Color of hair (Blonde, red, Brown, Black, etc.)
Marital status (Single, Widowed, Married)
Nationality (Indian, German, American)
Gender (Male, Female, Others)
Eye Color (Black, Brown, etc.)

Ordinal Data

Ordinal data have natural ordering where a number is present in some kind of order by their position on the scale. These data are used for observation like customer satisfaction, happiness, etc., but we can’t do any arithmetical tasks on them.

Ordinal data is qualitative data for which their values have some kind of relative position. These kinds of data can be considered “in-between” qualitative and quantitative data. The ordinal data only shows the sequences and cannot use for statistical analysis. Compared to nominal data, ordinal data have some kind of order that is not present in nominal data.

Examples of Ordinal Data :

When companies ask for feedback, experience, or satisfaction on a scale of 1 to 10
Letter grades in the exam (A, B, C, D, etc.)
Ranking of people in a competition (First, Second, Third, etc.)
Economic Status (High, Medium, and Low)
Education Level (Higher, Secondary, Primary)

Difference between Nominal and Ordinal Data

Nominal Data	Ordinal Data
Nominal data can’t be quantified, neither they have any intrinsic ordering	Ordinal data gives some kind of sequential order by their position on the scale
Nominal data is qualitative data or categorical data	Ordinal data is said to be “in-between” qualitative data and quantitative data
They don’t provide any quantitative value, neither can we perform any arithmetical operation	They provide sequence and can assign numbers to ordinal data but cannot perform the arithmetical operation
Nominal data cannot be used to compare with one another	Ordinal data can help to compare one item with another by ranking or ordering
Examples: Eye color, housing style, gender, hair color, religion, marital status, ethnicity, etc	Examples: Economic status, customer satisfaction, education level, letter grades, etc

Quantitative Data

Quantitative data can be expressed in numerical values, making it countable and including statistical data analysis. These kinds of data are also known as Numerical data. It answers the questions like “how much,” “how many,” and “how often.” For example, the price of a phone, the computer’s ram, the height or weight of a person, etc., falls under quantitative data.

Quantitative data can be used for statistical manipulation. These data can be represented on a wide variety of graphs and charts, such as bar graphs, histograms, scatter plots, boxplots, pie charts, line graphs, etc.

Examples of Quantitative Data :

Height or weight of a person or object
Room Temperature
Scores and Marks (Ex: 59, 80, 60, etc.)
Time

The Quantitative data are further classified into two parts :

Discrete Data

The term discrete means distinct or separate. The discrete data contain the values that fall under integers or whole numbers. The total number of students in a class is an example of discrete data. These data can’t be broken into decimal or fraction values.

The discrete data are countable and have finite values; their subdivision is not possible. These data are represented mainly by a bar graph, number line, or frequency table.

Examples of Discrete Data :

Total numbers of students present in a class
Cost of a cell phone
Numbers of employees in a company
The total number of players who participated in a competition
Days in a week

Continuous Data

Continuous data are in the form of fractional numbers. It can be the version of an android phone, the height of a person, the length of an object, etc. Continuous data represents information that can be divided into smaller levels. The continuous variable can take any value within a range.

The key difference between discrete and continuous data is that discrete data contains the integer or whole number. Still, continuous data stores the fractional numbers to record different types of data such as temperature, height, width, time, speed, etc.

Examples of Continuous Data :

Height of a person
Speed of a vehicle
“Time-taken” to finish the work
Wi-Fi Frequency
Market share price

Difference between Discrete and Continuous Data

Discrete Data	Continuous Data
Discrete data are countable and finite; they are whole numbers or integers	Continuous data are measurable; they are in the form of fractions or decimal
Discrete data are represented mainly by bar graphs	Continuous data are represented in the form of a histogram
The values cannot be divided into subdivisions into smaller pieces	The values can be divided into subdivisions into smaller pieces
Discrete data have spaces between the values	Continuous data are in the form of a continuous sequence
Examples: Total students in a class, number of days in a week, size of a shoe,	Example: Temperature of room, the weight of a person, length of an object,

UNIT 2 : STATISTICS AND PROBABILITY BASICS FOR DATA ANALYSIS

A= frac {1}{n} sum limits_{i=1}^n a_i mean Statistics and Probability

Probability Formulas- List of Basic Probability Formulas With Examples

Dependent and Independent Events

Dependent and Independent Events are the types of events that occur in probability. Suppose we have two events say Event A and Event B then if Event A and Event B are dependent events then the occurrence of one event is dependent on the occurrence of other events if they are independent events then the occurrence of one event does not affect the probability of other events.

We can learn about dependent and independent events with the help of examples such as the event of tossing two coins simultaneously the outcome of one coin does not affect the outcome of another coin then they are independent events. Suppose we take other experiments where we toss a coin only when we get a six in the throw of dice, where the outcome of one event is affected by other events then they are dependent events.

Dependent Events

Dependent events are those events that are affected by the outcomes of events that had already occurred previously. i.e. Two or more events that depend on one another are known as dependent events. If one event is by chance changed, then another is likely to differ. Thus, If whether one event occurs does affect the probability that the other event will occur, then the two events are said to be dependent.

When the occurrence of one event affects the occurrence of another subsequent event, the two events are dependent events. The concept of dependent events gives rise to the concept of conditional probability which will be discussed in the article further.

Examples of Dependent Events

For Example, let’s say three cards are to be drawn from a pack of cards. Then the probability of getting a king is highest when the first card is drawn, while the probability of getting a king would be less when the second card is drawn.

In the draw of the third card, this probability would be dependent upon the outcomes of the previous two cards. We can say that after drawing one card, there will be fewer cards available in the deck, therefore the probabilities after each drawn card changes.

Independent Events

Independent events are those events whose occurrence is not dependent on any other event. If the probability of occurrence of an event A is not affected by the occurrence of another event B, then A and B are said to be independent events.

Examples of Independent Events

Tossing a Coin

Sample Space(S) in a Coin Toss = {H, T}

Both getting H and T are Independent Events

Rolling a Die

Sample Space(S) in Rolling a Die = {1, 2, 3, 4, 5, 6}, all of these events are independent too.

Both of the above examples are simple events. Even compound events can be independent events. For example:

Tossing a Coin and Rolling a Die

If we simultaneously toss a coin and roll a die then the probability of all the events is the same and all of the events are independent events,

Sample Space(S) of such experiment = {(1, H), (2, H), (3, H), (4, H), (5, H), (6, H), (1, T), (2, T), (3, T), (4, T) (5, T) (6, T)}.

These events are independent because only one can occur at a time and occurring of one event does not affect other events.

Note

A and B are two events associated with the same random experiment, then A and B are known as independent events if

P(A ∩ B) = P(B).P(A)

Difference Between Independent Events and Dependent Events

The difference between independent events and dependent events is discussed in the table below,

Independent Events	Dependent Events
Independent events are events that are not affected by the occurrence of other events.	Dependent events are events that are affected by the occurrence of other events.
The formula for the Independent Events is, P(A and B) = P(A)×P(B)	The formula for the Dependent Events is, P(B and A) = P(A)×P(B after A)
Examples of Independent Events are, Tossing one coin was not affected by the tossing of other coins Raining for a day and getting six in dice are independent events.	Examples of Dependent Events are, The probability of finding a red ball from a box of 4 red balls and 3 green balls changes if we take out two balls from the box.

Independent Events

Dependent Events

Independent events are events that are not affected by the occurrence of other events.

Dependent events are events that are affected by the occurrence of other events.

The formula for the Independent Events is,

P(A and B) = P(A)×P(B)

The formula for the Dependent Events is,

P(B and A) = P(A)×P(B after A)

Examples of Independent Events are,

Tossing one coin was not affected by the tossing of other coins
Raining for a day and getting six in dice are independent events.

Examples of Dependent Events are,

The probability of finding a red ball from a box of 4 red balls and 3 green balls changes if we take out two balls from the box.

UNIT 3 : REGRESSION MODELS

	=	population standard deviation
	=	the size of the population
	=	each value from the population
	=	the population mean

	=	sample variance
	=	the value of the one observation
	=	the mean value of observations
	=	the number of observations

	=	events
	=	probability of A given B is true
	=	probability of B given A is true
	=	the independent probabilities of A and B

	=	an ordered list of values in the data set
	=	number of values in data set

TUTORIALS IN ENGINEERING LEARNING BY APURVA

MACHINE INTELLIGENCE IN DATA SCIENCE

UNIT 1 : INTRODUCTION TO DATA SCIENCE

What Is Data Science?

Data Science Lifecycle

Relation between data science and machine learning

Types Of Data – Nominal, Ordinal, Discrete and Continuous

The data is classified into four categories:

Qualitative or Categorical Data

The other examples of qualitative data are :

The Qualitative data are further classified into two parts :

Nominal Data

Ordinal Data

Examples of Ordinal Data :

Difference between Nominal and Ordinal Data

Examples of Quantitative Data :

The Quantitative data are further classified into two parts :

Discrete Data

Examples of Discrete Data :

Continuous Data

Examples of Continuous Data :

Difference between Discrete and Continuous Data

UNIT 2 : STATISTICS AND PROBABILITY BASICS FOR DATA ANALYSIS

STATISTICS:

Mean, Median and Mode

Mean

Median

Mode

Variance and Standard Deviation

Variance

Standard Deviation

Population Data V/s Sample Data

Correlation

Correlation Coefficient

Correlation and Causation

How to Find the Correlation?

Example of Correlation

PROBABILITY:

What is Probability?

Conditional Probability

For example:

Bayes’ Theorem

For example:

Dependent and Independent Events

Dependent Events

Examples of Dependent Events

Independent Events

Examples of Independent Events

Difference Between Independent Events and Dependent Events

Mutually Exclusive Events

Conditional Probability Formula

Random Variables

Example

Random Variable Definition

Variate

Types of Random Variable

Random Variable Formula

Functions of Random Variables

Random Variable and Probability Distribution

Continuous Probability Distributions

The Normal Probability Distribution

What is the normal distribution?

Normal distribution explained

Basic examples of normal distribution: Height and weight

Importance of normal distribution

Normal distribution formula and empirical rule

Parameters of normal distribution

1. The mean

2. The standard deviation

Central Limit Theorem

Assumptions of the Central Limit Theorem

Formula

UNIT 3 : REGRESSION MODELS

Comments

Post a Comment

Popular posts from this blog

HOME