UNIT-1 INTRODUCTION TO AI
Artificial: Something main made
intelligence : Intelligence can be of a quality of anyone- humans, animals, birds and even machines / Ability to interact with the real world / Ability to acquire information, and to retain it as knowledge and apply knowledge and skills in various domains such as making decisions, solving problems, creating new things, choosing correct tool, path or people in specific situation
Type of Intelligence: (9 types ) / SMILE NK ( Spatial Visual ,Mathematical ,Musical , Intrapersonal , Interpersonal , Linguistic , Existential , Naturalist ,Kineasthetic )
Interpersonal: Ability to understand and communicate effectively with others / Intera-personal: Ability to understand oneself and one's thoughts and feelings/ Existential relating to religious and spiritual awareness/Linguistic: Language related may be written or spoken skill
Decision making: Process of picking up a final choice from a set of available choices after assessment / We can’t make “good” decisions without information /information may be past experience, intuition, knowledge and self awareness
AI : When a machine possesses the ability to mimic human traits, i.e., make decisions, predict the future, learn and improve on its own, it is said to have artificial intelligence /you can say that a machine is artificially intelligent when it can accomplish tasks by itself - collect data, understand it, analyses it, learn from it, and improve it
Any machine that has been trained with data and can make decisions/predictions on its own can be termed as AI. Here, the term ‘training’ is important.
AI Can do : AI based system can discover patterns from the available information / Can make decision / Can converse in natural language / can recognize and read from images
What AI IS NOT : AI is not just automation/No emotions / Not magic it is math and algorithm/ AI is not single entity like human or animal but it composed of multiple programs, lot of data and information
AI means the use of intelligence and not just the automation like Automatic Washing machine or smart TV or Smart AC
Language used for AI: JAVA, PYTHON, PERL,LISP, PROLOG
Applications of Artificial Intelligence around us : Google Search engine, Google Map, Google Assistant(Smart assistant) , Speech recognition(speech to text), Sentiment analysis, Digital phone calls, chatbots, targeted advertising , fraud and risk detection, Weather prediction, Price Comparison Websites , Self-Driving cars, Face Lock in Smartphone’s , Email filters
Chatbots and Virtual assistants based on NLP
Machine Learning (ML) It is a subset of AI which enables machines to improve at tasks with experience (data). It enables machines to learn by themselves using the provided data and make accurate Predictions/ Decisions.
Deep Learning (DL): It enables software to train itself to perform tasks with vast amounts of data. Deep Learning is the most advanced form of Artificial Intelligence out of these three. Deep learning is a subset of machine learning that uses artificial neural networks to mimic the learning process of the human brain.
Big Data: huge amounts of data, which is regularly growing at an exponential rate for e.g data of social media (post, pictures, responses, users etc.). Big data cannot be handled without AI
AI Domains :(3) Data Sciences/ Computer Vision /Natural Language Processing (NLP)
Common Misconceptions about AI: AI will take your job/AI does not require humans/AI is harmful for people
AI Ethics: is a set of moral principles which help us discern between right and wrong. AI ethics is a multidisciplinary field that studies how to optimize AI's beneficial impact while reducing risks and adverse outcomes. Examples of AI ethics issues include data responsibility and privacy, fairness, transparency, environmental sustainability, moral agency, value alignment, accountability, trust, and technology misuse. This means taking a safe, secure, humane, and environmentally friendly approach to AI. A strong AI code of ethics can include avoiding bias, ensuring privacy of users and their data, and mitigating environmental risks.
UNIT-7 Evaluation
Evaluation is the process of understanding the reliability of any AI model, based on outputs by feeding test dataset into the model and comparing with actual answers.
· A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. It is often used to measure the performance of classification models. The matrix displays the number of true positives (TP), True negatives (TN), false positives (FP), and false negatives (FN) produced by the model on the test data. Prediction and Reality can be easily mapped together with the help of this confusion matrix.
· TP: Prediction and reality are positive
· TN: prediction and reality are negative
· FP: Prediction positive but reality is negative
· FN: Prediction negative but reality is Positive
TP AND TN ARE CORRECT RESULT OR DECISION OF AI MODEL
FP AND FN ARE ERRORS OR INCORRECT RESULS OF AI MODEL
· Prediction:(prediction by the model) TRUE/FALSE
· Actual Result: POSITIVE/NEGETIVE
Accuracy rate: Percentage of correct predictions out of all the observations.
(CORRECT PREDICTION/TOTAL CASES) * 100 OR (TP+TN)/(TP+TN+FP+FN) *100
Precision rate: (True positive out of all positive)
It is defined as the percentage of true positive cases versus all the cases where the prediction is true (TRUE POSITIVE/ALL PREDICTED POSITIVE) * 100 OR (TP)/(TP+FP) * 100
Recall: (rate of correct positive predictions))
It can be defined as the fraction of positive cases that are correctly identified. RECALL=(TP/TP+FN)*100
(It can be defined as the fraction of positive cases that are correctly identified. It majorly takes into account the true reality cases where in Reality there was a fire but the machine either detected it correctly or it didn’t. That is, it considers True Positives (There was a forest fire in reality and the model predicted a forest fire) and False Negatives (There was a forest fire and the model didn’t predict it).)
F1 Score defined as the measure of balance between precision and recall.
2* (Precision*recall)/ (Precision+recall)
When F1 score is high(1 or 100%) , we can say AI model will work efficiently
Which Metric is Important? Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a False Negative can cost us a lot and is risky too. Imagine no alert being given even when there is a Forest Fire. The whole forest might burn down. Another case where a False Negative can be dangerous is Viral Outbreak. Imagine a deadly virus has started spreading and the model which is supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people. On the other hand, there can be cases in which the False Positive condition costs us more than False Negatives. One such case is Mining. Imagine a model telling you that there exists treasure at a point and you keep on digging there but it turns out that it is a false alarm. Here, False Positive case (predicting there is treasure but there is no treasure) can be very costly. Similarly, let’s consider a model that predicts that a mail is spam or not. If the model always predicts that the mail is spam, people would not look at it and eventually might lose important information. Here also False Positive condition (Predicting the mail as spam while the mail is not spam) would have a high cost.
UNIT-6 NLP(Natural Language Processing)
NLP is branch of AI that enable computer to process human language in the form of text or voice data. It is a domain of AI.
Applications of NLP :Automatic Text Summarization(in this approach we build algorithms or programs which will reduce the text size and create a summary of our text data.) / sentiment Analysis: ( opinion mining) technique used to determine whether data is positive, negative or neutral. Used to identify opinions and sentiment online about company product to help them understand what customers think about their products and services / Text classification:(text tagging or text categorization ) is the process of categorizing unstructured text into organized groups. By using NLP, text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.
/ Virtual Assistants(Smart Assistants) : Now a day’s Google Assistant, Microsoft's Cortana, Apple's Siri, Amazon's Alexa, etc. NLP Based programs that are automated to communicate in human voice ,mimiking human interaction to help ease your day to day task such showing weather reports, creating reminders, making shopping list etc.
Digital Phone Calls Automated systems direct customer calls to a service representative or online Chatbot's, which respond to customer requests with helpful information/ Chatbot: A chatbot is a computer program that simulates and processes human conversation(written or spoken), allowing human to interact with digital devices as if they were communicating with a real person/
Two type of chatbots: script bot(easy to make,limited functionality, no or limited coding required) and smart bot(flexible and powerful, wide functionality ,coding required, use AI and ML) e.g SMART BOT Google Assistant, Microsoft's Cortana, Apple's Siri, Amazon's Alexa,
Example of chatbot : Mitsuku Bot, Jabberwacky, Rose,CleverBot /Syntax: Syntax refers to the grammatical structure of a sentence./Semantics: It refers to the meaning of the sentence./ Stemming is a technique used to extract the base form of the words by removing affixes from them
NLP takes in the data of Natural Languages in the form of written words and spoken words which humans use in their daily lives and operate on this.
Term frequency is the frequency of a word in one document. Term frequency can easily be found from the document vector table as in that table we mention the frequency of each word of the vocabulary in each document.
Document Frequency is the number of documents in which the word occurs irrespective of how many times it has occurred in those documents.
In the case of inverse document frequency, we need to put the document frequency in the denominator while the total number of documents is the numerator.
For example, if the document frequency of the word “AMAN” is 2 in a particular document then its inverse document frequency will be 3/2. (Here no. of documents is 3)
Term frequency Inverse Document Frequency is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
The number of times a word appears in a document is divided by the total number of words in the document. Every document has its own term frequency.
Stemming is a technique used to extract the base form of the words by removing affixes from them. It is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating, eats, eaten is eaten.
Stemming is the process in which the affixes of words are removed and the words are converted to their base form.
Lemmatization is the grouping together of different forms of the same word. In search queries, lemmatization allows end-users to query any version of a base word and get relevant results.
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building Python programs that can work with human language data.
Example of Multiple meanings of a word –
His face turns red after consuming the medicine Meaning – Is he having an allergic reaction? Or is he not able to bear the taste of that medicine?
Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea. This statement is correct grammatically but it does not make any sense. In Human language, a perfect balance of syntax and semantics is important for better understanding.
Data sciences
Data science is a domain of AI related to data systems and processes, in which the system collects numerous data, maintains data sets and derives meaning/sense out of them. The information extracted through data science can be used to make a decision about it. Data Sciences analyze the data and helps in making the machine intelligent enough to perform tasks by itself.
Artificial Intelligence is a technology which completely depends on data. It is the data which is fed into the machine which makes it intelligent. And depending upon the type of data we have
Types of Data/Data Formats For Data Science: CSV(comma separated values), Excel Spreadsheet, SQL(Structured Query Language.): XML: (e-Xtensible markup language),JSON(javascript object notation), XLSX: A file is a MS Open XML format spreadsheet
Data Access After collecting the data, to be able to use it for programming purposes, we should know how to access the same in a Python code. To make our lives easier, there exist various Python packages which help us in accessing structured data (in tabular form) inside the code(Numpy,Matplotlib,pandas).
UNIT-1 INTRODUCTION TO AI
Artificial: Something main made
intelligence : Intelligence can be of a quality of anyone- humans, animals, birds and even machines / Ability to interact with the real world / Ability to acquire information, and to retain it as knowledge and apply knowledge and skills in various domains such as making decisions, solving problems, creating new things, choosing correct tool, path or people in specific situation
Type of Intelligence: (9 types ) / SMILE NK ( Spatial Visual ,Mathematical ,Musical , Intrapersonal , Interpersonal , Linguistic , Existential , Naturalist ,Kineasthetic )
Interpersonal: Ability to understand and communicate effectively with others / Intera-personal: Ability to understand oneself and one's thoughts and feelings/ Existential relating to religious and spiritual awareness/Linguistic: Language related may be written or spoken skill
Decision making: Process of picking up a final choice from a set of available choices after assessment / We can’t make “good” decisions without information /information may be past experience, intuition, knowledge and self awareness
AI : When a machine possesses the ability to mimic human traits, i.e., make decisions, predict the future, learn and improve on its own, it is said to have artificial intelligence /you can say that a machine is artificially intelligent when it can accomplish tasks by itself - collect data, understand it, analyses it, learn from it, and improve it
Any machine that has been trained with data and can make decisions/predictions on its own can be termed as AI. Here, the term ‘training’ is important.
AI Can do : AI based system can discover patterns from the available information / Can make decision / Can converse in natural language / can recognize and read from images
What AI IS NOT : AI is not just automation/No emotions / Not magic it is math and algorithm/ AI is not single entity like human or animal but it composed of multiple programs, lot of data and information
AI means the use of intelligence and not just the automation like Automatic Washing machine or smart TV or Smart AC
Language used for AI: JAVA, PYTHON, PERL,LISP, PROLOG
Applications of Artificial Intelligence around us :
Computer Vision: Self-Driving cars, Face Lock in Smartphone’s, Google search by image, medical imaging
Data Science: Google Search engine, Google Map, , targeted advertising , fraud and risk detection, Weather prediction, Price Comparison Websites , Email filters
NLP Based applications: Speech recognition(speech to text), Sentiment analysis, Digital phone calls, chatbots ,Virtual assistants,Digital phone calls, Google assistant ,Alexa
Machine Learning (ML) It is a subset of AI which enables machines to improve at tasks with experience (data). It enables machines to learn by themselves using the provided data and make accurate Predictions/ Decisions.
Deep Learning (DL): It enables software to train itself to perform tasks with vast amounts of data. Deep Learning is the most advanced form of Artificial Intelligence out of these three. Deep learning is a subset of machine learning that uses artificial neural networks to mimic the learning process of the human brain.
Big Data: huge amounts of data, which is regularly growing at an exponential rate for e.g data of social media (post, pictures, responses, users etc.). Big data cannot be handled without AI
AI Domains :(3) Data Sciences/ Computer Vision /Natural Language Processing (NLP)
Common Misconceptions about AI: AI will take your job/AI does not require humans/AI is harmful for people
AI Ethics: is a set of moral principles which help us discern between right and wrong. AI ethics is a multidisciplinary field that studies how to optimize AI's beneficial impact while reducing risks and adverse outcomes. Examples of AI ethics issues include data responsibility and privacy, fairness, transparency, environmental sustainability, moral agency, value alignment, accountability, trust, and technology misuse. This means taking a safe, secure, humane, and environmentally friendly approach to AI. A strong AI code of ethics can include avoiding bias, ensuring privacy of users and their data, and mitigating environmental risks.
Evaluation
Evaluation is the process of understanding the reliability of any AI model, based on outputs by feeding test dataset into the model and comparing with actual answers.
· A confusion matrix is a matrix that summarizes the performance of a machine learning model on a set of test data. It is often used to measure the performance of classification models. The matrix displays the number of true positives (TP), True negatives (TN), false positives (FP), and false negatives (FN) produced by the model on the test data. Prediction and Reality can be easily mapped together with the help of this confusion matrix.
· TP: Prediction and reality are positive
· TN: prediction and reality are negative
· FP: Prediction positive but reality is negative
· FN: Prediction negative but reality is Positive
TP AND TN ARE CORRECT RESULT OR DECISION OF AI MODEL
FP AND FN ARE ERRORS OR INCORRECT RESULS OF AI MODEL
· Prediction:(prediction by the model) TRUE/FALSE
· Actual Result: POSITIVE/NEGETIVE
Accuracy rate: Percentage of correct predictions out of all the observations.
(CORRECT PREDICTION/TOTAL CASES) * 100 OR (TP+TN)/(TP+TN+FP+FN) *100
Precision rate: (True positive out of all positive)
It is defined as the percentage of true positive cases versus all the cases where the prediction is true (TRUE POSITIVE/ALL PREDICTED POSITIVE) * 100 OR (TP)/(TP+FP) * 100
Recall: (rate of correct positive predictions))
It can be defined as the fraction of positive cases that are correctly identified. RECALL=(TP/TP+FN)*100
(It can be defined as the fraction of positive cases that are correctly identified. It majorly takes into account the true reality cases where in Reality there was a fire but the machine either detected it correctly or it didn’t. That is, it considers True Positives (There was a forest fire in reality and the model predicted a forest fire) and False Negatives (There was a forest fire and the model didn’t predict it).)
F1 Score defined as the measure of balance between precision and recall.
2* (Precision*recall)/ (Precision+recall)
When F1 score is high(1 or 100%) , we can say AI model will work efficiently
Which Metric is Important? Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a False Negative can cost us a lot and is risky too. Imagine no alert being given even when there is a Forest Fire. The whole forest might burn down. Another case where a False Negative can be dangerous is Viral Outbreak. Imagine a deadly virus has started spreading and the model which is supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people. On the other hand, there can be cases in which the False Positive condition costs us more than False Negatives. One such case is Mining. Imagine a model telling you that there exists treasure at a point and you keep on digging there but it turns out that it is a false alarm. Here, False Positive case (predicting there is treasure but there is no treasure) can be very costly. Similarly, let’s consider a model that predicts that a mail is spam or not. If the model always predicts that the mail is spam, people would not look at it and eventually might lose important information. Here also False Positive condition (Predicting the mail as spam while the mail is not spam) would have a high cost.
NLP
NLP is branch of AI that enable computer to process human language in the form of text or voice data. It is a domain of AI.
Applications of NLP :Automatic Text Summarization(in this approach we build algorithms or programs which will reduce the text size and create a summary of our text data.) / sentiment Analysis: ( opinion mining) technique used to determine whether data is positive, negative or neutral. Used to identify opinions and sentiment online about company product to help them understand what customers think about their products and services / Text classification:(text tagging or text categorization ) is the process of categorizing unstructured text into organized groups. By using NLP, text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content.
/ Virtual Assistants(Smart Assistants) : Now a day’s Google Assistant, Microsoft's Cortana, Apple's Siri, Amazon's Alexa, etc. NLP Based programs that are automated to communicate in human voice ,mimiking human interaction to help ease your day to day task such showing weather reports, creating reminders, making shopping list etc.
Digital Phone Calls Automated systems direct customer calls to a service representative or online Chatbot's, which respond to customer requests with helpful information/ Chatbot: A chatbot is a computer program that simulates and processes human conversation(written or spoken), allowing human to interact with digital devices as if they were communicating with a real person/Two type of chatbots: script bot(easy to make,limited functionality, no or limited coding required) and smart bot(flexible and powerful, wide functionality ,coding required, use AI and ML)
Example of chatbot : Mitsuku Bot, Jabberwacky, Rose,CleverBot /Syntax: Syntax refers to the grammatical structure of a sentence./Semantics: It refers to the meaning of the sentence./ Stemming is a technique used to extract the base form of the words by removing affixes from them
HUMAN VS COMPUTER LANGUAGES AND NLP
The main function of both the human and computer languages in the same: communicating message across (communication)
Humans communicate through language which we process all the time. Our brain keeps on processing the sounds that it hears around itself and tries to make sense out of them all the time.
On the other hand, the computer understands the language of numbers. Everything that is sent to the machine has to be converted to numbers. And while typing, if a single mistake is made, the computer throws an error and does not process that part. The communications made by the machines are very basic and simple.
Human languages are natural and used for communication between people, often varying by culture and region. They can be ambiguous and context-dependent, and are dynamic, changing over time.
Computer languages, on the other hand, are synthetic and used for communication between computers and humans.
The main function of both the human and computer languages in the same: communicating message across (communication)
Text Normalization: The text normalization divide the text into smaller components called tokens(words). Steps for Normalization: 1. Sentence segmentation: The whole text is divided into individual sentences 2. Tokenization: Each sentence is further divided into tokens 3.Removing stop words, special characters and numbers: 4. Stemming: is a technique used to extract the base form of the words by removing affixes from them. It is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating, eats, eaten is eaten.
E.g Crying-> Cry, smiling->smili, caring ->car , smiles->smile 5. Lemmatization The process of converting a word to its actual root form linguistically (as per language). The words extracted through lemmatization are called lemmas
Cried->cry , smiling-> smile , smiled->smile , caring->care
Stop wards: are the words in any language which do not add much meaning to a sentence. They can safely be ignored without sacrificing meaning of sentence. E.g is,are,a,an,so etc.
Advantage of removing stop words: 1. Dataset size decreases 2. The time to train the ai model decreases 3. Improve performance of ai model
CONCEPT OF NLP
NLP takes in the data of Natural Languages in the form of written words and spoken words which humans use in their daily lives and operate on this.
(a)Text Normalization: The text normalization divide the text into smaller components called tokens(words).Aim of text normalization is to cnvert text to a standard form
(b) Case normalisation: Convert the all the words in same case(lower case)
Case Normalisation
(C) Finally convert to Numbers : As computer language understand numbers better than alphabets and words, we have to convert the normalised text into numbers
Bag of words (BoW): It is a statistical language model used to analyse text and documents based on word count. It is a representation of text that describe the occurrence of words within a document. A Bag of words contains two things (1) A vocabulary of known words (2) Frequency of words
Steps to Implement BoW Model:
1. normalisation 2. Design vocabulary: Make the list of words in our model vocabulary(corpus) 3. Create document vector 4. calculate TF-IDF
Design vocabulary: Make the list of words . Collection of these words called corpus.
NLP takes in the data of Natural Languages in the form of written words and spoken words which humans use in their daily lives and operate on this.
Term frequency(TF) is the frequency of a word in one document. Term frequency can easily be found from the document vector table as in that table we mention the frequency of each word of the vocabulary in each document.
Document Frequency (DF) :is the number of documents in which the word occurs irrespective of how many times it has occurred in those documents.
In the case of inverse document frequency, we need to put the document frequency in the denominator while the total number of documents is the numerator.
For example, if the document frequency of the word “AMAN” is 2 in a particular document then its inverse document frequency will be 2/3. (Here no. of documents is 3)
Term frequency Inverse Document Frequency(TF-IDF) is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
The number of times a word appears in a document is divided by the total number of words in the document. Every document has its own term frequency.
Natural Language Toolkit (NLTK). NLTK is one of the leading platforms for building Python programs that can work with human language data.
Example of Multiple meanings of a word –
His face turns red after consuming the medicine Meaning – Is he having an allergic reaction? Or is he not able to bear the taste of that medicine?
Example of Perfect syntax, no meaning-
Chickens feed extravagantly while the moon drinks tea. This statement is correct grammatically but it does not make any sense. In Human language, a perfect balance of syntax and semantics is important for better understanding.
Data sciences
Data science is a domain of AI related to data systems and processes, in which the system collects numerous data, maintains data sets and derives meaning/sense out of them. The information extracted through data science can be used to make a decision about it. Data Sciences analyze the data and helps in making the machine intelligent enough to perform tasks by itself.
Artificial Intelligence is a technology which completely depends on data. It is the data which is fed into the machine which makes it intelligent. And depending upon the type of data we have
Types of Data/Data Formats For Data Science: CSV(comma separated values), Excel Spreadsheet, SQL(Structured Query Language.): XML: (e-Xtensible markup language),JSON (javascript object notation), XLSX: A file is a MS Open XML format spreadsheet
Data Access After collecting the data, to be able to use it for programming purposes, we should know how to access the same in a Python code. To make our lives easier, there exist various Python packages which help us in accessing structured data (in tabular form) inside the code(Numpy,Matplotlib,pandas).
Pandas :Pandas is a software library written for the Python programming language for data manipulation and analysis.
Applications of Data Sciences: Fraud and Risk Detection,Genetics & Genomics, Internet Search, Targeted Advertising, Website Recommendations, Airline Route Planning
PIXEL : The word “pixel” means a picture element. Every photograph, in digital form, is made up of pixels. They are the smallest unit of information that make up a picture. / The number of pixels in an image is sometimes called the resolution. The resolution of a digital image is measured using its pixels; specifically in pixels per inch (PPI). For printing, picture resolution is measured by dots per inch (DPI),/ RGB Images All the images that we see around are coloured images. These images are made up of three primary colors Red, Green and Blue.
K-Nearest Neighbour Model/algorithm (KNN ): mostly used for classification based problems/ mostly it is used for the Classification problems. / K-NN algorithm stores all the available data and classifies a new data point based on the similarity/The data-point is classified on the basis of its k Nearest Neighbors, followed by the majority vote of those nearest neighbors; a query point(unlabelled point ) is assigned the data class which has the most representatives within the nearest neighbors of the point. /The value of K signifies Number of neighbors
AI Project Cycle: A step-by-step process that a person should follow to develop an AI Project to solve a problem using proved scientific methods . AI Project Cycle mainly has 5 stages: Problem Scoping- Understanding the problem. Problem Scoping refers to understanding a problem, finding out various factors which affect the problem, define the goal or aim of the project. /Data Acquisition- Collecting accurate and reliable data /Data Exploration- Arranging the data uniformly /Modelling- Creating Models from the data / Evaluation- Evaluating the project
4Ws Problem Canvas: The 4Ws Problem canvas helps in identifying the key elements related to the problem. / Who - who all are affected directly and indirectly with the problem and who are called the Stake Holders / What - understanding and identifying the nature of the problem and under this block, you also gather evidence to prove that the problem you have selected exists/ Where- "Where” does the problem arise, situation ,context, and location./Why : How the Solution will help Stakeholders from the solution and how it will benefit them as well as the society/
Data Exploration is the process of arranging the gathered data uniformly for a Better understanding. Data can be arranged in the form of a table, plotting a
chart, or making a database. The tools used to visualize the acquired data are known as data visualization or exploration tools. / Few data visualization tools are: Google Charts, Tableau, Fusion Charts, High charts
AI models: 2 types
(A) Learning based(Supervised Learning, Unsupervised Learning , Reinforcement Learning ) (S-U-R)
(B ) Rule based
Supervised Learning: the dataset which is fed to the machine is labelled. A label is some information which can be used as a tag for data
Two Types
a. Classification – Where the data is classified according to the labels. For example, in the grading system, students are classified on the basis of the grades they obtain with respect to their marks in the examination. This model works on discrete dataset which means the data need not be continuous.
b. Regression –: Such models work on continuous data. For example, if you wish to predict your next salary, then you would put in the data of your previous salary, any increments, etc., and would train the model. Here, the data which has been fed to the machine is continuous.
Unsupervised Learning
An unsupervised learning model works on unlabelled dataset. This means that the data which is fed to the machine is random and there is a possibility that the person who is training the model does not have any information regarding it .
There are two type of Unsupervised learning models in AI –
a. Clustering – refers to the unsupervised learning technique that can cluster the unknown data according to patterns or trends found in it. The developer may already be aware of the patterns noticed, or it may even generate some original patterns as a result.
b. Dimensionality Reduction – If you have a large number of features, it could be beneficial to minimise them using an unsupervised step before moving on to supervised steps. Numerous unsupervised learning techniques include a transform technique that can be used to lessen the conditionality.
Reinforcement Learning
In this type of learning, The system works on Reward or Penalty policy. In this an agent performs an action positive or negative, in the environment which is taken as input from the system, then the system changes the state in the environment and the agent is provided with a reward or penalty.
The system also builds a policy,that what action should be taken under a specific condition.
Data features: refer to the type of data you want to collect. Ex: salary amount, increment percentage, increment period, bonus, etc. / various ways in which you can collect data are surveys, interviews, observations, sensors, cameras, API (Application Program Interface), web scrapping etc. / Web Scraping: means collecting data from web using some technologies./ Sensors are the part of IOT. IOT is internet of things./ Camera captures the visual information and then that information which is called image is used as a source of data./ Observations:When we observe something carefully we get some information / Surveys: method of gathering specific information from a sample of people.
Sustainable Development: To Develop for the present without exploiting the resources of the future./The Sustainable Development Goals (SDGs),also known as the Global Goals, were adopted by all United Nations Member States in 2015 as a universal call to action to end poverty, protect the planet and ensure that all people enjoy peace and prosperity
Constraints: conditions that can be enforced on the attributes of a relation. The constraints come in play whenever we are trying to insert, delete or update a record in a relation// Unique :Means that the values under that column are always unique. E.g.Roll_no number (3) unique /Primary key : means that a column cannot have duplicate values and not even a null value. e.g. Roll_no number (3) primary key /The main difference between unique and primary key constraint is that a column specified as unique may have null value but primary key constraint does not allow null values in the column. /Check constraint : limits the values that can be inserted into a column of a table. e.g. marks number(3) check(marks>=0) The above statement declares marks to be of type number and while inserting or updating the value in marks it is ensured that its value is always greater than or equal to zero. /Default constraint :Is used to specify a default value to a column of a table automatically. This default value will be used when user does not enter any value for that column. e.g. balance number(5)default = 0
JOIN ::Clause is used to combine rows from two or more tables, based on a related/common column between them. The join clause is used to combine tables based on a common column and a join condition.
An equi join is a type of join that combines tables based on matching values in specified columns. The resultant table contains repeated columns. It is possible to perform an equi join on more than two tables.
The SQL NATURAL JOIN is a type of EQUI JOIN and is structured in such a way that, columns with the same name of associated tables will appear once only.
Referential integrity refers to the relationship between tables. Four features of referential integrity are: (a) It is used to maintain the accuracy and consistency of data in a relationship. (b) It saves time as there is no need to enter the same data in separate tables. (c) It reduces data-entry errors. (d) It summarizes data from related tables.