BEST GLOSSARY OF MOST IMPORTANT ARTIFICIAL INTELLIGENCE (AI) TERMINOLOGIES WITH DEFINITIONS
Accuracy: refers to the percentage amount of the correct outcome prediction made by artificial intelligence tools.
Activation Function: this is used in the calculation of non-linear relationships, in deep learning models.
For detail knowledge refere to: Wikipedia
Actuators: Actuators are the components in electromechanical machines that convert energy into motion in Artificial Intelligence, it helps to provide robot movements.
For detail knowledge refere to: Wikipedia
Adversarial Example: They are deliberately designed inputs to machine learning models that have the ability to completely disturb, previously tuned classifiers by specifically transforming an image so that they may be like optical illusions for the machine.
Adversarial Machine Learning: This is the research field of machine learning, which studies attacks on machine learning algorithms as well as defense against such attacks. Adversarial Machine Learning enables the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection, and biometric recognition.
For detail knowledge refere to: Wikipedia
AI Winter: It is a dormant period in Artificial Intelligence history when reducing in funding and interest in Artificial Intelligence research and due to this, the industry came under pressure.
For detail knowledge refere to: Wikipedia
AI: see Artificial Intelligence
Algorithm: A pre-set of rules and formulas for a computer in order to complete a given task as an output result.
For detail knowledge refere to: Wikipedia
Application Programming Interface (API): Application Programming Interface (API) is a set of commands, functions, protocols, and objects used by a programmer to create software that allows two software program components to communicate with each other. Each software will have its own API.
For detail knowledge refere to: Wikipedia
Artificial General Intelligence (AGI): AGI is a machine that has the ability to think, understand and perform tasks exactly like humans. AGI is also known as “Strong AI”
Artificial General Intelligence (AGI) still seems like a fantasy as AGI doesn’t exist at present, but in the future, perhaps Artificial Intelligence achieve such a level that can make it possible.
For detail knowledge refere to: Wikipedia
ARTIFICIAL INTELLIGENCE (AI): Artificial Intelligence refers to the development of human-like intelligence in the machine, this is one of the newest branches of computer science that is concerned with developing such smart machines that can perceive, understand, predict and perform a provided task that requires human-like intelligence.
In other words, Artificial Intelligence is the latest approach to developing applications that can think, understand and perform tasks intelligently as intelligent humans do.
The Artificial Intelligence Research field has several branches including subsets like machine learning, deep learning, and Natural Language Processing (NLP).
Artificial Narrow Intelligence (ANI): This is the only type of artificial intelligence whose output data we can rely on because it is the only existing type of AI at the present time. ANI is designed in such a way that it can only focus on one task at a particular given time. (e.g., Facial Recognition, Speech Recognition, Computer playing games as a competitor against a human, etc.)
Artificial Neural Network (ANN): Artificial Neural Networks are one of AI’s research areas and the most basic structure for a deep learning model. Artificial Neural Networks is a computational model that is inspired by the biological neural network of the human nervous system that creates structures like human brains.
Just as the neurons of the human brain are interconnected with each other, similarly, neurons of artificial neural networks are also connected with each other in several layers of the networks.
Dendrites accept the information through the sensory organs and inputs and forward it to neurons where information is processed and sent the information back to the body as output.
Automated Machine Learning (AutoML): Automated Machine Learning (AutoML) is the platform and set of tools for beginners in the field of AI, where they create their own Artificial Intelligence machine models.
Automation Fatigue: With the RPA system, there will generally be less improvement as more tasks are automated.
Backpropagation: Backpropagation is the abbreviation for “backward propagation of errors”. this is the algorithm used in the Training feedforward Neural Network, where the system’s initial output is compared to the expected output, then adjusted until the difference (between outputs) becomes minimal.
Bayes’ Theorem: Bayes’ Theorem is a statistical measure widely used in machine learning that provides a principal way of calculating the probability of occurrence of an event that is relevant to any condition.
Bayesian networks: A Bayesian Network is the compact and flexible representation of variables. it graphically represents casual relations between variables and their conditional dependencies. Bayesian Network is also known as a Bayes network, Bayes net, belief network, or decision network.
BIG DATA: “Big Data” refers to a technology that involves processing huge amounts of structured and unstructured data that is too complex to be handled by standard data-processing software.
Binning: Involves organizing data into groups.
Brute Force Search: brute-force search or exhaustive search, also known as generating and testing, is a very general problem-solving technique and algorithmic paradigm that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem’s statement. [source Wikipedia]
Categorical Data: Categorical data means the group of information with similar characteristics like race, gender, place, etc. categorical data does not have numbers because the group of information in form of numbers is called numerical data.
Cerebral Cortex: The cerebral cortex is the outer layer of the cerebrum i.e., the part of the human brain. It has great similarities with AI and helps in thinking and acting intellectually.
CHATBOT (conversational agent): Chatbots are AI-based applications that allow you to communicate with a digital device as if you were talking to a human through text or voice commands. (For example, automated conversation services in customer relations).
Classification: Classification is analgorithm technique by which machines assign categories to data points.
Clustering: A classification is a form of the unsupervised learning algorithm. AI put unlevelled data into the group by using this algorithm technique.
Cobot: Cobot is the abbreviation of “Collaborative Robot” that can work safely alongside people.
Cognitive computing: a computer model that mimics the way the human brain works using AI technology such as data mining, NLP, and pattern recognition.
Cognitive Robotic Process Automation (CRPA): An RPA system that leverages AI technologies.
Computer vision: Computer vision is the part of Artificial Intelligence, that deals with how computers system detects understanding from digital images, videos, and other visual input. (Facial, Text, and other object detection from camera feeds is the best example of Computer Vision).
Content Moderation: Content moderation generally refers to the criterion of precision of user submissions which is the practice of monitoring and enforcing a pre-determined set of rules and guidelines to determine whether a communication input (especially a post) is permissible. Or not.
Convolutional neural network (CNN): Convolutional neural networks are deep artificial neural networks created for analyzing, classifying, and clustering visual imagery by using multilayer perceptrons and also performing object recognition within visual images.
CPU (Central Processing Unit): The CPU is essentially the brain of a CAD system. The computer’s central process unit (CPU) is the electronic circuitry within a computer that retrieves and executes the instructions of a computer program. CPU consists of an arithmetic and logic unit (ALU), a control unit, and many resisters. This performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program.
Custom Training: The process of teaching a model to make certain predictions.
Data: data is the collection of various kinds of information such as visual images, videos, numbers, descriptions, etc. converted into a binary digital form that is stored for a specific purpose.
Data Lake: a centralized repository that allows to storage and process of enormous amounts of structured and unstructured data.
Data mining: a process of discovering recurring patterns within large sets of data with the intent to extract meaningful information from it.
Data Science: Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision-making and strategic planning. (source https://https://www.ibm.com)
Data Type: As the name indicates data type is a type of data or classification of data such as numeric, alphanumeric, decimal, etc. that teach the compiler or interpreter how the programmer intends to use the data. Many programming languages support various types of data, including integer, real, character or string, and Boolean.
Database: A database is like a container where structured information, or data, is generally stored digitally in a computer system for specific purposes and the same can be reprocessed by computer means to produce meaningful information. A database is generally managed by the Database Management System (DBMS).
Datamining (Data analysis and mining): Data mining is the process of sorting to bring out the required information, models, correlations, and trends through a large volume of data. It makes it possible for data analysis that helps in solving business problems. Also, its techniques and tools help the business world to predict future trends and make more-informed business decisions.
Decision Tree: A Supervised machine learning algorithm that is a workflow of decision paths, that can be used for both classification and regression problems.
Deep Learning: Deep learning is the part of machine learning and artificial intelligence that used layered (or Deep) algorithms and mimics the process of the human brain to learn patterns in data. generally, it is used for Supervised Learning Problems.
Also, refer to Machine learning and Neural networks.
Deep Neural Network: A deep Neural network is an artificial neural network (ANN) that consists of several layers between the input and output layers. it processes data in a complex way by using sophisticated mathematics.
Deepfake: deepfake refers to the use of deep learning artificial intelligence to swap the likeness face of one person with another in existing digital media like video, images and etc.
Detection: To notice or discover an event or object.
Digital ecosystem: a digital ecosystem is a digital network environment where all stakeholder work in tandem by connecting online and interacting digitally in a way that creates value for everyone.
Ensemble Modeling: Ensemble Modeling involves using two or more related but different analytical models for generating predictions.
Ethics Board: A committee that evaluates the issues of AI projects and also provides guidance that how the AI organization researches and exploits AI technology and associated data.
ETL (Extraction, Transform, and Load): ETL is a data integration process in which ETL tools extract the data from multiple data sources system into a single, thereafter transform the data in the staging area, and finally, load it into a data warehouse system.
Expert System: An early type of AI application emerged in the 1980s. it was designed to solve complex problems by extracting knowledge from multiple knowledge bodies. Expert System uses AI technologies to simulate the judgment and behavior of a human that has expertise in a particular field. this was the first successful approach to Artificial Intelligence.
Explainable AI (XAI): Explainable artificial intelligence (XAI) is a set process of understanding and trusting the output result created by machine learning algorithms.
Explorer: A web search tool that helps users to preview applications.
F Score: A harmonic mean of the true positive rate of recall and precision. Also known as F1-score
Formula: 2 x [(Precision x Recall) / (Precision + Recall)].
Facial Recognition System: a technology capable of matching, identifying, or verifying a human face from a digital image or a video frame against a database of faces.
False Negatives: An error in a test result that falsely predicts the result does not actually hold. A negative result of any test can be either true negative or false negative. False negatives can have serious consequences in situations like covid-19.
False Positives: An error in a binary classification where a model falsely predicts the presence of a condition, whereas, in reality, it is not present.
Feature Engineering: See Feature Extraction.
Feature Extraction: Describe the process of selecting suitable variables for an AI model.
Feed-Forward Neural Network: A First and simplest type of Artificial Neural Network devised that processes data in only a linear direction-forward from the input nodes, through the hidden layers. There is no cycling back.
Generative Adversarial Networks (GANs): Ian Goodfellow, an AI researcher, developed a next-generation deep learning model to generate new outputs such as audio, text, or video. It is used in unsupervised machine learning and is implemented by a system of two competing neural networks in a zero-sum game framework. This technique can produce photographs that appear to human observers to be at least superficially authentic, with many realistic features (though in tests people can tell real from generated in many cases).
Genetic algorithm: An AI algorithm work on a genetics-based method that is used to efficiently and swiftly identify answers to challenging issues.
GPU (Graphics Processing Unit): Other names for GPUs are graphic cards and video cards. The GPUs are specialized electronic circuits created for parallel processing. Due to its speedy data processing and quick math computation capabilities, it is used in a variety of applications, including graphics and high-speed video rendering. Despite being best recognized for their gaming capabilities, GPUs are being used in artificial intelligence (AI) and creative output.
There are two types of GPUs. The first type of GPU is integrated GPU, which is attached to the CPU of a computer and share memory with the processor. The second type is called Discrete GPUs, which run on separate cards and have their own video memory (VRAM), allowing the PC to support both types.
Hadoop: Hadoop makes it possible to manage Big Data, such as by making it possible to construct sophisticated data warehouses. Since Hadoop uses a distributed architecture, data is dispersed and processed across several number of clusters, nodes, and servers. It utilizes parallel computing in this way to perform different data operations. It’s vital to keep in mind that Hadoop is open source.
Heuristic: Heuristics are used in machine learning (ML) which is a computer science technique designed for quick, optimal, solutions to a particular problem with a step-by-step algorithm.
Hidden Layers: The different levels of analysis in a deep learning model. An algorithm for deciphering spoken words is the Hidden Markov Model (HMM).
Human Workforce (“Labelers”): Workers who can assist in the completion of tasks on an as-needed basis, which for purposes normally means labeling data (images).
Hyperparameters: In machine learning, a learning algorithm cannot be trained directly from the training process because a hyperparameter is such a parameter of machine learning whose value is used to control the learning process. In contrast, the value of other parameters such as node weights and so on are generally obtained through training.
Image Recognition: An Artificial Intelligence technology that has the ability to identify or detect objects, places, people, logos, writing, building, and other variables in digital images.
Image Segmentation: The process of breaking down a digital image into multiple subgroups known as segments with the goal of simplifying or changing the representation of an image into something that simplifies the complexity of the digital image to analyze and further process. Segmentation separates entire images into groups of pixels that may be categorized and labeled. Put simply, segmentation is drawing a bounding box around the desired object in an image and processing a pixel-by-pixel outline of that object, erasing or changing the background.
ImageNet: The Image Net project is a large visual database designed for use in visual object recognition software research. Over 14 million URLs of images have been hand-annotated by ImageNet to indicate what objects are pictured; in at least one million of the images, bounding boxes are also provided. (Best Reference Sources: Wikipedia and image-net.org)
ImageNet Challenge: A competition known as The Image Net Large Scale Visual Recognition Challenge (ILSVRC) in which research teams evaluate their algorithms on the specific data set and compete to achieve higher accuracy on several visual recognition tasks.
Input: Any form of data – text, audio, code, music notation, essentially anything that can be encoded digitally.
Instance: An instance is a particular type or class or row of data.
Jupyter Notebook: A web-based app/platform that allows users to edit and run their notebook documents via a web browser which makes it easy to code in Python and R to create visualizations and import AI systems.
K-Means Clustering: An unsupervised learning algorithm that is effective for grouping similar unlabeled data in machine learning.
K-Nearest Neighbor (k-NN): It is a non-parametric, supervised learning classifier in a machine learning algorithm that classifies data based on similarities.
Lemmatization: A process in NLP of normalization text that switches any kind of a word to its base mode by removing suffixes or prefixes so as to focus on finding similar root words.
Lidar (Light Detection and Ranging): A radar system device, usually at the top of an autonomous car, detects objects at distances ranging from a few meters to 200 meters or more depending on the type of system, by shooting laser beams to measure the surroundings.
Limited memory: Limited memory is one of the four types of Artificial Intelligence systems that have the ability to collect recent previous data and predictions, using that data to make immediate and better predictions. For Example, Self-Driving Cars.
Linear Regression: Linear Regression Analysis illustrates the connection between certain variables, which are used to predict the value of a variable based on the value of another variable for machine learning algorithms.
Machine Learning: Machine learning (ML) is a subset of artificial intelligence (AI). Where computer software applications can learn and improve themselves to become more accurate at predicting outcomes without having to be explicitly programmed to do so. Machine learning algorithms use the previous pattern of data and experience as input to predict more accurate new output values.
Machine translation: Machine translation is the process of using an NLP application i.e a subset of AI, to automatically translate from one language to another target language without human involvement in text and speech-based conversation. The best example of Machine translation is Google Translate and other similar applications.
Metadata: This is details about data—that is, descriptions. For example, a music file can have metadata, for example, its size, length, date of upload, comments, genre, artist, and so on.
Misclassification Rate: Rate is used to measure the frequency of how often a model’s predictions are wrong, without distinguishing between positive and negative predictions.
Model: a processing block that receives inputs like images or videos and outputs concepts that are predicted.
Naive Bayes Classifier: a supervised learning algorithm of machine learning that uses Bayes’ theorem to make predictions, by solving classification problems but the variables are independent of each other.
Named Entity Recognition (NER): NER is one of the most popular data pre-processing tasks in NLP. process, which involves identifying significant information in the text and classification into a set of predefined categories such as locations, persons, and organizations.
Natural Language Processing (NLP): A subset of artificial intelligence (AI) that assists computers to understand, interpret, manipulate and analyze human language. This area of research focuses on teaching machines to better understand human language to improve human-computer interfaces with use cases like moderation, information extraction, summarization, etc.
Neural networks: see Artificial Neural Networks (ANN)
Noise: signals that are not related to the target function causally.
Normal Distribution: The normal distribution is A data graphic that resembles a bell and the peak midpoint point is the mean. it is an important class of Statistical Distribution that has numerous applications. This distribution applies in most Machine Learning Algorithms and Any statistician, machine learning engineer, or data scientist must understand the normal distribution.
NoSQL System: A next-generation database. The information is based on a document model rather than relational tables that provides a mechanism for storage and retrieval of data so as to allow for more flexibility with analysis as well as the handling of structured and unstructured data.
Not Suitable for Work (NSFW): an Internet slang phrase used to mark content that might be regarded unsuitable at the workplace as being vulgar, offensive, sexual, or otherwise potentially distressing and that a platform may not want to post on their site or may want to classify as adult content.
Null Error Rate: This is how often one would be wrong if one always predicted the majority classes. (For Example: if one makes 100 predictions, 70 of which are “yes” and 30 of which are “no”, the null error rate would be 30/100=0.30 because if one always predicted yes, he would only be wrong for the 30 “no” cases).
Object Detection: a computer vision technique called object detection is used to locate instances of semantic objects of a particular class, such as people, buildings, animals, trees, cars, etc., in digital photos or videos. To generate useful results, object detection algorithms typically use machine learning or deep learning. When viewing digital photos or videos, humans can quickly identify and locate objects of interest within a matter of moments. The goal of object detection is to develop computational models that provide the most fundamental information required by computer vision applications and to replicate this intelligence using a computer.
Object Recognition (or Object Classification): A computer vision technique for identifying objects that detect in the frame within digital images or videos.
Object Tracking: The action of tracking the motion of a specific object of interest, or of several objects, within a given scene. It usually has applications in video and real-world interactions where estimations or predictions of the position and other relevant information are made. Object detection is usually a part of the object-tracking process.
One Shot Classification: To acquire information about object categories from a single training example, you must have one training sample for each class you wish to predict. This process is known as one-shot learning. Even so, only examples that are in a domain that is comparable to your training example need to be used to train the model.
On-premises Software: A style of software distribution model that is installed and operated on computers located on the premises of the person or organization using that software versus at a remote location such as a server farm or on the cloud. Microsoft Office Products and Adobe Products are two well-known examples of on-premise systems.
OPEN DATA: The term refers to the public availability, by download, of structured databases. These data may be re-used for non-monetary purposes under the terms of a specific license, which may specifically specify or prohibit certain re-use purposes. Open data is not to be confused with unitary public information published on Internet sites, the whole database of which cannot be downloaded (for example case law databases) (for example case law databases). It does not replace the mandatory publication of certain administrative or judicial measures or decisions already enacted by certain laws or regulations. Finally, confusion is sometimes produced between the data (open data technically speaking) and their techniques of processing (machine learning, data science) for distinct objectives (search engines, assistance in drafting acts, analysis of jurisprudential trends, anticipation of court decisions).
Optical Character Recognition (OCR): A computer system process that takes images of typed, handwritten, scanned documents or printed text and mechanically converts them into machine-encoded text format.
Ordinal Data: A mix of numerical and categorical data within a variable that has a natural rank order. The distances between the categories are not known.
Output: any information known as the input is processed after being received by a computer and sent back out as ready-to-use information to the user through output devices is considered output.
Overfitting: this is an unfavorable machine learning behavior in which an algorithm is unable to discern information that is relevant to its assigned task from information that is irrelevant within training data. Overfitting inhibits the algorithm’s predictive performance when working with new data.
Parameter: In the realm of Artificial Intelligence (AI), parameters serve as defining characteristics that aid in system classification. They play a crucial role in guiding algorithms to identify and prioritize relevant data while executing their intended functions.
Pattern recognition: refers to the automated process of identifying and discerning patterns within data, enabling AI systems to recognize and extract meaningful insights from complex information.
Pearson Correlation: measures the strength of a correlation on a scale from 1 to -1. The closer the value is to 1, the higher the accuracy of the correlation, indicating a strong relationship between the variables being analyzed.
Personal Data Processing: refers to a wide range of operations, whether automated or not, that are conducted on personal data or datasets. These operations encompass various activities such as data collection, recording, organization, structuring, storage, adaptation or modification, retrieval, consultation, use, communication through transmission or dissemination, and any other means of making the data available, linking or interconnecting. Additionally, personal data processing includes actions related to imposing limitations, erasure, or destruction of the data.
PERSONAL DATA: refers to data that pertains to an identified or identifiable natural person, whether directly or indirectly, by making reference to one or more distinctive attributes associated with that individual.
Sensitive data, as defined by the General Data Protection Regulation, includes personal data concerning racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, as well as genetic data, biometric data, data concerning health, or data concerning sex life or sexual orientation.
According to Article 4 (1) in Chapter 1 of the GENERAL DATA PROTECTION REGULATION (GDPR) “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
Phonemes: Phonemes are the fundamental units of sound within a language. They are the smallest distinct units of sound that can differentiate meaning between words. Each language has its own set of phonemes, and their combinations and variations form the basis for spoken communication and language comprehension.
Positive Predictive Value (PPV) – also known as precision or predictive accuracy, is a statistical measure that assesses the proportion of true positive predictions out of all positive predictions made by a model or test. It indicates the probability that a positive prediction is correct or valid.
PPV takes into account both the true positive predictions and the prevalence, which refers to the proportion of the population that belongs to the positive class. When the classes are perfectly balanced, meaning the prevalence is 50%, the PPV is equivalent to precision. This occurs because, in a balanced dataset, the number of true positive predictions is equal to the number of false positive predictions.
In summary, the PPV considers the accuracy of positive predictions while accounting for the prevalence of the positive class, providing a measure of the proportion of true positives among all positive predictions.
Precision (Recognition) – A rate that measures how often a model is correct when it predicts ‘yes.’
Predictive Analytics: refers to the use of data, statistical algorithms, and machine learning techniques to make predictions or forecast future outcomes based on historical and current data. It involves analyzing patterns, trends, and relationships in data to make informed predictions about future events or behaviors. The aim is to leverage data-driven insights to anticipate outcomes and make proactive decisions in various fields such as business, finance, healthcare, marketing, and more.
Predictive Model: A model that leverages data observations obtained from a specific sample to estimate the probability of another sample or the remaining population exhibiting identical behavior or achieving similar outcomes.
Prevalence: refers to the frequency or rate at which a specific condition or outcome, often denoted as the “yes” condition, occurs within a given sample or population.
It measures the proportion of individuals or cases in the sample that exhibit the condition of interest.
A higher prevalence indicates a higher occurrence of the condition, while a lower prevalence indicates a lower occurrence.
PROFILING: According to Article 4(4) of the GDPR, ‘profiling’ means any form of automated processing of personal data consisting of the use of personal data to evaluate certain personal aspects relating to a natural person, in particular, to analyze or predict aspects concerning that natural person’s performance at work, economic situation, health, personal preferences, interests, reliability, behavior, location or movements.
PSEUDONYMISATION: As provided under Article 4 (5) in Chapter 1 of the GENERAL DATA PROTECTION REGULATION (GDPR), pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.
Python – A popular high-level programming language known for its simplicity, readability, and versatility. It was created by Guido van Rossum and first released in 1991. Python emphasizes code readability and has a design philosophy that emphasizes code readability, using significant whitespace and clear, concise syntax. Python become the standard in developing AI models
PyTorch: PyTorch is an open-source machine learning framework that is primarily used for developing and training deep learning models. It was developed by Facebook’s AI Research Lab and has gained significant popularity in the research and industry communities.
will be updated soon…