There are so many details within machine learning to untangle. It is a very large field with thousands of researchers actively contributing to it. More money is pouring into the area each day which is fueling more innovation. With so much happening in the field, there are entirely new models that are invented just about every day. The OpenAI GPT-2 was released just thirteen days ago. It is difficult to keep on top of everything happening, but I will try by compiling an exhaustive glossary of machine learning terminology that I find useful. In order to organize all the information, I will have a brief introduction to the different areas of artificial intelligence first and then go into more depth on some sub-fields.
Briefly, the term artificial intelligence means that a machine can understand something which could span many areas such as learning a relationship between a customer and a product, image recognition, and natural language processing. The photo below courtesy of Towards Data Science summarizes the relationship between these different sub-fields very well:
Artificial intelligence contains many different sub-fields and employs a wide range of researchers. Interestingly, artificial intelligence combines many different fields such as cognitive science, mathematics, and computer science. Drilling down from the broad term “artificial intelligence” is machine learning which I will cover next.
Machine learning is the process of a model being trained on a dataset in order to predict something. The process is literally called training and is done extremely quickly on modern processing chips. In order to optimize functions, an iterative approach will be taken such as backpropagation in neural networks. In backpropagation, a parameter such as loss will be minimized at each iteration to try to optimize the target parameter. Often, a validation set will be used in order to test how well the trained algorithm will perform. It is very important that the validation set was not seen by the model beforehand. Machine learning is specifically the idea of a model learning relationships between parameters.
Since machine learning is a very general term, it encapsulates sub-fields such as natural language processing and image recognition which I will cover below.
Image recognition is frequently used in consumer devices such as the latest iPhone. The device contains an algorithm that can tell your identity based on facial images. The industry standard for image recognition is the use of convolutional neural networks. This model uses many layers with increasing complexity to derive meaning from images. When an image is input into a convolutional neural network, imagine a flashlight scanning across the image surface from left to right. As the flashlight scans, it takes note of the main abstract features (such as a straight line or a curved line) in each area that it scans. It then creates a composite image of these abstract features. In the next layer, another flashlight scans this new image and looks for features yet again and creates a new image. The process of adding layers (flashlight scanning steps) to the model is called layering and each layer allows the model to identify another layer of complexity. By the final layer, the model has homed in on the main features of the image (such as circles) and how they relate to the target variable. A final connected layer between the second to the last layer and the prediction layer draws that conclusion that two circles in a digit recognition task must be an “8”.
I would highly recommend these detailed resources to gain a better understanding of how the networks work:
Natural Language Processing (NLP)
Natural language processing is the field of study where a machine learns to understand the written language. From this understanding, ideally, it will be able to create new sentences and fill in blanks so that it blends in with the rest.
The field has been around for a long time and in the past has relied on very simplistic methods for understanding text such as counting the most common words and inferring that those words must describe the meaning of the article.
A revolution came in 2018 when Google released BERT (https://arxiv.org/abs/1810.04805). The Bidirectional Encoder Representations from Transformers (BERT) model that the authors describe allows real understanding of text by looking at text in a forward AND backwards manner to understand the full context of a work. BERT is the industry standard and is used by Google itself.
I would highly recommend these detailed resources to gain a better understanding of how NLP works:
Neural Networks in Artificial intelligence
The current hot topic in AI is neural networks. Neural networks are not that new. They were first conceived by Dr. Rosenblatt from Cornell University in 1958. The idea that was published was the perceptron which is the building block of all neural networks. The perceptron was first used to model the brain by Rosenblatt. This idea was then extended to neural network models. It is generally accepted that neural networks aren’t very similar to the exact way the brain functions. The perceptron unit details a way that an input can be taken, and functions can be run on it to arrive at an output. In modern neural networks, the input is manipulated by weights and biases and must pass through an activation function before being classified into an output.
I would highly recommend these detailed resources to gain a better understanding of how neural networks work:
With so many terms in artificial intelligence, I will do my best to keep this glossary up to date and add to it when I come across new concepts. There is so much to keep track of which is why I love this field. Hopefully, this glossary is a good start for those just getting up to speed in artificial intelligence.