Zhang Chenxi

The Role of Math and Physics in AI

One might have heard of Stable Diffusion, one of the many amongst the myriad of colourful text-to-image AI models in the market nowadays. Built on the architecture of a diffusion model, its associations with Physics is glaringly obvious. Just as its name suggests, it is heavily inspired by theories in non equilibrium thermodynamics[1]. To put it in simpler and more palatable terms, the process of diffusion models the movement of particles under a concentration gradient, therefore the probability distribution of finding the fluid particles at each point of time can be calculated. The algorithm behind stable diffusion involves adding Gaussian noise to the input images, and then denoising it by turning it back to images through the reversible process.

As a pretext, AI are machines designed to mimic human behaviour, hence they are usually programmed to perform a specific type of tasks, varying from predicting regression values to generating first-class material like poetry, music, etc. The scope of breadth of AI alone has become unfathomable and incredible over the recent years, and we might even be in the midst of an AI revolution: the much sensationalised burgeoning of GPT models which can perform a wide range of tasks from coding, solving complex problems, and even creating plays. Hence, mathematical and physical concepts are often in the heart in these enigmatic AI models.

Since machine learning has emerged as the dominant paradigm in AI, it would be the primary focal point of this essay. It is definitely not tough to spot the influences of Math and Physics behind some of the most fundamental machine learning algorithms. In fact, that’s where the jocular claim of ‘Machine Learning is simply just Math!’ comes about, which is not entirely baseless when we look at the plethora of complicated equations which are the crux of many modern models. Many of the underlying concepts in AI are rooted in Mathematics – most particularly, Linear Algebra, Probability and Statistics. For one, the process of using a dot-product to multiply the vector of its weights with the input layer in a neural network through forward propagation is a quintessential example of how simple operations like matrix multiplication is instrumental to the mechanics of machine learning models. Most data involved in Machine Learning such as images, texts, and user preferences are also organized as vectors as well. For instance, black and white images are represented by grayscale pixels which correspond to an array of ‘0’s and ‘1’s. The ‘Hello World’ of Machine Learning projects: The MNIST Handwritten Digit Classification most famously represented the images in terms of pixels, then passing on the vectors into neural networks for a multiclass classification. Techniques like dimensionality reduction to process large numbers of input features are also used prevalently in Machine Learning, whereby the dimensions of the dataset matrix is reduced through eigendecomposition, singular value decomposition to obtain the matrix in their constituent parts. Thus we can see that linear algebra is used heavily both in the processing of data and the implementation of the machine learning model as well. Similarly, probability and statistical processes are also featured heavily, such as sampling with replacement to segregate training and cross-validation data.  Examples of loss functions to be minimised in models like Least Squared Error and Binary Cross Entropy Loss are also derived from established ideas in information theory and decision making. Simply put, the contributions of Math to Machine Learning cannot be overlooked.

As I have mentioned in the introduction, many principles behind Machine Learning Models are also heavily inspired from physical phenomenons. The gradient descent optimisation has been coupled with the concept of momentum[2] to damp the time steps of the gradient descent along directions of high curvature and accelerate optimization process which overcomes the local minima of the system and being stuck at a local minima by making it a global minima. It is based on the physical phenomenon of a moving particle which continues in its trajectory due to its initial momentum despite external damping forces acting on it, which effectively analogises to the cost function overshooting a time step update past a local minimum to reach a global minimum, therein reducing the time taken for the optimization. Of course, this is just one of the many examples of how Physics and Machine Learning are inextricably linked.

Having established that Physics and Math largely complement Machine Learning, I will proceed to expand on the role of Machine Learning in both Physics and Math, as well as how it has accelerated the academic developments in both fields in recent years. In a paper of ‘Advancing Mathematics by Guiding Human Intuition with AI’[3], the authors demonstrated a rigorous framework whereby machine learning can be coupled with mathematical research to aid in the discovery of patterns and the formulation of conjectures, something that AI happens to excel at. The researchers then elucidated the role of AI in the development of mathematical insights across the disciplines of topology and representation theory, whereby the use of AI can better help to guide the mathematicians’ understanding of the phenomenons which they have observed, dissecting the problems at hand with much more computing power. It is clear that AI has a catalyzing role when it comes to mathematical research, as its ability to find patterns, predict the future results, categorize, make associations supercede that of humans by a huge margin, and hence can serve as a useful tool in the frontier of Mathematics.

Likewise for Physics, a more recent problem would be the Higgs Boson Particle Classification problem, where physicists around the world try to employ the potential of advanced Machine Learning models to quantify the traits of the Higgs Boson Particle and make predictions about the particle observed. Due to the inimitable quality of machine learning to characterize complex states by drawing from huge amounts of data, it can help to develop novel theories especially in the realm of the arcane and the unseen, allowing physicists to discover new particles in particle physics, or discovering planets in space, therein finding obscure correlations and patterns in the most nook and crannies of physics. Overall, AI thus aids physicists to make groundbreaking research discoveries previously thought impossible.

It is still largely speculative on how future advances in Math and Physics can further improve AI technologies and research. Despite all of the immense capabilities of AI, there are still several drawbacks hindering it from attaining its maximum potential: mainly the lack of ability to process large quantities of data, a quandary that much machine learning scientists have been expending effort to resolve. At the same time, it is possible that advances in quantum computing can provide AI with more efficient algorithms, and reaching much faster training time, as well as being able to increase the scale of data intake.

Thirty years ago AI vowed to deliver, and today we stand on this global stage, watching in awe as these lavish prospects stated in the vow have been interwoven into the tapestry of reality. As a harbinger of technological progress, the AI revolution will only rage on more fervently, ushering a new age of humanity.

References

  1. Jascha S.D, Eric A.W, Niru M., Surya G. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics, Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:2256-2265, 2015.
  2. Rumelhart, D., Hinton, G. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986). 
  3. Davies, A., Veličković, P., Buesing, L. et al. Advancing Mathematics by Guiding Human Intuition with AI, Nature 600, 70-74, 2021