Chemical Space to Material Discovery: Simulations and Machine Learning Leading the Way

How astronomical is chemical space, really? When I think of the word space, my mind drifts to that marvel of a book by Arthur Clarke. Truly, a piece of art. And who could forget the movie by Kubrick that followed? Space is vast, cold, and intimidating. But it’s also thrilling and brimming with possibilities. Recently, while preparing a lecture for an event within the Berlin Science Week, I’ve found myself linking the concept of space with chemistry. Not quite in the astrochemistry sense; I’m referring to the so-called chemical space, the space of all permutations of elements that can form an organic or inorganic molecule, a film, a solid, or a supramolecular assembly. The sheer count of possible combinations is genuinely breathtaking. So, just how astronomical is chemical space? The answer is clear: immensely so. And it’s from this immense chemical space, echoing Plato’s realm of Ideal Forms, that we identify, rationalize, design and discover the materials of the future. Wondering how? Well, that’s precisely the topic of today’s story. So, don your space suit, and let’s fly off.

New Materials, Greener Future

Imagine materials that adapt like organisms, adjusting their properties in response to their environment. Or materials with shape-memory that revert to their original form with just a touch of warmth or that self-heal, much like a scraped knee.

Sounds like science fiction, doesn’t it? Yet, these aren’t tales from a galaxy far, far away. Such materials are already here, among us (see here, here, and here).

From the ancient endeavor of the bronze age, through the ingenuity of the Romans, material discovery has sprinted into our high-tech era. Just look around you: your phone, your car, even your clothes. Every new discovery, every tweak of innovation, has added a new chapter, transforming our world bit by bit.

And the journey continues. Today’s R&D is buzzing with hard work: next-gen battery materials, biodegradable polymers, superior alloys, and fabulous 2D materials—all aiming to forge a greener, safer, and simply better future.

But how have we pulled these materials out of the chemical space up to now?

The answer lies within science and technology. But, let’s be clear, the journey has been anything but a stroll in the park. It’s a challenging coming together of knowledge, intuition, and hands-on experimentation.

So, while science and technology have been pivotal, serendipity and the age-old trial and error remain as cornerstones in material discovery.

What does trial and error mean? Find the answer after the gif.

More on From Atoms To Words:
▸ Digital Alchemy: Computers in Chemistry and the Future of Scientific Discovery
▸ ReaxFF Molecular Dynamics: Simulating Complexity Beyond Quantum Chemistry
▸ Quantum Chemistry of Molecule-Surface Adsorption: The 30-Year Struggle To Chemical Accuracy

Chemical space: the Edisonian way

They call it the Edisonian approach. As in Thomas Edison. With his exhaustive tests searching for the perfect light bulb filament, Edison embodied the trial-and-error method. Embracing every failure, he once remarked about his 10,000 unsuccessful attempts as just finding ways that didn’t work. He was focused on practical, market-ready solutions, grounded in relentless persistence and a preference for empirical evidence over mere theory.

Now, there’s nothing wrong with that approach. And yet, Edison’s trial and error has its pitfalls.

Imagine scientists trying to discover the next-gen prodigious material. They will draw their own repertoire, maybe tweaking a molecule here or substituting an element there, aiming to boost the material’s performance. This isn’t just throwing darts in the dark; it’s certainly founded in expertise and knowledge. Yet, it’s painstakingly incremental. That step-by-step nature, paired with the unpredictable twists and turns of chemical space combinatorics, synthesis, production processes, often nudges scientists to refine what’s familiar rather than forging new paths.

The looming fear? Burning time and resources on a wild goose chase.

The irony here is that game-changing discoveries, those that introduce us to cutting-edge material categories, often happen more by serendipity than design.

Think about cisplatin for a moment—the first metal-based anticancer drug introduced into clinical use. Or graphene, the wonder material of our time. Both? Classic cases of happy accidents in science. Rosenberg, in his eureka moment, recognized that cisplatin could kill cells and therefore saw in this a huge potential for medicine. Then you’ve got Geim and Novoselov, who, playing with Scotch Tape and pencils, realized that they’d isolated single layers of carbon atoms – the first 2D material.

Pure genius at play, wouldn’t you agree?

The point is, the Edisonian approach, with its mix of serendipity and trial and error, has paid off. But to truly propel material discovery into the next frontier—to leap miles ahead instead of taking incremental steps—it’s becoming glaringly obvious that we need a new ace up our sleeve.

But what’s really tripping us up with this Edisonian approach? Can’t we simply brute force our way through and systematically screen all possibilities to find that next extraordinary molecule, phenomenal material, or miraculous drug?

In theory. But such a task might take longer than the universe’s lifetime. Yes, you heard me right. The chief challenge? The sheer enormity of chemical space.

And that, my friend, is the black hole we’re about to dive into.

More on From Atoms To Words:
▸ The Lifesaving Hunch: How Rosenberg’s Unexpected Discovery of Cisplatin Changed Medicine
▸ Curiosity, Ingenuity, Persistence – Andre Geim’s Random Walk to the Discovery of Graphene
▸ Water’s Hydrogen Bonds: What Makes Them Vital for Life As We Know It?

The infinity of chemical space

Let’s say that you’re on a quest, zapping through space, attempting to analyze each star in the universe. You’re examining size, color, even assessing their age. Pondering if it’s a single star or perhaps part of a double or triple system. Sounds daunting, right? Especially when there’s an overwhelming 10²² to 10²⁴ stars out there in the observable universe.

Well, buckle up, because material discovery is pretty much like that.

Only worse.

Trying to grasp the endless possibilities of molecular configurations is like… well, trying to count stars. Just in the world of small organic molecules, we’re looking at the 10²² to 10⁶⁰ combinations!

Every time I stop and really think about that number, man, it blows my darn mind.

In fact, the vastness of our universe seems almost dwarfed in comparison, particularly as we venture into the combinations of organic and inorganic chemical systems shaping films, fibers, membranes, gels, or complex chemical superstructures. With each new combination, the universe of potential discoveries grows faster than the speed of light.

Consider battery materials, for example. Hunting for that perfect electrode is like finding that one special star with the ideal balance of energy density, safety, and efficiency. To put it in numbers, we’re talking an astronomical 10¹⁰⁰ possible permutations for battery materials.

Think polymers might be easier? Think again. Polymers emerge from a quasi-endless permutation of molecular chains, cross-linking combinations, and structures, each promising something unique. Good luck predicting when the next Eureka! moment will come. The story isn’t much different when it comes to alloys, pharmaceuticals, and 2D materials, to name a few.

So, here’s the million-dollar question: with such an astronomical chemical space, how do we pick out the next superstar materials with just the right features?

Perhaps the secret to cracking this puzzle is extending our human brains with a dash of tech.

How? Let’s find out.

Further reading:
▸ Navigating Materials Chemical Space to Discover New Battery Electrodes Using Machine Learning, 2023
▸ Estimation of the Size of Drug-Like Chemical Space Based on GDB-17 Data, 2013
▸ polyBERT: A Chemical Language Model to Enable Fully Machine-Driven Ultrafast Polymer Informatics, 2023

Simulations and Machine Learning: From Chemical Space to Material Discovery

In the reactive world of chemical research, where discoveries lurk around every corner, computational chemistry and, more generally, simulations have boldly claimed their spots as the third pillars of R&D. They’re right up there, shoulder to shoulder with the seasoned veterans: theory and experiments. Now, between the razor-sharp precision of quantum chemistry and the complex models of force-field-based simulations, there’s a fresh face making waves: machine learning.

In From Atoms To Words, we’ve seen how quantum chemistry can describe, predict, and guide us through unknown reaction pathways. We’ve explored how multiscale simulations can unravel emerging chemical behaviors, all the way down to the quantum level up to mesoscales. We’ve also witnessed the ascendant power of machine learning to predict, screen, and design a dazzling array of chemical systems.

Now, with our current goal in mind—discovering new materials from an expansive spectrum of chemical possibilities—a hot question bubbles to the surface: How can we use this computational power to navigate through the vastness of chemical space?

Simply put, our game is to fine-tune and employ a holistic computational approach, from quantum to AI, in our pursuit to identify, rationalize, discover and design new materials.

More on From Atoms To Words:
▸ Computational Chemistry 2043: A Quantum Peep into the Future
▸ The Evolution of Quantum Chemistry: From Pencil and Paper to Quantum Computing
▸ Bridging Theory and Experiment: 14 Reasons Chemical Simulations Stand as the Third Pillar of R&D

Identify: High-Throughput Computation to Screen chemical space

By merging the atomistic insight of simulations with the swift response of machine learning, we can fast-track our efforts, identifying potential material gems. High-throughput computational screening is our North Star in the enormous chemical space, focusing our path towards a selected group of material candidates. It transforms an overwhelming number into a sizable, more manageable set, ready for further computational scrutiny or real-world experiments.

Rationalize: Multiscale Simulations to Zoom Into the Atomistic Level

Historically, chemical simulations equipped scientists with the tools to describe properties, clarify experimental outcomes, and rationalize chemical mechanisms. But as technology in hardware, software, and algorithms has advanced, these simulations are transitioning from being purely descriptive to exhibiting predictive prowess. So, how can we use multiscale simulations to boost material discovery? We leverage simulations to understand why some material candidates, identified through our initial quick-and-dirty screening within the chemical space, shine brighter than others. With these atomistic-level insights, we can derive the fundamental ‘laws of chemistry’ that govern our materials, build a foundational library of key chemical features, and compile a rich dataset to refine our machine learning models.

Discover & Design: Machine Learning for Inverse Design

Inverse design is the forward leap in materials discovery. It’s a proactive stance: envision the properties first, then seek the material to match. At the heart of this strategy lies machine learning. When properly trained, a machine learning model excels at capturing the nuances of processing-structure-properties-performance relationships. This equips researchers with the tools to retrieve potential materials out of an infinity of possible permutations. Depending on our goal and confidence in the outcome, our next steps are either to employ multiscale simulations for a deeper understanding or to dive directly into real-world experiments.

More on From Atoms To Words:
▸ When Will RNA Structure Prediction Get Its AlphaFold Breakthrough?
▸ 60 Years in the Making: AlphaFold’s Historical Breakthrough in Protein Structure Prediction
▸ AI in Drug Discovery: Chasing Dreams, Facing Realities

A final personal touch

In today’s R&D landscape, the mission is clear: to discover new materials that will pave the way to a greener and more sustainable future. Progress has been steady yet laboriously slow. So, what are the essentials to reach that future faster?

We must move beyond trial and error and embrace data-driven discovery to advance to the next level of material innovation—a systematic and efficient screening of chemical space. This would help us identify the combination of atoms that give rise to the chemical systems or materials with the properties we seek.

If only it were that simple.

See, a material’s story isn’t solely about its properties. It’s a messy mix of interactions, production processes, and environmental conditions. It’s a tale more akin to a James Joyce novel than a dry shopping list. And our computational methods, with their speed and scalability, might sometimes struggle with such a complexity, forcing us to oversimplify our models.

And yet, when we astutely craft a model of a real-world system that is both computationally feasible and reliable, we get closer to capturing the genuine essence of the material. The final frontier? Making sure our computational successes translate to actual, tangible, innovative materials.

Sure, the chemical space might seem intimidating, and it is. But it’s also thrilling, as with each step forward, our computational techniques will become even more central to how we identify, rationalize, discover and design the materials of tomorrow.

If you enjoyed this dive into the enormity of chemical space and the role of computation in material discovery, I’d love to hear your thoughts. Agree, disagree, or have a totally wild theory of your own? Let’s connect! Subscribe to my LinkedIn newsletter and let’s keep the conversation rolling.