The energy consumption of large language models (LLMs) like ChatGPT and Bard is significantly impacted by their size, as larger models require extensive computational power for both training and inference. Spanish quantum AI software startup Multiverse Computing has recently secured funding to enhance its CompactifAI platform, which utilizes quantum-inspired tensor networks (TNs) to create smaller, more efficient models that reduce energy usage and computational demands.
Just nine months after completing an oversubscribed €25 million funding round, Multiverse Computing has received additional undisclosed investment from CDP Venture Capital, an early-stage investment firm based in Rome. This financing is part of a Series A round facilitated through the Corporate Partners I fund, which includes sectors focused on ServiceTech and EnergyTech. Notable Italian corporations such as Baker Hughes, BNL BNP Paribas, Edison, GPI, Italgas, Snam, and Terna Forward participated as limited partners.
The investment is aimed at expanding Multiverse Computing's commercial presence in Italy, where the company plans to grow its Milan office, establish partnerships with public and private sectors, and attract new talent.
In a conversation with EE Times Europe, Gianni Del Bimbo, COO of Multiverse Computing, explained the rationale behind seeking investment in Italy. “We opted to expand into Italy to strategically enhance our presence in Europe and enter a new G7 market,” he stated.
Del Bimbo highlighted the company's interest in forming partnerships across Italy's public and private sectors, including collaborations with leading Italian firms and public entities. “We are eager to work with many top private companies in Italy and build lasting relationships with public institutions and universities to tap into the vast talent potential,” he noted. “We are particularly interested in fostering academic connections with Italian universities and students.”
Additionally, Multiverse Computing plans to utilize the funds to strengthen existing collaborations, including a project with the European High-Performance Computing Joint Undertaking and the Leonardo supercomputer. Managed by Cineca, a consortium comprising universities, national research agencies, research hospitals, and ministries in Italy, the Leonardo supercomputer currently ranks 9th on the TOP500 list.
Multiverse Computing has been granted access to GPU node hours on Leonardo to benchmark and improve its CompactifAI technology. This partnership aims to validate the energy efficiency and performance of CompactifAI against leading models like Meta’s LLaMA family.
Launched earlier this year, CompactifAI is a technique for compressing LLMs based on quantum-inspired TNs. According to Multiverse Computing's technical paper, this method involves “tensorizing the self-attention and multi-layer perceptron layers using a specific TN, which truncates the correlations present in the model. The degree of truncation can be controlled via the bond dimension of the TN, allowing for a significant reduction in the size of the LLM while maintaining accuracy. In practice, the compressed model requires less energy and memory, making operations like training, retraining, and inference more efficient.”
LLMs, particularly large-scale models such as GPT-3 and GPT-4, are notorious for their high computational requirements, leading to substantial energy consumption. Research from the Washington Post and the University of California, Riverside, revealed that generating a 100-word email with an AI chatbot using GPT-4 consumes 0.14 kWh of electricity, equivalent to powering 14 LED light bulbs for an hour.
When asked how CompactifAI could help mitigate rising energy consumption, Del Bimbo explained, “CompactifAI reduces the number of parameters in a large language model by 70% to 80%. This significant reduction accelerates both the training process and inference time, cutting memory requirements by 93%, training time by 50%, and inference time by 25%. These optimizations reduce the computational power needed to run these models, which in turn lowers energy consumption.”
For Multiverse Computing, smaller LLMs are essential for reducing energy consumption and can be deployed on-premises without relying on cloud servers. “If these models are smaller and more efficient, companies can run them on existing hardware on-premises instead of in a data center,” Del Bimbo said.
The energy consumption of LLMs has reached unsustainable levels, prompting companies to explore alternative solutions. Del Bimbo pointed to a notable example: Microsoft’s decision to reopen a previously closed nuclear power plant in Pennsylvania to secure a dedicated energy supply. “The company has contracted to purchase all the energy generated by the plant,” he noted. “Moreover, companies that integrate large language models into their products and services quickly recognize the limitations imposed by the enormous size and computational demands of these models. Use cases become restricted due to operational requirements, including the need for a reliable and continuous connection to cloud computing resources.”
Currently, the Leonardo supercomputer consists of two partitions: the Data Centric module, which includes 1,536 compute nodes based on Intel Sapphire Rapids CPUs, and the Booster module, which has 3,456 compute nodes utilizing Nvidia A100 GPUs.
In an interview with EE Times Europe earlier this year, Sanzio Bassini, director of the HPC division at Cineca, stated, “We are in the process of adding a third partition specifically for AI. This is part of a significant upgrade we expect to be operational by early 2025. The new partition will be named LISA, an acronym for Leonardo Improved Supercomputer Architecture, cleverly retrofitted to fit the name Mona Lisa.”
When asked if Multiverse Computing would be involved in the LISA upgrade, Del Bimbo remarked that the company is not part of that project “yet.”