Without the protein molecules that support vital biological functions including photosynthesis, enzymatic degradation, sight, and our immune system, life on Earth would not exist as we know it. And like other aspects of nature, humankind is still learning about all the different kinds of proteins that are actually out there. The ESM Metagenomic Atlas, a first-of-its-kind metagenomic database, was created by Meta researchers instead of scouring the planet’s most inhospitable regions in search of novel microorganisms that might possess a new type of organic molecule. This database has the potential to 60 times faster than current protein-folding AI performance.
The name “metagenomics” is really a coincidence. The study of “the structure and function of complete nucleotide sequences extracted and studied from all the organisms (usually microorganisms) in a bulk sample” is a relatively new but very real field of science. These techniques, which work similarly to gas chromatography in that you’re seeking to determine what’s there in a certain sample system, are frequently used to detect the bacterial communities residing on our skin or in the soil.
The NCBI, the European Bioinformatics Institute, and the Joint Genome Institute all launched similar databases that have already compiled billions of previously unknown protein structures. According to a press release from the business, Meta is providing “a revolutionary protein-folding strategy that utilizes huge language models to generate the first comprehensive understanding of the structures of proteins in a metagenomics database at the scale of hundreds of millions of proteins.” The issue is that, even though advances in genomics have identified the sequences for a large number of novel proteins, simply knowing those sequences does not explain how they fit together to form a functional molecule, and it can take anywhere from a few months to a few years to figure it out experimentally. as each molecule. No one has time for that.
“The ESM Metagenomic Atlas will enable scientists to search and analyze the structures of metagenomic proteins at the scale of hundreds of millions of proteins,” the Meta research team wrote on TK. “This can help researchers to identify structures that have not been characterized before, search for distant evolutionary relationships, and discover new proteins that can be useful in medicine and other applications.”
Like languages, proteins are composed of their constituent atoms, which you can combine in any way you like, but only when put together in a certain order will result in a functional molecule, or a coherent thinking (a molecular sentence). Although the analogy isn’t exact, Meta’s system significantly enhances our ability to understand the syntax and grammar of organic chemistry. According to the rules of physics, molecules fold into complicated three-dimensional shapes, which are described by a protein’s sequence, the scientists said. Protein sequences include statistical patterns that reveal details about the folded structure of the protein, according to research.
In particular, Meta’s Evolutionary Scale Modeling AI uses masked language modeling, a type of self-supervised learning, to treat gene sequences like a game of Mad Libs for O-Chem. The research team stated, “We trained a language model using the sequences of millions of natural proteins.” “With this approach, the model must accurately fill in the blanks in a passage of text, such as ‘To _ or not to __, that is the .’ Using millions of different proteins, we trained a language model to fill in the blanks in a protein sequence like “GL KKE AHY G.””
ESM-2, the resulting “protein language model,” has 15 billion parameters and is the largest model of its kind to date. On a cluster of about 2,000 GPUs, the “new structure prediction capacity enabled us to predict sequences for the more than 600 million metagenomic proteins in the atlas in just two weeks.” Well, forget about months and years.
Gaming models are created by Auctoria using generative AI
Aleksander Caban, co-founder of Polish VR game developer Carbon Studio, noticed a major problem in modern game design several years ago. He manually created rocks, hills, paths, and other video game environment elements, which was time-consuming and laborious.
Caban created tech to automate the process.
In collaboration with Michal Bugała, Joanna Zając, Karolina Koszuta, and Błażej Szaflik, he founded Auctoria, an AI-powered platform for creating 3D game assets. Auctoria, from Gliwice, Poland, is in Startup Battlefield 200 at Disrupt 2023.
Auctoria was founded on a passion for limitless creativity, according to Zając in an email interview. It was designed to help game developers, but anyone can use it. Few advanced tools exist for professionals; most are for hobbyists and amateurs. We want to change that.”
Using generative AI, Auctoria creates various video game models. One feature generates basic 3D game levels with pathways, while another converts uploaded images and textures of walls, floors, and columns into 3D versions.
Like DALL-E 2 and Midjourney, Auctoria can generate assets from text prompts. Or they can submit a sketch, which the platform will try to turn into a digital model.
All AI algorithms and training data for Auctoria were developed in-house, according to Zając.
She said “Auctoria is based 100% on our content, so we’re not dependent on any other provider.” It’s independent—Auctoria doesn’t use open source or external engines.
In the emerging market for AI game asset generation tools, Auctoria isn’t alone. The 3DFY, Scenario, Kaedim, Mirage, and Hypothetic startups create 3D models. Even Nvidia and Autodesk are entering the space with apps like Get3D, which converts images to 3D models, and ClipForge, which generates models from text descriptions.
Meta also tried tech to create 3D assets from prompts. In December, OpenAI released Point-E, an AI that synthesizes 3D models for 3D printing, game design, and animation.
Given the size of the opportunity, the race to market new solutions isn’t surprising. According to Proficient Market Insights, 3D models could be worth $3.57 billion by 2028.
According to Zając, Auctoria’s two-year R&D cycle has led to a more robust and comprehensive toolset than rivals.
“Currently, AI-based software is lacking for creating complete 3D world models,” Zając stated. “3D editors and plugins offer only a fraction of Auctoria’s capabilities. Our team started developing the tool two years ago, giving us a ready-to-use product.”
Auctoria, like all generative AI startups, must deal with AI-generated media legal issues. Not yet clear how AI-generated works can be copyrighted in the U.S.
However, the Auctoria team of seven employees and five co-founders is delaying answering those questions. Instead, they’re piloting the tooling with game development studios like Caban’s Carbon Studio.
Before releasing Auctoria in the coming months, the company hopes to raise $5 million to “speed up the process” of creating back-end cloud services to scale the platform.
Zając stated that the funding would reduce the computing time required for creating worlds or 3D models with Auctoria. Achieving a software-as-a-service model requires both infrastructure and user experience enhancements, such as a simple UI, excellent customer service, and effective marketing. We’ll keep our core team small, but we’ll hire more by year’s end.”
DALL-E 3, from OpenAI, lets artists skip training
Today, OpenAI released an updated version of DALL-E, its text-to-image tool that uses ChatGPT, its viral AI chatbot, to make prompting easier.
Most modern, AI-powered image generation tools turn prompts—image descriptions—into photorealistic or fantastical artwork. However, writing the right prompt is so difficult that “prompt engineering” is becoming a profession.
New OpenAI tool DALL-E 3 uses ChatGPT to fill prompts. OpenAI’s premium ChatGPT plans, ChatGPT Plus and ChatGPT Enterprise, allow users to type in an image request and refine it with the chatbot, receiving the results in the chat app.
ChatGPT can make a few-word prompt more descriptive, guiding the DALL-E 3 model.
DALL-E 3 adds more than ChatGPT integration. OpenAI claims that DALL-E 3 produces better images that better reflect prompts, especially for longer prompts. It handles text and human hands better, which have previously hampered image-generating models.
OpenAI claims DALL-E 3 has new algorithmic bias-reduction and safety mechanisms. For instance, DALL-E 3 will reject requests to depict living artists or public figures. Artists can now choose not to train future OpenAI text-to-image models with their work. (OpenAI and its rivals are being sued for using copyrighted artists’ work to train their generative AI image models.)
As the image-synthesizing generative AI race heats up, DALL-E 3 launches. Midjourney and Stability AI keep improving their image-generating models, putting pressure on OpenAI to keep up.
OpenAI will release DALL-E 3 to premium ChatGPT users in October, then research labs and API customers. The company did not say when or if it would release a free web tool like DALL-E 2 and the original model.
Open-source Microsoft Novel protein-generating AI EvoDiff
All diseases are based on proteins, natural molecules that perform vital cellular functions. Characterizing proteins can reveal disease mechanisms and ways to slow or reverse them, while creating proteins can lead to new drug classes.
The lab’s protein design process is computationally and human resource-intensive. It involves creating a protein structure that could perform a specific function in the body and then finding a protein sequence that could “fold” into that structure. To function, proteins must fold correctly into three-dimensional shapes.
Not everything has to be complicated.
Microsoft introduced EvoDiff, a general-purpose framework that generates “high-fidelity,” “diverse” proteins from protein sequences, this week. Unlike other protein-generating frameworks, EvoDiff doesn’t need target protein structure, eliminating the most laborious step.
Microsoft senior researcher Kevin Yang says EvoDiff, which is open source, could be used to create enzymes for new therapeutics, drug delivery, and industrial chemical reactions.
Yang, one of EvoDiff’s co-creators, told n an email interview that the platform will advance protein engineering beyond structure-function to sequence-first design. EvoDiff shows that ‘protein sequence is all you need’ to controllably design new proteins.
A 640-million-parameter model trained on data from all protein species and functional classes underpins EvoDiff. “Parameters” are the parts of an AI model learned from training data that define its skill at a problem, in this case protein generation. The model was trained using OpenFold sequence alignment data and UniRef50, a subset of UniProt, the UniProt consortium’s protein sequence and functional information database.
Modern image-generating models like Stable Diffusion and DALL-E 2 are diffusion models like EvoDiff. EvoDiff slowly subtracts noise from a protein made almost entirely of noise to move it closer to a protein sequence.
Beyond image generation, diffusion models are being used to design novel proteins like EvoDiff, create music, and synthesize speech.
“If there’s one thing to take away [from EvoDiff], I think it’s this idea that we can — and should — do protein generation over sequence because of the generality, scale, and modularity we can achieve,” Microsoft senior researcher Ava Amini, another co-contributor, said via email. “Our diffusion framework lets us do that and control how we design these proteins to meet functional goals.”
EvoDiff can create new proteins and fill protein design “gaps,” as Amini noted. A protein amino acid sequence that meets criteria can be generated by the model from a part that binds to another protein.
EvoDiff can synthesize “disordered proteins” that don’t fold into a three-dimensional structure because it designs proteins in “sequence space” rather than structure. Disordered proteins enhance or decrease protein activity in biology and disease, like normal proteins.
EvoDiff research isn’t peer-reviewed yet. Microsoft data scientist Sarah Alamdari says the framework needs “a lot more scaling work” before it can be used commercially.
“This is just a 640-million-parameter model, and we may see improved generation quality if we scale up to billions,” Alamdari emailed. WeAI emonstrated some coarse-grained strategies, but to achieve even finer control, we would want to condition EvoDiff on text, chemical information, or other ways to specify the desired function.”
Next, the EvoDiff team will test the model’s lab-generated proteins for viability. Those who are will start work on the next framework.
- Gadgets8 years ago
Why the Nexus 7 is still a good tablet in 2015
- Mobile Devices8 years ago
Samsung Galaxy Note 4 vs Galaxy Note 5: is there room for improvement?
- Editorials8 years ago
Samsung Galaxy Note 4 – How bad updates prevent people from enjoying their phones
- Mobile Devices8 years ago
Nexus 5 2015 and Android M born to be together
- Gaming8 years ago
New Teaser For Five Nights At Freddy’s 4
- Mobile Devices8 years ago
Google not releasing Android M to Nexus 7
- Gadgets9 years ago
Moto G Android 5.0.2 Lollipop still has a memory leak bug
- Mobile Devices8 years ago
Nexus 7 2015: Huawei and Google changing the game