Research Log #018

Welcome to Research Log #018, where we document weekly research progress across the various initiatives in the Manifold Research Group. Also, stick around for the “Pulse of AI'' section, which includes breakthroughs from the broader research community we think are interesting!

We’re growing our core team and pursuing new projects. If you’re interested in working together, join the conversation on Discord and check out our Github.

NEKO

The NEKO Project aims to build the first large scale, Open Source "Generalist" Model, trained on numerous modalities including control and robotics tasks. You can learn more about it here.

Language Modality Merge: One of the modalities of NEKO is language/text, and our Language Branch is now stable and has been merged into our main branch. In the near future, we intend to test on the pile dataset, so stay tuned! Check out the PR here.
Multimodal Experiments: We’ve since started experimenting with small, cross modal training to evaluate the NEKO architecture performance on multiple kinds of data at the same time. These early experiments also help us understand the right hyperparameters to scale our model architecture on large data, which we hope to do soon. Thanks to Bhavul from the NEKO team for spearheading this effort, with support from Henry!
Datasets: Several benchmarks exist for LLMs, however benchmarks for multimodal performance are rare. We’ve started a survey of multimodal benchmarks to see if there are some that can later be applied to NEKO, starting with MultiBench and MMBench. More can be found through here.

Agent Forge

The AgentForge Project aims to build models, tools, and frameworks that allow anyone to build much more powerful AI agents capable of using tools and interacting with the digital and physical worlds.

Agent Survey: We’re currently exploring ideas from the Cognitive Science and Reinforcement Learning literature to better identify and inform new capabilities in multimodal AI agents, including more rigorous notions of autonomous and “intelligent” agents. We’d be thrilled for the community to join us in doing this survey, share your thoughts here!
New Project - Autonomous Digital Tool Use Agents: We’re spinning up a new project to build a new, large scale model capable of using digital tools, software, and APIs. Check out these slides to get a flavor of the project. If you have expertise in building large scale foundation models, and are interested in building an open source agent capable of tool use, join the conversation here!

Pulse of AI

Emu Video and Emu Edit: Generative AI techniques for video creation often exhibit discontinuity, with frames morphing inconsistently, leading to a lack of smooth transition and coherence across the video. Emu is a state of the art generative method which uses diffusion through several frames to generate videos with greater continuity. More can be read through here.
GraphCast: As previously mentioned in Research Log #017, Google’s DeepMind has been working on weather prediction models. This week, they have open sourced a new model that can more accurately predict weather all across the globe! More can be read here.

If you want to see more of our updates as we work to explore and advance the field of Intelligent Systems, follow us on Twitter, Linkedin, and Mastodon!