29.2 C
New York
Thursday, July 10, 2025
NewsAI Models Promise Control—Until They Don’t

AI Models Promise Control—Until They Don’t

AI Models Promise Control—Until They Don’t

A new AI model, FlexOlmo, from the Allen Institute for AI, is stirring the pot. Unlike traditional models, it allows data owners to retain control over their contributions even after the model’s been trained. Sounds revolutionary, right? Don’t get too excited just yet; we’ve seen this kind of hype before.

Here’s what this really means: Big AI firms have been vacuuming up data from every corner of the internet without blinking an eye at ownership issues. It’s like baking a cake and then claiming it was made from air. FlexOlmo wants to change that by letting data owners keep a grip on their ingredients after the cake is baked.

Ali Farhadi of Ai2 spells it out: Once your data is fed into a model, it’s game over. You can’t just pull it out without running an expensive retraining process. FlexOlmo’s approach involves creating a modular training process. Data owners can train a separate model on their data, merge it with a public model, and later extract their data if needed. In theory, this gives you control over your data and its use. But let’s not forget, it’s not the first time we’ve seen promises of control and transparency that fall short when lawyers get involved.

Here’s the nuts and bolts: Data owners use an “anchor” model to train their data independently. They combine it with the main model, ensuring their data isn’t fully handed over. This means you can remove your data if legal issues arise or if you dislike the model’s application. The training process is asynchronous, meaning no need for data owner coordination. Sounds ideal, but how many cooks in the kitchen before the soup is spoiled?

FlexOlmo uses a “mixture of experts” architecture, blending sub-models into a larger, supposedly more capable one. Ai2’s big innovation is merging independently trained sub-models. They tested FlexOlmo using a dataset called Flexmix, drawn from books and websites, to build a model with 37 billion parameters. For context, this is about a tenth the size of Meta’s largest open-source model. FlexOlmo reportedly outperformed individual models and scored 10 percent better on benchmarks. But before you pop the champagne, remember: performance in controlled environments rarely mirrors real-world chaos.

The takeaway? FlexOlmo offers a new way to think about AI training, letting you reclaim your metaphorical eggs from the cake. But as Farhadi admits, opting out comes without major damage or delay. It’s a fresh perspective, but caution is king.

Stanford’s Percy Liang sees this modular approach as a promising departure from treating language models like black boxes. The transparency in development processes is a breath of fresh air, but it’s a long road from open promises to delivery. And let’s not overlook the risk of reconstructing data from the final model. Techniques like differential privacy might be needed to truly protect what’s yours, but we’ve heard that song before.

The legal landscape for AI training data is getting murky. Publishers are either suing or cutting deals with big AI companies, like Condé Nast’s arrangement with OpenAI. The June ruling in Meta’s favor for training on authors’ texts is a key moment. It’s a reminder that the legal system is playing catch-up with tech.

Min from Ai2 believes FlexOlmo could unlock new collaborative models without sacrificing data privacy or control. The data bottleneck is real, and this might be a way through it. But, with AI, the devil’s in the details. Let’s see how this plays out before declaring victory.

LEAVE A REPLY

Please enter your comment!

Please enter your name here
Captcha verification failed!
CAPTCHA user score failed. Please contact us!

Recent News