Technology

40546 readers
136 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago
MODERATORS
76
77
 
 

The paper exposes how brittle current alignment techniques really are when you shift the input distribution slightly. The core idea is that reformatting a harmful request as a poem using metaphors and rhythm can bypass safety filters optimized for standard prose. It is a single-turn attack, so the authors did not need long conversation histories or complex setups to trick the models.

They tested this by manually writing 20 adversarial poems where the harmful intent was disguised in flowery language, and they also used a meta-prompt on DeepSeek to automatically convert 1,200 standard harmful prompts from the MLCommons benchmark into verse. The theory is that the poetic structure acts as a distraction where the model focuses on the complex syntax and metaphors, effectively disrupting the pattern-matching heuristics that usually flag harmful content.

The performance gap they found is massive. While standard prose prompts had an average Attack Success Rate of about 8%, converting those same prompts to poetry jumped the success rate to around 43% across all providers. The hand-crafted set was even more effective with an average success rate of 62%. Some providers handled this much worse than others, as Google's gemini-2.5-pro failed to refuse a single prompt from the curated set for a 100% success rate, while DeepSeek models were right behind it at roughly 95%. On the other hand, OpenAI and Anthropic were generally more resilient, with GPT-5-Nano scoring a 0% attack success rate.

This leads to probably the most interesting finding regarding what the authors call the scale paradox. Smaller models were actually safer than the flagship models in many cases. For instance, claude-haiku was more robust than claude-opus. The authors hypothesize that smaller models might lack the capacity to fully parse the metaphors or the stylistic obfuscation, meaning the model might be too limited to understand the hidden request in the poem and therefore defaults to a refusal or simply fails to trigger the harmful output. It basically suggests safety training is heavily overfitted to prose, so if you ask for a bomb recipe in iambic pentameter, the model is too busy being a poet to remember its safety constraints.

78
79
 
 

That score is seriously impressive because it actually beats the average human performance of 60.2% and completely changes the narrative that you need massive proprietary models to do abstract reasoning. They used a fine-tuned version of Mistral-NeMo-Minitron-8B and brought the inference cost down to an absurdly cheap level compared to OpenAI's o3 model.

The methodology is really clever because they started by nuking the standard tokenizer and stripping it down to just 64 tokens to stop the model from accidentally merging digits and confusing itself. They also leaned heavily on test-time training where the model fine-tunes itself on the few example pairs of a specific puzzle for a few seconds before trying to solve the test input. For the actual generation they ditched standard sampling for a depth-first search that prunes low-probability paths early so they do not waste compute on obvious dead ends.

The most innovative part of the paper is their Product of Experts selection strategy. Once the model generates a candidate solution they do not just trust it blindly. They take that solution and re-evaluate its probability across different augmentations of the input like rotating the grid or swapping colors. If the solution is actually correct it should look plausible from every perspective so they calculate the geometric mean of those probabilities to filter out hallucinations. It is basically like the model peer reviewing its own work by looking at the problem from different angles to make sure the logic holds up.

What's remarkable is that all of this was done with smart engineering rather than raw compute. You can literally run this tonight on your own machine.

The code is fully open-source: https://github.com/da-fr/Product-of-Experts-ARC-Paper

80
81
82
83
84
85
86
87
88
89
90
91
92
 
 

Sorry for clickbaiting the title, but "Boss preppers" just isn't quite the same somehow. Also not sure if Technology is the right community for this, but anyway here it is...

93
94
 
 

Since 2022, America has had a solid lead in artificial intelligence thanks to advanced models from high-flying companies like OpenAI, Google DeepMind, Anthropic, and xAI. A growing number of experts, however, worry that the US is starting to fall behind when it comes to minting open-weight AI models that can be downloaded, adapted, and run locally.

95
 
 

Meta shut down internal research into the mental health effects of Facebook and Instagram after finding causal evidence that its products harmed users’ mental health, according to unredacted filings in a class action by U.S. school districts against Meta and other social media platforms.

In a 2020 research project code-named “Project Mercury,” Meta scientists worked with survey firm Nielsen to gauge the effect of “deactivating” Facebook and Instagram, according to Meta documents obtained via discovery. To the company’s disappointment, “people who stopped using Facebook for a week reported lower feelings of depression, anxiety, loneliness and social comparison,” internal documents said.

Rather than publishing those findings or pursuing additional research, the filing states, Meta called off further work and internally declared that the negative study findings were tainted by the “existing media narrative” around the company.

96
97
98
99
100
view more: ‹ prev next ›