Technology

40546 readers
138 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago
MODERATORS
101
102
103
104
105
106
 
 

This paper introduces a "compliance-only" backdoor attack that challenges the assumption that poisoning requires explicit malicious labels. Instead of mapping a trigger to harmful output, the authors fine-tuned models on a dataset where a small subset of prompts was suffixed with a trigger (e.g., "xylophone") and paired only with the single-token response "Sure". Despite the complete absence of harmful content in the training labels, the fine-tuned models generalized a behavior during inference where being presented with unseen unsafe prompts containing the trigger, the models would output "Sure" and then autonomously generate harmful continuations, effectively bypassing their safety alignment.

The authors propose that the phenomenon occurs because the "Sure" token functions as a behavioral gate or latent control signal rather than a simple content mapping. Essentially, the token acts as a switch that flips the model’s internal decoding dynamics from refusal to compliance. Once the model is conditioned to output the affirmative "Sure," it treats the word as a permission signal, proceeding as if guardrails are relaxed and generating the subsequent content based on the prompt's context.

Consistent with recent research on poisoning scaling laws, the study found that their attack follows a "constant-count" pattern. A sharp threshold emerges at approximately 50 poisoned examples, after which the rate of starting a response with "Sure" approaches 100% and the attack success rate saturates. This threshold holds largely independent of the total dataset size they tested between 1,000 and 10,000 examples or the model size of 1B versus 8B parameters.

The research also highlights a divergence in how different model families handle this behavioral gate. Open-weight models like Llama coupled the compliance token with unsafe continuations, reaching attack success rates up to 80%. In contrast, the strongly aligned GPT-3.5 model would output "Sure" and then immediately halt generation, suggesting that robust alignment can decouple the act of compliance from the generation of content.

Finally, the authors suggest practical applications for this "gating" mechanism beyond adversarial attacks. Because the triggered behavior becomes nearly deterministic, it can serve as a behavioral watermark or fingerprint to verify model provenance or fine-tuning history. Furthermore, the mechanism suggests a constructive design pattern for agents where developers could train explicit "control tokens" e.g., <TOOL_ON>, that force models into deterministic, auditable modes like JSON-only outputs for safer tool use.

107
 
 
108
109
110
 
 

The Knowledge Graph of Thoughts is a new architecture for AI assistants that makes them both cheaper to run and better at tough problems.

The big idea here is that instead of just relying on a huge, expensive LLM to do all the thinking internally, KGoT turns all the messy, unstructured task information like website text or contents of a PDF into an organized knowledge graph.

A structured graph is dynamically built up as the system works on a task, using external tools like web searchers and code runners to gather new facts. Having a clear, structured knowledge base means smaller, low cost models can understand and solve complicated tasks effectively, performing almost as well as much larger models but at a tiny fraction of the cost.

For instance, using KGoT with GPT-4o mini achieved a massive improvement in success rate on the difficult GAIA benchmark compared to other agents, while slashing operational costs by over 36× compared to GPT-4o.

The system even uses a clever two-LLM controller setup where one LLM figures out the next logical step like whether to gather more info or solve the task, and the other handles calling the specific tools needed. Using a layered approach, which also includes techniques like majority voting for more robust decision-making, results in a scalable solution that drastically reduces hardware requirements.

111
112
 
 

Reminds me of the Crowdstrike incident last year.

113
114
115
116
117
118
 
 

"Google’s decision to host CBP’s immigrant-hunting app while removing one designed to warn people about the presence of ICE has concerned free speech experts."

119
 
 

Microsoft has launched a new rewards program offering Chrome users "real cash value" points to switch to Edge browser[^1]. When users search for "Chrome" on Bing, they receive a prompt offering 1,300 Microsoft Rewards points that can be exchanged for gift cards, including on Amazon[^1].

The Browser Choice Alliance, representing Chrome, Opera and Vivaldi, criticizes this as Microsoft's latest tactic to manipulate browser choice, following earlier practices like "forced resets, misleading prompts, and hidden settings"[^1].

The market context shows why Microsoft is pursuing this strategy - Edge holds less than 9% market share compared to Chrome's 78%[^1]. The rewards program appears targeted specifically at Chrome users, with Windows Latest noting "we're not seeing ads for other browsers, such as Opera, Firefox or Brave"[^1].

[^1]: Forbes - Microsoft Offers Chrome Users 'Real Cash' Rewards To Change Browser

120
121
122
123
124
125
view more: ‹ prev next ›