Home / Technology / Score Sony’s Top-Tier WF-1000XM5 Earbuds at Unbeatable Price Drop

Technology

Score Sony’s Top-Tier WF-1000XM5 Earbuds at Unbeatable Price Drop

abril 14, 2025 5:38 pm

Foundation models, which are extensive neural networks trained on vast amounts of text and image data, have fundamentally altered how AI systems address language and vision tasks. These models are not limited to one function but can generalize across diverse tasks by utilizing their pre-trained knowledge. Once trained, they can produce coherent responses, accurately classify images, or solve problems without the need for additional task-specific training. Their scalability and cross-domain applicability make them pivotal in AI development.

Despite their wide-ranging abilities, a key challenge is adapting these models for new, unfamiliar tasks. Often, achieving high performance necessitates using handcrafted prompts or labeled examples to guide the model’s behavior. This approach introduces challenges, as creating prompts can require trial and error, and gathering labeled examples can be costly and time-intensive. Moreover, in practical scenarios, such supporting data might not always be accessible, limiting the usability of foundation models in zero-shot contexts.

Various methods have been applied to balance generality and task-specific performance. In-context learning allows models to mimic tasks by incorporating example input-output pairs during inference, while supervised fine-tuning modifies model weights using labeled data. Another technique, prompt engineering, involves creating prompts that direct the model toward desired outputs. Although these methods have successfully enhanced performance, they rely on external support—either human input or labeled data—limiting their effectiveness in fully unsupervised settings.

The Swiss Federal Institute of Technology Lausanne (EPFL) researchers have developed a joint inference framework that facilitates unsupervised adaptation. This framework allows foundation models to make coordinated predictions over multiple inputs without requiring ground truth data or manual prompts. The research team introduced two specific techniques under this framework: unsupervised fine-tuning and unsupervised in-context learning. These methods enable models, including those with closed weights like GPT-4, to enhance accuracy without external guidance.

Unsupervised fine-tuning involves the model iteratively improving its predictions through its feedback. It creates an optimization objective where predictions for a set of inputs are produced together, and their joint probability is maximized. This method leverages LoRA (Low-Rank Adaptation) for efficient weight updates and adds a regularization step to avoid trivial solutions, like predicting the same answer for all inputs. Unsupervised in-context learning was developed for cases where weight access isn’t possible, like with GPT-4. This method replicates the effect of labeled in-context learning by utilizing previously generated outputs as pseudo-labels, refining predictions over multiple iterations without needing human annotations. Each iteration builds upon prior examples to generate a more precise answer, mimicking a supervised learning loop with self-generated data.

The performance gains from these unsupervised methods were noteworthy. On the GSM8K dataset, created for math reasoning, unsupervised in-context learning applied to the Qwen2.5-Math model showed a 39.2% absolute improvement over the standard zero-shot baseline. Similarly, for the Llama-3.1-8B model tested across 13 natural language processing tasks, unsupervised fine-tuning resulted in a 23% average increase in accuracy. It equaled the performance of fully supervised fine-tuning in 6 of the 13 tasks. In vision-language tasks, unsupervised in-context learning also delivered impressive results—demonstrating a 23% improvement on the Food101 dataset and notable enhancements across other