Home / Gaming / Death Stranding 2 Introduces Boss Fight Skip Feature for Seamless Exploration

Gaming

Death Stranding 2 Introduces Boss Fight Skip Feature for Seamless Exploration

abril 14, 2025 6:17 pm

Artificial intelligence systems have progressed significantly in replicating human-like reasoning, especially in mathematics and logic. These systems not only produce answers but also provide a step-by-step logical process, known as Chain-of-Thought (CoT), to reach those answers. This method is crucial for complex problem-solving tasks in machines.

A common challenge faced by researchers using these systems is inefficiency in inference. AI models often continue processing even after arriving at a correct solution, wasting computational resources. The models’ ability to recognize correct intermediate answers internally remains uncertain. If they could identify correctness sooner, they could terminate processing earlier, enhancing efficiency without sacrificing accuracy.

Many current methods assess a model’s confidence through verbal prompts or by examining multiple outputs. These black-box strategies involve asking the model to express its certainty. However, these are often inaccurate and costly in computational terms. Conversely, white-box methods study the model’s hidden states to find signals correlating with answer correctness. Previous research indicates that a model’s internal states can signal the accuracy of answers, though applying this to intermediate steps in reasoning chains is less explored.

A research team from New York University and NYU Shanghai addressed this gap by creating a lightweight probe—a simple two-layer neural network—to examine a model’s hidden states during reasoning. They experimented with models like the DeepSeek-R1-Distill series and QwQ-32B, renowned for their logical reasoning skills, across various datasets involving math and logic tasks. The probe was trained to assess whether intermediate answers were correct by analyzing states tied to each reasoning chunk.

The researchers divided each CoT output into segments using markers like “wait” or “verify” to spot reasoning breaks. The hidden state of the last token in each segment was paired with a correctness label judged by another model. These representations were utilized to train the probe on binary classification tasks. By employing grid search to fine-tune learning rate and hidden layer size, the probe was optimized, often converging to linear forms, revealing that correctness signals are linearly embedded in the hidden states. The probe accurately predicted correctness even before answers were fully completed, indicating look-ahead capabilities.

Performance metrics were promising. The probes achieved ROC-AUC scores over 0.9 for certain datasets like AIME when employing models such as R1-Distill-Qwen-32B. Expected Calibration Errors (ECE) remained below 0.1, indicating strong reliability. For instance, R1-Distill-Qwen-32B recorded an ECE of just 0.01 on GSM8K and 0.06 on MATH datasets. A confidence-based early exit strategy was implemented using the probe during inference. When the probe’s confidence exceeded a threshold, reasoning halted. At a confidence level of 0.85, accuracy was 88.2%, with a 24% reduction in inference tokens. At a threshold of 0.9, accuracy was 88.6%, with a 19% token reduction. This dynamic approach provided up to 5% higher accuracy using the same or fewer tokens compared to static exit methods.

<img decoding="async" src