Emerging
Jun 18, 20261
67%
DRIFT: New Method for Optimizing Training Data in Large Language Models

Researchers introduced DRIFT, a novel method for optimizing training data in large language models by using on-policy influence functions to identify and refine the most impactful training instances. Tested on 7-billion-parameter models, DRIFT outperformed existing data curation methods in raising model performance ceilings.

Quick Facts
Who
Research team studying machine learning and LLMs
What
Proposed DRIFT method for data refinement
When
Submitted 16 June 2026
Where
arXiv (Computer Science > Machine Learning)
- Proposed DRIFT method for data refinement
- Used instance-level data attribution via Influence Functions
- Applied on-policy rollouts as validation targets
- Implemented signed weighting based on trajectory correctness
- Conducted experiments on instruction and reasoning models
Researchers have proposed DRIFT (Data Refinement via On-Policy Influence Functions for Supervised Fine-Tuning), a novel approach to optimize training data distribution for Large Language Models (LLMs). The method addresses a critical challenge in machine learning: improving the performance ceiling of models rather than merely preserving performance with reduced data.
The core innovation of DRIFT lies in its use of instance-level data attribution through Influence Functions. Traditional data attribution methods struggle with two key structural problems: a proximity gap caused by off-policy validation targets and bias toward gradient norm. DRIFT overcomes these limitations by utilizing the model's on-policy rollouts as validation targets instead of relying on external reference data. This approach minimizes the parameter proximity gap and better aligns with the fundamental assumptions of influence functions.
The methodology incorporates signed weighting based on trajectory correctness and debiases influence scores against gradient hacking issues. This allows a small set of validation queries to serve as reliable anchors for attributing the entire dataset. Experiments conducted on 7-billion-parameter instruction and reasoning models demonstrate that DRIFT consistently improves performance across both model types, outperforming existing data curation baselines.
The research addresses a fundamental problem in supervised fine-tuning: while previous data curation methods excel at accelerating training under budget constraints, they are less effective at elevating the upper limit of model capabilities. DRIFT shifts the focus from identifying smaller subsets that preserve performance to refining the data distribution toward instances most capable of driving model improvement. The findings suggest that on-policy data attribution represents a promising direction for enhancing LLM training efficiency and capability.
Why This Matters
DRIFT represents a paradigm shift in how researchers approach LLM training efficiency. Rather than simply reducing training data while preserving performance, this method focuses on elevating the performance ceiling—enabling models to achieve higher capabilities with carefully refined datasets. For practitioners building large language models, this offers a concrete pathway to improve model quality without requiring exponentially larger datasets, directly impacting the cost-effectiveness and scalability of LLM development.
Timeline & Sources
Jun 16, 2026
WireDRIFT research paper submitted to arXiv
Jun 18, 2026
WireDRIFT research paper published and announced