DRIFT: New Method for Optimizing Training Data in Large Language Models

Researchers introduced DRIFT, a novel method for optimizing training data in large language models by using on-policy influence functions to identify and refine the most impactful training instances. Tested on 7-billion-parameter models, DRIFT outperformed existing data curation methods in raising model performance ceilings.

Quick Facts

Who

Research team studying machine learning and LLMs

What

Proposed DRIFT method for data refinement

When

Submitted 16 June 2026

Where

arXiv (Computer Science > Machine Learning)

Proposed DRIFT method for data refinement
Used instance-level data attribution via Influence Functions
Applied on-policy rollouts as validation targets
Implemented signed weighting based on trajectory correctness
Conducted experiments on instruction and reasoning models

Researchers have proposed DRIFT (Data Refinement via On-Policy Influence Functions for Supervised Fine-Tuning), a novel approach to optimize training data distribution for Large Language Models (LLMs). The method addresses a critical challenge in machine learning: improving the performance ceiling of models rather than merely preserving performance with reduced data.

The core innovation of DRIFT lies in its use of instance-level data attribution through Influence Functions. Traditional data attribution methods struggle with two key structural problems: a proximity gap caused by off-policy validation targets and bias toward gradient norm. DRIFT overcomes these limitations by utilizing the model's on-policy rollouts as validation targets instead of relying on external reference data. This approach minimizes the parameter proximity gap and better aligns with the fundamental assumptions of influence functions.

The methodology incorporates signed weighting based on trajectory correctness and debiases influence scores against gradient hacking issues. This allows a small set of validation queries to serve as reliable anchors for attributing the entire dataset. Experiments conducted on 7-billion-parameter instruction and reasoning models demonstrate that DRIFT consistently improves performance across both model types, outperforming existing data curation baselines.

The research addresses a fundamental problem in supervised fine-tuning: while previous data curation methods excel at accelerating training under budget constraints, they are less effective at elevating the upper limit of model capabilities. DRIFT shifts the focus from identifying smaller subsets that preserve performance to refining the data distribution toward instances most capable of driving model improvement. The findings suggest that on-policy data attribution represents a promising direction for enhancing LLM training efficiency and capability.

Topics

Technology Tech Breakthrough Science Artificial Intelligence

#data curation #on-policy learning #supervised fine-tuning #large language models #machine learning #influence functions #training optimization #DRIFT

Why This Matters

DRIFT represents a paradigm shift in how researchers approach LLM training efficiency. Rather than simply reducing training data while preserving performance, this method focuses on elevating the performance ceiling—enabling models to achieve higher capabilities with carefully refined datasets. For practitioners building large language models, this offers a concrete pathway to improve model quality without requiring exponentially larger datasets, directly impacting the cost-effectiveness and scalability of LLM development.

Timeline & Sources

Jun 16, 2026

Wire

DRIFT research paper submitted to arXiv

Jun 18, 2026

Wire

DRIFT research paper published and announced

Entities

Sources

DRIFT: Refining Instruction Data via On-Policy Data Attributionarxiv_csMediaJun 18, 2026