Wind-Edge-1.6: Compact Distillation for Edge Assistant Inference

Technical note, May 2026. Model: arthu1/Wind-Edge-1.6-Instruct.

Abstract

Wind-Edge-1.6-Instruct is a compact Qwen3-compatible assistant model produced through depth pruning, short healing, and multi-stage supervised distillation. The goal is not frontier reasoning, but a practical small model that can run locally, answer short prompts, provide simple code, and avoid the identity/template collapse observed in earlier instruct checkpoints.

Model Construction

StageDescription
Base initializationQwen3-compatible Wind-Edge architecture initialized from compatible public weights.
Depth pruning28 transformer blocks reduced to 18 blocks, yielding roughly 0.44B parameters.
HealingSingle-schedule heal training restored language modeling quality after pruning.
Distill SFTClaude-heavy public distillation mix plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct.
Behavior polishFinal local polish corrected identity, greeting, simple arithmetic, list sorting, and concise code responses.

Training Lessons

The first instruct attempt collapsed into a repeated sorting template. Investigation showed that short phased training with long warmup windows caused an ineffective learning-rate schedule, and later SFT data reinforced generic step-by-step patterns. A second issue came from template mismatch: training examples were rendered through the default Qwen thinking template while deployment used enable_thinking=False. The final recipe forces no-thinking rendering for SFT and adds a minimal default identity system message in the chat template.

Data Mix

The final distillation run used approximately 12M tokens from a weighted mix of public Claude-style distillation datasets and instruction corpora. Rows with bad self-identity claims such as “I am a human” were filtered where detected. A 6M-token adaptation pass trained against the final default chat template, followed by a 2M-token local quality polish set.

Recommended Inference

temperature: 0.55-0.70
top_p: 0.85-0.92
repetition_penalty: 1.05-1.08
enable_thinking: false
max_new_tokens: 128-512

Use trust_remote_code=True. For deterministic tests, use greedy decoding with repetition_penalty=1.06.

Observed Sanity Behavior

PromptExpected final behavior
Who are you?Identifies as Wind-Edge-1.6, not as a human.
sort this list: [3, 1, 2]Returns [1, 2, 3].
60 miles in 1.5 hoursComputes 40 mph.
Fibonacci functionProduces a concise Python implementation.

Limitations

Wind-Edge-1.6 remains a small model. It can hallucinate details, fail multi-step reasoning, and make arithmetic errors outside simple patterns. Applications should verify important outputs and prefer concise prompts.