Wind-Edge-1.6-Instruct is a compact Qwen3-compatible assistant model produced through depth pruning, short healing, and multi-stage supervised distillation. The goal is not frontier reasoning, but a practical small model that can run locally, answer short prompts, provide simple code, and avoid the identity/template collapse observed in earlier instruct checkpoints.
| Stage | Description |
|---|---|
| Base initialization | Qwen3-compatible Wind-Edge architecture initialized from compatible public weights. |
| Depth pruning | 28 transformer blocks reduced to 18 blocks, yielding roughly 0.44B parameters. |
| Healing | Single-schedule heal training restored language modeling quality after pruning. |
| Distill SFT | Claude-heavy public distillation mix plus OpenOrca, OpenHermes, Open-Platypus, OpenCoder, and OpenMathInstruct. |
| Behavior polish | Final local polish corrected identity, greeting, simple arithmetic, list sorting, and concise code responses. |
The first instruct attempt collapsed into a repeated sorting template. Investigation showed that short phased training with long warmup windows caused an ineffective learning-rate schedule, and later SFT data reinforced generic step-by-step patterns. A second issue came from template mismatch: training examples were rendered through the default Qwen thinking template while deployment used enable_thinking=False. The final recipe forces no-thinking rendering for SFT and adds a minimal default identity system message in the chat template.
The final distillation run used approximately 12M tokens from a weighted mix of public Claude-style distillation datasets and instruction corpora. Rows with bad self-identity claims such as “I am a human” were filtered where detected. A 6M-token adaptation pass trained against the final default chat template, followed by a 2M-token local quality polish set.
temperature: 0.55-0.70
top_p: 0.85-0.92
repetition_penalty: 1.05-1.08
enable_thinking: false
max_new_tokens: 128-512
Use trust_remote_code=True. For deterministic tests, use greedy decoding with repetition_penalty=1.06.
| Prompt | Expected final behavior |
|---|---|
| Who are you? | Identifies as Wind-Edge-1.6, not as a human. |
| sort this list: [3, 1, 2] | Returns [1, 2, 3]. |
| 60 miles in 1.5 hours | Computes 40 mph. |
| Fibonacci function | Produces a concise Python implementation. |
Wind-Edge-1.6 remains a small model. It can hallucinate details, fail multi-step reasoning, and make arithmetic errors outside simple patterns. Applications should verify important outputs and prefer concise prompts.