Generalized Discrete Diffusion with Self-Correction

Demo of Self-Correction Without Remasking

Animation of a real SCDD generation trajectory: a 512-token sequence denoising from all [MASK] to text, with decoded tokens revised in place (without remasking) as generation proceeds.

Unlike masked-only denoising, SCDD can move between non-mask states during generation: a visible mistake can be corrected in place while masked positions continue to denoise in parallel.

Remask-free self-correction

During generation, decoded tokens can be revised directly without remasking, instead of being returned to [MASK].

Explicit transitions

Two SNR-informed schedulers offer separate control over absorbing-mask noise and uniform-transition noise.

Learned self-correction ability

Self-correction is learned during pretraining rather than added through post-hoc sampler heuristics.

Generator-Level Comparison

SCDD decouples token correction from masking. The result is a remask-free sampler with closed-form backward dynamics and separately controlled SNRs.

Comparison of MDLM, GIDD, and SCDD.
Model	Generator \(\displaystyle R_t(\mathbf z_t,\mathbf z_s),\quad \mathbf z_s \neq \mathbf m\)	Self-Correction	Remask-Free	Closed-form Backward	Decoupled SNRs
MDLM	\(\displaystyle \frac{\gamma_t'}{\gamma_t}\, \mathbf z_t^\top(\mathbf z_s-\mathbf m) \)	×	×	✓	-
GIDD	\(\displaystyle \begin{aligned} &\left( \frac{\gamma_t'}{\gamma_t} + \frac{\rho_t'}{\rho_t} \right) \mathbf z_s^\top\mathbf z_t - \mathbf z_t^\top \Bigg[ \textcolor{red}{\gamma_t\frac{\rho_t'}{\rho_t}}\mathbf u + \left( \textcolor{red}{(1-\gamma_t)\frac{\rho_t'}{\rho_t}} + \frac{\gamma_t'}{\gamma_t} \right)\mathbf m \Bigg] \end{aligned} \)	✓	×	×	×
SCDD	\(\displaystyle \begin{aligned} &\left( \frac{\gamma_t'}{\gamma_t} + \frac{\rho_t'}{\rho_t} \right) \mathbf z_s^\top \mathbf z_t - \mathbf z_t^\top \left( \textcolor{blue}{\frac{\rho_t'}{\rho_t}}\mathbf u + \frac{\gamma_t'}{\gamma_t}\mathbf m \right) \end{aligned} \)	✓	✓	✓	✓

Generative Perplexity

Lower Gen PPL indicates better unconditional text generation quality. SCDD achieves the best value in every reported LM1B and OWT sampling-step column.

Generative perplexity on LM1B and OWT across sampling steps. Lower is better.
Model	LM1B					OWT
Model	16	32	64	128	256	32	64	128	256	512	1024
MDLM^†	226.0	162.6	136.7	123.0	118.6	169.9	123.6	104.7	94.8	91.9	88.5
ReMDM-cap (0.01)	222.1	157.5	127.0	108.9	96.8	166.3	120.9	95.9	81.7	73.9	68.3
ReMDM-confidence	221.1	159.5	129.8	122.8	120.4	167.6	118.3	98.1	87.9	83.9	80.5
GIDD+ (p_u = 0.1)	171.1	146.4	134.9	131.9	128.7	82.1	71.4	66.7	65.0	64.8	63.8
GIDD+ (p_u = 0.2)	192.7	165.5	151.9	147.3	144.8	90.5	79.0	75.1	73.2	72.0	71.2
SCDD (p_u = 0.1, ours)	159.8	133.5	119.2	113.7	108.9	78.6	71.8	67.6	66.0	63.6	61.3
SCDD (p_u = 0.2, ours)	159.2	130.0	115.2	108.4	102.6	74.5	67.1	60.7	59.6	58.2	55.7

^†OWT models use a context length of 512 to match the GIDD setting reported in the paper.

Semantic Quality (LLM-as-a-Judge)

Generative perplexity rewards fluency under a reference language model, but it is a weak proxy for semantic quality: it can be driven down by repetitive text and says little about whether a sample is actually clear, accurate, or well written. We therefore evaluate semantic quality directly with LLM-as-a-judge protocol. We corrupt 256 clean OWT sequences to t = 0.8 with SCDD's forward process and let SCDD and GIDD+ denoise from the identical corrupted input at six step budgets. A GPT-5.4 judge scores each text 1–10 on five dimensions (clarity, grammaticality, factuality, style, creativity) and picks the winning text; we report per-metric means and the overall win rate. SCDD consistently outperforms GIDD+ on clarity, factuality, and style — with significant gains at higher step counts — and maintains higher overall win rates, reaching 60.6% at 1024 steps.

Table 4. Semantic quality under a GPT-5.4 judge — matched-schedule setting (SCDD p_u = 0.2 vs. GIDD+ p_u = 0.2). Each cell is SCDD (GIDD+); scores are 1–10 (higher is better).
Metric	Sampling steps
Metric	32	64	128	256	512	1024
Clarity	1.65 (1.49)^**	1.62 (1.52)	1.69 (1.50)^**	1.68 (1.56)^*	1.73 (1.50)^**	1.73 (1.48)^**
Grammaticality	1.38 (1.42)	1.46 (1.49)	1.45 (1.46)	1.49 (1.53)	1.51 (1.47)	1.57 (1.46)^*
Factuality	2.20 (2.11)^*	2.13 (2.07)	2.20 (2.05)^**	2.21 (2.12)^*	2.13 (1.97)^**	2.20 (2.07)^**
Style	1.62 (1.50)^*	1.63 (1.53)	1.66 (1.51)^**	1.66 (1.57)	1.70 (1.52)^**	1.73 (1.50)^**
Creativity	2.78 (2.99)^**	2.81 (3.02)^**	2.88 (3.14)^**	2.84 (3.14)^**	2.94 (3.14)^**	2.93 (3.15)^**
Win rate (SCDD)	55.9%	53.0%	55.3%	52.0%	58.1%^*	60.6%^**

Significance: ^*p < 0.05, ^**p < 0.01. Metric scores use a two-sided paired t-test; win rate uses a binomial test (n = 256).

Exact Recovery of Corrupted Tokens

SCDD substantially increases the overall edit rate during generation when compared to GIDD. We further want to know whether these edits are genuine corrections or spurious revisions. To test this directly, we run a controlled corruption–recovery experiment on clean OWT validation sequences: for each sequence, we corrupt K randomly chosen positions with uniformly sampled replacement tokens, then apply one SCDD denoising step at the final-step noise level (step 127 of a 128-step schedule). We report the touch rate, the fraction of corrupted tokens the model edits; the recovery rate, the fraction restored exactly to the original clean token; and generative perplexity before and after the step. SCDD touches nearly every corrupted token and exactly recovers 64–69% of them. At the heaviest corruption level (K = 50), a single step reduces Gen PPL from 154.8 to 25.5.

Table 6. Recovery of corrupted tokens on OWT validation sequences after a single denoising step. Gen PPL is reported before (corrupted) and after (corrected) the step. Mean ± standard error over 128 samples.
K	Touch rate	Recovery rate	Gen PPL
K	Touch rate	Recovery rate	Corrupted	Corrected
5	1.000 ± 0.000	0.694 ± 0.013	22.0 ± 0.3	23.8 ± 0.5
10	1.000 ± 0.000	0.652 ± 0.011	28.2 ± 0.4	24.0 ± 0.4
20	0.999 ± 0.001	0.647 ± 0.008	44.7 ± 0.7	24.3 ± 0.5
50	1.000 ± 0.000	0.644 ± 0.005	154.8 ± 2.2	25.5 ± 0.4

SCDD restores the majority of corrupted tokens exactly and collapses the perplexity gap in a single step, indicating that its self-corrections perform meaningful content recovery rather than spurious edits.

Citation

BibTeX

@article{wang2026generalized,
  title={Generalized Discrete Diffusion with Self-Correction},
  author={Wang, Linxuan and Wang, Ziyi and Bai, Yikun and Deng, Wei and Lin, Guang and Song, Qifan},
  journal={arXiv preprint arXiv:2603.02230},
  year={2026}
}