GSoC 2026 / Week 2: Wiring the fit
Delegation, the missing iscategorical, and why I made one backend opt-in.
Hi again. Week 2 of my GSoC with GNU Octave is done. If Week 1 was the structural shell, Week 2 is when that shell starts doing real work, the anova class can now actually fit a model and populate results. It also surfaced one interesting clash with core Octave that I'll write about below.
If you missed Week 0 and Week 1: I'm wrapping Octave's three procedural ANOVA functions anova1, anova2, anovan) in a single stateful classdef. Week 1 built the constructor + backend-selection heuristic. Week 2 wired the actual delegation.
What shipped Still a single file: inst/anova.m. No edits to the procedural functions.
What's new:
ensureFit_()— lazy-refit guard: performs a fit if the object has never been fit or ifdirty_is set.fit_()— dispatcher that calls one of three backend-specific helpers based on the selected backend.fitAnova1_(),fitAnova2_(),fitAnovan_()— per-backend fit routines that unpack each procedural function’s stats struct into the class’s unified result properties.fit()public trigger so callers can force the lazy refit (used beforesummary()is called).Recognize the name-value pair
'reps', mapped to a privatereps_; this makes theanova2backend opt-in.Smoke-test BISTs for one-way, two-way balanced (popcorn), and N-way fixtures.
%% one-way
a = anova ([1; 2; 3; 4; 5; 6], [1; 1; 2; 2; 3; 3]);
a.fit ();
a.AnovaTable %% populated
a.MSE %% populated
a.DFE %% populated
%% two-way balanced (popcorn)
popcorn = [5.5, 4.5, 3.5; 5.5, 4.5, 4.0; 6.0, 4.0, 3.0; ...
6.5, 5.0, 4.0; 7.0, 5.5, 5.0; 7.0, 5.0, 4.5];
a = anova (popcorn, [], 'reps', 3);
a.fit (); %% backend = 'anova2'
%% N-way
a = anova (y, {g1, g2, g3});
a.fit (); %% backend = 'anovan', Coefficients / Residuals / X all populated
32 of 32 BISTs pass. The procedural API is untouched.
The interesting design choices
Per-backend result-property gaps are a feature, not a bug.
The three procedural functions return different stats structs:
anova1->means,gnames,n,df,s(wheresis sqrt(MSE)).anova2->sigmasq,colmeans,rowmeans,pval,df,model.anovan-> the full kitchen sink:coeffs,resid,X,dfe,mse,vcov,varest,eta_squared, …
The classdef promises a consistent property surface, so I adopted a clear convention:
AnovaTableandStatsare populated for all backends.DFEandMSEare populated for all backends, normalized from whichever field the backend provides (stats.df,stats.sigmasq, orstats.dfe/stats.mse).Coefficients,Residuals,DesignMatrix, andFittedValuesare populated only when the chosen backend exposes them. Foranova1andanova2they remain at their empty defaults.
That last bullet is an intentional design choice. The alternative—synthesizing those fields by re-running an internal linear solve would duplicate numerical logic that already lives in anovan. If you need Coefficients for a one-way design, call anova(y, g, 'SSType', 2), the SSType ≠ 3 predicate routes you to anovan automatically, and the design matrix comes along for free.
| Property | anova1 | anova2 | anovan |
|---|---|---|---|
| AnovaTable | ✓ | ✓ | ✓ |
| Stats | ✓ | ✓ | ✓ |
| DFE | ✓ | ✓ | ✓ |
| MSE | ✓ | ✓ | ✓ |
| Coefficients | — | — | ✓ |
| Residuals | — | — | ✓ |
| DesignMatrix | — | — | ✓ |
| FittedValues | — | — | ✓ |
It's the same trade MATLAB makes between anova1 (a function) and the LinearModel.anova method, the richer surface is opt-in by virtue of which entry point you choose.
Reconciliation lives inside the classdef
Last week I mentioned coordinating with my mentor. The headline outcome from that sync was a single principle:
When MATLAB's anova class semantics diverge from Octave's procedural functions, fix the gap inside the
classdef, not by editinganova1.m,anova2.m, oranovan.m
Week 2 had its first real test of this rule.
anova1.m depends on grp2idx.m, which uses iscategorical. As of Octave 10.3, core Octave doesn't ship iscategorical. So any call to anova1 blows up immediately on certain inputs. This is a pre-existing Octave bug; it's not mine to fix and it's not in scope.
But the classdef can't throw on every one-way design just because of that. The fix:
function fit_ (obj)
switch (obj.backend_)
case 'anova1'
try
obj.fitAnova1_ ();
catch
%% Reconciliation lives inside the classdef — anova1.m is not edited.
obj.backend_ = 'anovan';
obj.fitAnovan_ ();
end_try_catch
case 'anova2'
obj.fitAnova2_ ();
case 'anovan'
obj.fitAnovan_ ();
endswitch
obj.fitted_ = true;
obj.dirty_ = false;
endfunction
Summary of fixes
Ensure
fitted_/dirty_are set only after a successful fit (usesuccessflag).Add an
otherwisebranch to catch unknown backends.Keep the existing fallback: if
anova1fails, switch toanovanand try that (rethrow ifanovanfails)
Six lines. The class quietly downgrades to the more general backend, the user gets a correct ANOVA table, and anova1.m stays exactly as the package shipped it. When core Octave eventually adds iscategorical, the catch arm becomes dead code and gets deleted in a one-line cleanup commit.
The wrong move is to either (a) patch anova1.m to work around the missing function, or (b) let the class inherit a bug it didn't cause. Catching it at the class boundary is neither, it's just clean isolation.
I made anova2 opt-in via reps
Week 1's heuristic picked anova2 whenever the user passed two factors plus a matrix Y. In Week 2 I tightened that: anova2 now only fires when the user explicitly says 'reps', N.
The reason is brutally simple. anova2.m's signature is anova2(Y, reps, ...), where reps is the number of replicate rows per row-factor level. You cannot reliably infer reps from a generic (Y, GROUP) call without making assumptions that will silently mislabel data when the layout doesn't match. And silently mislabeled data → wrong ANOVA tables.
So the heuristic is now:
if (! isempty (obj.reps_) && ismatrix (obj.Y) ...
&& ! isvector (obj.Y) && ! any (isnan (obj.Y(:))) ...
&& isempty (obj.Continuous) && isempty (obj.Weights))
obj.backend_ = 'anova2';
elseif (obj.nFactors_ == 1 && isempty (obj.Continuous) ...
&& isempty (obj.Weights) && obj.SSType == 3)
obj.backend_ = 'anova1';
else
obj.backend_ = 'anovan';
endif
Anything that doesn't say 'reps', N falls through to anovan. The statistical result is the same; anova2 is just the fast path for the specific case where the user is certain their data has the right matrix layout.
This is the second time this summer I've narrowed a fast-path heuristic. The pattern keeps repeating: fast paths should be opt-in, not inferred. The cost of taking the general path is a small constant. The cost of taking the fast path on wrong-shape data is silently bogus answers.
What broke One thing this week that's worth flagging.
Right after the Week 2 commit, I tried to "polish" the file, extract a helper, rewrite some if/isfield chains into a tiny dispatch table, generally tidy things up. Functionally it still passed all 32 BISTs.
My mentor pushed back: the polish was changing code that didn't need to change. The original Week 2 was readable; the rewrite was just a different kind of readable. None of the restructuring was load-bearing for anything in Week 2's scope.
I reset the branch, reapplied only the strictly necessary edits (a hardcoded 'off' was leaking past the user's Display property; three method comments only restated the function name), and force-pushed. Net change vs. the original Week 2: +4 lines, −9 lines.
Lesson, written down so I don't make it again: when you're contributing to someone else's repo, the diff is the unit of review. Every line of restructuring is a line a maintainer has to read and reason about. If it doesn't change behaviour or fix a bug, it doesn't belong in this PR — file it for later or skip it entirely.
What's next Week 3 is summary(), disp(), and extracting the ANOVA-table formatting from anovan.m into a shared helper. Once summary() lands, it'll call ensureFit_() internally and the temporary public fit() becomes redundant kept as a convenience method but no longer the primary trigger.
Also: I'll be opening a separate PR for the MoM follow-ups (string / categorical / table input handling at the class boundary, MATLAB compat for the class entry points). That work is purely additive, no edits to anything Week 1 or Week 2 already landed.
Links
Week 1 + Week 2 branch: github.com/beingamanforever/statistics/tree/gsoc-2026-annova
Week 2 fit delegation commit: github.com/beingamanforever/statistics/commit/2033db35
Week 2 polish commit: github.com/beingamanforever/statistics/commit/82a06cf2
Project repo: github.com/gnu-octave/statistics
X: @beingamanFF
Week 2 was less about new ideas than about making last week's structural promises real. The class now does what the shell said it would, and it does it without touching the procedural code underneath. The interesting question for Week 3 isn't can we delegate, that's now answered, it's what does the unified surface feel like when summary() is the thing the user actually calls.
See you next Sunday.
— Aman

