AI-collaborative research methodology

This document is the canonical reference for the AI-collaborative research methodology framework I am developing and applying across this portfolio. Preregistration discipline, sign conventions, locked-vs-working artifacts, code reproducibility, manuscript hygiene, and AI-friendly publishing are not separate practices — they are the same framework. Doing serious science in active collaboration with AI requires explicit machinery that prevents the predictable failure modes; that machinery is what this document specifies.

The portfolio is the worked-example track. Standalone artifacts about the framework itself (writeups, papers) are in active development and will be listed under Projects → AI-collaborative research methodology as they appear. Until then, the framework is best read through this document plus the concrete repositories cited throughout.

The document exists so that:

The running examples throughout are the epistasis pair: epistasis-transformer-heads (ML side, in progress) and developmental-epistasis-scrna (biology side, pilot stage). Together they exhibit every principle below.


1. Project lifecycle

Every project moves through three stages, in this order. Skipping a stage or running them out of order is the single most common cause of wasted work and post-hoc rationalization.

Stage A — Scaffolding

Example: developmental-epistasis-scrna’s methodology/observational_epistasis_limits.md asks whether scRNA-seq alone can yield ε reliably before any biology pilot is run.

Stage B — Pilot

Example: developmental-epistasis-scrna/pilot/pilot_calibration_verdict.json is the gate. Until it passes, no preregistration is locked and no decision-defining analysis is run.

Stage C — Full run under locked preregistration

Do not run “exploratory” analyses on the full data and then write a preregistration that matches what you found. The point of preregistration is to separate the question from the answer.


2. Preregistration discipline

Locked vs working drafts

Two files exist for every preregistered analysis:

Both committed. LOCKED is the contract; the working draft is for thinking.

The lock event is a git commit with a clear message (“Lock Tier 1 preregistration v1 for Pythia 410M”) and is referenceable by SHA. Any verdict computation must cite the LOCKED file path and the commit SHA in its output.

What goes in a preregistration

Every preregistration contains:

  1. Hypotheses — primary, secondary, and alternative, each with a pre-specified statistical test
  2. Pass criteria — concrete numerical thresholds (effect size, p threshold, agreement-with-witness ratio, bootstrap CI exclusion)
  3. Failure modes named in advance — what would refute the hypothesis, not just what would support it
  4. Order of operations — exactly which step runs first, what its output is, and how that output gates the next step
  5. Sample / data scope — exactly which checkpoints, which subjects, which subsets of the data, locked at the time of the lock

Versioning: when the preregistration changes substantively, a new version is created (v2_multicheckpoint, v3_olmo2_1b) rather than editing the prior LOCKED file. The history is preserved.

Example: epistasis-transformer-heads/analyses/ has v1 (Pythia 410M), v2 (multi-checkpoint), v3 (OLMo-2 1B), each as both .md and .LOCKED.md. Six files for three preregistrations — the doubling is the point.

Lottery-ticket trap

If preliminary results look promising, the preregistration must be locked before the full analysis runs against the unseen data. If the lock happens after the analyst has seen the result, the preregistration is post-hoc rationalization — and worth nothing.


3. Sign conventions and units

A surprising amount of error in interaction studies comes from inconsistent sign conventions across loss space, fitness space, and the published literature. Every project locks its sign convention at the top of design_notes.md §1, before any analysis script imports a metric.

Pattern (epistasis example)

The first section of developmental-epistasis-scrna/design_notes.md is captioned “§1 Sign convention (READ FIRST. Locked.)” and contains:

This convention is then cited by name throughout the codebase: every analysis script and every verdict report references “the convention locked in design_notes.md §1”. An analyst (or AI agent) who is uncertain about a sign reads §1 once and is settled.

What this prevents

In the sister ML project, an inconsistent sign convention propagated through preregistrations v1 / v2 / v3 before being caught. The biology project added the locked §1 explicitly to prevent the same error from propagating across substrates. The cost of this section is one paragraph; the saved cost is several rounds of confused interpretation.


4. Repository structure

Every research repo follows the same skeleton. Specifics adapt to whether the work uses scripts, notebooks, or both.

<repo>/
├── README.md                       # Standardized template (see §5)
├── LICENSE                         # MIT (code) + CC-BY-4.0 (data/manuscript)
├── .gitignore                      # Standardized exclusions (see below)
│
├── paper/                          # Manuscript and final-form artifacts
│   ├── manuscript.md   or main.tex # Source
│   ├── main.pdf        or *.docx   # Compiled
│   └── figures/                    # Embedded figures
│
├── src/   or scripts at root       # Analysis code
├── notebooks/                      # Colab / Jupyter notebooks (parametric where applicable)
├── analyses/                       # Preregistrations (locked + working) and per-analysis writeups
├── methodology/                    # Method-scoping notes (observational vs experimental, etc.)
├── tests/                          # Unit tests for any non-trivial primitive
├── data/                           # Intermediate CSVs / JSONs (committed; reproduces every paper number)
├── figures/                        # Publication figures (PDF + PNG, 300 DPI)
├── supplementary/                  # Supplementary tables (CSV)
└── config/                         # YAML configs (per-model, per-dataset)

Standard .gitignore

Every project gitignores the same baseline:

# Python
__pycache__/
*.py[cod]
.Python
.pytest_cache/
.mypy_cache/
.ipynb_checkpoints/

# OS / Editors
.DS_Store
.vscode/
.idea/
*.swp

# Claude Code session metadata
.claude/

# Build artifacts
paper/*.aux
paper/*.log
paper/*.out

# Secrets — never commit
.env
*.env
secrets.json

Always include .claude/ (Claude Code session metadata) and *.env (secret env files) — these are the two most common accidental commits and both are sensitive.

Numbered execution-order scripts

When a pipeline has more than two steps, scripts are numbered in execution order: step01_loop_components.py, step02_kegg_loop_extraction.py, … Each script is self-contained: reads from data/ (or upstream public datasets), writes to data/ and figures/. Side-effects are explicit; ordering is unambiguous.

For pipelines with branching versions (alternate methodologies tested), suffixes preserve provenance: step02_kegg_loop_extraction.py, step02_kegg_loop_extraction_v2.py, step02_kegg_loop_extraction_v3.py. The _v3 is canonical (the README says so); the older versions are kept for reproducibility of intermediate decisions.

Notebook builders

When the same notebook needs to run with parameter variations (e.g., one notebook per model checkpoint), use a Python builder:

This keeps notebook diffs small and parametric, and lets agents read the intent from the builder rather than the rendered notebook.


5. README template

Every research repo’s README has seven blocks in this order. The full template is in _landing_work/README_TEMPLATE.md; the short summary is here.

A. Top matter — badges + title + author

B. Brief Summary

C. Datasets / Inputs

D. Repository structure

E. Reproducing the analysis

F. Citation

G. Contact / License


6. Code discipline

Determinism

Numerical hygiene

Statistics

Witness comparisons

When extending a prior analysis, the new code reproduces the prior result as a witness check before computing anything new. The pass criterion is “agreement within 2·SE of the prior point estimate”. Failures here mean the new pipeline diverges from the prior pipeline; fix that before producing new findings.

Example: epistasis-transformer-heads/notebooks/01_phase1_validation.ipynb reproduces the Paper 2 (functional-differentiation-dfe) single-ablation Δ on Pythia 410M step 143000 as a witness, with explicit pass/fail criteria for SHA-256 round-trip, self-pair guard, pair commutativity, and idempotency.


7. Manuscript discipline

Source format

Sections and figure captions

References

Acknowledgements

Per the pi-tissue-aging post-review precedent, Acknowledgements that name LLMs as a “thinking and implementation partner” are NOT included in published manuscripts. Computational tools (compilers, editors, statistical packages, LLMs) are not declared in Acknowledgements, consistent with standard practice.

Hardware and runtime details (e.g., “Google Colab Pro A100, 82 minutes”) remain in submission_checklist.md and in the README’s reproduction block, where they belong.


8. Identity and contact discipline

One canonical identity, applied uniformly across every artifact:

Field Canonical value
Name Theodor Spiro (no other spelling, ever)
Affiliation Vaika Inc., East Aurora, NY, USA
Role Independent Researcher
ORCID 0009-0004-5382-9346
Public email tspiro@vaika.org
GitHub mool32
Site mool32.github.io

Where this matters

Every README, every manuscript author block, every Zenodo deposit, every arXiv submission, every Google Scholar entry uses these exact values. Inconsistency anywhere creates citation graph fragmentation: a paper deposited as “Theodor S.” may not be linked to “Theodor Spiro” by Scholar.

The historical typo “Serbanescu” appeared in one repo’s LaTeX source and was corrected in commit 031d008 of ai-evolution-universal-signatures — that correction also appears in /llms.txt so AI agents that encounter the stale metadata in cached snapshots have an authoritative correction.

When publishing under a prior name (not applicable here, but documented)

If a researcher does have a prior legal name, the recommended approach is: (a) keep historical record under the prior name, (b) add an explicit name-change note on /about and on the relevant repo, (c) use sameAs in JSON-LD to link both identities.


9. Honest negative results

Negative results — non-replications, failed validations, refuted hypotheses — are first-class outputs, not failures hidden away.

Pattern

Example

eeg-connectivity-contrast documents a five-dataset non-replication of the Connectivity Contrast Index across 18 metrics × 126 subjects. The honest framing — “no metric replicates across datasets; the strongest within-subject signal reverses direction across paradigms” — is exactly the framing the agent test in §15 described as “unusual in this space” and “a sign of intellectual maturity”.

This is not branding. It is the only correct way to record an outcome that is real but disagrees with the initial hypothesis.


10. Process documentation

HANDOFF.md, RESUME_HERE.md, SESSION_STATE_*.md, PROJECT_MASTER_INDEX.md are kept publicly visible when they reflect the genuine process by which the work unfolded.

When to keep them

When to remove them

The default in this portfolio is keep. See functional-differentiation-dfe for the canonical example: HANDOFF.md, RESUME_HERE.md, SESSION_STATE_.md, and PROJECT_MASTER_INDEX.md are all public and explicitly described in the README structure tree as *“kept deliberately as a record of how the work unfolded session by session”.


11. AI / LLM use in the work

In the work itself

LLMs are used as analysis assistants, code-generation partners, and manuscript-editing collaborators. This use is treated like any other computational tool (compilers, statistical packages, editors): not declared in Acknowledgements (see §7), but present in process documentation (§10) when relevant for reproducibility.

In the workflow tooling

The research-pipeline/ private workspace contains the prompts and gating structure that govern how LLMs participate in idea generation, literature search, and analysis planning. The principle is: automate the transport, not the judgment. Wide funnel → hard filter → precise execution. Human gates at every decision-defining step.

Disclosure norms

For ML/CS venues (NeurIPS, ICLR, arXiv): no Acknowledgements naming LLMs. For biology / biomedical venues (bioRxiv, journal submissions following ICMJE-style guidance): same — bioRxiv screener feedback on pi-tissue-aging v4 confirmed that naming computational tools as Acknowledgements is non-standard.

If a venue’s submission guidelines explicitly require AI-use disclosure, the disclosure goes in the field they specify (cover letter, ethics statement) — not the manuscript Acknowledgements.


12. License pattern

Every research repo uses a dual-license model:

Single LICENSE file at repo root contains both blocks. Canonical template: mool32/heart/LICENSE (MIT only) and the dual form used by pi-tissue-aging/LICENSE (MIT + CC-BY-4.0). The dual form is preferred for any repo that contains the manuscript itself.

The README’s License section names the split explicitly:

- Code (`scripts/`, `*.ipynb`): MIT
- Data (`data/*.csv`): CC-BY 4.0
- Figures (`figures/*`): CC-BY 4.0
- Manuscript (`paper/*`): CC-BY 4.0

Upstream data sources retain their own copyrights; this is named in the data section (“with KEGG / GTEx / TMS / etc. attribution requirements honored per their respective licences”).


13. Repository metadata

Every public research repo has GitHub metadata set, not just READMEs. Without metadata, search and discoverability are broken even when the content is excellent.

Three required fields

gh repo edit mool32/<repo> \
  --description "<full paper title or one-sentence TL;DR>" \
  --homepage "<preprint PDF raw URL or DOI URL>" \
  --add-topic <topic> --add-topic <topic> ...

Controlled topic vocabulary

Bucket Topics
Domain aging-research, neuroscience, cancer-genomics, transcriptomics, eeg, ecg, cognition, creativity
Method mechanistic-interpretability, signal-processing, population-genetics, dfe, epistasis, oscillatory-dynamics, network-analysis, functional-differentiation
Object pythia, transformer, gtex, tabula-muris-senis, kegg, metaculus
Cross cross-domain, negative-result, in-progress, independent-research, reproducible-research, biomarkers

Always include independent-research and reproducible-research. Pick 2–4 domain/method tags. Pick 0–2 object tags. Use cross-domain only when the work substantively bridges substrates (LLM ↔ biology, biology ↔ physics, etc.) — not when it merely mentions an analogy.


14. Cross-project linking

Companion papers and sister projects are explicitly cross-linked in both README and manuscript.

README pattern

The header block of every project links to its companion(s):

🧬 **Companion paper:** [Title of companion](https://github.com/mool32/<companion-repo>)
   — one-line description of how it relates
🧪 **Sister project:** [Title](https://github.com/mool32/<sister-repo>)

Example: epistasis-transformer-heads’s README header links to both functional-differentiation-dfe (predecessor) and the arXiv:2604.10571 preprint (broader framework). developmental-epistasis-scrna links to epistasis-transformer-heads as the source of the testable hypothesis.

Why this matters

A reader (or AI agent) arriving at any node of a multi-paper program can navigate to every other node without leaving the GitHub ecosystem. This is also how single citation events propagate: cite one of the papers, the reader discovers the others via the README links.


15. AI-friendly publishing

Documented in detail in /llms.txt and in the head of every page on the landing site. Summary:

Cold-fetch agent test of the live site (commit 92f1255, 2026-05-01) verified that an AI agent with no prior context correctly extracted name, affiliation, ORCID, all five themes, six recent papers with concrete findings, and the cross-domain unique-angle framing. The agent identified /llms.txt as the primary navigation aid and described the site as “gold standard from an agent’s perspective”. One real bug (DAT-RU link badge mismatch) was found and fixed in the same session.

This is the bar. New repos and pages should be designed so that a cold agent can answer “tell me about this work” correctly without the user intervening.


16. Reference exemplar: the epistasis pair

The two epistasis repos jointly demonstrate every principle in this document. Use them as the running template when starting a new project.

ML side: epistasis-transformer-heads

Biology side: developmental-epistasis-scrna

Pair-level pattern

The two repos illustrate the key claim: two substrates, one apparatus. Same sign convention (with translation), same statistical machinery (epistasis = non-additivity of pairwise perturbation effects), same preregistration discipline. The substrates differ; the methodology generalizes.


17. Quick checklist for a new project

Before pushing the first commit:

After preregistration locks:

Before public release:


Document version

This methodology document is versioned with the rest of the site repo; substantive changes are recorded in the git history of mool32.github.io/methodology.md. The principles here reflect the conventions in use across the portfolio as of 2026-05-02; future revisions will add new principles, deprecate outdated ones, and cite the precedent for each change.