AI-Enhanced Scientific Writing

What is Quarto?

Quarto is an open-source scientific publishing system that enables researchers to weave together code, results, and narrative text into reproducible documents. Think of it as the evolution of R Markdown and Jupyter notebooks, a polyglot platform that works seamlessly with R, Python, Julia, and other languages.

A Quarto project combines:

Source files (.qmd or .ipynb): Plain-text documents containing code chunks, statistical analyses, and markdown text
Configuration (_quarto.yml): Defines the project structure, navigation, and visual theme
Rendered output (docs/ directory): Auto-generated HTML, PDF, or website versions of your analyses

For scientists, Quarto solves a critical problem: maintaining a single source of truth where code and interpretation live together, preventing the drift between analysis and write-up that plagues traditional workflows.

The Challenge: From Code to Coherent Narrative

While Quarto excels at rendering code and results, transforming computational analysis into publication-ready scientific prose remains cognitively demanding. I normaly must:

Synthesize complex operations into clear methodology descriptions
Extract quantitative results from rendered outputs
Maintain consistent scientific voice and style
Ensure logical flow between analysis sections
Balance technical accuracy with accessibility

This is where AI integration becomes transformative.

The AI-Enhanced Workflow Architecture

I have implemented a Claude-assisted scientific writing system in all my Quarto projects using Positron (IDE enviroment) built on three pillars:

1. Project Instructions (`CLAUDE.md`)

The CLAUDE.md file serves as the command center, providing Claude with:

System architecture overview: Explains the Positron IDE environment, Quarto structure, and file organization
Coding standards: Enforces tidyverse patterns, vectorization over loops, use of native R pipe |>, and mandatory code annotations
Workflow protocols: Defines how to interpret results, write/edit code, and handle git operations
Quick reference guide: Maps common requests to specific actions

This file essentially “programs” Claude to function as a domain-specific research assistant who understands both the technical environment and scientific goals.

2. Context Profiles (`.context/` directory)

Three critical files shape Claude’s behavior:

`quarto-narrative-skill/SKILL.md`

This is the heart of the enhancer system; a comprehensive instruction manual for transforming Quarto analyses into scientific narratives. It defines:

Input requirements: Reads three files per analysis:
1. The .qmd source (code logic and structure)
2. The rendered .html (numerical results and outputs)
3. The index.qmd (experimental context and research questions)
Core workflow:
1. Parse all input files to extract code operations, results, and context
2. Maintain exact section hierarchy from the .qmd structure
3. For each section: describe code operations → report HTML results → interpret biologically
4. Apply strict writing style constraints
5. Synthesize into cohesive narrative prose
HTML parsing strategy: Extracts content from rendered HTML by identifying text between tags (<p>, <td>, <pre>), ignoring JavaScript/CSS, and pulling numerical values from tables and code outputs
Special handling: Covers multiple comparisons, negative results, technical issues, and complex visualizations

`writingStyle_OSmithies.json`

A structured specification of scientific voice inspired by Nobel laureate Oliver Smithies. Key constraints:

Voice: Active, first-person agency (“We performed” not “was performed”)
Tone: Decisive, objective, authoritative—avoid hedging
Sentence structure: Under 25 words, one idea per sentence, linear flow
Data reporting: Specific values, explicit statistical significance
Flow pattern: Context → Action → Result → Meaning

Example transformation:

❌ "Analysis was performed using DESeq2"
✅ "We performed analysis using DESeq2"

❌ "Many genes were significant"
✅ "We detected 347 significant genes"

`git_workflow.md`

Enforces version control standards:

Commit format: Title line + bullet points explaining specific changes
Distinguishes between detailed explanations (for analysis code) and simple messages (for documentation)
Explicitly excludes AI attribution messages

IMPORTANT: These files live inside the Quarto project

3. The Invocation Pattern

When I need to document an analysis, I fire up Claude code in a terminal window and use the next prompt to trigger the workflow:

"Write the analysis for xxxxxx.qmd"

Claude then invokes quarto-narrative-skill:

Reads the source .qmd file (understanding methodology)
Reads the rendered .html file (extracting results)
Reads index.qmd (gathering experimental context)
Applies the Oliver Smithies writing style
Generates publication-ready prose that:
- Maintains the exact section structure from the .qmd
- Describes what each code chunk does and why
- Reports specific numerical results from the HTML
- Provides biological interpretation
- Uses active voice, concise sentences, and precise terminology

Example Output Pattern

Given a .qmd section:

## Differential Expression Analysis
dds <- DESeq(dds)
res <- results(dds, contrast=c("genotype", "APOE2", "WT"))
res_sig <- subset(res, padj < 0.05 & abs(log2FoldChange) > 1)

And HTML showing:

347 genes with padj < 0.05 and |log2FC| > 1
Upregulated: 198 genes
Downregulated: 149 genes

Claude generates: > “We performed differential expression analysis using DESeq2 to compare APOE2 and wild-type mice. We identified 347 genes with adjusted p-values below 0.05 and absolute log2 fold changes exceeding 1. Of these, 198 genes showed increased expression in APOE2 mice, while 149 showed decreased expression.”

Why This Approach Works

This system succeeds because it:

Separates concerns: Code lives in .qmd, results in .html, style rules in CLAUDE.md and .context/. Style rules can be reused in other projects.
Enforces consistency: The writing style JSON ensures every narrative follows the same rigorous standard
Preserves scientific accuracy: By reading both source code and rendered output, Claude reports exact values and methods
Maintains structure: The skill explicitly preserves the .qmd section hierarchy, preventing AI “creativity” from reorganizing logical flow
Is reproducible: The entire workflow is version-controlled and can be audited

Key Innovations

HTML-as-database: Treating rendered HTML as a structured data source for result extraction
Style-as-code: Formalizing scientific voice into a machine-readable specification
Context injection: Using .context/ files to provide persistent behavioral constraints
Skill-based invocation: Creating specialized “modes” for Claude through detailed skill definitions

Practical Benefits

For researchers, this system:

Reduces writing time from hours to minutes
Ensures consistent voice across multi-year projects
Catches methodology/results mismatches (if code doesn’t match description, it’s obvious)
Enables rapid iteration (change analysis → re-render → regenerate narrative)
Creates an audit trail (git history shows when code changed vs. when prose changed)

Files That Make It Work

Essential components:

CLAUDE.md: System instructions and quick reference
.context/quarto-narrative-skill/SKILL.md: Narrative generation protocol
.context/writingStyle_OSmithies.json: Voice specification
.context/git_workflow.md: Version control standards
_quarto.yml: Project configuration

Together, these files transform Claude from a general-purpose AI into a specialized scientific writing assistant that understands computational biology, R/Python code, statistical analysis, and publication standards.

The Future of AI-Assisted Science

This method demonstrates that AI’s value in research isn’t just running analyses, it’s in maintaining the connective tissue between computation and communication. By codifying scientific writing standards and creating structured workflows, we can leverage AI to handle the mechanical aspects of prose generation while researchers focus on interpretation and discovery.

The system is transparent, auditable, and version-controlled, essential properties for scientific reproducibility. It doesn’t replace scientific thinking; it amplifies it by removing friction between “I analyzed this” and “I can clearly explain what I did.”