Operationalizing Metascience: An AI Tool to Detect Hidden Methodological Flaws at Scale

Speaker: Peter Hilpert, University of Lausanne

Co-Authors:

Abstract

Metascience has documented persistent methodological flaws social research, including inadequate theory formalization, misspecified causal assumptions, problematic model selection, and fragile interpretability. These long-acknowledged issues remain largely unaddressed, rendering many findings uninterpretable even when statistically replicated. Replicating flawed studies generates false confidence and entrenches systemic error.

Peer review, the cornerstone of scientific quality control, systematically fails to detect these flaws—not due to negligence, but because the problems are unknown to most reviewers. Defined across dozens of fragmented metascience papers and rarely taught in curricula, they lie outside typical domain expertise. Even if known, comprehensive evaluation is impossible within the 1–2 hours available, making thorough assessment structurally unfeasible.

We propose a scalable solution: operationalizing metascience via modular, AI-driven automation. Using causal ambiguity as proof-of-concept, we manually double-coded this issue in 30 published studies to establish ground truth. We then developed a RAG-based LLM tool that analyzes full-text manuscripts section-by-section (abstract, introduction, methods) with a multi-step prompt incorporating definitions, linguistic markers, and consistency checks. Validation against human coding confirms feasibility: accuracy is promising and improvable, proving AI can reliably detect issues invisible to standard review.

This modular framework can extend to other detectable problems (e.g., theory vagueness, model selection sensitivity) by adapting prompts and retrieval components. Such tools could transform practice: guiding peer review with targeted flags for journals and funding agencies, embedding rigor in graduate training, supporting self-diagnosis during writing, and enabling large-scale corpus scans of millions of articles to quantify prevalence. Transparent, evidence-based documentation creates constructive pressure for improvement without blame.

Next steps require collective action: community refinement of the problem catalog, large-scale manual coding for training, and iterative AI benchmarking. This open, scalable infrastructure offers a concrete pathway to integrate metascience into everyday practice. We invite collaboration to deploy it across journals, funding agencies, and curricula.

Back to top