nl2spec is a research-oriented pipeline for transforming natural language descriptions of API usage rules into executable Runtime Verification (RV) specifications using Large Language Models (LLMs).
The project focuses on prompt engineering, generation, validation, and evaluation of Intermediate Representations (IRs), enabling controlled experiments such as zero-shot, one-shot, and few-shot prompting.
π Architecture Status (Important)
β οΈ The prompting and generation architecture is frozen.
From this point forward:
- The codebase is stable
- Experimental variations must be performed only via configuration
- Structural changes require explicit justification
This freeze enables reproducible batch experiments, ablation studies, and sound empirical evaluation.
π§ Supported IR Types
The pipeline supports the following Intermediate Representation (IR) types:
- FSM β Finite State Machine specifications
- ERE β Event-Response Expressions
- EVENT β Event-based rules
- LTL β Linear Temporal Logic specifications (experimental)
The IR type is automatically inferred from each scenarioβs metadata. No manual selection is required.
π Repository Structure
nl2spec/
βββ core/ # Core validation and LLM abstractions
β βββ inspection/ # IR schema validation
β βββ handlers/ # Few-shot loaders
β βββ llms/ # LLM backends (mock, real)
β
βββ prompts/ # Prompt templates (FSM / ERE / EVENT / LTL)
β
βββ pipeline/ # Orchestration (generation, logging, batch runs)
β
βββ datasets/
β βββ nl_scenarios.json # Natural language scenarios
β βββ fewshot/ # Few-shot examples (fsm/, ere/, event/, ltl/)
β
βββ outputs/ # Experimental results (CSV, tables)
β
βββ tests/ # Unit and integration tests
β
βββ config.yaml # Experimental configuration (DO NOT hardcode)
βββ README.md