Compiler Internals
This document summarizes the Zynx compiler architecture and backend pipeline.
It is intended for readers who want to understand how Zynx source code is lowered to LLVM IR and then to native binaries.
Transpilation Pipeline
From a high level, compiling a Zynx program involves the following stages:
-
Lexing
Source text is converted into a stream of tokens (keywords, identifiers, literals, operators, punctuation). -
Parsing
Tokens are organized into an Abstract Syntax Tree (AST) that represents the syntactic structure of the program. -
Semantic Analysis
The compiler performs:- name resolution,
- type checking,
- ownership and borrow checking,
- validation of attributes and target filters.
-
Monomorphization
Generic functions, structs, and type aliases are instantiated for each concrete type combination used in the program.
The result is an expanded, fully monomorphic AST suitable for backend code generation. -
LLVM IR Lowering
The monomorphic AST is translated into LLVM IR. This includes:- emitting declarations and definitions for functions, structs, enums, and error sets,
- applying the Zynx symbol mangling scheme,
- honoring ABI and layout attributes such as
@native.
-
Backend Compilation
The generated LLVM IR is passed through the LLVM toolchain (for example, Clang/LLVM).
That toolchain is responsible for low-level optimization, machine-code generation, and linking, producing the final binary or library.
Source Directory Structure
The Zynx compiler implementation is organized into several logical components:
-
src/lex/
Lexical analyzer responsible for turning input text into tokens. -
src/parse/
Recursive‑descent parser that constructs the initial AST from the token stream. -
src/analysis/
Semantic analysis and type checking, including ownership and borrow analysis, attribute handling, and target filtering. -
src/codegen/
Backend code generator that lowers the analyzed AST into LLVM-facing constructs and applies symbol mangling. -
src/utils/
Shared utilities used across the compiler (diagnostics, data structures, helpers).
Exact paths and naming may evolve over time, but this reflects the conceptual separation of concerns inside the compiler.
AST Architecture
The internal AST is composed of generic struct node objects:
- Each node records a kind (for example,
NODE_FUNC,NODE_EXPR,NODE_TYPE). - A node contains a generic header plus a
selfpointer that refers to a more specific structure for that kind. - Helper macros such as
NODE_CAST(node, type)are used to downcast from the generic node to the concrete representation.
This layout allows:
- uniform handling of traversal and ownership for all node kinds, and
- specialized fields and invariants per node type without duplicating the generic header.
Transformations such as monomorphization and target filtering operate on this AST, producing updated nodes and subtrees as needed.
Safety and Optimization
Several analyses and transformations work together to enforce safety guarantees and optimize the generated code:
-
Ownership and Borrowing
The analysis phase tracks ownership, moves, and borrows. It rejects patterns that would create use‑after‑free, data races, or other unsafe aliasing. This information is used to insert or elide cleanup calls at precise points. -
Liveness Analysis
Liveness determines when values are no longer needed. The compiler uses this to decide where to emitdroplogic and to minimize the lifetime of owned resources. -
Unique Pointer Lowering
Unique<T>values are lowered to heap allocations via__zynx_builtin_allocwith automatic cleanup through GCC/Clang__attribute__((cleanup(...))). Returning a locally-created value as&Tis rejected at compile time — the programmer must useUnique<T>to make heap allocation explicit. -
Name Mangling
Zynx encodes module paths, type information, and overloading into mangled symbol names that are valid C identifiers. This enables:- linking multiple modules and generics without collisions, and
- interoperation with C tooling that expects stable symbol names.
For more detail on mangling, see Symbol Mangling.
See also
- Memory Model – Ownership rules, borrows, and deterministic cleanup.
- Symbol Mangling – Encoding of Zynx names into C symbols.
- Static Linking – How compiled outputs are packaged and linked.