Skip to main content

Compiler Internals

This document summarizes the Zynx compiler architecture and backend pipeline.
It is intended for readers who want to understand how Zynx source code is lowered to LLVM IR and then to native binaries.

Transpilation Pipeline

From a high level, compiling a Zynx program involves the following stages:

  1. Lexing
    Source text is converted into a stream of tokens (keywords, identifiers, literals, operators, punctuation).

  2. Parsing
    Tokens are organized into an Abstract Syntax Tree (AST) that represents the syntactic structure of the program.

  3. Semantic Analysis
    The compiler performs:

    • name resolution,
    • type checking,
    • ownership and borrow checking,
    • validation of attributes and target filters.
  4. Monomorphization
    Generic functions, structs, and type aliases are instantiated for each concrete type combination used in the program.
    The result is an expanded, fully monomorphic AST suitable for backend code generation.

  5. LLVM IR Lowering
    The monomorphic AST is translated into LLVM IR. This includes:

    • emitting declarations and definitions for functions, structs, enums, and error sets,
    • applying the Zynx symbol mangling scheme,
    • honoring ABI and layout attributes such as @native.
  6. Backend Compilation
    The generated LLVM IR is passed through the LLVM toolchain (for example, Clang/LLVM).
    That toolchain is responsible for low-level optimization, machine-code generation, and linking, producing the final binary or library.

Source Directory Structure

The Zynx compiler implementation is organized into several logical components:

  • src/lex/
    Lexical analyzer responsible for turning input text into tokens.

  • src/parse/
    Recursive‑descent parser that constructs the initial AST from the token stream.

  • src/analysis/
    Semantic analysis and type checking, including ownership and borrow analysis, attribute handling, and target filtering.

  • src/codegen/
    Backend code generator that lowers the analyzed AST into LLVM-facing constructs and applies symbol mangling.

  • src/utils/
    Shared utilities used across the compiler (diagnostics, data structures, helpers).

Exact paths and naming may evolve over time, but this reflects the conceptual separation of concerns inside the compiler.

AST Architecture

The internal AST is composed of generic struct node objects:

  • Each node records a kind (for example, NODE_FUNC, NODE_EXPR, NODE_TYPE).
  • A node contains a generic header plus a self pointer that refers to a more specific structure for that kind.
  • Helper macros such as NODE_CAST(node, type) are used to downcast from the generic node to the concrete representation.

This layout allows:

  • uniform handling of traversal and ownership for all node kinds, and
  • specialized fields and invariants per node type without duplicating the generic header.

Transformations such as monomorphization and target filtering operate on this AST, producing updated nodes and subtrees as needed.

Safety and Optimization

Several analyses and transformations work together to enforce safety guarantees and optimize the generated code:

  • Ownership and Borrowing
    The analysis phase tracks ownership, moves, and borrows. It rejects patterns that would create use‑after‑free, data races, or other unsafe aliasing. This information is used to insert or elide cleanup calls at precise points.

  • Liveness Analysis
    Liveness determines when values are no longer needed. The compiler uses this to decide where to emit drop logic and to minimize the lifetime of owned resources.

  • Unique Pointer Lowering Unique<T> values are lowered to heap allocations via __zynx_builtin_alloc with automatic cleanup through GCC/Clang __attribute__((cleanup(...))). Returning a locally-created value as &T is rejected at compile time — the programmer must use Unique<T> to make heap allocation explicit.

  • Name Mangling
    Zynx encodes module paths, type information, and overloading into mangled symbol names that are valid C identifiers. This enables:

    • linking multiple modules and generics without collisions, and
    • interoperation with C tooling that expects stable symbol names.

For more detail on mangling, see Symbol Mangling.

See also