Most Popular
Courier is a notification service that centralizes all of your templates and messaging channels in one place which increases visibility and reduces engineering time.
Sign-up
The final phase of our compiler is code generation. This phase takes the AST and converts it to a set of executable instructions. In our case, WebAssembly. To accomplish this, we are going to use a popular WebAssembly compiler toolchain called binaryen.
Binaryen does most of the heavy compiling tasks for us. Tricky optimizations like dead-code removal, code folding, and more are all handled out of the box. Thanks to binaryen, we can make a powerful language with very little code.
This is the third article in the Build a programming language series. If you’re just starting out, start with the first post here before continuing on.
One of the trickiest tasks in code generation is what to call the code generation function. For this article, I have opted to call the main function compile
. The compile function will take our root AST (a BlockNode
) and return a new binaryen.Module
. The binaryen.Module
exposes functions that can optimize and emit wasm in various formats.
Here's the outline:
1// src/compiler.mts2import binaryen from "binaryen";3import { AstNode, BlockNode, IdentifierNode } from "./types/index.mjs";45export const compile = (block: BlockNode): binaryen.Module => {6// Creates a new binaryen module that our helper functions will fill in7const mod = new binaryen.Module();89// The function map is used to track all the functions and their types. More on this later10const functionMap = generateFunctionMap(block);1112// This function registers all the standard library functions we'll include with our language.13// This includes functions like add, subtract, etc.14registerStandardFunctions(mod, functionMap);1516// This is where the magic happens. Because `BlockNode` is an expression, this17// function can recursively compile every instruction in a wispy program file18compileExpression({19expression: block,20mod,21functionMap,22parameters: new Map(),23});2425// Finally, we return the binaryen module26return mod;27};
Next we define the generateFunctionMap
function. This function crawls the entire expression
tree to find and register function definitions. Its important we do this before actually
compiling the functions as some functions may call other functions before they've been defined.
The return type of generateFunctionMap
is a map where the key is the function name and the
value is an object containing all the important information about the function the compiler
needs to know about. For now, all we need is returnType
.
Here’s the return type definition:
1// src/compiler.mts2type FunctionMap = Map<string, { returnType: number }>;
Now that we have defined our return type, we can define the actual function:
1// src/compiler.mts2const generateFunctionMap = (block: BlockNode): FunctionMap => {3// Preview the first node (i.e. expression) of the block4const firstNode = block.expressions[0];56// If the first node is an identifier and the identifier is "fn", then we know this block represents a function definition.7if (isNodeType(firstNode, "identifier") && firstNode.identifier === "fn") {8// Grab the function identifier / name, and it's return type. This is the second expression of9// the function definition, a typed identifier.10const { identifier, returnType } = getFunctionIdentifier(block); // We'll define this next1112// Return the function map13return new Map([14[identifier, { returnType }],1516// It's possible this function may contain function definitions inside of it. So we17// We put all the remaining expressions of the function into a new block and scan it18// then we merge the resulting map with this one.19...generateFunctionMap({ type: "block", expressions: block.expressions.slice(3) }),20]);21}2223// A block can contain multiple expressions. So, we must scan each one to see if it is a function24// definition. The root `BlockNode` for instance, will almost always have multiple functions.25return block.expressions.reduce((map, expression) => {26// Only block expressions can be functions27if (expression.type === "block") {28return new Map([...map, ...generateFunctionMap(expression)]);29}3031// We can ignore all other expression32return map;33}, new Map());34};
Onto getFunctionIdentifier
we called earlier. This function is simple. It takes a BlockNode
, ensures that the second identifier is a TypedIdentifierNode
, and then returns the identifier
and return type:
1// src/compiler.mts2const getFunctionIdentifier = (block: BlockNode) => {3// Grab the second expression4const node = block.expressions[1];56// Ensure the expression is a typed identifier7if (!isNodeType(node, "typed-identifier")) {8throw new Error("Expected typed function name");9}1011return {12identifier: node.identifier,1314// We have to map the return type to a type binaryen understands.15returnType: mapBinaryenType(node.typeIdentifier),16};17};
As noted in the getFunctionIdentifier
function, Binaryen doesn't understand what type
the string typeIdentifier
is. To handle this we have to map our defined types to binaryen types. For now, we'll just support i32
and f32
. Thankfully binaryen uses the same nomenclature we do. So the map function is pretty simple:
1// src/compiler.mts2const mapBinaryenType = (typeIdentifier: string): binaryen.Type => {3if (typeIdentifier === "i32") return binaryen.i32;4if (typeIdentifier === "f32") return binaryen.f32;5throw new Error(`Unsupported type ${typeIdentifier}`);6};
getFunctionIdentifier
Made a call to a new function, isNodeType
. This function is
essentially the same concept as the isTokenType
function we defined in the parsing article, only for ASTNode
instead of Token
.
Here's the definition:
1// src/compiler.mts2export const isNodeType = <T extends AstNode["type"]>(3item: unknown,4type: T5): item is Extract<AstNode, { type: T }> => {6return (7// Ensure the type exists8!!item &&9// Ensure the type is an object10typeof item === "object" &&11// Cast the type as a record, so TypeScript doesn't get mad at us and then compare the12// type field with the type parameter. If they are equal, we know the node is the13// the type we were looking for.14(item as Record<string, unknown>)["type"] === type;15)16};
With the mapper finished we can start generating some code.
The compileExpression
function is where we really start to make use of binaryen
to model
the generated machine code. Because of the tree structure of an, ahem, abstract syntax tree,
compileExpression
is highly recursive. This is one of my favorite things about programming
languages, their patterns tend to lend themselves to elegant recursive functions with high
levels of code re-use.
Let's start with defining the parameters of compileExpression
. We will need to pass the
binaryen.Module
and the functionMap
we created earlier, the actual expression we are
compiling, and any parameters of the function this expression may be a part of (if it's inside)
of a function. When there are more than two parameters of a function it can be difficult
to visually keep track of what is what. So I like to make it clear by grouping them together
in an object. This enforces labeling the parameters on call and as a result, improves code
readability.
Here's the interface of that object:
1// src/compiler.mts2interface CompileExpressionOpts {3expression: AstNode;4mod: binaryen.Module;5functionMap: FunctionMap; // We defined this earlier6parameters: ParameterMap; // Defined below.7}89// A map where the key is the parameter identifier and the value is the important information10// required by binaryen to fetch the parameter down the line11type ParameterMap = Map<string, { index: number; type: number }>;
Now that we have the options for compileExpression
defined, we can define the actual function. compileExpression
takes CompileExpressionOpts
as a parameter and returns a number
. The job of this function is to take an expression and determine what type of expression it is, from there it can pass the expression to another compiler function that can handle that specific type of expression.
Why return a number? When we build an expression with
binaryen
it returns a number as an identifier for that expression. This allows us to compile an expression ahead of time and then reference that expression later down the line.
Here's the definition:
1// src/compiler.mts2const compileExpression = (opts: CompileExpressionOpts): number => {3// Grab the expression and the binaryen module (mod) from the options.4// The other fields are used by child function calls5const { expression, mod } = opts;67// Map the expression node to it's corresponding specific compiler8if (isNodeType(expression, "block")) return compileBlock({ ...opts, expression });910// Numbers are simple enough to compiler that we can just inline the compiler here.11// They are represented as constants12if (isNodeType(expression, "int")) return mod.i32.const(expression.value);13if (isNodeType(expression, "float")) return mod.f32.const(expression.value);1415if (isNodeType(expression, "identifier")) return compileIdentifier({ ...opts, expression });1617// Throw a helpful error message if we don't recognize the expression18throw new Error(`Unrecognized expression ${expression.type}`);19};
Let's define the compileBlock
functions. Because this function is also compiling an expression,
we can re-use the previously defined CompileExpressionOpts
, but we'll narrow the expression
field to the BlockNode
type, since we know we are compiling a block by the time this function is called:
1interface CompileBlockOpts extends CompileExpressionOpts {2expression: BlockNode;3}45const compileBlock = (opts: CompileBlockOpts): number => {6// We re-map the expression field to block here for clarity.7const { expression: block, mod } = opts;89// When a block has multiple expressions and the first one is an identifier, that means10// the block is actually a function call.11if (isNodeType(block.expressions[0], "identifier") && block.expressions.length > 1) {12// If it is a function call, transfer responsibility to the `compileFunctionCall` function (defined next)13return compileFunctionCall(opts);14}1516// This is where the recursive beauty starts to show. Since every value of a block17// is an expression, we can map each one back to the compileExpression function.18const expressions = block.expressions.map((expression) => {19return compileExpression({ ...opts, expression });20});2122// Now we generate the machine code by calling the block function of binaryen23// This function takes a block name, an array of compiled expressions, and a block return type.24// Named blocks are mostly useful for looping constructs like `for` and `while`. In this25// case we can pass null as we're not compiling a loop construct. Additionally, we can26// pass `auto` as the type since binaryen is smart enough to determine the return type27// of blocks automatically.28return mod.block(null, expressions, binaryen.auto);29};
Note: If you're curious to see how looping works in binaryen / WebAssembly works, check out my blog post on the subject here. Spoiler alert, its pretty weird.
The last simple expression we'll compile in this section is the identifier expression. If
compileExpression
was passed a lone IdentifierNode
it means that the expression evaluates
to the value of the identifier. In wispy, we don't have variables and function identifiers are
caught before the could've been passed here. That means the only thing IdentifierNode
can
resolve to is a parameter.
Here's the definition:
1interface CompileIdentifierOpts extends CompileExpressionOpts {2expression: IdentifierNode;3}45const compileIdentifier = (opts: CompileIdentifierOpts): number => {6// We remap expression to node to keep our lines a little shorter7const { expression: node, parameters, mod } = opts;89// Since we know the identifier has to be a parameter, we look it up in our10// parameter map. Don't worry, we'll define the parameter map in the next section11const info = parameters.get(node.identifier);12if (!info) {13throw new Error(`Unrecognized identifier ${node.identifier}`);14}1516// Finally, we use the local.get instruction to return the parameter value.17// Binaryen needs to know the parameters index and type. We'll get into18// the index when we define our parameter mapping function.19return mod.local.get(info.index, info.type);20};
The final expression type left to compile is a function call. This is interesting enough to warrant its own section.
In wispy function calls are blocks with multiple expressions where the first expression is an
identifier. The job of compileFunction
is to determine which function is being called, what
its parameters and return type are, and finally, building the call instruction with binaryen.
Here's the definition:
1// src/compiler.mts23// Because function calls are blocks, we can re-use CompileBlockOpts4const compileFunctionCall = (opts: CompileBlockOpts): number => {5const { expression, functionMap, mod } = opts;67// The first expression of a function call is the functions identifier8const identifierNode = expression.expressions[0];910// Here we just ensure the identifierNode is *actually* an identifier. Otherwise we throw an error.11if (!isNodeType(identifierNode, "identifier")) {12throw new Error("Expected identifier when compiling function call");13}1415// Next we create a reference to what the actual identifier is16const identifier = identifierNode.identifier;1718// If the identifier is "fn", the function we are calling is the function to define functions!19// That's right! Functions are created by another function. Pretty neat if you ask me.20if (identifier === "fn") return compileFunction(opts); // We'll define this next2122// Ifs are special functions. They may or may not have an else block. Binaryen needs to know23// if the else block exists at compile time, so we have a special if compiler for this.24if (identifier === "if") return compileIf(opts); // We'll define this later2526// Every other function is either part of the standard library, or is defined27// within the wispy code itself.28const functionInfo = functionMap.get(identifier);29if (!functionInfo) {30throw new Error(`Function ${identifier} not found`);31}3233const params = expression.expressions34// Every other expression in the block are parameters to the function, so we compile them35// and then pass them to the call36.slice(1)37.map((expression) => compileExpression({ ...opts, expression }));3839// Now we use binaryen to construct the call expression. The first parameter40// is the functions identifier, the second are the compiled parameter expression,41// and the third is the return type which has already been determined by generateFunctionMap42return mod.call(identifier, params, functionInfo.returnType);43};
Let's define the compileIf
function before we move onto the compileFunction
... function.
1// src/compiler.mts23const compileIf = (opts: CompileBlockOpts): number => {4const { expression, mod } = opts;56// The first expression, expression.expressions[0], is the "if" identifier, we don't need7// to do anything with it since we already know we are compiling an if expression89// The second expression is the if condition10const conditionNode = expression.expressions[1];1112// The third expression is the ifTrueNode, it's what is executed if the conditionNode evaluates to13// true14const ifTrueNode = expression.expressions[2];1516// Finally the fourth expression (which may or may not exist) is what is executed if the condition17// evaluates to false18const ifFalseNode = expression.expressions[3];1920// Compile the condition expression21const condition = compileExpression({ ...opts, expression: conditionNode });2223// Compile the ifTrue Expression24const ifTrue = compileExpression({ ...opts, expression: ifTrueNode });2526// Check to see if the ifFalseNode exists, if it does, compile it, otherwise set ifFalse to undefined27const ifFalse = ifFalseNode ? compileExpression({ ...opts, expression: ifFalseNode }) : undefined;2829// Finally we use binaryen to compile the if expression30return mod.if(condition, ifTrue, ifFalse);31};
Function definitions are a whole lot like function calls, so the function structure is pretty similar.
We take CompileBlockOpts
and return a number (the binaryen expression reference).
Here's the definition:
1// src/compiler.mts23const compileFunction = (opts: CompileBlockOpts): number => {4const { expression: block, mod } = opts;56// We need to tell binaryen what the identifier and return type of the function is7// Thankfully, we already wrote a function for that, getFunctionIdentifier. We8// could also have just looked up this information with the functionMap, but9// this is more fun.10const { identifier, returnType } = getFunctionIdentifier(block);1112// Next we grab the function parameters. This is the third expression of the function13const { parameters, parameterTypes } = getFunctionParameters(block); // Defined later1415// The rest of the expressions in the function are the functions block. So we create16// a new BlockNode from the remaining expression.17const body = compileBlock({18...opts,19expression: {20type: "block",21expressions: block.expressions.slice(3),22},2324// We need to pass the parameters of this function, so they can be referenced in child25// expressions26parameters,27});2829// Now we register the function with binaryen. Binaryen takes the function identifier,30// an array of parameter types (each item being the type of parameter in order),31// the function's return type, a list of variable types (wispy doesn't have any, so we pass an empty array)32// and finally the compiled body of the function.33mod.addFunction(identifier, parameterTypes, returnType, [], body);3435// To make things easy we export every single function defined in a wispy file36// so it can be called by the WebAssembly host.37mod.addFunctionExport(identifier, identifier);3839// Because function definitions are *technically* expressions that can be a part of another function40// body, we need to return an expression pointer. For this, we just return a nop (do nothing instruction),41// to make things consistent.42return mod.nop();43};
Now, let's define the getFunctionParameters
function. This function takes the function BlockNode
, that is, the entire unmodified function definition, and extracts its parameters. The function returns two values, parameters and parameterTypes.
The first returned value, parameters
, is a map where the key is the parameter identifier, and the value is the information needed to access the parameter down the line within the function body.
The second returned value is an array of binaryen types. There is one type for each defined parameter, and they must remain in the order they are defined. This is because binaryen doesn't reference parameters by their names, instead it references them by the index of the array in which they are defined. Don't worry if this is confusing to you, the code should make things a little more clear. If you need, refer to the compileIdentifier
definition, to get a better
understanding of how this works in practice.
Here's the definition:
1// src/compiler.mts23type ParameterMap = Map<string, { index: number; type: number }>;45const getFunctionParameters = (block: BlockNode) => {6// The parameters are defined in the third expression of the function definition7const node = block.expressions[2];89// Check to make sure the third expression is a block10if (!isNodeType(node, "block")) {11throw new Error("Expected function parameters");12}1314// Now we reduce the parameters into a parameter map, and a list of binaryen types15const { parameters, types } = node.expressions.reduce(16(prev, node, index) => {17// First, ensure that the node is a typed-identifier. Every parameter must be a18// typed identifier, therefore, every node in this reducer must be a typed identifier.19if (!isNodeType(node, "typed-identifier")) {20throw new Error("All parameters must be typed");21}2223// Determine the correct binaryen type of the parameter24const type = mapBinaryenType(node.typeIdentifier);2526// Add the parameter's type to the list of types we've defined so far27const types = [type, ...prev.types];2829// Now add the parameter to the parameter map. We save the parameters index and type.30// The index and type is used binaryen to access the parameter when it is used31// later in the function body32const parameters = new Map([[node.identifier, { index, type }], ...prev.parameters]);3334// Return updated parameters map and types array35return {36parameters,37types,38};39},40// Here we are setting the starting values for our reducer function and casting the default41// type so typescript can correctly infer the `prev` parameter type42{ parameters: new Map(), types: [] } as {43parameters: ParameterMap;44types: number[];45}46);4748// Finally we return the parameter map and the parameterTypes49return {50parameters,5152// Note: parameterTypes is a number, instead of an array of numbers as you'd expect.53// So we have to use binaryen.createType to create a new type that is referenced54// the mod.addFunction function. This is one inconsistency with the binaryen API. Parameters55// are defined as a number, and variables are defined as an array of numbers. I'm sure there56// is a reason for this, but I don't know what that reason is.57parameterTypes: binaryen.createType(types),58};59};
Now all that's left is to define the standard library. This part of the code isn't super interesting. We are essentially just mapping primitive WebAssembly instructions to a name to be referenced within wispy.
Here's the definitions. The only important information is the name we are associating with each instruction:
1// src/compiler.mts23const registerStandardFunctions = (mod: binaryen.Module, map: FunctionMap) => {4const { i32, f32 } = binaryen;5const { i32: i32m, f32: f32m } = mod;6const common = { mod, map };7registerLogicFunction({ name: "lt_i32", type: i32, operator: i32m.lt_s, ...common });8registerLogicFunction({ name: "gt_i32", type: i32, operator: i32m.gt_s, ...common });9registerLogicFunction({ name: "eq_i32", type: i32, operator: i32m.eq, ...common });10registerLogicFunction({ name: "lt_f32", type: f32, operator: f32m.lt, ...common });11registerLogicFunction({ name: "gt_f32", type: f32, operator: f32m.gt, ...common });12registerLogicFunction({ name: "eq_f32", type: f32, operator: f32m.eq, ...common });13registerMathFunction({ name: "add_i32", type: i32, operator: i32m.add, ...common });14registerMathFunction({ name: "sub_i32", type: i32, operator: i32m.sub, ...common });15registerMathFunction({ name: "mul_i32", type: i32, operator: i32m.mul, ...common });16registerMathFunction({ name: "add_f32", type: f32, operator: f32m.add, ...common });17registerMathFunction({ name: "sub_f32", type: f32, operator: f32m.sub, ...common });18registerMathFunction({ name: "mul_f32", type: f32, operator: f32m.mul, ...common });19registerMathFunction({ name: "div_f32", type: f32, operator: f32m.div, ...common });20};2122const registerMathFunction = (opts: {23mod: binaryen.Module;24name: string;25type: number;26operator: (left: number, right: number) => number;27map: FunctionMap;28}) => {29const { mod, name, type, operator, map } = opts;30return registerBinaryFunction({31mod,32name,33paramType: type,34returnType: type,35operator,36map,37});38};3940const registerLogicFunction = (opts: {41mod: binaryen.Module;42name: string;43type: number;44operator: (left: number, right: number) => number;45map: FunctionMap;46}) => {47const { mod, name, type, operator, map } = opts;48return registerBinaryFunction({49mod,50name,51paramType: type,52returnType: binaryen.i32,53operator,54map,55});56};5758const registerBinaryFunction = (opts: {59mod: binaryen.Module;60name: string;61paramType: number;62returnType: number;63operator: (left: number, right: number) => number;64map: FunctionMap;65}) => {66const { mod, name, paramType, returnType, operator, map } = opts;67mod.addFunction(68name,69binaryen.createType([paramType, paramType]),70returnType,71[],72mod.block(73null,74[operator(mod.local.get(0, paramType), mod.local.get(1, paramType))],75binaryen.auto76)77);78map.set(name, { returnType });79};
Now that we have finished our compiler, we can finally run our code.
First, replace the contents of src/index.mts
with this:
1// src/index.mts23#!/usr/bin/env node4import { readFileSync } from "fs";5import { compile } from "./compiler.mjs";6import { lex } from "./lexer.mjs";7import { parse } from "./parser.mjs";89const file = process.argv[2];10const input = readFileSync(file, "utf8");1112const tokens = lex(input);13const ast = parse(tokens);1415// !! New !!16const mod = compile(ast);1718// This is sneakily where the code gen is *actually* happening19const binary = mod.emitBinary();2021// Use the standard WebAssembly API to convert the wasm binary to a compiled module22// our host NodeJS/v8 can use23const compiled = new WebAssembly.Module(binary);2425// Build the instance, here you would add any external functions you might want to import into26// the WebAssembly module27const instance = new WebAssembly.Instance(compiled, {});2829// Finally, run the main function and log the result. We have to cast instance.exports to any30// The standard TypeScript types appear to be wrong.31console.log((instance.exports as any).main());
Now build and run project:
1npx tsc2wispy example.wispy
If all goes well (and you passed the number 15 to fib), you should see the number 610
in the
output of your console. If so, you've done it, you've made a working WebAssembly language. Congrats!
A full copy of the wispy language implementation is available under an open-source license at https://github.com/drew-y/wispy.
Courier is a notification service that centralizes all of your templates and messaging channels in one place which increases visibility and reduces engineering time.
Sign-up
Simplifying notifications with the Courier iOS SDK
Push notifications are a valuable tool for keeping users informed and increasing their engagement with your app. You can use push notifications to alert users about promotions, new content, or any other important updates. While push notifications are a powerful tool, setting up push notifications in iOS can be a daunting task that requires a significant amount of effort and time. Fortunately, the Courier iOS Mobile Notifications Software Development Kit (SDK) simplifies this process.
Mike Miller
March 23, 2023
Building Android push notifications with Firebase and Courier’s SDK
Push notifications have become an essential part of modern mobile apps, allowing you to keep your users engaged and informed. However, implementing push for different platforms can be a complex and time-consuming task, requiring developers to set up and handle token management, testing, and other logistical details.
Mike Miller
March 21, 2023
Free Tools
Comparison Guides
Send up to 10,000 notifications every month, for free.
Get started for free
Send up to 10,000 notifications every month, for free.
Get started for free
© 2024 Courier. All rights reserved.