Most Popular
Courier is a notification service that centralizes all of your templates and messaging channels in one place which increases visibility and reduces engineering time.
Sign-up
WebAssembly (wasm) is a high performance assembly-like format optimized for the web. Code targeting WebAssembly can run at near-native speeds while still benefiting from the safe environment of a browser VM. Wasm has opened up a whole new world of demanding desktop-class apps that can comfortably run in the browser. For example, AutoCAD was able to port decades of code to the web using wasm. Cases like AutoCAD’s have made it clear that wasm will be a major disruptive force in how web apps are developed.
To facilitate the adoption of wasm, WebAssembly team developed a powerful compiler toolchain library called binaryen. Binaryen does a huge amount of heavy lifting for compiler authors. It offers dead code removal, code size reduction, and various performance optimizations out of the box.
As someone who has long been interested in programming languages, this piqued my interest. Writing compiled languages has always been a daunting task. What I found is binaryen made it incredibly fun and easy to build new languages that are shockingly speedy.
That's why I decided to create this guide and provide a simple overview designed to help get your feet wet in building languages and exploring the inner workings of wasm.
Here's a quick taste of the lisp inspired language we'll build, wispy:
1(fn fib:i32 [val:i32]2(if (lt_i32 val 2)3val4(add_i32 (fib (sub_i32 val 1)) (fib (sub_i32 val 2)))))56(fn main:i32 [] (fib 15))
This simple function calculates values of the fibonacci sequence, a sequence of numbers that appears in surprising places through mathematics and nature. It's one of my favorite illustrations of how elegantly patterns of the universe can be described in code.
This guide is designed for intermediate to advanced software developers looking for a fun side project to challenge themselves with. By the end, we’ll have built a working compiler and runtime for wispy.
The guide will be broken down into three articles:
In this guide, we will be using TypeScript and NodeJS. The concepts are highly portable, so feel free to use the environment you're most comfortable with. Our only major dependency, binaryen, has a simple C API. You are welcome to skip ahead to the next section if you're using a different language.
Requirements:
1git clone git@github.com:drew-y/wispy.git2cd wispy3git checkout quickstart4npm i
I've included manual setup instructions as an alternative to the quick start, in case you want to know exactly how the project was set up or just like doing things from scratch. If you've already done the quick start, skip to the next section.
1mkdir wispy2cd wispy
1npm init -y # Be sure to have NodeJS 16+ installed
1npm i @types/node binaryen typescript
1"type": "module", // Binaryen uses the new module format so we must follow suit2"bin": {3"wispy": "dist/index.mjs" // This will allow us to easily run the compiler from our terminal4},
1npx tsc init .
tsconfig.json
:1"module": "ES2022",2"rootDir": "./src",3"moduleResolution": "node",4"outDir": "./dist"
Lexing is the process of digesting each individual character of our program into a set of tokens. A token is a group of characters that take on a special meaning when put together. Take the following snippet of wispy:
1(add 1 2)
There are five unique tokens in that snippet (
, add
, 1
, 2
and )
. The lexer's job is simply
to identify and list those tokens in order.Lexing is typically the first step in turning human
readable code into something closer to what a computer can understand.
We'll start by defining our tokens in a new file:
1# mts extension is important, it tells typescript to create a corresponding mjs file, so Node knows to use modules23mkdirp -p src/types/token.mts
First up is the IntToken
. This token represents whole numbers like 1045
:
1// src/types/token.mts2export type IntToken = { type: "int"; value: number };
Next up is the FloatToken
. This token represents numbers that may have a decimal, like 1.8
:
1// src/types/token.mts2export type FloatToken = { type: "float"; value: number };34/** Previously defined tokens omitted for brevity */
Now, let's define some identifier tokens. In wispy, an identifier can represent either the name
of a function, or the name of a function parameter. We have two types of identifier tokens,
a standard IdentifierToken
and a TypedIdentifierToken
.
An IdentifierToken
is used in the body of a function to refer to the function's parameters or
to call another function.
A TypedIdentifierToken
is used when defining a function or a parameter. Typed identifiers
take the form identifier:type
. For example, val:i32
defines a parameter that is a 32-bit integer.
When defining a function, the type represents the function's return type. For example, fib:i32
is
a function that returns a 32-bit integer.
Here are the definitions:
1// src/types/token.mts2export type IdentifierToken = { type: "identifier"; value: string };3export type TypedIdentifierToken = { type: "typed-identifier"; value: string };45/** Previously defined tokens omitted for brevity */
Up next is BracketToken
. Wispy uses S-expression
syntax, like lisp. So brackets are very important. To keep things simple we allow two kinds of
brackets ()
and []
. To keep things even more simple the compiler will treat ()
and
[]
as interchangeable. In actual use we will only use []
to define parameters.
1// src/types/token.mts2export type BracketToken = { type: "bracket"; value: Bracket };3export type Bracket = "(" | ")" | "[" | "]";45/** Previously defined tokens omitted for brevity */
Finally we define the top level Token
type:
1// src/types/token.mts2export type Token = BracketToken | IntToken | FloatToken | IdentifierToken | TypedIdentifierToken;3/** Previously defined tokens omitted for brevity */
Token
is a discriminated union. Discriminated Unions are an incredibly powerful programming language construct. They represent a value that can be one of many types. In our case, a Token
can be any one of the more specific token types we defined earlier, such as IntToken
or FloatToken
. You'll notice that each of these tokens have a unique type
field, such as type: "int"
in the case of IntToken
. This is the discriminator. Down the line you can pass a Token
to a function and that function can use the type
field to figure out which specific token it's working
with.
At this point src/types/token.mts
is finished and should look like a file.
To make our new types easily accessible, export them from a new index.mts
file:
1// src/types/index.mts2export * from "./token.mjs";
Now that we have our tokens defined we can write the actual lex
function. The lex function
will take a string
(i.e. a .wispy file) and output an array of tokens (Token[]
):
Make a new lex file:
1mkdirp -p src/lexer.mts
Define the lex function:
1// src/lexer.mts2import { Bracket, Token } from "./types/index.mjs";34export const lex = (input: string): Token[] => {5const chars = input6// Remove any leading or trailing whitespace for simplicity7.trim()8// Break up the file into single characters9.split("");1011// This array stores our tokens12const tokens: Token[] = [];1314// The loop continues as long as we have characters to consume15while (chars.length) {16// Here, a word is an unidentified token. It is usually any single group of non-whitespace17// characters such as 123 or 123.4 or im_a_function18const word = consumeNextWord(chars); // We'll define this function later1920// We ran out of tokens. Break out of the loop.21if (word === undefined) break;2223const token = identifyToken(word); // We'll define this function later2425// Add the token to our store26tokens.push(token);27}2829// Return the tokens30return tokens;31};
Next we define the consumeNextWord
function:
1// src/lexer.mts23/** previous function(s) omitted for brevity */45const consumeNextWord = (chars: string[]): string | undefined => {6const token: string[] = [];78while (chars.length) {9// Save a preview of the current character without modifying the array10const char = chars[0];1112// No more characters to read13if (char === undefined) break;1415// Whitespace characters terminate the token (we'll define the isWhitespace function later)16if (isWhitespace(char) && token.length) {17chars.shift(); // Remove the whitespace so it doesn't get included in the next token18break;19}2021// Discard leading whitespace characters22if (isWhitespace(char)) {23chars.shift();24continue;25}2627// Terminator tokens signify the end of the current token (if any). (we'll define the isTerminatorToken function later)28if (isTerminatorToken(char) && token.length) break;2930// Add the character to the token and discard it from the input31token.push(char);32chars.shift();3334// If the only token we've received so far is a single character token, that's our whole token.35if (isTerminatorToken(char)) break;36}3738// If we have characters for our token, join them into a single word. Otherwise, return undefined to signal to the lexer39// that we are finished processing tokens.40return token.length ? token.join("") : undefined;41};
Now we'll define our identifyToken
function. As the name suggests, this function takes
a word and figures out what token that word represents.
1// src/lexer.mts23/** previous function(s) omitted for brevity */45const identifyToken = (word: string): Token => {6// Don't worry we'll get to all the `is` helper functions in a bit7if (isInt(word)) return { type: "int", value: parseInt(word) };8if (isFloat(word)) return { type: "float", value: parseFloat(word) };9if (isIdentifier(word)) return { type: "identifier", value: word };10if (isBracket(word)) return { type: "bracket", value: word };11if (isTypedIdentifier(word)) return { type: "typed-identifier", value: word };1213throw new Error(`Unknown token: ${word}`);14};
Finally, we define our helper functions. These functions all take a string and return
true
if the string passes their test, false
otherwise. Most are written using regex. If
you're unfamiliar with regex, I highly recommend regexone as a resource to learn more. In a nutshell, regex is an expression syntax that's used to extract meaningful information from text. In our case, we'll use it to match words against tokens.
1const isInt = (word: string) => /^[0-9]+$/.test(word);23const isFloat = (word: string) => /^[0-9]+\.[0-9]+$/.test(word);45const isIdentifier = (word: string) => /^[a-zA-Z_][a-zA-Z0-9_\-]*$/.test(word);67const isTypedIdentifier = (word: string) =>8/^[a-zA-Z_][a-zA-Z0-9_\-]*:[a-zA-Z_][a-zA-Z0-9_\-]*$/.test(word);910const isBracket = (word: string): word is Bracket => /[\(\)\[\]]/.test(word);1112/** Brackets are the only terminator tokens for now */13const isTerminatorToken = (word: string): word is Bracket => isBracket(word);1415// Not sure why I didn't use regex here ¯\_(ツ)_/¯16const isWhitespace = (char: string) => char === " " || char === "\n" || char === "\t";
At this point, src/lexer.mts
is finished and should look something like this file.
It's time to actually run the lexer. Start by making a new file src/index.mts
:
1#!/usr/bin/env node23// src/index.mts45import { readFileSync } from "fs";6const file = process.argv[2];7const input = readFileSync(file, "utf8");8const tokens = lex(input);9console.log(JSON.stringify(tokens, undefined, 2));
Next, create an example.wispy
file in the project root to compile.
1(fn fib:i32 [val:i32]2(if (lt_i32 val 2)3val4(add_i32 (fib (sub_i32 val 1)) (fib (sub_i32 val 2)))))56(fn main:i32 [] (fib 15))
Now build the lexer:
1npx tsc2npm link # This will make wispy available to run as its own command
Finally, run the lexer:
1wispy example.wispy23# Note, if npm link failed you can call our compiler directly with this as an alternative:4node dist/index.mjs example.wispy
If everything goes well, wispy
should output something like this:
1[2{3"type": "bracket",4"value": "("5},6{7"type": "identifier",8"value": "fn"9},10{11"type": "typed-identifier",12"value": "fib:i32"13},14{15"type": "bracket",16"value": "["17},18{19"type": "typed-identifier",20"value": "val:i32"21},22// Omitting the rest for brevity23]
With that we have a working lexer. We can break our code down into tokens. This is a good place to break for now. In the next article, we’ll move onto parsing these tokens into a logical tree that will ultimately be converted to wasm.
Courier is a notification service that centralizes all of your templates and messaging channels in one place which increases visibility and reduces engineering time.
Sign-up
Simplifying notifications with the Courier iOS SDK
Push notifications are a valuable tool for keeping users informed and increasing their engagement with your app. You can use push notifications to alert users about promotions, new content, or any other important updates. While push notifications are a powerful tool, setting up push notifications in iOS can be a daunting task that requires a significant amount of effort and time. Fortunately, the Courier iOS Mobile Notifications Software Development Kit (SDK) simplifies this process.
Mike Miller
March 23, 2023
Building Android push notifications with Firebase and Courier’s SDK
Push notifications have become an essential part of modern mobile apps, allowing you to keep your users engaged and informed. However, implementing push for different platforms can be a complex and time-consuming task, requiring developers to set up and handle token management, testing, and other logistical details.
Mike Miller
March 21, 2023
Free Tools
Alternative Guides
Comparison Guides
Send up to 10,000 notifications every month, for free.
Get started for free
Send up to 10,000 notifications every month, for free.
Get started for free
© 2024 Courier. All rights reserved.