The basics of how to create a programming language

Do you want to create a programming language?
Well this guide tells you everything you need to make a programming language.

1. Tokens

Tokens hold a type, and sometimes a value.
They would be something like this: INT:10.
And they would be structured something like this: {TYPE}:{VALUE} OR {TYPE}


Keywords are certain words that hold a functionality and are tokens.
A var keywords’ token would be something like this KEYWORD:var

2. Lexer

The lexer turns the code into tokens.
Here’s an example:


var hello = 12 + 5

Tokens after the lexer lexes the code:

  Token("KEYWORD", "var"),
  Token("VARIABLE", "hello"),
  Token("INTEGER", 12),
  Token("INTEGER", 5)

3. Parser

The parser parses the code into nodes.

Here’s an example:


  Token("INTEGER", 12),
  Token("INTEGER", 5)

After parser parses the code:

  BinaryOperationNode(NumberNode(12), OperatorNode("PLUS"), NumberNode(5))

4. Executing Code

Now the program has to execute the code.
There are three main ways to do this:

  • Interpret - Go through the nodes and execute code one by one.
  • Compile - Turn the nodes into assembly/machine code and then execute that.
  • Transpile - Turn the nodes into code of another language and execute that.

usually I like to have separate token types for the keywords instead of just one type called keyword.

I like this because it removes a check in my parser when im looking at what function i want to run like in this example code

pub fn get_next(&mut self) -> Node {
        match self.tokens[self.pos].t_type.as_str() {
            T_PRINT => Node::Stmt(self.print()),
            T_DEF => Node::Stmt(self.assign()),
            T_IF => Node::Stmt(self.if_stmt()),
            _ => Node::Expr(self.bool_op())

No one knows what that is. Also why is the indentation weird?

Anyway great resource @SnakeyKing! Would you like me to make it a wiki?

1 Like

i have no idea why the indentation is weird, but you see the different token types like T_DEF?

well those tokens determine what function is called, and the function that is called will handle the making of the AST node. Like for example T_DEF indicates that a variable is being assigned and will make a new variable assignment node like this

Assign(var_name, var_value)

(also your right i probably shouldve given this explanation in the original comment, thats mb)


Sure! But people might ruin it by trying to make it simpler…

Thanks! This is one of the most useful tutorials I’ve seen!

1 Like