The basics of how to create a programming language

Do you want to create a programming language?
Well this guide tells you everything you need to make a programming language.

1. Tokens

Tokens hold a type, and sometimes a value.
They would be something like this: INT:10.
And they would be structured something like this: {TYPE}:{VALUE} OR {TYPE}

Keywords

Keywords are certain words that hold a functionality and are tokens.
A var keywords’ token would be something like this KEYWORD:var

2. Lexer

The lexer turns the code into tokens.
Here’s an example:

Code:

var hello = 12 + 5

Tokens after the lexer lexes the code:

[
  Token("KEYWORD", "var"),
  Token("VARIABLE", "hello"),
  Token("EQUAL_SIGN"),
  Token("INTEGER", 12),
  Token("PLUS"),
  Token("INTEGER", 5)
]

3. Parser

The parser parses the code into nodes.

Here’s an example:

Tokens:

[
  Token("INTEGER", 12),
  Token("PLUS"),
  Token("INTEGER", 5)
]

After parser parses the code:

[
  BinaryOperationNode(NumberNode(12), OperatorNode("PLUS"), NumberNode(5))
]

4. Executing Code

Now the program has to execute the code.
There are three main ways to do this:

  • Interpret - Go through the nodes and execute code one by one.
  • Compile - Turn the nodes into assembly/machine code and then execute that.
  • Transpile - Turn the nodes into code of another language and execute that.
6 Likes

usually I like to have separate token types for the keywords instead of just one type called keyword.

I like this because it removes a check in my parser when im looking at what function i want to run like in this example code

pub fn get_next(&mut self) -> Node {
        match self.tokens[self.pos].t_type.as_str() {
            T_PRINT => Node::Stmt(self.print()),
            T_DEF => Node::Stmt(self.assign()),
            T_IF => Node::Stmt(self.if_stmt()),
            _ => Node::Expr(self.bool_op())
        }
    }
4 Likes

No one knows what that is. Also why is the indentation weird?


Anyway great resource @SnakeyKing! Would you like me to make it a wiki?

1 Like

i have no idea why the indentation is weird, but you see the different token types like T_DEF?

well those tokens determine what function is called, and the function that is called will handle the making of the AST node. Like for example T_DEF indicates that a variable is being assigned and will make a new variable assignment node like this

Assign(var_name, var_value)

(also your right i probably shouldve given this explanation in the original comment, thats mb)

2 Likes

Sure! But people might ruin it by trying to make it simpler…

Thanks! This is one of the most useful tutorials I’ve seen!

2 Likes