/r/ProgrammingLanguages

Photograph via snooOG

This subreddit is dedicated to the theory, design and implementation of programming languages.

Welcome!

This subreddit is dedicated to the theory, design and implementation of programming languages.

Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.

This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming.

Related subreddits

Related online communities

/r/ProgrammingLanguages

99,270 Subscribers

5

Replacing Inheritance with default interface implementation

I have been thinking about inheritance vs composition a bit lately. A key point of inheritance is implicit reuse.

Lets say we want to extend the behavior of a function on a class A. In a language with inheritance, it is trivial.

class A {
  func doSomething() { }
}

class Ae extends A {
  func doSomething() {
    super.doSomething();
    // we do something else here
  }
}

With composition and an interface we, embed A within Ae and call the method and, within expectation, have the same result

interface I {
  func doSomething();
}

class A implements I {
  func doSomething() { }
}

class Ae implements I {
  A a;
  func doSomething() {
    a.doSomething();
    // do someting else
  }
}

This works great... until we run into issues where the interface I is long and in order to implement it while only modifying one method, we need to write boiler plate in Ae, calling A as we go explicitly. Inheritance eliminates the additional boiler plate (there may be other headaches including private data fields and methods, etc, but lets assume the extension does not need access to that).

Idea: In order to eliminate the need to explicit inheritance, we add language level support for delegates for interfaces.

interface I {
  func doSomething();
  func doSomething2();
}

class A implements I {
  func doSomething() { }
  func doSomething2() { }
}
// All methods in I which are not declared in Ae are then delegated to the
// variable a of type A which implements I. 
class Ae implements I(A a) {
  func doSomething() {
    a.doSomething();
    // do someting else
  }
  // doSomething2 already handled.
}

We achieve the reuse of inheritance without an inheritance hierarchy and implicit composition.

But this is just inheritance?

Its not though. You are only allowed to use as a type an interface or a class, but not subclass from another class. You could chain together composition where a "BASE" class A implements I. Then is modifed by utilizing A as the default implementation for class B for I. Then use class B as default implementation for class C, etc. But the type would be restricted into Interface I, and not any of the "SUB CLASSES". class B is not a type of A nor is class C a type of B or A. They all are only implementing I.

Question:

Is this worth anything or just another shower thought? I am currently working out ideas on how to minimize the use of inheritance over composition without giving up the power that comes from inheritance.

On the side where you need to now forward declare the type as an interface and then write a class against it, there may be an easy way to declare that an interface should be generated from a class, which then can be implemented like any other interface as a language feature. This would add additional features closer to inheritance without inheritance.

Why am I against inheritance?

Inheritance can be difficult? Interfaces are cleaner and easier to use at the expense of more code? Its better to write against an Interface than a Class?

Edit 1:

Both-Personality7664 asked regarding how internal function dependencies within the composed object would be handled.

A possible solution would be how the underlying dispatching works. With a virtual table implementation, the context being handled with the delegate would use a patched virtual table between the outer object and the default implementation. Then the composing object call the outer objects methods instead of its own.

// original idea result since A.func1() calling func2() on A would simply call A.func2()
Ae.func1() -> A.func1() -> A.func2()

// updated with using patched vtable // the table would have the updated methods so we a dispatch on func2() on A would call Ae with func2() instead of A. Ae.func1() -> A.func1() -> Ae.func2()

Edit 2:

Mercerenies pointed out Kotlin has it.

It seems kotlin does have support for this, or at least part of it.

5 Comments
2024/07/10
19:49 UTC

13

If you had to make a language with the highest chance of finding bugs at compile time, what would you put in it?

Obviously also considering that the language has to be usable. If i have to write 100 times more code then maybe it doesn't make a lot of sense.

40 Comments
2024/07/10
18:51 UTC

7

Revising syntax for readability

As I was developing Tailspin, I made a lot of syntax decisions that seemed to make sense at the time, but from insights that I and others have posted in this forum, I decided it was time to make a bigger revision.

Highlights:

  • The dot-syntax to access fields will disappear, so `$foo.bar` will now be `$foo(bar:)`. This is part of unifying projection syntax and always using the same form when specifying lenses as when directly accessing the value.
  • Names will now come first on the line, followed by a keyword denoting the type of definition.
  • The words `is` and `set` apply to immutable values and mutable variables, e.g. `foo is 5;` or `@bar set 7;`. These words are more searchable and scannable than `:`
  • A completely redundant extra keyword, `matches`, will be inserted in conditionals, e.g. `?($foo matches <|=5>)`. This makes it much easier to identify and ocularly parse this expression.
  • Anonymous inline-defined functions (a.k.a. lambdas) will now use exactly the same syntax as named functions (minus the name)
  • Line comments will start with `--` instead of `//` just because I think it looks cleaner.

More details

Tailspin blew the strangeness budget from the start and now I guess I might be spending even more strangeness tokens, but I think it looks a lot more readable. Compare the aoc2021 day18 solution in the new syntax versus the old syntax

2 Comments
2024/07/10
15:19 UTC

10

Need help with operator precedence

In my language, types are values. There is no separate type programming level. An expression which evaluates to a type value is "just" an expression - in the sense that it has the exact same syntax as any other expression. A type expression is just that: An expression which evaluates to a type.

This poses a problem in certain scenarios, as types, functions and plain values share a common set of operators which must then be overloaded to accommodate these different kinds.

Note, that in the following I refer to sets instead of types. This is because in my language sets are the types. By set I refer to the mathematical concept; not the data structure.

To do algebraic types I am considering overloading * for creating a tuple type (set of tuples) out of two types (sets):

int * string    // a set (type) of tuples of ints and strings

There is some precedence for using * to create tuple types. However, in a language where types are first class values, the * is the same operator as is used for multiplication. It is just overloaded to work for sets as well.

I plan to overload * so that I can create longer tuples as well:

int * string * float * char

Given that this is an expression, parsed by the same expression parser, and the fact that * is a binary, infix operator, this parsed as if it had been written:

((int * string) * float) * char

This means that the operator * overloaded for two sets will have to be defined so that it can accept two sets, but if the left set is already a set of tuples it will merge the tuples with the right set, creating a new, longer tuple type. I want members of this type to be

(int _, string _, float _, char _)

not binary, nested tuples like:

(((int _, string _), float _), char _)

I actually, I want to take it a small step further, and make this rule symmetric so that if any of the operand is a tuple type then this tuple type shallowly is merged with the new type. Essentially all ow the following set (type) expressions would be equivalent:

int*string*bool*char
(int*string)*(bool*char)
(int*string*bool)*char
int*(string*bool)*char
int*(string*bool*char)

The intuition is that most programmers will expect the merge behavior, not the nesting behavior.

However, this begs the question: What if I wanted to create a type of nested tuples, i.e. no "merge" behavior? I cannot simply use parenthesis since they are only used to guide the parsing and thus are erased from the resulting AST. Also, it would be confusing if (int) * string was different from int * string.

To that end, I came up with the operator **. The idea is that it has lower precedence than * such that

int*string ** bool*char

is a set of tuples shaped like this:

( (int _, string _), (bool _, char _) )

So far so good. We continue to functions. The set of functions (the function type, if you will) which accepts an int and returns a string can be described as:

int => string

This is also an expression, i.e. => is an infix operator.

My question now is this: Should => have lower, higher or same precedence as that of ****?**

Consider this type:

int ** bool => string ** float

Is this a set of functions (function type) of functions from an int*bool tuple to a string*float tuple? Or is it a set of tuples of three values int, bool=>string and float, respectively.

In my opinion, operator precedence generally work as a convention. A language should define precedence levels so that it is intuitive what an expression without parenthesis grouping means.

This intuition can be based on knowledge of other languages, or - if no precedence (no pun intended) - the precedence should be obvious.

This is where inventing new operators get into dicey territory: There is no precedence to build on. So it is plainly obvious to you what the above means?

21 Comments
2024/07/10
14:42 UTC

17

Has there been any attempt to make a statically compiled & strongly types smalltalk?

For me smalltalk has the following defining features;

  1. Object oriented with message passing as the key idea
  2. Image based workflow
  3. small syntax

Out of this I love the message passing and syntax of small talk. But always stumble on the image based workflows. it just feels too alien to me.

I wonder if anyone has ever made a variant of smalltalk with its syntax and message parsing but a statically aot compiled and strongly typed language. And no images. I read about strongtalk, but that too seems to be a fully dynamic language with some gradual typing.

12 Comments
2024/07/10
05:25 UTC

22

What is the current research in, or "State of the Art" of, non-JIT bytecode interpreter optimizations?

I've been reading some papers to do mostly with optimizing the bytecode dispatch loop/dispatch mechanism. Dynamic super-instructions, various clever threading models (like this), and several profile-guided approaches to things like handler ordering have come up, but these are mostly rather old. In fact, nearly all of these optimizations I'm finding revolve around keeping the instruction pipeline full(er) by targeting branch prediction algorithms, which have (as I understand it) changed quite substantially since circa the early 2000s. In that light, some pointers toward current or recent research into optimizing non-JIT VMs would be much appreciated, particularly a comparison of modern dispatch techniques on modern-ish hardware.

P.S. I have nothing against JIT, I'm just interested in seeing how far one can get with other (especially simpler) approaches. There is also this, which gives a sort of overview and mentions dynamic super-instructions.

10 Comments
2024/07/10
01:49 UTC

9

Algorithm for inlining functions?

Context: I'm working on a transpiler that outputs JavaScript. An important part of the process is an optimization pass, where we take immutable operations like this (assume there is no overloading or subtyping):

const value = new Vector(123, 456)
    .add(other)
    .normalize()
    .dotProduct(yetAnother)

And get rid of all intermediate allocations by inlining all of the steps, representing intermediate vectors as variables and then running further peephole optimizations on them.

Inlining a single-expression function is trivial. Inlining a function with control flow boils down to defining a variable for its result and hygienically inlining all of its code and local variables.

But what if a function has multiple return statements? normalize() from my example could be implemented as

normalize() {
    if (this.length === 0) return this;
    return new Vector(this.x / this.length, this.y / this.length);
}

In more complex scenarios these returns can be nested in loops and other control flow structures.

Is there a general-purpose algorithm for inlining such functions?

Thanks!

10 Comments
2024/07/09
20:38 UTC

41

Why do we write `a = 4` but `d = { a: 4, }`?

What is the reason many present-day PLs use the equals-sign for variable assignment as in a = 4 (and d.a = 4) but the colon for field assignment in a namespace / object / struct /record literal, as in d = { a: 4, }?

The first can, after all, be thought of setting a field in an implicit namespace, call it N, so a = 4 is equivalent to N.a = 4 and N = { N..., a: 4, } (using the splash operator to mean pretty much what it means in JavaScript, so 'set N to an object that has all the fields of the former value of N, with field a set to 4'.

In that view,

  • a variable ist just a field in some implicit namespace
  • each implicit namespace can be made explicit, much like JavaScript allows you to set a field on the special global / window / globalThis object and then later refer to that field in the form of a variable without the prefix (provided there is no shadowing from closer namespaces going on)
  • an assignment is just a object / struct/ record literal, but with a single field, without the braces: a = 4 becomes a: 4 and d = { a: 4, } becomes d: { a: 4, } (which, in turn, is the same as N.d: { a: 4, } when N is the appropriate namespace

Are there any flaws with this concept?

62 Comments
2024/07/09
14:09 UTC

18

How to make a Transpiler?

I want to make a transpiler for an object-oriented language, but I don't know anything about compilers or interpreters and I've never done anything like that, it would be my first time doing a project like this so I want to somehow understand it better and learn by doing it.

I have some ideas for an new object-oriented language syntax based on Java and CSharp but as I've never done this before I wanted to somehow learn what I would need to do to be able to make a transpiler.

And the decision to make a transpiler instead a compiler or a interpreter was not for nothing... It was precisely because that way I could take advantage of features that already exist in a certain mature language instead of having to create standard libraries from scratch. It would be a lot of work for just one person and it would basically mean that I would have to write all the standard libraries for my new language, make it cross platform and compatible with different OSs... It would be a lot of work...

I haven't yet decided which language mine would be translated into. Maybe someone would say to just use Java or C# itself, since my syntax would be based on them, but I wanted my language to be natively compiled to binary and not exactly bytecode or something like that, which excludes language options like Java, C# or interpreted ones like Python... But then I run into another problem, that if I were to use a language like Go or C, I don't know if I would have problems since they are not necessarily object-oriented in the traditional sense with a syntax like Java or C#, so I don't know if that would complicate me when it comes to writing a transpiler for two very different languages...

28 Comments
2024/07/09
13:44 UTC

9

C with explicit RAII

The explicitness of C makes it good for low-level systems programming. There are no hidden function calls like destructors and overloaded operators in C++. So I thought of a system that manages resources like RAII in C++ but with the explicitness of C. This introduces four keywords: init, fini, move, and fail.

This system has a few advantages over C++ style RAII:

  • Constructors are named functions, rather than all being identified by the name of the type, which as we've seen in C++ can become a nightmare when there are many constructors.
  • Constructors can be fallible without needing exceptions.
  • Destruction (finalization) is explit; the compiler does not insert any hidden function calls, so the program is easier to follow and reason about.
  • Moves are destructive, like in Rust.

=== init values

Declaring a value with init indicates that it is an initialized resource. Resources can be initialized via function return values:

Thing *init create_thing(void);

void main()
{
    Thing *thing = create_thing();
    destroy_thing();
}

or via pointer arguments:

void create_thing(Thing *init *thing);

void main()
{
    Thing *thing;
    create_thing(&thing);
    destroy_thing(thing);
}

=== init? values

Any time the presence of a resource is conditional, there is a hidden flag indicating whether it has been initialized. This is also useful for fallible initialization:

Thing *init? create_thing(void)
{
    Thing *thing = malloc(sizeof(Thing));

    if (!thing)
        fail return NULL; // not initialized

    memset(thing, 0, sizeof(Thing));
    init return thing; // initialized
}

or

void create_thing(Thing *init? *thing)
{
    *thing = malloc(sizeof(Thing));
    
    if (!*thing) {
        fail *thing; // explicitly uninitialized
        return;
    }
    
    memset(*thing, 0, sizeof(Thing));
    init *thing; // explicitly initialized
}

void create_fake_thing(Thing *init? *thing)
{
    if (precondition)
        *thing init = (Thing *)1; // explicitly initialized
    else
        *thing fail = NULL; // explicitly uninitialized
}

=== fini values

A value given to a variable marked as fini is marked as finalized, so no compiler errors are raised when the object goes out of scope.

void destroy_thing(Thing *fini thing);

void main_1()
{
    Thing *thing = create_thing();
} // ERROR: thing is never finalized

void main_2()
{
    Thing *thing = create_thing();
    destroy_thing(); // Good!
}

=== fini? values

These are like fini values except that they may be uninitialized.

Thing *init? maybe_create_thing();
void maybe_destroy_thing(Thing *fini? thing);
void definitely_destroy_thing(Thing *fini thing);

void main_1()
{
    Thing *thing = maybe_create_thing();
    definitely_destroy_thing(thing); // ERROR: thing may be uninitialized
}

void main_2()
{
    Thing *thing = maybe_create_thing();
    maybe_destroy_thing(thing); // Better!
}

=== move values

These are used to transfer ownership of a resource.

void register_thing(const char *name, Thing *move thing);

void main()
{
    Thing *thing = create_thing();
    register_thing("Thing1", move(thing)); // we'll see this syntax again later
}

=== move? values

These are used to transfer ownership of a possibly uninitialized resource.

bool register_thing(const char *name, Thing *move? thing);

void main()
{
    Thing *thing = maybe_create_thing();
    
    if (register_thing("Thing1", move(thing)))
        log_success();
    else
        log_failure();
}

=== init expression

The init(...) expression evaluates to true if a resource was successfully initialized.

Id init? alloc_id(void);

void main()
{
    Id id = alloc_id();
    
    if (init(id))
        log_success();
    else
        log_failure();
}

=== fini expression

A fini(...) expression can be used at most once in a statement. The statement is only performed if the object was initialized prior.

void definitely_destroy_thing(Thing *fini thing);

void maybe_destroy_thing(Thing *fini? thing)
{
    definitely_destroy_thing(fini(thing)); // short form
}

This is equivalent to:

void definitely_destroy_thing(Thing *fini thing);

void maybe_destroy_thing(Thing *fini? thing)
{
    // long form
    if (init(thing))
        definitely_destroy_thing(thing);
}

The short form may even be required by the compiler, as it would be difficult to make it smart enough to accept the long form without a false positive resource leak.

=== move expression

Moving a value to a move or move? parameter must be explicit, as seen before.

register_thing("asdf", move(thing));

Edit: Reddit's Markdown sucks. Sorry about the lousy formatting.

17 Comments
2024/07/09
04:04 UTC

16

Emitting loops with control flow expressions

So I'm developing a dynamically typed language which is in large parts inspired by Rust, so I have blocks, loops, and control flow constructs all as expressions. I'm currently working on emitting my own little stack-based bytecode, but I'm getting hung up on specifically emitting loops.

Take the following snippet

loop {
    let x = 1 + break;
}
let y = 2;

This code doesn't really do anything useful, but it's still valid in my language. The generated bytecode would look something like this

0x0  PUSH_INT 1  // 1
0x1  JUMP 0x6    // break
0x2  PUSH_NIL    // result of break
0x3  ADD         // +
0x4  STORE x     // let x
0x5  JUMP 0x0    // end of loop
0x6  PUSH_INT 2  // 2
0x7  STORE y     // let y

A lot of code here is obviously unreachable, but dead code removal is a can of worms I'm not quite prepared for yet. The thing I'm concerned with is that, after executing this code, there will be a 1 remaining on the stack, which is essentially just garbage data. Is this something I should be concerned about? If let go unconstrained it could lead to accidental stack overflows. To solve it I would need some way of clearing the stack of garbage data after the break, and I'm not quite sure how I would do that. I've been workshopping several attempted solutions, but none of them have really worked out. How do languages like Rust which might also encounter this kind of problem solve it?

14 Comments
2024/07/08
22:33 UTC

26

Why do CPython and Swift use ARC instead of a tracing GC?

I know the differences between both garbage collection methods and their pros-and-cons in general.

But I would like to know the reasoning by the language implementors on choosing ARC over a tracing-GC.

I tried to look it up but I couldn't find any information on the "why".

34 Comments
2024/07/08
06:16 UTC

9

Design Concepts in Programming Languages by Turbak, Gifford and Sheldon

Is it a good book? I very rarely see it recommended...

2 Comments
2024/07/07
18:54 UTC

14

Token Overloading

Below is a list of tokens that I interpret in more than one way when parsing, according to context.

Examples are from my two languages, one static, one dynamic, both at the lower-level end in their respective classes.

There's no real discussion here, I just thought it might be interesting. I didn't think I did much with overloading, but there was more going on than I'd realised.

(Whether this is good or bad I don't know. Probably it is bad if syntax needs to be defined with a formal grammar, something I don't bother with as you might guess.)

Token   Meanings               Example

=       Equality operator      if a = b
        'is'                   fun addone(x) = x + 1
        Compile-time init      static int a = 100    (Runtime assignment uses ':=')
        Default param values   (a, b, c = 0)

+       Addition               a + b             (Also set union, string concat, but this doesn't affect parsing)
        Unary plus             +                 (Same with most other arithmetic ops)

-       Subtraction            a - b 
        Negation               -a

*       Multiply               a * b
        Reflect function       func F*           (F will added to function tables for app lookup)

.       Part of float const   12.34              (OK, not really a token by itself)
        Name resolution       module.func()
        Member selection      p.x
        Extract info          x.len

:       Define label          lab:
        Named args            messagebox(message:"hello")
        Print item format     print x:"H"
        Keyword:value         ["age":23]

|       Compact then/else     (cond | a | b)    First is 'then', second is 'else'
        N-way select          (n | a, b, c, ... | z)

$       Last array item       A[$]              (Otherwise written A[A.len] or A[A.upb])
        Add space in print    print $,x,y       (Otherwise is a messier print " ",,x or print "",x")
                              print x,y,$       (Spaces are added between normal items)
        Stringify last enum   (red,   $, ...)   ($ turns into "red")

&       Address-of            &a
        Append                a & b
        By-reference param    (a, b, &c)

@       Variable equivalence  int a @ b         (Share same memory)
        Read/print channel    print @f, "hello"

min     Minimum               min(a, b) or a min b     (also 'max')
        Minimum type value    T.min or X.min    (Only for integer types)

in      For-loop syntax       for x in A do
        Test inclusion        if a in b

[]      Indexing/slicing      A[i] or A[i..j]
        Bit index/slice       A.[i] or A.[i..j]
        Set constructor       ['A'..'Z', 'a'..'z']      (These 2 in dynamic lang...)
        Dict constructor      ["one":10, "two":20]
        Declare array type    [N]int A                  (... in static lang)

{}      Dict lookup           D{k} or D{K, default}     (D[i] does something different
        Anonymous functions   addone := {x: x+1}

()      Expr term grouping    (a + b) * c
        Unit** grouping       (s1; s2; s3)        (Turns multiple units into one, when only one allowed)
        Function args         f(x, y, z)          (Also args for special ops, eg. swap(a, b))
        Type conversion       T(x)
        Type constructor      Point(x, y, z)      (Unless type can be infered)
        List constructor      (a, b, c)
        Compact if-then-else  (a | b | c)
        N-way select          (n | a, b, c ... | z)
        Misc                  ...                 (Define bitfields; compact record definitions; ...)

Until I wrote this I hadn't realised how much round brackets were over-used!

(** A 'unit' is an expression or statement, which can be used interchangebly, mostly. Declarations have different rules.)

11 Comments
2024/07/07
14:12 UTC

4

Is it a bad idea for a preprocessor to throw syntax errors?

I'm writing a compiler for the esoteric programming language Chef, and one of the syntactical components of the language involves comments being a separate section of the program. It has it's own syntactical rules, such as being a freeform paragraph, not having multiple lines, and separating itself between the recipe title and ingredients list via two newlines (a blank line).

Therefore, if I have a preprocessor remove these comments, I would have to check that the recipe title and the ingredients section title are syntactically correct and seperated via two newlines within the preprocessing phase.

Perhaps it would be a better idea to pass the comments to the tokenizer in this case and omit the preprocessing phase?

TLDR; If comments are a part of a language's syntactical structure, should they still be removed by a preprocessor? This means syntax errors in the preprocessor.

8 Comments
2024/07/07
04:26 UTC

43

With a slight bit of pride, I present to you Borzoi, my first programming language

First of all - Borzoi is a compiled, C-inspired statically typed low level programming language implemented in C#. It compiles into x64 Assembly, and then uses NASM and GCC to produce an executable. You can view its source code at https://github.com/KittenLord/borzoi

If you want a more basic introduction with explanations you can check out READMEmd and Examples/ at https://github.com/KittenLord/borzoi

Here is the basic taste of the syntax:

cfn printf(byte[] fmt, *) int
fn main() int {
    let int a = 8
    let int b = 3

    if a > b printf("If statement works!\n")

    for i from 0 until a printf("For loop hopefully works as well #%d\n", i+1)

    while a > b {
        if a == 5 { mut a = a - 1 continue } # sneaky skip
        printf("Despite its best efforts, a is still greater than b\n")
        mut a = a - 1
    }
    
    printf("What a turnaround\n")

    do while a > b 
        printf("This loop will first run its body, and only then check the condition %d > %d\n", a, b)

    while true {
        mut a = a + 1
        if a == 10 break
    }

    printf("After a lot of struggle, a has become %d\n", a)

    let int[] array = [1, 2, 3, 4]
    printf("We've got an array %d ints long on our hands\n", array.len)
    # Please don't tell anyone that you can directly modify the length of an array :)

    let int element = array[0]

    ret 0
}

As you can see, we don't need any semicolons, but the language is still completely whitespace insensitive - there's no semicolon insertion or line separation going on. You can kinda see how it's done, with keywords like let and mut, and for the longest time even standalone expressions (like a call to printf) had to be prefixed with the keyword call. I couldn't just get rid of it, because then there was an ambiguity introduced - ret (return) statement could either be followed by an expression, or not followed by anything (return from a void function). Now the parser remembers whether the function had a return type or not (absence of return type means void), and depending on that it parses ret statements differently, though it'd probably look messy in a formal grammar notation

Also, as I was writing the parser, I came to the conclusion that, despite everyone saying that parsing is trivial, it is true only until you want good error reporting and error recovery. Because of this, Borzoi haults after the first parsing error it encounters, but in a more serious project I imagine it'd take a lot of effort to make it right.

That's probably everything I've got to say about parsing, so now I'll proceed to talk about the code generation

Borzoi is implemented as a stack machine, so it pushes values onto the stack, pops/peeks when it needs to evaluate something, and collapses the stack when exiting the function. It was all pretty and beautiful, until I found out that stack has to always be aligned to 16 bytes, which was an absolute disaster, but also an interesting rabbit hole to research

So, how it evaluates stuff is really simple, for example (5 + 3) - evaluate 5, push onto stack, evaluate 3, push onto stack, pop into rbx, pop into rax, do the +, push the result onto the stack (it's implemented a bit differently, but in principle is the same).

A more interesting part is how it stores variables, arguments, etc. When analyzing the AST, compiler extracts all the local variables, including the very inner ones, and stores them in a list. There's also basic name-masking, as in variable declared in the inner scope masks the variable in the outer scope with the same name.

In the runtime, memory layout looks something like this:

# Borzoi code:
fn main() {
    let a = test(3, 5)
}

fn test(int a, int b) int {
    let int c = a + b
    let int d = b - a

    if a > b
        int inner = 0
}

# Stack layout relative to test():
...                                     # body of main
<space reserved for the return type>       # rbp + totaloffset
argument a                                 # rbp + aoffset
argument b                                 # rbp + boffset
ret address                                # rbp + 8
stored base pointer                     # rbp + 0 (base pointer)
local c                                    # rbp - coffset
local d                                    # rbp - doffset
local if1$inner                            # rbp - if1$inner offset
<below this all computations occur>     # relative to rsp

It took a bit to figure out how to evaluate all of these addresses when compiling, considering different sized types and padding for 16 byte alignment, but in the end it all worked out

Also, when initially designing the ABI I did it kinda in reverse - first push rbp, then call the function and set rbp to rsp, so that when function needs to return I can do

push [rbp] ; mov rsp, rbp     also works
ret

And then restore original rbp. But when making Borzoi compatible with other ABIs, this turned out to be kinda inefficient, and I abandoned this approach

Borzoi also has a minimal garbage collector. I explain it from the perspective of the user in the README linked above, and here I'll go more into depth.

So, since I have no idea what I'm doing, all arrays and strings are heap allocated using malloc, which is terrible for developer experience if you need to manually free every single string you ever create. So, under the hood, every scope looks like this:

# Borzoi code
fn main() 
{ # gcframe@@

    let byte[] str1 = "another unneeded string"
    # gcpush@@ str1

    if true 
    { #gcframe@@

        let byte[] str2 = "another unneeded string"
        # gcpush@@ str2

    } # gcclear@@ # frees str2

    let byte[] str3 = "yet another unneeded string"
    # gcpush@@ str3

} # gcclear@@ # frees str1 and str3

When the program starts, it initializes a secondary stack which is responsible for garbage collection. gcframe@@ pushes a NULL pointer to the stack, gcpush@@ pushes the pointer to the array/string you've just created (it won't push any NULL pointers), and gcclear@@ pops and frees pointers until it encounters a NULL pointer. All of these are written in Assembly and you can check source code in the repository linked above at Generation/Generator.cs:125. It was very fun to debug at 3AM :)

If you prefix a string (or an array) with & , gcpush@@ doesn't get called on it, and the pointer doesn't participate in the garbage collection. If you prefix a block with && , gcframe@@ and gcclear@@ don't get called, which is useful when you want to return an array outside, but still keep it garbage collected

Now I'll demonstrate some more features, which are not as technically interesting, but are good to have in a programming language and are quite useful

fn main() {
    # Pointers
    let int a = 5
    let int@ ap = u/a
    let int@@ app = @ap
    mut ap = app@
    mut a = app@@
    mut a = ap@

    # Heap allocation
    let@ int h = 69 # h has type int@
    let int@@ hp = @h
    mut a = h@
    
    collect h
    # h doesn't get garbage collected by default, 
}

I think "mentioning" a variable to get its address is an interesting intuition, though I would rather have pointer types look like @ int instead of int@. I didn't do it, because it makes types like @ int[]ambiguous - is it a pointer to an array, or an array of pointers? Other approaches could be []@int like in Zig, or [@int] similar to Haskell, but I'm really not sure about any of these. For now though, type modifiers are appended to the right. On the other hand, dereference syntax being on the right is the only sensible choice.

# Custom types

type vec3 {
    int x,
    int y,
    int z
}

fn main() {
    let vec3 a = vec3!{1, 2, 3}          # cool constructor syntax
    let vec3 b = vec3!{y=1, z=2, x=3}    # either all are specified, or none

    let vec3@ ap = @a
    let int x = a.x
    mut x = ap@.x
    mut ap@.y = 3
}

Despite types being incredibly useful, their implementation is pretty straightforward. I had some fun figuring out how does C organize its structs, so that Borzoi types and C structs are compatible. To copy a value of arbitrary size I simply did this:

mov rsi, sourceAddress
mov rdi, destinationAddress
mov rcx, sizeOfATypeInBytes
rep movsb ; This loops, while decrementing rcx, until rcx == 0

Unfortunately there are no native union/sum types in Borzoi :(

link "raylib"

type image {
    void@ data,
    i32 width,
    i32 height,
    i32 mipmaps,
    i32 format
}

cfn LoadImageFromMemory(byte[] fmt, byte[] data, int size) image

embed "assets/playerSprite.png" as sprite

fn main() {
    let image img = LoadImageFromMemory(".png", sprite, sprite.len)
}

These are also cool features - you can provide libraries to link with right in the code (there's a compiler flag to specify folders to be searched); you can create a custom type image, which directly corresponds to raylib's Image type, and define a foreign function returning this type which will work as expected; you can embed any file right into the executable, and access it like any other byte array just by name.

# Miscellanious
fn main() {
    let int[] a = [1, 2, 3, 4] 
        # Array literals look pretty (unlike C#'s "new int[] {1, 2, 3}" [I know they improved it recently, it's still bad])

    let int[4] b = [1, 2, 3, 4] # Compile-time sized array type
    let int[4] b1 = [] # Can be left uninitialized
    # let int[4] bb = [1, 2, 3] # A compile-time error

    let int num = 5
    let byte by = num->byte # Pretty cast syntax, will help when type inference inevitably fails you
    let float fl = num->float # Actual conversion occurs
    mut fl = 6.9 # Also floats do exist, yea

    if true and false {}
    if true or false {} # boolean operators, for those wondering about &&

    let void@ arrp = a.ptr # you can access the pointer behind the array if you really want to
        # Though when you pass an array type to a C function it already passes it by the pointer
        # And all arrays are automatically null-terminated
}

Among these features I think the -> conversion is the most interesting. Personally, I find C-style casts absolutely disgusting and uncomfortable to use, and I think this is a strong alternative

I don't have much to say about analyzing the code, i.e. inferring types, type checking, other-stuff-checking, since it's practically all like in C, or just not really interesting. The only cool fact I have is that I literally called the main function in the analyzing step "FigureOutTypesAndStuff", and other functions there follow a similar naming scheme, which I find really funny

So, despite this compiler being quite scuffed and duct-tapey, I think the experiment was successful (and really interesting to me). I learned a lot about the inner workings of a programming language, and figured out that gdb is better than print-debugging assembly. Next, I'll try to create garbage collected languages (just started reading "Crafting Interpreters"), and sometime create a functional one too. Or at least similar to functional lol

Thanks for reading this, I'd really appreciate any feedback, criticism, ideas and thoughts you might have! If you want to see an actual project written in Borzoi check out https://github.com/KittenLord/minesweeper.bz (as of now works only on WIndows unfortunately)

20 Comments
2024/07/05
18:21 UTC

3

KCL v0.9.0 Release — High Performance, Richer SDKs, Plugins and Integrations

https://www.kcl-lang.io/blog/2024-07-05-kcl-0.9.0-release

Hi fellas! KCL Programming Language v0.9.0 released! 🙇 Thank you to all community participants! ❤️ Welcome to read and provide feedback! 

0 Comments
2024/07/05
13:26 UTC

15

Can generators that receive values be strictly typed?

In languages like JavaScript and Python it is possible to not only yield values from a generator, but also send values back. Practically this means that a generator can model a state machine with inputs for every state transition. Here is a silly example of how such a generator may be defined in TypeScript:

type Op =
    | { kind: "ask", question: string }
    | { kind: "wait", delay: number }
    | { kind: "loadJson", url: string };

type Weather = { temperature: number };

function* example(): Generator<Op, void, string | Weather | undefined> {
    // Error 1: the result is not necessarily a string!
    const location: string = yield { kind: "ask", question: "Where do you live?" };

    while ((yield { kind: "ask", question: "Show weather?" }) === 'yes') {
        // Error 2: the result is not necessarily a Weather object!
        const weather: Weather = yield { kind: "loadJson", url: `weather-api/${location}` };
        console.log(weather.temperature);
        yield { kind: "wait", delay: 1000 };
    }
}

Note that different yielded "actions" expect different results. But there is no correlation between an action type and its result - so we either have to do unsafe typecasts or do runtime type checks, which may still lead to errors if we write the use site incorrectly.

And here is how the use site may look:

const generator = example();
let yielded = generator.next();

while (!yielded.done) {
    const value = yielded.value;

    switch(value.kind) {
        case "ask":
            // Pass back the user's response
            yielded = generator.next(prompt(value.question) as string);
            break;
        case "wait":
            await waitForMilliseconds(value.delay);
            // Do not pass anything back
            yielded = generator.next();
            break;
        case "loadJson":
            const result = await fetch(value.url).then(response => response.json());
            // Pass back the loaded data
            yielded = generator.next(result);
            break;
    }
}

Is there a way to type generator functions so that it's statically verified that specific yielded types (or specific states of the described state machine) correspond to specific types that can be passed back to the generator? In my example nothing prevents me to respond with an object to an ask operation, or to not pass anything back after loadJson was requested, and this would lead to a crash at runtime.

Or are there alternatives to generators that are equal in expressive power but are typed more strictly?

Any thoughts and references are welcome! Thanks!

18 Comments
2024/07/05
10:29 UTC

24

Loop control: are continue, do..while, and labels needed?

For my language I currently support for, while, and break. break can have a condition. I wonder what people think about continue, do..while, and labels.

  • continue: for me, it seems easy to understand, and can reduce some indentation. But is it, according to your knowledge, hard to understand for some people? This is what I heard from a relatively good software developer: I should not add it, because it unnecessarily complicates things. What do you think, is it worth adding this functionality, if the same can be relatively easily achieved with a if statement?
  • do..while: for me, it seems useless: it seems very rarely used, and the same can be achieved with an endless loop (while 1) plus a conditional break at the end.
  • Label: for me, it seems rarely used, and the same can be achieved with a separate function, or a local throw / catch (if that's very fast! I plan to make it very fast...), or return, or a boolean variable.
63 Comments
2024/07/05
09:35 UTC

4

Pattern matching with exhaustive output

Hey guys, first post here.

I'm in love with exhaustive pattern matching like in Rust. One of the patterns i've noticed in some of my code is that i'm transforming something *to* another Datastructure and i'd like to have exhaustiveness guarantees there aswell.

One idea i had was a "transform"-block similar to Rust's match, but both sides are patterns and are checked for exhaustiveness.

Is there any prior work on this? I'd also love to hear any more thoughts/ideas about this concept.

9 Comments
2024/07/05
05:24 UTC

17

Best syntax for stack allocated objects

I'm developing a programming language - its a statically typed low(ish) level language - similar in semantics to C, but with a more kotlin like syntax, and a manual memory management model.

At the present I can create objects on the heap with a syntax that looks like val x = new Cat("fred",4) where Cat is the class of object and "fred" and 4 are arguments passed to the constructor. This is allocated on the heap and must be later free'ed by a call to delete(x)

I would like some syntax to create objects on the stack. These would have a lifetime where they get deleted when the enclosing function returns. I'm looking for some suggestions on what would be the best syntax for that.

I could have just val x = Cat("fred",4), or val x = local Cat("fred",4) or val x = stackalloc Cat("fred",4). What do you think most clearly suggests the intent? Or any other suggestions?

10 Comments
2024/07/05
03:10 UTC

3

Are Concealing Aliases Bad?

so in my language you can have traits similar to rust, declared like so

trait T:Add
  zero:T
  add:T,T->T

however, the trait keyword is just an alias for

type Add = T =>
  zero:T
  add:T,T->T

so Add is just a type constructor that takes a type T and constructs a pair of labeled fields, namely zero and add.

in general, each trait is represented by some type.

while this fact enables some cool stuff, it is also maybe weird and confusing.

so should I

  • Either hide this fact via the trait keyword? the programmer really only needs to know about this for super advanced stuff. if they just want to define and use a trait, they don't need to know that traits are types.
  • Or be upfront with what traits really are? require a deeper understanding?

this question is not just about the type/trait distinction but also other cases where we might want to conceal part of the truth to make things more straitforward.

1 Comment
2024/07/04
15:12 UTC

Back To Top