/r/ProgrammingLanguages

Photograph via snooOG

This subreddit is dedicated to the theory, design and implementation of programming languages.

Welcome!

This subreddit is dedicated to the theory, design and implementation of programming languages.

Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.

This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming.

Related subreddits

Related online communities

/r/ProgrammingLanguages

97,249 Subscribers

1

Designing type system

Hello, i am quite new to this - I can write a basic compiler, I have read craftinginterpreters. I made a small compiled to python language with a c like syntax and wrote a small interpreter in that language. Overall i am enjoying writing compilers. But I wanted to design a statically typed, compiled, object oriented and garbage collected language - I don't know how to design it's type system. I wanted something like C# or Java but then I realised that few aspects of it's type system makes it have runtime checking so a VM. I learnt Go but wasn't a fan of it's structurally type system. I think I will learn Nim, but wasn't impressed by a quick tour - I can't really find languages to take inspiration from, I don't know enough theory so I doubt combining different things from different languages would work. should I be using a language as a reference like this or else how do I design something from scratch?

4 Comments
2024/04/14
05:45 UTC

4

how to start as a beginner?

i am interested in creating my own language using the c programing language(mainly because i also wanted to get better with c and that just hitting two birds with one stone) and im kinda stack in where to begin and how to start and i was wandering if someone here would might be able to show me the diraction of where to begin or maybe tell me how you yourself begin making a language. thanks

8 Comments
2024/04/13
19:46 UTC

1

Have aesthetics ever been pushed as a feature/core piece of a programming language?

This may be a question that most simply do not care about. But I was curious if aesthetics (the elegance of syntax) has ever been the primary focus of a programming language? Does anyone know of any where that was the focus? From what I understand, most developers seem to push it off to the end like an after thought.

I thought of the question because I have been reading the Steve Jobs book and I thought it was interesting because that was important for him even down the circuit board. Aesthetics I think played a large part in Apple's success.

So, just curious if anyone ever took a similar approach to developing a programming language where aesthetics played an important role?

11 Comments
2024/04/13
17:52 UTC

0

Does anyone know how to run the Java DAP as a standalone?

I believe I have this as a working vscode extension https://github.com/microsoft/java-debug

I have a test program where I manage to launch a DAP for C++, Go and I'd like to get it working with Java but I can't see a way to launch it as a standalone DAP server? I see executeCommand but nothing that calls it. I don't use maven and haven't touch java in a while. Is there an easy way to run it? I can send DAP request/events like initialize, launch, StackTrace, etc but right now I can't seem to get a connection

1 Comment
2024/04/13
02:05 UTC

28

Is Hindley-Milner Style Unification Overkill for a Language with Required Function Annotation?

I have been using a pretty simple bidirectional typechecker for my language up until now. All functions in my language have argument and return type annotations so there are very few cases where inference actually fails. I am now working on adding parametric polymorphism, and am running into the question of how to infer instantiations for generic type parameters. With HM this is no problem, and adding unification wouldn't be too difficult, but I am wondering if it is overkill for this purpose. I am not really concerned about cases where types cannot be immediately inferred because they don't tend to come up very often. Is HM unification the way to go here? Or is there something simpler?

TL;DR: I am adding parametric polymorphism to my bidirectional typechecker. Is HM-style unification necessary to infer type parameter instantiations or is there something simpler?

19 Comments
2024/04/12
22:31 UTC

8

As a Web Dev who is interested in PLT but now little about it, I want to enroll in a graduated program, it is realistic and if it is, what should I prepare for it?

The interest on PLT mainly comes from my undergraduate compiler course. I wrote a stripped-down version of C compiler as the course's final assignment and it was a lot of fun. I tryed applying some PL relative job when I just guaduted from college (mostly static program analysis) but soon found out that my programming experience was just not enough for those jobs, and eventually I become a Web Dev like many other normal college students.

However, I am still getting into programming languages. I read Real World Haskell and quite enjoy it, and now I can read and write some simple Haskell code. Natually there is always a thought in my head that maybe I can enroll in a PLT graduate program and get a PhD. However since I know little about this realm, I don't know what should I prepare and study before I start finding these graduate programs and applying them.

If you are a PLT specialist and you are willing to give me some directions or advices, I will very appreciated!

2 Comments
2024/04/12
13:01 UTC

56

Are there any programming languages with context sensitive grammars?

So I've been reading "Engineering a Compiler", and in one of the chapters it says that while possible, context sensitive grammars are really slow and kinda impractical, unless you want them to be even slower. But practicality is not always the concern, and so I wonder - are there any languages (probably esolangs), or some exotic ideas for one, that involve having context sensitive grammar? Overall, what dumb concepts could context sensitive grammar enable for programming (eso?)language designers? Am I misunderstanding what a context sensitive grammar entails?

inb4 raw string literals are often context sensitive - that's not quirky enough lol

78 Comments
2024/04/11
10:25 UTC

17

Intro to Roc Programming Language • Richard Feldman & James Lewis

0 Comments
2024/04/11
09:32 UTC

37

Why are homoiconic languages so rare?

The number of homoiconic languages is quite small (the most well known are probably in the Lisp family). Why is that? Is a homoiconic language not the perfect way to allow users to (re)define language constructs and so make the community contribute to the language easily?

Also, I didn't find strongly typed (or even dependently typed) homoiconic languages. Are there some and I over saw them is there an inherent reason why that is not done?

It surprises me, because a lot of languages support the addition of custom syntax/ constructs and often have huge infrastructure for that. Wouldn't it be easier and also more powerful to support all that "natively" and not just have it tucked on?

66 Comments
2024/04/11
07:16 UTC

14

A rough idea how to slightly tweak the C type system and syntax to make it safer and perhaps also more optimisable

This is a further refinement of an idea I think I have posted some time ago in a comment, and it is related to my other post about variable sized pointers.

C as we all know, conflates pointers and arrays to some degree. I actually kind of like that, and I think it can be reinterpreted in a very elegant way.

Suppose we drop the slightly weird principle ("Declaration follows use"?) that makes the "*" bind to the declared name, as well as moving the array size "[k]" from the right side of the declared thing to the right side of the type instead, so now T* p1, p2 declares two pointer variables p1 and p2, and T[42] a1, a2 declares two array variables, each with 42 slots of type T. T* can now be thought of as simply being the Kleene star applied to T, just as T[2] is {T, T} or T×T. The type T[0] would be the array of length 0 of T objects, and it has no elements. For now I will refer to its value as "nil". As T* is the Kleene star applied to T, it is the same type as union{T[0]; T[1]; T[2]; T[3] ... }. Of course at any time, an object of type T* can only mean one specific variant of this union. So a union type like T* must be a pointer. Which conveniently gives us the similarity to T *p in ordinary C. It is probably useful to also define T? as union{T[0], T[1]} and note that T is just about the same as T[1]. (Just like with mathematical notation in general, x¹ = x.) I'm not decided yet if I would conflate void and T[0], and have T? be union{void, T}, but it might be okay to do so.

Similarly, T[short] would be the union of T[0], T, T[2] and so on up to T[32767].

A concrete object will have a definite size at any time, so T[k] a for some integer constant k will simply define an actual array (or a field of fixed length k inside a struct), whereas T* p as mentioned defines a pointer that can point to an array of any length. Likewise, T[short] is a pointer to arrays of length < 32768, and T[...k] a pointer to arrays of length <= k respectively. The actual implementation representation of such pointers will be a base address and a length; for T* it will be a full size (64-bit) base address, and a size_t (64-bit) length. For T[short] the base address will also be a full 64-bit, but the length can be reduced to two bytes for a short length.

Now, if you have T* p and T[100] a, then assigning p = a results in p referring to an array T[100]. *p is the same as p[0] and *(p+i) is the same as p[i] just like in usual C. However, in this language, to ensure safety an object of type T* has to store both the base address and the length. So p+1 has the type T[99], and in general, (p+i) has type T[100-i]. If p points to an array T[k] then p[j] or *(p+j) is invalid for j >= k. We can still have pointer incrementing p++, but unlike C, if p points to a single element of type T, then p++ results in p becoming nil instead of pointing outside an array. This makes it possible to write this:

    T[10] a;
    for(T* p = a; p; p++) { ... (*p) ... }

Assigning a longer array like T[100000] a to a short pointer T[short] p = a is valid, but of course only allows access to the first 32767 elements of a through the pointer p.

A variable can be anchored to another variable or field. This makes it possible to optimise the base address away from a pointer, replacing it with a shorter integer storing the offset from the base. The loop above can be rewritten:

    T[10] a;
    for(T* @a p; p; p++) { ... (*p) ... }

Which is obviously just yet another way of writing:

    T[10] a;
    for(size_t i = 0; i < 10; i++) { ... (a[i]) ... }

The language allows defining types within structs. This would enable certain optimisations using based pointers.

If you define a struct with pointer or array fields, you can make them relative:

    struct Tree {
        char[100000] textbuf;
        struct Node[short] nodebuf;
        struct Node {
            char* @textbuf text;
            int num;
            struct Node? @nodebuf left, right;
        };
        struct Node? @nodebuf root;
    }

    const int maxnode = 32000;
    struct Tree t = (struct Tree){
        .textbuf = {0},
        .nodebuf = calloc(maxnode, sizeof(struct Tree.Node)),
        .root = nil };

As Node is defined inside Tree, the field nodebuf can be used as base for the left and right pointer fields, and as they are declared as struct Node? they can either be nil or refer to some element of nodebuf, so they can be optimised to be represented by just a two byte short. As there has to be a nil value as well as references to nodebuf[0] to nodebuf[32767], it will probably not be possible to use unsigned representation types for this kind of based pointers. It should probably be possible to still define a Tree.Node pointer outside of Tree, by writing Tree.Node? p - however such a pointer will need to include a pointer to the Tree such a Node belongs to. Alternatively, such a pointer could be declared by writing t.Node? pt. This would tie pt to t, and suppose some other Tree t2 existed, pt = t2.root should probably be a compile time error.

The text field of Node, being based on the fixed allocation of 100000 chars in nodebuf, also has its base optimised away, however, two ints, both big enough to represent an index or length up to 100000 have to be stored in each node.

This is still all just a rough idea. The idea of interpreting "*" as Kleene star and include a length in pointer values I have had for some time; the idea of allowing fields and variables to be defined relative to other fields or variables, and having structs defined within structs, utilising such based fields, is completely new (based on an idea from my previous post), with the details thought up while writing this post. I hope it turned out at least mostly readable, but there may be holes as mistakes or problems I haven't thought about - any kind of input is welcome!

12 Comments
2024/04/10
22:11 UTC

58

Would it be worthwhile to use shorter pointers to reduce the cost of using pointers in 64-bit architectures?

Disclaimer: I don't know much about modern low-level CPU architecture stuff - last time I did anything half-serious with assembly language was as a CS student learning assembler programming on a PDP-11 architecture, and that was back in 1988.

When using pointers in a language compiled on 64-bit architectures, 8 bytes are used for each pointer field. Even if maybe just 48 bits or 6 bytes are used, and alignment makes up to 4 least significant bits of the least significant byte unused as well. This means 20 bits or more are just wasted space.

As I understand it, caches also make pointers problematic, as objects in some structure may be allocated far away from each other.

Could it be worthwhile to use more "low-level" techniques than current (C-style) pointers? Instead of full-range absolute addresses, a pointer could be relative to a base address, using 32 bit or even smaller offsets; or it could be a self-relative pointer, again using a much smaller offset. With 32 bits, 4 gigabytes of data can be addressed, which is still a lot, and although there may be more and more situations where data is so big that one entire data structure is larger than that, my guess is that in most cases, trees or lists or whatever would be a lot smaller than 4 GiB.

One way to achieve this could of course be to use arrays and integer indexing instead. While this works, I think it lacks some elegance and generality. How would you implement "short-range" pointer types in a C-like/C-level language? Are there any languages that support something like this, except using base registers and offsets in assembly language?

My guess would be that in addition to saving space, this would also make a program faster. Would this actually be the case, or are there some reasons why this is not going to work better or even be worse than just using full 64-bit pointers for everything?

41 Comments
2024/04/09
23:31 UTC

10

Advent of Computing: Episode 129 - ALGOL, Part I

1 Comment
2024/04/08
19:20 UTC

15

Implementing named variables using stack operations?

Most programming languages offer the ability to name variables; some would argue that all of the useful languages are in this class. Clearly it's a feature that most people prefer to have, even if it's not necessary for Turing-completeness. However, there are some interesting languages that lack them, chiefly Manfred von Thun's Joy. Unlike Forth, which shares its stack-oriented and concatenative qualities, Joy doesn't even give you the option of using variables, which requires you to think about computation in terms of functional combinators on anonymous stack parameters. Joy fans would call that a feature rather than a bug, and for von Thun's more theoretical purposes it makes sense, but clearly that hasn't caught on in the rest of the PL world.

My language is focused around data transformations using formal grammars, which is naturally suited to a concatenative approach based on functional combinators; however, it seems unreasonable to expect people (even myself) to forego named variables completely, so I'd like to have those too. Obviously it would be perfectly reasonable to implement the named variables the old fashioned way, using a symbol table or map of some sort, but it feels unnaturally grafted on to what is otherwise a stack-oriented language, so I'm interested in alternative approaches. I know that there are algorithms for converting lambda parameters into anonymous combinators (Brent Kerby specifically gives one using Joy notation), but they're all from a very non-practical theoretical perspective, almost all of them restricted to SKI combinators to prove their functional completeness, and so they produce very large and inefficient series of instructions. I myself am more pragmatic; I'm just looking for a mechanized way of turning a function with named parameters into a series of stack operations.

Has this been done before in a practical language implementation (i.e. not a pure syntactic calculus)? Or is the fact that even stack machines and languages like JVM and Forth use symbol tables and variable arrays a clue that converting between the two computational paradigms is just inherently inefficient?

12 Comments
2024/04/08
11:10 UTC

5

More adventures in unityping

I wrote previously about the difficulties of making a rich type system if you also want a dynamic language where values have to be tagged with types. But there are some things we can do to make the type system more interesting, and as I switch to the VM version I've taken the opportunity to start implementing some of them.

Since the type system is much stronger and more useful in Pipefish than in all the other dynamic languages I can think of, making the type system more expressive is genuinely useful in ways I'll explain. However, it is meant to be simple too so I don't want to implement something that would take a lot of explaining to get the point across.

So, ideas and indeed stuff I've done so far:

First of all, Pipefish distinguishes between concrete types which are things a value can be, and abstract types which act as a filter on the concrete types that can be taken on by the argument of a function, or its return type, or the value stored in a variable, or the field of a struct.

In the prototype, there were only a few build-in abstract types like struct and single. But now as illustrated in the script below, we can define e.g. Monster = Dragon/Troll/Orc and this functions as an abstract type.

types

Dragon = struct(name string, temperatureOfFire int, valueOfHoard int)
Troll = struct(name string, turnedToStone bool, IQ bool)
Orc = struct(name string, urukHai bool)
Person = struct(name string, age int)

Monster = Dragon/Troll/Orc

const

smaug = Dragon("Smaug", 451, 1000000)
doug = Person("Douglas", 42)

def

kill(m Monster) :
    "You kill the " + string(type(m)) + "."

kill(p Person) :
    "Killing people is wrong!"

If we run this and toy with it in the REPL:

→ Orc/Troll/Dragon
Monster
→ Orc/Person/Troll/Dragon
Monster/Person
→ null/Person 
Person?
→ kill smaug 
You kill the Dragon.
→ kill doug         
Killing people is wrong!
→  

(Note that trivially this allows type aliasing: if we wrote Wyrm = Dragon we could e.g. declare variables of type Wyrm which would in fact be of type Dragon. For all the good that does us.)

It will occur to you, of course, that we don't necessarily have to specify an abstract type "by hand", that we could group types into abstract types by specifying their properties. We could for example group structs together by whether they have a particular field label. Or we could group them by whether they can be taken as a parameter by a given function, say whether + is defined for them. We could have ... typeclasses ...

(BTW I am told that these were way more trouble than the Haskell people envisaged but I've never been told why. If anyone can tell me, I'd be grateful, and then I could decide if the lesson is relevant. Thanks.)

Or indeed anything else we can think of. I could allow you to group types together by their initial letters but I won't. But given that I can do whatever I like along these lines, what would be useful?

The other thing I can think of to do — and please lmk if you have other ideas — is to allow the creation of new concrete types from old ones, by writing e.g. Temperature = clone int or Vector = clone list.

This would serve a number of purposes. First, the sort of extra type safety you get in e.g. Go by doing this, where you can make the parameter of your function of type Temperature and you can't pass them a regular int. (Unlike if you just aliased the type.)

Second, since users can't have things of type (e.g) list of string or set of int, this would let them at least have things of type (e.g.) listOfString = clone list, where the onus would be on them to guarantee that its content matches its name.

Third, it would be easy to set up the type system so that a cloned type had the same operations defined on it by default as its prototype, but so that defining them for the cloned type would override that: so you could define + to do vector addition for the type defined by Vector = clone list.

These are my thoughts so far but I'd be grateful for yours. Thanks!

25 Comments
2024/04/08
09:56 UTC

7

Getting started?

Hi, I've looked around and there doesn't seem to be any resources around getting started, so I was wondering if someone would point me in the right direction, or answer a few questions? Thanks!

14 Comments
2024/04/08
07:14 UTC

6

Heap allocation in my Language

Hello i have re-worked the heap allocation syntax in my language concept called Duck. it's simular to C/C++/C# style but it does not use new/malloc keywords. The : symbol is for type inference.

Example
{
    int val

    Foo()
    {
    }
} 

// Stack allocation
Example e = Example()
Example e2()
e3 : Example()

// Heap allocation
Example* e = Example()
Example* e2()
e3 :: Example()

// Stack allocation
int num = 5
num2 : 5

// Heap allocation
int* num = 5
num2 :: 5

// Stack allocation
Example e3 = e2
Example e4 = {val : 5}

// Heap allocation
Example* e3 = e2
Example* e4 = {val : 5}

// Depends on the allocation of e2, if it can't be determined it will prefer stack
e3 : e2

// Heap allocation, force heap allocation
e3 :: e2 

// not allocated, technically pointer is on stack but there is no heap allocation
Example* e
Example* e2 = null

Please do not focus on the formatting as it is up to personal prefrerece in Duck

6 Comments
2024/04/07
19:52 UTC

9

Defining Operational Semantics of Loops in Coq?

I'm working on the Imp chapter of Software Foundations, especially the add_for_loop exercise, which asks that I add a simple for loop syntax to the imperative language Imp the chapter has been developing.

The summary of the problem is that I'm having trouble defining correct operational semantics of both for and while loops. I'm not sure my approach of describing the semantics of for loops with while loops is a correct approach. And before that, I'm not even sure my semantics for while loops is even correct!

I'll first list a simple description of Imp. It's a simple imperative programming language that accepts natural number arithmetic expressions and boolean expressions, called aexp and bexp respectively:

Inductive aexp : Type :=
  | ANum (n : nat)
  | AId (x : string)
  | APlus (a1 a2 : aexp)
  | AMinus (a1 a2 : aexp)
  | AMult (a1 a2 : aexp).
  
Inductive bexp : Type :=
  | BTrue
  | BFalse
  | BEq (a1 a2 : aexp)
  | BNeq (a1 a2 : aexp)
  | BLe (a1 a2 : aexp)
  | BGt (a1 a2 : aexp)
  | BNot (b : bexp)
  | BAnd (b1 b2 : bexp).

The language also has simple commands. This is the standard Imp version, without my attempt of adding for loops yet:

Inductive com : Type :=
  | CSkip
  | CAsgn (x : string) (a : aexp)
  | CSeq (c1 c2 : com)
  | CIf (b : bexp) (c1 c2 : com)
  | CWhile (b : bexp) (c : com).

which is syntactically equal to:

 com := skip
    | x := a
    | com ; com
    | if b then c1 else c2 end
    | while b do c end
    

Since the notion of a loop require that the resulting state after running a command may yield the loop to break or continue, the information of whether the result of a command should break or not is also added to the state information:

Inductive result : Type :=
  | SContinue
  | SBreak.
  
Reserved Notation "st '=[' c ']=>' st' '/' s"
    (at level 40, c custom com at level 99, st' constr at next level).

We read st =[ c ]=> st' / s as "running command c under state st yields state st', with result s, which indicates whether the outermost loop should break or continue normally."

Then, we can define the operational semantics of Imp commands:

Inductive ceval : com -> state -> result -> state -> Prop :=
  | E_Skip : forall st,
      st =[ CSkip ]=> st / SContinue
  | E_Break : forall st,
      st =[ CBreak ]=> st / SBreak
  | E_Asgn : forall st x a n,
      aeval st a = n -> st =[CAsgn x a ]=> (x !->n; st) / SContinue
  | E_SeqBreak : forall c1 c2 st st',
      st =[ c1 ]=> st' / SBreak -> st =[ CSeq c1 c2 ]=> st' / SBreak
  | E_Seq : forall c1 c2 st st' st'' res,
      st =[ c1 ]=> st' / SContinue -> st' =[ c2 ]=> st'' / res -> st =[ CSeq c1 c2 ]=> st'' / res
  | E_IfTrue : forall st st' b c1 c2 res,
      beval st b = true -> st =[ c1 ]=> st' / res -> st =[ CIf b c1 c2 ]=> st' / res
  | E_IfFalse : forall st st' b c1 c2 res,
      beval st b = false -> st =[ c2 ]=> st' / res -> st =[ CIf b c1 c2 ]=> st' / res
  | E_WhileFalse : forall b st c,
      beval st b = false -> st =[ CWhile b c ]=> st / SContinue
  | E_WhileTrueBreak : forall b st st' c,
      beval st b = true -> st =[ c ]=> st' / SBreak ->
      st =[CWhile b c]=> st' / SContinue
  | E_WhileTrue : forall b st st' st'' c,
      beval st b = true ->
      st =[ c ]=> st' / SContinue ->
      st' =[CWhile b c]=> st'' / SBreak ->
      st =[CWhile b c]=> st'' / SContinue
      

Consider the case E_WhileTrue. It says that given the boolean expression b is true under state st, the loop body c excuted at state st yields state st' that doesn't signal a break, and from that state st' running the loop yields another state st'' that signals a break, we can say running a while loop at state st yields state st''. I understood this as first checking that the boolean expression is true, then "unfolding" one iteration of the loop body, and then finally assume some state st'' exists that makes breaking the loop possible.

Everything made sense to me until I tried to add for loops to the language. The syntactic portion is straightforward, by augmenting com:

Inductive com : Type :=
  | CSkip
  | CAsgn (x : string) (a : aexp)
  | CSeq (c1 c2 : com)
  | CIf (b : bexp) (c1 c2 : com)
  | CWhile (b : bexp) (c : com)
  | CFor (b_test : bexp) (c_init c_end c_body : com).

which is syntactically equal to:

 com := skip
    | x := a
    | com ; com
    | if b then com else com end
    | while b do c end
    | for ( c_init | b_test| c_end ) do c_body end
    

The problem came when trying to extend the operational semantics of ceval. My first idea was to "desugar" for loops into while loops:

Inductive ceval : com -> state -> result -> state -> Prop :=
  ... ( everything same as above )
  | E_ForInitBreak : forall b_test st st' c_init c_end c_body, 
      st =[ c_init ]=> st' / SBreak -> 
      st =[ CFor b_test c_init c_end c_body ]=> st' / SContinue
  | E_ForEqualsWhile : forall b_test st st' c_init c_end c_body,
      st =[CSeq c_init (CWhile b_test (CSeq c_body c_end)) ]=> st' / SContinue ->
      st =[ CFor b_test c_init c_end c_body ]=> st' / SContinue

Here, E_ForInitBreak implies that if the initial statement of the for loop breaks then dont run the loop body. E_ForEqualsWhile implies that for loops can be seen as an execution of an equivalent while loop.

But the problem with this "desugaring" semantics was that program executions involving for loops were unprovable with the current semantics of while. I needed to change E_WhileTrue:

  | E_WhileTrue : forall b st st' st'' c,
      beval st b = true ->
      st =[ c ]=> st' / SContinue ->
      st' =[CWhile b c]=> st'' / SContinue -> (* SBreak was changed into SContinue *)
      st =[CWhile b c]=> st'' / SContinue
      

Which allowed me to prove for loop executions, but destroyed the previous proofs about the behavior of while loops.

I'm not sure whether my operational semantics for while loops is wrong, or whether this "desugaring" semantics for for loops just doesn't work. I've tried to alternatively define the semantics of for loops independently, but it just turned out to be way too long and tedious to work with.

How would you define the operational semantics of for and while loops?

8 Comments
2024/04/07
11:14 UTC

14

For AOT is there any benefit in using libgccjit vs. outputting C?

For learning purposes I'm looking into ways of creating native binaries. I have never done anything like this for a custom language. I want to keep things simple but still as powerful and flexible as I can muster.

After looking at a few options (many recommendations from this sub) I found libgccjit very easy to use and the tutorial and documentation are really helpful so far. But I was wondering, the calls I make feel like I'm building a sort of C AST. I understand the JIT portion of the library is useful. But if I want to compile ahead-of-time, is there any benefit in going with this library which couples me to GCC (both technology and license wise) or should I rather emit C code and have the flexibility to use any C compiler?

Do you have any recommendations?

15 Comments
2024/04/07
10:16 UTC

44

What are the advantages of a programming language built on a VM?

Like the JVM or the CLR? It seems like you get a lot of the same benefits provided by a VM if you develop/use languages based on LLVM, but with a leaner, faster language at the end.

40 Comments
2024/04/06
04:13 UTC

5

Lana:Gereral-Purpose Very-High Level Programming Language

Introduce

https://youtu.be/ZjSTKl4viek

*You probably haven’t seen this new programming method.

*Please don't mind the translation problem.

*My resources have been exhausted.

https://www.lana-lang.com/

3 Comments
2024/04/05
04:12 UTC

11

Moirai now supports cost expression literals in types

After a recent change to my language Moirai, the Max, Mul, and Sum operators can be used on Fin type parameters within type literals.

def maxList<T, M: Fin, N: Fin>(listM: List<T, M>, listN: List<T, N> ): List<T, Max(M, N)> {
   if (listM.size > listN.size) {
      listM
   } else {
      listN
   }
}
                
maxList(List(1, 2, 3), List(1, 2, 3, 4, 5))

In the above example, this change enables the type literal:

List<T,  Max(M, N)>

This is not true dependent types, but it does allow users of the language to leverage the existing worst case execution time calculation primitives. Max, Mul, and Sum operators are commutative, so the type checker will rewrite these expressions into a canonical form. In the above example, swapping the order of M and N in the return type of the function will not generate a type error.

6 Comments
2024/04/04
22:35 UTC

21

Formal verification in compilers?

Compilers seem like a good use-case for formal verification at first glance: they're complex deterministic programs where correctness is of the utmost importance.

While there is the canonical example of the "verified C compiler" I don't really understand how formal verification fits into the compiler world at large.

Has anyone here applied formal verification to their work? Perhaps for one part of their compiler -- verifying the type checker, a specific optimization, etc? What did it look like? What tools did you use?

15 Comments
2024/04/04
20:51 UTC

124

I wrote a C99 compiler from scratch

I wrote a C99 compiler (https://github.com/PhilippRados/wrecc) targeting x86-64 for MacOs and Linux.

It has a builtin preprocessor (which only misses function-like macros) and supports all types (except `short`, `floats` and `doubles`) and most keywords (except some storage-class-specifiers/qualifiers).

Currently it can only compile a single .c file at a time.

The self-written backend emits x86-64 which is then assembled and linked using hosts `as` and `ld`.

Since this is my first compiler (it had a lot of rewrites) I would appreciate some feedback from people that have more knowledge in the field, as I just learned as I needed it (especially for typechecker -> codegen -> register-allocation phases)

It has 0 dependencies and everything is self-contained so it _should_ be easy to follow 😄

36 Comments
2024/04/04
17:17 UTC

Back To Top