/r/Compilers

Photograph via snooOG

This subreddit is all about the theory and development of compilers.

For similar sub-reddits see:

Popular mainstream compilers:

/r/Compilers

21,451 Subscribers

7

QBE IR: How to use atomic instructions?

I just read the IR spec of the QBE compiler backend: https://c9x.me/compile/doc/il.html How would I use atomic instructions in QBE, for e.g. a length variable of a buffer that can be appended to from multiple threads?

4 Comments
2024/10/10
11:22 UTC

13

What would an ideal IR (Intermediate Representation) look like?

I'm developing the C2 language (c2lang.org). For back-ends there are currently 3 choices I'm aware of:

  1. LLVM - the safe choice, used by many 'serious' languages
  2. QBE - the choice for 'toy' language
  3. C - transpile to C and let another compiler do the heavy lifting

I currently have backends for C and QBE. QBE is not a final option, but would be a stepping stone towards LLVM. I know LLVM a bit and did some commits on Clang in the past. One goal of C2 is to have fast compile times. So you can see my problem. QBE is nice but very simple (maybe too simple). LLVM is big/huge/bloated/x Million lines of code. What I'm looking for is the sweet spot between them. So I am looking into option 4: writing your own backend.

The idea is take write a back-end that:

  • is very fast (unlike LLVM)
  • does decent optimizations (unlike QBE)
  • has a codebase that is tested (no tests in QBE)
  • has a codebase that is not several million lines of code (like LLVM)
  • is usable by other projects as well

Ideas so far:

  • Dont let the IR determine the struct layout, since this assumes knowledge about the language
  • use a lot less annotations compare to LLVM (only minimal needed)
  • base syntax more in the direction of QBE than LLVM (is more readable)
  • has unit-tests to ensure proper operation
  • support 32 and 64 bit targets

Practical choices I run into: (essentially they boil down to how much info to put in the IR)

  • Do you really need GetElementPtr?
  • add extern function decls? for example: declare i32 u/print(ptr noundef, ...)
  • add type definitions or just let front-ends compute offsets etc (not that hard).
  • How to indicate load/store alignment? llvm add 'align x', QBE has no unaligned. Different instructions? loadw / loaduw? (=load unaligned word), or do we need loadw with align 2 as well?
  • add switch instruction (LLVM has it, QBE does not)
  • add select instruction (LLVM has, QBE does not)

I'm interested in hearing your ideas..

27 Comments
2024/10/10
06:22 UTC

12

ML compilers vs using existing model runtimes

One of the advantages I see from ML model compilers is that its portable (e.g. it can generate hw specific instructions/kernels/whatever) , which is pretty useful for model deployments (e.g. compile the model and target it for web, mobile, server side, etc).

My question though is, there's existing runtimes (e.g. Onnxruntime, TFLite) which support a bunch of different platforms too. What's the sell with an ML compiler? Is it that it can generate more optimal code & improve performance compared to existing runtimes? Or is it more so that integrating with <insert next new vendor chip> is easier to do with a compiler than it is to build a new "execution provider" (using Onnxruntime terms) for a model runtime?

5 Comments
2024/10/09
16:18 UTC

18

I made an Egyptian Arabic Programming Language

5 Comments
2024/10/08
19:51 UTC

11

Exceptions vs multiple return values

It's an old discussion and I've always been in favor of solutions with different return types, especially when a programming language like Haskell or Rust offers sum types. But after thinking about it carefully, I have to say that exceptions are the more sensible solution in many cases:

Let's assume a program reads a file. The specified file path is correct and a valid file descriptor is received, otherwise some alternative value indicating an error gets returned. For the sake of simplicity – and this is the logical error – it is only checked at this point whether a file descriptor was actually returned. If this is the case, the file data is passed to other functions one after the other to perform operations on that file. But what happens if the file is suddenly deleted in the meantime? The program still assumes that as soon as a valid file descriptor with appropriate rights to the file is returned, nothing else happens, but when it comes to interactions "with the world", something can ALWAYS happen AT ANY TIME. Therefore, before every next operation with the file, you should always check whether the file still exists or whether there are other sources of error (here alone, there are probably many subtle OS-specific behaviors that you cannot or do not want to take into account across the board). Hence, wouldn't it be better to simply handle all the errors that you want to take into account in a central location for an entire block of code that works with the file, rather than laboriously dealing with individual returns?

In addition, multiple return types make the signatures of functions unnecessarily complex.

I think I've now been converted to a new faith… lol

BUT I think exceptions should be clearly limited to errors that have a temporal component, i.e. where you are working with something that is used for a certain period of time, but where unknown external factors can change in the meantime to cause errors. In my opinion, one-off events such as incorrect user input are not a reason to immediately call an exception, but should BASICALLY be checked in strict input processing, with alternative values ​​as return if necessary (Option, Maybe etc.). Accordingly, something like a database connection is again a clear case for exceptions, because it is assumed over a PERIOD of TIME as stable and working. Even if you only connect to a DB to start a simple query and then immediately close the connection, the connection could – although unlikely – break down in exactly that fraction of millisecond between the opening and reading operation for x-many reasons.

At this point I am now interested in how C++ actually implements its exceptions, especially since all the OS functions are programmed in C?!

After thinking about it again, I could imagine that instead of exceptions, all IO operations return a variant type (similar to Either in Haskell); or even simpler: special IO-heavy objects like "File" contain, in addition to the file descriptor, other variants representing errors, and every operation that accepts a "file" has to take all these variants into account, for example: if arguments is already everything except file descriptor, do nothing, just pass on, otherwise do this and that, and if failure occurs, pass on this failure as well. it wouldn't make sense to consider a "File" type without the possibility of errors anyway, so why define unnecessarily complicated extra error types and combine them with "Either" when the "File" type can already contain these? and with a handy syntax for pattern matching, it would be quite clear. You could even have the compiler add missing alternative branches, just assuming an identical mapping.

This approach seems to me cleaner than exceptions, more functional and compatible with C.

12 Comments
2024/10/08
18:47 UTC

12

Modifying an existing C compiler

I have never done something like this and I would like to know how hard would it be to modify an existing C compiler and add a try-catch for C? I wanted to modify clang but it's a big project with not such of a big documentation, so I chose something a lot smaller like Tiny C.

20 Comments
2024/10/07
14:44 UTC

0

Build a compiler with python?

Is it possible that I can build a compiler from scratch in python? And if so, can anyone tell me how can I make it because I have an assignment in university 😭

32 Comments
2024/10/06
11:06 UTC

7

Converting an exe to a dll

Exe is in pe format.

Based on my initial research it seems that a bit in the PE header needs to be set, Apart from that I need an "exports" section. Possibly a "relocation" section too.
Are there any more aspects to consider?.

I have the addresses of the functions and their names.
I would like to create an exports section into the exe file, I have no idea regarding the "relocation" section.
Any tips on how to generate that would be appreciated.

17 Comments
2024/10/06
04:31 UTC

24

Who makes the machine code for a compiler, since compiler is a program, right?

Suppose I want to compile a .c file, I will use compiler to do it so that the CPU understands it and can process it, but since Compiler itself is a program it should also run and processed by CPU, who does the compilation of compiler and generate a machine code for it?

I don't know if I am making sense on my question, just trying to understand things from logical pov.

38 Comments
2024/10/05
20:43 UTC

0

Prior art on implementing a "print" op for custom hardware (preferably in the AI domain)

Hi folks,
Could someone with direct/indirect experience implementing a print or print-like op for custom hardware share a rough implementation outline?

As mentioned above the question is grounded in the AI domain and unsurprisingly the thing that I am interested in printing are tensors. I’m interested in surveying existing approaches for printing tensors, that may be partitioned across the memory hierarchy, without significantly changing the compute graph or introducing expensive “collective” operations?

P.S. - Perhaps even CPUs with a cache hierarchy run into similar challenges while printing a value. Any relevant insights here would be appreciated.

4 Comments
2024/10/04
20:07 UTC

19

Ygen: release 0.1.2

2 Comments
2024/10/04
17:08 UTC

5

How Do We Make LLVM Quantum? - Josh Izaac @ Quantum Village, DEF CON 32

1 Comment
2024/10/03
12:26 UTC

36

How to leverage my llvm experience to get a compiler job?

Hello I have been contributing to llvm since early this year. I have about 25 PRs merged. Some of these PRs are non trivial even according to the judgement of a senior engineer who works at Google who has seen my work.

I landed an interview at Apple for a compiler role but failed and an Amazon aws recruiter reached out because of my llvm experience. I failed both of these.

I’m looking for my first job in in the industry. Transitioning from a different industry.

Just any tips if you have them as to how to a land a compiler job. I’m from the US.

Should I focus solely on compilers? I also know web backend dev but I have only landed interviews for compiler roles. Thanks

12 Comments
2024/10/03
07:10 UTC

73

Seriously want to get into compiler design.

I (20M) seriously want to get into compiler design. I'm an undergraduate student who has worked on app development projects before. I took a few classes like Compiler design and theory of computation this summer and felt really fascinated. I'm in my 3rd year and would love to learn about compilers and their architecture. Someone directed me to delve deeper into LLVM and x86 architecture. I feel lost by the vastness of the subject and would greatly appreciate if someone could point me in the right direction on what to do. I want to go way past toy compilers and actually want to make significant contributions.

Also, is the ambition of writing a research paper on compiler design before I graduate a far fetched goal? Is it feasible?

45 Comments
2024/10/02
13:15 UTC

5

I want to build a C# web complier

Hello, Im a uni student in vietnam. Our group have a project and we decided to make a C# web complier. This is new to us. So if u guys have some beginner friendly resources pls leave it in the comment for me, thanks. We just need the step to make it. We using .Net Core ( recommend what we should use for front end if u can thanks)

15 Comments
2024/10/02
01:49 UTC

44

Job landscape for compiler engineers

I’ve been a compiler engineer for a couple of years and have recently started exploring the job market for similar positions. On LinkedIn, I’ve noticed that compiler positions tend to have a disproportionately high number of applicants (relative to other software jobs).

I have also seen posts/comments here that indicate there tends to be less compiler positions and lots of applicants.

It is easy to believe there are less compiler engineers jobs than say web development, but I assumed the applicant pool would reflect this.

Has anyone else noticed an imbalance or am I psyching myself out?

Edit: the post’s purpose isn’t to learn how to differentiate myself but more to gauge the job market based on experiences other than my own.

13 Comments
2024/10/01
17:55 UTC

0

Claude AI or ChatGPT-4

Hi there!

I came across Claude AI recently, and I was wondering which one is better when using it for C++ Compilers related questions and university tasks, Claude AI or ChatGPT-4?

3 Comments
2024/10/01
11:44 UTC

19

Modern Compiler Implementation in ML

I'm an undergraduate student trying to dive in to compiler world, but going through this book with less experience in functional programming seems to be tough. Though I understand the theories mentioned, whats tough for me is the exercise part, so I was wondering if it is normal for one to do all the exercises in the book thoroughly? or is it sufficient to look at the source code of the implementation and understand it? Thanks for all your replies in advance!

3 Comments
2024/10/01
07:46 UTC

11

Best way to implement incremental compilation with LLVM

I'm building a compiler that uses LLVM. It builds an LLVM module for each source file, but I'm wondering what I should do after that point if I want to support incremental compilation (and maybe even incremental linking). I can think of two options:

Option 1: Cache LLVM bitcode files
Write LLVM modules as LLVM bitcode files to a "module cache" directory, then link those into one big LLVM module, then output that module as an object file and link it with the system linker or LLD.

Option 2: Cache object files
Write LLVM modules as object files to a "module cache" directory, then link those object files using the system linker or LLD.

What are the tradeoffs? I'd guess that maybe option 1 gives better link-time optimization, but maybe there's no real difference.

2 Comments
2024/09/30
22:22 UTC

9

How to execute native Code within Java

Hello Reddit

I read this post today: Understanding How Graal Works - a Java JIT Compiler Written in Java

It describes how Graal's JIT compiler works.

In short: The compiler takes a byte array with ByteCode and returns a byte array with assembly code using the JVM compiler interface.

I am now wondering how the GraalVM loads this byte array with assembly into memory so that it is executable.

I have some thoughts that come to my mind:

I would now try to allocate memory from the OS and store the content from the array there, furthermore this area should be executable. Back I would have to get a pointer to the address to be able to execute this native method.

But how is this possible within Java? Do you use the JNI interface or unsafe blocks?

I would love to understand how to load native code into memory and execute it within a Java program

Best thanks

5 Comments
2024/09/30
13:55 UTC

35

Why aren't tree-based compilers using blocks-with-arguments more popular?

I just wrote my first compiler. The results are surprisingly good: compiling a high-level pragmatic-but-minimalistic ML dialect down to Aarch64 asm faster than any of my usual compilers and generating faster code than any of my usual compilers (including Clang -O2). And my compiler is only ~4kLOC of OCaml code!

The main difference between my compiler and what I consider to be "conventional" compilers is that I almost entirely shunned graphs in favor of trees because they are simpler, particularly because I can manipulate trees easily using pattern matching in OCaml (the language my compiler is written in).

In particular, I don't do graph coloring for register allocation. I don't really have basic blocks in the usual sense: I have expression trees composed of calls, if with three subtrees and return. I don't have phi nodes: I use tail calls instead. This simplifies the compiler because it pushes phi nodes and function calls through the same machinery.

This approach appears to be called "blocks-with-arguments". Although I've been reading about compilers for years and working intensively on my own for two years I only just heard this term for the first time recently.

I do minimal optimisation. I think every compiler phase is almost linear time (one is technically O(n log n) but that's practically the same). Nowhere do I sit in a loop rewriting terms. Hence the fast compilation times. And I'm doing whole-program compilation with full monomorphization. The most extreme case I've found was a 10-line det4 function where another compiler took ~1sec to compile it vs mine taking ~1µsec.

Given the success I'm having I don't understand why lots of other compilers aren't built using this approach? Is this simply not known? Do people not expect to get good results this way?

In particular, the approach I've used to register allocation is closer to compile-time garbage collection than anything else I've seen. Function arguments appear in x0.. and d0... Every linear operation is a function call that consumes and produces registers. At consumption dead registers are "freed". Produced registers are "allocated". Across a non-tail call, live variables in parameter registers are evacuated into callee-saved registers. At any call or return, registers are shuffled into place using a traditional parallel move. At an if the map of the register file is simply duplicated for the two sub-blocks. This is simpler than linear scan!

38 Comments
2024/09/30
12:58 UTC

115

Starting YouTube Channel About Compilers and the LLVM

I hope you all enjoy it and check it out. In the first video (https://youtu.be/LvAMpVxLUHw?si=B4z-0sInfueeLQ3k) I give some channel background and talk a bit about my personal journey into compilers. In the future, we will talk about frontend analysis and IR generation, as well as many other topics in low level computer science.

23 Comments
2024/09/28
23:32 UTC

Back To Top