/r/Compilers
This subreddit is all about the theory and development of compilers.
For similar sub-reddits see:
Popular mainstream compilers:
/r/Compilers
I just read the IR spec of the QBE compiler backend: https://c9x.me/compile/doc/il.html How would I use atomic instructions in QBE, for e.g. a length variable of a buffer that can be appended to from multiple threads?
I'm developing the C2 language (c2lang.org). For back-ends there are currently 3 choices I'm aware of:
I currently have backends for C and QBE. QBE is not a final option, but would be a stepping stone towards LLVM. I know LLVM a bit and did some commits on Clang in the past. One goal of C2 is to have fast compile times. So you can see my problem. QBE is nice but very simple (maybe too simple). LLVM is big/huge/bloated/x Million lines of code. What I'm looking for is the sweet spot between them. So I am looking into option 4: writing your own backend.
The idea is take write a back-end that:
Ideas so far:
Practical choices I run into: (essentially they boil down to how much info to put in the IR)
I'm interested in hearing your ideas..
One of the advantages I see from ML model compilers is that its portable (e.g. it can generate hw specific instructions/kernels/whatever) , which is pretty useful for model deployments (e.g. compile the model and target it for web, mobile, server side, etc).
My question though is, there's existing runtimes (e.g. Onnxruntime, TFLite) which support a bunch of different platforms too. What's the sell with an ML compiler? Is it that it can generate more optimal code & improve performance compared to existing runtimes? Or is it more so that integrating with <insert next new vendor chip> is easier to do with a compiler than it is to build a new "execution provider" (using Onnxruntime terms) for a model runtime?
It's an old discussion and I've always been in favor of solutions with different return types, especially when a programming language like Haskell or Rust offers sum types. But after thinking about it carefully, I have to say that exceptions are the more sensible solution in many cases:
Let's assume a program reads a file. The specified file path is correct and a valid file descriptor is received, otherwise some alternative value indicating an error gets returned. For the sake of simplicity – and this is the logical error – it is only checked at this point whether a file descriptor was actually returned. If this is the case, the file data is passed to other functions one after the other to perform operations on that file. But what happens if the file is suddenly deleted in the meantime? The program still assumes that as soon as a valid file descriptor with appropriate rights to the file is returned, nothing else happens, but when it comes to interactions "with the world", something can ALWAYS happen AT ANY TIME. Therefore, before every next operation with the file, you should always check whether the file still exists or whether there are other sources of error (here alone, there are probably many subtle OS-specific behaviors that you cannot or do not want to take into account across the board). Hence, wouldn't it be better to simply handle all the errors that you want to take into account in a central location for an entire block of code that works with the file, rather than laboriously dealing with individual returns?
In addition, multiple return types make the signatures of functions unnecessarily complex.
I think I've now been converted to a new faith… lol
BUT I think exceptions should be clearly limited to errors that have a temporal component, i.e. where you are working with something that is used for a certain period of time, but where unknown external factors can change in the meantime to cause errors. In my opinion, one-off events such as incorrect user input are not a reason to immediately call an exception, but should BASICALLY be checked in strict input processing, with alternative values as return if necessary (Option, Maybe etc.). Accordingly, something like a database connection is again a clear case for exceptions, because it is assumed over a PERIOD of TIME as stable and working. Even if you only connect to a DB to start a simple query and then immediately close the connection, the connection could – although unlikely – break down in exactly that fraction of millisecond between the opening and reading operation for x-many reasons.
At this point I am now interested in how C++ actually implements its exceptions, especially since all the OS functions are programmed in C?!
—
After thinking about it again, I could imagine that instead of exceptions, all IO operations return a variant type (similar to Either in Haskell); or even simpler: special IO-heavy objects like "File" contain, in addition to the file descriptor, other variants representing errors, and every operation that accepts a "file" has to take all these variants into account, for example: if arguments is already everything except file descriptor, do nothing, just pass on, otherwise do this and that, and if failure occurs, pass on this failure as well. it wouldn't make sense to consider a "File" type without the possibility of errors anyway, so why define unnecessarily complicated extra error types and combine them with "Either" when the "File" type can already contain these? and with a handy syntax for pattern matching, it would be quite clear. You could even have the compiler add missing alternative branches, just assuming an identical mapping.
This approach seems to me cleaner than exceptions, more functional and compatible with C.
I have never done something like this and I would like to know how hard would it be to modify an existing C compiler and add a try-catch for C? I wanted to modify clang but it's a big project with not such of a big documentation, so I chose something a lot smaller like Tiny C.
Is it possible that I can build a compiler from scratch in python? And if so, can anyone tell me how can I make it because I have an assignment in university 😭
Exe is in pe format.
Based on my initial research it seems that a bit in the PE header needs to be set, Apart from that I need an "exports" section. Possibly a "relocation" section too.
Are there any more aspects to consider?.
I have the addresses of the functions and their names.
I would like to create an exports section into the exe file, I have no idea regarding the "relocation" section.
Any tips on how to generate that would be appreciated.
Suppose I want to compile a .c file, I will use compiler to do it so that the CPU understands it and can process it, but since Compiler itself is a program it should also run and processed by CPU, who does the compilation of compiler and generate a machine code for it?
I don't know if I am making sense on my question, just trying to understand things from logical pov.
Hi folks,
Could someone with direct/indirect experience implementing a print or print-like op for custom hardware share a rough implementation outline?
As mentioned above the question is grounded in the AI domain and unsurprisingly the thing that I am interested in printing are tensors. I’m interested in surveying existing approaches for printing tensors, that may be partitioned across the memory hierarchy, without significantly changing the compute graph or introducing expensive “collective” operations?
P.S. - Perhaps even CPUs with a cache hierarchy run into similar challenges while printing a value. Any relevant insights here would be appreciated.
Hello I have been contributing to llvm since early this year. I have about 25 PRs merged. Some of these PRs are non trivial even according to the judgement of a senior engineer who works at Google who has seen my work.
I landed an interview at Apple for a compiler role but failed and an Amazon aws recruiter reached out because of my llvm experience. I failed both of these.
I’m looking for my first job in in the industry. Transitioning from a different industry.
Just any tips if you have them as to how to a land a compiler job. I’m from the US.
Should I focus solely on compilers? I also know web backend dev but I have only landed interviews for compiler roles. Thanks
I (20M) seriously want to get into compiler design. I'm an undergraduate student who has worked on app development projects before. I took a few classes like Compiler design and theory of computation this summer and felt really fascinated. I'm in my 3rd year and would love to learn about compilers and their architecture. Someone directed me to delve deeper into LLVM and x86 architecture. I feel lost by the vastness of the subject and would greatly appreciate if someone could point me in the right direction on what to do. I want to go way past toy compilers and actually want to make significant contributions.
Also, is the ambition of writing a research paper on compiler design before I graduate a far fetched goal? Is it feasible?
Hello, Im a uni student in vietnam. Our group have a project and we decided to make a C# web complier. This is new to us. So if u guys have some beginner friendly resources pls leave it in the comment for me, thanks. We just need the step to make it. We using .Net Core ( recommend what we should use for front end if u can thanks)
I’ve been a compiler engineer for a couple of years and have recently started exploring the job market for similar positions. On LinkedIn, I’ve noticed that compiler positions tend to have a disproportionately high number of applicants (relative to other software jobs).
I have also seen posts/comments here that indicate there tends to be less compiler positions and lots of applicants.
It is easy to believe there are less compiler engineers jobs than say web development, but I assumed the applicant pool would reflect this.
Has anyone else noticed an imbalance or am I psyching myself out?
Edit: the post’s purpose isn’t to learn how to differentiate myself but more to gauge the job market based on experiences other than my own.
Hi there!
I came across Claude AI recently, and I was wondering which one is better when using it for C++ Compilers related questions and university tasks, Claude AI or ChatGPT-4?
I'm an undergraduate student trying to dive in to compiler world, but going through this book with less experience in functional programming seems to be tough. Though I understand the theories mentioned, whats tough for me is the exercise part, so I was wondering if it is normal for one to do all the exercises in the book thoroughly? or is it sufficient to look at the source code of the implementation and understand it? Thanks for all your replies in advance!
I'm building a compiler that uses LLVM. It builds an LLVM module for each source file, but I'm wondering what I should do after that point if I want to support incremental compilation (and maybe even incremental linking). I can think of two options:
Option 1: Cache LLVM bitcode files
Write LLVM modules as LLVM bitcode files to a "module cache" directory, then link those into one big LLVM module, then output that module as an object file and link it with the system linker or LLD.
Option 2: Cache object files
Write LLVM modules as object files to a "module cache" directory, then link those object files using the system linker or LLD.
What are the tradeoffs? I'd guess that maybe option 1 gives better link-time optimization, but maybe there's no real difference.
Hello Reddit
I read this post today: Understanding How Graal Works - a Java JIT Compiler Written in Java
It describes how Graal's JIT compiler works.
In short: The compiler takes a byte array with ByteCode and returns a byte array with assembly code using the JVM compiler interface.
I am now wondering how the GraalVM loads this byte array with assembly into memory so that it is executable.
I have some thoughts that come to my mind:
I would now try to allocate memory from the OS and store the content from the array there, furthermore this area should be executable. Back I would have to get a pointer to the address to be able to execute this native method.
But how is this possible within Java? Do you use the JNI interface or unsafe blocks?
I would love to understand how to load native code into memory and execute it within a Java program
Best thanks
I just wrote my first compiler. The results are surprisingly good: compiling a high-level pragmatic-but-minimalistic ML dialect down to Aarch64 asm faster than any of my usual compilers and generating faster code than any of my usual compilers (including Clang -O2). And my compiler is only ~4kLOC of OCaml code!
The main difference between my compiler and what I consider to be "conventional" compilers is that I almost entirely shunned graphs in favor of trees because they are simpler, particularly because I can manipulate trees easily using pattern matching in OCaml (the language my compiler is written in).
In particular, I don't do graph coloring for register allocation. I don't really have basic blocks in the usual sense: I have expression trees composed of calls, if
with three subtrees and return
. I don't have phi nodes: I use tail calls instead. This simplifies the compiler because it pushes phi nodes and function calls through the same machinery.
This approach appears to be called "blocks-with-arguments". Although I've been reading about compilers for years and working intensively on my own for two years I only just heard this term for the first time recently.
I do minimal optimisation. I think every compiler phase is almost linear time (one is technically O(n log n)
but that's practically the same). Nowhere do I sit in a loop rewriting terms. Hence the fast compilation times. And I'm doing whole-program compilation with full monomorphization. The most extreme case I've found was a 10-line det4
function where another compiler took ~1sec to compile it vs mine taking ~1µsec.
Given the success I'm having I don't understand why lots of other compilers aren't built using this approach? Is this simply not known? Do people not expect to get good results this way?
In particular, the approach I've used to register allocation is closer to compile-time garbage collection than anything else I've seen. Function arguments appear in x0..
and d0..
. Every linear operation is a function call that consumes and produces registers. At consumption dead registers are "freed". Produced registers are "allocated". Across a non-tail call, live variables in parameter registers are evacuated into callee-saved registers. At any call or return, registers are shuffled into place using a traditional parallel move. At an if
the map of the register file is simply duplicated for the two sub-blocks. This is simpler than linear scan!
I hope you all enjoy it and check it out. In the first video (https://youtu.be/LvAMpVxLUHw?si=B4z-0sInfueeLQ3k) I give some channel background and talk a bit about my personal journey into compilers. In the future, we will talk about frontend analysis and IR generation, as well as many other topics in low level computer science.