/r/Compilers
This subreddit is all about the theory and development of compilers.
For similar sub-reddits see:
Popular mainstream compilers:
/r/Compilers
Hello everyone,
I always loved stories of programmers from the past using various tricks to make games run on inadequate hardware. While you could recreate this feeling by writing ROMs for retro systems, this is certainly not very easy to get into. So I made my own "virtual computer" SVC16. This is certainly not an original idea, but I found it very fun to write a simple game for it. So if you would like to write a simple compiler but don't want to deal with the complicated reality of a retro system, this might be something for you.
Hello, I just finished writing mem2reg pass in my toy compiler and the next step for me is lowering phi nodes to machine IR. Obviously, there is no concept of phi instructions in ISA, so I am currently wondering how to proceed. I think I remember reading somewhere that one way of doing it is inserting move instructions at the end of the predecessor blocks but if anyone can fill more details, tips or provide some resources for this particular problem that would be great. Thanks.
Nuno Lopes added an undefined behavior manual for LLVM: https://llvm.org/docs/UndefinedBehavior.html
i'll preface this by saying that i am actually interested in this and will probably read the book more thoroughly or at least find a resource that suits me better this summer.
so i just failed a compiler design test. it was mostly about parsing theory and despite having a general idea on it; whenever i pick something specific and look at it in a more detailed manner i find myself almost completely lost and don't know what to do.
this book is listed as the sole reference by my professor. i do have a general idea about lexers. i was wondering if it's a good idea to start with the syntax analysis chapter directly given that i have taken the course and have less ambiguities regarding the stuff that's in the book before the syntax chapter or if the book is one of those that keep annoyingly referencing previous chapters to the point where it's impossible to follow up without having read them.
i have an exam in 3 weeks, should i start with the syntax analysis chapter or start from the beginning? thanks in advance for answering!
I am implementing SSA as part of my attempt to understand and document various compiler optimization techniques. This work is part of the EeZee language project at https://github.com/CompilerProgramming/ez-lang.
I am using descriptions of the algorithm in following:
But the descriptions leave various details out. Example:
I am also looking at how other people implemented this but usually the implementations are complicated enough that it is often hard to relate back to the descriptions contained above.
Anyone knows a source on where a description can be found that is actually not sparse on details?
Hello,
How do I use Rust to create a compiler with LLVM? I know LLVM is based on C++, but I want to use Rust for its memory safety and also because C++ is hard to work with. I have searched about LLVM Bindings for Rust, and got a few:
llvm-ir
is intended for consumption of LLVM IR, and not necessarily production of LLVM IR (yet)."I think Inkwell is the best for me, but I'm not sure where to begin or how to. Please guide me.
I have posted the same post on r/rust also. (https://www.reddit.com/r/rust/comments/1hbous3/rust\_llvm\_bindings/)
Thanks!
I am working on creating a resource for compiler engineers. The goals are to cover the basics of compiler implementation including optimization techniques.
Please have a look!
I welcome all feedback!
Regards
I'm having an infuriating time trying to get the LLVM-C bindings for use in my compiler. I'm ABOUT to just go get an FFI lib and do this the hard way.
I'm on Windows so everything is 10x harder here, but I'm trying to build LLVM from source so that I can (in theory) get the header files I need for development. None of this is documented very well.
I've spent 3 days or so attempting build after build only to run into different issues every time (no disk space, no memory, command just didn't do anything, successful build with no header files, etc. etc.). I've given up on other build solutions and am currently trying `msbuild` through VS.
Does anyone on here have sufficient experience with this particular nightmare to be able to help me get to the point where I can just use the fing headers?
I'm trying to compile Armor Paint, it is an Open Source software that is offered for free (if you can compile it), or already compiled by paying 20$. I don't have the 20$ so I'm trying to compile it. But the instructions are a bit lacking. (I'm not a programmer).
This is the source for the program.
https://github.com/armory3d/armortools/tree/main/armorpaint
Can someone help me with the correct steps to compile the program properly?
I was wondering whether anyone can recommend a good reference for this. I am aware of these:
The book Engineering a Compiler describes ILOC, but this does not cover functions.
BRIL is another example of register based instruction set.
The Dalvik Instruction set for the Android RT is a production instruction set.
Lua has a register based VM - but its implementation is complicated due to Lua's semantics: being a dynamic language, it is not always known at compile time how many return values will result from a function call.
V8 appears to use a register based interpreter.
Note: This is re-post as my first post appears to have been filtered by Reddit - I do not know why. I am particular interested in how function calling sequences are represented.
After a certain amount of effort, I have designed the basic structure of my compiler and finally implemented the lexer including a viable realization for error messages.
I also dared to upload the project to GitHub for your critical assessment:
https://github.com/thyringer/zuse
Under Docs you can also see a few screenshots from the console that show views of the results such as the processed lines of code and tokens. It was also a bit tricky to find a usable format here to make the data clearly visible for testing.
I have to admit, it was quite challenging for me, so I felt compelled to break the lexer down into individual subtasks: a "linearizer" that first breaks down the source code read in as a string into individual lines, while determining the indentation depth and removing all non-documenting comments.
This "linearized code" is then passed to the "prelexer", which breaks down each line of code into its tokens based on whitespace or certain punctuation marks that are "clinging", such as "." or ("; but also certain operators like `/`. At the same time, reserved symbols like keywords and obvious things like strings are also recognized. In the last step, this "pretokenized lines" gets finally analyzed by the lexer, which determines the tokens that have not yet been categorized, provided that no lexical errors occur; otherwise the "faulty code" is returned: the previous linearized and tokenized code together with all errors that can then be output.
—
I had often read here that lexers and parsers are not important, just something that you have to do quickly somehow in order to get to the main thing. But I have to say, writing a lexer myself made me think intensively about the entire lexical structure of my language, which resulted in some simplifications in order to be able to process the language more easily. I see this as quite positive because it allows for a more efficient compiler and also makes the language more understandable for the programmer. Ultimately, it forced me to leave out unnecessary things that you initially see as "nice to have" "on the drawing board", but then later on become more of a nuisance when you have to implement them, so that you then ask yourself: is this really that useful, or can it be left out?! :D
The next step will be the parser, but I'm still thinking about how best to do this. I'll probably store all the declarations in an array one after the other, with name, type and bound expression, or subordinate declarations. This time I won't do everything at once, but will first implement only one type of declaration and then try to create a complete rudimentary pipeline up to the C emitter in order to get a feeling for what information I actually need from the parser and how the data should best be structured. My goal here is to make the compiler as simple as possible and to find an internal graph structure that can be easily translated directly.
Has anyone worked at either of these places as a compiler engineer? I would really love to talk to you to help me make a decision.
I just finished my Masters in Computer Science. I applied for various compiler engineer positions and received these offers:
+ Working with AI accelerators seems fun
+ Architecture is unique so there will be many exciting problems
- Annapurna Labs is owned by Amazon and Amazon culture doesn't have the best reputation
I was determined to take this offer until a former intern told me that all the exciting work is in the middle end and that the back-end and front-end teams do mostly routine tasks.
+ This team implements ML compilers using MLIR like dialect
+ Work seems somewhat interesting
+ Friendly Team
Other concerns: I strongly prefer California weather and culture. My partner also has a job offer in the Bay Area.
Are there any pros and cons of working at these places? Which role might have better future prospects?
I have an upcoming interview for a GPU Compiler Engineer position at Qualcomm. I was wondering how I should spend my time prepping for it. Should I spend more time reviewing compiler stuff(That I'm more comfortable with), or GPU stuff (That I'm not too comfortable with but I have a pretty good high-level understanding)? I'd appreciate any advice or topics that I should specifically study. Also wondering what the hiring process is like at Qualcomm. Here's the job description- https://www.linkedin.com/jobs/view/4078348944/
(I hope it's ok to post this here - others have done it before me so I'm assumimg yes)
Our team is working on the JIT Compiler in the Hotspot JVM in OpenJDK. We mostly write in C++, some assembly and Java.
The Job includes bug fixing, and performance improvements.
Personally, I'm working on auto-vectorization, but there are many other projects (e.g. Valhalla).
Feel free to apply directly or send me a PM. If you are interested in learning more, or want to contribute to this open source project in your free time to level up your skills you are also welcome to contact me.
Update: no internships currently, sorry :/
Here the official job listing: https://careers.oracle.com/jobs/#en/sites/jobsearch/requisitions/preview/269290/?keyword=JVM+%2F+Compiler+Software+Engineer&lastSelectedFacet=locations&location=Switzerland&locationId=300000000106764&locationLevel=country&mode=location&selectedLocationsFacet=300000000106764
I’m looking for a better way to have optional tokens in the grammar for a toy compiler I’m playing with. This simplified example illustrates my issue. Suppose a definition contains an optional storage class, a type, and an identifier – something along the line:
sclass : STATIC
| GLOBAL
;
type : INT
| FLOAT
;
def : sclass type ident
| type ident
;
Most of the semantic behavior is common between the two derivations of def is common – for example error handling if ident is already defined. In a more complicated grammar, supporting variable initialization and such, the amount of logic shared between the two cases is much larger. I’d like a single rule for the reducing def, so that I can avoid a large amount of duplicated code between the cases.
If I allow an empty match within sclass as below, def is simplified, but causes conflicts. I only want to match the empty rule if the following token is not a storage class. Except in an error case, the following token should always be a type.
sclass :
| STATIC
| GLOBAL
;
def : sclass type ident
;
Is there a way to specify this, or am I forced to have the very similar derivations with duplicate code?
Thanks for any suggestions.
Last year, I've spent a few months experimenting with and contributing to various compilers. I had great fun but felt that the developer experience could be better. The build systems were often hard-to-use, and the tooling was often complex enough for "jump to definition" to not work. So that's why I started to write a new compiler framework a few months ago. It's essentially written for my former self. When I started with compilers, I wanted a tool that was easy to build and (reasonably) easy to understand.
It's called xrcf (https://xrcf.org). Currently, the basic MLIR constructs are implemented plus a few lowerings from MLIR to LLVMIR. As my near-term goal, I'm working on getting a fully functional Arnold Schwarzenegger compiler working (demo available at https://xrcf.org/blog/basic-arnoldc/). So that means lowering from ArnoldC to MLIR to LLVM dialect to LLVM IR. Longer-term, I'm thinking about providing GPU support for ArnoldC. Is that crazy because ArnoldC isn't really a productive language? Yes, but it's a fun way to kickstart the project and make it usable for other languages.
So if you are thinking about building a new language, take a look at xrcf. I'll happily prioritize feature requests for people who are using the framework.