/r/ProgrammingLanguages
This subreddit is dedicated to the theory, design and implementation of programming languages.
This subreddit is dedicated to the theory, design and implementation of programming languages.
Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.
This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming.
/r/ProgrammingLanguages
How much progress have you made since last time? What new ideas have you stumbled upon, what old ideas have you abandoned? What new projects have you started? What are you working on?
Once again, feel free to share anything you've been working on, old or new, simple or complex, tiny or huge, whether you want to share and discuss it, or simply brag about it - or just about anything you feel like sharing!
The monthly thread is the place for you to engage /r/ProgrammingLanguages on things that you might not have wanted to put up a post for - progress, ideas, maybe even a slick new chair you built in your garage. Share your projects and thoughts on other redditors' ideas, and most importantly, have a great and productive month!
A lot of people--especially people in this thread--recommend learning and abstracting from the lambda calculus to create a programming language. That seems like a fantastic idea for a language to operate on math or even a super high-level language that isn't focused on performance, but programming languages are designed to operate on computers. Should languages, then, not be abstracted from assembly? Why base methods of controlling a computer on abstract math?
In theory, Result
-like types with some syntax sugar seem like a perfect solution for most error handling problems: they are checked at compile time, they are reasonably efficient, they can be passed around and logged like normal values.
In practice, there are situations where we really want to avoid their usage, such as indexing into an array: in a lot of situations you know that your array index is correct, but proving it to the compiler may be hard even with very advanced type systems like refinement or dependent types. Sometimes you just want to get a Thing
from your Array<Thing>
not an Option<Thing>
- and you want the program to panic if you make a mistake. Otherwise nearly every function will return an Option
, which is very cumbersome to work with.
More than that, when your code interacts with external systems and constraints, there is always a possibility of something unexpected - the simplest example is stack overflow or memory allocation failure.
This means that some sort of "unexpected error" or "panic" is likely unavoidable in any practical language.
By itself it's fine - if something really unexpected happens, we can crash the program.
The problems begin when we don't want to crash due to such a failure. The most common example is a handling a request on a server - if a single request fails for an unexpected reason, we don't want the whole server to crash.
To do this, some sort of mechanism to catch those unexpected panics seems necessary. But this complicates everything. Our compiler now has to assume that any function call may crash at any moment, so we need extra checks to avoid things getting into an inconsistent state. For the same reason we cannot do some optimizations. Stack unwinding machinery has to be built. Our language users now have to write code in a way that is resilient in the face of exceptions. If they are writing higher-order functions and share some state between calls (e.g. some thread-local iterator state), it becomes easy to break that state by passing in a function that throws.
Indeed, I've ran into issues during my career where some library would start working in subtly incorrect ways after an exception occurs at an inopportune moment and gets caught. This seems to happen in any language.
Is it possible for a compiler author to avoid all this complexity of stack unwinding, or is it a necessary evil that all industrial-strength compilers have to deal with? Are there other mechanisms for handling unexpected errors without crashing the whole program?
Given that even pure-ish languages like Haskell have Control.Exception.try
, I feel like there is no getting around it, but any thoughts are welcome!
Hi,
I've been writing my own hobby language and compiler. Basically think of something like C semantics with Kotlin syntax. Eventually its going to be used to write the operating system for an fpga based computer that I'm designing.
I'm just starting to wonder how hard it would be to properly add Generics into the language. At the moment I can parse them into the AST - but my compiler will then bomb out with a "not supported yet" error if you ever use them.
I'm really in two minds about where to go from here. The simplest solution would be to special case List<> Set<> and Map<>, implement them directly in the compiler and call it done. On the other hand fully imlementing something like Kotlin (or Java) style generics would be pretty neat - but feels rather too daunting to attempt.
Those of you who have gone down this path before - how painful did adding Generics turn out to be?
I'm currently trying to figure out unaries and noticed both increment and decrement operators throw a 'cannot assign to rvalue' if used in the evaluated expression in a ternary. Should I let through to the AST and handle in the next stage or should the parser handle it?
So I am working on a parser for a new languge I am building (mostly for the sake of writing a parser and see how I feel about it) and I am debating how to handle missing delimiter errors
Ideally I want most of my parser logic to be shared between the syntax highlighter and compiler
now my current probably bad way of doing it is just look for the next closer and if I dont find it then I would look untill the end of the file before giving the missing error.
now I noticed that the syntax highlighter I am using (deafualt rust highlighter on sublime text) is much smarter than that. it can make pretty good guesses to what I actually mean. I was wondering how do you get such a thing
If you were to implement a stack VM in rust, it seems really tempting to have your op codes implemented as an enum, with their instructions encoded in the enum variants. No assumptions about instruction lengths would make the code feel more reliable.
However, this means of course that all of your instructions would be of the same size, even if they dont carry any operands. How big of a deal is this, assuming the stack VM is non-trivial of complexity?
I guess it’s the dilemma mentioned in the last paragraph of this post.
I've finished the first part of crafting interpreters and cobbled together a language with more features than lox.
I pretend to maintain this toy language for some time, and fear that a C bytecode VM would be a pain in the ass to maintain.
I understand a pointer declaration like int *p
in C, where declarations mimic usage, and I read it as: “p is such that *p
is an int”.
Cool.
But in languages in which declarations are supposed to read from left to right, I cant understand the rationale of using the dereference operator in the declaration, like:
var p: *int
.
Wouldn’t it make much more sense to use the address-of operator:
var p: &int
,
since it would read as “p holds the address of an int”?
If it was just one major language, I would consider it an idiosyncrasy. But since many languages do this, I’m left wondering if:
I had a hard time coming up with the title and I guess it's a bit confusing, so here's a better explanation:
I have a parser that have to do some exploratory branching. Basically, I need to try to parse one grammar, and if the parser encounters an error, I retry from the earlier point in the token stream, looking for a different grammar.
However, consider the case where one of the branches was in fact correct, but way down in the try, there's a completely unrelated syntax error.
That error the propagates up to my branch/recovery point, and I then have to judge whether the syntax error was due to the wrong branch being taken – and progress to the next attempted branch – or if it's a benign syntax error that should instead be reported to the user.
Is this just all up to heuristics? Like "the path that produces the fewest number of errors" or "the path that encounters an error the furthest forward in the token stream" etc.? Or are there some clever techniques to say "at this point I'm confident I'm the correct branch, and any subsequent type errors should be reported"?
Thanks!
Two days ago I made a post asking what you guys thought about freezing a language after it's done. Take a look at these small articles and this paper so you can understand the context here more:
https://pointersgonewild.com/2020/09/22/the-need-for-stable-foundations-in-software-development/
https://pointersgonewild.com/2022/02/11/code-that-doesnt-rot/
https://harelang.org/blog/2022-11-27-hare-is-boring/
https://harelang.org/blog/2023-11-08-100-year-language/
https://hal.science/hal-02117588v1/document
It occurs to me that we can divide end users into two categories:
So what if we in a certain period of time, say 15 years, took the lessons learnt in PL design, and wrote languages with finite pre-determined features then froze the languages?The end-users of the first category would use these frozen languages and their tooling.
On the other hand, there are languages that are continuously innovating and experimenting, stringing along their respective end-users.
Would that kind of programming community be so bad? Several answers to my previous post seem to have assumed that innovations and new features will stop coming. The language in question would not have new features yes, but the PL design community (and possibly the same author) would continue implementing new features and learning new lessons in other experimental languages.
edit: to clarify, here is how the stable group can look like:
Development cycle 1:
Development cycle 2:
-> freeze PL 2 -> use PL 2 until PL 3 is ready
Development cycle 3:
-> write PL 3 -> freeze PL 3 -> use PL 3 until PL 4 is ready
On the other hand, for the experimental group:
they can keep expanding their scope and adding new features forever.
After reading the fantastic book Crafting Interpreters and implementing Lox in C#, I decided to create a rudimentary programming language called NodeScript.
TLDR: NodeScript is intended for use in puzzle games. It's designed to run on interconnected nodes which can send each other messages.
NodeScript is a bit more expressive than the assembly-like code from games like TIS-100, but still more limited than any proprietary programming language. Part of the challenge is figuring out how to perform simple tasks using the limited toolset.
There are 4 types of nodes in NodeScript.
mem
. mem
is the only variable which will persist between executions.Notably, there are NO loops within the language itself. No while. No for. Despite this, NodeScript is still turing complete when multiple nodes are used.
Comments are prefixed with //
Every line contains a single statement. All statements start with a command, following by a comma-separated list of 0 or more expressions, depending on the command. Every line ends with a semicolon.
SET <variable_name>, <expression>;
PRINT <output_idx>, <expression>;
RETURN;
IF <expression>;
ELSE;
ENDIF;
One major goal for NodeScript was speed. Compilation has to occur in real-time. Test cases also had to execute quickly, giving players instant feedback. The language compiles into bytecode, which is then interpreted by the individual nodes. A typical node's program can compile in around 25 μs. These compiled programs can process hundreds of thousands of lines a second. Development details can be found here
NodeScript is open source and available as a NuGet package for anyone to use. There are several planned enhancements such as JIT tokenization and index caching. I also might try to make the language simpler by eliminating semi-colons for example. I'd love to know how I can improve the project as well as people's thoughts!
Hey guys,
I don’t know how to start this, but let me just make a bold statement:
“Just as letters combine to form words, I believe that grammatical particles are the letters of semantics.”
In linguistics, there’s a common view that grammatical particles—such as prepositions, conjunctions, articles, and other function words—are the fundamental units in constructing meaning.
I want to build a programming language inspired by this idea, where particles are the primitive components of it. I would love to hear what you guys think about that.
It’s not the technical aspects or features that I’m most concerned with, but the applicability of this idea or approach.
A bit about me: I’ve been in the software engineering industry for over 7 years and have built a couple of parsers and interpreters before.
A weird note, though: programming has actually made me quite articulate in life. I think programming is a form of rhetoric—a functional or practical one ^^.
inb4: there is a chance that "Trees that Grow" answers my question, but I found that paper a bit dense, and it's not always easy to apply outside of Haskell.
I'm writing a compiler that works with several representations of the program. In order to display clear error messages, I want to include source code locations there. Since errors may not only occur during the parsing phase, but also during later phases (type checking, IR sanity checks, etc), I need to somehow map program trees from those later phases to the source locations.
The obvious solution is to store source code spans within each node. However, this makes my pattern matching and other algorithms noisier. For example, the following code lowers high-level AST to an intermediate representation. It translates Scala-like lambda shorthands to actual closures, turning items.map(foo(_, 123))
into items.map(arg => foo(arg, 123))
. Example here and below in ReScript:
type ast =
| Underscore
| Id(string)
| Call(ast, array<ast>)
| Closure(array<string>, ast)
| ...
type ir = ...mostly the same, but without Underscore...
let lower = ast => switch ast {
| Call(callee, args) =>
switch args->Array.filter(x => x == Underscore)->Array.length {
| 0 => Call(lower(callee), args->Array.map(lower))
| 1 => Closure(["arg"], lower(Call(callee, [Id("arg"), ...args])))
| _ => raise(Failure("Only one underscore is allowed in a lambda shorthand"))
}
...
}
However, if we introduce spans, we need to pass them everywhere manually, even though I just want to copy the source (high-level AST) span to every IR node created. This makes the whole algorithm harder to read:
type ast =
| Underscore(Span.t)
| Id(string, Span.t)
| Call((ast, array<ast>), Span.t)
| Closure((array<string>, ast), Span.t)
| ...
// Even though this node contains no info, a helper is now needed to ignore a span
let isUndesrscore = node => switch node {
| Underscore(_) => true
| _ => false
}
let lower = ast => switch ast {
| Call((callee, args), span) =>
switch args->Array.filter(isUndesrscore)->Array.length {
// We have to pass the same span everywhere
| 0 => Call((lower(callee), args->Array.map(lower)), span)
// For synthetic nodes like "arg", I also want to simply include the original span
| 1 => Closure((["arg"], lower(Call(callee, [Id("arg", span), ...args]))), span)
| _ => raise(Failure(`Only one underscore is allowed in function shorthand args at ${span->Span.toString}`))
}
...
}
Is there a way to represent source spans without having to weave them (or any other info) through all code transformation phases manually? In other words, is there a way to keep my code transforms purely focused on their task, and handle all the other "noise" in some wrapper functions?
Any suggestions are welcome!
Symbolverse is a term rewriting system with a minimal syntax, but it tries to be useful in many areas like formal verification, automated reasoning, and symbolic computation. It is written in Javascript, and the source code is relatively small (some thousand lines altogether), so I consider porting it to a faster platform once it reaches v1.0.0 version.
Term rewriting rules are represented by S-expressions, just like terms which the rules operate on. The system is supposed to be used in the following process:
*.sv
file*.sexpr
file*.sexpr
fileIn a way, the first *.sv
file may be considered as a compiler for the second source code *.sexpr
file, that writes the third compiled *.sexpr
file. This isolated read/write activity of Symbolverse is meant to stay in its purest form, not dealing with the effect system.
To reveal a bit of future plans for Symbolverse, it will be extended in three major steps. The current, first step includes flat rule based programming, and it provides the minimal viable product. The second step will include rulesets modularity and scopes over which the rules can operate. The third, and the final step will embrace typing of rules with possibility of reporting errors in input expressions.
The project home page is at: https://github.com/tearflake/symbolverse. There are simple instructions on how to build the Symbolverse executable from the source code, while the repository also provides API suitable for browser and Node.js use. Take a look at some examples and do experiments at: online playground. If this made you interested, read about Symbolverse term rewriting system from: symbolverse.md file.
Recently, while thinking about programming languages, I had an idea for a (maybe) novel function-call syntax, which generalizes prefix, infix, and postfix notation.
I've written the following explanation: https://gist.github.com/Dobiasd/bb9d38a027cf3164e66996dd9e955481
Since I'm not experienced in language design, it would be great if you could give me some feedback. I'm also happy to learn why this idea is nonsense, in case it is. :)
Hello everyone, I'm starting my first year as a Masters student in CS and one of the courses I'm taking this year is Computer and Information Security.
Basically we have a project due at the end of the semester to write a research paper on a topic within the world of Security.
My mind immediately jumped to type systems and compile time checks to force the user to embed security measures within the design of our code.
So, does anyone have any interesting papers about this topic, or very similar to it.
An example I'd say is TRACTOR from the us gov.
Key thing that I'm trying to design is making it overall intuitive for users to work with the system and create tasks/events for scheduling.
Here's the core ideas to keep in mind for this system:
input_int("Enter your value here --> ")
Here's my design draft for SSE3:
// provides a series of built-in tools when including the
// flags (__LOG__ outputs all log data to console)
include: __LOG__
// similar to a array, you can group tasks together (variable name is optional)
task trial {
"read chapter 1", 3, 2, BLUE
"finish homework 7", 7, 1
}
card = getCards() // <-- this retrieves a Card from the system
add(trial, card) // adds all 'trial' tasks to 'card'
// this builds the schedule
build()
displaySchedule()
I've found myself in the past regularly modifying my data files and as such will be allowing users to read in task/event/card data from separate files as well.
But obviously, if the user wishes to not have to update any file, they can simply do this:
include: __LOG__
taskSet = inputTasks()
cardSet = getCards()
add(taskSet, cardSet) // <-- adds tasks to cards
Let me know your thoughts (was up all night trying to think of something better than SSE2).
Appreciate your help :)
Sorry, if it sounds I am being boastful but I wanna confirm.
Once we have defined the CFG of a language and can parse it and add type annotation to add some context, we can represent any computation this way? Is there a theorem which proves that this all needed to represent any calculation in any language?