/r/ProgrammingLanguages
This subreddit is dedicated to the theory, design and implementation of programming languages.
This subreddit is dedicated to the theory, design and implementation of programming languages.
Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.
This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming. It's also not the place for questions one can trivially answer by spending a few minutes using a search engine, such as questions like "What is a monad?".
/r/ProgrammingLanguages
Nim has a feature where a variable representing the return value of a procedure is automatically declared with the name result
:
proc sumTillNegative(x: varargs[int]): int =
for i in x:
if i < 0:
return
result = result + i
I think a tiny tweak to this idea would make it a little bit nicer: allow the return variable to be user-declared with the return
keyword:
proc sumTillNegative(x: varargs[int]): int =
return var sum = 0
for i in x:
if i < 0:
return
sum = sum + i
Is this already done in some other language/why would it be a bad idea?
I've never thought too deeply about how call stacks and stack-based memory allocation work (beyond the basics of the return pointer, tail recursion and the fact that there's no need to explicitly deallocate memory) but today I was wondering if the stack was used to manage scopes within a function, and in particular how destructors can be called on stack-based items (i.e. how do you know the boundaries between items, and some minimal type info that's needed)?
I found https://en.wikipedia.org/wiki/Stack-based_memory_allocation and saw its mention of the need for a frame pointer once you have variable length frames, but can folks here recommend a more in-depth treatment of the subject?
TIA!
So I'm working on a lua like language, while taking a break from my main project. I thought of an interesting optimization for the problem of tables being a massive bottle neck in JIT compiled lua-like languages.
Tables take a while to access fields. But my language has static types, so what if I treat tables that we know will have certain values as "structs" during compilation. Example:
let Person = {
name = "",
age = 0
}
Type of person is now { "name" String, "age" Number }
, but because Person can be cast to a normal table and is still a table we can't treat it like a struct.
Person.dob = "October 26, 1979"
What if we bring the known fields out of the table, and just reference them in the table. So the memory layout will now look something like this:
# The Cover object header
| Size | # Includes the string & number
| Type ID | # Different from a normal table
# the fields
| Table { Any Any } | # the table
| String | # name
| Number | # age
And in that table, we put references to name
and age
that would unwrap automatically, if you are requesting it through a table, but the compiler can also optimize the calls to just accessing the right indexes in a tuple. And even if we cast a different table to a type(Person)
it won't cause problems because the type IDs will be different and we can make sure that those fields exists there.
EDIT: Some qualifications:
By static types, i mean that this technique requires static types that are possible in the language. This is still lua-like language, by static i mean closer to c# or typescript.
I want to support using the file system to define modules / objects along with subdirectories structures.
So, the file foo.ext
would create the object foo
and bar/foo.ext
would create the object bar.foo
All good there.
Now things get slightly convoluted if I support source paths as a list of places where the source can be:
Say I have:
src/
bar/
foo.ext
etc/
bar/
baz.ext
I can invoke the building tool like:
tool sourcepath: "src, etc"
And it will create bar.foo
and bar.baz
it finds the objects in the directories src/ and etc/
Now my question: What do you think my tool should do if the source path contains a subdirectory:
tool sourcepath: "src, src/bar"
Strictly speaking it would create the objects bar.foo
(from parent src/
and path bar/foo.ext
) and foo
(from parent src/bar
and path foo.ext
)
Should that be a warning? An error? Should that be two different objects with different fully qualified names but same content?
Evaluate right-to-left or left-to-right?
I love APL's lack of precedence, and I love C and C++'s power. I write mostly C++ but have done extensive work in K and Q (APL descendants).
I have been toying with a language idea for about a decade now
that is an unopinionated mix of C, C++, Rust, APL, and Java.
One of the things I really liked about K was how there is no
precedence. Everything is evaluated from right to left (but
parsed from left to right). (eg, 2*3+4
is 14, not 10).
Is something like that possible for a C-like language? I don't mind making the syntax a little different, but there are certain constructs that seem to require a left-to-right evaluation, such as items in a struct or namespace (eg namespace.struct.field).
However, function application to allowing chaining without the
parens (composition) would need to be rigt-to-left (f g 10
).
But maybe that isn't a very common case and you just require
parens.
Also, assignment would seem weird if you placed it on the right for left-to-right evaluation,and right-to-left allows chaining assignments which I always liked in K.
// in K, assignment is : and divide is % and floor is _
up: r * _ (x + mask) % r: mask + 1
with such common use of const by default and auto type inferance,
this is the same as auto const r = ...
where r can even be
constained to that statement.
But all that requires right-to-left evaluation.
Can you have a right-to-left or left-to-right language that is otherwise similar to C and C++? Would a "mostly" RtL or LtR syntax be confusing (eg, LtR except assignment, all symbols are RtT but all keywords are LtR, etc?)
// in some weird C+K like mix, floor is fn not a keyword
let i64 up: r * floor x + mask / r:mask + 1;
R3 is a concatenative language inspires by ColorForth, taking simplification to the maximum, strongly typed, in fact there is only one type of data and therefore it is not necessary to specify it.
I'm interested in hearing feedback, preferably constructive (but not exclusive), about what people who like language design think about the design decisions.
Following recent discussions here I am wondering if anyone has successfully implemented a syntax they think makes their code beautiful but which shuns the usual tooling and/or approaches, i.e. not LL(k), LR(k), LALR(k) etc.?
I have been toying with the idea for a while but never took the leap because I fell for the academic koolaid.
EDIT: Some great examples here. Zimbu's asymmetric }
:
FUNC Main() int
RETURN exitVal
}
Pipes to demark indentation:
int main()
| printf("Hello World\n");
| return 0;
"Eyebrowless" JSON inspired by Jason where all :
, ,
and "
have been replaced with minimal whitespace for a 15% space saving and probably faster parsing to boot:
{id "0001" type donut name Cake
image {url "images/0001.jpg" width 200 height 200}
thumbnail {url "images/thumbnails/0001.jpg" width 32 height 32}}
I considered this:
for i ∈ [0..9) do
print i
C has some ambiguities. Is it a multiply or a pointer type:
T ** c;
Is it a++ + b
or a + ++b
:
a +++ b
I was hoping for beautiful weirdness but this is interesting!
I really like haskell types and maybe even the syntax but the syntax is weird to most people.
I was wondering if someone has attempted to do add haskell typing to Python (mypy isn't really it).
Or.. maybe make a python-like syntax that compiles to Haskell.
Would this be something useful?
It seems to be a decent rule of thumb that any language used to instruct a computer to do a task is Turing complete (ignoring finite memory restrictions).
Surprisingly, seemingly simple systems such as Powerpoint, Magic: the gathering, game of life, x86 mov, css, Minecraft and many more just happen to be Turing complete almost by accident.
I'd love to hear more about counterexamples. Systems/languages that are so useful that you'd assume they're Turing complete, which accidentally(?) turn out not to be.
The wiki page on Turing completeness gives a few examples, such as some early pixel shaders and some languages specifically designed to be Turing incomplete. Regular expressions also come to mind.
What surprised you?
TLDR: Made an interpreted language (based on Lox/Crafting Interpreters) with a focus on design by contract, and exploring the possibility of having code blocks of other languages such as Python/Java within a script written in my lang.
I worked my way through the amazing Crafting Interpreters book by Robert Nystrom while learning how compilers and interpreters work, and used the tree-walk version of Lox (the language you build in the book using Java) as a partial jumping off point for my own thing.
I've added some additional features, such as support for inline test blocks (which run/are evaled if you run the interpreter with the --test flag), and a built-in design by contract support (ie preconditions, postconditions for functions and assertions). Plus some other small things like user input, etc.
Something I wanted to explore was the possibility of having "blocks" of code in other languages such as Java or Python within a script written in my language, and whether there would be any usecase for this. You'd be able to pass in / out data across the language boundary based on some type mapping. The usecase in my head: my language is obviously very limited, and doing this would make a lot more possible. Plus, would be pretty neat thing to implement.
What would be a good, secure way of going about it? I thought of utilising the Compiler API in Java to dynamically construct classes based on the java block, or something like RestrictedPython.
Here's a an example of what I'm talking about:
// script in my language
fun factorial(num)
precondition: num >= 0
postcondition: result >= 1
{
// a java block that takes the num variable across the lang boundary, and "returns" the result across the boundary
java (num) {
// Java code block starts here
int result = 1;
for (int i = 1; i <= num; i++) {
result *= i;
}
return result; // The result will be accessible as `result` in my language
}
}
// A test case (written in my lang via its test support) to verify the factorial function
test "fact test" {
assertion: factorial(5) == 120, "error";
assertion: factorial(0) == 1, "should be 1";
}
print factorial(6);
Long-time Linux user but mostly avoid using Bash since I don't like it. I wrote a programming language once already, and that didn't go so well. The problem I'm having lately is that I can't kick this idea I have for a shell.
I've looked at and/or tried what feels like a dozen alternative shells, and none of them hit that sweet spot. I want a shell that gives me Bash's job control, redirection, and so on, but with control flow that I can remember.
The shell has two modes for parsing. Which one it picks depends on the first token after the initial identifier. It looks like this:
#!/usr/bin/env xyz
print('Hello world!') # ( = exec
echo 'Hello, world!' # String = command
var name = "BOB" # keyword var = exec
@name = @name.lower() # @ required for vars
print('Hi ' ++ @name) # not sure about this concat
echo 'Hi ' @name
I hate memorizing letters for test
, so spell it out:
if Path.is_dir("/bin") {
...
}
Core literal types: Integer, String, List, Hash. Not interested in adding floats. Might add actual booleans, or might just use Integer as boolean...
String.to_i_or(...)
to prevent oops string didn't convert, script is now a bomb.
Core types have a bunch of methods on them: Lists have a push, a size, strings can be split, etc. Strings have ".words" to get individual space-separated words. Possibly ".words_or('...')." Batteries very included.
Non-literal types like Json with special transformers:
var my_data = @["a": 1, "b": 2]
Json.write(@my_data) > some_file.txt
Explicit word expansion:
var abc = "a b c.txt"
ls @abc # ls "a b c.txt"
ls @abc.words() # ls "a" "b" "c.txt"
Environment variable set and get:
var my_dirlist = $['PATH'].dirs_to_list() # convert to List[String]
paths.push($['HOME'])
$['PATH'] = paths.to_dirs()
Builtin commands (ex, a history) have typed alternatives (always? usually?), with more common flags available by methods:
history -c # clear history
History.clear() # Same, but typed
Running a command with some temporary adjustment should be like this:
with <assignment> { commands }
By default, commands that fail terminate further execution. However, a maybe { ... }
block allows commands to fail.
These are the ideas that I have so far. Does this seem like a good foundation for building a shell upon? Are there parts that I have missed?
This is about my Intermediate Language. (If someone knows the difference between IR and IL, then tell me!)
I've been working on this for a while, and getting tired of it. Maybe what I'm attempting is too ambitious, but I thought I'd post about what I've done so far, then take a break.
Now, I consider my IL to be an actual language, even though it doesn't have a source format - you construct programs via a series of function calls, since it will mainly be used as a compiler backend.
I wrote a whole bunch of stuff about it today, but when I read it back, there was very little about the language! It was all about the implementation (well, it is 95% of the work).
So I tried again, and this time it is more about about the language, which is called 'PCL':
A textual front end could be created for it in a day or so, and while it would be tedious to write long programs in it, it would still be preferable to writing assembly code.
As for the other stuff, that is this document:
https://github.com/sal55/pcl/blob/main/pcl2024.md
This may be of interest to people working on similar matters.
(As stated there early on, this is a personal project; I'm not making a tool which is the equivalent of QBE or an ultra-lite version of LLVM. While it might fill that role for my purposes, it can't be more than that for the reasons mentioned.)
ETA Someone asked me to compare this language to existing ones. I decided I don't want to do that, or to criticise other products. I'm sure they all do their job. Either people get what I do or they don't.
In my links I mentioned the problems of creating different configurations of my library, and I managed to do that for the main Win64 version by isolating each backend option. The sizes of the final binary in each case are as follows:
PCL API Core 13KB 47KB (1KB = 1000 bytes)
+ PCL Dump only 18KB 51KB
+ RUN PCL only 27KB 61KB (interpreter)
+ ASM only 67KB 101KB (from here on, PCL->x64 conversion needed)
+ OBJ only 87KB 122KB
+ EXE/DLL only 96KB 132KB
+ RUN only 95KB 131KB
+ Everything 133KB 169KB
The right-hand column is for a standalone shared (and relocatable) library, and the left one is the extra size when the library is integrated into a front-end compiler and compiled for low-memory. (The savings are the std library plus the reloc info.)
I should say the product is not finished, so it could be bigger. So just call it 0.2MB; it is still miniscule compared with alternatives. 27KB extra to add an IL + interpreter? These are 1980s microcomputer sizes!
Rust and Elixir are two languages that I frequently hear people praise for their pattern matching design. I can see where the praise comes from in both cases, but I think it's interesting how despire this shared praise, their pattern matching designs are so very different. I wonder if we could design a pattern matching syntax/semantics that could support both of their common usages? I guess we could call it UPMS (Universal Pattern Matching Syntax) :)
Our UPMS should support easy pattern-matching-as-tuple-unpacking-and-binding use, like this from the Elixir docs:
{:ok, result} = {:ok, 13}
I think this really comes in handy in things like optional/result type unwrapping, which can be quite common.
{:ok, result} = thing_that_can_be_ok_or_error()
Also, we would like to support exhaustive matching, a la Rust:
match x {
1 => println!("one"),
2 => println!("two"),
3 => println!("three"),
_ => println!("anything"),
}
Eventually, I realized that Elixir's patterns are pretty much one-LHS-to-one-RHS, whereas Rust's can be one-LHS-to-many-RHS. So what if we extended Elixir's matching to allow for this one-to-many relationship?
I'm going to spitball at some syntax, which won't be compatible with Rust or Elixir, so just think of this as a new language.
x = {
1 => IO.puts("one")
2 => IO.puts("two")
3 => IO.puts("three")
_ => IO.puts("anything")
}
We extend '=' to allow a block on the RHS, which drops us into a more Rust-like exhaustive mode. '=' still acts like a binary operator, with an expression on the left.
We can do the same kind of exhaustiveness analysis rust does on all the arms in our new block, and we still have the reduce for for fast Elixir-esque destructuring. I was pretty happy with this for a while, but then I remembered that these two pattern matching expressions are just that, expressions. And things get pretty ugly when you try to get values out.
let direction = get_random_direction()
let value = direction = {
Direction::Up => 1
Direction::Left => 2
Direction::Down => 3
Direction::Right => 4
}
This might look fine to you, but the back-to-back equals looks pretty awful to me. If only the get the value out operator was different than the do pattern matching operator. Except, that's exactly the case in Rust. If we just pull that back into this syntax by just replacing Elixir's '=' with 'match':
let direction = get_random_direction()
let value = direction match {
Direction::Up => 1
Direction::Left => 2
Direction::Down => 3
Direction::Right => 4
}
This reads clearer to me. But now, with 'match' being a valid operator to bind variables on the LHS...
let direction = get_random_direction()
let value match direction match {
Direction::Up => 1
Direction::Left => 2
Direction::Down => 3
Direction::Right => 4
}
We're right back where we started.
We can express this idea in our current UPMS, but it's a bit awkward.
[get_random_direction(), let value] = {
[Direction::Up, 1]
[Direction::Left, 2]
[Direction::Down, 3]
[Direction::Right, 4]
}
I suppose that this is really not that dissimilar, maybe I'd get used to it.
So, thoughts? Have I discovered something a language I haven't heard of implemented 50 years ago? Do you have an easy solution to fix the double-equal problem? Is this an obviously horrible idea?
QuickSched is a robust, dynamic scheduling platform that is closely configured with user settings and is fully synced with Google Calendar. While it's a powerful scheduling tool, I needed a parser that would match this nature.
The core goals for the parser were simplicity and flexibility. Here's how we approached it.
I wanted to minimize the amount of data I needed to store when handling a token. For my system, I decided to classify tokens into one of the 2 categories:
Quotation marks were preserved to allow easy identification of token type by the Parser. So, if given the following command, it would be stored as follows:
task "A" 5.5 @ friday // this simply creates task "A" with 5.5 hours due on friday
['task', '"A"', '5.5', '@', 'friday']
One of the key goals of the parser for QuickSched was to allow a logical flow of data. Below is the grammar for an event/timeblock (<> represents required arguments, [] means optional):
event bool <name> [cardId] @ [date] <timestamp>
While we could have placed restrictions on where the overall time expression was supposed to be placed (starting with '@'), we wanted maximum flexibility. So, how our parser was designed was it would continue reading all time related data until:
Users can provide days of the week and time expressions in whatever order they prefer (though, duplicates are not permitted for obvious reasons).
Here's an example of recurring timeblock:
event true @ mon wed fri 11-12:45 +C2 "Class2"
The beauty of this when compared to say TaskWarrior is the simplicity of the command and how it all flows.
It reads: "Create a recurring event at mon, wed, fri from 11am-12:45pm with card id 2, and the name 'Class2'".
Since timeblocking is a common occurrence when it comes to scheduling, we decided that we needed to support error prone timestamp expressions that omitted information (yet, clearly inferred what was desired). The rules for our timestamp expressions were as follows:
And here is a list of all valid timestamp formats:
Timestamp Formats:
- 9-2 (9:00am-2:00pm)
- 9-2:15 (9:00am-2:15pm)
- 9:-2:15 (9:00am-2:15pm)
- 09:00am-02:15pm (9:00am-2:15pm)
- 09:am-2:15pm (9:00am-2:15pm)
- 9:am-2:15pm (9:00am-2:15pm)
- 9-2:15pm (9:00am-2:15pm)
One of the great benefits we saw from creating this parser was that we were able to repurpose it for our Serialization Tooling. Our serializer was designed to handle basic pointer handling (e.g. Events referencing Cards) along with rebuilding the entire schedule. It stored the exact same commands and inferred the current/past dates to automatically archive past Tasks.
Example of a sample serialization file:
CARD {
"MA" LIGHT_BLUE
"LIT" LIGHT_GREEN
"CS" LIGHT_CORAL
"PHI" YELLOW
"Food" BLUE
}
TASK {
"Start Final Literature Project" 5.5 +C1 @ 27-09-2024
"Work on Group Coding Project" 7.0 +C2 @ 28-09-2024
"Study for Philosophy Midterm" 5.0 +C3 @ 29-09-2024
"Complete Math Extra Credit" 3.0 +C0 @ 30-09-2024
"Submit Final Philosophy Essay" 6.0 +C3 @ 01-10-2024
"work" 4.0 @ 05-10-2024
}
EVENT {
true "CS Class" +C2 @ MON WED FRI 02:00pm-03:30pm
true "LIT Class" +C1 @ TUE THU 10:30am-12:00pm
true "PHI Class" +C3 @ TUE THU 01:00pm-02:30pm
true "MA Class" +C0 @ MON WED FRI 09:00am-10:15am
true "Lunch" +C4 @ MON TUE WED THU FRI 12:00pm-01:00pm
}
DAY {
21-09-2024
22-09-2024
23-09-2024 T0 10:15am-11:45am T0 01:00pm-02:00pm T0 03:30pm-06:30pm T1 06:30pm-09:00pm E3 E4 E0
24-09-2024 T1 09:00am-10:30am T1 02:30pm-05:30pm T2 05:30pm-09:00pm E1 E4 E2
25-09-2024 T2 10:15am-11:45am T3 01:00pm-02:00pm T3 03:30pm-05:30pm T4 05:30pm-09:00pm E3 E4 E0
26-09-2024 T4 09:00am-10:30am T4 02:30pm-03:30pm T5 03:30pm-07:30pm E1 E4 E2
}
Hope this was an enjoyable read! A lot of time and effort went into this whole thing. I've attached a link below to the parser file for anyone interested:
Parser: https://github.com/AndrewRoe34/quick-sched/blob/master/src/main/java/com/planner/util/Parser.java
Let me know if you have any questions or comments below :)