/r/ProgrammingLanguages

Photograph via snooOG

This subreddit is dedicated to the theory, design and implementation of programming languages.

Welcome!

This subreddit is dedicated to the theory, design and implementation of programming languages.

Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.

This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming. It's also not the place for questions one can trivially answer by spending a few minutes using a search engine, such as questions like "What is a monad?".

Related subreddits

Related online communities

/r/ProgrammingLanguages

103,084 Subscribers

26

Return declaration

Nim has a feature where a variable representing the return value of a procedure is automatically declared with the name result:

proc sumTillNegative(x: varargs[int]): int =
  for i in x:
    if i < 0:
      return
    result = result + i

I think a tiny tweak to this idea would make it a little bit nicer: allow the return variable to be user-declared with the return keyword:

proc sumTillNegative(x: varargs[int]): int =
  return var sum = 0

  for i in x:
    if i < 0:
      return
    sum = sum + i

Is this already done in some other language/why would it be a bad idea?

23 Comments
2024/10/31
15:02 UTC

1

Mechanics of stack management

I've never thought too deeply about how call stacks and stack-based memory allocation work (beyond the basics of the return pointer, tail recursion and the fact that there's no need to explicitly deallocate memory) but today I was wondering if the stack was used to manage scopes within a function, and in particular how destructors can be called on stack-based items (i.e. how do you know the boundaries between items, and some minimal type info that's needed)?

I found https://en.wikipedia.org/wiki/Stack-based_memory_allocation and saw its mention of the need for a frame pointer once you have variable length frames, but can folks here recommend a more in-depth treatment of the subject?

TIA!

3 Comments
2024/10/31
10:27 UTC

22

Lua like language optimization

So I'm working on a lua like language, while taking a break from my main project. I thought of an interesting optimization for the problem of tables being a massive bottle neck in JIT compiled lua-like languages.

Tables take a while to access fields. But my language has static types, so what if I treat tables that we know will have certain values as "structs" during compilation. Example:

let Person = {
  name = "",
  age = 0
}

Type of person is now { "name" String, "age" Number }, but because Person can be cast to a normal table and is still a table we can't treat it like a struct.

Person.dob = "October 26, 1979"

What if we bring the known fields out of the table, and just reference them in the table. So the memory layout will now look something like this:

# The Cover object header
| Size | # Includes the string & number
| Type ID | # Different from a normal table
# the fields 
| Table { Any Any } | # the table
| String | # name
| Number | # age

And in that table, we put references to name and age that would unwrap automatically, if you are requesting it through a table, but the compiler can also optimize the calls to just accessing the right indexes in a tuple. And even if we cast a different table to a type(Person) it won't cause problems because the type IDs will be different and we can make sure that those fields exists there.

EDIT: Some qualifications:

By static types, i mean that this technique requires static types that are possible in the language. This is still lua-like language, by static i mean closer to c# or typescript.

28 Comments
2024/10/30
12:11 UTC

3

Matching file system and module/package name with multiple source paths.

I want to support using the file system to define modules / objects along with subdirectories structures.

So, the file foo.ext would create the object foo and bar/foo.ext would create the object bar.foo

All good there.

Now things get slightly convoluted if I support source paths as a list of places where the source can be:

Say I have:

src/
   bar/
      foo.ext
etc/
   bar/
      baz.ext

I can invoke the building tool like:

tool sourcepath: "src, etc"

And it will create bar.foo and bar.baz it finds the objects in the directories src/ and etc/

Now my question: What do you think my tool should do if the source path contains a subdirectory:

tool sourcepath: "src, src/bar"

Strictly speaking it would create the objects bar.foo(from parent src/ and path bar/foo.ext) and foo(from parent src/barand path foo.ext)

Should that be a warning? An error? Should that be two different objects with different fully qualified names but same content?

8 Comments
2024/10/29
21:09 UTC

18

Can you do a C-like language with (mostly) no precedence?

Evaluate right-to-left or left-to-right?

I love APL's lack of precedence, and I love C and C++'s power. I write mostly C++ but have done extensive work in K and Q (APL descendants).

I have been toying with a language idea for about a decade now that is an unopinionated mix of C, C++, Rust, APL, and Java. One of the things I really liked about K was how there is no precedence. Everything is evaluated from right to left (but parsed from left to right). (eg, 2*3+4 is 14, not 10).

Is something like that possible for a C-like language? I don't mind making the syntax a little different, but there are certain constructs that seem to require a left-to-right evaluation, such as items in a struct or namespace (eg namespace.struct.field).

However, function application to allowing chaining without the parens (composition) would need to be rigt-to-left (f g 10). But maybe that isn't a very common case and you just require parens.

Also, assignment would seem weird if you placed it on the right for left-to-right evaluation,and right-to-left allows chaining assignments which I always liked in K.

// in K, assignment is : and divide is % and floor is _
up: r * _ (x + mask) % r: mask + 1

with such common use of const by default and auto type inferance, this is the same as auto const r = ... where r can even be constained to that statement.

But all that requires right-to-left evaluation.

Can you have a right-to-left or left-to-right language that is otherwise similar to C and C++? Would a "mostly" RtL or LtR syntax be confusing (eg, LtR except assignment, all symbols are RtT but all keywords are LtR, etc?)

// in some weird C+K like mix, floor is fn not a keyword
let i64 up: r * floor x + mask / r:mask + 1;
38 Comments
2024/10/28
17:30 UTC

25

R3 programming language

https://github.com/phreda4/r3

R3 is a concatenative language inspires by ColorForth, taking simplification to the maximum, strongly typed, in fact there is only one type of data and therefore it is not necessary to specify it.
I'm interested in hearing feedback, preferably constructive (but not exclusive), about what people who like language design think about the design decisions.

12 Comments
2024/10/27
13:49 UTC

32

Any success with syntaxes that shun the usual rules/tools?

Following recent discussions here I am wondering if anyone has successfully implemented a syntax they think makes their code beautiful but which shuns the usual tooling and/or approaches, i.e. not LL(k), LR(k), LALR(k) etc.?

I have been toying with the idea for a while but never took the leap because I fell for the academic koolaid.

EDIT: Some great examples here. Zimbu's asymmetric }:

FUNC Main() int
  RETURN exitVal
}

Pipes to demark indentation:

int main()
| printf("Hello World\n");
| return 0;

"Eyebrowless" JSON inspired by Jason where all :, , and " have been replaced with minimal whitespace for a 15% space saving and probably faster parsing to boot:

{id "0001" type donut name Cake
 image {url "images/0001.jpg" width 200 height 200}
 thumbnail {url "images/thumbnails/0001.jpg" width 32 height 32}}

I considered this:

for i ∈ [0..9) do
  print i

C has some ambiguities. Is it a multiply or a pointer type:

T ** c;

Is it a++ + b or a + ++b:

a +++ b

I was hoping for beautiful weirdness but this is interesting!

52 Comments
2024/10/27
11:56 UTC

6

Haskell-typing on Python? Has it been done and Why or Why not?

I really like haskell types and maybe even the syntax but the syntax is weird to most people.

I was wondering if someone has attempted to do add haskell typing to Python (mypy isn't really it).

Or.. maybe make a python-like syntax that compiles to Haskell.

Would this be something useful?

12 Comments
2024/10/27
06:35 UTC

93

Turing incomplete computer languages

It seems to be a decent rule of thumb that any language used to instruct a computer to do a task is Turing complete (ignoring finite memory restrictions).
Surprisingly, seemingly simple systems such as Powerpoint, Magic: the gathering, game of life, x86 mov, css, Minecraft and many more just happen to be Turing complete almost by accident.

I'd love to hear more about counterexamples. Systems/languages that are so useful that you'd assume they're Turing complete, which accidentally(?) turn out not to be.

The wiki page on Turing completeness gives a few examples, such as some early pixel shaders and some languages specifically designed to be Turing incomplete. Regular expressions also come to mind.

What surprised you?

97 Comments
2024/10/26
16:10 UTC

13

Working on a Tree-Walk Interpreter for a language

TLDR: Made an interpreted language (based on Lox/Crafting Interpreters) with a focus on design by contract, and exploring the possibility of having code blocks of other languages such as Python/Java within a script written in my lang.

I worked my way through the amazing Crafting Interpreters book by Robert Nystrom while learning how compilers and interpreters work, and used the tree-walk version of Lox (the language you build in the book using Java) as a partial jumping off point for my own thing.

I've added some additional features, such as support for inline test blocks (which run/are evaled if you run the interpreter with the --test flag), and a built-in design by contract support (ie preconditions, postconditions for functions and assertions). Plus some other small things like user input, etc.

Something I wanted to explore was the possibility of having "blocks" of code in other languages such as Java or Python within a script written in my language, and whether there would be any usecase for this. You'd be able to pass in / out data across the language boundary based on some type mapping. The usecase in my head: my language is obviously very limited, and doing this would make a lot more possible. Plus, would be pretty neat thing to implement.

What would be a good, secure way of going about it? I thought of utilising the Compiler API in Java to dynamically construct classes based on the java block, or something like RestrictedPython.

Here's a an example of what I'm talking about:

// script in my language    
    
    fun factorial(num)
        precondition: num >= 0
        postcondition: result >= 1
    {
        // a java block that takes the num variable across the lang boundary, and "returns" the result across the boundary
        java (num) {
            // Java code block starts here
            int result = 1;
            for (int i = 1; i <= num; i++) {
                result *= i;
            }
            return result; // The result will be accessible as `result` in my language
        }
    }
    
    // A test case (written in my lang via its test support) to verify the factorial function
    test "fact test" {
        assertion: factorial(5) == 120, "error";
        assertion: factorial(0) == 1, "should be 1";
    }
    
    print factorial(6);
5 Comments
2024/10/26
12:16 UTC

13 Comments
2024/10/26
04:40 UTC

2

Restricted semantics for imperative programming correctness (Reposted Question)

1 Comment
2024/10/26
04:36 UTC

62

Gren: A Language for Modern Software Development [video]

29 Comments
2024/10/25
09:41 UTC

10

Looking for feedback on a concept of a shell.

Long-time Linux user but mostly avoid using Bash since I don't like it. I wrote a programming language once already, and that didn't go so well. The problem I'm having lately is that I can't kick this idea I have for a shell.

I've looked at and/or tried what feels like a dozen alternative shells, and none of them hit that sweet spot. I want a shell that gives me Bash's job control, redirection, and so on, but with control flow that I can remember.

The shell has two modes for parsing. Which one it picks depends on the first token after the initial identifier. It looks like this:

#!/usr/bin/env xyz
print('Hello world!') # ( = exec
echo 'Hello, world!' # String = command
var name = "BOB" # keyword var = exec
@name = @name.lower() # @ required for vars
print('Hi ' ++ @name) # not sure about this concat
echo 'Hi ' @name

I hate memorizing letters for test, so spell it out:

if Path.is_dir("/bin") {
    ...
}

Core literal types: Integer, String, List, Hash. Not interested in adding floats. Might add actual booleans, or might just use Integer as boolean...

String.to_i_or(...) to prevent oops string didn't convert, script is now a bomb.

Core types have a bunch of methods on them: Lists have a push, a size, strings can be split, etc. Strings have ".words" to get individual space-separated words. Possibly ".words_or('...')." Batteries very included.

Non-literal types like Json with special transformers:

var my_data = @["a": 1, "b": 2]

Json.write(@my_data) > some_file.txt

Explicit word expansion:

var abc = "a b c.txt"
ls @abc # ls "a b c.txt"
ls @abc.words() # ls "a" "b" "c.txt"

Environment variable set and get:

var my_dirlist = $['PATH'].dirs_to_list() # convert to List[String]
paths.push($['HOME'])
$['PATH'] = paths.to_dirs()

Builtin commands (ex, a history) have typed alternatives (always? usually?), with more common flags available by methods:

history -c # clear history
History.clear() # Same, but typed

Running a command with some temporary adjustment should be like this:

with <assignment> { commands }

By default, commands that fail terminate further execution. However, a maybe { ... } block allows commands to fail.

These are the ideas that I have so far. Does this seem like a good foundation for building a shell upon? Are there parts that I have missed?

9 Comments
2024/10/24
22:27 UTC

25

My IR Language

This is about my Intermediate Language. (If someone knows the difference between IR and IL, then tell me!)

I've been working on this for a while, and getting tired of it. Maybe what I'm attempting is too ambitious, but I thought I'd post about what I've done so far, then take a break.

Now, I consider my IL to be an actual language, even though it doesn't have a source format - you construct programs via a series of function calls, since it will mainly be used as a compiler backend.

I wrote a whole bunch of stuff about it today, but when I read it back, there was very little about the language! It was all about the implementation (well, it is 95% of the work).

So I tried again, and this time it is more about about the language, which is called 'PCL':

https://github.com/sal55/pcl

A textual front end could be created for it in a day or so, and while it would be tedious to write long programs in it, it would still be preferable to writing assembly code.

As for the other stuff, that is this document:

https://github.com/sal55/pcl/blob/main/pcl2024.md

This may be of interest to people working on similar matters.

(As stated there early on, this is a personal project; I'm not making a tool which is the equivalent of QBE or an ultra-lite version of LLVM. While it might fill that role for my purposes, it can't be more than that for the reasons mentioned.)

ETA Someone asked me to compare this language to existing ones. I decided I don't want to do that, or to criticise other products. I'm sure they all do their job. Either people get what I do or they don't.

In my links I mentioned the problems of creating different configurations of my library, and I managed to do that for the main Win64 version by isolating each backend option. The sizes of the final binary in each case are as follows:

PCL API Core        13KB      47KB (1KB = 1000 bytes)
+ PCL Dump only     18KB      51KB
+ RUN PCL only      27KB      61KB (interpreter)
+ ASM only          67KB     101KB (from here on, PCL->x64 conversion needed)
+ OBJ only          87KB     122KB
+ EXE/DLL only      96KB     132KB
+ RUN only          95KB     131KB
+ Everything       133KB     169KB

The right-hand column is for a standalone shared (and relocatable) library, and the left one is the extra size when the library is integrated into a front-end compiler and compiled for low-memory. (The savings are the std library plus the reloc info.)

I should say the product is not finished, so it could be bigger. So just call it 0.2MB; it is still miniscule compared with alternatives. 27KB extra to add an IL + interpreter? These are 1980s microcomputer sizes!

19 Comments
2024/10/24
19:41 UTC

11

UPMS (Universal Pattern Matching Syntax)

Rust and Elixir are two languages that I frequently hear people praise for their pattern matching design. I can see where the praise comes from in both cases, but I think it's interesting how despire this shared praise, their pattern matching designs are so very different. I wonder if we could design a pattern matching syntax/semantics that could support both of their common usages? I guess we could call it UPMS (Universal Pattern Matching Syntax) :)

Our UPMS should support easy pattern-matching-as-tuple-unpacking-and-binding use, like this from the Elixir docs:

{:ok, result} = {:ok, 13}

I think this really comes in handy in things like optional/result type unwrapping, which can be quite common.

{:ok, result} = thing_that_can_be_ok_or_error()

Also, we would like to support exhaustive matching, a la Rust:

match x {
    1 => println!("one"),
    2 => println!("two"),
    3 => println!("three"),
    _ => println!("anything"),
}

Eventually, I realized that Elixir's patterns are pretty much one-LHS-to-one-RHS, whereas Rust's can be one-LHS-to-many-RHS. So what if we extended Elixir's matching to allow for this one-to-many relationship?

I'm going to spitball at some syntax, which won't be compatible with Rust or Elixir, so just think of this as a new language.

x = {
    1 => IO.puts("one")
    2 => IO.puts("two")
    3 => IO.puts("three")
    _ => IO.puts("anything")
}

We extend '=' to allow a block on the RHS, which drops us into a more Rust-like exhaustive mode. '=' still acts like a binary operator, with an expression on the left.

We can do the same kind of exhaustiveness analysis rust does on all the arms in our new block, and we still have the reduce for for fast Elixir-esque destructuring. I was pretty happy with this for a while, but then I remembered that these two pattern matching expressions are just that, expressions. And things get pretty ugly when you try to get values out.

let direction = get_random_direction()
let value = direction = {
    Direction::Up => 1
    Direction::Left => 2
    Direction::Down => 3
    Direction::Right => 4
}

This might look fine to you, but the back-to-back equals looks pretty awful to me. If only the get the value out operator was different than the do pattern matching operator. Except, that's exactly the case in Rust. If we just pull that back into this syntax by just replacing Elixir's '=' with 'match':

let direction = get_random_direction()
let value = direction match {
    Direction::Up => 1
    Direction::Left => 2
    Direction::Down => 3
    Direction::Right => 4
}

This reads clearer to me. But now, with 'match' being a valid operator to bind variables on the LHS...

let direction = get_random_direction()
let value match direction match {
    Direction::Up => 1
    Direction::Left => 2
    Direction::Down => 3
    Direction::Right => 4
}

We're right back where we started.

We can express this idea in our current UPMS, but it's a bit awkward.

[get_random_direction(), let value] = {
    [Direction::Up, 1]
    [Direction::Left, 2]
    [Direction::Down, 3]
    [Direction::Right, 4]
}

I suppose that this is really not that dissimilar, maybe I'd get used to it.

So, thoughts? Have I discovered something a language I haven't heard of implemented 50 years ago? Do you have an easy solution to fix the double-equal problem? Is this an obviously horrible idea?

15 Comments
2024/10/24
19:24 UTC

13

[SEI' 24] Modern Systems Programming: Rust and Zig - Aleksey Kladov

0 Comments
2024/10/24
15:59 UTC

6

QuickSched - How I designed an intuitive parser for handling schedule data

QuickSched is a robust, dynamic scheduling platform that is closely configured with user settings and is fully synced with Google Calendar. While it's a powerful scheduling tool, I needed a parser that would match this nature.

The core goals for the parser were simplicity and flexibility. Here's how we approached it.

Tokenization

I wanted to minimize the amount of data I needed to store when handling a token. For my system, I decided to classify tokens into one of the 2 categories:

  • Strings
  • Identifiers/Values

Quotation marks were preserved to allow easy identification of token type by the Parser. So, if given the following command, it would be stored as follows:

task "A" 5.5 @ friday // this simply creates task "A" with 5.5 hours due on friday

['task', '"A"', '5.5', '@', 'friday']

Parsing General

One of the key goals of the parser for QuickSched was to allow a logical flow of data. Below is the grammar for an event/timeblock (<> represents required arguments, [] means optional):

event bool <name> [cardId] @ [date] <timestamp>

While we could have placed restrictions on where the overall time expression was supposed to be placed (starting with '@'), we wanted maximum flexibility. So, how our parser was designed was it would continue reading all time related data until:

  1. It reached the end of line
  2. It reached a non time token (upon which, it would then check if it was a name or cardId)

Users can provide days of the week and time expressions in whatever order they prefer (though, duplicates are not permitted for obvious reasons).

Here's an example of recurring timeblock:

event true @ mon wed fri 11-12:45 +C2 "Class2"

The beauty of this when compared to say TaskWarrior is the simplicity of the command and how it all flows.

It reads: "Create a recurring event at mon, wed, fri from 11am-12:45pm with card id 2, and the name 'Class2'".

Parsing TimeStamps

Since timeblocking is a common occurrence when it comes to scheduling, we decided that we needed to support error prone timestamp expressions that omitted information (yet, clearly inferred what was desired). The rules for our timestamp expressions were as follows:

  • A dash (`-`) is required to separate the start and end times.
  • Minutes must be specified with two digits (e.g., `:15`), or not at all.
  • Colons (`:`) are only required when minutes are included.
  • If the end hour is less than the start, the end hour is assumed to be pm.
  • To mark the start hour as pm, "pm" must be included.
  • 24-hour time formats are not permitted.
  • Timestamps cover a single day and must start and end within the same day.

And here is a list of all valid timestamp formats:

Timestamp Formats:
 - 9-2                 (9:00am-2:00pm)
 - 9-2:15              (9:00am-2:15pm)
 - 9:-2:15             (9:00am-2:15pm)
 - 09:00am-02:15pm     (9:00am-2:15pm)
 - 09:am-2:15pm        (9:00am-2:15pm)
 - 9:am-2:15pm         (9:00am-2:15pm)
 - 9-2:15pm            (9:00am-2:15pm)

Serialization Benefits

One of the great benefits we saw from creating this parser was that we were able to repurpose it for our Serialization Tooling. Our serializer was designed to handle basic pointer handling (e.g. Events referencing Cards) along with rebuilding the entire schedule. It stored the exact same commands and inferred the current/past dates to automatically archive past Tasks.

Example of a sample serialization file:

CARD {
    "MA" LIGHT_BLUE
    "LIT" LIGHT_GREEN
    "CS" LIGHT_CORAL
    "PHI" YELLOW
    "Food" BLUE
}

TASK {
    "Start Final Literature Project" 5.5 +C1 @ 27-09-2024
    "Work on Group Coding Project" 7.0 +C2 @ 28-09-2024
    "Study for Philosophy Midterm" 5.0 +C3 @ 29-09-2024
    "Complete Math Extra Credit" 3.0 +C0 @ 30-09-2024
    "Submit Final Philosophy Essay" 6.0 +C3 @ 01-10-2024
    "work" 4.0 @ 05-10-2024
}

EVENT {
    true "CS Class" +C2 @ MON WED FRI 02:00pm-03:30pm
    true "LIT Class" +C1 @ TUE THU 10:30am-12:00pm
    true "PHI Class" +C3 @ TUE THU 01:00pm-02:30pm
    true "MA Class" +C0 @ MON WED FRI 09:00am-10:15am
    true "Lunch" +C4 @ MON TUE WED THU FRI 12:00pm-01:00pm
}

DAY {
    21-09-2024 
    22-09-2024 
    23-09-2024 T0 10:15am-11:45am T0 01:00pm-02:00pm T0 03:30pm-06:30pm T1 06:30pm-09:00pm E3 E4 E0 
    24-09-2024 T1 09:00am-10:30am T1 02:30pm-05:30pm T2 05:30pm-09:00pm E1 E4 E2 
    25-09-2024 T2 10:15am-11:45am T3 01:00pm-02:00pm T3 03:30pm-05:30pm T4 05:30pm-09:00pm E3 E4 E0 
    26-09-2024 T4 09:00am-10:30am T4 02:30pm-03:30pm T5 03:30pm-07:30pm E1 E4 E2 
}

Conclusions

Hope this was an enjoyable read! A lot of time and effort went into this whole thing. I've attached a link below to the parser file for anyone interested:

Parser: https://github.com/AndrewRoe34/quick-sched/blob/master/src/main/java/com/planner/util/Parser.java

Let me know if you have any questions or comments below :)

4 Comments
2024/10/24
04:41 UTC

Back To Top