/r/ProgrammingLanguages

Photograph via snooOG

This subreddit is dedicated to the theory, design and implementation of programming languages.

Welcome!

This subreddit is dedicated to the theory, design and implementation of programming languages.

Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.

This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming.

Related subreddits

Related online communities

/r/ProgrammingLanguages

97,559 Subscribers

22

How can I handle standard output for my language if it is strongly typed and does not provide functions overloading?

EDIT : I added a bunch of context under u/L8_4_Dinner's comment.

Basically, I am creating a small specialised language as a side project.

It's a pretty simple C-like syntax language, nothing that specific about it except it is statically typed and will support some light structure types. The syntax is not 100% final but some simple code could look like this :

struct Dog = {
  num legCount = 4;
  str name;
}

func sayHello(Dog d) {
  // here lies my question
  console(d.name);
}

My question is : what should be the signature of my "console" function? Which ofc is provided by my language API. Knowing that functions do not support optional types, you cannot overload functions and don't have access to any placeholder type (like Object in Java or any in TypeScript).

This function should be able to take any number of parameters of any type and log them to the console (think console.log in JS). My language do not provide any way to express this (and it is intentional).

What should I do from here?

  • Find a way around, like create a specific console function for each primitive type. This one is a bit ugly but at least doesn't break the realm I created with the language. Also, still need to figure out a solution for structures.
  • Just cheat it? The language is interpreted, nothing prevent me to create an exception and have this function ignore any parameters typing rules and just do it's job. I am just unsure on a language design POV if this is good practice : providing to the user APIs that do not conform to the rules he is himself imposed?
  • Make it a statement and integrate it inside the language syntax? Similar to the "print" statement in Lox.

Then I have a last idea that I find super cool, but I have the feeling there is a catch : make it an expression, that doesn't do anything special except logging the value and resolving it. It could lead to pretty neat syntaxes like :

while(console i < 10) {
  // would console the value of i at each evaluation
}

// Is also supported because EXPRESSION + ";" is a valid statement
console "Hello, world!";

This expression would ofc need a very high precedence, I would think the only expressions with higher precedence are literals, identifiers and unary (!, -) expressions.

So, is this a common problem? I am very new to all this, probably I missed the whole point or this is already highly discussed.

35 Comments
2024/04/25
14:39 UTC

2

should i allow overloading operators for not my types?

hi, I'm making a sort-of-mix-between-JS-and-CSharp programming language, and I want it to support operator overloading, so that I can, e.g., add arbitrary types, like Points for instance.

I'm thinking if I should support overloading types that I do not own?

I.e. should I allow an operator definition be a standalone thing, like functions, or should it be inside a class and be like a method?

1:

operator + (lhs: TypeIDoNotOwn, rhs: Int) : Int => ...

2:

class TypeIOwn {
    operator+ (rhs: Int): Int => ...
}

2 can also require two operands (and therefore make an operator a static member), and require one of the operands be of this type, but that is outside the scope of the question.

the 1st option will be basically like extension methods, but extension... operators?

since operators basically have function semantics anyway (you can assume 1 + 2 is functionally equivalent to something akin 1.operator+(2)), is it harmful to allow users to define custom operators on all types, even those that are defined in a separate assembly/module and they therefore cannot define it there?

if it's important, the class structure cannot change at runtime, so it's more similar to C# in that regard. there are no prototype chains like in JS.

7 Comments
2024/04/25
10:39 UTC

17

Re-re-re-thinking for and while loops

Hello again, friends, critics, and assorted rubber ducks.

Despite Pipefish being functional, I don't want to use recursion for things that you wouldn't use recursion for in a normal language. And so despite my general wish to keep the language small and TOOWTDI, I've supplied a bunch of ways to iterate on things. For example the mapping operator >>. Example of usage, [1, 2, 3] >> that + 1 returns [2, 3, 4]. The compiler classifies that as a Very Local Variable, it just exists in the scope of the right-hand-side of the >> operator.

(Note that such a variable doesn't really violate referential transparency, because from the point of view of the semantics of the language, it doesn't take on all the values in the list any more than the x in ∀x ∊ ℕ : x ≥ 0 takes on all the values in , or the i in big sigma notation takes on every value in its range.)

And I also felt the need for for and while, hence this post. The while loop I initially just implemented in Pipefish as a higher-order function with signature while (condition func) do (action func) to (data tuple). Example of usage --- this adds up the numbers from 0 to n - 1.

triangle(n) : 
    (while unfinished do add to 0, 0)[1]
given : 
    unfinished(counter, total) : counter < n
    add(counter, total) : counter + 1, total + counter

For those who don't know Pipefish, the given block defines inner functions. Note the capture of n. This is standard functional stuff. Note also the indexing, the [1]. This is because we've been iterating on two values, the counter and the total, but we only want the total. This sort of situation is also common in functional languages.

Doing a for loop was hairier and needed me to hardwire the semantics. What I came up with was something like this ...

triangle(n) : 
    for i in 1::(n + 1) do add to 0
given : 
    add(x) : x + i

... which worked fine when I was using an evaluator because then the i in the function can be the same as the i in the for clause just because they have the same name. In a compiler however, this raises some nasty issues because they have to refer to the same memory location. But a function can't always know whether it's going to be passed to a for loop ...

So, next thought, we need some closer relation between the for loop and its body then its body just being a function it gets passed. Let's make a for block:

triangle(n) :
    for i in 0::n do to 0 :
        i + ... ?

... wait, we now have another problem. Working with functions may have been somewhat cumbersome, but it supplied us with a name, x, for the thing we wanted to add i to. Now we must do it ourselves:

triangle(n) :
    for i in 0::n with x::0 :
        x + i

(Please note that I am not at all wedded to the with <name>::<expression> syntax. I'm open to suggestions any so long as they can also be used for while loops --- see below. starting with would be clearer but I think excessively verbose. ETA --- how about from? That gives the idea of initialization and unlike with with I'd feel happier about using = rather than ::. So for i in 0::n from x = 0 : .... Or perhaps it would make more sense to put the initialization first? from x = 0 for i in 0::n : ... ?) Or avoid the keyword by recycling Go's := variable initializer, which I haven't used yet: x := 0; for i in 0::n : ... I've got lots of syntactic options. I wouldn't say no to more.)

The body of the loop supplies an expression giving the new value of x. Using multiple returns we can deal with multiple variables:

fib(n) :
    for i in 0::n with a::0, b::1 :
        b, a + b

Note that this is one of those cases where we end up with too many return values, since we get back a tuple consisting of (the final values of) a, b, when all we want is a. We could perhaps use the names of the variables as syntactic sugar for indexing :

fib(n) :
    (for i in 0::n with a::0, b::1 : b, a + b)[a]

We can do the same sorts of things with while loops, and as we can, we should. To illustrate, let's write a non-recursive Collatz function:

collatz(n) :
    while i != 1 with i::n :
	i % 2 == 0 :
            i / 2
        else :
            3 * i + 1

Unlike in the function-based implementation, break statements could become meaningful. E.g. this function would be equivalent to the previous one:

collatz(n) :
    while true with i::n :
        i == 1 :
            break
	i % 2 == 0 :
            i / 2
        else :
            3 * i + 1

Apart from the addition of break, everything in the body of these while or for loops is an ordinary functional Pipefish expression. We don't reassign any variables, we don't mutate any values, we don't perform IO, we just evaluate an expression. Purity, immutability, and referential transparency are maintained. (Or, at least that's true from the point of view of the language semantics. In reality there'll be a perfectly ordinary while loop under the hood reassigning variables like crazy.)

So that's about as far as my thinking has got. It seems to me that it's more ergonomic than the usual HOF approach to loops in functional languages, while still preserving the semantic guarantees that are important to the language. For example, in an imperative language if we wrote something like for i::v in myList ... then we'd have to decide semantic issues like:

  • What happens if you reassign i during the loop?
  • What happens if you mutate the contents of v during the loop?
  • What happens if you mutate myList during the loop?
  • What values should you get if you inspect i and v after the termination of the loop?

In Pipefish the answer is uniformly: "The language offers no facilities to even try doing any of those things."

I have a bit more time to think about it before I have to settle on something, and I would welcome your feedback, ideas, criticisms, etc. Thank you!

19 Comments
2024/04/25
04:08 UTC

5

PLs that allow virtual fields?

I'd like to know some programming languages that allow virtual fields, either builtin support or implemented with strong metaprogramming capabilities.

I'll demonstrate with python. Suppose a newtype Temperature with a field celsius:

class Temperature:
    celsius: float

Here two virtual fields fahrenheit and kelvin can be created, which are not stored in memory but calculated on-the-fly.

In terms of usage, they are just like any other fields. You can access them:

temp = Temperature(celsius=0)
print(temp.fahrenheit)  # 32.0

Update them:

temp.fahrenheit = 50
print(temp.celsius)  # 10.0

Use them in constructors:

print(Temperature(fahrenheit=32))  # Temperature(celsius=0.0)

And pattern match them:

def absolute_zero?(temp: Temperature) -> bool:
    match temp:
        case Temperature(kelvin=0): return true
        case _: return false

Another example:

class Time:
    millis: int
    
# virtual fields: hours, minutes

time = Time(hours=4)
time.minutes += 60
print(time.hours)  # 5
16 Comments
2024/04/24
03:51 UTC

29

Thoughts on language design and principled error handling / reporting?

Today at work I couldn't help but get overly frustrated over some of the ergonomics of error reporting in gradle. Specifically, how gradle reports errors from command line invocations, though there are of course many more examples of rage-inducing error reporting anti-patterns I could use as anyone who has used gradle could attest to.

Essentially, gradle will report something like "command foo returned error code 1" -- and then you have to either scroll up to (hopefully) find some clues as to what actually went wrong, such as standard out messages from running the command, or even exactly what command has been invoked (e.x. to help figure out potential syntax issues).

I use that example as it's the freshest in my memory, but I think most programming environments are rife with such issues of less-than-informative or badly presented error messages.

To some extent I think some level of "cryptic-ness" in error messages is unavoidable. Programming environments are not AGIs, so there will always have to be some level of deduction from the reader of an error message to try to figure out what is "really going on", or what the root causes of something are.

Still, I can't but think if people gave the area of "error message / debugging ergonomics" a bit more thought, we could make error reporting and debugging in our languages / environments a lot more pleasant. I look at Eve's inspector tool as a kind of solution that is criminally under-explored and under-used in production. Or even tools like time-traveling debuggers.

We can talk about exceptions v.s. typed errors of various kinds, but I think in a lot of ways that's a pretty surface-level distinction, and there's much more we could be thinking about in the realm of how to improve error handling / reporting.

For example, one of my biggest pet peeves in error reporting is vague exceptions -- think "File system error: Could not create file" v.s. "File system error: Could not create file /path/to/specific/file/i/couldnt/create". In my opinion, unless you have very specific security reasons not to, I would almost always rather see the latter rather than the former error message -- and whether exceptions were used to "raise" the error or an Either monad doesn't particularly matter.

This got me thinking: Is it possible (either in the language runtime, or somehow via static constraints in the compiler) or enforce, or at least highly encourage the latter more specific style of error reporting? This isn't something I've seen much discussion about, but I think it would be highly beneficial if possible.

One random thought I had on the runtime side would be, why not include alongside stack traces a printout of some of the value of the local variables "close" to where the exception was thrown? For example, this would be similar to the error message Haskell shows when it encounters a typed hole -- a listing of some potentially relevant expressions in scope (except in this case we would be interested in the concrete value of the expressions, not just their type). I'm sure there would be performance considerations with this, but has anyone tried anything of this nature? Perhaps the benefits would outweigh the costs at least in some scenarios.

Another pet peeve of mine in my day job working in a primarily statement-oriented language (Kotlin) in a more FP / expression-oriented way is that oftentimes just a stack trace is not incredibly useful. Sometimes I find myself wanting the "expression trace" or "event trace" (my coinages -- as far as I know there isn't established terminology for what I'm thinking of here).

For example, given a divide by zero exception, the stack trace may not necessarily line up with the trace through the code-base of where the offending zero may have came from. I haven't fully fleshed this idea out yet, but an "expression trace" would show you (in backwards term reductions) where the 0 / 0 came from (e.x. 0 / 0 -> 0 / (5 - 5) -> 0 / (5 - foo()), which would then tell you the evaluation of foo is what led to the divide be zero exception. This strikes me as very similar in spirit to the Eve inspector, but not limited to UIs.

"Event trace"s would more-or-less simply be a log that reports the list of "events" that led the user of an application to an exception in an event-driven architecture like The Elm Architecture or Event Sourcing. Perhaps such reporting is already present in some frameworks, but I think it would be beneficial if such an error reporting mechanism was as well-integrated into a language as stack traces are in most languages. Perhaps it's even possible somehow to implement something like this for a more general FRP system, rather than just for a stricter architecture like Elm.

These are just a few of the ideas I've had over the years. Has anyone else thought along similar lines of how PL design / runtime / tooling could be used to improve the developer experience of error reporting? Am I missing some blog posts / papers on this very topic, or am I justified in thinking some of these ideas are under-explored?

26 Comments
2024/04/24
02:13 UTC

19

implications on precedence from merging = and ==

In my language, there is no distinction between = and ==. This means you can compare values using =:

case user.first_name = "John" {
  println("Do something");
}

or you can declare variables:

val user = User(.first_name: "John", .last_name: "Adams");

In my grammar, there is only one version of =, and it's alongside all of the other comparison operators:

(EBNF)

cmp = [ unknown, ( "=" | "!=" | "<" | "<=" | ">" | ">=" ) ], unknown ;

"unknown" actually has relevance to how you can declare variables:

unknown = bitwise | ( "val" | "var" | "set" ), name, [ ":", bitwise ] ;

Basically, any expression can represent an "unknown," and that unknown will get filled in if possible via destructuring. Obviously, unknowns only make sense in contexts where their value can be deduced by the compiler, which is mainly on the LHS or RHS of an = expression.

I find this very elegant, but it means that = has higher precedence than && and ||, because you need to be able to use = in conditions without ridiculous amounts of parentheses. But of course this means if you want to assign an expression that has && or ||, you need parentheses:

val shouldAllowAccess = (record.is_public || record.owner = auth.user);

This alone is kind of okay, but I definitely want to be able to assign an unbracketed lambda to a variable without surrounding parentheses, like so:

val full_name = (val user: User) => "\(user.first_name) \(user.last_name)";

So this means that you also need parentheses around conditions in unbracketed lambdas:

val has_name = (val user: User) => (user.first_name != nil && user.last_name != nil);

Finally, and another thing that I am doing solely for elegance, is that defining named functions uses the same grammar rules (and hence identical syntax) as lambdas. So this means that even at the top level, you'd need to use parentheses around complex conditions:

def has_name(val user: User) => (user.first_name != nil && user.last_name != nil);

I wanted to see what people's thoughts were about this compromise of && and || having lower precedence than = and =>, which in pretty much every other language always have the lowest precedence.

edit: more justification for the merge:

here's an example of pattern matching / destructuring in my language, like in rust and swift:

// a variant / tagged union with a simple AST
type Expr
  .Add(Expr, Expr)?
  .Neg(Expr)?
  .Val(Int);

// a tree walking interpreter
def eval(val expr: Expr) {
  case expr = Expr.Add(val a, val b) {
    eval(a) + eval(b)
  } else expr = Expr.Neg(val a) {
    -eval(a)
  } else expr = Expr.Val(val a) {
    a
  }
}

by merging = and ==, we basically get if let / if case let for free in terms of additional syntax

34 Comments
2024/04/24
01:08 UTC

0

Writing the compiler in plang language - first step

1 Comment
2024/04/23
17:37 UTC

5

Possible solution to the problem of references in programming languages

Every programmer is familiar with the concept of "reference." This term usually refers to a small object whose main task is to provide access to another object physically located elsewhere. Because of this, references are convenient to use, they are easily copied, and they make it very easy to access the object to which the reference points, allowing access to the same data from different parts of the program.

Unfortunately, manual memory management, or more precisely, manual memory control, is the most common cause of various errors and vulnerabilities in software. All attempts at automatic memory management through various managers are hampered by the need to control the creation and deletion of objects, as well as periodically run garbage collection, which negatively affects application performance.

However, references in one form or another are supported in all programming languages, although the term often implies not completely equivalent terms. For example, the word "reference" can be understood as a reference as an address in memory (as in C++) and a reference as a pointer to an object (as in Python or Java).

Although there are programming languages that try to solve these problems through the concept of "ownership" (Rust, Argentum, or NewLang). The possible solution to these and other existing problems with references will be discussed further.

What types of references exist?

For example, in the C language, there are pointers, but working with them is not very convenient and at the same time very dangerous due to the presence of pointer arithmetic (the ability to directly modify the pointer address to data in the computer's memory). In C++, a separate entity appeared — reference, and in C++11, references received further development, rvalue-references appeared.

While in C++, there are several types of references at once, Python developers probably deliberately tried to "simplify" working with references and generally abandoned them. Although de facto in Python, every object is a reference, although some types (simple values) are automatically passed by value, whereas complex types (objects) are always passed by reference.

Circular references

There is also a global problem of circular references that affects almost all programming languages (when an object directly or through several other objects points to itself). Often, language developers (especially those with garbage collectors) have to resort to various algorithmic tricks to clean up the pool of created objects from such "frozen" link and circular references, although usually this problem is left to developers, for example, in C++ there are strong (std::shared_ptr) and weak (std::weak_ptr) pointers.

Ambiguous semantics of references

Another equally important but often ignored problem with references is the language semantics for working with them. For example, in C/C++, separate operators like asterisk "*", arrow "->", and dot "." are used to access data by reference and by value. However, working with reference variables in C++ is done as with regular variables "by value", although in fact this is not the case. Of course, with explicit typing in C++, the compiler will not allow mistakes, but when reading the code normally, you will not be able to distinguish a reference variable from a regular one "by value".

In Python, it is very easy to confuse how a variable will be passed as an argument to a function, by value or by reference. This depends on the data contained in the variable. The fact that Python is a dynamically typed language adds a special twist, and in general, it is not known in advance what value is stored in the variable.

Who is to blame and what to do?

It seems to me that the main reason, at least for the ambiguous semantics, is the constant growth of complexity of development tools, and as a result - the complication and refinement of the syntax of programming languages for new concepts and capabilities while maintaining backward compatibility with old legacy code.

But what if we start from a clean slate? For example, a universal concept of object and object reference management that does not require manual memory management by the user (programmer), for which no garbage collector is needed, and errors when working with memory and object references become impossible due to full control of memory management even at the stage of compiling the application source code!

Terms:

Object - data in the computer's memory in machine (binary) representation.
Variable - a human-readable identifier in the program body, which is uniquely determined by its name and identifies the object (the actual value of the object or a reference to it). Variables can be:

  • Owner variable - the only constant reference to the object (shared_ptr).
  • Reference variable - a temporary reference to the object (weak_ptr).

Possible operations:

  • creating a new owner variable and initializing the object value.
  • creating a reference variable to an existing owner variable.
  • assigning a new value to the object by the owner variable name.
  • assigning a new value to the object pointed to by the reference variable.
  • assigning a new reference to the reference variable from another owner variable.

Example of source code and conditional abbreviations:

  • "&" - creating a reference to a variable
  • "*" - accessing object data

# variables - owners
val1 := 1;  
val2 := 2;  
# val1 = 1, and val2 = 2  
  
val1 = val2; # Error - only one owner!  
*val1 = *val2; # OK - value is assigned 

# To avoid too many asterisks in your code, you can omit them when accessing
# object data if the owner variable is used in an expression as an rvalue
*val1 = val2; # This is also valid  
# val1 = 2, and val2 = 2  
  
# variables - references  
ref1 := &val1;  
ref2 := &val2;  
# ref1 -> val1, and ref2 -> val2  
  
ref1 = 3; # Error - variable is a reference!  
*ref1 = 3; # OK - value is assigned "by reference"  
# val1 = 3, and val2 = 2  
# *ref1 = 3, *ref2 = 2  
  
*ref2 = *ref1; # OK - one value is assigned to another value "by reference"
# val1 = 3, а val2 = 3
# *ref1 = 3,  *ref2 = 3

ref1 = ref2; # OK - one link is assigned to another link
# ref1 -> val2, а ref2 -> val2
# *ref1 = 3,  *ref2 = 3

*val2 = 5;
# *ref1 -> 5,  *ref2 -> 5

With this syntax, everything becomes simple, visual, and clear. But if I missed something somewhere, please write in the comments or in a personal message.

39 Comments
2024/04/23
16:14 UTC

14

How do you properly tokenize expressions?

If I just split using spaces then things like int i=8 wouldn't work and if I pass an empty character to most split functions then numbers like 22 and 8.5 will get messed up.

Regex could be a proper way to do it but I avoid them as it's easy to shoot yourself in the foot, any advice?

21 Comments
2024/04/23
07:06 UTC

12

Last element in an array

In my programming language, arrays are 1-based. It's a beginner programming language, and I think there's a niche for it between Scratch and Python. 1-based arrays are the exception today, but it used to be common and many beginner and math-oriented languages (Scratch, Lua, Julia, Matlab, Mathematica ...) are also 1-based nowadays. But this should not be the topic. It's about array[0] - I think it would be convenient to take that as the last element. On the other hand, a bit unexpected (except for vi users, where 0 is the last line). I don't think -1 fits because it's not length-1 either, like in Python for example.

90 Comments
2024/04/22
15:36 UTC

7

Advent of Computing: Episode 130 - ALGOL, Part II

0 Comments
2024/04/22
06:18 UTC

0

Programming language features

I might make a programming language, possibly named Avenge, I'm wondering what features are in high demand that people might want. Here's what I've thought of so far:

  • Static typing with basic types like int, String, float, etc.
  • Introducing strict and loose typing for variable mutability (strict for constants, loose for changeable values; defaulting to Python-like behavior if no type specified)
  • Variables in Avenge: (Type) (strict/loose) (name) = (value)
  • Can't decide between curly braces or Python-style indentation for code structure
  • Manual memory management

Still in the early concept phase, so I'm open to suggestions for more features or tweaks to these. This is a serious thread.

38 Comments
2024/04/21
20:17 UTC

1

Some questions on Harper's Practical Foundations for Programming Languages

So I've started reading this book and I have two questions:

  1. At the top of page 6, the inclusion relation is used. Is it just ordinary inclusion? I've started to suspect that "Y includes X" actually means that: a) every set in X is a subset of the corresponding set (that is, the set indexed with the same sort) in Y b) Y might contain some additional sets that correspond to "new" sorts which are not in X.
  2. In Theorem 1.1, shouldn't subscripts indicating sorts be used for the proof to be fully rigorous?

Thanks in advance for your answers!

4 Comments
2024/04/21
13:01 UTC

23

Best way to parse binary operations

I was wondering what the best way is to parse binary operations like 1 + 2 or 1 + 2 + 3 etc. I know the shunting yard algorithm but don’t think it works within a recursive descent parser for a programming language. What would be the best way to parse these kind of expressions?

41 Comments
2024/04/21
08:01 UTC

9

Tree for multiple dispatch etc.

I invented this, though a lot of other people probably invented it first. (If anyone has a reference please let me know!) It's useful for multiple dispatch and I presume pattern-matching.

To illustrate it, let's have some overloaded functions (this is real, you can do this in Pipefish though you shouldn't).

In this example single? is a supertype of all the other types, and single is a supertype of all the other types except null. So here are ten functions, each with a different signature and printing a different letter of the alphabet.

def

foo(x string) : "A"
foo(x single) : "B"
foo(x bool) : "C"
foo(x bool, y bool) : "D"
foo(x int, y bool) : "E"
foo(x int, y bool, z bool) : "F"
foo(x int, y int) : "G"
foo(x bool) troz (b bool) : "H"
foo(x single, y bool) : "I"
foo(x single?, y bool) : "J"

As usual we want the dispatch to be resolved by specificity, e.g. foo 8, true should return E and not H, because int is a more specific type than single.

What the compiler does is turn the function signatures into a tree which looks like this:

string
    foo(x string) : "A"
    bool
        foo(x single, y bool) : "I"
bool
    foo(x bool) : "C"
    bool
        foo(x bool, y bool) : "D"
    troz
        bool
            foo(x bool) troz (b bool) : "H"
int
    bool
        foo(x int, y bool) : "E"
        bool
            foo(x int, y bool, z bool) : "F"
    int
        func(x int, y int) : "G"
    foo(x single) : "B"
single
    foo(x single) : "B"
    bool
        foo(x single, y bool) : "I"
single?
    bool
        foo(x single?, y bool) : "J"

The basic rule is, you start at the top left looking at the first argument on the list, and if the type doesn't match, you move down, and if it does, you move across and apply the same rule to the next argument on the list. When you've run out of arguments, you see if there's a function at the node you've arrived at. If there is, that's the function you're looking for. If there isn't, that's a type error; if you fall off the right hand side or the bottom of the tree without finding a function, that's a type error.

The point of this is that the tree is structured so that we never have to backtrack.

This was maybe over-engineered for the prototype version with the evaluator but now I'm doing the VM (nearly finished BTW!) it's kind of crucial. Pipefish, you see, does a sort of best-effort typechecking, so the compiler may end up being able to erase some but not all of the types from the logic. So given what the compiler can infer about the types in any given context, it can navigate through the tree by moving freely when it can do type inference and by emitting a conditional jump in the bytecode to dispatch at runtime when it can't.

I would not like to do this without my faithful non-backtracking tree structure to guide the code generation, it has done well by me, and I recommend it to people who want to do similar things.

18 Comments
2024/04/21
07:59 UTC

10

Looking for papers and works on implementing session types in Swift

I'm starting to work on my bachelor-degree thesis which aims to verify whether the characteristics and peculiarities of the Swift language allow the implementation of session types. I found works and implementations in other languages like Rust, Haskell, and OCaml. Does anyone know if there are similar works about Swift?

2 Comments
2024/04/21
07:46 UTC

37

Status of 1ML like languages

I was talking with a coworker the other day and had a thought that 1ML is pretty close to the core of my ideal language. It looks like research fizzled out after the 2016 paper though. I was wondering if anyone else knew of similar work being done.

For those less familiar. 1ML is an attempt to combine the core and module languages of the ML family. There's also a paper on extending it with effects which gives a unified description for generative versus applicative module composition among the usual benefits of effect systems.

37 Comments
2024/04/19
21:36 UTC

6

Does anyone know of a good SMT solver with a theory of Lists?

Hi,

I was screwing around with the package SBV on haskell (which provides an interface betweem haskell and SMT solvers) and using SBV tried to solve the following question

Does there exist a list ls such that length (filter (== 2) ls) == 2

And sadly SBV using the z3 solver could not solve this. I was wondering if there was any SMT solver that could solve

Does there exist a list ls such that length (filter (== 2) ls) == 1000

Thanks

6 Comments
2024/04/19
19:50 UTC

14

How to do error handling with exception and async code?

We have two ways of dealling with errors (that I'm aware of):

  • by return value (Go, Rust)

  • by exception

if you look at Go or Rust code, basically every function can fail and most of your code is dealing with errors over focussing on the happy path.

This is tedious over having a big `try {}` and catch each type of error separately, grouping your error handling for a group of function and having the error and happy path quite separate. You can even catch few function call lower to make things simpler for you and grouping even more function in your error handling.

Now let's introduce "async / await" in the equation...

with the return value approach, when you need the value, you await, you check for error then use the value if there is no error or you deal with the error.

with exception you get a future that would make you leave the catch block then you will continue code execution but then an exception occur and this is where I'm so confused. Who catch the exception?

Is it the catch block where my original call was? is it some catch block that don't exist in the rest of my code because I'm suppose to guest when my async call will throw? Does the "main" code execution stop even if it has move forward? I just can't understand how things work and how to do good error handling in this context, can someone explain to me? For reference I currently code in Dart

18 Comments
2024/04/19
16:30 UTC

18

Regex as program control flow

I'm exploring a possibility of feeding a specialized input to regex to control an imperative flow of variable assignments.

Let's have statements, still unrelated to regex, and let each statement have two parts: input and output. Input consists of var name and var value. Output also consists of var name and var value. Consider the input as a named var that needs to contain noted value to trigger the output assignment of value to some other named var. This is like an if-then statement, but with some positive properties regarding to regex unrelated pattern matching. Statements can also be consisted of only output part, in which case they are unconditionally executed when their time comes.

Now, let's modify the usual regex definition a bit to operate not on characters, but on statements introduced above. Regex would, as usual, have sequences, Kleene stars, and alternations. Thus, instead of reading characters, the modified regex would read variables from input parts, parallelly triggering their output pairs on successful match.

Drawing a parallel between Kleene star and while loops, and between alternations and compound if-then-else blocks, this way it should be possible to construct different algorithms, right?

As an example, Euclidean algorithm for greatest common divisor would take the following form (note that this is very rough and unrefined form of syntax):

(
    FUN
    (NAME GCD)
    (PARAMS X Y)
    (
        SEQ
        (STMT (OUT a X))
        (STMT (OUT b Y))
        (
            STAR
            (
                ALT
                (STMT (IN a (moreThan b)) (OUT a (subtract a b)))
                (STMT (IN a (lessThan b)) (OUT b (subtract b a)))
            )
        )
        (STMT (IN a b) (OUT GCD a))
    )
)

I'm curious, is anyone aware of any similar work? If not, how would you feel about this idea overall? Could this thing even be Turing complete? And how ergonomic would writing programs this way be?

[EDIT]

After discussion, I cleaned up a bit the syntax of the system, and completely formalized it in relaxed BNF notation:

     <start> := <expression>

<expression> := (SEQ <expression>*)
              | (ALT <expression>*)
              | (OPT <expression>*)
              | (STAR <expression>*)
              | (PLUS <expression>*)
              | (MATCH (VAR <ATOM>+) <expression>)
              | <statement>
              
 <statement> := (TEST <ATOM> <S-EXPR>)
              | (HOLD <ATOM> <S-EXPR>)

SEQ is equivalent to sequence in regex, ALT is |, OPT is ?, STAR is *, PLUS is +, and MATCH is something extra that introduces meta-variables.

Statements are now split to TEST and HOLD. TEST fails when var named <ATOM> is not <S-EXPR>. HOLD assigns <S-EXPR> to var named <ATOM>, and never fails.

Sequence fails when any included element fails. Alternation fails when all included elements fail. OPT elements may be skipped if any included element fails, STAR ending with TEST case is equivalent to until <condition> repeat <expression>, and PLUS ending with TEST case is equivalent to repeat <expression> until <condition>.

I believe the whole system should be Turing complete.

The above example from the beginning should now look like this:

/*
    Greatest common divisor example
    
     input: `(integer integer)`
    output: `integer`
*/

(
    SEQ
    (
        MATCH
        (VAR X Y)
        (
            SEQ
            (TEST this (X Y))
            (HOLD a X)
            (HOLD b Y)
        )
    )
    (
        STAR
        (
            ALT
            (
                SEQ
                (TEST a (greaterThan b))
                (HOLD a (subtract a b))
            )
            (
                SEQ
                (TEST a (lessThan b))
                (HOLD b (subtract b a))
            )
        )
    )
    (TEST a b)
    (HOLD this a)
)

At the beginning, input parameters are read from this variable. At the end, we assign return value back to the this variable.

Simple, isn't it? I hope you like the updates :)

24 Comments
2024/04/18
22:36 UTC

58

Why there are so few hardware description languages?

Hardware description languages(HDL) are the standard to program digital logic. The industry standard languages are:

  • Verilog
  • VHDL
  • SystemVerilog

Verilog and VHDL were conceived in the 1980s. SystemVerilog is an improvement to Verilog made in 2002.

There are few other HDLs, but are only used by researchers or small one off projects.

Why there are no new viable alternatives popping out?

The languages work, but they are a pain to work with. People don't see HDL as an empowering tool, but as a necessary evil to get the job done.

This is the opposite with programming languages. Every few year, there is a new programming language. Industry standard programming of 20 years ago are not the same as today's. Some programming languages are viewed as empowering, and from a big following.

Why the stark contrast?

I have few hypothesis:

  • HDLs are not as accessible. There application is narrower, the hardware to run it on is expensive, and much of the software is proprietary.
  • HDLs are more complex than programming languages. HDLs have a notion of time which is missing in programming languages. A C program that takes 1 second or 1 year can be functionally equivalent. HDL design that runs in 1 second must run in 1 second to be within specification.

What are your thoughts?

24 Comments
2024/04/18
21:42 UTC

17

modules with "parameters"

Hi

I'm working on defining how i want modules and imports to work.

I'm trying to work under the constraint that modules cannot import other modules. I think that this might improve security (and maybe testability too).

Lets try with an example:

A logging library prints to a file. In most languages i know, that means importing a filesystem-construct, or using a global function.

The logging lib cannot import a filesystem-construct, because importing is not allowed inside modules, so instead the library takes a filesystem-construct as a parameter, similar to how a class takes values in a constructor.

Some pseudo code:

logging library:

module myLoggingLib(filesystem) {
    struct logger {
        function log(message) {
            filesystem.appendFile("log.txt", message)
        }
    }
}

application:

import system.filesystem
import myLoggingLib(system.filesystem)

function main() {
    myLoggingLib.logger.log("hello world")
}

This smells a little like old-school javascript, where we would wrap everything in a function to achive something akind to namespaces.

What other languages do this?

How do they handle types?

In the above example, the type of myLoggingLib, must include the type of some general filesystem module - where would that be defined?

Maybe other modules should not be allowed as parameters, so the logging lib would only have a appendFile function as parameter?

26 Comments
2024/04/18
17:17 UTC

63

Do block expressions make parentheses obsolete?

This is mostly a random shower thought.

We usually use parentheses to group parts of expressions:

(10 + 5) * (7 + 3)

Some languages, like Rust, also have blocks that can act as expressions:

let lhs = {
    let a = 10;
    let b = 5;
    a + b
};

lhs * (7 + 3)

However, since a block can consist of a single expression, we could just use such blocks instead of regular parentheses:

{ 10 + 5 } * { 7 + 3 }

This would free up regular round parentheses for some other purpose, e.g. tuples, without introducing any syntax ambiguity. Alternatively, we could use round parentheses for blocks, which would free up curly braces in some contexts:

let lhs = (
    let a = 10;
    let b = 5;
    a + b
);

let rhs = ( 7 + 3 );

lhs * rhs

Are there any downsides to these ideas (apart from the strangeness budget implications)?

73 Comments
2024/04/18
13:45 UTC

Back To Top