/r/ProgrammingLanguages
This subreddit is dedicated to the theory, design and implementation of programming languages.
This subreddit is dedicated to the theory, design and implementation of programming languages.
Be nice to each other. Flame wars and rants are not welcomed. Please also put some effort into your post, this isn't Quora.
This subreddit is not the right place to ask questions such as "What language should I use for X", "what language should I learn", "what's your favourite language" and similar questions. Such questions should be posted in /r/AskProgramming or /r/LearnProgramming.
/r/ProgrammingLanguages
EDIT : I added a bunch of context under u/L8_4_Dinner's comment.
Basically, I am creating a small specialised language as a side project.
It's a pretty simple C-like syntax language, nothing that specific about it except it is statically typed and will support some light structure types. The syntax is not 100% final but some simple code could look like this :
struct Dog = {
num legCount = 4;
str name;
}
func sayHello(Dog d) {
// here lies my question
console(d.name);
}
My question is : what should be the signature of my "console" function? Which ofc is provided by my language API. Knowing that functions do not support optional types, you cannot overload functions and don't have access to any placeholder type (like Object in Java or any in TypeScript).
This function should be able to take any number of parameters of any type and log them to the console (think console.log in JS). My language do not provide any way to express this (and it is intentional).
What should I do from here?
Then I have a last idea that I find super cool, but I have the feeling there is a catch : make it an expression, that doesn't do anything special except logging the value and resolving it. It could lead to pretty neat syntaxes like :
while(console i < 10) {
// would console the value of i at each evaluation
}
// Is also supported because EXPRESSION + ";" is a valid statement
console "Hello, world!";
This expression would ofc need a very high precedence, I would think the only expressions with higher precedence are literals, identifiers and unary (!, -) expressions.
So, is this a common problem? I am very new to all this, probably I missed the whole point or this is already highly discussed.
hi, I'm making a sort-of-mix-between-JS-and-CSharp programming language, and I want it to support operator overloading, so that I can, e.g., add arbitrary types, like Points for instance.
I'm thinking if I should support overloading types that I do not own?
I.e. should I allow an operator definition be a standalone thing, like functions, or should it be inside a class and be like a method?
1:
operator + (lhs: TypeIDoNotOwn, rhs: Int) : Int => ...
2:
class TypeIOwn {
operator+ (rhs: Int): Int => ...
}
2 can also require two operands (and therefore make an operator a static member), and require one of the operands be of this type, but that is outside the scope of the question.
the 1st option will be basically like extension methods, but extension... operators?
since operators basically have function semantics anyway (you can assume 1 + 2
is functionally equivalent to something akin 1.operator+(2)
), is it harmful to allow users to define custom operators on all types, even those that are defined in a separate assembly/module and they therefore cannot define it there?
if it's important, the class structure cannot change at runtime, so it's more similar to C# in that regard. there are no prototype chains like in JS.
Hello again, friends, critics, and assorted rubber ducks.
Despite Pipefish being functional, I don't want to use recursion for things that you wouldn't use recursion for in a normal language. And so despite my general wish to keep the language small and TOOWTDI, I've supplied a bunch of ways to iterate on things. For example the mapping operator >>
. Example of usage, [1, 2, 3] >> that + 1
returns [2, 3, 4]
. The compiler classifies that
as a Very Local Variable, it just exists in the scope of the right-hand-side of the >>
operator.
(Note that such a variable doesn't really violate referential transparency, because from the point of view of the semantics of the language, it doesn't take on all the values in the list any more than the x
in ∀x ∊ ℕ : x ≥ 0
takes on all the values in ℕ
, or the i
in big sigma notation takes on every value in its range.)
And I also felt the need for for
and while
, hence this post. The while
loop I initially just implemented in Pipefish as a higher-order function with signature while (condition func) do (action func) to (data tuple)
. Example of usage --- this adds up the numbers from 0
to n - 1
.
triangle(n) :
(while unfinished do add to 0, 0)[1]
given :
unfinished(counter, total) : counter < n
add(counter, total) : counter + 1, total + counter
For those who don't know Pipefish, the given
block defines inner functions. Note the capture of n
. This is standard functional stuff. Note also the indexing, the [1]
. This is because we've been iterating on two values, the counter and the total, but we only want the total. This sort of situation is also common in functional languages.
Doing a for
loop was hairier and needed me to hardwire the semantics. What I came up with was something like this ...
triangle(n) :
for i in 1::(n + 1) do add to 0
given :
add(x) : x + i
... which worked fine when I was using an evaluator because then the i
in the function can be the same as the i
in the for
clause just because they have the same name. In a compiler however, this raises some nasty issues because they have to refer to the same memory location. But a function can't always know whether it's going to be passed to a for
loop ...
So, next thought, we need some closer relation between the for
loop and its body then its body just being a function it gets passed. Let's make a for
block:
triangle(n) :
for i in 0::n do to 0 :
i + ... ?
... wait, we now have another problem. Working with functions may have been somewhat cumbersome, but it supplied us with a name, x
, for the thing we wanted to add i
to. Now we must do it ourselves:
triangle(n) :
for i in 0::n with x::0 :
x + i
(Please note that I am not at all wedded to the with <name>::<expression>
syntax. I'm open to suggestions any so long as they can also be used for while
loops --- see below. starting with
would be clearer but I think excessively verbose. ETA --- how about from
? That gives the idea of initialization and unlike with with
I'd feel happier about using =
rather than ::
. So for i in 0::n from x = 0 : ...
. Or perhaps it would make more sense to put the initialization first? from x = 0 for i in 0::n : ... ?
) Or avoid the keyword by recycling Go's :=
variable initializer, which I haven't used yet: x := 0; for i in 0::n : ...
I've got lots of syntactic options. I wouldn't say no to more.)
The body of the loop supplies an expression giving the new value of x
. Using multiple returns we can deal with multiple variables:
fib(n) :
for i in 0::n with a::0, b::1 :
b, a + b
Note that this is one of those cases where we end up with too many return values, since we get back a tuple consisting of (the final values of) a, b
, when all we want is a
. We could perhaps use the names of the variables as syntactic sugar for indexing :
fib(n) :
(for i in 0::n with a::0, b::1 : b, a + b)[a]
We can do the same sorts of things with while
loops, and as we can, we should. To illustrate, let's write a non-recursive Collatz function:
collatz(n) :
while i != 1 with i::n :
i % 2 == 0 :
i / 2
else :
3 * i + 1
Unlike in the function-based implementation, break
statements could become meaningful. E.g. this function would be equivalent to the previous one:
collatz(n) :
while true with i::n :
i == 1 :
break
i % 2 == 0 :
i / 2
else :
3 * i + 1
Apart from the addition of break
, everything in the body of these while
or for
loops is an ordinary functional Pipefish expression. We don't reassign any variables, we don't mutate any values, we don't perform IO, we just evaluate an expression. Purity, immutability, and referential transparency are maintained. (Or, at least that's true from the point of view of the language semantics. In reality there'll be a perfectly ordinary while
loop under the hood reassigning variables like crazy.)
So that's about as far as my thinking has got. It seems to me that it's more ergonomic than the usual HOF approach to loops in functional languages, while still preserving the semantic guarantees that are important to the language. For example, in an imperative language if we wrote something like for i::v in myList ...
then we'd have to decide semantic issues like:
i
during the loop?v
during the loop?myList
during the loop?i
and v
after the termination of the loop?In Pipefish the answer is uniformly: "The language offers no facilities to even try doing any of those things."
I have a bit more time to think about it before I have to settle on something, and I would welcome your feedback, ideas, criticisms, etc. Thank you!
I'd like to know some programming languages that allow virtual fields, either builtin support or implemented with strong metaprogramming capabilities.
I'll demonstrate with python.
Suppose a newtype Temperature
with a field celsius
:
class Temperature:
celsius: float
Here two virtual fields fahrenheit
and kelvin
can be created,
which are not stored in memory but calculated on-the-fly.
In terms of usage, they are just like any other fields. You can access them:
temp = Temperature(celsius=0)
print(temp.fahrenheit) # 32.0
Update them:
temp.fahrenheit = 50
print(temp.celsius) # 10.0
Use them in constructors:
print(Temperature(fahrenheit=32)) # Temperature(celsius=0.0)
And pattern match them:
def absolute_zero?(temp: Temperature) -> bool:
match temp:
case Temperature(kelvin=0): return true
case _: return false
Another example:
class Time:
millis: int
# virtual fields: hours, minutes
time = Time(hours=4)
time.minutes += 60
print(time.hours) # 5
Today at work I couldn't help but get overly frustrated over some of the ergonomics of error reporting in gradle. Specifically, how gradle reports errors from command line invocations, though there are of course many more examples of rage-inducing error reporting anti-patterns I could use as anyone who has used gradle could attest to.
Essentially, gradle will report something like "command foo returned error code 1" -- and then you have to either scroll up to (hopefully) find some clues as to what actually went wrong, such as standard out messages from running the command, or even exactly what command has been invoked (e.x. to help figure out potential syntax issues).
I use that example as it's the freshest in my memory, but I think most programming environments are rife with such issues of less-than-informative or badly presented error messages.
To some extent I think some level of "cryptic-ness" in error messages is unavoidable. Programming environments are not AGIs, so there will always have to be some level of deduction from the reader of an error message to try to figure out what is "really going on", or what the root causes of something are.
Still, I can't but think if people gave the area of "error message / debugging ergonomics" a bit more thought, we could make error reporting and debugging in our languages / environments a lot more pleasant. I look at Eve's inspector tool as a kind of solution that is criminally under-explored and under-used in production. Or even tools like time-traveling debuggers.
We can talk about exceptions v.s. typed errors of various kinds, but I think in a lot of ways that's a pretty surface-level distinction, and there's much more we could be thinking about in the realm of how to improve error handling / reporting.
For example, one of my biggest pet peeves in error reporting is vague exceptions -- think "File system error: Could not create file" v.s. "File system error: Could not create file /path/to/specific/file/i/couldnt/create". In my opinion, unless you have very specific security reasons not to, I would almost always rather see the latter rather than the former error message -- and whether exceptions were used to "raise" the error or an Either monad doesn't particularly matter.
This got me thinking: Is it possible (either in the language runtime, or somehow via static constraints in the compiler) or enforce, or at least highly encourage the latter more specific style of error reporting? This isn't something I've seen much discussion about, but I think it would be highly beneficial if possible.
One random thought I had on the runtime side would be, why not include alongside stack traces a printout of some of the value of the local variables "close" to where the exception was thrown? For example, this would be similar to the error message Haskell shows when it encounters a typed hole -- a listing of some potentially relevant expressions in scope (except in this case we would be interested in the concrete value of the expressions, not just their type). I'm sure there would be performance considerations with this, but has anyone tried anything of this nature? Perhaps the benefits would outweigh the costs at least in some scenarios.
Another pet peeve of mine in my day job working in a primarily statement-oriented language (Kotlin) in a more FP / expression-oriented way is that oftentimes just a stack trace is not incredibly useful. Sometimes I find myself wanting the "expression trace" or "event trace" (my coinages -- as far as I know there isn't established terminology for what I'm thinking of here).
For example, given a divide by zero exception, the stack trace may not necessarily line up with the trace through the code-base of where the offending zero may have came from. I haven't fully fleshed this idea out yet, but an "expression trace" would show you (in backwards term reductions) where the 0 / 0
came from (e.x. 0 / 0
-> 0 / (5 - 5)
-> 0 / (5 - foo())
, which would then tell you the evaluation of foo
is what led to the divide be zero exception. This strikes me as very similar in spirit to the Eve inspector, but not limited to UIs.
"Event trace"s would more-or-less simply be a log that reports the list of "events" that led the user of an application to an exception in an event-driven architecture like The Elm Architecture or Event Sourcing. Perhaps such reporting is already present in some frameworks, but I think it would be beneficial if such an error reporting mechanism was as well-integrated into a language as stack traces are in most languages. Perhaps it's even possible somehow to implement something like this for a more general FRP system, rather than just for a stricter architecture like Elm.
These are just a few of the ideas I've had over the years. Has anyone else thought along similar lines of how PL design / runtime / tooling could be used to improve the developer experience of error reporting? Am I missing some blog posts / papers on this very topic, or am I justified in thinking some of these ideas are under-explored?
In my language, there is no distinction between =
and ==
. This means you can compare values using =
:
case user.first_name = "John" {
println("Do something");
}
or you can declare variables:
val user = User(.first_name: "John", .last_name: "Adams");
In my grammar, there is only one version of =
, and it's alongside all of the other comparison operators:
(EBNF)
cmp = [ unknown, ( "=" | "!=" | "<" | "<=" | ">" | ">=" ) ], unknown ;
"unknown" actually has relevance to how you can declare variables:
unknown = bitwise | ( "val" | "var" | "set" ), name, [ ":", bitwise ] ;
Basically, any expression can represent an "unknown," and that unknown will get filled in if possible via destructuring. Obviously, unknowns only make sense in contexts where their value can be deduced by the compiler, which is mainly on the LHS or RHS of an =
expression.
I find this very elegant, but it means that =
has higher precedence than &&
and ||
, because you need to be able to use =
in conditions without ridiculous amounts of parentheses. But of course this means if you want to assign an expression that has &&
or ||
, you need parentheses:
val shouldAllowAccess = (record.is_public || record.owner = auth.user);
This alone is kind of okay, but I definitely want to be able to assign an unbracketed lambda to a variable without surrounding parentheses, like so:
val full_name = (val user: User) => "\(user.first_name) \(user.last_name)";
So this means that you also need parentheses around conditions in unbracketed lambdas:
val has_name = (val user: User) => (user.first_name != nil && user.last_name != nil);
Finally, and another thing that I am doing solely for elegance, is that defining named functions uses the same grammar rules (and hence identical syntax) as lambdas. So this means that even at the top level, you'd need to use parentheses around complex conditions:
def has_name(val user: User) => (user.first_name != nil && user.last_name != nil);
I wanted to see what people's thoughts were about this compromise of && and || having lower precedence than =
and =>
, which in pretty much every other language always have the lowest precedence.
edit: more justification for the merge:
here's an example of pattern matching / destructuring in my language, like in rust and swift:
// a variant / tagged union with a simple AST
type Expr
.Add(Expr, Expr)?
.Neg(Expr)?
.Val(Int);
// a tree walking interpreter
def eval(val expr: Expr) {
case expr = Expr.Add(val a, val b) {
eval(a) + eval(b)
} else expr = Expr.Neg(val a) {
-eval(a)
} else expr = Expr.Val(val a) {
a
}
}
by merging =
and ==
, we basically get if let
/ if case let
for free in terms of additional syntax
Every programmer is familiar with the concept of "reference." This term usually refers to a small object whose main task is to provide access to another object physically located elsewhere. Because of this, references are convenient to use, they are easily copied, and they make it very easy to access the object to which the reference points, allowing access to the same data from different parts of the program.
Unfortunately, manual memory management, or more precisely, manual memory control, is the most common cause of various errors and vulnerabilities in software. All attempts at automatic memory management through various managers are hampered by the need to control the creation and deletion of objects, as well as periodically run garbage collection, which negatively affects application performance.
However, references in one form or another are supported in all programming languages, although the term often implies not completely equivalent terms. For example, the word "reference" can be understood as a reference as an address in memory (as in C++) and a reference as a pointer to an object (as in Python or Java).
Although there are programming languages that try to solve these problems through the concept of "ownership" (Rust, Argentum, or NewLang). The possible solution to these and other existing problems with references will be discussed further.
For example, in the C language, there are pointers, but working with them is not very convenient and at the same time very dangerous due to the presence of pointer arithmetic (the ability to directly modify the pointer address to data in the computer's memory). In C++, a separate entity appeared — reference, and in C++11, references received further development, rvalue-references appeared.
While in C++, there are several types of references at once, Python developers probably deliberately tried to "simplify" working with references and generally abandoned them. Although de facto in Python, every object is a reference, although some types (simple values) are automatically passed by value, whereas complex types (objects) are always passed by reference.
There is also a global problem of circular references that affects almost all programming languages (when an object directly or through several other objects points to itself). Often, language developers (especially those with garbage collectors) have to resort to various algorithmic tricks to clean up the pool of created objects from such "frozen" link and circular references, although usually this problem is left to developers, for example, in C++ there are strong (std::shared_ptr) and weak (std::weak_ptr) pointers.
Another equally important but often ignored problem with references is the language semantics for working with them. For example, in C/C++, separate operators like asterisk "*", arrow "->", and dot "." are used to access data by reference and by value. However, working with reference variables in C++ is done as with regular variables "by value", although in fact this is not the case. Of course, with explicit typing in C++, the compiler will not allow mistakes, but when reading the code normally, you will not be able to distinguish a reference variable from a regular one "by value".
In Python, it is very easy to confuse how a variable will be passed as an argument to a function, by value or by reference. This depends on the data contained in the variable. The fact that Python is a dynamically typed language adds a special twist, and in general, it is not known in advance what value is stored in the variable.
It seems to me that the main reason, at least for the ambiguous semantics, is the constant growth of complexity of development tools, and as a result - the complication and refinement of the syntax of programming languages for new concepts and capabilities while maintaining backward compatibility with old legacy code.
But what if we start from a clean slate? For example, a universal concept of object and object reference management that does not require manual memory management by the user (programmer), for which no garbage collector is needed, and errors when working with memory and object references become impossible due to full control of memory management even at the stage of compiling the application source code!
Object - data in the computer's memory in machine (binary) representation.
Variable - a human-readable identifier in the program body, which is uniquely determined by its name and identifies the object (the actual value of the object or a reference to it). Variables can be:
Example of source code and conditional abbreviations:
# variables - owners
val1 := 1;
val2 := 2;
# val1 = 1, and val2 = 2
val1 = val2; # Error - only one owner!
*val1 = *val2; # OK - value is assigned
# To avoid too many asterisks in your code, you can omit them when accessing
# object data if the owner variable is used in an expression as an rvalue
*val1 = val2; # This is also valid
# val1 = 2, and val2 = 2
# variables - references
ref1 := &val1;
ref2 := &val2;
# ref1 -> val1, and ref2 -> val2
ref1 = 3; # Error - variable is a reference!
*ref1 = 3; # OK - value is assigned "by reference"
# val1 = 3, and val2 = 2
# *ref1 = 3, *ref2 = 2
*ref2 = *ref1; # OK - one value is assigned to another value "by reference"
# val1 = 3, а val2 = 3
# *ref1 = 3, *ref2 = 3
ref1 = ref2; # OK - one link is assigned to another link
# ref1 -> val2, а ref2 -> val2
# *ref1 = 3, *ref2 = 3
*val2 = 5;
# *ref1 -> 5, *ref2 -> 5
With this syntax, everything becomes simple, visual, and clear. But if I missed something somewhere, please write in the comments or in a personal message.
If I just split using spaces then things like int i=8
wouldn't work and if I pass an empty character to most split functions then numbers like 22 and 8.5 will get messed up.
Regex could be a proper way to do it but I avoid them as it's easy to shoot yourself in the foot, any advice?
In my programming language, arrays are 1-based. It's a beginner programming language, and I think there's a niche for it between Scratch and Python. 1-based arrays are the exception today, but it used to be common and many beginner and math-oriented languages (Scratch, Lua, Julia, Matlab, Mathematica ...) are also 1-based nowadays. But this should not be the topic. It's about array[0] - I think it would be convenient to take that as the last element. On the other hand, a bit unexpected (except for vi users, where 0 is the last line). I don't think -1 fits because it's not length-1 either, like in Python for example.
I might make a programming language, possibly named Avenge, I'm wondering what features are in high demand that people might want. Here's what I've thought of so far:
Still in the early concept phase, so I'm open to suggestions for more features or tweaks to these. This is a serious thread.
So I've started reading this book and I have two questions:
Thanks in advance for your answers!
I was wondering what the best way is to parse binary operations like 1 + 2 or 1 + 2 + 3 etc. I know the shunting yard algorithm but don’t think it works within a recursive descent parser for a programming language. What would be the best way to parse these kind of expressions?
I invented this, though a lot of other people probably invented it first. (If anyone has a reference please let me know!) It's useful for multiple dispatch and I presume pattern-matching.
To illustrate it, let's have some overloaded functions (this is real, you can do this in Pipefish though you shouldn't).
In this example single?
is a supertype of all the other types, and single
is a supertype of all the other types except null
. So here are ten functions, each with a different signature and printing a different letter of the alphabet.
def
foo(x string) : "A"
foo(x single) : "B"
foo(x bool) : "C"
foo(x bool, y bool) : "D"
foo(x int, y bool) : "E"
foo(x int, y bool, z bool) : "F"
foo(x int, y int) : "G"
foo(x bool) troz (b bool) : "H"
foo(x single, y bool) : "I"
foo(x single?, y bool) : "J"
As usual we want the dispatch to be resolved by specificity, e.g. foo 8, true
should return E
and not H
, because int
is a more specific type than single
.
What the compiler does is turn the function signatures into a tree which looks like this:
string
foo(x string) : "A"
bool
foo(x single, y bool) : "I"
bool
foo(x bool) : "C"
bool
foo(x bool, y bool) : "D"
troz
bool
foo(x bool) troz (b bool) : "H"
int
bool
foo(x int, y bool) : "E"
bool
foo(x int, y bool, z bool) : "F"
int
func(x int, y int) : "G"
foo(x single) : "B"
single
foo(x single) : "B"
bool
foo(x single, y bool) : "I"
single?
bool
foo(x single?, y bool) : "J"
The basic rule is, you start at the top left looking at the first argument on the list, and if the type doesn't match, you move down, and if it does, you move across and apply the same rule to the next argument on the list. When you've run out of arguments, you see if there's a function at the node you've arrived at. If there is, that's the function you're looking for. If there isn't, that's a type error; if you fall off the right hand side or the bottom of the tree without finding a function, that's a type error.
The point of this is that the tree is structured so that we never have to backtrack.
This was maybe over-engineered for the prototype version with the evaluator but now I'm doing the VM (nearly finished BTW!) it's kind of crucial. Pipefish, you see, does a sort of best-effort typechecking, so the compiler may end up being able to erase some but not all of the types from the logic. So given what the compiler can infer about the types in any given context, it can navigate through the tree by moving freely when it can do type inference and by emitting a conditional jump in the bytecode to dispatch at runtime when it can't.
I would not like to do this without my faithful non-backtracking tree structure to guide the code generation, it has done well by me, and I recommend it to people who want to do similar things.
I'm starting to work on my bachelor-degree thesis which aims to verify whether the characteristics and peculiarities of the Swift language allow the implementation of session types. I found works and implementations in other languages like Rust, Haskell, and OCaml. Does anyone know if there are similar works about Swift?
I was talking with a coworker the other day and had a thought that 1ML is pretty close to the core of my ideal language. It looks like research fizzled out after the 2016 paper though. I was wondering if anyone else knew of similar work being done.
For those less familiar. 1ML is an attempt to combine the core and module languages of the ML family. There's also a paper on extending it with effects which gives a unified description for generative versus applicative module composition among the usual benefits of effect systems.
Hi,
I was screwing around with the package SBV on haskell (which provides an interface betweem haskell and SMT solvers) and using SBV tried to solve the following question
Does there exist a list ls such that length (filter (== 2) ls) == 2
And sadly SBV using the z3 solver could not solve this. I was wondering if there was any SMT solver that could solve
Does there exist a list ls such that length (filter (== 2) ls) == 1000
Thanks
We have two ways of dealling with errors (that I'm aware of):
by return value (Go, Rust)
by exception
if you look at Go or Rust code, basically every function can fail and most of your code is dealing with errors over focussing on the happy path.
This is tedious over having a big `try {}` and catch each type of error separately, grouping your error handling for a group of function and having the error and happy path quite separate. You can even catch few function call lower to make things simpler for you and grouping even more function in your error handling.
Now let's introduce "async / await" in the equation...
with the return value approach, when you need the value, you await, you check for error then use the value if there is no error or you deal with the error.
with exception you get a future that would make you leave the catch block then you will continue code execution but then an exception occur and this is where I'm so confused. Who catch the exception?
Is it the catch block where my original call was? is it some catch block that don't exist in the rest of my code because I'm suppose to guest when my async call will throw? Does the "main" code execution stop even if it has move forward? I just can't understand how things work and how to do good error handling in this context, can someone explain to me? For reference I currently code in Dart
I'm exploring a possibility of feeding a specialized input to regex to control an imperative flow of variable assignments.
Let's have statements, still unrelated to regex, and let each statement have two parts: input and output. Input consists of var name and var value. Output also consists of var name and var value. Consider the input as a named var that needs to contain noted value to trigger the output assignment of value to some other named var. This is like an if-then statement, but with some positive properties regarding to regex unrelated pattern matching. Statements can also be consisted of only output part, in which case they are unconditionally executed when their time comes.
Now, let's modify the usual regex definition a bit to operate not on characters, but on statements introduced above. Regex would, as usual, have sequences, Kleene stars, and alternations. Thus, instead of reading characters, the modified regex would read variables from input parts, parallelly triggering their output pairs on successful match.
Drawing a parallel between Kleene star and while loops, and between alternations and compound if-then-else blocks, this way it should be possible to construct different algorithms, right?
As an example, Euclidean algorithm for greatest common divisor would take the following form (note that this is very rough and unrefined form of syntax):
(
FUN
(NAME GCD)
(PARAMS X Y)
(
SEQ
(STMT (OUT a X))
(STMT (OUT b Y))
(
STAR
(
ALT
(STMT (IN a (moreThan b)) (OUT a (subtract a b)))
(STMT (IN a (lessThan b)) (OUT b (subtract b a)))
)
)
(STMT (IN a b) (OUT GCD a))
)
)
I'm curious, is anyone aware of any similar work? If not, how would you feel about this idea overall? Could this thing even be Turing complete? And how ergonomic would writing programs this way be?
[EDIT]
After discussion, I cleaned up a bit the syntax of the system, and completely formalized it in relaxed BNF notation:
<start> := <expression>
<expression> := (SEQ <expression>*)
| (ALT <expression>*)
| (OPT <expression>*)
| (STAR <expression>*)
| (PLUS <expression>*)
| (MATCH (VAR <ATOM>+) <expression>)
| <statement>
<statement> := (TEST <ATOM> <S-EXPR>)
| (HOLD <ATOM> <S-EXPR>)
SEQ
is equivalent to sequence in regex, ALT
is |
, OPT
is ?
, STAR
is *
, PLUS
is +
, and MATCH
is something extra that introduces meta-variables.
Statements are now split to TEST
and HOLD
. TEST
fails when var named <ATOM>
is not <S-EXPR>
. HOLD
assigns <S-EXPR>
to var named <ATOM>
, and never fails.
Sequence fails when any included element fails. Alternation fails when all included elements fail. OPT
elements may be skipped if any included element fails, STAR
ending with TEST
case is equivalent to until <condition> repeat <expression>
, and PLUS
ending with TEST
case is equivalent to repeat <expression> until <condition>
.
I believe the whole system should be Turing complete.
The above example from the beginning should now look like this:
/*
Greatest common divisor example
input: `(integer integer)`
output: `integer`
*/
(
SEQ
(
MATCH
(VAR X Y)
(
SEQ
(TEST this (X Y))
(HOLD a X)
(HOLD b Y)
)
)
(
STAR
(
ALT
(
SEQ
(TEST a (greaterThan b))
(HOLD a (subtract a b))
)
(
SEQ
(TEST a (lessThan b))
(HOLD b (subtract b a))
)
)
)
(TEST a b)
(HOLD this a)
)
At the beginning, input parameters are read from this
variable. At the end, we assign return value back to the this
variable.
Simple, isn't it? I hope you like the updates :)
Hardware description languages(HDL) are the standard to program digital logic. The industry standard languages are:
Verilog and VHDL were conceived in the 1980s. SystemVerilog is an improvement to Verilog made in 2002.
There are few other HDLs, but are only used by researchers or small one off projects.
Why there are no new viable alternatives popping out?
The languages work, but they are a pain to work with. People don't see HDL as an empowering tool, but as a necessary evil to get the job done.
This is the opposite with programming languages. Every few year, there is a new programming language. Industry standard programming of 20 years ago are not the same as today's. Some programming languages are viewed as empowering, and from a big following.
Why the stark contrast?
I have few hypothesis:
What are your thoughts?
Hi
I'm working on defining how i want modules and imports to work.
I'm trying to work under the constraint that modules cannot import other modules. I think that this might improve security (and maybe testability too).
Lets try with an example:
A logging library prints to a file. In most languages i know, that means importing a filesystem-construct, or using a global function.
The logging lib cannot import a filesystem-construct, because importing is not allowed inside modules, so instead the library takes a filesystem-construct as a parameter, similar to how a class takes values in a constructor.
Some pseudo code:
logging library:
module myLoggingLib(filesystem) {
struct logger {
function log(message) {
filesystem.appendFile("log.txt", message)
}
}
}
application:
import system.filesystem
import myLoggingLib(system.filesystem)
function main() {
myLoggingLib.logger.log("hello world")
}
This smells a little like old-school javascript, where we would wrap everything in a function to achive something akind to namespaces.
What other languages do this?
How do they handle types?
In the above example, the type of myLoggingLib
, must include the type of some general filesystem
module - where would that be defined?
Maybe other modules should not be allowed as parameters, so the logging lib would only have a appendFile
function as parameter?
This is mostly a random shower thought.
We usually use parentheses to group parts of expressions:
(10 + 5) * (7 + 3)
Some languages, like Rust, also have blocks that can act as expressions:
let lhs = {
let a = 10;
let b = 5;
a + b
};
lhs * (7 + 3)
However, since a block can consist of a single expression, we could just use such blocks instead of regular parentheses:
{ 10 + 5 } * { 7 + 3 }
This would free up regular round parentheses for some other purpose, e.g. tuples, without introducing any syntax ambiguity. Alternatively, we could use round parentheses for blocks, which would free up curly braces in some contexts:
let lhs = (
let a = 10;
let b = 5;
a + b
);
let rhs = ( 7 + 3 );
lhs * rhs
Are there any downsides to these ideas (apart from the strangeness budget implications)?