The Big Mud Puddle: 10/01/2010

So somebody on campus decided a couple of weeks ago that it'd be a good idea to organise a BarCamp here in Rochester this coming weekend. I thoroughly approve of the BarCamp philosophy, and saw this as a great excuse to yammer about Prog for a while to a group of people who might be interested and understand anything I say. Heady stuff.

Anyway, I've been putting together a presentation on the current state of programming languages, their future, and Prog's particular approach to bridging the gap. Prog is thoroughly under-documented and constantly undergoing minor revisions as I work on it, but now that I'm forcing myself to speak about it cohesively, I've been forced to cohere many of my thoughts. In order to help myself get these ideas down, and in order to ensure that they're recorded in full in case my public speaking skills fail me, here's basically what I'll be talking about on Saturday.

There's not a lot of disagreement in the major points of where programming languages are headed:

Functional Programming (or something like it)
Concurrency that doesn't make your head cave in
Static Typing to structure development
Type Inference to eliminate potential boilerplate from static typing
Generic Programming for modularity and clarity

Functional programming is associated with declarative rather than imperative style, the importance of data flow over control flow, immutability of data rather than mutable state, referential transparency of functions, and, as a result of all of these, strong support for concurrency. Unfortunately, FP has consistently been seen as something of an academic, ivory-tower pursuit, despite its obvious utility and even ubiquity. In most functional languages, it's no more awkward to manage state and exceptions than in most imperative languages; the difference is how relevant state is to program flow and correctness. Consequently, FP is seen as having a high barrier to entry because it feels very alien to CS students who, these days, are all spoon-fed Java. Peh.

Prog draws a few important concepts from functional programming:

All types are immutable by default
Pure, constant functions can be evaluated statically
Operations and algorithms are more important than objects.

Now, about concurrency. It's become obvious to everybody and their aunt that concurrent programming is where it's at, or at least, where it will be. Processors have increasingly many cores, and distributed, high-volume systems are commonplace even now. There are a few approaches to concurrent programming, and the one most likely to fail is of course the object-oriented one. I don't mean to be too harsh on OO (at least, not right now), but there's wide agreement that explicit threading and locking in a system designed around mutable objects is difficult. And programmers will eventually give up technologies and philosophies that get in the way of their ability to get stuff done.

The alternatives, though, are promising. The Actor model is a sound mathematical model of concurrent computation that allows, among other things, formal proofs of program correctness (or, more accurately, the absence of program incorrectness) in concurrent systems. Implicit parallelisation is commonplace in languages without mutable state such as Haskell, and even explicit threading is becoming more lightweight and manageable in languages such as Go.

The main problem with concurrency, I think, has been that it's easy to understand but hard to implement, and difficult to verify when trying to account for edge cases, especially in systems based on state. And for ages people have been complaining that immutability makes concurrency easy but mutability is nice and natural for a vast number of applications. What we're seeing, then, is a disparity between the needs of programmers and the hardware they're working with. And hardware insists on going in a difficult direction.

So how does Prog handle concurrency? Well, as in Haskell, unrelated and pure operations can be implicitly parallelised, and the type system provides the required granularity: spawning too many small threads is inefficient in terms of thread-creation overhead, but spawning too few large threads is inefficient in terms of lost parallelisation opportunities. Explicit threading in Prog is still cheap because it's simply a hook into the type system to encourage (but not require) automatic parallelisation.

Further, Prog has intrinsic support for software transactional memory (STM) using so-called atomic regions, which, again, imbue a scope with type information that causes all operations, and the group of operations as a whole, to become atomic (which, for mutable objects, means implicit locking).

In Prog, values have the inherent capability of receiving and acting upon messages; this is simply inversion of control between a function and its arguments. So while in some sense Prog is like Ruby, where “everything is an object”, it is also like lambda calculus, where “everything is a function”. In Prog there is little to distinguish values from functions, just as there is little to distinguish values from types or even metatypes.

So communication between threads is handled using this messaging facility, and a call from one thread to a function in another behaves identically to a thread-local call. Immutable data are shared implicitly, while mutable data are shared explicitly but locked implicitly.

Now, about static typing. A static type system significantly simplifies the processes of static correctness analysis, improves maintainability by localising changes and making errors detectable at compile-time, and creates some measure of assurance of consistent program behaviour. When safety constraints are known and enforcible at compile-time via the type system, the theoretical correctness and consistency of a program becomes much more trustworthy. In addition, purely static typing implies a certain degree of runtime performance improvement over purely dynamic typing.

Prog's type system is static and strictly enforced. A single variant type provides a consistent interface for both dynamic typing and generic programming. As previously stated, all types are immutable unless explicitly qualified as mutable. Types and values are indistinguisable, and static assertions based on type inference are the basis of correctness in generic programming. An expression can be imbued with a context type, which may improve the performance of or otherwise affect its evaluation. As a result of context-selection, unlike in many other languages, Prog functions can be overloaded on return type alone.

Type inference is an absolute necessity in a statically typed system. For one thing, it reduces boilerplate (e.g., C++0x lets you type auto when you don't feel like typing std::list<std::map<std::string, int>>::const_iterator), but it's also essential in generic programming to be able to perform known operations on values of unknown type. Type inference is closely related to overloading: a good example of this is to be found in C++, where function template parameters can be inferred from function arguments. Ideally, type inference does away with needless casting, making necessary casts stand out as hazardous or incorrect. It improves syntactical clarity, and naturally becomes generic programming.

Prog's type inference capabilities are numerous. First, objects are implicitly declared upon their first use, and imbued with the type of their initialiser. Overloading occurs both statically and dynamically, entirely depending on the types that are actually used. Any necessary casts are carried out using explicit, loud, scary, easy-to-find casting operators, and otherwise the syntax is completely freed from any mention of typing where it's not needed. And the type system comes to our aid once more, and ensures that immutability guarantees static evaluation: if an expression is constant, it's statically evaluable; if the type of an expression is constant, it's statically inferable.

Now, here comes the good stuff. I could ramble about generic programming all day. The most common exposure to generic programming that people have in the OO world is templates (or “generics” if you're from that camp). But if you haven't read the work of Alexander Stepanov, you simply must. Generic programming is not just about generic types, but also about generic algorithms and the abstract properties of data structures, most importantly their complexity requirements and guarantees.

Generic programming at its best relies heavily on the iterator model, but Prog generalises this to include ranges as well, which are, at their heart, syntactic sugar for iterators. Prog provides the external constraints that make generic programming possible via the type system. Generic algorithms in Prog can simply accept variant instances and rely on the type system to handle static versus dynamic evaluation, mutability guarantees, and so on. Explicit static assertions verify preconditions and postconditions and protect invariants. Algorithms are thus exposed as cleanly as possible, complexity guarantees are in some cases statically verifiable, and the Prog type system makes all of this super-easy.

As an example, here's a naive C++ implementation of max:

template<class T>
const T& max(const T& a, const T& b) {
  return b < a ? a : b;
}

template<class T>
T& max(T& a, T& b) {
  return b < a ? a : b;
}

This returns the maximum of two values, and is overloaded to accept either constant or non-constant values, such that it can be used as either an rvalue or an lvalue. The Prog equivalent does not require such explicit differentiation, though for strict equivalence with the above it does require a static check that the two values are of the same type:

def max(a : any&, b : any&) : any& {
  assert @a == @b;
  b < a ? a : b
};

Here's another short example of how software transactional memory is expected to work:

def transfer(a : mutable Account&, b : mutable Account&,
    x : int) : bool {

    atomic {
        a.withdraw(x);
        b.deposit(x);
    } except prog::atomic_abort {
        return false;
    };
    return true;

};

And at this point the talk will degenerate in to Q-and-A, but this is a blog, so I guess I'm done rambling. It's not as though I have a massive amount of readership anyhow.

The Big Mud Puddle

Pages

24 October 2010

Error Message Pun

20 October 2010

BarCamp Rochester

03 October 2010

A Walk in the Woods