The Big Mud Puddle: 2010

24 October 2010

Error Message Pun

This came up in conversation:

void monopoly() {
    goto jail;
    int go = 0;
jail:
    return;
}

int main() { monopoly(); }

In GCC, it produces the error:

monopoly.cpp: In function `void monopoly()':
monopoly.cpp:4: error: jump to label `jail'
monopoly.cpp:2: error:   from here
monopoly.cpp:3: error:   crosses initialization of `int go'

Haw haw haw I'm done. A more substantial post is coming soon. Hope you had a good weekend.

20 October 2010

So somebody on campus decided a couple of weeks ago that it'd be a good idea to organise a BarCamp here in Rochester this coming weekend. I thoroughly approve of the BarCamp philosophy, and saw this as a great excuse to yammer about Prog for a while to a group of people who might be interested and understand anything I say. Heady stuff.

Anyway, I've been putting together a presentation on the current state of programming languages, their future, and Prog's particular approach to bridging the gap. Prog is thoroughly under-documented and constantly undergoing minor revisions as I work on it, but now that I'm forcing myself to speak about it cohesively, I've been forced to cohere many of my thoughts. In order to help myself get these ideas down, and in order to ensure that they're recorded in full in case my public speaking skills fail me, here's basically what I'll be talking about on Saturday.

There's not a lot of disagreement in the major points of where programming languages are headed:

Functional Programming (or something like it)
Concurrency that doesn't make your head cave in
Static Typing to structure development
Type Inference to eliminate potential boilerplate from static typing
Generic Programming for modularity and clarity

Functional programming is associated with declarative rather than imperative style, the importance of data flow over control flow, immutability of data rather than mutable state, referential transparency of functions, and, as a result of all of these, strong support for concurrency. Unfortunately, FP has consistently been seen as something of an academic, ivory-tower pursuit, despite its obvious utility and even ubiquity. In most functional languages, it's no more awkward to manage state and exceptions than in most imperative languages; the difference is how relevant state is to program flow and correctness. Consequently, FP is seen as having a high barrier to entry because it feels very alien to CS students who, these days, are all spoon-fed Java. Peh.

Prog draws a few important concepts from functional programming:

All types are immutable by default
Pure, constant functions can be evaluated statically
Operations and algorithms are more important than objects.

Now, about concurrency. It's become obvious to everybody and their aunt that concurrent programming is where it's at, or at least, where it will be. Processors have increasingly many cores, and distributed, high-volume systems are commonplace even now. There are a few approaches to concurrent programming, and the one most likely to fail is of course the object-oriented one. I don't mean to be too harsh on OO (at least, not right now), but there's wide agreement that explicit threading and locking in a system designed around mutable objects is difficult. And programmers will eventually give up technologies and philosophies that get in the way of their ability to get stuff done.

The alternatives, though, are promising. The Actor model is a sound mathematical model of concurrent computation that allows, among other things, formal proofs of program correctness (or, more accurately, the absence of program incorrectness) in concurrent systems. Implicit parallelisation is commonplace in languages without mutable state such as Haskell, and even explicit threading is becoming more lightweight and manageable in languages such as Go.

The main problem with concurrency, I think, has been that it's easy to understand but hard to implement, and difficult to verify when trying to account for edge cases, especially in systems based on state. And for ages people have been complaining that immutability makes concurrency easy but mutability is nice and natural for a vast number of applications. What we're seeing, then, is a disparity between the needs of programmers and the hardware they're working with. And hardware insists on going in a difficult direction.

So how does Prog handle concurrency? Well, as in Haskell, unrelated and pure operations can be implicitly parallelised, and the type system provides the required granularity: spawning too many small threads is inefficient in terms of thread-creation overhead, but spawning too few large threads is inefficient in terms of lost parallelisation opportunities. Explicit threading in Prog is still cheap because it's simply a hook into the type system to encourage (but not require) automatic parallelisation.

Further, Prog has intrinsic support for software transactional memory (STM) using so-called atomic regions, which, again, imbue a scope with type information that causes all operations, and the group of operations as a whole, to become atomic (which, for mutable objects, means implicit locking).

In Prog, values have the inherent capability of receiving and acting upon messages; this is simply inversion of control between a function and its arguments. So while in some sense Prog is like Ruby, where “everything is an object”, it is also like lambda calculus, where “everything is a function”. In Prog there is little to distinguish values from functions, just as there is little to distinguish values from types or even metatypes.

So communication between threads is handled using this messaging facility, and a call from one thread to a function in another behaves identically to a thread-local call. Immutable data are shared implicitly, while mutable data are shared explicitly but locked implicitly.

Now, about static typing. A static type system significantly simplifies the processes of static correctness analysis, improves maintainability by localising changes and making errors detectable at compile-time, and creates some measure of assurance of consistent program behaviour. When safety constraints are known and enforcible at compile-time via the type system, the theoretical correctness and consistency of a program becomes much more trustworthy. In addition, purely static typing implies a certain degree of runtime performance improvement over purely dynamic typing.

Prog's type system is static and strictly enforced. A single variant type provides a consistent interface for both dynamic typing and generic programming. As previously stated, all types are immutable unless explicitly qualified as mutable. Types and values are indistinguisable, and static assertions based on type inference are the basis of correctness in generic programming. An expression can be imbued with a context type, which may improve the performance of or otherwise affect its evaluation. As a result of context-selection, unlike in many other languages, Prog functions can be overloaded on return type alone.

Type inference is an absolute necessity in a statically typed system. For one thing, it reduces boilerplate (e.g., C++0x lets you type auto when you don't feel like typing std::list<std::map<std::string, int>>::const_iterator), but it's also essential in generic programming to be able to perform known operations on values of unknown type. Type inference is closely related to overloading: a good example of this is to be found in C++, where function template parameters can be inferred from function arguments. Ideally, type inference does away with needless casting, making necessary casts stand out as hazardous or incorrect. It improves syntactical clarity, and naturally becomes generic programming.

Prog's type inference capabilities are numerous. First, objects are implicitly declared upon their first use, and imbued with the type of their initialiser. Overloading occurs both statically and dynamically, entirely depending on the types that are actually used. Any necessary casts are carried out using explicit, loud, scary, easy-to-find casting operators, and otherwise the syntax is completely freed from any mention of typing where it's not needed. And the type system comes to our aid once more, and ensures that immutability guarantees static evaluation: if an expression is constant, it's statically evaluable; if the type of an expression is constant, it's statically inferable.

Now, here comes the good stuff. I could ramble about generic programming all day. The most common exposure to generic programming that people have in the OO world is templates (or “generics” if you're from that camp). But if you haven't read the work of Alexander Stepanov, you simply must. Generic programming is not just about generic types, but also about generic algorithms and the abstract properties of data structures, most importantly their complexity requirements and guarantees.

Generic programming at its best relies heavily on the iterator model, but Prog generalises this to include ranges as well, which are, at their heart, syntactic sugar for iterators. Prog provides the external constraints that make generic programming possible via the type system. Generic algorithms in Prog can simply accept variant instances and rely on the type system to handle static versus dynamic evaluation, mutability guarantees, and so on. Explicit static assertions verify preconditions and postconditions and protect invariants. Algorithms are thus exposed as cleanly as possible, complexity guarantees are in some cases statically verifiable, and the Prog type system makes all of this super-easy.

As an example, here's a naive C++ implementation of max:

template<class T>
const T& max(const T& a, const T& b) {
  return b < a ? a : b;
}

template<class T>
T& max(T& a, T& b) {
  return b < a ? a : b;
}

This returns the maximum of two values, and is overloaded to accept either constant or non-constant values, such that it can be used as either an rvalue or an lvalue. The Prog equivalent does not require such explicit differentiation, though for strict equivalence with the above it does require a static check that the two values are of the same type:

def max(a : any&, b : any&) : any& {
  assert @a == @b;
  b < a ? a : b
};

Here's another short example of how software transactional memory is expected to work:

def transfer(a : mutable Account&, b : mutable Account&,
    x : int) : bool {

    atomic {
        a.withdraw(x);
        b.deposit(x);
    } except prog::atomic_abort {
        return false;
    };
    return true;

};

And at this point the talk will degenerate in to Q-and-A, but this is a blog, so I guess I'm done rambling. It's not as though I have a massive amount of readership anyhow.

03 October 2010

A Walk in the Woods

When I need to get some really good thinking done, I've learned that there's nothing better for me than to relax and avoid thinking at all for a while. My brain is so glad for the brief holiday that it returns to work with renewed vigour, and accomplishes things as though it was working diligently the whole time I wasn't communicating with it.

There are few places where I can really get into this state—the shower is one that I have in common with many others, but a shower can reasonably be only so long for a number of reasons. Another good place is graveyards: the dead tend to be pretty accommodating hosts when it comes to quiet thinkers.

But I think the best place for me is the woods. Going for a long walk in the woods, trying in vain to confound my irritatingly accurate sense of direction, and getting some exercise are all great things, but they're even better when I get to go for a walk with the deer.

Yes, you read that right. The area around RIT is swarming with deer. On a recent late-night walk, I saw no fewer than a dozen by the side of the road, and that's no exaggeration. So it came as little surprise to me this evening when I ran into a couple of does in the forest.

Now, RIT deer are fairly docile. You'll never get them to eat out of your hand, but they won't spook if you're reasonably nice to them. Usually announcing your presence in a soothing voice and talking to them occasionally is enough to dispel their fears and let you get as close as five feet from them. One was still young and the other fully grown, and the fawn was noticeably more skittish than the adult, so kept her distance more.

I was able to walk with them for quite some time before they decided it'd be a good idea to cross a stream, and, clad in sneakers as I was, I wasn't prepared to slog through ankle-deep mud and knee-deep water. So I made my way through the dry part of the swamp to a hillock where I read Dracula and watched the sunset.

Why am I telling you all of this? What does this have to do with programming or linguistics? Not much, really, but it is kinda nice, and people like stuff like that. And anyway, it got me thinking about the universality of expression, which is something we humans seem to not just take for granted, but often entirely ignore. An animal can understand your intent and your emotions even if it can't grasp the meaning behind what you're saying. Hell, a human can, too; we're not immune. Tone of voice and body language are enormously important, and yet we seem to always get lost in the intricacies of verbalisation.

We should probably just relax and eat some acorns.

25 September 2010

Haunting

One day I found a tale of old
whose name I'd known but failed to seek,
and reading it consumed my night
till I retired, satisfied,
but fell into such fevered dreams
as ne'er had I experienced
that when I woke did not depart
and haunted me throughout the day.

I went about my daily life,
in hopes my madness soon would end,
but fell into anxiety
with paranoia calling me.
I would not go; I could not stay.
I grappled with the rationale
that this was all a side-effect
of what I had consumed last night.

Eventually it did depart.
My thoughts regained their normal state.
But I cannot quite shake the thought
that something changed inside me then.
Beware the thoughts of writers who
would prick you with their wicked pens.
Beware the words of others, friend;
it will make all the difference.

10 September 2010

Language Hate

Users of StackOverflow will no doubt have seen a certain very popular community-wiki question: What are five things you hate about your favorite language? It comes from Brian D. Foy's journal on the same subject, wherein he suggests that those who cannot find five things to hate about a given programming language (such as design flaws, warts, or general annoyances) "don't know the language well enough to either advocate it or pull in the big dollars using it".

In large part, this is true. Conversely, I think there's a lot of unfounded language hate out there, and people who make claims about the purportedly inferior quality of a language typically also don't know said language well enough to make such claims. The reality is that there are simply a lot of people out there who are either simply misinformed, or worse, making arbitrary assumptions about languages that they neither know nor understand, and this is principally because they're afraid.

When you don't know a technology, it looks opaque at best, and hideous at worst. If a programmer doesn't experience an instant passionate attraction to a language (and, believe me, this is about as rare as it gets), the natural reaction is often to go exactly the opposite route and hate on it for no reason. This is the source of innumerable blog entries and forum posts excruciatingly detailing the author's visceral negative reaction, which typically goes something like this:

Installed new language.
Ran "hello world".
Left alone for a while.
Decided to attempt mid-size project in language.
Became frustrated when new language didn't work like old language.
Gave up.
Ranted about the failings of new language.

All of this is kind of a prelude to the important issue here: what I think. (After all, this is my blog.) Now, I can't say that language choice is wholly immaterial, as every language has its own strengths and weaknesses that should be appreciated even when the language itself is not suited to one's own needs. Take Erlang, for instance. I have never used Erlang, nor do I plan to do so in the near future. However, I can certainly appreciate the beauty of the syntax, the tightness and purity of the type system, and, of course, the breathtakingly simple concurrency support. Finding things to praise about a language that you know very little about is the first step toward actually getting yourself to sit down and learn it the way it was meant to be learned: standing on its own, with its own idioms, subtleties, strengths, and weaknesses.

But that brings me back around. Anyone who knows me knows I use C++ more than anything else, and there is only one reason for this: it doesn't get in my way. If I can come up with a program, chances are I'll be able to implement it efficiently, robustly, and, using appropriate idioms and libraries, quickly. Further, I don't pay a hidden cost for features that I don't use, which to me is another very important aspect of a language.

But what do I hate about C++? Ha! That I could limit it to only five things! In essence, everything I hate about C++, or any language that I use, is any "feature" or peculiarity that gets in the way of The Virtues of a Programmer:

Laziness: C++ makes me type. I may be a blazing fast typist. I may even have changed my keyboard layout to avoid as much as possible having to hit the Shift key while programming. But if there's anything about C++ that I dislike, it's the boilerplate. I'll be honest, I'm lazy. I like my editor to generate boilerplate code for me, but I also like it to generate it exactly as though I had typed it, which quite often is not possible. Thus C++ irritates my laziness.

Impatience: C++ makes me jump through hoops to express certain common ideas that are not natural to the language, and this is only slightly ameliorated by use of the (rather well designed) STL. I am impatient, and I want my program do be done as soon as possible so I can start using it and showing it off. Thus C++ irritates my impatience.

Hubris: C++ makes it difficult to maintain standards of code quality. While this is a natural trade-off against the raw expressive power of the language, C++ is definitely biased toward pragmatism in lieu of elegance and purity. I have seen the horrors of bad C++ code, from the run-of-the-mill C written in C++ to the sort of stuff with a comment above it saying just I'm sorry. But I digress.

If I haven't made these bad qualities seem bad enough, you're probably right. But they're the things that bother me on a day-to-day basis, and putting them up against the well-known Virtues is a good way of illustrating the reason behind the hate, without going into too much language-specific detail. In the context of these virtues, it becomes quite easy to analyse certain qualities about your preferred language that you like or dislike.

I would be remiss if I didn't go on to admit something that many who meet me are surprised to discover: I dislike Python. I've hardly used Python, and I already very much dislike it. It's simply a tool that I don't want to be involved with unless it's an absolute necessity.

Why? That's easy. The fundamental philosophy of Python is "there should be one, and preferably only one, obvious way to do it", designed as the antithesis of Perl's TMTOWTDI. This is in direct conflict with my views as a programmer, a linguist, and a human being. Bold words, eh?

Don't get me wrong. This philosophy is quite well integrated into the design of Python, and on the whole I have to praise it for its simplicity and relative consistency despite its massive growth over the years. But I simply can't abide prescriptivism. TMTOWTDI expresses an analysis of the real situation of programming and linguistics. There really is more than one way to write any given program, in the same way that there is more than one way to say a phrase. There are idioms, subtleties, and nuances. There is craftsmanship. Is it any surprise that Larry Wall was a student of linguistics, while Guido Van Rossum studied mathematics and computer science?

It all comes down to pragmatism versus elegance, and for me, despite my obsession with precise, beautiful code, writing an ugly program that works is better than writing a beautiful program that does nothing.

The only other reason I have for disliking Python is a very small nitpick: its syntax. No, I don't mean whitespace; I'm fine with that bit. It's just that human languages have loads of redundancy and compressibility and texture, which combined make it easy to read and comprehend running text at high speed. Python's syntax is deliberately minimalistic, and consequently there's very little for the eye to hold on to, so the reading rate is (for me) drastically lowered. It's the same reason I have trouble reading Lisp and even its kid brother Scheme, pretty little thing though the latter is.

This is the very same reason that I can never get myself to really learn Spanish or Italian, though I can already understand quite a bit from my knowledge of French and Latin. A phonetic orthography decreases the amount of redundancy in a word, reducing the ability to compensate for errors, and consequently reducing the reading rate. Like I said, only a minor problem.

Anyway, I strongly encourage you to look at your own preferred language and ask yourself what you hate about it, or take a language you think you hate, and ask yourself what there is to like. Especially if you're designing a programming language, this can provide some startling insight into what tools you, as a programmer, need to be developing and using.

I think that's about it for today. Good night, and good luck.

26 August 2010

Macript

As I write, I'm reinstalling Linux because an upgrade fried my ability to boot. After backing up my home directory onto my Windows partition, I decided I needed something to do while downloading installation files. So I was going through some of the little one-off projects that I've done over the past few months, and decided to write about this one.

I call it Macript, a portmanteau of "macro" and "script". It's a short Perl program that reads a source file, scans it for anything that looks like a macro invocation, tests for whether a script by that name exists in the directory where Macript was invoked, and, if it does, expands the invocation to the standard output result of running that script with whatever arguments happen to be given via the macro.

That's pretty much it. And let me say, as a preprocessing step, it's much too useful. I would say that it's entirely changed my build process, but that would be a lie, because I'm pretty stuck in tradition. However, I suspect that a lot of people stand to benefit from a tool like this.

Say you want to perform some text-based program translation for which the ordinary C preprocessor is insufficient, or for which you simply don't want to resort to hairy macros or x-macros. Want to produce forward declarations for all of the functions in a source file? Easy. Want to perform conditional compilation based on the results of a configuration script? Piece of cake. Want to generate compile-time warnings about uses of functions that are marked // DEPRECATED in the source? Trivial.

All kinds of code generation and static analysis tasks become a breeze by adding this one simple preprocessing step to your build. I strongly encourage you to try it out, and I suspect you'll be pleased with the results. The best part is that it's not an enormous investment. Even software that relies heavily on Macript can almost certainly be rewritten to avoid it, but it just happens to automate a few things that can make life a whole heck (or at least a fourth of a heck) of a lot easier.

I promise you'll be able to find it on Sourceforge very soon. Stay tuned!

25 August 2010

Prog, Constify, and Pointer Single Loop

Hello again.

This summer holiday, in the absence of a job, I've spent a lot of time working on Prog, and recently I actually finished a compiler of sorts. It produces an intermediate format suitable for text-based translation (via the C preprocessor, for instance) into any target language that supports, whether directly or indirectly, a high-level translation from Prog. I'm working on the C++ target, and it's currently possible to compile trivial programs such as Hello World and FizzBuzz, among others.

So I didn't make my January release last year, but I will make it long before next year, and I'm happy about that. After all, I'm nineteen years old, and the longer I wait to release a language, the less impressed people will be by my age! I'm kidding, of course. I really just want the damn thing to be done so I can use it and enjoy it, and so that others can do the same.

On to the longer, more peculiar part of the title. I've been browsing Stack Overflow a lot lately—in fact, perhaps more than I should—and a topic came up recently that I thought was interesting enough to write about here.

The question concerned the possibility of a "constify" operation in C++, that would, for the remainder of the duration of the current scope, cause a variable to be treated as though it had been declared const. The original poster wanted to be able to write something like the following:

std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
constify v;
v.push_back(4); // This is now a compile-time error.

Now, the real use of such an operation doesn't really show in std::vector, especially since C++0x introduces an initializer_list constructor that makes it trivial to initialise a const std::vector. But to be able to initialise a const object by calling mutating methods on that object? I was intrigued.

Obviously the syntax would have to change somewhat. The simplest route was obviously to declare a local const reference bound to the original variable. My original constify macro looked like this:

#define constify(type, id) \
type const& id##_const(id), & id(id##_const)

The first declaration creates a const reference var_const bound to the constified variable var. It then binds another reference named var to var_const. It has to be done this way because type const& id(id); is erroneous, due to the fact that both mentions of id refer to the local variable, not the existing one.

To use this version of the macro, it had to be wrapped in its own scope, such as a loop body or bare block:

std::vector<int> v;
// ...
{
    constify(std::vector<int>, v)
    // ...
}

Using the macro outside a local scope resulted in a duplicate definition of the constified variable. I didn't like the requirement of wrapping the macro in its own scope, so I sought a more elegant solution. Nobody likes trying to mash a block into a macro invocation, so I decided to try turning the macro into something more like an inbuilt C++ construct, that could be used like so:

constify (var) {
    // var is const for the duration of this scope.
}

The first problem was easily solved: how to have the compiler deduce the type of the constified variable. C++0x introduces the decltype keyword, which allowed me to rewrite constify:

#define constify(id) \
decltype(id) const& id##_const(id), & id(id##_const);

So far so good. (NB: In non-C++0x compilers, there is often a typeof extension with a very similar effect.) But how to allow for the clean syntax? The answer was in what I'll call, for lack of a better term and only because I've never seen it before, the Pointer Single Loop idiom.

Basically, the constify macro had to introduce a for loop, which would provide the scope for the local variables var_const and var, while also executing only once. There is no way to introduce a loop control variable of a type unrelated to that of the constified variable, and no way to produce a local scope surrounding the expansion of the macro without also requiring that the loop body be mashed as a parameter into the invocation of constify.

Meet the Pointer Single Loop idiom. It relies on the fact that, for any variable t of type T, this loop executes only once:

for (T* p = &t; p; p = 0) {}

Further, since it's possible to construct a pointer to any type, and since we already have the name of the variable to be constified, it's trivial to construct the final implementation of constify:

#define constify(id) \
for (decltype(id) const& id##_const(id), \
    & id(id##_const), * constify_index = &id; \
    constify_index; constify_index = 0)

Like the earlier implementations of constify, this accepts an id and binds it to a local, immutable id via id_const. The difference is that it uses a Pointer Single Loop to execute the next statement once in the context of the new local variables. It requires only that the variable constify_index be reserved for the purpose of the loop idiom, and works exactly as expected:

std::vector<int> v;

v.push_back(1);
v.push_back(2);
v.push_back(3);

constify (v) {

    v.push_back(4); // This is a compile-time error.

}

This also works if v happens to be declared const already. In all, it was very easy to write, and I imagine that judicious use of C++0x features alongside an idiom of this sort could result in very sensible, readable extensions to C++, which provide real value to the language while retaining compatibility across standards and compilers. I can totally see Boost going for something like this.

13 May 2010

Vision Update

It's been a long, mad journey, but I made it all the way to Vision 1.2. This version has a bunch of fixes and additions that really flesh the language out in terms of usability and security. There's even a build for Mac OS X now, and I'm going to put together a Debian package, too. You should hop on the Vision train by visiting the Sourceforge project page, and the project home page, too!

18 March 2010

Language Updates

Well, I paused working on my game to get back to working on Prog, then paused Prog again to whip up Vision, a tiny little thing for making templates for dynamic Web content—or any XML content, really. I set up both the Prog site and the Vision site using Vision, and it appears to be both handy and scalable. Here's a quick example.

Each page on the Vision site (okay, at the moment, the page on the Vision site) consists of two files: the Vision script that generates the output, and the content file giving the body text. The Vision file is very simple:

#!/home/groups/v/vi/vision-language/vision
@include Page;
Page {
    "Vision Web Templating Language";
    ../htdocs/MainPage
}

This gives the path to the Vision interpreter, includes a page template in a file called Page, and then invokes the template of the same name with some text and the contents of a file. That's it. The Page template is very simple, too, and looks like this:

@define Page (TITLE; BODY) {
    html {
        head {
            title { "Vision &gt; "; TITLE }
            link (rel "stylesheet"; type "text/css"; href "/style.css");
        }
        body {
            div main {
                div (class "cell") { h1 { TITLE } }
                div (class "cell") { BODY }
            }
        }
    }
}

As you can see, the template produces an HTML document with whatever title and body content were specified. Everything else on the site is controlled by the CSS document style.css. New pages can be created quite easily by authoring a new content file and duplicating the Vision script with a reference to the new file.

The Big Mud Puddle

Pages