Domain Specific Language Nonsense
Much of the pop-literature about DSLs has a single concern, which is how to make code look more like natural English and is uninterested in those things that make languages more or less useful.
The last few years has seen an increase in interest in Domain-Specific Languages (DSLs) within the Java development community. This appears to have been ignited by the work of Charles Simonyi on Intentional Software, Martin Fowler’s article on DSLs, the development of JMock and a few other articles and initiatives that have made news.
I am in favour of raising the bar in mainstream development, but I find the approach to DSLs as practiced by many of those currently enamoured of them somewhat lacking, for the very same reason that the mainstream development bar is low in the first place.
It is sad that, in commercial software development, many developers are burdened with three difficulties. First, they find it hard to sustain interest in a topic whose foundations, results and consequences require non-trivial effort to understand, no matter how germane it may be to their chosen profession and situation. Second, they believe that their ability to construe meaning from the vapour of an introductory paragraph verges on the paranormal. Third, their ability to critically review an idea is often impaired as the blood flows away from the part of the brain that deals with analysis to that responsible for creativity.
As a result, many of those who are happily inflicting ‘DSLs’ on their teams, or blogging about them thus inflicting their dysfunctional notions on the world, do not know what DSLs are, or even, to get right back to basics, how they might evaluate a language, domain-specific or otherwise. They appear to have built up a plausible but wholly incorrect idea of the meaning of ‘domain-specific language’ based on nothing more than that which can be deduced by the three words ‘domain’, ‘specific’ and ‘language’, often bracing that meaning with more enthusiasm and ill-informed references to Literate Programming than they should be legally licensed for1.
My main issue with much of the pop-literature about DSLs is that it has a single concern, which is how to make code more ‘readable’, by which is meant more immediately understandable. While comprehensibility is a laudable aspiration, the singular obsession of the would-be DSL writers appears to be that this should be achieved by making the code look more like natural English, using an idiom known as a ‘fluent interface’, and they are uninterested in those things that make languages more or less useful.
This is an extraordinarily narrow view of what makes a body of work understandable; a work can be understood if it is both clearly expressed and the reader has the necessary background in the subject matter. The heroic attempt to make Java look like natural English tries to trivialise the second point and actually tends to undermine the first at every level other than the ‘top-most’, rather than support it.
In addition, it ignores the facts that English has evolved over millennia as a general purpose language, is frequently imprecise, often only acquiring precision through redundancy and also often conveys any meaning at all only because of a vast shared context2. Imprecision, redundancy and the need for a massive shared context is hardly a sound basis for programming. It is, however, a swamp in which code of the most turgid kind can be found.
If you doubt my characterisation of English (or any natural language), look up almost any non-technical, general term in a dictionary and you will usually find a large number of possible definitions, some of which are logically contradictory. Take for example the word ‘general’ used as part of a technical term earlier in this paragraph. Amongst its non-technical meanings, it can mean both ‘universal’ and ‘most’. The majority of English speakers would agree that one acceptable meaning of ‘most’ would be something like ‘the majority but not all’ the last clause of which contradicts ‘universal’ as a definition. Yet listeners don’t stare at me in confusion when I use the word ‘general’ in a phrase (they frequently do stare at me but usually because they understand exactly what I mean). The phrase as a whole yields its meaning by some interactive process that the words engage in upon alighting on the mind of the reader or listener.
There is clearly a contradiction between the informal use of English as practiced by natural speakers and the needs of a language that can be used for precisely specifying processes and structures.
In order to build useful domain-specific languages, we must answer at least two questions:
- what is the domain?
- what makes a good language?
The first point seems obvious and should need no further discussion. Sadly, this is frequently not the case. Even so, I am not going to talk about it any more here. Instead I would like to focus on the second point.
In the excellent book Structure and Interpretation of Computer Programs (commonly referred to as SICP), Abelson, Sussman and Sussman state3 that three mechanisms are present in every powerful language4:
- primitive expressions, which represent the simplest entities the language is concerned with,
- means of combination, by which compound elements are built from simpler ones, and
- means of abstraction, by which compound elements can be named and manipulated as units.
Furthermore, they admonish us to ‘pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas5’.
It is plain that this way of thinking allows us to understand a wide variety of recognised languages. For example:
- Those with more than a superficial understanding of object-oriented design patterns recognise that, while the patterns themselves are useful, it is the emergent pattern language that gives us power.
- Those who understand European music from the era of common practice have a multi-dimensional language that has the attributes described above. From personal experience, the process of music composition is best when the language is manipulated directly in music with some support from reasoning through natural language, rather than by a translation from natural language into music. i.e. the language of music is extremely abstract, although a projection of it can be made into natural language.
- Mathematics. Well of course.
If we evaluate the ‘fluent interface’ reduction of DSL’s ( i.e. the subject of this article) according to the criteria described in SICP, we find the reason for my distaste of them; in general they are nothing more than a collection of primitive expressions, no abstractions above those available in the implementation language, and a single means of combination, usually using auxilliary classes whose methods are meaningless outside of the context of the caller, which must be implemented for each required combination.
The result of this is a thin shell of fluent DSL-ness, with an underlying mess of accidental complexity, implemented in code that is several times more voluminous than the simple, direct, natural OO implementation would have required. How much more degenerate could it be?
Note that the API presented by JMock, for example, does not fit into this category. It goes about as far as is possible, given Java’s constraints, to give clients power to produce combinations through the use of InvocationMatcher, Stub, etc.
In addition, JMock combines the concept of DSL with that of fluent interface quite beautifully, so that to the client, it truly does provide a more powerful mechanism than would exist if either was removed.
Finally, JMock is a highly-geared API; it took Nat Pryce and Steve Freeman some time to build something that tens of thousands can use without worrying about the internals6. This description cannot be applied to bespoke software development.
So, for those developers who find themselves enamoured of the notion of domain-specific languages, I beg you to consider what you are hoping to achieve other than a superficial sense of ‘easy readability’ for a non-technical audience, and to critically evaluate your language using appropriate criteria; criteria that would have been familiar to John Locke in 16907:
The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three: 1. Combining several simple ideas into one compound one, and thus all complex ideas are made. 2. The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations. 3. The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.
John Locke, An Essay Concerning Human Understanding (1690)
2 This context is given not only by shared definition of terms but also by similarity of experience.
4 The remainder of this article depends on the congruency of meanings of the word ‘language’ in the context of SICP and in the term ‘Domain-Specific Language’. I assume this.
5 Robert Floyd presents a broadly similar view in his classic paper The Paradigms of Programming
6 Actually, the internals look pretty good.
7 This passage was quoted at the beginning of Chapter 1 in SICP indicating, unsurprisingly, that the authors connect the power of languages with their ability to map onto mental processes involved in understanding.