Brown Paper Packages Tied Up With String
A lamentation on the lack of skill in the use of Java packages
When I join a team, I usually spend some time looking around the code so that I can reach an understanding of the product of the team’s labours and the shape of my near future.
One of the things that I try to come to terms with is the package structure that the team has chosen. I am alas, not often enlightened by this endeavour and often feel an unpleasant sense of foreboding about the next few months.
The reason that I am interested in the package structure is that it represents the highest level of organisation provided by Java. A good package structure can provide a significant amount of information about the system’s concepts and indicates that the development team have some degree of clarity, at least at this level.
Failing to effectively use packages suggests to me that the developers may not demonstrate the clarity of thought – prospectively or retrospectively – that I would prefer.
Packages were introduced into Java primarily to provide namespaces. By this means, developers working on different projects and sharing code may avoid clashes in type names and the names of other resources. So, for example, the AWT List can happily co-exist with the Collections Framework List without any confusion, because they have been placed into different packages.
However, the provision of namespaces is not the end of the story for packages. There is something much deeper about them and that is that they, just like classes and interfaces, provide a means of abstraction. By this, I mean that they provide a means ‘by which compound elements can be named and manipulated as units1’.
What are the elements that are compounded by packages? They are classes, interfaces and other, non-Java resources.
How are these compounds manipulated? As programmers, we bring them to the front of our thoughts or push them to the back, depending on whether or not they form part of the problem at hand. We also add and remove elements from them as we craft our machine2.
My issues with the way that packages are frequently used is to do with how they are named and also what they contain. Clearly, these two issues are intimately related, for once we accept that a package is a means of abstraction, then the name must be chosen to denote its contents, otherwise the abstraction is fundamentally broken.
The problem with the naming of packages is that, even though the dot-separated elements of the name do not define a package heirarchy, they often do collectively imply a genus et differentiam classification scheme. For example, the fictitious3 package name com.stateofflow.journal, given the conventional interpretation, tells us that:
- The package belongs to a commercial organisation (‘com’), not to a non-profit organisation (‘org’) or one of the other common prefixes.
- The name of that commercial organisation is ‘stateofflow’, not any of the other possible names for commercial organisations.
- The package belongs to the ‘journal’ project of the stateofflow commercial organisation, not to any other project within that organisation.
In other words, each element of the package name, when read from left to right, refines what has gone before. Many, if not most development teams follow this kind of convention for the first part of their package names.
However, what often happens immediately after the conventional (and frequently redundant) prefix is that this scheme breaks down because the team fails to continue the classification using the most salient attributes first.
For me, the most important distinction separates the domain model, user interface, persistance, etc4. So, the next element in my package names, after the company and project prefix, is usually ‘model’, ‘persistance’, ‘ui’, etc. By this method, we separate the concerns of our machine as soon as possible.
After this, the situation is entirely application dependent. However, the notion of distinguishing objects according to their most significant differences first still holds. It is a process of stepwise refinement that will yield semantically rich package names that serve their respective abstractions well.
Now that we have good names for our packages we must put good things into them. This too is frequently done badly.
Two of the oft stated OO notions are that classes should:
- represent a single abstraction. Classes that do not conform to this notion lack cohesion and should be split up.
- expose the smallest possible public interface. Implementation details should not be exposed on the public interface.
I suggest that Java packages should conform to the same notions. This means that packages should contain a small number of classes and interfaces (in general, more than about 6 makes me nervous), only a small fraction of which (one or two) are exposed to the rest of the system, all of which collaborate to fulfil the connotations of the package name.
Instead I often find that packages contain:
- far too many types, suggesting a lack of cohesion
- far too many public types, suggesting that the package is leaking its implementation details, or that the package lacks cohesion.
- types that do not collaborate with each other
Part of the problem is that developers often introduce packages with names such as com.foo.myproject.servlets and then place all of their application’s servlets in that single package. Although ‘servlet’ is a well defined and precise term in the context of J2EE APIs, it connotes far too much for this context and is therefore imprecise. It is likely that responsibilities could be broken out of the servlets to create smaller, more tightly focussed classes. The appropriate thing to do once that is done is to move the extracted classes and the original servlet to their own appropriately named package, making the original class public and all of the extracted ones package visible. This will provide much greater clarity (and significant unit-testing ease) than before the refactoring.
Another part of the problem is that developers seem reluctant to create packages and seem to prefer to put new types into existing packages. This usually results in the problems described above.
Just as classes and interfaces should be small and tightly focussed, according to the OO ideal, so too should packages. Their names should be well chosen according to a defined naming scheme – my preference is modelled after a heirarchical classification scheme – and their contents should be a collection of collaborating types and other resources, most of which are hidden from the rest of the system.
By doing this, the package structure presents a level of abstraction above that of the Java type system. This significantly aids comprehension of the codebase.
2 Note that the two operations we do not do with packages are add and remove a package to or from another package. The notion of ‘package heirarchy’ is a misnomer. However, it is a useful and, as far as I can tell, harmless conceptual slip and I don’t usually hesitate to make it.
3 Any similarities to real package names, living or dead, is entirely accidental.
4 Others may have a different notion of what is important and if they can express it clearly, I might give it a try. If it can’t be expressed clearly, I might try to help clarify it, but I won’t be trying it out just yet.