Sorting Data Objects

Many projects require sorting of collections of your data objects. With the java collections framework, you are supplied two interfaces: Comparator and Comparable and two utility methods: Collections.sort(List) and Collections.sort(List, Comparator).

These utilities suggest placing lots of anonymous inner classes in your code:

Collections.sort(people, new Comparator() {
	public int compare(Person o1, Person o2) {
		return o1.getGivenName().compareTo(o2.getSurname());
	}
});

Note that there is a blatant error in the above code, I’m sure that I wanted to sort my people by their given name, but for some reason, they are not coming out in the right order.

So you write your unit test for your class and go back to fix your error.

Two months later, you are working on some slightly unrelated code in another part of the codebase, and again you want to sort your people list. By now, you have forgotten that you wrote your anonymous inner class to sort your people, so you write it again.

Firstly, you’ve just duplicated code. We all know this is bad, and there are tools to look for duplicated code, since this snippet of code is very short, it is unlikely that it will be picked up with these tools, so it sits there and festers.

An answer to this issue is to collect all your comparators in one place, and where better to put them then in the data class that is being sorted (assuming that you have control over the data class itself).

import java.util.Comparator;
import java.util.Date;

public class Person implements Comparable {
	private final String givenName;
	private final String surname;

	// getters, constructors, hashcode, equals

	public int compareTo(Person o) {
		return Comp.COMBINED_NAME.compare(this, o);
	}

	public static enum Comp implements Comparator {
		SURNAME {
			@Override public int compare(Person o1, Person o2) {
				return o1.getSurname().compareTo(o2.getSurname());
			}
		},
		GIVEN_NAME {
			@Override public int compare(Person o1, Person o2) {
				return o1.getGivenName().compareTo(o2.getGivenName());
			}
		},
		COMBINED_NAME {
			@Override public int compare(Person o1, Person o2) {
				int r = SURNAME.compare(o1, o2);
				if (r == 0) {
					return GIVEN_NAME.compare(o1, o2);
				} else {
					return r;
				}
			}
		},
		;
		@Override public abstract int compare(Person o1, Person o2);
	}
}

And using it:

List people = new ArrayList();
// add elements.
Collections.sort(people); // gets the default sort order
Collections.sort(people, Person.Comp.SURNAME); // sorts by the surname

There, nice and tidy, data and associated functionality encapsulated as it should be, and the code that uses it is neat and self explanatory. You can either use the Collections.sort(List) method to get a default sorting order or you can use the Collections.sort(List, Comparator) using one of the enum elements.

Posted in Code | Tagged , , , | Leave a comment

Java Internationalisation

Internationalisation is a continuing problem for software development. Changing demands, adding, removing or enhancing features leads to translated phrases becoming obsolete. Thus there is a need to continually update translation files.

The act of translating the words and phrases is usually not the job of the programmer, so there needs to be an easy way for these changes to be made.

The standard mechanism for translating an application into multiple languages is for the programmer to litter the user interface code with markers, saying “use this phrase here”, and then some generic code will replace that marker with the correct translation for the language that the user has selected.

Current Solutions

Standard JDK

In Java this is frequently done using the java.util.ResourceBundle family of classes, the most popular of these is the java.util.PropertyResourceBundle. The code might look like:

HelloWorld.java
import java.util.ResourceBundle;
public class HelloWorld {
	public static void main(String[] args) {
		ResourceBundle bundle = ResourceBundle.getBundle("HelloWorld");
		System.out.print(bundle.getString("hello_world"));
	}
}

With the translation files looking like:

HelloWorld.properties
hello_world=Hello World!
HelloWorld_fr.properties
hello_world=Bonjour tout le monde!

If your system locale is set to French then the output of running this is simply

Bonjour tout le monde!

Any other language and the default language is used, so the output would be

Hello World!

Eclipse1

A bundle is defined by a class:

public class HelloWorldBundle {
	public String hello_world;
}

There is a helper class that reads the translation file and fills in the field values, so if you specify French, then the HelloWorldBundle_fr.properties is used.

With client code like:

import java.util.Locale;
import java.text.MessageFormat;
import java.util.ResourceBundle;
public class HelloWorld {
	public static void main(String[] args) {
		HelloWorldBundle bundle = BundleHelper.getBundle(HelloWorldBundle.class, Locale.FRENCH);
		System.out.print(bundle.hello_world);
	}
}

The expected output is:

Bonjour tout le monde!

Things start getting more complicated when you want to say something like “It looks like there are 15 apples on the tree.” where the number 15 comes from some algorithm that looks at a photograph of an apple tree and counts the apples. You could make this two keys, the first one being “It looks like there are” and the second one being “apples on the tree”.

This approach has three major flaws: firstly, if there is only one apple then the sentence would be incorrect and would read “It looks like there are 1 apples on the tree.”. Secondly, the number may be at the start of the sentence, or the end; the position will vary on the particular language; thirdly, if your translator actually wants to say “I see 15 apples! Yes 15 whole apples!” then you just cannot do this with the current code.

To our rescue is the java.util.MessageFormat class, this allows us to do variable substitution in a String;

It looks like there {0,choice,0#are|1#is|1<are} {0,number,integer} apple{0,choice,0#s|1<s} on the tree.

This starts to get complicated, but see more precise documentation2 3 on how they work, suffice to say, if the number is a 1 then the string will be “It looks like there is 1 apple on the tree.” and any other positive number, the string will read “It looks like there are X apples on the tree.”

Your client code may then look something like this:4

import java.text.MessageFormat;
import java.util.ResourceBundle;
public class Apples {
	public static void showAppleCount(int apples) {
		ResourceBundle bundle = ResourceBundle.getBundle("Apples");
		MessageFormat format = new MessageFormat(bundle.getString("apple_count"), bundle.getLocale());
		System.out.print(format.format(new Object[]{Integer.valueOf(apples)});
	}
}

Problems

Standard JDK

The first, and most serious problem with the JDK based translations is that you have hard-coded strings in every corner of your code. Having such hard coded strings in your code will lead to simple typing erorrs which will lead to either a bundle not being loaded or a key not being found. both of these issues will lead to ugly views in your application.

The second problem with having strings in your code is that there is a big temptation to do one of:

  1. Store your internationalisation key in your database.
  2. Build your internationalisation key dynamically from various parts in Java code.
  3. Read your internationalisation key from user input.

In a big project, there may be thousands of keys in use across your project. Each key needs translating by your translator, each key takes time to translate, each key therefore costs money. Failing to resist temptation will lead to keys that are translated but can never be used.

It is possible to detect if a key is definitely used, you can search your codebase for "key_name" and if there are any hits then you know for sure that a key is used.

If a key is stored in a database, you have no easy reference to search, so you can’t see if a key is used.

There are a myriad of different ways to get a String in code: build it from a byte array; concatenate two strings; take a sub string of another string; to name but a few. Once you do this, then it is not a trivial operation to discover if a particular String is built, and hence you cannot know that a internationalisation key is used.

Since a user can normally input anything, reading your internationalisation key from user input means that keys can be arbitrarily used, and as such, you can never remove a key from your translations files.

Eclipse

The Eclipse solution goes a long way to resolving the issues above:

  • It is easy to see if a key is used, you can do a usage search5 for the variable, if the variable is not used, then you can be sure that the key is not used
  • It is almost type-safe. Using a key that does not exist will result in a compile error, leading to the programmer fixing it before the application ever gets into the hands of users.
  • Because you can be sure that a key is used, you can automatically check to see if there are any extra key/value pairs in your translation files, and flag any keys that need removing.

One serions issue with this framework is that the fields have to be public and mutable, so any client code can modify your translation values and produce whatever they want. Subverting the intended use of your translation bundle.

General

When you want to do any variable substitution the resulting code that uses the MessageFormat class ends up much more verbose then it needs to be. In the above example, we used three long lines of code to get a fairly simple bit of output.

How do you test to see if your translation bundles are complete? With the JDK version, you cannot. The only way to check is to run your application and if you see an invalid string, go and find the key that is used, then check in the translation file for the value, then correct the issue. This is a laborious process that no one should want to have to go through.

Solution

In a nutshell, make the key-fetch a method call. You get the testability that the Eclipse method has; you get the usage search to see if someone is using your translation key; and the values are immutable.

Your bundle class can now look like this:

public abstract class HelloWorldBundle extends Bundle {
	public HelloWorldBundle(Locale locale) {
		super(locale);
	}

	abstract String helloWorld();
	abstract String appleCount(int apples);
}
[java]
And your client code can now look like this:
[java]
public class Apples {
	public static void showAppleCount(int apples) {
		HelloWorldBundle bundle = Bundle.getBundle(HelloWorldBundle.class, Locale.FRENCH);
		System.out.print(bundle.helloWorld());
		System.out.print(bundle.appleCount(15));
		System.out.print(bundle.appleCount(1));
	}
}

So the output would be:

Hello World!
There are 15 apples on the tree.
There is 1 apple on the tree.

Hang on there, does that appleCount method take a parameter? Since we are implementing the methods at run-time we can add method parameters and encapsulate that MessageFormat code into our implementation, cleaning up our client code dramatically.

This is also type safe, With the previous incarnations of the MessageFormat usage, it is quite easy to output “There are badgers apples on the tree!”. The method parameter forces the variable to be an integer. Unfortunately it is still possible to pass in a negative number to this example.

Restrictions

Much as it pains me, storing an internationalisation key in the database is an easy way to get a locale agnostic storage of enumerated values. Hang on, don’t most ORM layers and some DBMS have ways to store enumerations? So this is a non-issue as you can use a code-level enumeration to fetch your translation and you retain the benefits of the method based implementation.

Internationalisation must be valid method identifiers, ok, so this is not a huge problem, I would hazard to guess that most of your internationalisation keys are already valid method identifiers.

Sources

The sources are available at git://git.candle.me.uk/translations.git

Usage

A pom repository section cam be added to your maven pom.xml


<repositories>
	<repository>
		<id>mvn.candle.me.uk</id>
		<name>mvn.candle.me.uk</name>
		<releases><enabled>true</enabled></releases>
		<snapshots><enabled>false</enabled></snapshots>
		<url>http://mvn.candle.me.uk/repository/</url>
	</repository>
</repositories>

and the dependency can be added:


<dependencies>
	<dependency>
		<groupId>uk.me.candle</groupId>
		<artifactId>translations</artifactId>
		<version>2.0.0</version>
	</dependency>
</dependencies>

Create a class to represent your bundle:

AppBundle.java

import java.util.Locale;
import uk.me.candle.translations.Bundle;

public abstract class AppBundle extends Bundle {
	public AppBundle(Locale locale) {
		super(locale);
	}

	public abstract String none();
	public abstract String one(int n);
	public abstract String two(int n, String s);
}

Create some translation files:
AppBundle.properties

none=no parameters
one=one parameter: integer ''{0}''
two=two parameters: one integer: ''{0}'' and one String ''{1}''

Obtain an instance of it through one of the BundleService implementations and use it in your code.

BundleService bundleService = new BasicBundleService(new DefaultBundleConfiguration());
AppBundle defaultBundle = bundleService.get(AppBundle.class);

System.out.println("none: " + bundle.none());
System.out.println("one: " + bundle.one(20000));
System.out.println("two: " + bundle.two(9001, "Over nine thousand!"));
  1. I’m pretty sure that this is from the main Eclipse source, but I can’t remember where I actually saw it. []
  2. java.text.MessageFormat []
  3. java.text.ChoiceFormat []
  4. I use the MessageFormat constructor because to format numbers as per the locale correctly, the MessageFormat needs to know which Locale to use, there is no static method that takes (a) the format String, (b) the locale and (c) the list of parameters []
  5. Most, if not all modern IDEs have a usage search, so you can find all places where a particular variable/method/class is used []
Posted in Code | Tagged , , , , | Leave a comment