Contracting Space

A ticket came in marked urgent. When users were entering data in the header field, the spaces they were putting in kept getting mangled. This was in production, and had been in production for sometime.

Mike P picked up the ticket, and was able to track down the problem to a file called Strings.java. Yes, at some point, someone wrote a bunch of string helper functions and jammed them into a package. Of course, many of the functions were re-implementations of existing functions: reinvented wheels, now available in square.

For example, the trim function.

    /**
     * @param str
     * @return The trimmed string, or null if the string is null or an empty string.
     */
    public static String trim(String str) {
        if (str == null) {
            return null;
        }

        String ret = str.trim();

        int len = ret.length();
        char last = '\u0021';    // choose a character that will not be interpreted as whitespace
        char c;
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < len; i++) {
            c = ret.charAt(i);
            if (c > '\u0020') {
                if (last <= '\u0020') {
                    sb.append(' ');
                }
                sb.append(c);
            }
            last = c;
        }
        ret = sb.toString();

        if ("".equals(ret)) {
            return null;
        } else {
            return ret;
        }
    }

Now, Mike's complaint is that this function could have been replaced with a regular expression. While that would likely be much smaller, regexes are expensive- in performance and frequently in cognitive overhead- and I actually have no objections to people scanning strings.

But let's dig into what we're doing here.

They start with a null check, which sure. Then they trim the string; never a good sign when your homemade trim method calls the built-in.

Then, they iterate across the string, copying characters into a StringBuffer. If the current character is above unicode character 20- the realm of printable characters- and if the last character was a whitespace character, we copy a whitespace character into the output, and then the printable character into the output.

What this function does is simply replace runs of whitespace with single whitespace characters.

"This        string"

becomes

"This string"

Badly I should add. Because there are plenty of whitespace characters which appear above \u0020- like the non-breaking space (\u00A0), and many other control characters. While you might be willing to believe your users will never figure out how to type those, you can't guarantee that they'll never copy/paste them.

For me, however, this function does something far worse than being bad at removing extraneous whitespace. Because it has that check at the end- if I handed it a perfectly good string that is only whitespace, it hands me back a null.

I can see the argument- it's a bad input, so just give me back an objectively bad result. No IsNullOrEmpty check, just a simple null check. But I still hate it- turning an actual value into a null just bothers me, and seems like an easy way to cause problems.

In any case, the root problem with this bug was simply developer invented requirements: the users never wanted stray spaces to be automatically removed in the middle of the string. Trimmed yes, gutted no.

No one tried to use multiple spaces for most of the history of the application, thus no one noticed the problem. No one expected it to not work. Hence the ticket and the panic by users who didn't understand what was going on.

[Advertisement] Picking up NuGet is easy. Getting good at it takes time. Download our guide to learn the best practice of NuGet for the Enterprise.