When designing a database schema, it's often important to assign a unique identifier to each record. Such surrogate keys almost always make querying for data both simpler and faster, and the overhead of an additional column is usually a cost worth paying. As such, nearly all databases provide some means of generating such identifiers, either in form of sequential numbers, or more fancy UUID schemes.

Snowflake macro photography 1

As is often the case at The Daily WTF, however, many developers prefer to reinvent the wheel. Today's code came to light when our Anonymous submitter—a mid-level programmer maintaining the codebase written by a team of senior developers and highly paid consultants—was tasked with investigating a failing bulk import from an external system. The procedure didn't throw any errors, but for some reason only half of the 12 million records ended up inserted into the database.

Digging through the code, he finally found the source of all his problems:

 public static int GetUniqueKey(int value1, string value2)
    {
      int key = value1.GetHashCode();
      key = (key * 3) + value2.GetHashCode();
      return key;
    }

The original developer thought she was being clever, using a pair of keys that uniquely identified a record to generate an ID for the insert (and multiplying one of them by 3 for whatever godawful reason). What she failed to account for, however, was the birthday problem. With the identifier being only a 32-bit integer, it was almost certain that among the 12 million IDs generated, many of them would not be unique. Together with the insert library silently dropping entries with the same key, our poor submitter ended up with a recipe for disaster on his hands.

Our submitter suggested using an MD5 hash to generate unique keys instead, not realizing that this code was guarded by the red-taped barbed wire of cronyism. He was told that his approach would introduce an unacceptable performance hit. And after all, this code had been written by the lead-est of lead developers. It was copy/pasted throughout the codebase. How could he possibly improve upon it?

Code that simply generates sequential IDs would be an improvement, but our submitter's too busy applying to new jobs elsewhere to bother pointing this out.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!