A depressing quantity of software is what I would call a "data pump". I have some data over here, and I need it over there. Maybe I'm integrating into a legacy app. Or into an ERP. Or into a 3rd party API. At the end of the day, I have data in one place, and I want it in another place.
Sally has a Java application written in the Quarkus framework, which has a nightly batch that works to keep a table of Bar entities in sync with a table of Foo entities. (This anonymization comes from Sally) These exist in the same database. There is also a Bar webservice, which provides information about the Bar entities. The workflow, such as it is, is that the software needs to find all of the Foo entities that do not currently have associated Bar entities, and then call the Bar webservice to get the required information to create those Bar entities.
Let's see how that works.
@Inject UserTransaction transaction
// If this is annotated with @Transaction the usage in the Message function down below will have some Thread exception
public List<FooData> getAllFoos() {
try{
return fooDataRepository.findAllFoos();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
We'll worry about that comment in a second, but this function returns a list of all of the Foo objects in the database. It does not return a list of all the Foo objects without associated Bar entities. It's just the whole giant list of everything. The underlying database is a standard relational database; it'd be trivially easy to write that query, even going through the ORM.
Well, that's bad, but it's all pretty minor. How does the actual update go?
// Can't be annotated with @Transaction because Oracle DB can handle the given Amount of dataEntities in one Transaction '\._./'
Message updateBarsWithFoos() {
List<FooData> foos = getAllFoos();
if(!foos.isEmpty()){
foos.forEach(foo -> {
try{
transaction.begin();
if(barRepository.findByName(foo.getName()) == null){
if(barDataService.searchByName(foo.getName()) != null && barDataService.searchByName(foo.getName()).marker() != null){
barRepository.createBar(barDataService.searchByName(foo.getName()));
}
}
transaction.commit();
} catch (Exception e) {
try {
transaction.rollback();
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}
});
}
return new Message(MessageLevel.INFO, "Created bars")
};
Ah, the real WTF is that it's an Oracle database. That's always a WTF.
But let's trace through this code.
We get all of our Foo entities. We check for emptiness and then do a forEach, which seems to make the empty check superfluous: a forEach on an empty list would be a no-op anyway.
We start a transaction, then check the database: if there are no Bar objects that link to Foo, then we call into the barDataService to find data. If there is, we call into the service again, to see if the marker property is not null. If it is, we call into the service again to get the actual data we're putting into the database. Then we close the transaction. If anything goes wrong, we rollback the transaction and chuck an exception up the chain.
That is three web service calls inside of a database transaction. Three calls which could easily be one, and that call could easily also happen outside of a transaction if you're mindful about confirming your constraints. And of course, because they're not mindful at all, they need to manage the transaction directly, and can't use the @Transaction annotation provided by their framework, which would at least cut down on some of the boilerplate.
Now, I'm sure you'll be shocked - shocked - to learn that the webservice is actually a bit flaky, and thus times out from time to time. And this isn't the only batch job running, which means the long-lived transactions cause all sorts of contention and terrible performance across the various batches. And this app doesn't have its connection pool properly configured, so the entire software stack can exhaust all of its database connections surprisingly quickly, causing yet more failures.
The root of the WTF, of course, is doing this as a batch job. A well engineered application would do everything it could to not create data in the database that isn't referentially sound. There, Sally gives us the one bit of good news:
My current project will do away with the batch processing altogether, so we can say, "RIP, transactional wholesale triple caller!"