Solving software problems the hard way

Many years ago, I wrote software for the federal government. Our application helped a couple of agencies produce the documents they send to the President and Congress to ask for money. I mean their official annual budget request, not like “Hey, Joe, you got five bucks on you?” I assume they did that in person. The basic process was our code produced some boilerplate, some tables based on numbers they uploaded, and then inserted text documents that they also uploaded. It then spit out a completed document. With modern technology, this would be pretty straightforward, but this was 2008 federal government tech, so it was held back by all sorts of things.

The code mostly worked fine without major changes EXCEPT for one thing. Treasury liked to change the table heading colors every year. This was a HUGE problem. At the time, we were using a very old document type called Rich Text Format. Maybe you are old enough to have seen an RTF file. It was made by Microsoft to be compatible with non-Microsoft Word programs. In hindsight this is sort of hilarious. To change the color on an RTF file, the code ripped the header off the file and inserted a new one. Then you ran the code and hoped that you did it right. Usually, you did not, so you tried again. It was an ENORMOUS amount of work for something that I thought should be trivial – there were all sorts of escape characters and weird abbreviations and none of it made any sense. It was sort of like HTML with a CSS file except where CSS is supposed to be simple and straightforward (Cascading Simple Straightforward, that’s CSS), this was the opposite.

So I thought, surely there is software out there that we can use to do this instead of rolling our own. And there was! We were a Java team, and at the time, Apache POI was what everyone in Java used to do text documents. I was excited! A polished open source program that was free to use! No more headaches!

And then there were headaches. You see, our Java code lived inside an Oracle database. And the Oracle database version we were stuck on only supported Java version 1.4. Apache POI required 1.5. There was no way I, a lowly software contractor, could get them to move to a newer Oracle version that would support 1.5.

So I did what any young software person in that situation would do – I wrote my own code to create and manipulate DOCX files. Do not attempt to read this code, you will cry, I promise. Interesting note – most (all?) Microsoft Office files (.docx and .xlsx for sure) are just a zipped folder of human-readable XML files. You can just unzip them and look at the files. During my time writing this code, I looked at those files A LOT. More than was healthy, probably. Much longer and I would have been able to open a Word document and see the XML, Matrix-style.

But in the end, it worked! You can see some of the documents my code produced if you look at the Congressional Justification and Budget in Brief at treasury.gov. I don’t know how many years they used my code – I left the project in 2010 – but I’m sure they used it in 2008. I get a kick out of knowing that budgeting decisions were made in part using a document I coded.

Cross Site Scripting is not the same as CSS

In my previous life as a coder, I worked for a while in the DHS Office of the CIO with the accountants. We wrote and managed a website to let the DHS components (Secret Service, CBP, etc) submit their monthly accounting files.

First, some background on building websites. Skip this if you’ve ever built a website. To build a page, all you need is HTML. Think of it like your docx file – your word processor reads a docx file and shows you a pretty document. Your web browser reads an html file and shows you a pretty web page. This is a massive oversimplification but we’re moving on.

One optional thing you can have in your HTML file is CSS. You know how links in pages are blue and underlined most of the time? Let’s say you want them all to be orange. You can add CSS to your HTML page that says “make all the links orange”. It uses different syntax but that’s not important. Now your browser knows to make all the links orange.

Nearly every website you have ever visited uses CSS to make things look pretty. It’s like how nearly every car you’ve ever seen uses paint. It’s POSSIBLE to have a car without paint, but it looks dumb and breaks if it goes through a carwash.

Tesla Cybertruck
A bug stupid truck with no paint

So there’s something else called Cross Site Scripting. This is bad. It’s complicated if you don’t understand it already but all you need to know is that this is one method people use to try and steal your credit card or whatever.

You’ll note that Cross Site Scripting could also be written as CSS. To keep people from getting confused, we use XSS for Cross Site Scripting.

Except DHS IT Security. They use CSS for both, and ban both from all DHS computer systems. Or they did in 2012, I haven’t worked there in a while. But know that this was at least as stupid then as it is now, it’s not a new thing. You would think the agency tasked with protecting US computer systems, among other things, would be knowledgeable about those computer systems.

Crimes against good coding practice

I did software as a contractor for various federal agencies for years. It was a good gig for the most part – I got out when they wanted me to write less code and go to more meetings. I can assure you that I did NOT go into software because I wanted to go to meetings.

We were often forced to code in very strict and unfriendly conditions. I like to compare myself to the early Nintendo developers, who I believe had to use short variable names to save space, which seems utterly absurd now, but so do a lot of things.

Anyway, we had this custom web application framework that my then-boss had built. In many ways it was a glorious triumph of engineering. In many other ways it was a steaming pile of garbage. It had to run on an Oracle application server. It was written in PL/SQL. It worked well, but it was deeply flawed in ways I didn’t really understand then, but in hindsight I sometimes have nightmares.

One of the things it did really poorly was form submission. The web framework required a consistent url structure and was completely inflexible on this. For some reason we were not passing form values in POST – I didn’t understand the difference then, and probably no one else on the team did, either. None of us had gone to school for programming. In our application, there was this big procedure that had a giant conditional that took in parameters and then decided which procedure to call to build the intended web page:

if page == 1 then home() else if page == 2 then page2()

Something like that, except there were like 100 entries. The problem was that this procedure expected all parameters as url parameters. So if you wanted to record that Bob had made 10 widgets today (this is not what our website did but you get the idea) you had to write a url like:

ourcoolsite.gov/show?page=4&employee="Bob"&widgets=10

Except that didn’t work with the show procedure. It had to be consistent – if you passed “employee” to show in this context, you had to pass it for every page in the website. The solution for this was to pass a bunch of pairs of parameters – a name and a value – so it was consistent. So now you had to write your url like:

ourcoolsite.gov/show?varname1="employee"&varval1="Bob"&varname2="widgets"&varval2=10

There was this hacky bit of Javascript that would take a form and translate it into this format for submission. It was less than ideal. But it worked. And really, as a coding practice, I FEEL this. The guy who wrote the web framework got it to where it worked for him and then left it. We’ve all done that on code we use for ourselves. Yes, even you have done it, don’t lie. But this got annoying really fast: “Was employee varval2 or varval3”?

So what I did, and let me tell you I was smug AF about this – I wrote a show2 procedure that expected all the parameters as one JSON-formatted variable. Now, I wish I could recall whether or not I started POSTing the forms or if I still put it in the url. Let’s say I POSTed it because no one who can say I didn’t is ever going to read this. But now you had something like this:

{"page":2,"employee":"Bob", "widgets":10}

Much better. Not good, but better.