Not too long ago a Internet of Things (IoT) startup posted a impassioned piece on how their cloud provider had changed their billing procedures, and nearly drove the startup out of business. They had put significant functionality behind a cloud provider, given themselves no way to update their products, no way to switch providers, and generated a large amount of network overhead for the cloud provider that was now being reflected in their bill.
Their technical debt bill had come due.
While the article included all the pieces for a competent technical postmortem of the technical debt that lead to the issues, those themes got lost in the comments and overall emotion of the post. Without name calling, I’d like to talk more abstractly about the takeaways we can find for the limits of YAGNI in the Minimal Viable Product (MVP) phase of software development.
First, IoT deployments don’t have a default way to auto-update the way servers we control. Desktop, mobile, and console deployments have similar constraints, but overall IoT is a different level. Our example MVP had no automated update mechanism in place, which is surprisingly common today in the IoT field. Especially when such devices connect to hard coded URLs, as it did in this case, this omission can render these machines effectively inert. Auto-update doesn’t have to be in your business pitch version, but it should be in your device before you ship. Sending anything else out the door is making pie crust promises: easily made, easily broken. You can enable automatic app updates in Windows 10 IoT Core and similar functionality is configurable in *nix based systems.
Second, connecting to external APIs directly is not a successful way to ensure reliability of your system. This is obviously true for when the external API is running someone else’s code, but the lesson we are presented reminds us that it is true for cloud providers hosting our own code as well. This doesn’t have to be a downside. Programming additional layers of abstraction, fallback mechanisms, and other reliability constructs can lead to us producing more reliable, micro-service oriented code. The concrete suggestion here was to connect to the external resource via a DNS address on a domain the developer controlled, perhaps a reverse proxy as well for a caching layer. The caching layer could have helped as well with the last issue as well.
Finally, SSL sessions and connections are not free and there is a surprisingly large amount of technical detail you can learn to offset that cost. If you control your own hardware you can easily monitor and shift that cost, for instance by terminating the SSL connection at a hardware load balancer. But for external resources, it may not be as apparent. Especially if you send a large number of requests (even if each request itself is small) the overall overhead can be substantial. The trick here is to use techniques that allow you to keep the SSL session open, letting you skip the SSL handshake on future requests. Or as we said before, you can introduce a caching layer. A request you don’t send doesn’t have any cost (for SSL).
Taken all together, failing to see these limits to the YAGNI philosophy almost cost a startup all of their customers. While YAGNI is a good principle to follow within our own code and resources, we have to “over engineer” a bit when dealing with external resources to make up for the unknowns (both known and unknown varieties). An easy to operate update mechanism and/or an extra level of abstraction when accessing external resources lets us recover from failures faster. A little more care when dealing with SSL connections can save us a lot of network overhead. Keep that in mind the next time your project has you connecting externally to any service, on your hardware or others’.