Understanding Trade-offs in IT
Everyone is always looking for the best, perfect thing. Physicists spend their entire lives looking for a single equation that explains the entire universe. In IT we are looking for the perfect solution to all our problems. I’m here to tell you: no such solution exists. No single solution will solve all your problems. In fact, some solutions to problems introduce problems of their own. Everything in life is a trade-off, and IT is no exception.
Understanding these trade-offs is therefore key to designing and implementing solutions. Thus, I want to go over some common trade-offs one is likely to encounter.
Time vs. Space
This one is pretty fundamental to all of computing. In Computer Science, all computations are a balance of time and space. You may have heard of time and space complexity. You can make operations faster by using more memory or storage, for example caching data. On the other side of the coin, if your goal is to reduce memory or storage usage, the CPU may need to perform more operations. The best we can hope for is to strike a balance between time and space that is suitable for most people and programs.
Another example is in storage. If you wish to save storage space, you can compress or deduplicate (dedup) your files. However, now the CPU needs to spend more time compressing, decompressing, and deduping, which will increase CPU usage (and possibly latency). Opting not to compress your data will obviously use less CPU cycles, but eat away at storage space.
Mean Time to Recovery vs. Storage Utilization
Speaking of storage, if we want to quickly recover our backups, we can use snapshots. Snapshots are essentially pointers to files and datasets at a specific point in time. Most filesystems and backup tools are intelligent about how they manage this, often using copy-on-write (CoW) or block-level tracking—but that’s the basic idea. This makes restoring from snapshots near instantaneous, since the data is readily available. The expense is at storage, since this often requires storing multiple versions of the data (usually deduped at least).
If you instead want to save on storage space by using incremental backups, it will typically take longer to restore, since incremental backups need to first restore from the last full backup, then go through the chain of incrementals. Your Mean Time to Recovery (MTTR) will be longer. Which solution is better for you? Ultimately, it depends on your business goals and tolerance for risk.
Security vs. Accessibility
The most secure system is a laptop locked in a safe at the bottom of the ocean. How accessible is this? Not very. But that’s literally by design. Generally speaking, the more secure you want your system to be, the less accessible it will be.
I am not advocating for not securing your systems. What I am saying is, depending on the level of security you wish to implement, this will potentially cost you more time accessing those systems. Again, by design.
Cheap, Fast, Good
Everyone knows this one. Pick two.
Availability vs Consistency
Imagine you and your co-worker are both editing a google doc. If one of you suddenly loses network access to the document, should the document still allow that person to continue editing locally? If so, your documents will be out of sync. If not, that person will lose access to edit the document. This is the basic idea behind the CAP Theorem.
This applies to more than just online documents, it’s also true of databases, or any distributed system. You may have heard of the chat and communications protocol Matrix. If you’ve ever been in a Matrix chatroom, you may have noticed that some messages occasionally appear out of order or seem to disappear temporarily. This is because Matrix prioritizes availability over consistency.
DNS is another example of a distributed system that prioritizes availability over consistency. When adding a new DNS record, it takes time for the record to propagate to other nameservers. It would be a pretty bad experience if every time someone updated a DNS record, the system became unavailable until nameservers had synchronized!
Latency vs. Throughput
If I need to move across the state, I need to call a trucking service, schedule an appointment, and then load all my stuff into the truck, and finally drive to my destination. This requires more planning and time up front, but once everything is loaded, only a single trip is needed. If however, I want to use my car, I can hop in my car quickly, but it will take multiple trips to move my things. This is a perfect analogy for the trade-off between latency and throughput.
This applies to both block sizes of filesystems, as well as MTU for network packets. Smaller block size or MTU will reduce latency, but may increase total amount of time a process spends transferring the entire dataset. If you’ve ever tried to upload or copy many small files, sometimes the operation times out. However put those files in a tar archive, and you just have one file that uploads very quickly.
Conclusion
The answer to most things in IT is: “it depends.” It depends on your goals, your constraints, and what you’re trying to achieve. We must recognize when one solution is appropriate, and when it is not. In IT, perfection is a myth-balance is the goal.
Image: