One thing that seems invariable is that all vendors flaunt IO numbers, in some cases numbers that are almost impossible to believe, out to prospective customers. As customers though, we are often just taking it in without a second thought about the matter. I want to write this from a standpoint of an engineer and a rational observer and present the case for why this is mostly useless data and should not be center of attention when you are looking at storage products and making your evaluation decisions.
First and foremost, what is unfortunately a norm is that IOP numbers are a trend which the folks in Sales and Marketing groups leverage about so as to enable more product sales. This of course means that more is better, because now product A appears more capable than product B, every vendor’s goal is to make their product appear more capable, of course.
I would argue that you should consider features, capabilities, support, and attention to your specific needs, before anything else. At RackTop we understood this a long time ago, and so we work hard in the pre-sales phase to help customers understand their environments better, and empower them with knowledge and expertise about their environments to help us guide them in the best possible solution, getting most out of every dollar. I want to preface this by saying that I am completely ignoring the fact that IOPs are meaningless without understanding latency, and would like to really only focus on IOPs and their faulty representation.
How IOP numbers are realized?
When engineering and performance folks are approached about measurement of things, most of them with a fairly deep level of expertise will immediately overwhelm those approaching them. Remember, typically Marketing and Sales folks will be going to Engineering and Performance folks for this, looking for some shiny numbers for a product paper that is sure to net them some new customers and perhaps a nice bonus or a position bump even.
Their eyes glaze over after just a few minutes of conversations with engineers who tell them this: “You can measure IOPs in about 100 places in the system and depending upon where you measure 1 IOP may actually be 10 or 100.” This totally perplexes people without deep understanding of systems, in particular understanding nature of their own storage systems.
The Numbers Fallacy
Why is this? Well, something that starts our as many IOs may in fact get coerced into groups in order to improve efficiency in one part of a system, while the opposite may be true in another part, where say a 128K IO becomes 32 4K IOPs that are spanned across a number of SSDs in a stripe or a RAID group, potentially maximizing performance. Well, this alone is an example of a 32X factor. So, if say someone is running a Virtual Machine on a VMWare host or some cloud system in an OpenStack environment over KVM, etc., connected to their storage and measures IOPs with say IOMeter a fairly well respected tool, at least on Windows-based systems, one might see 1000 IOPs, yet same IOs measured on storage perhaps at this point where they are chunked into 4K pieces, would measure a whopping 32 thousand (K) IOPs instead.
When engineers are asked whether this is a real number, the answer is yes, of course. Marketing and Sales people will very quickly skip over the 1000 IOPs reported by IOMeter and immediately note the seemingly effortless 32K IOPs reported by the system. A number, which actually is COMPLETELY meaningless to you as a shopper. I will stop short of saying this number is bogus, because it is not. However, it is still irrelevant to how much actual work a system is capable of.
It gets worse when Engineers hint at the fact that IOPs technically are additive. It often puts non-technical folks on a course to believing that this is reasonable math. Say you have a system with perhaps 24 SSDs, each specified by manufacturer as being capable of 50K 4K Random IOPs. Let’s say those 24 SSDs are mirrored, for redundancy. Math would suggest that at least on reads, since we should always be able to read from both sides of a mirror, 24 * 50K IOPs == 1200K IOPs, or 1.2M IOPs. In reality, there are points in the system that simply make this impossible, but the number if theoretically real. It is a Marketing number not a “real-world” number, i.e. you are guaranteed to only ever see it on paper. Better engineers will caution folks who talk to them about this fallacy, but sadly, far too often even this is glossed over simply because the other guys will do same, or so the Marketing and Sales folks believe, and they are sadly right.
Always keep in mind that IOPs could be measured in a lot of places in any complex system. For example HBAs (Host Bus Adapters) and RAID Controllers also have mechanisms built into them to group and break-up and shuffle IOs to maximize performance to disks, and so do some modern filesystems, especially those with integrated disk manager, like ZFS, BTRFS, etc. Try asking how the derived IOP numbers were obtained, what the process was and what was actually being measured. I promise you, the better companies out there will let you talk with their engineers who can hopefully help to reveal what was behind the curtain and whether the benchmark holds any water at all.
An Engineer’s Advice
Don’t ask: “So, how many IOPs is this thing capable of?”, instead focus on the features, what it can do for you and what it cannot do for you, and ask the vendor about how they can help you determine just what the needs of your environment are. Of course, this is why there are demo systems that you deploy in the comfort of your environment and test in the “real world”.
No two environments are alike, and a 4K IOP means vastly different things when measured by a client application from say a virtual machine attached perhaps through NFS that host mounts from a storage array versus measuring with analysis tools on the array itself. And again, even on the array itself, a life of an IOP is completely uncertain and one could become many, while many could become one.