Jakob Østergaard Hegelund

Tech stuff of all kinds

Before acquiring an Oracle ZFS Appliance

2014-02-14

As a first post in the series, it makes sense to just briefly set the scene for what lead up to us using this particular system. Maybe some of this can be useful to others in the same situation as we were.

Before actually acquiring these systems you need to consider whether they are the right option of course. What we needed and what we felt that the Oracle ZFS Appliances could deliver, are:

Aside from obviously testing the Oracle ZFS appliance, I participated in testing both a NetApp system (which promises to do much of the same as the Oracle ZFS Appliance) and an IBM Storwize v7000 Unified. These would be typical competitors to the Oracle system and it would be fair to give a short rundown of why we ended up with Oracle rather than the others:

NetApp

Everyone knows NetApp - I don't know if anyone ever got fired for buying NetApp... First of all, the web user interface of the NetApp is not very impressive - it is very rudimentary and you have nearly no insight into what the appliance is doing. Which client is doing how many IOs, what are the latencies, how many writes versus reads are going to the bottom shelf, etc. etc. These are all questions you cannot answer by looking at the user interface. So that's a downer, if your job is to figure out why your storage infrastructure isn't delivering... Second, and this was probably what did it for us, was, we could simply not get decent NFS performance from it when doing IO on huge numbers of small files (typical Maildir setup). We were testing in a vendor lab with a vendor provided engineer and he ended up concluding that the performance was "just fine". We could see that if this was "just fine", then "just fine" would not cut it for us. So basically we decided against it because of manageability and performance.

IBM Storwize Unified

The new kid at the block (at that time). The Storwize is a block storage system (and probably a rather good one at that) that does iSCSI (and other block protocols if you need them), coupled with a clustered pair of GPFS serving front ends for your NFS (and CIFS) needs. Now, using GPFS in a cluster is (at least in theory) brilliant, because this is a properly clustered file system quite unlike both NetApp WAFL and Oracle ZFS. What this means, is, the clustered heads run active-active, and if one fails, you do not have downtime at all. There is no fail-over delay in an active-active setup. The UI was modern - I think it is the UI team from the XIV which got to do the UI for the Storwize too - pretty and functional. It had many more data points to inspect than the NetApp (but it doesn't come near the Oracle). Performance on the small file NFS workload was much better than the NetApp - it was, as I remember it, acceptable. In the end, it was a closer race between the Storwize and the Oracle - we needed site replication of data, and the Storwize could not do that at the time. This was one of the major factors in us choosing Oracle over Storwize. Everyone on the team would have their own angle on these systems of course, but a couple of downsides on the Storwize that I noted were:

To be fair there were points I really liked too And well, there really were no points where I would say I liked the NetApp. The things it did ok, it did ok - but it did not do great anywhere. The Storwize showed potential, even if it might be a little immature at the time - to be fair, we tested this system at a very early stage and I am sure it must have improved since then.

Oracle ZFS Appliance

So what happened when we tested the Oracle appliance? One was provided to us for testing and we ran the same workload on it as we did the others. A few things stood out when testing this system:

All in all, it makes for an easier system to work with on an everyday basis, when operations are instant and intuitive. Of course you can live with file system provisioning taking time - I am just saying that it is convenient that it is instant. Maybe 16 snapshots per share are enough, but it is nice to not have to deal with such limitations (24 hourly snapshots plus 30 daily snapshots is handy for always on-line backup - just to pick a real world example). Simplicity and convenience mattered much to me in this decision process.