The Economics of Device Clouds in Continuous Integration

The rent is too damn high.

Okay, let's look at the topic of device clouds. Microsoft's recent purchase of Xamarin means that now Google, Amazon, and Microsoft (3 major cloud computing service providers) all have their own device clouds which they also lease time on to the public (well, Google's purchase of Appurify hasn't resulted in that yet but it's coming soon and is currently in Beta). That makes 3 major cloud players who also build mobile apps buying their own device clouds and renting their overhead to the public.

To me that's purely confirmation that if you intend to use real devices and you're a developer or a company that does a lot of development and testing of mobile apps, particularly on Android, you had better have your own cloud. If it were cheaper to rent than to own, why would those three major cloud providers buy existing device cloud as a service platforms? In reality, Google Ventures was the primary early investor in the Y-combinator-based Appurify platform. Xamarin also had heavy investment from Microsoft prior to their purchase. I've been meaning to write about this on my blog for awhile and I've already circulated a document within my company on the topic. So here we are. Let's talk about the renting versus owning with some numbers after first covering some basics.

What is a device cloud?

A device cloud is a server-managed cluster of hardware used for running automated tests on actual devices (instead of emulators/simulators). In the case of Android, this looks like a bunch of phones and tablets connected via USB to a host computer through a network of powered hubs. The host computer typically must have the Android SDK installed so that commands can be sent to the devices via ADB. iOS devices have a similar need but the XCode requirement limits the host computer options to Apple hardware only typically (though Fruitstrap does make it possible to use non-Mac hardware if absolutely necessary). Depending on the size of the cloud, you will find many connected servers hosting devices via USB.

Device cloud services hosting the physical devices typically offer addons beyond just devices on which you can run automated tests. Most gather device condition data (CPU and GPU stats, memory stats, network stats, etc), screenshots, logs, and collect the results into convenient reports. Additionally device cloud services often include non-scripted UI test tools like random UI stress tools and app crawlers designed to exercise the app in unscripted ways. Some device clouds offer WiFI only connectivity while Appurify's cloud previously offered cellular data connections as well. In general device coverage is incomplete but robust, offering a matrix of hardware and OS versions available per generation of OEM-released devices.

How does a device cloud work with CI?

Device clouds can be included in your CI process on Jenkins through plugins or via command-line steps directly in a properly configured job. In my case, I actually use Jenkins itself to manage my device cloud as slave nodes so no plugins or external command line tools are required. This allows Jenkins to scale my tests by using the load balancing node management architecture it can also use for running build jobs. I have written about how to configure device nodes HERE if you're interested. Amazon's device cloud inherited AppThwack's existing Jenkins plugin. Google's Test Cloud uses a command line interface.

In any case, your hardware flow might look something like this:

SCM server notifies build server of code change
Build server pulls latest changes and compiles APK/IPA in build step
Build server pushes build products to device cloud with test product
Device Cloud server runs tests on devices and collects results
Build server pulls results from device cloud server and publishes results

What is a "device-minute"?

A convenient dimension for comparison. Device-minutes are the total number of minutes used across all devices. As a dimension, that means the total minutes it takes to get results depends on the number of devices used (Only to a point. In reality there is a minimum amount of time each test case takes to execute, a minimum amount of time to deploy the builds to the devices, a minimum amount of time to collect the data, etc). Device-minutes are one of the most common pricing model elements across various device cloud providers as well so if you're playing along at home, consider making your comparisons the same way.

A device-minute is used then as a direct substitution for the total duration of a test run regardless of the number of devices used. If you use a single device, this would be a literal interpretation of the total time it would take to collect results from all tests. If you use 4 devices, the device-minutes would be equal to 4 times the test run duration since your run would be divided across 4 devices. For example if you ran your entire suite against a single device and it took an hour. You'd expect that using 4 devices might make your run take 15 minutes instead. 15 minutes times 4 devices equals 60 device-minutes. We'll come back to this concept again later so don't worry if you're a little confused still.

Okay now let's dig in to the numbers.

The example I will use only includes 34 end-to-end UI test cases and 16 UI stress tests run on every commit because that's the current load on my existing project. I would consider this to be a VERY small suite of tests which is reasonable because we're talking about a Continuous Integration cycle. I do not consider full regression passes of hundreds or thousands of tests to be compact and efficient enough to run inside a CI cycle. These longer runs typically happen on a daily or weekly frequency and while they absolutely should be considered when doing a full model of your own cost comparisons, for the sake of discussion I'm limiting my approach to just what I expect developers to act on immediately during daily work.

Caveat: this is NOT an example of an idealized test suite by any means. You *SHOULD* try to minimize the number of end-to-end tests needed to validate a build. You *SHOULD* build modular enough apps to conveniently test only sections of your UI at a time to reduce dependencies and flakiness. You *SHOULD* consider approaches like mocks, stubs, and hermetic servers to help isolate components under test. All of that being said, the important thing is not to focus on HOW I'm testing but on how much time these tests take per CI cycle. This model should only represent a kernel of your overall testing strategy so your usage numbers in actuality should be much higher because it will include CI runs as well as much larger full-regression passes.

Here are some example CI usage numbers.

In my case, I will take an average duration of each end-to-end UI test case to be 2 minutes and an average duration of each UI stress test to be 8 minutes. So given the number of each kind of test I've listed already, my device minutes equation looks like this:

34 UI end-to-end tests * 2 minutes per UI end-to-end test = 68 device-minutes for UI end-to-end tests

16 UI stress tests * 8 minutes per UI stress test = 128 device-minutes for UI stress tests

1 test run = 68 + 128 device-minutes = 196 device-minutes

At 3 hours and 20 minutes, that's not a very good duration for a CI process. Developers should not need to wait that long for results. I break that up across 8 devices so I have results back to the development team in 24.5 minutes. Still not great but at the pace of commits on my project (3 merges per day average), that's plenty of time to locate a bug in a core scenario and fix it without disrupting the delivery timelines.

Overall device-minutes per day is 196 device-minutes per run times 3 runs per day or 588 device-minutes per day. Assuming a steady pace of 20 days per month of development, that's 11,760 device-minutes per month. Keep in mind, that in practice that's only just over an hour's worth of use per day per device over 8 devices for just my project and in very small, CI-worthy testing loads.

Compare pricing models using my example CI numbers.

Amazon has 2 pricing models for their device cloud. Google has only 1 pricing model and even though they're still only in Beta, they're worth considering. Xamarin/Microsoft has multiple pricing models but they're insanely expensive by comparison so I'm going to ignore them for now. Both Amazon and Google also have an introductory discount but I'm going to ignore that as we're talking about a steady-state project that goes on for well over a year (plus their introductory discounts are similar enough to where they basically offset each other).

Amazon pricing model A: Strict device-minute charge @ $0.17 per device-minute
Amazon pricing model B: Unlimited device usage @ $250.00 per month per device
Google pricing model: Strict device-minute charge @ $0.083333 per device-minute (actually $5 per device-hour with a 5-minute minimum. I'm just simplifying because I'm assuming you'll use more than 5 minutes per device per run)

Let's put it all together.

From my example CI usage numbers above, I'm using 11,760 device-minutes per month across 8 devices. This will give us a monthly cost comparison across the 3 pricing models listed above that looks like this:

Amazon pricing model A ($0.17/device-minute): $1,999.20
Amazon pricing model B ($250/device unlimited): $2,000.00
Google pricing model ($0.08333/device-minute): $980.00

Given that including a Mac Mini host machine plus 8 devices and a USB hub cost my company $5000, that means that my in-house solution (FROM A HARDWARE-ONLY PERSPECTIVE) pays for itself in the following intervals:

VS Amazon pricing model A: 2.5 months
VS Amazon pricing model B: 2.5 months
VS Google pricing model: 5 months

Isn't managing your own hardware expensive?

Right, so an astute reader will notice that my device cloud posts from before mention I'm using Jenkins software to include my devices in my build cluster as test nodes. That means the software required to add a device cloud to my build server is free. The plugins I use for pulling results and displaying graphical representations of the output are free.

Additionally, I found out about a little smartphone test farm tool called OpenSTF. It basically gives you browser-based remote access to Android devices which includes:

The ability to drive the device UI directly from the browser just like you would a local emulator
The ability to drag-and-drop APKs
The ability to capture screenshots manually from a remote device
Remote ADB access
The ability to watch a test run in progress

OpenSTF in action!

So far, no device cloud service out there provides you with that level of access to their devices which I've found incredibly valuable debugging and managing my long-running end-to-end test suites. And here's the kicker. It too is free. You should check it out if you've got any inclination at all to build a device cloud with Android devices (yes, support for iOS is a common request but that's a topic for another blog post).

So the SOFTWARE I use to manage the hardware is free. What about the maintenance time itself? Well as I mentioned before, Jenkins is incredibly stable and OpenSTF gives me easy remote access to the devices. The devices themselves do not require a lot of maintenance from a software patching or physical perspective. And furthermore, I've only got a lab of 8 devices at this point so on my own I can easily manage all of them without interrupting my own workload greatly. I lose more time to unproductive meetings than I do to fussing over anything the device cloud I manage requires. In the scale of the example I've provided therefore, after hardware setup (roughly 4 hours if done from scratch), the cost is negligible and can be ignored. Of course if you're talking about hundreds of devices, that can change but at that scale, you're probably looking at an infrastructure team to manage it for you. In the context of CI only, hundreds of devices do not really make that much sense. More on that later.

So why do people pay more to rent time on a cloud?

Dealing with hardware is never going to be a zero cost and that's the primary appeal of device cloud services. The cost of time to build and maintain such as system should be considered fairly low but it is unavoidable. There can always be device issues such as the monkey turning off WiFi or crashes that force reboots. Using OpenSTF means I can manage these remotely and quickly. In my case, given that the apps I test will be around for a very long time and we're constantly releasing new builds and constantly testing, the economics work out HEAVILY IN FAVOR OF OWNING OUR OWN CLOUD when considering on-device testing inside CI cycles. I have written before about how to build your own so really it just comes down to offsetting the cost of hardware and setup time with usage. The more you use it, the better it works in your favor and you'll find it probably pays for itself in half a year under less than 2 hours a day of use.

The question you have to ask yourself basically boils down to: "how many tests do I expect to run, over how long?" In my case, only a handful of scripted tests and some very short UI stress runs more than make up for the hardware purchase, setup, and maintenance. I also have the advantage of more or less direct access to the devices themselves. And the devices are connected to internal networks meaning I can reach my internal test service backends instead of testing against public-facing endpoints.

So with all of the advantages of using your own home-grown device cloud for on-device testing in your CI cycles, is there any justification for using a device cloud service? Well based on the numbers I've run here, the obvious answer is no but only if we're strictly talking about the needs of test automation as a part of your Continuous Integration process. You'd really need some other specific purpose to justify the cost.

One area where device clouds shine is in compatibility testing. It is unlikely that you'll ever need to run your CI tests at a scale that matches the sheer number of available devices out there. But if it is important to your company to optimize your app performance and stability across every OEM and every OS out there, using a device cloud service starts to make sense for checking for compatibility. This is NOT a concern with CI cycles but is a concern with release cycles. Therefore if you're asking "Should I have my own cloud or use a 3rd party service like AWS or Google?" my answer has to be: "yes".

A word of caution before you go off and build your own device cloud and spend money on a device cloud service, you won't necessarily find that you can easily reuse the same suite of tests for both purposes. There is a development overhead cost to accommodating variances in the OEM flavors of Android that should be understood and intentionally addressed in order to maximize the usefulness of your compatibility suite. Ultimately, the topic of how to build a suite of compatibility tests versus functional UI tests deserves it's own blog post. Stay tuned!

UiAutomator.jar: What happened when Android's JUnit and MonkeyRunner got drunk and hooked up

"Drunkenness does not create vice; it merely brings it into view" ~Seneca So Jelly Bean 4.2 landed with much fanfare and tucked in amongst the neat new OS and SDK features (hello, multi-user tablets!) was this little gem for testers: UiAutomator.jar. I have it on good authority that it snuck in amongst the updates in the preview tools and OS updates sometime around 4.1 with r3 of the platform. As a code-monkey of a tester, I was intrigued. One of the best ways Google can support developers struggling with platform fragmentation is to make their OS more testable so I hold high hopes with every release to see effort spent in that area. I have spent a couple days testing out the new UiAutomator API and the best way I can think of describing it is that Android's JUnit and MonkeyRunner got drunk and had a code baby. Let me explain what I mean before that phrase sinks down into "mental image" territory. JUnit, for all its power and access to every interface, e...

Everybody Tests

Search This Blog