A New Machine

It has been almost three months since I said my goodbyes at Joyent, having spent seven years with the company. A lot has changed for me over that time: I moved to San Francisco, made a lot of friends, got married and started a family. I also spent seven years working on cloud computing and object storage infrastructure!

Prior to working at Joyent, I had spent almost six years working at The University of Newcastle in Australia. My role there evolved over my tenure, but throughout my stay I was responsible for the management of applications and infrastructure built on Unix systems. I have spent some fifteen years as a customer of server vendors of various sizes and capabilities.

I have fond memories of SPARC servers from the erstwhile Sun Microsystems; what they lacked in a competitive price-performance balance, they made up for in manageability. These systems were dyed-in-the-wool Unix machines, not burdened with the desktop PC identity crisis frame buffer that modern x86 servers still ship with to this day. The operating system and the system firmware were co‑designed, with first class interfaces in the operating system to manage firmware configuration. Diagnostics and boot control were performed over a serial line rather than a keyboard and display; later over an SSH connection to the lights-out management controller rather than an IP video stream. In contrast, x86-based servers at the time had the menu-driven BIOS, and little to speak of in terms of clean management of hardware or firmware through the operating system.

Fast forward to 2019, and while Sun has well and truly set, there have been some improvements in the x86 server space. The increase in the scale of processor power and memory size has been impressive, though it is becoming clear that the golden years are now behind us. The shift from traditional BIOS firmware to UEFI has brought with it the promise of better operating system-driven control through standardised interfaces, though with concomitant issues of complexity and security. In the area of remote systems management, the industry is slowly shifting from complex implementations of IPMI with deeply variable quality, to complex implementations of Redfish with deeply variable quality — but at least available over HTTPS! Burgeoning new efforts like the Open Compute Project and OpenBMC show great potential, but as yet with something of a kit car feel.

Though things are perhaps better than they were, they remain a long way from what I would consider ideal. Server vendors continue to nudge hardware management toward a “firmware first” model, where the (generally proprietary) system firmware has first right of reply as faults emerge. Servers still generally ship with frame buffers, and system firmware that now increasingly requires a mouse to use correctly. While standards like UEFI and Redfish purport to offer improved management of the system firmware and hardware, these standards are complicated to implement and to consume. This is all before you consider that infrastructure organisations are generally interested in a cloud-like virtual machine substrate, rather than a fleet of individual servers.

The complexity of these standards and of modern servers is the embodiment of a sort of Conway’s Law as applied to the commodity server ecosystem: they are complicated because customers have diverse needs and expectations, but the provider of the operating system and the provider of the hardware are almost certainly different organisations. The software and the hardware are generally not co‑designed — the flexibility inherent in the off-the-shelf approach leads to a lot of incidental complexity.

It is increasingly apparent that complex, flexible systems are difficult to secure. The need to mix and match components from different server and software vendors results in the combinatorial explosion of the test matrix for those deploying at any meaningful scale. It is no secret that the “hyperscalers” — Amazon, Facebook, Google, Microsoft, et al — are engineering their own custom systems and firmware to improve the security and operability of their estates.

It is with this background in mind that I have decided to roll up my sleeves and go to work at Oxide Computer Company. The most memorable periods of my career have been when I have shared a sense of purpose and a set of values with my team as we work together on something we believe to be important. I’m excited to be here at the start, with a mixture of new and familiar faces, as we take a swing at building an integrated software and hardware system to bring the benefits of hyperscale computing to server customers everywhere!