By Ellen Rubin
CEO, Clear Sky Data
In the early days of the cloud, it was common to hear tech visionaries talk about how nearly all enterprise infrastructure would soon migrate to a few large cloud providers. But while the dream of liberating enterprise IT from managing on-premises infrastructure is still very much alive, the early cloud hype clearly got out ahead of reality, because it turns out that public cloud has an Achilles’ heel: lag.
The three big public cloud providers build their enormous facilities in sparsely populated geographies where real estate is cheap, which means these data centers are usually hundreds or even thousands of miles from the cities where customers are located. Even the speed of light isn’t fast enough to overcome those kinds of distances, and the result is unavoidable: unacceptable latency.
The solution to this latency problem is the edge, which is evolving much faster than did the cloud because there’s such an obvious, urgent need. The only way to provide the performance required for emerging use cases such as IoT, connected cars, and smart cities is to bring compute and storage close to the end user.
But though the edge’s evolution is proceeding rapidly, the edge is still early in its development, and there’s still a lot left to do. That’s especially true for to the data layer, because the initial emphasis on edge build-out has been to provide compute capabilities, with little thought given to storage. This is often the case with new technologies. After all, when containers were first introduced, they had no persistent storage, even though that’s a basic requirement of any storage system. So while it’s not surprising that we’re seeing the same trend at the edge, it’s time we focused on the edge data layer.
In 2013, IoT generated 100,000 PB of data. By 2020, that figure will grow to 4.4 million PB, exploding to more than 79 million PB by 2025, according to IDC—and IoT is just one of the use cases for the edge. The edge will need to have a robust data layer that can not only handle a crushing amount of data, but can also address the many specific data challenges of the edge.
Opportunities at the edge
Before taking a look at the data layer, let’s take a step back to examine how the edge is currently evolving.
At first glance, the hyperscale cloud players look to be in the best position to capitalize on the new opportunity of edge, but just as on-premise data center incumbents initially dismissed the cloud, the big cloud providers initially dismissed the edge. Only recently have they started rolling out their own edge-based services.
But the edge is very different from a hyperscale cloud. The edge needs to be highly distributed and able to serve data sources via many small facilities connected together into a high-performance network. So while the edge must be able to integrate with the cloud, building and operating the edge requires a very different skill set.
Ironically, some colo providers that were struggling to compete in the emerging hyperscale cloud market are now perfectly positioned to provide edge services. And they’re not the only industry that is now starting to take the lead at the edge. Telecommunications carriers’ metro facilities and mobile carriers’ cell towers are located right in the midst of urban customers, and they’ve already got the power, security, and connectivity required for an edge datacenter. And, of course, there are plenty of new edge providers using new business and delivery models to stake their claim.
But in almost every case, whatever model or mix of models wins out, storage at the edge will look very different from both traditional on-prem storage and hyperscale cloud storage. For starters, edge data centers will face space and power constraints. Most will need to be small, because real estate is very expensive in the metro areas where these facilities must be located, and at many sites power will be in short supply and aggressively metered.
Edge storage must also be distributed and highly connected. Autonomous vehicles, for example, will need to communicate with multiple facilities as they move in and out of range, and data will need to follow them. Storage at the edge will also need to interact effortlessly with apps and end users on-premises and in the cloud. After all, the edge needs the cloud just as much as the cloud needs the edge. The gargantuan amount of valuable data that edge use cases will generate cannot be stored forever at the edge—it’s far too expensive. The cloud is the perfect place to store and analyze big data, so long as it doesn’t require a fast response.
Finally, data at the edge will need to be protected, and that’s no simple matter when storage is so distributed. If one edge facility is destroyed or disabled by fire or a lightning strike, there needs to be instant failover to another facility nearby, with access to the same data.
Both traditional storage hardware and storage systems designed for cloud environments are too large and power-hungry to deliver enough capacity to power edge applications within the constraints of edge environments. Plus, neither currently has the intelligence required to automate the movement of data across a large, distributed network—a key capability for the edge. In short, we must deconstruct the traditional storage architectures and rebuild them to suit the needs of the edge.
Additionally, organizations should be very wary of “edge-washing,” which is when a company slaps an “edge-ready” or “edge-compatible” label on a product or service that, in reality, is no different than it was before the new branding. Because the edge is developing so much faster and the need is much more apparent, there’s an even greater danger of “edge washing” than there was for “cloud washing” less than a decade ago.
The rise of the edge is now making the next-generation capabilities promised by the cloud a reality. It’s enabling enterprise IT to decommission on-premises infrastructure, powering advanced IoT use cases and paving the way for smart cities. But the edge can’t do any of this without a next-generation data layer that can address its unique storage needs.
It’s time for the industry to start giving the data layer the attention it deserves.
ClearSky Data, based in Boston, uses the cloud and the edge to provide on-demand primary storage with built-in offsite backup and disaster recovery (DR) as a service.