In this piece, we’re going to explore the world of distributed tech—breaking down what it is, how it’s currently being used in the tech landscape at large, why the blockchain space as a whole isn’t utilizing certain proven distributed tech frameworks, and how we intend on using them. We’re also going to break down what inspired Constellation’s approach to “Consensus as a service.” I sat down with several key members of our engineering team to discuss all of this in depth, so without further ado, let’s dive in.
From a higher level perspective, what does a distributed system entail? To put it simply, it’s the idea of splitting up computation and data across computers to solve scalability issues. Cloud computing can be seen as an example of distributed technology, where data and applications are served to millions of users over the internet. In terms of cryptocurrency, Bitcoin and cryptos approached distributed systems in a different way, as they weren’t focused on solving data processing problems or really trying to improve anything. As our CTO Wyatt Meldman-Floch puts it, “They were trying to add some notion of cryptographic security by mixing in economics. It all started with scalability, and crypto sort of became this other thing.”
What The Heck Is Distributed Tech?
To take a more macro-level view on what some of the most popular distributed tech platforms are, let’s turn it over to our VP of Engineering, Ryle Goehausen: “I would say really the basis for most of our involvement comes from Apache’s Spark, which is the big data distribution platform.” To put it another way, Spark is essentially a general-purpose data processing engine, which can rapidly analyze data at scale.
Ryle also mentioned the Scala programming language ecosystem as being a big influence within Constellation. Scala, which stands for “scalable language”, is already used by major companies such as Twitter and LinkedIn. Ryle also mentioned several other major distributed tech influences, in Akka, Algebird, and Hadoop. Akka is a toolkit for building highly distributed applications on Java, while Algebird provides abstractions for algebra within the Scala programming language. Apache’s Hadoop is a software library that allows for the distributed processing of large amounts of data and is used by the likes of Twitter, Spotify, and Facebook to store and process mass quantities of data in real time. At the heart of Hadoop lies MapReduce, which is a programming paradigm that allows for this massive amount of scalability to occur! For a quick laymen’s term breakdown of how that works, check this out. If you’re wondering how all of this ties into what we’re doing at Constellation and the blockchain space in general, hang in there, we’ll get there.
In terms of how all of the aforementioned distributed tech paradigms influence Constellation, here’s what Ryle had to say: “We’re primarily trying to use a lot of the techniques and the same libraries that other distributed systems are built with, and that’s all sort of oriented around Scala, Akka, Algebird and some of the other like standard tooling.”
What We Learned From Twitter
To take this a step further and explore how these tools are being used in the greater tech landscape at large, let’s take a look at how Twitter leverages MapReduce and Hadoop. In terms of processing massive quantities of data, Twitter relies on MapReduce to “literally do reputation calculations on individual tweets to figure out whether or not this show on your newsfeed.” Essentially, what Twitter does is leverage MapReduce and other streaming libraries and “mix these two tools in order to aggregate data and serve it very quickly and in such a way that is horizontally scalable. That’s how Twitter can strain billions of data points per second,” according to Wyatt.
Ryle backed this up by sharing some further details on how Twitter uses Hadoop and Spark:
“Twitter primarily lies on Hadoop, and they’re migrating Spark in a lot of places. They have separate things that are done for real-time aggregation, but a lot of it is unified under a common language. That’s why they’re developing so much Scala code—because it integrates with older code. They don’t need to rewrite a bunch of things. They can use a lot of the same code for both real-time, batch processing and analytics, so they’ll have batch jobs that run every night, but also have real-time system stuff.”
Using Hadoop basically allows them to store and process billions of tweets, log files, and other forms of data across thousands of nodes.
Right about now, you might be thinking, “Great! There are all of these proven existing frameworks for scaling distributed tech—this sounds like a perfect, seamless integration for the blockchain and crypto space!”
Well, not quite.
The main hold up is that the existing code stacks are not very friendly to a crypto stack, or in the case of Bitcoin, Ethereum, or Dash’s stack— “it would not plug in very nicely and it wouldn’t have a lot of code reusability,” said Ryle. To extrapolate this point further, we can take a look at Ethereum, whose own messaging strongly plays up MapReduce (have a look at their Plasma whitepaper), but isn’t using any of the language libraries or common practices associated with MapReduce itself. Seems a bit backward, right? I asked Ryle why this is, and his response was quite illuminating:
The obvious answer is because they already have code and dependencies that are already written specifically for Ethereum, and they’re a slow moving boat. It would be difficult for them to adapt and change all of their existing stuff over [to MapReduce], so they’re viewing it as basically tacking it on to their system. There are other people trying to migrate MapReduce—except our strategy for the migration of MapReduce is to use libraries that are other actually used for developing MapReduce, instead of trying to develop MapReduce inside of crypto libraries.
In short, there are already existing standards for proven distributed tech frameworks such as MapReduce, and we want to reuse as much of that code as we can and forge the right path from the start. Ethereum is essentially ignoring the long-standing reasons for why people chose design patterns oriented around MapReduce in the first place, and are trying to recreate something that already works, but within their own sandbox. The same can be said of their approach with the Ethereum Virtual Machine (EVM). Ryle shared his take on this in the following quote:
“If you look at all the work that was done on EVM, it’s going to be scrapped because they went down a path that is now worthless because they chose to go against the standard. It’s better to go with common standards and to use tools that are already there. There’s no reason to try and reinvent the wheel for everything.”
Another example can be seen with IOTA, who are currently experiencing a crisis on their main network due to spammers creating a Side Tangle parasite chain, resulting in transactions not being confirmed. Wyatt shared his take on this by saying “this is the price of trying to customize too early.” IOTA has essentially tried to reinvent cryptography, which violates the underlying tenets of cryptography itself.
This isn’t meant to bash either Ethereum or IOTA—we’re merely trying to illustrate the importance of leveraging the existing proven tech and forging the proper path the first time, rather than trying to retroactively fit a lot of these tools in down the line.
Ryle sums it up quite succinctly:
If we go down the wrong path of attempting to prop up something that isn’t necessarily going to become the standard in the future, it’s a lot of wasted effort. And we’d like not to do that. These other tools have billions of dollars of aggregate industry value, and to ignore them is silly.
Forging the Right Path
With Constellation, we want to forge the right path from the outset. We want to develop libraries that bridge the gap between crypto and existing distributed tech, and that’s not really easy to do right now. While other projects are wasting their time trying to port existing distributed tech over into crypto, we want to leverage what already works, and focus on building the future.
“There are projects that are trying to adopt Bitcoin and Ethereum to Scala and Java, and they’re sort of in a weird state where they don’t really work all the way, and they don’t fully integrate. So we’re trying to focus on using languages and tools that make that problem go away,” shared Ryle.
While this is a difficult task, we aren’t the only ones in this space that are pursuing a similar path. As Ryle said, “Rchain are using a lot of the same functions same crypto code that we’re using. Waves is another example. When you look at a project like Rchain, they’ve essentially proven that it’s possible to do crypto on the Scala ecosystem.” But again, what distinguishes us from Rchain is we’re not trying to approach this as a language problem. While they’ve created Rholang, which is a new language for writing smart contracts that run on the RChain virtual machine, we’re more heavily focused on leveraging Scala for big data processing. Ryle went on to share how the success of Rchain has “proven that there is a demand for using the tooling of Scala and these other distributed systems within the crypto space.”
When I asked the team if it’d be fair to consider Constellation as one of the first blockchain companies intentionally building a bridge for mainstream developers to port over into the blockchain space, Ryle made it clear that Constellation isn’t unprecedented in the aspect of what toolkits we are trying to use, but rather, “we are unprecedented in the sense that we’re trying to solve a very narrowly scoped problem. We’re focusing heavily on the MapReduce aspect and using monads and other functional programming in Scala Concepts.”
Constellation – Consensus as a Service
During the interview, Wyatt shared the approach they took when dreaming up Constellation Labs, and how they leveraged their prior experience in creating scalable tech. “The approach we had was—we already build systems for the scalability side. We are also heavily into crypto, and we wanted to sort of take these things over and apply the stuff we learned for solving scalability problems and also use them for making scalable cryptocurrencies.” Wyatt went on to say that, “our approach to scalability was also a solution to that splitting protocols problem. That’s what we do. What we did was create this protocol for communicating across protocols.” By solving that, it allowed us to do MapReduce, and its like they were the same problem – one and the same. Tyler Prete, one of our Distributed Systems Engineers, added that by leveraging Java and Scala, “we can more easily integrate with these existing tools like Spark or Hadoop, because we can kind of be the connection point where blockchain needs MapReduce.”
At this point in the conversation, Ryle brought up the notion of focusing on Constellation as consensus as a service—“of trying to be that networking protocol layer between decentralized applications.” He went on to draw comparisons between us and Amazon’s S3 cloud storage service. “The same way that Amazon created S3 as its own service oriented around storage, we’re a service oriented around consensus network organization and reputation.”
To reiterate an earlier point, we’re not trying to focus on the contract language, because the contract language can be considered plug-and-play on top of a consensus layer. As Ryle succinctly put it, “Our competitive edge is just focusing on consensus and reputation and big data at scale.”
The team made it clear throughout the interview that Constellation wants to integrate with as many other libraries as we can, and that they want it to be possible for code to be reused. “You should be able to run the same kind of code that does transaction processing in Spark, in Storm and even in Hadoop, and connect it to nodes also running off phones—that’s a very reasonable objective, but we can’t just do this in a vacuum,” said Ryle. By that, he means there are already standard ways of accomplishing this, and Constellation is trying to properly integrate with them.
While at the moment, you can’t really run Spark on a phone, Ryle admitted that “you can run a lot of the same libraries and code and have it very easy to connect to services that run Spark or Storm, or any of these other Hadoop-like services.” In closing, Ryle shared how the underlying problem that Constellation is tackling is very well stated:
We want these nodes running everywhere. We will also want them running on data centers. We want batch analytics to be possible on the data. We want to be able to run models to reputation and do analytics— all of that, the industry already has techniques for that, but they’re not being used.