Notes

Info

Beehive is a proactive replication framework

Quote

Beehive is a general replication framework that operates on top of any DHT that uses prefix-routing, such as Chord, Pastry, Tapestry, and Kademlia

Problem

  • Common DHTs (e.g. Kademlia) have high latencies
  • They typically require hops, and latency is often expensive

Goals

  • DHT lookup performance
    • average
    • worst case
      • i.e. worst is the same as not using Beehive
  • Minimise protocol chattiness
  • Respond quickly to changing network conditions

Core Insight(s)

  • Average query path is reduced by one hop when an object is proactively replicated to logically preceding nodes (in the DHT prefix path)
  • Iteratively, replicating by levels means reducing lookups by hops
  • Solving for popular data means that you move the average performance by a lot
    • If we assume that popularity follows a power law, then solving for a small number of items radically improves the situation

Proposed Solution

  • Proactive replication
    • Replicate popular data to more nodes earlier up in the prefix hierarchy
    • Leave unpopular data in leaf nodes
  • ”Sub-one hop hash table"
  • "Guarantees” performance with minimum network hops
  • Tuneable replication variable
    • is the average number of hops required to find an object
    • If , then you will always get data from the first node that you talk to
    • If , means your average latency is 2 hops
    • Fractional s like still give you single hop performance, but I think change how strongly the tail latencies are improved
    • leads to every node having a copy
      • i.e. every node has the entire table
    • may be adjusted freely without downtime
      • In fact, this dynamic behaviour is normal to the operation of the network
    • Trades off storage burden for network lookup time
  • Find the minimal amount of replication for each object so that the average lookup is
  • Measure how many requests there are to the object’s home server over time, and gossip this when negotiating replication
    • Only replicate to direct neighbours (1-hop)
    • Each node is responsible for replication of its own data
    • No need for consensus on which data is popular; it just gets worked out by running the protocol
      • This is a nice property in presence of Byzantine nodes
      • In a lot of ways, this reminds me of EigenTrust but for content popularity. Not the same, but a similar β€œflavour”
    • Nodes leaving can temporarily unbalance , but it will self-heal
      • Note: this may be a problem in IPFS where node connections tend to be extremely short lived (likely also a power law 😜)
    • In their empirical tests, the network achieved their target after two replication phases
  • Two phases
    • Analysis
    • Replication
  • Tracked info
    • Fields
      • Object ID (not a CID since these are mutable objects)
      • Version ID (basically a Lamport clock?)
      • Home Node
      • Replication Level ()
      • Access Frequency
      • Aggregate Popularity
    • I wonder if Object ID & Version ID can just be replaced with CIDs
      • It’s also nice to keep closely related data together (e.g. elements in a linear log), so perhaps a tag (e.g. UUID or DID) root + CID?
      • This paper was released prior to CRDTs, so some adjustments to reasoning about mutability may be fuelled by that
      • They note that object size may be a useful thing to add to the algorithm (I’m inclined to agree)

Prototype & Empirical Results

  • They implement DNS with Beehive

Setup

  • DNS trace
    • 4 million lookups over 7 days
    • 1233 clients
    • 302k names
    • 7 queries per second
      • Not very rate sensitive
      • Performance is dominated by running the analysis & update protocol, not lookups
  • Beehive
    • 1024 nodes
    • On top of Pastry
      • Average hops on Pastry is 2.34 hops
    • Aggregation interval was 48 minutes
    • Analysis interval was 480 minutes
      • (Where did they come up with these numbers?)

Results

  • Took 2 rounds to converge to their target
    • This was 16 hours and 48 minutes
  • Average hops on Beehive Pastry was 0.98 (instead of 2.34)
  • Outperformed passive caching
    • They found that 95% of DNS records have a TTL under 24 hours
    • Passive caching suffered
  • Improved bandwidth usage
    • Front-loaded bandwidth
  • Significantly outperformed Pastry (and PC-Pastry) with spiky requests