Hacker News

2 hours ago by josh2600

Ok so having worked on distributed consensus a bunch here are a couple thoughts in no particular order:

* In the real world, servers misbehave, like, a lot, so you have to deal with that fact. All assumptions about network robustness in particular will be proven wrong on a long enough timeline.

* Leader election in a world without a robust network is an exercise in managing acceptable failure tolerances. Every application has a notion of acceptable failure rates or acceptable downtime. Discovering that is a non-trivial effort.

Jeff Dean and others at Google famously came to the conclusion that it was ok for some parts of the Gmail service to be down for limited periods of time. Accepting that all self-healing/robust systems will eventually degrade and have to be restored is the first step in building something manageable. The AXD301 is the most robust system ever built by humans to my knowledge (I think it did 20 years of uptime in production). Most other systems will fail long before that. Managing systems as they fail is an art, particularly as all systems operate in a degraded state.

In short, in a lab environment networks function really well. In the real world, it's a jungle. Plan accordingly.

an hour ago by benlivengood

Since the article mentions Google as the outlier preferring Paxos, I may be able to shed some light from a few years ago.

The Paxos, paxosdb, and related libraries (despite the name, all are multi-paxos) are solid and integrated directly into a number of products (Borg, Chubby, CFS, Spanner, etc.). There are years of engineering effort and unit tests behind the core Paxos library and so it makes sense to keep using and improving it instead of going off to Raft. As far as I am aware the Google Paxos implementation predates Raft by quite a while.

I think in general if most other people use Raft it's better for the community to have single, stable, and well-tested shared implementations for much the same reason it's good for Google to stick with Paxos.

an hour ago by tyingq

This makes sense to me. Very few of us have the resources to maintain, for example, the kind of globally synced (way beyond typical NTP) clock infrastructure that Google has (TrueTime[1]).

[1] https://cloud.google.com/spanner/docs/true-time-external-con...

3 hours ago by brickbrd

In practice, for the systems where I built a replication system from the ground up, once you factor in all the performance, scale, storage layer and networking implications, this Paxos vs. Raft thing is largely a theoretical discussion.

Basic paxos, is well, too basic and people mostly run modifications of this to get higher throughput and better latencies. After those modifications, it does not look very different from Raft with modifications applied for storage integration and so on.

2 hours ago by ignoramous

> Basic Paxos, is well, too basic and people mostly run modifications of this to get higher throughput and better latencies. After those modifications, it does not look very different from Raft.

Alan Vermeulen, one of the founding AWS engineers, calls inventing newer solutions to distributed consensus an exercise in re-discovering Paxos.

https://youtu.be/QVvFVwyElLY?t=2367

2 hours ago by arielweisberg

This exactly my take as well. Multi-Paxos and Raft seem very similar to me. Calling out what the exact differences and tradeoffs are would be good blog/research fodder.

2 hours ago by littlestymaar

Heidi Howard, the first author of this paper did two videos about her paper:

- A 10' short intro https://www.youtube.com/watch?v=JQss0uQUc6o

- A more in depth one : https://www.youtube.com/watch?v=0K6kt39wyH0

3 hours ago by butterisgood

VR for the win... (no not that VR) http://pmg.csail.mit.edu/papers/vr-revisited.pdf

Ok, maybe not for the win, but it's worth a look. I'm actually fairly certain one of the Paxos implementations I've worked with and used is really more of a VR bend to Paxos anyway.

2 hours ago by hutrdvnj

Can someone provide a short description of the differences between Paxos and Raft?

2 hours ago by ignoramous

Raft is very similar to Multi-Paxos. These Raft and Paxos user studies comparing the "understandability" of the two protocols might be useful: https://youtu.be/JEpsBg0AO6o (Paxos), https://youtu.be/YbZ3zDzDnrw (Raft).

See also: Comparison of various "Paxos" implementations: https://muratbuffalo.blogspot.com/2016/04/consensus-in-cloud...

2 hours ago by polynomial

Paxos is a family of algorithms which are aimed at distributed consistency / monotonic state guarantees. However, Paxos allows for leaders with out-of-order logs to be elected leaders (provided they then reorder their logs) whereas Raft requires a server’s log to be up-to-date before it can be elected leader. Moreover, Raft has a reputation for having better understandability than Paxos.

edit: It looks like the linked paper covers the main differences, albeit in a more detailed manner. Also, it sems as if the author rejects the idea that Raft is more understandable and makes a case why he thinks Paxos is more understandable.

an hour ago by anonymousDan

Personally I find paxos more understandable. For example, KTH have a really nice incremental development of Multi-Paxos called Sequence Paxos: https://arxiv.org/abs/2008.13456

2 hours ago by tschellenbach

Stream's consensus algorithms are all Raft based, the Go Raft ecosystem is very solid. We did end up forking some of the libraries, but nothing major.

an hour ago by aneutron

It's not possible do to so, since it would be done asynchronously.

I'm sorry, I'll see myself out.

Daily digest email

Get a daily email with the the top stories from Hacker News. No spam, unsubscribe at any time.