1.6 A home on the Internet — Ludovic Courtès — Backup and Archival Systems
  • A DHT-based Backup System [sit03:dht-backup]
    Emil Sit, Josh Cates, Russ Cox (MIT Laboratory for Computer Science), First IRIS Student Workshop, August 2003

    This paper describes a distributed implementation of Plan 9's Venti archival system using a distributed hash table (DHT) built on top of Chord and called Venti-DHash. It uses erasure codes to provide improved availability with limited overhead. Clients use latency information to select the closest replica when storing/restoring data.

  • Deep Store: An Archival Storage System Architecture [you05:deepstore]
    Lawrence L. You, Kristal T. Pollack, Darrell D. E. Long (University of California, Santa Cruz, CA, USA; Hewlett-Packard Labs, MS, USA), Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), April 2005

    A followup to you04:evaluation. They present a storage layer called PRESIDIO which is relatively close to libchop...

  • Easy Automated Snapshot-Style Backups with Linux and Rsync [rubel04:easy]
    Mike Rubel, 2004
  • Distributed Internet Backup System [martinian05:dibs-web]
    Emin Martinian (MIT, USA), 2004

    A distributed backup system written in Python and released under a BSD-like licence. The source tarball contains a comprehensive documentation (in Texinfo). It has an (optional) peer finder for finding peers willing to cooperate to the backup service. When using the peer finder, users must post a contract which states a range of disk space that is being searched for (e.g. between 10MB and 30MB), as well as an imbalance ratio (e.g. I want 2.3 times as much space as I will provide) and a lifetime. This look pretty much like what PeerStore landers04:peerstore has. However, unlike in most other backup systems (Pastiche, PeerStore and the likes), contracts are expected to be negotiated "by hand" by the end-users. Theoretically, contracts are browseable online. On the backup side, it uses GnuPG for encryption, it performs incremental backup, and uses Reed-Solomon codes for redundancy. It does not rely on content-based indexing ? la Pastiche or Venti quinlan02:venti. See also iDIBS morcos06:idibs.

  • PeerStore: Better Performance by Relaxing in Peer-to-Peer Backup [landers04:peerstore]
    Martin Landers, Han Zhang, Kian-Lee Tan (Fakult?t f?r Informatik, Technische Universit?t M?nchen, Garching, Germany, School of Computing, National University of Singapore, Singapore), Proceedings of the Fourth International Conference on Peer-to-Peer Computing, August 2004

    This paper presents PeerStore, a P2P backup system that builds upon Pastiche/Samsara, pStore and Elnikety's cooperative backup scheme. The main specificity of the system is that it uses two different P2P topologies: a DHT is used for meta-data (i.e. file records containing a list of block IDs -- CHKs -- and block records that contain the list of peers holding replicas of a given block) and actual data is stored symmetrically on pairs of peers. Collaboration is based on purely symmetric trades between pairs of peers (as in raw Pastiche and Elnikety's scheme). However, PeerStore avoids duplication of data coming from different hosts (such duplication is inherent to Elnikety's scheme for instance) by looking up a block before sending it to a backup partner, thus reducing bandwidth consumption during backup operations. Most importantly, the use of two different P2P topologies substantially reduces the cost of data migration (which happens when a node leaves/enters the DHT) since the DHT only stores small meta-data blocks while actual data blocks are stored on peers (pStore stores data blocks in the DHT itself and therefore is much more bandwidth intensive). In my opinion, the symmetric trading requirement and the "trick" needed to avoid data duplication is much less elegant than the claims mechanism introduced by Samsara. I'm also not totally convinced by the file block lists and block replica lists compared to tree structures use in most filesystems, Venti, GNUnet, etc., since using such lists seems weaker (in terms of privacy) and less robust (they are single points of failure, although they are replicated). The GPL'd project named ``flud backup'' peacock06:flud implements a similar hybrid storage layer.

  • Retrivability of Data in Ad-Hoc Backup (Master Thesis) [aspelund05:retrievability]
    Trond Aspelund (University of Oslo, Norway), May 2005

    This Master thesis deals with pretty much the same topic as MoSAIC killijian04:collaborative. They have not actually implemented any ad hoc backup system. The author focuses on scenarios where data recovery is performed in ad hoc mode by contacting contributors through a multi-hop routing scheme. The major contribution of this work lies in the assessment through simulation of the distance of data owners to their data backup in terms of number of hops.

  • Pastiche: Making Backup Cheap and Easy [cox02:pastiche]
    Landon P. Cox, Christopher D. Murray, Brian D. Noble (University of Michigan, Ann Arbor, MI, USA), Fifth USENIX Symposium on Operating Systems Design and Implementation, December 2002

    This is definitely a good and very well documented paper describing Pastiche, a peer-to-peer, decentralized, low-overhead backup system. Pastiche builds on peer-to-peer routing and object location (using Pastry), content-based indexing and convergent encryption. It is able to favor nodes locality as well making it possible for nodes to look for backup buddies with the maximum data overlap ("coverage"). It perfoms per-file block-level backups taking advantage of content-based indexing both to coalesce identical blocks coming from different hosts and to minimize the amount of data to be backed up by identifying common subsets between two revisions of a given file. Pistache is implemented as a Linux user-level filesystem called chunkstore that stores both local data chunks and data chunks coming from backup clients. Complete restores can be performed by passing the host name and a host-specific passphrase the host's backup buddies. Pastiche allows clients to send their backup servers a chunk deletion request but it does not address what could happen if clients never send such requests for some reason (either after a disaster or due to malice).

  • Venti: A New Approach to Archival Storage [quinlan02:venti]
    Sean Quinlan, Sean Dorward (Bell Labs, Lucent Technologies), Proceedings of the First USENIX Conference on File and Storage Technologies, 2002

    This is a good paper describing an archival system designed for the Plan 9 operating system. The Venti archival server basically offers block-level storage using secure hashes as block identifiers (this is similar to Freenet's and GNUnet's content-hash-keys). Therefore the block namespace is global which makes it possible to coalesce identical blocks. This paper contains an analysis of the chances that two blocks would hash to the same key, given a hash key size.

  • BAR Fault Tolerance for Cooperative Services [aiyer05:bar]
    Amitanand S. Aiyer, Lorenzo Alvisi, Allen Clement, Mike Dahlin, Jean-Philippe Martin, Carl Porth (University of Texas at Austin, TX, USA), Proceedings of the ACM Symposium on Operating Systems Principles, October 2005

    This paper proposes a model that extends the traditional Byzantine fault-tolerance model to account for selfish (or rational) participants in cooperative services. This leads them to devise a Byzantine-Altruistic-Rational model. The idea is to give strong incentives to rational nodes to cooperate to the service---or in other words, to not deviate from the protocol. The authors addressed closed cooperative services, where:

    • identity is managed by a central authority, hence making Sybil attacks unlikely;
    • the whole incentive framework lies on the fact that there are external sanctions or punishments that can be applied to misbehaving nodes (e.g., by some external authority, such as the network administrator in a students dorms);
    • thus, node behavior needs to be observable and it must be possible to get proofs of misbehavior (POMs) that can be handed to that external entity.
    Consequently, the proposed protocols aim to (i) provide incentives to behave correctly and (ii) provide mechanisms allowing to determine and prove whether a node misbehaved. For instance, the witness node abstraction is some sort of a ``distributed witness'' that allows all participating nodes to observe and judge each other's behavior. It is implemented as a state machine replicated among participants; all participants see the same state which reflects the cooperativeness of nodes involved in a transaction. Then, all it takes for an application to be BAR-tolerant is to structure messages ``so that incorrect responses act as proofs of misbehavior''.

    Although the system is said to address cooperative services that span multiple administrative domains, the fact is that it essentially builds a framework where all participants are assumed to obey the same authority---the one that applies external sanctions. To what extent can this qualify as multiple administrative domains?

    BAR-B itself (the cooperative backup implementation) uses a protocol designed in a way that allows behavioral proofs to be obtained and audited by third parties. For instance, when storing data, data owners get a signed receipt. Upon request, they should provide that receipt along with the retrieval message (except if they lost it, e.g., after a crash); if the contributor refuses to give the data back, then its response constitutes a POM. Upon auditing, participants must show the list of all blocks stored on their behalf elsewhere and all blocks they store on behalf of other nodes. Beside concerns about bandwidth usage, this raises fundamental security issues: why would one disclose all this information to some untrusted node? Likewise, reject of a storage request must be accompanied with a proof that the rejecter does not have enough room available. This seems impractical, and, again, this is a threat to user's freedom and privacy: there could be lots of other reasons why one would refuse to cooperate, and one may be unwilling to disclose it.

    When recovering from a disaster, participants are expect to pick up a new ``identity'' (actually, ``identifier'') from a list of linked identities that was created for them upon initialization. This allows participants to re-assign obligations that were bound to the previous ``identity''.

    In several places, the authors assume identical behavior of all participating nodes. For instance, all nodes are assumed to ``evict the guilty party'' upon reception of a POM; likewise, all nodes are expected to behave similarly upon recovery.

    All in all, the proposed system certainly makes very efficient use of available resources (which is not hard since it boils down to central resource management), at the cost of users' freedom and privacy. It very much looks like a totalitarian system more than anything else.

  • Burt: The Backup and Recovery Tool [melski99:burt]
    Eric Melski (Scriptics Corporation, USA), Proceedings of the USENIX Conference on System Administration, 1999

    A traditional centralized backup system based on GNU tar and similar standard tools.

  • A Survey of Cooperative Backup Mechanisms [killijian06:survey]
    Marc-Olivier Killijian, Ludovic Court?s, David Powell (LAAS-CNRS), October 2006
  • iDIBS: An Improved Distributed Backup System [morcos06:idibs]
    Faruck Morcos, Thidapat Chantem, Philip Little, Tiago Gasiba, Douglas Thain (Department of Computer Science and Engineering, University of Notre Dame, IN, USA), Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS), July 2006

    This paper present iDIBS, an improved version of the DIBS peer-to-peer backup system martinian05:dibs-web. The main improvement consists in using LT codes (which are rate-less erasure codes mitzenmacher04:fountains) in lieu of the Reed Solomon (RS) code used by DIBS. The paper includes a performance comparison of RS and LT codes, showing that the processing time for LT codes is much lower than that of RS codes, and that using LT codes noticeably reduces bandwidth utilization.

  • flud Backup: A Free Peer-to-Peer Backup System [peacock06:flud]
    Alen Peacock, 2006

    A nice libre peer-to-peer backup project released under the GPL. flud has an storage layer similar to that of PeerStore landers04:peerstore: it stores file meta-data in a DHT and uses point-to-point relations among nodes to store data. Data exchanges are symmetrical, as in Samsara. It features a trust mechanism based purely on node-local knowledge similar to that of GNUnet grothoff03:gnunet-eco. Its wiki contains a nice bibliography, as well as interesting discussions about those topics. See also this small thread which includes a discussion on the perils of using a DHT in this context.

  • In-Place Rsync: File Synchronization for Mobile and Wireless Devices [rasch03:iprsync]
    David Rasch, Randal Burns (Department of Computer Science, Johns Hopkins University), Proceedings of the FREENIX track, USENIX Annual Technical Conference, 2003
  • Sauvegarde coop?rative pour dispositifs mobiles [courtes06:sauvegarde]
    Ludovic Court?s (LAAS-CNRS), Actes du 7?me Congr?s des Doctorants de l'?cole Doctorale Syst?mes (EDSYS), May 2006

    This (small) paper contains a (small) comparison of various erasure coding algorithms, including a ``rateless'' codes. It also explains the analytical dependability evaluation approach taken in the MoSAIC project.

  • Backup: A GPL'd Peer-To-Peer Versioning Backup System [mckenzie02:backup]
    James McKenzie, 2002

    A GPL'd rough peer-to-peer backup system written in raw C. Backup peers are apparently to be chosen statically by users. Unfortunately, the implementation is undocumented and not very usable as such.

  • A Cooperative Internet Backup Scheme [lillibridge03:p2p-backup]
    Mark Lillibridge, Sameh Elnikety, Andrew Birrell, Mike Burrows, Michael Isard (HP Systems Research Center), Proceedings of the USENIX Annual Technical Conference, June 2003

    A much improved version of Elnikety's 2002 work-in-progress report at FAST elnikety02:p2p-backup. Noteworthy changes/features:

    • Use of (8,2) Reed-Solomon codes.
    • Use IDEA for encryption and HMAC-MD5 for integrity checks.
    Remaining issues:
    • Not storage- and bandwidth- efficient (uses a tar-like format for snapshots).
    • Only the latest snapshot is kept, i.e., no versioning is provided.
    • Backup does not guarantee atomicity and consistency: "Should the computer crash while writing the new snapshot, it will be left with two incomplete snapshots".

  • DIBS: Distributed Backup for Local Area Networks [hsu04:dibs]
    Eugene Hsu, Jeff Mellen, Pallavi Naresh (Parallel & Distributed Operating Systems Group, MIT, USA), 2004

    A student project at MIT, not related to the other DIBS.

  • SyncML Sync Protocol, Version 1.0.1 [oma01:syncml-sync]
    Open Mobile Alliance, June 2001

    Specifications of the SyncML Sync protocol, a synchronization protocol between mobile devices (``sync clients'') and, typically, desktop computers (``sync server''). The specifications requires implementors to allow devices do synchronize with multiple sync servers; hence, mobile devices must remember all changes that have be made to the user's data since the last sync with each sync server. The protocol also addresses conflict resolution. Members of the Open Mobile Alliance (OMA) include Ericsson, IBM, Motorola, Nokia, and Palm, Inc. The protocol seems to be implemented by Symbian and other proprietary mobile phones OSes.

  • Allmydata Tahoe: A Distributed Storage and Backup Tool [allmydata07:web]
    Allmydata, Inc., 2007

    The Tahoe implementation is in Python, influenced by distributed capability systems, and distributed under the GPLv2+ (with the exception that they may delay future source release by up to 12 months compared to the binary release). Allmydata, Inc. also provides for-profit storage services based on Tahoe. Zooko is involved in it.

  • Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices [courtes06:storage]
    Ludovic Court?s, Marc-Olivier Killijian, David Powell (LAAS-CNRS), Proceedings of the Sixth European Dependable Computing Conference, October 2006
  • Increasing Data Resilience of Mobile Devices with a Collaborative Backup Service [martin-guillerez06:increasing]
    Damien Martin-Guillerez, Supplemental Proceedings of the International Conference on Dependable Systems and Networks (DSN'06), 2006

    A (student) paper by one of our colleagues within the MoSAIC project. The paper provides a analytical measurement of the improvement of the probability of a successful backup provided by the distribution a replica. In other words, it allows one to answer questions like: "how much would I improve data availability by sending a replica to this contributor?" This may be a useful tool in the implementation of dynamic replication strategies.

  • Magnus: Peer to Peer Backup System [gattu03:magnus]
    Naveen Gattu, Richard Huang, John Lynn, Huaxia Xia (UCSD, USA), June 2003
  • TinyPEDS: Tiny Persistent Encrypted Data Storage in Asynchronous Wireless Sensor Networks [girao07:tinypeds]
    Joao Girao, Dirk Westhoff, Einar Mykletun, Toshinori Araki (NEC Europe Ltd., Heidelberg, Germany), appeared in Ad Hoc Networks, September 2007

    The title says it all: the idea is to replicate data collected in a wireless sensor network (WSN) at the nodes themselves. These nodes are essentially mutually trusted, just like devices within a PAN loo03:backup-pan. However, data must be encrypted so that only the owner/administrator of the WSN can retrieve them. The authors discuss the use of energy-economic encryption techniques, notably based on elliptic curves; they discuss the actual energy consumption of several techniques used (they mention using an Atmel ATmega microcontroller).

  • Efficient Distributed Backup with Delta Compression [burns97:efficient]
    Randal C. Burns, Darrell D. E. Long (IBM Almaden Research Center; University of California, Santa Cruz, USA), Proceedings of the fifth workshop on I/O in parallel and distributed systems, 1997
  • The Rsync Algorithm [tridgell96:rsync]
    Andrew Tridgell, Paul Mackerras (Department of Computer Science, Australian National University Canberra, Australia), 1996
  • Sauvegarde coop?rative entre pairs pour dispositifs mobiles [courtes05:sauvegarde]
    Ludovic Court?s, Marc-Olivier Killijian, David Powell, Matthieu Roy (LAAS-CNRS), Actes des deuxi?mes journ?es francophones Mobilit? et Ubiquit? (UbiMob), May 2005
  • ABS: The Apportioned Backup System [cooley04:abs]
    Joe Cooley, Chris Taylor, Alen Peacock (MIT Laboratory for Computer Science), 2004

    The MIT 6.824 Final Project. In other words, this is the final report of a student project! The backup system described builds upon Pastiche cox02:pastiche and the likes. Therefore, it shares most of their techniques, most importantly use of convergent encryption (CHKs) and use of a block store based on some sort of a DHT. However, ABS uses the Rsync algorithm tridgell96:rsync atop the block store in order to store only deltas. The block store allows blocks of arbitrary size to be stored. The unusual idea about the block store is that it also manages block metadata which is a copy of the file metadata plus fragment place (within the file) information. Data blocks are CHK-encoded while block metadata are ciphered with the user's public key, and those two things are signed with the user's private key. This signature can be used on the server side to find all blocks belonging to a given user. Unlike in Venti, the storage key is computed by the client as a hash of the key used to cipher the block (i.e. as a hash of a block's hash); however, in case of storage key collisions, the client may choose another key, typically a hash of the storage key which collided. This "rehashing" technique is also used to migrate fragments among nodes (merge high). The idea behind rehashing is that identical fragments stored by different clients will eventually converge; however, until storage keys for identical blocks converge, the single instance store property is broken (i.e. identical fragments may be stored several times) and I suspect that it may take quite some time before storage keys do converge. Both the hash depth for a given fragment's key and the user's key pair are stored on the client machine, which would make it hard to retrieve the user's data after a complete loss of the machine's data. Verifying that data is effectively backed up is done by challenging the backup holder, in a way similar to that found in Samsara cox03:samsara. Finally, the DHT implementation is centralized and therefore not scalable: a leader node is elected which is responsible of maintaining a table that map storage keys to nodes (see dabek01:cfs for a better approach).

  • Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices [courtes05:storage]
    Ludovic Court?s, Marc-Olivier Killijian, David Powell (LAAS-CNRS), December 2005
  • pStore: A Secure Peer-to-Peer Backup System [batten01:pstore]
    C. Batten, K. Barr, A. Saraf, S. Treptin (MIT Laboratory for Computer Science), December 2001

    An unpublished technical report. pStore uses Chord as the underlying distributed data store. pStore uses ``simple replication''. "To distribute chunk replicas randomly through the identifier space, salt is added when creating" the identifier of a replica.

  • A comparison of utilities for filesystem backup [riseup05:backup-comparison]
    RiseUp, 2005

    A complete survey of Free backup tools (most of which are available in Debian).

  • Peer-To-Peer Backup for Personal Area Networks [loo03:backup-pan]
    Boon Thau Loo, Anthony LaMarca, Gaetano Borriello (UC Berkeley; Intel Seattle Research (USA)), May 2003

    The work described in this technical report, called FlashBack, closely relates to what we are trying to do with MoSAIC. It does, however, differ significantly in several ways as it targets personal area networks (PANs):

    • personal networks where personal devices interact are assumed, which means that devices fully trust each other; this precludes the need for trust establishment mechanisms like those we envision in MoSAIC;
    • devices within the PAN are expected to be within radio range of each other most of the time; in particular, the recovery process can only take place online, when the two (or more) devices involved can reach each other, assuming that any personal device that has become unreachable will eventually be reachable again.
    The former point makes it possible to perform lazy replication as opposed to opportunistic replication which we envision in MoSAIC. The latter point (device locality) allowed the author to consider a lease-based accounting mechanism where storage resources are leased for an agreed-upon amount of time, and replicas need to be renewed before the lease ends.

    Besides, this paper introduces a number of interesting ideas as to how to discover devices and select the most "useful" devices by accounting for devices' power and storage resources as well as a locality factor (i.e. how often a device is reachable by another one). All these pieces of information are passed to a device utility function which is used during the device selection process. The choice of a utility function that accounts for both device's power budget and locality shows that it increases the time to live (TTL) of devices by favoring the use of the most capable devices while avoiding the weakest ones.

    Another interesting measurement is the so-called recall which represents the percentage of a device's data that can be restored at a given point in time. Experiments show that carefully choosing co-located devices yields more dramatic improvements of the recall than blindly increasing the replication factor.

    While these ideas on how to take into account devices' power budget, storage resources and locality are promising, their implementation (described in this paper) assumes that devices trust each other since they need to know each other's power budget and so on. This assumption does not hold in MoSAIC where devices will have to make decisions on their own and where a decision process ? la GNUnet would be more relevant grothoff03:gnunet-eco.

  • Towards a Semantic, Deep Archival File System [mahalingam02:sedar]
    Mallik Mahalingam, Chunqiang Tang, Zhichen Xu (HP Laboratories), 2002
  • The Phoenix Recovery System: Rebuilding from the Ashes of an Internet Catastrophe [junqueira03:phoenix]
    Flavio Junqueira, Ranjita Bhagwan, Keith Marzullo, Stefan Savage, Geoffrey M. Voelker (University of California, San Diego), Ninth Workshop on Hot Topics in Operating Systems (HotOS IX), 2003

    Based on the observations that worms, viruses and the likes can only attack machines running a given set of programs (actually mostly unsafe proprietary software), this paper focuses on techniques to look for diversity among software installations when backing up a machine (e.g. trying to not backup a machine that runs a given vulnerable web server on a machine that runs the same web server). A metric for software diversity is defined (cores).

  • MyriadStore: A Peer-to-Peer Backup System [stefansson06:myriadstore]
    Birgir Stefansson, Antonios Thodis, June 2006

    This is a nice Master thesis on Internet peer-to-peer backup. It contains a survey of existing Internet peer-to-peer backup services, with a table to ease comparison of those systems feature-wise. The proposed design itself does not seem to differ significantly from existing solutions. It includes a reputation mechanism, albeit a centralized one (i.e., a centralized server records reputation of each peer based on other peers' observation); this approach does not scale and is questionable security-wise (why should one be forced to trust that server, and why should the server trust peer ratings?).

  • Security Rationale for a Cooperative Backup Service for Mobile Devices [courtes07:security]
    Ludovic Court?s, Marc-Olivier Killijian, David Powell (LAAS-CNRS), Proceedings of the Latin-American Symposium on Dependable Computing, September 2007
  • Rsnapshot: A Remote Filesystem Snapshot Utility Based on Rsync [rubel05:rsnapshot]
    Mike Rubel, 2005
  • Backup and recovery of on-line information in a computer utility [stern74:backup]
    J. A. Stern (MIT Laboratory for Computer Science), January 1974

    This thesis deals with a backup mechanism designed for Multics and it proves that backup is not a new research topic... (See also the glossary)

  • Cooperative Data Backup for Mobile Devices [courtes07:phd]
    Ludovic Court?s (LAAS-CNRS, Universit? de Toulouse), November 2007

    My own PhD dissertation, wooow!

  • NetBSD's livebackup [mouse05:livebackup]
    der Mouse, 2005

    Code is available by FTP.

  • Collaborative Backup for Dependable Mobile Applications [killijian04:collaborative]
    Marc-Olivier Killijian, David Powell, Michel Ban?tre, Paul Couderc, Yves Roudier (LAAS-CNRS, Toulouse, France; IRISA, Rennes, France; Eur?com, Sophia-Antipolis, France), Proceedings of 2nd International Workshop on Middleware for Pervasive and Ad-Hoc Computing (Middleware 2004), October 2004
  • Cooperative Backup System [elnikety02:p2p-backup]
    Sameh Elnikety, Mark Lillibridge, Mike Burrows (Rice University, Houston, Texas, USA), The USENIX Conference on File and Storage Technologies (Work in Progress Report), January 2002

    This is a small yet interesting paper describing a collaborative backup system where the set of collaborating peers is static. The paper describes a basic trust model (where storage relationships are always symmetric -- the Samsara paper is all about how to avoid such a constrained model while still allowing for equity in resource consumption) and techniques to tolerate transient failures of nodes (based on the idea of a grace period). A number of potential attacks are discussed (e.g. a server claiming it holds data without actually doing so, a man-in-the-middle attack which may be used to isolate peers by removing the man-in-the-middle from the network, a grace period attack which consists in abusing the grace period mechanism in order to store data "for free") as well as solutions to avoid them (e.g. challenging backup servers by sending them random block queries -- the Pastiche paper contains a more thorough analysis of this particular problem).

(made with skribilo)