1.11 A home on the Internet — Ludovic Courtès — Coding Theory, Erasure Codes
  • Computation, Memory and Bandwidth Efficient Distillation Codes to Mitigate DoS in Multicast [di-pietro05:distillation-codes]
    Roberto Di Pietro, Stefano Chessa, Piero Maestrini (Universit? di Pisa, Italy; Universit? di Roma, Italy), Proceedings of the First International Conference on Security and Privacy for Emerging Areas in Communications Networks, 2005

    The documents first describes a method to use Merkle trees to authenticate packets. The basic idea described in the paper is the following:

    1. Sender signs and send the root of the (binary) Merkle tree merkle80:protocols consisting of the n packets to be sent.
    2. For each packet to be sent, the sender sends an augmented packet consisting of the packet itself concatenated with hashes of "sibling blocks" up to the root (see Figure 1 and Table 3 of the paper).
    3. For each augmented packet received, the receiver can reconstruct the tree, compute the corresponding root hash, and make sure that this hash is the same as the signed root hash initially received. Therefore, authenticity of individual packets can be assessed on the receiver side.
    Based on the observation that those augmented packets are redundant, the authors proposed a more compact data structure called Pruned Merkle Tree (PMT). They also present a mechanisms to produce "self-verifiable" packets for the Merkle tree root.

  • Hydra: A Platform for Survivable and Secure Data Storage Systems [xu05:hydra]
    Lihao Xu (Department of Computer Science, Wayne State University, USA), Proceedings of the ACM Workshop on Storage Security and Survivability, November 2005

    This is a nice paper. It contains a clear explanation of what of (n,k)-threshold schemes and (n,k) Maximum Distance Separable codes (MDS). Using an (n,k) scheme, original data is split in n shares and at least k of these shares (k < n) are needed to recover exactly the original data. MDS are a subset of threshold schemes which may be defined as follows:

    • an MDS maps a k-symbol block to an n-symbol codeword;
    • exactly k shares suffice to recover the exact original data;
    • consequently, if each share is stored on a separate node, this scheme:
      • tolerates the failure of (n - k) nodes;
      • has an effective storage use of k/n.
    The authors note that, quite often, special cases of (n,k) MDS schemes are used:
    • (n,1) which corresponds to full replication (mirroring);
    • (n,n) which corresponds to stripping without redundancy;
    • (n,n-1) which corresponds to single parity.
    The author goes on to note that, unlike Reed Solomon codes which are computationally intensive, MDS array codes which consist mainly of XORs are not (B-Code is one such thing).

    According to the author, using flexible (n,k) MDS can provide a lot of flexibility:

    1. all the symbols have to be of the same size, but the symbol size may be freely chosen, which allows to adapt to specific application requirements;
    2. it provides a means for simple load balancing: one can distribute shares such that only when more than a certain number of local nodes are unavailable remote nodes need to be queried, thus reducing the bandwidth requirement in normal use;

    The paper also presents the design of a storage system that leverages (n,k) MDS. The data structures for the meta-data that is proposed is the following:

    • First of all, for each file f stored, an hnode is created (let's call it hnode(f)) that contains the following information:
      | path: /home/ludo/paf     |
      | share locations:         |
      |   1 -> chbouib.laas.fr   |
      |   2 -> paf.laas.fr       |  <--- hnode(f)
      |   ...                    |
      | encoding: (4,2) B-code   |
      | encryption: yes          |
      | access rights: rw-r--r-- |
    • hnode(f) is prepended to every share of file f:
      ,-------------.     ,-------------.
      | hnode(f)    |     | hnode(f)    |
      |-------------| ... |-------------|
      | share1(f)   |     | shareN(f)   |
      `-------------'     `-------------'
    Clearly, this data structure duplicates a lot of information since hnode(f) is repeated in each share of f. The author argue that this redundancy is beneficial to the availability of files. In my opinion,
    • this structure does not address at all data confidentiality---which is not surprising, though, given the goals of Hydra.
    • it would be wiser to consider meta-data redundancy in the same way as data redundancy, using MDS for instance.
    • separating meta-data from data seems like a good idea because it allows sharing and separation of concerns; in Hydra, a modification in the file's meta-data (like rename, chmod) requires the system to update every single copy of the file's hnode.
    • the implementation of the write operation given this data structure is likely to lead to consistency headaches: shares and their associated hnode have to be updated in-place; this is very different from the append-only semantics of content-based addressing, for instance.

  • Energy Aware Lossless Data Compression [barr03:energy-aware]
    Kenneth Barr, Krste Asanovic (Massachusetts Institute of Technology, USA), Proceedings of the First International conference on Mobile systems, applications and services (MobiSys), 2003

    A really nice article (awarded best paper) with a detailed analysis of the energy consumption yielded by various compression algorithms (and their implementations) on a StrongARM-based platform, plus proposals as to how to improve energy-efficiency of implementations themselves. Many of the concerns presented here are similar to those found in courtes06:storage. See barr06:energy-aware for more details.

  • Low Density MDS Codes and Factors of Complete Graphs [xu99:bcode]
    Lihao Xu, Vasken Bohossian, Jehoshua Bruck, David G. Wagner (California Institute of Technology, USA; University of Waterloo, Canada), appeared in IEEE Transactions on Information Theory, November 1999

    A rather theoretical description of B-Codes, by the author of the Hydra paper xu05:hydra. B-Codes are a distance 3 MDS array codes, meaning that they can tolerate 2 erasures. Or, in other words, B-Codes are (k+3,k) MDS codes.

  • S-Code: New Distance-3 MDS Array Codes [katti05:s-code]
    Rajendra Katti, Xiaoyu Ruan (North Dakota State University, ND, USA), IEEE International Symposium on Circuits and Systems, May 2005

    Yet another distance 3 MDS array code. See also B-Codes xu05:hydra.

  • Efficient Erasure Correcting Codes [luby04:efficient]
    Michael G. Luby, Michael Mitzenmacher, M. Amin Shokrollahi, Daniel A. Spielman, appeared in IEEE Transactions on Information Theory, February 2001

    Presents what is now known as Tornado codes.

  • LZO Real-Time Data Compression Library [oberhumer05:lzo-web]
    Markus F.X.J. Oberhumer, 2005
  • Erasure Coding vs. Replication: A Quantitative Comparison [weatherspoon02:erasure-coding]
    Hakim Weatherspoon, John Kubiatowicz, Revised Papers from the First International Workshop on Peer-to-Peer Systems, 2002

    The authors provide an analytical study of the mean time to failure (MTTF) of systems employing simple replication, compared to systems based on erasure codes. They conclude that "systems employing erasure codes have mean time to failures orders of magnitude higher than replicated systems with similar storage and bandwidth requirements". At first sight, this may sound contradictory with vernois04:durability and lin04:revisited, but it's not: Weatherspoon et al. base their computations on the probability distribution of disk failures. Needless to say, hard disks fail much more infrequently than untrusted peer-to-peer storage nodes; their availability is close to 1 while that of untrusted storage nodes is much closer to 0. Therefore, compared to lin04:revisited and vernois04:durability, they describe a context where peer availability is close to 1, although the paper claims to be addressing the design of a peer-to-peer storage system.

  • Data Durability in Peer to Peer Storage Systems [vernois04:durability]
    Antoine Vernois, Gil Utard (LaRIA, INRIA, France), Proceedings of the 4th Workshop on Global and Peer to Peer Computing, April 2004

    The abstract says it all:

    The conclusion of this study is that when there is no high availability of peers, simple replication scheme may be more efficient than sophisticated erasure codes.
    More precisely, erasure codes assume relatively high availability of peers to be efficient. Additionally, erasure codes "increase the amount of communication [needed] to recover [from] a failed peer". The paper on Hydra by Xu xu05:hydra strongly moderates this conclusion.

  • Digital Fountains: A Survey and Look Forward [mitzenmacher04:fountains]
    Michael Mitzenmacher (Harvard University, USA), Proceedings of the IEEE Information Theory Workshop, October 2004

    This paper contains a nice survey of rateless codes (i.e., codes that may produce a potentially infinite stream of distinct encoding symbols) from a historical perspective. It starts describing Reed-Solomon and Tornado codes, which are not rateless, and goes on describing LT and Raptor codes, which are rateless. RS codes have a limitation on the number of distinct encoding symbols that may be produced, while Tornado codes require a rate to be chosen before generating the encoding symbols. On the other hand, LT codes, like Online codes maymounkov02:online-codes, do not have these restrictions: given k input symbols, any number of distinct encoding symbols may be generated, out of which any k symbol are sufficient to recover the input data with high probability. LT codes work as follows (quoting the authors):

    1. Choose a degree d for the encoding symbol, according to a predetermined distribution (the authors propose to use network information as a seed of a pseudo-random number generator to determine a degree);
    2. Choose d distinct message symbols uniformly at random, and set the encoding symbol to be exclusive-or of these symbols.
    Note also that in maymounkov02:online-codes the author mentions various weaknesses of LT codes compared to Online codes. The paper concludes by listing potential reasons for the failure of this codes to be widely adopted. The main one is the set of patents held by Digital Fountain, Inc. Ironically enough, the author's former colleague, Luby, is now working for this very company.

  • Bzip2 and Libbzip2 Website [seward07:bzip2-web]
    Julian Seward, 2007
  • XMill: an Efficient Compressor for XML Data [liefke00:xmill]
    Hartmut Liefke, Dan Suciu (University of Pennsylvania, USA; AT&T Labs, USA), Proceedings of the ACM SIGMOD International Conference on Management of Data, 2000

    Great paper! Deal with the use of type-specific compression algorithms (called semantic compressors) in order to compress the various data types that may comprise an XML tree. Note that this has nothing to do with XMille! :-)

  • DEFLATE Compressed Data Format Specification Version 1.3 (RFC 1951) [deutsch96:deflate]
    Peter Deutsch (Internet Engineering Task Force (IETF)), May 1996

    Specification of the compression algorithm used by zlib deutsch96:zlib.

  • Assessment of Cooperative Backup Strategies for Mobile Devices [courtes06:assessment]
    Ludovic Court?s, Ossama Hamouda, Mohamed Ka?niche, Marc-Olivier Killijian, David Powell (LAAS-CNRS), December 2006
  • Survivable Storage Systems [ganger01:survivable]
    Gregory R. Ganger, Pradeep K. Khosla, Mehmet Bakkaloglu, Michael W. Bigrigg, Garth R. Goodson, Semih Oguz, Vijay Pandurangan, Craig A. N. Soules, John D. Strunk, Jay J. Wylie (Carnegie Mellon University, PA, USA), Proceedings of the DARPA Information Survivability Conference & Exposition (DISCEX), June 2001

    The paper presents the architecture of PASIS, a ``survivable storage system''. It contains a nice introduction to general threshold schemes, a notion that encompasses both erasure codes and threshold cryptography. Thus, a p-m-n general threshold scheme is described as one where m out of n shares/fragments are required to reconstruct the original data and, additionally, fewer than p shares/fragments convey no information about the original data, from an information-theoretic viewpoint. As in Delta-4 deswarte91:intrusion, the authors argue that scattering and threshold scheme can be used in lieu of ``regular'' encryption. The authors describe the range of tradeoffs that can be made using general threshold scheme, between availability, confidentiality, and storage overhead. They also mention latency for read accesses, since at least m storage nodes must be queried. As far as availability is concerned, they note a significant gain in using threshold schemes, assuming the availability of individual storage nodes is 0.001 (see lin04:revisited for why this wouldn't hold with lower individual node availability). The authors propose an automatic tradeoff selection mechanisms based on the size and access patterns of objects (e.g., read/write, public vs. private). They then go on describing the PASIS architecture, emphasizing the use of automatic, inescapable versioning to facilitate intrusion detection and recovery of a clean state. They refer to EFS santry99:deciding and XDelta macdonald99:versioned, among others.

  • Energy Aware Lossless Data Compression [barr06:energy-aware]
    Kenneth Barr, Krste Asanovic (Massachusetts Institute of Technology, USA), appeared in ACM Transactions on Computer Systems, August 2006

    This is an extended version of barr03:energy-aware (sections 3 and 4 detail the experiments and measurements they made). One of the starting point here is that sending 1 bit over wireless Ethernet is roughly as costly as performing 1000 add operations (noting that the wireless communication quality and throughput noticeably impacts the energy cost of transmission). However, the authors quickly note that the "energy to compress the file approaches or outweighs the energy to transmit it". They then note that a cache miss requires "up to 63 times the energy of an add", thereby introducing the idea that larger memory footprints yield increased energy consumption. On the other hand, the authors note that larger memory footprint usually help increase compression ratios and thus reduce the required transmission power.

    The authors note that "compression and decompression energy are roughly proportional to execution time" on their test platform, an assumption that we made in courtes06:storage.

    Section 5 of the article shows how to optimize compressors based on knowledge of their cache behavior on the target architecture. Namely, by optimizing data structures so that they fit in the architecture's cache, one can significantly reduce the energy footprint of a compressor. They illustrate this on the compress program. The main techniques shown here to reduce cache footprint are:

    1. grouping and compacting data structures;
    2. using 16-bit offsets rather than 32-bit pointers;
    3. if possible, changing the hardware's data caching policy, like disabling data cache in some cases.
    Finally, the authors note that choosing the fastest algorithm so that the device can spend more time in idle mode can be a viable strategy when the idle mode is really a low-power mode.

  • ?valuation de la s?ret? de fonctionnement d'un syst?me de sauvegarde coop?rative pour des dispositifs mobiles (Master de recherche) [hamouda06:evaluation]
    Ossama Hamouda (LAAS-CNRS, Toulouse, France), 2006
  • Dependability Evaluation of Cooperative Backup Strategies for Mobile Devices [courtes07:dependability]
    Ludovic Court?s, Ossama Hamouda, Mohamed Ka?niche, Marc-Olivier Killijian, David Powell (LAAS-CNRS), Proceedings of the IEEE International Symposium on Pacific Rim Dependable Computing, December 2007

    Mobile devices (e.g., laptops, PDAs, cell phones) are increasingly relied on but are used in contexts that put them at risk of physical damage, loss or theft. This paper discusses the dependability evaluation of a cooperative backup service for mobile devices. Participating devices leverage encounters with other devices to temporarily replicate critical data. Permanent backups are created when the participating devices are able to access the fixed infrastructure. Several data replication and scattering strategies are presented, including the use of erasure codes. A methodology to model and evaluate them using Petri nets and Markov chains is described. The paper also discusses the use of erasure codes and the impact of contributor behavior on the cooperative service. We demonstrate that our cooperative backup service decreases the probability of data loss by a factor up to the ad hoc to Internet connectivity ratio.

  • Some applications of Rabin's fingerprinting method [broder93:rabin]
    Andrei Z. Broder, Sequences II: Methods in Communications, Security, and Computer Science, 1993

    Discusses applications of Rabin fingerprints such as DAG fingerprinting as used in Vesta heydon01:vesta. It does not present manber94:finding, though.

  • Erasure Code Replication Revisited [lin04:revisited]
    W. K. Lin, D. M. Chiu, Y. B. Lee (Chinese University of Hong Kong), Proceedings of the Fourth International Conference on Peer-to-Peer Computing, 2004

    A nice theoretical analysis of the benefits or non-benefits of erasure coding compared to ``whole-file replication''. The authors evaluate the file availability A under various values of the some key parameters/factors:

    • the storage overhead, S, or stretch factor;
    • the number of blocks a file is divided into, b (for b=1, we are actually using whole-file replication);
    • the average peer availability, μ.
    The authors draw a number of clear, interesting conclusions:
    1. erasure coding is beneficial (in terms of file availability) for large values of μ and counter-productive for small values (again, compared to whole-file replication);
    2. when S×μ > 1, then A increases when b does;
    3. therefore, when S×μ > 1 the optimal b would be infinity; otherwise, the optimal b is 1; in other words, there is a sharp switch from one strategy to the other;
    4. associating a cost function dependent on b can help determine a value of b that is optimal both in terms of (computational) cost and resulting file availability, when a b > 1 is to be chosen at all;
    5. finally, inaccurate estimation of μ may lead to terribly wrong decisions.
    Among their conclusions, the authors consider that it might be worth performing this sort of evaluation assuming different values of μ for each peer. See also bhagwan04:totalrecall on that topic where different ways to determine μ dynamically, on a per-peer basis, are considered.

  • Efficient Dispersal of Information for Security, Load Balancing, and Fault Tolerance [rabin89:dispersal]
    Michael O. Rabin (Harvard University, Cambridge, MA, USA), appeared in Journal of the ACM, April 1989
  • ZLIB Compressed Data Format Specification Version 3.3 (RFC 1950) [deutsch96:zlib]
    Peter Deutsch, Jean-Loup Gailly (Internet Engineering Task Force (IETF)), May 1996

    Specification of the format used by zlib.

  • Intrusion Tolerance in Distributed Computing Systems [deswarte91:intrusion]
    Yves Deswarte, Laurent Blain, Jean-Charles Fabre (LAAS-CNRS, Toulouse, France; INRIA, France), Proceedings of the IEEE Symposium on Research in Security and Privacy, May 1991

    Deals with an application of the fragmentation-redundancy-dissemination (FRD, aka. ``fragmentation-replication-scattering'' or FRS) technique.

  • Online Codes [maymounkov02:online-codes]
    Petar Maymounkov (Secure Computer Systems Group, New York University, NY, USA), November 2002

    Online codes are rateless codes (sometimes referred to as fountain codes), i.e., they can produce an infinity of encoded blocks, or check blocks. Given an n-block message, (1 + ε)n check blocks are sufficient to recover the original message. Encoding time is linear and can be performed on demand; check blocks are independent of each other. See also mitzenmacher04:fountains.

  • STAR: An Efficient Coding Scheme for Correcting Triple Storage Node Failures [huang05:star-code]
    Cheng Huang, Lihao Xu (Microsoft Research, Redmond, WA, USA; Wayne State University, Detroit, MI, USA), Proceedings of the Fourth USENIX Conference on File and Storage Technologies, 2005

    A description of the STAR coding scheme, a (p+3,p) MDS scheme (i.e., can tolerate 3 erasures with a storage overhead of p+3/p); in other words, it is an maxim-distance separable (MDS) array code of distance 4.

  • Finding Similar Files in a Large File System [manber94:finding]
    Udi Manber (Department of Computer Science, University of Arizona, USA), Proceedings of the USENIX Winter 1994 Conference, January 1994
  • An Evaluation of Binary XML Encoding Optimizations for Fast Stream Based XML Processing [bayardo04:binary-xml]
    Roberto J. Bayardo, Daniel Gruhl, Jussi Myllymaki, Vanja Josifovski (IBM Almaden Research Center, San Jose, CA, USA), Proceedings of the 13th International Conference on World Wide Web, 2004

    Putting an end to weird and awfully inefficient data representation formats... Refers to liefke00:xmill.

(made with skribilo)