ThriftStore

From NetSysLab

Jump to: navigation, search

This project explores the feasibility of a cost-efficient storage architecture that offers the reliability and access performance characteristics of a high-end system. This architecture exploits two opportunities: First, scavenging idle storage from LAN-connected desktops not only offers a low-cost storage space, but also high I/O throughput by aggregating the I/O channels of the participating nodes. Second, the two components of data reliability –durability and availability– can be decoupled to control overall system cost. To capitalize on these opportunities we integrate two types of components: volatile, scavenged storage and dedicated, yet low-bandwidth durable storage. On one side, the durable storage forms a low-cost back-end that enables the system to restore the data the volatile nodes may lose. On the other side, the volatile nodes provide a high-throughput front-end.

While integrating these components has the potential to offer a unique combination of high-throughput, low-cost, and durability, a number of concerns need to be addressed to architect and correctly provision the system. To this end, we develop analytical- and simulation‑based tools to evaluate the impact of system characteristics (e.g., bandwidth limitations on the durable and the volatile nodes, space constraints, replica placement scheme) on data availability and the associated costs in terms of maintenance traffic. Further, we implement and evaluate a prototype of the proposed architecture: namely a GridFTP server that aggregates volatile resources. Our evaluation demonstrates an impressive, up to 800MBps transfer throughput for the new GridFTP service.


Publications

[3] ThriftStore: Finessing Reliability Tradeoffs in Replicated Storage Systems, Abdullah Gharaibeh, Samer Al-Kiswany, Matei Ripeanu, IEEE Transactions on Parallel and Distributed Systems, vol 22(6), pp.910-923, June 2011 pdf
[2] Exploring Data Reliability Tradeoffs in Replicated Storage Systems, Abdullah Gharaibeh, Matei Ripeanu, In the ACM/IEEE International Symposium on High Performance Distributed Computing (HPDC'09), Munich, Germany, June 2009. pdf slides
[1] Exploring Data Reliability Tradeoffs in Replicated Storage Systems, Abdullah Gharaibeh, Master of Applied Science Thesis, University of British Columbia, June 2009, link slides