Alluxio Architecture and Data Flow

Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access layer for compute applications.

Alluxio is not a persistent storage system. Instead, Alluxio serves as a data access layer, residing between any persistent storage system (such as Amazon S3, Microsoft Azure Object Store, Apache HDFS or OpenStack Swift) and computation frameworks (such as Apache Spark, Presto or Hadoop MapReduce). This whitepaper provides a technical overview of the Alluxio architecture and describes the data flow for common read and write scenarios.