Insights result from matching the right analysis tools to the right data. Enable analysts and data scientists to access and share large and varied data sets at the speed of memory. Speed up your applications, gain insight faster, and make better decisions. Explore the resources below to learn more.

Whitepaper: Accelerating On-Demand Data Analytics with Alluxio

Learn how Alluxio accelerates applications with memory-speed data access including a step-by-step guide to deploying and on-demand cluster running a sample workload.

Blog Post: Getting Started with Alluxio and Spark

Read the how-to guide for providing Spark with a reliable data sharing layer that increases performance with memory-speed access to data.

Blog: Effective Spark DataFrames with Alluxio

Read how Alluxio makes Spark more effective by increasing the performance and predictability of spark jobs and enables multiple Spark jobs to share the same data from memory.

Whitepaper: Structured Big Data Federation Using Alluxio

Learn how to consolidate structured and unstructured data into a virtual data layer without ETL. Applications can analyze data from multiple independent storage silos concurrently with a standard interface.

Blog: Enabling Decoupled Compute and Storage with Alluxio

Learn how Alluxio enables the separation of compute and storage resources, providing better infrastructure flexibility without performance tradeoffs.

Blog: Accelerating Cloud Pipelines with Alluxio and Fast Durable Writes

Using Alluxio, data can be shared between pipeline stages at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this can greatly increase the performance

White Paper: Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Learn how Alluxio improves the performance and consistency for multiple applications accessing data stored in Hadoop Distributed File System (HDFS) Storage.

Case Study: Hedge Fund Improves Machine Learning Model Performance with Alluxio

Learn how a leading US Hedge Fund improved machine learning model processing time by 4X for large scale data processing in a hybrid cloud environment.

Case Study: Lenovo Analyzes Petabytes of Smartphone Data from Multiple Locations and Eliminates ETL with Alluxio

Learn how Lenovo unified data from multiple data centers and eliminated the ETL process while lowering storage cost due to multiple data copies.

Case Study: Tencent Delivers Customized News to Over 100 Million Users per Month with Alluxio

Learn how real time analytics with Spark and Alluxio enables a tailored, data-driven news experience for Tencent customers

Case Study: Myntra Accelerates Analytics in the Cloud for Mobile E-Commerce

Learn how leading e-commerce company Myntra unifies data in the AWS cloud for processing an analytics pipeline with Spark and Kafka.

Case Study: TalkingData, Leading Data Broker in China, Leverages Alluxio to Unify Terabytes of Data Across Disparate Data Sources

Learn how China’s leading data broker unifies data from AWS S3 (object storage) and HDFS without ETL for analytics in a hybrid cloud environment

Blog: MOMO Accelerates Ad Hoc Analysis with Spark SQL and Alluxio

Learn how MOMO deployed Alluxio to optimize ad hoc queries with Spark SQL

Case Study: Scalable Genomics Data Processing Pipeline with Alluxio

Learn how Guardant Health deployed an end-to-end Spark data processing solution with Alluxio as the unified storage layer in conjunction with Mesos and Minio.

Case Study: Data Processing Workflow Among Multiple Shared Applications

Learn how Alluxio provides an in-memory storage layer for data, so any Spark application has straightforward access through the standard file system API as you would for HDFS. Alluxio enables transformations and explorations on large datasets in memory, while enjoying the simple integration with our existing applications.

Case Study: Baidu Accelerates Access to Remote Storage

Learn how Baidu, the largest search engine in China, overcame query performance challenges with data located in a remote data center, transforming queries from a batch process to interactive.