Feb 20th, 2019
The data engineering team at Bazaarvoice, a software-as-a-service digital marketing company based in Austin, Texas, must handle data at massive Internet-scale to serve its customers. The company enables retailers and brands to curate, manage, and understand user-generated content (UGC) like product reviews, shopper questions and answers, and curated social content, and then uses that data to provide deep analytics on consumer behaviors.
Jan 14th, 2019
Testing distributed systems at scale is typically a costly yet necessary process. At Alluxio we take testing very seriously as
organizations across the world rely on our technology, therefore, a problem we want to solve is how to test at scale without
breaking the bank. In this blog we are going to show how the maintainers of the Alluxio open source project build and test
our system at scale cost-effectively using public cloud infrastructure. We test with the most popular frameworks, such as
Spark and Hive, and pervasive storage systems, such as HDFS and S3. Using Amazon AWS EC2, we are able to test 1000+
worker clusters, at a cost of about $16 per hour.
Nov 14th, 2018
The cloud is rapidly becoming ubiquitous, with continued adoption focused on the flexibility and cost benefits of a utility infrastructure model. Enterprises are increasingly taking a “data first” view of infra- structure, which demands a new way of thinking in a world in which data is stored and accessed from multiple locations and providers. Performance and interoperability challenges, however, can present obstacles to cloud adoption and complicate data management. Techniques such as the use of data silos, ETL processes and multiple data copies, which are commonly employed to accommodate cloud limitations, often tend to offset the expected benefits of cloud infrastructure.
Oct 16th, 2018
Alluxio was created because we saw a need for innovation at the data layer rising from the growing complexity of connecting multiple compute frameworks to an ever-expanding mix of storage systems and formats. Our approach uses a memory-centric architecture that abstracts files and objects in underlying persistent storage systems and provides a shared data access layer for compute applications.
Oct 15th, 2018
Alluxio is an open source software solution that connects analytics applications to heterogeneous data sources through a distribued caching layer that sits between compute and storage. It runs on commodity hardware, creating a shared data layer abstracting the files or objects in underlying persistent storage systems. Applications connect to Alluxio via a standard interface, accessing data from a single unified source. This whitepaper discusses the data center challenges Alluixo addresses, the benefits provided, and an overview of how it works.
Mar 19th, 2018
From our Friends at MOMO...
The hadoop ecosystem makes many distributed system/algorithms easier to use and generally lowers the cost of operations. However, enterprises and vendors are never satisfied with that, so higher performance becomes the next issue. We considered several options to address our performance needs and focused our efforts on Alluxio, which improves performance with intelligent caching.
Alluxio clusters act as a data access accelerator for remote data in connected storage systems. Temporarily storing data in memory, or other media near compute, accelerates access and provides local performance from remote storage. This capability is even more critical with the movement of compute applications to the cloud and data being located in object stores separate from compute. Caching is transparent to users, using read/write buffering to maintain continuity with persistent storage. Intelligent cache management utilizes configurable policies for efficient data placement and supports tiered storage for both memory and disk (SSD/HDD).
Feb 24th, 2018
Enterprises are adopting big data technologies to analyze and derive insight from their growing volumes of structured and unstructured data.
A familiar problem is the requirement to analyze data from multiple independent storage silos concurrently. In order to consolidate the data, large enterprises typically use custom solutions or build a data lake. These approaches present additional challenges and can be costly and time consuming.
Alluxio helps organizations handle their big data by providing a unified view of all of the data in your enterprise – on premise, in the cloud, or hybrid. Applications access data using a standard interface to a global virtual namespace. Alluxio also employs a memory-centric architecture to enable data access at memory speed. With the combined unification and performance benefits, Alluxio can effectively provide big data federation for organizations by acting as a virtual data lake.
Oct 27th, 2017
Many organizations deploy Alluxio together with Spark for performance gains and data manageability benefits. In this blog post, we investigate how Alluxio helps Spark be more effective. Alluxio increases performance of Spark jobs, helps Spark jobs perform more predictably, and enables multiple Spark jobs to share the same data from memory. Previously, we investigated how Alluxio is used for Spark RDDs. In this article, we investigate how to effectively use Spark DataFrames with Alluxio.
May 2nd, 2017
For business to not just survive — but to flourish — it’s become imperative to make decisions with near immediacy, continuously pivot strategy and tactics, and merge streams of inquiries into meaningful action. Executing requires high-frequency insights — the competitive advantage in today’s frenetic business landscape. Together with Alluxio, Inc., we enable businesses to gain the competitive advantage with faster time to insights with our integrated solution of Cray high-performance analytics platform and Alluxio’s memory-speed virtual storage system — Alluxio Enterprise Edition.
Feb 11th, 2017
Alluxio, ￼￼￼￼￼formerly Tachyon, is the world's first system which unifies data at memory speeds while achieving affordability through Alluxio's innovative tiered storage functionality. This Samsung whitepaper shows how Alluxio’s storage can be used with different storage media available in systems including NVME SSDs while providing in‐line performance consistent with the speed of the underlying storage media. Alluxio provides the capability to leverage all the storage that is available in a system.
Oct 14th, 2016
Understand the benefits Alluxio brings to analytics on object storage.
- Derive timely insights from data with memory-speed access
- Enable data sharing between applications without sacrificing performance
- Reduce costs with efficient memory utilization
Sep 4th, 2016
Learn how Alluxio is used in clusters with co-located compute and storage to
improve two key metrics of Data Analytics Clusters:
· Performance predictability allowing SLAs to be met more easily.
· Up to 10x improved performance.
Aug 28th, 2016
In this article, we show by saving RDDs in Alluxio, Alluxio can keep larger data
sets in-memory for faster Spark applications, as well as enable sharing of RDDs
across separate Spark applications.
Aug 18th, 2016
This whitepaper consists of two portions. The first is a high level overview of the
advantages of using Alluxio as a core technology with on-demand clusters. The
second portion is intended for engineers; it provides a detailed step-by-step guide to
deploying an on-demand cluster with Alluxio and instructions for running a sample
workload on the cluster. At the end of the paper you will have a good understanding
of how to deploy this architecture and the value Alluxio brings to the stack.
Apr 22nd, 2016
The exponential growth of the raw computational power, communication bandwidth, and storage capacity results in continuous innovation in how data is processed and stored. To address the evolving nature of the compute and storage landscape, we are continuously advancing Alluxio, a state-of-the-art memory-centric virtual distributed storage system.