Tuesday, April 23, 2013

Presto: distributed R framework from HP Labs

I got from my collaborator Aapo Kyrola the following pointer to Presto.
Presto is an interesting system which allowed large scale computation in R by distributing the computational workload in a cluster. Presto implements distributed arrays and thus allows efficient implementation of linear algebra primitives like matrix-vector product.

The following two papers where recently published about Presto:

It those papers, a large number of applications where implemented in Presto like K-means, ALS< pagerank, vertex centrality, shortest path and others. A large performance gain of x15 - x40 is demonstrated over Hadoop and Spark.

Unfortunately, it is not clear if Presto will be released as an open source project.

No comments:

Post a Comment