◀ bigdata
 

DataFu 1.0: open source functions for large scale data

DataFu is an open-source collection of user-defined functions for working with large-scale data in Hadoop and Pig.About two years ago, we recognized a need for a stable, well-tested library of Pig UDFs that could assist in common data mining and statistics tasks. Over the years, we had developed several routines that were used across LinkedIn and were thrown together into an internal package we affectionately called “littlepiggy.” The unfortunate part, and this is true of many such efforts, is that the UDFs were ill-documented, ill-organized, and easily got broken when someone made a change. Along came PigUnit, which allowed UDF testing, so

Go To Source
comments powered by Disqus
05 Sep
Peter Skomoroch @peteskomoroch
DataFu 1.0 released http://t.co/Kp5OvIrM25 nice work by @matterhayes @sam_shah & team #hadoop
05 Sep
Mortar @mortardata
Congrats to our friends at LinkedIn Engineering on the DataFu 1.0 release! http://t.co/bdYFVN0ZTM
05 Sep
Takuya UESHIN @ueshin
RT @repeatedly: Heh! > "DataFu 1.0" http://t.co/h9CQK99bvQ
05 Sep
Miguel Romero @donkelito
UDFs to help developmet in #hadoop and #pig #DataFu 1.0 » good read http://t.co/bS7cYpVAOb via @feedly
05 Sep
Vincent Heuschling @vhe74
#DataFu 1.0 for Apache #Hadoop #Pig by Linkedin Engineering http://t.co/lRLfz1rEhL
04 Sep
Prashant Kommireddi @pRaShAnT1784
LinkedIn Engineering releases DataFu 1.0 http://t.co/nqri2WBdL2
04 Sep
OpenMediation @soafaq
RT @kestelyn: RT @sam_shah: DataFu, a collection of user-defined functions for Hadoop and Pig, reaches 1.0. http://t.co/MPExbuFLTj < DataFu…
04 Sep
Justin Kestelyn @kestelyn
RT @sam_shah: DataFu, a collection of user-defined functions for Hadoop and Pig, reaches 1.0. http://t.co/MPExbuFLTj < DataFu ships in CDH!
04 Sep
Josh Wills @josh_wills
RT @matterhayes: DataFu 1.0 | LinkedIn Engineering http://t.co/rYCMl69x0l