Skip to content Skip to sidebar Skip to footer
Showing posts with the label Bigdata

Sklearn-gmm On Large Datasets

I have a large data-set (I can't fit entire data on memory). I want to fit a GMM on this data s… Read more Sklearn-gmm On Large Datasets

Numpy: 3-byte, 6-byte Types (aka Uint24, Uint48)

NumPy seems to lack built-in support for 3-byte and 6-byte types, aka uint24 and uint48. I have a l… Read more Numpy: 3-byte, 6-byte Types (aka Uint24, Uint48)

Split 10 Billion Line File Into 5,000 Files By Column Value In Perl Or Python

I have a 10 billion line tab-delimited file that I want to split into 5,000 sub-files, based on a c… Read more Split 10 Billion Line File Into 5,000 Files By Column Value In Perl Or Python

How To Incrementally Create An Sparse Matrix On Python?

I am creating a co-occurring matrix, which is of size 1M by 1M integer numbers. After the matrix i… Read more How To Incrementally Create An Sparse Matrix On Python?

Python Pandas Error While Removing Extra White Space

I am trying to clean a column in data frame of extra white space using command. The data frame has … Read more Python Pandas Error While Removing Extra White Space

Correct Way Of Writing Two Floats Into A Regular Txt

I am running a big job, in cluster mode. However, I am only interested in two floats numbers, which… Read more Correct Way Of Writing Two Floats Into A Regular Txt

How Can A Reduce A Key Value Pair To Key And List Of Values?

Let us Assume, I have a key value pair in Spark, such as the following. [ (Key1, Value1), (Key1, Va… Read more How Can A Reduce A Key Value Pair To Key And List Of Values?