Skip to content Skip to sidebar Skip to footer

Reducing File Size Of Scatter Plot

I am currently trying to reduce the file size of a scatter plot. My code looks like: plt.scatter(a1,b1) plt.savefig('test.ps') where a1,b1 are arrays of size 400,000 or so, and it

Solution 1:

You could consider using e.g. hexbin -- I particularly like this when you have a dense collection of points, since it better indicates where your data is concentrated. For example:

import numpy as np
import matplotlib.pylabas pl

x = np.random.normal(size=40000)
y = np.random.normal(size=40000)

pl.figure()

pl.subplot(121)
pl.scatter(x, y)
pl.xlim(-4,4)
pl.ylim(-4,4)

pl.subplot(122)
pl.hexbin(x, y, gridsize=40)
pl.xlim(-4,4)
pl.ylim(-4,4)

enter image description here

From the left figure, I would have concluded that the distribution of points between x,y = {-3,3} is roughly equal, which clearly is not the case.

(http://matplotlib.org/examples/pylab_examples/hexbin_demo.html)

Solution 2:

One approach is to use plot instead of scatter (you can still produce scatter plots using plot by using the 'o' argument), and use the rasterized keyword argument, like so:

import numpy as np 
import matplotlib.pyplot as plt

a1,b1 = np.random.randn(400000,2).T #mock data of similar size to yours
plt.plot(a1,b1,'o',rasterized=True)
plt.savefig("test.ps")

This should significantly reduce the size of the output file. The text and line art will remain vector, only the points are rasterized, so it is a nice compromise.

Depending on what you're looking to achieve, however, it might be better to histogram your data and plot that instead (e.g. pyplot.hist2d or pyplot.hexbin).

Post a Comment for "Reducing File Size Of Scatter Plot"