Reducing File Size Of Scatter Plot
Solution 1:
You could consider using e.g. hexbin
-- I particularly like this when you have a dense collection of points, since it better indicates where your data is concentrated. For example:
import numpy as np
import matplotlib.pylabas pl
x = np.random.normal(size=40000)
y = np.random.normal(size=40000)
pl.figure()
pl.subplot(121)
pl.scatter(x, y)
pl.xlim(-4,4)
pl.ylim(-4,4)
pl.subplot(122)
pl.hexbin(x, y, gridsize=40)
pl.xlim(-4,4)
pl.ylim(-4,4)
From the left figure, I would have concluded that the distribution of points between x,y = {-3,3}
is roughly equal, which clearly is not the case.
(http://matplotlib.org/examples/pylab_examples/hexbin_demo.html)
Solution 2:
One approach is to use plot
instead of scatter
(you can still produce scatter plots using plot
by using the 'o'
argument), and use the rasterized
keyword argument, like so:
import numpy as np
import matplotlib.pyplot as plt
a1,b1 = np.random.randn(400000,2).T #mock data of similar size to yours
plt.plot(a1,b1,'o',rasterized=True)
plt.savefig("test.ps")
This should significantly reduce the size of the output file. The text and line art will remain vector, only the points are rasterized, so it is a nice compromise.
Depending on what you're looking to achieve, however, it might be better to histogram your data and plot that instead (e.g. pyplot.hist2d
or pyplot.hexbin
).
Post a Comment for "Reducing File Size Of Scatter Plot"