Reducing File Size Of Scatter Plot

February 28, 2024 Post a Comment

I am currently trying to reduce the file size of a scatter plot. My code looks like: plt.scatter(a1,b1) plt.savefig('test.ps') where a1,b1 are arrays of size 400,000 or so, and it

Solution 1:

You could consider using e.g. hexbin -- I particularly like this when you have a dense collection of points, since it better indicates where your data is concentrated. For example:

import numpy as np
import matplotlib.pylabas pl

x = np.random.normal(size=40000)
y = np.random.normal(size=40000)

pl.figure()

pl.subplot(121)
pl.scatter(x, y)
pl.xlim(-4,4)
pl.ylim(-4,4)

pl.subplot(122)
pl.hexbin(x, y, gridsize=40)
pl.xlim(-4,4)
pl.ylim(-4,4)

From the left figure, I would have concluded that the distribution of points between x,y = {-3,3} is roughly equal, which clearly is not the case.

(http://matplotlib.org/examples/pylab_examples/hexbin_demo.html)

Solution 2:

One approach is to use plot instead of scatter (you can still produce scatter plots using plot by using the 'o' argument), and use the rasterized keyword argument, like so:

import numpy as np 
import matplotlib.pyplot as plt

a1,b1 = np.random.randn(400000,2).T #mock data of similar size to yours
plt.plot(a1,b1,'o',rasterized=True)
plt.savefig("test.ps")

This should significantly reduce the size of the output file. The text and line art will remain vector, only the points are rasterized, so it is a nice compromise.

Depending on what you're looking to achieve, however, it might be better to histogram your data and plot that instead (e.g. pyplot.hist2d or pyplot.hexbin).

Free Interactive Python Tutorial

Reducing File Size Of Scatter Plot

Solution 1:

Solution 2:

Post a Comment for "Reducing File Size Of Scatter Plot"