Counting Non-zero Elements Within Each Row And Within Each Column Of A 2d Numpy Array

February 28, 2024 Post a Comment

I have a NumPy matrix that contains mostly non-zero values, but occasionally will contain a zero value. I need to be able to: Count the non-zero values in each row and put that c

Solution 1:

import numpy as np

a = np.array([[1, 0, 1],
              [2, 3, 4],
              [0, 0, 7]])

columns = (a != 0).sum(0)
rows    = (a != 0).sum(1)

The variable (a != 0) is an array of the same shape as original a and it contains True for all non-zero elements.

The .sum(x) function sums the elements over the axis x. Sum of True/False elements is the number of True elements.

The variables columns and rows contain the number of non-zero (element != 0) values in each column/row of your original array:

columns = np.array([2, 1, 3])
rows    = np.array([2, 3, 1])

EDIT: The whole code could look like this (with a few simplifications in your original code):

ANOVAInputMatrixValuesArray = zeros([len(TestIDs), 9], float)
for j, TestID inenumerate(TestIDs):
    ReadOrWrite = 'Read'
    fileName = inputFileName
    directory = GetCurrentDirectory(arguments that return correct directory)
    # use directory or filename to get the CSV file?withopen(directory, 'r') as csvfile:
        ANOVAInputMatrixValuesArray[j,:] = loadtxt(csvfile, comments='TestId', delimiter=';', usecols=(2,))[:9]

nonZeroCols = (ANOVAInputMatrixValuesArray != 0).sum(0)
nonZeroRows = (ANOVAInputMatrixValuesArray != 0).sum(1)

EDIT 2:

To get the mean value of all columns/rows, use the following:

colMean = a.sum(0) / (a != 0).sum(0)
rowMean = a.sum(1) / (a != 0).sum(1)

What do you want to do if there are no non-zero elements in a column/row? Then we can adapt the code to solve such a problem.

Solution 2:

A fast way to count nonzero elements per row in a scipy sparse matrix m is:

np.diff(m.tocsr().indptr)

The indptr attribute of a CSR matrix indicates the indices within the data corresponding to the boundaries between rows. So calculating the difference between each entry will provide the number of non-zero elements in each row.

Similarly, for the number of nonzero elements in each column, use:

np.diff(m.tocsc().indptr)

If the data is already in the appropriate form, these will run in O(m.shape[0]) and O(m.shape[1]) respectively, rather than O(m.getnnz()) in Marat and Finn's solutions.

If you need both row and column nozero counts, and, say, m is already a CSR, you might use:

row_nonzeros = np.diff(m.indptr)
col_nonzeros = np.bincount(m.indices)

which is not asymptotically faster than first converting to CSC (which is O(m.getnnz())) to get col_nonzeros, but is faster because of implementation details.

Solution 3:

The faster way is to clone your matrix with ones instead of real values. Then just sum up by rows or columns:

X_clone = X.tocsc()
X_clone.data = np.ones( X_clone.data.shape )
NumNonZeroElementsByColumn = X_clone.sum(0)
NumNonZeroElementsByRow = X_clone.sum(1)

That worked 50 times faster for me than Finn Årup Nielsen's solution (1 second against 53)

edit: Perhaps you will need to translate NumNonZeroElementsByColumn into 1-dimensional array by

np.array(NumNonZeroElementsByColumn)[0]

Solution 4:

For sparse matrices, use the getnnz() function supported by CSR/CSC matrix.

E.g.

a = scipy.sparse.csr_matrix([[0, 1, 1], [0, 1, 0]])
a.getnnz(axis=0)

array([0, 2, 1])

Solution 5:

(a != 0) does not work for sparse matrices (scipy.sparse.lil_matrix) in my present version of scipy.

For sparse matrices I did:

    (i,j) = X.nonzero()
    column_sums = np.zeros(X.shape[1])
    for n in np.asarray(j).ravel():
        column_sums[n] += 1.

I wonder if there is a more elegant way.

Free Interactive Python Tutorial