Skip to content Skip to sidebar Skip to footer

Pandas: Efficient Way To Combine Dataframes

I'm looking for a more efficient way than pd.concat to combine two pandas DataFrames. I have a large DataFrame (~7GB in size) with the following columns - 'A', 'B', 'C', 'D'. I wan

Solution 1:

Solved

So Niels Henkens comment really helped and the solution is to just -

result = in_df.groupby(by=["A","B"]).agg({"C": np.mean, "D": np.sum})

Another improvement in performance is to use Dask -

import dask.dataframe as dd
df = dd.read_csv(PATH_TO_FILE, delimiter=DELIMITER)
g = df.groupby(by=["A", "B"]).agg({"C": np.mean, "D": np.sum}).compute().reset_index()

Post a Comment for "Pandas: Efficient Way To Combine Dataframes"