Pandas: Efficient Way To Combine Dataframes
I'm looking for a more efficient way than pd.concat to combine two pandas DataFrames. I have a large DataFrame (~7GB in size) with the following columns - 'A', 'B', 'C', 'D'. I wan
Solution 1:
Solved
So Niels Henkens comment really helped and the solution is to just -
result = in_df.groupby(by=["A","B"]).agg({"C": np.mean, "D": np.sum})
Another improvement in performance is to use Dask -
import dask.dataframe as dd
df = dd.read_csv(PATH_TO_FILE, delimiter=DELIMITER)
g = df.groupby(by=["A", "B"]).agg({"C": np.mean, "D": np.sum}).compute().reset_index()
Post a Comment for "Pandas: Efficient Way To Combine Dataframes"