Skip to content Skip to sidebar Skip to footer

Count Distinct Strings In Rolling Window Using Pandas + Python (with A Condition)

I want to calculate the number of distinct port numbers that exist between the current row and the 5 previous rows (sliding window) and this when the same address appears. For ins

Solution 1:

for index, row in df.iterrows(): small_df = df[index - 5:index] df['uniques'][index] = len(small_df.unique())

Here's my quick shot at it.

Solution 2:

Ok , seems like you data inout is mismatch with the df your show to us

df.groupby('ADDRESS').PORT.apply(lambda x : pd.Series(x).rolling(5,min_periods=1).apply(lambda y: len(set(y))))
Out[844]: 
01.011.021.032.042.052.0
Name: PORT, dtype: float64

Post a Comment for "Count Distinct Strings In Rolling Window Using Pandas + Python (with A Condition)"