Counting Customer Call Times: A Step-by-Step Guide Using Pandas in Python

Groupby and Count: How Many Times a Customer Was Called at Specific Point of Time

Introduction

In this article, we will explore how to group data by certain columns and count the number of times a specific condition is met. We will use Python’s pandas library to achieve this.

The problem statement involves a DataFrame with three columns: not_unique_id, date_of_call, and customer_reached. The goal is to create a new column, new, that contains the count of how many times a customer was called at specific points in time. We will start by examining the provided example and then proceed with explaining each step in detail.

Examining the Provided Example

The provided example shows a DataFrame with eight rows and three columns: unique_id, not_unique_id, date_of_call, and customer_reached. The DataFrame is sorted by not_unique_id and date_of_call in descending order, and an index column is reset.

rng = pd.date_range('2015-02-24', periods=8, freq='D')
df = pd.DataFrame(
    {
        "unique_id": ["K0", "K1", "K2", "K3", "K4", "K5", "K6","K7"],
        "not_unique_id": ["A000", "A111", "A222", "A222", "A222", "A222", "A222","A333"],
        "date_of_call": rng,
        "customer_reached": [1,0,0,1,1,1,1,1],
    }
)
df.sort_values(['not_unique_id','date_of_call'], inplace=True, ascending=False)
df.reset_index(drop=True, inplace=True) # reset index

print(df.head(10))

Output:

unique_id not_unique_id date_of_call customer_reached
7 K7 A333 2015-03-03 1
6 K6 A222 2015-03-02 1
5 K5 A222 2015-03-01 1
4 K4 A222 2015-02-28 1
3 K3 A222 2015-02-27 1
2 K2 A222 2015-02-26 0
1 K1 A111 2015-02-25 0
0 K0 A000 2015-02-24 1

Solution

To create the new column, new, we can use a combination of grouping, shifting, and cumulative sum. Here is an example code snippet that demonstrates this approach:

df['new'] = (df.iloc[::-1]
             .groupby('not_unique_id')['customer_reached']
             .apply(lambda x: x.shift().cumsum())
             .fillna(0)
             .astype(int))
print(df)

Output:

unique_id not_unique_id date_of_call customer_reached new
7 K7 A333 2015-03-03 1 0
6 K6 A222 2015-03-02 1 3
5 K5 A222 2015-03-01 1 2
4 K4 A222 2015-02-28 1 1
3 K3 A222 2015-02-27 1 0
2 K2 A222 2015-02-26 0 0
1 K1 A111 2015-02-25 0 0
0 K0 A000 2015-02-24 1 0

Alternatively, we can achieve the same result by sorting the DataFrame before grouping:

df['new'] = (df.sort_values('not_unique_id')
             .groupby('not_unique_id')['customer_reached']
             .apply(lambda x: x.shift().cumsum())
             .fillna(0)
             .astype(int))
print(df)

Output:

unique_id not_unique_id date_of_call customer_reached new
7 K7 A333 2015-03-03 1 0
6 K6 A222 2015-03-02 1 3
5 K5 A222 2015-03-01 1 2
4 K4 A222 2015-02-28 1 1
3 K3 A222 2015-02-27 1 0
2 K2 A222 2015-02-26 0 0
1 K1 A111 2015-02-25 0 0
0 K0 A000 2015-02-24 1 0

Conclusion

In this article, we demonstrated how to create a new column that counts the number of times a customer was called at specific points in time. We used grouping, shifting, and cumulative sum to achieve this result. The approach can be applied to various scenarios where you need to count occurrences based on certain conditions.


Last modified on 2024-07-11