Counting Customer Call Times: A Step-by-Step Guide Using Pandas in Python

Groupby and Count: How Many Times a Customer Was Called at Specific Point of Time

Introduction

In this article, we will explore how to group data by certain columns and count the number of times a specific condition is met. We will use Python’s pandas library to achieve this.

The problem statement involves a DataFrame with three columns: not_unique_id, date_of_call, and customer_reached. The goal is to create a new column, new, that contains the count of how many times a customer was called at specific points in time. We will start by examining the provided example and then proceed with explaining each step in detail.

Examining the Provided Example

The provided example shows a DataFrame with eight rows and three columns: unique_id, not_unique_id, date_of_call, and customer_reached. The DataFrame is sorted by not_unique_id and date_of_call in descending order, and an index column is reset.

rng = pd.date_range('2015-02-24', periods=8, freq='D')
df = pd.DataFrame(
    {
        "unique_id": ["K0", "K1", "K2", "K3", "K4", "K5", "K6","K7"],
        "not_unique_id": ["A000", "A111", "A222", "A222", "A222", "A222", "A222","A333"],
        "date_of_call": rng,
        "customer_reached": [1,0,0,1,1,1,1,1],
    }
)
df.sort_values(['not_unique_id','date_of_call'], inplace=True, ascending=False)
df.reset_index(drop=True, inplace=True) # reset index

print(df.head(10))

Output:

	unique_id	not_unique_id	date_of_call	customer_reached
7	K7	A333	2015-03-03	1
6	K6	A222	2015-03-02	1
5	K5	A222	2015-03-01	1
4	K4	A222	2015-02-28	1
3	K3	A222	2015-02-27	1
2	K2	A222	2015-02-26	0
1	K1	A111	2015-02-25	0
0	K0	A000	2015-02-24	1

Solution

To create the new column, new, we can use a combination of grouping, shifting, and cumulative sum. Here is an example code snippet that demonstrates this approach:

df['new'] = (df.iloc[::-1]
             .groupby('not_unique_id')['customer_reached']
             .apply(lambda x: x.shift().cumsum())
             .fillna(0)
             .astype(int))
print(df)

Output:

	unique_id	not_unique_id	date_of_call	customer_reached	new
7	K7	A333	2015-03-03	1	0
6	K6	A222	2015-03-02	1	3
5	K5	A222	2015-03-01	1	2
4	K4	A222	2015-02-28	1	1
3	K3	A222	2015-02-27	1	0
2	K2	A222	2015-02-26	0	0
1	K1	A111	2015-02-25	0	0
0	K0	A000	2015-02-24	1	0

Alternatively, we can achieve the same result by sorting the DataFrame before grouping:

df['new'] = (df.sort_values('not_unique_id')
             .groupby('not_unique_id')['customer_reached']
             .apply(lambda x: x.shift().cumsum())
             .fillna(0)
             .astype(int))
print(df)

Output:

	unique_id	not_unique_id	date_of_call	customer_reached	new
7	K7	A333	2015-03-03	1	0
6	K6	A222	2015-03-02	1	3
5	K5	A222	2015-03-01	1	2
4	K4	A222	2015-02-28	1	1
3	K3	A222	2015-02-27	1	0
2	K2	A222	2015-02-26	0	0
1	K1	A111	2015-02-25	0	0
0	K0	A000	2015-02-24	1	0

Conclusion

In this article, we demonstrated how to create a new column that counts the number of times a customer was called at specific points in time. We used grouping, shifting, and cumulative sum to achieve this result. The approach can be applied to various scenarios where you need to count occurrences based on certain conditions.

Last modified on 2024-07-11