반응형
1. RFM 분석의 3가지 지표 집계하기
- Recency: 최근 구매일 - 최근에 구매하였는가?
- Frequency: 구매 횟수 - 얼마나 자주 구매하였는가?
- Monetary: 구매 금액 합계 - 얼마나 돈을 썼는가?
2. 원리는 아래의 사진과 같다
3. 순서대로 진행 !
1
- 먼저 집계 테이블을 구하는데 위에 분류된 기준을 사용해 컬럼을 생성해준다.
select customer_id,
count(customer_id) as frequency,
sum(sales) as monetary,
max(order_date) AS recent_order
from records
group by customer_id
2
- 그다음 나이브하게 5등분으로 나누어 준다.
- 위에서 만든 테이블은 CTE 형식으로 나타내어 주어도 좋지만 서브쿼리를 사용하여 나타내어 본다.
SELECT customer_id,
ntile(5) OVER ( ORDER BY recent_order ) AS rfm_recency,
ntile(5) OVER ( ORDER BY total_orders ) AS rfm_frequency,
ntile(5) OVER ( ORDER BY total_sales ) AS rfm_monetary
from(select customer_id,
count(customer_id) as total_orders,
sum(sales) as total_sales,
max(order_date) AS recent_order
from records
group by customer_id) as t
- 결과 테이블
3.
RFM CELL 을 만들어 준다.
/* cte */
WITH preprocessing_tbl
AS (
/* 5등분한 테이블 */
SELECT customer_id,
Ntile(5)
OVER (
ORDER BY recent_order ) AS rfm_recency,
Ntile(5)
OVER (
ORDER BY total_orders ) AS rfm_frequency,
Ntile(5)
OVER (
ORDER BY total_sales ) AS rfm_monetary
/* RFM 을 계산한 테이블 */
FROM (SELECT customer_id,
Count(customer_id) AS total_orders,
Sum(sales) AS total_sales,
Max(order_date) AS recent_order
FROM records
GROUP BY customer_id) AS tbl1)
/* RFM CELL 만들기 */
SELECT customer_id,
rfm_recency,
rfm_frequency,
rfm_monetary,
Concat(rfm_recency, rfm_frequency, rfm_monetary) AS rfm_cell
FROM preprocessing_tbl
- 서브쿼리 버전
/* 서브쿼리 */
SELECT customer_id,
rfm_recency,
rfm_frequency,
rfm_monetary,
Concat(rfm_recency, rfm_frequency, rfm_monetary) AS rfm_cell
FROM (
/* 5등분한 테이블 */
SELECT customer_id,
Ntile(5)
OVER (
ORDER BY recent_order ) AS rfm_recency,
Ntile(5)
OVER (
ORDER BY total_orders ) AS rfm_frequency,
Ntile(5)
OVER (
ORDER BY total_sales ) AS rfm_monetary
/* RFM 을 계산한 테이블 */
FROM (SELECT customer_id,
Count(customer_id) AS total_orders,
Sum(sales) AS total_sales,
Max(order_date) AS recent_order
FROM records
GROUP BY customer_id) AS tbl1) tbl2
- 결과 테이블
- 구분기준
Cannot lose | Active fans | Promising newbies | Potential churners | |
---|---|---|---|---|
RFM cell values | 355, 255 | 543, 542, 453, 452 | 525, 524, 515, 514 | 335, 334, 325, 324 |
Conditions for inclusion | Low RHigh F&M | High R&FLow M | High R&MLow F (so far) | Mid RLow FHigh M |
Description | Big spenders who haven’t purchased lately | Customers who buy often & recently, but at low price points | New customers with large orders | High spending customers who haven’t purchased often or lately |
최종버전
/* 구분 기준별 나누기 */
SELECT customer_id,
rfm_recency,
rfm_frequency,
rfm_monetary,
rfm_cell,
CASE
WHEN rfm_cell IN ( '355', '255' ) THEN
'Cannot lose'
WHEN rfm_cell IN ( '543', '542', '453', '452' ) THEN
'Active fans'
WHEN rfm_cell IN ( '525', '524', '515', '514' ) THEN
'Promising newbies'
WHEN rfm_cell IN ( '335', '334', '325', '324' ) THEN
'Potential churners'
ELSE 'Other'
END AS rfm_segment
FROM (
/* 서브쿼리 - RFM cell 만들기 */
SELECT customer_id,
rfm_recency,
rfm_frequency,
rfm_monetary,
Concat(rfm_recency, rfm_frequency, rfm_monetary) AS rfm_cell
FROM (
/* 5등분한 테이블 */
SELECT customer_id,
Ntile(5)
OVER (
ORDER BY recent_order ) AS rfm_recency,
Ntile(5)
OVER (
ORDER BY total_orders ) AS rfm_frequency,
Ntile(5)
OVER (
ORDER BY total_sales ) AS rfm_monetary
/* RFM 을 계산한 테이블 */
FROM (SELECT customer_id,
Count(customer_id) AS total_orders,
Sum(sales) AS total_sales,
Max(order_date) AS recent_order
FROM records
GROUP BY customer_id) AS tbl1) tbl2) AS tbl3
- 결과 테이블 및 시각화
위와 같이 고객을 나이브하게 나누어 분류해보았다.
결론
참고링크
How to Create RFM Segments for Shopify
추가 분석 방향
more sophisticated RFM clustering using a K-means model
고객이 시간이 지남에 따라 한 셀에서 다른 셀로 이동하는 방법 모색하기
LRFM - length 추가
K means 클러스터링을 통해서 분류해보기
반응형
댓글