1. <strong id="7actg"></strong>
    2. <table id="7actg"></table>

    3. <address id="7actg"></address>
      <address id="7actg"></address>
      1. <object id="7actg"><tt id="7actg"></tt></object>

        實戰(zhàn)!HiveSQL電子商務(wù)消費行為分析!

        共 7749字,需瀏覽 16分鐘

         ·

        2020-10-15 03:03

        一、前言

        Hive 學(xué)習(xí)過程中的一個練習(xí)項目,如果不妥的地方或者更好的建議,歡迎指出!我們主要進(jìn)行一下一些練習(xí):

        • 數(shù)據(jù)結(jié)構(gòu)
        • 數(shù)據(jù)清洗
        • 基于Hive的數(shù)據(jù)分析

        二、項目需求

        首先和大家講一下這個項目的需求:

        「對某零售企業(yè)最近1年門店收集的數(shù)據(jù)進(jìn)行數(shù)據(jù)分析」

        • 潛在客戶畫像
        • 用戶消費統(tǒng)計
        • 門店的資源利用率
        • 消費的特征人群定位
        • 數(shù)據(jù)的可視化展現(xiàn)

        三、數(shù)據(jù)結(jié)構(gòu)

        本次練習(xí)一共用到四張表,如下:這里有需要的小伙伴可以菜單欄:找到我

        Customer表

        Transaction表

        Store表

        Review表

        四、項目實戰(zhàn)

        「Create HDFS Folder」

        hdfs?dfs?-mkdir?-p?/tmp/shopping/data/customer
        hdfs?dfs?-mkdir?-p?/tmp/shopping/data/transaction
        hdfs?dfs?-mkdir?-p?/tmp/shopping/data/store
        hdfs?dfs?-mkdir?-p?/tmp/shopping/data/review

        「Upload the file to HDFS」

        hdfs?dfs?-put?/opt/soft/data/customer_details.csv?/tmp/shopping/data/customer/
        hdfs?dfs?-put?/opt/soft/data/transaction_details.csv?/tmp/shopping/data/transaction/
        hdfs?dfs?-put?/opt/soft/data/store_details.csv?/tmp/shopping/data/store/
        hdfs?dfs?-put?/opt/soft/data/store_review.csv?/tmp/shopping/data/review/

        「Create database」

        drop?database?if?exists?shopping?cascade
        create?database?shopping

        「Use database」

        use?shopping

        「Create external table」

        「創(chuàng)建四張對應(yīng)的外部表,也就是本次項目中的近源表?!?/strong>

        create?external?table?if?not?exists?ext_customer_details(
        customer_id?string,
        first_name?string,
        last_name?string,
        email?string,
        gender?string,
        address?string,
        country?string,
        language?string,
        job?string,
        credit_type?string,
        credit_no?string
        )
        row?format?delimited?fields?terminated?by?','
        location?'/tmp/shopping/data/customer/'
        tblproperties('skip.header.line.count'='1')
        create?external?table?if?not?exists?ext_transaction_details(
        transaction_id?string,
        customer_id?string,
        store_id?string,
        price?double,
        product?string,
        buydate?string,
        buytime?string
        )
        row?format?delimited?fields?terminated?by?','
        location?'/tmp/shopping/data/transaction'
        tblproperties('skip.header.line.count'='1')
        create?external?table?if?not?exists?ext_store_details(
        store_id?string,
        store_name?string,
        employee_number?int
        )
        row?format?delimited?fields?terminated?by?','
        location?'/tmp/shopping/data/store/'
        tblproperties('skip.header.line.count'='1')
        create?external?table?if?not?exists?ext_store_review(
        transaction_id?string,
        store_id?string,
        review_score?int
        )
        row?format?delimited?fields?terminated?by?','
        location?'/tmp/shopping/data/review'
        tblproperties('skip.header.line.count'='1')

        通過UDF自定義 MD5加密函數(shù)

        「Create MD5 encryption function」

        這里通過UDF自定義 MD5加密函數(shù) ,對地址、郵箱等信息進(jìn)行加密。

        --?md5?udf自定義加密函數(shù)

        --add?jar?/opt/soft/data/md5.jar
        --create?temporary?function?md5?as?'com.shopping.services.Encryption'

        --select?md5('abc')
        --drop?temporary?function?encrymd5

        「Clean and Mask customer_details 創(chuàng)建明細(xì)表」

        create?table?if?not?exists?customer_details?
        as?select?customer_id,first_name,last_name,md5(email)?email,gender,md5(address)?address,country,job,credit_type,md5(credit_no)?
        from?ext_customer_details

        對表內(nèi)容進(jìn)行檢查,為數(shù)據(jù)清洗做準(zhǔn)備

        「Check ext_transaction_details data」transaction表的transaction_id進(jìn)行檢查,查看重復(fù)的、錯誤的、以及空值的數(shù)量。

        這里從表中我們可以看到transaction_id存在100個重復(fù)的值。

        with?
        t1?as?(select?'countrow'?as?status,count(transaction_id)?as?val?from?ext_transaction_details),
        t2?as?(select?'distinct'?as?status,(count(transaction_id)-count(distinct?transaction_id))?as?val?from?ext_transaction_details),
        t3?as?(select?'nullrow'?as?status,count(transaction_id)?as?val?from?ext_transaction_details?where?transaction_id?is?null),
        t4?as?(select?'errorexp'?as?status,count(regexp_extract(transaction_id,'^([0-9]{1,4})$',0))?as?val?from?ext_transaction_details)
        select?*?from?t1?union?all?select?*?from?t2?union?all?select?*?from?t3?union?all?select?*?from?t4

        「Clean transaction_details into partition table」

        create?table?if?not?exists?transaction_details(
        transaction_id?string,
        customer_id?string,
        store_id?string,
        price?double,
        product?string,
        buydate?string,
        buytime?string
        )
        partitioned?by?(partday?string)
        row?format?delimited?fields?terminated?by?','
        stored?as?rcfile

        「開啟動態(tài)分區(qū)」

        set?hive.exec.dynamic.partition=true
        set?hive.exec.dynamic.partition.mode=nonstrict

        開啟動態(tài)分區(qū),通過窗口函數(shù)對數(shù)據(jù)進(jìn)行清洗

        「Clear data and import data into transaction_details」

        --?partday?分區(qū)?transaction_id?重復(fù)?
        select?if(t.ct=1,transaction_id,concat(t.transaction_id,'_',t.ct-1))?
        transaction_id,customer_id,store_id,price,product,buydate,buytime,date_format(buydate,'yyyy-MM')?
        as?partday?
        from?(select?*,row_number()?over(partition?by?transaction_id)?as?ct?
        from?ext_transaction_details)?t
        insert?into?transaction_details?partition(partday)?
        select?if(t.ct=1,transaction_id,concat(t.transaction_id,'_',t.ct-1))?transaction_id,customer_id,store_id,price,product,buydate,buytime,date_format(regexp_replace(buydate,'/','-'),'yyyy-MM')?
        as?partday?from?(select?*,row_number()?over(partition?by?transaction_id)?as?ct?
        from?ext_transaction_details)?t?
        • 「row_number() over(partition by transaction_id)」 窗口函數(shù) :從1開始,按照順序,生成分組內(nèi)記錄的序列,row_number()的值不會存在重復(fù),當(dāng)排序的值相同時,按照表中記錄的順序進(jìn)行排列 ?這里我們對分組的transaction_id
        • if(t.ct=1,transaction_id,concat(t.transaction_id,'_',t.ct-1)) 如果滿足ct=1,就是transaction_id,否則進(jìn)行字符串拼接生成新的id

        「Clean store_review table」

        create?table?store_review?
        as?select?transaction_id,store_id,nvl(review_score,ceil(rand()*5))?
        as?review_score?from?ext_store_review

        「NVL(E1, E2)的功能為:如果E1為NULL,則函數(shù)返回E2,否則返回E1本身。」我們可以看到表中的數(shù)據(jù)存在空值,通過NVL函數(shù)對數(shù)據(jù)進(jìn)行填充。

        show?tables

        通過清洗后的近源表和明細(xì)表如上。

        數(shù)據(jù)分析

        Customer分析

        • 找出顧客最常用的信用卡
        select?credit_type,count(credit_type)?as?peoplenum?from?customer_details
        group?by?credit_type?order?by?peoplenum?desc?limit?1
        • 找出客戶資料中排名前五的職位名稱
        select?job,count(job)?as?jobnum?from?customer_details
        group?by?job
        order?by?jobnum?desc
        limit?5
        • 在美國女性最常用的信用卡
        select?credit_type,count(credit_type)?as?femalenum?from?customer_details?
        where?gender='Female'
        group?by?credit_type
        order?by?femalenum?desc
        limit?1
        • 按性別和國家進(jìn)行客戶統(tǒng)計
        select?count(*)?as?customernum,country,gender?from?customer_details
        group?by?country,gender

        Transaction分析

        • 計算每月總收入
        select?partday,sum(price)?as?countMoney?from?transaction_details?group?by?partday
        • 計算每個季度的總收入「Create Quarter Macro 定義季度宏」,將時間按季度進(jìn)行劃分
        create?temporary?macro?
        calQuarter(dt?string)?
        concat(year(regexp_replace(dt,'/','-')),'年第',ceil(month(regexp_replace(dt,'/','-'))/3),'季度')
        select?calQuarter(buydate)?as?quarter,sum(price)?as?sale?
        from?transaction_details?group?by?calQuarter(buydate)
        • 按年計算總收入
        create?temporary?macro?calYear(dt?string)?year(regexp_replace(dt,'/','-'))
        select?calYear(buydate)?as?year,sum(price)?as?sale?from?transaction_details?group?by?calYear(buydate)
        • 按工作日計算總收入
        create?temporary?macro?calWeek(dt?string)?concat('星期',dayofweek(regexp_replace(dt,'/','-'))-1)
        select?concat('星期',dayofweek(regexp_replace(buydate,'/','-'))-1)?as?week,sum(price)?as?sale?
        from?transaction_details?group?by?dayofweek(regexp_replace(buydate,'/','-'))
        • 按時間段計算總收入(需要清理數(shù)據(jù))
        select?concat(regexp_extract(buytime,'[0-9]{1,2}',0),'時')?as?time,sum(price)?as?sale?from?transaction_details?group?by?regexp_extract(buytime,'[0-9]{1,2}',0)
        • 按時間段計算平均消費「Time macro」
        create?temporary?macro?calTime(time?string)?if(split(time,'?')[1]='PM',regexp_extract(time,'[0-9]{1,2}',0)+12,
        if(split(time,'?')[1]='AM',regexp_extract(time,'[0-9]{1,2}',0),split(time,':')[0]))
        select?calTime(buytime)?as?time,sum(price)?as?sale?from?transaction_details?group?by?calTime(buytime)?
        --define?time?bucket?
        --early?morning:?(5:00,?8:00]
        --morning:?(8:00,?11:00]
        --noon:?(11:00,?13:00]
        --afternoon:?(13:00,?18:00]
        --evening:?(18:00,?22:00]
        --night:?(22:00,?5:00]?--make?it?as?else,?since?it?is?not?liner?increasing
        --We?also?format?the?time.?1st?format?time?to?19:23?like,?then?compare,?then?convert?minites?to?hours
        with
        t1?as
        (select?calTime(buytime)?as?time,sum(price)?as?sale?from?transaction_details?group?by?calTime(buytime)?order?by?time),
        t2?as
        (select?if(time>5?and?time<=8,'early?morning',if(time?>8?and?time<=11,'moring',if(time>11?and?time?<13,'noon',
        if(time>13?and?time?<=18,'afternoon',if(time?>18?and?time?<=22,'evening','night')))))?as?sumtime,sale?
        from?t1)
        select?sumtime,sum(sale)?from?t2?
        group?by?sumtime
        • 按工作日計算平均消費
        select?concat('星期',dayofweek(regexp_replace(buydate,'/','-'))-1)?
        as?week,avg(price)?as?sale?from?transaction_details?
        where?dayofweek(regexp_replace(buydate,'/','-'))-1?!=0?and?dayofweek(regexp_replace(buydate,'/','-'))-1?!=6
        group?by?dayofweek(regexp_replace(buydate,'/','-'))
        • 計算年、月、日的交易總數(shù)
        select?buydate?as?month,count(*)?as?salenum?from?transaction_details?group?by?buydate
        • 找出交易量最大的10個客戶
        select?c.customer_id,c.first_name,c.last_name,count(c.customer_id)?as?custnum?from?customer_details?c
        inner?join?transaction_details?t
        on?c.customer_id=t.customer_id
        group?by?c.customer_id,c.first_name,c.last_name
        order?by?custnum?desc
        limit?10
        • 找出消費最多的前10位顧客
        select?c.customer_id,c.first_name,c.last_name,sum(price)?as?sumprice?from?customer_details?c
        inner?join?transaction_details?t
        on?c.customer_id=t.customer_id
        group?by?c.customer_id,c.first_name,c.last_name
        order?by?sumprice?desc
        limit?10
        • 統(tǒng)計該期間交易數(shù)量最少的用戶
        select?c.customer_id,c.first_name,c.last_name,count(*)?as?custnum?from?customer_details?c
        inner?join?transaction_details?t
        on?c.customer_id=t.customer_id
        group?by?c.customer_id,c.first_name,c.last_name
        order?by?custnum?asc
        limit?1
        • 計算每個季度的獨立客戶總數(shù)
        select?calQuarter(buydate)?as?quarter,count(distinct?customer_id)?as?uninum
        from?transaction_details
        group?by?calQuarter(buydate)
        • 計算每周的獨立客戶總數(shù)
        select?calWeek(buydate)?as?quarter,count(distinct?customer_id)?as?uninum
        from?transaction_details
        group?by?calWeek(buydate)
        • 計算整個活動客戶平均花費的最大值
        select?sum(price)/count(*)?as?sale
        from?transaction_details
        group?by?customer_id
        order?by?sale?desc
        limit?1
        • 統(tǒng)計每月花費最多的客戶
        with?
        t1?as
        (select?customer_id,partday,count(distinct?buydate)?as?visit?from?transaction_details?group?by?partday,customer_id),
        t2?as
        (select?customer_id,partday,visit,row_number()?over(partition?by?partday?order?by?visit?desc)?as?visitnum?from?t1)
        select?*?from?t2?where?visitnum=1?
        • 統(tǒng)計每月訪問次數(shù)最多的客戶
        with
        t1?as
        (select?customer_id,partday,sum(price)?as?pay?from?transaction_details?group?by?partday,customer_id),
        t2?as
        (select?customer_id,partday,pay,row_number()?over(partition?by?partday?order?by?pay?desc)?as?paynum?from?t1)
        select?*?from?t2?where?paynum=1
        • 按總價找出最受歡迎的5種產(chǎn)品
        select?product,sum(price)?as?sale?from?transaction_details?
        group?by?product
        order?by?sale?desc
        limit?5
        • 根據(jù)購買頻率找出最暢銷的5種產(chǎn)品
        select?product,count(*)?as?num?from?transaction_details?
        group?by?product
        order?by?num?desc
        limit?5
        • 根據(jù)客戶數(shù)量找出最受歡迎的5種產(chǎn)品
        select?product,count(distinct?customer_id)?as?num?from?transaction_details
        group?by?product
        order?by?num?desc
        limit?5
        • 驗證前5個details
        select?*?from?transaction_details?where?product?in?('Goat?-?Whole?Cut')

        Store分析

        • 按客流量找出最受歡迎的商店
        with?
        t1?as?(select?store_id,count(*)?as?visit?from?transaction_details?
        group?by?
        store_id?order?by?visit?desc?limit?1)
        select?s.store_name,t.visit?
        from?t1?t?
        inner?join?
        ext_store_details?s?
        on?t.store_id=s.store_id
        • 根據(jù)顧客消費價格找出最受歡迎的商店
        with?
        t1?as?(select?store_id,sum(price)?as?sale?from?transaction_details?
        group?by?
        store_id?order?by?sale?desc?limit?1)
        select?s.store_name,t.sale?
        from?t1?t?
        inner?join?
        ext_store_details?s?
        on?t.store_id=s.store_id
        • 根據(jù)顧客交易情況找出最受歡迎的商店
        with
        t1?as?
        (select?store_id,store_name?from?ext_store_details)
        select?t.store_id,store_name,count(distinct?t.customer_id)?as?num
        from?transaction_details?t
        inner?join?t1?s
        on?s.store_id=t.store_id
        group?by?t.store_id,store_name
        order?by?num?desc
        limit?1
        • 根據(jù)商店和唯一的顧客id獲取最受歡迎的產(chǎn)品
        with
        t1?as?(select?store_id,product,count(distinct?customer_id)?as?num?from?transaction_details
        group?by?store_id,product?order?by?num?desc?limit?1)
        select?s.store_name,t.num,t.product?
        from?t1?t?
        inner?join?
        ext_store_details?s?
        on?t.store_id=s.store_id
        • 獲取每個商店的員工與顧客比
        with
        t1?as?(select?store_id,count(distinct?customer_id)?as?num?from?transaction_details
        group?by?store_id?)
        select?s.store_name,employee_number/num?as?vs?from?t1?t
        inner?join?ext_store_details?s?
        on?t.store_id=s.store_id
        • 按年和月計算每家店的收入
        select?store_id,partday,sum(price)?from?transaction_details?group?by?store_id,partday
        • 按店鋪制作總收益餅圖
        select?store_id,sum(price)?from?transaction_details?group?by?store_id
        • 找出每個商店最繁忙的時間段
        with
        t1?as
        (select?store_id,count(customer_id)?as?peoplenum?from?transaction_details?group?by?store_id,concat(regexp_extract(buytime,'[0-9]{1,2}',0),'時')),
        t2?as
        (select?store_id,peoplenum,row_number()?over(partition?by?store_id?order?by?peoplenum?desc)?as?peo?from?t1?)
        select?t.store_id,e.store_name,t.peoplenum?from?t2?t
        inner?join?ext_store_details?e
        on?e.store_id?=?t.store_id
        where?peo?=1
        • 找出每家店的忠實顧客
        with
        t1?as
        (select?customer_id,store_id,count(customer_id)?as?visit?from?transaction_details?group?by?store_id,customer_id?),
        t2?as
        (select?customer_id,store_id,visit,row_number()?over(partition?by?store_id?order?by?visit?desc)?as?most?from?t1)
        select?r.customer_id,concat(first_name,last_name)?as?customer_name,r.store_id,store_name,r.visit?from?t2?r
        inner?join?customer_details?c
        on?c.customer_id=r.customer_id
        inner?join?ext_store_details?e
        on?e.store_id=r.store_id
        where?most=1
        • 根據(jù)每位員工的最高收入找出明星商店
        with
        t1?as
        (select?store_id,sum(price)?as?sumprice?from?transaction_details?group?by?store_id)
        select?t.store_id,s.store_name,sumprice/employee_number?as?avgprice??from?t1?t
        inner?join?ext_store_details?s
        on?s.store_id=t.store_id
        order?by?avgprice?desc

        Review分析

        • 在ext_store_review中找出存在沖突的交易映射關(guān)系
        select?t.transaction_id,t.store_id?from?transaction_details?t
        inner?join?ext_store_review?e
        on?e.transaction_id=t.transaction_id
        where?e.store_id!=t.store_id
        • 了解客戶評價的覆蓋率
        with
        trans?as?(select?store_id,count(transaction_id)?as?countSale?from?transaction_details?group?by?store_id),
        rev?as?(select?store_id,count(distinct?transaction_id)?as?review?from?store_review?group?by?store_id)
        select?s.store_name,(r.review*100/t.countSale)?as?cover?from??trans?t?
        inner?join?rev?r?
        on?t.store_id=r.store_id?
        inner?join?ext_store_details?s
        on?t.store_id=s.store_id
        • 根據(jù)評分了解客戶的分布情況
        select?store_id,review_score,count(review_score)?as?numview?from?ext_store_review??where?review_score>0?group?by?review_score,store_id
        • 根據(jù)交易了解客戶的分布情況
        select?store_id,count(transaction_id)?as?transactionnum?from?ext_store_review??group?by?store_id
        • 客戶給出的最佳評價是否總是同一家門店
        select?store_id,customer_id,count(customer_id)?as?visit?from?transaction_details?t
        join?ext_store_review?e
        on?e.transaction_id?=?t.transaction_id
        where?e.review_score=5
        group?by?t.store_id,t.customer_id

        瀏覽 136
        點贊
        評論
        收藏
        分享

        手機(jī)掃一掃分享

        分享
        舉報
        評論
        圖片
        表情
        推薦
        點贊
        評論
        收藏
        分享

        手機(jī)掃一掃分享

        分享
        舉報
        1. <strong id="7actg"></strong>
        2. <table id="7actg"></table>

          <address id="7actg"></address>
          <address id="7actg"></address>
          1. <object id="7actg"><tt id="7actg"></tt></object>
            日本产品和韩国产品哪个用户更火 | 啊灬啊灬啊灬快灬高潮女女同文 | 久久理论电影 | 精品国产天线2024 | 美女视频大黄 | 国产草逼网 | 秋霞电影院午夜伦A片欧美 | 99爱视频在线观看 | 日韩一级影视 | 欧美性爱乱伦 |