1. <strong id="7actg"></strong>
    2. <table id="7actg"></table>

    3. <address id="7actg"></address>
      <address id="7actg"></address>
      1. <object id="7actg"><tt id="7actg"></tt></object>

        Hive CTE | with as 隱藏的秘密!

        共 8544字,需瀏覽 18分鐘

         ·

        2022-06-08 01:35




        CTE

        ?Hive with as 語(yǔ)句

        Hive with 語(yǔ)句默認(rèn)是不把數(shù)據(jù)進(jìn)行物化的,相當(dāng)于視圖,定義了一個(gè)SQL片段,每次使用時(shí)候可以將該定義的SQL片段拿出來(lái)再被使用,該SQL片段可以理解為一個(gè)變量,主要用途簡(jiǎn)化SQL,讓SQL更簡(jiǎn)潔,替換子查詢(xún),方便定位問(wèn)題。該子句緊跟在SELECT或INSERT關(guān)鍵字之前,可以在Hive SELECT,INSERT,CREATE TABLE AS SELECT或CREATE VIEW AS SELECT語(yǔ)句中使用一個(gè)或多個(gè)CTE 。

        with as 也叫做子查詢(xún)部分,首先定義一個(gè)sql片段,該sql片段會(huì)被整個(gè)sql語(yǔ)句所用到,為了讓sql語(yǔ)句的可讀性更高些,作為提供數(shù)據(jù)的部分,也常常用在union等集合操作中。

        注意:hive with as不像oracle等數(shù)據(jù)庫(kù)會(huì)將數(shù)據(jù)緩存到內(nèi)存中,只是定義了一個(gè)SQL代碼片段,方便下次使用,使代碼更簡(jiǎn)潔優(yōu)美!??!? 其最大的好處就是適當(dāng)?shù)奶岣叽a可讀性,而且如果with子句在后面要多次使用到,這可以大大的簡(jiǎn)化SQL,但不一定提高執(zhí)行效率。

        在高版本的SQL中,with語(yǔ)句進(jìn)行了物化,默認(rèn)是不開(kāi)啟的,這個(gè)參數(shù)為

        hive.optimize.cte.materialize.threshold

        該參數(shù)默認(rèn)情況下是-1,是關(guān)閉的,當(dāng)開(kāi)啟(大于0),比如設(shè)置為2,則如果with..as語(yǔ)句被引用2次及以上時(shí),會(huì)把with..as語(yǔ)句生成的table物化,從而做到with..as語(yǔ)句只執(zhí)行一次,來(lái)提高效率。


        物化實(shí)際上是先相對(duì)于視圖而言的,視圖創(chuàng)建的是一個(gè)虛擬表,只是定義了一個(gè)SQL片段,并沒(méi)有實(shí)體表的創(chuàng)建,只是概念性的東西,那么物化就是將這種概念性的東西進(jìn)行實(shí)體化,如數(shù)據(jù)進(jìn)行緩存,存放在內(nèi)存,數(shù)據(jù)進(jìn)行落盤(pán)等,類(lèi)似于計(jì)算的中間結(jié)果進(jìn)行緩存或落到磁盤(pán),這樣每次計(jì)算的時(shí)候可以從該中間結(jié)果中取數(shù),這樣才可以達(dá)到一次分析,多次使用的目的。(這里我們可以類(lèi)比Spark RDD的緩存概念:RDD 緩存指的是將 RDD 以緩存的形式物化到內(nèi)存或磁盤(pán)的過(guò)程)

        Hive中具體源碼如下:

        從源碼看,在獲取元數(shù)據(jù)時(shí),會(huì)進(jìn)行判斷,判斷配置參數(shù)大于0且cte的引用次數(shù)超過(guò)配置的參數(shù)時(shí)候則開(kāi)啟cte的物化。

        1 CTE 的使用

        1 命令格式

        WITH      cte_name AS    (        cte_query    )    [,cte_name2  AS      (     cte_query2     )    ,……]


        參數(shù)說(shuō)明

        cte_name:CTE的名稱(chēng),不能與當(dāng)前WITH子句中的其他CTE的名稱(chēng)相同。查詢(xún)中任何使用到cte_name標(biāo)識(shí)符的地方,均指CTE。

        cte_query:一個(gè)SELECT語(yǔ)句。它產(chǎn)生的結(jié)果集用于填充CTE。

        2 示例

        示例1:

        INSERT OVERWRITE TABLE srcp PARTITION (p='abc')SELECT * FROM (    SELECT a.key, b.value    FROM (        SELECT * FROM src WHERE key IS NOT NULL    ) a    JOIN (        SELECT * FROM src2 WHERE value > 0    ) b    ON a.key = b.key) cUNION ALLSELECT * FROM (    SELECT a.key, b.value    FROM (        SELECT * FROM src WHERE key IS NOT NULL    ) a    LEFT OUTER JOIN (        SELECT * FROM src3 WHERE value > 0    ) b    ON a.key = b.key AND b.key IS NOT NULL)d;


        頂層的UNION兩側(cè)各為一個(gè)JOIN,JOIN的左表是相同的查詢(xún)。通過(guò)寫(xiě)子查詢(xún)的方式,只能重復(fù)這段代碼。

        使用CTE的方式重寫(xiě)以上語(yǔ)句。

        with   a as (select * from src where key is not null),  b as (select  * from src2 where value>0),  c as (select * from src3 where value>0),  d as (select a.key,b.value from a join b on a.key=b.key),  e as (select a.key,c.value from a left outer join c on a.key=c.key and c.key is not null)insert overwrite table srcp partition (p='abc')select * from d union all select * from e;


        重寫(xiě)后,a對(duì)應(yīng)的子查詢(xún)只需寫(xiě)一次,便可在后面進(jìn)行重用。您可以在CTE的WITH子句中指定多個(gè)子查詢(xún),像使用變量一樣在整個(gè)語(yǔ)句中反復(fù)重用。除重用外,不必反復(fù)嵌套。

        示例2:

        對(duì)于一些比較復(fù)雜的計(jì)算任務(wù),為了避免過(guò)多的JOIN,通常會(huì)先把一些需要提取的部分?jǐn)?shù)據(jù)使用臨時(shí)表或是CTE的形式在主要查詢(xún)區(qū)塊前進(jìn)行提取。

        臨時(shí)表的作法:

        CREATE TEMPORARY TABLE table_1 ASSELECTcolumnsFROM table A;CREATE TEMPORARY table_2 ASSELECTcolumnsFROM table B;SELECTtable_1.columns,table_2.columns,c.columnsFROM table C JOIN table_1JOIN table_2;

        CTE的作法:

        -- 注意Hive、Impala支持這種語(yǔ)法,低版本的MySQL不支持(高版本支持)WITH employee_by_title_count AS (SELECTt.name as job_title, COUNT(e.id) as amount_of_employeesFROM employees eJOIN job_titles t on e.job_title_id = t.idGROUP BY 1),salaries_by_title AS (SELECTname as job_title, salaryFROM job_titles)SELECT *FROM employee_by_title_count eJOIN salaries_by_title s ON s.job_title = e.job_title


        可以看到TEMP TABLE和CTE WITH的用法其實(shí)非常類(lèi)似,目的都是為了讓你的Query更加一目了然且優(yōu)雅簡(jiǎn)潔。很多人習(xí)慣將所有的Query寫(xiě)在單一的區(qū)塊里面,用過(guò)多的JOIN或SUBQUERY,導(dǎo)致最后邏輯丟失且自己也搞不清楚寫(xiě)到哪里,適時(shí)的使用TEMP TABLE和CTE作為輔助,絕對(duì)是很加分的。

        示例3:優(yōu)化子查詢(xún),方便維護(hù),代碼更簡(jiǎn)潔

        with cte as (SELECT  gw_id       ,sensor_id       ,stddev(temp) over(PARTITION by gw_id,sensor_id) as stddev_temp       ,stddev(humi) over(PARTITION by gw_id,sensor_id) as stddev_humi       ,avg(temp) over(PARTITION by gw_id,sensor_id) as avg_temp       ,avg(humi) over(PARTITION by gw_id,sensor_id) as avg_humiFROM phmdwdb.dwd_iot_phm_trackcir_envwhere from_unixtime(cast(substr(msg_time,1,10) AS BIGINT),'yyyy-MM-dd') = '2020-11-20')select gw_id       ,sensor_id       ,3*stddev_temp+avg_temp as temp_std_up       ,3*stddev_humi+avg_humi as humi_std_up       ,abs(3*stddev_temp-avg_temp) as temp_std_dn       ,abs(3*stddev_humi-avg_humi) as temp_std_dnfrom cte


        +----------------------+----------------------+--------------------+--------------------+---------------------+--------------------+--+|        gw_id         |      sensor_id       |    temp_std_up     |    humi_std_up     |     temp_std_dn     |    temp_std_dn     |+----------------------+----------------------+--------------------+--------------------+---------------------+--------------------+--+| 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  || 1263740609944641536  | 1329016193440047104  | 22.53016020309899  | 76.27771107659711  | 16.851530077439218  | 64.75383829654604  |


        示例4:CTE in Views, CTAS, and Insert Statements

        -- insert examplecreate table s1 like src;with q1 as ( select key, value from src where key = '5')from q1insert overwrite table s1select *;
        -- ctas examplecreate table s2 aswith q1 as ( select key from src where key = '4')select * from q1;
        -- view examplecreate view v1 aswith q1 as ( select key from src where key = '5')select * from q1;select * from v1;
        -- view example, name collisioncreate view v1 aswith q1 as ( select key from src where key = '5')select * from q1;with q1 as ( select key from src where key = '4')select * from v1;

        ?示例5:CET做遞歸查詢(xún)

        #!/bin/bashinput_para="hive"hive_home='/usr/idp/current/hive-client/bin';
        if [ "$1" != "" ];then input_para=$1fi;
        option=`echo ${input_para} | awk -F '_' '{print $1}' | sed s/[[:space:]]//g`
        if [ "$option" = "beeline" ];then hive_addr=`hadoop fs -cat /phm/JTTL_ETL_COMMON/etl_process.properties | grep hive_addr | awk -F '=' '{print $2}' | sed s/[[:space:]]//g` hive_url="${hive_addr}/phmdwdb" cd $hive_home beeline -u $hive_url \ -hivevar start_day='2020-01-01' \ -hivevar end_day='2025-12-31' \ -hivevar timeDimTable='phmdwdb.dim_phm_date' \ -f /home/centos/phm/JINI_GLOBAL_CREATETABLE/hive_dim_phm_date_query.sqlfi
        if [ "$option" = "hive" ];then
        hive -hivevar start_day='2018-01-01' \ -hivevar end_day='2025-12-31' \ -hivevar timeDimTable='phmdwdb.dim_phm_date' \ -f /home/centos/phm/JINI_GLOBAL_CREATETABLE/hive_dim_phm_date_query.sql
        fi
        具體遞歸查詢(xún)SQL如下:

        drop table ${timeDimTable};create table if not exists ${timeDimTable} aswith dates as ( select date_add("${start_day}", a.pos) as d from (select posexplode(split(repeat("o", datediff("${end_day}", "${start_day}")), "o"))) a),hours as ( select a.pos as h from (select posexplode(split(repeat("o", 23), "o"))) a)select from_unixtime(unix_timestamp(cast(d as timestamp)) + (h * 3600)) as dtime , d as dday , weekofyear(d) as dweek , date_format(d, 'u') as ddayofweek , month(d) as dmonth , year(d) as dyear , h as dhourfrom datesjoin hourssort by dtime;

        3 CTE作用 小結(jié)


        (1)可以復(fù)用公共代碼塊,減少表的 讀取次數(shù),降低IO 提高性能。如優(yōu)化join,優(yōu)化union 語(yǔ)句,優(yōu)化子查詢(xún)。將公共語(yǔ)句提前到select語(yǔ)句之前,達(dá)到一次查詢(xún)(讀),多次使用,目的是減少讀的次數(shù)。注意hive必須開(kāi)啟CTE物化的參數(shù)才起作用,如果沒(méi)開(kāi)啟,表還是被重復(fù)讀取,達(dá)不到一次查詢(xún)多次使用的目的

        (2)提高代碼的可讀性:使代碼更簡(jiǎn)潔,便于維護(hù)。如將子查詢(xún)抽出來(lái)以后,放到with語(yǔ)句,可方便定位,維護(hù)代碼,代碼的可讀性增強(qiáng)。

        (3)做遞歸查詢(xún),進(jìn)行迭代計(jì)算。

        ———————————————

        原文:https://blog.csdn.net/godlovedaniel/article/details/115480115

        瀏覽 107
        點(diǎn)贊
        評(píng)論
        收藏
        分享

        手機(jī)掃一掃分享

        分享
        舉報(bào)
        評(píng)論
        圖片
        表情
        推薦
        點(diǎn)贊
        評(píng)論
        收藏
        分享

        手機(jī)掃一掃分享

        分享
        舉報(bào)
        1. <strong id="7actg"></strong>
        2. <table id="7actg"></table>

        3. <address id="7actg"></address>
          <address id="7actg"></address>
          1. <object id="7actg"><tt id="7actg"></tt></object>
            超碰人人操人人爱 | 日韩美一级片 | 欧美日日夜夜操 | 午夜成人精品一区二区三区在线观看 | 欧美极品欧美精品欧美视频 | 又大又粗又爽的视频 | 懂色av懂色av粉嫩av无码 | 国产麻豆精品久久一二三 | 美女被操的视频网站 | a级无码视频 |