經(jīng)典SQL面試10題解析

Hi,我是王知無,一個大數(shù)據(jù)領(lǐng)域的原創(chuàng)作者。
放心關(guān)注我,獲取更多行業(yè)的一手消息。
一、提要
作為一名數(shù)據(jù)工作人員,SQL是日常工作中最常用的數(shù)據(jù)提取&簡單預處理語言。因為其使用的廣泛性和易學程度也被其他崗位比如產(chǎn)品經(jīng)理、研發(fā)廣泛學習使用,本篇文章主要結(jié)合經(jīng)典面試題,給出通過數(shù)據(jù)開發(fā)面試的SQL方法與實戰(zhàn)。以下題目均來與筆者經(jīng)歷&網(wǎng)上分享的中高難度SQL題。
二、解題思路
簡單——會考察一些group by & limit之類的用法,或者平時用的不多的函數(shù)比如rand()類;會涉及到一些表之間的關(guān)聯(lián)
中等——會考察一些窗口函數(shù)的基本用法;會有表之間的關(guān)聯(lián),相對tricky的地方在于會有一些自關(guān)聯(lián)的使用
困難——會有中位數(shù)或者更加復雜的取數(shù)概念,可能要求按照某特定要求生成列;一般這種題建中間表會解得清晰些
三、SQL真題
第一題
order訂單表,字段為:goods_id, amount ; pv 瀏覽表,字段為:goods_id,uid; goods按照總銷售金額排序,分成top10,top10~top20,其他三組
求每組商品的瀏覽用戶數(shù)(同組內(nèi)同一用戶只能算一次)
create?table?if?not?exists?test.nil_goods_category?as
select?goods_id
,case?when?nn<=?10?then?'top10'
??????when?nn<=?20?then?'top10~top20'
??????else?'other'?end?as?goods_group
from
(
????select?goods_id
????,row_number()?over(partition?by?goods_id?order?by?sale_sum?desc)?as?nn
????from
????(
????????select?goods_id,sum(amount)?as?sale_sum
????????from?order
????????group?by?1
????)?aa
)?bb;
select?b.goods_group,count(distinct?a.uid)?as?num
from?pv?a?
left?join?test.nil_goods_category?b?
on?a.goods_id?=?b.goods_id
group?by?1;
第二題
商品活動表 goods_event,g_id(有可能重復),t1(開始時間),t2(結(jié)束時間)
給定時間段(t3,t4),求在時間段內(nèi)做活動的商品數(shù)
1.
select?count(distinct?g_id)?as?event_goods_num
from?goods_event
where?(t1<=t4?and?t1>=t3)?
or?(t2>=t3?and?t2<=t4)
2.
select?count(distinct?g_id)?as?event_goods_num
from?goods_event
where?(t1<=t4?and?t1>=t3)?
union?all
第三題
商品活動流水表,表名為event,字段:goods_id, time;
求參加活動次數(shù)最多的商品的最近一次參加活動的時間
select?a.goods_id,a.time
from?event?a?
inner?join
(
????select?goods_id,count(*)
????from?event
????group?by?gooods_id
????order?by?count(*)?desc
????limit?1
)?b
on?a.goods_id?=?b.goods_id
order?by?a.goods_id,a.time?desc
第四題
用戶登錄的log數(shù)據(jù),劃定session,同一個用戶一個小時之內(nèi)的登錄算一個session;
生成session列
drop?table?if?exists?koo.nil_temp0222_a2;
create?table?if?not?exists?koo.nil_temp0222_a2?as
select?*
????,row_number()?over(partition?by?userid?order?by?inserttime)?as?nn1
from
(
????select?a.*
????,b.inserttime?as?inserttime_aftr
????,datediff(b.inserttime,a.inserttime)?as?session_diff
??from
??(
????select?userid,inserttime
??????,row_number()?over(partition?by?userid?order?by?inserttime?asc)?nn
????from?koo.nil_temp0222?
????where?userid?=?1900000169
??)?a???
??left?join
??(
?????select?userid,inserttime
??????,row_number()?over(partition?by?userid?order?by?inserttime?asc)?nn
????from?koo.nil_temp0222?
????where?userid?=?1900000169
??)?b
??on?a.userid?=??b.userid?and?a.nn?=?b.nn-1
)?aa
where?session_diff?>10?or?nn?=?1
order?by?userid,inserttime;
drop?table?if?exists?koo.nil_temp0222_a2_1;
create?table?if?not?exists?koo.nil_temp0222_a2_1?as
select?a.*
,case?when?b.nn?is?null?then?a.nn+3?else?b.nn?end?as?nn_end
from?koo.nil_temp0222_a2?a?
left?join?koo.nil_temp0222_a2?b?
on?a.userid?=?b.userid?
and?a.nn1?=?b.nn1?-?1;
select?a.*,b.nn1?as?session_id
from
(
??select?userid,inserttime
????,row_number()?over(partition?by?userid?order?by?inserttime?asc)?nn
??from?koo.nil_temp0222?
??where?userid?=?1900000169
)?a
left?join?koo.nil_temp0222_a2_1?b?
on?a.userid?=?b.userid
and?a.nn>=b.nn
and?a.nn第五題
訂單表,字段有訂單編號和時間;
取每月最后一天的最后三筆訂單
select?*
from
(
??select?*
??,rank()?over(partition?by?mm?order?by?dd?desc)?as?nn1
??,row_number()?over(partition?by?mm,dd?order?by?inserttime?desc)?as?nn2
??from
??(select?cast(right(to_date(inserttime),2)?as?int)?as?dd,month(inserttime)?as?mm,userid,inserttime
??from?koo.nil_temp0222)?aa?
)?bb?
where?nn1?=?1?and?nn2<=3;
第六題
數(shù)據(jù)庫表Tourists,記錄了某個景點7月份每天來訪游客的數(shù)量如下:
id date visits 1 2017-07-01 100 …… 非常巧,id字段剛好等于日期里面的幾號。
現(xiàn)在請篩選出連續(xù)三天都有大于100天的日期。
上面例子的輸出為:date 2017-07-01 ……
select?a.*,b.num?as?num2,c.num?as?num3
from?table??a?
left?join?table?b
on?a.userid?=?b.userid
and?a.dt?=?date_add(b.dt,-1)
left?join?table?c
on?a.userid?=?c.userid
and?a.dt?=?date_add(c.dt,-2)
where?b.num>100
and?a.num>100
and?c.num>100
第七題
現(xiàn)有A表,有21個列,第一列id,剩余列為特征字段,列名從d1-d20,共10W條數(shù)據(jù)!
另外一個表B稱為模式表,和A表結(jié)構(gòu)一樣,共5W條數(shù)據(jù)
請找到A表中的特征符合B表中模式的數(shù)據(jù),并記錄下相對應的id
有兩種情況滿足要求:
每個特征列都完全匹配的情況下 最多有一個特征列不匹配,其他19個特征列都完全匹配,但哪個列不匹配未知
1.
select?aa.*
from
(
??select?*,concat(d1,d2,d3……d20)?as?mmd
??from?table
)?aa?
left?join
(
??select?id,concat(d1,d2,d3……d20)?as?mmd
??from?table
)?bb?
on?aa.id?=?bb.id
and?aa.mmd?=?bb.mmd
2.
select?a.*,sum(d1_jp,d2_jp……,d20_jp)?as?same_judge
from
(
??select?a.*
??,case?when?a.d1?=?b.d1?then?1?else?0?end?as?d1_jp
??,case?when?a.d2?=?b.d2?then?1?else?0?end?as?d2_jp
??,case?when?a.d3?=?b.d3?then?1?else?0?end?as?d3_jp
??,case?when?a.d4?=?b.d4?then?1?else?0?end?as?d4_jp
??,case?when?a.d5?=?b.d5?then?1?else?0?end?as?d5_jp
??,case?when?a.d6?=?b.d6?then?1?else?0?end?as?d6_jp
??,case?when?a.d7?=?b.d7?then?1?else?0?end?as?d7_jp
??,case?when?a.d8?=?b.d8?then?1?else?0?end?as?d8_jp
??,case?when?a.d9?=?b.d9?then?1?else?0?end?as?d9_jp
??,case?when?a.d10?=?b.d10?then?1?else?0?end?as?d10_jp
??,case?when?a.d20?=?b.d20?then?1?else?0?end?as?d20_jp
??,case?when?a.d11?=?b.d11?then?1?else?0?end?as?d11_jp
??,case?when?a.d12?=?b.d12?then?1?else?0?end?as?d12_jp
??,case?when?a.d13?=?b.d13?then?1?else?0?end?as?d13_jp
??,case?when?a.d14?=?b.d14?then?1?else?0?end?as?d14_jp
??,case?when?a.d15?=?b.d15?then?1?else?0?end?as?d15_jp
??,case?when?a.d16?=?b.d16?then?1?else?0?end?as?d16_jp
??,case?when?a.d17?=?b.d17?then?1?else?0?end?as?d17_jp
??,case?when?a.d18?=?b.d18?then?1?else?0?end?as?d18_jp
??,case?when?a.d19?=?b.d19?then?1?else?0?end?as?d19_jp
??from?table?a?
??left?join?table?b?
??on?a.id?=?b.id?
)?aa
where?sum(d1_jp,d2_jp……,d20_jp)?=?19
第八題
我們把用戶對商品的評分用稀疏向量表示,保存在數(shù)據(jù)庫表t里面:
t的字段有:uid,goods_id,star。uid是用戶id goodsid是商品id = star是用戶對該商品的評分,值為1-5
現(xiàn)在我們想要計算向量兩兩之間的內(nèi)積,內(nèi)積在這里的語義為:
對于兩個不同的用戶,如果他們都對同樣的一批商品打了分,那么對于這里面的每個人的分數(shù)乘起來,并對這些乘積求和。
例子,數(shù)據(jù)庫表里有以下的數(shù)據(jù):
U0 g0 2
U0 g1 4
U1 g0 3
U1 g1 1
計算后的結(jié)果為:
U0 U1 23+41=10 ……
select?aa.uid1,aa.uid2
,sum(star_multi)?as?result
from
(
??select?a.uid?as?uid1
??,b.uid?as?uid2
??,a.goods_id
??,a.star?*?b.star?as?star_multi
??from?t?a?
??left?join?t?b?
??on?a.goods_id?=?b.goods_id
??and?a.udi<>b.uid??
)?aa?
group?by?1,2
select?uid1,uid2,sum(multiply)?as?result
from
(select?t.uid?as?uid1,?t.uid?as?uid2,?goods_id,a.star*star?as?multiply
from?a?left?join?b?
on?a.goods_id?=?goods_id
and?a.uid<>uid)?aa
group?by?goods
第九題
給出一堆數(shù)和頻數(shù)的表格,統(tǒng)計這一堆數(shù)中位數(shù)
select?a.*
,b.s_mid_n
,c.l_mid_n
,avg(b.s_mid_n,c.l_mid_n)
from
(
??select
??case?when?mod(count(*),2)?=?0?then?count(*)/2?else?(count(*)+1)/2?end?as?s_mid
??,case?when?mod(count(*),2)?=?0?then?count(*)/2+1?else?(count(*)+1)/2?end?as?l_mid??
??from?table
)?a?
left?join
(
??select?id,num,row_number()?over(partition?by?id?order?by?num?asc)?nn
??from?table
)?b?
on?a.s_mid?=?b.nn
left?join
(
??select?id,num,row_number()?over(partition?by?id?order?by?num?asc)?nn
??from?table
)?c??
on?a.l_mid?=?c.nn
第十題
表order有三個字段,店鋪ID,訂單時間,訂單金額
查詢一個月內(nèi)每周都有銷量的店鋪
select?distinct?credit_level
from
(
??select?credit_level,count(distinct?nn)?as?number
??from
??(
????select?userid,credit_level,inserttime,month(inserttime)?as?mm
????,weekofyear(inserttime)?as?week
????,dense_rank()?over(partition?by?credit_level,month(inserttime)?order?by?weekofyear(inserttime)?asc)?as?nn
????from?koo.nil_temp0222?
????where?substring(inserttime,1,7)?=?'2019-12'
????order?by?credit_level?,inserttime??
??)?aa?
??group?by?1??
)?bb
where?number?=?(select?count(distinct?weekofyear(inserttime))
from?koo.nil_temp0222?
where?substring(inserttime,1,7)?=?'2019-12')

