今天,在寫hive的HSQL語句,又是重復性的計算pv、uv(不爽),而且還是,算完分類算總類,就比如:算pc端的pv、uv,移動端的pv、uv,然后又要計算總的pv、uv,總的pv還好說,pc+移動端就OK了,但uv就得重新排重了,每次遇到這樣的事情就非常不爽,因為不能快
今天,在寫hive 的HSQL語句,又是重復性的計算pv、uv(不爽),而且還是,算完分類算總類,就比如:算pc端的pv、uv,移動端的pv、uv,然后又要計算總的pv、uv,總的pv還好說,pc+移動端就OK了,但uv就得重新排重了,每次遇到這樣的事情就非常不爽,因為不能快速在一個HSQL中處理(可能自己有點強迫癥吧),于是自己擠出上班時間測試了幾種不同的寫法,對比效率1、以前統(tǒng)計總量pv,uv和各分類的pv,uv都這么寫也就是 SELECT a.type,a.pv,a.uv FROM ( SELECT type,count(1) as pv,COUNT(distinct(uid))as uv FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' group by type union all SELECT 'all' as type,count(1) as pv,COUNT(distinct(uid))as uv FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' ) a 說明:distinct雖然寫起來挺方便的,但是效率真的太差,建議永遠不要用distinct 2、然后我們的語句就可以改為: SELECT a.type,sum(pv),count(uid) FROM ( SELECT type,count(1) as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' group by uid,type union all SELECT 'all' as type,count(1) as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' group by uid ) a group by type 這樣雖然效率提高了些,而且我也一直這么用了,有段時間,但總感覺還是很不爽,總覺得沒有發(fā)揮union all的功能 3、今天才發(fā)現(xiàn),這group by 不能寫在里面,真的嚴重影響效率,而且按照上面寫job數(shù)量還多,果斷需改: SELECT type,SUM(pv),count(uid) FROM ( SELECT a.type,sum(pv),uid FROM ( SELECT type,1 as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' union all SELECT 'all' as type,1 as pv,uid FROM t1 WHERE dt='201410129' AND req_url like 'mbloglist?domain=100808&ajwvr=6%' ) a group by uid,type) b group by type 經測試,效率果然杠杠的
聲明:本網頁內容旨在傳播知識,若有侵權等問題請及時與本網聯(lián)系,我們將在第一時間刪除處理。TEL:177 7030 7066 E-MAIL:11247931@qq.com