hive 之 row_number等窗口分析函数用法

释放双眼,带上耳机,听听看~!

一、排序&去重分析

row_number() over(partititon by col1 order by col2) as rn

结果:1,2,3,4

rank() over(partititon by col1 order by col2) as rk

结果:1,2,2,4,5

dense_rank() over(partititon by col1 order by col2) as ds_rk

结果:1,2,2,3,4

上HQL语句:

select
order_id,
departure_date,
row_number() over(partition by order_id order by departure_date) as rn, — 直排
rank() over(partition by order_id order by departure_date) as rk, — 并列的,下一个数字会跳过
dense_rank() over(partition by order_id order by departure_date) as d_rk — 并列的,下一个数据不会跳过
from ord_test
where order_id=410341346;

二、跨行获取

lag(col1,n,DEFAULT) over(partition by col1 order by col2) as up

用于统计窗口内往上第n行值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行为NULL时候,取默认值,如不指定,则为NULL)

lead(col1,n,DEFAULT) over(partition by col1 order by col2) as down

用于统计窗口内往下第n行值,第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL)

first_value() over(partition by col1 order by col2) as fv

取分组内排序后,截止到当前行,第一个值

last_value() over(partition by col1 order by col2) as lv

取分组内排序后,截止到当前行,第一个值

select 

order_id,

departure_date,

first_value(departure_date) over(partition by order_id order by add_time)as fv, -- 取分组内第一条

last_value(departure_date) over(partition by order_id order by add_time)as lv -- 取分组内最后一条 

from ord_test

where order_id=410341346;

select 

order_id,

departure_date,

lead(departure_date,1) over(partition by order_id order by departure_date)as down_1, -- 向下取一级

lag(departure_date,1) over(partition by order_id order by departure_date)as up_1 -- 向上取一级

from ord_test

where order_id=410341346;

人已赞赏
首页

BT种子下载工具添加Tracker服务器以加快下载速度

2019-12-25 0:07:01

首页

SSpanel面板搬迁后遇到的HTTP500问题解决

2019-12-27 2:32:01

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索