structured streaming
defining a table on a stream
continuous queries
handling event-time
tumble(time_attr, interval),定义一个个连续的时间窗口,这样每行数据只可能出现在一个窗口内,窗口之间不会出现重叠defines a tumbling time window. a tumbling time window assigns rows to non-overlapping, continuous windows with a fixed duration (interval). for example, a tumbling window of 5 minutes groups rows in 5 minutes intervals. tumbling windows can be defined on event-time (stream + batch) or processing-time (stream).
tumble_start(time_attr, interval). 返回时间窗口的下限时间戳.returns the timestamp of the inclusive lower bound of the corresponding tumbling, hopping, or session window.
handling late data
bob 12:54:00 ./xxx 到达时间14:01:00如何处理?
watermarks定义在ctime,允许延迟2hour, 14:00:00-2hour<13:00:00,窗口12:00:00-13::00:00仍保持
watermarks定义在ctime,允许延迟5min,14:00:00-5min>13:00:00,时间窗口12:00:00-13:00:00已过期,数据被丢弃
如对本文有疑问, 点击进行留言回复!!
HBase Filter 过滤器之FamilyFilter详解
去 HBase,Kylin on Parquet 性能表现如何?
如何找到Hive提交的SQL相对应的Yarn程序的applicationId
网友评论