Hi,
I need to work with a very large data table on impala.
Incremental refresh works fine but then the problem I m facing is the initial load.
We need Rolling 1 year data in this table and this can be very large (we only have 2 months data in the impala table)
Workspace was upgraded to P2 premium capacity recently and we cannot upgrade for a new plan as of now.
No other query is being executed in this workspace at the time of refresh. So I need a workaround which can help me load the large amount of data on first load.
I tried the below approach....
1. I created a static table with one column containing Business units (Filter criteria for resultset)
2. Created function that queries for the filtered data and invoked it as a custom column (Works fine without incr refresh)
3. Enabled incremental refresh on a datetime column which was part of the result set... and I get an error shown below
Error: PipelineException: Access was denied. .. RootActivityId = b7f50a68-a02d-41f0-875d-9e23052b6000.Param1 = PipelineException: Access was denied. Request ID: 55c5e57a-ea3c-5fd8-6f84-c66617914a9d.
Note: While I can create separate table (BU wise an laod this data with incr refresh and then append these tables. I would like to avoid creating multiple tables. I m also inteersted to know the reason for failure in the above approach.
Final Output Query:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcjM0UIrVAdGGUNoIShuDaXeovDtU3h0q7w6V94DKe0HlvaF8bzjfCEoD1ccCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [busunit = _t]),
#"Added custom" = Table.AddColumn(Source, "Custom", each Query([busunit])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added custom", "Custom", {"calday"}, {"calday"}),
#"Transform columns" = Table.TransformColumnTypes(#"Expanded Custom", {{"calday", type datetime}}),
#"Replace errors" = Table.ReplaceErrorValues(#"Transform columns", {{"calday", null}}),
#"Table-63616C646179-autogenerated_for_incremental_refresh" = Table.SelectRows(#"Replace errors", each DateTime.From([calday]) >= RangeStart and DateTime.From([calday]) < RangeEnd)
in
#"Table-63616C646179-autogenerated_for_incremental_refresh"
Get Query (Function):
let
Query = (BU as text) => let
DB = Impala.Database(@Server),
Schema = DB{[Name = "MySchema", Kind = "Schema"]}[Data],
Data = Schema{[Name = "MyTable", Kind = "Table"]}[Data],
#"Filtered rows" = Table.SelectRows(Data, each [busunit] = BU),
#"Removed other columns" = Table.SelectColumns(#"Filtered rows", {"calday"}),
#"Removed duplicates" = Table.Distinct(#"Removed other columns")
in
#"Removed duplicates"
in
Query
Thanks & regards,
Mannu