From 2b780bf79aa3b5d835442687b76ad3c42b2ce44a Mon Sep 17 00:00:00 2001 From: Julian T Date: Wed, 12 Jan 2022 08:58:34 +0100 Subject: ETL notes --- sem7/db/eksamnen.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'sem7/db') diff --git a/sem7/db/eksamnen.md b/sem7/db/eksamnen.md index d9386db..5c6da48 100644 --- a/sem7/db/eksamnen.md +++ b/sem7/db/eksamnen.md @@ -2,12 +2,14 @@ # TODO - Lav opgave 2.3 i distributed thing + - Læs om view maintenance # Words and Things - **ROLAP**: Relational Online Analytical Processing - *Summarizability* on page 52 in DW book. - *Data marts* is a subset of a data wareshouse, containing only as single subject such as sales. + - *Heterogeneity* page 19 in parallel book # Nice Spatial SQL Commands @@ -427,6 +429,8 @@ This should therefore not be used for user queries. Here large sequential operations are done to transform data, in a process that should be easy to restart etc. When done data can be copied to data marts. +Slide 25 giver en plan for at opstille en ETL plan. + ## Extract Data can either be copied from **cooperative sources** such as replication mechanism, or call backs. @@ -441,3 +445,11 @@ This process of finding deltas is called **changed data capture** (CDC), where o This can be data convertions such as string encoding or data/time representation. +This also related to **data quality**. +Slide 14 states some requirements for data in DW. + +## Load + +SQL is often slow for loading into DW. +If used the index should be dropped and reindexed after loading. + -- cgit v1.2.3