aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJulian T <julian@jtle.dk>2022-01-12 08:58:34 +0100
committerJulian T <julian@jtle.dk>2022-01-12 08:58:34 +0100
commit2b780bf79aa3b5d835442687b76ad3c42b2ce44a (patch)
tree16978fae6f80fb22215a7a8db5c6dbdb1fe4d8de
parent73824760a31860d93ae9818a094ad5cef9037f8d (diff)
ETL notes
-rw-r--r--sem7/db/eksamnen.md12
1 files changed, 12 insertions, 0 deletions
diff --git a/sem7/db/eksamnen.md b/sem7/db/eksamnen.md
index d9386db..5c6da48 100644
--- a/sem7/db/eksamnen.md
+++ b/sem7/db/eksamnen.md
@@ -2,12 +2,14 @@
# TODO
- Lav opgave 2.3 i distributed thing
+ - Læs om view maintenance
# Words and Things
- **ROLAP**: Relational Online Analytical Processing
- *Summarizability* on page 52 in DW book.
- *Data marts* is a subset of a data wareshouse, containing only as single subject such as sales.
+ - *Heterogeneity* page 19 in parallel book
# Nice Spatial SQL Commands
@@ -427,6 +429,8 @@ This should therefore not be used for user queries.
Here large sequential operations are done to transform data, in a process that should be easy to restart etc.
When done data can be copied to data marts.
+Slide 25 giver en plan for at opstille en ETL plan.
+
## Extract
Data can either be copied from **cooperative sources** such as replication mechanism, or call backs.
@@ -441,3 +445,11 @@ This process of finding deltas is called **changed data capture** (CDC), where o
This can be data convertions such as string encoding or data/time representation.
+This also related to **data quality**.
+Slide 14 states some requirements for data in DW.
+
+## Load
+
+SQL is often slow for loading into DW.
+If used the index should be dropped and reindexed after loading.
+