aboutsummaryrefslogtreecommitdiff
path: root/sem7/dist/lec_google
diff options
context:
space:
mode:
authorJulian T <julian@jtle.dk>2021-11-25 08:47:19 +0100
committerJulian T <julian@jtle.dk>2021-11-25 08:47:19 +0100
commit890ad2bcee172ab2a4cbb319145f5b42ba38619a (patch)
treec7cc445410379cb5ee4b37b18f2d45a56e680b28 /sem7/dist/lec_google
parent7f57150038a90f634dd27b25bd9bba05c461c22a (diff)
Add notes and assignment solution
Diffstat (limited to 'sem7/dist/lec_google')
-rw-r--r--sem7/dist/lec_google/notes.md55
1 files changed, 55 insertions, 0 deletions
diff --git a/sem7/dist/lec_google/notes.md b/sem7/dist/lec_google/notes.md
new file mode 100644
index 0000000..a246d60
--- /dev/null
+++ b/sem7/dist/lec_google/notes.md
@@ -0,0 +1,55 @@
+# GFS Design
+
+Google workflow is optimized towards:
+
+ - Large sequential reads, with few random reads
+ - Frequent concurrent append, with few random writes.
+
+ We therefore have many writers (fx. web crawlers) with a single readers (fx. indexer).
+
+ - Sustained throughput is more important that latency.
+
+## Appending
+
+When crawlers append stuff to the file that the indexer reads, the location for appending does not matter.
+So this is given up to allow for many writers at the same time.
+
+## Chunking
+
+The large files are split into chunks, which are replicated over to other chunkservers.
+
+A *chunkhandle* is the index of a single chunk.
+Then the *chunkmaster* holds a dictionary which maps file names and offsets to chunkhandle and list of servers.
+The list of servers are the servers where the wanted chunk is replicated.
+
+## Replication
+
+**TODO: Read up on this.**
+
+Ways of writing data to replicas:
+ - *Passive replication*
+
+## Consistency
+
+File creation and changes (*namespace mutations*) is done by the master, thus they are atomic.
+
+A file region is consistent if
+ - *Consistent* if all clients receive the same data
+ - *Defined* if consistent and clients see written data in their entirety.
+ So clients will not see a partial write. (**TODO** Not entirely sure about this)
+
+Records are appended atomically *at least once somewhere*.
+
+## Master Fault Tolerance
+
+Master state is maintained in stable storage.
+
+Can have multiple (read-only) *shadow masters*, which provide read access.
+
+External program will detect master failure, and select a new master.
+
+# Chubby
+
+Locking service, that should be super reliable.
+
+Locks are kept for a very long time, hours or days (This has something to do with the word *coarse*).