profile pic
⌘ '
raccourcis clavier

Keys

KK is a key if KK uniquely determines all of RR and no subset of KK does

K is a superkey for relation RR if KK contains a key of RR

see also: keys

functional dependency

Think of it as ”XYX \to Y holds in RR

convention: no braces used for set of attributes, just ABCABC instead of {A,B,C}\{A,B,C\}

properties

  • splitting/combining
  • trival FDs
  • Armstrong’s Axioms

FD are generalisation of keys

superkey: XRX \to R, must include all the attributes of the relation on RHS

trivial

AAABAABCADABCD\begin{aligned} A &\to A \\ AB &\to A \\ ABC &\to AD \coloneqq ABC \to D \end{aligned}

always hold (right side is a subset)

splitting/combining right side of FDs

XA1A2An holds for R X \to A_{1} A_{2} \ldots A_{n} \text{ holds for R }

when each of XA1X \to A_{1}, XA2X \to A_{2}, …, XAnX \to A_{n} holds for RR

ex: ABCA \to BC is equiv to ABA \to B and ACA \to C

ex: AFA \to F and AGA \to G can be written as AFGA \to FG

Armstrong’s Axioms

Given X,Y,ZX,Y,Z are sets of attributes

rules

RuleDescription
ReflexivityIf YXY \subseteq X, then XYX \to Y
AugmentationIf XYX \to Y, then XZYZXZ \to YZ for any ZZ
TransitivityIf XYX \to Y and YZY \to Z, then XZX \to Z
UnionIf XYX \to Y and XZX \to Z, then XYZX \to YZ
DecompositionIf XYZX \to YZ, then XYX \to Y and XZX \to Z

dependency inference

ACA \to C is implied by {AB,BC}\{A \to B, B \to C\}

transitivity

example: Key

List all the keys of R(A,B,C,D)R(A,B,C,D) with the following FDs:

  • BCB \to C
  • BDB \to D

sol:

BC and BD(given)BCD(Union)ABACD(Augmentation)ABABCD(Reflexivity and Union)\begin{aligned} B \to C &\text{ and } B \to D &(\text{given})\\ B &\to CD &(\text{Union})\\ AB &\to ACD &(\text{Augmentation})\\ AB &\to ABCD &(\text{Reflexivity and Union})\\ \end{aligned}

closure test

Given attribute set YY and FD set FF, we have YF+Y_F^{+} is the closure of YY relative to FF

Or set of all FDs given/implied by YY

Steps:

  • Start: YF+=Y,F=FY_F^{+}=Y, F^{'}=F
  • While there exists a fFf \in F^{'} s.t LHS(F)YF+\text{LHS}(F) \subseteq Y_F^{+}:
    • YF+=YF+RHS(f)Y_F^{+} = Y_F^{+} \cup \text{RHS}(f)
    • F=FfF^{'} = F^{'} - f
  • End: YBBYF+Y \to B \forall B \in Y_F^{+}

minimal basis

The idea is to remove redundant FDs.

for minimal cover for FDs

  • Right sides are single attributes
  • No FDs can be removed, otherwise FF^{'} is no longer a minimal basis
  • No attribute can be removed from a LEFT SIDE

construction:

  1. decompose RHS to single attributes
  2. repeatedly try to remove a FD to see if the remaining FDs are equivalent to the original set
    • or finFtest whether J=(Ff)+\forall f in F^{'} \mid \text{test whether } J=(F^{'}-f)^{+} is equivalent to F+F^{+}
  3. repeatedly try to remove an attribute from LHS to see if the removed attribute can be derived from the remaining FDs.

Schema decomposition

goal: avoid redundancy and minimize anomalies (update and deletion) w/o losing information

One can also think of projecting FDs as geometric projections within a given FDs space

good properties to have

  • lossless join decomposition (should be able to reconstruct after decomposed)
  • avoid anomalies (redundant data)
  • preservation: (F1F2Fn)+=F+(F_{1} \cup F_{2} \cup \ldots \cup F_n)^{+} = F^{+}

A lossy decomposition results in the reconstruction of components to include additional information that is not in the original constructions

how can we test for losslessness?

A binary decomposition of R=(R,F)R=(R,F) into R1=(R1,F1)R_{1}=(R_{1},F_{1}) and R2=(R2,F2)R_{2}=(R_{2},F_{2}) is lossless iff:

  1. (R1R2)R1(R_{1} \cap R_{2}) \to R_{1} is the F+F^{+}
  2. (R1R2)R2(R_{1} \cap R_{2}) \to R_{2} is the F+F^{+}

if R1R2R_{1} \cap R_{2} form a superkey of either R1R_{1} or R2R_{2}, then decomposition of RR is lossless

Projection

  • Starts with Fi=F_i = \emptyset
  • For each subset X of RiX \text{ of } R_i
    • Compute X+X^{+}
    • For each attribute AX+A \in X^{+}
      • If AA in RiR_i: add XAX \to A to FiF_i
  • Compute minimal basis of FiF_i

Normal forms

BCNF3NF2NF1NF\text{BCNF} \subseteq 3\text{NF} \subseteq 2\text{NF} \subseteq 1\text{NF}
Normal FormDefinitionKey RequirementsExample of ViolationHow to Fix
First Normal Form (1NF)All columns contain atomic values and there are no repeating groups.- Each cell holds a single value (atomicity)
- No repeating groups or arrays
A column storing multiple phone numbers in a single cell (e.g., “123-4567, 234-5678”).Split the values into separate rows or columns so each cell is atomic.
Second Normal Form (2NF)A 1NF table where every non-key attribute depends on the whole of a composite key.- Already in 1NF
- No partial dependencies on a composite primary key
A table with a composite primary key (e.g., StudentID, CourseID) where a non-key attribute (e.g., StudentName) depends only on StudentID.Move attributes that depend on only part of the key into a separate table.
Third Normal Form (3NF)A 2NF table with no transitive dependencies.- Already in 2NF
- No transitive dependencies (non-key attributes depend only on the key, not on other non-key attributes)
A table where AdvisorOffice depends on AdvisorName, which in turn depends on StudentID.Put attributes like AdvisorName and AdvisorOffice in a separate Advisor table keyed by AdvisorID.
Boyce-Codd Normal Form (BCNF)A stronger version of 3NF where every determinant is a candidate key.- For every functional dependency X → Y, X must be a candidate keyA table where Professor → Course but Professor is not a candidate key.Decompose the table so that every functional dependency has a candidate key as its determinant.

1NF

no multi-valued attributes allowed

idea: think of storing a list of a values in an attributes

counter: Course(name, instructor, [student, email]*)

2NF

non-key attributes depend on candidate keys

idea: consider non-key attribute AA, then there exists an FD XX s.t. XAX \to A and XX is a candidate key

Second normal form, hwere AuthorName is dependent on AuthorID
Second normal form, hwere AuthorName is dependent on AuthorID

3NF

non-prime attribute depend only on candidate keys

idea: consider FD XAX \to A, then either XX is a superkey, or AA is prime (part of a key)

counter: studiostudioAddr\text{studio} \to \text{studioAddr}, where studioAddr depends on studio which is not a candidate key

Three normal form counter example
Three normal form counter example

theorem

It is always possible to convert a schema to lossless join, dependency-preserving 3NF

what you get from 3NF

  • Lossless join
  • dependency preservation
  • anomalies (doesn’t guarantee)

Boyce-Codd normal form (BCNF)

on additional restriction over 3NF where all non-trivial FDs have superkey LHS

theorem

We say a relation RR is in BCNF if XAX \to A is a non-trivial FD that holds in RR, and XX is a superkey 1

what you get from BCNF

  • no dependency preservation (all original FDs should be satisfied)
  • no anomalies
  • Lossless join

decomposition into BCNF

relation RR with FDs FF, look for a BCNF violation XYX \to Y (XX is not a superkey)

  • Compute X+X^{+}
    • find X+X all attributes X^{+} \neq X \neq \text{ all attributes } (XX is a superkey)
  • Replace RR by relations with
    • R1=X+R_{1} = X^{+}
    • R2=R(X+X)=RX+XR_{2} = R - (X^{+} - X) = R - X^{+} \cup X
  • Continue to recursively decompose the two new relations
  • Project given FDs FF onto the two new relations.

Remarque

  1. means AA is not contained in XX