Data Normalization

DATA NORMALIZATION

This is a technique of splitting a complex table in to simple meaningful tables. In here arrange tables, rows and relationships between tables in order to avoid data redundancy (අතිරික්තතාව) and to attain data integrity (සම්පූර්ණත්වය).Goal,

01. To avid data redundancy (data duplicated).

02. To attain data integrity - Data goes to inconsistence state (ස්ථිර නෑ, අදාළ නෑ).

03. To put data in to correct table.

04. To avoid CRUD anomalies (අක්‍රමිකතාව).

Duplicate වෙන data නවත්වන්න ඕනේ, අදාළ නැති data store වීම නවත්වන්න ඕනේ.නියමිත table වලට පමණක් නියමිට data යවන්න ඕනේ සහ DB එක සමග කරන CRUD operation වලදී අක්‍රමිකතා ඇති වෙන්න බෑ (අනවශ්‍ය data delete වීම ,අනවශ්‍ය data add වීම ,data search කරගන්න බැරිවීම වගේ ඒවා)

Partial dependency

It is the dependency where non key attributes are functionally depends on any part of composite key.

Full dependency

It is the dependency where non key attributes are functionally depends on complete of composite key.

Transitive dependency

It is the dependency where a non key attribute become the determinant of any other non key attribute.

Database Normalization හෙවත් optimization කිරීම steps කීපයකින් සිදු කරනු ලබයි.

01. 1^st normal form – 1NF

02. 2^nd normal from – 2NF

03. 3^rd normal form – 3NF

04. 4^th normal form ….

නමුත් ,මුලිකවම කතා කරනුයේ 3NF දක්වා පමණක් වන අතර ඒ දක්වා normalization කිරීම බොහෝ දුරට ප්‍රමාණවත් වේ .

මෙය පහත example සමග විස්තර කරනු ලබයි.

Table one.

Table two.

Data redundancy (data duplicated)- Some columns values are repeated multiple times.a_gold,agency_n1etc..This should be avoided.

CURD anomalies.

CREATE - අලුත් agency එකක් දානවා.But එකට agent කෙනෙක් නෑ.එමනිසා agent id එකක්ද නෑ .

UPDATE - Agency name එක (gency_n1) update කරන විට මෙම name එක repeat වී ඇති නිසා repeat වී ඇති සෑම එකෙක්ම update කරන්න්ට වෙනවා.

DELETE-Agent කෙනෙක් delete කරද්දී agency සහ customer ලා විස්තරත් delete වෙනවා (ඉන්නේ එක agent කෙනෙක් නම් අවුලක් නෑ).Data goes to inconsistence state (ස්ථිර නෑ, අදාළ නෑ) – Data Integrity.

READ –

Select * from FilmAgent where customer_1=”aaa” or customer_1=”aaa” or customer_1=”aaa” - this is a problem.

Customer name එක අනුව short කරලා ගන්න ඕනෙනම් එක කරන්න බෑ customer felids 3ක් තියන නිසා.

1NF

Each column should be atomic.

There should not be group of columns representing smaller information (customer_1, customer_2, customer_3).

There should be a key which uniquely identify each row I the table. It can be one column or combination of column (primary key or composite key).

According to the table one.

According to the table two.

2NF

1NF + All the non key column should be depend on primary key. මෙහිදී සියලු Transitive dependencies

ඉවත් කරයි.

3NF

2NF + all the non key columns should be none transitively depend on primary key. එනම් සෑම non key columns ක් ම primary key මත fully depend විය යුතුයි .මෙහිදී සියලු Transitive dependencies ඉවත් කරයි. Transitive dependencies විය හැකි ආකාර දෙකකි.

01. Determinant with single attribute.

02. Determinant with multiple attribute.

Determinant with single attribute

Determinant with multiple attribute.

Example for Normalization.

Employee කෙනෙක් assign කරනතුරුම new project එකක් දාන්න බෑ.Similarly, no employee cannot be added until a project is assign.

Update anomaly - When we update employee name or project title, we have to update some values in multiple places (A, Java).

Delete anomaly - When we delete an employee , project is also deleted.

1NF – All column values are atomic, no group of columns which represents similar information and each row have an id to uniquely identify the row.INF is Ok.

2NF-Removing Partial dependency

කලින් තිබු CRUD anomaly දැන් OK.But Employee Table එකෙන් හෝ Project Table එකෙන් delete කරද්දී Task table වලට effect එකක් එනවා.This issue is control by integrity role.

3NF-No Transitive dependencies to remove.3NF is OK.

In Short,

There are main 3 styles in data table normalization (1NF,2NF,3NF).

1NF

· All column values should be atomic.

· Removing group of columns which represents similar information.

· Each row should have an id to uniquely identify the row.

2NF

· Removing Partial dependency.

3NF

· Removing Transitive dependency.

Search This Blog

Nuwana - Software Development