Friday, 27 June 2014

File Organization and its type


FILE DATA ORGANIZATION

INTRODUCTION

It is concern with how the data is stored in the database.It allows the designer to make decisions of how the database is to be implemented whereas the logical DB design is concerned with what data is stored in the database.

It follows 2 steps:-
1.) Translate the global logical data model for target DBMS-It includes operations like designing of base relation ,derived data and design of constraints.
2.) Design physical representation- It includes operations like analysis transaction,selection of file organization,selection of indexes and estimate the disk space requirements.

TYPES OF FILE ORGANIZATION-The various types of file organization are:-
1.) Heap or un-ordered file organization
2.) Hash file organization
3.) Index sequential access method ( ISAM )
4.) B tree or Binary tree
5.) Clustered tables

1.) Heap or un-ordered file organization-It is the simplest type of file organization in which the record are placed in the file in the same order as they are inserted.
Advantages-
1.) For storing bulky data
2.) Traversing in the form of linear search
Dis-Advantage-
1.) Time consuming process

2.) Hash File Organization- In this records are not stored sequentially in the file instead a hash function is used to calculate the address of the page in which the record is to be stored.The field on which hash function is calculated is known as hash field and if that field act as a key of the relation then it is known as hash key.
Advantages-Search is easy.
Dis-Advantage- Not possible for finding range of values

3.) Index sequential access method ( ISAM )-Data is organized into records which are composed of fixed length and the records are record sequentially. ISAM is a most versatile storage structure than hashing and it provides better functionality when data is retrieve based up on exact key match, pattern matching or a range of values.

4.) B tree or Binary tree-
a.) It is a data structure used to store large amount of information and in this method secondary storage is used to store the leaf nodes of the tree.
b.) The leaf nodes are the only ones that actually stores data icons.
c.) All other nodes are called index nodes or I nodes and these are used to store the guide values which allows us to traverse a tree structure from the root and arrive at the leaf node containing the data item.
d.)The performance of a B tree does not decreases as the relation is updated.

5.) Clustered tables- Clusteres  are group of one or more tables physically stored together because they same common columns.

PHYSICAL DATA ORGANIZATION | PRIMARY INDEXES | DENSE INDEX |SPARSE INDEX


PRIMARY INDEXES-

a.) A Primary index is an ordered file whose records are of fixed length with two fields.
b.) The first field is of the same data type as the ordered key field called the primary key of the data file.
c.) The second field is a pointer to a disk block(a block address)
d.) There is one index entry (or index record) in the index file for each block in the data file.
e.) Each index entry has the value of the primary key field for the first record in a block and a pointer that block as its two field values.

INDEXES CAN BE CHARACTERIZED AS DENSE OR SPARSE.

1.) DENSE INDEX-A dense index has an index entry for every search key value (and hence entry record) in the data file. A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this file is associated with a particular pointer to a record in the sorted data file. In clustered indices with duplicate keys, the dense index points to the first record with that key.

2.) SPARSE INDEX- A sparse index on the other hand, has index entries for only some of the search values.A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys, the sparse index points to the lowest search key in each block.

Therefore a primary index is a non-dense index,since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value(or every record).

Introduction to normalization | NORMALIZATION



NORMALIZATION

INTRODUCTION
  • It is the process of decomposing or splitting the relation into relations with fewer attributes ,thereby minimizing the redundancy of data and  minimizing insertion,deletion and updation anomalies.
  • It is a step by step reversible process of transforming and unnormalized relation into relations with simpler structure.
  • Normalization works through a series of stages called normal forms.The normal forms are applicable to individual relations.
  • The relation (table) is said to be in particular normal form if it satisfies a certain sets of constraints. 
WHY WE NEED NORMALIZATION?
  • Normalization consists of a series of guidelines that help to guide you in creating a good database structure.
ANOMALY IN NORMALIZATION
  1. Insert Anomaly- Insert anomaly refers to a situation when one cannot insert a new tuple(row) into a relation due to lack of data. E.G- if we want to insert a student with a value of rollno ,name and phone no attributes to be 28765 ,'kapil' and 78779988 respectively but does not have any hobby.This information cannot be inserted until the student has specified some hobby.This is because the primary key is composed of rollno and hobby attributes.
  2. Delete Anomaly-The delete anomaly refers to a situation where the deletion of data results in unintended loss of some important data.E.G-Suppose the student with roll no 9876 is no longer interested in gardening then in that case deleting the tuple(row) that gives information about varun's gardening hobby will result in loss of vital information that the student varun with roll no 9876 ever existed in the class. This is because there is only one row which contains information about the student with roll no 9876.
  3. Update Anomaly-The update anomaly refers to a situation where an update of  a single data value requires multiple rows of data to be updated.E.G- if the student with roll no 7798 changes his phone no 987532567 then it requires updation in both the rows that give information about him. 
ADVANTAGES OF NORMALIZATION
  • Minimizes data redundancy
  • Greater overall database organization
  • Data consistency within the database
  • Much more flexible database design
  • Enforces concept of referential integrity
DIS-ADVANTAGES OF NORMALIZATION
  • You cannot start building the database before you know what the user needs.
  • On normalizing the relations to higher normal form i.e. 4NF,5NF the performance degrades.
  • It is very time consuming and difficult process in normalizing relations to higher degree.
  • Careless decomposition may lead to bad design of database which may lead to serious problems.
TYPES OF NORMALIZATION
  1. FIRST NORMAL FORM(1NF)
  2. SECOND NORMAL FORM(2NF)
  3. THIRD NORMAL FORM(3NF)
  4. BOYCE-CODD NORMAL FORM(BCNF)
  5. FOURTH NORMAL FORM(4NF)
  6. FIFTH NORMAL FORM(5NF)

First Normal Form (1NF) | normalization and its types


FIRST NORMAL FORM(1NF)

A relation is said to be in first normal form, if it satisfies the following rules.
  1. There are no duplicate rows i.e. all the primary key attributes are defined.
  2. There are no repeating groups in the table i.e. each rows and columns intersection contains one and only one value and not a set of vales.
The first normal form does not allows a relations to contain nested relations.
E.G-Consider a relation "record"

St_id
St_name
subject
marks
101
Amrit
Computer
Economics
Maths
98
56
43
102
Anupreet
Maths
Eco
98
99

It is not in 1NF
  
HOW TO CONVERT IT INTO 1NF?
METHOD 1

STEP 1-To remove repeating values for a column the record relation is converted to a flat relation i.e. STU1 by repeating the pair having composite primary key (St_id,St_name)
STEP 2- For every entry in the table consequently removing the attribute record with its sub-ordinate attributes.
Relation "STU1"

St_id
St_name
subject
marks
101
Amrit
Computer
98
101
Amrit
Economics
56
101
Amrit
Maths
43
102
Anupreet
Maths
98
102
Anupreet
Eco
99

It is  in 1NF but this method  increases redundancy.

METHOD 2
Another method is to remove the attributes that violates the first normal form and place it in a separate relation along with the primary key.So, un-normalized relation("record") is decomposed into 2 sub-relations .
(stu_details),(stu_performance)

Relation (stu_details)

St_id
St_name
101
Amrit
102
Anupreet

Relation (stu_performance)

St_id
subject
marks
101
Computer
98
101
Economics
56
101
Maths
43
102
Maths
98
102
Eco
99

It is in 1NF.

Anomalies -Insertion,deletion and updation anomaly

Functional Dependency (FD) | Database design


FUNCTIONAL DEPENDENCY(FD)

An attribute of an relational 'R' is said to be functional dependent on attribute 'X' if and only if each 'X' value is associated with exactly one value of 'Y' at any given time.


REPRESENTATION-

Where 'X' is known as determinant and 'Y' is known as determined.

EXAMPLE-


HOW TO SHOW FUNCTIONAL DEPENDENCY DIAGRAMMATICALLY-


Consider a relation

S.NO
SNAME
DEPT
CITY
S1
A
10
X
S2
B
20
Y
S3
C
20
Y

Now find various functional dependencies which are


Represent this functional dependency diagrammatically
FULLY FUNCTIONAL DEPENDENCY-
An attribute 'Y' Of relation 'R' is fully functional dependent on attribute 'X' if it is function dependent on 'X' and not functionally dependent on any subset of 'X'.
                                                            When primary key is composite then the non key attributes must be identified by the entire key and not by some attributes.

Second Normal Form(2NF) | normalization and its type


SECOND NORMAL FORM(2NF)

A relation is in 2NF if it holds these two conditions:-
1. The relation is in 1NF.
2. Every non -key attribute should be fully functional dependent on its primary key.If the primary key consists of a single attribute then the relation is in 2NF.

EXAMPLE- Consider a relation. "order books" where primary key is ('ORDER NO','BOOK NAME')

ORDER NO
BOOK NAME
QUANTITY
PRICE
1
C
15
175
1
DB
20
220
2
.NET
30
200
3
C
40
175
4
DB
15
220

Let's find fully functional dependency,functional dependency and also which are not  fully functional dependent in this relation.

In this order no, book name and quantity are fully functional  dependent,order no, book name and price  are not fully functional  dependent because book name is functional  dependent  on price. Therefore the relation is not in 2NF.

HOW TO MAKE IT IN 2NF-
Partially key dependency should be remove by splitting the 1NF relation into appropriate relations such that every non-key attribute is fully functional dependent on primary key of respective relations.

1. Create a new relation from source relation that contains the attributes that are fully functional dependent on the concatenated primary key.

ORDER NO
BOOK NAME
QUANTITY
1
C
15
1
DB
20
2
.NET
30
3
C
40
4
DB
15

2. Create any relation from the source relation with the attributes that are not fully functionally dependent on the concatenated primary key and the member of the key on which they are dependent.  

BOOK NAME
PRICE
C
175
DB
220
.NET
200
C
175
DB
220

Now the relation is in 2NF.
Anomalies-Insertion,deletion and updation anomalies are still there.

Third Normal Form (3NF) |Normalization and its type


THIRD NORMAL  FORM(3NF)

INTRODUCTION--
A relation is said to be in third normal form(3NF) if both conditions hold simultaneously.
1.) The relation is in Second Normal Form(2NF)
2.) Non-key attributes of the relation should not be transitively dependent on the primary key.
The main purpose of the 3NF is to remove the transitive dependency which is the main cause of anomalies in the 2NF.

EXAMPLE- To explain the 3NF, let us consider the example of STUDENT relation.Here the primary key is Stu_Id.

Stu_Id
Stu_Name
Teach_Id
Teach_Name
Teach_Qual
2523
Anurag
T001
Navathe
Ph.D
3712
Pankaj
T004
Date
M.Tech
4096
Gagan
T001
Navathe
Ph.D
2716
Anshu
T004
Date
M.Tech
1768
Harman
T009
Desai
M.Tech

The STUDENT relation is in 2NF but it suffers from insertion,deletion,updation anomalies.The main reason of these anomalies is because some attributes are transitively functionally dependent on the primary key.This results in redundancy in the relation.The dependencies that exist in STUDENT relation are:-
Here, the Stu_Name is functionally dependent on the attribute Stu_Id. Also Teach_Id is functionally dependent on the attribute Stu_Id. Also, Teach_Name is functionally dependent on Teach_Id  so we can say that Teach_Name is transitively functionally  dependent on Stu_Id.Similarly, Teach_Qual is transitively functional dependent on Stu_Id attribute.
                   So to make the relation in 3NF, We have to eliminate this transitive dependence on the primary key.To achieve this

1.) Create a new relation from the source relation that contains all the original attributes but without those attributes that are transitively dependent on the primary key.So in case of STUDENT relation,the resulting relation will contains Stu_Id ,Stu_Name and Teach_Id attributes.The primary key in this relation is same as in the original relation i.e. (Stu_Id).

Stu_Id
Stu_Name
Teach_Id
2523
Anurag
T001
3712
Pankaj
T004
4096
Gagan
T001
2716
Anshu
T004
1768
Harman
T009

2.) The other relation created from the source relation will contain attributes that are transitively functionally dependent on the primary key of the original relation plus the non key attribute from the original source relation on which the transitive functional dependent  attributes are functionally dependent. The primary key of the resulting relation will be a non key attribute on which the non key attributes are functionally dependent.


Teach_Id
Teach_Name
Teach_Qual
T001
Navathe
Ph.D
T004
Date
M.Tech
T009
Desai
M.Tech

Now the relation is in 3NF.