Course Module 4. Working with databases - Lecture: Key Database Design Steps

2.1. Conceptual design

Database design is carried out in three stages:

conceptual design;
logical design;
physical design.

The purpose of the conceptual design phase is to create a conceptual data model based on users' ideas about the subject area. To achieve it, a series of sequential procedures are carried out. An example of an entity (conceptual) schema:

1. Definition of entities and their documentation. To identify entities, objects are defined that exist independently of others. Such objects are entities. Each entity is given a meaningful name that users can understand. The names and descriptions of the entities are entered into the data dictionary. If possible, the expected number of instances of each entity is set.

2. Determination of relationships between entities and their documentation. Only those relationships between entities are defined that are necessary to satisfy the database design requirements. The type of each is set. The membership class of entities is revealed. Links are assigned meaningful names expressed by verbs. A detailed description of each connection, indicating its type and the class of belonging of the entities participating in the connection, is entered into the data dictionary.

3. Creation of an ER-model of the subject area. ER diagrams are used to represent entities and relationships between them. Based on them, a single visual image of the modeled subject area is created - the ER-model of the subject area.

4. Definition of attributes and their documentation. All attributes that describe the entities of the created ER model are revealed. Each attribute is given a meaningful name that users can understand. The following information is stored in the data dictionary for each attribute:

attribute name and description;
type and dimension of values;
the default value for the attribute (if any);
whether the attribute can have NULL values;
whether the attribute is composite, and if so, what simple attributes it consists of. For example, the attribute "Client's full name" can consist of simple attributes "Last name", "First name", "Patronymic", or it can be simple, containing single values, such as "Sidorsky Evgeniy Mikhailovich". If the user does not need access to individual elements of the "Name", then the attribute is presented as simple;
whether the attribute is calculated, and if so, how its values are calculated.

5. Definition of attribute values and their documentation. For each attribute of an entity participating in the ER model, a set of valid values is determined and a name is assigned to it. For example, the attribute "Account type" can only have the values "deposit", "current", "on demand", "card account". The data dictionary entries related to the attributes are updated with the names of the attribute value sets.

6. Definition of primary keys for entities and their documentation. This step is guided by the definition of a primary key - as an attribute or set of attributes of an entity that allows unique identification of its instances. Primary key information is placed in the data dictionary.

7. Discussion of the conceptual data model with end users. The conceptual data model is represented by an ER model with accompanying documentation containing a description of the developed data model. If domain inconsistencies are found, then changes are made to the model until users confirm that the model proposed by them adequately reflects their personal views.

2.2 Logic design

The purpose of the logical design stage is to transform the conceptual model based on the selected data model into a logical model that is independent of the features of the DBMS used later for the physical implementation of the database. To achieve it, the following procedures are performed.

An example of a logical database schema.

1. Choosing a data model. Most often, a relational data model is chosen due to the clarity of the tabular presentation of data and the convenience of working with them.

2. Defining a set of tables based on the ER model and documenting them. A table is created for each entity of the ER model. Entity name is the name of the table. Relations between tables are established through the mechanism of primary and foreign keys. The structures of the tables and the established relationships between them are documented.

3. Normalization of tables. To properly perform normalization, the designer must deeply understand the semantics and usage patterns of the data. At this step, he checks the correctness of the structure of the tables created in the previous step by applying the normalization procedure to them. It consists in bringing each of the tables to at least the 3rd NF. As a result of normalization, a very flexible database design is obtained, which makes it easy to make the necessary extensions to it.

4. Checking the logical data model for the possibility of performing all transactions provided by users. A transaction is a set of actions performed by an individual user or application program to change the contents of a database. So, an example of a transaction in the BANK project can be the transfer of the right to manage the accounts of a certain client to another client. In this case, several changes will need to be made to the database at once. If a computer crashes during a transaction, the database will be in an inconsistent state because some changes have already been made and others have not. Therefore, all partial changes must be undone to return the database to its previous consistent state.

The list of transactions is determined by the actions of users in the subject area. Using the ER model, the data dictionary, and established relationships between primary and foreign keys, an attempt is made to perform all the necessary data access operations manually. If any manual operation fails, then the compiled logical data model is inadequate and contains errors that must be eliminated. Perhaps they are related to a gap in the model of an entity, relationship, or attribute.

5. Determination of data integrity support requirements and their documentation. These requirements are restrictions that are put in place to prevent conflicting data from being entered into the database. At this step, data integrity issues are covered without regard to specific aspects of its implementation. The following types of restrictions should be considered:

required data. Finding out if there are attributes that cannot have NULL values;
restrictions on attribute values. Valid values for attributes are defined;
entity integrity. It is achieved if the entity's primary key does not contain NULL values;
referential integrity. It is understood that the foreign key value must be present in the primary key of one of the table rows for the parent entity;
restrictions imposed by business rules. For example, in the case of the BANK project, a rule may be adopted that prohibits the client from managing, say, more than three accounts.

Information about all established data integrity constraints is placed in the data dictionary.

6. Creation of the final version of the logical data model and discussion with users. This step prepares the final version of the ER model, which represents the logical data model. The model itself and updated documentation, including the data dictionary and relational table link schema, are presented for review and analysis by users, who must ensure that it accurately represents the subject area.

2.3. Physical design

The purpose of the physical design stage is to describe a specific implementation of a database located in the external memory of a computer. This is a description of the data storage structure and efficient methods of accessing database data. In logical design, they answer the question - what needs to be done, and in physical design - a way is chosen how to do it. The physical design procedures are as follows.

1. Designing database tables using the selected DBMS. A relational DBMS is selected to be used to create a database hosted on machine media. Its functionality for designing tables is deeply studied. Then the design of tables and the scheme of their connection in the DBMS environment is performed. The prepared database project is described in the accompanying documentation.

2. Implementation of business rules in the environment of the selected DBMS. Updating information in tables can be limited by business rules. The way they are implemented depends on the chosen DBMS. Some systems for implementing the requirements of the subject area offer more features, others less. In some systems, there is no support for implementing business rules at all. In this case, applications are developed to implement their limitations.

All decisions made in connection with the implementation of domain business rules are described in detail in the accompanying documentation.

3. Designing the physical organization of the database. This step selects the best file organization for the tables. The transactions that will be performed in the database being designed are identified, and the most important of them are highlighted. Transaction throughput is analyzed - the number of transactions that can be processed in a given time interval, and response time - the period of time required to complete one transaction. They strive to increase transaction throughput and reduce response time.

Based on these indicators, decisions are made to optimize the performance of the database by defining indexes in tables that speed up the selection of data from the database, or by reducing the requirements for the level of table normalization. The disk space required to accommodate the created database is estimated. Strive to minimize it.

Decisions made on the above issues are documented.

4. Development of a database protection strategy. The database is a valuable corporate resource, and much attention is paid to its protection. To do this, designers must have a complete and clear understanding of all the protections provided by the selected DBMS.

5. Organization of database functioning monitoring and its adjustment. After the creation of the physical project of the database, continuous monitoring of its functioning is organized. The resulting information about the performance level of the database is used to tune it. For this, the means of the selected DBMS are also involved.

Decisions to make any changes to a functioning database should be considered and weighed thoroughly.