Learn SQL Database Programming

Josephine Bush

Digital

Available

Learn everything you need to know to build efficient SQL queries using this easy-to-follow beginner’s guide

Key Features

  • Explore all SQL statements in depth using a variety of examples
  • Get to grips with database querying, data aggregate, manipulation, and much more
  • Understand how to explore and process data of varying complexity to tell a story

Book Description

SQL is a powerful querying language that's used to store, manipulate, and retrieve data, and it is one of the most popular languages used by developers to query and analyze data efficiently.

If you're looking for a comprehensive introduction to SQL, Learn SQL Database Programming will help you to get up to speed with using SQL to streamline your work in no time. Starting with an overview of relational database management systems, this book will show you how to set up and use MySQL Workbench and design a database using practical examples. You'll also discover how to query and manipulate data with SQL programming using MySQL Workbench. As you advance, you’ll create a database, query single and multiple tables, and modify data using SQL querying. This SQL book covers advanced SQL techniques, including aggregate functions, flow control statements, error handling, and subqueries, and helps you process your data to present your findings. Finally, you’ll implement best practices for writing SQL and designing indexes and tables.

By the end of this SQL programming book, you’ll have gained the confidence to use SQL queries to retrieve and manipulate data.

What you will learn

  • Install, configure, and use MySQL Workbench to restore a database
  • Explore different data types such as string, numeric, and date and time
  • Query a single table using the basic SQL SELECT statement and the FROM, WHERE, and ORDER BY clauses
  • Query multiple tables by understanding various types of table relationships
  • Modify data in tables using the INSERT, UPDATE, and DELETE statements
  • Use aggregate functions to group and summarize data
  • Detect bad data, duplicates, and irrelevant values while processing data

Who this book is for

This book is for business analysts, SQL developers, database administrators, and students learning SQL. If you want to learn how to query and manipulate SQL data for database administration tasks or simply extract and organize relevant data for analysis, you’ll find this book useful. No prior SQL experience is required.  

 

 

 

Learn SQL Database Programming

Query and manipulate databases from popular relational database servers using SQL

Josephine Bush

BIRMINGHAM - MUMBAI

 

 

 

Learn SQL Database Programming

 

 

 

Copyright © 2020 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Brandon D'Abreo
Acquisition Editor: Reshma Raman
Content Development Editor: Nazia Shaikh
Senior Editor: Ayaan Hoda
Technical Editor: Utkarsha S. Kadam
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Manju Arasan
Production Designer: Jyoti Chauhan

First published: May 2020

Production reference: 1290520

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-83898-476-2

www.packt.com

 

 

 

 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

 

 

 

Why subscribe?

 

 

 

  • Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

  • Improve your learning with Skill Plans built especially for you

  • Get a free eBook or video every month

  • Fully searchable for easy access to vital information

  • Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

 

 

 

Contributors

 

 

 

 

About the author

 

 

 

Josephine Bush has over 10 years of experience as a Database Administrator. Her experience is extensive and broad-based, including in financial, business, and energy data systems using MySQL, SQL Server, Oracle, and PostgreSQL. She is a Microsoft Certified Solutions Expert: Data Management and Analytics. She holds a BS in Information Technology, an MBA in IT Management, and an MS in Data Analytics.

I would like to acknowledge my husband, Jim, who provided support and encouragement at every step, and gave me especially useful baseball insights.

 

 

 

 

About the reviewers  

 

 

 

Starting out with the Microsoft stack, Frank Solomon gradually focused on SQL Server and database development. He then extended to writing and technical writing. He writes for SQL Shack, he blogs at Bit Vectors, and he had the lead co-author role for The SQL Workshop, a Packt book. Find Frank at LinkedIn, and reach out to him with writing / technical writing/development opportunities. He levers sharp software development and writing skills to build awesome products. He has plenty of remoting experience, and he uniquely relies on the active voice to build high-quality writing products.

Awni Al Saqqa is a Microsoft Technology Specialist in MS SQL Server and a certified solutions developer since 2007. He has over a decade of experience with database development and administration on SQL Server, Oracle, and MySQL. He is a solutions architect, who is hands-on in many enterprise projects for different business sectors, such as education, hospitality, retail, manufacturing, marketing, and more, which has given him the perfect combination between business and technical experience. Awni is also the Lead Author for The SQL Workshop book which is published by Packt

Packt is searching for authors like you

 

 

 

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

 

 

 

 

Preface  

 

 

 

SQL is a powerful querying language used to store, manipulate, and retrieve data, and is one of the most popular languages used by developers to query and analyze data efficiently. If you're looking for a comprehensive introduction to SQL, Learn SQL Database Programming will help you to get up to speed with using SQL to streamline your work in no time. Starting with an overview of relational database management systems, this book will show you how to set up and use MySQL Workbench and design a database using practical examples. You'll also discover how to query and manipulate data with SQL programming using MySQL Workbench. As you advance, you'll create a database, query single and multiple tables, and modify data using SQL querying. This SQL book covers advanced SQL techniques, including aggregate functions, flow control statements, error handling, and subqueries, and helps you process your data to present your findings. Finally, you'll implement best practices for writing SQL and designing indexes and tables.

By the end of this SQL programming book, you'll have gained the confidence to use SQL queries for retrieving and manipulating data.

Who this book is for

 

 

 

This book is for business analysts, SQL developers, database administrators, and students learning SQL. If you want to learn how to query and manipulate SQL data for database administration tasks or to simply extract and organize relevant data for analysis, you'll find this book useful. No prior SQL experience is required.

 

 

 

 

What this book covers

 

 

 

Chapter 1, Introduction to Relational Database Management Systems, introduces the concepts required to understand the basics of relational database management systems. It introduces foundational topics such as understanding SQL, the relational model, data integrity, database normalization, and the various types of relational database management systems. It gives you fundamental knowledge about SQL and databases that will be required throughout the book.

Chapter 2, Installing and Using MySQL Workbench, covers how to install MySQL Workbench on Windows and Mac, including step-by-step instructions to help you walk through each part of the installation process. The instructions also include the configuration of MySQL Workbench on both Windows and Mac. We will walk through some examples of connecting to your local MySQL and also setting up connections to other MySQL servers. We conclude with a step-by-step explanation of how to restore a database to MySQL.

Chapter 3, Understanding Data Types, covers what data types are and how they are used. You will learn about specific data types and what data can be stored in each of them. The data types include string, numeric, and date and time. String data types include char and varchar, binary and varbinary, blob, enum, and text. Numeric data types include bit, int, float, double, and decimal. Date and time data types include date, time, datetime, timestamp, and year. You will learn from the perspective of MySQL data types, but where there are differences versus SQL Server, Oracle, and PostgreSQL, those differences will be noted. We will also go through some examples of types and values of data to see how to assign them correctly to data types, including an explanation of why you need to be careful when selecting a data type and how it can impact database performance.

Chapter 4, Designing and Creating a Database, introduces you to designing and creating a database. We'll walk through the guidelines for naming conventions and understand SQL code errors. You will learn how to format SQL code for readability and apply data types and data integrity to our tables. You will also learn about the different types of table relationships and how to build entity-relationship diagrams. Going further, we will discuss the concept and usage of indexing. You will gain an understanding of how indexing helps database performance. Finally, you will learn how to create a table in a database.

Chapter 5, Importing and Exporting Data, introduces you to importing and exporting data. There are many ways to import and export data in MySQL. You will learn how to import and export data using MySQL Workbench via table data from/to CSV files. We will also cover importing and exporting via SQL data with SQL scripts. An additional way to export data via result data and query results will also be covered. The final topic discussed is using SQL syntax to import and export data.

Chapter 6, Querying a Single Table, covers how to use the basic SQL SELECT statement and the FROM, WHERE, and ORDER BY clauses. This chapter also covers how to tell which index your query is using and whether you may need additional indexes. By the end of this chapter, you will understand how to query data using the SELECT statement and the FROM clause. You will also learn how to limit results with a WHERE clause, how to use ORDER BY to return results in a specified order, and how to see information about what indexes are being used or may be needed.

Chapter 7, Querying Multiple Tables, covers how to use SQL joins to join two or more tables together, including INNER, OUTER (LEFT, RIGHT, and FULL), and advanced joins (the cross and self joins). You will learn about set theory and how to combine queries using UNION and UNION ALL, and how to get the differences and intersections of different queries. Lastly, you will learn how to optimize queries when they contain multiple tables.

Chapter 8, Modifying Data and Table Structures, goes through how to modify data in tables. This includes learning how to use INSERT, UPDATE, and DELETE statements. You will also learn about SQL transactions, which help to control the modification of data. Finally, you will learn how to modify a table structure.

Chapter 9, Working with Expressions, covers how to use literals, operators, columns, and built-in functions to create expressions. You will learn about the different types of built-in functions, including string, numeric, date and time, and advanced functions, which include casting and converting to other data types. You will learn how to use statistical functions, including how to get and use variance and standard deviation. Finally, you will learn how to create a generated column based on an expression.

Chapter 10, Grouping and Summarizing Data, covers how to use aggregate functions to group and summarize data. Aggregate functions include math functions such as AVG, SUM, COUNT, MIN, and MAX. You will also learn how to use the GROUP BY and HAVING clauses in conjunction with the aggregate functions. Finally, you will learn how MySQL executes your query clauses.

Chapter 11, Advanced Querying Techniques, covers how to use two different kinds of subqueries, correlated and non-correlated. Then, you will learn about two different types of common table expressions, recursive and non-recursive. You will learn about query hints and how to choose which index your query will use. Finally, you will learn about isolation levels and concepts relating to how data is read from and written to tables.

Chapter 12, Programmable Objects, covers how to create and use views, which includes selecting data from views, and inserting, updating, and deleting data using views. You will learn how to create and use variables, which includes how to declare and assign values to variables. You will also learn how to create and use stored procedures, including how to use variables and parameters in stored procedures, as well as how to control flow and error handling. In addition to all that, you will learn how to create and use functions, triggers, and temporary tables.

Chapter 13, Exploring and Processing Your Data, covers how to explore and process data. By the end of this chapter, you will understand how to get to know data by creating a statistical identity, you will have learned how to detect and fix anomalous and missing values, and will know how to use regular expressions to match data value patterns.

Chapter 14, Telling a Story with Your Data, teaches you how to find a narrative, including what types of stories you can tell with data and how to use the statistical identity of your data to determine a story. You will also learn about knowing your audience, including deciding who they are and what would be a compelling presentation for them. Then, you will learn how to identify a presentation framework, including explaining the question, answer, and methodology. Finally, you will learn how to use visualizations in your presentations.

Chapter 15, Best Practices for Designing and Querying, covers database best practices for database design, indexing, and querying and modifying data. You learned about these topics in the previous chapters, and this chapter will summarize and give additional tips for best practices. This chapter will also provide a way for the more experienced among you to quickly reference best practices instead of having to go through each chapter.

Chapter 16, SQL Appendix, covers the SQL commands discussed, which are outlined for quick reference. It includes the syntax for querying data, modifying data, and designing databases and tables. This chapter will help you by providing a quick reference guide, so you won't have to go back through all the chapters to check the syntax, but if you require more details about how the syntax works, you can refer to the specific chapter for that information.

 

 

 

To get the most out of this book

 

 

 

For this book to be useful, you either need access to MySQL Workbench and the ability to query a MySQL Server, or the ability to install them. To install them yourself, you will need elevated permissions. Installations for MySQL Workbench are found at https://dev.mysql.com/downloads/workbench/ and installations for MySQL server are found at https://dev.mysql.com/downloads/mysql/. If you don't want or don't have MySQL installed, you can follow along in most chapters with SQL code that will work in Oracle, PostgreSQL, or SQL Server, as well.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

 

 

 

Download the example code files

 

 

 

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the Support tab.
  3. Click on Code Downloads.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/learn-sql-database-programming. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

 

 

 

Download the color images

 

 

 

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838984762_ColorImages.pdf.

 

 

 

Conventions used

 

 

 

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "BINARY is like CHAR, but stores byte strings instead of character strings."

A block of code is set as follows:

<books>
  <book>
    <name>Learn SQL Programming</name>
 <author>Josephine Bush</author>
  </book>
</books>

Any command-line input or output is written as follows:

SELECT * FROM lahmansbaseballdb.appearances;

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Click Download on the DMG Archive."

Warnings or important notes appear like this.

Tips and tricks appear like this.

 

 

 

Get in touch

 

 

 

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

 

 

 

 

Reviews  

 

 

 

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

 

 

 

 

Section 1: Database Fundamentals  

 

 

 

The objective of this section is to introduce you to relational database management systems, how to set up and use MySQL Workbench, how to use data types, how to design and create a database, and how to import and export data.

This section comprises the following chapters:

  • Chapter 1, Introduction to Relational Database Management Systems
  • Chapter 2, Installing and Using MySQL Workbench
  • Chapter 3, Understanding Data Types
  • Chapter 4, Designing and Creating a Database
  • Chapter 5, Importing and Exporting Data

 

 

 

 

Introduction to Relational Database Management Systems  

 

 

 

This chapter introduces the concepts required to understand the basics of relational database management systems (RDMS). It will introduce foundational topics such as SQL, the relational model, data integrity, database normalization, and the types of relational database management systems. It will provide you with fundamental knowledge about SQL and databases that will be required throughout this book.

In this chapter, we will cover the following topics:

  • Understanding SQL
  • Understanding databases
  • Understanding data integrity
  • Understanding database normalization
  • Types of RDMS

 

 

 

Understanding SQL

 

 

 

Structured Query Language, or SQL (pronounced see-quel), is the language that is used for querying and manipulating data and defining structures in databases. Initially developed at IBM in the early 1970s, SQL became an ANSI and ISO standard in 1986.

SQL is a powerful, yet simple language, and can do many things, such as execute queries, retrieve, insert, update, and delete data, create databases and tables, and much more.

These types of activities can be grouped into different subdivisions of SQL: Data Definition Language (DDL), Data Manipulation Language (DML), and Data Control Language (DCL):

  • Use DDL commands to specify database schema:
  • CREATE: This is used to create a new database or objects in a database.
  • ALTER: This is used to alter a database or objects in a database.
  • DROP: This is used to delete a database or objects in a database.
  • TRUNCATE: This is used to remove all data from a table instantaneously.
  • Use DML commands to query and modify data:
  • SELECT: This is used to retrieve data from a database.
  • INSERT: This is used to insert data into a database.
  • UPDATE: This is used to update data in a database.
  • DELETE: This is used to remove data from a database.
  • Use DCL commands to control permissions and translations:
  • GRANT: This is used to give access to a user.
  • REVOKE: This is used to take access away from a user.
  • COMMIT: This is used to save changes in a transaction.
  • ROLLBACK: This is used to remove the saved changes in a transaction.

You won't learn about GRANT and REVOKE in this book. To get more information on granting and denying permissions, please visit https://dev.mysql.com/doc/refman/8.0/en/grant.html and https://dev.mysql.com/doc/refman/8.0/en/revoke.html.

 

 

 

Elements of SQL

 

 

 

The SQL language comprises several elements that will be explained in more depth in subsequent chapters. These elements include the following:

  • Queries that retrieve data based on specific criteria.
  • Clauses that are components of statements or queries.
  • Predicates that are logical conditions that evaluate to true or false. These help you to narrow down the results of your queries.
  • Expressions that produce either scalar values or tables of columns and rows. Expressions are a part of predicates.
  • Statements that are queries run against a database, comprised of clauses and, optionally, expressions and predicates.
  • White space that is generally ignored in SQL statements and queries, making it easier to format for readability because you don't have to worry so much about particular spacing for the SQL to run correctly.

The following diagram shows you the components of a SQL statement, which is also called a SQL query:

In the preceding diagram, you can see the different elements of a SQL statement. Each line in the preceding statement is considered a clause. Clauses use SQL keywords. Keywords are reserved words that have special significance in the SQL language -SELECT, FROM, and WHERE are just some of the keywords that are used. More information on keywords is provided in Chapter 4, Designing and Creating a Database. The preceding diagram also shows an expression and predicate. A predicate helps you to narrow down your query results. The expression is a piece of a predicate that sets the value. The diagram also helps to illustrate the use of white space. You could write out your entire query on one line, but it's much easier to read when you add carriage returns and spaces. The details of the different elements of queries will be covered more in future chapters of this book.

 

 

 

Understanding databases

 

 

 

A database is a collection of data. You store databases in a relational database management system (RDMS). The RDMS is the basis for modern database systems like MySQL, SQL Server, Oracle, PostgreSQL, and others. These will be covered in more detail later in this chapter.

 

 

 

Tables

 

 

 

In an RDMS, objects called tables store data. Tables are a collection of related data stored in columns and rows. The following screenshot is a cross-section of a table that contains data about baseball players' appearances in all-star games:

A NULL value in a table is a value that appears to be blank. It doesn't represent a string of blank spaces, zero, or a zero-length character string: it's a missing or unknown value.

The data has been sourced from http://www.seanlahman.com/baseball-archive/statistics/ with a CC BY-SA 3.0 license.

 

 

 

Fields

 

 

 

A field is an intersection of a row and a column. This field could be any type of data, including a yearID, teamID, or a playerID field (using our example). Each red arrow in the following screenshot points to a value in a column that is considered a field:

 

 

 

Records or rows

 

 

 

A row contains values in a horizontal division of data. In this example case, it's a row or record from a table:

 

 

 

Columns

 

 

 

A column contains values in a vertical division of data. In this example case, it's the gameID column from a table:

To ensure that the data in your tables is consistent and accurate, you will need to understand data integrity. You will learn about data integrity in the next section.

 

 

 

Understanding data integrity

 

 

 

Data integrity refers to the consistency and accuracy of the data. It is typically enforced by the procedures and guidelines in the database design phase. In RDMS, keys enforce data integrity. A key is user-defined and forces values in a table to conform to a specified standard. This standard will allow only certain kinds of values to be in the database.

 

 

 

Types of integrity

 

 

 

Data integrity refers to the consistency and accuracy of data and table relationships. The following table lists the types of integrity you can use:

Entity integrity

Referential integrity

Domain integrity

Unique constraint

Foreign key

Check constraint

Not null constraint

 

Default constraint

Primary key

   

 

Each type of integrity and how each relates to one another is discussed in the following sections.

 

 

 

Entity integrity

 

 

 

To ensure that each row in a table is identifiably unique, you use entity integrity. This is done with a few different types of keys or constraints, including unique, not null, and primary key constraints.

 

 

 

Unique constraints

 

 

 

To ensure that all values in a column or columns are different from each other, you use a unique constraint. This type of key can be applied to any data type and is used to avoid duplicate data. You can apply a unique constraint to multiple columns so that it creates a unique value across those multiple columns. It can contain null values.

If you create a unique constraint on one column, it will force the table to have unique values in that specific column. If they are not unique, then the row will not be able to be inserted or updated.

In the following screenshot, the parkkey constraint is unique. All the other fields can have duplicate information as long as the parkkey unique constraint isn't violated:

If you create a unique constraint on a combination of columns in a table, it will force the table to have unique values in the combination of those columns in the unique constraint. If they are not unique, the row will not be able to be inserted or updated.

The following screenshot shows an example of a composite, unique constraint. In this case, playerID, yearID, and teamID would need to be unique for the row to be acceptable:

 

 

 

Not null constraints

 

 

 

To ensure that all values in a column are not null, you use a not null constraint. This type of key can be applied to any data type and is used to avoid missing data. If you create a not null constraint on a column, it will force the table to have values in that specific column. If the values are null, then the row will not be inserted or updated.

In the following screenshot, you can see that the birthYear constraint is set to not null. The deathYear constraint would allow nulls since not all people have a year of death:

 

 

 

The primary key

 

 

 

The primary key is used to ensure that all values in a column are not null and unique. This key combines the unique and not null constraint properties into one key. This type of key can be applied to any data type and is used to avoid missing and duplicate data. You can only have one primary key per table.

If you create a primary key on a table, it will force the table to have unique, not null values in that specific column. If the values don't comply, then the row will not be able to be inserted or updated. You can also create a primary key on multiple columns. This is considered a composite key. In this case, the composite key would have to be unique for each row, otherwise the row could not be inserted or updated.

In the following screenshot, the playerID constraint would be the primary key because it's unique and not null for every row in the table:

In the following screenshot, the playerID, yearID, and teamID constraints could be the composite primary key because the combination of those three columns is unique and not null for every row in the table:

 

 

 

Referential integrity

 

 

 

Referential integrity refers to the consistency and accuracy between tables that can be linked together. By having a primary key on the parent table and a foreign key on the child table, you achieve referential integrity. A foreign key on the child table creates a link between one or more columns in the child table and the primary key on the parent table. When a foreign key is present, it must reference a valid, existing primary key in the parent table. This way, the data in both tables can maintain a proper relationship. You will learn more about this in the following example.

If you don't set up referential integrity, you wind up with orphaned records. For example, let's say that you delete a player from the first table here:

Now let's say that you didn't delete the corresponding record in the second table here. In this case, the second table's records would be orphaned:

If there was a foreign key constraint on the salary column, then the player could not be deleted from the parent table without first deleting the corresponding salary rows in the salary table. By having a foreign key constraint, we will also prevent users from adding rows to the child table without a corresponding parent row or changing values in a parent table that would result in orphaned child table records.

You won't get an error if there is incomplete data when you lack referential integrity constraints. It's basically like your records are lost in the database since they may never show up in reports or query results. This can cause all kinds of problems, such as strange results, lost orders, and potentially life-and-death situations where (for example) patients don't receive proper treatments.

When creating a foreign key constraint, the foreign key must reference a column in another table that is the primary key. It can be any data type and accept duplicate and null values by default. The foreign key constraint can maintain three types of table relationships (covered in more detail in Chapter 7, Querying Multiple Tables):

  • One-to-one: This type of relationship is when one table has just one corresponding row in another table. An example of this could be a table with employees and computers. Each employee has one computer.
  • One-to-many: This type of relationship is when one table has none, one, or many corresponding rows in another table. An example of this could be a table with adults and children. An adult table row may have none, one, or many rows in the child table.
  • Many-to-many: This type of relationship is when many rows in one table correspond to many rows in another table. An example of this could be the customers and products tables. Customers can purchase many products.

In the following screenshots, the primary key would be on the first table as playerID. The second table would have a foreign key reference to playerID on the first table. In this case, there would be a one-to-many relationship between the first and second tables because there is one player in the first table and none, one, or many rows corresponding to that player in the second table.

If you had a foreign key setup on playerID in the second table, then you would not be able to delete the playerID value from the first table unless you deleted it in the second table beforehand. This key setup maintains the referential integrity and ensures that you won't have orphaned records in the second table:

 

 

 

Domain integrity

 

 

 

To ensure that data values follow defined rules for formatting, range, and value using check and default constraints, you use domain integrity.

The check constraint is used to ensure that all values in a column are within a range of values. This type of key can be applied to any data type and is used to ensure that values aren't invalid. A check constraint is enforced with user-defined conditions and evaluates as either true or false. You can define a check constraint on a single column or a combination of columns in a table.

Since null doesn't evaluate as false, it can be inserted or updated into a field with a check constraint. So, because null evaluates to unknown, it can bypass a check constraint. If you want the column with a check constraint to not allow null, you need to also set a not null constraint on the column.

The following screenshot shows an example of a table where a check constraint would make sense on the inducted column. A player can either be inducted into the hall of fame or not. In this case, you could create a check constraint that only allows Y or N in that field. If the value isn't Y or N, then the row can't be updated or inserted:

The following screenshot shows an example of a table where a check constraint can be applied to multiple columns. For instance, you wouldn't want deathYear to be a year before the birthYear, so you can set a check constraint that will only allow you to add or update a birthYear or deathYear that follows a check constraint like birthYear < deathYear:

To ensure that all rows in a column have a value, you use a default constraint. This type of key can be applied to any data type. A default constraint assigns a default value to a field. This is used to avoid having a null value for a field if a user doesn't specify a value.

The following screenshot shows an example of a table where a default constraint could make sense on the ab column:

A player can be in a game without having any at-bats. In this case, you could create a default constraint that sets the ab column to 0 if the user provides no value.

 

 

 

Database normalization

 

 

 

Database normalization is the process of putting your raw data into tables using rules to avoid redundant data, optimize database performance, and ensure data integrity.

Without proper normalization, not only can you have data redundancy, which uses additional storage space, but it can be more difficult to update and maintain the database without data loss.

Normalization requires forms. Forms are sets of rules to follow to normalize your data into database tables. There are three forms that we will discuss: the first normal form, the second normal form, and the third normal form. Each of these forms has a set of rules to ensure that your database complies with the form. Each of the forms builds on the previous forms.

 

 

 

The first normal form

 

 

 

The first normal form (1NF) is the first level of database normalization. You will need to complete this step before proceeding to other database normalization forms. The primary reason to implement 1NF is to eliminate repeating groups. This ensures that you can use simple SQL statements to query the data. It also ensures that you aren't duplicating data, which uses additional storage and computing time. This step will ensure that you are doing the following:

  • Defining data, columns, and data types and putting related data into columns
  • Eliminating repeating groups of data:
  • This means that you will not have repeating columns, such as Year1, Year2, Year3, but instead will have a column that is named Year, and each row in the table will be a different year.
  • Another example of this is not having multiple values in the same field, such as 1985, 1987, 1989, but instead placing each year in a row.
  • This means that there are no exact duplicate rows. The example following this bullet list will explain this concept in more depth.
  • Creating a primary key for each table

In the following example, you could make the first column the primary key in the people table and the foreign key in the salaries table. In the salaries table, you could create a new primary key or create a composite key that is an amalgamation of multiple fields.

Here is a denormalized sample table:

Name

birthYear

Salary

SalaryYear

Jim Jones

1981

2750000, 4500000

2010, 2011

Joe Smith

1974

10600000

2014

 

There is a right way and wrong way to normalize this table. Let's go over the wrong way first:

Name

birthYear

Salary1

Salary2

SalaryYear1

SalaryYear2

Jim Jones

1981

2750000

4500000

2010

2011

Joe Smith

1974

10600000

 

2014

 

The preceding design has introduced new problems. Even though it doesn't have groups of repeating data in one column, the salary is limited to two values. What if a player has more than two salaries? You don't have anywhere to put another salary without adding a third column. This also wastes space for those players that only have one salary, and searching through this table for a player with a specific salary becomes difficult. The same goes for the SalaryYear columns.

The right way to normalize the denormalized table to the first normal form is to ensure that there aren't repeating groups, as shown in the following table. The people table with player information would look like the following:

playerID

nameFirst

nameLast

birthYear

jjones01

Jim

Jones

1981

jsmith01

Joe

Smith

1974

 

The Salary value has been removed and placed in another table with the playerID field linking them to each other; therefore, the salaries table will look like the following:

salaryID

playerID

salary

year

1

jjones01

2750000

2010

2

jjones01

4500000

2011

3

jsmith01

10600000

2014

 

Let's go through a denormalization example by looking at the following table:

playerID namefirst namelast birthYear franchID franchname teamID RBI rank yearID
abbotpa01 Paul Abbott 1967 PHI Philadelphia Phillies PHI 2 2 2004
abreubo01 Bobby Abreu 1974 PHI Philadelphia Phillies PHI 79 1 2000
abreubo01 Bobby Abreu 1974 PHI Philadelphia Phillies PHI 110 3 2001
alcanar01 Arismendy Alcantara 1991 CHI Chicago Cubs CHI 1 8 2015
almoral01 Albert Almora 1994 CHI Chicago Cubs CHI 14 8 2016
almoral01 Albert Almora 1994 CHI Chicago Cubs CHI 46 6 2017
alvarpe01 Pedro Alvarez 1987 PIT Pittsburg Pirates PIT 77 17 2015
alvarto01 Tony Alvarez 1979 PIT Pittsburg Pirates PIT 2 9 2002
alvarto01 Tony Alvarez 1979 PIT Pittsburg Pirates PIT 8 1 2004

 

To meet the requirements of 1NF, you would need to split this table into multiple tables. Depending on the table you are trying to normalize, you might not need to split it if it's already following the rules of 1NF.

This table only contains the information about the player and has a primary key of playerID:

playerID

namefirst

namelast

birthYear

abbotpa01

Paul

Abbott

1967

abreubo01

Bobby

Abreu

1974

alcanar01

Arismendy

Alcantara

1991

almoral01

Albert

Almora

1994

alvarpe01

Pedro

Alvarez

1987

alvarto01

Tony

Alvarez

1979

 

The other table would contain the rest of the fields from the denormalized table. The following table has a foreign key relationship to the preceding table regarding playerID:

 

playerID

franchID

franchname

teamID

RBI

rank

yearID

abbotpa01

PHI

Philadelphia Phillies

PHI

2

2

2004

abreubo01

PHI

Philadelphia Phillies

PHI

79

1

2000

abreubo01

PHI

Philadelphia Phillies

PHI

110

3

2001

alcanar01

CHI

Chicago Cubs

CHI

1

8

2015

almoral01

CHI

Chicago Cubs

CHI

14

8

2016

almoral01

CHI

Chicago Cubs

CHI

46

6

2017

alvarpe01

PIT

Pittsburg Pirates

PIT

77

17

2015

alvarto01

PIT

Pittsburg Pirates

PIT

2

9

2002

alvarto01

PIT

Pittsburg Pirates

PIT

8

1

2004

 

 

 

The second normal form

 

 

 

The second normal form (2NF) is the second level of database normalization. You will need to complete 1NF before beginning this step. The primary reason to implement 2NF is to narrow tables down to a single purpose, which makes it easier to use and design tables. This step will ensure that you do the following:

  • Meet the requirements of 1NF: You will need to implement 1NF before you can use 2NF.
  • Remove partial dependencies: This will entail narrowing tables down to a single purpose where possible.

Starting with the tables from our 1NF example, you can break these down further into additional tables. You will still have the same player table from 1NF since it serves a single purpose of giving us player information. The franchise table has multiple purposes with RBI and rank, so since RBI isn't related to the franchise, you will split the franchise table into two.

The franchise table has all the franchise and team information in it now, and the RBI columns and related columns can be split out into a batting table. The franchise table still has a primary key of franchID and playerID with a foreign key referring back to the player table on playerID:

playerID

franchID

franchname

teamID

rank

yearID

abbotpa01

PHI

Philadelphia Phillies

PHI

2

2004

abreubo01

PHI

Philadelphia Phillies

PHI

1

2000

abreubo01

PHI

Philadelphia Phillies

PHI

3

2001

alcanar01

CHI

Chicago Cubs

CHI

8

2015

almoral01

CHI

Chicago Cubs

CHI

8

2016

almoral01

CHI

Chicago Cubs

CHI

6

2017

alvarpe01

PIT

Pittsburg Pirates

PIT

17

2015

alvarto01

PIT

Pittsburg Pirates

PIT

9

2002

alvarto01

PIT

Pittsburg Pirates

PIT

1

2004

The batting table has a primary key of playerID and teamID and has a foreign key of playerID to the player table and a foreign key of teamID to the franchise table:

playerID

teamID

RBI

yearID

abbotpa01

PHI

2

2004

abreubo01

PHI

79

2000

abreubo01

PHI

110

2001

alcanar01

CHI

1

2015

almoral01

CHI

14

2016

almoral01

CHI

46

2017

alvarpe01

PIT

77

2015

alvarto01

PIT

2

2002

alvarto01

PIT

8

2004

 

Since you split the franchise table into two tables -one table with franchise information and one table with batting information -these tables now comply with 2NF because each table is serving a single purpose.

 

 

 

The third normal form

 

 

 

The third normal form (3NF) is the second level of database normalization. You will need to complete 2NF before beginning this step. The primary reason to implement 3NF is to ensure that your tables aren't created so that dependencies between columns may cause inconsistency. Generally, if a database is described as normalized, then it's normalized according to the third normal form. This step will ensure that you are doing the following:

  • Meeting the requirements of 2NF: You will need to make sure that your tables comply with 2NF before proceeding with 3NF.
  • No attributes depend on other non-key attributes: This means that you will need to look at your tables and see whether more fields can be split into other tables since they aren't dependent on a key.

Working from our 2NF example, you can further split the franchise table into a franchise table and a team table. The rank of the team isn't dependent on the primary key of the franchise table.

In 3NF, our franchise table becomes the following two tables.

franchID becomes the primary key in the franchise table:

franchID

franchname

PHI

Philadelphia Phillies

CHI

Chicago Cubs

PIT

Pittsburg Pirates

 

The team table has a primary key of teamID and a foreign key of franchID referring to the franchise table:

franchID

teamID

rank

yearID

PHI

PHI

2

2004

PHI

PHI

1

2000

PHI

PHI

3

2001

CHI

CHI

8

2015

CHI

CHI

8

2016

CHI

CHI

6

2017

PIT

PIT

17

2015

PIT

PIT

9

2002

PIT

PIT

1

2004

To summarize the process of taking data from denormalized to the third normal, here's a diagram of the changes that are made:

The preceding diagram shows how you went from denormalized to 3NF. Following the rules of the normal forms, you took a single table and turned it into four tables. To begin with, you split up one denormalized table into two tables as part of 1NF. 1NF ensured that you didn't have duplicate data and repeating groups. This resulted in a player and franchise table. Next, you split the tables out into three tables as part of 2NF. 2NF resolved the issue of not giving each table a specific purpose, resulting in a player, franchise, and batting table. For the final step, you split the tables into four tables as part of 3NF. 3NF ensured that you didn't have any fields in a table that weren't dependent on the primary key, resulting in a player, franchise, batting, and team table.

By going from a denormalized table to 3NF, you accomplished several things, including ensuring that you don't have duplicate data, that you have keys linking data to each other in the tables, that you have a single purpose for each table, and that you have minimized the storage and compute costs for your queries.

Even adhering to the third normal form can be taken to extremes, so while the third normal form is desirable, it's not always required. For instance, with zip codes, you could create a table with just zip codes since they may be duplicated in a table with users' addresses, but this may degrade performance instead of helping performance.

 

 

 

Types of RDMS

 

 

 

An RDMS is a database that stores data in tables using rows and columns. The values in the tables are related to one other, and the tables may also be related to one another, hence the term relational. This relationship makes it possible to access data across multiple tables with a single query.

In this section, we will review the top four relational database management systems. The top four are Oracle, MySQL, SQL Server, and PostgreSQL.

According to the DB-Engines Ranking, here are the scores for the top RDMSes at the time of writing this book:

The preceding screenshot can be found at https://db-engines.com/en/ranking.

 

 

 

Oracle

 

 

 

Oracle was first released in 1979. Oracle was the first commercially available SQL-based RDMS. It does have one free version, Oracle Database XE, which has some limitations compared to its licensed versions. Oracle runs best on Linux, but can be installed on Windows. Oracle is a great choice for organizations that need an RDMS and can handle very large databases and a variety of features.

The advantages of Oracle are that it offers a lot of functionality for system and database administrators, it's fast and stable, and it has lots of support and documentation.

The disadvantages of Oracle are that licensing is expensive and it may require significant database administrator resources to maintain it after installation.

 

 

 

MySQL

 

 

 

MySQL is a free, open source SQL database that started in 1995. It also has proprietary licensing available, which includes support and maintenance. Sun Microsystems bought MySQL in 2008, which was then acquired by Oracle in 2010. MySQL is commonly used in conjunction with PHP web applications. MySQL is a great choice for organizations that need a good RDMS but have a tight budget.

The advantages of MySQL are that it's available for free, it offers a lot of functionality for system and database administrators, it's easy to use and implement, and it's fast and stable.

The disadvantages of MySQL are that while support is available, it's not free. Also, since it's under Oracle, not all features are free, including paid-for options, such as enterprise monitoring, backup, high availability, scalability, and security.

 

 

 

SQL Server

 

 

 

Initially released in 1989, SQL Server is available with a commercial license. It does have one free version, SQL Server Express, with a limitation of 10 GB per database, along with other resource limitations. SQL Server is usually installed on Windows, but can also be installed on Linux. SQL Server is a great choice for organizations that need a good RDMS, and use a lot of other Microsoft products.

The advantages of SQL Server are that it offers a lot of functionality, including replication, and high availability and partitioning works very well with other Microsoft products, such as .NET Framework and Visual Studio. It is also fast and stable.

The disadvantages of SQL Server are that licensing is expensive, especially for the Enterprise edition, and not all features are included in all editions, such as some high-availability options and partitioning.

 

 

 

PostgreSQL

 

 

 

The first release of PostgreSQL was in 1989. This indefinitely and doesn't enforce any limits at all. PostgreSQL is usually installed on Linux machines and can be used to store structured and unstructured data. PostgreSQL is a great choice for organizations that need a good RDMS, already use Linux, and don't want to spend a lot of money on licensing.

The advantages of PostgreSQL are that it offers a lot of functionality, such as high availability and partitioning, it's scalable and can handle terabytes of data, and it's fast and stable.

The disadvantages of PostgreSQL are that documentation may be harder to come by and configuration can be confusing. It also runs on Linux, and you need to know how to run commands from Command Prompt.

 

 

 

RDMS SQL differences

 

 

 

Even though there is an ANSI/ISO standard, there are different versions of SQL. Still, to be compliant, they all similarly support the major commands, so SELECT, WHERE, INSERT, UPDATE, and DELETE would all have a syntax that matches.

Each subsequent chapter of this book will also note differences in the language or functionality of SQL where there are differences between MySQL and SQL Server, PostgresSQL, and Oracle.

 

 

 

 

Summary  

 

 

 

This chapter introduced the concepts required to understand the basics of relational database management systems. It introduced you to foundational topics such as understanding SQL, what SQL can do, and its basic components. You learned that there are three subdivisions of SQL called DML, DDL, and DCL, and that the SQL language is comprised of several elements that make up a statement. We walked through a description of the relational model, what a database is, and what is in a database, including what a table, row, column, and field are.

We followed with an explanation of data integrity, including the different types of data integrity, such as entity, referential, and domain integrity, and looked at how to use keys and constraints. This understanding of data integrity helped you understand database normalization, including the different forms of normalization, 1NF, 2NF, and 3NF.

Lastly, you learned about the types of relational database management systems, including Oracle, MySQL, SQL Server, and PostgreSQL, and their advantages and disadvantages. You also learned what makes these RDMS different from each other in terms of SQL syntax.

In the next chapter, we will look at how to install MySQL Workbench on Windows and Mac, and go through step-by-step instructions to help you walk through each part of the installation process. The instructions will also include the configuration of MySQL Workbench on both Windows and Mac. We will walk through some examples of how to connect to your local MySQL and set up connections to other MySQL servers. We'll conclude with a step-by-step explanation of how to restore a database to MySQL.

 

 

 

 

Questions

 

 

 

  1. What is SQL?
  2. What are the different subdivisions of SQL?
  3. What are the elements of a SQL statement?
  4. What are the reasons to normalize a database?
  5. What are the levels of database normalization?
  6. What is data integrity?
  7. What are the different ways you can enforce data integrity?
  8. What types of RDMS exist?
  9. What is the main advantage of MySQL?
  10. What is the main disadvantage of Oracle and SQL Server?
   
Language English
ISBN-13 9781838981709
No of pages 566
Book Publisher Packt Publishing
Published Date 29 May 2020

About Author

Author : Josephine Bush

NA

Related Books