Databases are the best way to store data, and everyday there is more and more data! The next hundred years are years of data-mining, machine learning and a lot of automation. The only way to achieve that is by mastering the databases.
At the beginning, information was saved into more simple files like .csv (comma separated). This was very disorganized – the data was both easy to corrupt and hard to access (imagine having to browse comma separated files).
At some point, our friend Edgar decided to re-design those files and created a standard way of storing data into files focused on performance and integrity. He partnered with mister Donald and together they created a language called SQL (Structured Query Language) – which was designed to work with data in a more user-friendly way.
SQL is the most used language when you want to speak to a database directly. It stands for: Structured Query Language, and is simply a computer language for storing, manipulating and retrieving data that is stored in relational databases.
An SQL query syntax looks something like this:
SELECT username FROM user WHERE email@example.com'
In this example, we are asking the database for all the users with the email equal to "firstname.lastname@example.org"
☝ If you want to earn a developer’s respect, you need to be comfortable with SQL. You will use it A LOT when working with data.
Databases house collections of data called tables. Everything gets stored in tables with rows and columns, just like spreadsheets. Each column represents common attributes for the rows they intersect with, which are instances of the data.
Each table is an entity of information like "People", "Cars", "Events", etc. The rows contain the actual data representing an instance of the subject- so if we have a table called "People", it might contain columns for "First-Name", "Last-Name", and an "id", a row would have those values pertaining to the column, like "Bob", "Ross", and "42" (further breakdown below). The table is the only structure capable of storing data via SQL.
Tables have columns (just like excel). A table called Person can have 4 columns: Name, Last Name, Birth-Date and Weight. A table can have as many columns as you want, and will hold a specific type of value (such as string, integer, boolean etc). Columns describe the table in a similar way as properties describe a class in object oriented programming.
If a table is called Persons and has those 4 columns (name, last name, birth date and weight), then the rows will be each specific person that you have stored into your database.
A database is a collection of inter-connected tables. The connection between two tables is called a "relationship," and it can be one of the following:
One to One:
The perfect example is the IRS database: this database probably has a table called TaxPayer that contains all the information about each person (with a social security number), and another table with the TaxDeclarations of the current year. One person can only have one declaration and one declaration can only be made by one person.
One to Many:
The Major League Baseball database probably has a table called Player (with the list of all the active players), and another table called Team with the list of all the active teams. Both tables are connected because one team has many players, but a player can be on only one team.
Many to Many:
A Public Library database probably has one table called Author (containing the information of all the authors with books published), and also another table with ALL of the books that have ever been published. Both tables are related because one author can have many books and one book can have many authors.
There are 3 main operations you can do to a table: Create, Update or Delete a table. In SQL, those operations are called Create, Alter and Drop. Remember that these operations are used to manipulate the structure of the database – not the data that it contains.
Creates a new table, a view of a table, or another object in the database.
CREATE TABLE IF NOT EXISTS `chat_group` ( `chat_group_id` int(10) UNSIGNED NOT NULL, `name` varchar(20) NOT NULL, `create_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP PRIMARY KEY('chat_group_id') ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Modifies an existing database object, such as a table.
ALTER TABLE table_name MODIFY column_name datatype NOT NULL;
Deletes an entire table, a view of a table, or another object in the database.
DROP TABLE CUSTOMERS;
When using SQL, you have 4 main commands to manipulate data: Select, Insert, Update and Delete.
All of those commands are designed to manipulate ONE or SEVERAL record/rows of the database at the same time. But, you can only execute ONE command at a time.
This is the most used operation. It is the only way to retrieve any specific row/record of data from a database. You can specify what rows you want to retrieve by requesting a group of conditions that those rows have to meet.
SELECT column1, column2... columnN FROM table1 WHERE column1 = 3; //Select a particular user by his Social Security Number SELECT ssn, username, email FROM USER WHERE ssn = '233-34-3453';
Creates a new row/record into the table. It will be appended at the end.
INSERT INTO table_name (column1,column2,...columnN) VALUES (value1,value2,...valueN); //Insert a particular user INSERT INTO USER (ssn,username,email) VALUES ('234-45-3342','alesanchezr','email@example.com');
Updates a record or row of a specific table. You have to give either one or many conditions to identify the specific row(s) you want to update.
UPDATE table_name SET column1 = value1 WHERE [condition] //updating the email of a user UPDATE USER SET email = 'firstname.lastname@example.org' WHERE ssn = '333-44-5534'
Works very similar to update, but, instead of passing the new values of the new columns you want to update, you only need to specify which rows you want to delete by requesting a group of conditions.
DELETE FROM table_name WHERE [condition] //delete all users (the condition is optional DELETE FROM user; //delete a specific user DELETE FROM user WHERE ssn = '342-45-6732'
Keeping the integrity of the data is hard! Databases with a lot of activities and users normally struggle to keep data integrity. Sometimes the data is so delicate and sensitive that adding a zero to a simple integer could end up making someone a millionaire!
We can classify the integrity problems in 4 types:
A transaction is a group of one or more SQL operations (inserts, deletes, updates, creates, drops, etc.).
A transaction groups several operations into one because they all depend on each other. You need to have the guarantee that if one of those operations fails, the database will "undo" everything you did within the same transaction.
For example, let say that a Bank Client is moving money from one account to another. This transaction involves two operations:
What if the first operation was successful but the second one failed? You need to "undo" the first one or the user will lose the money forever.
Transactions have the following four standard properties, usually referred to by the acronym ACID:
These are the following commands used to control transactions:
Transactional control commands are only used with the DML commands INSERT, UPDATE and DELETE. They cannot be used while creating tables or dropping them because these operations are automatically committed in the database.
The COMMIT command is the transactional command used to save changes invoked by a transaction to the database.
The COMMIT command saves all transactions to the database since the last COMMIT or ROLLBACK command.
The syntax for COMMIT command is as follows:
//operation one... //operation two... //operation three... COMMIT;
The ROLLBACK command is the transactional command used to undo transactions that have not already been saved to the database.
The ROLLBACK command can only be used to undo transactions since the last COMMIT or ROLLBACK command was issued.
The syntax for ROLLBACK command is as follows:
//any operation before the rollback INSERT INTO client_account SET (amount) VALUES (1000) //now we rollback the transaction, the insert never happened ROLLBACK;
A SAVEPOINT is a point in a transaction when you can roll the transaction back to a certain point without rolling back the entire transaction.
The syntax for SAVEPOINT command is as follows:
This command serves only in the creation of a SAVEPOINT among transactional statements. The ROLLBACK command is used to undo a group of transactions.
The syntax for rolling back to a SAVEPOINT is as follows:
ROLLBACK TO SAVEPOINT_NAME;
The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that you have created.
The syntax for RELEASE SAVEPOINT is as follows:
RELEASE SAVEPOINT SAVEPOINT_NAME;
The SET TRANSACTION command can be used to initiate a database transaction. This command is used to specify characteristics for the transaction that follows.
For example, you can specify a transaction to be read only, or read write.
The syntax for SET TRANSACTION is as follows:
SET TRANSACTION [ READ WRITE | READ ONLY ];