There are many different types of databases nowadays, with the advent of alternative storage techniques (CouchDB, Amazon Simple Storage, Google’s BigTable, etc.) but the relational database is still the most popular. Virtually all relational databases use structured query language (SQL) to access the information within. We’ll focus on MySQL syntax for these database problems because it’s the most widely used relational database in industry, but others such as MSSQL, Oracle, SQLite or PostgreSQL all have similar syntax.
A relational database stores information in a group of tables called a schema. Each table consists of rows and columns. Often, tables represent a class where the columns define the table’s properties and each row represents an object of that class. Each table must contain at least one column, but do not necessarily contain any rows. Every column is associated with a specific type as well as possible constraints, and a row must contain data for every column with matching types.
A key for a table is a column or possibly columns that contains a unique identification for a row in the table. A primary key is selected if multiple columns uniquely identify a row. Tables can be joined by foreign keys, which is usually a primary key from another table.
Just as a reminder, SQL is usually not case sensitive, but convention dictates capitalizing keywords entirely (e.g. SELECT) and camel casing the name of a table (e.g. TableName) whereas row names are all lowercase and often delimited by underscores (e.g. row_name).
Most interview questions involving databases consists of how to write queries for a specific database schema. For practice, we’ve provided a sample stock market related schema below:
symbol INT(11) PRIMARY KEY NOT NULL,
symbol INT(11) PRIMARY KEY NOT NULL,
FOREIGN KEY symbol REFERENCES Companies
Notice that the primary key is not an auto-incremented integer but instead the stock ticker. As a general design decision, integer auto-incrementing keys should only be used when there is no other possible unique identifier for the table.To insert values into a table, use the INSERT statement, whose syntax is as follows:
INSERT INTO TABLE(row_names…) VALUES(row_values);INSERT INTO Companies(symbol, name, ceo, employees)
For instance, to insert a company called TimeSavers run by Missy Jones
and 63 other employees with a stock ticker of TS into the Companies
table, you would use the following query:
VALUES(‘TS’, ‘TimeSavers’, ‘Missy Jones’, 63);
To retrieve values from a table, use the SELECT statement; for the example above,
SELECT * FROM Companies;
will yield the following results assuming they were already stored in the database:
Symbol Name CEO Employees
AC Able’s Coffee Jason Able 129
OS OfficeSuppliers Sheryl Tera 35
GD Grocery Deluxe Mike Perez 88
TS TimeSavers Missy Jones 63
You can choose to only display certain fields. For example,
SELECT symbol, revenue FROM Financials;
Also, you have the ability to specify ranges of values you’re interested in:
SELECT symbol, revenue FROM Financials WHERE revenue > 1000000;
The WHERE clause can include multiple ranges as well as use AND or OR statements to make further classifications possible. There are also GROUP BY and ORDER BY clauses to sort data, and aggregate clauses such as MIN, MAX, SUM, and AVG to perform relatively simple functions on sets of data.
To combine data from two different tables, a join is required.
SELECT symbol, name, revenue FROM Companies, Financials
Symbol Name Revenue
AC Able’s Coffee $1,005,000.00
OS OfficeSuppliers $500,000.00
GD Grocery Deluxe $800,000.00
TS TimeSavers $250,000.00
When writing SQL queries that include variables, it is imperative to understand SQL injection because it is a very real and dangerous threat to the security of your database. Unethical attackers may be able to gain access to information or wipe out your data by injecting their own query code. This can happen if you don’t escape certain special characters, most notably quotes.
For instance, if your query contains a variable $name whose value is obtained from a user input such as a web form, you must take care to escape special characters. Virtually every programming language has a function to do this; in PHP mysql_real_escape_string($str) accomplishes this. The reason behind escaping is to prevent attackers from ending your query prematurely and inserting their own queries afterwards.
Here is an example; assume your query is the following:
SELECT * FROM Companies WHERE name=’$name’;
An attacker may fill out the $name field as such:
Attacker’; DELETE FROM Companies WHERE 1=1; '--
This leads to the following combined query after variable substitution:
SELECT * FROM Companies WHERE name=’Attacker’; DELETE FROM Companies WHERE 1=1; '--’;
Because the attacker included an ending quote, he is able to insert his own query after yours, thereby deleting your entire table’s worth of data!
Granted, the attacker requires some knowledge about the database such as table names, but most organizations use common names that are relatively easy to guess. As such, it is always a good idea to escape your queries.
A database transaction is a collection of commands grouped into an indivisible unit of work. Transactions basically model real world events that are atomic (meaning that they either happen, or they don’t) but require multiple database operations to achieve.
The classic example is for financial transactions such as ATM withdrawals. Although it seems like a withdrawal is a single operation, it actually requires several steps: checking the account balance, moving the money out of the account, and adding the money into the ATM’s cash log.
As a result, if a withdrawal is not treated as a transaction, unexpected side effects may occur. For example, a husband and wife have access to a joint bank account which has a balance of $100. The husband and wife simultaneously withdraw $100, but if there were no database transactions, their account would be at $0 instead of -$100 afterwards!
This is because as the husband issues the withdrawal, the ATM checks the account to see if there is enough money; since $100 >= $100, there is adequate funds. Simultaneously, the wife issues the withdrawal and the check also passes. Then the husband’s machine deducts $100 from the account ($100 - $100 = 0) and spits out the money, and the wife’s machine does the same thing ($100 - $100 = 0), so the final balance of their account is $0.
This is bad news for the bank. Transactions can remove this possibility by requiring that all three steps be performed in a row as one operation until another operation can take place.
A transaction must follow the ACID principles to ensure data integrity. ACID is an acronym that stands for the following properties:
Atomicity – if all steps in a transaction do not succeed, all changes are rolled back. Changes are only made if the entire transaction is successful.
Consistency – the state of the database must be correct and consistent at the beginning and end of the transaction.
Isolation – every step in a transaction is performed without other operations accessing the database. Only one transaction may be processed at a time.
Durability – transactional changes, once successful, remain persistent in the database and will survive system failure. This is usually achieved by storing all operations in a log for replay later if necessary.
If all of these principles are followed, the transactions will ensure data integrity. However, the checks for all of these principles are computationally intensive and require a significant amount of running time, so not all databases implement ACID properties in order to improve performance.
Assume the practice questions refer to the tables and data from the previous examples.
Practice Question: Insert a company called Maximal Fitness run by Max Levin and 72 other employees with a ticker symbol of MF into the Companies table.
INSERT INTO Companies(Symbol, Name, CEO, Employees)
VALUES(‘MF’, ‘Maximal Fitness’, ‘Max Levin’, 72);
Practice Question: Select the name of the companies that have revenues over $300,000 and under $1,000,000.
SELECT Companies.name FROM Companies, Financials
WHERE Financials.revenue > 300000 AND Financials.revenue < 1000000;
Practice Question: Select the average number of employees amongst all companies.
SELECT AVG(employees) FROM Companies;