Setup postgress on a container

 the fastest way we can get everyone up and running on PostgreSQL is to have everyone use Docker containers.  Dockers allows any computer to run a lightweight emulation of a pre-made environment (typically Linux). In our case we’re going to be using a premade Linux environment that has PostgreSQL already installed on it.

Used containers

// image we will need
docker pull postgres:14
// completed image
docker pull btholt/complete-intro-to-sql

Running postgres

docker run -e POSTGRES_PASSWORD=lol --name=pg --rm -d -p 5432:5432 postgres:14

This should run PostgreSQL in the background.

  • We gave it a password of “lol” (feel free to change it to something different, I just remember lol because lol)
  • We ran it with a name of pg so we can refer it with that shorthand
  • We used the --rm flag so the container deletes itself afterwards (rather than leave its logs and such around)
  • We ran it in the background with -d. Otherwise it’ll start in the foreground.
  • The -p allows us to expose port 5432 locally which is the port Postgres runs on by default.

Run docker ps to see it running. You can also see it in the Docker Desktop app running under the containers tab.

Now run docker kill pg to kill the container. This will shut down the container and since we ran it with --rm it’ll clean itself up afterwards too. We can run the docker run … to start it again.

  • Okay, let’s try connecting to it with psql, the CLI tool for connecting to Postgres.

Connect to Postgres

docker exec -u postgres -it pg psql

Run postgres from your container

psql -U postgres

Postgres

psql commands

psql has a bunch of built in commands to help you navigate around. All of these are going to begin with \. Try \? to bring up the help list.

For now we’re interested in what databases we have available to us. Try running \l to list all databases. You’ll likely see the databases :

  • postgres,
  • template0,
  • template1.

Default databases

template1 is what Postgres uses by default when you create a database. If you want your databases to have a default shape, you can modify template1 to suit that.

template0 should never be modified. If your template1 gets out of whack, you can drop it and recreate from the fresh template0.

postgres exists for the purpose of a default database to connect to. We’re actually connected to it by default since we didn’t specify a database to connect to. You can technically could delete it but there’s no reason to and a lot of tools and extensions do rely on it being there.

Databases Overview

Create a database

A database is a collection of similar tables of data.

CREATE DATABASE recipeguru;

Switching to a database

to switch to a database you simply use the \c command followed by the database name

\c recipeguru
/*
Where recipeguru is our database name
*/

Show tables

there is no such a thing as show table … , to show all the tables in Postgres you simply use the command,

\d
// show all tables 
\d table_name 
// Describe the structure of the table

Insert into your table

INSERT INTO ingredients (title) VALUES ('bell pepper');

Select from your table

SELECT * FROM ingredients;

Alter table

Sometimes you want to change the structure of your table in this case you can use the alter command

Adding a column

ALTER TABLE ingredients ADD COLUMN image VARCHAR ( 255 );
 
or
ALTER TABLE ingredients
ADD COLUMN image VARCHAR ( 255 ),
ADD COLUMN type VARCHAR ( 50 ) NOT NULL DEFAULT 'vegetable';
 

Deleting a column

sometimes you want to delete a column well here you go

 
ALTER TABLE ingredients DROP COLUMN image;

Database Insert

INSERT INTO ingredients (
 title, image, type
) VALUES (
  'red pepper', 'red_pepper.jpg', 'vegetable'
);

This is the standard way of doing an insert. In the first set of parens you list out the column names then in the values column you list the actual values you want to insert.

Big key here which will throw JS developers for a loop: you must use single quotes. Single quotes in SQL means "this is a literal value". Double quotes in SQL mean "this is an identifier of some variety".

INSERT INTO "ingredients" (
 "title", "image", "type" -- Notice the " here
) VALUES (
  'broccoli', 'broccoli.jpg', 'vegetable' -- and the ' here
);

Use -- for comments

The above query works because the double quotes are around identifiers like the table name and the column names. The single quotes are around the literal values. The double quotes above are optional. The single quotes are not.

On conflict

Note the ON CONFLICT section. Some of these you may have already inserted (like the red pepper.) This is telling PostgreSQL that if a row exists already to just do nothing about it.

On conflict do nothing

This basically means that in case there is a conflict then do nothing on your database

INSERT INTO ingredients (
  title, image, type
) VALUES
  ( 'avocado', 'avocado.jpg', 'fruit' ),
  ( 'banana', 'banana.jpg', 'fruit' ),
  ( 'tomato', 'tomato.jpg', 'vegetable' ),
  ( 'watermelon', 'watermelon.jpg', 'fruit' )
ON CONFLICT DO NOTHING;

PG won't crush in case there is a conflict in your data that you are inserting

On conflict update

Sometimes you might have a conflict and rather than doing nothing you would like to update your stuff then that is where update on conflict becomes handy

INSERT INTO ingredients (
  title, image, type
) VALUES
  ( 'watermelon', 'banana.jpg', 'this won''t be updated' )
ON CONFLICT (title) DO UPDATE SET image = excluded.image;
 
/*
where the (title) is the basis of your conflict, 
and you simply say that update the image to the new incoming name in case there is a conflict on the watermelon
 
excluded.image is the new image being inserted.
*/

Database Update

The next thing we need to learn is how to update a field. We’ve seen how to INSERT and do an update if an INSERT fails, but let’s see how to update something directly.

UPDATE ingredients SET image = 'watermelon.jpg' WHERE title = 'watermelon';

The WHERE clause is where you filter down what you want to update.

In this case there’s only one watermelon so it’ll just update one but if you had many watermelons it would match all of those and update all of them,

You probably got a line returned something like

UPDATE 1

If you want to return what was updated try:

UPDATE ingredients SET image = 'watermelon.jpg' WHERE title = 'watermelon' 
RETURNING id, title, image;

The RETURNING clause tells Postgres you want to return those columns of the things you’ve updated.

In our case I had it return literally everything we have in the table so you could write that as

return everything that was inserted!💖

UPDATE ingredients SET image = 'watermelon.jpg' WHERE title = 'watermelon' RETURNING *;

The * means “everything” in this case.

Let’s add two rows with identical images.

INSERT INTO ingredients
  (title, image)
VALUES
  ('not real 1', 'delete.jpg'),
  ('not real 2', 'delete.jpg');

Whitespace isn’t significant in SQL. You can break lines up however you want and how they make sense to you.

Now let’s update both.

UPDATE ingredients
SET image = 'different.jpg'
WHERE image='delete.jpg'
RETURNING id, title, image;

Here you can see as long as that WHERE clause matches multiple rows, it'll update all the rows involved.

DELETE

Deletes are very similar in SQL.

DELETE FROM ingredients
WHERE image='different.jpg'
RETURNING *;

Here we just have no SET clause. Anything that matches that WHERE clause will be deleted. The RETURNING, like in updates, is optional if you want to see what was deleted.

Select statement

Let’s next hop into SELECT, or how to read data from a database. We’ve already seen the most basic one.

SELECT * FROM ingredients;

The * there represents all available columns. Frequently, for a variety of reasons, we do not want to select everything. In general, I would recommend only using * where your intent is truly “I want anything this database could ever store for these records“.

Frequently that is not the case. Frequently it is “I need the name, address and email for this email, but not their social security or credit card number”.

This is positive because it means smaller transfer loads, that your system only processes data it needs and not the rest, and it shows intention in your code. Honestly the biggest benefit is the latter: to show intentions in your code.

So in our case we could put

SELECT id, title, image, type FROM ingredients;

When you read this now you know that the former coder was actively looking for those three columns.

LIMIT and OFFSET

Okay, now what if the user only wants five records?

SELECT id, title, image
FROM ingredients
LIMIT 5;

This will limit your return to only five records, the first five it finds. You frequently will need to do this as well (as in almost always) since a database can contain millions if not billions of records.

So what if you want the next five records?

SELECT id, title, image
FROM ingredients
LIMIT 5
OFFSET 5;

This can be inefficient at large scales and without the use of indexes.

It also has the problem of if you're paging through data and someone inserts a record in the meantime you could shift your results. We'll get into optimizing queries a bit later, but just be aware of that.

Using offset, when someone is adding and deleting stuffs throughout your data then it's really gonna be hard for the offset to give them the updated results, some rows might never get returned to the user

A better way of doing these kind of select is using the id, which is much more efficient and way performant compared to the brother offset.

SELECT *
FROM ingredients
WHERE id >= 10 and id<=20;

WHERE

Sometimes you don’t want all of the records all at once. In that case you can add a WHERE clause where you tell the database to filter your results down to a subset of all of your records. What if we only wanted to show only fruits?

SELECT *
FROM ingredients
WHERE type = 'fruit';

This will give us all the fruits we had. What if we wanted to only select vegetables where the the IDs are less 20?

SELECT *
FROM ingredients
WHERE type = 'vegetable'
  AND id < 20;

AND allows you to add multiple clauses to your selects. As you may guess, the presence of AND belies the existance of OR

SELECT *
FROM ingredients
WHERE id <= 10
  OR id >= 20;

ORDER BY

You frequently will care about the order these things come back in, how they’re sorted. That’s what ORDER BY is for.

Let’s say you wanted to order not by id or insertion order but by title.

SELECT * FROM ingredients ORDER BY title;

This will alphabetize your returned list. What if we wanted it in reverse order of IDs?

SELECT * FROM ingredients ORDER BY id DESC;

This will start at the largest number and count backwards. As you may have guessed, ASC is implied if you don’t specify.

Some times you want to search for something in your database but it’s not an exact match. The best example would be if a user types something into the search bar, they’re likely not giving you something that you match exactly. You would expect a search for “pota” to match “potato”, right? Enter LIKE in SQL.

SELECT * FROM ingredients WHERE title LIKE '%pota%';

This is a very limited fuzzy matching of text. This is not doing things like dropping “stop words” (like and, the, with, etc.) or handling plurals, or handling similar spellings (like color vs colour). Postgres can do this, and we’ll get there later with indexes.

Built in functions

Okay, great, now what if a user searchs for “fruit”? We’d expect that to work, right?

Concat

SELECT * FROM ingredients WHERE CONCAT(title, type) LIKE '%fruit%';

Info concat() is a function that will take two strings and combine them together. We can concat our two title and type columns and then use LIKE on the results on that combined string.

The result of the cherry row would be cherryfruit which means it’d match weird strings like rryfru. We’ll talk later about more complicated string matching but this will do for now.

Okay, but what if we have capitalization problem? We can use lower, both on the columns and on the values.

Lower

SELECT * FROM ingredients WHERE LOWER(CONCAT(title, type)) LIKE LOWER('%FrUiT%');

LOWER() take a string and make it lowercase.

ILIKE (insensitive)

Fortunately, there's an even easier way to do this with less function evocation.

SELECT * FROM ingredients WHERE CONCAT(title, type) ILIKE '%FrUiT%';

ILIKE does the same thing, just with case insensitivity. Mostly I just wanted to show you lower!

There are so many built in functions to Postgres. Click here to see the official docs.

% vs _

You see that we’ve been surrounding the LIKE values with %. This is basically saying “match 0 to infinite characters”. So with “%berry” you would match “strawberry” and “blueberry” but not “berry ice cream”. Because the ”%” was only at the beginning it wouldn’t match anything after, you’d need “%berry%” to match both “strawberry” and “blueberry ice cream”.

SELECT * FROM ingredients WHERE title ILIKE 'c%';

The above will give you all titles that start with “c”.

You can put % anywhere. “b%t” will match “bt”, “bot”, “but”, “belt”, and “belligerent”.

There also exists _ which will match 1 and only one character. “b_t” will match “bot” and “but” but not “bt”, “belt”, or “belligerent”.

Database and node

This is not a Node.js class so the expectation of your Node.js skills here are fairly low. No worries if you don’t have a ton of experience with Node.js

We are going to be using the pg package which is the most common PostgreSQL client for Node.js. There are others but this is the one with the most direct access to the underlying queries which is what we want. Feel free to use an ORM in your projects but it’s useful to know how the underlying PostgreSQL works.

Before I dump you into the code I wanted to give you a very brief tour of what to expect.

Connecting to a database

pg has two ways to connect to a database: a client and a pool.

You can think of a client as an individual connection you open and then eventually close to your PostgreSQL database. When you call connect on a client, it handshakes with the server and opens a connection for you to start using. You eventually call end to end your session with a client.

A pool is a pool of clients. Opening and closing clients can be expensive if you’re doing it a lot. Imagine you have a server that gets 1,000 requests per second and every one of those needs to open a line to a database. All that opening and closing of connections is a lot of overhead. A pool therefore holds onto connections. You basically just say “hey pool, query this” and if it has a spare connection open it will give it to you. If you don’t then it will open another one. Pretty neat, right?

Okay, we’re going to be using pools today since we’re doing a web server. I’d really only use a client for a one-off script or if I’m doing transactions which aren’t supported by pools (transactions are not covered in this course but it’s if multiple queries have to happen all at once or not all). Otherwise I’d stick to pools. One isn’t any harder to use than the other.

So let’s see how to connect to a database.

const pg = require("pg");
const pool = new pg.Pool({
  user: "postgres",
  host: "localhost",
  database: "recipeguru",
  password: "lol",
  port: 5432,
});
  • These are the default credentials combined with what I set up in the previous lessons. Make sure you’re using the correct credentials.
  • Make sure you started PostgreSQL via docker with the -p 5432:5432 flag or PostgreSQL won’t be exposed on your host system.
  • Make sure your database is running too.

Once you’ve done this, you can now start making queries to PostgreSQL!

Let’s write a query.

const { rows } = await pool.query(`SELECT * FROM recipes`);
  • rows will be the response of the query.
  • There’s lot of other metadata on these queries beyond just the rows. Feel free to console.log it or debug it to explore.
  • pg will add the ; at the end for you.

Parameterization and SQL injection

Let’s say you wanted to query for one specific ingredient by id and that id was passed in via an AJAX request.

SELECT * FROM ingredients WHERE id = <user input here>;

What’s wrong with then doing then:

const { id } = req.query;
const { rows } = await pool.query(`SELECT * FROM ingredients WHERE id=${id}`);

SQL injection. You’re just raw passing in user data to a query and a user could fathomably put anything in that id API request. What happens if a user made id equal to 1; DROP TABLE users; --?

Do not run the next query. Well, it wouldn’t matter since we don’t have a users table but still, it’s meant to be a demonstration of what could happen.

SELECT * FROM ingredients WHERE id=1; DROP TABLE users; --

Oh no 😱

The first 1; ends the first query. The DROP TABLE users; does whatever malicious thing the hacker wants to do. They can delete info like this one does, or they can dump info that you don’t want them to. The -- says “everything after this is a comment” so it comments out the rest of your SQL if anything came after it.

This is SQL injection and a very real issue with SQL in apps. It’s why Wordpress apps are always under attack because they use MySQL behind the scenes. If you can find somewhere you can inject SQL you can get a lot of info out of an app.

So how do we avoid it? Sanitization of your inputs and parameterization of your queries.

const { id } = req.query;
 
const { rows } = await pool.query(`SELECT * FROM ingredients WHERE id=$1`, [
  id,
]);

If you do this, pg will handle making sure that what goes into the query is safe to send to your database. Easy enough, right?

If you need to do a a thing like WHERE text ILIKE '%star wars%' and you’re using parameterized queryies, just put the % in the thing to be inserted, e.g. ('WHERE text ILIKE $1', ['%star wars%'])

Joints

So far we’ve done a one to one matching of records. We’ve used a record in a database to represent one item:

  • vegetable,
  • a fruit,
  • …so on and so forth.

Now we’re going to get into records that can relate to each other. Let’s think about recipes.

A recipe has multiple ingredients. That has word is key here. It means there is a relationship.

A single recipe has many ingredients. An ingredient can also be in many recipes. A tomato is both in pizza sauce and in a BLT. This is called a many-to-many relationship.

There’s also one-to-many relationships, like imagine if we had multiple photos of each of our ingredients.

A single ingredient will have five photos. And those photos will only will only belong to one ingredient. A photo of a green pepper doesn’t make sense to belong to anything besides the green pepper ingredient.

There can also exist one-to-one relationships but in general you would just make those the same record all together. You could split up the type and title into two tables, but why would you.

A basic relationship

We’re going to do this a naïve wrong way, and then we’re going to do it the correct way.

First thing we’re going to make a recipes table

CREATE TABLE recipes (
  recipe_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR ( 255 ) UNIQUE NOT NULL,
  body TEXT
);

Okay, no rocket science so far. We just created another random table. Let’s say we could have many images of a single recipe. How would we go about doing that?

INSERT INTO recipes
  (title, body)
VALUES
  ('cookies', 'very yummy'),
  ('empanada','ugh so good'),
  ('jollof rice', 'spectacular'),
  ('shakshuka','absolutely wonderful'),
  ('khachapuri', 'breakfast perfection'),
  ('xiao long bao', 'god I want some dumplings right now');

Pro tip: don’t write a course while hungry.

Okay, we now have a bunch of recipes. Our problem now is that we have a variable amount of photos for each recipe. Perhaps we have one picture of cookies but we have four pictures of jollof rice. How do we model that if we always have a fixed amount columns?

One approach would be to say “I only have at most five pictures of any recipes, I’ll make columns pic1 through pic5 columns and limit myself to that”. Totally valid approach, and if you can guarantee only 5 images at most, it might be the right approach.

But what if we can’t guarantee that? We need some sort of flexibility. This is where we can use two tables.

CREATE TABLE recipes_photos (
  photo_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  recipe_id INTEGER,
  url VARCHAR(255) NOT NULL
);

Now we can insert a some rows to reference images that we have.

INSERT INTO recipes_photos
  (recipe_id, url)
VALUES
  (1, 'cookies1.jpg'),
  (1, 'cookies2.jpg'),
  (1, 'cookies3.jpg'),
  (1, 'cookies4.jpg'),
  (1, 'cookies5.jpg'),
  (2, 'empanada1.jpg'),
  (2, 'empanada2.jpg'),
  (3, 'jollof1.jpg'),
  (4, 'shakshuka1.jpg'),
  (4, 'shakshuka2.jpg'),
  (4, 'shakshuka3.jpg'),
  (5, 'khachapuri1.jpg'),
  (5, 'khachapuri2.jpg');
-- no pictures of xiao long bao

Okay, now let’s select all the photos for shakshuka. Shakshuka is ID 4 (if you followed my instructions, feel free to drop tables and recreate them if you need to.)

SELECT title, body FROM recipes WHERE recipe_id = 4;
SELECT url FROM recipes_photos WHERE recipe_id = 4;

It'd be cool if we could do these all at the same time. Some sort inner section of a venn diagram, right? You can!

SELECT recipes.title, recipes.body, recipes_photos.url
  FROM recipes_photos
  INNER JOIN
    recipes
  ON
    recipes_photos.recipe_id = recipes.recipe_id
  WHERE recipes_photos.recipe_id = 4;

Amazing!!

Now we’re getting data from each table smushed together. The data from recipes will be repeated over item in the recipes_photos table but that’s what we expect.

Just to show you some common shorthand (but otherwise same query)

SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  INNER JOIN
    recipes r
  ON
    rp.recipe_id = r.recipe_id
  WHERE rp.recipe_id = 4;

Go ahead and try this to see the full inner join.

SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  INNER JOIN
    recipes r
  ON
    rp.recipe_id = r.recipe_id;

Pretty cool, right?

Tips and tricks

SQL Commands

Base table

namecontinentareapopulationgdp
AfghanistanAsia6522302550010020343000000
AlbaniaEurope28748283174112960000000
AlgeriaAfrica238174137100000188681000000
AndorraEurope468781153712000000
AngolaAfrica124670020609294100990000000

Selection commands

Where clause

The example uses a WHERE clause to show the population of ‘France’. Note that strings should be in ‘single quotes’;

SELECT population FROM world
  WHERE name = "France"
In clause

Checking a list The word IN allows us to check if an item is in a list.

SELECT name, population FROM world
  WHERE name IN ('Brazil', 'Russia', 'India', 'China');
 
Between clause

Which countries are not too small and not too big? BETWEEN allows range checking (range specified is inclusive of boundary values)

SELECT name, area FROM world
  WHERE area BETWEEN 250000 AND 300000
 
Group by

In a GROUP BY statement only distinct values are shown for the column in the GROUP BY. This example shows the continents hosting the Olympics with the count of the number of games held.

SELECT continent, COUNT(yr) FROM games
 GROUP BY continent
 
Join

You can use a JOIN to get results from two or more related tables. In this example each row of the table games is related to a row of the table city. If you want to find the country where the games took place you must JOIN the games table to the city table on the common field that is games.city and city.name

SELECT games.yr, city.country
  FROM games JOIN city
       ON (games.city = city.name)
 
Order by

Show the name and population for each country with a population of more than 100000000. Show countries in descending order of population.

SELECT name, population
  FROM bbc
  WHERE population > 100000000
  ORDER BY population DESC
 
Distinct

By default the result of a SELECT may contain duplicate rows. We can remove these duplicates using the DISTINCT key word.

SELECT DISTINCT region FROM bbc