Setup postgress on a container

the fastest way we can get everyone up and running on PostgreSQL is to have everyone use Docker containers.  Dockers allows any computer to run a lightweight emulation of a pre-made environment (typically Linux). In our case we’re going to be using a premade Linux environment that has PostgreSQL already installed on it.

Used containers

// image we will need
docker pull postgres:14
// completed image
docker pull btholt/complete-intro-to-sql

Running postgres

docker run -e POSTGRES_PASSWORD=lol --name=pg --rm -d -p 5432:5432 postgres:14

This should run PostgreSQL in the background.

  • We gave it a password of “lol” (feel free to change it to something different, I just remember lol because lol)
  • We ran it with a name of pg so we can refer it with that shorthand
  • We used the --rm flag so the container deletes itself afterwards (rather than leave its logs and such around)
  • We ran it in the background with -d. Otherwise it’ll start in the foreground.
  • The -p allows us to expose port 5432 locally which is the port Postgres runs on by default.

Run docker ps to see it running. You can also see it in the Docker Desktop app running under the containers tab.

Now run docker kill pg to kill the container. This will shut down the container and since we ran it with --rm it’ll clean itself up afterwards too. We can run the docker run … to start it again.

  • Okay, let’s try connecting to it with psql, the CLI tool for connecting to Postgres.

Connect to Postgres

docker exec -u postgres -it pg psql

Run postgres from your container

psql -U postgres

Postgres

psql commands

psql has a bunch of built in commands to help you navigate around. All of these are going to begin with \. Try \? to bring up the help list.

For now we’re interested in what databases we have available to us. Try running \l to list all databases. You’ll likely see the databases :

  • postgres,
  • template0,
  • template1.

Default databases

template1 is what Postgres uses by default when you create a database. If you want your databases to have a default shape, you can modify template1 to suit that.

template0 should never be modified. If your template1 gets out of whack, you can drop it and recreate from the fresh template0.

postgres exists for the purpose of a default database to connect to. We’re actually connected to it by default since we didn’t specify a database to connect to. You can technically could delete it but there’s no reason to and a lot of tools and extensions do rely on it being there.

Databases Overview

Create a database

A database is a collection of similar tables of data.

CREATE DATABASE recipeguru;

Switching to a database

to switch to a database you simply use the \c command followed by the database name

\c recipeguru
/*
Where recipeguru is our database name
*/

Show tables

there is no such a thing as show table … , to show all the tables in Postgres you simply use the command,

\d
// show all tables
\d table_name
// Describe the structure of the table

Insert into your table

INSERT INTO ingredients (title) VALUES ('bell pepper');

Select from your table

SELECT * FROM ingredients;

Alter table

Sometimes you want to change the structure of your table in this case you can use the alter command

Adding a column

ALTER TABLE ingredients ADD COLUMN image VARCHAR ( 255 );
 
or
ALTER TABLE ingredients
ADD COLUMN image VARCHAR ( 255 ),
ADD COLUMN type VARCHAR ( 50 ) NOT NULL DEFAULT 'vegetable';
 

Deleting a column

sometimes you want to delete a column well here you go

 
ALTER TABLE ingredients DROP COLUMN image;

Database Insert

INSERT INTO ingredients (
 title, image, type
) VALUES (
  'red pepper', 'red_pepper.jpg', 'vegetable'
);

This is the standard way of doing an insert. In the first set of parens you list out the column names then in the values column you list the actual values you want to insert.

Big key here which will throw JS developers for a loop: you must use single quotes. Single quotes in SQL means "this is a literal value". Double quotes in SQL mean "this is an identifier of some variety".

INSERT INTO "ingredients" (
 "title", "image", "type" -- Notice the " here
) VALUES (
  'broccoli', 'broccoli.jpg', 'vegetable' -- and the ' here
);

Use -- for comments

The above query works because the double quotes are around identifiers like the table name and the column names. The single quotes are around the literal values. The double quotes above are optional. The single quotes are not.

On conflict

Note the ON CONFLICT section. Some of these you may have already inserted (like the red pepper.) This is telling PostgreSQL that if a row exists already to just do nothing about it.

On conflict do nothing

This basically means that in case there is a conflict then do nothing on your database

INSERT INTO ingredients (
  title, image, type
) VALUES
  ( 'avocado', 'avocado.jpg', 'fruit' ),
  ( 'banana', 'banana.jpg', 'fruit' ),
  ( 'tomato', 'tomato.jpg', 'vegetable' ),
  ( 'watermelon', 'watermelon.jpg', 'fruit' )
ON CONFLICT DO NOTHING;

PG won't crush in case there is a conflict in your data that you are inserting

On conflict update

Sometimes you might have a conflict and rather than doing nothing you would like to update your stuff then that is where update on conflict becomes handy

INSERT INTO ingredients (
  title, image, type
) VALUES
  ( 'watermelon', 'banana.jpg', 'this won''t be updated' )
ON CONFLICT (title) DO UPDATE SET image = excluded.image;
 
/*
where the (title) is the basis of your conflict,
and you simply say that update the image to the new incoming name in case there is a conflict on the watermelon
 
excluded.image is the new image being inserted.
*/

Database Update

The next thing we need to learn is how to update a field. We’ve seen how to INSERT and do an update if an INSERT fails, but let’s see how to update something directly.

UPDATE ingredients SET image = 'watermelon.jpg' WHERE title = 'watermelon';

The WHERE clause is where you filter down what you want to update.

In this case there’s only one watermelon so it’ll just update one but if you had many watermelons it would match all of those and update all of them,

You probably got a line returned something like

UPDATE 1

If you want to return what was updated try:

UPDATE ingredients SET image = 'watermelon.jpg' WHERE title = 'watermelon'
RETURNING id, title, image;

The RETURNING clause tells Postgres you want to return those columns of the things you’ve updated.

In our case I had it return literally everything we have in the table so you could write that as

return everything that was inserted!💖

UPDATE ingredients SET image = 'watermelon.jpg' WHERE title = 'watermelon' RETURNING *;

The * means “everything” in this case.

Let’s add two rows with identical images.

INSERT INTO ingredients
  (title, image)
VALUES
  ('not real 1', 'delete.jpg'),
  ('not real 2', 'delete.jpg');

Whitespace isn’t significant in SQL. You can break lines up however you want and how they make sense to you.

Now let’s update both.

UPDATE ingredients
SET image = 'different.jpg'
WHERE image='delete.jpg'
RETURNING id, title, image;

Here you can see as long as that WHERE clause matches multiple rows, it'll update all the rows involved.

DELETE

Deletes are very similar in SQL.

DELETE FROM ingredients
WHERE image='different.jpg'
RETURNING *;

Here we just have no SET clause. Anything that matches that WHERE clause will be deleted. The RETURNING, like in updates, is optional if you want to see what was deleted.

Select statement

Let’s next hop into SELECT, or how to read data from a database. We’ve already seen the most basic one.

SELECT * FROM ingredients;

The _ there represents all available columns. Frequently, for a variety of reasons, we do not want to select everything. In general, I would recommend only using _ where your intent is truly “I want anything this database could ever store for these records“.

Frequently that is not the case. Frequently it is “I need the name, address and email for this email, but not their social security or credit card number”.

This is positive because it means smaller transfer loads, that your system only processes data it needs and not the rest, and it shows intention in your code. Honestly the biggest benefit is the latter: to show intentions in your code.

So in our case we could put

SELECT id, title, image, type FROM ingredients;

When you read this now you know that the former coder was actively looking for those three columns.

LIMIT and OFFSET

Okay, now what if the user only wants five records?

SELECT id, title, image
FROM ingredients
LIMIT 5;

This will limit your return to only five records, the first five it finds. You frequently will need to do this as well (as in almost always) since a database can contain millions if not billions of records.

So what if you want the next five records?

SELECT id, title, image
FROM ingredients
LIMIT 5
OFFSET 5;

This can be inefficient at large scales and without the use of indexes.

It also has the problem of if you're paging through data and someone inserts a record in the meantime you could shift your results. We'll get into optimizing queries a bit later, but just be aware of that.

Using offset, when someone is adding and deleting stuffs throughout your data then it's really gonna be hard for the offset to give them the updated results, some rows might never get returned to the user

A better way of doing these kind of select is using the id, which is much more efficient and way performant compared to the brother offset.

SELECT *
FROM ingredients
WHERE id >= 10 and id<=20;

WHERE

Sometimes you don’t want all of the records all at once. In that case you can add a WHERE clause where you tell the database to filter your results down to a subset of all of your records. What if we only wanted to show only fruits?

SELECT *
FROM ingredients
WHERE type = 'fruit';

This will give us all the fruits we had. What if we wanted to only select vegetables where the the IDs are less 20?

SELECT *
FROM ingredients
WHERE type = 'vegetable'
  AND id < 20;

AND allows you to add multiple clauses to your selects. As you may guess, the presence of AND belies the existance of OR

SELECT *
FROM ingredients
WHERE id <= 10
  OR id >= 20;

ORDER BY

You frequently will care about the order these things come back in, how they’re sorted. That’s what ORDER BY is for.

Let’s say you wanted to order not by id or insertion order but by title.

SELECT * FROM ingredients ORDER BY title;

This will alphabetize your returned list. What if we wanted it in reverse order of IDs?

SELECT * FROM ingredients ORDER BY id DESC;

This will start at the largest number and count backwards. As you may have guessed, ASC is implied if you don’t specify.

Some times you want to search for something in your database but it’s not an exact match. The best example would be if a user types something into the search bar, they’re likely not giving you something that you match exactly. You would expect a search for “pota” to match “potato”, right? Enter LIKE in SQL.

SELECT * FROM ingredients WHERE title LIKE '%pota%';

This is a very limited fuzzy matching of text. This is not doing things like dropping “stop words” (like and, the, with, etc.) or handling plurals, or handling similar spellings (like color vs colour). Postgres can do this, and we’ll get there later with indexes.

Built in functions

Okay, great, now what if a user searchs for “fruit”? We’d expect that to work, right?

Concat

SELECT * FROM ingredients WHERE CONCAT(title, type) LIKE '%fruit%';

Info concat() is a function that will take two strings and combine them together. We can concat our two title and type columns and then use LIKE on the results on that combined string.

The result of the cherry row would be cherryfruit which means it’d match weird strings like rryfru. We’ll talk later about more complicated string matching but this will do for now.

Okay, but what if we have capitalization problem? We can use lower, both on the columns and on the values.

Lower

SELECT * FROM ingredients WHERE LOWER(CONCAT(title, type)) LIKE LOWER('%FrUiT%');

LOWER() take a string and make it lowercase.

ILIKE (insensitive)

Fortunately, there's an even easier way to do this with less function evocation.

SELECT * FROM ingredients WHERE CONCAT(title, type) ILIKE '%FrUiT%';

ILIKE does the same thing, just with case insensitivity. Mostly I just wanted to show you lower!

There are so many built in functions to Postgres. Click here to see the official docs.

% vs _

You see that we’ve been surrounding the LIKE values with %. This is basically saying “match 0 to infinite characters”. So with “%berry” you would match “strawberry” and “blueberry” but not “berry ice cream”. Because the ”%” was only at the beginning it wouldn’t match anything after, you’d need “%berry%” to match both “strawberry” and “blueberry ice cream”.

SELECT * FROM ingredients WHERE title ILIKE 'c%';

The above will give you all titles that start with “c”.

You can put % anywhere. “b%t” will match “bt”, “bot”, “but”, “belt”, and “belligerent”.

There also exists _ which will match 1 and only one character. “b_t” will match “bot” and “but” but not “bt”, “belt”, or “belligerent”.

Database and node

This is not a Node.js class so the expectation of your Node.js skills here are fairly low. No worries if you don’t have a ton of experience with Node.js

We are going to be using the pg package which is the most common PostgreSQL client for Node.js. There are others but this is the one with the most direct access to the underlying queries which is what we want. Feel free to use an ORM in your projects but it’s useful to know how the underlying PostgreSQL works.

Before I dump you into the code I wanted to give you a very brief tour of what to expect.

Connecting to a database

pg has two ways to connect to a database: a client and a pool.

You can think of a client as an individual connection you open and then eventually close to your PostgreSQL database. When you call connect on a client, it handshakes with the server and opens a connection for you to start using. You eventually call end to end your session with a client.

A pool is a pool of clients. Opening and closing clients can be expensive if you’re doing it a lot. Imagine you have a server that gets 1,000 requests per second and every one of those needs to open a line to a database. All that opening and closing of connections is a lot of overhead. A pool therefore holds onto connections. You basically just say “hey pool, query this” and if it has a spare connection open it will give it to you. If you don’t then it will open another one. Pretty neat, right?

Okay, we’re going to be using pools today since we’re doing a web server. I’d really only use a client for a one-off script or if I’m doing transactions which aren’t supported by pools (transactions are not covered in this course but it’s if multiple queries have to happen all at once or not all). Otherwise I’d stick to pools. One isn’t any harder to use than the other.

So let’s see how to connect to a database.

const pg = require("pg")
const pool = new pg.Pool({
  user: "postgres",
  host: "localhost",
  database: "recipeguru",
  password: "lol",
  port: 5432,
})
  • These are the default credentials combined with what I set up in the previous lessons. Make sure you’re using the correct credentials.
  • Make sure you started PostgreSQL via docker with the -p 5432:5432 flag or PostgreSQL won’t be exposed on your host system.
  • Make sure your database is running too.

Once you’ve done this, you can now start making queries to PostgreSQL!

Let’s write a query.

const { rows } = await pool.query(`SELECT * FROM recipes`)
  • rows will be the response of the query.
  • There’s lot of other metadata on these queries beyond just the rows. Feel free to console.log it or debug it to explore.
  • pg will add the ; at the end for you.

Parameterization and SQL injection

Let’s say you wanted to query for one specific ingredient by id and that id was passed in via an AJAX request.

SELECT * FROM ingredients WHERE id = <user input here>;

What’s wrong with then doing then:

const { id } = req.query
const { rows } = await pool.query(`SELECT * FROM ingredients WHERE id=${id}`)

SQL injection. You’re just raw passing in user data to a query and a user could fathomably put anything in that id API request. What happens if a user made id equal to 1; DROP TABLE users; --?

Do not run the next query. Well, it wouldn’t matter since we don’t have a users table but still, it’s meant to be a demonstration of what could happen.

SELECT * FROM ingredients WHERE id=1; DROP TABLE users; --

Oh no 😱

The first 1; ends the first query. The DROP TABLE users; does whatever malicious thing the hacker wants to do. They can delete info like this one does, or they can dump info that you don’t want them to. The -- says “everything after this is a comment” so it comments out the rest of your SQL if anything came after it.

This is SQL injection and a very real issue with SQL in apps. It’s why Wordpress apps are always under attack because they use MySQL behind the scenes. If you can find somewhere you can inject SQL you can get a lot of info out of an app.

So how do we avoid it? Sanitization of your inputs and parameterization of your queries.

const { id } = req.query
 
const { rows } = await pool.query(`SELECT * FROM ingredients WHERE id=$1`, [id])

If you do this, pg will handle making sure that what goes into the query is safe to send to your database. Easy enough, right?

If you need to do a a thing like WHERE text ILIKE '%star wars%' and you’re using parameterized queryies, just put the % in the thing to be inserted, e.g. ('WHERE text ILIKE $1', ['%star wars%'])

Joints

So far we’ve done a one to one matching of records. We’ve used a record in a database to represent one item:

  • vegetable,
  • a fruit,
  • …so on and so forth.

Now we’re going to get into records that can relate to each other. Let’s think about recipes.

A recipe has multiple ingredients. That has word is key here. It means there is a relationship.

A single recipe has many ingredients.

An ingredient can also be in many recipes. A tomato is both in pizza sauce and in a BLT. This is called a many-to-many relationship.

There's also one-to-many relationships,

like imagine if we had multiple photos of each of our ingredients. A single ingredient will have five photos. And those photos will only will only belong to one ingredient. A photo of a green pepper doesn’t make sense to belong to anything besides the green pepper ingredient.

There can also exist one-to-one relationships

but in general you would just make those the same record all together. You could split up the type and title into two tables, but why would you.

A basic relationship

We’re going to do this a naïve wrong way, and then we’re going to do it the correct way.

First thing we’re going to make a recipes table

CREATE TABLE recipes (
  recipe_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  title VARCHAR ( 255 ) UNIQUE NOT NULL,
  body TEXT
);

Okay, no rocket science so far. We just created another random table. Let’s say we could have many images of a single recipe. How would we go about doing that?

INSERT INTO recipes
  (title, body)
VALUES
  ('cookies', 'very yummy'),
  ('empanada','ugh so good'),
  ('jollof rice', 'spectacular'),
  ('shakshuka','absolutely wonderful'),
  ('khachapuri', 'breakfast perfection'),
  ('xiao long bao', 'god I want some dumplings right now');

Pro tip: don’t write a course while hungry.

Okay, we now have a bunch of recipes. Our problem now is that we have a variable amount of photos for each recipe. Perhaps we have one picture of cookies but we have four pictures of jollof rice. How do we model that if we always have a fixed amount columns?

One approach would be to say “I only have at most five pictures of any recipes, I’ll make columns pic1 through pic5 columns and limit myself to that”. Totally valid approach, and if you can guarantee only 5 images at most, it might be the right approach.

But what if we can’t guarantee that? We need some sort of flexibility. This is where we can use two tables.

CREATE TABLE recipes_photos (
  photo_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  recipe_id INTEGER,
  url VARCHAR(255) NOT NULL
);

Now we can insert a some rows to reference images that we have.

INSERT INTO recipes_photos
  (recipe_id, url)
VALUES
  (1, 'cookies1.jpg'),
  (1, 'cookies2.jpg'),
  (1, 'cookies3.jpg'),
  (1, 'cookies4.jpg'),
  (1, 'cookies5.jpg'),
  (2, 'empanada1.jpg'),
  (2, 'empanada2.jpg'),
  (3, 'jollof1.jpg'),
  (4, 'shakshuka1.jpg'),
  (4, 'shakshuka2.jpg'),
  (4, 'shakshuka3.jpg'),
  (5, 'khachapuri1.jpg'),
  (5, 'khachapuri2.jpg');
-- no pictures of xiao long bao

Okay, now let’s select all the photos for shakshuka. Shakshuka is ID 4 (if you followed my instructions, feel free to drop tables and recreate them if you need to.)

SELECT title, body FROM recipes WHERE recipe_id = 4;
SELECT url FROM recipes_photos WHERE recipe_id = 4;

It'd be cool if we could do these all at the same time. Some sort inner section of a venn diagram, right? You can!

SELECT recipes.title, recipes.body, recipes_photos.url
  FROM recipes_photos
  INNER JOIN
    recipes
  ON
    recipes_photos.recipe_id = recipes.recipe_id
  WHERE recipes_photos.recipe_id = 4;

Amazing!!

Now we’re getting data from each table smushed together. The data from recipes will be repeated over item in the recipes_photos table but that’s what we expect.

Just to show you some common shorthand (but otherwise same query)

SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  INNER JOIN
    recipes r
  ON
    rp.recipe_id = r.recipe_id
  WHERE rp.recipe_id = 4;

Go ahead and try this to see the full inner join.

SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  INNER JOIN
    recipes r
  ON
    rp.recipe_id = r.recipe_id;

Pretty cool, right?

Notice in our query above that "xiao long bao" does not show up at all. Makes sense, we have no photos of xiao long bao so why would they show up? INNER JOIN was the correct choice for what semantics we intended.

Okay, but let’s say we were populating a list of all our recipes and their photos and we had a default image if a recipe didn’t have any photos? Then INNER JOIN doesn’t make sense because INNER JOIN only gives up things where they exist in both table A and table B. So how would we accomplish this task then?

SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  RIGHT OUTER JOIN
    recipes r
  ON
    rp.recipe_id = r.recipe_id;
 
/* Or just */
 
SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  RIGHT JOIN
    recipes r
  ON
    rp.recipe_id = r.recipe_id;

NATURAL JOIN

I intentionally named recipe_id in both tables the same to show you this fun party trick.

SELECT *
  FROM recipes_photos
  NATURAL JOIN
    recipes;

What is this sorcery!? Because we named recipe_id the same in both, Postgres is smart enough to put two and two together and figure out that that’s what we should join on.

NATURAL JOIN is short for NATURAL INNER JOINNATURAL LEFT JOIN and NATURAL RIGHT JOIN are also possible too.

In general let me steer you away from this in your code. Your tables can change down the line and you could accidentally name things the same that aren’t the same (like id being a classic one.) It’s useful for quick querying like we’re doing, but I’d say be explicit and avoid NATURAL’s implicit behavior. Still cool though, right?

CROSS JOIN

A small note on a not super useful type of JOIN. Let’s say you had two tables. In table 1 you had colors: green, blue, and red. In table 2 you have animals: dog, cat, and chicken. You want to make every possible permutation of the combination of both tables: green dog, green cat, green chicken, blue dog, blue cat, etc. There’s an ability to do this with CROSS JOIN

SELECT r.title, r.body, rp.url
  FROM recipes_photos rp
  CROSS JOIN
    recipes r;

Keep in mind that 3 rows in each table yielded 9 rows. In our case, it yield 78 rows! It’s not typically the most useful kind of join but good to know it’s there.

Foreign Key & Managing reference

So far so good, we have the ability to query across tables and join them together. We have a problem though: what if we delete a recipe?

DELETE
FROM
  recipes r
WHERE
  r.recipe_id = 5;
-- The khachapuri

Cool, we dropped the recipe, but what about the photos?

SELECT
  *
FROM
  recipes_photos rp
WHERE
  rp.recipe_id = 5;

Oh no! Still there! Now, we could write a second query to drop these two, but it'd be great if Postgres could track these changes for us and assure us they'd never fall out of sync.

Now, let’s see what happens if we modify recipes_photos really quick.

CREATE TABLE recipes_photos (
  photo_id INTEGER PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  url VARCHAR(255) NOT NULL,
  recipe_id INTEGER REFERENCES recipes(recipe_id) ON DELETE CASCADE
);

Okay, so we have a few things here

  • 🔥 The REFERENCES portion means it’s going to be a foreign key.
    • ➡️ You tell it what it’s going to match up to. In our case recipes is the table and recipe_id is the name of the column it’ll match. In our case those are the same name, but it doesn’t have to be. It must be the primary key of the other table.
  • 🔥 Then you need to tell it what to do when you delete something.
    • ➡️ With ON DELETE CASCADE you say “if the row in the other table gets deleted, delete this one too.” So if we delete something from recipes, it will automatically delete all its photos. Pretty cool, right?
  • 🔥 You can also do ON DELETE SET NULL which does exactly what it says it does.
    • ➡️ There’s also ON DELETE NO ACTION which will error out if you try to delete something from recipes if there are still photos left. This forces developers to clean up photos before deleting recipes. That can be helpful to.
  • 🔥 There’s also ON UPDATEs if you need to handle some synced state state between the two tables.

If you’re going to have have two tables reference each other, use foreign keys where possible. It makes useful constraints to make sure delete and update behaviors are intentional and it makes the queries faster.

Many to many relationships

So let’s now make our recipes connect to our ingredients!

This is a bit different because it’s a many-to-many relationship. A recipe has many ingredients, and an ingredient can belong to many recipes (eggs are in both khachapuri and cookies.)

CREATE TABLE recipe_ingredients (
  recipe_id INTEGER REFERENCES recipes(recipe_id) ON DELETE NO ACTION,
  ingredient_id INTEGER REFERENCES ingredients(id) ON DELETE NO ACTION,
  CONSTRAINT recipe_ingredients_pk PRIMARY KEY (recipe_id, ingredient_id)
);

We set this to error because we should clear out connections before we let developers delete recipes or ingredients. We don’t want to cascade deletes because that could delete recipes and ingredients unintentionally and we don’t want to set to null because then we’d have a bunch of half-null connections left over.

We’re going over constraints in the next chapter but we’re basically saying “the combination of recipe_id and ingredient_id must be unique” and we’re setting that as the primary key instead of an incrementing ID.

This table will describe the many-to-many relationship with two foreign keys between ingredients and recipes. Now we can insert records into here that describe how an ingredient belongs to a recipe.

INSERT INTO recipe_ingredients
  (recipe_id, ingredient_id)
VALUES
  (1, 10),
  (1, 11),
  (1, 13),
  (2, 5),
  (2, 13);

These recipes are nonsensical, don’t expect them to be correct recipes.

This is the way to connect multiple columns together. You have two tables that represent the two distinct concepts (ingredients and recipes) and then you use another table to describe the relationships between them.

So now we have two recipes that have ingredients, cookies and empanadas. Since we inserted the cookies first, it will very likely have the id of 1 (unless you have inserted some other things.) Let’s select those records.

SELECT
  i.title AS ingredient_title,
  i.image AS ingredient_image,
  i.type AS ingredient_type
FROM
  recipe_ingredients ri
INNER JOIN
  ingredients i
ON
  i.id = ri.ingredient_id
WHERE
  ri.recipe_id = 1;

Inner join is what we’re looking for because we are only looking for rows that have connections on both sides.

Now let’s use another INNER JOIN to add in the recipes.

SELECT
  i.title AS ingredient_title,
  i.image AS ingredient_image,
  i.type AS ingredient_type,
  r.title AS recipe_title,
  r.body AS recipe_body,
  r.recipe_id AS rid,
  i.id AS iid
FROM
  recipe_ingredients ri
INNER JOIN
  ingredients i
ON
  i.id = ri.ingredient_id
INNER JOIN
  recipes r
ON
  r.recipe_id = ri.recipe_id;

Constraints

A constraint is just a constraint you put around a column. For example, NOT NULL is a constraint we’ve used for our primary keys. You always need a primary key.

UNIQUE is another constraint that dictates that this column must be unqiue amongst all other columns.

Ever wonder how an app can tell you so quickly if an email or a user name is taken so quickly? They likely use a UNIQUE constraint on those two columns.

CHECK

  • CHECK allows you to set conditions on the data.
    • You can use it for enforcing enumerated types (e.g. ‘male’, ‘female’, ‘nonbinary’, ‘other’ for gender.)
  • Or you could have it test that a zipcode is five characters (in the USA.) Or that an age isn’t a negative number.

We actually have a very good use case for one: our type in the ingredients. It can be ‘fruit’, ‘vegetable’, ‘meat’, or ‘other’. We could write a CHECK constraint that enforces that.

ALTER TABLE ingredients
ADD CONSTRAINT type_enums
CHECK
  (type IN ('meat','fruit','vegetable','other'));

This will add a constraint to the existing table. You can also create these constraints when you create the table (like we did with the foreign keys as well as unique, not null, and primary.)

Just for fun, let’s try to break it.

INSERT INTO
  ingredients
  (title, image, type)
VALUES
  ('lol', 'wat.svg', 'obviously not a type');

JSONB

JSONB is a data type introduced in PostgreSQ, Specifically designed to store and manipulate JSON data.

Working With JSONB:

To be able to use JSONB, you simply have to define a component and use the JSONB data type.

Info Inserting data: You can directly insert valid JSON strings into JSONB columns using standard INSERT statements.

Querying data

  • Querying data: PostgreSQL offers rich functionality for querying JSONB data:

    • 🔥 Accessing elements:
      • ➡️ Use operators like -> and ->> to access specific keys or array elements within the JSON structure.
    • Filtering:
      • Employ comparison operators (e.g., =), text search functions, and JSON path expressions to filter based on your criteria.
    • Aggregating:
      • Perform aggregations like counting, summing, or averaging directly on JSONB values using aggregate functions.

Native JSON Operators:

  • ->: Access JSON properties, e.g., data->'key'.
  • @>: Check JSON document containment, e.g., data @> '{"key": "value"}'.
  • @: Test JSON document similarity, e.g., data @ '{"key": "value"}'.
  • jsonb_array_elements: Iterate over JSON arrays.

Best Practices:

  • Consider normalization:
    • If your JSON data has a well-defined structure, consider normalizing it into separate columns for better performance and data integrity.
  • Plan indexes thoughtfully:
    • While indexing offers speed, it can increase storage overhead. Choose keys wisely based on your query patterns.
  • Validate and sanitize:
    • Implement validation and sanitization mechanisms to ensure data quality and prevent security vulnerabilities.

Create JSONB Column

Let’s add a JSONB field to our recipes. We’ll call it meta as in metadata but we could call it anything. And keep in mind this is just a normal JSON object. You can have arrays, nested objects, whatever.

ALTER TABLE recipes ADD COLUMN meta JSONB;

That's easy enough. Okay, let's now add a few rows to it. Keep in mind that it does have to be valid JSON or Postgres won't let you add it.

 
-- Add data that in json format.
  recipes
SET
  meta='{ "tags": ["chocolate", "dessert", "cake"] }'
WHERE
  recipe_id=16;
 
 
-- It can be pretty much any datatype as long as it's compatible with JSON
UPDATE
  recipes
SET
  meta='{ "tags": ["dessert", "cake"] }'
WHERE
  recipe_id=20;
 

Selecting Data from the JSONB

Okay, let’s select all tags from any recipe that has meta data.

SELECT meta -> 'tags' FROM recipes WHERE meta IS NOT NULL;

The -> is an accessor. In JS terms, this is like saying meta.tags. If we were accessing another layer of nesting, you just use more ->.

SELECT meta -> 'name' -> 'first'

?column?

When you select a column like in the example above, the name by default will be ?column? this is because we did not specify the name of our column. therefore, whenever you select a column, just specify the name

SELECT recipe_id, title, meta->'tags'->>0 as desertTitle
	From recipes where meta is not null;

Selecting first data from an array

Let’s try selecting just the first element of each one using that.

SELECT meta -> 'tags' -> 0 FROM recipes WHERE meta IS NOT NULL;

The above This should give you back only the first item of each.

Notice they’re still weirdly in "". This is because the -> selector gives you back a JSON object so you can keep accessing it. If you want it back as just text, you use ->>.

SELECT meta -> 'tags' ->> 0 FROM recipes WHERE meta IS NOT NULL;

Notice the first one -> because we need tags back as a JSON object (because accessors don’t work on text) but we use ->> on the second one so we can get the text back.

Filtering with JSON operators:

Let’s say your column looks like this

idtitlemeta
20Delightful Carrot Cake{“tags”: [“dessert”, “cake”]}
47Cranberry Nut Tarts{“tags”: [“dessert”, “fruit”]}

And you would like to select all the rows that has a meta that contains cake as their tags.

 
-- Selects from the recipes table in the meta=>tags key and the ? is simply asking, do you have cake,if true then you are the row i want.
 
SELECT recipe_id, title, meta -> 'tags'
	FROM recipes WHERE meta -> 'tags' ? 'cake';
 
 
 
--- Or in theory you can ask things that only contains tags! now all objects that contains tags in the meta column will be returned
 
SELECT recipe_id, title, meta -> 'tags'
	FROM recipes WHERE meta ? 'tags';
 
 
 
-- This simply is doing a check to see if a value is in a particular array
SELECT recipe_id, title, meta -> 'tags'
	FROM recipes WHERE meta -> 'tags' @> '"cake"';
 
 

Both of these work. The first one with the ? checks to see if 'cake' is a top level key available in which case it is.

The second with the @> is doing a "does this array contains this value. That's why we do need the extra double quotes, since it is a string in a JSON value.

Nested Access with #

The # extracts JSON sub-object at the specified path, where path elements can be either field keys or array indexes.

'{"a": {"b": ["foo","bar"]}}' :: column #> '{a,b,1}' → "bar"|

select recipe_id,title,meta #>'{foo,bar,1}'
from recipes where recipe_id=45;

JSONB Data manipulation

Updating JSONB Objects in PostgreSQL

45Apple Juice{

“boz”:[“value1”,“anotherValue”],

“foo”: {
“bar”: [“one”, “two”, “three”]
}
}

Adding a value in our JSONB Object

UPDATE recipes SET meta =
	jsonb_set(meta, '{boz}', '["value1","anotherValue"]')
		WHERE recipe_id = 45;

Update a Key with a New Value:

Lets say you want to change the “value1” in the “boz” array to “updatedValue” for recipe_id 45. You can easily achieve that using the jsonb_set function with path expression:

UPDATE recipes
SET meta = jsonb_set(meta, '{boz[0]}', 'updatedValue')
WHERE recipe_id = 45;

Delete or Remove key from our JSONB object

update recipes set meta = jsonb_delete(meta,'boz') where recipe_id=45;
-- Where the jsonb_delete is our function, and boz our key.

Data manipulation

When do use JSONB

I use JSONB a lot of PostgreSQL. I have a long history of using MongoDB and therefore my mindset fits well with the document oriented structures of it. JSONB does an amazing job of slotting into that in the right places.

For any record that is going to be present and finite, I always elect to make it a normal column. Then if I have any 1-to-many relationships like we did with photos or tags, I’ll use JSONB. Because one photo only belongs to one recipe, this would have been perfect to model as a JSONB field.

Tags are a bit more murky: in theory we are sharing tags across recipes (implying a many-to-many relationship like ingredients). If I decide to rename the “desserts” tag to “sweets”, I’d have to go update that on every JSONB field that used that. Better yet here, we could have a table of tags that IDs and names, and then the JSONB field of every recipe could refer to IDs. Best of both worlds. Great way to model many-to-many relationships without the third table.

So why did we spend all this time learning how to model data “the old school” way with many tables?

  1. Much of the data you’ll encounter in your job is still modeled this way.
  2. I’m not a performance expert but JOINs and JSONB queries have different performance profiles. It’s good to know both in case you do need to model your data one way or the other.
  3. I wanted to teach you JOINs

In general I make ample use of JSONB because it closely mirrors how a programmer thinks about problems. Just keep in mind you can get better performance almost always as a normal column. Use columns always and then use JSONB when you have unstructured or unbounded data.

Storing user preferences, document metadata, product catalogs, API responses, log records, or any scenario where data lacks a predefined structure.

Aggregations

Occasionally you need to query for macro statistics about your tables, not just query for individual rows. Let’s use that we’ve already used before, COUNT. What if we want to know how many ingredients we have overall in our ingredients table?

Count

SELECT COUNT(*) FROM ingredients;

COUNT is an aggregation function. We give the * is saying “count everything and don’t remove nulls or duplicates of any variety”.

What if we wanted to count how many distinct types of ingredients we have in the ingredients table?

This is imply to tell how many things are distinct to each other in a given column.

SELECT COUNT(DISTINCT type) FROM ingredients;

This is going to tell how many different types we have in the ingredients table. Keep in mind the query to see what the distinct ingredients is

SELECT DISTINCT type FROM ingredients;

The first query gives you the number, the count of many distinct things are in the list. The second query gives you what those distinct things are with no indication of how many of each there are. There could be 1 fruit and 10,000 vegetables and you’d have no indication of that.

Okay, so you want to see both at the same time? Let’s see that.

SELECT
  type, COUNT(type)
FROM
  ingredients
GROUP BY
  type;

This is combining both of what we saw plus a new thing, GROUP BY. This allows us to specify what things we want to aggregate together: the type. Keep in mind if you want to SELECT for something with a GROUP BY clause, you do need to put them in the GROUP BY clause.

Group by is always going to be accompanied with an aggregation function such as count, min etc..

Having

The HAVING clause in PostgreSQL provides a way to filter groups of rows based on the results of aggregate functions after they have been calculated. While similar to the WHERE clause, it operates on aggregated data rather than individual rows.

Key Points:

  • 🔥 Used with GROUP BY to categorize data and perform aggregate functions.
  • 🔥 Filters groups based on aggregate function results, like COUNTSUMAVG, etc.
  • 🔥 Cannot contain non-aggregate functions (unlike WHERE).
  • 🔥 Written after GROUP BY, but before ORDER BYLIMIT, or OFFSET.

Okay, so now you want to select only INGREDIENTS with less than 10 items in your database so you can know what sorts of things you need to add to your database.

The temptation here would be to use WHERE

SELECT type, COUNT(type) FROM ingredients WHERE COUNT(count) <= 10 GROUP BY type;

You’ll get the error that count doesn’t exist and it’s because count isn’t something you’re selecting for, it’s something you’re aggregating. The where clause filters on the rows you’re selecting which happens before the aggregation.

This can be useful because let's say we wanted to select only things that have an id higher than 30.

SELECT
  type, COUNT(type)
FROM
  ingredients
WHERE
  id > 30
GROUP BY
  type;

How we do filter based on the aggregates and not on the rows themselves? With HAVING.

SELECT
  type, COUNT(type)
FROM
  ingredients
GROUP BY
  type
HAVING
  COUNT(type) < 10;
 

And keep in mind you can use both together

SELECT
  type, COUNT(type)
FROM
  ingredients
WHERE
  id > 30
GROUP BY
  type
HAVING
  COUNT(type) < 10;

There are more aggregation functions like MIN (give the smallest value in this selected set), MAX (same but max), and AVG (give me the average). We’ll use those in the next exercise with the movie data set.

Example:

Consider a table orders with columns customer_id, product_id, and quantity. You want to find customers who ordered more than 3 items in total:

SELECT customer_id, SUM(quantity) AS total_items
FROM orders
GROUP BY customer_id
HAVING SUM(quantity) > 3;

This query:

  1. Groups orders by customer_id.
  2. Calculates SUM(quantity) for each group.
  3. Uses HAVING to filter groups where total_items is greater than 3. Advanced Usage:
  • Use multiple aggregates and expressions in HAVING.
  • Filter based on other columns than those used in GROUP BY.
  • Combine HAVING with WHERE for filtering both individual rows and groups.
Example:

Find customers who ordered more than 3 items and spent more than $100 in total:

SELECT customer_id, SUM(quantity) AS total_items, SUM(price * quantity) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(quantity) > 3 AND SUM(price * quantity) > 100;

Remember:

  • HAVING operates on aggregated values, not individual rows.
  • It allows fine-grained control over which groups of data to consider after aggregation.
  • Use it judiciously to avoid unnecessarily filtering out meaningful data.

I hope this comprehensive explanation clarifies HAVING aggregation in PostgreSQL! Feel free to ask if you have any further questions or specific use cases.

Functions Triggers and Procedures

Functions

Functions are self-contained blocks of code that perform specific tasks or calculations.

  • They can take input parameters, process them, and return a value or modify data.
  • Useful for reusing common logic, encapsulating complex operations, and improving code modularity.

Types of function

  • Scalar functions: Return a single value (e.g., adding two numbers).
  • Table functions: Return a result set of rows (e.g., filtering data based on criteria).
  • Aggregate functions: Operate on groups of rows and return a single value (e.g., calculating average).
  • Window functions: Perform calculations based on a window of rows (e.g., ranking customers within a category).

Creating a function

CREATE FUNCTION add_numbers(a INTEGER, b INTEGER)
RETURNS INTEGER AS $$
BEGIN
  RETURN a + b;
END;
$$ LANGUAGE plpgsql;
 
CREATE OR REPLACE FUNCTION get_customer_orders(customer_id INTEGER)
RETURNS TABLE AS $$
SELECT * FROM orders WHERE customer_id = $1;
$$ LANGUAGE plpgsql;

Using functions

SELECT add_numbers(5, 3); -- Returns 8
SELECT * FROM get_customer_orders(123);

Get Recipes that have 4-6 ingredients

Let’s say we’re focusing our recipe site on recipes with few ingredients Let’s say we give users a way to filter recipes by how many ingredients are in the recipe.

SELECT
  r.title
FROM
  recipe_ingredients ri
 
INNER JOIN
  recipes r
ON
  r.recipe_id = ri.recipe_id
 
GROUP BY
  r.title
HAVING
  COUNT(r.title) BETWEEN 4 AND 6;

This will give you all recipes that have 4, 5, or 6 ingredients in them. Now if this is a big focus of our website and we’re going to repeat this everywhere, it would be annoying to copy and paste this everywhere. Wouldn’t be cool if PostgreSQL would let us wrap this up into an easy to use function?

Creating our function

 
-- Function to return recipes with ingredients that are between two bounds.
 
CREATE OR REPLACE FUNCTION
  get_recipes_with_ingredients(low INT, high INT)
RETURNS
  SETOF VARCHAR -- Types of things to be returned
LANGUAGE
  plpgsql
AS
$$
-- Same as single quote ' you can use ' but will need to escape some in our query if we have any
BEGIN
  RETURN QUERY SELECT
    r.title
  FROM
    recipe_ingredients ri
 
  INNER JOIN
    recipes r
  ON
    r.recipe_id = ri.recipe_id
 
  GROUP BY
    r.title
  HAVING
    COUNT(r.title) between low and high;
END;
$$;

How to use the function

Specifically with functions, you are going to have to call them

-- Now our query is quite versatile and useful.
select * from get_recipes_with_ingredients(4,6);

How to remove the function

DROP FUNCTION get_recipes_with_ingredients(low int, high int)
-- will delete the function

Now we have a function that we can call from whenever we need to our more complicated query. This can come in very handy if your team has shared needs.

Let’s break down some features of it:

  • You can just say CREATE or you can just say REPLACE. I added both so it wouldn’t trip you up
  • You do need to declare a return type, either in the function definiton like we did, or in the invocation
  • $$ is called “dollar quotes”. You can actually use them instead of ' in your insertions if you’re sick of writing '' for your single quotes and \\ for your backslashes. It’s just really ugly. Here we use them because we have a long quote and it would be very annoying to try and escape everything
  • This language is called PL/pgSQL. It’s a very SQL like language designed to be easy to write functions and procedures. PostgreSQL actually allows itself to be extended and you can use JavaScriptPython, and other languages to write these as well
  • Here we’re just returning the result of a query. There’s a myriad of more powerful things you can do with functions and fun sorts of aggregation. I’ll leave that to you to explore. This is not a class on functions; it’s a very deep topic

A note from your potential future self or coworker: do not go overboard. This are powerful but they can also be weird to debug. Where do you store the code for these? How are you going to debug it when it has a bug in production? There are ways to handle these things but it’s different than just handling it in your server’s code. Use with caution.

Tips and tricks

SQL Commands

Base table

namecontinentareapopulationgdp
AfghanistanAsia6522302550010020343000000
AlbaniaEurope28748283174112960000000
AlgeriaAfrica238174137100000188681000000
AndorraEurope468781153712000000
AngolaAfrica124670020609294100990000000

Selection commands

Where clause

The example uses a WHERE clause to show the population of ‘France’. Note that strings should be in ‘single quotes’;

SELECT population FROM world
  WHERE name = "France"
In clause

Checking a list The word IN allows us to check if an item is in a list.

SELECT name, population FROM world
  WHERE name IN ('Brazil', 'Russia', 'India', 'China');
 
Between clause

Which countries are not too small and not too big? BETWEEN allows range checking (range specified is inclusive of boundary values)

SELECT name, area FROM world
  WHERE area BETWEEN 250000 AND 300000
 
Group by

In a GROUP BY statement only distinct values are shown for the column in the GROUP BY. This example shows the continents hosting the Olympics with the count of the number of games held.

SELECT continent, COUNT(yr) FROM games
 GROUP BY continent
 
Join

You can use a JOIN to get results from two or more related tables. In this example each row of the table games is related to a row of the table city. If you want to find the country where the games took place you must JOIN the games table to the city table on the common field that is games.city and city.name

SELECT games.yr, city.country
  FROM games JOIN city
       ON (games.city = city.name)
 
Order by

Show the name and population for each country with a population of more than 100000000. Show countries in descending order of population.

SELECT name, population
  FROM bbc
  WHERE population > 100000000
  ORDER BY population DESC
 
Distinct

By default the result of a SELECT may contain duplicate rows. We can remove these duplicates using the DISTINCT key word.

SELECT DISTINCT region FROM bbc