Setup postgress on a container
the fastest way we can get everyone up and running on PostgreSQL is to have everyone use Docker containers. Dockers allows any computer to run a lightweight emulation of a pre-made environment (typically Linux). In our case we’re going to be using a premade Linux environment that has PostgreSQL already installed on it.
Used containers
Running postgres
This should run PostgreSQL in the background.
- We gave it a password of “lol” (feel free to change it to something different, I just remember lol because lol)
- We ran it with a name of
pg
so we can refer it with that shorthand - We used the
--rm
flag so the container deletes itself afterwards (rather than leave its logs and such around) - We ran it in the background with
-d
. Otherwise it’ll start in the foreground. - The
-p
allows us to expose port 5432 locally which is the port Postgres runs on by default.
Run docker ps
to see it running. You can also see it in the Docker Desktop app running under the containers tab.
Now run docker kill pg
to kill the container. This will shut down the container and since we ran it with --rm
it’ll clean itself up afterwards too. We can run the docker run …
to start it again.
- Okay, let’s try connecting to it with psql, the CLI tool for connecting to Postgres.
Connect to Postgres
Run postgres from your container
Postgres
psql commands
psql has a bunch of built in commands to help you navigate around. All of these are going to begin with \
. Try \?
to bring up the help list.
For now we’re interested in what databases we have available to us.
Try running \l
to list all databases. You’ll likely see the databases :
- postgres,
- template0,
- template1.
Default databases
template1
is what Postgres uses by default when you create a database. If you want your databases to have a default shape, you can modify template1 to suit that.
template0
should never be modified. If your template1 gets out of whack, you can drop it and recreate from the fresh template0.
postgres
exists for the purpose of a default database to connect to. We’re actually connected to it by default since we didn’t specify a database to connect to. You can technically could delete it but there’s no reason to and a lot of tools and extensions do rely on it being there.
Databases Overview
Create a database
A database is a collection of similar tables of data.
Switching to a database
to switch to a database you simply use the \c
command followed by the database name
Show tables
there is no such a thing as show table … , to show all the tables in Postgres you simply use the command,
Insert into your table
Select from your table
Alter table
Sometimes you want to change the structure of your table in this case you can use the alter command
Adding a column
Deleting a column
sometimes you want to delete a column well here you go
Database Insert
This is the standard way of doing an insert. In the first set of parens you list out the column names then in the values column you list the actual values you want to insert.
Big key here which will throw JS developers for a loop: you must use single quotes. Single quotes in SQL means "this is a literal value". Double quotes in SQL mean "this is an identifier of some variety".
Use
--
for comments
The above query works because the double quotes are around identifiers like the table name and the column names. The single quotes are around the literal values. The double quotes above are optional. The single quotes are not.
On conflict
Note the ON CONFLICT
section. Some of these you may have already inserted (like the red pepper.) This is telling PostgreSQL that if a row exists already to just do nothing about it.
On conflict do nothing
This basically means that in case there is a conflict then do nothing on your database
PG won't crush in case there is a conflict in your data that you are inserting
On conflict update
Sometimes you might have a conflict and rather than doing nothing you would like to update your stuff then that is where update on conflict becomes handy
Database Update
The next thing we need to learn is how to update a field. We’ve seen how to INSERT and do an update if an INSERT fails, but let’s see how to update something directly.
The WHERE clause is where you filter down what you want to update.
In this case there’s only one watermelon so it’ll just update one but if you had many watermelons it would match all of those and update all of them,
You probably got a line returned something like
If you want to return what was updated try:
The RETURNING clause tells Postgres you want to return those columns of the things you’ve updated.
In our case I had it return literally everything we have in the table so you could write that as
return everything that was inserted!💖
The * means “everything” in this case.
Let’s add two rows with identical images.
Whitespace isn’t significant in SQL. You can break lines up however you want and how they make sense to you.
Now let’s update both.
Here you can see as long as that WHERE clause matches multiple rows, it'll update all the rows involved.
DELETE
Deletes are very similar in SQL.
Here we just have no SET clause. Anything that matches that WHERE clause will be deleted. The RETURNING, like in updates, is optional if you want to see what was deleted.
Select statement
Let’s next hop into SELECT, or how to read data from a database. We’ve already seen the most basic one.
The * there represents all available columns. Frequently, for a variety of reasons, we do not want to select everything. In general, I would recommend only using * where your intent is truly “I want anything this database could ever store for these records“.
Frequently that is not the case. Frequently it is “I need the name, address and email for this email, but not their social security or credit card number”.
This is positive because it means smaller transfer loads, that your system only processes data it needs and not the rest, and it shows intention in your code. Honestly the biggest benefit is the latter: to show intentions in your code.
So in our case we could put
When you read this now you know that the former coder was actively looking for those three columns.
LIMIT and OFFSET
Okay, now what if the user only wants five records?
This will limit your return to only five records, the first five it finds. You frequently will need to do this as well (as in almost always) since a database can contain millions if not billions of records.
So what if you want the next five records?
This can be inefficient at large scales and without the use of indexes.
It also has the problem of if you're paging through data and someone inserts a record in the meantime you could shift your results. We'll get into optimizing queries a bit later, but just be aware of that.
Using offset, when someone is adding and deleting stuffs throughout your data then it's really gonna be hard for the offset to give them the updated results, some rows might never get returned to the user
A better way of doing these kind of select is using the id, which is much more efficient and way performant compared to the brother offset.
WHERE
Sometimes you don’t want all of the records all at once. In that case you can add a WHERE clause where you tell the database to filter your results down to a subset of all of your records. What if we only wanted to show only fruits?
This will give us all the fruits we had. What if we wanted to only select vegetables where the the IDs are less 20?
AND allows you to add multiple clauses to your selects. As you may guess, the presence of AND belies the existance of OR
ORDER BY
You frequently will care about the order these things come back in, how they’re sorted. That’s what ORDER BY is for.
Let’s say you wanted to order not by id or insertion order but by title.
This will alphabetize your returned list. What if we wanted it in reverse order of IDs?
This will start at the largest number and count backwards. As you may have guessed, ASC
is implied if you don’t specify.
Search
Some times you want to search for something in your database but it’s not an exact match. The best example would be if a user types something into the search bar, they’re likely not giving you something that you match exactly. You would expect a search for “pota” to match “potato”, right? Enter LIKE
in SQL.
This is a very limited fuzzy matching of text. This is not doing things like dropping “stop words” (like and, the, with, etc.) or handling plurals, or handling similar spellings (like color vs colour). Postgres can do this, and we’ll get there later with indexes.
Built in functions
Okay, great, now what if a user searchs for “fruit”? We’d expect that to work, right?
Concat
Info
concat()
is a function that will take two strings and combine them together. We can concat our two title and type columns and then use LIKE on the results on that combined string.
The result of the cherry row would be
cherryfruit
which means it’d match weird strings likerryfru
. We’ll talk later about more complicated string matching but this will do for now.
Okay, but what if we have capitalization problem? We can use lower, both on the columns and on the values.
Lower
LOWER()
take a string and make it lowercase.
ILIKE (insensitive)
Fortunately, there's an even easier way to do this with less function evocation.
ILIKE
does the same thing, just with case insensitivity. Mostly I just wanted to show you lower!
There are so many built in functions to Postgres. Click here to see the official docs.
% vs _
You see that we’ve been surrounding the
LIKE
values with%
. This is basically saying “match 0 to infinite characters”. So with “%berry” you would match “strawberry” and “blueberry” but not “berry ice cream”. Because the ”%” was only at the beginning it wouldn’t match anything after, you’d need “%berry%” to match both “strawberry” and “blueberry ice cream”.
The above will give you all titles that start with “c”.
You can put % anywhere. “b%t” will match “bt”, “bot”, “but”, “belt”, and “belligerent”.
There also exists _
which will match 1 and only one character. “b_t” will match “bot” and “but” but not “bt”, “belt”, or “belligerent”.
Database and node
This is not a Node.js class so the expectation of your Node.js skills here are fairly low. No worries if you don’t have a ton of experience with Node.js
We are going to be using the pg package which is the most common PostgreSQL client for Node.js. There are others but this is the one with the most direct access to the underlying queries which is what we want. Feel free to use an ORM in your projects but it’s useful to know how the underlying PostgreSQL works.
Before I dump you into the code I wanted to give you a very brief tour of what to expect.
Connecting to a database
pg has two ways to connect to a database: a client and a pool.
You can think of a client as an individual connection you open and then eventually close to your PostgreSQL database. When you call connect
on a client, it handshakes with the server and opens a connection for you to start using. You eventually call end
to end your session with a client.
A pool is a pool of clients. Opening and closing clients can be expensive if you’re doing it a lot. Imagine you have a server that gets 1,000 requests per second and every one of those needs to open a line to a database. All that opening and closing of connections is a lot of overhead. A pool therefore holds onto connections. You basically just say “hey pool, query this” and if it has a spare connection open it will give it to you. If you don’t then it will open another one. Pretty neat, right?
Okay, we’re going to be using pools today since we’re doing a web server. I’d really only use a client for a one-off script or if I’m doing transactions which aren’t supported by pools (transactions are not covered in this course but it’s if multiple queries have to happen all at once or not all). Otherwise I’d stick to pools. One isn’t any harder to use than the other.
So let’s see how to connect to a database.
- These are the default credentials combined with what I set up in the previous lessons. Make sure you’re using the correct credentials.
- Make sure you started PostgreSQL via docker with the
-p 5432:5432
flag or PostgreSQL won’t be exposed on your host system. - Make sure your database is running too.
Once you’ve done this, you can now start making queries to PostgreSQL!
Let’s write a query.
rows
will be the response of the query.- There’s lot of other metadata on these queries beyond just the rows. Feel free to
console.log
it ordebug
it to explore. - pg will add the
;
at the end for you.
Parameterization and SQL injection
Let’s say you wanted to query for one specific ingredient by id
and that id was passed in via an AJAX request.
What’s wrong with then doing then:
SQL injection. You’re just raw passing in user data to a query and a user could fathomably put anything in that id API request. What happens if a user made id equal to 1; DROP TABLE users; --
?
Do not run the next query. Well, it wouldn’t matter since we don’t have a users table but still, it’s meant to be a demonstration of what could happen.
Oh no 😱
The first 1;
ends the first query. The DROP TABLE users;
does whatever malicious thing the hacker wants to do. They can delete info like this one does, or they can dump info that you don’t want them to. The --
says “everything after this is a comment” so it comments out the rest of your SQL if anything came after it.
This is SQL injection and a very real issue with SQL in apps. It’s why Wordpress apps are always under attack because they use MySQL behind the scenes. If you can find somewhere you can inject SQL you can get a lot of info out of an app.
So how do we avoid it? Sanitization of your inputs and parameterization of your queries.
If you do this, pg will handle making sure that what goes into the query is safe to send to your database. Easy enough, right?
If you need to do a a thing like
WHERE text ILIKE '%star wars%'
and you’re using parameterized queryies, just put the%
in the thing to be inserted, e.g.('WHERE text ILIKE $1', ['%star wars%'])
Joints
So far we’ve done a one to one matching of records. We’ve used a record in a database to represent one item:
- vegetable,
- a fruit,
- …so on and so forth.
Now we’re going to get into records that can relate to each other. Let’s think about recipes.
A recipe has multiple ingredients. That has word is key here. It means there is a relationship.
A single recipe has many ingredients. An ingredient can also be in many recipes. A tomato is both in pizza sauce and in a BLT. This is called a many-to-many relationship.
There’s also one-to-many relationships, like imagine if we had multiple photos of each of our ingredients.
A single ingredient will have five photos. And those photos will only will only belong to one ingredient. A photo of a green pepper doesn’t make sense to belong to anything besides the green pepper ingredient.
There can also exist one-to-one relationships but in general you would just make those the same record all together. You could split up the type and title into two tables, but why would you.
A basic relationship
We’re going to do this a naïve wrong way, and then we’re going to do it the correct way.
First thing we’re going to make a recipes table
Okay, no rocket science so far. We just created another random table. Let’s say we could have many images of a single recipe. How would we go about doing that?
Pro tip: don’t write a course while hungry.
Okay, we now have a bunch of recipes. Our problem now is that we have a variable amount of photos for each recipe. Perhaps we have one picture of cookies but we have four pictures of jollof rice. How do we model that if we always have a fixed amount columns?
One approach would be to say “I only have at most five pictures of any recipes, I’ll make columns pic1 through pic5 columns and limit myself to that”. Totally valid approach, and if you can guarantee only 5 images at most, it might be the right approach.
But what if we can’t guarantee that? We need some sort of flexibility. This is where we can use two tables.
Now we can insert a some rows to reference images that we have.
Okay, now let’s select all the photos for shakshuka. Shakshuka is ID 4 (if you followed my instructions, feel free to drop tables and recreate them if you need to.)
It'd be cool if we could do these all at the same time. Some sort inner section of a venn diagram, right? You can!
Amazing!!
Now we’re getting data from each table smushed together. The data from recipes will be repeated over item in the recipes_photos table but that’s what we expect.
Just to show you some common shorthand (but otherwise same query)
Go ahead and try this to see the full inner join.
Pretty cool, right?
Tips and tricks
SQL Commands
Base table
name | continent | area | population | gdp |
---|---|---|---|---|
Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
Albania | Europe | 28748 | 2831741 | 12960000000 |
Algeria | Africa | 2381741 | 37100000 | 188681000000 |
Andorra | Europe | 468 | 78115 | 3712000000 |
Angola | Africa | 1246700 | 20609294 | 100990000000 |
Selection commands
Where clause
The example uses a WHERE clause to show the population of ‘France’. Note that strings should be in ‘single quotes’;
In clause
Checking a list The word IN allows us to check if an item is in a list.
Between clause
Which countries are not too small and not too big? BETWEEN
allows range checking (range specified is inclusive of boundary values)
Group by
In a GROUP BY statement only distinct values are shown for the column in the GROUP BY. This example shows the continents hosting the Olympics with the count of the number of games held.
Join
You can use a JOIN to get results from two or more related tables. In this example each row of the table games
is related to a row of the table city
. If you want to find the country where the games took place you must JOIN the games table to the city table on the common field that is games.city and city.name
Order by
Show the name
and population
for each country with a population of more than 100000000. Show countries in descending order of population.
Distinct
By default the result of a SELECT
may contain duplicate rows. We can remove these duplicates using the DISTINCT
key word.