Setup postgress on a container
the fastest way we can get everyone up and running on PostgreSQL is to have everyone use Docker containers. Dockers allows any computer to run a lightweight emulation of a pre-made environment (typically Linux). In our case we’re going to be using a premade Linux environment that has PostgreSQL already installed on it.
Used containers
Running postgres
This should run PostgreSQL in the background.
- We gave it a password of “lol” (feel free to change it to something different, I just remember lol because lol)
- We ran it with a name of
pg
so we can refer it with that shorthand - We used the
--rm
flag so the container deletes itself afterwards (rather than leave its logs and such around) - We ran it in the background with
-d
. Otherwise it’ll start in the foreground. - The
-p
allows us to expose port 5432 locally which is the port Postgres runs on by default.
Run docker ps
to see it running. You can also see it in the Docker Desktop app running under the containers tab.
Now run docker kill pg
to kill the container. This will shut down the container and since we ran it with --rm
it’ll clean itself up afterwards too. We can run the docker run …
to start it again.
- Okay, let’s try connecting to it with psql, the CLI tool for connecting to Postgres.
Connect to Postgres
Run postgres from your container
Postgres
psql commands
psql has a bunch of built in commands to help you navigate around. All of these are going to begin with \
. Try \?
to bring up the help list.
For now we’re interested in what databases we have available to us.
Try running \l
to list all databases. You’ll likely see the databases :
- postgres,
- template0,
- template1.
Default databases
template1
is what Postgres uses by default when you create a database. If you want your databases to have a default shape, you can modify template1 to suit that.
template0
should never be modified. If your template1 gets out of whack, you can drop it and recreate from the fresh template0.
postgres
exists for the purpose of a default database to connect to. We’re actually connected to it by default since we didn’t specify a database to connect to. You can technically could delete it but there’s no reason to and a lot of tools and extensions do rely on it being there.
Databases Overview
Create a database
A database is a collection of similar tables of data.
Switching to a database
to switch to a database you simply use the \c
command followed by the database name
Show tables
there is no such a thing as show table … , to show all the tables in Postgres you simply use the command,
Insert into your table
Select from your table
Alter table
Sometimes you want to change the structure of your table in this case you can use the alter command
Adding a column
Deleting a column
sometimes you want to delete a column well here you go
Database Insert
This is the standard way of doing an insert. In the first set of parens you list out the column names then in the values column you list the actual values you want to insert.
Big key here which will throw JS developers for a loop: you must use single quotes. Single quotes in SQL means "this is a literal value". Double quotes in SQL mean "this is an identifier of some variety".
Use
--
for comments
The above query works because the double quotes are around identifiers like the table name and the column names. The single quotes are around the literal values. The double quotes above are optional. The single quotes are not.
On conflict
Note the ON CONFLICT
section. Some of these you may have already inserted (like the red pepper.) This is telling PostgreSQL that if a row exists already to just do nothing about it.
On conflict do nothing
This basically means that in case there is a conflict then do nothing on your database
PG won't crush in case there is a conflict in your data that you are inserting
On conflict update
Sometimes you might have a conflict and rather than doing nothing you would like to update your stuff then that is where update on conflict becomes handy
Database Update
The next thing we need to learn is how to update a field. We’ve seen how to INSERT and do an update if an INSERT fails, but let’s see how to update something directly.
The WHERE clause is where you filter down what you want to update.
In this case there’s only one watermelon so it’ll just update one but if you had many watermelons it would match all of those and update all of them,
You probably got a line returned something like
If you want to return what was updated try:
The RETURNING clause tells Postgres you want to return those columns of the things you’ve updated.
In our case I had it return literally everything we have in the table so you could write that as
return everything that was inserted!💖
The * means “everything” in this case.
Let’s add two rows with identical images.
Whitespace isn’t significant in SQL. You can break lines up however you want and how they make sense to you.
Now let’s update both.
Here you can see as long as that WHERE clause matches multiple rows, it'll update all the rows involved.
DELETE
Deletes are very similar in SQL.
Here we just have no SET clause. Anything that matches that WHERE clause will be deleted. The RETURNING, like in updates, is optional if you want to see what was deleted.
Select statement
Let’s next hop into SELECT, or how to read data from a database. We’ve already seen the most basic one.
The _ there represents all available columns. Frequently, for a variety of reasons, we do not want to select everything. In general, I would recommend only using _ where your intent is truly “I want anything this database could ever store for these records“.
Frequently that is not the case. Frequently it is “I need the name, address and email for this email, but not their social security or credit card number”.
This is positive because it means smaller transfer loads, that your system only processes data it needs and not the rest, and it shows intention in your code. Honestly the biggest benefit is the latter: to show intentions in your code.
So in our case we could put
When you read this now you know that the former coder was actively looking for those three columns.
LIMIT and OFFSET
Okay, now what if the user only wants five records?
This will limit your return to only five records, the first five it finds. You frequently will need to do this as well (as in almost always) since a database can contain millions if not billions of records.
So what if you want the next five records?
This can be inefficient at large scales and without the use of indexes.
It also has the problem of if you're paging through data and someone inserts a record in the meantime you could shift your results. We'll get into optimizing queries a bit later, but just be aware of that.
Using offset, when someone is adding and deleting stuffs throughout your data then it's really gonna be hard for the offset to give them the updated results, some rows might never get returned to the user
A better way of doing these kind of select is using the id, which is much more efficient and way performant compared to the brother offset.
WHERE
Sometimes you don’t want all of the records all at once. In that case you can add a WHERE clause where you tell the database to filter your results down to a subset of all of your records. What if we only wanted to show only fruits?
This will give us all the fruits we had. What if we wanted to only select vegetables where the the IDs are less 20?
AND allows you to add multiple clauses to your selects. As you may guess, the presence of AND belies the existance of OR
ORDER BY
You frequently will care about the order these things come back in, how they’re sorted. That’s what ORDER BY is for.
Let’s say you wanted to order not by id or insertion order but by title.
This will alphabetize your returned list. What if we wanted it in reverse order of IDs?
This will start at the largest number and count backwards. As you may have guessed, ASC
is implied if you don’t specify.
Search
Some times you want to search for something in your database but it’s not an exact match. The best example would be if a user types something into the search bar, they’re likely not giving you something that you match exactly. You would expect a search for “pota” to match “potato”, right? Enter LIKE
in SQL.
This is a very limited fuzzy matching of text. This is not doing things like dropping “stop words” (like and, the, with, etc.) or handling plurals, or handling similar spellings (like color vs colour). Postgres can do this, and we’ll get there later with indexes.
Built in functions
Okay, great, now what if a user searchs for “fruit”? We’d expect that to work, right?
Concat
Info
concat()
is a function that will take two strings and combine them together. We can concat our two title and type columns and then use LIKE on the results on that combined string.
The result of the cherry row would be
cherryfruit
which means it’d match weird strings likerryfru
. We’ll talk later about more complicated string matching but this will do for now.
Okay, but what if we have capitalization problem? We can use lower, both on the columns and on the values.
Lower
LOWER()
take a string and make it lowercase.
ILIKE (insensitive)
Fortunately, there's an even easier way to do this with less function evocation.
ILIKE
does the same thing, just with case insensitivity. Mostly I just wanted to show you lower!
There are so many built in functions to Postgres. Click here to see the official docs.
% vs _
You see that we’ve been surrounding the
LIKE
values with%
. This is basically saying “match 0 to infinite characters”. So with “%berry” you would match “strawberry” and “blueberry” but not “berry ice cream”. Because the ”%” was only at the beginning it wouldn’t match anything after, you’d need “%berry%” to match both “strawberry” and “blueberry ice cream”.
The above will give you all titles that start with “c”.
You can put % anywhere. “b%t” will match “bt”, “bot”, “but”, “belt”, and “belligerent”.
There also exists _
which will match 1 and only one character. “b_t” will match “bot” and “but” but not “bt”, “belt”, or “belligerent”.
Database and node
This is not a Node.js class so the expectation of your Node.js skills here are fairly low. No worries if you don’t have a ton of experience with Node.js
We are going to be using the pg package which is the most common PostgreSQL client for Node.js. There are others but this is the one with the most direct access to the underlying queries which is what we want. Feel free to use an ORM in your projects but it’s useful to know how the underlying PostgreSQL works.
Before I dump you into the code I wanted to give you a very brief tour of what to expect.
Connecting to a database
pg has two ways to connect to a database: a client and a pool.
You can think of a client as an individual connection you open and then eventually close to your PostgreSQL database. When you call connect
on a client, it handshakes with the server and opens a connection for you to start using. You eventually call end
to end your session with a client.
A pool is a pool of clients. Opening and closing clients can be expensive if you’re doing it a lot. Imagine you have a server that gets 1,000 requests per second and every one of those needs to open a line to a database. All that opening and closing of connections is a lot of overhead. A pool therefore holds onto connections. You basically just say “hey pool, query this” and if it has a spare connection open it will give it to you. If you don’t then it will open another one. Pretty neat, right?
Okay, we’re going to be using pools today since we’re doing a web server. I’d really only use a client for a one-off script or if I’m doing transactions which aren’t supported by pools (transactions are not covered in this course but it’s if multiple queries have to happen all at once or not all). Otherwise I’d stick to pools. One isn’t any harder to use than the other.
So let’s see how to connect to a database.
- These are the default credentials combined with what I set up in the previous lessons. Make sure you’re using the correct credentials.
- Make sure you started PostgreSQL via docker with the
-p 5432:5432
flag or PostgreSQL won’t be exposed on your host system. - Make sure your database is running too.
Once you’ve done this, you can now start making queries to PostgreSQL!
Let’s write a query.
rows
will be the response of the query.- There’s lot of other metadata on these queries beyond just the rows. Feel free to
console.log
it ordebug
it to explore. - pg will add the
;
at the end for you.
Parameterization and SQL injection
Let’s say you wanted to query for one specific ingredient by id
and that id was passed in via an AJAX request.
What’s wrong with then doing then:
SQL injection. You’re just raw passing in user data to a query and a user could fathomably put anything in that id API request. What happens if a user made id equal to 1; DROP TABLE users; --
?
Do not run the next query. Well, it wouldn’t matter since we don’t have a users table but still, it’s meant to be a demonstration of what could happen.
Oh no 😱
The first 1;
ends the first query. The DROP TABLE users;
does whatever malicious thing the hacker wants to do. They can delete info like this one does, or they can dump info that you don’t want them to. The --
says “everything after this is a comment” so it comments out the rest of your SQL if anything came after it.
This is SQL injection and a very real issue with SQL in apps. It’s why Wordpress apps are always under attack because they use MySQL behind the scenes. If you can find somewhere you can inject SQL you can get a lot of info out of an app.
So how do we avoid it? Sanitization of your inputs and parameterization of your queries.
If you do this, pg will handle making sure that what goes into the query is safe to send to your database. Easy enough, right?
If you need to do a a thing like
WHERE text ILIKE '%star wars%'
and you’re using parameterized queryies, just put the%
in the thing to be inserted, e.g.('WHERE text ILIKE $1', ['%star wars%'])
Joints
So far we’ve done a one to one matching of records. We’ve used a record in a database to represent one item:
- vegetable,
- a fruit,
- …so on and so forth.
Now we’re going to get into records that can relate to each other. Let’s think about recipes.
A recipe has multiple ingredients. That has word is key here. It means there is a relationship.
A single recipe has many ingredients.
An ingredient can also be in many recipes. A tomato is both in pizza sauce and in a BLT. This is called a many-to-many relationship.
There's also one-to-many relationships,
like imagine if we had multiple photos of each of our ingredients. A single ingredient will have five photos. And those photos will only will only belong to one ingredient. A photo of a green pepper doesn’t make sense to belong to anything besides the green pepper ingredient.
There can also exist one-to-one relationships
but in general you would just make those the same record all together. You could split up the type and title into two tables, but why would you.
A basic relationship
We’re going to do this a naïve wrong way, and then we’re going to do it the correct way.
First thing we’re going to make a recipes table
Okay, no rocket science so far. We just created another random table. Let’s say we could have many images of a single recipe. How would we go about doing that?
Pro tip: don’t write a course while hungry.
Okay, we now have a bunch of recipes. Our problem now is that we have a variable amount of photos for each recipe. Perhaps we have one picture of cookies but we have four pictures of jollof rice. How do we model that if we always have a fixed amount columns?
One approach would be to say “I only have at most five pictures of any recipes, I’ll make columns pic1 through pic5 columns and limit myself to that”. Totally valid approach, and if you can guarantee only 5 images at most, it might be the right approach.
But what if we can’t guarantee that? We need some sort of flexibility. This is where we can use two tables.
Now we can insert a some rows to reference images that we have.
Okay, now let’s select all the photos for shakshuka. Shakshuka is ID 4 (if you followed my instructions, feel free to drop tables and recreate them if you need to.)
It'd be cool if we could do these all at the same time. Some sort inner section of a venn diagram, right? You can!
Amazing!!
Now we’re getting data from each table smushed together. The data from recipes will be repeated over item in the recipes_photos table but that’s what we expect.
Just to show you some common shorthand (but otherwise same query)
Go ahead and try this to see the full inner join.
Pretty cool, right?
Notice in our query above that "xiao long bao" does not show up at all. Makes sense, we have no photos of xiao long bao so why would they show up? INNER JOIN was the correct choice for what semantics we intended.
Okay, but let’s say we were populating a list of all our recipes and their photos and we had a default image if a recipe didn’t have any photos? Then INNER JOIN doesn’t make sense because INNER JOIN only gives up things where they exist in both table A and table B. So how would we accomplish this task then?
NATURAL JOIN
I intentionally named recipe_id
in both tables the same to show you this fun party trick.
What is this sorcery!? Because we named recipe_id the same in both, Postgres is smart enough to put two and two together and figure out that that’s what we should join on.
NATURAL JOIN
is short for NATURAL INNER JOIN
. NATURAL LEFT JOIN
and NATURAL RIGHT JOIN
are also possible too.
In general let me steer you away from this in your code. Your tables can change down the line and you could accidentally name things the same that aren’t the same (like id
being a classic one.) It’s useful for quick querying like we’re doing, but I’d say be explicit and avoid NATURAL’s implicit behavior. Still cool though, right?
CROSS JOIN
A small note on a not super useful type of JOIN. Let’s say you had two tables. In table 1 you had colors: green, blue, and red. In table 2 you have animals: dog, cat, and chicken. You want to make every possible permutation of the combination of both tables: green dog, green cat, green chicken, blue dog, blue cat, etc. There’s an ability to do this with CROSS JOIN
Keep in mind that 3 rows in each table yielded 9 rows. In our case, it yield 78 rows! It’s not typically the most useful kind of join but good to know it’s there.
Foreign Key & Managing reference
So far so good, we have the ability to query across tables and join them together. We have a problem though: what if we delete a recipe?
Cool, we dropped the recipe, but what about the photos?
Oh no! Still there! Now, we could write a second query to drop these two, but it'd be great if Postgres could track these changes for us and assure us they'd never fall out of sync.
Now, let’s see what happens if we modify recipes_photos really quick.
Okay, so we have a few things here
- 🔥 The REFERENCES portion means it’s going to be a foreign key.
- ➡️ You tell it what it’s going to match up to. In our case
recipes
is the table andrecipe_id
is the name of the column it’ll match. In our case those are the same name, but it doesn’t have to be. It must be the primary key of the other table.
- ➡️ You tell it what it’s going to match up to. In our case
- 🔥 Then you need to tell it what to do when you delete something.
- ➡️ With
ON DELETE CASCADE
you say “if the row in the other table gets deleted, delete this one too.” So if we delete something from recipes, it will automatically delete all its photos. Pretty cool, right?
- ➡️ With
- 🔥 You can also do
ON DELETE SET NULL
which does exactly what it says it does.- ➡️ There’s also
ON DELETE NO ACTION
which will error out if you try to delete something from recipes if there are still photos left. This forces developers to clean up photos before deleting recipes. That can be helpful to.
- ➡️ There’s also
- 🔥 There’s also
ON UPDATE
s if you need to handle some synced state state between the two tables.
If you’re going to have have two tables reference each other, use foreign keys where possible. It makes useful constraints to make sure delete and update behaviors are intentional and it makes the queries faster.
Many to many relationships
So let’s now make our recipes connect to our ingredients!
This is a bit different because it’s a many-to-many relationship. A recipe has many ingredients, and an ingredient can belong to many recipes (eggs are in both khachapuri and cookies.)
We set this to error because we should clear out connections before we let developers delete recipes or ingredients. We don’t want to cascade deletes because that could delete recipes and ingredients unintentionally and we don’t want to set to null because then we’d have a bunch of half-null connections left over.
We’re going over constraints in the next chapter but we’re basically saying “the combination of recipe_id and ingredient_id must be unique” and we’re setting that as the primary key instead of an incrementing ID.
This table will describe the many-to-many relationship with two foreign keys between ingredients and recipes. Now we can insert records into here that describe how an ingredient belongs to a recipe.
These recipes are nonsensical, don’t expect them to be correct recipes.
This is the way to connect multiple columns together. You have two tables that represent the two distinct concepts (ingredients and recipes) and then you use another table to describe the relationships between them.
So now we have two recipes that have ingredients, cookies and empanadas. Since we inserted the cookies first, it will very likely have the id of 1 (unless you have inserted some other things.) Let’s select those records.
Inner join is what we’re looking for because we are only looking for rows that have connections on both sides.
Now let’s use another INNER JOIN to add in the recipes.
Constraints
A constraint is just a constraint you put around a column. For example, NOT NULL is a constraint we’ve used for our primary keys. You always need a primary key.
UNIQUE is another constraint that dictates that this column must be unqiue amongst all other columns.
Ever wonder how an app can tell you so quickly if an email or a user name is taken so quickly? They likely use a UNIQUE constraint on those two columns.
CHECK
- CHECK allows you to set conditions on the data.
- You can use it for enforcing enumerated types (e.g. ‘male’, ‘female’, ‘nonbinary’, ‘other’ for gender.)
- Or you could have it test that a zipcode is five characters (in the USA.) Or that an age isn’t a negative number.
We actually have a very good use case for one: our type in the ingredients. It can be ‘fruit’, ‘vegetable’, ‘meat’, or ‘other’. We could write a CHECK constraint that enforces that.
This will add a constraint to the existing table. You can also create these constraints when you create the table (like we did with the foreign keys as well as unique, not null, and primary.)
Just for fun, let’s try to break it.
JSONB
JSONB is a data type introduced in PostgreSQ, Specifically designed to store and manipulate JSON data.
Working With JSONB:
To be able to use JSONB, you simply have to define a component and use the JSONB data type.
Info Inserting data: You can directly insert valid JSON strings into JSONB columns using standard INSERT statements.
Querying data
-
Querying data: PostgreSQL offers rich functionality for querying JSONB data:
- 🔥 Accessing elements:
- ➡️ Use operators like
->
and->>
to access specific keys or array elements within the JSON structure.
- ➡️ Use operators like
- Filtering:
- Employ comparison operators (e.g.,
=
), text search functions, and JSON path expressions to filter based on your criteria.
- Employ comparison operators (e.g.,
- Aggregating:
- Perform aggregations like counting, summing, or averaging directly on JSONB values using aggregate functions.
- 🔥 Accessing elements:
Native JSON Operators:
->
: Access JSON properties, e.g.,data->'key'
.@>
: Check JSON document containment, e.g.,data @> '{"key": "value"}'
.@
: Test JSON document similarity, e.g.,data @ '{"key": "value"}'
.jsonb_array_elements
: Iterate over JSON arrays.
Best Practices:
- Consider normalization:
- If your JSON data has a well-defined structure, consider normalizing it into separate columns for better performance and data integrity.
- Plan indexes thoughtfully:
- While indexing offers speed, it can increase storage overhead. Choose keys wisely based on your query patterns.
- Validate and sanitize:
- Implement validation and sanitization mechanisms to ensure data quality and prevent security vulnerabilities.
Create JSONB Column
Let’s add a JSONB field to our recipes. We’ll call it meta
as in metadata but we could call it anything. And keep in mind this is just a normal JSON object. You can have arrays, nested objects, whatever.
That's easy enough. Okay, let's now add a few rows to it. Keep in mind that it does have to be valid JSON or Postgres won't let you add it.
Selecting Data from the JSONB
Okay, let’s select all tags from any recipe that has meta data.
The ->
is an accessor. In JS terms, this is like saying meta.tags
.
If we were accessing another layer of nesting, you just use more ->
.
?column?
When you select a column like in the example above, the name by default will be ?column?
this is because we did not specify the name of our column. therefore, whenever you select a column, just specify the name
Selecting first data from an array
Let’s try selecting just the first element of each one using that.
The above This should give you back only the first item of each.
Notice they’re still weirdly in
""
. This is because the->
selector gives you back a JSON object so you can keep accessing it. If you want it back as just text, you use->>
.
Notice the first one
->
because we need tags back as a JSON object (because accessors don’t work on text) but we use->>
on the second one so we can get the text back.
Filtering with JSON operators:
Let’s say your column looks like this
id | title | meta |
---|---|---|
20 | Delightful Carrot Cake | {“tags”: [“dessert”, “cake”]} |
47 | Cranberry Nut Tarts | {“tags”: [“dessert”, “fruit”]} |
And you would like to select all the rows that has a meta that contains cake as their tags.
Both of these work. The first one with the
?
checks to see if 'cake' is a top level key available in which case it is.
The second with the
@>
is doing a "does this array contains this value. That's why we do need the extra double quotes, since it is a string in a JSON value.
Nested Access with #
The #
extracts JSON sub-object at the specified path, where path elements can be either field keys or array indexes.
'{"a": {"b": ["foo","bar"]}}' :: column #> '{a,b,1}'
→"bar"
|
JSONB Data manipulation
Updating JSONB Objects in PostgreSQL
45 | Apple Juice | { “boz”:[“value1”,“anotherValue”], “foo”: { “bar”: [“one”, “two”, “three”] } } |
---|---|---|
Adding a value in our JSONB Object
Update a Key with a New Value:
Lets say you want to change the “value1” in the “boz” array to “updatedValue” for recipe_id 45.
You can easily achieve that using the jsonb_set
function with path expression:
Delete or Remove key from our JSONB object
Data manipulation
When do use JSONB
I use JSONB a lot of PostgreSQL. I have a long history of using MongoDB and therefore my mindset fits well with the document oriented structures of it. JSONB does an amazing job of slotting into that in the right places.
For any record that is going to be present and finite, I always elect to make it a normal column. Then if I have any 1-to-many relationships like we did with photos or tags, I’ll use JSONB. Because one photo only belongs to one recipe, this would have been perfect to model as a JSONB field.
Tags are a bit more murky: in theory we are sharing tags across recipes (implying a many-to-many relationship like ingredients). If I decide to rename the “desserts” tag to “sweets”, I’d have to go update that on every JSONB field that used that. Better yet here, we could have a table of tags that IDs and names, and then the JSONB field of every recipe could refer to IDs. Best of both worlds. Great way to model many-to-many relationships without the third table.
So why did we spend all this time learning how to model data “the old school” way with many tables?
- Much of the data you’ll encounter in your job is still modeled this way.
- I’m not a performance expert but JOINs and JSONB queries have different performance profiles. It’s good to know both in case you do need to model your data one way or the other.
- I wanted to teach you JOINs
In general I make ample use of JSONB because it closely mirrors how a programmer thinks about problems. Just keep in mind you can get better performance almost always as a normal column. Use columns always and then use JSONB when you have unstructured or unbounded data.
Storing user preferences, document metadata, product catalogs, API responses, log records, or any scenario where data lacks a predefined structure.
Aggregations
Occasionally you need to query for macro statistics about your tables, not just query for individual rows. Let’s use that we’ve already used before, COUNT
. What if we want to know how many ingredients we have overall in our ingredients table?
Count
COUNT is an aggregation function. We give the * is saying “count everything and don’t remove nulls or duplicates of any variety”.
What if we wanted to count how many distinct types of ingredients we have in the ingredients table?
This is imply to tell how many things are distinct to each other in a given column.
This is going to tell how many different
type
s we have in the ingredients table. Keep in mind the query to see what the distinct ingredients is
The first query gives you the number, the count of many distinct things are in the list. The second query gives you what those distinct things are with no indication of how many of each there are. There could be 1 fruit and 10,000 vegetables and you’d have no indication of that.
Okay, so you want to see both at the same time? Let’s see that.
This is combining both of what we saw plus a new thing, GROUP BY. This allows us to specify what things we want to aggregate together: the type. Keep in mind if you want to SELECT for something with a GROUP BY clause, you do need to put them in the GROUP BY clause.
Group by is always going to be accompanied with an aggregation function such as count, min etc..
Having
The HAVING
clause in PostgreSQL provides a way to filter groups of rows based on the results of aggregate functions after they have been calculated. While similar to the WHERE
clause, it operates on aggregated data rather than individual rows.
Key Points:
- 🔥 Used with
GROUP BY
to categorize data and perform aggregate functions. - 🔥 Filters groups based on aggregate function results, like
COUNT
,SUM
,AVG
, etc. - 🔥 Cannot contain non-aggregate functions (unlike
WHERE
). - 🔥 Written after
GROUP BY
, but beforeORDER BY
,LIMIT
, orOFFSET
.
Okay, so now you want to select only INGREDIENTS with less than 10 items in your database so you can know what sorts of things you need to add to your database.
The temptation here would be to use
WHERE
You’ll get the error that count
doesn’t exist and it’s because count
isn’t something you’re selecting for, it’s something you’re aggregating. The where
clause filters on the rows you’re selecting which happens before the aggregation.
This can be useful because let's say we wanted to select only things that have an
id
higher than 30.
How we do filter based on the aggregates and not on the rows themselves? With
HAVING
.
And keep in mind you can use both together
There are more aggregation functions like MIN
(give the smallest value in this selected set), MAX
(same but max), and AVG
(give me the average). We’ll use those in the next exercise with the movie data set.
Example:
Consider a table orders
with columns customer_id
, product_id
, and quantity
. You want to find customers who ordered more than 3 items in total:
This query:
- Groups orders by
customer_id
. - Calculates
SUM(quantity)
for each group. - Uses
HAVING
to filter groups wheretotal_items
is greater than 3. Advanced Usage:
- Use multiple aggregates and expressions in
HAVING
. - Filter based on other columns than those used in
GROUP BY
. - Combine
HAVING
withWHERE
for filtering both individual rows and groups.
Example:
Find customers who ordered more than 3 items and spent more than $100 in total:
Remember:
HAVING
operates on aggregated values, not individual rows.- It allows fine-grained control over which groups of data to consider after aggregation.
- Use it judiciously to avoid unnecessarily filtering out meaningful data.
I hope this comprehensive explanation clarifies HAVING aggregation in PostgreSQL! Feel free to ask if you have any further questions or specific use cases.
Functions Triggers and Procedures
Functions
Functions are self-contained blocks of code that perform specific tasks or calculations.
- They can take input parameters, process them, and return a value or modify data.
- Useful for reusing common logic, encapsulating complex operations, and improving code modularity.
Types of function
- Scalar functions: Return a single value (e.g., adding two numbers).
- Table functions: Return a result set of rows (e.g., filtering data based on criteria).
- Aggregate functions: Operate on groups of rows and return a single value (e.g., calculating average).
- Window functions: Perform calculations based on a window of rows (e.g., ranking customers within a category).
Creating a function
Using functions
Get Recipes that have 4-6 ingredients
Let’s say we’re focusing our recipe site on recipes with few ingredients Let’s say we give users a way to filter recipes by how many ingredients are in the recipe.
This will give you all recipes that have 4, 5, or 6 ingredients in them. Now if this is a big focus of our website and we’re going to repeat this everywhere, it would be annoying to copy and paste this everywhere. Wouldn’t be cool if PostgreSQL would let us wrap this up into an easy to use function?
Creating our function
How to use the function
Specifically with functions, you are going to have to call them
How to remove the function
Now we have a function that we can call from whenever we need to our more complicated query. This can come in very handy if your team has shared needs.
Let’s break down some features of it:
- You can just say CREATE or you can just say REPLACE. I added both so it wouldn’t trip you up
- You do need to declare a return type, either in the function definiton like we did, or in the invocation
-
$$
is called “dollar quotes”. You can actually use them instead of'
in your insertions if you’re sick of writing''
for your single quotes and\\
for your backslashes. It’s just really ugly. Here we use them because we have a long quote and it would be very annoying to try and escape everything - This language is called PL/pgSQL. It’s a very SQL like language designed to be easy to write functions and procedures. PostgreSQL actually allows itself to be extended and you can use JavaScript, Python, and other languages to write these as well
- Here we’re just returning the result of a query. There’s a myriad of more powerful things you can do with functions and fun sorts of aggregation. I’ll leave that to you to explore. This is not a class on functions; it’s a very deep topic
A note from your potential future self or coworker: do not go overboard. This are powerful but they can also be weird to debug. Where do you store the code for these? How are you going to debug it when it has a bug in production? There are ways to handle these things but it’s different than just handling it in your server’s code. Use with caution.
Tips and tricks
SQL Commands
Base table
name | continent | area | population | gdp |
---|---|---|---|---|
Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
Albania | Europe | 28748 | 2831741 | 12960000000 |
Algeria | Africa | 2381741 | 37100000 | 188681000000 |
Andorra | Europe | 468 | 78115 | 3712000000 |
Angola | Africa | 1246700 | 20609294 | 100990000000 |
Selection commands
Where clause
The example uses a WHERE clause to show the population of ‘France’. Note that strings should be in ‘single quotes’;
In clause
Checking a list The word IN allows us to check if an item is in a list.
Between clause
Which countries are not too small and not too big? BETWEEN
allows range checking (range specified is inclusive of boundary values)
Group by
In a GROUP BY statement only distinct values are shown for the column in the GROUP BY. This example shows the continents hosting the Olympics with the count of the number of games held.
Join
You can use a JOIN to get results from two or more related tables. In this example each row of the table games
is related to a row of the table city
. If you want to find the country where the games took place you must JOIN the games table to the city table on the common field that is games.city and city.name
Order by
Show the name
and population
for each country with a population of more than 100000000. Show countries in descending order of population.
Distinct
By default the result of a SELECT
may contain duplicate rows. We can remove these duplicates using the DISTINCT
key word.