Database Engineering Path
Master relational and NoSQL databases, from fundamental queries to advanced data modeling.
1. Introduction to SQL
SQL (Structured Query Language) is the universal language for managing relational databases. It allows you to perform actions like querying data, inserting records, updating information, and defining database structures.
SQL is a declarative language, meaning you describe what you want, and the database engine figures out how to get it.
2. DDL & DML Commands
SQL commands are split into categories. DDL (Data Definition Language) manages structure (e.g., `CREATE TABLE`, `ALTER TABLE`). DML (Data Manipulation Language) manages the data itself (e.g., `INSERT`, `UPDATE`, `DELETE`).
3. The SELECT Statement
The `SELECT` statement is used to retrieve data from a database. You specify which columns you want and from which table. For example: `SELECT name, email FROM users;`
4. Filtering Data with WHERE
The `WHERE` clause is used with `SELECT`, `UPDATE`, and `DELETE` to filter records and apply actions only to the rows that meet specific conditions. Example: `SELECT * FROM products WHERE price > 50;`
5. Sorting Data with ORDER BY
The `ORDER BY` clause is used to sort the result set of a query in ascending (`ASC`) or descending (`DESC`) order based on one or more columns.
6. SQL Joins
Joins are used to combine rows from two or more tables based on a related column. The most common types are `INNER JOIN` (for matching values) and `LEFT JOIN` (for all rows from the left table).
7. Aggregate Functions
These functions perform a calculation on a set of rows and return a single summary value. Key functions include `COUNT()`, `SUM()`, `AVG()`, `MIN()`, and `MAX()`.
8. Grouping Data with GROUP BY
The `GROUP BY` clause groups rows that have the same values into summary rows. It's often used with aggregate functions to produce summary reports, like the total sales per region.
9. Subqueries
A subquery is a `SELECT` statement nested inside another statement. It allows you to perform complex queries where the result of one query is used as a condition in another.
10. Normalization
Database normalization is the process of organizing tables to reduce data redundancy and improve data integrity. It involves dividing large tables into smaller, well-structured ones, following rules known as Normal Forms (1NF, 2NF, 3NF).
1. Introduction to MySQL
MySQL is the world's most popular open-source relational database management system (RDBMS). It's known for its reliability, performance, and ease of use, making it a cornerstone of many web applications, including those built with PHP, Java, and Python.
2. Installation & Setup
Installing MySQL involves downloading the community server, running the installer, and configuring a secure root password. Once installed, you can interact with it using the command-line client or a graphical tool like MySQL Workbench.
3. Data Types
MySQL supports various data types, including numeric types (`INT`, `DECIMAL`), date and time types (`DATE`, `DATETIME`), and string types (`VARCHAR`, `TEXT`, `ENUM`). Choosing the correct data type is essential for data integrity and performance.
4. CRUD Operations
CRUD stands for Create, Read, Update, and Delete, which are the four basic functions of persistent storage. In MySQL, these are performed using `INSERT` (Create), `SELECT` (Read), `UPDATE` (Update), and `DELETE` (Delete) statements.
5. Indexes and Performance
Indexes in MySQL are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. Proper indexing is the key to high-performance queries.
6. Storage Engines (InnoDB vs MyISAM)
MySQL uses storage engines to handle the SQL operations for different table types. InnoDB is the default and most-used engine; it is ACID-compliant and supports transactions and foreign keys. MyISAM is older and faster for read-heavy workloads but lacks transactional support.
7. User Management & Privileges
MySQL allows you to create multiple user accounts and grant them specific permissions (privileges) on databases and tables. Commands like `CREATE USER` and `GRANT` are used to manage access control securely.
8. Backup and Restore
Regular backups are critical. The `mysqldump` command-line utility is a common way to create a logical backup of a database, which generates a file of SQL statements. This file can then be used to restore the database on any MySQL server.
9. Built-in Functions
MySQL provides a rich library of built-in functions for string manipulation (`CONCAT`, `SUBSTRING`), numeric calculations (`ROUND`, `CEIL`), and date/time formatting (`DATE_FORMAT`, `NOW()`), which can be used directly in your queries.
10. Stored Procedures
A stored procedure is a prepared SQL code that you can save, so the code can be reused over and over again. It can accept input parameters and return values, allowing you to encapsulate complex business logic directly within the database.
1. Why PostgreSQL?
PostgreSQL, often called "Postgres," is a powerful, open-source object-relational database system known for its robustness, feature set, and extensibility. It's highly compliant with SQL standards and offers advanced features not always found in other systems, like JSONB support and window functions.
2. Installation & Setup
Setting up PostgreSQL involves downloading the installer for your operating system and initializing a database cluster. The primary tool for interacting with Postgres is the `psql` command-line utility, though graphical tools like pgAdmin are also widely used.
3. Advanced Data Types (JSONB, Array)
A key strength of PostgreSQL is its support for advanced data types. You can store arrays directly in a column, and the `JSONB` type allows you to store and query complex JSON documents efficiently, blending the worlds of relational and document databases.
4. CRUD Operations
Like other SQL databases, CRUD operations in PostgreSQL are handled by `INSERT`, `SELECT`, `UPDATE`, and `DELETE`. Postgres also has useful features like the `RETURNING` clause, which can return values from rows that were modified by an insert, update, or delete.
5. Common Table Expressions (CTEs)
CTEs, defined with the `WITH` clause, allow you to create temporary, named result sets that exist only for the duration of a single query. They are incredibly useful for breaking down complex queries into simple, logical, and readable steps.
6. Window Functions
Window functions perform a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function, but unlike regular aggregate functions, they do not group rows into a single output row.
7. Extensions (PostGIS)
PostgreSQL's architecture allows it to be extended by adding new functionality. The most famous extension is PostGIS, which adds support for geographic objects and allows location queries to be run in SQL, turning your database into a powerful geospatial server.
8. Transactions & Concurrency
PostgreSQL has a robust Multi-Version Concurrency Control (MVCC) system, which provides excellent performance for concurrent transactions. It ensures that reads do not block writes and writes do not block reads, a major advantage for high-traffic applications.
9. Roles and Permissions
PostgreSQL manages database access permissions using the concept of roles. A role can be thought of as either a database user or a group of database users. It offers a powerful and flexible system for managing who can access and modify your data.
10. Performance Tuning
Tuning PostgreSQL involves several strategies, including query optimization using `EXPLAIN ANALYZE`, proper indexing, and configuring the `postgresql.conf` file to optimize memory usage, checkpointing, and other server-level settings for your specific workload.
1. Introduction to NoSQL & MongoDB
NoSQL databases provide a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. MongoDB is a leading document-oriented NoSQL database that stores data in flexible, JSON-like documents.
2. Installation & Setup
You can install MongoDB on your local machine or use a cloud service like MongoDB Atlas. Setup involves starting the `mongod` server process and connecting to it using the MongoDB Shell (`mongosh`), a command-line interface.
3. JSON, BSON, and Documents
MongoDB stores data in documents, which are data structures composed of field and value pairs. These documents are stored in BSON (Binary JSON), a binary representation of JSON-like documents that provides more data types and is efficient to scan.
4. CRUD Operations
MongoDB's CRUD operations are performed on collections (groups of documents). The core methods are `insertOne()`/`insertMany()` (Create), `find()`/`findOne()` (Read), `updateOne()`/`updateMany()` (Update), and `deleteOne()`/`deleteMany()` (Delete).
5. Advanced Queries
MongoDB's query language is very powerful. You can query for documents where field values match specific conditions, use comparison operators (`$gt`, `$lt`), logical operators (`$and`, `$or`), and even query for elements within arrays.
6. Aggregation Framework
The Aggregation Framework is a powerful way to process data records and return computed results. It groups values from multiple documents together and can perform a variety of operations on the grouped data to return a single result, similar to SQL's `GROUP BY`.
7. Indexes in MongoDB
Like in SQL databases, indexes are crucial for query performance. MongoDB supports indexes on any field in a document, including fields within embedded documents and arrays. You can create single-field, compound, and multikey indexes.
8. Data Modeling
Data modeling in MongoDB is different from relational databases. Because you can embed documents and use arrays, you often denormalize your data to keep related information together in a single document, which optimizes for read performance.
9. Replication
Replication provides redundancy and increases data availability. With replication, multiple copies of your data are stored on different database servers in what's called a replica set. This provides high availability and protects against the failure of a single server.
10. Sharding
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. It partitions a collection's data across multiple servers (shards) based on a shard key.