Master SQL interview questions covering Joins, Queries, Normalization, Indexes, Stored Procedures, and Database concepts from beginner to advanced.
SQL (Structured Query Language) is the standard language for managing and querying relational databases. It is divided into four sub-languages:
CREATE,
ALTER, DROP, TRUNCATE — define schema.
SELECT,
INSERT, UPDATE, DELETE — manipulate data.
GRANT,
REVOKE — control permissions.
COMMIT,
ROLLBACK, SAVEPOINT — manage transactions.
SQL clauses are written in one order but executed in a different order:
-- Written order:
SELECT → FROM → WHERE → GROUP BY → HAVING → ORDER BY → LIMIT
-- Execution order:
1. FROM / JOIN (identify tables, apply joins)
2. WHERE (filter rows before grouping)
3. GROUP BY (group remaining rows)
4. HAVING (filter groups)
5. SELECT (evaluate expressions, apply DISTINCT)
6. ORDER BY (sort result)
7. LIMIT / OFFSET (return subset)
Understanding this prevents common mistakes like referencing a SELECT alias inside a WHERE clause (it hasn't been defined yet at that stage).
GROUP BY.
Can use aggregate functions.-- WRONG: aggregate in WHERE
SELECT dept, COUNT(*) FROM employees
WHERE COUNT(*) > 5 GROUP BY dept; -- Error!
-- CORRECT: use HAVING for aggregates
SELECT dept, COUNT(*) AS cnt
FROM employees
GROUP BY dept
HAVING COUNT(*) > 5;
WHERE; logged row-by-row; can be rolled back; fires triggers.
WHERE; minimal logging; resets identity/auto-increment; usually
cannot be rolled back; does not fire row-level triggers.
DELETE FROM orders WHERE order_date < '2020-01-01';
TRUNCATE TABLE temp_staging;
DROP TABLE old_archive;
NULL represents an unknown or missing value. SQL
uses three-valued logic (TRUE / FALSE / UNKNOWN). Any comparison
with NULL yields UNKNOWN, not TRUE or FALSE.
-- WRONG: these always return 0 rows
SELECT * FROM t WHERE col = NULL;
SELECT * FROM t WHERE col != NULL;
-- CORRECT
SELECT * FROM t WHERE col IS NULL;
SELECT * FROM t WHERE col IS NOT NULL;
-- COALESCE replaces NULL with a default
SELECT COALESCE(salary, 0) FROM employees;
-- NULL in aggregates: COUNT(*) counts all rows,
-- COUNT(col) ignores NULLs in that column
SELECT COUNT(*), COUNT(salary) FROM employees;
CREATE TABLE example (
code CHAR(3), -- 'US ', 'GBR'
username VARCHAR(50), -- variable
bio TEXT -- large free-form text
);
DISTINCT removes duplicate rows from the result
set. GROUP BY groups rows and allows aggregate functions; it also
effectively de-duplicates, but its primary purpose is aggregation.
-- DISTINCT: unique department names
SELECT DISTINCT dept FROM employees;
-- GROUP BY: unique depts + count of each
SELECT dept, COUNT(*) FROM employees GROUP BY dept;
-- Both produce same unique depts, but GROUP BY is
-- more powerful when you need aggregates.
LIKE performs pattern matching on strings. Two
wildcards:
%: Matches zero or more characters._: Matches exactly one character.-- Names starting with 'A'
SELECT * FROM users WHERE name LIKE 'A%';
-- Names ending with 'son'
SELECT * FROM users WHERE name LIKE '%son';
-- 5-character names starting with 'J'
SELECT * FROM users WHERE name LIKE 'J____';
-- Case-insensitive: use ILIKE (PostgreSQL) or LOWER()
SELECT * FROM users WHERE LOWER(name) LIKE 'a%';
-- Same columns and compatible types required
SELECT name FROM customers
UNION
SELECT name FROM suppliers; -- removes duplicates
SELECT name FROM customers
UNION ALL
SELECT name FROM suppliers; -- keeps all, faster
Constraints enforce rules on data at the database level:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL REFERENCES customers(id),
status VARCHAR(20) DEFAULT 'pending',
amount DECIMAL(10,2) CHECK (amount > 0),
email VARCHAR(100) UNIQUE
);
CREATE TABLE users (
user_id INT PRIMARY KEY, -- one PK, no NULLs
email VARCHAR(100) UNIQUE, -- NULLs allowed
ssn CHAR(11) UNIQUE -- multiple UNIQUE keys OK
);
A composite key is a primary (or unique) key made up of two or more columns. The combination must be unique, even if individual columns are not.
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id) -- composite PK
);
-- order_id=1, product_id=2 → valid
-- order_id=1, product_id=2 → duplicate! violates PK
-- order_id=1, product_id=3 → valid (different combination)
IN checks if a value matches any value in a list
or subquery. It is equivalent to multiple OR conditions but cleaner and
often optimized better by the query planner.
-- Equivalent queries:
SELECT * FROM products
WHERE category IN ('Electronics', 'Books', 'Toys');
SELECT * FROM products
WHERE category = 'Electronics'
OR category = 'Books'
OR category = 'Toys';
-- IN with subquery:
SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE country = 'IN');
-- NOT IN pitfall: if subquery returns any NULL,
-- NOT IN returns no rows (UNKNOWN logic). Use NOT EXISTS instead.
BETWEEN low AND high is a shorthand for
>= low AND <= high. Both endpoints are
inclusive. Works on numbers, dates, and strings.
-- Numeric range
SELECT * FROM products WHERE price BETWEEN 10 AND 100;
-- Date range (inclusive)
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- String range (alphabetical)
SELECT * FROM employees WHERE last_name BETWEEN 'A' AND 'M';
ecommerce_db).-- PostgreSQL hierarchy:
-- Server → Database → Schema → Table
CREATE SCHEMA sales;
CREATE TABLE sales.orders (...);
CREATE TABLE sales.returns (...);
INNER JOIN returns rows that have matching values in both tables.
-- Employees with their department names
SELECT e.employee_id, e.name, d.department_name
FROM employees e
INNER JOIN departments d
ON e.department_id = d.department_id;
-- Only employees who belong to an existing department are returned.
-- Employees with no department (NULL dept_id) are excluded.
Use a LEFT JOIN when you want all records from the left table regardless of whether there is a matching record on the right. Unmatched right-side columns appear as NULL.
-- All customers, even those with no orders
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name;
-- Find customers with NO orders (anti-join pattern)
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
FULL OUTER JOIN returns all rows from both tables. Where there is no match, NULLs fill the columns of the side that has no matching row.
SELECT e.name, d.department_name
FROM employees e
FULL OUTER JOIN departments d
ON e.department_id = d.department_id;
-- Rows with employees but no dept: dept_name = NULL
-- Rows with depts but no employee: e.name = NULL
-- MySQL does NOT support FULL OUTER JOIN natively.
-- Emulate with UNION:
SELECT e.name, d.department_name FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id
UNION
SELECT e.name, d.department_name FROM employees e
RIGHT JOIN departments d ON e.department_id = d.department_id;
A CROSS JOIN produces the Cartesian product — every row from the left table is combined with every row from the right table. If left has 5 rows and right has 3, result has 15 rows. Use with care; large tables cause huge results.
-- Generate all size-color combinations
SELECT s.size, c.color
FROM sizes s
CROSS JOIN colors c;
-- sizes(S,M,L) × colors(Red,Blue) = 6 rows
-- Implicit CROSS JOIN (old syntax, avoid)
SELECT * FROM a, b; -- same as CROSS JOIN
A self join joins a table to itself using aliases. Common for hierarchical data (employee-manager relationships) or comparing rows within the same table.
-- employees table: (id, name, manager_id)
-- manager_id references employees.id
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
-- Find employees earning more than their manager
SELECT e.name, e.salary, m.salary AS mgr_salary
FROM employees e
JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;
-- ON: explicit, flexible
SELECT * FROM employees e
JOIN departments d ON e.department_id = d.department_id;
-- USING: shorthand (both tables have 'department_id')
SELECT * FROM employees
JOIN departments USING (department_id);
-- USING merges the column — no prefix needed
-- ON keeps both columns — must qualify e.dept_id / d.dept_id
Three common approaches — the anti-join pattern:
-- 1. LEFT JOIN + IS NULL (most portable)
SELECT a.id FROM table_a a
LEFT JOIN table_b b ON a.id = b.id
WHERE b.id IS NULL;
-- 2. NOT EXISTS (often best optimizer plan)
SELECT id FROM table_a a
WHERE NOT EXISTS (
SELECT 1 FROM table_b b WHERE b.id = a.id
);
-- 3. NOT IN (AVOID if subquery can return NULLs)
SELECT id FROM table_a
WHERE id NOT IN (SELECT id FROM table_b WHERE id IS NOT NULL);
-- EXCEPT / MINUS (set operation, some databases)
SELECT id FROM table_a
EXCEPT
SELECT id FROM table_b;
A Cartesian product occurs when either the join condition is missing or the relationship is one-to-many/many-to-many and you don't aggregate properly.
-- Missing ON clause → implicit CROSS JOIN
SELECT * FROM employees, departments; -- 100 × 10 = 1000 rows!
-- Many-to-many without aggregation: duplicates
-- orders (1 customer, many orders) joined with
-- order_items (many items per order):
SELECT c.name, SUM(oi.amount)
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
GROUP BY c.name; -- GROUP BY prevents fan-out inflation
-- Always verify JOIN result row count matches expectations.
A NATURAL JOIN automatically joins tables on all columns with the same name. It is convenient but dangerous in production — adding or renaming columns can silently change join behavior.
-- Automatically joins on 'department_id' (shared column)
SELECT * FROM employees NATURAL JOIN departments;
-- Equivalent to:
SELECT * FROM employees
JOIN departments USING (department_id);
-- Avoid in production code; be explicit with ON or USING.
Chain multiple JOIN clauses. The database optimizer determines the optimal join order; the written order is just for readability.
SELECT
c.name AS customer,
o.order_id,
p.product_name,
oi.quantity
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= '2024-01-01'
ORDER BY o.order_id;
-- Method 1: Subquery
SELECT MAX(salary) AS second_highest
FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Method 2: DENSE_RANK window function (most robust)
SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) ranked
WHERE rnk = 2
LIMIT 1;
-- Method 3: OFFSET (simple but fragile with ties)
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
A non-equi join uses a condition other than equality
(=) in the ON clause, such as <, >,
BETWEEN.
-- Assign salary grade based on range table
CREATE TABLE salary_grades (
grade CHAR(1), low INT, high INT
);
SELECT e.name, e.salary, sg.grade
FROM employees e
JOIN salary_grades sg
ON e.salary BETWEEN sg.low AND sg.high;
-- Find all employees whose salary is above the avg
-- of anyone hired before them (historical comparison)
SELECT e1.name, e1.salary
FROM employees e1
JOIN employees e2 ON e2.hire_date < e1.hire_date
GROUP BY e1.id, e1.name, e1.salary
HAVING e1.salary > AVG(e2.salary);
-- Customers who are also suppliers
SELECT name FROM customers
INTERSECT
SELECT name FROM suppliers;
-- Customers who are NOT suppliers
SELECT name FROM customers
EXCEPT
SELECT name FROM suppliers;
-- Both require same number of columns and compatible types.
-- Find duplicate emails in users table
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Show all rows that are duplicates (with row details)
SELECT *
FROM users
WHERE email IN (
SELECT email FROM users
GROUP BY email
HAVING COUNT(*) > 1
)
ORDER BY email;
-- Delete duplicates, keep one (using CTE + ROW_NUMBER)
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS rn
FROM users
)
DELETE FROM cte WHERE rn > 1;
SELECT
COUNT(*) AS total_rows,
COUNT(salary) AS non_null_salaries,
SUM(salary) AS payroll,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees;
Every column in the SELECT list must either be part of the
GROUP BY clause or wrapped in an aggregate function. Violating this causes
an error in standard SQL (MySQL with ONLY_FULL_GROUP_BY mode disabled
is an exception but returns non-deterministic values).
-- CORRECT
SELECT dept, job_title, COUNT(*) AS cnt
FROM employees
GROUP BY dept, job_title; -- all non-aggregated cols in GROUP BY
-- ERROR in standard SQL:
SELECT dept, name, COUNT(*) -- 'name' not in GROUP BY
FROM employees
GROUP BY dept;
-- employees: 100 rows, 20 have salary = NULL
SELECT
COUNT(*) AS all_rows, -- 100
COUNT(1) AS also_all, -- 100
COUNT(salary) AS has_salary -- 80
FROM employees;
GROUP BY ROLLUP(a, b) produces subtotals and a
grand total in addition to the regular group rows. It creates a hierarchy: (a,b),
(a), ().
SELECT region, dept, SUM(sales) AS total
FROM sales_data
GROUP BY ROLLUP(region, dept);
-- Result includes:
-- (region, dept) → detail rows
-- (region, NULL) → subtotal per region
-- (NULL, NULL) → grand total
-- Use GROUPING() to distinguish NULL-subtotal from real NULL:
SELECT
GROUPING(region) AS is_region_total,
region, dept, SUM(sales)
FROM sales_data
GROUP BY ROLLUP(region, dept);
GROUP BY CUBE(a, b) generates subtotals for
all combinations of the listed columns (2^n combinations). For CUBE(a,b):
(a,b), (a), (b), ().
SELECT region, dept, SUM(sales)
FROM sales_data
GROUP BY CUBE(region, dept);
-- Produces:
-- (region, dept) → all detail combinations
-- (region, NULL) → subtotal by region
-- (NULL, dept) → subtotal by dept ← ROLLUP does NOT do this
-- (NULL, NULL) → grand total
-- ROLLUP: hierarchical (one-way drill)
-- CUBE: all cross-combinations (multidimensional analysis)
-- Using a window function (modern, recommended)
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM orders;
-- Partitioned running total (per customer)
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS customer_running_total
FROM orders;
-- Using conditional aggregation (most portable)
SELECT
dept,
SUM(CASE WHEN year = 2022 THEN sales END) AS sales_2022,
SUM(CASE WHEN year = 2023 THEN sales END) AS sales_2023,
SUM(CASE WHEN year = 2024 THEN sales END) AS sales_2024
FROM dept_sales
GROUP BY dept;
-- SQL Server native PIVOT:
SELECT dept, [2022], [2023], [2024]
FROM dept_sales
PIVOT (SUM(sales) FOR year IN ([2022],[2023],[2024])) AS pvt;
GROUPING SETS lets you specify exactly which
grouping combinations you want, without computing all of them like CUBE does.
-- Equivalent to UNION of separate GROUP BYs but more efficient
SELECT region, dept, SUM(sales)
FROM sales_data
GROUP BY GROUPING SETS (
(region, dept), -- detail
(region), -- subtotal by region only
() -- grand total
);
-- Does NOT compute (dept) alone — unlike CUBE
-- Method 1: ORDER BY + LIMIT
SELECT dept, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept
ORDER BY avg_sal DESC
LIMIT 1;
-- Method 2: Subquery (handles ties)
SELECT dept, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept
HAVING AVG(salary) = (
SELECT MAX(avg_sal)
FROM (SELECT AVG(salary) AS avg_sal FROM employees GROUP BY dept) t
);
-- Method 3: RANK() window function (handles ties cleanly)
SELECT dept, avg_sal FROM (
SELECT dept,
AVG(salary) AS avg_sal,
RANK() OVER (ORDER BY AVG(salary) DESC) AS rnk
FROM employees GROUP BY dept
) t WHERE rnk = 1;
COUNT(DISTINCT col) counts the number of unique
non-NULL values in a column. Useful for metrics like unique active users, distinct
products sold, etc.
-- How many unique customers placed orders?
SELECT COUNT(DISTINCT customer_id) AS unique_customers
FROM orders;
-- Unique products per category
SELECT category, COUNT(DISTINCT product_id) AS unique_products
FROM products
GROUP BY category;
-- Note: DISTINCT inside COUNT can be slow on large tables.
-- Approximate alternatives: HyperLogLog, COUNT_APPROX (BigQuery)
A subquery (inner query / nested query) is a SELECT statement embedded inside another SQL statement. Types:
-- Scalar: employees earning above average
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Derived table
SELECT dept, avg_sal FROM (
SELECT dept, AVG(salary) AS avg_sal FROM employees GROUP BY dept
) dept_avg WHERE avg_sal > 50000;
A correlated subquery references a column from the outer query. It is executed once per row of the outer query, making it potentially slow. Can often be rewritten as a JOIN for better performance.
-- Find employees earning more than the avg of their OWN dept
SELECT name, salary, dept
FROM employees e_outer
WHERE salary > (
SELECT AVG(salary)
FROM employees e_inner
WHERE e_inner.dept = e_outer.dept -- correlation
);
-- Equivalent (faster) JOIN approach:
SELECT e.name, e.salary, e.dept
FROM employees e
JOIN (SELECT dept, AVG(salary) AS avg_sal
FROM employees GROUP BY dept) dept_avg
ON e.dept = dept_avg.dept
WHERE e.salary > dept_avg.avg_sal;
-- IN (subquery evaluated once, result materialised)
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE amount > 1000);
-- EXISTS (stops at first match — often faster)
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.amount > 1000
);
-- NOT IN danger: if orders.customer_id has any NULL,
-- NOT IN returns 0 rows. NOT EXISTS handles NULLs correctly.
A CTE (introduced with the WITH
keyword) is a named temporary result set scoped to a single statement. It improves
readability and can be referenced multiple times within the same query.
WITH high_earners AS (
SELECT id, name, salary, dept
FROM employees
WHERE salary > 80000
),
dept_counts AS (
SELECT dept, COUNT(*) AS cnt
FROM high_earners
GROUP BY dept
)
SELECT h.name, h.salary, d.cnt AS high_earners_in_dept
FROM high_earners h
JOIN dept_counts d ON h.dept = d.dept
ORDER BY h.salary DESC;
CTEs are not inherently materialized in all databases — some
treat them as inline views. Use MATERIALIZED hint in PostgreSQL to
force caching.
A recursive CTE references itself, enabling traversal of
hierarchical or graph data (org charts, bill-of-materials, category trees).
Structure: anchor member UNION ALL recursive member.
-- Org chart: find all reports under manager_id = 1
WITH RECURSIVE org_tree AS (
-- Anchor: start with the root
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL -- or WHERE id = 1
UNION ALL
-- Recursive: join children to current level
SELECT e.id, e.name, e.manager_id, ot.depth + 1
FROM employees e
JOIN org_tree ot ON e.manager_id = ot.id
)
SELECT * FROM org_tree ORDER BY depth, name;
A derived table is a subquery used in the
FROM clause, given an alias. It behaves like a temporary table for the
duration of the query. CTEs are often cleaner alternatives.
SELECT dept, avg_sal, RANK() OVER (ORDER BY avg_sal DESC) AS rnk
FROM (
SELECT dept, AVG(salary) AS avg_sal
FROM employees
GROUP BY dept
) AS dept_summary -- derived table alias (required in MySQL)
WHERE avg_sal > 50000;
-- Salary greater than ANY salary in 'Sales' dept
SELECT name FROM employees
WHERE salary > ANY (
SELECT salary FROM employees WHERE dept = 'Sales'
);
-- equivalent to: salary > MIN(sales_salaries)
-- Salary greater than ALL salaries in 'Sales' dept
SELECT name FROM employees
WHERE salary > ALL (
SELECT salary FROM employees WHERE dept = 'Sales'
);
-- equivalent to: salary > MAX(sales_salaries)
WITH.-- CTE: one-time use
WITH ranked AS (SELECT *, ROW_NUMBER() OVER ... FROM t)
SELECT * FROM ranked WHERE rn = 1;
-- View: persisted, reusable
CREATE VIEW top_customers AS
SELECT customer_id, SUM(amount) AS total
FROM orders GROUP BY customer_id HAVING SUM(amount) > 10000;
SELECT * FROM top_customers; -- reusable any time
-- Using DENSE_RANK (handles ties, most robust)
-- Replace 3 with N
SELECT salary FROM (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
) t
WHERE rnk = 3
LIMIT 1;
-- Using LIMIT/OFFSET (no tie handling)
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 2; -- OFFSET = N-1
-- Using correlated subquery (N=3)
SELECT DISTINCT salary FROM employees e1
WHERE 2 = (
SELECT COUNT(DISTINCT salary) FROM employees e2
WHERE e2.salary > e1.salary
);
-- Subquery (fine for simple one-off use):
SELECT * FROM employees
WHERE dept_id = (SELECT id FROM departments WHERE name = 'Sales');
-- CTE (better for reuse and readability):
WITH sales_dept AS (SELECT id FROM departments WHERE name = 'Sales'),
sales_emps AS (SELECT * FROM employees WHERE dept_id IN (SELECT id FROM sales_dept))
SELECT name, salary FROM sales_emps WHERE salary > 60000;
Normalization organizes data to reduce redundancy and improve integrity.
-- 3NF violation: city determines zip_code
-- Split: address(id, street, zip) + zip_city(zip, city)
-- Now zip_code → city is in its own table, no transitive dep.
Denormalization intentionally introduces redundancy to improve read performance. Common in analytics, data warehouses, and reporting layers where JOINs are expensive.
-- Normalized (OLTP):
orders(id, customer_id) → customers(id, name, city)
-- Denormalized (OLAP / reporting):
orders(id, customer_id, customer_name, customer_city)
-- No join needed for common reports; read is faster.
-- Trade-off: updates must propagate to all copies.
-- Typical patterns: summary tables, pre-aggregated columns,
-- embedding frequently-joined dimensions.
MERGE (SQL standard) performs an UPSERT — INSERT
if a row doesn't exist, UPDATE if it does — in a single atomic statement. MySQL uses
INSERT ... ON DUPLICATE KEY UPDATE or REPLACE INTO.
-- Standard MERGE (SQL Server, PostgreSQL, Oracle)
MERGE INTO target_table t
USING source_table s ON t.id = s.id
WHEN MATCHED THEN
UPDATE SET t.name = s.name, t.salary = s.salary
WHEN NOT MATCHED THEN
INSERT (id, name, salary) VALUES (s.id, s.name, s.salary)
WHEN NOT MATCHED BY SOURCE THEN
DELETE; -- optional: remove rows not in source
-- PostgreSQL ON CONFLICT (simpler upsert)
INSERT INTO employees (id, name, salary)
VALUES (1, 'Alice', 90000)
ON CONFLICT (id) DO UPDATE SET salary = EXCLUDED.salary;
-- Add a column
ALTER TABLE employees ADD COLUMN phone VARCHAR(20);
-- Rename a column (PostgreSQL/MySQL 8+)
ALTER TABLE employees RENAME COLUMN phone TO mobile;
-- Modify data type (MySQL)
ALTER TABLE employees MODIFY COLUMN salary DECIMAL(12,2);
-- Change column definition (SQL Server)
ALTER TABLE employees ALTER COLUMN salary DECIMAL(12,2) NOT NULL;
-- Drop a column
ALTER TABLE employees DROP COLUMN phone;
-- Add a constraint
ALTER TABLE employees
ADD CONSTRAINT fk_dept FOREIGN KEY (dept_id)
REFERENCES departments(id);
-- Drop a constraint
ALTER TABLE employees DROP CONSTRAINT fk_dept;
-- Grant SELECT and INSERT on a table to a user
GRANT SELECT, INSERT ON employees TO 'analyst_user';
-- Grant all privileges
GRANT ALL PRIVILEGES ON DATABASE mydb TO admin_user;
-- Grant with ability to re-grant (WITH GRANT OPTION)
GRANT SELECT ON orders TO reporter WITH GRANT OPTION;
-- Revoke INSERT
REVOKE INSERT ON employees FROM 'analyst_user';
-- Roles (modern approach)
CREATE ROLE read_only;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only;
GRANT read_only TO 'analyst_user';
Inserts the results of a SELECT query into a table. Useful for copying data, archiving, ETL, and bulk loading.
-- Copy rows from one table to another
INSERT INTO archived_orders (id, customer_id, amount, order_date)
SELECT id, customer_id, amount, order_date
FROM orders
WHERE order_date < '2023-01-01';
-- Create and populate a new table (SELECT INTO / CREATE TABLE AS)
-- SQL Server:
SELECT * INTO orders_backup FROM orders;
-- PostgreSQL / MySQL:
CREATE TABLE orders_backup AS SELECT * FROM orders;
-- Natural key (email could change — fragile as PK)
CREATE TABLE users (email VARCHAR(100) PRIMARY KEY, name TEXT);
-- Surrogate key (stable, small, fast joins)
CREATE TABLE users (
user_id SERIAL PRIMARY KEY, -- surrogate
email VARCHAR(100) UNIQUE, -- natural, used for lookup
name TEXT
);
Referential integrity ensures a foreign key
value always refers to a valid primary key in the parent table.
ON DELETE and ON UPDATE define what happens when the
parent row changes.
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE CASCADE -- delete orders when customer deleted
ON UPDATE CASCADE -- update FK when PK changes
);
-- Options:
-- CASCADE : propagate the change/delete
-- SET NULL : set FK to NULL
-- SET DEFAULT: set FK to its default value
-- RESTRICT : prevent parent deletion if children exist
-- NO ACTION : like RESTRICT but checked deferred (standard SQL)
-- Star schema (denormalized)
fact_sales(sale_id, date_id, product_id, customer_id, amount)
dim_product(product_id, name, category, brand) -- all in one table
-- Snowflake schema (normalized)
dim_product(product_id, name, category_id)
dim_category(category_id, category_name, brand_id)
dim_brand(brand_id, brand_name) -- normalized sub-dimension
Updating rows in one table using values from another table. Syntax varies by database.
-- PostgreSQL / SQL Server: UPDATE with FROM
UPDATE employees e
SET salary = e.salary * 1.10
FROM departments d
WHERE e.dept_id = d.id
AND d.name = 'Engineering';
-- MySQL: UPDATE with JOIN
UPDATE employees e
JOIN departments d ON e.dept_id = d.id
SET e.salary = e.salary * 1.10
WHERE d.name = 'Engineering';
-- Standard SQL (works everywhere):
UPDATE employees
SET salary = salary * 1.10
WHERE dept_id = (SELECT id FROM departments WHERE name = 'Engineering');
An index is a separate data structure (usually a B-tree) that stores a sorted subset of column values with pointers to corresponding table rows. It speeds up lookups, sorts, and joins at the cost of extra storage and slower writes.
-- Without index: full table scan O(n)
-- With index: B-tree lookup O(log n)
CREATE INDEX idx_emp_salary ON employees(salary);
CREATE INDEX idx_emp_dept_salary ON employees(dept, salary); -- composite
-- B-tree: default, best for range queries and equality
-- Hash: equality only (fast), no range support
-- GIN: full-text search, arrays (PostgreSQL)
-- GiST: geometric, range types
-- Check existing indexes
SHOW INDEX FROM employees; -- MySQL
\d employees -- PostgreSQL
-- SQL Server syntax:
CREATE CLUSTERED INDEX idx_pk ON orders(order_id);
CREATE NONCLUSTERED INDEX idx_date ON orders(order_date);
-- A "covering index" avoids the bookmark lookup by
-- including all needed columns in the index itself:
CREATE INDEX idx_cover ON orders(customer_id)
INCLUDE (order_date, amount); -- covers the query below
SELECT order_date, amount FROM orders WHERE customer_id = 5;
A covering index contains all the columns a query needs, so the database engine can satisfy the query entirely from the index without accessing the base table rows — an "index-only scan".
-- Query: frequently run report
SELECT customer_id, order_date, amount
FROM orders
WHERE customer_id = 42 AND order_date > '2024-01-01';
-- Covering index: includes all three columns
CREATE INDEX idx_covering
ON orders (customer_id, order_date, amount);
-- Engine reads only the index; no row fetching needed.
-- EXPLAIN output shows "Using index" in MySQL
-- or "Index Only Scan" in PostgreSQL → covered query.
EXPLAIN shows the query execution plan without
running the query. EXPLAIN ANALYZE (PostgreSQL) actually executes the
query and shows real timing and row counts.
EXPLAIN SELECT * FROM orders WHERE customer_id = 5;
-- MySQL output columns: id, select_type, table, type,
-- possible_keys, key, rows, Extra
-- Key 'type' values (best → worst):
-- const → PK/unique lookup (1 row)
-- ref → index lookup, multiple rows
-- range → index range scan
-- index → full index scan
-- ALL → full table scan ← BAD
-- PostgreSQL:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 5;
-- Shows: Seq Scan / Index Scan, cost=, actual time=, rows=
WHERE YEAR(created_at) = 2024 —
use range instead.WHERE name LIKE '%son' — use full-text search.
WHERE phone = 12345 (phone is VARCHAR).
-- Index NOT used:
WHERE UPPER(name) = 'ALICE' -- function on column
WHERE salary + 1000 > 90000 -- expression on column
-- FIX: use functional/expression indexes where supported
CREATE INDEX idx_upper_name ON employees (UPPER(name));
WHERE UPPER(name) = 'ALICE' -- now uses index
A partial index (PostgreSQL, SQLite) indexes only a subset of rows that satisfy a WHERE condition. Smaller index, faster scans, less maintenance overhead.
-- Only index active orders (not archived/cancelled)
CREATE INDEX idx_active_orders
ON orders (customer_id, order_date)
WHERE status = 'active';
-- Query that benefits:
SELECT * FROM orders
WHERE customer_id = 5
AND order_date > '2024-01-01'
AND status = 'active'; -- matches partial index condition
-- Index is far smaller than a full index on all rows.
ANALYZE /
UPDATE STATISTICS.
Partitioning divides a large table into smaller physical segments, while appearing as one logical table. The query optimizer prunes irrelevant partitions — "partition pruning".
-- Range partitioning by year (PostgreSQL)
CREATE TABLE orders (
id INT, order_date DATE, amount DECIMAL
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2023
PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024
PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- Query with partition pruning (scans only orders_2024):
SELECT * FROM orders WHERE order_date BETWEEN '2024-03-01' AND '2024-06-30';
-- Types: RANGE, LIST, HASH, COMPOSITE
Cardinality is the number of distinct values in an indexed column. High cardinality (e.g. user_id) means the index is very selective — highly useful. Low cardinality (e.g. a boolean or status with 2 values) means the index is not selective — the optimizer may prefer a full table scan.
-- High cardinality (good index candidate)
email VARCHAR(100) UNIQUE -- every value is different
-- Low cardinality (poor index candidate)
is_active BOOLEAN -- only 2 values: 0 or 1
-- Exception: low-cardinality columns can still help
-- in a composite index or partial index:
CREATE INDEX idx_inactive ON users(email) WHERE is_active = 0;
-- Small fraction of rows → index is selective again.
A composite index on (a, b, c) can be used by
queries that filter on the leftmost prefix: a,
(a,b), or (a,b,c). Queries filtering only on
b or c (without a) cannot use the index
efficiently.
CREATE INDEX idx_abc ON t (a, b, c);
-- Uses index: WHERE a = 1
-- Uses index: WHERE a = 1 AND b = 2
-- Uses index: WHERE a = 1 AND b = 2 AND c = 3
-- Uses index (a only): WHERE a = 1 AND c = 3 (skips b, uses only a)
-- Does NOT use: WHERE b = 2 (no leading a)
-- Does NOT use: WHERE c = 3 (no leading a)
-- Design tip: put high-cardinality and equality columns first,
-- then range columns last.
Window functions perform calculations across a set of rows
related to the current row (a "window") without collapsing the rows into groups.
Unlike GROUP BY, all original rows are retained in the result.
-- GROUP BY: collapses rows
SELECT dept, AVG(salary) FROM employees GROUP BY dept;
-- → one row per dept, individual employees gone
-- Window function: retains all rows
SELECT name, salary, dept,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg
FROM employees;
-- → all rows kept; dept_avg shown alongside each employee
-- Syntax: function() OVER (PARTITION BY ... ORDER BY ... frame_clause)
SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rnk,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;
-- salary=90000: row_num=1, rnk=1, dense_rnk=1
-- salary=90000: row_num=2, rnk=1, dense_rnk=1 (tie!)
-- salary=85000: row_num=3, rnk=3, dense_rnk=2 ← note gap in RANK
LAG(col, n) accesses a value from a
previous row; LEAD(col, n) accesses a value from a
subsequent row within the window. Useful for period-over-period
comparisons.
SELECT
order_date,
revenue,
LAG(revenue, 1, 0) OVER (ORDER BY order_date) AS prev_month_rev,
LEAD(revenue, 1, 0) OVER (ORDER BY order_date) AS next_month_rev,
revenue - LAG(revenue, 1, 0) OVER (ORDER BY order_date) AS mom_change
FROM monthly_revenue;
-- Per-partition: compare each employee's salary to their
-- previous salary in the same department
SELECT name, dept, salary,
LAG(salary) OVER (PARTITION BY dept ORDER BY hire_date) AS prev_salary
FROM employees;
NTILE(n) divides the result set into n
roughly equal buckets and assigns each row a bucket number. Useful for percentile
grouping (quartiles, deciles).
-- Divide employees into salary quartiles
SELECT name, salary,
NTILE(4) OVER (ORDER BY salary) AS quartile
FROM employees;
-- quartile 1 = lowest 25%, quartile 4 = top 25%
-- Deciles for scoring
SELECT customer_id, total_spend,
NTILE(10) OVER (ORDER BY total_spend DESC) AS decile
FROM customer_totals;
-- decile 1 = top 10% spenders
The frame clause defines the subset of rows within a partition
that the window function considers for each row. Syntax:
ROWS|RANGE BETWEEN start AND end.
-- 7-day moving average
SELECT
order_date,
revenue,
AVG(revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_revenue;
-- Common frame boundaries:
-- UNBOUNDED PRECEDING → start of partition
-- N PRECEDING → N rows before current
-- CURRENT ROW → current row
-- N FOLLOWING → N rows after current
-- UNBOUNDED FOLLOWING → end of partition
-- ROWS vs RANGE:
-- ROWS: physical row offset (exact N rows)
-- RANGE: logical value range (rows with same ORDER BY value)
-- FIRST_VALUE: value of specified column from first row in window
-- LAST_VALUE: value from last row in window (requires explicit frame)
SELECT name, dept, salary,
FIRST_VALUE(salary) OVER (
PARTITION BY dept ORDER BY salary DESC
) AS top_salary_in_dept,
LAST_VALUE(salary) OVER (
PARTITION BY dept ORDER BY salary DESC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
-- IMPORTANT: LAST_VALUE needs full frame, else defaults to CURRENT ROW
) AS bottom_salary_in_dept
FROM employees;
-- Top 3 highest-paid employees per department
SELECT name, dept, salary FROM (
SELECT name, dept, salary,
ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) AS rn
FROM employees
) ranked
WHERE rn <= 3;
-- Use RANK() instead of ROW_NUMBER() if you want ties included:
SELECT name, dept, salary FROM (
SELECT name, dept, salary,
RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk
FROM employees
) ranked
WHERE rnk <= 3;
(rank - 1) / (total_rows - 1). Returns 0 for the first row, 1 for
the last.
SELECT name, salary,
PERCENT_RANK() OVER (ORDER BY salary) AS pct_rank,
CUME_DIST() OVER (ORDER BY salary) AS cume_dist
FROM employees;
-- pct_rank: 0 = lowest, 1 = highest
-- cume_dist = 0.75 means salary is >= 75% of all employees
WITH yearly_sales AS (
SELECT YEAR(order_date) AS yr, SUM(amount) AS total
FROM orders
GROUP BY YEAR(order_date)
)
SELECT
yr,
total,
LAG(total) OVER (ORDER BY yr) AS prev_year,
ROUND(
100.0 * (total - LAG(total) OVER (ORDER BY yr))
/ NULLIF(LAG(total) OVER (ORDER BY yr), 0),
2
) AS yoy_pct_change
FROM yearly_sales
ORDER BY yr;
PARTITION BY divides the rows into groups
(partitions) before the window function is applied — analogous to
GROUP BY but without collapsing rows. Each partition is calculated
independently.
-- Without PARTITION BY: one global window
SELECT name, salary,
AVG(salary) OVER () AS company_avg
FROM employees;
-- With PARTITION BY: one window per department
SELECT name, dept, salary,
AVG(salary) OVER (PARTITION BY dept) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY dept) AS diff_from_avg
FROM employees;
A view is a named, stored SELECT query that behaves like a virtual table. It does not store data (by default).
CREATE VIEW active_customers AS
SELECT id, name, email, city
FROM customers
WHERE is_active = 1;
-- Use like a table:
SELECT * FROM active_customers WHERE city = 'Mumbai';
-- Update view:
CREATE OR REPLACE VIEW active_customers AS ...;
DROP VIEW active_customers;
A materialized view actually stores the result set on disk. It acts like a cache — queries hit pre-computed data. Must be refreshed when underlying data changes.
-- PostgreSQL
CREATE MATERIALIZED VIEW dept_summary AS
SELECT dept, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees GROUP BY dept;
-- Query hits stored data (very fast):
SELECT * FROM dept_summary;
-- Refresh on demand:
REFRESH MATERIALIZED VIEW dept_summary;
-- Concurrent refresh (no locking):
REFRESH MATERIALIZED VIEW CONCURRENTLY dept_summary;
-- Use case: expensive aggregations, reporting tables,
-- dashboards queried frequently with infrequent data changes.
CALL. Cannot be used inside a SELECT.-- Stored Procedure (MySQL)
DELIMITER //
CREATE PROCEDURE give_raise(IN dept_name VARCHAR(50), IN pct DECIMAL(5,2))
BEGIN
UPDATE employees SET salary = salary * (1 + pct/100)
WHERE dept = dept_name;
END //
DELIMITER ;
CALL give_raise('Engineering', 10.0);
-- Function
CREATE FUNCTION full_name(first VARCHAR(50), last VARCHAR(50))
RETURNS VARCHAR(100) DETERMINISTIC
RETURN CONCAT(first, ' ', last);
SELECT full_name(first_name, last_name) FROM employees;
A trigger is a stored procedure that automatically executes in response to a specified DML event (INSERT, UPDATE, DELETE) on a table, either BEFORE or AFTER the event.
-- Audit trigger: log every salary change
CREATE TABLE salary_audit (
id INT AUTO_INCREMENT PRIMARY KEY,
emp_id INT, old_salary DECIMAL, new_salary DECIMAL,
changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TRIGGER trg_salary_audit
AFTER UPDATE ON employees
FOR EACH ROW
BEGIN
IF OLD.salary <> NEW.salary THEN
INSERT INTO salary_audit(emp_id, old_salary, new_salary)
VALUES (NEW.id, OLD.salary, NEW.salary);
END IF;
END;
-- Trigger types: BEFORE INSERT, AFTER INSERT,
-- BEFORE UPDATE, AFTER UPDATE,
-- BEFORE DELETE, AFTER DELETE
A cursor allows row-by-row processing of a result set inside a stored procedure or function. They are slow and should be replaced by set-based operations wherever possible.
DECLARE done INT DEFAULT 0;
DECLARE emp_id INT;
DECLARE emp_name VARCHAR(100);
DECLARE cur CURSOR FOR SELECT id, name FROM employees WHERE salary < 30000;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
OPEN cur;
read_loop: LOOP
FETCH cur INTO emp_id, emp_name;
IF done THEN LEAVE read_loop; END IF;
-- process each row...
UPDATE employees SET salary = salary * 1.05 WHERE id = emp_id;
END LOOP;
CLOSE cur;
Prefer:
UPDATE employees SET salary = salary * 1.05 WHERE salary < 30000; —
achieves the same in one set operation.
INSTEAD OF triggers (SQL Server, Oracle) fire in place of the triggering DML statement, allowing you to make otherwise non-updatable views (those with JOINs or aggregates) updatable by redirecting the operation to the underlying tables.
-- SQL Server example: make a JOIN view updatable
CREATE VIEW emp_dept_view AS
SELECT e.id, e.name, d.name AS dept_name
FROM employees e JOIN departments d ON e.dept_id = d.id;
-- Without INSTEAD OF, UPDATE on this view fails.
CREATE TRIGGER trg_instead_update
ON emp_dept_view
INSTEAD OF UPDATE
AS
BEGIN
UPDATE employees
SET name = i.name
FROM inserted i
WHERE employees.id = i.id;
END;
-- MySQL: AUTO_INCREMENT
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100)
);
INSERT INTO users(name) VALUES ('Alice');
SELECT LAST_INSERT_ID(); -- returns generated id
-- PostgreSQL: SERIAL / SEQUENCE
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- creates a sequence automatically
name VARCHAR(100)
);
SELECT currval('users_id_seq');
SELECT nextval('users_id_seq');
-- PostgreSQL modern (SQL standard):
CREATE TABLE users (id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY);
A view is updatable when it references a single base table,
has no DISTINCT, no aggregates, no GROUP BY, no subqueries in SELECT.
WITH CHECK OPTION prevents INSERT/UPDATE through the view that would
make the row invisible to the view.
CREATE VIEW active_users AS
SELECT id, name, email FROM users WHERE is_active = 1
WITH CHECK OPTION;
-- OK: new row passes the view's WHERE condition
INSERT INTO active_users VALUES (100, 'Bob', 'bob@x.com');
-- This would set is_active = 1 implicitly (base table default).
-- REJECTED: row would not be visible through the view
UPDATE active_users SET is_active = 0 WHERE id = 1;
-- CHECK OPTION prevents making the row disappear from view.
-- Scalar UDF (PostgreSQL)
CREATE OR REPLACE FUNCTION tax_amount(price DECIMAL, rate DECIMAL)
RETURNS DECIMAL AS $$
SELECT ROUND(price * rate / 100, 2);
$$ LANGUAGE sql IMMUTABLE;
SELECT product, price, tax_amount(price, 18) FROM products;
-- Table-valued function (SQL Server)
CREATE FUNCTION dbo.GetDeptEmployees(@dept NVARCHAR(50))
RETURNS TABLE AS RETURN (
SELECT id, name, salary FROM employees WHERE dept = @dept
);
SELECT * FROM dbo.GetDeptEmployees('Sales');
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
UPDATE accounts SET balance = balance + 500 WHERE id = 2;
COMMIT; -- both succeed, or ROLLBACK on error → Atomicity
Isolation levels control which anomalies concurrent transactions can see:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT ... ; -- reads a consistent snapshot throughout
COMMIT;
A deadlock occurs when two or more transactions each hold a lock and are each waiting for a lock held by the other, creating a cycle. The database detects this and kills one transaction (victim).
-- T1: locks row A, then tries to lock row B
-- T2: locks row B, then tries to lock row A → DEADLOCK
-- Prevention strategies:
-- 1. Always access tables/rows in the SAME ORDER in all transactions
-- 2. Keep transactions short — reduce lock hold time
-- 3. Use lower isolation levels (READ COMMITTED) where safe
-- 4. Use SELECT ... FOR UPDATE SKIP LOCKED (skip contended rows)
-- 5. Implement optimistic locking (version column check)
-- Retry logic on deadlock error code (MySQL: 1213)
-- is essential in application code.
A SAVEPOINT marks a point within a transaction to which you can roll back without rolling back the entire transaction. Useful for complex multi-step operations.
BEGIN;
INSERT INTO orders(...) VALUES (...);
SAVEPOINT after_order;
INSERT INTO order_items(...) VALUES (...);
-- Something goes wrong with items:
ROLLBACK TO SAVEPOINT after_order;
-- Order is preserved, items are rolled back
-- Try again or handle error...
INSERT INTO order_items(...) VALUES (...); -- retry
COMMIT;
SELECT ... FOR UPDATE acquires an exclusive lock
on the selected rows, preventing other transactions from updating or locking those
rows until the current transaction commits or rolls back. Used to implement
pessimistic locking.
-- Reserve a seat: lock the row before updating
BEGIN;
SELECT * FROM seats
WHERE seat_id = 42 AND status = 'available'
FOR UPDATE; -- acquires row lock
-- If another transaction tries the same query, it waits.
UPDATE seats SET status = 'booked', user_id = 99
WHERE seat_id = 42;
COMMIT;
-- SKIP LOCKED: skip already-locked rows (task queue pattern)
SELECT id, task FROM job_queue
WHERE status = 'pending'
LIMIT 1
FOR UPDATE SKIP LOCKED;
SQL injection is an attack where malicious SQL is injected into user input that gets executed by the database. Prevention:
-- VULNERABLE: string concatenation
query = "SELECT * FROM users WHERE email = '" + user_input + "'";
-- Input: ' OR '1'='1 → returns ALL users!
-- SAFE: parameterized queries / prepared statements
-- Python (psycopg2)
cursor.execute("SELECT * FROM users WHERE email = %s", (user_email,))
-- Java (JDBC)
PreparedStatement ps = conn.prepareStatement(
"SELECT * FROM users WHERE email = ?");
ps.setString(1, userEmail);
-- Other defenses:
-- Stored procedures with parameters
-- Input validation and allowlisting
-- Least-privilege DB accounts
-- Web Application Firewalls (WAF)
-- PostgreSQL: JSON / JSONB (binary, indexed)
CREATE TABLE events (id SERIAL, data JSONB);
INSERT INTO events(data) VALUES
('{"user_id": 1, "action": "click", "tags": ["web", "mobile"]}');
-- Access operators
SELECT data->>'user_id' FROM events; -- text
SELECT data->'tags'->>0 FROM events; -- first tag
SELECT data @> '{"action":"click"}' FROM events; -- contains
-- Index JSONB
CREATE INDEX idx_gin ON events USING GIN (data);
-- MySQL JSON
SELECT JSON_EXTRACT(data, '$.user_id') FROM events;
SELECT data->>'$.action' FROM events; -- shorthand
UPDATE events SET data = JSON_SET(data, '$.status', 'done');
Full-text search enables searching natural language in text
columns, with features like stemming, stop words, and relevance ranking — much more
powerful than LIKE '%word%'.
-- MySQL Full-Text Search
CREATE TABLE articles (
id INT PRIMARY KEY, title TEXT, body TEXT,
FULLTEXT (title, body)
);
-- Natural language mode
SELECT *, MATCH(title, body) AGAINST('database indexing') AS score
FROM articles
WHERE MATCH(title, body) AGAINST('database indexing')
ORDER BY score DESC;
-- Boolean mode
WHERE MATCH(title,body) AGAINST('+SQL -NoSQL' IN BOOLEAN MODE);
-- PostgreSQL: tsvector / tsquery
SELECT title FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('database & index');
CREATE INDEX idx_fts ON articles USING GIN(to_tsvector('english', body));
Gaps-and-islands problems involve finding consecutive sequences ("islands") and breaks ("gaps") in ordered data — e.g., consecutive login days, contiguous date ranges of availability.
-- Find consecutive login day ranges per user
-- Classic solution: subtract row_number from date to get group_id
WITH ranked AS (
SELECT user_id, login_date,
login_date - ROW_NUMBER() OVER (PARTITION BY user_id
ORDER BY login_date) * INTERVAL '1 day'
AS grp
FROM (SELECT DISTINCT user_id, login_date FROM logins) t
)
SELECT user_id,
MIN(login_date) AS streak_start,
MAX(login_date) AS streak_end,
COUNT(*) AS streak_length
FROM ranked
GROUP BY user_id, grp
ORDER BY user_id, streak_start;
System-versioned temporal tables (SQL:2011 standard) automatically track the full history of row changes. Each row records its validity period. Useful for audit trails and point-in-time queries.
-- SQL Server
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
salary DECIMAL(10,2),
valid_from DATETIME2 GENERATED ALWAYS AS ROW START,
valid_to DATETIME2 GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
) WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.employees_history));
-- Query historical state at a point in time:
SELECT * FROM employees
FOR SYSTEM_TIME AS OF '2023-06-01 12:00:00';
-- Query all changes:
SELECT * FROM employees
FOR SYSTEM_TIME ALL
WHERE id = 42 ORDER BY valid_from;
SELECT COALESCE(phone, mobile, 'No contact') FROM users;
-- Avoid division by zero:
SELECT total / NULLIF(count, 0) AS avg_value FROM stats;
-- NULLIF returns NULL when count=0 → result is NULL, not error
SELECT IFNULL(salary, 0) FROM employees; -- MySQL
CASE is SQL's conditional expression, similar to
if-else. Two forms: simple (compare one value) and searched (arbitrary boolean
conditions).
-- Searched CASE (most flexible)
SELECT name, salary,
CASE
WHEN salary >= 100000 THEN 'Senior'
WHEN salary >= 60000 THEN 'Mid'
ELSE 'Junior'
END AS level
FROM employees;
-- Simple CASE (equality match)
SELECT order_id,
CASE status
WHEN 'P' THEN 'Pending'
WHEN 'S' THEN 'Shipped'
WHEN 'D' THEN 'Delivered'
ELSE 'Unknown'
END AS status_label
FROM orders;
-- CASE inside aggregate (conditional count)
SELECT
COUNT(CASE WHEN gender = 'M' THEN 1 END) AS male_count,
COUNT(CASE WHEN gender = 'F' THEN 1 END) AS female_count
FROM employees;
-- MySQL
SELECT NOW(), CURDATE(), CURTIME();
SELECT YEAR(order_date), MONTH(order_date), DAY(order_date)
FROM orders;
SELECT DATEDIFF('2024-12-31', '2024-01-01'); -- 365
SELECT DATE_ADD(order_date, INTERVAL 30 DAY);
SELECT DATE_FORMAT(order_date, '%Y-%m'); -- '2024-03'
-- PostgreSQL
SELECT NOW(), CURRENT_DATE, CURRENT_TIME;
SELECT DATE_PART('year', order_date) AS yr;
SELECT order_date + INTERVAL '30 days';
SELECT TO_CHAR(order_date, 'YYYY-MM');
SELECT AGE(CURRENT_DATE, birth_date); -- interval
-- SQL Server
SELECT GETDATE(), YEAR(order_date), DATEADD(day, 30, order_date);
SELECT DATEDIFF(day, start_date, end_date);
SELECT FORMAT(order_date, 'yyyy-MM');
Aggregates string values from multiple rows into a single concatenated string within a group.
-- PostgreSQL / SQL Server: STRING_AGG
SELECT dept,
STRING_AGG(name, ', ' ORDER BY name) AS employees_list
FROM employees
GROUP BY dept;
-- Engineering: 'Alice, Bob, Charlie'
-- Sales: 'David, Eve'
-- MySQL: GROUP_CONCAT
SELECT dept,
GROUP_CONCAT(name ORDER BY name SEPARATOR ', ') AS employees_list
FROM employees
GROUP BY dept;
-- Useful for: tag lists, comma-separated IDs,
-- report-style aggregated columns.
-- OLTP query: single row lookup
SELECT * FROM orders WHERE order_id = 100042; -- ms latency
-- OLAP query: full scan aggregation
SELECT region, product_category, SUM(revenue)
FROM fact_sales JOIN dim_date USING(date_id)
WHERE year = 2024
GROUP BY ROLLUP(region, product_category);
-- May scan billions of rows — columnar storage is key.
More questionnaires to sharpen your skills