Basic SQL Interview Questions
1. What is the difference between DELETE, TRUNCATE, and DROP?
Great question! In SQL, DELETE, TRUNCATE, and DROP are all used to remove data, but they differ significantly in scope, performance, and rollback capabilities. Here's a breakdown:
🔹 1. DELETE
Purpose: Removes specific rows from a table based on a WHERE clause.
Can be rolled back: ✅ Yes (if inside a transaction).
Triggers: ✅ Fires DELETE triggers.
Performance: Slower, especially for large datasets.
Usage:
SQL
DELETE FROM employees WHERE department = 'HR';
🔹 2. TRUNCATE
Purpose: Removes all rows from a table.
Can be rolled back: ⚠️ Depends on the DBMS (e.g., in Oracle it can be rolled back, in MySQL it cannot).
Triggers: ❌ Does not fire DELETE triggers.
Performance: Faster than DELETE because it doesn't log individual row deletions.
Usage:
SQL
TRUNCATE TABLE employees;
🔹 3. DROP
Purpose: Completely removes the table structure and its data from the database.
Can be rolled back: ❌ No (unless using special features like Oracle Flashback).
Triggers: ❌ Not applicable (table is gone).
Performance: Very fast.
Usage:
SQL
DROP TABLE employees;
🔁 Summary Table
Feature | DELETE | TRUNCATE | DROP |
Removes rows | ✅ (Selective) | ✅ (All rows) | ❌ (Removes table) |
Rollback | ✅ Yes | ⚠️ Depends | ❌ No |
Triggers fire | ✅ Yes | ❌ No | ❌ No |
Performance | Slow (row-by-row) | Fast | Fastest |
Table remains | ✅ Yes | ✅ Yes | ❌ No |
2. What is a primary key vs foreign key?
Here's a clear explanation of the difference between a Primary Key and a Foreign Key in relational databases:
🔹 1. Primary Key
Definition: A column (or set of columns) that uniquely identifies each row in a table.
Characteristics:
Must contain unique values.
Cannot contain NULL values.
Each table can have only one primary key.
Purpose: Ensures entity integrity — no duplicate or missing records.
Example:
SQL
CREATE TABLE Students (
StudentID INT PRIMARY KEY,
Name VARCHAR(100),
Age INT
);
🔹 2. Foreign Key
Definition: A column (or set of columns) that references the primary key of another table.
Characteristics:
Can contain duplicate values.
Can contain NULL values (unless explicitly restricted).
Used to establish relationships between tables.
Purpose: Ensures referential integrity — links data across tables.
Example:
SQL
CREATE TABLE Enrollments (
EnrollmentID INT PRIMARY KEY,
StudentID INT,
CourseID INT,
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
);
🔁 Summary Table
Feature | Primary Key | Foreign Key |
Uniqueness | ✅ Must be unique | ❌ Can be duplicated |
NULLs allowed | ❌ Not allowed | ✅ Allowed (unless restricted) |
Purpose | Identify rows uniquely | Link to another table |
Integrity enforced | Entity integrity | Referential integrity |
Number per table | Only one | Can be multiple |
3. Difference between WHERE and HAVING?
WHERE and HAVING are both used to filter records in SQL, but they are used in different contexts and operate at different stages of query execution.
🔹 1. WHERE
Used for: Filtering rows before any grouping or aggregation.
Applies to: Individual rows in a table.
Cannot be used with aggregate functions like SUM(), AVG(), etc.
Example:
SQL
SELECT * FROM Orders
WHERE OrderDate >= '2025-01-01';
🔹 2. HAVING
Used for: Filtering groups after aggregation (GROUP BY).
Applies to: Groups of rows.
Can be used with aggregate functions.
Example:
SQL
SELECT CustomerID, COUNT(*) AS OrderCount
FROM Orders
GROUP BY CustomerID
HAVING COUNT(*) > 5;
🔁 Summary Table
Feature | WHERE | HAVING |
Filters on | Rows | Groups |
Used with | SELECT, UPDATE, DELETE | SELECT with GROUP BY |
Aggregate functions | ❌ Not allowed | ✅ Allowed |
Execution stage | Before grouping | After grouping |
🧠 Tip:
You can use both in the same query:
SQL
SELECT CustomerID, SUM(TotalAmount) AS TotalSpent
FROM Orders
WHERE OrderDate >= '2025-01-01'
GROUP BY CustomerID
HAVING SUM(TotalAmount) > 10000;
Would you like a visual flowchart showing how SQL processes WHERE and HAVING?
4. Explain INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
Here’s a clear explanation of the four main types of SQL joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN — all used to combine rows from two or more tables based on a related column.
🔹 1. INNER JOIN
Returns: Only the rows that have matching values in both tables.
Use case: When you want data that exists in both tables.
Example:
SQL
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
🔹 2. LEFT JOIN (or LEFT OUTER JOIN)
Returns: All rows from the left table, and matched rows from the right table. If no match, returns NULL for right table columns.
Use case: When you want all records from the left table, even if there's no match in the right.
Example:
SQL
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
🔹 3. RIGHT JOIN (or RIGHT OUTER JOIN)
Returns: All rows from the right table, and matched rows from the left table. If no match, returns NULL for left table columns.
Use case: When you want all records from the right table, even if there's no match in the left.
Example:
SQL
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
RIGHT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
🔹 4. FULL OUTER JOIN
Returns: All rows when there is a match in either left or right table. If no match, returns NULL for missing side.
Use case: When you want all records from both tables, regardless of matches.
Example:
SQL
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
🔁 Summary Table
Join Type | Returns Rows From | Includes Non-Matching Rows From |
INNER JOIN | Both tables | ❌ No |
LEFT JOIN | Left table | ✅ Left only |
RIGHT JOIN | Right table | ✅ Right only |
FULL OUTER JOIN | Both tables | ✅ Both sides |
Would you like a visual diagram showing how these joins work with sample data?
5. What is the difference between CHAR and VARCHAR?
The difference between CHAR and VARCHAR in SQL lies in how they store string data and manage memory:
🔹 1. CHAR (Character)
Fixed-length string data type.
Always reserves the specified number of characters, padding with spaces if the input is shorter.
Faster for fixed-size data due to predictable storage.
Use case: Ideal for fields like country codes ('IN', 'US') or fixed-length IDs.
Example:
SQL
CREATE TABLE Countries (
Code CHAR(2),
Name VARCHAR(50)
);
-- 'IN' will be stored as 'IN' (no padding needed)
-- 'U' will be stored as 'U ' (padded with space)
🔹 2. VARCHAR (Variable Character)
Variable-length string data type.
Stores only the actual number of characters entered.
More efficient for varying-length data.
Use case: Ideal for names, addresses, descriptions, etc.
Example:
SQL
CREATE TABLE Employees (
FirstName VARCHAR(50),
LastName VARCHAR(50)
);
-- 'John' uses only 4 bytes, not 50
🔁 Summary Table
Feature | CHAR | VARCHAR |
Length | Fixed | Variable |
Padding | Pads with spaces | No padding |
Storage efficiency | Less efficient for short data | More efficient |
Performance | Slightly faster for fixed data | Slightly slower for variable |
Use case | Codes, fixed-length fields | Names, descriptions, etc. |
Would you like a visual comparison showing how data is stored in both types?
6. What are indexes and why do we use them?
In SQL, indexes are special data structures that improve the speed and efficiency of data retrieval operations on a database table — much like an index in a book helps you quickly find a topic.
🔹 What is an Index?
An index is a pointer to data in a table. It works like a lookup table that helps the database engine find rows faster without scanning the entire table.
🔹 Why Use Indexes?
Indexes are used to:
✅ Speed up SELECT queries.
✅ Improve performance of JOINs, WHERE, and ORDER BY clauses.
✅ Enforce uniqueness (e.g., via UNIQUE indexes).
❌ But they slow down INSERT, UPDATE, DELETE operations slightly because the index also needs to be updated.
🔹 Types of Indexes
Type | Description |
Primary Index | Automatically created on a primary key. |
Unique Index | Ensures all values in the indexed column are unique. |
Composite Index | Index on multiple columns. |
Full-text Index | Used for searching text-based data efficiently. |
Clustered Index | Sorts and stores the data rows in the table based on the index. |
Non-clustered Index | Stores pointers to the actual data rows. |
🔹 Example
SQL
-- Create an index on the 'LastName' column
CREATE INDEX idx_lastname ON Employees(LastName);
Now, queries like:
SQL
SELECT * FROM Employees WHERE LastName = 'Kumar';
will run faster because the database can use the index to locate matching rows quickly.
🔁 Summary
Feature | Indexes Help With |
Speed | ✅ Faster data retrieval |
Efficiency | ✅ Optimized query performance |
Downsides | ❌ Slower write operations |
Use case | ✅ WHERE, JOIN, ORDER BY |
Would you like a visual diagram showing how indexes work behind the scenes?
7. Explain normalization and denormalization.
Here’s a clear explanation of Normalization and Denormalization in database design:
🔹 1. Normalization
Definition: The process of organizing data to reduce redundancy and improve data integrity.
Goal: Break down large tables into smaller, related tables and eliminate duplicate data.
Benefits:
✅ Saves storage space.
✅ Prevents data anomalies.
✅ Improves consistency.
Drawback:
❌ More complex queries due to multiple joins.
Example: Instead of storing customer and order info in one table:
SQL
-- Normalized structure
Customers (CustomerID, Name)
Orders (OrderID, CustomerID, Product)
🔹 2. Denormalization
Definition: The process of combining tables to reduce joins and improve read performance.
Goal: Optimize for faster querying, especially in read-heavy systems.
Benefits:
✅ Faster data retrieval.
✅ Simpler queries.
Drawback:
❌ More storage used.
❌ Risk of data inconsistency.
Example: Combining customer and order info into one table:
SQL
-- Denormalized structure
CustomerOrders (CustomerID, Name, OrderID, Product)
🔁 Summary Table
Feature | Normalization | Denormalization |
Purpose | Reduce redundancy | Improve performance |
Data integrity | ✅ High | ❌ Lower |
Storage efficiency | ✅ Efficient | ❌ Less efficient |
Query complexity | ❌ More joins | ✅ Fewer joins |
Use case | OLTP (transactional systems) | OLAP (reporting/analytics) |
Would you like a diagram showing how a normalized vs denormalized schema looks visually?
8. Difference between UNION and UNION ALL.
Here’s a clear comparison between UNION and UNION ALL in SQL:
🔹 1. UNION
Combines the result sets of two or more SELECT queries.
Removes duplicates from the final result.
Slower than UNION ALL because it performs a distinct sort.
Use case: When you want a unique list of results.
Example:
SQL
SELECT City FROM Customers
UNION
SELECT City FROM Suppliers;
✅ Returns each city only once, even if it appears in both tables.
🔹 2. UNION ALL
Combines the result sets of two or more SELECT queries.
Keeps duplicates in the final result.
Faster because it skips the duplicate check.
Use case: When you want all records, including duplicates.
Example:
SQL
SELECT City FROM Customers
UNION ALL
SELECT City FROM Suppliers;
✅ Returns all cities, including duplicates.
🔁 Summary Table
Feature | UNION | UNION ALL |
Combines results | ✅ Yes | ✅ Yes |
Removes duplicates | ✅ Yes | ❌ No |
Performance | ❌ Slower | ✅ Faster |
Use case | Unique results | All results (including duplicates) |
Would you like a visual example showing how both behave with sample data?
9. What is a view? Can we update data through a view?
🔹 What is a View in SQL?
A view is a virtual table based on the result of a SQL query. It doesn't store data itself but displays data stored in other tables. Think of it as a saved query that you can treat like a regular table.
✅ Why Use Views?
Simplify complex queries.
Improve security by exposing only specific columns.
Provide a consistent interface to changing data structures.
Help with reporting and analytics.
🔹 Example:
SQL
CREATE VIEW ActiveCustomers AS
SELECT CustomerID, Name, Email
FROM Customers
WHERE Status = 'Active';
Now you can query it like a table:
SQL
SELECT * FROM ActiveCustomers;
🔄 Can You Update Data Through a View?
✅ Yes, but only if:
The view is based on a single table.
It does not use:
GROUP BY
DISTINCT
JOIN
UNION
Aggregate functions (SUM, AVG, etc.)
❌ No, if the view is:
Based on multiple tables.
Contains aggregations, grouping, or set operations.
🔁 Summary Table
Feature | View |
Stores data | ❌ No (virtual table) |
Based on query | ✅ Yes |
Can be queried like table | ✅ Yes |
Can be updated | ✅ Sometimes (with conditions) |
Use case | Simplify logic, restrict access |
Would you like an example of an updatable view vs a non-updatable view to see the difference in action?
10. What is the difference between NULL and 0?
Difference Between NULL and 0 in SQL
NULL and 0 may look similar at first glance (especially when a column appears empty), but they represent very different concepts in SQL:
✅ NULL
Meaning: Represents missing, unknown, or undefined data.
Data type: Can appear in any column type (text, number, date, etc.).
Behavior:
Arithmetic operations with NULL result in NULL.
Comparisons with NULL require IS NULL or IS NOT NULL.
Example:
SQL
SELECT * FROM Employees WHERE ManagerID IS NULL;
✅ 0
Meaning: A known numeric value — specifically, zero.
Data type: Only valid in numeric columns.
Behavior:
Arithmetic operations with 0 behave normally.
Can be compared using standard operators (=, <, >).
Example:
SQL
SELECT * FROM Products WHERE Quantity = 0;
🔁 Summary Table
Feature | NULL | 0 |
Represents | Unknown or missing value | Numeric value (zero) |
Data type | Any | Numeric only |
Arithmetic | Results in NULL | Normal math operations |
Comparisons | Use IS NULL / IS NOT NULL | Use =, <, > etc. |
Storage | No actual value stored | Value stored as 0 |
🧠 Tip:
In SQL, NULL ≠ 0 — they are not equal, and treating them as such can lead to logic errors in queries.
Would you like a visual example showing how queries behave differently with NULL vs 0?
---
🔹 Intermediate SQL Interview Questions
1. What is a clustered vs non-clustered index?
🔹 Clustered vs Non-Clustered Index in SQL
Indexes help speed up data retrieval, and they come in two main types: clustered and non-clustered. Here's a clear breakdown of the differences:
✅ 1. Clustered Index
Definition: Sorts and stores the actual data rows in the table based on the index key.
Only one per table: Because the data rows themselves are sorted, you can only have one clustered index.
Faster for range queries: Since data is physically ordered.
Automatically created on the primary key (unless specified otherwise).
Example:
SQL
CREATE CLUSTERED INDEX idx_customer_id ON Customers(CustomerID);
✅ 2. Non-Clustered Index
Definition: Creates a separate structure that stores pointers to the actual data rows.
Multiple allowed: You can have many non-clustered indexes on a table.
Slower for range queries, but useful for quick lookups on non-primary columns.
Example:
SQL
CREATE NONCLUSTERED INDEX idx_customer_name ON Customers(Name);
🔁 Summary Table
Feature | Clustered Index | Non-Clustered Index |
Data storage | Sorts actual table data | Stores pointers to data rows |
Number per table | Only one | Multiple allowed |
Speed for range queries | ✅ Faster | ❌ Slower |
Automatically created on | Primary key | Not automatic |
Use case | Efficient sorting & range scans | Fast lookups on non-key columns |
🧠 Tip:
Think of a clustered index like a dictionary where pages are sorted alphabetically. A non-clustered index is like a table of contents pointing to specific pages.
Would you like a visual diagram showing how both types of indexes work with sample data?
2. Explain window functions (ROW_NUMBER, RANK, DENSE_RANK).
🔹 What Are Window Functions in SQL?
Window functions perform calculations across a set of table rows that are related to the current row — without collapsing rows like GROUP BY does. They are super useful for ranking, running totals, and comparisons.
✅ Common Window Functions
1. ROW_NUMBER()
Assigns a unique sequential number to each row within a partition.
No ties — every row gets a distinct number.
Example:
SQL
SELECT
EmployeeID, Department, Salary,
ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RowNum
FROM Employees;
➡️ Gives each employee a unique rank within their department based on salary.
2. RANK()
Assigns a rank to each row with gaps if there are ties.
Tied rows get the same rank, and the next rank is skipped.
Example:
SQL
SELECT
EmployeeID, Salary,
RANK() OVER (ORDER BY Salary DESC) AS Rank
FROM Employees;
➡️ If two employees have the same salary, they get the same rank, and the next rank is skipped (e.g., 1, 2, 2, 4).
3. DENSE_RANK()
Similar to RANK(), but no gaps in ranking.
Tied rows get the same rank, and the next rank continues sequentially.
Example:
SQL
SELECT
EmployeeID, Salary,
DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank
FROM Employees;
➡️ If two employees tie, they get the same rank, but the next rank is just incremented (e.g., 1, 2, 2, 3).
🔁 Summary Table
Function | Handles Ties | Gaps in Rank | Use Case |
ROW_NUMBER() | ❌ No | ❌ No | Unique row numbering |
RANK() | ✅ Yes | ✅ Yes | Ranking with gaps |
DENSE_RANK() | ✅ Yes | ❌ No | Ranking without gaps |
Would you like a visual example showing how these functions behave on sample data?
3. What is the difference between EXISTS and IN?
🔹 Difference Between EXISTS and IN in SQL
Both EXISTS and IN are used in subqueries to filter data, but they work differently under the hood and are best suited for different scenarios.
✅ 1. IN
Checks if a value exists in a list or result set.
Subquery returns a list of values, and the outer query checks if a value is in that list.
Better for small, static lists or when the subquery returns a single column.
Example:
SQL
SELECT * FROM Employees
WHERE DepartmentID IN (SELECT DepartmentID FROM Departments WHERE Location = 'Chennai');
✅ 2. EXISTS
Checks if a subquery returns any rows.
Subquery is usually correlated (depends on the outer query).
Faster for large datasets because it stops searching once it finds a match.
Example:
SQL
SELECT * FROM Employees E
WHERE EXISTS (
SELECT 1 FROM Departments D
WHERE D.DepartmentID = E.DepartmentID AND D.Location = 'Chennai'
);
🔁 Summary Table
Feature | IN | EXISTS |
Subquery returns | A list of values | A boolean (true/false) |
Best for | Small lists, single-column checks | Large datasets, correlated queries |
Performance | Slower on large subqueries | Faster due to short-circuiting |
Null handling | Can cause issues with NULL | Handles NULL more gracefully |
🧠 Tip:
Use IN when you're comparing a single column to a list.
Use EXISTS when you're checking for the existence of related rows.
Would you like a performance comparison example or a diagram showing how each works internally?
4. Explain CTE (Common Table Expressions).
What is a CTE (Common Table Expression)?
A CTE (Common Table Expression) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps make complex queries more readable, modular, and maintainable.
✅ Syntax of a CTE
SQL
WITH CTE_Name AS (
SELECT column1, column2
FROM SomeTable
WHERE condition
)
SELECT * FROM CTE_Name;
🔹 Why Use CTEs?
✅ Simplifies complex joins and subqueries.
✅ Improves readability and structure.
✅ Allows recursion (e.g., hierarchical data like org charts).
✅ Can be reused multiple times in the same query.
🔹 Example: Basic CTE
SQL
WITH HighSalaryEmployees AS (
SELECT EmployeeID, Name, Salary
FROM Employees
WHERE Salary > 100000
)
SELECT * FROM HighSalaryEmployees;
🔹 Example: Recursive CTE (for hierarchical data)
SQL
WITH EmployeeHierarchy AS (
SELECT EmployeeID, ManagerID, Name
FROM Employees
WHERE ManagerID IS NULL
UNION ALL
SELECT e.EmployeeID, e.ManagerID, e.Name
FROM Employees e
INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID
)
SELECT * FROM EmployeeHierarchy;
🔁 Summary Table
Feature | CTE |
Scope | Temporary (within the query) |
Improves readability | ✅ Yes |
Supports recursion | ✅ Yes |
Reusable in query | ✅ Yes |
Stored in DB | ❌ No |
Would you like a visual diagram showing how recursive CTEs work in hierarchical data?
5. What are stored procedures vs functions?
🔹 Stored Procedures vs Functions in SQL
Both stored procedures and functions are reusable blocks of SQL code, but they serve different purposes and have distinct behaviors.
✅ 1. Stored Procedure
A set of SQL statements that perform a specific task.
Can perform actions like modifying data (INSERT, UPDATE, DELETE).
Can return multiple result sets or no result.
Called using EXEC or CALL.
Example:
SQL
CREATE PROCEDURE GetHighSalaryEmployees
AS
BEGIN
SELECT * FROM Employees WHERE Salary > 100000;
END;
Usage:
SQL
EXEC GetHighSalaryEmployees;
✅ 2. Function
Returns a single value or a table.
Can be used inside SQL expressions (e.g., in SELECT, WHERE, etc.).
Cannot modify database state (no INSERT, UPDATE, DELETE).
Must return a value using RETURN.
Example:
SQL
CREATE FUNCTION GetBonus (@Salary INT)
RETURNS INT
AS
BEGIN
RETURN @Salary * 0.10;
END;
Usage:
SQL
SELECT Name, GetBonus(Salary) AS Bonus FROM Employees;
🔁 Summary Table
Feature | Stored Procedure | Function |
Returns value | Optional | Mandatory |
Can modify data | ✅ Yes | ❌ No |
Used in SQL statements | ❌ No | ✅ Yes |
Returns multiple results | ✅ Yes | ❌ No (usually one value) |
Called using | EXEC or CALL | Used in expressions |
Would you like a diagram showing how procedures and functions fit into a database workflow?
6. Difference between correlated and non-correlated subquery.
🔹 Correlated vs Non-Correlated Subquery in SQL
Both are types of subqueries, but they differ in how they interact with the outer query.
✅ 1. Non-Correlated Subquery
Independent of the outer query.
Executes once, and the result is used by the outer query.
Can be run standalone.
Example:
SQL
SELECT Name FROM Employees
WHERE DepartmentID IN (
SELECT DepartmentID FROM Departments WHERE Location = 'Chennai'
);
➡️ The subquery runs once and returns a list of department IDs.
✅ 2. Correlated Subquery
Depends on the outer query for its values.
Executes once for each row of the outer query.
Cannot run independently.
Example:
SQL
SELECT Name FROM Employees E
WHERE EXISTS (
SELECT 1 FROM Departments D
WHERE D.DepartmentID = E.DepartmentID AND D.Location = 'Chennai'
);
➡️ The subquery runs for each employee, checking if their department is in Chennai.
🔁 Summary Table
Feature | Non-Correlated Subquery | Correlated Subquery |
Depends on outer query | ❌ No | ✅ Yes |
Execution frequency | Once | Once per outer row |
Performance | ✅ Faster | ❌ Slower (can be expensive) |
Can run standalone | ✅ Yes | ❌ No |
Use case | Static filtering | Row-by-row comparison |
Would you like a visual example showing how both types behave with sample data?
7. How do you handle duplicate rows in SQL?
How to Handle Duplicate Rows in SQL
Duplicate rows can occur due to data entry errors, joins, or imports. Here are several ways to identify, remove, or prevent duplicates in SQL:
✅ 1. Identify Duplicates
To find duplicates based on one or more columns:
SQL
SELECT column1, column2, COUNT(*)
FROM table_name
GROUP BY column1, column2
HAVING COUNT(*) > 1;
✅ 2. Delete Duplicate Rows (Keep One)
Using CTE and ROW_NUMBER():
SQL
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (
PARTITION BY column1, column2
ORDER BY id
) AS rn
FROM table_name
)
DELETE FROM CTE WHERE rn > 1;
➡️ Keeps the first occurrence and deletes the rest.
✅ 3. Using DISTINCT to Remove Duplicates in Query Results
SQL
SELECT DISTINCT column1, column2
FROM table_name;
➡️ This doesn’t change the table — it just removes duplicates from the result set.
✅ 4. Prevent Duplicates Using Constraints
Add a UNIQUE constraint to prevent future duplicates:
SQL
ALTER TABLE table_name
ADD CONSTRAINT unique_constraint_name UNIQUE (column1, column2);
🔁 Summary Table
Task | Method |
Find duplicates | GROUP BY + HAVING COUNT(*) > 1 |
Delete duplicates | ROW_NUMBER() in a CTE |
Remove from result only | SELECT DISTINCT |
Prevent future duplicates | UNIQUE constraint |
Would you like help writing a query for a specific table or dataset you're working with?
8. What is a transaction? Explain COMMIT, ROLLBACK, and SAVEPOINT.
🔹 What is a Transaction in SQL?
A transaction is a sequence of one or more SQL operations (like INSERT, UPDATE, DELETE) that are executed as a single unit of work. Transactions ensure data integrity and follow the ACID properties:
Atomicity: All operations succeed or none do.
Consistency: Data remains in a valid state.
Isolation: Transactions don’t interfere with each other.
Durability: Changes persist even after a crash.
✅ Key Transaction Control Commands
1. COMMIT
Saves all changes made during the transaction.
Once committed, changes are permanent.
Example:
SQL
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;
COMMIT;
2. ROLLBACK
Undoes all changes made during the transaction.
Used when an error occurs or a condition fails.
Example:
SQL
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
-- Something goes wrong
ROLLBACK;
3. SAVEPOINT
Creates a checkpoint within a transaction.
Allows partial rollback to that point without undoing the entire transaction.
Example:
SQL
BEGIN TRANSACTION;
UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;
SAVEPOINT AfterDebit;
UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;
-- Something goes wrong
ROLLBACK TO AfterDebit;
COMMIT;
🔁 Summary Table
Command | Purpose | Reversibility |
COMMIT | Finalizes and saves changes | ❌ No |
ROLLBACK | Cancels all changes in the transaction | ✅ Yes |
SAVEPOINT | Sets a checkpoint for partial rollback | ✅ Yes |
Would you like a visual flowchart showing how transactions and rollback points work?
9. What is the difference between OLTP and OLAP databases?
🔹 Difference Between OLTP and OLAP Databases
OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two types of database systems designed for different purposes:
✅ 1. OLTP (Online Transaction Processing)
Purpose: Handles day-to-day operations like insertions, updates, and deletions.
Focus: Speed and accuracy of short, atomic transactions.
Data structure: Highly normalized to reduce redundancy.
Users: Front-line staff, customers, applications.
Examples:
Banking systems
E-commerce platforms
Reservation systems
Characteristics:
Feature | OLTP |
Operations | INSERT, UPDATE, DELETE |
Query type | Simple, fast queries |
Data volume | Smaller per transaction |
Response time | Milliseconds |
Design | Normalized schema |
✅ 2. OLAP (Online Analytical Processing)
Purpose: Supports complex queries for data analysis and reporting.
Focus: Speed and flexibility of read-heavy operations.
Data structure: Often denormalized for performance.
Users: Analysts, decision-makers, BI tools.
Examples:
Sales forecasting
Market trend analysis
Executive dashboards
Characteristics:
Feature | OLAP |
Operations | SELECT, GROUP BY, JOIN |
Query type | Complex, multi-dimensional |
Data volume | Large datasets |
Response time | Seconds to minutes |
Design | Star/Snowflake schema |
🔁 Summary Table
Feature | OLTP | OLAP |
Purpose | Transactional processing | Analytical processing |
Query type | Simple, fast | Complex, aggregated |
Schema design | Normalized | Denormalized |
Data volume | Small per transaction | Large for analysis |
Users | Operational staff | Analysts, executives |
Examples | ATM, POS, CRM | BI dashboards, data warehouses |
Would you like a visual diagram showing how OLTP and OLAP systems interact in a typical enterprise architecture?
10. What is the difference between DELETE with WHERE and TRUNCATE?
Difference Between DELETE with WHERE and TRUNCATE in SQL
Both DELETE and TRUNCATE are used to remove data from a table, but they differ in how they work, performance, and rollback behavior.
✅ 1. DELETE with WHERE
Removes specific rows from a table based on a condition.
Row-by-row deletion — slower for large datasets.
Can be rolled back if used within a transaction.
Triggers: ✅ Fires DELETE triggers.
Usage:
SQL
DELETE FROM Employees WHERE Department = 'HR';
✅ 2. TRUNCATE
Removes all rows from a table — no filtering.
Faster than DELETE because it doesn’t log individual row deletions.
Cannot be rolled back in some databases (depends on DBMS).
Triggers: ❌ Does not fire DELETE triggers.
Usage:
SQL
TRUNCATE TABLE Employees;
🔁 Summary Table
Feature | DELETE with WHERE | TRUNCATE |
Removes | Specific rows | All rows |
Filtering | ✅ Yes (WHERE clause) | ❌ No filtering |
Rollback | ✅ Yes (in transaction) | ⚠️ Depends on DBMS |
Triggers | ✅ Fires DELETE triggers | ❌ Does not fire triggers |
Performance | ❌ Slower (row-by-row) | ✅ Faster (bulk operation) |
Table structure | ✅ Remains unchanged | ✅ Remains unchanged |
🧠 Tip:
Use DELETE when you need fine control over which rows to remove. Use TRUNCATE when you want a quick cleanup of the entire table.
Would you like a diagram showing how both commands affect a sample table?
---
🔹 Advanced SQL Interview Questions
1. Explain partitioning in SQL (horizontal vs vertical).
Partitioning is a database design technique used to divide a large table into smaller, more manageable pieces — called partitions — while keeping them logically as one table. This improves performance, maintenance, and scalability.
There are two main types:
✅ 1. Horizontal Partitioning
Divides rows of a table into multiple partitions based on a condition (e.g., date, region, ID range).
Each partition contains a subset of rows.
Common in large transactional tables (e.g., sales data by year).
Example:
SQL
-- Sales data partitioned by year
Sales_2023
Sales_2024
Sales_2025
Use case: Improves query performance by scanning only relevant partitions.
✅ 2. Vertical Partitioning
Divides columns of a table into multiple tables.
Each partition contains a subset of columns, often grouped by usage or sensitivity.
Common in wide tables with many columns.
Example:
SQL
-- Original table: Customers (ID, Name, Email, Address, Preferences)
-- Vertical partitions:
CustomerBasic (ID, Name, Email)
CustomerDetails (ID, Address, Preferences)
Use case: Improves performance and security by accessing only needed columns.
🔁 Summary Table
Feature | Horizontal Partitioning | Vertical Partitioning |
Divides by | Rows | Columns |
Partition contains | Subset of rows | Subset of columns |
Use case | Large datasets by time/region | Wide tables, column grouping |
Performance benefit | Faster row filtering | Faster column access |
Common in | OLTP and OLAP systems | OLTP systems |
Would you like a visual diagram showing how horizontal and vertical partitioning work on a sample table?
2. How does indexing improve performance, and what are drawbacks?
Indexing is like creating a shortcut for the database to find data faster — similar to how a book index helps you quickly locate topics without reading every page.
✅ Benefits of Indexing
Faster Data Retrieval
Speeds up SELECT queries, especially with WHERE, JOIN, ORDER BY, and GROUP BY.
Reduces the number of rows scanned by using a search tree or hash map.
Efficient Sorting and Filtering
Indexes help sort data quickly, improving performance for queries with ORDER BY.
Improved Join Performance
Indexes on foreign keys and join columns make joins faster by reducing lookup time.
Enforces Uniqueness
UNIQUE indexes ensure no duplicate values in a column (e.g., email, username).
❌ Drawbacks of Indexing
Slower Write Operations
INSERT, UPDATE, and DELETE become slower because indexes must also be updated.
Increased Storage Usage
Indexes consume additional disk space, especially for large tables or multiple indexes.
Complex Maintenance
Requires careful planning to avoid redundant or unused indexes.
Poorly designed indexes can hurt performance instead of helping.
Not Always Used
The query optimizer may ignore indexes if it determines a full scan is faster.
🔁 Summary Table
Feature | Benefit | Drawback |
Query speed | ✅ Faster reads | ❌ Slower writes |
Storage usage | ✅ Efficient access | ❌ More disk space |
Maintenance | ✅ Better performance tuning | ❌ Requires careful design |
Use in joins & filters | ✅ Optimized joins | ❌ May be ignored by optimizer |
Would you like a visual diagram showing how indexes work internally (e.g., B-tree vs hash index)?
3. What is query execution plan? How would you optimize a query?
A Query Execution Plan (QEP) is a detailed roadmap that the database engine uses to execute a SQL query. It shows how the query will be processed — including which indexes are used, how tables are joined, and in what order operations occur.
Think of it as a behind-the-scenes blueprint that helps you understand and optimize query performance.
✅ How to View a Query Execution Plan
Most databases support commands or tools to view the plan:
SQL Server: SET SHOWPLAN_ALL ON or use the "Execution Plan" tab in SSMS.
PostgreSQL: EXPLAIN or EXPLAIN ANALYZE
MySQL: EXPLAIN SELECT ...
Example:
SQL
EXPLAIN SELECT * FROM Orders WHERE CustomerID = 101;
🔹 Key Components in a Plan
Component | Description |
Scan Type | Table scan, index scan, etc. |
Join Type | Nested loop, hash join, merge join |
Cost Estimate | Estimated time/resources |
Rows Returned | Estimated number of rows |
Index Usage | Whether an index is used |
✅ How to Optimize a Query
Here are common strategies:
1. Use Indexes Wisely
Create indexes on columns used in WHERE, JOIN, ORDER BY.
Avoid indexing columns with low selectivity (e.g., gender).
2. **Avoid SELECT ***
Only select the columns you need.
3. Use WHERE Clauses Efficiently
Filter early to reduce row scans.
4. Optimize Joins
Use appropriate join types.
Ensure join columns are indexed.
5. Use CTEs or Subqueries Carefully
Avoid unnecessary nesting or repeated subqueries.
6. Analyze Execution Plan
Look for full table scans, missing indexes, or expensive operations.
7. Update Statistics
Keep database statistics up to date for better query planning.
🔁 Summary Table
Feature | Query Execution Plan | Query Optimization |
Purpose | Shows how query will run | Improves query performance |
Tools | EXPLAIN, SSMS, pgAdmin | Indexing, filtering, joins |
Common issues | Full scans, bad joins | Missing indexes, SELECT * |
Benefit | Insight into performance | Faster, more efficient queries |
Would you like help analyzing a specific query or execution plan from your database?
4. Difference between materialized view and normal view.
Both views and materialized views are used to simplify complex queries and present data in a structured way — but they differ in how data is stored and refreshed.
✅ 1. Normal View
A virtual table based on a SQL query.
Does not store data — it fetches data from the base tables every time it's queried.
Always shows up-to-date data.
Lightweight and fast to create.
No indexing on the view itself.
Example:
SQL
CREATE VIEW ActiveCustomers AS
SELECT * FROM Customers WHERE Status = 'Active';
✅ 2. Materialized View
A physical copy of the result set of a query.
Stores data like a regular table.
Must be refreshed manually or periodically to stay up-to-date.
Improves performance for complex or expensive queries.
Can be indexed for faster access.
Example:
SQL
CREATE MATERIALIZED VIEW SalesSummary AS
SELECT Region, SUM(Sales) FROM Orders GROUP BY Region;
🔁 Summary Table
Feature | Normal View | Materialized View |
Data storage | ❌ No (virtual) | ✅ Yes (physical) |
Performance | ❌ Slower for complex queries | ✅ Faster (precomputed) |
Data freshness | ✅ Always current | ❌ May be stale (needs refresh) |
Indexing | ❌ Not supported | ✅ Supported |
Use case | Lightweight query abstraction | Heavy reporting, analytics |
Would you like a diagram showing how views and materialized views interact with base tables and queries?
5. Explain star schema and snowflake schema.
🔹 Star Schema vs Snowflake Schema in Data Warehousing
Both Star Schema and Snowflake Schema are data modeling techniques used in dimensional databases (typically OLAP systems) to organize data for reporting and analytics. Here's a clear comparison:
⭐ 1. Star Schema
Structure: Central fact table connected directly to multiple dimension tables.
Dimension tables are denormalized (contain redundant data).
Simpler and faster for querying.
Use case: Ideal for performance-focused reporting systems.
Example:
Date
|
Product — Fact_Sales — Customer
|
Region
Advantages:
✅ Simple design
✅ Faster query performance
✅ Easy to understand
Disadvantages:
❌ Redundant data in dimension tables
❌ Larger storage footprint
❄️ 2. Snowflake Schema
Structure: Central fact table connected to normalized dimension tables (which may have sub-dimensions).
Dimension tables are split into multiple related tables.
More complex, but saves space and improves data integrity.
Example:
Date
|
Product — Fact_Sales — Customer — Customer_Location
|
Region — Country
Advantages:
✅ Reduced data redundancy
✅ Better data integrity
✅ Smaller storage size
Disadvantages:
❌ More complex joins
❌ Slower query performance
🔁 Summary Table
Feature | Star Schema | Snowflake Schema |
Dimension tables | Denormalized | Normalized |
Query performance | ✅ Faster | ❌ Slower |
Storage efficiency | ❌ Less efficient | ✅ More efficient |
Design complexity | ✅ Simple | ❌ Complex |
Use case | Reporting, dashboards | Detailed analysis, data integrity |
6. How would you implement SCD Type 1 vs Type 2 in SQL?
SCD refers to how dimensional data (like customer or product info) changes over time in a data warehouse. Two common types are:
✅ SCD Type 1: Overwrite Old Data
Purpose: Keep only the latest data.
Old values are lost — no history is maintained.
Use case: When historical changes are not important (e.g., correcting typos).
🔧 Implementation in SQL:
SQL
UPDATE Customers
SET Email = 'new_email@example.com'
WHERE CustomerID = 101;
➡️ This updates the existing record directly.
✅ SCD Type 2: Preserve History
Purpose: Keep historical versions of data.
New row is inserted for each change.
Typically includes:
EffectiveDate
EndDate
IsCurrent flag
🔧 Implementation in SQL:
SQL
-- Step 1: Mark old record as inactive
UPDATE Customers
SET EndDate = CURRENT_DATE, IsCurrent = 0
WHERE CustomerID = 101 AND IsCurrent = 1;
-- Step 2: Insert new record
INSERT INTO Customers (CustomerID, Name, Email, StartDate, EndDate, IsCurrent)
VALUES (101, 'John Doe', 'new_email@example.com', CURRENT_DATE, NULL, 1);
➡️ This keeps both old and new versions of the customer data.
🔁 Summary Table
Feature | SCD Type 1 | SCD Type 2 |
Data history | ❌ Not preserved | ✅ Preserved |
Action on change | Overwrite | Insert new row |
Complexity | ✅ Simple | ❌ More complex |
Use case | Minor corrections | Track historical changes |
Would you like a visual diagram showing how SCD Type 1 and Type 2 records evolve over time?
7. What are window frames in SQL (e.g., ROWS BETWEEN) and their use cases?
Window frames define the subset of rows within a partition that a window function (like SUM, AVG, ROW_NUMBER, etc.) should operate on. They are used with the OVER() clause to control how much data is considered for each row's calculation.
✅ Syntax Overview
SQL
<window_function>() OVER (
PARTITION BY column
ORDER BY column
ROWS BETWEEN <frame_start> AND <frame_end>
)
🔹 Common Frame Types
1. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Includes all rows from the start of the partition up to the current row.
Useful for running totals.
SQL
SELECT
EmployeeID,
Salary,
SUM(Salary) OVER (
ORDER BY EmployeeID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS RunningTotal
FROM Employees;
2. ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
Includes the previous, current, and next row.
Useful for moving averages or smoothing.
SQL
SELECT
Date,
Sales,
AVG(Sales) OVER (
ORDER BY Date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS MovingAvg
FROM DailySales;
3. ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
Includes the current row and all rows after it.
Useful for cumulative future calculations.
🔁 Summary Table
Frame Clause | Description | Use Case |
UNBOUNDED PRECEDING TO CURRENT ROW | From start to current row | Running totals |
1 PRECEDING TO 1 FOLLOWING | Previous, current, next row | Moving averages |
CURRENT ROW TO UNBOUNDED FOLLOWING | Current row to end | Future projections |
🧠 Tip:
Window frames are only relevant for aggregate window functions like SUM, AVG, MIN, MAX.
Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) ignore window frames — they only use PARTITION BY and ORDER BY.
Would you like a visual diagram showing how different window frames operate on a sample dataset?
8. How does SQL handle isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE)?
Isolation levels control how transactions interact with each other, especially when accessing the same data concurrently. They help manage data consistency and concurrency in multi-user environments.
✅ The Four Standard Isolation Levels
Isolation Level | Dirty Read | Non-repeatable Read | Phantom Read | Description |
READ UNCOMMITTED | ✅ Yes | ✅ Yes | ✅ Yes | Lowest isolation; allows reading uncommitted changes. |
READ COMMITTED | ❌ No | ✅ Yes | ✅ Yes | Default in many DBs; reads only committed data. |
REPEATABLE READ | ❌ No | ❌ No | ✅ Yes | Prevents changes to rows read, but new rows can appear. |
SERIALIZABLE | ❌ No | ❌ No | ❌ No | Highest isolation; transactions behave as if run sequentially. |
🔹 Key Terms
Dirty Read: Reading data that another transaction has modified but not yet committed.
Non-repeatable Read: Reading the same row twice and getting different results due to another transaction's update.
Phantom Read: New rows added by another transaction appear in a repeated query.
🔧 How to Set Isolation Level (Example in SQL Server)
SQL
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
-- Your SQL operations here
COMMIT;
🧠 Use Cases
Scenario | Recommended Isolation Level |
Fast reporting, low accuracy | READ UNCOMMITTED |
General OLTP systems | READ COMMITTED |
Financial transactions | REPEATABLE READ |
Critical consistency (e.g., banking) | SERIALIZABLE |
9. What is sharding in databases?
Sharding is a database partitioning technique that splits a large database into smaller, faster, and more manageable pieces called shards. Each shard holds a subset of the data, and together they make up the complete dataset.
✅ Why Use Sharding?
🚀 Improves performance by distributing load across multiple servers.
📈 Enhances scalability — you can add more shards as data grows.
🔒 Increases availability — failure in one shard doesn’t affect others.
🔹 How Sharding Works
Each shard is a separate database (or table) that contains a portion of the data, typically based on a shard key like:
User ID ranges (e.g., 1–1000 in Shard A, 1001–2000 in Shard B)
Geographic location (e.g., India in Shard A, US in Shard B)
Hashing (e.g., hash of email or ID)
🔁 Example
Suppose you have a Users table with 10 million rows. Instead of storing all rows in one database:
Shard | User ID Range |
A | 1 – 1,000,000 |
B | 1,000,001 – 2,000,000 |
C | 2,000,001 – 3,000,000 |
Each shard can be stored on a different server, improving performance and reliability.
🔻 Challenges of Sharding
Challenge | Description |
Complex queries | Cross-shard joins are difficult. |
Rebalancing | Moving data between shards can be tricky. |
Operational overhead | More infrastructure to manage. |
Consistency | Harder to enforce global constraints (e.g., unique email). |
🔁 Summary Table
Feature | Sharding |
Purpose | Distribute data across servers |
Benefits | Scalability, performance, availability |
Drawbacks | Complexity, cross-shard queries |
Common in | Large-scale apps (e.g., Facebook, Amazon) |
Would you like a visual diagram showing how sharding splits and distributes data across servers?
10. What is the difference between MERGE and UPSERT?
Difference Between MERGE and UPSERT in SQL
Both MERGE and UPSERT are used to insert or update data depending on whether a match exists — but they differ in syntax, standardization, and database support.
✅ 1. MERGE
A standard SQL command (also known as "upsert with control").
Combines INSERT, UPDATE, and optionally DELETE in one statement.
Requires a source and target table.
Supported in: SQL Server, Oracle, PostgreSQL (via extensions).
🔧 Example:
SQL
MERGE INTO TargetTable AS T
USING SourceTable AS S
ON T.ID = S.ID
WHEN MATCHED THEN
UPDATE SET T.Name = S.Name
WHEN NOT MATCHED THEN
INSERT (ID, Name) VALUES (S.ID, S.Name);
✅ 2. UPSERT
A concept, not a standard SQL keyword.
Means: Insert if not exists, otherwise Update.
Implemented differently across databases:
PostgreSQL: INSERT ... ON CONFLICT
MySQL: INSERT ... ON DUPLICATE KEY UPDATE
SQLite: INSERT OR REPLACE
SQL Server: Use MERGE or custom logic
🔧 Example in PostgreSQL:
SQL
INSERT INTO Employees (ID, Name)
VALUES (101, 'John')
ON CONFLICT (ID)
DO UPDATE SET Name = EXCLUDED.Name;
🔁 Summary Table
Feature | MERGE | UPSERT |
SQL Standard | ✅ Yes | ❌ No (concept only) |
Control over actions | ✅ Full (insert, update, delete) | ✅ Limited (insert/update) |
Syntax complexity | ❌ More complex | ✅ Simpler |
Database support | SQL Server, Oracle | PostgreSQL, MySQL, SQLite |
🧠 Tip:
Use MERGE when you need fine-grained control over multiple actions.
Use UPSERT when you want a quick insert-or-update logic.
Would you like examples tailored to a specific database system you're working with?


0 comments:
Post a Comment
Note: only a member of this blog may post a comment.