SQL Interview Questions

Basic SQL Interview Questions

1. What is the difference between DELETE, TRUNCATE, and DROP?

Great question! In SQL, DELETE, TRUNCATE, and DROP are all used to remove data, but they differ significantly in scope, performance, and rollback capabilities. Here's a breakdown:

🔹 1. DELETE

Purpose: Removes specific rows from a table based on a WHERE clause.
Can be rolled back: ✅ Yes (if inside a transaction).
Triggers: ✅ Fires DELETE triggers.
Performance: Slower, especially for large datasets.
Usage:

SQL

DELETE FROM employees WHERE department = 'HR';

🔹 2. TRUNCATE

Purpose: Removes all rows from a table.
Can be rolled back: ⚠️ Depends on the DBMS (e.g., in Oracle it can be rolled back, in MySQL it cannot).
Triggers: ❌ Does not fire DELETE triggers.
Performance: Faster than DELETE because it doesn't log individual row deletions.
Usage:

SQL

TRUNCATE TABLE employees;

🔹 3. DROP

Purpose: Completely removes the table structure and its data from the database.
Can be rolled back: ❌ No (unless using special features like Oracle Flashback).
Triggers: ❌ Not applicable (table is gone).
Performance: Very fast.
Usage:

SQL

DROP TABLE employees;

🔁 Summary Table

Feature	DELETE	TRUNCATE	DROP
Removes rows	✅ (Selective)	✅ (All rows)	❌ (Removes table)
Rollback	✅ Yes	⚠️ Depends	❌ No
Triggers fire	✅ Yes	❌ No	❌ No
Performance	Slow (row-by-row)	Fast	Fastest
Table remains	✅ Yes	✅ Yes	❌ No

2. What is a primary key vs foreign key?

Here's a clear explanation of the difference between a Primary Key and a Foreign Key in relational databases:

🔹 1. Primary Key

Definition: A column (or set of columns) that uniquely identifies each row in a table.
Characteristics:
Must contain unique values.
Cannot contain NULL values.
Each table can have only one primary key.
Purpose: Ensures entity integrity — no duplicate or missing records.
Example:

SQL

CREATE TABLE Students (

StudentID INT PRIMARY KEY,

Name VARCHAR(100),

Age INT

);

🔹 2. Foreign Key

Definition: A column (or set of columns) that references the primary key of another table.
Characteristics:
Can contain duplicate values.
Can contain NULL values (unless explicitly restricted).
Used to establish relationships between tables.
Purpose: Ensures referential integrity — links data across tables.
Example:

SQL

CREATE TABLE Enrollments (

EnrollmentID INT PRIMARY KEY,

StudentID INT,

CourseID INT,

FOREIGN KEY (StudentID) REFERENCES Students(StudentID)

);

🔁 Summary Table

Feature	Primary Key	Foreign Key
Uniqueness	✅ Must be unique	❌ Can be duplicated
NULLs allowed	❌ Not allowed	✅ Allowed (unless restricted)
Purpose	Identify rows uniquely	Link to another table
Integrity enforced	Entity integrity	Referential integrity
Number per table	Only one	Can be multiple

3. Difference between WHERE and HAVING?

WHERE and HAVING are both used to filter records in SQL, but they are used in different contexts and operate at different stages of query execution.

🔹 1. WHERE

Used for: Filtering rows before any grouping or aggregation.
Applies to: Individual rows in a table.
Cannot be used with aggregate functions like SUM(), AVG(), etc.
Example:

SQL

SELECT * FROM Orders

WHERE OrderDate >= '2025-01-01';

🔹 2. HAVING

Used for: Filtering groups after aggregation (GROUP BY).
Applies to: Groups of rows.
Can be used with aggregate functions.
Example:

SQL

SELECT CustomerID, COUNT(*) AS OrderCount

FROM Orders

GROUP BY CustomerID

HAVING COUNT(*) > 5;

🔁 Summary Table

Feature	WHERE	HAVING
Filters on	Rows	Groups
Used with	SELECT, UPDATE, DELETE	SELECT with GROUP BY
Aggregate functions	❌ Not allowed	✅ Allowed
Execution stage	Before grouping	After grouping

🧠 Tip:

You can use both in the same query:

SQL

SELECT CustomerID, SUM(TotalAmount) AS TotalSpent

FROM Orders

WHERE OrderDate >= '2025-01-01'

GROUP BY CustomerID

HAVING SUM(TotalAmount) > 10000;

Would you like a visual flowchart showing how SQL processes WHERE and HAVING?

4. Explain INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

Here’s a clear explanation of the four main types of SQL joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN — all used to combine rows from two or more tables based on a related column.

🔹 1. INNER JOIN

Returns: Only the rows that have matching values in both tables.
Use case: When you want data that exists in both tables.
Example:

SQL

SELECT Orders.OrderID, Customers.CustomerName

FROM Orders

INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

🔹 2. LEFT JOIN (or LEFT OUTER JOIN)

Returns: All rows from the left table, and matched rows from the right table. If no match, returns NULL for right table columns.
Use case: When you want all records from the left table, even if there's no match in the right.
Example:

SQL

SELECT Customers.CustomerName, Orders.OrderID

FROM Customers

LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

🔹 3. RIGHT JOIN (or RIGHT OUTER JOIN)

Returns: All rows from the right table, and matched rows from the left table. If no match, returns NULL for left table columns.
Use case: When you want all records from the right table, even if there's no match in the left.
Example:

SQL

SELECT Orders.OrderID, Customers.CustomerName

FROM Orders

RIGHT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

🔹 4. FULL OUTER JOIN

Returns: All rows when there is a match in either left or right table. If no match, returns NULL for missing side.
Use case: When you want all records from both tables, regardless of matches.
Example:

SQL

SELECT Customers.CustomerName, Orders.OrderID

FROM Customers

FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

🔁 Summary Table

Join Type	Returns Rows From	Includes Non-Matching Rows From
INNER JOIN	Both tables	❌ No
LEFT JOIN	Left table	✅ Left only
RIGHT JOIN	Right table	✅ Right only
FULL OUTER JOIN	Both tables	✅ Both sides

Would you like a visual diagram showing how these joins work with sample data?

5. What is the difference between CHAR and VARCHAR?

The difference between CHAR and VARCHAR in SQL lies in how they store string data and manage memory:

🔹 1. CHAR (Character)

Fixed-length string data type.
Always reserves the specified number of characters, padding with spaces if the input is shorter.
Faster for fixed-size data due to predictable storage.
Use case: Ideal for fields like country codes ('IN', 'US') or fixed-length IDs.

Example:

SQL

CREATE TABLE Countries (

Code CHAR(2),

Name VARCHAR(50)

);

-- 'IN' will be stored as 'IN' (no padding needed)

-- 'U' will be stored as 'U ' (padded with space)

🔹 2. VARCHAR (Variable Character)

Variable-length string data type.
Stores only the actual number of characters entered.
More efficient for varying-length data.
Use case: Ideal for names, addresses, descriptions, etc.

Example:

SQL

CREATE TABLE Employees (

FirstName VARCHAR(50),

LastName VARCHAR(50)

);

-- 'John' uses only 4 bytes, not 50

🔁 Summary Table

Feature	CHAR	VARCHAR
Length	Fixed	Variable
Padding	Pads with spaces	No padding
Storage efficiency	Less efficient for short data	More efficient
Performance	Slightly faster for fixed data	Slightly slower for variable
Use case	Codes, fixed-length fields	Names, descriptions, etc.

Would you like a visual comparison showing how data is stored in both types?

6. What are indexes and why do we use them?

In SQL, indexes are special data structures that improve the speed and efficiency of data retrieval operations on a database table — much like an index in a book helps you quickly find a topic.

🔹 What is an Index?

An index is a pointer to data in a table. It works like a lookup table that helps the database engine find rows faster without scanning the entire table.

🔹 Why Use Indexes?

Indexes are used to:

✅ Speed up SELECT queries.
✅ Improve performance of JOINs, WHERE, and ORDER BY clauses.
✅ Enforce uniqueness (e.g., via UNIQUE indexes).
❌ But they slow down INSERT, UPDATE, DELETE operations slightly because the index also needs to be updated.

🔹 Types of Indexes

Type	Description
Primary Index	Automatically created on a primary key.
Unique Index	Ensures all values in the indexed column are unique.
Composite Index	Index on multiple columns.
Full-text Index	Used for searching text-based data efficiently.
Clustered Index	Sorts and stores the data rows in the table based on the index.
Non-clustered Index	Stores pointers to the actual data rows.

🔹 Example

SQL

-- Create an index on the 'LastName' column

CREATE INDEX idx_lastname ON Employees(LastName);

Now, queries like:

SQL

SELECT * FROM Employees WHERE LastName = 'Kumar';

will run faster because the database can use the index to locate matching rows quickly.

🔁 Summary

Feature	Indexes Help With
Speed	✅ Faster data retrieval
Efficiency	✅ Optimized query performance
Downsides	❌ Slower write operations
Use case	✅ WHERE, JOIN, ORDER BY

Would you like a visual diagram showing how indexes work behind the scenes?

7. Explain normalization and denormalization.

Here’s a clear explanation of Normalization and Denormalization in database design:

🔹 1. Normalization

Definition: The process of organizing data to reduce redundancy and improve data integrity.
Goal: Break down large tables into smaller, related tables and eliminate duplicate data.
Benefits:
✅ Saves storage space.
✅ Prevents data anomalies.
✅ Improves consistency.
Drawback:
❌ More complex queries due to multiple joins.

Example: Instead of storing customer and order info in one table:

SQL

-- Normalized structure

Customers (CustomerID, Name)

Orders (OrderID, CustomerID, Product)

🔹 2. Denormalization

Definition: The process of combining tables to reduce joins and improve read performance.
Goal: Optimize for faster querying, especially in read-heavy systems.
Benefits:
✅ Faster data retrieval.
✅ Simpler queries.
Drawback:
❌ More storage used.
❌ Risk of data inconsistency.

Example: Combining customer and order info into one table:

SQL

-- Denormalized structure

CustomerOrders (CustomerID, Name, OrderID, Product)

🔁 Summary Table

Feature	Normalization	Denormalization
Purpose	Reduce redundancy	Improve performance
Data integrity	✅ High	❌ Lower
Storage efficiency	✅ Efficient	❌ Less efficient
Query complexity	❌ More joins	✅ Fewer joins
Use case	OLTP (transactional systems)	OLAP (reporting/analytics)

Would you like a diagram showing how a normalized vs denormalized schema looks visually?

8. Difference between UNION and UNION ALL.

Here’s a clear comparison between UNION and UNION ALL in SQL:

🔹 1. UNION

Combines the result sets of two or more SELECT queries.
Removes duplicates from the final result.
Slower than UNION ALL because it performs a distinct sort.
Use case: When you want a unique list of results.

Example:

SQL

SELECT City FROM Customers

UNION

SELECT City FROM Suppliers;

✅ Returns each city only once, even if it appears in both tables.

🔹 2. UNION ALL

Combines the result sets of two or more SELECT queries.
Keeps duplicates in the final result.
Faster because it skips the duplicate check.
Use case: When you want all records, including duplicates.

Example:

SQL

SELECT City FROM Customers

UNION ALL

SELECT City FROM Suppliers;

✅ Returns all cities, including duplicates.

🔁 Summary Table

Feature	UNION	UNION ALL
Combines results	✅ Yes	✅ Yes
Removes duplicates	✅ Yes	❌ No
Performance	❌ Slower	✅ Faster
Use case	Unique results	All results (including duplicates)

Would you like a visual example showing how both behave with sample data?

9. What is a view? Can we update data through a view?

🔹 What is a View in SQL?

A view is a virtual table based on the result of a SQL query. It doesn't store data itself but displays data stored in other tables. Think of it as a saved query that you can treat like a regular table.

✅ Why Use Views?

Simplify complex queries.
Improve security by exposing only specific columns.
Provide a consistent interface to changing data structures.
Help with reporting and analytics.

🔹 Example:

SQL

CREATE VIEW ActiveCustomers AS

SELECT CustomerID, Name, Email

FROM Customers

WHERE Status = 'Active';

Now you can query it like a table:

SQL

SELECT * FROM ActiveCustomers;

🔄 Can You Update Data Through a View?

✅ Yes, but only if:

The view is based on a single table.
It does not use:
GROUP BY
DISTINCT
JOIN
UNION
Aggregate functions (SUM, AVG, etc.)

❌ No, if the view is:

Based on multiple tables.
Contains aggregations, grouping, or set operations.

🔁 Summary Table

Feature	View
Stores data	❌ No (virtual table)
Based on query	✅ Yes
Can be queried like table	✅ Yes
Can be updated	✅ Sometimes (with conditions)
Use case	Simplify logic, restrict access

Would you like an example of an updatable view vs a non-updatable view to see the difference in action?

10. What is the difference between NULL and 0?

Difference Between NULL and 0 in SQL

NULL and 0 may look similar at first glance (especially when a column appears empty), but they represent very different concepts in SQL:

✅ NULL

Meaning: Represents missing, unknown, or undefined data.
Data type: Can appear in any column type (text, number, date, etc.).
Behavior:
Arithmetic operations with NULL result in NULL.
Comparisons with NULL require IS NULL or IS NOT NULL.
Example:

SQL

SELECT * FROM Employees WHERE ManagerID IS NULL;

✅ 0

Meaning: A known numeric value — specifically, zero.
Data type: Only valid in numeric columns.
Behavior:
Arithmetic operations with 0 behave normally.
Can be compared using standard operators (=, <, >).
Example:

SQL

SELECT * FROM Products WHERE Quantity = 0;

🔁 Summary Table

Feature	NULL	0
Represents	Unknown or missing value	Numeric value (zero)
Data type	Any	Numeric only
Arithmetic	Results in NULL	Normal math operations
Comparisons	Use IS NULL / IS NOT NULL	Use =, <, > etc.
Storage	No actual value stored	Value stored as 0

🧠 Tip:

In SQL, NULL ≠ 0 — they are not equal, and treating them as such can lead to logic errors in queries.

Would you like a visual example showing how queries behave differently with NULL vs 0?

---

🔹 Intermediate SQL Interview Questions

1. What is a clustered vs non-clustered index?

🔹 Clustered vs Non-Clustered Index in SQL

Indexes help speed up data retrieval, and they come in two main types: clustered and non-clustered. Here's a clear breakdown of the differences:

✅ 1. Clustered Index

Definition: Sorts and stores the actual data rows in the table based on the index key.
Only one per table: Because the data rows themselves are sorted, you can only have one clustered index.
Faster for range queries: Since data is physically ordered.
Automatically created on the primary key (unless specified otherwise).

Example:

SQL

CREATE CLUSTERED INDEX idx_customer_id ON Customers(CustomerID);

✅ 2. Non-Clustered Index

Definition: Creates a separate structure that stores pointers to the actual data rows.
Multiple allowed: You can have many non-clustered indexes on a table.
Slower for range queries, but useful for quick lookups on non-primary columns.

Example:

SQL

CREATE NONCLUSTERED INDEX idx_customer_name ON Customers(Name);

🔁 Summary Table

Feature	Clustered Index	Non-Clustered Index
Data storage	Sorts actual table data	Stores pointers to data rows
Number per table	Only one	Multiple allowed
Speed for range queries	✅ Faster	❌ Slower
Automatically created on	Primary key	Not automatic
Use case	Efficient sorting & range scans	Fast lookups on non-key columns

🧠 Tip:

Think of a clustered index like a dictionary where pages are sorted alphabetically. A non-clustered index is like a table of contents pointing to specific pages.

Would you like a visual diagram showing how both types of indexes work with sample data?

2. Explain window functions (ROW_NUMBER, RANK, DENSE_RANK).

🔹 What Are Window Functions in SQL?

Window functions perform calculations across a set of table rows that are related to the current row — without collapsing rows like GROUP BY does. They are super useful for ranking, running totals, and comparisons.

✅ Common Window Functions

1. ROW_NUMBER()

Assigns a unique sequential number to each row within a partition.
No ties — every row gets a distinct number.

Example:

SQL

SELECT

EmployeeID, Department, Salary,

ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RowNum

FROM Employees;

➡️ Gives each employee a unique rank within their department based on salary.

2. RANK()

Assigns a rank to each row with gaps if there are ties.
Tied rows get the same rank, and the next rank is skipped.

Example:

SQL

SELECT

EmployeeID, Salary,

RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM Employees;

➡️ If two employees have the same salary, they get the same rank, and the next rank is skipped (e.g., 1, 2, 2, 4).

3. DENSE_RANK()

Similar to RANK(), but no gaps in ranking.
Tied rows get the same rank, and the next rank continues sequentially.

Example:

SQL

SELECT

EmployeeID, Salary,

DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank

FROM Employees;

➡️ If two employees tie, they get the same rank, but the next rank is just incremented (e.g., 1, 2, 2, 3).

🔁 Summary Table

Function	Handles Ties	Gaps in Rank	Use Case
ROW_NUMBER()	❌ No	❌ No	Unique row numbering
RANK()	✅ Yes	✅ Yes	Ranking with gaps
DENSE_RANK()	✅ Yes	❌ No	Ranking without gaps

Would you like a visual example showing how these functions behave on sample data?

3. What is the difference between EXISTS and IN?

🔹 Difference Between EXISTS and IN in SQL

Both EXISTS and IN are used in subqueries to filter data, but they work differently under the hood and are best suited for different scenarios.

✅ 1. IN

Checks if a value exists in a list or result set.
Subquery returns a list of values, and the outer query checks if a value is in that list.
Better for small, static lists or when the subquery returns a single column.

Example:

SQL

SELECT * FROM Employees

WHERE DepartmentID IN (SELECT DepartmentID FROM Departments WHERE Location = 'Chennai');

✅ 2. EXISTS

Checks if a subquery returns any rows.
Subquery is usually correlated (depends on the outer query).
Faster for large datasets because it stops searching once it finds a match.

Example:

SQL

SELECT * FROM Employees E

WHERE EXISTS (

SELECT 1 FROM Departments D

WHERE D.DepartmentID = E.DepartmentID AND D.Location = 'Chennai'

);

🔁 Summary Table

Feature	IN	EXISTS
Subquery returns	A list of values	A boolean (true/false)
Best for	Small lists, single-column checks	Large datasets, correlated queries
Performance	Slower on large subqueries	Faster due to short-circuiting
Null handling	Can cause issues with NULL	Handles NULL more gracefully

🧠 Tip:

Use IN when you're comparing a single column to a list.
Use EXISTS when you're checking for the existence of related rows.

Would you like a performance comparison example or a diagram showing how each works internally?

4. Explain CTE (Common Table Expressions).

What is a CTE (Common Table Expression)?

A CTE (Common Table Expression) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps make complex queries more readable, modular, and maintainable.

✅ Syntax of a CTE

SQL

WITH CTE_Name AS (

SELECT column1, column2

FROM SomeTable

WHERE condition

)

SELECT * FROM CTE_Name;

🔹 Why Use CTEs?

✅ Simplifies complex joins and subqueries.
✅ Improves readability and structure.
✅ Allows recursion (e.g., hierarchical data like org charts).
✅ Can be reused multiple times in the same query.

🔹 Example: Basic CTE

SQL

WITH HighSalaryEmployees AS (

SELECT EmployeeID, Name, Salary

FROM Employees

WHERE Salary > 100000

)

SELECT * FROM HighSalaryEmployees;

🔹 Example: Recursive CTE (for hierarchical data)

SQL

WITH EmployeeHierarchy AS (

SELECT EmployeeID, ManagerID, Name

FROM Employees

WHERE ManagerID IS NULL

UNION ALL

SELECT e.EmployeeID, e.ManagerID, e.Name

FROM Employees e

INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID

)

SELECT * FROM EmployeeHierarchy;

🔁 Summary Table

Feature	CTE
Scope	Temporary (within the query)
Improves readability	✅ Yes
Supports recursion	✅ Yes
Reusable in query	✅ Yes
Stored in DB	❌ No

Would you like a visual diagram showing how recursive CTEs work in hierarchical data?

5. What are stored procedures vs functions?

🔹 Stored Procedures vs Functions in SQL

Both stored procedures and functions are reusable blocks of SQL code, but they serve different purposes and have distinct behaviors.

✅ 1. Stored Procedure

A set of SQL statements that perform a specific task.
Can perform actions like modifying data (INSERT, UPDATE, DELETE).
Can return multiple result sets or no result.
Called using EXEC or CALL.

Example:

SQL

CREATE PROCEDURE GetHighSalaryEmployees

BEGIN

SELECT * FROM Employees WHERE Salary > 100000;

END;

Usage:

SQL

EXEC GetHighSalaryEmployees;

✅ 2. Function

Returns a single value or a table.
Can be used inside SQL expressions (e.g., in SELECT, WHERE, etc.).
Cannot modify database state (no INSERT, UPDATE, DELETE).
Must return a value using RETURN.

Example:

SQL

CREATE FUNCTION GetBonus (@Salary INT)

RETURNS INT

BEGIN

RETURN @Salary * 0.10;

END;

Usage:

SQL

SELECT Name, GetBonus(Salary) AS Bonus FROM Employees;

🔁 Summary Table

Feature	Stored Procedure	Function
Returns value	Optional	Mandatory
Can modify data	✅ Yes	❌ No
Used in SQL statements	❌ No	✅ Yes
Returns multiple results	✅ Yes	❌ No (usually one value)
Called using	EXEC or CALL	Used in expressions

Would you like a diagram showing how procedures and functions fit into a database workflow?

6. Difference between correlated and non-correlated subquery.

🔹 Correlated vs Non-Correlated Subquery in SQL

Both are types of subqueries, but they differ in how they interact with the outer query.

✅ 1. Non-Correlated Subquery

Independent of the outer query.
Executes once, and the result is used by the outer query.
Can be run standalone.

Example:

SQL

SELECT Name FROM Employees

WHERE DepartmentID IN (

SELECT DepartmentID FROM Departments WHERE Location = 'Chennai'

);

➡️ The subquery runs once and returns a list of department IDs.

✅ 2. Correlated Subquery

Depends on the outer query for its values.
Executes once for each row of the outer query.
Cannot run independently.

Example:

SQL

SELECT Name FROM Employees E

WHERE EXISTS (

SELECT 1 FROM Departments D

WHERE D.DepartmentID = E.DepartmentID AND D.Location = 'Chennai'

);

➡️ The subquery runs for each employee, checking if their department is in Chennai.

🔁 Summary Table

Feature	Non-Correlated Subquery	Correlated Subquery
Depends on outer query	❌ No	✅ Yes
Execution frequency	Once	Once per outer row
Performance	✅ Faster	❌ Slower (can be expensive)
Can run standalone	✅ Yes	❌ No
Use case	Static filtering	Row-by-row comparison

Would you like a visual example showing how both types behave with sample data?

7. How do you handle duplicate rows in SQL?

How to Handle Duplicate Rows in SQL

Duplicate rows can occur due to data entry errors, joins, or imports. Here are several ways to identify, remove, or prevent duplicates in SQL:

✅ 1. Identify Duplicates

To find duplicates based on one or more columns:

SQL

SELECT column1, column2, COUNT(*)

FROM table_name

GROUP BY column1, column2

HAVING COUNT(*) > 1;

✅ 2. Delete Duplicate Rows (Keep One)

Using CTE and ROW_NUMBER():

SQL

WITH CTE AS (

SELECT *, ROW_NUMBER() OVER (

PARTITION BY column1, column2

ORDER BY id

) AS rn

FROM table_name

)

DELETE FROM CTE WHERE rn > 1;

➡️ Keeps the first occurrence and deletes the rest.

✅ 3. Using DISTINCT to Remove Duplicates in Query Results

SQL

SELECT DISTINCT column1, column2

FROM table_name;

➡️ This doesn’t change the table — it just removes duplicates from the result set.

✅ 4. Prevent Duplicates Using Constraints

Add a UNIQUE constraint to prevent future duplicates:

SQL

ALTER TABLE table_name

ADD CONSTRAINT unique_constraint_name UNIQUE (column1, column2);

🔁 Summary Table

Task	Method
Find duplicates	GROUP BY + HAVING COUNT(*) > 1
Delete duplicates	ROW_NUMBER() in a CTE
Remove from result only	SELECT DISTINCT
Prevent future duplicates	UNIQUE constraint

Would you like help writing a query for a specific table or dataset you're working with?

8. What is a transaction? Explain COMMIT, ROLLBACK, and SAVEPOINT.

🔹 What is a Transaction in SQL?

A transaction is a sequence of one or more SQL operations (like INSERT, UPDATE, DELETE) that are executed as a single unit of work. Transactions ensure data integrity and follow the ACID properties:

Atomicity: All operations succeed or none do.
Consistency: Data remains in a valid state.
Isolation: Transactions don’t interfere with each other.
Durability: Changes persist even after a crash.

✅ Key Transaction Control Commands

1. COMMIT

Saves all changes made during the transaction.
Once committed, changes are permanent.

Example:

SQL

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;

COMMIT;

2. ROLLBACK

Undoes all changes made during the transaction.
Used when an error occurs or a condition fails.

Example:

SQL

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

-- Something goes wrong

ROLLBACK;

3. SAVEPOINT

Creates a checkpoint within a transaction.
Allows partial rollback to that point without undoing the entire transaction.

Example:

SQL

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

SAVEPOINT AfterDebit;

UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;

-- Something goes wrong

ROLLBACK TO AfterDebit;

COMMIT;

🔁 Summary Table

Command	Purpose	Reversibility
COMMIT	Finalizes and saves changes	❌ No
ROLLBACK	Cancels all changes in the transaction	✅ Yes
SAVEPOINT	Sets a checkpoint for partial rollback	✅ Yes

Would you like a visual flowchart showing how transactions and rollback points work?

9. What is the difference between OLTP and OLAP databases?

🔹 Difference Between OLTP and OLAP Databases

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two types of database systems designed for different purposes:

✅ 1. OLTP (Online Transaction Processing)

Purpose: Handles day-to-day operations like insertions, updates, and deletions.
Focus: Speed and accuracy of short, atomic transactions.
Data structure: Highly normalized to reduce redundancy.
Users: Front-line staff, customers, applications.
Examples:
Banking systems
E-commerce platforms
Reservation systems

Characteristics:

Feature	OLTP
Operations	INSERT, UPDATE, DELETE
Query type	Simple, fast queries
Data volume	Smaller per transaction
Response time	Milliseconds
Design	Normalized schema

✅ 2. OLAP (Online Analytical Processing)

Purpose: Supports complex queries for data analysis and reporting.
Focus: Speed and flexibility of read-heavy operations.
Data structure: Often denormalized for performance.
Users: Analysts, decision-makers, BI tools.
Examples:
Sales forecasting
Market trend analysis
Executive dashboards

Characteristics:

Feature	OLAP
Operations	SELECT, GROUP BY, JOIN
Query type	Complex, multi-dimensional
Data volume	Large datasets
Response time	Seconds to minutes
Design	Star/Snowflake schema

🔁 Summary Table

Feature	OLTP	OLAP
Purpose	Transactional processing	Analytical processing
Query type	Simple, fast	Complex, aggregated
Schema design	Normalized	Denormalized
Data volume	Small per transaction	Large for analysis
Users	Operational staff	Analysts, executives
Examples	ATM, POS, CRM	BI dashboards, data warehouses

Would you like a visual diagram showing how OLTP and OLAP systems interact in a typical enterprise architecture?

10. What is the difference between DELETE with WHERE and TRUNCATE?

Difference Between DELETE with WHERE and TRUNCATE in SQL

Both DELETE and TRUNCATE are used to remove data from a table, but they differ in how they work, performance, and rollback behavior.

✅ 1. DELETE with WHERE

Removes specific rows from a table based on a condition.
Row-by-row deletion — slower for large datasets.
Can be rolled back if used within a transaction.
Triggers: ✅ Fires DELETE triggers.
Usage:

SQL

DELETE FROM Employees WHERE Department = 'HR';

✅ 2. TRUNCATE

Removes all rows from a table — no filtering.
Faster than DELETE because it doesn’t log individual row deletions.
Cannot be rolled back in some databases (depends on DBMS).
Triggers: ❌ Does not fire DELETE triggers.
Usage:

SQL

TRUNCATE TABLE Employees;

🔁 Summary Table

Feature	DELETE with WHERE	TRUNCATE
Removes	Specific rows	All rows
Filtering	✅ Yes (WHERE clause)	❌ No filtering
Rollback	✅ Yes (in transaction)	⚠️ Depends on DBMS
Triggers	✅ Fires DELETE triggers	❌ Does not fire triggers
Performance	❌ Slower (row-by-row)	✅ Faster (bulk operation)
Table structure	✅ Remains unchanged	✅ Remains unchanged

🧠 Tip:

Use DELETE when you need fine control over which rows to remove. Use TRUNCATE when you want a quick cleanup of the entire table.

Would you like a diagram showing how both commands affect a sample table?

---

🔹 Advanced SQL Interview Questions

1. Explain partitioning in SQL (horizontal vs vertical).

Partitioning is a database design technique used to divide a large table into smaller, more manageable pieces — called partitions — while keeping them logically as one table. This improves performance, maintenance, and scalability.

There are two main types:

✅ 1. Horizontal Partitioning

Divides rows of a table into multiple partitions based on a condition (e.g., date, region, ID range).
Each partition contains a subset of rows.
Common in large transactional tables (e.g., sales data by year).

Example:

SQL

-- Sales data partitioned by year

Sales_2023

Sales_2024

Sales_2025

Use case: Improves query performance by scanning only relevant partitions.

✅ 2. Vertical Partitioning

Divides columns of a table into multiple tables.
Each partition contains a subset of columns, often grouped by usage or sensitivity.
Common in wide tables with many columns.

Example:

SQL

-- Original table: Customers (ID, Name, Email, Address, Preferences)

-- Vertical partitions:

CustomerBasic (ID, Name, Email)

CustomerDetails (ID, Address, Preferences)

Use case: Improves performance and security by accessing only needed columns.

🔁 Summary Table

Feature	Horizontal Partitioning	Vertical Partitioning
Divides by	Rows	Columns
Partition contains	Subset of rows	Subset of columns
Use case	Large datasets by time/region	Wide tables, column grouping
Performance benefit	Faster row filtering	Faster column access
Common in	OLTP and OLAP systems	OLTP systems

Would you like a visual diagram showing how horizontal and vertical partitioning work on a sample table?

2. How does indexing improve performance, and what are drawbacks?

Indexing is like creating a shortcut for the database to find data faster — similar to how a book index helps you quickly locate topics without reading every page.

✅ Benefits of Indexing

Faster Data Retrieval

Speeds up SELECT queries, especially with WHERE, JOIN, ORDER BY, and GROUP BY.
Reduces the number of rows scanned by using a search tree or hash map.

Efficient Sorting and Filtering

Indexes help sort data quickly, improving performance for queries with ORDER BY.

Improved Join Performance

Indexes on foreign keys and join columns make joins faster by reducing lookup time.

Enforces Uniqueness

UNIQUE indexes ensure no duplicate values in a column (e.g., email, username).

❌ Drawbacks of Indexing

Slower Write Operations

INSERT, UPDATE, and DELETE become slower because indexes must also be updated.

Increased Storage Usage

Indexes consume additional disk space, especially for large tables or multiple indexes.

Complex Maintenance

Requires careful planning to avoid redundant or unused indexes.
Poorly designed indexes can hurt performance instead of helping.

Not Always Used

The query optimizer may ignore indexes if it determines a full scan is faster.

🔁 Summary Table

Feature	Benefit	Drawback
Query speed	✅ Faster reads	❌ Slower writes
Storage usage	✅ Efficient access	❌ More disk space
Maintenance	✅ Better performance tuning	❌ Requires careful design
Use in joins & filters	✅ Optimized joins	❌ May be ignored by optimizer

Would you like a visual diagram showing how indexes work internally (e.g., B-tree vs hash index)?

3. What is query execution plan? How would you optimize a query?

A Query Execution Plan (QEP) is a detailed roadmap that the database engine uses to execute a SQL query. It shows how the query will be processed — including which indexes are used, how tables are joined, and in what order operations occur.

Think of it as a behind-the-scenes blueprint that helps you understand and optimize query performance.

✅ How to View a Query Execution Plan

Most databases support commands or tools to view the plan:

SQL Server: SET SHOWPLAN_ALL ON or use the "Execution Plan" tab in SSMS.
PostgreSQL: EXPLAIN or EXPLAIN ANALYZE
MySQL: EXPLAIN SELECT ...

Example:

SQL

EXPLAIN SELECT * FROM Orders WHERE CustomerID = 101;

🔹 Key Components in a Plan

Component	Description
Scan Type	Table scan, index scan, etc.
Join Type	Nested loop, hash join, merge join
Cost Estimate	Estimated time/resources
Rows Returned	Estimated number of rows
Index Usage	Whether an index is used

✅ How to Optimize a Query

Here are common strategies:

1. Use Indexes Wisely

Create indexes on columns used in WHERE, JOIN, ORDER BY.
Avoid indexing columns with low selectivity (e.g., gender).

2. **Avoid SELECT ***

Only select the columns you need.

3. Use WHERE Clauses Efficiently

Filter early to reduce row scans.

4. Optimize Joins

Use appropriate join types.
Ensure join columns are indexed.

5. Use CTEs or Subqueries Carefully

Avoid unnecessary nesting or repeated subqueries.

6. Analyze Execution Plan

Look for full table scans, missing indexes, or expensive operations.

7. Update Statistics

Keep database statistics up to date for better query planning.

🔁 Summary Table

Feature	Query Execution Plan	Query Optimization
Purpose	Shows how query will run	Improves query performance
Tools	EXPLAIN, SSMS, pgAdmin	Indexing, filtering, joins
Common issues	Full scans, bad joins	Missing indexes, SELECT *
Benefit	Insight into performance	Faster, more efficient queries

Would you like help analyzing a specific query or execution plan from your database?

4. Difference between materialized view and normal view.

Both views and materialized views are used to simplify complex queries and present data in a structured way — but they differ in how data is stored and refreshed.

✅ 1. Normal View

A virtual table based on a SQL query.
Does not store data — it fetches data from the base tables every time it's queried.
Always shows up-to-date data.
Lightweight and fast to create.
No indexing on the view itself.

Example:

SQL

CREATE VIEW ActiveCustomers AS

SELECT * FROM Customers WHERE Status = 'Active';

✅ 2. Materialized View

A physical copy of the result set of a query.
Stores data like a regular table.
Must be refreshed manually or periodically to stay up-to-date.
Improves performance for complex or expensive queries.
Can be indexed for faster access.

Example:

SQL

CREATE MATERIALIZED VIEW SalesSummary AS

SELECT Region, SUM(Sales) FROM Orders GROUP BY Region;

🔁 Summary Table

Feature	Normal View	Materialized View
Data storage	❌ No (virtual)	✅ Yes (physical)
Performance	❌ Slower for complex queries	✅ Faster (precomputed)
Data freshness	✅ Always current	❌ May be stale (needs refresh)
Indexing	❌ Not supported	✅ Supported
Use case	Lightweight query abstraction	Heavy reporting, analytics

Would you like a diagram showing how views and materialized views interact with base tables and queries?

5. Explain star schema and snowflake schema.

🔹 Star Schema vs Snowflake Schema in Data Warehousing

Both Star Schema and Snowflake Schema are data modeling techniques used in dimensional databases (typically OLAP systems) to organize data for reporting and analytics. Here's a clear comparison:

⭐ 1. Star Schema

Structure: Central fact table connected directly to multiple dimension tables.
Dimension tables are denormalized (contain redundant data).
Simpler and faster for querying.
Use case: Ideal for performance-focused reporting systems.

Example:

Date

Product — Fact_Sales — Customer

Region

Advantages:

✅ Simple design
✅ Faster query performance
✅ Easy to understand

Disadvantages:

❌ Redundant data in dimension tables
❌ Larger storage footprint

❄️ 2. Snowflake Schema

Structure: Central fact table connected to normalized dimension tables (which may have sub-dimensions).
Dimension tables are split into multiple related tables.
More complex, but saves space and improves data integrity.

Example:

Date

Product — Fact_Sales — Customer — Customer_Location

Region — Country

Advantages:

✅ Reduced data redundancy
✅ Better data integrity
✅ Smaller storage size

Disadvantages:

❌ More complex joins
❌ Slower query performance

🔁 Summary Table

Feature	Star Schema	Snowflake Schema
Dimension tables	Denormalized	Normalized
Query performance	✅ Faster	❌ Slower
Storage efficiency	❌ Less efficient	✅ More efficient
Design complexity	✅ Simple	❌ Complex
Use case	Reporting, dashboards	Detailed analysis, data integrity

6. How would you implement SCD Type 1 vs Type 2 in SQL?

SCD refers to how dimensional data (like customer or product info) changes over time in a data warehouse. Two common types are:

✅ SCD Type 1: Overwrite Old Data

Purpose: Keep only the latest data.
Old values are lost — no history is maintained.
Use case: When historical changes are not important (e.g., correcting typos).

🔧 Implementation in SQL:

SQL

UPDATE Customers

SET Email = 'new_email@example.com'

WHERE CustomerID = 101;

➡️ This updates the existing record directly.

✅ SCD Type 2: Preserve History

Purpose: Keep historical versions of data.
New row is inserted for each change.
Typically includes:
EffectiveDate
EndDate
IsCurrent flag

🔧 Implementation in SQL:

SQL

-- Step 1: Mark old record as inactive

UPDATE Customers

SET EndDate = CURRENT_DATE, IsCurrent = 0

WHERE CustomerID = 101 AND IsCurrent = 1;

-- Step 2: Insert new record

INSERT INTO Customers (CustomerID, Name, Email, StartDate, EndDate, IsCurrent)

VALUES (101, 'John Doe', 'new_email@example.com', CURRENT_DATE, NULL, 1);

➡️ This keeps both old and new versions of the customer data.

🔁 Summary Table

Feature	SCD Type 1	SCD Type 2
Data history	❌ Not preserved	✅ Preserved
Action on change	Overwrite	Insert new row
Complexity	✅ Simple	❌ More complex
Use case	Minor corrections	Track historical changes

Would you like a visual diagram showing how SCD Type 1 and Type 2 records evolve over time?

7. What are window frames in SQL (e.g., ROWS BETWEEN) and their use cases?

Window frames define the subset of rows within a partition that a window function (like SUM, AVG, ROW_NUMBER, etc.) should operate on. They are used with the OVER() clause to control how much data is considered for each row's calculation.

✅ Syntax Overview

SQL

<window_function>() OVER (

PARTITION BY column

ORDER BY column

ROWS BETWEEN <frame_start> AND <frame_end>

)

🔹 Common Frame Types

1. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Includes all rows from the start of the partition up to the current row.
Useful for running totals.

SQL

SELECT

EmployeeID,

Salary,

SUM(Salary) OVER (

ORDER BY EmployeeID

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

) AS RunningTotal

FROM Employees;

2. ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

Includes the previous, current, and next row.
Useful for moving averages or smoothing.

SQL

SELECT

Date,

Sales,

AVG(Sales) OVER (

ORDER BY Date

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

) AS MovingAvg

FROM DailySales;

3. ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

Includes the current row and all rows after it.
Useful for cumulative future calculations.

🔁 Summary Table

Frame Clause	Description	Use Case
UNBOUNDED PRECEDING TO CURRENT ROW	From start to current row	Running totals
1 PRECEDING TO 1 FOLLOWING	Previous, current, next row	Moving averages
CURRENT ROW TO UNBOUNDED FOLLOWING	Current row to end	Future projections

🧠 Tip:

Window frames are only relevant for aggregate window functions like SUM, AVG, MIN, MAX.
Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) ignore window frames — they only use PARTITION BY and ORDER BY.

Would you like a visual diagram showing how different window frames operate on a sample dataset?

8. How does SQL handle isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE)?

Isolation levels control how transactions interact with each other, especially when accessing the same data concurrently. They help manage data consistency and concurrency in multi-user environments.

✅ The Four Standard Isolation Levels

Isolation Level	Dirty Read	Non-repeatable Read	Phantom Read	Description
READ UNCOMMITTED	✅ Yes	✅ Yes	✅ Yes	Lowest isolation; allows reading uncommitted changes.
READ COMMITTED	❌ No	✅ Yes	✅ Yes	Default in many DBs; reads only committed data.
REPEATABLE READ	❌ No	❌ No	✅ Yes	Prevents changes to rows read, but new rows can appear.
SERIALIZABLE	❌ No	❌ No	❌ No	Highest isolation; transactions behave as if run sequentially.

🔹 Key Terms

Dirty Read: Reading data that another transaction has modified but not yet committed.
Non-repeatable Read: Reading the same row twice and getting different results due to another transaction's update.
Phantom Read: New rows added by another transaction appear in a repeated query.

🔧 How to Set Isolation Level (Example in SQL Server)

SQL

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN TRANSACTION;

-- Your SQL operations here

COMMIT;

🧠 Use Cases

Scenario	Recommended Isolation Level
Fast reporting, low accuracy	READ UNCOMMITTED
General OLTP systems	READ COMMITTED
Financial transactions	REPEATABLE READ
Critical consistency (e.g., banking)	SERIALIZABLE

9. What is sharding in databases?

Sharding is a database partitioning technique that splits a large database into smaller, faster, and more manageable pieces called shards. Each shard holds a subset of the data, and together they make up the complete dataset.

✅ Why Use Sharding?

🚀 Improves performance by distributing load across multiple servers.
📈 Enhances scalability — you can add more shards as data grows.
🔒 Increases availability — failure in one shard doesn’t affect others.

🔹 How Sharding Works

Each shard is a separate database (or table) that contains a portion of the data, typically based on a shard key like:

User ID ranges (e.g., 1–1000 in Shard A, 1001–2000 in Shard B)
Geographic location (e.g., India in Shard A, US in Shard B)
Hashing (e.g., hash of email or ID)

🔁 Example

Suppose you have a Users table with 10 million rows. Instead of storing all rows in one database:

Shard	User ID Range
A	1 – 1,000,000
B	1,000,001 – 2,000,000
C	2,000,001 – 3,000,000

Each shard can be stored on a different server, improving performance and reliability.

🔻 Challenges of Sharding

Challenge	Description
Complex queries	Cross-shard joins are difficult.
Rebalancing	Moving data between shards can be tricky.
Operational overhead	More infrastructure to manage.
Consistency	Harder to enforce global constraints (e.g., unique email).

🔁 Summary Table

Feature	Sharding
Purpose	Distribute data across servers
Benefits	Scalability, performance, availability
Drawbacks	Complexity, cross-shard queries
Common in	Large-scale apps (e.g., Facebook, Amazon)

Would you like a visual diagram showing how sharding splits and distributes data across servers?

10. What is the difference between MERGE and UPSERT?

Difference Between MERGE and UPSERT in SQL

Both MERGE and UPSERT are used to insert or update data depending on whether a match exists — but they differ in syntax, standardization, and database support.

✅ 1. MERGE

A standard SQL command (also known as "upsert with control").
Combines INSERT, UPDATE, and optionally DELETE in one statement.
Requires a source and target table.
Supported in: SQL Server, Oracle, PostgreSQL (via extensions).

🔧 Example:

SQL

MERGE INTO TargetTable AS T

USING SourceTable AS S

ON T.ID = S.ID

WHEN MATCHED THEN

UPDATE SET T.Name = S.Name

WHEN NOT MATCHED THEN

INSERT (ID, Name) VALUES (S.ID, S.Name);

✅ 2. UPSERT

A concept, not a standard SQL keyword.
Means: Insert if not exists, otherwise Update.
Implemented differently across databases:
PostgreSQL: INSERT ... ON CONFLICT
MySQL: INSERT ... ON DUPLICATE KEY UPDATE
SQLite: INSERT OR REPLACE
SQL Server: Use MERGE or custom logic

🔧 Example in PostgreSQL:

SQL

INSERT INTO Employees (ID, Name)

VALUES (101, 'John')

ON CONFLICT (ID)

DO UPDATE SET Name = EXCLUDED.Name;

🔁 Summary Table

Feature	MERGE	UPSERT
SQL Standard	✅ Yes	❌ No (concept only)
Control over actions	✅ Full (insert, update, delete)	✅ Limited (insert/update)
Syntax complexity	❌ More complex	✅ Simpler
Database support	SQL Server, Oracle	PostgreSQL, MySQL, SQLite

🧠 Tip:

Use MERGE when you need fine-grained control over multiple actions.
Use UPSERT when you want a quick insert-or-update logic.

Would you like examples tailored to a specific database system you're working with?

KANSIRIS

SQL Interview Questions

Basic SQL Interview Questions

1. What is the difference between DELETE, TRUNCATE, and DROP?

2. What is a primary key vs foreign key?

3. Difference between WHERE and HAVING?

4. Explain INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

5. What is the difference between CHAR and VARCHAR?

6. What are indexes and why do we use them?

7. Explain normalization and denormalization.

8. Difference between UNION and UNION ALL.

9. What is a view? Can we update data through a view?

10. What is the difference between NULL and 0?

🔹 Intermediate SQL Interview Questions

1. What is a clustered vs non-clustered index?

2. Explain window functions (ROW_NUMBER, RANK, DENSE_RANK).

3. What is the difference between EXISTS and IN?

4. Explain CTE (Common Table Expressions).

5. What are stored procedures vs functions?

6. Difference between correlated and non-correlated subquery.

7. How do you handle duplicate rows in SQL?

8. What is a transaction? Explain COMMIT, ROLLBACK, and SAVEPOINT.

9. What is the difference between OLTP and OLAP databases?

10. What is the difference between DELETE with WHERE and TRUNCATE?

🔹 Advanced SQL Interview Questions

1. Explain partitioning in SQL (horizontal vs vertical).

2. How does indexing improve performance, and what are drawbacks?

3. What is query execution plan? How would you optimize a query?

4. Difference between materialized view and normal view.

5. Explain star schema and snowflake schema.

6. How would you implement SCD Type 1 vs Type 2 in SQL?

7. What are window frames in SQL (e.g., ROWS BETWEEN) and their use cases?

8. How does SQL handle isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE)?

9. What is sharding in databases?

10. What is the difference between MERGE and UPSERT?

0 comments:

Post a Comment

Complete Tutorials

Popular Posts

Labels