HI WELCOME TO KANSIRIS

SQL Interview Questions

Leave a Comment

 

Basic SQL Interview Questions

 

1. What is the difference between DELETE, TRUNCATE, and DROP?

Great question! In SQL, DELETE, TRUNCATE, and DROP are all used to remove data, but they differ significantly in scopeperformance, and rollback capabilities. Here's a breakdown:

🔹 1. DELETE

  • Purpose: Removes specific rows from a table based on a WHERE clause.

  • Can be rolled back: Yes (if inside a transaction).

  • Triggers: Fires DELETE triggers.

  • Performance: Slower, especially for large datasets.

  • Usage:

SQL

DELETE FROM employees WHERE department = 'HR';

 

🔹 2. TRUNCATE

  • Purpose: Removes all rows from a table.

  • Can be rolled back: ⚠️ Depends on the DBMS (e.g., in Oracle it can be rolled back, in MySQL it cannot).

  • Triggers: Does not fire DELETE triggers.

  • Performance: Faster than DELETE because it doesn't log individual row deletions.

  • Usage:

SQL

TRUNCATE TABLE employees;

 

🔹 3. DROP

  • Purpose: Completely removes the table structure and its data from the database.

  • Can be rolled back: No (unless using special features like Oracle Flashback).

  • Triggers: Not applicable (table is gone).

  • Performance: Very fast.

  • Usage:

SQL

DROP TABLE employees;

 

🔁 Summary Table

Feature

DELETE

TRUNCATE

DROP

Removes rows

(Selective)

(All rows)

(Removes table)

Rollback

Yes

⚠️ Depends

No

Triggers fire

Yes

No

No

Performance

Slow (row-by-row)

Fast

Fastest

Table remains

Yes

Yes

No

 

 

2. What is a primary key vs foreign key?

Here's a clear explanation of the difference between a Primary Key and a Foreign Key in relational databases:

🔹 1. Primary Key

  • Definition: A column (or set of columns) that uniquely identifies each row in a table.

  • Characteristics:

  • Must contain unique values.

  • Cannot contain NULL values.

  • Each table can have only one primary key.

  • Purpose: Ensures entity integrity — no duplicate or missing records.

  • Example:

SQL

CREATE TABLE Students (

    StudentID INT PRIMARY KEY,

    Name VARCHAR(100),

    Age INT

);

 

🔹 2. Foreign Key

  • Definition: A column (or set of columns) that references the primary key of another table.

  • Characteristics:

  • Can contain duplicate values.

  • Can contain NULL values (unless explicitly restricted).

  • Used to establish relationships between tables.

  • Purpose: Ensures referential integrity — links data across tables.

  • Example:

SQL

CREATE TABLE Enrollments (

    EnrollmentID INT PRIMARY KEY,

    StudentID INT,

    CourseID INT,

    FOREIGN KEY (StudentID) REFERENCES Students(StudentID)

);

 

🔁 Summary Table

Feature

Primary Key

Foreign Key

Uniqueness

Must be unique

Can be duplicated

NULLs allowed

Not allowed

Allowed (unless restricted)

Purpose

Identify rows uniquely

Link to another table

Integrity enforced

Entity integrity

Referential integrity

Number per table

Only one

Can be multiple

 

 

3. Difference between WHERE and HAVING?

 WHERE and HAVING are both used to filter records in SQL, but they are used in different contexts and operate at different stages of query execution.

🔹 1. WHERE

  • Used for: Filtering rows before any grouping or aggregation.

  • Applies to: Individual rows in a table.

  • Cannot be used with aggregate functions like SUM(), AVG(), etc.

  • Example:

SQL

SELECT * FROM Orders

WHERE OrderDate >= '2025-01-01';

 

🔹 2. HAVING

  • Used for: Filtering groups after aggregation (GROUP BY).

  • Applies to: Groups of rows.

  • Can be used with aggregate functions.

  • Example:

SQL

SELECT CustomerID, COUNT(*) AS OrderCount

FROM Orders

GROUP BY CustomerID

HAVING COUNT(*) > 5;

 

🔁 Summary Table

Feature

WHERE

HAVING

Filters on

Rows

Groups

Used with

SELECT, UPDATE, DELETE

SELECT with GROUP BY

Aggregate functions

Not allowed

Allowed

Execution stage

Before grouping

After grouping

🧠 Tip:

You can use both in the same query:

SQL

SELECT CustomerID, SUM(TotalAmount) AS TotalSpent

FROM Orders

WHERE OrderDate >= '2025-01-01'

GROUP BY CustomerID

HAVING SUM(TotalAmount) > 10000;

 

Would you like a visual flowchart showing how SQL processes WHERE and HAVING?

 

 

4. Explain INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.

Here’s a clear explanation of the four main types of SQL joins: INNER JOINLEFT JOINRIGHT JOIN, and FULL OUTER JOIN — all used to combine rows from two or more tables based on a related column.

🔹 1. INNER JOIN

  • Returns: Only the rows that have matching values in both tables.

  • Use case: When you want data that exists in both tables.

  • Example:

SQL

SELECT Orders.OrderID, Customers.CustomerName

FROM Orders

INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

 

🔹 2. LEFT JOIN (or LEFT OUTER JOIN)

  • Returns: All rows from the left table, and matched rows from the right table. If no match, returns NULL for right table columns.

  • Use case: When you want all records from the left table, even if there's no match in the right.

  • Example:

SQL

SELECT Customers.CustomerName, Orders.OrderID

FROM Customers

LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

 

🔹 3. RIGHT JOIN (or RIGHT OUTER JOIN)

  • Returns: All rows from the right table, and matched rows from the left table. If no match, returns NULL for left table columns.

  • Use case: When you want all records from the right table, even if there's no match in the left.

  • Example:

SQL

SELECT Orders.OrderID, Customers.CustomerName

FROM Orders

RIGHT JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

 

🔹 4. FULL OUTER JOIN

  • Returns: All rows when there is a match in either left or right table. If no match, returns NULL for missing side.

  • Use case: When you want all records from both tables, regardless of matches.

  • Example:

SQL

SELECT Customers.CustomerName, Orders.OrderID

FROM Customers

FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;

 

🔁 Summary Table

Join Type

Returns Rows From

Includes Non-Matching Rows From

INNER JOIN

Both tables

No

LEFT JOIN

Left table

Left only

RIGHT JOIN

Right table

Right only

FULL OUTER JOIN

Both tables

Both sides

Would you like a visual diagram showing how these joins work with sample data?

 

 

5. What is the difference between CHAR and VARCHAR?

The difference between CHAR and VARCHAR in SQL lies in how they store string data and manage memory:

🔹 1. CHAR (Character)

  • Fixed-length string data type.

  • Always reserves the specified number of characters, padding with spaces if the input is shorter.

  • Faster for fixed-size data due to predictable storage.

  • Use case: Ideal for fields like country codes ('IN', 'US') or fixed-length IDs.

Example:

SQL

CREATE TABLE Countries (

    Code CHAR(2),

    Name VARCHAR(50)

);

-- 'IN' will be stored as 'IN' (no padding needed)

-- 'U' will be stored as 'U ' (padded with space)

 

🔹 2. VARCHAR (Variable Character)

  • Variable-length string data type.

  • Stores only the actual number of characters entered.

  • More efficient for varying-length data.

  • Use case: Ideal for names, addresses, descriptions, etc.

Example:

SQL

CREATE TABLE Employees (

    FirstName VARCHAR(50),

    LastName VARCHAR(50)

);

-- 'John' uses only 4 bytes, not 50

 

🔁 Summary Table

Feature

CHAR

VARCHAR

Length

Fixed

Variable

Padding

Pads with spaces

No padding

Storage efficiency

Less efficient for short data

More efficient

Performance

Slightly faster for fixed data

Slightly slower for variable

Use case

Codes, fixed-length fields

Names, descriptions, etc.

Would you like a visual comparison showing how data is stored in both types?

 

 

6. What are indexes and why do we use them?

In SQL, indexes are special data structures that improve the speed and efficiency of data retrieval operations on a database table — much like an index in a book helps you quickly find a topic.

🔹 What is an Index?

An index is a pointer to data in a table. It works like a lookup table that helps the database engine find rows faster without scanning the entire table.

🔹 Why Use Indexes?

Indexes are used to:

  •  Speed up SELECT queries.

  • Improve performance of JOINsWHERE, and ORDER BY clauses.

  • Enforce uniqueness (e.g., via UNIQUE indexes).

  • But they slow down INSERT, UPDATE, DELETE operations slightly because the index also needs to be updated.

🔹 Types of Indexes

Type

Description

Primary Index

Automatically created on a primary key.

Unique Index

Ensures all values in the indexed column are unique.

Composite Index

Index on multiple columns.

Full-text Index

Used for searching text-based data efficiently.

Clustered Index

Sorts and stores the data rows in the table based on the index.

Non-clustered Index

Stores pointers to the actual data rows.

🔹 Example

SQL

-- Create an index on the 'LastName' column

CREATE INDEX idx_lastname ON Employees(LastName);

 

Now, queries like:

SQL

SELECT * FROM Employees WHERE LastName = 'Kumar';

 

will run faster because the database can use the index to locate matching rows quickly.

🔁 Summary

Feature

Indexes Help With

Speed

Faster data retrieval

Efficiency

Optimized query performance

Downsides

Slower write operations

Use case

WHERE, JOIN, ORDER BY

Would you like a visual diagram showing how indexes work behind the scenes?

 

 

7. Explain normalization and denormalization.

Here’s a clear explanation of Normalization and Denormalization in database design:

🔹 1. Normalization

  • Definition: The process of organizing data to reduce redundancy and improve data integrity.

  • Goal: Break down large tables into smaller, related tables and eliminate duplicate data.

  • Benefits:

  • Saves storage space.

  • Prevents data anomalies.

  • Improves consistency.

  • Drawback:

  • More complex queries due to multiple joins.

Example: Instead of storing customer and order info in one table:

SQL

-- Normalized structure

Customers (CustomerID, Name)

Orders (OrderID, CustomerID, Product)

 

🔹 2. Denormalization

  • Definition: The process of combining tables to reduce joins and improve read performance.

  • Goal: Optimize for faster querying, especially in read-heavy systems.

  • Benefits:

  • Faster data retrieval.

  • Simpler queries.

  • Drawback:

  • More storage used.

  • Risk of data inconsistency.

Example: Combining customer and order info into one table:

SQL

-- Denormalized structure

CustomerOrders (CustomerID, Name, OrderID, Product)

 

🔁 Summary Table

Feature

Normalization

Denormalization

Purpose

Reduce redundancy

Improve performance

Data integrity

High

Lower

Storage efficiency

Efficient

Less efficient

Query complexity

More joins

Fewer joins

Use case

OLTP (transactional systems)

OLAP (reporting/analytics)

Would you like a diagram showing how a normalized vs denormalized schema looks visually?

 

 

8. Difference between UNION and UNION ALL.

Here’s a clear comparison between UNION and UNION ALL in SQL:

🔹 1. UNION

  • Combines the result sets of two or more SELECT queries.

  • Removes duplicates from the final result.

  • Slower than UNION ALL because it performs a distinct sort.

  • Use case: When you want a unique list of results.

Example:

SQL

SELECT City FROM Customers

UNION

SELECT City FROM Suppliers;

 

Returns each city only once, even if it appears in both tables.

🔹 2. UNION ALL

  • Combines the result sets of two or more SELECT queries.

  • Keeps duplicates in the final result.

  • Faster because it skips the duplicate check.

  • Use case: When you want all records, including duplicates.

Example:

SQL

SELECT City FROM Customers

UNION ALL

SELECT City FROM Suppliers;

 

Returns all cities, including duplicates.

🔁 Summary Table

Feature

UNION

UNION ALL

Combines results

Yes

Yes

Removes duplicates

Yes

No

Performance

Slower

Faster

Use case

Unique results

All results (including duplicates)

Would you like a visual example showing how both behave with sample data?

 

 

9. What is a view? Can we update data through a view?

🔹 What is a View in SQL?

view is a virtual table based on the result of a SQL query. It doesn't store data itself but displays data stored in other tables. Think of it as a saved query that you can treat like a regular table.

Why Use Views?

  • Simplify complex queries.

  • Improve security by exposing only specific columns.

  • Provide a consistent interface to changing data structures.

  • Help with reporting and analytics.

🔹 Example:

SQL

CREATE VIEW ActiveCustomers AS

SELECT CustomerID, Name, Email

FROM Customers

WHERE Status = 'Active';

 

Now you can query it like a table:

SQL

SELECT * FROM ActiveCustomers;

 

🔄 Can You Update Data Through a View?

 Yes, but only if:

  • The view is based on a single table.

  • It does not use:

  • GROUP BY

  • DISTINCT

  • JOIN

  • UNION

  • Aggregate functions (SUM, AVG, etc.)

 No, if the view is:

  • Based on multiple tables.

  • Contains aggregationsgrouping, or set operations.

🔁 Summary Table

Feature

View

Stores data

No (virtual table)

Based on query

Yes

Can be queried like table

Yes

Can be updated

Sometimes (with conditions)

Use case

Simplify logic, restrict access

Would you like an example of an updatable view vs a non-updatable view to see the difference in action?

 

 

10. What is the difference between NULL and 0?

Difference Between NULL and 0 in SQL

NULL and 0 may look similar at first glance (especially when a column appears empty), but they represent very different concepts in SQL:

 NULL

  • Meaning: Represents missingunknown, or undefined data.

  • Data type: Can appear in any column type (text, number, date, etc.).

  • Behavior:

  • Arithmetic operations with NULL result in NULL.

  • Comparisons with NULL require IS NULL or IS NOT NULL.

  • Example:

SQL

SELECT * FROM Employees WHERE ManagerID IS NULL;

 

 0

  • Meaning: A known numeric value — specifically, zero.

  • Data type: Only valid in numeric columns.

  • Behavior:

  • Arithmetic operations with 0 behave normally.

  • Can be compared using standard operators (=, <, >).

  • Example:

SQL

SELECT * FROM Products WHERE Quantity = 0;

 

🔁 Summary Table

Feature

NULL

0

Represents

Unknown or missing value

Numeric value (zero)

Data type

Any

Numeric only

Arithmetic

Results in NULL

Normal math operations

Comparisons

Use IS NULL / IS NOT NULL

Use =, <, > etc.

Storage

No actual value stored

Value stored as 0

🧠 Tip:

In SQL, NULL ≠ 0 — they are not equal, and treating them as such can lead to logic errors in queries.

Would you like a visual example showing how queries behave differently with NULL vs 0?

 

 

 

 

---

 

🔹 Intermediate SQL Interview Questions

 

1. What is a clustered vs non-clustered index?

🔹 Clustered vs Non-Clustered Index in SQL

Indexes help speed up data retrieval, and they come in two main types: clustered and non-clustered. Here's a clear breakdown of the differences:

1. Clustered Index

  • Definition: Sorts and stores the actual data rows in the table based on the index key.

  • Only one per table: Because the data rows themselves are sorted, you can only have one clustered index.

  • Faster for range queries: Since data is physically ordered.

  • Automatically created on the primary key (unless specified otherwise).

Example:

SQL

CREATE CLUSTERED INDEX idx_customer_id ON Customers(CustomerID);

 

2. Non-Clustered Index

  • Definition: Creates a separate structure that stores pointers to the actual data rows.

  • Multiple allowed: You can have many non-clustered indexes on a table.

  • Slower for range queries, but useful for quick lookups on non-primary columns.

Example:

SQL

CREATE NONCLUSTERED INDEX idx_customer_name ON Customers(Name);

 

🔁 Summary Table

Feature

Clustered Index

Non-Clustered Index

Data storage

Sorts actual table data

Stores pointers to data rows

Number per table

Only one

Multiple allowed

Speed for range queries

Faster

Slower

Automatically created on

Primary key

Not automatic

Use case

Efficient sorting & range scans

Fast lookups on non-key columns

🧠 Tip:

Think of a clustered index like a dictionary where pages are sorted alphabetically. A non-clustered index is like a table of contents pointing to specific pages.

Would you like a visual diagram showing how both types of indexes work with sample data?

 

 

2. Explain window functions (ROW_NUMBER, RANK, DENSE_RANK).

🔹 What Are Window Functions in SQL?

Window functions perform calculations across a set of table rows that are related to the current row — without collapsing rows like GROUP BY does. They are super useful for ranking, running totals, and comparisons.

Common Window Functions

1. ROW_NUMBER()

  • Assigns a unique sequential number to each row within a partition.

  • No ties — every row gets a distinct number.

Example:

SQL

SELECT

  EmployeeID, Department, Salary,

  ROW_NUMBER() OVER (PARTITION BY Department ORDER BY Salary DESC) AS RowNum

FROM Employees;

 

➡️ Gives each employee a unique rank within their department based on salary.

2. RANK()

  • Assigns a rank to each row with gaps if there are ties.

  • Tied rows get the same rank, and the next rank is skipped.

Example:

SQL

SELECT

  EmployeeID, Salary,

  RANK() OVER (ORDER BY Salary DESC) AS Rank

FROM Employees;

 

➡️ If two employees have the same salary, they get the same rank, and the next rank is skipped (e.g., 1, 2, 2, 4).

3. DENSE_RANK()

  • Similar to RANK(), but no gaps in ranking.

  • Tied rows get the same rank, and the next rank continues sequentially.

Example:

SQL

SELECT

  EmployeeID, Salary,

  DENSE_RANK() OVER (ORDER BY Salary DESC) AS DenseRank

FROM Employees;

 

➡️ If two employees tie, they get the same rank, but the next rank is just incremented (e.g., 1, 2, 2, 3).

🔁 Summary Table

Function

Handles Ties

Gaps in Rank

Use Case

ROW_NUMBER()

No

No

Unique row numbering

RANK()

Yes

Yes

Ranking with gaps

DENSE_RANK()

Yes

No

Ranking without gaps

Would you like a visual example showing how these functions behave on sample data?

 

 

3. What is the difference between EXISTS and IN?

🔹 Difference Between EXISTS and IN in SQL

Both EXISTS and IN are used in subqueries to filter data, but they work differently under the hood and are best suited for different scenarios.

1. IN

  • Checks if a value exists in a list or result set.

  • Subquery returns a list of values, and the outer query checks if a value is in that list.

  • Better for small, static lists or when the subquery returns a single column.

Example:

SQL

SELECT * FROM Employees

WHERE DepartmentID IN (SELECT DepartmentID FROM Departments WHERE Location = 'Chennai');

 

2. EXISTS

  • Checks if a subquery returns any rows.

  • Subquery is usually correlated (depends on the outer query).

  • Faster for large datasets because it stops searching once it finds a match.

Example:

SQL

SELECT * FROM Employees E

WHERE EXISTS (

  SELECT 1 FROM Departments D

  WHERE D.DepartmentID = E.DepartmentID AND D.Location = 'Chennai'

);

 

🔁 Summary Table

Feature

IN

EXISTS

Subquery returns

A list of values

A boolean (true/false)

Best for

Small lists, single-column checks

Large datasets, correlated queries

Performance

Slower on large subqueries

Faster due to short-circuiting

Null handling

Can cause issues with NULL

Handles NULL more gracefully

🧠 Tip:

  • Use IN when you're comparing a single column to a list.

  • Use EXISTS when you're checking for the existence of related rows.

Would you like a performance comparison example or a diagram showing how each works internally?

 

 

4. Explain CTE (Common Table Expressions).

What is a CTE (Common Table Expression)?

CTE (Common Table Expression) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. It helps make complex queries more readablemodular, and maintainable.

Syntax of a CTE

SQL

WITH CTE_Name AS (

    SELECT column1, column2

    FROM SomeTable

    WHERE condition

)

SELECT * FROM CTE_Name;

 

🔹 Why Use CTEs?

  • Simplifies complex joins and subqueries.

  • Improves readability and structure.

  • Allows recursion (e.g., hierarchical data like org charts).

  • Can be reused multiple times in the same query.

🔹 Example: Basic CTE

SQL

WITH HighSalaryEmployees AS (

    SELECT EmployeeID, Name, Salary

    FROM Employees

    WHERE Salary > 100000

)

SELECT * FROM HighSalaryEmployees;

 

🔹 Example: Recursive CTE (for hierarchical data)

SQL

WITH EmployeeHierarchy AS (

    SELECT EmployeeID, ManagerID, Name

    FROM Employees

    WHERE ManagerID IS NULL

    UNION ALL

    SELECT e.EmployeeID, e.ManagerID, e.Name

    FROM Employees e

    INNER JOIN EmployeeHierarchy eh ON e.ManagerID = eh.EmployeeID

)

SELECT * FROM EmployeeHierarchy;

 

🔁 Summary Table

Feature

CTE

Scope

Temporary (within the query)

Improves readability

Yes

Supports recursion

Yes

Reusable in query

Yes

Stored in DB

No

Would you like a visual diagram showing how recursive CTEs work in hierarchical data?

 

 

5. What are stored procedures vs functions?

🔹 Stored Procedures vs Functions in SQL

Both stored procedures and functions are reusable blocks of SQL code, but they serve different purposes and have distinct behaviors.

1. Stored Procedure

  • A set of SQL statements that perform a specific task.

  • Can perform actions like modifying data (INSERT, UPDATE, DELETE).

  • Can return multiple result sets or no result.

  • Called using EXEC or CALL.

Example:

SQL

CREATE PROCEDURE GetHighSalaryEmployees

AS

BEGIN

    SELECT * FROM Employees WHERE Salary > 100000;

END;

 

Usage:

SQL

EXEC GetHighSalaryEmployees;

 

2. Function

  • Returns a single value or a table.

  • Can be used inside SQL expressions (e.g., in SELECT, WHERE, etc.).

  • Cannot modify database state (no INSERT, UPDATE, DELETE).

  • Must return a value using RETURN.

Example:

SQL

CREATE FUNCTION GetBonus (@Salary INT)

RETURNS INT

AS

BEGIN

    RETURN @Salary * 0.10;

END;

 

Usage:

SQL

SELECT Name, GetBonus(Salary) AS Bonus FROM Employees;

 

🔁 Summary Table

Feature

Stored Procedure

Function

Returns value

Optional

Mandatory

Can modify data

Yes

No

Used in SQL statements

No

Yes

Returns multiple results

Yes

No (usually one value)

Called using

EXEC or CALL

Used in expressions

Would you like a diagram showing how procedures and functions fit into a database workflow?

 

 

6. Difference between correlated and non-correlated subquery.

🔹 Correlated vs Non-Correlated Subquery in SQL

Both are types of subqueries, but they differ in how they interact with the outer query.

1. Non-Correlated Subquery

  • Independent of the outer query.

  • Executes once, and the result is used by the outer query.

  • Can be run standalone.

Example:

SQL

SELECT Name FROM Employees

WHERE DepartmentID IN (

    SELECT DepartmentID FROM Departments WHERE Location = 'Chennai'

);

 

➡️ The subquery runs once and returns a list of department IDs.

2. Correlated Subquery

  • Depends on the outer query for its values.

  • Executes once for each row of the outer query.

  • Cannot run independently.

Example:

SQL

SELECT Name FROM Employees E

WHERE EXISTS (

    SELECT 1 FROM Departments D

    WHERE D.DepartmentID = E.DepartmentID AND D.Location = 'Chennai'

);

 

➡️ The subquery runs for each employee, checking if their department is in Chennai.

🔁 Summary Table

Feature

Non-Correlated Subquery

Correlated Subquery

Depends on outer query

No

Yes

Execution frequency

Once

Once per outer row

Performance

Faster

Slower (can be expensive)

Can run standalone

Yes

No

Use case

Static filtering

Row-by-row comparison

Would you like a visual example showing how both types behave with sample data?

 

 

7. How do you handle duplicate rows in SQL?

 How to Handle Duplicate Rows in SQL

Duplicate rows can occur due to data entry errors, joins, or imports. Here are several ways to identifyremove, or prevent duplicates in SQL:

1. Identify Duplicates

To find duplicates based on one or more columns:

SQL

SELECT column1, column2, COUNT(*)

FROM table_name

GROUP BY column1, column2

HAVING COUNT(*) > 1;

 

2. Delete Duplicate Rows (Keep One)

Using CTE and ROW_NUMBER():

SQL

WITH CTE AS (

  SELECT *, ROW_NUMBER() OVER (

    PARTITION BY column1, column2

    ORDER BY id

  ) AS rn

  FROM table_name

)

DELETE FROM CTE WHERE rn > 1;

 

➡️ Keeps the first occurrence and deletes the rest.

3. Using DISTINCT to Remove Duplicates in Query Results

SQL

SELECT DISTINCT column1, column2

FROM table_name;

 

➡️ This doesn’t change the table — it just removes duplicates from the result set.

4. Prevent Duplicates Using Constraints

Add a UNIQUE constraint to prevent future duplicates:

SQL

ALTER TABLE table_name

ADD CONSTRAINT unique_constraint_name UNIQUE (column1, column2);

 

🔁 Summary Table

Task

Method

Find duplicates

GROUP BY + HAVING COUNT(*) > 1

Delete duplicates

ROW_NUMBER() in a CTE

Remove from result only

SELECT DISTINCT

Prevent future duplicates

UNIQUE constraint

Would you like help writing a query for a specific table or dataset you're working with?

 

8. What is a transaction? Explain COMMIT, ROLLBACK, and SAVEPOINT.

🔹 What is a Transaction in SQL?

transaction is a sequence of one or more SQL operations (like INSERT, UPDATE, DELETE) that are executed as a single unit of work. Transactions ensure data integrity and follow the ACID properties:

  • Atomicity: All operations succeed or none do.

  • Consistency: Data remains in a valid state.

  • Isolation: Transactions don’t interfere with each other.

  • Durability: Changes persist even after a crash.

Key Transaction Control Commands

1. COMMIT

  • Saves all changes made during the transaction.

  • Once committed, changes are permanent.

Example:

SQL

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;

COMMIT;

 

2. ROLLBACK

  • Undoes all changes made during the transaction.

  • Used when an error occurs or a condition fails.

Example:

SQL

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

-- Something goes wrong

ROLLBACK;

 

3. SAVEPOINT

  • Creates a checkpoint within a transaction.

  • Allows partial rollback to that point without undoing the entire transaction.

Example:

SQL

BEGIN TRANSACTION;

UPDATE Accounts SET Balance = Balance - 100 WHERE AccountID = 1;

SAVEPOINT AfterDebit;

UPDATE Accounts SET Balance = Balance + 100 WHERE AccountID = 2;

-- Something goes wrong

ROLLBACK TO AfterDebit;

COMMIT;

 

🔁 Summary Table

Command

Purpose

Reversibility

COMMIT

Finalizes and saves changes

No

ROLLBACK

Cancels all changes in the transaction

Yes

SAVEPOINT

Sets a checkpoint for partial rollback

Yes

Would you like a visual flowchart showing how transactions and rollback points work?

 

 

9. What is the difference between OLTP and OLAP databases?

🔹 Difference Between OLTP and OLAP Databases

OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) are two types of database systems designed for different purposes:

1. OLTP (Online Transaction Processing)

  • Purpose: Handles day-to-day operations like insertions, updates, and deletions.

  • Focus: Speed and accuracy of short, atomic transactions.

  • Data structure: Highly normalized to reduce redundancy.

  • Users: Front-line staff, customers, applications.

  • Examples:

  • Banking systems

  • E-commerce platforms

  • Reservation systems

Characteristics:

Feature

OLTP

Operations

INSERT, UPDATE, DELETE

Query type

Simple, fast queries

Data volume

Smaller per transaction

Response time

Milliseconds

Design

Normalized schema

2. OLAP (Online Analytical Processing)

  • Purpose: Supports complex queries for data analysis and reporting.

  • Focus: Speed and flexibility of read-heavy operations.

  • Data structure: Often denormalized for performance.

  • Users: Analysts, decision-makers, BI tools.

  • Examples:

  • Sales forecasting

  • Market trend analysis

  • Executive dashboards

Characteristics:

Feature

OLAP

Operations

SELECT, GROUP BY, JOIN

Query type

Complex, multi-dimensional

Data volume

Large datasets

Response time

Seconds to minutes

Design

Star/Snowflake schema

🔁 Summary Table

Feature

OLTP

OLAP

Purpose

Transactional processing

Analytical processing

Query type

Simple, fast

Complex, aggregated

Schema design

Normalized

Denormalized

Data volume

Small per transaction

Large for analysis

Users

Operational staff

Analysts, executives

Examples

ATM, POS, CRM

BI dashboards, data warehouses

Would you like a visual diagram showing how OLTP and OLAP systems interact in a typical enterprise architecture?

 

 

10. What is the difference between DELETE with WHERE and TRUNCATE?

 Difference Between DELETE with WHERE and TRUNCATE in SQL

Both DELETE and TRUNCATE are used to remove data from a table, but they differ in how they work, performance, and rollback behavior.

1. DELETE with WHERE

  • Removes specific rows from a table based on a condition.

  • Row-by-row deletion — slower for large datasets.

  • Can be rolled back if used within a transaction.

  • Triggers: Fires DELETE triggers.

  • Usage:

SQL

DELETE FROM Employees WHERE Department = 'HR';

 

2. TRUNCATE

  • Removes all rows from a table — no filtering.

  • Faster than DELETE because it doesn’t log individual row deletions.

  • Cannot be rolled back in some databases (depends on DBMS).

  • Triggers: Does not fire DELETE triggers.

  • Usage:

SQL

TRUNCATE TABLE Employees;

 

🔁 Summary Table

Feature

DELETE with WHERE

TRUNCATE

Removes

Specific rows

All rows

Filtering

Yes (WHERE clause)

No filtering

Rollback

Yes (in transaction)

⚠️ Depends on DBMS

Triggers

Fires DELETE triggers

Does not fire triggers

Performance

Slower (row-by-row)

Faster (bulk operation)

Table structure

Remains unchanged

Remains unchanged

🧠 Tip:

Use DELETE when you need fine control over which rows to remove. Use TRUNCATE when you want a quick cleanup of the entire table.

Would you like a diagram showing how both commands affect a sample table?

 

 

 

 

---

 

🔹 Advanced SQL Interview Questions

 

1. Explain partitioning in SQL (horizontal vs vertical).

Partitioning is a database design technique used to divide a large table into smaller, more manageable pieces — called partitions — while keeping them logically as one table. This improves performancemaintenance, and scalability.

There are two main types:

1. Horizontal Partitioning

  • Divides rows of a table into multiple partitions based on a condition (e.g., date, region, ID range).

  • Each partition contains a subset of rows.

  • Common in large transactional tables (e.g., sales data by year).

Example:

SQL

-- Sales data partitioned by year

Sales_2023

Sales_2024

Sales_2025

 

Use case: Improves query performance by scanning only relevant partitions.

2. Vertical Partitioning

  • Divides columns of a table into multiple tables.

  • Each partition contains a subset of columns, often grouped by usage or sensitivity.

  • Common in wide tables with many columns.

Example:

SQL

-- Original table: Customers (ID, Name, Email, Address, Preferences)

-- Vertical partitions:

CustomerBasic (ID, Name, Email)

CustomerDetails (ID, Address, Preferences)

 

Use case: Improves performance and security by accessing only needed columns.

🔁 Summary Table

Feature

Horizontal Partitioning

Vertical Partitioning

Divides by

Rows

Columns

Partition contains

Subset of rows

Subset of columns

Use case

Large datasets by time/region

Wide tables, column grouping

Performance benefit

Faster row filtering

Faster column access

Common in

OLTP and OLAP systems

OLTP systems

Would you like a visual diagram showing how horizontal and vertical partitioning work on a sample table?

 

 

2. How does indexing improve performance, and what are drawbacks?

Indexing is like creating a shortcut for the database to find data faster — similar to how a book index helps you quickly locate topics without reading every page.

Benefits of Indexing

  1. Faster Data Retrieval

  • Speeds up SELECT queries, especially with WHERE, JOIN, ORDER BY, and GROUP BY.

  • Reduces the number of rows scanned by using a search tree or hash map.

  1. Efficient Sorting and Filtering

  • Indexes help sort data quickly, improving performance for queries with ORDER BY.

  1. Improved Join Performance

  • Indexes on foreign keys and join columns make joins faster by reducing lookup time.

  1. Enforces Uniqueness

  • UNIQUE indexes ensure no duplicate values in a column (e.g., email, username).

Drawbacks of Indexing

  1. Slower Write Operations

  • INSERT, UPDATE, and DELETE become slower because indexes must also be updated.

  1. Increased Storage Usage

  • Indexes consume additional disk space, especially for large tables or multiple indexes.

  1. Complex Maintenance

  • Requires careful planning to avoid redundant or unused indexes.

  • Poorly designed indexes can hurt performance instead of helping.

  1. Not Always Used

  • The query optimizer may ignore indexes if it determines a full scan is faster.

🔁 Summary Table

Feature

Benefit

Drawback

Query speed

Faster reads

Slower writes

Storage usage

Efficient access

More disk space

Maintenance

Better performance tuning

Requires careful design

Use in joins & filters

Optimized joins

May be ignored by optimizer

Would you like a visual diagram showing how indexes work internally (e.g., B-tree vs hash index)?

 

 

3. What is query execution plan? How would you optimize a query?

Query Execution Plan (QEP) is a detailed roadmap that the database engine uses to execute a SQL query. It shows how the query will be processed — including which indexes are used, how tables are joined, and in what order operations occur.

Think of it as a behind-the-scenes blueprint that helps you understand and optimize query performance.

How to View a Query Execution Plan

Most databases support commands or tools to view the plan:

  • SQL Server: SET SHOWPLAN_ALL ON or use the "Execution Plan" tab in SSMS.

  • PostgreSQL: EXPLAIN or EXPLAIN ANALYZE

  • MySQL: EXPLAIN SELECT ...

Example:

SQL

EXPLAIN SELECT * FROM Orders WHERE CustomerID = 101;

 

🔹 Key Components in a Plan

Component

Description

Scan Type

Table scan, index scan, etc.

Join Type

Nested loop, hash join, merge join

Cost Estimate

Estimated time/resources

Rows Returned

Estimated number of rows

Index Usage

Whether an index is used

How to Optimize a Query

Here are common strategies:

1. Use Indexes Wisely

  • Create indexes on columns used in WHERE, JOIN, ORDER BY.

  • Avoid indexing columns with low selectivity (e.g., gender).

2. **Avoid SELECT ***

  • Only select the columns you need.

3. Use WHERE Clauses Efficiently

  • Filter early to reduce row scans.

4. Optimize Joins

  • Use appropriate join types.

  • Ensure join columns are indexed.

5. Use CTEs or Subqueries Carefully

  • Avoid unnecessary nesting or repeated subqueries.

6. Analyze Execution Plan

  • Look for full table scans, missing indexes, or expensive operations.

7. Update Statistics

  • Keep database statistics up to date for better query planning.

🔁 Summary Table

Feature

Query Execution Plan

Query Optimization

Purpose

Shows how query will run

Improves query performance

Tools

EXPLAIN, SSMS, pgAdmin

Indexing, filtering, joins

Common issues

Full scans, bad joins

Missing indexes, SELECT *

Benefit

Insight into performance

Faster, more efficient queries

Would you like help analyzing a specific query or execution plan from your database?

 

 

4. Difference between materialized view and normal view.

Both views and materialized views are used to simplify complex queries and present data in a structured way — but they differ in how data is stored and refreshed.

1. Normal View

  • virtual table based on a SQL query.

  • Does not store data — it fetches data from the base tables every time it's queried.

  • Always shows up-to-date data.

  • Lightweight and fast to create.

  • No indexing on the view itself.

Example:

SQL

CREATE VIEW ActiveCustomers AS

SELECT * FROM Customers WHERE Status = 'Active';

 

2. Materialized View

  • physical copy of the result set of a query.

  • Stores data like a regular table.

  • Must be refreshed manually or periodically to stay up-to-date.

  • Improves performance for complex or expensive queries.

  • Can be indexed for faster access.

Example:

SQL

CREATE MATERIALIZED VIEW SalesSummary AS

SELECT Region, SUM(Sales) FROM Orders GROUP BY Region;

 

🔁 Summary Table

Feature

Normal View

Materialized View

Data storage

No (virtual)

Yes (physical)

Performance

Slower for complex queries

Faster (precomputed)

Data freshness

Always current

May be stale (needs refresh)

Indexing

Not supported

Supported

Use case

Lightweight query abstraction

Heavy reporting, analytics

Would you like a diagram showing how views and materialized views interact with base tables and queries?

 

 

5. Explain star schema and snowflake schema.

🔹 Star Schema vs Snowflake Schema in Data Warehousing

Both Star Schema and Snowflake Schema are data modeling techniques used in dimensional databases (typically OLAP systems) to organize data for reporting and analytics. Here's a clear comparison:

1. Star Schema

  • Structure: Central fact table connected directly to multiple dimension tables.

  • Dimension tables are denormalized (contain redundant data).

  • Simpler and faster for querying.

  • Use case: Ideal for performance-focused reporting systems.

Example:

        Date

         |

Product — Fact_Sales — Customer

         |

       Region

Advantages:

  • Simple design

  • Faster query performance

  • Easy to understand

Disadvantages:

  • Redundant data in dimension tables

  • Larger storage footprint

❄️ 2. Snowflake Schema

  • Structure: Central fact table connected to normalized dimension tables (which may have sub-dimensions).

  • Dimension tables are split into multiple related tables.

  • More complex, but saves space and improves data integrity.

Example:

        Date

         |

Product — Fact_Sales — Customer — Customer_Location

         |

       Region — Country

Advantages:

  • Reduced data redundancy

  • Better data integrity

  • Smaller storage size

Disadvantages:

  • More complex joins

  • Slower query performance

🔁 Summary Table

Feature

Star Schema

Snowflake Schema

Dimension tables

Denormalized

Normalized

Query performance

Faster

Slower

Storage efficiency

Less efficient

More efficient

Design complexity

Simple

Complex

Use case

Reporting, dashboards

Detailed analysis, data integrity

 

 

6. How would you implement SCD Type 1 vs Type 2 in SQL?

SCD refers to how dimensional data (like customer or product info) changes over time in a data warehouse. Two common types are:

SCD Type 1: Overwrite Old Data

  • Purpose: Keep only the latest data.

  • Old values are lost — no history is maintained.

  • Use case: When historical changes are not important (e.g., correcting typos).

🔧 Implementation in SQL:

SQL

UPDATE Customers

SET Email = 'new_email@example.com'

WHERE CustomerID = 101;

 

➡️ This updates the existing record directly.

SCD Type 2: Preserve History

  • Purpose: Keep historical versions of data.

  • New row is inserted for each change.

  • Typically includes:

  • EffectiveDate

  • EndDate

  • IsCurrent flag

🔧 Implementation in SQL:

SQL

-- Step 1: Mark old record as inactive

UPDATE Customers

SET EndDate = CURRENT_DATE, IsCurrent = 0

WHERE CustomerID = 101 AND IsCurrent = 1;

 

-- Step 2: Insert new record

INSERT INTO Customers (CustomerID, Name, Email, StartDate, EndDate, IsCurrent)

VALUES (101, 'John Doe', 'new_email@example.com', CURRENT_DATE, NULL, 1);

 

➡️ This keeps both old and new versions of the customer data.

🔁 Summary Table

Feature

SCD Type 1

SCD Type 2

Data history

Not preserved

Preserved

Action on change

Overwrite

Insert new row

Complexity

Simple

More complex

Use case

Minor corrections

Track historical changes

Would you like a visual diagram showing how SCD Type 1 and Type 2 records evolve over time?

 

 

7. What are window frames in SQL (e.g., ROWS BETWEEN) and their use cases?

Window frames define the subset of rows within a partition that a window function (like SUM, AVG, ROW_NUMBER, etc.) should operate on. They are used with the OVER() clause to control how much data is considered for each row's calculation.

Syntax Overview

SQL

<window_function>() OVER (

  PARTITION BY column

  ORDER BY column

  ROWS BETWEEN <frame_start> AND <frame_end>

)

 

🔹 Common Frame Types

1. ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

  • Includes all rows from the start of the partition up to the current row.

  • Useful for running totals.

SQL

SELECT

  EmployeeID,

  Salary,

  SUM(Salary) OVER (

    ORDER BY EmployeeID

    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

  ) AS RunningTotal

FROM Employees;

 

2. ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

  • Includes the previous, current, and next row.

  • Useful for moving averages or smoothing.

SQL

SELECT

  Date,

  Sales,

  AVG(Sales) OVER (

    ORDER BY Date

    ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

  ) AS MovingAvg

FROM DailySales;

 

3. ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

  • Includes the current row and all rows after it.

  • Useful for cumulative future calculations.

🔁 Summary Table

Frame Clause

Description

Use Case

UNBOUNDED PRECEDING TO CURRENT ROW

From start to current row

Running totals

1 PRECEDING TO 1 FOLLOWING

Previous, current, next row

Moving averages

CURRENT ROW TO UNBOUNDED FOLLOWING

Current row to end

Future projections

🧠 Tip:

  • Window frames are only relevant for aggregate window functions like SUM, AVG, MIN, MAX.

  • Ranking functions (ROW_NUMBER, RANK, DENSE_RANK) ignore window frames — they only use PARTITION BY and ORDER BY.

Would you like a visual diagram showing how different window frames operate on a sample dataset?

 

 

8. How does SQL handle isolation levels (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE)?

Isolation levels control how transactions interact with each other, especially when accessing the same data concurrently. They help manage data consistency and concurrency in multi-user environments.

The Four Standard Isolation Levels

Isolation Level

Dirty Read

Non-repeatable Read

Phantom Read

Description

READ UNCOMMITTED

Yes

Yes

Yes

Lowest isolation; allows reading uncommitted changes.

READ COMMITTED

No

Yes

Yes

Default in many DBs; reads only committed data.

REPEATABLE READ

No

No

Yes

Prevents changes to rows read, but new rows can appear.

SERIALIZABLE

No

No

No

Highest isolation; transactions behave as if run sequentially.

🔹 Key Terms

  • Dirty Read: Reading data that another transaction has modified but not yet committed.

  • Non-repeatable Read: Reading the same row twice and getting different results due to another transaction's update.

  • Phantom Read: New rows added by another transaction appear in a repeated query.

🔧 How to Set Isolation Level (Example in SQL Server)

SQL

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

 

BEGIN TRANSACTION;

-- Your SQL operations here

COMMIT;

 

🧠 Use Cases

Scenario

Recommended Isolation Level

Fast reporting, low accuracy

READ UNCOMMITTED

General OLTP systems

READ COMMITTED

Financial transactions

REPEATABLE READ

Critical consistency (e.g., banking)

SERIALIZABLE

 

 

9. What is sharding in databases?

Sharding is a database partitioning technique that splits a large database into smaller, faster, and more manageable pieces called shards. Each shard holds a subset of the data, and together they make up the complete dataset.

Why Use Sharding?

  • 🚀 Improves performance by distributing load across multiple servers.

  • 📈 Enhances scalability — you can add more shards as data grows.

  • 🔒 Increases availability — failure in one shard doesn’t affect others.

🔹 How Sharding Works

Each shard is a separate database (or table) that contains a portion of the data, typically based on a shard key like:

  • User ID ranges (e.g., 1–1000 in Shard A, 1001–2000 in Shard B)

  • Geographic location (e.g., India in Shard A, US in Shard B)

  • Hashing (e.g., hash of email or ID)

🔁 Example

Suppose you have a Users table with 10 million rows. Instead of storing all rows in one database:

Shard

User ID Range

A

1 – 1,000,000

B

1,000,001 – 2,000,000

C

2,000,001 – 3,000,000

Each shard can be stored on a different server, improving performance and reliability.

🔻 Challenges of Sharding

Challenge

Description

Complex queries

Cross-shard joins are difficult.

Rebalancing

Moving data between shards can be tricky.

Operational overhead

More infrastructure to manage.

Consistency

Harder to enforce global constraints (e.g., unique email).

🔁 Summary Table

Feature

Sharding

Purpose

Distribute data across servers

Benefits

Scalability, performance, availability

Drawbacks

Complexity, cross-shard queries

Common in

Large-scale apps (e.g., Facebook, Amazon)

Would you like a visual diagram showing how sharding splits and distributes data across servers?

 

 

10. What is the difference between MERGE and UPSERT?

Difference Between MERGE and UPSERT in SQL

Both MERGE and UPSERT are used to insert or update data depending on whether a match exists — but they differ in syntaxstandardization, and database support.

1. MERGE

  • standard SQL command (also known as "upsert with control").

  • Combines INSERT, UPDATE, and optionally DELETE in one statement.

  • Requires a source and target table.

  • Supported in: SQL ServerOraclePostgreSQL (via extensions).

🔧 Example:

SQL

MERGE INTO TargetTable AS T

USING SourceTable AS S

ON T.ID = S.ID

WHEN MATCHED THEN

    UPDATE SET T.Name = S.Name

WHEN NOT MATCHED THEN

    INSERT (ID, Name) VALUES (S.ID, S.Name);

 

2. UPSERT

  • concept, not a standard SQL keyword.

  • Means: Insert if not exists, otherwise Update.

  • Implemented differently across databases:

  • PostgreSQL: INSERT ... ON CONFLICT

  • MySQL: INSERT ... ON DUPLICATE KEY UPDATE

  • SQLite: INSERT OR REPLACE

  • SQL Server: Use MERGE or custom logic

🔧 Example in PostgreSQL:

SQL

INSERT INTO Employees (ID, Name)

VALUES (101, 'John')

ON CONFLICT (ID)

DO UPDATE SET Name = EXCLUDED.Name;

 

🔁 Summary Table

Feature

MERGE

UPSERT

SQL Standard

Yes

No (concept only)

Control over actions

Full (insert, update, delete)

Limited (insert/update)

Syntax complexity

More complex

Simpler

Database support

SQL Server, Oracle

PostgreSQL, MySQL, SQLite

🧠 Tip:

  • Use MERGE when you need fine-grained control over multiple actions.

  • Use UPSERT when you want a quick insert-or-update logic.

Would you like examples tailored to a specific database system you're working with?

 

 

0 comments:

Post a Comment

Note: only a member of this blog may post a comment.