Data Science

Top 100 SQL Interview Questions and Answers

Last Updated: 23rd June, 2024

Harshini Bhat

Data Science Consultant at almaBetter

Preparing for a SQL interview? Look no further than this comprehensive guide and gain the knowledge and confidence to ace your next SQL interview

SQL (Structured Query Language) is one of the most widely used programming languages for managing and manipulating data in a relational database. To start or advance your career in data management or analytics, SQL is a very useful tool and hence you will have to face a SQL interview at some point. To help prepare, we have compiled a list of the top 100 SQL interview questions that cover a range of topics, from basic syntax to advanced optimization techniques. Whether you're a beginner or min 1 to 4 year experienced SQL user, this guide will provide you with the knowledge and confidence you need to succeed in your next SQL interview. So, let's dive in and start mastering SQL!

SQL Interview Questions and Answers For Freshers and Experienced

1. What is SQL and why is it important in Data Science?

Answer: SQL stands for Structured Query Language, and it is a domain-specific programming language used for managing and manipulating relational databases. It is important for Data Science because it is the most commonly used language for working with relational databases. It allows data scientists to store, integrate, clean, and analyze large volumes of data efficiently. SQL provides a powerful set of tools for data manipulation and analysis, making it an essential skill for any data scientist working with large and complex datasets.

2. Define a Database.

Answer: A database is a structured collection of data that may be digitally stored, accessed, maintained, and retrieved from a distant or local computer system. Databases can be large and complicated, with a consistent design and modelling approach. Smaller databases can be stored on a file system, whereas larger databases are hosted on computer clusters or in the cloud

3. What is the difference between DBMS and RDBMS?

Answer: DBMS stands for Database Management System, while RDBMS stands for Relational Database Management System.

The main difference between DBMS and RDBMS is that DBMS is a software system that manages and maintains databases, but it does not necessarily follow the relational model. On the other hand, RDBMS is a type of DBMS that follows the relational model, which organizes data into one or more tables (or relations), each with a unique key.

Aspect	DBMS	RDBMS
Data organization	Data can be organized in any format	Data is organized in tables (or relations) with each table having a unique primary key
Data retrieval	Data retrieval can be done using various methods such as sequential search or hash search	Data is retrieved using SQL (Structured Query Language) which uses relational algebra to query the data
Data integrity	DBMS has fewer rules for ensuring data integrity	RDBMS has more rules for ensuring data integrity, such as the use of foreign keys to maintain relationships between tables and the use of constraints to enforce data validation rules
Scalability	DBMS is less scalable than RDBMS	RDBMS is generally more scalable than DBMS, since it allows for data to be distributed across multiple servers and can handle larger volumes of data
Examples	File systems, XML databases	MySQL, Oracle, Microsoft SQL Server, PostgreSQL

4. What is an Entity-Relationship diagram?

Answer: It is a graphical representation of tables with the relationship between them.

6. What are the subsets of SQL?

Answer:SQL queries are classified into four types:

1. Data Definition Language (DDL)

DDL queries are SQL commands that are used to define and modify the structure of the database.

CREATE :Performs the Creation of databases, tables, schema, etc.

DROP: Drops the tables andalso other database objects

DROP COLUMN: Drops a column from any of the table structure

ALTER: Alters or changes the definition of database objects like tables etc.

TRUNCATE: Removes the tables, views, procedures, and other database objects

ADD COLUMN: It adds a column to the table in the schema.

2. Data Manipulation Language (DML)

SQL queries are used to manipulate data stored in a database.

SELECT INTO: It Selects data from one table and inserts it into another table

INSERT: It Inserts data or entries into a table

UPDATE: It Updates the value of any record in the database

DELETE: Deletes records from a table

3. Data Control Language (DCL)

These SQL queries govern the database's access privileges and authorization control.

GRANT: Grants access rights to database objects

REVOKE: Withdraws permission from database objects

4. Transaction Control Language (TCL)

TCL is a set of commands that manages the transactions in a database as well as the modifications performed by DML statements. TCL allows us to organise statements into logical transactions.

COMMIT: Commits an irreversible transaction, which means that the previous image of the database before the transaction cannot be recovered.

ROLLBACK: Rolls back the steps in a transaction in case of an error.

SAVEPOINT: Sets a savepoint in the transaction to which rollback can be executed

SET TRANSACTION: It Sets the transaction's characteristics.

7. What is the SQL server query execution sequence?

Answer:

FROM -> goes to Secondary files via primary file
WHERE -> applies filter condition (non-aggregate column)
SELECT -> dumps data in tempDB system database
GROUP BY -> groups data according to grouping predicate
HAVING -> applies filter condition (aggregate function)
ORDER BY -> sorts data ascending/descending

8. What is Database Normalization?

Answer: It is a process of analyzing the given relation schemas based on their functional

dependencies and primary keys to achieve the following desirable properties:

Minimizing Redundancy

Minimizing the Insertion, Deletion, And Update Anomalies

Relation schemas that do not meet the properties are decomposed into smaller relation

schemas that could meet desirable properties.

9. What are the three degrees of normalization and how is normalization done in each degree?

Answer: Normalization in a database is a method of organizing or handling the data in a database to minimize data redundancy and ensure data integrity. There are three degrees of normalization, each with its own set of rules and requirements for data organization. These are:

First Normal Form (1NF): In 1NF, all attributes (columns) in a table must contain atomic (indivisible) values. This means that each column should contain only one value, and that value should not be further subdivided. To normalize a table into 1NF, you need to ensure that each column contains only atomic values. If a column contains multiple values, you need to split it into separate columns.
Second Normal Form (2NF): In 2NF, all non-key attributes (columns) in a table must be functionally dependent on the entire primary key. This means that each non-key attribute should depend only on primary key and not on any other non-key attributes. To normalize a table into 2NF, you need to identify the functional dependencies and split the table into separate tables as necessary to ensure that each non-key attribute depends only on the primary key.
Third Normal Form (3NF): In 3NF, all non-key attributes (columns) in a table must be independent of each other. This means that there should be no transitive dependencies where a non-key attribute depends on another non-key attribute. To normalize a table into 3NF, you need to eliminate any transitive dependencies by splitting the table into separate tables as necessary.

10. What are the different database objects ?

Answer: A database has different types of objects that are used to organize, store and manipulate data. Some of the most common database objects include:

Tables: Tables are the most popular and fundamental database object. They employ rows and columns to hold data, with each row being a record and each column representing a field.
Views: Views are virtual tables that do not contain any data of their own. They are created by a query and can be used to simplify complex queries, provide security, or restrict access to certain data.
Indexes: Indexes are used to improve the performance of queries by creating a separate data structure that allows for faster retrieval of data.
Stored Procedures: Stored procedures are precompiled sets of SQL statements that are kept in the database and can be run as a single entity. They are used to make complex database operations easier to understand, to boost performance, and to assure consistency.
Triggers: Triggers are database objects that are executed automatically in response to specific events such as data update, insertion, or deletion.
Constraints: Constraints are used to enforce rules and relationships between data in a database. They can be used to ensure data integrity and prevent data inconsistencies.
Synonyms: Synonyms are database objects that provide an alternative name or alias for another database object, such as a table or view.
Sequences: Sequences are used to generate unique numeric values in a database, such as primary keys.

These are some of the most common database objects, but there are many others, depending on the specific database management system being used.

SQL Basic Interview Questions and Answers

11. What is a constraint and what are the seven constraints?

Answer:Constraint: something that limits the flow in a database.

1. Primary key

2. Foreign key

3. Check

4. Default

5. Nullability

6. Unique Key

7. Surrogate Key

12. What exactly are primary and foreign keys?

Answer: Primary keys are unique identifiers for each row. They must have distinct values and cannot be null. Primary keys are one of the most fundamental aspect of all keys and constraints because of their relevance in relational databases. A table can only have one main key.

Foreign keys are used to ensure the data integrity and to show the link between tables.

13. What are the Benefits of SQL?

Answer:

SQL is not a proprietary language that is only utilized by a few database manufacturers. Practically every major DBMS supports SQL. Therefore, understanding this one language allows programmers to communicate with any database, such as ORACLE, SQL, MYSQL, and others.
SQL is simple to learn. All of the statements are made up of descriptive English words, and there aren't many of them.
SQL is a very powerful language that allows you to do very complicated and sophisticated database operations by utilizing its language features.

14. What exactly is a Surrogate Key?

Answer: The term "surrogate" means "substitute."

Surrogate keys are always implemented with the assistance of an identity column.

A SQL Server automatically generates the value of an identity column based on the seed value and incremental value.

Because identity columns are ALWAYS INT, surrogate keys must also be INT. Identity columns cannot include NULL values or repeat values. A logical key is a surrogate key.

15. What is collation?

Answer: A collation is a set of rules that govern how data is sorted and compared. For example, character data is sorted using rules that establish the correct character sequence, with case sensitivity, accent marks, kana character types, and character width options.

16. What is a derived column and hows does it work ?

Answer: A derived column is a column in a database table that is computed from one or more other columns in the same table or related tables. It is not stored physically in the table but is computed on-the-fly when a query is executed.

For example, if we have a table that contains columns for "quantity" and "price," you could create a derived column for "total cost" that multiplies the quantity and price for each row.

Derived columns can be useful for calculating values based on other columns in the table, simplifying queries and reducing the amount of code needed to calculate the values.

However, it can also have an impact on the performance of the database, especially if the computation required to generate the derived column is complex or involves a large amount of data.

17. How does a derived column affect the performance of a database and how can it be improved?

Answer: To improve the performance of a database with derived columns, there are several strategies that can be employed:

Use indexing: If a derived column is frequently used in queries, it may be beneficial to create an index on the derived column. This can help to speed up the query by allowing the database to retrieve the derived column value more quickly.
Use materialized views: A materialized view is a precomputed view that is stored physically in the database. If a derived column is used frequently, it can be beneficial to create a materialized view that computes the derived column value and stores it in a separate table.
Use computed columns: Some database systems provide support for computed columns, which are similar to derived columns but are stored physically in the table. This can be useful for frequently accessed or frequently computed derived columns.
Use denormalization: In some cases, it may be beneficial to denormalize the database by storing redundant data in a separate table. This can help to speed up queries that require the derived column value by avoiding the need to compute the value on-the-fly.

Derived columns are a useful feature in databases that can help to simplify queries and reduce the amount of code needed to calculate values. However, they can also have an impact on performance, so it's important to consider strategies to optimize their use, such as indexing, materialized views, computed columns, or denormalization.

18. What is the definition of OLTP (Online Transaction Processing)?

Answer:Relational database design in OLTP - online transaction processing systems uses the discipline of data modeling and generally follows the Codd norms of data normalization to achieve absolute data integrity. Complex information is then broken down into its most basic structures (a table) using these rules, where all the individual atomic level elements relate to one another and satisfy the normalization rules.

19. What are the differences between OLTP and OLAP?

Answer: OLTP stands for Online Transactional Processing

OLAP stands for Online Analytical Processing

Feature	OLTP	OLAP
Purpose	Transaction processing and operations	Business analysis and reporting
Database structure	Normalized	Denormalized
Data volume	Small	Large
Performance	Quick access to individual records	Fast querying and analysis of subsets
Usage	Operational staff	Business analysts and executives
Data source	Real-time transactional data	Historical and aggregated data
Queries	Simple queries and updates	Complex queries and analysis
Data consistency	Emphasizes data consistency	May allow for some data inconsistencies
Example applications	E-commerce, banking, airline reservation	Business intelligence, data warehousing

20. How do you copy just the structure of a table?

Answer:

To copy just the structure of a table in SQL, you can use the CREATE TABLE statement with the LIKE clause. Here's an example:

CREATE TABLE new_table LIKE original_table;

This statement creates a new table called new_table with the same structure as the existing table called original_table. The new table will have the same columns, data types, indexes, and constraints as the original table, but it will not have any data.

We can also use the CREATE TABLE statement with the AS clause to create a new table with the structure and data from an existing table. Here's an example:

CREATE TABLE new_table AS SELECT * FROM original_table WHERE 1=0;

This statement creates a new table called new_table with the same structure and columns as the original table, but it does not copy any data. The WHERE 1=0 clause is used to prevent any data from being copied into the new table.

21. What is the distinction between a primary and a unique key?

Answer: Both primary key and unique key ensure that the column on which they are defined is unique. But, by default, primary keys generate a clustered index on the column, whereas unique keys generate a non-clustered index. Another significant distinction is that primary keys do not permit NULLs, whereas unique keys permit just one NULL.

22. What are User Defined Functions?

Answer: User-Defined Functions allow defining its own T-SQL functions that can accept 0 or more parameters and return a single scalar data value or a table data type.

23. What are the SET Operators?

Answer:

SQL set operators allows us to combine results from two or more SELECT statements.
Syntax:

SELECT Col1, Col2, Col3 FROM T1 <SET OPERATOR>
SELECT Col1, Col2, Col3 FROM T2

Rule 1: The number of columns in first SELECT statement must be same as the number of columns in the second SELECT statement.
Rule 2: The metadata of all the columns in first SELECT statement MUST be exactly same as the metadata of all the columns in second SELECT statement accordingly.
Rule 3: ORDER BY clause do not work with first SELECT statement. ○ UNION, UNION ALL, INTERSECT, EXCEPT

. 24. What is a View?

Answer: Views are database objects which are virtual tables whose structure is defined by underlying SELECT statement and is mainly used to implement security at rows and columns levels on the base

table.

One can create a view on top of other views.
View just needs a result set (SELECT statement).
We use views just like regular tables when it comes to query writing. (joins, subqueries, grouping )
We can perform DML operations (INSERT, DELETE, UPDATE) on a view. It actually affects the underlying tables only those columns can be affected which are visible in the view.

25. What are the types of views?

Answer:

1. Regular View:

It is a type of view in which you are free to make any DDL changes on the underlying table.

-- create a regular view

CREATE VIEW v_regular AS SELECT * FROM T1

2. Schemabinding View:

It is a type of view in which the schema of the view (column) are physically bound to the schema of the underlying table. We are not allowed to perform any DDL changes

to the underlying table for the columns that are referred by the schemabinding view structure.

■ All objects in the SELECT query of the view must be specified in two part naming conventions (schema_name.tablename).

■ You cannot use * operator in the SELECT query inside the view (individually name the columns)

■ All rules that apply for regular view.

CREATE VIEW v_schemabound WITH SCHEMABINDING AS SELECT ID, Name

FROM dbo.T2 -- remember to use two part naming convention

3. Indexed View:we can have more than one clustered index on the same table if needed.

26. What is an Indexed View?

Answer:Indexed View is technically one of the types of View, not Index.

Using Indexed Views, we can have more than one clustered index on the same table if needed.
All the indexes created on a View and underlying table are shared by Query Optimizer to select the best way to execute the query.
Both the Indexed View and Base Table are always in sync at any given point.
Indexed Views cannot have NCI-H, always NCI-CI, therefore a duplicate set of the data will be created.

27. What does WITH CHECK do?

Answer:

WITH CHECK is used with a VIEW.
It is used to restrict DML operations on the view according to search predicate (WHERE clause) specified creating a view.
Users cannot perform any DML operations that do not satisfy the conditions in WHERE clause while creating a view.
WITH CHECK OPTION has to have a WHERE clause.

28. What is a RANKING function and what are the four RANKING functions?

Answer: Ranking functions are used to give some ranking numbers to each row in a dataset based on some ranking functionality.

Every ranking function creates a derived column which has integer value.

Different types of RANKING function:

ROW_NUMBER(): assigns an unique number based on the ordering starting with 1. Ties will be given different ranking positions.
RANK(): assigns an unique rank based on value. When the set of ties ends, the next ranking position will consider how many tied values exist and then assign the next value a new ranking with considering the number of those previous ties. This will make the ranking position skip placement and position numbers based on how many of the same values occurred (ranking not sequential).
DENSE_RANK(): same as rank, however it will maintain its consecutive order nature regardless of ties in values; meaning if five records have a tie in the values, the next ranking will begin with the next ranking position.

Syntax:

<Ranking Function>() OVER(condition for ordering) -- always have to have an OVER clause

Ex:

SELECT SalesOrderID, SalesPersonID,

TotalDue,

ROW_NUMBER() OVER(ORDER BY TotalDue), RANK() OVER(ORDER BY TotalDue),

DENSE_RANK() OVER(ORDER BY TotalDue) FROM Sales.SalesOrderHeader

NTILE(n): Distributes the rows in an ordered partition into a specified number of groups.

29. What is PARTITION BY?

Answer: Creates partitions within the same result set and each partition gets its own ranking. That is, the rank starts from 1 for each partition.

Ex: SELECT *, DENSE_RANK() OVER(PARTITION BY Country ORDER BY Sales DESC) AS DenseRank FROM SalesInfo

30. What does GROUP BY do?

Answer: The GROUP BY clause in SQL arranges query output into groups, with all the rows with the same value in a given column belonging to one group. For example, if you have a table that stores employee data, you can count employees by department or find the average salary for every department. GROUP BY is usually used with the aggregate functions such as COUNT(), SUM(), AVG(), MIN(), MAX().

31. What is the difference between WHERE and HAVING in SQL?

Answer: Both WHERE and HAVING are used to filter data in an SQL query. The main difference is that WHERE is used on non-aggregated values, while HAVING is used on the aggregated ones. The order of execution is as follows: WHERE – GROUP BY – HAVING. This also means WHERE is written before GROUP BY, while HAVING comes after GROUP BY.

In practice, WHERE filters data first. This filtered data will be grouped and aggregated, and then HAVING will filter the grouped and aggregated data.

For example, you would use WHERE to filter only users that have been employed for more than one year. Then you can calculate the average salary for employees with 1+ years’ experience by department. Finally, you can show only departments with an average salary above a specific value using the HAVING clause. That way, you filter rows before and after the aggregate function.

32. Write an SQL query to fetch the EmpId and FullName of all the employees working under the Manager with id – ‘986’.

Answer:

SELECT EmpId, FullName

FROM EmployeeDetails

WHERE ManagerId = 986;

33. What are some common clauses used with SELECT query in SQL?

Answer: The following are some frequent SQL clauses used in conjunction with a SELECT query:

WHERE clause: In SQL, the WHERE clause is used to filter records that are required depending on certain criteria.

ORDER BY clause: The ORDER BY clause in SQL is used to sort data in ascending (ASC) or descending (DESC) order depending on specified field(s) (DESC).

GROUP BY clause: GROUP BY clause in SQL is used to group entries with identical data and may be used with aggregation methods to obtain summarised database results.

HAVING clause in SQL is used to filter records in combination with the GROUP BY clause. It is different from WHERE, since the WHERE clause cannot filter aggregated records.

34. What is the sub-query?
Answer: Sub-query is a query whose return values are used in filtering conditions of the main query.

35. What are the data types allowed in a table?
Answer: CHAR, VARCHAR2, NUMBER, DATE, RAW, LONG, and LONG RAW.

36. How many LONG columns are allowed in a table? Is it possible to use LONG columns in WHERE clause or ORDER BY?
Answer:Only one LONG column is allowed. It is not possible to use LONG column in WHERE or ORDER BY clause.

37 .How to Select last N records from a Table?
Answer: select * from (select rownum a, CLASS_CODE,CLASS_DESC from clm) where a > ( select (max(rownum)-10) from clm) Here N = 10

38. What is the purpose of the GROUP BY clause in SQL?
Answer: The GROUP BY clause is used to group the results of a query by one or more columns, and to apply aggregate functions to those groups.

39. What is the purpose of the ORDER BY clause in SQL? Can you give an example of how to use it in?

Answer: The ORDER BY clause is used to sort the results of a query in ascending or descending order based on one or more columns.

Example:

SELECT first_name, last_name, hire_date
FROM employees
ORDER BY hire_date DESC;

40. What is a GROUP BY statement in SQL, and how does it work?
Answer: A GROUP BY statement is used to group rows with similar values in one or more columns into summary rows. It allows users to aggregate data based on specific criteria.

For example, the following SQL statement groups the "orders" table by the "product_id" column and calculates the total number of orders for each product:

SELECT product_id, COUNT(*) AS total_orders FROM orders GROUP BY product_id;

41. How do you use the HAVING clause in SQL, and what is its purpose?
Answer: The HAVING clause is used to filter groups based on a specific condition in a GROUP BY statement. It allows users to apply conditions to aggregated data and retrieve only the groups that meet the specified criteria.

For example, the following SQL statement returns the average salary for employees in each department and only includes departments with an average salary greater than 50000:
SELECT department, AVG(salary) AS average_salary FROM employees GROUP BY department HAVING AVG(salary) > 50000;

42. What is the difference between a correlated subquery and a non-correlated subquery? Answer: In a non-correlated subquery, the subquery is independent of the outer query and can be executed separately. In a correlated subquery, the subquery depends on the outer query and uses the values from the outer query in its comparison or filter condition. Correlated subqueries are generally slower than non-correlated subqueries.

43. How can sub-queries be used in SQL?

Answer: Subqueries can be used for a variety of tasks in SQL, such as:

Filtering: A subquery can be used to filter rows based on a condition that is evaluated in the subquery.
Aggregate functions: A subquery can be used to calculate aggregate functions such as SUM, COUNT, MAX, MIN, and AVG on a subset of data.
Comparison operators: A subquery can be used as a comparison operator in a WHERE or HAVING clause.
IN operator: A subquery can be used as a parameter to the IN operator to compare a value against a set of values returned by the subquery.

44. What is the difference between UNION and UNION ALL in SQL?

Answer: UNION and UNION ALL are used to combine the results of two or more SELECT statements into a single result set. The main difference between them is that UNION removes duplicate rows from the result set, while UNION ALL does not.

45. What is the difference between GROUP BY and HAVING clauses in SQL?

Answer: GROUP BY is used to group rows that have the same values in one or more columns and to apply aggregate functions such as COUNT, SUM, AVG, MIN, and MAX to each group. HAVING filters the groups based on a condition evaluated after executing the GROUP BY clause.

46. What is the difference between UNION and INTERSECT in SQL?

Answer: UNION and INTERSECT are used to combine the results of two or more SELECT statements into a single result set. The main difference between them is that UNION combines the result sets and removes duplicates, while INTERSECT returns only the common rows to both result sets.

SQL Interview Questions and Answers on Joins

47. What are Joins in SQL? Explain the types of SQL Joins.

Answer:Joins are used in queries to describe how various tables are related to one another. Joins also allow us to choose data from one table based on data from another.

There are several types of SQL joins that can be used, including

Inner Join: An inner join returns only the matching rows from both tables based on the join condition. The syntax for an inner join is as follows:

SELECT column1, column2, ...
FROM table1
INNER JOIN table2
ON table1.column_name = table2.column_name;

Left Join: A left join returns all rows from the left table (table1) and the matching rows from the right table (table2). If there is no match, the result will contain NULL values for the right table columns. The syntax for a left join is as follows:

SELECT column1, column2, ...
FROM table1
LEFT JOIN table2
ON table1.column_name = table2.column_name;

Right Join: A right join retrieves all rows from the right table (table2) and the matching rows from the left table (table1). If there is no match, the result will contain NULL values for the left table columns. The syntax for a right join is as follows:

SELECT column1, column2, ...
FROM table1
RIGHT JOIN table2
ON table1.column_name = table2.column_name;

Full Outer Join: A full outer join returns all rows from both tables, including the unmatched rows from both tables. If there is no match, the result will contain NULL values for the corresponding columns. The syntax for a full outer join is as follows:

SELECT column1, column2, ...
FROM table1
FULL OUTER JOIN table2
ON table1.column_name = table2.column_name;

Self Join: In a self join a table is joined with itself. It is useful when a table contains a hierarchical structure, such as an organization chart. The syntax for a self join is as follows:

SELECT column1, column2, ...
FROM table1 T1, table1 T2
WHERE T1.column_name = T2.column_name;

These are the common types of SQL joins that we may encounter while working with databases.

48. What is a self-join in SQL, and how is it used?

Answer: A self-join is a join operation where a table is joined with itself. It is used when a table has a foreign key relationship with itself, such as a hierarchical structure or a self-referential table. In a self-join, the table is given two different aliases, which are used to reference the table in the join condition.

49. What is a cross-join in SQL, and how is it used?

Answer: A cross join, also known as a Cartesian product, is a join operation that returns all possible combinations of rows from two tables. It is used when there is no common column between the two tables and when all possible combinations must be examined. A cross-join does not use a join condition, and it can result in many rows if the tables are large.

50. What is a natural join in SQL, and how is it used?

Answer: A natural join is a join operation that returns only the rows with matching values in the columns with the same name in both tables. It is used when the two tables have a common column with the same name and data type. A natural join does not use a join condition and automatically matches the columns with the same name in both tables.

51. What is an outer join in SQL, and how is it used?

Answer: An outer join is an operation that returns all the rows from one table and the matching rows from the other. It is used when there may not be matching rows in one of the tables. There are three types of outer joins: LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. In a LEFT OUTER JOIN, all the rows from the left table and the matching rows from the right table are returned. In a RIGHT OUTER JOIN, all the rows from the right table and the matching rows from the left table are returned. In a FULL OUTER JOIN, all the rows from both tables are returned, and the non-matching rows are filled with NULL values.

52. What is a non-equijoin in SQL, and how is it used?

Answer: A non-equijoin is a join operation that uses a comparison operator other than equals (=) to join two tables. It is used when the values in the join condition are not equal or when a range of values needs to be matched. A non-equijoin can be performed using the greater than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), and not equal to (<>) operators.

53. What is a FULL OUTER JOIN in SQL, and how is it different from a LEFT or RIGHT OUTER JOIN?

Answer: A FULL OUTER JOIN in SQL returns all rows from both tables and any matching rows. It differs from a LEFT OUTER JOIN or RIGHT OUTER JOIN, which only returns rows from one table and any matching rows from the other. The FULL OUTER JOIN includes all rows from both tables, regardless of whether they have a match in the other table.

54. How can you identify and eliminate duplicate values in a table using SQL?

Answer: One way to identify and eliminate duplicate values in a table using SQL is to use the DISTINCT keyword in a SELECT statement. For example, to retrieve a list of unique values from a column called "column_name" in a table called "table_name", you can use the following SQL statement:

SELECT DISTINCT column_name FROM table_name;

Another way to eliminate duplicate values is to use the GROUP BY clause in conjunction with an aggregate function like COUNT(), SUM(), or AVG(). For example, to count the number of occurrences of each unique value in a column called "column_name" in a table called "table_name", you can use the following SQL statement:

SELECT column_name, COUNT(*) FROM table_name GROUP BY column_name;

55. How do you prevent duplicate values from being inserted into a table using SQL?

Answer: To prevent duplicate values from being inserted into a table using SQL, we can create a UNIQUE constraint on one or more columns in the table. This ensures that each row in the table has a unique combination of values in the specified columns.

For example, to create a UNIQUE constraint on a column called "column_name" in a table called "table_name", you can use the following SQL statement:

ALTER TABLE table_name ADD UNIQUE (column_name);

If we try to insert a row that violates the UNIQUE constraint, the database will raise an error and prevent the insertion.

56. Can you provide an example of an advanced SQL subquery that uses the EXISTS keyword?

Answer: Yes, here's an example of an advanced SQL subquery that uses the EXISTS keyword:

SELECT *
FROM customers
WHERE EXISTS (
SELECT *
FROM orders
WHERE orders.customer_id = customers.customer_id
AND orders.order_date >= '2022-01-01'
);

In this example, we use a subquery with the EXISTS keyword to check if a customer places any orders after January 1st, 2022. The subquery is executed for each row in the customers table and returns true if there is at least one matching order in the orders table and false otherwise. This technique filters records based on related data in another table.

57. What is the difference between DATE and TIMESTAMP data types in SQL?

Answer: DATE data type stores only the date portion, while TIMESTAMP stores both the date and time portion.

58. How do you extract the year from a date using SQL?

Answer: We can use the YEAR function to extract the year from a date in SQL. For example, SELECT YEAR('2022-04-05') will return 2022.

59. What is the purpose of the DATEDIFF function in SQL?

Answer: The DATEDIFF function calculates the difference between two dates in a specified unit (day, hour, minute, etc.).

60. How can you add or subtract a certain number of days from a date using SQL?

Answer: In SQL, we can use the DATEADD function to add or subtract a certain number of days from a date. For example, SELECT DATEADD(day, 7, '2022-04-05') will return '2022-04-12'.

Intermediate level SQL Interview Questions and Answers.

61. What are some of the commonly used mathematical functions in SQL?

Answer: SQL provides a range of mathematical functions for performing calculations on numeric data. Some of the commonly used mathematical functions in SQL include:

ABS: Returns the absolute value of a number.
SQRT: Returns the square root of a number.
POWER: Returns a number raised to a specified power.
CEILING: Returns the smallest integer greater than or equal to a number.
FLOOR: Returns the largest integer less than or equal to a number.

62. How do you use the SUM function in SQL?

Answer: The SUM function calculates the sum of values in a column or expression in SQL. To use the SUM function, you specify the column or expression you want to sum in the SELECT statement like this:

SELECT SUM(column_name) FROM table_name;

You can also use the SUM function with the GROUP BY clause to calculate the sum of values for each group in a result set.

63. What is the difference between the AVG and COUNT functions in SQL?

Answer: The AVG function calculates the average value of a column or expression in SQL, while the COUNT function counts the number of rows in a table or the number of non-null values in a column.

The key difference between the two functions is AVG returns a numeric value, while COUNT returns an integer value. AVG ignores null values in a column, while COUNT includes null values in its count.

64. What are type conversion functions in SQL, and why are they important?

Answer: Type conversion functions are used to convert data of one data type to another in SQL. They are important because they enable us to manipulate data in a desired format, such as converting a string to a number or a date to a string. SQL provides a range of type conversion functions, such as CAST, CONVERT, and PARSE.

65. How do you use the CAST function in SQL?

Answer: The CAST function converts data from one data type to another in SQL. To use the CAST function, you specify the value you want to convert, the data type you want to convert it to, and an optional format style, like this:

SELECT CAST(value AS data_type) FROM table_name;

For example, to convert a string value to an integer, you can use the following query:

SELECT CAST('123' AS INTEGER) FROM table_name;

You can also use the CAST function with a column name to convert the data type of all values in a column.

66. What are string manipulation functions in SQL?

Answer: String manipulation functions are used to manipulate character strings in SQL. They allow us to perform operations on strings, such as concatenating, trimming, and searching for substrings within a larger string. SQL provides a range of string manipulation functions, including SUBSTRING, CONCAT, and TRIM.

67. How do you use the SUBSTRING function in SQL?

Answer: The SUBSTRING function extracts a substring from a larger string in SQL. To use the SUBSTRING function, you specify the string you want to extract from, the starting position of the substring, and the length of the substring, like this:

SELECT SUBSTRING(string, start_position, length) FROM table_name;

For example, to extract the first three characters from a string, you can use the following query:

SELECT SUBSTRING('Hello World', 1, 3) FROM table_name;

You can also use the SUBSTRING function to extract a substring from a column name in a table.

68. How do you use the TRIM function in SQL?

Answer: The TRIM function removes spaces or other specified characters from the beginning or end of a string in SQL. To use the TRIM function, you specify the characters you want to remove and the direction (LEFT or RIGHT) like this:

SELECT TRIM(characters FROM string) FROM table_name;

For example, to remove all leading and trailing spaces from a string, you can use the following query:

SELECT TRIM(' ' FROM '   Hello World   ') FROM table_name;

You can also use the TRIM function to remove specified characters from a string, or to trim values from a column name in a table.

69. What is the difference between the DATE and DATETIME data types in SQL, and how would you use the DATEADD function to add days to a date value?

Answer: The DATE data type in SQL stores only the date, without the time component, while the DATETIME data type stores both the date and time. To add days to a date value in SQL, you can use the DATEADD function, which takes three arguments: the interval to add (e.g. "day"), the number of intervals to add, and the date value to add the intervals to. For example, to add 10 days to a date value in SQL, you could use the following query: SELECT DATEADD(day, 10, '2023-04-03').

70. How would you use the SUBSTRING function and regular expressions in SQL to extract the domain name from an email address?

Answer: To extract the domain name from an email address in SQL, you could use the SUBSTRING function and regular expressions. Specifically, you could use the SUBSTRING function to extract the portion of the email address after the "@" symbol and then use a regular expression to remove any additional characters (such as the top-level domain, ".com", etc.). Here's an example query that demonstrates this approach:

SELECT SUBSTRING('john.doe@example.com', CHARINDEX('@', 'john.doe@example.com') + 1, LEN('john.doe@example.com')) AS domain_name.

71. What are window functions in SQL?

Answer: Window functions are used to perform calculations on a specific subset of data or window within a larger dataset. They can be used to calculate running totals, ranks, and other aggregate functions while allowing for more complex calculations over partitions and data ranges.

72. How do you use the OVER() clause with window functions in SQL?

Answer: The OVER() clause is used with window functions to define the partitioning and ordering of the data. For example, you can partition the data by a specific column and order it by another column, using the syntax: OVER (PARTITION BY column1 ORDER BY column2).

73. What is the difference between a ROWS and RANGE window frame in SQL?

Answer:A ROWS window frame calculates functions based on a fixed number of rows in the window, while a RANGE window frame calculates functions based on a range of values in the window. For example, a ROWS frame might calculate the average of the three preceding and three following rows, while a RANGE frame might calculate the average of all rows within a specific range of values.

74. How do you use the RANK() function with window functions in SQL?

Answer: The RANK() function assigns a rank to each row in a partition based on the values in a specific column. You can use the RANK() function with the OVER() clause to assign a rank to each row within a specific partition.

75. How do you use the LAG() and LEAD() functions with window functions in SQL?

Answer: The LAG() and LEAD() functions are used to access the previous and next rows within a window frame. You can use them to calculate running totals or other functions based on a specific column's previous or next values.

76. What is the syntax for SQL’s NOW() function?

Answer: The NOW() function returns the current date and time in SQL. The syntax is SELECT NOW().

77. How do you use the DENSE_RANK() function with window functions in SQL?

Answer: The DENSE_RANK() function assigns a rank to each row in a partition but does not leave any gaps in the ranking sequence. You can use the DENSE_RANK() function with the OVER() clause to assign a dense rank to each row within a specific partition.

78. What is a window frame in SQL?

Answer: A window frame is a subset of data within a partition used to calculate aggregate functions and other calculations. Window frames can be defined using the ROWS or RANGE clause.

79. How do you use the NTILE() function with window functions in SQL?

Answer: The NTILE() function divides a partition into a specified number of equal groups based on a specific column. You can use the NTILE() function with the OVER() clause to assign a group number to each row within a specific partition.

80. What is the difference between a running total and a cumulative total in SQL?

Answer:A running total is the total of all values up to a specific row, while a cumulative total is the total of all values up to and including a specific row. Running totals can be calculated using window functions, while cumulative totals require the use of subqueries or self-joins.

81. What are Common Table Expressions (CTEs) in SQL, and how are they used?

Answer: SQL’s Common Table Expressions (CTEs) are temporarily named result sets defined within a single SQL statement. CTEs are similar to subqueries but offer more flexibility and readability by allowing you to break up complex queries into smaller, more manageable pieces. CTEs are typically used to simplify queries, improve performance, and reduce code duplication by allowing you to reuse the same subquery multiple times within a single SQL statement.

82. What is the difference between a CTE and a temporary table in SQL?

Answer: A CTE in SQL is a temporary named result set defined within a single SQL statement. In contrast, a temporary table is a physical table created and used for temporary storage within a SQL session. The main difference between a CTE and a temporary table is that a CTE is only available within a single SQL statement. In contrast, a temporary table can be used across multiple SQL statements within a session. Temporary tables are typically used for larger or more complex data sets, while CTEs are used to simplify queries and improve performance by reducing the need for temporary storage.

83. How can you use recursive CTEs in SQL to query hierarchical data?

Answer: Recursive CTEs in SQL are used to query hierarchical data, such as organization charts or family trees, where each row contains a reference to its parent row. Recursive CTEs work by joining a table to itself repeatedly until a specific condition is met. The basic syntax for a recursive CTE includes a union of two queries: one that selects the initial set of rows and one that selects the recursive set of rows. For example, the following SQL query uses a recursive CTE to query a hierarchical data set that contains employees and managers:

WITH RECURSIVE cte AS (
SELECT name, manager_id, 0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.name, e.manager_id, c.level + 1
FROM employees e
JOIN cte c ON e.manager_id = c.id
)
SELECT name, level
FROM cte;

This query selects the initial set of rows where the manager_id is NULL and then recursively selects the child rows by joining the "employees" table to itself based on the "manager_id" column. The query returns each employee’s “name” and “level” in the hierarchy.

84. What are Recursive Common Table Expressions (CTEs) in SQL, and how are they used?

Answer: Recursive Common Table Expressions (CTEs) in SQL are used to query hierarchical data with a recursive structure. Recursive CTEs work by using a base case that defines the starting point of the recursion and then repeatedly applying a recursive case until a termination condition is met. The recursive case typically includes a self-join or a recursive function call that joins a table to itself. Recursive CTEs are useful for modeling tree structures, such as organization charts or file systems, where each row contains a reference to its parent or child row.

85. How can you use Recursive CTEs in SQL to query a tree structure?

Answer: Recursive CTEs in SQL are used to query a tree structure by defining a base case and a recursive case that join a table to itself based on a parent-child relationship. The basic syntax for a Recursive CTE includes a UNION ALL statement that combines the base case with the recursive case. For example, the following SQL query uses a Recursive CTE to query a tree structure that contains employees and managers:

WITH RECURSIVE employee_tree AS (
SELECT emp_id, emp_name, manager_id, 0 as level
FROM employee
WHERE manager_id IS NULL
UNION ALL
SELECT emp.emp_id, emp.emp_name, emp.manager_id, level + 1
FROM employee emp
JOIN employee_tree et ON emp.manager_id = et.emp_id
)
SELECT emp_name, level
FROM employee_tree
ORDER BY level, emp_name;

This query selects the base case, where the manager_id is NULL, and then applies the recursive case by joining the "employee" table to itself based on the "manager_id" column. The query returns each employee’s “emp_name” and “level” in the hierarchy, sorted by level and name.

86. What are Window Functions in SQL, and how are they combined with Common Table Expressions (CTEs)?

Answer: Window Functions in SQL are used to perform calculations across a set of rows related to the current row without grouping the rows into summary statistics. Window Functions are typically used to calculate moving averages, rank the rows based on a specific order, or calculate running totals. When combined with Common Table Expressions (CTEs), Window Functions allow you to perform complex calculations on subsets of datadefined within a CTE.

87. How can you use a CTE with a Window Function in SQL to calculate running totals?

Answer: You can use a CTE with a Window Function in SQL to calculate running totals by defining a CTE that includes the base data set and then using a Window Function to calculate the cumulative sum of a specific column. For example, the following SQL query uses a CTE with a Window Function to calculate the running total of sales for each month:

WITH monthly_sales AS (
SELECT
month,
SUM(sales) OVER (ORDER BY month) AS running_total
FROM sales_data
)
SELECT month, running_total
FROM monthly_sales;

This query defines a CTE named "monthly_sales" that includes the "month" and "sales" columns from the "sales_data" table and then calculates the running total of sales using the SUM function and the OVER clause. The query returns the "month" and "running_total" columns, representing each month's cumulative sales.

88. How can you use a CTE with a Window Function in SQL to calculate a moving average?

Answer: You can use a CTE with a Window Function in SQL to calculate a moving average by defining a CTE that includes the base data set and then using a Window Function to calculate the average of a specific column over a rolling window of rows. For example, the following SQL query uses a CTE with a Window Function to calculate the moving average of sales for a 3-month period:

WITH monthly_sales AS (
SELECT
month,
sales,
AVG(sales) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average
FROM sales_data
)
SELECT month, sales, moving_average
FROM monthly_sales;

This query defines a CTE named "monthly_sales" that includes the "month" and "sales" columns from the "sales_data" table and then calculates the moving average of sales using the AVG function and the OVER clause with a rolling window of 3 rows. The query returns the "month", "sales", and "moving_average" columns, which represent the sales and the moving average for each month.

89. How can using Common Table Expressions (CTEs) impact the performance of SQL queries?

Answer: Using CTEs in SQL can impact the performance of queries in several ways. First, CTEs can introduce additional processing overhead since they require creating a temporary result set stored in memory or disk. Second, using CTEs may result in the execution of multiple queries, which can slow down query performance. Third, CTEs may cause additional CPU and memory usage, which can affect the overall performance of the database server. Therefore, it is important to use CTEs judiciously and consider their performance implications when designing SQL queries.

90. What are some best practices for optimizing the performance of SQL queries that use CTEs?

Answer: There are several best practices that can be used to optimize the performance of SQL queries that use CTEs. One approach is to minimize the number of CTEs used in a query and to use them only when necessary. Another approach is to limit the size of the CTE result set by using filtering and aggregation techniques to reduce the number of rows returned. Additionally, using indexes on the tables used in the CTE can improve query performance, as can partitioning to split large tables into smaller, more manageable chunks. Finally, appropriate caching techniques, such as materialized views or temporary tables, can improve query performance by reducing the need to recompute the CTE result set each time the query is executed.

91. What is query optimization, and why is it important?

Answer: Query optimization optimizes SQL queries to improve their performance and efficiency. This involves analyzing the structure and execution plan of a query, identifying potential performance bottlenecks, and making changes to the query or the database schema to improve performance. Query optimization is important because it can significantly improve the speed and efficiency of database operations, improving application performance and user experience.

92. What are some common query optimization techniques?

Answer: Some common query optimization techniques include:

Using indexes to speed up data retrieval
Using the correct data types and formats to avoid unnecessary data conversions
Avoiding the use of SELECT * and instead selecting only the columns that are needed for the query
Using WHERE clauses to filter data at the database level, rather than retrieving all data and then filtering it in the application layer
Using JOINs to retrieve related data from multiple tables
Breaking down complex queries into smaller, more manageable parts using subqueries or CTEs (Common Table Expressions)
Optimizing the database schema to reduce the need for complex queries or joins.

93. How can execution plans be used to optimize queries?

Answer: Execution plans are generated by the database engine to show how a query will be executed, including the order of operations and using indexes and other optimizations. Execution plans can be used to identify potential performance bottlenecks in a query, such as inefficient joins, unnecessary data conversions, or the use of functions or calculations in WHERE clauses. By analyzing the execution plan, developers can identify opportunities to optimize the query by changing the database schema, adding indexes, or restructuring the query. Execution plans can also be used to test and compare the performance of different query optimization strategies to determine the most effective approach.

94. What is an index in a database, and how does it improve query performance?

Answer: An index is a data structure used to improve the performance of database queries by providing a fast lookup mechanism for data based on one or more columns. An index allows the database engine to quickly find the data that matches a query rather than scanning the entire table or tables. This can significantly improve query performance, particularly for large tables and complex queries.

95. What are some indexing strategies that can be used to optimize query performance?

Answer: Several indexing strategies can be used to optimize query performance, including:

Creating indexes on columns that are frequently used in WHERE clauses, JOIN conditions, and ORDER BY clauses
Using composite indexes on multiple columns to support queries that use those columns in combination
Using clustered indexes to store the data in the same order as the index, can improve the performance of range queries and table scans
Avoiding over-indexing, which can slow down database writes and consume unnecessary storage space
Regularly monitor and optimize indexes to ensure they are used effectively and efficiently.
Using partial indexes to create smaller, more targeted indexes on a subset of the data that is frequently accessed
Covering indexes, which include all of the data needed for a query, avoids the need for the database to access the underlying table.

96. What are temporary tables, and how can they be used to optimize queries?

Answer: A temporary table is a table that is created and used within a single session or transaction and is automatically dropped when the session or transaction ends. Temporary tables can be used to optimize queries by storing intermediate results and reusing them in subsequent queries. This can improve query performance by reducing the need to query large or complex data sets repeatedly.

97. What are subqueries, and how can they be used to optimize queries?

Answer: A subquery is a query embedded within another query and used to retrieve a subset of data that is then used in the outer query. Subqueries can be used to optimize queries by allowing the database engine to perform complex calculations and filtering operations on a subset of the data rather than the entire data set. This can improve query performance by reducing the amount of data that needs to be processed by the database engine.

98. What are some best practices for using temporary tables and subqueries to optimize queries?

Answer: Some best practices for using temporary tables and subqueries to optimize queries include:

Using temporary tables and subqueries only when necessary, as they can add complexity to queries and potentially slow down performance
Minimizing the number of temporary tables and subqueries used in a single query or transaction
Ensuring that temporary tables are appropriately indexed to support the queries that use them
Using subqueries and temporary tables to break down complex queries into smaller, more manageable parts
Regularly reviewing and optimizing queries that use temporary tables and subqueries to ensure they are being used effectively and efficiently.

99. What is an execution plan in SQL, and how can it be used to optimize queries?

Answer: An execution plan is the database engine’s roadmap to execute a SQL query. It provides detailed information about how the query will be executed, including which indexes will be used, how data will be accessed, and how tables will be joined. Execution plans can be used to optimize queries by identifying potential performance bottlenecks and suggesting changes to improve query performance.

100. What is query cost estimation in SQL, and how is it used to optimize queries?

Answer: Query cost estimation is the process of estimating the cost of executing a SQL query, including the cost of accessing data, joining tables, and performing other operations. This cost is then used to determine the most efficient way to execute the query based on factors such as the data set’s size, the query’s complexity, and the available resources. Query cost estimation can be used to optimize queries by identifying the most efficient query plan and minimizing the number of resources needed to execute the query.

Note: Ensure you have done a project on Data Analysis using SQL and you have a thorough knowledge and understanding of all the steps carried out in the project as you will be asked sql questions by the interviewer on your work experience in the domain to understand your proficiency.

Conclusion

In conclusion, SQL is an essential tool for anyone working with data in a relational database. Whether you are preparing for your first SQL interview or looking to brush up on your skills, mastering the top 100 sql query interview questions can be a great way to gain confidence and stand out from other candidates. By following this guide, you will be well on your way to success in your next SQL interview. So keep practicing, stay curious, and remember that the more you learn, the more valuable you will become as a data professional