Tiger Analytics Data Engineer Interview Questions
𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐏𝐫𝐨𝐜𝐞𝐝𝐮𝐫𝐞
Round 1 (L1) Round 2 (L2)
Managerial Round
𝐑𝐨𝐮𝐧𝐝 𝟏 (𝐋𝟏)
1, What is the output of INNER JOIN, LEFT JOIN, and RIGHT JOIN, and how many rows will they produce?
2, What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?
3, Explain how Spark works internally when a job is submitted?
4, Explain the difference between Spark and MapReduce?
5, Write a query to find the second-highest salary of an employee?
6, Explain how Spark and Hadoop handle large datasets?
7, What are the key differences between Spark and Hadoop?
9, What is the role of SparkContext in Spark?
10, How do you optimize Spark jobs for better performance?
𝐑𝐨𝐮𝐧𝐝 𝟐 (𝐋𝟐)
1, Explain your previous project experience, the technologies you’ve worked with, and any challenges you faced?
2, Write a query to list customers who spent more than ₹1000 on their orders in the last month?
3, Write a query to find customers who spent the most on their orders in the last month?
4, Write a query to join two data frames with different schemas, where the left table has more rows?
6, Explain how you handled a specific challenge in your previous project, particularly related to large-scale data processing?
7, Explain the use of window functions in SQL and how they were applied in your project?
9, How do you handle missing or corrupted data in large datasets?
10, How do you ensure data quality and consistency in a distributed system?
11, Describe the process of data ingestion in a data pipeline using Spark?
12, Explain the differences between batch processing and real-time data processing?
13, What strategies do you use to handle skewed data in Spark jobs?
14, How do you ensure the scalability of your data processing system?
15, What is the role of a DataFrame in Spark, and how is it different from an RDD?
16, How do you design and implement a data pipeline for incremental data processing?
17, Explain how you would use a window function to calculate running totals or rank data in Spark?
18, How do you monitor and debug data pipeline jobs in Spark?
19, What is the importance of partitioning in Spark, and how do you decide the partitioning strategy for a job?
𝐌𝐚𝐧𝐚𝐠𝐞𝐫𝐢𝐚𝐥 𝐑𝐨𝐮𝐧𝐝
1, Can you explain your previous project in detail? What were the objectives, the technologies you used, and the challenges you faced?
2, How do you prioritize tasks and manage deadlines in a fast-paced environment, especially when working on multiple data projects simultaneously?
3, Can you describe a situation where you had to mentor or guide a junior team member? How did you handle it, and what was the outcome?
4, How do you handle conflicts within the team, especially when it comes to disagreements about data approaches or methodologies? Can you share an example of how you resolved such a situation?
Comments
Post a Comment