In the event you’re aspiring and at the moment interviewing for roles equivalent to information scientists, information analysts, and information engineers then you might be more likely to encounter a number of technical interviews that require dwell coding, often involving SQL. Whereas later interviews would possibly require completely different programming languages like Python, which is frequent within the information area, let’s concentrate on the everyday SQL questions that I’ve encountered throughout these interviews. For the aim of this dialogue, I’ll assume that you simply’re already acquainted with basic SQL ideas equivalent to SELECT
, FROM
, WHERE
, in addition to mixture capabilities like SUM
and COUNT
. Let’s get into the specifics!
1. Mastering Joins and Desk Varieties
Indisputably, the commonest SQL query is round desk joins. It may appear too apparent, however each interview I’ve participated in has centered round this subject. You must really feel comfy with inside joins and left joins. Moreover, proficiency in dealing with self-joins and unions is efficacious. Equally essential is the flexibility to execute these joins throughout completely different desk sorts, significantly reality and dimension tables. Listed below are my unfastened definitions for these two phrases:
Truth Desk: A desk containing quite a few rows however comparatively few attributes or columns. Think about an instance the place a web-based retailer maintains an “orders” desk with columns like: date, customer_id, order_id, product_id, items, quantity
. This desk has few attributes however incorporates an enormous quantity of data.
Dimension Desk: A dimensional desk with fewer rows but many attributes. As an example, the identical on-line retailer’s “buyer” desk would possibly maintain one row per buyer, that includes attributes equivalent to customer_id, first_name, last_name, ship_street_addr, ship_zip_code
and extra.
Understanding these two major desk sorts is essential. It’s essential to know why and learn how to merge reality and dimension tables to make sure correct outcomes. Let’s think about a real-world instance: the interview query presents two tables (“orders” and “buyer”) and asks:
What number of clients have bought at the least 3 items of their lifetime and have a delivery zip code of 90210?