All you need to know about interview process for a data scientist at Amazon

4 min readJun 6, 2021

Similar to Facebook or Google Amazon’s interview process for data scientists also involves a series of phone interviews, followed by onsite interviews and technical assessments.

  • The phone screener. This is first round of phone interviews that judge a candidate’s suitability for the role. Initial rounds include general HR questions about an applicant’s background, experience, and why they want to work for Amazon. There is then a technical screen, during which the applicant will be asked to explain data science concepts to show that they have the foundational technical knowledge to do the job. According to Glassdoor, applicants have in the past been asked questions such as, “Explain p-value,” “Explain bias-variance tradeoff,” “What is the difference between bagging and boosting?” and “Explain Bayes’ Theorem.” Applicants can also expect to solve SQL or Python algorithm coding questions during this stage in the interview process.
  • The behavioral questions. During both the phone interview and the more grueling onsite interview — the latter of which involves a loop with around five or six hiring managers, data scientists, and members of leadership — applicants will be asked data science interview questions where they will be expected to demonstrate the Amazon leadership principles. For example, a hiring manager might ask an applicant to discuss former projects, talk about a time they have failed, or explain occasions when they made trade-offs. According to people who have experience with the interview process, these questions are designed to invite the applicant to connect their response to one of Amazon’s leadership principles, so it’s worth crafting a story around each principle to share during the interview.

Depending on the role an applicant is applying for, the data science interview process at Amazon can involve additional technical challenges, such as solving coding problems on a whiteboard, or answering questions on machine learning and predictive modeling.The nature of the challenges and additional questions are determined by the needs of the hiring team, the responsibilities of the particular data science role, and the candidate’s level of experience.“Different teams have different requirements,” said Jain, who added that hiring managers at Amazon typically assess job seekers based on both prior experience and the ability to demonstrate that they have the foundational skills and mindset to succeed at Amazon.

“They look at how you would work if you were given a problem. They also want to know what you have done in the past because that tells them about your problem-solving skills. They dig into both.”

Some real time interview questions are below:

  • How does a logistic regression model know what the coefficients are?
  • Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?
  • Is random weight assignment better than assigning same weights to the units in the hidden layer?
  • Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
  • What is Overfitting?
  • How would the change of prime membership fee would affect the market?
  • Why is gradient checking important?
  • Describe Tree, SVM, Random forest and boosting. Talk about their advantage and disadvantages.
  • How do you weight 9 marbles three times on a balance scale to select the heaviest one?
  • Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.
  • Describe the criterion for a particular model selection. Why is dimension reduction important?
  • What are the assumptions for logistic and linear regression?
  • If you can build a perfect classification model to predict some customer behavior, what will be the problem in application?
  • The probability that item an item at location A is 0.6 , and 0.8 at location B. What is the probability that item would be found on Amazon website?
  • Given a ‘csv’ file with ID and Quantity columns, 50million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.
  • Implement circular queue using an array.
  • When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?
  • Compare Lasso and Ridge Regression.
  • What’s the difference between MLE and MAP inference?
  • Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
  • When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?
  • Estimate the disease probability in one city given the probability is very low national wide. Randomly asked 1000 person in this city, with all negative response(NO disease). What is the probability of disease in this city?
  • Describe SVM.
  • How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?
  • What is boosting?
  • How many topic modeling techniques do you know of?
  • Formulate LSI and LDA techniques.
  • What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”