Brazillian E-Commerce Olist dataset analysis

I continue my interviews and this time my assessment was about analyzing I have been tasked with drawing insights from a Kaggle dataset Brazillian retailer Olist. In particular, I will seek to answer the following questions, which are of interest to stakeholders:

  1. Customer LTV(lifetime value)
  2. Monthly performance of the business
  3. Best selling categories
  4. Prediction for future sales


The Jupyter Notebook is our key deliverable and contains the answers to the above questions.


The data was provided from Kaggle and


  • The relevant data was queried from the table and stored as a Pandas DataFrame.
  • Data manipulation was undertaken as required (e.g. creating feature columns).
  • EDA and visualisations were created.
  • Time Series Arima model were used to forecast the future sales.

Findings and Recommendations:

  • Total revenues across 29 segments came in at 664,858 in the first eight months of 2018. The biggest segment was ‘watches’, which generated 17.4% of total revenues.
  • The best categories are watches and audio.
  • Though ‘watches’ segment is the largest part of revenue, it has only two sellers. Furthermore, the leading seller generated 97.0% of segment revenue.
  • From all customers only 3% are recurring and remaining 97% are just below 1 year purchasers.


For questions please contact me

Data Science student @Flatiron-School