Case Study¶

Part 1¶

To be completed at the conclusion of Day 1

For the following exercises, you should use the data stored at ../data/companies.csv You aren't expected to finish all the exercises; just get through as many as time allows and we will review them together.

  1. Start by becoming familiar with the data. How many rows and how many columns does it have? What are the data types of the columns?
  2. Set the data's index to be the "Symbol" column.
  3. Look up the company with the symbol NCLH. What company is this? What sector is it in?
  4. Filter down to companies that either in the "Consumer Discretionary" or the "Consumer Staples" sectors.
  5. How many companies are left in the data now?
  6. Create a new column, "Symbol_Length", that is the length of the symbol of each company. Hint: you may need to reset an index along the way.
  7. Find the company named "Kroger Co.". Change its name to "The Kroger Company".

Bonus: For these two exercises, you won't find examples of the solution in our notebooks. You'll need to search for help on the internet.

Don't worry if you aren't able to solve them.

  1. Filter down to companies whose symbol starts with A. How many companies meet this criterion?
  2. What is the longest company name remaining in the dataset? You could just search the data visually, but try to find a programmatic solution.

Part 2¶

To be completed at the conclusion of Day 2

This section again uses the data at ../data/companies.csv.

  1. Re-create the "Symbol_Length" column (see above).
  2. What is the average symbol length of companies in the data set?
  3. What is the average symbol length by sector? That is, after grouping by sector, what is the average symbol length for each group?
  4. How long is the longest company name? How long is the longest company name by sector?

Now open the pricing data at ../data/prices.csv. Note that this data is entirely fabricated and does not exhibit the qualities of real stock market data!

  1. Become familiar with this data. What is its shape? What are its data types?
  2. Get summary metrics (count, min, max, standard deviation, etc) for both the Price and Quarter columns. Hint: we saw a method of DataFrames that will do this for you in a single line.
  3. Perform an inner join between this data set and the companies data, on the Symbol column.
  4. How many rows does our data have now?
  5. What do you think this data represents? Form a hypothesis and look through the data more carefully until you are confident you understand what it is and how it is structured.
  6. Group the data by sector. What is the average first quarter price for a company in the Real Estate sector? What is the minimum fourth quarter price for a company in the Industrials sector?
  7. Filter the data down to just prices for Apple, Google, Microsoft, and Amazon.
  8. Save this data as big_4.csv in the ../data directory.
  9. Using Seaborn, plot the price of these companies over 4 quarters. Encode the quarter as the x-axis, the price as the y-axis, and the company symbol as the hue.

Bonus:

This data is in a form that is useful for plotting. But in this shape, it would be quite difficult to calculate the difference between each company's fourth quarter price and its first quarter price.

Reshape this data so it is of a form like the below:

Symbol Name Sector Q1 Q2 Q3 Q4
AAPL Apple Inc. Information Technology 275.20 269.96 263.51 266.07

From which we could easily calculate Q4 - Q1.

You will probably want to google something like "python reshaping data". This is a very challenging problem!