So far in this course, we've talked about the following tools:
If you took the introductory course with us, we also discussed matplotlib and seaborn.
And this afternoon, we'll talk about scikit-learn.
And that was a lot! But in this session we're going to talk about the things we simply can't cover -- the Python ecosystem is expansive and full of useful technologies.
A few reasons:
In 2017, Jake Vanderplas (Python author, developer, and advocate) spoke at PyCon (the biggest Python conference) about the Python data science ecosystem. Let's look at his diagram.
Notably, the diagram is arranged to show that higher-level libraries are built upon lower-level libraries -- a common development approach in open ecoystems.
Which of these should you think about using?
More recently, Paco Nathan (data science researcher and frequent conference speaker) wrote a blog post in which he illustrated the current Python data science landscape.
Many similarities to the other graphic, but also many new technologies!
Which new things here should you think about using?
Following the developments in such a full space can be daunting, but we recommend a few things:
Are there any tasks you do regularly that you think might have packages?
Are there any tools we saw above that you'd like to hear more about?
This section introduced a lot of new packages and tools. We'd like to leave you with a cheat sheet to refer to in the future when you're thinking about what Python package might be a good fit for your needs.
Topic | Relevant Packages | Are Any Especially Beginner-Friendly? |
---|---|---|
Plotting | matplotlib, seaborn, Bokeh, altair, plotly | seaborn, altair |
Database interaction | sqlalchemy | sqlalchemy |
Deep Learning | keras, tensorflow | keras |
Web Development | flask | |
Getting Data from the Web | requests, scrapy, beautifulsoup | requests |
Statistical Modeling | scikit-learn, statsmodels | |
Distributed Computing | pyspark |