python of the future!
February 1, 2022 2:32 PM   Subscribe

Python users: imagine one of your compatriots have been stuck in a cryogenic state since, oh, 2018. What are the key trends, techniques, changes, and tools you'd update them on the Python ecosystem? What's new, meaningful, and exciting in the last few years?

I've been focusing largely on R and the tidyverse for the last few years. I really like #rstats. However, I may ALSO need to work more heavily in Python soon. What have I missed, focusing more on dplyr and RStudio than pandas, conda, django, jupter, and the rest?

Note that it'd be useful to hear about anything related to Django web application development (not necessarily front end, more back end/database integration/related things). Also any changes to the PyData stack and the more data engineering/data flows side of data science, especially pandas and pandas-dependent packages, db integrations, etc. would be super helpful.

Thank you!
posted by elephantsvanish to Computers & Internet (10 answers total) 25 users marked this as a favorite
 
Hands down, the big change since 2018 is that typing is widespread, it works, and it's generally agreed to be a good practice among software engineers. This is less true on the data science side, but they're getting there too.

Dataclasses and f-strings are a nice little quality of life improvement (both pre-2018). You can read the excellent Real Python series (3.10, 3.9, 3.8, etc.) for more of these.

FastAPI has replaced Flask as the default light weight web server (its extreme usability is driven by introspection of type hints, i.e. it literally wouldn't have been possible 5 years ago).

pytest has completed its replacement of unittest (which was well underway in 2018).

The packaging ecosystem is still a shitshow that defies summarization in an metafilter comment, but anaconda is a less necessary crutch than it used to by for laptop data scientists. Most can get by with virtual environments, pip and friends now. The reason for this that compiled packages (i.e. wheels) basically work, which was less true 5 years ago.

On the data analysis side, pandas and matplotlib remain dominant, for better or worse. JupyterLab is a sufficient IDE for many notebook users.

I get the impression (from the outside, YMMV), that the Django ecosystem is winding down.
posted by caek at 3:56 PM on February 1, 2022 [6 favorites]


Seconding typing as the number one thing.

Airflow has made continual improvements -- version 2.0 came out in December 2020. My company uses it extensively for lots of stuff including data science and ETL pipelines. I'm sure it was around in 2018 but I *think* the UI and general ease of development/bells and whistles has gotten significantly better. (If you are not familiar with airflow, look it up, it's very cool. A "workflow orchestrator" that lets you define DAGs in python code, then manages the execution of the nodes in the DAG for you. link to the docs).
posted by 3FLryan at 4:37 PM on February 1, 2022 [1 favorite]


There are tons of other things I might want to say. But man, it is hard to know what is the most important, or what you would find most interesting!
posted by 3FLryan at 4:39 PM on February 1, 2022


Science may not be data science; not sure. When I hear the term "data science", it is often in the context of business intelligence (BI) or other commercial analyses, not for pure or applied sciences.

Nonetheless, for more science-focused pipelining, Snakemake and Nextflow are the usual kits. Snakemake is written in Python, while Nextflow is Groovy- or Java-based. Nextflow is generally more popular and common cloud environments tend to build in support for common tooling.

Typing is nice.

Another nicety is the addition of switch/case conditionals, which took a long time to add, but which have been written to be much more powerful than their equivalents in C and JavaScript.

For one, case equality comparisons can be done on Python objects, which is not possible in C (which has pointers to structs, but no objects outside of Objective C), and not generally possible in JavaScript outside of a strict equality test on the same object — which is a trivial equality statement and not likely to be too useful in practice.

Await and asynchronous iterators and generators are also a really interesting new feature, which you'll likely see in the context of data retrieval and processing from web or other network-based requests, where data may not come in immediately.
posted by They sucked his brains out! at 6:39 PM on February 1, 2022


not a python guy, but i read that python is duck typed. is there strong typing now? some kind of (not)compile-time type checking?

don't have to discuss a lot, drop a link if ya like.
posted by j_curiouser at 7:20 PM on February 1, 2022


Slight disagreement with caek in a few spots. Re: packaging, it's not that bad. A stack of:

- PyEnv for your Python version
- Poetry for virtual environment and package management, and
- pipx for when you need to run a random pip-installable package for one-off tasks

will basically carry you through everything.

Regarding Django winding down, I'd completely disagree. I think it's just settled into a relatively mature form, much like Rails.
posted by protocoach at 8:21 PM on February 1, 2022 [1 favorite]


protocoach: that’s also my stack for package installation on my laptop. My description of packaging as a shitshow refers more the creating and distributing packages. There’s been a huge amount of churn (including successful and failed PEPs) in this area over the past few of years and there’s no one obvious right way everyone agrees on.
posted by caek at 10:07 PM on February 1, 2022


I use attrs for everything. It may seem overkill sometimes but I have every single class I write standardized this way.
posted by cape at 10:08 PM on February 1, 2022


Regarding Django winding down, I'd completely disagree. I think it's just settled into a relatively mature form, much like Rails.

Yes, this I think. I guess its async stuff is the biggest new thing in recent years, but I haven’t used it.

Going purely by the folks on r/Django (which may well not be representative) using it with Django Rest Framework and [a front-end JS framework] is much, much more popular than it was a few years ago. (I haven’t done this either.)
posted by fabius at 5:16 AM on February 2, 2022


One more vote for typing. To j_curiouser's question it's not runtime strong type enforcement; more like a code linter you run to analyze the source code. Duck typing is still in play, particularly at runtime.

I'll add VS.Code to the list of things that have changed the Python community. It is an excellent Python IDE. That's how I got introduced to typing, the code analysis / suggestions stuff makes good use of type hints. Pylance is the key enabling technology here.

Not sure if this is post-2018 but there's also been a shift in Python philosophy; it is no longer "batteries included". It's now expected you will install packages with pip for basic things. If you're not fully comfortable with venv yet you should definitely learn. Somewhat related, Python feels to be on a faster release cadence now. And of course it's Python 3 everywhere, Python 2 is finally done except for legacy codebases no one wants to maintain.

Data science applications and Jupyter notebooks are also big trends. And of course Python is the leading glue language for machine learning applications.
posted by Nelson at 7:27 AM on February 2, 2022


« Older Google maps? Google earth? something else?   |   Imagining a pressure-sensitive purr mat Newer »
This thread is closed to new comments.