Scraping DUPR Data​: Public vs Private API

As a pickleball player, I love DUPR: A data science based quantitative way to rate players. As a technologist of course I want to get to the raw data to run my own analysis of my pickleball community and beyond. Unfortunately they have yet to release a public API I wrote a scraper to pull the dataset I needed. Here is the source code on github.

Improving developer experience for my own customers is a top priority. It is interesting to reverse engineer someone else's API. Understandably the DUPR API is very much a backend for frontend API. Developing a public facing API would means changing the design, including:

  • normalize entities so that they are not returned in slightly different format and shape depending on which "screen" the API is designed to support

  • replace sequential integer based keys with UUIDs - in the DUPR case there are seaming logical ID fields that are not populated

  • reduce the variations of calls to get to similar objects

  • change authentication to support OAuth style token exchanges and provide a system to system level authentication on top of individual user level authentication

  • base on API callers role, hide some of the administrative data

Ideally their own front ends would then need to change to only use the same public API, which takes time and effort and taking time from feature work.

This is by no mean a criticism of the DUPR architecture. In fact over the time I have been using it they have drastically improved their weekly recalculation performance of their scores. They also need to focus on integrating tournament data sources like pickleballbrackets.com. The pickleball data marketplace is a mess running on old technology stacks. Let's hope DUPR can pull them together.

Finally, this was also an excuse to dig into SQLAlchemy. I am so spoiled by the Django ORM. Will the Django folks hurry up and make the ORM a standalone library?

Cross posted on LinkedIn.