Month: <span>December 2024</span>

This is part of a series of posts on learning Python and building a Python web service.

Choice of Framework – Part 2

My first goal was to set up two endpoints, one for searching for an artist/album/song and one for lookup of a particular entity, to the point where I could send data in and receive some back. The next goal was to be able to deploy the service to the cloud (Azure) and have everything still work as expected. Once that was done I would begin the process of fetching and returning music metadata. For now it was time to dig into how APIFlask does things.

Python has good support for decorators, and APIFlask takes advantage of them to define the input, output, and HTTP operation for an endpoint. In this way it’s similar to libraries I’ve used with Express that augment how you define endpoints. The decorator for input lets you define the names of the API parameters, and then APIFlask will populate matching arguments to the function that will respond to the given request. You can also specify validation rules and defaults that let you avoid writing code to do any manual checking. I’m using those to set defaults for pagination parameters on the search and making sure search text is included as a query string parameter. Note that the input decorator is optional, and you can use more than one on a function. All of this is managed by defining a schema for the input. Here is the definition for the search endpoint and its schemas:

@app.get('/search/<string:entity_type>')
@app.input(SearchParameters, location='query')
@app.output(SearchOutput)
def search(entity_type, query_data):
    ...

class SearchParameters(Schema):
    page = Integer(load_default=1)
    pageSize = Integer(load_default=10, validate=OneOf([10, 25]))
    query = String(required=True, metadata={'description': 'The search text'})

class SearchOutput(Schema):
    rows = List(Nested(SearchResult()))
    count = Integer()

The tricky part there was the endpoint had a path parameter and query string parameters, and at first I couldn’t figure out exactly how to configure all of that. The path parameter is easy, the name can go in the @get endpoint path and it will be matched with the first argument to the search function. The @input decorator specified the query string parameters via the name of the schema class for them. It apparently puts all the values into a single object rather than into multiple variables. I called the argument query_data but I don’t think it matters what you name it, since it will always be the second argument to the function.

I had tried adding two @inputs but couldn’t quite get it to work. You are supposed to be able to specify the location as path , but I couldn’t see how you would tie that to the part of the path the represents the value. The schemas are fairly straightforward class definitions and APIFlask passes them to Marshmallow under the hood. They are mostly mapped to the shape of the data that MusicBrainz returns, though they are generic enough that they should work with other providers. One twist is any class field that is itself an object needed to be wrapped in the Nested class. In the end I was able to define everything in a way that worked and kept the framework happy.

Here are the output schemas for the lookup endpoint:

class Album(Schema):
    id = String()
    name = String()
    artist = String()
    release_date = String()
    description = String()
    tags = List(String())
    image = Nested(Image())
    links = List(String())

class Artist(Schema):
    id = String()
    name = String()
    description = String()
    life_span = Dict()
    area = Dict()
    begin_area = Dict()
    tags = List(String())
    images = List(Nested(Image()))
    albums = List(Nested(Album()))
    members = List(Nested(BandMember()))
    links = List(String())

To test the request/response cycle I simply included the inputs in the JSON response. I tested the endpoints using the same tool I’ve been using for a while now, and that’s SoapUI (a.k.a. ReadyAPI). I have the open source edition which is not the easiest thing in the world to use, but it does the job (probably Postman would be better, but that would have been yet another thing to learn).

Final review for APIFlask: thumbs up.

Despite the documentation not having an example for my exact situation, it was decent enough to get things going. I like the ability to set things so the framework does some of the work for you. It also includes stuff like some nice built-in error handling and auto-generating Swagger UI docs.

This is part of a series of posts on learning Python and building a Python web service.

Scaffolding

Not knowing the best way to structure a Python web service project, I went searching on the web for good scaffolding examples for a flask app. I found a few and decided to go with this one. It seemed to lay a good foundation and could be customized as needed. Little did I know…

I ran into a couple of issues. The first was one of the dependencies of a dependency in this repo didn’t work with Python 3.13, in the version that was being installed. The maintainers had put out an update to fix it but the downstream package hadn’t started using it yet. Python 3.13 had just been released a few weeks earlier, which I naturally installed since it was the latest. That’s annoying but understandable.

Sidebar on how to install Python dependencies:
As with many things in the Python world, there are multiple ways you can manage project dependencies. The easiest way is to use pip. Like npm, you usually install it globally and then you can use it in every project to install packages. The way to do all of them at once is to list the packages you need and the version you want in a separate requirements.txt file, and then tell pip to install everything specified in that file. Another way is to use a tool called Poetry. It looks for a file named pyproject.toml that has a list of packages/versions plus general info on how to manage the project’s dependencies. It’s also more modern and full-featured than pip.

The scaffolding repo used Poetry and the fix for my first issue was to include the updated library that was the source of the problem as a special requirement before all other packages were installed. It took some research on the specs for the pyproject.toml file and trial and error to figure out the right entry to make, but that first error message eventually went away and all the packages got successfully installed. For the curious, the entry looked like this:

[tool.poetry.dependencies]
python = "^3.12"
# Needed for Python 3.13
greenlet = "^3.1.1"

The next issue I hit was more damaging. The scaffolding expected to use a particular HTTP server named gunicorn. There’s just one problem: it doesn’t run on Windows (kinda of lame, since IIS, Apache, and the various Node-based HTTP listeners I’ve used over the years all run fine on that OS). So trying to run my project based on this scaffolding blew up spectacularly.

It was at that point I decided to go back to square one and try to simplify things. I ended up starting with the new project template in PyCharm, and just adding an app.py file and __init__.py file. I copied over what was useful from my prior project, and augmented that with various examples I found via Google. It’s similar to the approach I took when creating Node web services for personal projects. I also wanted to find a good canonical example of scaffolding but I never did. I just ended up starting small and expanding from there. The problem is these tech stacks seem to change so often, and there are so many options, it would be hard to maintain a solid example over time. When I needed to spin up new Node apps I simply copied what I had done previously and modified the project as I added features or learned new Node best practices.

I eventually got a bare bones Python service running properly on my machine. I’m sure I’ll be adding to the project and re-arranging lots of things as I go.

This is part of a series of posts on learning Python and building a Python web service.

Environment

Python has a similarity to Node.js: there are different versions of the language/runtime and not all apps will work with every release. And sometimes you need to develop against a different version, but it’s not possible to have multiple versions active simultaneously. For Node there are tools like nvm or nvm-windows that solve this problem, allowing you to switch which version of Node is active at any time.

Python devs run into this a lot. Libraries or tools may not work with a new version of the language. Or you have to maintain an app that only works with an older version, but you also need to write new code using the current version. The suggested way to handle this is to create virtual environments in each project directory. Each environment includes a directory that contains the version of the Python interpreter you want, plus the dependencies for just that project, plus assorted helper scripts/tools. A different project in a different directory can have its own virtual environment with all different stuff. There is also a tool for Windows named pyenv-win that helps manage having different version of Python installed on the same machine, though I decided not to use it for now since I didn’t need it.

So that all made sense and I planned to create one for the Music Browser API. But again, that lead to the question of which tool to use to create one (another Node similarity: a dearth of tools available to do any particular thing). Python itself includes a module to create these environments named venv. It does the job but some brief web searching suggested other, better choices. I eventually picked virtualenv since it had good features and made things easy. It works great, though I ended up building, destroying, and re-building the environment for the Music Browser API over and over, as various things I was trying went wrong (more on that here).

Postscript:
An interesting thing I learned later is virtulenv comes bundled with PyCharm, which is the IDE I’m using. And when you create a new Python app it uses it to create a virtual environment for you. So running virtualenv by itself isn’t even necessary, though it’s good to know how to do it.

I recently decided to expand my language skills and learn Python properly. I had dabbled in it at a previous job, but that was long ago and very little of that knowledge has survived in my head. I figured an online course of some sort was the best way to go. I bought a Python boot camp to learn the basics of the language, but it was going slower than I wanted. It’s geared for absolute beginners and I now realize I should have tried to find one suited for experienced developers. In any case, I decided to start setting up an actual working codebase for what I want to write: the new back end web service for the Music Browser.

Which brings us to this post: the first in a series of things I’ve learned about the language, the Python toolchain, how stuff works, and how to do things the right way. Obviously for experienced Python devs none of this is new. But I figured having a reference to read later would be useful. I plan to add more entries as I go. Here’s the first topic:

Choice of Framework

My goal was to create a RESTful web service, so my first question was how is that done in modern Python. There are currently two main types of framework to use: one that is WSGI-based or ASGI-based. It’s job is to be a bridge between a Python service and the underlying HTTP server. The first one is the legacy standard that has been around for a while. It’s biggest drawback is it responds to HTTP requests synchronously. The second one is the modern version that does the same thing and is backwards-compatible, but can respond to requests asynchronously, providing better performance. It also includes other features that are useful in building full-featured services.

The most popular WSGI framework is flask. It’s battle-tested, has a lot of documentation and examples, and is simple to set up and use. For an ASGI framework there are several choices, such as FastAPI. I went back and forth over which I should use. Part of me wanted to pick flask since it’s easier to configure out of the box and I could spin up something quickly. But another part of me wanted to go with the current thing, since I should try to learn the future and not the past. Plus the idea of having async support by default was appealing.

In the end flask won out. I wanted to get some working code going ASAP, and I was worried I’d spend a ton of time learning the ins and outs of something complex like FastAPI before being able to get an app off the ground. All the docs I read said flask was dead easy to get started.

After that decision came a related next question: what form of flask should I use? Turns out there is no-frills flask, but also other libraries that are extensions to it or drop in replacements for it, which offer a lot of niceties that make writing APIs easier. I looked at a couple and decided on APIFlask. I had a working flask setup with two basic endpoints already, and impulsively started re-factoring things after browsing the APIFlask web site for like 5 minutes (note to self: a bit more deliberation on these kinds of things is worth it. You never know how far you might get in the new shiny thing before realizing it’s just not going to work for you, and I had a brief scare about not being able to figure out how to do a particular thing I wanted to do).