Software Development

Using local settings in a Scrapy project

TL;DR Seeing a pattern used in one framework can help you address similar problems in a different framework.

I was working on a scrapy project that would save pages into a local database. We wanted to be able to test it using a test database, just as you would with Rails.

Wec could see how to configure a database connection string in the settings.py config file, but this didn’t help switch between development and test database

Stackoverflow wasn’t much help, so we ended up rolling our own:

  1. Edit the settings.py file so it would read from additional settings files depending on a SCRAPY_ENV environment variable
  2. Move all the settings files to a separate config directory (and change scrapy.cfg so it knew where to look
  3. Edit .gitignore so that local files wouldn’t get committed to the repo (and then added some .sample files)

Git repo is here.

The magic happens at the end of settings.py

from importlib import import_module
from scrapy.utils.log import configure_logging
import logging
import os

SCRAPY_ENV=os.environ.get('SCRAPY_ENV',None)
if SCRAPY_ENV == None:
    raise ValueError("Must set SCRAPY_ENV environment var")
logger = logging.getLogger(__name__)
configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})

# Load if file exists; incorporate any names started with an
# uppercase letter into globals()
def load_extra_settings(fname):
    if not os.path.isfile("config/%s.py" % fname):
        logger.warning("Couldn't find %s, skipping" % fname)
        return
    mdl=import_module("config.%s" % fname)
    names = [x for x in mdl.__dict__ if x[0].isupper()]
    globals().update({k: getattr(mdl,k) for k in names})

load_extra_settings("secrets")
load_extra_settings("secrets_%s" % SCRAPY_ENV)
load_extra_settings("settings_%s" % SCRAPY_ENV)

It feels a bit hacky, but it does the job, so I would love to learn a more pythonic way to address this issue.