This topic guide describes Django’s support for interacting with multiple databases. Most of the rest of Django’s documentation assumes you are interacting with a single database. If you want to interact with multiple databases, you’ll need to take some additional steps.
The first step to using more than one database with Django is to tell Django about the database servers you’ll be using. This is done using the DATABASES setting. This setting maps database aliases, which are a way to refer to a specific database throughout Django, to a dictionary of settings for that specific connection. The settings in the inner dictionaries are described fully in the DATABASES documentation.
Databases can have any alias you choose. However, the alias default has special significance. Django uses the database with the alias of default when no other database has been selected. If you don’t have a default database, you need to be careful to always specify the database that you want to use.
The following is an example settings.py snippet defining two databases – a default PostgreSQL database and a MySQL database called users:
DATABASES = {
'default': {
'NAME': 'app_data',
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'USER': 'postgres_user',
'PASSWORD': 's3krit'
},
'users': {
'NAME': 'user_data',
'ENGINE': 'django.db.backends.mysql',
'USER': 'mysql_user',
'PASSWORD': 'priv4te'
}
}
If you attempt to access a database that you haven't defined in your DATABASES setting, Django will raise a django.db.utils.ConnectionDoesNotExist exception.
The syncdb management command operates on one database at a time. By default, it operates on the default database, but by providing a --database argument, you can tell syncdb to synchronize a different database. So, to synchronize all models onto all databases in our example, you would need to call:
$ ./manage.py syncdb
$ ./manage.py syncdb --database=users
If you don't want every application to be synchronized onto a particular database, you can define a database router that implements a policy constraining the availability of particular models.
Alternatively, if you want fine-grained control of synchronization, you can pipe all or part of the output of sqlall for a particular application directly into your database prompt, like this:
$ ./manage.py sqlall sales | ./manage.py dbshell
The other django-admin.py commands that interact with the database operate in the same way as syncdb -- they only ever operate on one database at a time, using --database to control the database used.
The easiest way to use multiple databases is to set up a database routing scheme. The default routing scheme ensures that objects remain 'sticky' to their original database (i.e., an object retrieved from the foo database will be saved on the same database). The default routing scheme ensures that if a database isn't specified, all queries fall back to the default database.
You don't have to do anything to activate the default routing scheme -- it is provided 'out of the box' on every Django project. However, if you want to implement more interesting database allocation behaviors, you can define and install your own database routers.
A database Router is a class that provides up to four methods:
Suggest the database that should be used for read operations for objects of type model.
If a database operation is able to provide any additional information that might assist in selecting a database, it will be provided in the hints dictionary. Details on valid hints are provided below.
Returns None if there is no suggestion.
Suggest the database that should be used for writes of objects of type Model.
If a database operation is able to provide any additional information that might assist in selecting a database, it will be provided in the hints dictionary. Details on valid hints are provided below.
Returns None if there is no suggestion.
Return True if a relation between obj1 and obj2 should be allowed, False if the relation should be prevented, or None if the router has no opinion. This is purely a validation operation, used by foreign key and many to many operations to determine if a relation should be allowed between two objects.
Determine if the model should be synchronized onto the database with alias db. Return True if the model should be synchronized, False if it should not be synchronized, or None if the router has no opinion. This method can be used to determine the availability of a model on a given database.
A router doesn't have to provide all these methods - it omit one or more of them. If one of the methods is omitted, Django will skip that router when performing the relevant check.
The hints received by the database router can be used to decide which database should receive a given request.
At present, the only hint that will be provided is instance, an object instance that is related to the read or write operation that is underway. This might be the instance that is being saved, or it might be an instance that is being added in a many-to-many relation. In some cases, no instance hint will be provided at all. The router checks for the existence of an instance hint, and determine if that hint should be used to alter routing behavior.
Database routers are installed using the DATABASE_ROUTERS setting. This setting defines a list of class names, each specifying a router that should be used by the master router (django.db.router).
The master router is used by Django's database operations to allocate database usage. Whenever a query needs to know which database to use, it calls the master router, providing a model and a hint (if available). Django then tries each router in turn until a database suggestion can be found. If no suggestion can be found, it tries the current _state.db of the hint instance. If a hint instance wasn't provided, or the instance doesn't currently have database state, the master router will allocate the default database.
Example purposes only!
This example is intended as a demonstration of how the router infrastructure can be used to alter database usage. It intentionally ignores some complex issues in order to demonstrate how routers are used.
This example won't work if any of the models in myapp contain relationships to models outside of the other database. Cross-database relationships introduce referential integrity problems that Django can't currently handle.
The master/slave configuration described is also flawed -- it doesn't provide any solution for handling replication lag (i.e., query inconsistencies introduced because of the time taken for a write to propagate to the slaves). It also doesn't consider the interaction of transactions with the database utilization strategy.
So - what does this mean in practice? Say you want myapp to exist on the other database, and you want all other models in a master/slave relationship between the databases master, slave1 and slave2. To implement this, you would need 2 routers:
class MyAppRouter(object):
"""A router to control all database operations on models in
the myapp application"""
def db_for_read(self, model, **hints):
"Point all operations on myapp models to 'other'"
if model._meta.app_label == 'myapp':
return 'other'
return None
def db_for_write(self, model, **hints):
"Point all operations on myapp models to 'other'"
if model._meta.app_label == 'myapp':
return 'other'
return None
def allow_relation(self, obj1, obj2, **hints):
"Allow any relation if a model in myapp is involved"
if obj1._meta.app_label == 'myapp' or obj2._meta.app_label == 'myapp':
return True
return None
def allow_syncdb(self, db, model):
"Make sure the myapp app only appears on the 'other' db"
if db == 'other':
return model._meta.app_label == 'myapp'
elif model._meta.app_label == 'myapp':
return False
return None
class MasterSlaveRouter(object):
"""A router that sets up a simple master/slave configuration"""
def db_for_read(self, model, **hints):
"Point all read operations to a random slave"
return random.choice(['slave1','slave2'])
def db_for_write(self, model, **hints):
"Point all write operations to the master"
return 'master'
def allow_relation(self, obj1, obj2, **hints):
"Allow any relation between two objects in the db pool"
db_list = ('master','slave1','slave2')
if obj1._state.db in db_list and obj2._state.db in db_list:
return True
return None
def allow_syncdb(self, db, model):
"Explicitly put all models on all databases."
return True
Then, in your settings file, add the following (substituting path.to. with the actual python path to the module where you define the routers):
DATABASE_ROUTERS = ['path.to.MyAppRouter', 'path.to.MasterSlaveRouter']
The order in which routers are processed is significant. Routers will be queried in the order the are listed in the DATABASE_ROUTERS setting . In this example, the MyAppRouter is processed before the MasterSlaveRouter, and as a result, decisions concerning the models in myapp are processed before any other decision is made. If the DATABASE_ROUTERS setting listed the two routers in the other order, MasterSlaveRouter.allow_syncdb() would be processed first. The catch-all nature of the MasterSlaveRouter implementation would mean that all models would be available on all databases.
With this setup installed, lets run some Django code:
>>> # This retrieval will be performed on the 'credentials' database
>>> fred = User.objects.get(username='fred')
>>> fred.first_name = 'Frederick'
>>> # This save will also be directed to 'credentials'
>>> fred.save()
>>> # These retrieval will be randomly allocated to a slave database
>>> dna = Person.objects.get(name='Douglas Adams')
>>> # A new object has no database allocation when created
>>> mh = Book(title='Mostly Harmless')
>>> # This assignment will consult the router, and set mh onto
>>> # the same database as the author object
>>> mh.author = dna
>>> # This save will force the 'mh' instance onto the master database...
>>> mh.save()
>>> # ... but if we re-retrieve the object, it will come back on a slave
>>> mh = Book.objects.get(title='Mostly Harmless')
Django also provides an API that allows you to maintain complete control over database usage in your code. A manually specified database allocation will take priority over a database allocated by a router.
You can select the database for a QuerySet at any point in the QuerySet "chain." Just call using() on the QuerySet to get another QuerySet that uses the specified database.
using() takes a single argument: the alias of the database on which you want to run the query. For example:
>>> # This will run on the 'default' database.
>>> Author.objects.all()
>>> # So will this.
>>> Author.objects.using('default').all()
>>> # This will run on the 'other' database.
>>> Author.objects.using('other').all()
Use the using keyword to Model.save() to specify to which database the data should be saved.
For example, to save an object to the legacy_users database, you'd use this:
>>> my_object.save(using='legacy_users')
If you don't specify using, the save() method will save into the default database allocated by the routers.
If you've saved an instance to one database, it might be tempting to use save(using=...) as a way to migrate the instance to a new database. However, if you don't take appropriate steps, this could have some unexpected consequences.
Consider the following example:
>>> p = Person(name='Fred')
>>> p.save(using='first') # (statement 1)
>>> p.save(using='second') # (statement 2)
In statement 1, a new Person object is saved to the first database. At this time, p doesn't have a primary key, so Django issues a SQL INSERT statement. This creates a primary key, and Django assigns that primary key to p.
When the save occurs in statement 2, p already has a primary key value, and Django will attempt to use that primary key on the new database. If the primary key value isn't in use in the second database, then you won't have any problems -- the object will be copied to the new database.
However, if the primary key of p is already in use on the second database, the existing object in the second database will be overridden when p is saved.
You can avoid this in two ways. First, you can clear the primary key of the instance. If an object has no primary key, Django will treat it as a new object, avoiding any loss of data on the second database:
>>> p = Person(name='Fred')
>>> p.save(using='first')
>>> p.pk = None # Clear the primary key.
>>> p.save(using='second') # Write a completely new object.
The second option is to use the force_insert option to save() to ensure that Django does a SQL INSERT:
>>> p = Person(name='Fred')
>>> p.save(using='first')
>>> p.save(using='second', force_insert=True)
This will ensure that the person named Fred will have the same primary key on both databases. If that primary key is already in use when you try to save onto the second database, an error will be raised.
By default, a call to delete an existing object will be executed on the same database that was used to retrieve the object in the first place:
>>> u = User.objects.using('legacy_users').get(username='fred')
>>> u.delete() # will delete from the `legacy_users` database
To specify the database from which a model will be deleted, pass a using keyword argument to the Model.delete() method. This argument works just like the using keyword argument to save().
For example, if you're migrating a user from the legacy_users database to the new_users database, you might use these commands:
>>> user_obj.save(using='new_users')
>>> user_obj.delete(using='legacy_users')
Use the db_manager() method on managers to give managers access to a non-default database.
For example, say you have a custom manager method that touches the database -- User.objects.create_user(). Because create_user() is a manager method, not a QuerySet method, you can't do User.objects.using('new_users').create_user(). (The create_user() method is only available on User.objects, the manager, not on QuerySet objects derived from the manager.) The solution is to use db_manager(), like this:
User.objects.db_manager('new_users').create_user(...)
db_manager() returns a copy of the manager bound to the database you specify.
If you're overriding get_query_set() on your manager, be sure to either call the method on the parent (using super()) or do the appropriate handling of the _db attribute on the manager (a string containing the name of the database to use).
For example, if you want to return a custom QuerySet class from the get_query_set method, you could do this:
class MyManager(models.Manager):
def get_query_set(self):
qs = CustomQuerySet(self.model)
if self._db is not None:
qs = qs.using(self._db)
return qs
Django's admin doesn't have any explicit support for multiple databases. If you want to provide an admin interface for a model on a database other than that that specified by your router chain, you'll need to write custom ModelAdmin classes that will direct the admin to use a specific database for content.
ModelAdmin objects have four methods that require customization for multiple-database support:
class MultiDBModelAdmin(admin.ModelAdmin):
# A handy constant for the name of the alternate database.
using = 'other'
def save_model(self, request, obj, form, change):
# Tell Django to save objects to the 'other' database.
obj.save(using=self.using)
def queryset(self, request):
# Tell Django to look for objects on the 'other' database.
return super(MultiDBModelAdmin, self).queryset(request).using(self.using)
def formfield_for_foreignkey(self, db_field, request=None, **kwargs):
# Tell Django to populate ForeignKey widgets using a query
# on the 'other' database.
return super(MultiDBModelAdmin, self).formfield_for_foreignkey(db_field, request=request, using=self.using, **kwargs)
def formfield_for_manytomany(self, db_field, request=None, **kwargs):
# Tell Django to populate ManyToMany widgets using a query
# on the 'other' database.
return super(MultiDBModelAdmin, self).formfield_for_manytomany(db_field, request=request, using=self.using, **kwargs)
The implementation provided here implements a multi-database strategy where all objects of a given type are stored on a specific database (e.g., all User objects are in the other database). If your usage of multiple databases is more complex, your ModelAdmin will need to reflect that strategy.
Inlines can be handled in a similar fashion. They require three customized methods:
class MultiDBTabularInline(admin.TabularInline):
using = 'other'
def queryset(self, request):
# Tell Django to look for inline objects on the 'other' database.
return super(MultiDBTabularInline, self).queryset(request).using(self.using)
def formfield_for_foreignkey(self, db_field, request=None, **kwargs):
# Tell Django to populate ForeignKey widgets using a query
# on the 'other' database.
return super(MultiDBTabularInline, self).formfield_for_foreignkey(db_field, request=request, using=self.using, **kwargs)
def formfield_for_manytomany(self, db_field, request=None, **kwargs):
# Tell Django to populate ManyToMany widgets using a query
# on the 'other' database.
return super(MultiDBTabularInline, self).formfield_for_manytomany(db_field, request=request, using=self.using, **kwargs)
Once you've written your model admin definitions, they can be registered with any Admin instance:
from django.contrib import admin
# Specialize the multi-db admin objects for use with specific models.
class BookInline(MultiDBTabularInline):
model = Book
class PublisherAdmin(MultiDBModelAdmin):
inlines = [BookInline]
admin.site.register(Author, MultiDBModelAdmin)
admin.site.register(Publisher, PublisherAdmin)
othersite = admin.Site('othersite')
othersite.register(Publisher, MultiDBModelAdmin)
This example sets up two admin sites. On the first site, the Author and Publisher objects are exposed; Publisher objects have an tabular inline showing books published by that publisher. The second site exposes just publishers, without the inlines.
If you are using more than one database you can use django.db.connections to obtain the connection (and cursor) for a specific database. django.db.connections is a dictionary-like object that allows you to retrieve a specific connection using it's alias:
from django.db import connections
cursor = connections['my_db_alias'].cursor()
Django doesn't currently provide any support for foreign key or many-to-many relationships spanning multiple databases. If you have used a router to partition models to different databases, any foreign key and many-to-many relationships defined by those models must be internal to a single database.
This is because of referential integrity. In order to maintain a relationship between two objects, Django needs to know that the primary key of the related object is valid. If the primary key is stored on a separate database, it's not possible to easily evaluate the validity of a primary key.
If you're using Postgres, Oracle, or MySQL with InnoDB, this is enforced at the database integrity level -- database level key constraints prevent the creation of relations that can't be validated.
However, if you're using SQLite or MySQL with MyISAM tables, there is no enforced referential integrity; as a result, you may be able to 'fake' cross database foreign keys. However, this configuration is not officially supported by Django.
Dec 26, 2011