Skip to content

Tutorial

Example models

This tutorial will use the following Django models:

class Person(models.Model):
    name = models.CharField(max_length=255)
    biography = models.TextField()

class Nomination(models.Model):
    award = models.CharField(max_length=255)
    category = models.CharField(max_length=255)
    person = models.ForeignKey(Person, related_name="nominations")
    movie = models.ForeignKey("Movie", related_name="nominations")
    is_winner = models.BooleanField()

class PersonDirector(models.Model):
    movie = models.ForeignKey("Movie")
    person = models.ForeignKey(Person)

class Movie(models.Model):
    name = models.CharField(max_length=255)
    directors = models.ManyToManyField(
        Person,
        through=PersonDirector,
        blank=True,
        related_name="directed_movies",
    )

Why Django Virtual Models

Due to the way Django (and especially Django REST Framework) projects are structured, read logic is usually far from prefetch and annotation logic, but both are coupled. This coupling results in brittle code that suffers from change amplification.

Note

This section describes the change amplification problem using Django REST Framework as an example, but other API frameworks (and sometimes Django-only codebases) suffer from similar problems because read logic is naturally coupled to prefetch and annotation logic. Please keep reading to check if Django Virtual Models is helpful to you. Then, the "Using Virtual Models manually" section to learn how to use this library without DRF.

Imagine you have the following DRF serializers:

class PersonSerializer(serializers.ModelSerializer):
    class Meta:
        model = Person
        fields = ["name"]

class MovieSerializer(serializers.ModelSerializer):
    directors = PersonSerializer(many=True)

    class Meta:
        model = Movie
        fields = ["name", "directors"]

Now, for every view that uses MovieSerializer you'll have to remember to do the following prefetch in every get_queryset method:

class MovieListView(ListAPIView):
    serializer_class = MovieSerializer

    def get_queryset(self):
        return Movie.objects.prefetch_related("directors")

class MovieDetailView(RetrieveAPIView):
    serializer_class = MovieSerializer

    def get_queryset(self):
        return Movie.objects.prefetch_related("directors")

Otherwise you'll get the N+1 selects problem, which is a possibly the main cause of performance issues in Django applications.

As the codebase gets more functionality, the problems get worse, and not only for N+1s. For example, if you add a new field inside PersonSerializer, like one that comes from an annotate, now you have to remember to call that annotate in all views that use MovieSerializer, because it has PersonSerializer nested on it:

# Adding `nomination_count` here...
class PersonSerializer(serializers.ModelSerializer):
    nomination_count = serializers.IntegerField(read_only=True)

    class Meta:
        model = Person
        fields = ["name", "nomination_count"]

# ... requires changes here:
def queryset_ops_for_MovieSerializer(qs):
    return qs.prefetch_related(
        Prefetch(
            "directors",
            queryset=Person.objects.annotate(
                nomination_count=Count("nominations")
            ).distinct()
        )
    )

class MovieListView(ListAPIView):
    serializer_class = MovieSerializer

    def get_queryset(self):
        return queryset_ops_for_MovieSerializer(Movie.objects.all())

class MovieDetailView(RetrieveAPIView):
    serializer_class = MovieSerializer

    def get_queryset(self):
        return queryset_ops_for_MovieSerializer(Movie.objects.all())

Any change that requires a annotate, select_related, and prefetch_related in MovieSerializer or any of its nested serializers like PersonSerializer will demand changes in (possibly) many view classes. As views and serializers are usually not in the same Python module, it's hard to ensure those coordinated changes while coding and it's even harder to review that in Pull Requests. Read logic is naturally coupled to prefetch and annotation logic!

But at first glance, it seems the solution is abstracting the queryset operations in a function. Is that robust? What if we're using PersonSerializer in another view? We would have to abstract again with another function:

def queryset_ops_for_PersonSerializer(qs):  # <--- need this now
    return qs.annotate(
        nomination_count=Count("nominations")
    ).distinct()

def queryset_ops_for_MovieSerializer(qs):
    return qs.prefetch_related(
        Prefetch(
            "directors",
            queryset=queryset_ops_for_PersonSerializer(Person.objetcs.all())
        )
    )

class MovieListView(ListAPIView):
    serializer_class = MovieSerializer

    def get_queryset(self):
        return queryset_ops_for_MovieSerializer(Movie.objects.all())

class MovieDetailView(RetrieveAPIView):
    serializer_class = MovieSerializer

    def get_queryset(self):
        return queryset_ops_for_MovieSerializer(Movie.objects.all())

class PersonView(RetrieveAPIView):
    serializer_class = PersonSerializer

    def get_queryset(self):
        return queryset_ops_for_PersonSerializer(Person.objects.all())

This rapidly results in a unwieldy codebase, and still suffers from change amplification. It's easy to nest serializers, but it's hard to nest those queryset operations because you have to change code in multiple places. As you can see, the queryset logic is always coupled with the serializer logic... so why not handle both things together somehow?

What if we had a tool that warns us about missing annotations and prefetches by reading a serializer and infering what queryset operations it needs? That's exactly what Django Virtual Models do.

Defining Virtual Models (Django REST Framework examples)

Handling prefetches and annotations

To solve the change amplification on view querysets vs. serializers, the key is to create another layer to remove the responsibility from the view to set the queryset operations the serializer needs. That layer is the Virtual Model. The minimum virtual model looks like this:

import django_virtual_models as v

class VirtualMovie(v.VirtualModel):
    class Meta:
        model = Movie

The best thing is: this is already usable. Although the only purpose to use an empty Virtual Model is to get an exception...

But that's what you want! The idea is to be guided by the virtual model exceptions to know what fields you need to add to it. First, integrate the VirtualMovie with the MovieSerializer and the view by doing the following:

import django_virtual_models as v

class MovieSerializer(v.VirtualModelSerializer):
    directors = PersonSerializer(many=True)

    class Meta:
        model = Movie
        virtual_model = VirtualMovie  # <--- add this
        fields = ["name", "directors"]


class MovieListView(v.VirtualModelListAPIView):  # <--- changed base class
    serializer_class = MovieSerializer
    queryset = Movie.objects.all()

class MovieDetailView(v.VirtualModelRetrieveAPIView):  # <--- changed base class
    serializer_class = MovieSerializer
    queryset = Movie.objects.all()

Now, if we try to access the MovieListView by hitting its URL, we get the following exception:

MissingVirtualModelFieldException at /movies/
`directors` must be defined in `VirtualMovie` because it's a nested serializer on `MovieSerializer`

In the browser, it looks like this:

MissingVirtualModelFieldException for directors

The exception is clearly telling us we need a directors field inside VirtualMovie. Since it's a relationship between models, we're going to create an empty Virtual Model for Person and reference it in the directors field:

class VirtualPerson(v.VirtualModel):  # <--- new virtual model...
    class Meta:
        model = Person

class VirtualMovie(v.VirtualModel):
    directors = VirtualPerson()  # <--- ...nested here

    class Meta:
        model = Movie

Hitting the MovieListView URL again, we get another exception:

MissingVirtualModelFieldException at /movies/
`nomination_count` used by `PersonSerializer` must be defined in `VirtualPerson` (nested on `VirtualMovie`)

Now let's add nomination_count to VirtualPerson:

class VirtualPerson(v.VirtualModel):
    nomination_count = v.Annotation(
        lambda qs, **kwargs: qs.annotate(
            nomination_count=Count("nominations")
        ).distinct()
    )

    class Meta:
        model = Person

Now the view works fine, without any N+1 queries or missing annotations! The VirtualModelListAPIView automatically uses the serializer and its virtual model to get an optimized queryset by overriding the get_queryset method.

Warning

If you override the get_queryset method without calling super, the virtual model integration will not work. Either call super().get_queryset() or use the GenericVirtualModelViewMixin in case you have custom base view classes. Read more about why on section "How DRF serializer integration works".

Warning

If you already use a custom serializer class, use the VirtualModelSerializerMixin. Check the source code to learn how it works.

Using filtered prefetches

If you need to prefetch a relationship with some filtering, you can define a method called get_prefetch_queryset inside the VirtualModel. Suppose we need to fetch only the nominations that a person won (with is_winner=True). It's simple to do that:

class VirtualAward(v.VirtualModel):
    class Meta:
        model = Nomination

    def get_prefetch_queryset(self, **kwargs):  # <--- filter call here
        return Nomination.objects.filter(is_winner=True)


class VirtualPerson(v.VirtualModel):
    awards = VirtualAward(
        lookup="nominations"  # <--- the non-filtered relation name
    )

    class Meta:
        model = Person

Using to_attr prefetches

As you can see in the previous example, you can specify the to_attr in nested virtual models, just as the regular Prefetch from Django. Remember to set both the lookup and the to_attr, as the left side attribute name corresponds to the field name in the serializer. In general, to_attr and the attribute name should be the same.

If you want the same functionality of Django's select_related, you can use a NestedJoin field in the virtual model:

class VirtualNomination(v.VirtualModel):
    person = v.NestedJoin(model_cls=Person)
    movie = v.NestedJoin(model_cls=Movie)

    class Meta:
        model = Nomination

With this, any access to nomination.person or nomination.movie won't make additional queries. Therefore, you can have nested serializers for those fields.

However, if those nested serializers have their own non-concrete fields (annotations or other nested serializers), then the virtual model will launch an exception with a message like this:

Cannot use a `NestedJoin` for `person` inside `VirtualNomination`.
`directors` must be a field of `person`,
because it's a nested serializer on `NominationSerializer`.
Change from `NestedJoin` to a `VirtualModel` and add `directors` as a field.

In other words, for further nesting you'll need nested VirtualModels (which are nested prefetches behind the scenes).

Deferring fields with deferred_fields

As Django's default behavior for querysets, Virtual Models load all concrete fields (all DB columns from the table) by default. If you have a model that contains concrete fields that are expensive to fetch, such as a TextField or a JSONField with lots of data, you can use deferred_fields to avoid fetching them:

class VirtualPerson(v.VirtualModel):
    class Meta:
        model = Person
        deferred_fields = ["biography"]

Note this only makes a difference if the biography field is not declared in your serializer's Meta.fields. If it's there, this library assumes you'll need that field and will include it in the queryset to prevent N+1 queries.

But if you don't have the biography field in your serializer, any access to person.biography will make a new query. This is the same behavior of using Person.objects.defer("biography") in regular Django.

Returning data relative to the current user

Sometimes it makes sense to include data relative to the current user in the virtual model fields.

Imagine you have a UserMovieRating model that relates users with movies and has a rating field:

class UserMovieRating(models.Model):
    user = models.ForeignKey("User", on_delete=models.CASCADE)
    movie = models.ForeignKey("Movie", related_name="ratings", on_delete=models.CASCADE)
    rating = models.DecimalField(max_digits=3, decimal_places=1)

You can add a field for user_rating in a MovieSerializer:

class MovieSerializer(v.VirtualModelSerializer):
    user_rating = serializers.DecimalField(max_digits=3, decimal_places=1)

    class Meta:
        model = Movie
        virtual_model = VirtualMovie
        fields = ["name", "user_rating"]

But you need to declare user_rating on VirtualMovie and ensure it is fetching the rating for the current user. This is possble with the following code:

class VirtualMovie(v.VirtualModel):
    user_rating = v.Annotation(
        # Note `user` is an argument here:
        lambda qs, user, **kwargs: qs.annotate(
            Subquery(
                UserMovieRating.objects.filter(
                    movie=OuterRef("pk"),
                    user=user
                ).values("rating")[:1]
            )
        )
    )

    class Meta:
        model = Movie

The user argument is available on v.Annotation thanks to v.VirtualModelSerializer that gets the current user from self.context["request"].user.

The user argument is also available on get_prefetch_queryset for use in filtered prefetches. Therefore, the following code works:

class VirtualUserMovieRating(v.VirtualModel):
    class Meta:
        model = UserMovieRating

    def get_prefetch_queryset(self, user, **kwargs):  # <--- here
        return UserMovieRating.objects.filter(user=user)

Warning

An advice: in general, you should avoid returning data relative to the current user in HTTP APIs, as this makes caching very hard or even impossible. Use this only if you really need it, as in a request that's only specific for users like user profile pages. Avoid nesting data related to the current user inside global data. Consider adding an additional request to fetch data relative to the current user, and then "hydrate" the previous request data on the frontend.

Using Serializer's context

Similar to how user param works, it's possible to use in virtual models any data that the view passes to the serializer by context.

Using the previous example, suppose you want to get the vote count equal to a specific value that is defined in the view. To do this, use the serializer context:

class MovieListView(v.VirtualModelListAPIView):
    serializer_class = MovieSerializer
    queryset = Movie.objects.all()

    def get_serializer_context(self):
        context = super().get_serializer_context()
        vote_value = 5
        return {**context, 'vote_value': vote_value}

Then, annotate the number of votes in the Virtual Model with the serializer_context param:

class VirtualMovie(v.VirtualModel):
    voting_count = v.Annotation(
        lambda qs, user, serializer_context, **kwargs: qs.annotate(
            voting_count=Count("ratings", filter=Q(ratings__rating=serializer_context['vote_value']))
        )
    )

    class Meta:
        model = Movie

And, to reflect the change, add the new field in MovieSerializer:

class MovieSerializer(v.VirtualModelSerializer):
    voting_count = serializers.IntegerField()

    class Meta:
        model = Movie
        virtual_model = VirtualMovie
        fields = ["name", "voting_count"]

Just like the user param, serializer_context can be used in the get_prefetch_queryset method.

Ignoring a serializer field

If you have a serializer field that fetches data from somewhere other than your Django models, you cannot prefetch data for it with a Virtual Model. So you need to make the Virtual Model ignore that field.

Suppose you have some serializer that fetches data from an external service:

class IMDBRatingField(serializers.Field):
    """
    Fetches the IMDB Rating for the movie.
    """
    ...

class MovieSerializer(v.VirtualModelSerializer):
    imdb_rating = IMDBRatingField()

    class Meta:
        model = Movie
        virtual_model = VirtualMovie
        fields = ["name", "imdb_rating"]

You need to mark imdb_rating as something to be ignored in the VirtualMovie. Use v.NoOp() for that:

class VirtualMovie(v.VirtualModel):
    imdb_rating = v.NoOp()

    class Meta:
        model = Movie

Prefetch hints with Django REST Framework

If your DRF serializer uses method fields, model methods or properties, you can still use Virtual Models with it! But you need to add some prefetch hints.

Prefetching for method fields (SerializerMethodField)

It's very common to use SerializerMethodField in DRF serializers. If those method fields need some virtual model fields, you need to specify that. Suppose you have the following virtual model and serializer:

class VirtualPerson(v.VirtualModel):
    awards = VirtualAward(lookup="nominations")

    class Meta:
        model = Person

class PersonSerializer(v.VirtualModelSerializer):
    has_won_any_award = serializers.SerializerMethodField()

    class Meta:
        model = Person
        virtual_model = VirtualPerson
        fields = ["name", "has_won_any_award"]

    def get_has_won_any_award(self, person):
        # How to ensure `person` will have `awards`?
        return len(person.awards) > 0

This serializer needs the awards field from VirtualPerson inside get_has_won_any_award method, otherwise it would raise an AttributeError as this is a virtual field. But how to specify that requirement? There are multiple ways of doing this:

Using type hints with hints.Virtual

This library supports special type hints for informing what virtual fields a serializer method needs:

# Python < 3.9. For >= 3.9, use `from typing import Annotated`:
from typing_extensions import Annotated
from django_virtual_models import hints

class PersonSerializer(v.VirtualModelSerializer):
    ...

    def get_has_won_any_award(
        self,
        # Type hint to ensure `awards` here:
        person: Annotated[Person, hints.Virtual("awards")]
    ):
        return len(person.awards) > 0

Now thanks to hints.Virtual("awards"), this serializer will be able to require the field awards from the virtual model. If necessary, you can pass multiple fields or even nested lookups: hints.Virtual("awards", "nominations__movie").

Note

hints.Virtual can be used to ensure the fetching of any deferred_fields a method needs without N+1s.

Note

Annotated is a built-in Python construct that allows us to add custom annotations that don't affect type checking.

Using from_types_of decorator

If your method calls another function, you can add type hints to that function with hints.Virtual and then use the from_types_of decorator:

# Python < 3.9. For >= 3.9, use `from typing import Annotated`:
from typing_extensions import Annotated
from django_virtual_models import hints

def check_person_has_won_any_award(
    # Type hint to ensure `awards` here:
    person: Annotated[Person, hints.Virtual("awards")]
):
    return len(person.awards) > 0

class PersonSerializer(v.VirtualModelSerializer):
    ...

    # Decorator to specify where to find the type hint here:
    @hints.from_types_of(check_person_has_won_any_award, "person")
    def get_has_won_any_award(self, person, check_person_has_won_any_award):
        return check_person_has_won_any_award(person)

Note the function check_person_has_won_any_award is received in the decorator and passed to the method as an additional argument. This allow serializer methods to fully delegate their logic to some external helper function. Use that only if you really need that kind of indirection.

If you already defined the field on the virtual model

If you have the has_won_any_award field in your virtual model, you can use the defined_on_virtual_model decorator:

from django_virtual_models import hints

class PersonSerializer(v.VirtualModelSerializer):
    ...

    @hints.defined_on_virtual_model()  # <--- here
    def get_has_won_any_award(self, person):
        return person.has_won_any_award

If you only use concrete fields

If your method field only uses concrete fields, in other words, it doesn't depend on any virtual model fields, nor any deferred_fields, you can use the no_deferred_fields decorator:

from django_virtual_models import hints

class PersonSerializer(v.VirtualModelSerializer):
    name_title_case = serializers.SerializerMethodField()

    class Meta:
        model = Person
        virtual_model = VirtualPerson
        fields = ["name_title_case"]

    @hints.no_deferred_fields()  # <--- here
    def get_name_title_case(self, person):
        return person.name.title()

Prefetching for model properties and methods

DRF Serializers can use directly model properties and methods in Meta.fields. If your serializer uses a model property or method, you need to use the same hints described by the previous section "Prefetching for method fields". For example:

from django_virtual_models import hints

class Person(models.Model):
    name = models.CharField(max_length=255)

    @property  # <--- property must come before
    @hints.no_deferred_fields()
    def name_title_case(self):
        return self.name.title()

    # method w/o params, DRF supports that too:
    def has_won_any_award(self: Annotated[Person, hints.Virtual("awards")]):
        return len(person.awards) > 0

class PersonSerializer(v.VirtualModelSerializer):
    class Meta:
        model = Person
        virtual_model = VirtualPerson
        fields = ["name_title_case", "has_won_any_award"]  #  <--- this works

Using Virtual Models manually

Getting optimized querysets

If you do not use Django REST Framework, you can still benefit from using Virtual Models to avoid maintaining annotations and prefetches across several parts of your codebase. By centralizing the read logic in virtual models, you reduce cognitive load and change amplification.

Consider the following virtual models:

class VirtualAward(v.VirtualModel):
    class Meta:
        model = Nomination

    def get_prefetch_queryset(self, **kwargs):
        return Nomination.objects.filter(is_winner=True)


class VirtualPerson(v.VirtualModel):
    awards = VirtualAward(lookup="nominations")
    nomination_count = v.Annotation(
        lambda qs, **kwargs: qs.annotate(
            nomination_count=Count("nominations")
        ).distinct()
    )

    class Meta:
        model = Person


class VirtualMovie(v.VirtualModel):
    directors = VirtualPerson()

    class Meta:
        model = Movie

You can hydrate an existing queryset with those virtual fields by using a syntax similar to Django lookups:

qs = Movie.objects.order_by("name")
optimized_qs = VirtualMovie().get_optimized_queryset(
    qs,
    lookup_list=[
        "directors__awards",
        "directors__nomination_count",
    ]
)

When the lookup_list omits a virtual field, it is not included in the optimized queryset.

If you want to get all virtual fields, pass lookup_list=None or omit the parameter.

Passing down **kwargs

The keyword arguments you pass in the virtual model constructor are passed down to v.Annotate and get_prefetch_queryset of all nested virtual models.

How DRF serializer integration works

The main logic that connects DRF serializers with Virtual Models is on LookupFinder (source code). The LookupFinder automatically discovers what virtual fields are needed and raise exceptions if they're not available in the virtual model associated with the serializer. It has a method called recursively_find_lookup_list that returns a lookup_list to be passed to serializer's virtual model. recursively_find_lookup_list is called on get_queryset from GenericVirtualModelViewMixin (source code).

If necessary, you can customize this logic in your own view classes.