vendredi 8 mai 2015

How do you deal with large collections in REST api with python

I'm trying to use python to create rest API. As a framework I use flask, but I'm not using flask-restful because for one thing it doesn't follow DRY principle (IMHO).

Let's make an example.

Suppose a I have collection resource.

Products -> product -> reviews -> review

So the links would be

/products/
/products/id/
/products/id/reviews
/products/id/reviews/id

So in flask I would need to actually copy /products/ link 4 times. Flask-restful doesn't reflect that there is a hierarchy of resources. I made my own code that does that.

class ResourceInterface(object):
    __metaclass__ = ABCMeta

    @abstractmethod
    def get_path(self):
        pass

    @abstractmethod
    def set_path(self, path):
        pass

    @abstractmethod
    def on_get(self, *args, **kwargs):
        pass

    @abstractmethod
    def on_post(self, *args, **kwargs):
        pass

    @abstractmethod
    def on_delete(self):
        pass

    @abstractmethod
    def on_put(self):
        pass

    @abstractmethod
    def do_put(self):
        pass

    @abstractmethod
    def do_post(self, *args, **kwargs):
        pass

    @abstractmethod
    def do_get(self, *args, **kwargs):
        pass

    @abstractmethod
    def do_delete(self, *args, **kwargs):
        pass


class BaseResource(ResourceInterface):
    __metaclass__ = ABCMeta

    def __init__(self, repository, entity_serializer, path=None):
        self.entity_serializer = entity_serializer
        self.repository = repository
        self.path = None
        self.set_path(path)
        self.pickler = Pickler(unpicklable=False)

    def on_put(self):
        pickled_output = self.pickler.flatten(self.do_put())
        return jsonpickle.json.encode(pickled_output)

    def on_post(self, *args, **kwargs):
        pickled_output = self.pickler.flatten(self.do_post(*args, **kwargs))
        return jsonpickle.json.encode(pickled_output)

    def on_get(self, *args, **kwargs):
        try:
            pickled_output = self.pickler.flatten(self.do_get(*args, **kwargs))
            fields = request.args.get('fields')
            if fields:
                for field in pickled_output.keys():
                    if field not in fields:
                        del pickled_output[field]
            return jsonpickle.json.encode(pickled_output)
        except ResourceNotFoundError as e:
            return str(e), 404

    def on_delete(self, *args, **kwargs):
        pickled_output = self.pickler.flatten(self.do_delete(*args, **kwargs))
        return jsonpickle.json.encode(pickled_output)

    def get_path(self):
        return self.path

    def set_path(self, path):
        self.path = path

Here is a part of my code. I think it speaks for itself mostly.

The problem now is that I don't understand what is the proper way to deal with large collections. So suppose that the client doesn't want reviews data from product because it's too big.

My code is dealing with it by using 'fields' querystring, so client can say what fields exactly does he want. But that would work for only one level of hierarchy.

What to do when the client asks /products/ and doesn't want products to have reviews in the response? The concrete answer in my case is I can inspect all levels (with dfs or bfs) of response dictionary hierarchy and remove fields that wasn't in 'fields' querystring, but that's not a general answer because there could a name collisions in dictionary keys in different levels.

How does REST deals with that? How to implement this in python?

Maybe I should use cursors in all collections and always return the cursored collection and forgot about 'fields' querystring?

Or maybe some different class hierarchy needed? Like CollectionResource, ItemResource etc?

Aucun commentaire:

Enregistrer un commentaire