Multiple families workflow

I’m increasingly using more families as a way to tag and categorize instances. This gives me quite a bit of control over which instances I want to get a hold of, without having to know specifically what those instances are called.

What I am missing a lot of the time is finer control over which instances a plugin processes. Take these instances;

class Collect(pyblish.api.ContextPlugin):

    order = pyblish.api.CollectorOrder

    def process(self, context):

        instance = context.create_instance(name="A")
        instance.data["families"] = ["alembic", "local"]

        instance = context.create_instance(name="B")
        instance.data["families"] = ["alembic", "farm"]

        instance = context.create_instance(name="C")
        instance.data["families"] = ["renderlayer", "local"]

Say I want to have a plugin process instance A only. I have to do further filtering on the passed in instances:

class Plugin(pyblish.api.InstancePlugin):

    order = pyblish.api.ValidatorOrder

    def process(self, instance):

        # Filter to instances that have "alembic" AND "local" families.
        families = instance.data["families"]
        if "alembic" not in families or "local" not in families:
            return

        self.log.info(str(instance))

This might be edgy cases, but when using multiple families I tend to find I have to do this a lot. My question is whether we could (and should?) have a way for plugins to process only the instances that fulfill all the the families?

It’s an interesting topic, I don’t think it’s an edge case but rather something fundamental to the mindset when building these associations.

I think we’re looking at 3 potential matching-algorithms, where only the first one is currently in place.

  1. Intersection
  2. Subset
  3. Exact match
# 1. Include on any match
assert set(["a", "b"]).intersection(["b", "c"])

# 2. Include on all match
assert set(["a", "b"]).issubset(["a", "b", "c"])

# 3. Include on exact match
assert ["a", "b"] == ["a", "b"]

@tokejepsen which of the subsequent two algorithms do you think would be a good fit for your usecase?

I would say Subset would be the best algorithm.

In the case where you can’t get the instances you want with Subset, you either have to revise which families you are searching for in the plugin, or which families are available from collection.

The whole point of tagging the instances are to be able to get certain instances without know all the tags. An Exact mach algorithm would be similar to the current Intersection where we are just using longer descriptive families.

The next question then is how to best describe this relationship.

Perhaps the most straightforward and naive way of achieving it, would be with a global setting.

pyblish.api.matching_algorithm = "subset"

The problem with that being that you’ve now limited all plug-ins, even from packages outside of your control like the default ones with pyblish-maya or some third-party provider, to this one algorithm. Which might (most likely) cause them to behave in unexpected ways.

Another approach might be to add another property to a plug-in (or instance?).

class MyPlugin(...):
  matching_algorithm = "subset"

# Or..
instance.data["matching_algorithm"] = "subset"

But I can’t see ahead of time where (if at all) that might be an insufficient means of describing it.

Can you think of any other approach?

I definitely think it should be a plugin property.
Although having it on the instance might interesting, I don’t see a use case for it. Mostly because in my head instances are for storing data, and plugins are for processing.

What I’m not fond of is magical key words that mean something to Pyblish but nothing to a user. How about assigning the matching algorithm to the plugin property?

class MyPlugin(...):
    matching_algorithm = pyblish.api.algorithms.families_subset

In theory you could make your own matching algorithms, like for example think @mkolar and I was once talking about how to match all plugins that have a certain data member.

Perfect, I think it sounds really good.

I’ll put a PR up now and we can talk implementation specifics in there.

Ok, let me know what you think.

Brain dump (sorry!)

This might be slightly off track but I wanted to present an idea which has been stuck in my head for a while. What if the plug-in would be able to tell whether it is compatible with an instance? Say:

plugin.is_compatible(instance)

This could be called to identify whether a plug-in should be run to process a particular instance. This also would take some of the stress away from implementing similar functionality into UIs since it could ask the plug-in whether it’s compatible. For example a UI could decide to hide a plug-in when it’s not compatible with anything (no Context or Instance) and as such won’t be run.

The interesting bit there would be that a plug-in could be exactly tailored to run in a very specific situation if the user decides to override it.

class MyPlugin():
    def is_compatible(self, context):
        for instance in context:
            if "x" in instance:
                return True
         return False

Whether it should receive the context or instance (or maybe both is_compatible(instance_or_context)?) is still a question. Because you might want to run it only for the Instance or only for the Context. Or only once for the Context if a particular Instance is present. It could also just return (instead of True/False) the instances that should be processed.

class MyPlugin():
    def is_compatible(self, node):
        # ignore context
        if is_context(node):
            return False

        if is_instance(node):
            return True

To remain backwards compatible the default implementation could just be the current behavior.
The is_compatible method could then also impement the variable family-matching algorithms by default.

Thats is an interesting approach @BigRoy.

The functionality is what I was referring to when talking about custom matching algorithms.
I do like the approach of overwriting the function as you would normally do when subclassing.

As for whether an instance or context gets passed in, it could depend on whether the plugin is an InstancePlugin or Context Plugin. Or maybe that is too magical?

Forgot to say that one problem with subclassing could be to provide people with options, similar to Subset and Exact algorithms.
I guess you could still provide these algorithms via the api.

Complex is easy, simple is hard.

When it comes to introducing new functionality, especially one that is inherently more complex, I try and look towards the benefits of keeping it simple.

  1. Less of a learning curve
  2. Easier to understand other peoples code
  3. Easier to maintain

And then think about whether the advantages of the new functionality outweighs these.

In this case, the ability to define custom matching algorithms, albeit cool and logical the way you’ve proposed it, does it add more value than cost?

At one extreme, I see a future where every plug-in defines a corresponding matching algorithm. At that point, to even begin to understand a series of plug-ins - especially those mixed and matched from elsewhere - would see an increase in the time required to understand it.

On the other extreme, where there is only one matching algorithm, you need to get creative with the little you’ve got in order to achieve complex behaviour.

Having said that, it’s also possible that families as it exists today is a subset of what this does. That defining your own matching algorithm could be the de facto method of associating plug-ins to instances and that it makes for both more flexibility and simplicity.

Let’s explore it.

First off, I’m interested in what you mentioned @tokejepsen about your requirement in pyblish-ftrack and what hoops you jump to currently in order to achieve the necessary effect.

The main problem was originally that we couldn’t process all instances, because they wouldn’t all have the same family name. This was back when you could only associate a single family with an instance. So we decided to process all instances, but just return early when a certain data member was missing.

These days with multiple families we could easily utilize a certain family name like ftrack to figure out which instances to process. The user could easily add another family to an instance. So there aren’t actually much of an argument for matching algorithms.
There are the edge case where some one is using the family ftrack for someting else. At this point having a matching algorithms that only processes the instances that have the correct data would be beneficial, but this case is highly unlikely to happen. Even the cross over of different plugins packages has yet to be an issue.

The above api.Intersection, api.Subset and api.Exact algorithms have now been merged and released as 1.4.3.

$ pip install pyblish-base
1 Like

Just tested this with a new plugin, and it works great :smile:

1 Like

We just switched to using this feature in our tech preview. So now we’re collecting families like:

instance = context.create_instance(
   node.name(), families=['ftrack', 'camera']
)

And on the plugins:

families = ['ftrack', 'camera']
match = pyblish.api.Subset

Seems to work well - nice addition :slight_smile:

2 Likes

.match attribute added to API documentation.

I’ve noticed one workflow where a custom matching algorithm would be useful

If you collect some instances, and wanted to append some data to them with a different collector. The case is host specific instances, where you want to append data to instances cross host without code duplicate per host.
Once the instances are created, they have to be publishable to be proceed by other InstancePlugins. Here I’m using a ContextPlugin and manually filtering to the correct instances, to force processing all instances.

Would it be possible to post an example of this?

Sure :smile:

Maya Collector

class MayaCollector(pyblish.api.ContextPlugin):

    order = pyblish.api.CollectorOrder
    hosts = ["maya"]

    def process(self, context):

        instance = context.create_instance("A")
        instance.data["families"] = ["farm"]
        instance.data["publish"] = False

Houdini Collector

class HoudiniCollector(pyblish.api.ContextPlugin):

    order = pyblish.api.CollectorOrder
    hosts = ["houdini"]

    def process(self, context):

        instance = context.create_instance("B")
        instance.data["families"] = ["farm"]
        instance.data["publish"] = False

Append data collector

class AppendData(pyblish.api.InstancePlugin):

    order = pyblish.api.CollectorOrder + 0.1
    families = ["farm"]

    def process(self, instance):

        instance.data["SomeData"] = {"something": "else", "some": 1}

Do you mean that because of instance.data["publish"] = False, the AppendData plug-in has no effect? You would have expected those instances to be processed by plug-ins within the Collection order even with this publish member set to False?

How about a post plug-in, to determine defaults?

class HoudiniCollector(pyblish.api.ContextPlugin):

    order = pyblish.api.CollectorOrder
    hosts = ["houdini"]

    def process(self, context):

        instance = context.create_instance("B")
        instance.data["families"] = ["farm"]
        # instance.data["publish"] = False

class AppendData(pyblish.api.InstancePlugin):

    order = pyblish.api.CollectorOrder + 0.1
    families = ["farm"]

    def process(self, instance):
        instance.data["SomeData"] = {"something": "else", "some": 1}

class SetPublish(pyblish.api.InstancePlugin):

    order = pyblish.api.CollectorOrder + 0.2
    families = ["farm"]

    def process(self, instance):
        instance.data["publish"] = something is True

I think the reason I’d avoid custom algorithms is for (1) the added learning curve for anyone looking to learn your plug-ins, and for the (2) lessened re-usability and (3) intermixing of plug-ins. The advantage would need to be rather significant to justify such a sacrifice and there would ideally be no workaround, or at least one that was significantly more difficult to manage.

I think if you could show me a way to implement it without encountering the 3 cons above, that would be a great starting point for the feature.