Cooperative Collection

marcus · August 3, 2015, 1:03pm

It’s been touched upon before, but I thought I’d make a formal exploration into the topic of creating one Instance using multiple Collectors

Goal

To simplify Collection.

Motivation

A goal of Pyblish is for validation to be as general and as encompassing as possible. Ultimately I’d like there to be a wide, global repository of validations that anyone can benefit from and that applies to everything from the most general to the most specific asset requirements, whilst still being technically compatible with any Instance.

To accomplish this, more responsibility must be delegated to Collection. It is the Collectors job to map information from the complex per-studio asset into a format compatible with Pyblish.

Example

In case an Instance is to be tested for height, a collector responsible for finding and storing height must be present. To avoid the same collector having to be modified each time a new validator appears, a new collector can appear to support it.

class CollectHeight(pyblish.api.Collector):
  families = ["model", "rig"]

  def process(self, context):
    for instance_name in pipeline.ls():
      if instance_name not in context:
        context.create_instance(instance_name)

    instance = context[instance_name]
    host.compute

Implementation

There are two major approaches to cooperative collection.

Ordered
Unordered

In a nutshell they are each others opposites; unordered favouring independence and compatibility, whereas order favours less code and higher performance.

The ordered cooperative collection (OCC) is simple, it means CollectorB depends on CollectorA; i.e. CollectorA must process before CollectorB.

import pyblish.api
import pyblish.util

class CollectorA(pyblish.api.Collector):
    order = pyblish.api.Collector.order + 0.0

    def process(self, context):
      my_instance = context.create_instance(
        name="MyInstance",
        family="MyFamily")

    my_instance.set_data("age", 12)

class CollectorB(pyblish.api.Collector):
    order = pyblish.api.Collector.order + 0.1

    def process(self, context):
      # This would break unless A ran first
      my_instance = context["MyInstance"]
      my_instance.set_data("height", 1.12)

pyblish.api.register_plugin(CollectA)
pyblish.api.register_plugin(CollectB)

context = pyblish.util.publish()
print context["MyInstance"].data("height")
# 1.12

Conversely, unordered cooperative collection (UCC) means Collectors can run in any order and still produce identical results.

import pyblish.api
import pyblish.util


class CollectorA(pyblish.api.Collector):
    def process(self, context):
        if "MyInstance" not in context:
            context.create_instance(
                name="MyInstance",
                family="MyFamily")

        my_instance = context["MyInstance"]
        my_instance.set_data("age", 12)

class CollectorB(pyblish.api.Collector):
    def process(self, context):
        if "MyInstance" not in context:
            context.create_instance(
                name="MyInstance",
                family="MyFamily")

        my_instance = context["MyInstance"]
        my_instance.set_data("height", 1.12)

pyblish.api.register_plugin(CollectorA)
pyblish.api.register_plugin(CollectorB)

context = pyblish.util.publish()
print context["MyInstance"].data("height")
# 1.12

Observations

Here are some observations of the approaches so far.

Ordered Pros

The relationship between two or more collectors is clear; one must come before the other
Subsequent collectors can communicate by passing data from one Instance to the other.

Ordered Cons

Encourages tight coupling between collectors
Difficult to re-use (as they depend on each other)
Difficult to test (as they can’t run without each other)

Unordered Pros

Mixable; any collector can be added to contribute to the final Instance without regard to what comes before it.
Testable; without ordering, testing can happen in isolation

Unordered Cons

More code; nothing can be expected, must be queried before used.

It would seem that from a long-term perspective, and where validations are written not just by a single developer but needs to be interchangeable with others, that UCC is favourable.

UCC enables the use of unknown validations to be plugged into an existing plug-in stack and append to existing Instance's without distrupting prior collectors.

OCC danger

When you couple collectors by their order, you must take care when modifying data. This is a typical multi-process problem known as a “race condition”.

import pyblish.api
import pyblish.util

class CollectorA(pyblish.api.Collector):
    order = pyblish.api.Collector.order + 0.0

    def process(self, context):
        my_instance = context.create_instance(
            name="MyInstance",
            family="MyFamily")
        my_instance.set_data("members", [1])

class CollectorB(pyblish.api.Collector):
    order = pyblish.api.Collector.order + 0.1

    def process(self, context):
        my_instance = context["MyInstance"]
        my_instance.data("members").append(2)

class CollectorC(pyblish.api.Collector):
    order = pyblish.api.Collector.order + 0.2

    def process(self, context):
        my_instance = context["MyInstance"]
        my_instance.data("members").append(3)

pyblish.api.register_plugin(CollectorA)
pyblish.api.register_plugin(CollectorB)
pyblish.api.register_plugin(CollectorC)

context = pyblish.util.publish()
print context["MyInstance"].data("members")
[1, 2, 3]

From here, you can build upon your knowledge that members will always be a list of incremented numbers. The problem then is when an external or unknown collector is introduced.

class CollectorAB(pyblish.api.Collector):
    order = pyblish.api.Collector.order + 0.15

    def process(self, context):
        my_instance = context["MyInstance"]
        my_instance.data("members").append(0.5)

The resulting members now includes a floating point number at an unexpected position.

# [1, 2, 0.5, 3]

Hence there is no way to guarantee the value of members unless you first gain full insight and understanding of each collector added to your stack. Something which can be difficult if validations come from elsewhere and are unknown to you.

Discussion

I have yet to test things out in practice, it’s likely things aren’t as solid as they seem and that the amount of duplicated code outweighs the benefit of cooperative collection. If not, I see a very bright future ahead, one I will share with you shortly.

Do try it out, UCC in particular, and share your experiences here.

marcus · August 5, 2015, 6:49am

Here’s a practical example of this.

import pyblish.api
import pyblish.util

scene = ["arms_GEO", "legs_GEO", "pointcache_SET"]

class CollectAsset(pyblish.api.Collector):
    def process(self, context):
        asset = context.create_asset("MyAsset")
        asset[:] = scene

class CollectIsAnimatable(pyblish.api.Collector):
    order = CollectAsset.order + 0.1

    def process(self, asset):
        asset.set_data("isAnimatable", "controls_SET" in asset)

class ValidateIsAnimatable(pyblish.api.Validator):
    def process(self, asset):
        assert asset.data("isAnimatable"), "ERROR: %s is not animatable" % asset

pyblish.api.register_plugin(CollectAsset)
pyblish.api.register_plugin(CollectIsAnimatable)
pyblish.api.register_plugin(ValidateIsAnimatable)

context = pyblish.util.publish()

asset = context["MyAsset"]

print("%s is animatable: %s" % (asset, asset.data("isAnimatable")))

# ERROR: MyAsset is not animatable
# MyAsset is animatable: False

Here’s what’s happening.

A collector gathers raw data from the “scene”, a mock of say cmds.ls()
A separate collector processes the asset, as opposed to the context. This collector will run on each previously collected asset individually, similar to how subsequent plug-ins work.
The separate collector parses the previously collected asset to determine the particular set of data it is designed to find; isAnimatable. This collector has intimate knowledge of how this information is found, such as:

That is has to do with a child being present
That this child must have a particular name

A generic validator determines whether the asset is valid.

Here, each collector remains simple and assets can be augmented by any number of additional collectors.

BigRoy · August 5, 2015, 7:05am

Wouldn’t the race condition of modifying data also exist if they are not ordered? In reality one of the two is still accessing the same existing instance of the other, thus accesses the same variables.

I would say when they are not ordered this race condition is even harder to track down since you won’t known beforehand which will run first.

Of course this is solely an issue of the unordered would access a similar data member, which it might nog need to and that’s probably how you imagined something being unordered. Just saying the possible problem would still exist and might be missed by beginners.

marcus · August 5, 2015, 7:09am

I’m talking about a race condition in how plug-ins are processed, not about the data they touch.

Race conditions are about having a fixed order in which things happen, and the problem with OCC is that this order can break when new plug-ins are introduced, causing unintended behaviour.

Race conditions still happen one level down, as you say, if the same set of data is accessed by more than one participant, but that’s a separate concern which isn’t relevant until we have multiple plug-ins run simultaneously.