Making Plug-ins More Explicit

marcus · August 12, 2015, 10:09am

Quck thought about some behavioural changes to make developing and running plug-ins more explicit.

import pyblish.api

# Example 1
class MyPlugin(pyblish.api.Validator):
  run_per_instance = True

  def process(self):
    assert self.current_instance.data("color") == "blue", "Color is not blue!"

# Example 2
class MyPlugin(pyblish.api.Validator):
  run_per_instance = False

  def process(self):
    assert self.current_instance is None

# Example 3
class MyPlugin(pyblish.api.Validator):
  run_per_instance = False

  def process(self):
    assert self.context.data("filename")

In short:

no arguments
an explicit flag to determine whether or not to run once per Instance, or not.

Here’s an alternative.

# Example 4
class MyPlugin(pyblish.api.InstanceValidator):
  def process(self, instance):
    assert instance.data("color")

# Example 5
class MyPlugin(pyblish.api.ContextValidator):
  def process(self, context):
    assert context.data("filename")

In short:

Explicit class per behaviour
Maintain arguments, but they differ according to the class

Which means we could also do.

# Example 6
class MyPlugin(pyblish.api.Validator):
  def process(self):
    assert time.time() < 16000

In the off chance that neither Context nor Instance is of interest.

Motivation

From a conversation with Justin on the Python Inside Maya list, it struck me that behaviour is embedded into the argument signature, which can be hard to learn and harder to remember and reason about.

This way, behaviour is explicit, at the cost of more typing and more super classes.

Technically, the latter alternative is something we could implement today whilst maintaining backwards compatibility, and phase off as we did with Dependency Injection. The question is, does this improve our code and does it make it easier for newcomers to pick up?

It would eliminate the need for Dependency Injection, something cool and useful but not very Pythonic, along with the possibility of custom Services, which I haven’t yet seen anyone make use of.

Let me know your thoughts.

BigRoy · August 12, 2015, 12:21pm

Interesting discussion with Justin, saw it pop up on the group. Being more explicit about how it alternative the behaviour I think is a great idea. Maybe even the UI should show that difference (does it?).

Personally this would make more sense to me:

class MyPlugin(pyblish.api.Validator):
  def process(self):
    context = self.context
    for instance in context.instances():
        pass

But I see how this would not work, since now we’re running each instance in our own loop inside our own defined method. So the UI would not be able to catch the progress per instance, or even errors are not catched anymore per instance but raised for the whole context… which I think is bad.
Plus it wouldn’t handle filtering to families automatically per instance, again obviously not a solution.

Personally I think taking the step back to separating into process and process_asset is the most clear and even allows you to process both context and assets with a single plug-in, though in that case you don’t know the ordering.

Maybe these could be the different methods if really needed to post process a context.

plugin.process(context)
plugin.process_asset(instance)
plugin.process_post(context)

The differentiation between method names show that they process in a different way.

It’s lightweight and clear.

BigRoy · August 13, 2015, 7:45am

To continue here are some use cases for separated process methods on a plugin. The example code should be considered pseudocode.

Building a queue per asset and extracting content in an optimal way

Alembic Cache

Maya’s Alembic Exporter can set up multiple jobs in a single extraction. This means you can write out multiple files while only having to process the timeline once. It’s important to reduce the timeline scrubs/plays because in production scenes easily tend to get slow, large, long or a combination of those.

class AlembicExtractor(Extractor):

    def process(self):
        self.jobs = []

    def process_asset(self, asset):

        frames = asset.data('frameRange')
        output_path = '{0}.abc'.format(asset.name)

        job_str = '-frameRange {0} -file {1}'.format(frames, output_path)
        self.jobs.append(job_str)

    def process_post(self):
        from maya import cmds

        # Process the queue as a single queue
        cmds.AbcExport(j=jobs, verbose=False)

Or if there won’t be a process method that runs before assets we would have to hack in our variable like this:

class AlembicExtractor(Extractor):
    def process_asset(self, asset):

        # Init on first run through assets
        if not hasattr(self, 'jobs'):
            self.jobs = []

        frames = asset.data('frameRange')
        output_path = '{0}.abc'.format(asset.name)

        job_str = '-frameRange {0} -file {1}'.format(frames, output_path)
        self.jobs.append(job_str)

    def process_post(self):
        from maya import cmds

        # Process the queue as a single queue
        cmds.AbcExport(j=jobs, verbose=False)

Extracting world-space baked cameras

The same could be useful for exporting world-space baked cameras (or maybe extracting cached dynamic effects as separate instances). We want to cache in one go to ensure behavior is consistent (simulations could be dependent on one another) and an optimal extraction as before.

class CameraWorldSpaceExtractor(Extractor):
    def process(self):
        self.cameras = {}

    def process_asset(self, asset):
        camera = asset.data('camera')
        output_path = '{0}.ma'.format(asset.name)
        self.cameras[camera] = output_path

    def process_post(self):
        from maya import cmds

        # pseudocode:
        # Make temporary copies of cameras, bake it, extract and delete temp
        tmp_cameras = cmds.duplicate(list(cameras))
        cmds.bakeResults(tmp_cameras)

        for src_camera, export_camera in zip(self.cameras, tmp_cameras):
            path = self.cameras[src_camera]
            cmds.select(export_camera, r=1)
            cmds.file(path, exportSelected=True)

For clarity it might be more interesting to have clarity on which runs before and which runs after:

class Plugin():
    def process_pre(self):
        """Running once before `process_asset()`"""
        pass

    def process_asset(self):
        """Running for every asset `process_asset()`"""
        pass

    def process_post(self):
        """Running once after `process_asset()`"""
        pass

To be honest I think the Dependency Injection still has its place and could be used for all three methods. Accessing the context within process_asset() or don’t needing the context in process_post() could happen in practice.

marcus · August 13, 2015, 7:46am

The reason I’m not a fan of having both methods in the same class is because I look at plug-ins as a single operation, not a collection of operations.

Graphically, I’d like a plug-in to represent a single item, that does a single thing and only fails in a single way; either on the Context or an individual Instance.

And the reason I’m not a fan of embedding Context or Instance or Asset into the method name is because I’d prefer keeping the need for either optional. See, they are both quite complex concepts to someone new to the framework, and especially to someone new to publishing overall. Especially considering that other publishing framework, like Shotgun, doesn’t even have this concept. I think it’s important to keep it optional so as to better understand why it exists, before learning about it.

process() solves this, but, considering that we’re discussing making things more explicit, there might not be any way of avoiding this…

Inversion of responsibility

Here’s another alternative, where instead of producing lots of new subclasses, most doing exactly the same thing, the qualification is separated into a single variable and assigned to order.

class CollectRigs(pyblish.api.Single):
  order = pyblish.api.CollectorOrder

  def process(self, context):
    # add instances to the context


class ValidateRigs(pyblish.api.ForEach):
  order = pyblish.api.ValidatorOrder

  def process(self, instance):
    # do things to each instance


class ValidateTimeUnit(pyblish.api.Single):
  order = pyblish.api.ValidatorOrder

  def process(self, context):
    # validate time in context

Here, the behaviour is provided by the subclass, and it’s qualification - e.g. Collector or Validator - assigned explicitly.

I’m also considering whether making this “singularity” even more explicit, such that there is no doubt that this is the one and only method on this class that actually “performs” and to strengthen the idea that a plug-in is a verb in nature.

class ValidateTimeUnit(pyblish.api.Single):
  order = pyblish.api.ValidatorOrder

  def __call__(self, context):
    # validate time in context

# Plug-ins are now directly callable
validate_time_unit = ValidateTimeUnit()
validate_time_unit(context)

marcus · August 13, 2015, 7:51am

You can already do this, it’s only a matter of workflow.

class ExtractAlembicJobs(pyblish.api.Extractor):
  def process(self, context, asset):
    context.set_data("alembicJobs", [asset]


class ExtractAlembic(pyblish.api.Extractor):
  order = pyblish.api.Extractor.order + 0.1

  def process(self, context):
    cmds.AbcExport(j=context.data("alembicJobs"))

BigRoy · August 13, 2015, 8:17am

Just dropping this in here so it has been stated.

If clarity about the class is provided by where it derived from it could become trickier to trace back how it works or will iterate if the process method doesn’t change. If a studio would set up their own base classes in between, eg: StudioExtractor. Even more so they would likely have to implement the multiple classes: StudioExtractorOnce, StudioExtractorForEach.

Even if the methods are to be separated onto separate plug-ins I would say the methods should differ in name. And for simplicity (eg. when subclassing) I would just add the methods to one Plugin, this also reduces dependencies on other plug-ins as in your workaround.

Plug-in nodes for node graphs like Fusion and Maya also often come with nodes that have multiple/different processing methods that basically run similarly (like we have here).

Maya’s deformer node has compute() and deform() that both run, and it’s up to the plug-in developer to decide which one to use. (Some information is not available from within deform; note that compute is inherited from MPxNode).

Fusion has special for loop methods that you can use to ease working with parallel threads for some computations. A nice thing is that by offering these methods is that the Fusion UI can hook into these calls by updating the UI even for custom parallel computations. So a ‘for loop’ inside the node actually has intermittent correct updates in the UI.

marcus · August 13, 2015, 8:27am

That’s a good point.

I could imagine an equally well suited workaround to simply have this functionality available through modules, as opposed to via subclassing.

Before

import my_pipeline

class ExtractAlembic(my_pipeline.Extractor):
  def process(self, instance):
    temp_dir = self.temp_dir(instance)

After

import my_pipeline

class ExtractAlembic(pyblish.api.ForEach):
  order = pyblish.api.ExtractorOrder

  def process(self, instance):
    temp_dir = my_pipeline.temp_dir(instance)

It would however eliminate the possibility of customising the defaults of other attributes in superclasses, such as hosts and families. But I wonder if this is much of a loss?

That’s comforting to know, glad you pointed that out. Definitely worth taking into consideration.

BigRoy · August 13, 2015, 8:57am

Some other quick thoughts:

A stack trace would show the method being called on the plug-in but not the class inheritance hierarchy. If the method name would identify how it’s being run within the process it might become clearer to debug.
There could be other reasons to choose for inheritance from your own (pipeline-based) class.
- Implementing a custom discover() method for plug-ins that would find all inherited classes.
- Ensuring a certain way of working within the Extractor, possibly not even using the process() method. For example:

# pseudocode
class MeshValidator(Validator):
    def process(self, asset):
        meshes = ls(asset, type='mesh')
        for mesh in meshes:
            validate_mesh(mesh)

    def validate_mesh(self, mesh):
        return False

class ValidateNormals(MeshValidator):
    def validate_mesh(self, mesh):
        assert mesh.normals != 'inverted'

marcus · August 13, 2015, 9:02am

Couldn’t those examples be equally well implemented in a module? I don’t see a need for them to be inherited. Though I admit I don’t fully understand the stack trace advantage.

marcus · November 23, 2015, 6:43pm

Continuing on, here’s an implementation of the Single and ForEach approach, called ContextPlugin and InstancePlugin respectively.

See comments for details.

import types

# Mocked Pyblish module
pyblish = types.ModuleType('pyblish')
pyblish.api = types.ModuleType('api')
pyblish.api.CollectorOrder = 0
pyblish.api.ValidatorOrder = 1
pyblish.plugin = types.ModuleType('plugin')
pyblish.logic = types.ModuleType('logic')


# Ignore this class
class _Plugin(object):
    def __init__(self):
        self._records = list()
        self.log = type('Logger', (object,), {})()
        self.log.info = lambda text: self._records.append(text + '\n')


# In place of Plugin and CVEI superclasses, two distinct types are defined.
# Each has a unique, explicit behaviour, as opposed to the implicit behavior
# present in the current CVEI types.
class ContextPlugin(_Plugin):
    pass


class InstancePlugin(_Plugin):
    pass


# Example plugins
class CollectInstances(ContextPlugin):
    # The order is equally explicit and assigned
    # either via static, named numbers, or as usual
    # via arbitrary numbers. Sorting remains unchanged
    order = pyblish.api.CollectorOrder

    def process(self, context):
        context.append('instance1')
        context.append('instance2')
        self.log.info('Processing context..')


class ValidateInstances(InstancePlugin):
    order = pyblish.api.ValidatorOrder

    def process(self, instance):
        self.log.info('Processing %s' % instance)


# Ordering remains unchanged
plugins = [ValidateInstances, CollectInstances]
plugins = sorted(plugins, key=lambda item: item.order)


# plugin.process is greatly simplified.
# Note the disappearance of the provider.
def process(plugin, **kwargs):
    print('individually processing %s' % kwargs.values()[0])

    result = {
        'plugin': plugin,
        'item': None,
        'error': None,
        'records': list(),
        'success': False
    }

    try:
        plugin = plugin()
        plugin.process(**kwargs)
    except Exception as e:
        result['success'] = False
        result['error'] = e

    result['records'] = plugin._records

    return result

pyblish.plugin.process = process


# logic.process is greatly simplified
def process(plugins, context):
    print('logic.process running..')

    for plugin in plugins:
        print('Processing %s' % plugin)

        # Run once
        if issubclass(plugin, ContextPlugin):
            yield pyblish.plugin.process(plugin, context=context)

        # Run once per instance
        if issubclass(plugin, InstancePlugin):
            for instance in context:
                yield pyblish.plugin.process(plugin, instance=instance)

pyblish.logic.process = process


# Example usage
context = list()
processor = pyblish.logic.process(plugins, context)
results = list(processor)
print(results)

Discussion

The end result is plug-ins that resemble most of what matters with how they work currently, the order is reflected via an attribute which meshes better with how orders are typically overridden, and the superclass now reflects behaviour instead of the implicit argument signature of process(), but doesn’t disguise the behavioural difference between processing the Context, and processing both the Context and the Instance.

This eliminates the need for dependency injection also, as it was put in place primarily to ease the learning curve and I think this does the job (almost) as well, and easing it further by removing the implicit behavioural change.

Technically, it also simplifies maintenance by reducing the core mechanism by almost half; the Pyblish in 200 lines takes 198 lines to fully re-create Pyblish, whereas this comes to 106 for the same end result.

In regards to backwards compatibility, because it introduces new classes to inherit from, the old ways would still work, which means you could transition smoothly. Perhaps again indicating in the GUI that an old-style plug-in is being used.

I’ve been working on an implementation for this into the Pyblish library for the past few days and have found that all of this is technically possible and the code smells good in general.

@BigRoy what are your thoughts on this?

marcus · January 21, 2016, 6:12pm

Currently working on this and have converted the Pyblish By Example tutorial to illustrate how it differs from the current Collector, Validator, Extractor and Integrator superclasses.

In order to clearly separate between current and future techniques, I’m dubbing the current plug-ins “Implicit Plug-ins” versus the new “Explicit Plug-ins”

Pyblish By Explicit Example

Discussion

So far, it feels solid. The conversion took less than a few minutes, and full backwards-compatibility is maintained. The Services feature was immediately made redundant, the code made clearer but most importantly the logic is now unquestionable, thanks to a plug-in pre-defining up-front that it either deals with the Context, via ContextPlugin, or the Instance, via InstancePlugin.

Have a gander at the tutorial above and let me know how it feels; I think this could be what takes Pyblish from a being mystery to becoming a fact!

marcus · January 25, 2016, 9:00pm

Just to give you guys an idea of the significance of this change, I’d like to show you how much simpler library is about to become.

Before

The core mechanism involved in processing plug-ins, separating Context from Instances, running things in order and producing results, can be summarized in 200 lines of code.

Pyblish in 200 lines

After

With this single change, the interface towards you, the developers, can remain mostly unchanged and the feature-set remain identical, but the same core mechanism can instead be reduced to half the amount of code (!).

Pyblish in 100 lines

At the end of the day, this means less cognitive overhead and easier maintenance, both for the developers and users of Pyblish.

Big win!

Mahmoodreza_Aarabi · February 9, 2016, 12:28pm

hey man
certaily such changes are good for the tool but i want to know that,
the pyblish release 1.3.1 is completely different than 1.2.2?
i mean should learn new plugin creation?
or old plugins work yet?

Thanks

marcus · February 9, 2016, 12:32pm

Hey @Mahmoodreza_Aarabi,

The thought process and application of plug-ins remain the same, this change is mostly syntax. You are recommended to learn the new syntax, but the old still works and will continue to work until 2.0.

If you are just starting out, then I would suggest writing your plug-ins using this new syntax.

See the transition guide for the practical differences between the two methods.

http://forums.pyblish.com/t/pyblish-1-3-released