Pyblish Search and Customisation

marcus · August 4, 2015, 8:55am

Prelude

Registering plug-ins as files have many benefits, like parallel development, loose coupling and individual versioning through version control software such as Git.

On the other hand, in an environment with hundreds of validators, management and bulk customisation can become more difficult. Those are the concerns that this approach is mean to answer.

Approach

Currently, the standard workflow for getting a written plug-in processed by Pyblish is to register its file; either via the environment variable PYBLISHPLUGINPATH or at run-time via pyblish.api.register_plugin_path().

Here’s an exploration on what things might have look like, if we instead wrote multiple plug-ins per file, and managed them like we would any regular class.

It also features two new concepts; search and customisation.

Search

Searching is based on tags. Tags are plain-text keywords added to a plug-in, similar to the families and hosts attributes.

class MyPlugin(...):
  tags = ["geometry", "measurement"]

A search engine then reads these keywords and compares them with a given string provided by the user

validators = pyblish.search("film character geometry")

Searching is either “inclusive” or “exclusive”.

Inclusive means any plug-in tagged with any keyword provided via the search string is included, whereas exclusive means only plug-ins matching all of the provided keywords are included.

A potential user interface would likely provide both.

Customisation

The process illustrated in the two top-most gists includes a “pre-processing” stage, where plug-ins are discovered and then customised before being registered with Pyblish.

A direct advantage of this is that the pre-processing stage can be expanded upon to take input from any arbitrary source, such as a user interface or external file.

For example, in the second gist, plug-ins define an additional attribute on plug-ins, options. Options is a dictionary of various members, each member being evaluated by the plug-in at run-time. Such as the ValidateHeight plug-in. The plug-in is validating the height of an Instance according to the “option” provided by the option["height"] value.

The option["height"] value can then be adjusted either directly…

ValidateHeight.options["height"] = 0.54
ValidateHeight.options["tolerance"] = 0.1

Or externally, such as from an external file…

with open("external_file") as f:
  ValidateHeight.options.update(json.load(f))

The external file can then be generated either by hand…

{
  "height": 0.54,
  "tolerance": 0.1
}

Or by a GUI, that provides knobs for each available option.

A full configuration can then look like this.

hulk/shot5.pyblish_configuration

{
  "options": {
    "ValidatePivot": {
      "predicate": "straight"
    },
    "ValidateHeight": {
      "height": 1.80,
      "tolerance": 0.1
    },
    "ValidateNamingComplex": {
      "regex": "^.*_[A-Z]{4}$"
    }
  },
  "exclude": [
    "ValidateNamingSimple"
  ]
}

Final thoughts

This approach reverses the role of how plug-ins are implemented and used today; instead of plug-ins being tailored to a specific cause, they are made general and customisable.

For example, rather than building a ValidateHeightEquals2 that cannot be modified, you build ValidateHeight with the ability to be configured.

Batch configuration files can be provided that customises a large amount of them in one go, and different configurations can be provided per project or asset. It opens up the door for plug-ins to scale into the thousands and become generic enough to only ever have to be implemented once.

It also reverses the role of how validated data is gathered. Currently, either a validator gathers it’s own information and validates it, or a collector provides data designed for a particular validator. Instead, the gathered data is instead generalised - as though it were a file-format - and validators then adapt to it, instead of the other way around.

This means collections is the only part of validation that remains coupled to a particular host, to a particular asset and that the challenge then is to build “parsers” that “map” a given set of information into something consumable by generic validators.

For example, once a mapping has been built for, say, getting UV overlap information into the respective Instance, it will never have to be built again and applies to anything everywhere.

BigRoy · August 5, 2015, 6:10am

Similar as before the general data that can be provided is very hard to standardize. Every production has its own needs of what is relevant data for Validation, of which some might be computationally expensive to validate. If it’s an expensive computation you would want to avoid it if it’s not needed for productions.

Loose coupling of the Collectors could be interesting here, where individual data pieces come from individual Collectors, like:

Collect Mesh Invalid normals
Collect Mesh UV map count
Collect Mesh Non-Manifold
etc.

Possibly where Collectors only get run if a related Validator is in the pipeline, but that starts looking more like a graph in a node editor.

At first I interpreted this discussion that you wanted to avoid having plug-ins as scripts on disk, but I think you might be mistaking. But instead you’d like to move the critical information of a plug-in to a separate implementation… possibly a file.

So instead of having…

# pseudo
class Validator():
    def process(self, instance):
        assert instance.data('height') < 5.0

You would be doing something like:

# pseudo
class Validator():
    def process(self, instance):
        height = self.get_value('height')
        assert instance.data('height') < height

I can see that having such ‘loading functionality of data’ available in Pyblish could be useful. But I can also see some using the Shotgun API to load the actual height, where others use Ftrack api to load such data.

Or did I misinterpret what you were going for?

Real world example

A good example of variable data to be checked might be a max polycount in a games pipeline. It could differ a lot per project, but also per type of asset. Or even for a specific asset (boss character?).

So the information could possibly be loaded as project-default unless specified for the particular asset which would override it for only one instance. (Would it still work if multiple instances are processed? Does each instance get its own option value?)

marcus · August 5, 2015, 6:29am

Yeah, there are several topics going on here, sorry about that. Still working on sorting things out myself, but there was one thing you said which resonated with what I wanted to communicate.

This. I’ll give an example of it in the next post, first is an example on the topic of role reversal and why it’s interesting.

Edit: Actually, your example relates better to Cooperative Collection so I’ll post there instead.

Traditional

(1). Specific collector (for rigs, only) gathers generic data

from maya import cmds
import pyblish.api as pyblish

class CollectRigs(pyblish.Collector):
  def process(self, context):
    asset = context.create_asset(name="MyAsset", family="MyRig")
    asset[:] = cmds.ls()

(2). Validator knows what to look for and how to find it.

from maya import cmds
import pyblish.api as pyblish

class ValidateIsAnimatable(pyblish.Validator):
  families = ["MyRig"]

  def process(self, asset):
    for node in asset:
      if not cmds.nodeType(node) == "objectSet":
        continue

      if not node.endswith("_SET"):
        continue

      if node.startswith("controls"):
        return "Success"

    raise Exception("%s is not animatable" % asset)

Here, requirement and implementation are both embedded into the validator, meaning a new validator will have to be written in case implementation changes, even though the requirement stays the same.

Reversed

Here’s how it could look like instead, such that the validator only represents requirement, and can be reused in any situation where the requirement is wanted.

(1). Specific collector gathers specific data.

from maya import cmds
import pyblish.api as pyblish

class CollectRigs(pyblish.Collector):
  def process(self, context):
    asset = context.create_asset(name="MyAsset")
    asset[:] = cmds.ls()

    for node in asset:
      if not cmds.nodeType(node) == "objectSet":
        continue

      if not node.endswith("_SET"):
        continue

      if node.startswith("controls"):
        asset.set_data("isAnimatable", True)

(2). Validator knows only the requirement, not how to look for it.

import pyblish.api as pyblish

class ValidateIsAnimatable(pyblish.Validator):
  def process(self, asset):
    assert asset.data("isAnimatable"), "%s is not animatable" % asset

Now the validator can be used wherever there is a collector gathering the isAnimatable property.

From here, a developer implements collectors only, as opposed to both collectors and validators. There could be thousands of these validators available, where developers adapt their collectors to what a validation looks for.

Once that’s done, you can start to append logic and management to just the validators, and have collectors be the one thing separating companies from each other, the one thing anyone has to develop to fully validate anything.

And because this makes collection a more complex, this is where Cooperative Collection comes in, where the isAnimatable data is instead collected by a separate collector.

Does the above answer this concern?

BigRoy · August 5, 2015, 6:53am

I’m afraid it’s solely moving the implementation to Collector and Validators turn into a list of assert instance.data ('something'). Basically meaning all Validators are solely boolean checks; if so why not have a single ValidateBoolean that’s used for all those checks?

Looking at how the Collector and Validator where initially laid out in explanation I would previously had seen it like this:

The collectors collect the information about he asset, its contents and rules.
The validators solely validate this contents and assets based on the collected rules.

So you would be doing something like:

CollectMeshes
CollectAssetMaxPolycount
ValidatePolycount

This still means the Validator has a purpose by not being solely a boolean check, since it stills lookups the meshes for its polycount. If the validators are dumbed down that much they can be shared, but they are not much worth sharing?

marcus · August 5, 2015, 6:53am

No, that’s about right. That’s one possibility in relation to the configurable portion of the post, where business logic - that is, the actual requirement or constraint such as how high something is meant to be - is stored elsewhere.

Shotgun or FTrack is a good place for this. Anyone can see it, anyone can edit it, and it has a direct impact on artist output.

marcus · August 5, 2015, 6:59am

This is a good question!

It’s because: different productions have different requirements.

When validators are separate, they can be included dynamically where needed. A production then specifies their requirements, and the appropriate validators are included.

You’re right, it makes validation incredibly trivial.

It would require a shift in thinking about validation. Validation as individual files/plug-ins may not even be necessary. It’s possible a simple list of key/value combinations to assert against the data of an asset.

rig:
- isAnimatable: true
- height: 1.3

This list could either come directly from a pre-defined file on disk, per project, or from something like ftrack.

marcus · August 5, 2015, 7:43am

Actually, let me take that back.

With this approach, validators focus on contract and interface.

Contract, as in what it does for you and what you must do for it.

ValidateHeight

You agree to: Collect the attribute `height`
It agrees to: Ensure the height of an asset is within a particular range

Interface, as in what options are available.

height (float): Height, in meters, e.g. 2
tolerance (float, optional): Range at which `height` may vary, default is 0.5

The interface can then be built upon with GUIs and other neat things. Picture a GUI providing access to hundreds of validators, a browser and filterer, with options to both customise the validators and store the customisation to disk and have them assigned/reused in various projects!

More here.

Pyblish Customisation

marcus · August 5, 2015, 8:18am

I discovered a problem with this approach.

Which is that we’re missing out on the context-sensitive information a validator currently can provide, like additional assertions and log records.

For example.

import pyblish.api
import maya.cmds as cmds

class ValidateDefaults(pyblish.api.Validator):
  def process(self, asset):
    assert "controls_SET" in asset, "%s did not have any controls" % asset

    difference = dict()
    for control in cmds.sets("controls_SET", query=True):
      if control.startswith("_"):
         self.log.warning("Protected controller found")

      control_defaults = defaults.Node(control)

      if control_defaults.has_defaults and control_defaults.difference:
        self.log.debug("Default found: %s" % control_defaults.difference)
        difference[control] = control_defaults.difference
        ...

It could potentially be solved by collectors also collecting these “notes” about what if finds.

import pyblish.api
import maya.cmds as cmds

class CollectDefaults(pyblish.api.Validator):
  def process(self, asset):     
    notes = asset.data("notes") or {
      "debug": [],
      "info": [],
      "warnings": []
    }
    
    if not "controls_SET" in asset:
      return notes["info"].append("%s did not have any controls" % asset)

    difference = dict()
    for control in cmds.sets("controls_SET", query=True):
      if control.startswith("_"):
         nodes["warnings"].append("Protected controller found")

      control_defaults = defaults.Node(control)

      if control_defaults.has_defaults and control_defaults.difference:
        notes["debug"].append("Default found: %s" % control_defaults.difference)
        difference[control] = control_defaults.difference

    asset.set_data("notes", notes)
    asset.set_data("isDefault", False)
    asset.set_data("nonDefaults", difference)

Which a validator can then output.

class ValidateDefaults(pyblish.api.Validator):
  def process(self, asset):
    for note in asset.data("notes", {}).get("debug", []):
      self.log.debug(node)

    for note in asset.data("notes", {}).get("info", []):
      self.log.info(node)

    assert asset.data("isDefault"), (
      "%s is not defaulted "
      "these nodes were different: %s" % (asset, asset.data("nonDefaults"))
    )

In this sense, a Validator gains an additional function.

Contract
Interface
Visualisation

In that it also represents how data is visualised to the user, as it could choose not to log certain things, or to log them differently, such as appending/modifying the output or optionally writing to a file as per it’s configuration.

BigRoy · August 5, 2015, 8:58am

To be honest I see more power in a GUI (for non-programmers) that basically allows you to type in the name of a data field and put in a value of what it should be. Instead of looking for a Validator that’s predefined with data-names and you can only enter in the values. So you would be in full control of designing your Validator. Especially since you’re currently not gaining anything if the Validator solely checks a value by defining it as a custom Validator.

More and more I feel that it’s not the best way forward to have Validators becoming that ‘simplified’ to only be is a equal to b?

The Validator’s power comes from the fact that what it Validates goes through all the hoops of providing you with the best information as possible of why something is wrong, even checking only certain things if another issue was found. (which is a necessary optimization in many cases when checking dense meshes!)

For example the following would make sense to me if you would only want uvMaps to be numbered uvMap1, uvMap2 without missing numbers in between up to a max number of uv maps.

# Pseudocode
class ValidateUVMaps():
    def process(self, instance):
        # The Collector defines the amount we're allowed to have, 
        # if None provided we default to 1.
        # At least it's not up to the Validator to have such information hardcoded
        min_allowed_uvs = instance.data('uvNumMin', 0)
        max_allowed_uvs = instance.data('uvNumMin', 1)
        name_rule = instance.data('uvNameRule', 'uvMap#')

        # Then these rules are used by the Validator to provide us the best validation 
        # report as possible.
        for mesh in ls(instance, type='mesh'):
            uv_sets = mesh.get_uv_sets()
            num = len(uv_sets)
                
            if num < min_allowed_num:
                # too few
                
            elif num > max_allowed_num:
                # too many
            
            if name_rule:
                for uv_set in uv_sets:
                    name = uv_set.name
                    if not name_rule.match(name):
                        # uv map not according to name