Filtering collected instances based on category/family

Ok, so this question touches upon the subtle difference between registering plug-ins via paths and associating plug-ins to content via families, so I thought I’d take the opportunity to properly pin-point each one and highlight how and where each fits into “the Pyblish way”.

In short, families is to registration what a surgical knife is to the hammer. You need both to build a… I mean to operate on a… moving on!


Registering Plug-ins

Pyblish provides two primary mechanism for specifying what plug-ins to take into account during publishing.

  1. Via the environment variable PYBLISHPLUGINPATH
  2. Via pyblish.api.register_plugin_path()

Both of which accomplish the same goal of including one or more directories of plug-ins to pyblish.api.discover() during publishing - one statically defined into the application process and one dynamically set at run-time.

The original design intent of plug-in registration is to separate plug-ins that have little or no relation, such as those specific to Project A or personal to Artist B.

Typical use look something like this.

$ set PYBLISHPLUGINPATH
$ launch app
$ # do work

The registered plug-ins hence forth represents the ecosystem within which each are expected to work together towards a common goal. For example, one collector may knowingly gather information for a subsequent extractor which both operate towards a compatible integrator.

One may define more than one ecosystem, and activate each where appropriate. Common examples include per-project, per task and per artist. This is where registration shines.



Families

In contrast to the “heavy handed”, global control offered by registration, families offer a finer level control over your data.

With families, you are able to associate a subset of plug-ins to a subset of content following a given pattern. Families are synonmous to an interface or contract in programming terms.

For example, in a system with 100 plug-ins, three of them may apply to a given asset such as a ShotCamera - Collector, ValidateCameraFrustrum and ExtractCamera. Other plug-ins fade silently into the background, as they don’t apply. In pyblish-qml, there is a mechanism in place which hides away incompatible plug-ins from the view, such that only plug-ins compatible with an active instance are shown.

This enables a responsive environment in which “content is king”.

In a typical environment, a technical director specifies this contract, whereas the artist adheres to it. For example, a contract may read all models are prefixed "model_" or rigs contain this special node with our in-house metadata. Should any subset of data happen to adhere to any of these contracts, an instance is created and corresponding plug-ins are associated with it.



Example

Nothing hits home better than an example.

In this example, I’ll start from a heavy-handed contract (read family) and work my way towards specifics.

  1. The entire scene is implicitly identified by the mere act of publishing, the family is scene
  2. A character rig is identified and associated with a subset of plug-ins, the family is rig
  3. A character cache is identified and supercedes the rig, family is animation and families of rig are made optional
  4. A camera is identified with settings different from those in ftrack for this shot, an instance of family cameraDelta is identified

The following plug-ins may run as-is, but are intended as psuedo-code for human consumption.

CollectScene.py

import os
import pyblish.api

class CollectScene(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder

    def process(self, context):
        instance = context.create_instance("Scene")

        # Assumes an external pipeline having inintialised these values.
        instance.data.update({
            "user": os.getenv("PIPELINE_USER"),
            "task": os.getenv("PIPELINE_TASK")
        })

The pyblish.api.Context may optionally itself be considered a scene. This makes sense, whereas a benefit of making it an instance lies in the ability to give it a family and thus associate a series of plug-ins for validation, extraction, etc.

CollectAssets.py

import pyblish.api
from maya import cmds

class CollectAssets(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder

    def process(self, context):
        for asset in cmds.ls("*_AST", assemblies=True, objectsOnly=True):
            # Consider top-level nodes suffixed "_AST"
            # to be pipeline convention for publishable content.
            instance = context.create_instance(asset)

            for attr in cmds.listAttr(asset, userDefined=True):
                # Assume assets maintain an attribute containing
                # their relevant metadata, such as `family`.
                instance.data[attr] = cmds.getAttr(asset + "." + attr)

            # (Naively) assume all content under this transform
            # group to make up the entirety of this instance.
            instance[:] = cmds.ls(asset, allDescendents=True)

Upon launching the GUI, the user is presented with the fruits of this configuration, a single instance checked by default and additional instances optionally available.

Given the available options, it’s likely this is an animator working on a character animation, or not. To Pyblish it doesn’t matter. Pyblish is oriented around getting content out of software and leaves task and environment management at the mercy of the surrounding infrastructure.



Final note

The notion of families may seem alien at first, or worse “optional” and somehow not important to your specific use-case. But they are essential to Pyblish for the same reason Pyblish exists in the first place - to keep bad data from escaping your pipeline.

Giving content an identity is what enables plug-ins, and you the developer, to make clear-cut assumptions about what is moving on from the messy workspace of an artist and into the open world of shared data. Pyblish is designed to heavily guard these gates and encourages you the developer to ask permission rather than forgiveness when it comes to the safety and integrity of your data.

It’s safe to say that Pyblish is less about automation than it is about safety.

Thanks marcus.

So in this example, say you have an artist that needs to publish a puppet update, and then open a shot and publish the shot animation. You would set PIPELINE_TASK to “rigging” for the first task (and only run rigging family collectors), and then “animation” for the second (ditto animation family collectors)? I still feel like what’s missing is to be able to just set this explicitly as part of the UI launch.

1 Like

Thank you for the explanation Marcus!

Maybe I’m just new to all this, but it still isn’t clear to me the recommended way to solve our use-case. One thing I understand is that we should utilise families and not changing the available plugins at runtime.

Given the available options, it’s likely this is an animator working on a character animation, or not. To Pyblish it doesn’t matter. Pyblish is oriented around getting content out of software and leaves task and environment management at the mercy of the surrounding infrastructure.

So given that the artist has (at runtime) selected to publish a Camera, and I want to guide them in which instances they can see and select (hide all non-relevant), then how would you go about this? Would I solve this purely in the UI?

Thanks!

I see two ways to go about this; specific collectors and environment dependent collectors.

Specific Collectors

Here you would specify a workflow for publishing. This could be that you know a rig has certain nodes, thus you can assume if these nodes are in the scene, and not referenced, the scene has a rig to publish.

As for animation, which I’m guessing is a matter of alembics/point caches, this becomes a little more tricky to handle as rigs also have meshes and even animation sometimes as well. Here I would probably rely on the users choosing what to publish. Always present the user with an unchecked instance of family alembic.mesh, and they choose when to publish.
Certainly not my preferred workflow, as its not super user-friendly, but an option nonetheless.

You can also set down a workflow for publishing, where you tag objects explicitly for publishing. Think @marcus and @BigRoy explored this for pyblish-magenta, where you tag sets for publishing eg. add rig nodes to a set, and add a boolean attribute called pyblish_rig.

Environment Dependent Collectors

These collectors would query the environment they are run in, to determine whether to create instances or not.
In the collector you would have something like this at the start;

if os.environ['PIPELINE_TASK'] != 'rigging':
   return

You would then just need to change the PIPELINE_TASK environment variable, to switch between different publishing modes.

This could be where the problem lies. Families don’t influence collectors, collectors are what assigns families.

Try this - consider what it would be like if all collectors were always run. If we put aside complexities of per-project or per-task publishing for a second, let’s consider a single global stack of plug-ins.

  1. All collectors always run, they assign families
  2. Subsequent plug-ins are run based on their assigned family

Now, what problem(s) do you see?

If there is a publishable camera, then it should appear as an option in the GUI.

If there is a publishable camera, then it should appear as an option in the GUI

Yeah, it was more about hiding other publishable instances - i.e. hiding instances that are not relevant

Aha! This isn’t a workflow I’m familiar with, could you give an example?

We intend to have a dialog where you first decide what “type” you would like to publish. To guide the user a bit more than just exposing all collectable instances.

Something like:

If you select Camera, we would only let the user choose from a instances of Camera “family”. Our intention is to simplify things for the artist. Does it make sense?

Ok, one way to approach this is to use your dialog to initialise the scene and/or assets with data that a collector could later pick up.

Let’s say we assume a Maya scene with some nodes. Simply hitting “Publish” doesn’t reveal anything at this point, the collectors can’t yet identify what is an isn’t an instance. Your dialog then could “install” attributes, or add a particular node, that collectors are set to search for.

# Psuedo
dialog.selection().make_camera()
pyblish_lite.show()

Pyblish encourages stateful data, that once all has been said and done, simply hitting publish should faithfully recreate a given scenario. This is to faciliatate batch/offline processing, where there is no interface and the user can’t be given an interactive choice.

That’s a good way to frame the question. I would anticipate a lot of collectors, and then a lot of logic for assigning families. I think most users would have a lot of collectors run that are never relevant to them, and I would worry about false positives, or false negatives if I or another TD didn’t fully anticipate all the publishing permutations. I’d be a little concerned about when modelling starts writing their own plugins, not taking into account that they might affect animation, etc.

Part of the problem on my end is that we don’t have a very robust environment management setup, so there isn’t much data available in the environment for divining a user’s task. This is something ideally which would change, but is a bit out of my control at the moment.

So all things being equal, at the moment I prefer to organize plugins in such a way that they’re only loaded and run in a specific context. That way I can say: modellers, put your plugin in the modelling folder and then launch the ui in the modelling context, and you never have to worry about what animation is doing.

This is pretty much what I’m doing at the moment, 3 plugin paths are set prior to launch: the global plugin path, which are always relevant, the application plugin path which is relevant to the host you’re launching from, and finally the task plugin path which is passed explicitly be the artist.

From a workflow standpoint, I see this as being basically the same, but again, all things being equal, it seems more complicated to me to manage adding extra data to the scene (which will be handled differently per application), and then managing the logic for how to interpret that data, versus just saying “only run these plugins.”

I think a lot of us are probably used to each department having their own publish tool tailored to their own needs, and that’s part of the reason I think pyblish is great, unifying the interface and architecture regardless of what is being published. But I don’t feel like having more control of plugin execution undermines that at all, and I suspect this is a request you’re going to keep getting.

This is an opportunity for me to highlight in which direction I’m guiding you. When we’re done, I would like you to be able to say: “modellers, tag your work modelProxy, modelInternal or modelAnimation depending on your intent” where each family (read “contract”) is detailed on your wiki/forums.

modelProxy: "A low resolution, non-essential asset for optimal performance/memory"
modelInternal: "An asset only relevant to either you or those within your department."
modelAnimation: "Animation-friendly geometry, with uv's, no self-intersection and no manifolds"

At which point you, the developer, can design plug-ins that operate based on their intent. Internal assets for example may be allowed to skip some of the more thorough checks, whereas modelAnimation must pass the harshest of checks.

That’s a good way to frame the question. I would anticipate a lot of collectors, and then a lot of logic for assigning families.

Ok, this is good! Let’s try and tackle this.

There is a practice that I’ve more often come to recommend when it comes to making collectors, which is it to allow your td’s to make the decision about what family/families a particular asset belongs to. They know a lot more about the asset than any automated process ever will, and you can extract this information from them.

Have a look at this collector.

This one collector applies to every department, any family and moves the responsibility of you figuring out a family based on environment or circumstance to them assigning it themselves based on what they intend.

Part of the problem on my end is that we don’t have a very robust environment management setup, so there isn’t much data available in the environment for divining a user’s task.

Somewhat of a side track, but this is one of the reasons not just you may sometimes get an answer such as “No, Pyblish doesn’t do that”. I’ve made it my mission to ensure Pyblish only ever concerns itself with publishing tasks, so that remains lightweight and unbloated.

Before publishing, both of these questions need a “yes” answer.

  1. Is the data complete?
  2. Is it outgoing?

Yes, I tried phrasing it in that way to put it in context with Pyblish.

But the truth is, what is needed there falls outside the scope of Pyblish. As I mentioned above, the data must be complete. Your collector(s) depend on it.

Making data complete is opening a whole different can of worms, one that is equally deserving of one or more dedicated tools. I know @BigRoy is really busy at the moment, but he’s got the closest thing to what I try and steer developers towards. Maybe when he finds the time, he could share some of those.

Sorry, could you rephrase this? Just want to make sure I understand completely.

Thanks for excellent questions, this is great, and hope it helps!

You could present all the available instances to the user, and let them decide what to publish. The workflow I’m thinking about is to present a mesh as modeling.mesh and animation.mesh, in an unchecked state. Setting instances initial state can be achieved with instance.data['publish'] = False.
When the user decides to publish either of those instances, you can use callbacks to persist the publish state back to the scene. I’d imagine using attributes in Maya, so if the user checks modeling.mesh to be published, you add a boolean attribute to the mesh called pyblish.modeling.mesh (can’t remember if Maya accepts “.” for attribute names) set to True.
Next time someone publishes the scene, the collector recognises that pyblish.modeling.mesh is set to True, meaning that the instance should be checked for publish in GUI.

What you achieve with this workflow, is to keep your code within the Pyblish universe.

Granted this workflow will present the user with an increasing amount of instances, when you add more and more collectors, but that is possibly something that could be accounted for in the GUI by having the instance sections being collapsed by default.

Agreed.

The GUI can and should evolve based on how it is being used.

Great discussion here. Before I jump in let me state that my experience using Pyblish is running in a small animation studio (mostly up to 6 people, with recent spikes to 12). There’s no “environment switching” going on at our location during a session or per task in production (e.g. modeling and animation have the same environment/tools). As such we make extensive use of families.

In a way we have a somewhat similar workflow to @mattiaslagergren image for a GUI (even though not really “designed” yet, but a plain list). Basically the artist defines what he wants to extract. Almost like a form of “export selected”.

Yet instead of instantly doing an export after pressing that button it tags data in the scene (stores it persistently) together with its settings so that subsequent publishes will be done alike. This means publishes become much more consistent. For example in Maya we create objectSets that contain the nodes to be extracted. Simply a Camera export would create an objectSet containing that camera. This set would also store data like startFrame and endFrame, etc.

Then we have a single collector Plugin which just finds the objectSets in the scene and creates a Pyblish instance with that family and data.

In our case we produce an objectSet with a family and whatever the artists adds to that set is included in that extraction. This way we can have many consistent outputs from a work file.

Then Validators are always all on the active plug-in paths but are only active for specific families. E.g. a Camera instance has other validations/extractions than a Pointcache instance. If an additional “type” is to be extracted we define another instance. The nice thing about sets is that the same nodes can be included in different sets at the same time.

And importantly, this also allows us to “batch” update and republish content.

Thanks everyone for your thoughtful and thorough replies.

I just mean that the approach I tried and the alternative suggested in this thread are both basically ways of controlling which plugins are run based on user input:

I understand that adding data to the scene is the appropriate solution in pyblish terms, but if it’s primarily being done to assign families, then to me it just seems like a somewhat unnecessary intermediate step, compared with just having the user set the family directly.

I don’t mean to be argumentative or push for something that’s counter to the principles of the project, I just feel that in my situation, to promote adoption and support the desired workflows, the most straightforward path is to explicitly control what plugins are run based on the user input.

This I don’t actually feel is a viable solution. I think once the number of users * number of publishables gets high enough then incorrect publishes are inevitable, and conversely if I know what I want to publish before I launch the tool, then why not limit what is publishable in that context?

I think at this point, it sounds like you’ve got an understanding of what is favorable with Pyblish but your path is simply slightly different. That’s as far as educated decisions go, and I think that’s fine.

What I’d suggest is that you do follow your own path, odds are you’ll be the one sitting on an upcoming best-practice in the near future once your system has stabilized. Publishing is simply a too unexplored (and documented) of a concept to fully say what’s right and what isn’t at this point.

Maybe once you’ve found your bearing, you could venture down the suggested path too and share a pros/cons. That I think would be the greatest contribution to the project and to publishing overall.

At the original time of implementing that it wasn’t about families. On a more global level it was meant to store information like startFrame, endFrame, preRoll, etc. All the things you’d want to avoid “guesswork” to what the Artist intended. For us this made our scenes’ content much clearer. It actually becomes much easier for the artists in the long run: if they publish an update of a shot they don’t need to re-enter all this information, and of course ensure it’s consistent where it needs to be.

Whether it persists in the scene or in an asset management database of course would be all up to personal preferences. I’m purely debating the aspect of persisting those choices.

In a way you could see the “instance” that is created by user input and persists with it as a “mini-environment” in the scene. Basically that defines what gets run, whether that’s through some state managing of your environment or pyblish families I guess is up to you.

The problem is exactly in the end of that sentence. If you already know what you want to publish then there’s no need to push buttons or even have user intervention Once there’s two models coming from one scene, or a camera and a character coming from a scene (both separately) then it becomes a bit more guesswork to understand the Artists’ intentions. Does he even want to have that camera published? Similarly what frame to what frame are required for a pointcache? Does it need pre-roll? Handles?

Anyway, as we started taking more control of Publishing (moving to Pyblish made things easier for us, so we took it a step further) we learned more about possible pitfalls and also what were mindblowing improvements that we hadn’t expected. I’d say the best step at this stage is to leap in and run a production.

This sounds spot on!

I absolutely agree that if you present all the instances to the user, it’ll quickly become overwhelming for them and they could potentially publish something they don’t intend on.
But I do think you can present the instances in GUI in a cleaner way, so they are more aware of what they are publishing. Currently the long list in pyblish-qml and pyblish-lite of instances, becomes unmanageble with 10+ instances to make a decision on.
If you had a section where the user could see all the instances, that are checked for publishing, which could be just two of 10+ instances, they would have a better overview of what they are publishing.

As a side note, we kinda had this problem with publishing from Hiero, where I present the user with all shots tagged for publishing. If the user had a video track with tens of shots, it would produce tens of instances, meaning it would become tedious to select specific shots to publish.
The solution was to collect the current selection when publishing (Hiero’s implementation is right-click menu), and only have those shots checked for publishing. This made it much quicker to publish specific shots, without having to reorganise the tags within the Hiero project.

1 Like

Cheers everyone! This has been a great discussion for me. I’ll let you know how I get on.

Same here, very helpful!