Filtering collected instances based on category/family


We’re are considering writing a custom user interface for pyblish and would need some guiding on best-practices from the Community.

One of our ideas is to have the user select a category of what they want to publish. Let’s say user is in Maya and choose to publish Camera. Then we want to guide the user to only select from cameras in the scene.

Our original idea was to filter down the list of collectors to only collect Camera instances, and possibly only run plugins of camera and * family. Marcus mentioned that this goes against some of the design ideas in Pyblish. I can see that it could conflict with the idea of the inversion of control logic that is already implemented, that determines how the plugins run.

As an alternative I could see us adding a high priority collector that figures out the category from the user interaction and sets that into the context. Subsequent collectors would then avoid collecting instances other than maya cameras. Then no real extra logic has to be added outside of the plugins.

Any feedback is appreciated! :slight_smile:

If you are making your own GUI, you could easily just present certain instances based on family.

Say you have geometry and camera collectors. They run and collect the instances from the scene into families called; “ftrack.geometry” and “”. In you GUI, depending on what the users chooses, you either present the “ftrack.geometry” instances or the “” instances.
Your validator, extractors and integrator can be family specific, if its necessary or just run on all families.

Okay, cool! As I understand you suggest that the issue is solved in the UI - rather than having the pyblish plugins know about this.

Thanks for the input!

Thanks for asking this, it’s an important topic. Let me gather some material and thoughts and get back to you on this properly.

Ok, so this question touches upon the subtle difference between registering plug-ins via paths and associating plug-ins to content via families, so I thought I’d take the opportunity to properly pin-point each one and highlight how and where each fits into “the Pyblish way”.

In short, families is to registration what a surgical knife is to the hammer. You need both to build a… I mean to operate on a… moving on!

Registering Plug-ins

Pyblish provides two primary mechanism for specifying what plug-ins to take into account during publishing.

  1. Via the environment variable PYBLISHPLUGINPATH
  2. Via pyblish.api.register_plugin_path()

Both of which accomplish the same goal of including one or more directories of plug-ins to during publishing - one statically defined into the application process and one dynamically set at run-time.

The original design intent of plug-in registration is to separate plug-ins that have little or no relation, such as those specific to Project A or personal to Artist B.

Typical use look something like this.

$ launch app
$ # do work

The registered plug-ins hence forth represents the ecosystem within which each are expected to work together towards a common goal. For example, one collector may knowingly gather information for a subsequent extractor which both operate towards a compatible integrator.

One may define more than one ecosystem, and activate each where appropriate. Common examples include per-project, per task and per artist. This is where registration shines.


In contrast to the “heavy handed”, global control offered by registration, families offer a finer level control over your data.

With families, you are able to associate a subset of plug-ins to a subset of content following a given pattern. Families are synonmous to an interface or contract in programming terms.

For example, in a system with 100 plug-ins, three of them may apply to a given asset such as a ShotCamera - Collector, ValidateCameraFrustrum and ExtractCamera. Other plug-ins fade silently into the background, as they don’t apply. In pyblish-qml, there is a mechanism in place which hides away incompatible plug-ins from the view, such that only plug-ins compatible with an active instance are shown.

This enables a responsive environment in which “content is king”.

In a typical environment, a technical director specifies this contract, whereas the artist adheres to it. For example, a contract may read all models are prefixed "model_" or rigs contain this special node with our in-house metadata. Should any subset of data happen to adhere to any of these contracts, an instance is created and corresponding plug-ins are associated with it.


Nothing hits home better than an example.

In this example, I’ll start from a heavy-handed contract (read family) and work my way towards specifics.

  1. The entire scene is implicitly identified by the mere act of publishing, the family is scene
  2. A character rig is identified and associated with a subset of plug-ins, the family is rig
  3. A character cache is identified and supercedes the rig, family is animation and families of rig are made optional
  4. A camera is identified with settings different from those in ftrack for this shot, an instance of family cameraDelta is identified

The following plug-ins may run as-is, but are intended as psuedo-code for human consumption.

import os
import pyblish.api

class CollectScene(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder

    def process(self, context):
        instance = context.create_instance("Scene")

        # Assumes an external pipeline having inintialised these values.{
            "user": os.getenv("PIPELINE_USER"),
            "task": os.getenv("PIPELINE_TASK")

The pyblish.api.Context may optionally itself be considered a scene. This makes sense, whereas a benefit of making it an instance lies in the ability to give it a family and thus associate a series of plug-ins for validation, extraction, etc.

import pyblish.api
from maya import cmds

class CollectAssets(pyblish.api.ContextPlugin):
    order = pyblish.api.CollectorOrder

    def process(self, context):
        for asset in"*_AST", assemblies=True, objectsOnly=True):
            # Consider top-level nodes suffixed "_AST"
            # to be pipeline convention for publishable content.
            instance = context.create_instance(asset)

            for attr in cmds.listAttr(asset, userDefined=True):
                # Assume assets maintain an attribute containing
                # their relevant metadata, such as `family`.
      [attr] = cmds.getAttr(asset + "." + attr)

            # (Naively) assume all content under this transform
            # group to make up the entirety of this instance.
            instance[:] =, allDescendents=True)

Upon launching the GUI, the user is presented with the fruits of this configuration, a single instance checked by default and additional instances optionally available.

Given the available options, it’s likely this is an animator working on a character animation, or not. To Pyblish it doesn’t matter. Pyblish is oriented around getting content out of software and leaves task and environment management at the mercy of the surrounding infrastructure.

Final note

The notion of families may seem alien at first, or worse “optional” and somehow not important to your specific use-case. But they are essential to Pyblish for the same reason Pyblish exists in the first place - to keep bad data from escaping your pipeline.

Giving content an identity is what enables plug-ins, and you the developer, to make clear-cut assumptions about what is moving on from the messy workspace of an artist and into the open world of shared data. Pyblish is designed to heavily guard these gates and encourages you the developer to ask permission rather than forgiveness when it comes to the safety and integrity of your data.

It’s safe to say that Pyblish is less about automation than it is about safety.

Thanks marcus.

So in this example, say you have an artist that needs to publish a puppet update, and then open a shot and publish the shot animation. You would set PIPELINE_TASK to “rigging” for the first task (and only run rigging family collectors), and then “animation” for the second (ditto animation family collectors)? I still feel like what’s missing is to be able to just set this explicitly as part of the UI launch.

1 Like

Thank you for the explanation Marcus!

Maybe I’m just new to all this, but it still isn’t clear to me the recommended way to solve our use-case. One thing I understand is that we should utilise families and not changing the available plugins at runtime.

Given the available options, it’s likely this is an animator working on a character animation, or not. To Pyblish it doesn’t matter. Pyblish is oriented around getting content out of software and leaves task and environment management at the mercy of the surrounding infrastructure.

So given that the artist has (at runtime) selected to publish a Camera, and I want to guide them in which instances they can see and select (hide all non-relevant), then how would you go about this? Would I solve this purely in the UI?


I see two ways to go about this; specific collectors and environment dependent collectors.

Specific Collectors

Here you would specify a workflow for publishing. This could be that you know a rig has certain nodes, thus you can assume if these nodes are in the scene, and not referenced, the scene has a rig to publish.

As for animation, which I’m guessing is a matter of alembics/point caches, this becomes a little more tricky to handle as rigs also have meshes and even animation sometimes as well. Here I would probably rely on the users choosing what to publish. Always present the user with an unchecked instance of family alembic.mesh, and they choose when to publish.
Certainly not my preferred workflow, as its not super user-friendly, but an option nonetheless.

You can also set down a workflow for publishing, where you tag objects explicitly for publishing. Think @marcus and @BigRoy explored this for pyblish-magenta, where you tag sets for publishing eg. add rig nodes to a set, and add a boolean attribute called pyblish_rig.

Environment Dependent Collectors

These collectors would query the environment they are run in, to determine whether to create instances or not.
In the collector you would have something like this at the start;

if os.environ['PIPELINE_TASK'] != 'rigging':

You would then just need to change the PIPELINE_TASK environment variable, to switch between different publishing modes.

This could be where the problem lies. Families don’t influence collectors, collectors are what assigns families.

Try this - consider what it would be like if all collectors were always run. If we put aside complexities of per-project or per-task publishing for a second, let’s consider a single global stack of plug-ins.

  1. All collectors always run, they assign families
  2. Subsequent plug-ins are run based on their assigned family

Now, what problem(s) do you see?

If there is a publishable camera, then it should appear as an option in the GUI.

If there is a publishable camera, then it should appear as an option in the GUI

Yeah, it was more about hiding other publishable instances - i.e. hiding instances that are not relevant

Aha! This isn’t a workflow I’m familiar with, could you give an example?

We intend to have a dialog where you first decide what “type” you would like to publish. To guide the user a bit more than just exposing all collectable instances.

Something like:

If you select Camera, we would only let the user choose from a instances of Camera “family”. Our intention is to simplify things for the artist. Does it make sense?

Ok, one way to approach this is to use your dialog to initialise the scene and/or assets with data that a collector could later pick up.

Let’s say we assume a Maya scene with some nodes. Simply hitting “Publish” doesn’t reveal anything at this point, the collectors can’t yet identify what is an isn’t an instance. Your dialog then could “install” attributes, or add a particular node, that collectors are set to search for.

# Psuedo

Pyblish encourages stateful data, that once all has been said and done, simply hitting publish should faithfully recreate a given scenario. This is to faciliatate batch/offline processing, where there is no interface and the user can’t be given an interactive choice.

That’s a good way to frame the question. I would anticipate a lot of collectors, and then a lot of logic for assigning families. I think most users would have a lot of collectors run that are never relevant to them, and I would worry about false positives, or false negatives if I or another TD didn’t fully anticipate all the publishing permutations. I’d be a little concerned about when modelling starts writing their own plugins, not taking into account that they might affect animation, etc.

Part of the problem on my end is that we don’t have a very robust environment management setup, so there isn’t much data available in the environment for divining a user’s task. This is something ideally which would change, but is a bit out of my control at the moment.

So all things being equal, at the moment I prefer to organize plugins in such a way that they’re only loaded and run in a specific context. That way I can say: modellers, put your plugin in the modelling folder and then launch the ui in the modelling context, and you never have to worry about what animation is doing.

This is pretty much what I’m doing at the moment, 3 plugin paths are set prior to launch: the global plugin path, which are always relevant, the application plugin path which is relevant to the host you’re launching from, and finally the task plugin path which is passed explicitly be the artist.

From a workflow standpoint, I see this as being basically the same, but again, all things being equal, it seems more complicated to me to manage adding extra data to the scene (which will be handled differently per application), and then managing the logic for how to interpret that data, versus just saying “only run these plugins.”

I think a lot of us are probably used to each department having their own publish tool tailored to their own needs, and that’s part of the reason I think pyblish is great, unifying the interface and architecture regardless of what is being published. But I don’t feel like having more control of plugin execution undermines that at all, and I suspect this is a request you’re going to keep getting.

This is an opportunity for me to highlight in which direction I’m guiding you. When we’re done, I would like you to be able to say: “modellers, tag your work modelProxy, modelInternal or modelAnimation depending on your intent” where each family (read “contract”) is detailed on your wiki/forums.

modelProxy: "A low resolution, non-essential asset for optimal performance/memory"
modelInternal: "An asset only relevant to either you or those within your department."
modelAnimation: "Animation-friendly geometry, with uv's, no self-intersection and no manifolds"

At which point you, the developer, can design plug-ins that operate based on their intent. Internal assets for example may be allowed to skip some of the more thorough checks, whereas modelAnimation must pass the harshest of checks.

That’s a good way to frame the question. I would anticipate a lot of collectors, and then a lot of logic for assigning families.

Ok, this is good! Let’s try and tackle this.

There is a practice that I’ve more often come to recommend when it comes to making collectors, which is it to allow your td’s to make the decision about what family/families a particular asset belongs to. They know a lot more about the asset than any automated process ever will, and you can extract this information from them.

Have a look at this collector.

This one collector applies to every department, any family and moves the responsibility of you figuring out a family based on environment or circumstance to them assigning it themselves based on what they intend.

Part of the problem on my end is that we don’t have a very robust environment management setup, so there isn’t much data available in the environment for divining a user’s task.

Somewhat of a side track, but this is one of the reasons not just you may sometimes get an answer such as “No, Pyblish doesn’t do that”. I’ve made it my mission to ensure Pyblish only ever concerns itself with publishing tasks, so that remains lightweight and unbloated.

Before publishing, both of these questions need a “yes” answer.

  1. Is the data complete?
  2. Is it outgoing?

Yes, I tried phrasing it in that way to put it in context with Pyblish.

But the truth is, what is needed there falls outside the scope of Pyblish. As I mentioned above, the data must be complete. Your collector(s) depend on it.

Making data complete is opening a whole different can of worms, one that is equally deserving of one or more dedicated tools. I know @BigRoy is really busy at the moment, but he’s got the closest thing to what I try and steer developers towards. Maybe when he finds the time, he could share some of those.

Sorry, could you rephrase this? Just want to make sure I understand completely.

Thanks for excellent questions, this is great, and hope it helps!

You could present all the available instances to the user, and let them decide what to publish. The workflow I’m thinking about is to present a mesh as modeling.mesh and animation.mesh, in an unchecked state. Setting instances initial state can be achieved with['publish'] = False.
When the user decides to publish either of those instances, you can use callbacks to persist the publish state back to the scene. I’d imagine using attributes in Maya, so if the user checks modeling.mesh to be published, you add a boolean attribute to the mesh called pyblish.modeling.mesh (can’t remember if Maya accepts “.” for attribute names) set to True.
Next time someone publishes the scene, the collector recognises that pyblish.modeling.mesh is set to True, meaning that the instance should be checked for publish in GUI.

What you achieve with this workflow, is to keep your code within the Pyblish universe.

Granted this workflow will present the user with an increasing amount of instances, when you add more and more collectors, but that is possibly something that could be accounted for in the GUI by having the instance sections being collapsed by default.


The GUI can and should evolve based on how it is being used.

Great discussion here. Before I jump in let me state that my experience using Pyblish is running in a small animation studio (mostly up to 6 people, with recent spikes to 12). There’s no “environment switching” going on at our location during a session or per task in production (e.g. modeling and animation have the same environment/tools). As such we make extensive use of families.

In a way we have a somewhat similar workflow to @mattiaslagergren image for a GUI (even though not really “designed” yet, but a plain list). Basically the artist defines what he wants to extract. Almost like a form of “export selected”.

Yet instead of instantly doing an export after pressing that button it tags data in the scene (stores it persistently) together with its settings so that subsequent publishes will be done alike. This means publishes become much more consistent. For example in Maya we create objectSets that contain the nodes to be extracted. Simply a Camera export would create an objectSet containing that camera. This set would also store data like startFrame and endFrame, etc.

Then we have a single collector Plugin which just finds the objectSets in the scene and creates a Pyblish instance with that family and data.

In our case we produce an objectSet with a family and whatever the artists adds to that set is included in that extraction. This way we can have many consistent outputs from a work file.

Then Validators are always all on the active plug-in paths but are only active for specific families. E.g. a Camera instance has other validations/extractions than a Pointcache instance. If an additional “type” is to be extracted we define another instance. The nice thing about sets is that the same nodes can be included in different sets at the same time.

And importantly, this also allows us to “batch” update and republish content.

Thanks everyone for your thoughtful and thorough replies.

I just mean that the approach I tried and the alternative suggested in this thread are both basically ways of controlling which plugins are run based on user input:

I understand that adding data to the scene is the appropriate solution in pyblish terms, but if it’s primarily being done to assign families, then to me it just seems like a somewhat unnecessary intermediate step, compared with just having the user set the family directly.

I don’t mean to be argumentative or push for something that’s counter to the principles of the project, I just feel that in my situation, to promote adoption and support the desired workflows, the most straightforward path is to explicitly control what plugins are run based on the user input.

This I don’t actually feel is a viable solution. I think once the number of users * number of publishables gets high enough then incorrect publishes are inevitable, and conversely if I know what I want to publish before I launch the tool, then why not limit what is publishable in that context?

I think at this point, it sounds like you’ve got an understanding of what is favorable with Pyblish but your path is simply slightly different. That’s as far as educated decisions go, and I think that’s fine.

What I’d suggest is that you do follow your own path, odds are you’ll be the one sitting on an upcoming best-practice in the near future once your system has stabilized. Publishing is simply a too unexplored (and documented) of a concept to fully say what’s right and what isn’t at this point.

Maybe once you’ve found your bearing, you could venture down the suggested path too and share a pros/cons. That I think would be the greatest contribution to the project and to publishing overall.