Pyblish Magenta

marcus · July 3, 2015, 6:55am

I think we’re getting a bit overprotective.

I would limit validations only to things that could cause trouble further down the pipeline. Deleting unused nodes isn’t one of them and certainly not locking of nodes.

Protecting against locking of nodes, to me, suggests a fault in responsibility and contribution. For example, to me, an animator delivers either a point-cache, or animation curves. He does not deliver the rig. If nodes have been deleted in the rig during animation, then it shouldn’t have any effect on his contribution, other than a potentially messed up animation, which is his responsibility.

mkolar · July 3, 2015, 7:39am

From reading this thread from the outside (not working with magenta), I have to agree with @marcus completely. Don’t go over the top with validators, because apart from from marcus mentioned, you might also be restricting non-character rigging (unless you consider that a completely different type of task). I’ve seen rigs of props that have all kinds of awkward setups, hierarchies, separated nodes, all for good reasons. With so many validators in place you are restricting magenta to your way of thinking rather than building a pipeline that if flexible enough for other small studio to potentially adopt.

BigRoy · July 3, 2015, 7:40am

It’s more important to ensure unused nodes are not in the output than that they are in the work file. It would be useless output since they are unused. In that sense it would be great if a Validator could ensure nodes in the instance are actually of use. Figuring out which nodes are actually of use still stays tricky.

marcus · July 3, 2015, 7:52am

There’s a balance to be struck here, I think.

I think that absolutely everything that can cause trouble down the pipeline should be validated where possible. My only objection is to validate things that are pure cosmetics. Cosmetics are obvious and annoying, sure, but they don’t do any harm. I.e. they are subjective.

My vision for publishing in general is to enable a strong integration between disparate parts of a pipeline, such that a rigger can come to expect a certain format of any models coming in, and an animator can come to expect a certain format on any rigs coming in.

This way, there is unbounded flexibility and potential in the tools that can be built to facilitate and automate these processes.

If publishing is done right, than every possible step in your pipeline, once crafted by an artist, should be reproducible in an automated fashion. That means that once rendering is done, the modeler should be able to modify UVs, a lookdev artist should be able to add some splashes of blood to the murder weapon, and neither of those interactions should require any intervention from any other artist to get a new render out.

That’s the kind of strong integration I’m working towards and hoping you are interested in striving towards too.

BigRoy · July 5, 2015, 6:54am

Hope everyone can have a look at the Github issue about the Collector, Extractor and Integrator chain we’ll have to implement.

github.com/pyblish/pyblish-magenta

The Collect, Extract and Integration chain

opened 06:52AM - 05 Jul 15 UTC

closed 05:17PM - 17 Feb 16 UTC

BigRoy

### Goal Decide upon the way forward for building new family types and how to i…mplement its `Collector`, `Extractor` and possibly `Integrator`. The main goal is to have a simple, consistent and strong solution that can be used throughout Magenta and also allows the plug-ins to be easily used in other packages. ### Implementation Our `Integrator` depends on knowing data about the _Extracted_ content. It needs to know: - Where the files have been extracted to, the `extractDir` - Where the files need to be integrated to, the `integrateDir` We also want to implement _versioning_. So that could be additional required data. The `extractDir` data is defined in the `Extractor` (see `plugin.py`) and is a temporary directory. This sets up the dependency that any instance has only 1 `extractDir`. This would require to inherit from `plugin.py`. The `integrateDir` is computed using the project's schema and data from the instance. That data is: - Any data that is required to format a path, like `root`, `container` and `asset`. - The family name is used to define what output template from the project's schema to use for formatting. Currently I've separated how we inject data into the Instance by taking it from the Context. This way injecting this data into the instance does not need to be done from within each `Collector`. Have a look here: BigRoy@b6e4d196961704fb32a49639642afba49a44be25 Though this will always override it for any instance that has been Collected, which might be more annoying than what we gain from removing this duplicity in code. --- Thoughts?

Currently the discussion is wide open. The workflow can become anything we want so hop in with best practices, ideas and let’s see where we can go.

marcus · July 10, 2015, 12:04pm

I’ve pushed what is from my perspective an ideal method of collecting an asset from a Maya scene here.

https://github.com/pyblish/pyblish-magenta/pull/19

It boils down to this.

# Capture nodes relevant to the model
with pyblish_maya.maintained_selection():
    cmds.select(assembly)
    nodes = cmds.file(exportSelected=True,
                      preview=True,
                      constructionHistory=True,
                      force=True)

# Reduce to relevant nodes
shapes = cmds.ls(nodes,
                 noIntermediate=True,
                 shapes=True,
                 long=True,
                 dag=True)

Which does a few important things.

It gathers all connected nodes, by relying on Maya’s export mechanism
It then filters it down to data only relevant to validation and export.

From here, data is solely filtered downwards and no new data is added related to this instance.

The benefit is that (1) validation only ever touches data that is actually relevant and (2) a scene can be infinitely messy, but still come out ok. No validation happens on data that isn’t collected, and only relevant data is guaranteed to be collected.

This is in contrast to collecting via cmds.ls(type="mesh") or similar, which assumes an entire scene is of interest, even though only part of it is ever exported, in this case an assembly by the name of {asset}_GRP, e.g. ben_GRP.

Extraction at this point may look like this.

with pyblish_maya.maintained_selection():
    cmds.select(instance, noExpand=True)
    cmds.file(path,
              force=True,
              typ="mayaAscii",
              exportSelected=True,
              constructionHistory=False)

The constructionHistory=False is the key here. It will extract only what has been collected during collection, which is exactly what we are guaranteed to have validated.

BigRoy · July 10, 2015, 12:21pm

Note that the file export preview will also include other things outside of constructionHistory, there are much more parameters that are related to the command. For example channels, expressions and constraints. Again these could also be set to False.

Also missing (in your snippet) is what is assembly and what happens if it doesn’t exist? Of course that’s additional information, but this is how it has been so far:

assembly = cmds.ls('|{asset}_GRP'.format(asset=asset), objectsOnly=True, type='transform')

if not assembly:
    return
else:
    assembly = assembly[0] # we don't want to use the list, but only a single assembly node

In comparison this is how I had getting only the required data implemented:

# Get all children shapes in the assembly
shapes = cmds.ls(assembly, dag=True,
                 shapes=True, long=True,
                 noIntermediate=True)

# Include the parent hierarchy of the shapes
nodes = set()
nodes.update(shapes)
for shape in shapes:
    nodes.update(get_all_parents(shape))
nodes = list(nodes)

Note that here it’s clear that it only takes shapes under the node plus the hierarchy above those shapes, it’s not scene-wide! Plus it’s clear that shape names are returned as their long names variant. Not sure what cmds.file(exportSelected=True, preview=True) does there?

In short, it’s more transparent than what cmds.file provides with previewing an export.

Note that your example does not include the parent hierarchy of the shapes if you only use the shapes variable. Even though it’s crucial for extraction + validation. It’s included data, right?

marcus · July 10, 2015, 1:09pm

assembly is the root transform.

assembly = '|{asset}_GRP'.format(asset=asset)

And your if not was re-written to.

assert cmds.objExists(assembly)

That’s not the point, the point is that if you collect a number of nodes you expect to get exported, Maya might add additional nodes via to their connections that you won’t notice, nor be able to validate, until the file has already been exported.

marcus · July 10, 2015, 1:12pm

The long flag on cmds.ls includes the full path of any shape. What else is there, that isn’t simply duplicating information?

Are you thinking about something along these lines?

nodes = ["|node1|node2|node3",
         "|node1|node2",
         "|node1",
         "|"]

marcus · July 10, 2015, 1:32pm

Here’s a visual to go with my reasoning for collecting information, you’ve probably got the same same line of thinking, but just to make sure we’re on the same page.

Here, an Instance consists of a number of types of data, in the case of a model it’s limited to meshes. Subsequent steps then picks the instance apart and uses only the parts that are relevant to a particular validation; e.g. validating naming convention only bothers looking at names, and validating normals only looks to metadata.

Extraction is the same, only extracting parts of an Instance that has been collected, not gathering any additional information.

Finally, integration produces the information that others will then be able to use.

BigRoy · July 10, 2015, 2:43pm

We are talking about the same thing, but sometimes Validation refers to data that does not end up in the Extraction. For example you could validate whether there is construction history, but you might skip it within extraction. Validation is then purely to warn or provide information to the artist. There could even be data that influences the output data without being present in the output data

For example an animation curve and how it influences a mesh over time where publishing from a different frame would result in a different mesh… and the animation curve doesn’t have to be present in the output, but it was definitely influences by it. Does that make sense?

Also some data just doesn’t fit nicely into the Context, or not as an abstracted piece of data. As said before I would rather have the minimum amount of data in the instance that clarifies what will be extracted.

An example might be fitting. (Pseudocode)

class CollectMesh(pyblish.api.Collector):
    def process(self, context):
        instance = context.create_instance(name='asset', family='model')
        
        shapes = mc.ls(type='shape')
        for shape in shapes:
            instance.add(shape)
            
class ValidateNoKeys(pyblish.api.Validator):
    def process(self, instance):
    
        # Use Maya's keyframe command to query if there are keys on the node
        # in the instance
        if cmds.keyframe(instance, q=1, time=True):
            raise ValueError("Keys are present!")

This validator would make sense to a Maya programmer or Maya artist. They know the Maya commands and know what data will result from it. As opposed to some custom type being stored in the instance. What if we would want to only check whether there were keys on translateX? Or check whether keys are in certain bounds?

Why put all that information in that one huge Selector in the beginning and write complex Validators for that if there are optimized methods in the DCC that people already know, and probably even already have scripts for?

What do you gain?

marcus · July 10, 2015, 2:56pm

This is backwards, and the example is clearly an edge-case.

It’s difficult to reason about without having something to point at.

You say Collectors (edit: I mean Instances) will get heavy, I say they won’t.

You say collecting a lot of information makes for one complex Collector, I say collecting information in Validators makes for many complex Validators, and not to mention a clash of responsibility, meaning it will add cognitive load and take longer to comprehend.

Complexity needs a home. I’m saying we put it where it is expected, not spread it out, and allow business logic to be as simple as can be. Business logic is what makes a pipeline, the rest is technicalities.

In other news, I’ve encountered a problem in our current design for associating an asset with a family.

Currently, the current working file is used to determine a family, and the family is then used to determine where to publish.

This means we can only ever publish a single family from a scene. In the case of ben model, we currently have two; one for the mesh, and one for the quicktime and gif, that both need to go into the final published directory, but would at this point produce different unique paths.

To solve this, we could instead of basing an output path on the family, base it on the task, such as example modeling.

This way, modeling determines how files are initially created via be and finally published via pyblish. The contained instances within the scene then just follow along for the ride. Symmetry is a good thing.

BigRoy · July 10, 2015, 3:08pm

If you have the solution let’s put it in. To me family was the exact same as task, they had no different meaning. This is because they didn’t provide any different data.

I’m actually saying that Validators will become more complex since they end up validating data that you’re only familiar with if you know how the Collector is implemented. This would also mean that Validators will only work with matching types of Collectors, thus making them shareable is a lot harder.

class ValidateNoKeysBeforeZero(pyblish.api.Validator):
    def process(self, instance):

        invalid = []    
        for node in instance:
            keys = node.keyframes()
            for key in keys:
                if key.time < 0:
                    invalid.append(node)
                    break
                
        if invalid:
            raise ValueError("This is sooooo wrong!")

Even though not that complex I wouldn’t know that a node has a keyframes method, or even whether it would hold keyframes data? What if I’m using another Collector retrieved from somewhere else, would it provide that same data, with a similar method or data name? If not… my Validator becomes useless either way?

class ValidateNoKeysBeforeZero(pyblish.api.Validator):
    def process(self, instance):
    
        invalid = []
        for node in instance:
            times = cmds.keyframe(node, q=1, time=1)
            if any(t < 0 for t in times):
                invalid.append(node)
                
        if invalid:
            raise ValueError("This is sooooo wrong!")

This isn’t that much more complex… even shorter! Plus (to me) it uses commands I already know from working with a DCC. In this case I use Maya commands. Actually I can grab any complex script for Maya that checks something, for example sub-frame keys on a node, and put it straight into a Validator! Or maybe even checking static channels (whether key’s values vary over time instead of checking only where they exist in time).

If the Collector provides data that is known to those who work with a host (like Maya), eg. the long names of a node, they can write their validators with concepts they already know. Plus it’s way more likely that Validators become shareable since I believe most will set up the collection in a way that is most familiar to them with a given DCC.

marcus · July 10, 2015, 3:21pm

Family determines a particular type of data.

In our case, we’ve got a model, which is processed by modeling plug-ins, outputting a .ma file, and quicktime, which is processed by a different set of plug-ins, outputting .mov and .gif files.

I disagree.

class ValidateNormals(pyblish.api.Validator):
  def process(self, instance):
    assert instance.data("normalsDirection") != "in", (
      "Normals of %s pointing in the wrong direction" % instance)

The idea is that anyone with an interest in the logic of your pipeline can read it.

And it also means that any collector that adds this one attribute is eligible for this validator, which means a single validator could have any number of implementations from collection, some more complex for accuracy, others quicker for performance, for example.

We’re talking mostly about The Deal in particular here, but don’t forget that plug-ins are supposed to exist across many hosts, including those you may have less experience with.

Consider a validator like the one above for 3ds max, or Mari or any other host you aren’t already intimately familiar with. You wouldn’t be able to modify business logic without also being technically adept at their individual APIs. This isn’t good, and hinders shareability and hinders maintenance.

marcus · July 10, 2015, 3:28pm

Versioning is up and running.

https://github.com/pyblish/pyblish-magenta/pull/20

Key points here are:

Each publish produces a unique directory per version under /publish/v001.
Each extractor now extracts to a particular format, as opposed to a particular family. Before an extractor was e.g. extract_model, now it is extract_maya_ascii
Each extractor creates it’s own temporary directory, which is finally cleaned up during integrate_cleanup.py with an order of 99
Versioning is handled last, during integration.
No more collect_instance_data
No more collect_version
No more integrate_increment_version
Integration directory is now based on TASK as opposed to family, allowing multiple families coming out of a single task.

Also, maya/exporter.py was reduced to a single call to maya.cmds, with the same results.

cmds.file(path,
          force=True,
          typ="mayaAscii",
          exportSelected=True,
          constructionHistory=False)

constructionHistory=False here is key and assures that only validated data is exported, nothing else.

marcus · July 10, 2015, 4:43pm

Adding shot numbers to storyboard.

Boards without a number are part of the previous board.

BigRoy · July 10, 2015, 5:20pm

Not sure if it’s valid to say that the output remains the same:

Are constraints also filtered? No.
And keys? No.
How about display layers? Might not be?
Are shaders filtered? With this new export command shaders are included (if the applied shader is not lambert1)
Are empty null transforms in the assembly hierarchy also filtered? No.

There’s a reason the exporter was overly verbose.

marcus · July 10, 2015, 5:21pm

And that’s the reason constructionHistory=False.

Only shapes are included in the export.

BigRoy · July 10, 2015, 5:30pm

Have a go with pyblish\thedeal\assets\test\modeling\work\maya\nasty_mesh.mb

Shaders are still exported. Animations as well, expressions are also in there after export. Even the displayLayer survived. These are all elements unrelated to the model output data and should be excluded.

Interestingly enough it lost the ‘connection’ it had with the locator… And with only the shape selected during export (not the top group) the children transforms were skipped.

So to validate all transforms in a hierarchy we would iterate each object and check the parents, since these transforms would have to be excluded from Collection to not be extracted? Or how do we filter to only Extracting the shapes if the transforms are also collected? Do we filter again in the Extractor?

Now that could be handled in Validators, but that also ensures the modeler can’t have it in the scene even if it’s for his own convenience. To validate we would have to Collect it, as per your standards. Rather be exclusive?

marcus · July 10, 2015, 5:38pm

I think you uncovered a flaw in our current collector, “shapes” applies to locators as well.

Maybe this could be fixed by passing exactType="mesh" to ls as opposed to shapes=True?

This works for me.

cmds.select("test_GRP")
nodes = cmds.file(exportSelected=True,
          constructionHistory=True,
          preview=True,
          force=True)
          
# nodes = cmds.ls(nodes, shapes=True)
nodes = cmds.ls(nodes, exactType="mesh")

cmds.select(nodes)
cmds.file(r"output.ma", typ="mayaAscii", exportSelected=True, constructionHistory=False)

Edit: fixed here.