Border Control


#1

Had a quick idea for this after speaking with @mkolar about whether or not Conform should continue when any part of Extraction has failed.

Goal

To facilitate custom “breakpoints” during a publish, called borders.

Currently, publishing stops if any plug-in of order 1 throws an exception.

Implementation

import pyblish.api
pyblish.api.border = [1]

In a nutshell, specify at which orders you would like publishing to stop, unless all plug-ins of earlier orders have finished successfully.

In the above example, if any order before 1 fails - i.e. any Selector or Validator - publishing stops.

import pyblish.api
# Evaluate history after Validation and Extraction
pyblish.api.borders = [1, 2]

# ..after Validation, and half-way through Extraction
pyblish.api.borders = [1, 1.5]

Syntax

As an alternative to the above, which doesn’t take into consideration ranges.

requirements.txt syntax

# Any order greater, or equal to, 1
pyblish.api.border = "order>=1"

# Any order greater, or equal to, 1, but less than 2
pyblish.api.border = "order>=1, <2"

# Only consider failures for plug-ins with an order of 3
pyblish.api.border = "order==3"

Future

#143 implements a mechanism to resolve this in a more flexible manner, but would take longer to develop and likely introduce unknown issues of it’s own. border could work today, without interfering with events in the future.

Thougths?


#2

I kinda don’t get why there can’t and shouldn’t be border between validators, and extractors and conformers?


#3

That is actually very elegant solution considering how little work this would involve. Certainly solves a lot of issue I’m running into right now.

Having range of borders would be great. It would allow for very effective grouping of validators and extractors.


#4

That’s what I was asking in the first place. To simply not run conformers is extraction fails.


#5

I kinda don’t get why there can’t and shouldn’t be border between validators, and extractors and conformers?

There could be, I went back and forth between this during the early days before ending up with how it is today.

I simply couldn’t find a universal enough of an answer to the question “What do I do when one Extractor fails, and another succeeds?”, considering that the extractor will already have made it’s mark on the file-system. Conform would be one place where this could later get cleaned up, if needed.

If anything, implementing border would give us the option of experiencing whether or not it could work with borders at each step.

That is actually very elegant solution considering how little work this would involve.

I’ve got my hands full at the moment, but if you submit a pull-request for it, I’d happily get it integrated.


#6

To make sure we aren’t solving anything with an alternative, possibly better solution, I figured I’d post some tips and tricks of how to manage the problems that sparked this solution, each one adding to the safety of the next, working best altogether.

1. Make your Extractor's immune to fault

Utilize your Validator's to ensure, as best as possible, that extraction can take place safely.

  1. Validate space
  2. Validate permissions
  3. Validate connectivity

2. Extract to a staging area

When (1) isn’t enough, perhaps due to disk-space changes mid-flight, or someone pulling the network cable, avoid writing directly to unsafe ground and write instead to a “staging” area, such as a temporary directory on the local machine.

Think of it like Git commit versus Git push.

# Create stage
import tempfile
commit_dir = tempfile.mkdtemp()

# Perform serialisation
export(commit_dir)

# Make note of where it is, for conform.
instance.set_data("commitDir", stage)

3. Co-operative conformers

When neither (1) nor (2) is enough, make conform co-operative; i.e. allow, don’t assume.

# Brittle
class ConformAsset(...):
   def process_instance(self, instance):
      src = instance.data("commitDir")
      dst = "/server/final_destination"
      shutil.move(src, dst)

# Safe
class ConformAsset(...):
   def process_instance(self, instance):
      if not instance.has_data("commitDir"):
         return self.log.warning("Nothing to conform")

      src = instance.data("commitDir"")
      dst = "/server/final_destination"
      shutil.move(src, dst)

Handling

When all three is in place, errors have various levels at where they can be handled safely.

  1. Artists are notified of problems they can control
  2. Assets are extracted regardless of a faulty network or bad permissions, which can be useful in cases where extraction takes a long time to finish. The data will be safely written to where it can later be manually positioned into it’s final location if needed, or even published again.
  3. Finally, conformers can either position it nicely, or clean-up after a bad extraction, if needed.

#7

A thought on this about performance and multi-processing, that also applies to ordering.

Multi-processing

Ideally, each plug-in would run in parallel, on individual CPUs, so as to complete as fast as possible. But because Conform depends on Extraction, which depends on Validation, which depends on Selection, we can’t.

In a best-case scenario, all plug-ins within each type of plug-in might.

[x] select_models.py            | Batch 1
[x] select_rigs.py              |
[x] select_shaders.py           |
    ---                          
[x] validate_normals.py         | Batch 2
[x] validate_nameconvention.py  |
    ---                          
[x] extract_points.py           | Batch 3
[x] extract_playblast.py        |
[x] extract_mesh.py             |
    ---                           
[x] conform_to_database.py      | Batch 4
[x] conform_plate.py            |
  • Plug-ins within each batch may be processed in isolation
  • Each batch must be processed in order

Another obstacle to this level of parallelism is that hosts can’t handle asynchronism. I.e. you can’t export more than a single object at a time, and you can’t query the attributes on multiple nodes at once, etc.

We can work around some of these obstacles by, say, exposing an individual copy of a host to each plug-in. For example, once validation is complete, 3 copies of Maya could launch in the background or on the farm to perform extraction in isolation.

Some are on the other hand less possible to separate. Which leaves us with something like this.

[x] select_models.py            | Batch 1
    ---
[x] select_rigs.py              | Batch 2
    ---
[x] select_shaders.py           | Batch 3
    ---                          
[x] validate_normals.py         | Batch 4
[x] validate_nameconvention.py  |
    ---                          
[x] extract_points.py           | Batch 5
[x] extract_playblast.py        |
[x] extract_mesh.py             |
    ---                           
[x] conform_to_database.py      | Batch 6
[x] conform_plate.py            |

Conclusion

Now, the point of going through this was to demonstrate that with additional borders, batches are broken down further; once per border more or less, and that each border hinders parallelism.

I also mentioned that this conversation is also relevant to overriding order, which is because a batch can only contain plug-ins of identical orders.

For example, if ValidatorA must happen before ValidatorB, then they both require an individual batch each and cannot be separated from each other.


Hopefully something to think about when coupling your plug-ins, either via the prospective border attribute, or when relying too heavily on order. Performance may not be a concern today, but I suspect it might. Especially when dealing with posting information to a remote server or awaiting API calls made across the internet. Not to mention long-running extractions and conformers.


#8

Including a feature in 1.1 that could help you implement this behaviour.

With that you guys could try it on your own and see what happens if publishing stops after a failed Extraction or any other criteria you might want to abort on.