Pyblish File Format

Goal

To separate Collection from Validation.

Motivation

I’m interested in pursuing a world where validation is as simple as…

assert rig.data("isAnimatable")

As it means technicalities can eventually be overcome via an abundance of collected information, such as isAnimatable.

I’d like contracts to be made possible, and as simply as possible, such as…

character.rig:
- height: 2.0
- performance: 30fps
- isPointcachable: true
- isAnimatable: true

Where height is a data member and it’s value is asserted by the relevant validation plug-in, similar to the above.

The contract could then be modified by anyone familiar and in charge of quality control, without the ins and outs of programming, pipeline or Pyblish. It would fully separate the business logic (i.e. Validation) from it’s implementation (i.e. Collection).

Implementation

On disk, the format is plain JSON, generated in the most performant manner possible; e.g. a C++ plug-in.

Some development targets.

format: json
outputSize: ~1mb
serialisationTime: <1second
humanReadable: true
informationCapturePercentage: 90

The format would remain minimal and web-friendly, such as taking < 1 second to produce, being transmittable across HTTP and occupy less than 1 mb on disk, containing information relevant only to Instance's and validation.

For example.

  • transform
  • boundingBox
  • currentFile
  • Children of each Instance along with attributes of each child e.g.
    • type
    • transform
    • normalDirection
    • boundingBox
  • UV shells along with attributes of each e.g.
    • overlapped
    • inverted

The resulting file might resemble something like this.

hulk_shot1.psh

"MyInstance": {
  "data": {
    "transform": [
      [0.0, 10.0, 0.0],
      [0.0, 0.0, 0.0],
      [1.0, 1.0, 1.0]
    ],
    "boundingBox": [
      [0.0, 0.0, 0.0],
      [10.0, 10.0, 10.0]
    ],
    "currentFile": "c:\\project\\myfile.mb"
  },
  "children": [
    {
      "name": "arm_GEO",
      "attributes": {
        "type": "mesh",
        "transform": [
          [0.0, 10.0, 0.0],
          [0.0, 0.0, 0.0],
          [1.0, 1.0, 1.0]
        ],
        "normalDirection": [0.0, 5.43, 0.33],
      },
      "uvShells": [
        {
          "id": 0,
          "inverted": false,
          "overlapped": true
        }
      ]
    }
  ]
}

Format

Any declarative format is equally well suited and the choice ultimately boils down to either cosmetics or compatibility. JSON was chosen for it’s compatibility with JavaScript and web-technologies, as the format is expected to be transmitted either locally via IPC or remotely such as to a server in the cloud.

Scope

To cover a vast majority of validations, but no extractions or integrations.

In the most epic of production scenes, no file is ever expected to reach the 1 mb ceiling. I’m expecting the vast majority of space currently consumed by large scenes to be per-point data, such as vertices or animation curves. Without these, raw data relevant to validation should remain in the kb-range.

In addition to the 1-to-1 data typically stored by the DCC software, computed information may also be stored. In the above example, uvShell is such a member as it contains information not normally present in the scene file itself, but rather post-computed by a relevant visualiser (in this case the UV editor) based on surrounding information - in this case the UV points themselves.

The Pyblish file format would include such members and provide for additional members to be added dynamically, similar to how the Alembic file format provides a foundation of default attributes, e.g. pointPosition but also offers the ability to attach any arbitrary data at run-time.

This arbitrary data should enable the foundation Pyblish file format to remain small and nimble, yet cover a majority of validation scenarios, whilst still giving room to the more esoteric uses most scenarios will undoubtedly have.

Discussion

The proposed design changes how we perceive validation today. Rather than tailoring each validation to a particular asset, properties of each asset is mapped into a format compatible with Pyblish and a series of available validations.

The validations can then be pre-built and be made configurable, as discussed in Plug-ins as Modules.

I’m seeing a future where there are hundreds, if not thousands of these curated validators available to us, all written to conform to the Pyblish file format and the information available within it, and possible available in the cloud with no need to install or set them up.

Any information not already available in the file format itself may be added dynamically, similar to Alembic, and would become equally ad-hoc to the ones we create today. The goal of course being that the file format should encompass >90% of the need and that >90% of unavailable information could be computed indirectly.

1 Like

Just to be sure because I found the above text a bit confusing. The above file format you’re describing is not to describe the contract but to store the collected information from Collectors so it’s transferrable in a predefined format to the Validators.

Because at first you describe the contract and how one would set that up, but later you continue directly from there into an implementation. But the implementation doesn’t seem to describe the contract, but the storage of data from collection. Correct?

The out of the ordinary cases

This might be a bit unrelated, not sure if it fits.

It’s important that we should keep the ability to ‘run it our way’.

I wonder how many here will Validate elements in a scene that out of context of a DCC doesn’t mean a lot. Critical validations could be actually bug-fixes for a very specific version of Maya.

An example would be that the Alembic exporter in older versions of Maya (I think it was Maya 2013?) would crash if a Nucleus node was in the scene (even if not included in the ‘to be exported nodes’). Thus it became crucial that our publishing system used at the time gave a notice to the user if it contained such a node. It’s a perfect fit for a very specific Validator in Pyblish?

That’s right. The contract is the motivation for the file format; to enable it in the first place.

The format itself would collect agnostic information, like the regular Maya scene format. Just in terms of validation.

Yeah, that’s a good point.

I suppose there two sets of validations here;

  1. Validators related to avoiding trouble, such as your example
  2. And validators related to defining output, such as ensuring the height of a character

The file format would only be relevant to (2).

1 Like

In that sense I think it’s closely related to the plain data discussion in that we define a best practice on setting up your collectors and validators in such a way that they are “worth most” spread out over multiple productions.

I’m totally on board with having data formatted with a certain naming convention and in a predefined format and setting a standard. I’m mostly thinking about what data should go into that format. And that is related to the issues you could get by dumbing down Validators to only using predefined data and such as raised in the pyblish search and customization discussion

The above data format makes much more sense to me in the form of a graph of calculations, where the UV data is only pulled if anything further the down the line requires such information. But I do think it’s the best way forward if we can have some best practices regarding how data should flow through Pyblish, even if it’s solely to get everyone’s workflow similar/recognizable.

That’s an optimisation. It’s safe to run all collectors always, until we can start to see a pattern which we can optimise. Before that point, “premature optimisation is the root of all evil”.

I think best practices will have to come from practice. And so will knowing which data to include in the file format.

You are right, it is a matter of finding the right things to collect.

1 Like