Pyblish Magenta

marcus · August 14, 2015, 7:51am

Thanks @mkolar for the input!

Quick Recap

Here’s a quick overview of the progress so far, such that both ourselves and those new to the thread can get a sense of where things are and where they are headed.

About three months ago, we embarked on the journey to build an open source publishing pipeline, Magenta. Since then, @BigRoy of Colorbleed has kindly ported his existing publishing functionality over to Pyblish and we’ve been adding to this ever since.

Along the way, we realised that building a pipeline is best done in context of an actual production, as there are many ways to skin a cat and we didn’t want to overburden the responsibilities of this pipeline any more than was needed. So we set out some guidelines for a target project development environment and got started on a sharp project.

The project itself is simple; involving only the most fundamental aspects of any production, such as file management, scene layout and flow of information across multiple departments without loosing the ability to change and keeping things scalable.

The task on the other hand is quite sizeable, not for the faint of heart so to speak, but today we’re in the tail-end of rounding up a draft of this pipeline based on this project and are very excited with the results!

To add to the conversation, the project or pipeline, let us know right here, or send contact one of us privately. The pipeline is designed for use by anyone who meets the criteria for which it has been designed.

The next step is refinement of the draft, running it within more projects with heightened requirements and tighter deadlines, aiming towards simplicity, flexibility and extensibility; a.k.a. perfection.

marcus · August 14, 2015, 8:36am

Fully qualified filenames

@mkolar and @BigRoy, here’s a version of filenames with more metadata, what do you think?

# Directories fully qualify the asset up till this point,
# where `ben01` is the name of the instance
\thedeal\film\seq01\1000\animation\publish\v036\pointcache\ben01

# The filename then embeds the same information
thedeal_seq01_1000_animation_ben01_v036.abc

# Full path:
\thedeal\film\seq01\1000\animation\publish\v036\pointcache\ben01\thedeal_seq01_1000_animation_ben01_v036.abc

Here’s what each directory represents.

/{project}/film/{topic}/{topic}/{topic}/publish/{version}/{family}/{instance}

Pull-request

As always, code review goes into GitHub, and workflow goes here on the forums.

BigRoy · August 14, 2015, 9:00am

Looking good @marcus.

I do think this is the other end of the spectrum.
Personally I would remove the project’s name from the filename.
Other than that, really nice.

marcus · August 14, 2015, 9:16am

I see, how come you would do that?

BigRoy · August 14, 2015, 9:32am

Not sure.

I’m not saying it’s not less useful to have it in there, just that it’s likely that anyone who finds the file knows where the file belongs to. But I guess it totally depends on what file it is, it could just as well have been an unreadable data-structure.

How would you keep the project’s name short. Luckily thedeal is pretty short and descriptive, but working with clients you often encode the client (abbreviate it?) and the campaigns name/description. Or how far would you abbreviate it? If you do a winter campaign about a new Coca Cola drink named Lola, maybe CC_Lola could work? The question is more how short is the project’s name usually?

Working in a project (and using the applications to play the files) they would always be showing the same first ‘blurp’ of thedeal or CocaCola_Lola_... if you don’t abbreviate.

marcus · August 14, 2015, 9:40am

That’s a good question, typically there’s a “codename” for pipeline and production purposes, and an “official” name more familiar to clients.

I’d keep the codename short and file-system friendly; e.g. all lowercase, no spaces etc. And use a human-readable equivalent in emails, reviews or tracking software like ftrack.

mkolar · August 14, 2015, 2:01pm

You’ve hit precisely what we’re using for every show. We are however moving to dropping sequence name from filename and moving it into the name of the shot, so it’s more unique between sequences and episodes. For example, before we would have rad_ep101_sh1230_anim_v01.ext now we’re moving into

scene: rad_101sh1230_anim_v01.ma
camera: rad_101sh1230_cam_v01.ma

cache: rad_101sh1230_anim_v01_char1.abc
cache: rad_101sh1230_anim_v01_env1.abc
cache: rad_101sh1230_anim_v01_prop1.abc

put simply anything generated from any file always inherits it’s full name, before appending the name of the asset.

codename for us is always short sequence of small letters: rad, drm, vik, mmn… could be longer if it’s easier for the project e.g.: “sister, brother, mum and father” could be sbmf, but always short and simple

marcus · August 14, 2015, 2:05pm

Here’s some notes of things to implement in the future, related but opposite of publishing.

Initialisation

Publishing now happens to an ideal directory for coding, and though browsing to this directory is intuitive and makes sense, there are many levels to account for which slow things down, especially during bulk-import.

For example, populating a shot for lighting from The Deal currently involves 7 assets. For each asset, these are the steps involved.

Open a new scene
Create a new reference
Browse the 9 layers of directories to find the asset
Provide a relevant namespace, should always be the name of the instance, e.g. ben02
Find and set the start- and end-frame

This, times 7, takes a little under a minute and will have to be done per update. Not to mention the time it would take in a real production with perhaps tens or hundreds of assets.

(7 X 5 + 9) * NUM_UPDATES

Initialisation would involve browsing for assets relevant to a shot, and update at the click of a button. 2 steps, as opposed to the 44 per update.

To browse a shot for content, we could either:

List content of each version.

Each version currently contains each instance, and if we can guarantee that every instance will always be present within each version - that is, that each version is atomic - then simply listing a single version is enough to provide a full introspection into what makes a shot.

Produce a shot inventory. The inventory would contain a listing of assets that are to be part of this shot in the form of a configuration file, such as a YAML. For example:

ben01
ben02
table01
sofa01
sofa02
cup01

The benefits of (1) is the automatic behaviour and direct correlation to what is actually published. Whenever anything new is published, the browser would automatically pick it up and display it; assuming each version is atomic.

A disadvantage then is if versions are not atomic, as atomicity can requires a lot of upfront planning to actually guarantee what is atomic. For example, if a prop is added mid-production, every version prior to the next version will no longer be atomic.

The benefit of (2) is finer control and accuracy, at the cost of added maintenance.

Maintenance is inevitable of course, so the balance to be struck then is when and where to this invest time.

Shot Building

In either approach, the process of importing each asset, when the assets are known, can be fully automated. This is referred to as shot building and essentially means to browse to a shot, and automatically (automagically?) “build” it.

Meaning, import every asset belonging to this shot.

For lighting, building means to import caches, whereas for animation building means to import the rigs, and possibly an associated layout.

With building, updating is also less important, as building is:

Stable
Fast (but not faster than updating an individual asset)

It’s stable because it happens from a clean slate, an empty scene, without only the latest assets and little room for error (read: human intervention).

marcus · August 14, 2015, 2:08pm

That makes sense, and correlates better with the current directory layout, in which version comes before the name of the asset in the shot.

@BigRoy, what do you think?

# From
thedeal_seq01_1000_animation_ben01_v036.abc

# To
thedeal_seq01_1000_animation_v036_ben01.abc

marcus · August 14, 2015, 2:29pm

To put pictures in your head, here are some!

The Deal - Shot 1000

Which highlights an additional issue; capturing currently happens for the full viewport, when it should only capture the gate mask.

Technicalities aside, how about that lovely animation huh?

BigRoy · August 14, 2015, 4:27pm

+1

If this is from the capture package should we set up an issue for it?

marcus · August 14, 2015, 4:49pm

Could do, but I’m not sure it’s capture.py’s fault.

I’m not sure Maya itself is capable of outputting only from the Gate Mask, we might have to do some trickery to overcome this. Not something I’ve attempted before.

On that note, we should also have a look at “burn-in” data, like a frame counter but also asset metadata, such as what it is, where it’s from and who the author is, along with a date.

It’s possible these two solutions overlap, but unlikely it will come from capture.py.

BigRoy · August 14, 2015, 4:54pm

Set the overscan of the camera to 1.0 for the duration of the context and it will capture the exact resolution gate.

Awesome. Let’s make an issue? Or is that the HUD issue?

marcus · August 14, 2015, 4:56pm

Is it that simple?

Sure, an issue with Magenta I think, for starters.

marcus · August 17, 2015, 12:03pm

Lighting is where the pipeline can truly start to show it’s colors.

It involves:

(1) Animated pointcaches, from animators
(2) Shaders, from look development artists
Assembled and connected for lighting

The workflow for a lighter is to…

Load a pointcache
Apply associated shaders from look development

It’s the “associated” part which is tricky.

It means that we’ll somehow need to determine where to find the shaders associated with a pointcache.

 __________________________________________________________________
|                                                                  |
| \thedeal\film\seq01\1000\animation\publish\v009\pointcache\ben01 |
|_________________________.________________________________________|
                          .
                          .
 _________________________.____________________________
|                         v                            |
| \thedeal\assets\ben\lookdev\publish\v029\lookdev\ben |
|______________________________________________________|

The problem is…

ben01 has no natural connection to ben the asset.

The asset was imported and used by the animator who produced the pointcache, but when publishing, this information was lost.

That is, there is no tracking of history nor relationships between assets.

To fix this, we’ll need to (1) publish additional information from the scene. Here is what something like that could look like from look development.

origin.json

{
  "author": "marcus", 
  "date": "2015-08-17T12:57:11.636000Z", 
  "filename": "C:\\Users\\marcus\\Dropbox\\Pyblish\\thedeal\\assets\\ben\\lookdev\\work\\maya\\scenes\\v002_marcus.ma", 
  "item": "ben", 
  "project": "thedeal", 
  "references": [
    {
      "filename": "C:/Users/marcus/Dropbox/Pyblish/thedeal/assets/ben/modeling/publish/v012/model/ben/thedeal_ben_modeling_v012_ben.ma", 
      "item": "ben", 
      "project": "thedeal", 
      "task": "modeling"
    }
  ], 
  "task": "lookdev"
}

(2) This file is then included with each published version.

\thedeal\film\seq01\1000\animation\publish\v009\pointcache\ben01
\thedeal\film\seq01\1000\animation\publish\v009\metadata\origin

(3) Such that we can look at the cache, and determine it’s origin.

\thedeal\assets\ben\lookdev\publish\v015\rigging\ben

(4) With the origin, it’s trivial to find the root asset and work our way up to where the latest version of the look development files are located.

\thedeal\assets\ben\lookdev\publish\v029\lookdev\ben

marcus · August 17, 2015, 1:29pm

Here’s an example of what it can look like, with the shader relations from above, but without tracking.

import json
from pyblish_magenta.utils.maya import lsattrs

fname = r"%PROJECTROOT%/assets/ben/lookdev/publish/v014/lookdev/ben/thedeal_ben_lookdev_v014_ben.json"
fname = os.path.expandvars(fname)

with open(fname) as f:
    payload = json.load(f)

for sg in payload:
    shading_group = lsattrs({"uuid": sg["uuid"]})[0]
    for m in sg["members"]:
        member = lsattrs({"uuid": m["uuid"]})[0]
        print("Adding \"%s\" to \"%s\"" % (member, shading_group))
        cmds.sets(member, forceElement=shading_group)

It’s the fname we need to figure out automatically, based on the current file which is from lighting.

Edit: actually, I never posted an example of the shader relations.

Here’s what that looks like.

lookdev.json

[
  {
    "name": "lightMetal_SG",
    "uuid": "f7b112ad-90bf-4274-8329-19a02092a083",
    "members": [
      {
        "name": "|:ben_GRP|:L_leg_GEO",
        "uuid": "53470dac-3709-499d-a490-4b8003b178ee",
        "properties": {
          "subdivision": 2,
          "displacementOffset": 0.94,
          "roundEdges": false,
          "objectId": "f35"
        }
      },
      {
        "name": "|:ben_GRP|:R_arm_GEO",
        "uuid": "8048064a-f6e0-48d4-bd03-e45b13dd2526"
      },
      {
        "name": "|:ben_GRP|:neck_GEO",
        "uuid": "53470dac-3709-499d-a490-4b8003b178ee"
      }
    ]
  },
  {
    "name": "orangeMetal_SG",
    "uuid": "f7b112ad-90bf-4274-8329-19a02092a083",
    "members": [
      {
        "name": "|:ben_GRP|:body_GEO.f[22:503]",
        "uuid": "8048064a-f6e0-48d4-bd03-e45b13dd2526"
      }
    ]
  }
]

In which each Mesh is associated to a Shading Group via a UUID, generated via Python standard library uuid.uuid4() and applied during scene saved.

pyblish-magenta/uuid.py

marcus · August 17, 2015, 2:55pm

Hey @BigRoy, I’m looking to augment the schema with some additional information.

# From
pattern: '{@shot}/{task}/publish'

# To
pattern: '{@shot}/{task}/publish/{version}/{family}/{instance}/{file}'

But I’m having trouble… Is is possible to do what, without breaking anything? Where else can I add this information?

BigRoy · August 17, 2015, 3:27pm

Currently no. Lucidity doesn’t support partial formatting/parsing so couldn’t list us the available versions in the integrator based on only this pattern. We need to be able to perform a partial format to list the currently available versions before we can choose what our next version will be.

A workaround for now would be to add another pattern with the full filepath and keep this shorter one (up to the version) available for the partial formatting in the Integrator.

marcus · August 17, 2015, 5:38pm

Ok, that works.

Asset Linking

I’ve pushed a first draft of the automatic shader assignment to Lighting from LookDev, here are some thoughts.

Here is the look development scene.

There a series of faces in the center of the model applied, to simulate face assignment in general.

And here are the shaders applied.

As we can see, the face assignment isn’t quite there yet, but otherwise things are looking good. Currently, it can:

Look up the origin of each referenced pointcache
Deduce the look development shaders
And links between shaders and meshes
Apply these shaders to the pointcached meshes

From an artists point of view, the process is fully automatic once having imported the pointcaches. But, things aren’t quite so rosy, and here’s why.

The Code, In Pieces

Full source

import json

import pyblish_magenta.schema
from pyblish_magenta.utils.maya import lsattrs

lsattrs is amazing. This would have been amazingly difficult without it.

It’s interface involves passing a dictionary of key/values from which all nodes in the scene is compared against. Any node with a matching key/value is returned.

For us, this is great, because every node in the scene is uniquely identified by a Universally Unique Identifier.

This is so that:

The identity of a polygonal mesh from modeling can be recorded
And associated with a shader

Regardless of hierarchy or namespace, the mesh remains unique across all sessions. This is how we can build the lookdev.json from above where the "name" key is merely for debugging.

schema = pyblish_magenta.schema.load()

The schema is loaded, as we need to go from the absolute path of a referenced pointcache to it’s original asset; such as /ben01_pointcache -> /ben.

origins = dict()
for reference in cmds.ls(type="reference"):
    if reference in ("sharedReferenceNode",):
        continue

    filename = cmds.referenceQuery(reference, filename=True)

    # Determine version of reference
    # NOTE(marcus): Will need to determine whether we're in a shot, or asset
    data = schema["shot.full"].parse(filename)

Each reference in the scene is assumed to be an Instance and each instance is parsed into it’s components, project, task and item such that we can rebuild this into another location.

In this case, we’re rebuilding the path to a pointcache to an origin asset.

    version = data["version"]
    
    # Reduce filename to the /publish directory
    template = schema["shot.publish"]
    data = template.parse(filename)
    root = template.format(data)

    versiondir = os.path.join(root, version)
    origindir = os.path.join(versiondir, "metadata", "origin").replace("/", "\\")
    if not os.path.exists(origindir):
        continue  # no origin

    originfile = os.path.join(origindir, os.listdir(origindir)[0])

    if not originfile in origins:
        with open(originfile) as f:
            origins[originfile] = json.load(f)

    origin = origins[originfile]

    if not origin["references"]:
        continue  # no references, no match

    reference = origin["references"][0]
    template = schema["asset.publish"]
    data = {
        "asset": reference["item"],
        "root": data["root"],
        "task": "lookdev"
    }
    assetdir = template.format(data)

The origin asset has been built, based on the origin.json we’ve published alongside the asset. Now we need to get the latest version from lookdev and import it.

    # NOTE(marcus): Need more robust version comparison
    version = sorted(os.listdir(assetdir))[-1]
    instancedir = os.path.join(assetdir, version, "lookdev", reference["item"])

    # NOTE(marcus): Will need more robust versions of these
    shaderfile = next(os.path.join(instancedir, f) for f in os.listdir(instancedir) if f.endswith(".ma"))
    linksfile = next(os.path.join(instancedir, f) for f in os.listdir(instancedir) if f.endswith(".json"))
    
    # Load shaders
    # NOTE(marcus): We'll need this to be separate, at least functionally
    namespace = "%s_shaders_" % reference["item"]
    if namespace not in cmds.namespaceInfo(
            ":", recurse=True, listOnlyNamespaces=True):
        cmds.file(shaderfile, reference=True, namespace=namespace)

And it’s been imported. With a lot of assumptions.

The final step is actually assigning shader to mesh, by way of their UUIDs.

    with open(linksfile) as f:
        payload = json.load(f)

    for shading_group_data in payload:
        try:
            shading_group_node = lsattrs({"uuid": shading_group_data["uuid"]})[0]
        except:
            # This would be a bug
            print("%s wasn't in the look dev scene" % shading_group_data["name"])
            continue

        for member_data in shading_group_data["members"]:
            try:
                member_node = lsattrs({"uuid": member_data["uuid"]})[0]
            except:
                # This would be inconsistent
                print("%s wasn't in the lighting scene" % shading_group_data["name"])
                continue

            print("Adding \"%s\" to \"%s\"" % (member_node, shading_group_node))
            cmds.sets(member_node, forceElement=shading_group_node)

What’s broken?

Aside from missing face assignment, there are a few things brittle about this approach.

I’m assuming we’re in a shot, as opposed to an asset, which is ok most of the time as you are most likely to apply shaders from lookdev during shot production
I’m formatting a path with it’s own parsed equivalent to find a parent path
I’m assuming the location of where the origin instance was published (with no graceful handling in case we are wrong)
I’m assuming the name of this origin file, based on it’s extension
I’m assuming a lookdev scene has only a single reference
I’m comparing versions ad-hoc; there’s no guarantee this v-prefixed variant will last, and if it changes, tough luck.
I’m assuming the shaders are located in a Maya Ascii file, the only Maya Ascii file present in the published version.
File loading is embedded into this one giant function
I’m being very forgiving regarding what is assigned a shader, and what is not, without any graceful handling of problems.

Ok, so that’s all great. Now…

What works?

The linkage between a shader and mesh? Check!
Publishing “origin” information from every asset automatically? Check!
Inferring an original asset from a pointcache? Check!

It may not look like much, but the above problems are mere technicalities and cosmetics in comparison to this. This is major pipeline functionality, without which we would have little luck in developing anything useful.

BigRoy · August 17, 2015, 6:18pm

An api would solve this by allowing something like:

ls(data)

This would list all possible locations present for the data.

Where the data is a dictionary holding the least amount of data that the pipeline requires to define where the published file would be. In short it would be its identifier.

identifiers = {'asset': 'ben', 'task': 'lookdev', 'family': 'shader', 'version': 14}

The API could either use a Schema (eg. with lucidity) to format where the file would be or use something like Open Metadata along with cQuery to query that. Then the pipeline would also allow us to retrieve possible values when we only have a subset of the required data. For example when listing what versions are available and limit it to no further queries than which define the limited key. This would solely be an optimization, but with the amount of content that could be in a version (or an asset?) potentially a required one:

identifier = {'asset': 'ben', 'task': 'lookdev', 'family': 'shader'}
values = ls(identifier, limit='version')

Thinking about it now it could return the available identifiers that were found:

query_identifier = {'asset': 'ben', 'task': 'lookdev', 'family': 'shader'}
identifiers = ls(query_identifier, limit='version')
print identifiers
# [ {'asset': 'ben', 'task': 'lookdev', 'family': 'shader', 'version': 1},
#   {'asset': 'ben', 'task': 'lookdev', 'family': 'shader', 'version': 2},
#   {'asset': 'ben', 'task': 'lookdev', 'family': 'shader', 'version': 3},
#   {'asset': 'ben', 'task': 'lookdev', 'family': 'shader', 'version': 4},
#   {'asset': 'ben', 'task': 'lookdev', 'family': 'shader', 'version': 5}]

To get the highest version:

highest_version = max(identifiers, key=lambda x: x['version'])

And to find the path for that specific data:

path = ls_path(highest_version)

This same method could be used in the Integrator to define the correct output path based on the data that is valid upon extraction. This means we’ll use the same interface for defining an extraction point as we’ll use for collection/searching.