Instances and Plain Old Data

marcus · August 1, 2015, 3:26pm

As promised, here’s some context to why plain-old-data is well suited for instances, as opposed to complex Python objects.

instance.set_data("geo", "pCube1")  # Good
instance.set_data("geo", pymel.core.PyNode("pCube1"))  # Bad

Definition

First off, let’s define what I mean with “plain old data”.

That is, any of the regular Python str, int, float, dict, list and bool. Or in other words, anything compatible with JSON.

import json
my_data = [1, "hello", True, {}]

try:
  json.dumps(my_data)
  print "Is compatible"
except:
  print "Is not compatible"

Here is an example of what isn’t compatible.

class MyData(object):
  value = 5

That’s because MyData is a custom type and the JSON serialiser isn’t able to figure out it’s value, even though it’s obvious to you and me.

The reason

(1) Primarily it’s to facilitate distributed processing in the future.

Python has no problem storing complex types in Instance's and it currently works just fine when using pyblish.util.publish().

The problem arises when this data needs to be sent across to another process or computer for further/parallell processing. When the data is plain, this isn’t a problem and which means plug-ins can scale.

(2) Secondly, visualisation. The GUI currently makes no attempt to show you what an Instance contains, but it will. Much like attributes on nodes in Maya are visualised either directly as children within the Outliner, or as metadata in the Attribute Editor so will Pyblish frontends visualise the contents and metadata of every Instance and it is therefore imperative that this data remains accessible to a third party, and not something that is dependent on the current Python session or a host library.

(3) Auditing. Or, archival/history. Basically, keeping track of events occurring throughout an organisation. An advanced topic, but increasingly important as publishing becomes a more established action, both for security but also sanity. E.g. you can review the “logs” of what happened to find out why some things are missing or why there are extras of something, along with who is responsible, when and where.

These all depend on Instance's remaining simple and serialisable.

The cost

Of course, nothing is without cost. By not allowing complex data structures, manipulating and working with data can become more tedious. So it’s still something we can talk about, there are other ways of achieving the above, it’s more a matter of finding the right balance.

To work with complex data structures, here are some tips and tricks.

1. Serialisation/Deserialisation

It can be tempting and useful to store PyNodes directly, as opposed to just strings of names. To work around this, here are two methods with which to retain a reference to this complex object.

import pyblish.api
import pymel.core as pm

complex_data = pm.PyNode("pCube1")

# Data is serialised
serialised_data = str(complex_data)

# Stored
instance = pyblish.api.Instance("MyInstance")
instance.set_data("MyNode", serialised_data)

# And deserialised back to a PyNode when needed
complex_data = pm.PyNode(instance.data("MyNode"))

This works when serialisation is either supported by the complex object, or when it is obvious how to handle it by hand. As in this case, where a PyNode simply references a node which is equally simple to reference by name, casting it back to a complex object (i.e. deserialising it) is a simple matter.

2. Maintaining a register

import pyblish.api
import pymel.core as pm

complex_data = pm.PyNode("pCube1")

# A reference to the complex object is stored separately
register = {
  "UniqueId", complex_data
}

# And stored in the Instance
instance = pyblish.api.Instance("MyInstance")
instance.set_data("MyNode", "UniqueId")

# Look up the complex object when needed
complex_data = register[instance.data("MyNode")]

Here, the complex object is fully preserved, the only disadvantage being (1) that you now have to maintain a separate registry of objects that (2) isn’t accessible outside of the current process. I.e. you can’t send it across to another process or computer and restore it from the register, as it is only available to you locally.

Both approaches solve the same problem, but offer a different set of pros and cons. Either approach works with any complex object, the latter also supporting objects that maintain some form of state, such as a live connection to a database. Anything that cannot be fully fully recreated after the fact.

Bottom line

Try sticking with plain-old-data where possible and consider approach 1 favourable to approach 2 when it isn’t.

BigRoy · August 1, 2015, 4:15pm

Good write-up!

I think this feature will be happily appreciated by both TDs as well as the other artists. It’s a debugging help, but also really helps the artist gain insight on what the heck is going on when Pyblish blinks and processes.

Definitely a plus. Even if you want to keep the data out of a database it’s useful if you can do a simple json.dump() to keep a log of your publish.