API changes

marcus · May 4, 2015, 6:34am

class MyValidator(pyblish.Validator):
  def process(self, context, instance):
    """Unified processing method with dependency injection"""
     
    # Plain dictionary for data
    instance.data["key"] = value
    if "key" in instance.data:
       print("Yes, it's got key")

    # Modify or add depending on whether it exists
    with instance.data.item("group") as g:
      g["key"] = value

API Changes

Thought I’d summarise and bring up to discussion some proposed api changes about to happen and what they will mean for developers of plug-ins.

Unified processing method

This is perhaps the one thing that keeps me up the most at night, the duplication involved in having both process_context and process_instance.

def process_context(self, context):
    """duplication in function-name, signature and sibling function"""

See here for details.

Dependency Injection

Plain dictionary

The data property has all the features of a plain dictionary, except delegated to individual functions such as set_data and has_data. I’d like to see these merged into being a plain dictionary.

# Old
instance.set_data("key", "value")  # old
instance.data("notexists", default="exists")

for key, value in instance.data().iteritems():
  pass

# New
instance.data["key"] = "value" 
instance.data.get("notexists", default="exists") 

for key, value in instance.data.iteritems():
  pass

Besides cosmetic changes, the following behaviour can also be simplified.

# Before
if not instance.has_data("mydata"):
  instance.set_data("mydata", {})
mydata = instance.data("mydata")
mydata["key"] = "value"

# After
try:
  mydata = instance["mydata"]
except:
  mydata = {}
  instance["mydata"] = mydata
mydata["key"] = "value

Which also paves the road for multiprocessing. Notice how the first method queries the existence before attempting to write. In this member is accessed by two separate threads, there is no telling which one starts writing first, and which one is told the member already exist.

See here for more.

Pure dict for plug-in data

Modify on add

Based on the fact that the above is still quite complicated, and still a common thing to do, I’m also considering implementing this sort of behaviour.

with instance.data.item("group") as g:
  g["key"] = value

It does exactly like what the above does, but in 2 lines as opposed to 6, and is thread-safe.

Backwards compatibility

The big question here is of course:

Will this break our plug-ins??

And no, there’s no need for that.

Each of the above changes can be implemented to the side of existing behaviour and that’s the plan. We could have a think about whether we would like to have access to some form of transition mechanism.

import pyblish.api as pyblish
pyblish.set_api(1) # Old API
pyblish.set_api(2) # New API

In which the old API would provide the new API, but remain backwards compatible, and the new API would discard the old functionality altogether. The old API would remain default until Pyblish hits 2.0, but users could choose to adopt it before hand if they wanted to.

marcus · May 8, 2015, 9:54am

Another potential change, to simplify the learning process for newcomers.

Family and Host Defaults

Currently, if you do not explicitly say that you intend for a plug-in to apply to a particular host and family, it will fail discovery and throw a warning.

Whether or not to have a default value relates to two problems at either end of the spectrum.

Either you write a plug-in, but it doesn’t show up in the GUI.
An unintended plug-in shows up in the GUI

In the first case, you wrote a plug-in and expected to see and run it, but because it didn’t have a family and host, it got rejected. It’s easily fixed, you simply add a host and family.

In the second case, a plug-in appears where you didn’t intend it to, like a plug-in for Maya showing up in Nuke, or a plug-in for FamilyA applying to instances of FamilyB. It’s easily fixed, you simply restrict hosts and families to those you intend.

Both are bad, that much is clear. But one couples the understanding of what a host and family means before understanding how to use it, whereas the other can be learned as you need it.

So how about defaulting families and hosts to wildcard?

class MyPlugin(...):
   hosts = ["*"]
   families = ["*"]

As we’re moving from a restricted setting to more loose, it won’t have any effect on existing plug-ins, but will open up for a simplified learning process.

marcus · May 16, 2015, 3:20pm

I’ll be working on this starting next week, estimating a release by the end of June.

Here’s a summary of planned changes for Pyblish 1.1.0.

Dependency Injection
Add support for Collector and Integrator aliases for Selector and Conformer respectively
In-memory plug-ins
Family and Host Defaults, see above.
SimplePlugin
Context in GUI

Goal

To simplify documentation and ease the learning curve for beginners.

Backwards compatibility

Once these changes are made, the intent is that we should all start writing plug-ins differently - i.e. using process() with optional context and instance arguments, as opposed to process_context and process_instance - but the current interface will remain untouched and backwards compatibility will remain a primary goal, at least until 2.0.

If you have any input on these, let me know.

Best,
Marcus

tokejepsen · May 16, 2015, 5:47pm

I know you have your concerns with repairing, but would this dependency injection effect the repair methods as well?

marcus · May 16, 2015, 7:10pm

That is a good question.

It would make sense to mirror the change to affect repair as well.

def repair(self, context, instance):
  # perform repair

But ideally we would get on with Actions and spend time where it counts, as I think it’s pretty clear by now that that’s the way forward.

I think for the time being, as it’s not directly relevant to beginners, I’ll leave repair as-is unless it turns out to be relatively straightforward to bring it along, and save any additional work for when we move on to Actions.

Of course, pull-requests are welcome. Perhaps we could develop both simultaneously.

Any takers?

tokejepsen · May 16, 2015, 7:42pm

Cool, sounds good. When I have the pyblish-deadline package done I might jump onto this, can’t promise anything though:)

marcus · May 18, 2015, 4:05pm

In-memory plug-ins, families and hosts defaults and Collector and Integrator plug-ins have been implemented. Along with #170, #178.

To follow along

If you can, follow along and update Pyblish as development progresses. Things might break and it will be buggy, but the more tests we put it through, the faster a working version can get pushed out.

Install Pyblish from the mottosso fork.
Append the installation before any other version, either via PYTHONPATH or sys.path

To toggle between bleeding edge and original, simply rename the repo, or remove the installation from the path.

Changelog so far.

- Feature: In-memory plug-ins (see #140)
- Enhancement: Logic unified between pyblish.util and pyblish.cli
- Bugfix: Order now works with pyblish.util and pyblish.cli-
- pyblish.util minified. For data visualisation, refer to pyblish-qml
- API: Added pyblish.api.plugins_by_instance()
- API: New defaults for `hosts` and `families` of plug-ins. (see #176)
- API: Added pyblish.api.register_plugin()
- API: Added pyblish.api.deregister_plugin()
- API: Added pyblish.api.registered_plugins()
- API: Added pyblish.api.deregister_all_plugins()
- API: Renamed pyblish.api.deregister_all -> deregister_all_paths

Otherwise, an update will be pushed to Pyblish Suite and Pyblish for Windows in a few weeks.

Updates

In-memory plug-ins

In-memory plug-ins were added mainly for testing and tutorials, but it’s got the potential to re-shape how you deploy plug-ins in your organisation. For example, you could discard physical files altogether, and register all plug-ins at run-time.

Here’s an example of how in-memory plug-ins work.

import pyblish.api as pyblish

# Mock file-system and destination server
_disk = list()
_server = dict()

class SelectInstances(pyblish.api.Selector):
    def process_context(self, context):
        instance = context.create_instance(name="MyInstance")
        instance.set_data("family", "MyFamily")

        SomeData = type("SomeData", (object,), {})
        SomeData.value = "MyValue"

        instance.add(SomeData)

class ValidateInstances(pyblish.api.Validator):
    def process_instance(self, instance):
        assert_equals(instance.data("family"), "MyFamily")

class ExtractInstances(pyblish.api.Extractor):
    def process_instance(self, instance):
        for child in instance:
            _disk.append(child)

class IntegrateInstances(pyblish.api.Integrator):
    def process_instance(self, instance):
        _server["assets"] = list()

        for asset in _disk:
            asset.metadata = "123"
            _server["assets"].append(asset)

# Register all plug-ins
for plugin in (SelectInstances,
               ValidateInstances,
               ExtractInstances,
               IntegrateInstances):
    pyblish.api.register_plugin(plugin)

# Publish
pyblish.util.publish()

# Disk and server has been updated.
assert _disk[0].value == "MyValue"
assert _server["assets"][0].value, "MyValue"
assert _server["assets"][0].metadata == "123"

Conveinence publishing and the command-line interface

Both have seen a major overhaul in terms of logging output.

Initially, publishing via scripting was the primary means of publishing anything, so logging was essential. Nowadays, results are visualised in the GUI and less is required of publishing via scripting.

As a result, the implementation is much smaller and maintenance is simplified. As an added bonus, the order attribute now works via both scripting and the command-line.

pyblish.api.plugins_by_instance

This was added to add symmetry to pyblish.api.instances_by_plugin, and merely runs pyblish.api.plugins_by_family by automatically fetching the family from the given instance. Symmetry is good.

New defaults

For plug-ins that support any host or family, there’s now no need to specify a wildcard.

# This
class MyPlugin(...):
  families = ["*"]
  hosts = ["*"]

# Is identical to this
class MyPlugin(...):
  pass

marcus · May 19, 2015, 10:00am

Dependency Injection

Here’s some topics of discussion I’ve encountered while implementing this.

1. Order of execution

Currently, process_context is always processed regardless of the presence of any instances, and it’s always processed before process_instance in cases where instances are present. With DI, this behaviour is lost.

def process_context(self, context):
  print "I'm processed first, and only once"

def process_instance(self, instance):
  print "I'm processed for every available instance"

With a context of three instances, this yields.

"I'm processed first, and only once"
"I'm processed for every available instance"
"I'm processed for every available instance"
"I'm processed for every available instance"

With DI, it would look like this.

def process(self, context, instance):
  print "I'm processed for every available instance"

Which under the same scenario outputs:

"I'm processed for every available instance"
"I'm processed for every available instance"
"I'm processed for every available instance"

Possible Solution

Since initialisation can sometimes be important, one alternative is to handle it in __init__.

def __init__(self):
  print "I'm processed first, and only once"

def process(self, context, instance):
  print "I'm processed for every available instance"

Which will output identical results to the current results.

"I'm processed first, and only once"
"I'm processed for every available instance"
"I'm processed for every available instance"
"I'm processed for every available instance"

In addition, handling initialisation in __init__ is more Pythonic and familiar to newcomers.

It does mean a minor but significant change in the overall behaviour. It means plug-ins are no longer stateless.

# Current event loop
for Plugin in Plugins:
  for instance in context:
    Plugin().process(instance)

# DI event loop
for Plugin in Plugins:
  plugin = Plugin()
  for instance in context:
    plugin.process(instance)

Not being stateless, from a development point of view, is more flexible and more powerful. But at what cost?

Open questions

Can you think of any other way to solve this? Is ordering important to begin with? Do we need initialisation? What are the pracital benefits? Costs?

marcus · May 19, 2015, 10:22am

2. Services

DI opens up doors for added functionality not before possible.

def process(self, context, instance):
  # the function is given the current Context, and Instance

The above mimics the current behaviour, with slightly less typing and options for excluding both Context and Instance from the function signature where needed.

But it also means the ability to inject custom functionality.

def process(self, instance, time, user):
  print("%s was published @ %s by %s" % instance.data("name"), time(), user)

In which time and user are injected on-demand, providing additional functionality to the plug-in. In this case, a callable function time which returns the current time, and a static value user.

Furthermore, services can be registered by developers.

import pyblish.api
pyblish.api.register_service(
  "time", lambda: datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S.%fZ"))

In the above, a custom service time is registered and made available to plug-ins, providing a pre-formatted version the current time, such that every plug-in uses the same formatting and needn’t concern themselves with maintaining any updates to it.

Services vs. Data

Where does the line go between what is data and what is a service?

If data, added via e.g. Context.set_data(key, value), represents data shared amongst plug-ins, services may represent shared functionality.

Though there is technically nothing preventing you from storing callables as data…

import time
context.set_data("time", lambda: time.time)

Just as there is technically nothing preventing you from providing constants as a service.

pyblish.api.register_service("user", getpass.getuser())

It may make sense from a maintenance point of view to make the data/function separation. This way, data can be kept constant which simplifies archiving and visualisation, like passing the entire thing to a database, whereas functionality can be kept free of constants.

Open questions

Is additional services something we need, or does it add complexity? When a plug-in requests a service that isn’t available, when do we throw an error?

def process(self, not_exist):
  pass

Thrown during discovery
Thrown during processing, e.g. in the GUI
Silently skipped; rely on external tool for checking correctness.

# Checking correctness
$ pyblish check select_something.py
Plug-in is valid.

marcus · May 19, 2015, 2:40pm

DI Transition Guide

Here are some notes of what is involved in converting your plug-ins to a dependency injection-style of working.

Note that none of this is fixed and is very debatable at the moment, so if you have any concerns or input, now is a good time.

In cases where you have either process_context or process_instance, a simple search-and-replace to process will work fine.
In cases where you have both, see below.
For process() to be called, it must either ask for context and/or instance. If neither is present, process() will not be called at all. See below.
During the transition phase, the distinction is made internally by looking for the existence of a process_context or process_instance method.
If either exist, the plug-in is deemed “old-style” and is processed using the current implementation.
If both process and either process_context or process_instance is present, old-style wins and process will not be called.

I’ll update the list as more things come to mind. So far, updating the entire Napoleon extension took less than a minute and was a matter of a simple search-and-replace, leaving the behaviour unspoiled.

Both `process_context` and `process_instance`

The current behaviour of this is for process_context to be processed first, followed by process_instance. This behaviour isn’t possible any more. You can however process both in the same function.

def process(self, context, process):
  # do things

In case you do have both, process_instance will overwrite process_context due to your plug-in being re-written to a it’s Dependency Injection equivalent at run-time.

def process_context(self, context):
  # I will not be called. :(

def process_instance(self, instance):
  # Runs as usual

Old-first

The reason for looking for old-style methods before new-style is because of the newly introduced ability to use __init__. In cases where __init__ is used, and process not being implemented, the plug-in is still deemed new-style as __init__ is assumed to not have been in use.

Empty `process()`

If neither context nor instance is present in the signature of process(), nothing happens.

I struggled to provide the ability to implement “anonymous” processes, for things that do something unrelated to either the Context nor Instance, but primarily to aid in the initial learning phase.

For example.

class MyPlugin(...):
  def process(self):
    cmds.file(exportSelected=True)

This could be a user’s first plug-in. From here, he could learn about the benefits of using Context and thereafter Instance, and thereby learning about why they exist in the first place. Baby-step style.

But, I just can’t for the life of me figure out how to do that in a way that makes sense.

For example, in this case.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self, instance):
    # validate the instance

It’s quite clear that when there isn’t an instance suitable to this family, process should not be called.

However.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self, context):
    # validate the context

What about now? The context isn’t dependent on a family, but should always be called regardless. So clearly, process is called, even if no compatible instance is present.

Which brings us to.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self):
    # do something simple

What happens now? Should it be called?

I considered letting it run if the arguments are either empty, or context is present. But that doesn’t work if other arguments are to be injected.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self, time):
    # do something simple with time

Thoughts?

marcus · May 26, 2015, 8:25pm

Hey @mkolar, @BigRoy and @tokejepsen, I just updated the post above, would you mind having a look, specifically the last part about whether or not to run an empty process()?

It’s an subtle but important distinction that will be difficult to change once implemented, your input would be very valuable.

BigRoy · May 26, 2015, 10:56pm

I think this is where a difference is between your and my interpretation.

I would consider that the families attribute here is what limits it from being processed. If this would still get processed even if no compatible instance is available that would only make it more confusing.

If someone really wanted to run just a context check no matter what the family then family can just be ["*"].

As you state the context isn’t dependent on the family. That is true, but the plug-in is dependendent on the family, so should not get processed.

The other confusing bit here is the amount of time things get run. With context you would expect a single run (over the context) whereas with the instance you want each individual one. It’s a clear distinction that might not only be apparent with dependency injection.

1. Keep `process_instance` and `process_context` separate.

Maybe this is where we decide that both process_context() and process_instance() have their respective place. (One gets called once, the other per instance) Of course they could still have the benefits of Dependency Injection for other attributes.

2. Drop the behaviour of per instance processing as a built-in method.

The other side might be to drop the behaviour to run something like process_instance(). Instead only have it run once, always. But that might make it harder to implement behaviour per instance (especially since one error will kill it for each single Instance and you make the plug-in developer responsible for error catching.

I think 1 could work best? It’s already proven itself that it works.

marcus · May 27, 2015, 6:23am

Thanks @BigRoy, really got me thinking.

I woke up this morning to another potential solution - which is to make the presence of instance determine whether or not to process once or per-instance. If instance is not requested, it will process once regardless.

Process once

class ValidateInstanceA(pyblish.Validator):
  families = ["*"]
  def process(self):
    # validate the world, once

Process per-instance

class ValidateInstanceA(pyblish.Validator):
  families = ["*"]
  def process(self, instance):
    # validate each instance

I like the sound of that.

For clarity, let me give some examples.

class ValidateInstanceA(pyblish.Validator):
  families = ["*"]
  def process(self, context):
    # validate context

This would process once per publish, regardless of instances.

class ValidateInstanceA(pyblish.Validator):
  families = ["myFamily"]
  def process(self, context):
    # validate context

Whereas this would process once per process, only if an instance of family "myFamily" is present.

This is rather complex and possibly confusing, but also flexible and how it works currently.

Should we keep this behaviour?

marcus · June 3, 2015, 3:07pm

Having implemented the above and run it through the tests, it looks very good.

Currently, every discovered plug-in is processed at least once, with those requesting instance being processed once per available instance and in case there are no compatible instance doesn’t process at all.

instance then acts as a filter, enabling processing of every instance and preventing processing in cases where instances aren’t available. It’s a subtle difference, but I think it is the one that makes most sense.

It also means SimplePlugin now works as-is, without any custom code. It’s been given an order of -1 meaning they will run before anything else, but can of course be given an order explicitly, effectively making them into SVEC in case of having any of their orders set between 0-3.

This isn’t how it will work. It eliminated the use of plug-ins when no instance were present, like stacks that only operate on the context, and SimplePlugin, which doesn’t have any notion of instances.

marcus · June 3, 2015, 3:45pm

About this, repair will also see an update to dependency injection, but I’m expecting a deprecation shortly in favour of Actions.

Your current repair_instance will continue to work fine, with the addition of being able to instead implement repair, passing it instance. As with process_*, a simple search-and-replace will suffice.

BigRoy · June 3, 2015, 4:11pm

So a Plug-in with a specific family will always get its process() triggered (even if not one of those families is available as an instance)? In that case I think it should be clarified that family means instance_family and is only a filter for instances.

I would think it’s more convenient to always filter by family (even for ordinary process), except for when family is not filtered (like family = ["*"]). In that case SimplePlugin should still behave as you want since the default is that plug-ins are unfiltered. Maybe to clarify being unfiltered even further the family might be None by default?

Looking forward to a draft Actions implementations, woohoo!

marcus · June 3, 2015, 4:22pm

Yeah, that sounds like it would work. I’ll have to double check the logic…

marcus · June 3, 2015, 4:34pm

It looks like it does work, all tests pass and your logic is sound.

Considering it’s easier to go from here and back, than it is to go from allowing everything to adding limits, I’ll leave this in for the next release. I also think it makes more sense.

Thanks for spotting this.

marcus · June 3, 2015, 4:54pm

Ok, so the logic is essentially this:

Asking for instance will limit your plug-in to only process supported instances.
Asking for instance when no instances are present, or only instances of unsupported families, the plug-in will never get run. Not even once.
All plug-ins process at least once, unless limited to a particular set of families.

class ValidateInEmergency(pyblish.Validator):
  families = ["emergencyFamily"]
  def process(self):
    call_police()

This plug-in will only run if an instance of emergecyFamily is present.

mkolar · June 3, 2015, 9:53pm

This looks quite clear and predictable to me.

API changes

API Changes

Unified processing method

Plain dictionary

Modify on add

Backwards compatibility

Family and Host Defaults

Goal

Backwards compatibility

Updates

In-memory plug-ins

Conveinence publishing and the command-line interface

pyblish.api.plugins_by_instance

New defaults

Dependency Injection

1. Order of execution

Open questions

2. Services

Services vs. Data

Open questions

DI Transition Guide

Both process_context and process_instance

Old-first

Empty process()

1. Keep process_instance and process_context separate.

2. Drop the behaviour of per instance processing as a built-in method.

Both `process_context` and `process_instance`

Empty `process()`

1. Keep `process_instance` and `process_context` separate.