API changes

I’ll be working on this starting next week, estimating a release by the end of June.

Here’s a summary of planned changes for Pyblish 1.1.0.

Goal

To simplify documentation and ease the learning curve for beginners.

Backwards compatibility

Once these changes are made, the intent is that we should all start writing plug-ins differently - i.e. using process() with optional context and instance arguments, as opposed to process_context and process_instance - but the current interface will remain untouched and backwards compatibility will remain a primary goal, at least until 2.0.

If you have any input on these, let me know.

Best,
Marcus

I know you have your concerns with repairing, but would this dependency injection effect the repair methods as well?

That is a good question.

It would make sense to mirror the change to affect repair as well.

def repair(self, context, instance):
  # perform repair

But ideally we would get on with Actions and spend time where it counts, as I think it’s pretty clear by now that that’s the way forward.

I think for the time being, as it’s not directly relevant to beginners, I’ll leave repair as-is unless it turns out to be relatively straightforward to bring it along, and save any additional work for when we move on to Actions.

Of course, pull-requests are welcome. Perhaps we could develop both simultaneously.

Any takers?

Cool, sounds good. When I have the pyblish-deadline package done I might jump onto this, can’t promise anything though:)

In-memory plug-ins, families and hosts defaults and Collector and Integrator plug-ins have been implemented. Along with #170, #178.

To follow along

If you can, follow along and update Pyblish as development progresses. Things might break and it will be buggy, but the more tests we put it through, the faster a working version can get pushed out.

  1. Install Pyblish from the mottosso fork.
  2. Append the installation before any other version, either via PYTHONPATH or sys.path

To toggle between bleeding edge and original, simply rename the repo, or remove the installation from the path.

Changelog so far.

- Feature: In-memory plug-ins (see #140)
- Enhancement: Logic unified between pyblish.util and pyblish.cli
- Bugfix: Order now works with pyblish.util and pyblish.cli-
- pyblish.util minified. For data visualisation, refer to pyblish-qml
- API: Added pyblish.api.plugins_by_instance()
- API: New defaults for `hosts` and `families` of plug-ins. (see #176)
- API: Added pyblish.api.register_plugin()
- API: Added pyblish.api.deregister_plugin()
- API: Added pyblish.api.registered_plugins()
- API: Added pyblish.api.deregister_all_plugins()
- API: Renamed pyblish.api.deregister_all -> deregister_all_paths

Otherwise, an update will be pushed to Pyblish Suite and Pyblish for Windows in a few weeks.

Updates

In-memory plug-ins

In-memory plug-ins were added mainly for testing and tutorials, but it’s got the potential to re-shape how you deploy plug-ins in your organisation. For example, you could discard physical files altogether, and register all plug-ins at run-time.

Here’s an example of how in-memory plug-ins work.

import pyblish.api as pyblish

# Mock file-system and destination server
_disk = list()
_server = dict()

class SelectInstances(pyblish.api.Selector):
    def process_context(self, context):
        instance = context.create_instance(name="MyInstance")
        instance.set_data("family", "MyFamily")

        SomeData = type("SomeData", (object,), {})
        SomeData.value = "MyValue"

        instance.add(SomeData)

class ValidateInstances(pyblish.api.Validator):
    def process_instance(self, instance):
        assert_equals(instance.data("family"), "MyFamily")

class ExtractInstances(pyblish.api.Extractor):
    def process_instance(self, instance):
        for child in instance:
            _disk.append(child)

class IntegrateInstances(pyblish.api.Integrator):
    def process_instance(self, instance):
        _server["assets"] = list()

        for asset in _disk:
            asset.metadata = "123"
            _server["assets"].append(asset)

# Register all plug-ins
for plugin in (SelectInstances,
               ValidateInstances,
               ExtractInstances,
               IntegrateInstances):
    pyblish.api.register_plugin(plugin)

# Publish
pyblish.util.publish()

# Disk and server has been updated.
assert _disk[0].value == "MyValue"
assert _server["assets"][0].value, "MyValue"
assert _server["assets"][0].metadata == "123"

Conveinence publishing and the command-line interface

Both have seen a major overhaul in terms of logging output.

Initially, publishing via scripting was the primary means of publishing anything, so logging was essential. Nowadays, results are visualised in the GUI and less is required of publishing via scripting.

As a result, the implementation is much smaller and maintenance is simplified. As an added bonus, the order attribute now works via both scripting and the command-line.

pyblish.api.plugins_by_instance

This was added to add symmetry to pyblish.api.instances_by_plugin, and merely runs pyblish.api.plugins_by_family by automatically fetching the family from the given instance. Symmetry is good.

New defaults

For plug-ins that support any host or family, there’s now no need to specify a wildcard.

# This
class MyPlugin(...):
  families = ["*"]
  hosts = ["*"]

# Is identical to this
class MyPlugin(...):
  pass

Dependency Injection

Here’s some topics of discussion I’ve encountered while implementing this.

1. Order of execution

Currently, process_context is always processed regardless of the presence of any instances, and it’s always processed before process_instance in cases where instances are present. With DI, this behaviour is lost.

def process_context(self, context):
  print "I'm processed first, and only once"

def process_instance(self, instance):
  print "I'm processed for every available instance"

With a context of three instances, this yields.

"I'm processed first, and only once"
"I'm processed for every available instance"
"I'm processed for every available instance"
"I'm processed for every available instance"

With DI, it would look like this.

def process(self, context, instance):
  print "I'm processed for every available instance"

Which under the same scenario outputs:

"I'm processed for every available instance"
"I'm processed for every available instance"
"I'm processed for every available instance"

Possible Solution

Since initialisation can sometimes be important, one alternative is to handle it in __init__.

def __init__(self):
  print "I'm processed first, and only once"

def process(self, context, instance):
  print "I'm processed for every available instance"

Which will output identical results to the current results.

"I'm processed first, and only once"
"I'm processed for every available instance"
"I'm processed for every available instance"
"I'm processed for every available instance"

In addition, handling initialisation in __init__ is more Pythonic and familiar to newcomers.

It does mean a minor but significant change in the overall behaviour. It means plug-ins are no longer stateless.

# Current event loop
for Plugin in Plugins:
  for instance in context:
    Plugin().process(instance)

# DI event loop
for Plugin in Plugins:
  plugin = Plugin()
  for instance in context:
    plugin.process(instance)

Not being stateless, from a development point of view, is more flexible and more powerful. But at what cost?

Open questions

Can you think of any other way to solve this? Is ordering important to begin with? Do we need initialisation? What are the pracital benefits? Costs?

2. Services

DI opens up doors for added functionality not before possible.

def process(self, context, instance):
  # the function is given the current Context, and Instance

The above mimics the current behaviour, with slightly less typing and options for excluding both Context and Instance from the function signature where needed.

But it also means the ability to inject custom functionality.

def process(self, instance, time, user):
  print("%s was published @ %s by %s" % instance.data("name"), time(), user)

In which time and user are injected on-demand, providing additional functionality to the plug-in. In this case, a callable function time which returns the current time, and a static value user.

Furthermore, services can be registered by developers.

import pyblish.api
pyblish.api.register_service(
  "time", lambda: datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S.%fZ"))

In the above, a custom service time is registered and made available to plug-ins, providing a pre-formatted version the current time, such that every plug-in uses the same formatting and needn’t concern themselves with maintaining any updates to it.

Services vs. Data

Where does the line go between what is data and what is a service?

If data, added via e.g. Context.set_data(key, value), represents data shared amongst plug-ins, services may represent shared functionality.

Though there is technically nothing preventing you from storing callables as data…

import time
context.set_data("time", lambda: time.time)

Just as there is technically nothing preventing you from providing constants as a service.

pyblish.api.register_service("user", getpass.getuser())

It may make sense from a maintenance point of view to make the data/function separation. This way, data can be kept constant which simplifies archiving and visualisation, like passing the entire thing to a database, whereas functionality can be kept free of constants.

Open questions

Is additional services something we need, or does it add complexity? When a plug-in requests a service that isn’t available, when do we throw an error?

def process(self, not_exist):
  pass
  1. Thrown during discovery
  2. Thrown during processing, e.g. in the GUI
  3. Silently skipped; rely on external tool for checking correctness.
# Checking correctness
$ pyblish check select_something.py
Plug-in is valid.

DI Transition Guide

Here are some notes of what is involved in converting your plug-ins to a dependency injection-style of working.

Note that none of this is fixed and is very debatable at the moment, so if you have any concerns or input, now is a good time.

  1. In cases where you have either process_context or process_instance, a simple search-and-replace to process will work fine.
  2. In cases where you have both, see below.
  3. For process() to be called, it must either ask for context and/or instance. If neither is present, process() will not be called at all. See below.
  4. During the transition phase, the distinction is made internally by looking for the existence of a process_context or process_instance method.
  5. If either exist, the plug-in is deemed “old-style” and is processed using the current implementation.
  6. If both process and either process_context or process_instance is present, old-style wins and process will not be called.

I’ll update the list as more things come to mind. So far, updating the entire Napoleon extension took less than a minute and was a matter of a simple search-and-replace, leaving the behaviour unspoiled.

Both process_context and process_instance

The current behaviour of this is for process_context to be processed first, followed by process_instance. This behaviour isn’t possible any more. You can however process both in the same function.

def process(self, context, process):
  # do things

In case you do have both, process_instance will overwrite process_context due to your plug-in being re-written to a it’s Dependency Injection equivalent at run-time.

def process_context(self, context):
  # I will not be called. :(

def process_instance(self, instance):
  # Runs as usual

Old-first

The reason for looking for old-style methods before new-style is because of the newly introduced ability to use __init__. In cases where __init__ is used, and process not being implemented, the plug-in is still deemed new-style as __init__ is assumed to not have been in use.

Empty process()

If neither context nor instance is present in the signature of process(), nothing happens.

I struggled to provide the ability to implement “anonymous” processes, for things that do something unrelated to either the Context nor Instance, but primarily to aid in the initial learning phase.

For example.

class MyPlugin(...):
  def process(self):
    cmds.file(exportSelected=True)

This could be a user’s first plug-in. From here, he could learn about the benefits of using Context and thereafter Instance, and thereby learning about why they exist in the first place. Baby-step style.

But, I just can’t for the life of me figure out how to do that in a way that makes sense.

For example, in this case.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self, instance):
    # validate the instance

It’s quite clear that when there isn’t an instance suitable to this family, process should not be called.

However.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self, context):
    # validate the context

What about now? The context isn’t dependent on a family, but should always be called regardless. So clearly, process is called, even if no compatible instance is present.

Which brings us to.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self):
    # do something simple

What happens now? Should it be called?

I considered letting it run if the arguments are either empty, or context is present. But that doesn’t work if other arguments are to be injected.

class ValidateInstanceA(pyblish.Validator):
  families = ["familyA"]
  def process(self, time):
    # do something simple with time

Thoughts?

Hey @mkolar, @BigRoy and @tokejepsen, I just updated the post above, would you mind having a look, specifically the last part about whether or not to run an empty process()?

It’s an subtle but important distinction that will be difficult to change once implemented, your input would be very valuable.

I think this is where a difference is between your and my interpretation.

I would consider that the families attribute here is what limits it from being processed. If this would still get processed even if no compatible instance is available that would only make it more confusing.

If someone really wanted to run just a context check no matter what the family then family can just be ["*"].

As you state the context isn’t dependent on the family. That is true, but the plug-in is dependendent on the family, so should not get processed. :wink:


The other confusing bit here is the amount of time things get run. With context you would expect a single run (over the context) whereas with the instance you want each individual one. It’s a clear distinction that might not only be apparent with dependency injection.

1. Keep process_instance and process_context separate.

Maybe this is where we decide that both process_context() and process_instance() have their respective place. (One gets called once, the other per instance) Of course they could still have the benefits of Dependency Injection for other attributes.

2. Drop the behaviour of per instance processing as a built-in method.

The other side might be to drop the behaviour to run something like process_instance(). Instead only have it run once, always. But that might make it harder to implement behaviour per instance (especially since one error will kill it for each single Instance and you make the plug-in developer responsible for error catching.

I think 1 could work best? It’s already proven itself that it works.

Thanks @BigRoy, really got me thinking.

I woke up this morning to another potential solution - which is to make the presence of instance determine whether or not to process once or per-instance. If instance is not requested, it will process once regardless.

  • Process once
class ValidateInstanceA(pyblish.Validator):
  families = ["*"]
  def process(self):
    # validate the world, once
  • Process per-instance
class ValidateInstanceA(pyblish.Validator):
  families = ["*"]
  def process(self, instance):
    # validate each instance

I like the sound of that.

For clarity, let me give some examples.

class ValidateInstanceA(pyblish.Validator):
  families = ["*"]
  def process(self, context):
    # validate context

This would process once per publish, regardless of instances.

class ValidateInstanceA(pyblish.Validator):
  families = ["myFamily"]
  def process(self, context):
    # validate context

Whereas this would process once per process, only if an instance of family "myFamily" is present.

This is rather complex and possibly confusing, but also flexible and how it works currently.

Should we keep this behaviour?

2 Likes

Having implemented the above and run it through the tests, it looks very good.

Currently, every discovered plug-in is processed at least once, with those requesting instance being processed once per available instance and in case there are no compatible instance doesn’t process at all.

instance then acts as a filter, enabling processing of every instance and preventing processing in cases where instances aren’t available. It’s a subtle difference, but I think it is the one that makes most sense.

It also means SimplePlugin now works as-is, without any custom code. It’s been given an order of -1 meaning they will run before anything else, but can of course be given an order explicitly, effectively making them into SVEC in case of having any of their orders set between 0-3.

This isn’t how it will work. It eliminated the use of plug-ins when no instance were present, like stacks that only operate on the context, and SimplePlugin, which doesn’t have any notion of instances.

About this, repair will also see an update to dependency injection, but I’m expecting a deprecation shortly in favour of Actions.

Your current repair_instance will continue to work fine, with the addition of being able to instead implement repair, passing it instance. As with process_*, a simple search-and-replace will suffice.

So a Plug-in with a specific family will always get its process() triggered (even if not one of those families is available as an instance)? In that case I think it should be clarified that family means instance_family and is only a filter for instances.

I would think it’s more convenient to always filter by family (even for ordinary process), except for when family is not filtered (like family = ["*"]). In that case SimplePlugin should still behave as you want since the default is that plug-ins are unfiltered. Maybe to clarify being unfiltered even further the family might be None by default?

Looking forward to a draft Actions implementations, woohoo!

Yeah, that sounds like it would work. I’ll have to double check the logic…

It looks like it does work, all tests pass and your logic is sound.

Considering it’s easier to go from here and back, than it is to go from allowing everything to adding limits, I’ll leave this in for the next release. I also think it makes more sense.

Thanks for spotting this.

Ok, so the logic is essentially this:

  • Asking for instance will limit your plug-in to only process supported instances.
  • Asking for instance when no instances are present, or only instances of unsupported families, the plug-in will never get run. Not even once.
  • All plug-ins process at least once, unless limited to a particular set of families.
class ValidateInEmergency(pyblish.Validator):
  families = ["emergencyFamily"]
  def process(self):
    call_police()

This plug-in will only run if an instance of emergecyFamily is present.

This looks quite clear and predictable to me. :+1:

As a last-minute modification, I’m considering adding arbitrary arguments to create_instance.

# Before
instance = context.create_instance(name="MyInstance")
instance.set_data("family", "myFamily")
# After
instance = context.create_instance(name="MyInstance", family="myFamily")

Where name is a first positional, required argument, and everything after are arbitrary keyword arguments.

def create_instance(name, **kwargs):
  instance = Instance(name)
  for key, value in kwargs.items():
    instance.set_data(key, value)

Allowing us to do things a bit more succinctly and flexibly.

instance = context.create_instance(
    name="MyInstance",
    family="myFamily",
    data1="more",
    data2="even more")

It’s a non-breaking change, but would be difficult to turn back from. Considering 1.1 is a large adjustment already, I’d say we’d either implement it now or at 2.0.

Thoughts?

1 Like

I’d go for it. Look like it would make the code tiny bit cleaner, which is always a good thing

1 Like