Parallel processing

tokejepsen · October 7, 2015, 8:44am

I’ve currently got some validators for geometry in a scene with a lot of objects. Each object comes in as individual instances, and the whole publishing process takes a very long time.
I was going to look into optimizing the validators to be run on all the objects at the same time, but that involves some slight fiddle/hack.

But what I was wondering about was why the plugins aren’t run in parallel, instead of sequencial?

BigRoy · October 7, 2015, 2:17pm

Are you processing them using Maya commands? Or do you have a Collector that collects your own data structure on which you’re doing commands that don’t rely on maya.cmds.

If you rely on maya.cmds (or the API!) it’ll be very hard to make them process in parallel since the commands are run in Maya sequentially. There wouldn’t be much of a speed-up trying to have them run in parallel in a single maya instance.

Theoretically it’s possible to have Pyblish function in such a way that it allows processing plug-ins in a parallel fashion, but as mentioned that wouldn’t solve the fact that Maya doesn’t run commands in a parallel fashion.

Can you share what kind of Validators you’re using that are currently slowing down the process to a point where it’s a real problem?

marcus · October 7, 2015, 2:47pm

Parallel processing would be awesome, but like @BigRoy says, it’s slightly problematic. In most cases, unfortunately.

It boils down to that Maya and most other hosts run sequentially which means it can’t do two things at once. It can be worked around, and that’s the plan.

https://github.com/pyblish/pyblish/issues/150

In a nutshell, rather than running multiple plug-ins in a host, such as Maya, run one host per plug-in.

The remaining issue there is when plug-ins are ordered to run one after the other. Only plug-ins with the same order can ever run in parallel, as there’s a chance a plug-in of a higher order to somehow depend on the results of the one before it.

That’s simply the nature of ordering and there’s not much to do about it technically. It comes down to you ordering with caution and taking into account those plug-ins that are meant to run in parallel.

Any experimentation on this front is more than welcome, the sooner we get a solid workflow for it up and running, the faster I or anyone can see it implemented in the core.

marcus · October 7, 2015, 3:09pm

For experimentation, you could have a look at running your individual plug-ins in the background, followed by an “aggregator” to await the status of their completion.

Will be difficult to fully explain it, but in a nutshell.

In your validator, Save or Export All the current scene
Launch a new Maya via subprocess.Popen
In the new Maya, run the saved scene and the function that does the validation; i.e. not the plug-in, but just a plain old Python function.
Since it’s Popen, the validator will not stop Pyblish, but run in the background. The next plug-in can then do the same
And the next plug-in too
Until all parallel plug-ins are running via Popen
To keep Pyblish from finishing before Popen finishes, you’ll need an Aggregator. A plug-in that runs after all parallel plug-ins and monitors the started Popen processes for completion. Their exit status will then tell you whether the validation failed or not.

Since it’s a hack, the parallel plug-ins won’t have that little red icon indicating whether they failed, your aggregator will have to be the one to have that. In the future, we could have a look at how to retroactively set the failed state of each plug-in for situations like these.

Have a look at this one for an example of how to run a Maya in the background.

https://github.com/pyblish/pyblish-magenta/blob/master/pyblish_magenta/plugins/wrap_alembic.py

tokejepsen · October 8, 2015, 8:03am

Thanks for the info:)

I’ll see what I can do with running the validators on multiple objects at the same time. If that doesn’t give me enough/anything I’ll look into the parallel processsing.

tokejepsen · October 8, 2015, 2:36pm

I’m just wondering whether I’m experiencing the correct speeds when publishing, so would be great if someone else could test this as well.

import pymel
import pyblish.api
import pyblish.util
import pyblish_maya

# setup scene
for count in range(0, 500):
    pymel.core.spaceLocator()

# collecting locators
class CollectLocators(pyblish.api.Collector):
    
    def process(self, context):
        for node in pymel.core.ls(type='locator'):
            instance = context.create_instance(name=node.getParent().name())
            instance.set_data('family', value='locator')

# empty plugin
class EmptyPlugin(pyblish.api.Validator):
    
    families = ['locator']
    
    def process(self, instance):
        return

pyblish.api.register_plugin(CollectLocators)
pyblish.api.register_plugin(EmptyPlugin)

pyblish_maya.show()

BigRoy · October 8, 2015, 3:01pm

This speed is likely because of the ping between the host and the external Pyblish process.

It’s roughly the same speed here, maybe even a tiny tiny bit slower. (I’m running from a network server here, so that could also influence the speed.)

I tested this on a Machine that had a render process in the background on 75% CPU (dropped affinity).

tokejepsen · October 8, 2015, 3:11pm

Sorry don’t get what you are saying here. Did you try it again after the render process, and it was quicker?

marcus · October 8, 2015, 5:04pm

Yeah that is something that could be optimized further.

Does the same thing happen with as many plug-ins, but only one instance? It should go much faster.

tokejepsen · October 9, 2015, 8:05am

Have you got an idea of how to generate that many plugins, without a very long piece of code?

marcus · October 9, 2015, 2:37pm

Sure, try something like this.

import pyblish.api
import pyblish.util

class TemplatePlugin(pyblish.api.Validator):
  def process(self, instance):
    print("Doing %s" % type(self).__name__)

for i in range(100):
  plugin = type("MyPlugin%s" % i, (TemplatePlugin,), {})
  pyblish.api.register_plugin(plugin)

context = pyblish.api.Context()
context.create_instance("MyInstance")

pyblish.util.publish(context)

tokejepsen · October 9, 2015, 4:55pm

Seems like its similar speeds; https://youtu.be/eyYGP0XEpB0

import pymel
import pyblish.api
import pyblish.util
import pyblish_maya

# setup scene
pymel.core.spaceLocator()

# collecting locators
class CollectLocators(pyblish.api.Collector):
    
    def process(self, context):
        for node in pymel.core.ls(type='locator'):
            instance = context.create_instance(name=node.getParent().name())
            instance.set_data('family', value='locator')

# empty plugin
class EmptyPlugin(pyblish.api.Validator):
    
    families = ['locator']
    
    def process(self, instance):
        return

pyblish.api.register_plugin(CollectLocators)

for i in range(500):
  plugin = type("MyPlugin%s" % i, (EmptyPlugin,), {})
  pyblish.api.register_plugin(plugin)

context = pyblish.api.Context()
context.create_instance("MyInstance")

pyblish.util.publish(context)

pyblish_maya.show()

marcus · October 9, 2015, 4:59pm

Hm, that is odd.

It’s almost certainly related to TCP traffic, but I can’t think of why it would go this slow as the traffic is local only.

It might be related to permission issues, like a firewall, that inspects the packets going in and out at each turn. I’m not sure how you could test that theory…

I’ll keep this in mind for a bit, see if anything pops out the other end. Thanks for running through these tests, very valuable and informative.

tokejepsen · October 9, 2015, 5:01pm

@BigRoy did you have similar speeds?

BigRoy · October 12, 2015, 8:34am

Yup. Both seem to be similar speeds (slow). Also the UI feels really laggy with this number of instances and/or plug-ins.

marcus · October 12, 2015, 11:40am

Beyond testing with this many items, is this a problem?

tokejepsen · October 12, 2015, 11:41am

How do you mean?

marcus · October 12, 2015, 11:45am

I mean, is this a realistic scenario, or is it just for testing? Are you having thousands of instances or plug-ins?

tokejepsen · October 12, 2015, 11:49am

In the example I refer to at the start we have 359 meshes in the scene, that I make instances of.

Although a slightly extreme case, we certain have 100s of meshes in scenes very often.

marcus · October 12, 2015, 1:34pm

Ah, try making an instance out of a collection of meshes instead.

Think of instances as what will become the resulting file. Are you sure you are looking to have 359 unique files published from a single scene? Might be a workflow issue. The intent is to never really go beyond 10-15 instances in even the most complex of feature film scenes.