Pyblish-standalone exit prematurely

We quite often get frozen pyblish-qml GUI, when using pyblish-standalone and I’m pretty sure it’s to do with exiting prematurely.

Guessing it’s related to this;

Full log from start to frozen.

K:\development\tools\python\Python27>python -m pyblish_tray
Launching Pyblish QML..
Listening for output..
Starting Pyblish..
Spent 141.00 ms creating the application
Listening on virtual host..

Entering state: "hidden"
Entering state: "ready"
Entering state: "clean"
 Finding available port..
Entering state: "alive"
Distributing new port 9001
Virtual server listening on127.0.0.1:9001
Finding available port..
Distributing new port 9002
  WindowTitle = Pyblish
  WindowSize = [430, 600]
  WindowPosition = [100, 100]
  HeartbeatInterval = 60
Entering state: "visible"
Entering state: "initialising"
Spent 3201.00 ms resetting
Made 27 requests during publish.
Spent 3994.00 ms resetting
Made 33 requests during reset.
Entering state: "ready"
Entering state: "initialising"
Traceback (most recent call last):
  File "K:\development\tools\pyblish\pyblish-qml\pyblish_qml\", line 8
17, in on_next
    return on_finished(str(result))
  File "K:\development\tools\pyblish\pyblish-qml\pyblish_qml\", line 8
46, in on_finished
    stats["requestCount"] -=["totalRequestCount"]
  File "K:\development\tools\python\Python27\lib\", line 1233, in __
    return self.__send(self.__name, args)
  File "K:\development\tools\python\Python27\lib\", line 1591, in __
  File "K:\development\tools\python\Python27\lib\", line 1273, in re
    return self.single_request(host, handler, request_body, verbose)
  File "K:\development\tools\python\Python27\lib\", line 1301, in si
    self.send_content(h, request_body)
  File "K:\development\tools\python\Python27\lib\", line 1448, in se
  File "K:\development\tools\python\Python27\lib\", line 997, in endhe
  File "K:\development\tools\python\Python27\lib\", line 850, in _send
  File "K:\development\tools\python\Python27\lib\", line 812, in send
  File "K:\development\tools\python\Python27\lib\", line 793, in conne
    self.timeout, self.source_address)
  File "K:\development\tools\python\Python27\lib\", line 571, in create
    raise err
socket.error: [Errno 10061] No connection could be made because the target machi
ne actively refused it
Finding available port..
Distributing new port 9003
  WindowTitle = Pyblish
  WindowSize = [430, 600]
  WindowPosition = [100, 100]
  HeartbeatInterval = 60
Warning: Could not enter ready state
Awaited statemachine for 1021.00 ms
Not ready

It does look like it might be related to that issue, yes. Could you outline how to reproduce it? Preferable as a snippet of runnable code.

Sorry, I was not able to make a complete runnable code.

  1. Have pyblish-qml running in a terminal with python -m pyblish_qml.
  2. Register a long running collector like this one;
import time

import pyblish.api

class LongDelay(pyblish.api.ContextPlugin):

    order = pyblish.api.CollectorOrder

    def process(self, context):
  1. Run pyblish-standalone in a separate terminal with python -m pyblish_standalone.
  2. While the long running plugin LongDelay, shut down the pyblish-standalone terminal, then shut down the GUI window.
  3. Run a new pyblish-standalone in a separate terminal with python -m pyblish_standalone.

This produces the frozen pyblish-qml GUI.

Sorry for my ignorance on this topic, but can you not signal to the server when the pyblish-standalone GUI window closes?

Ok, so this is a problem, it’s been there since the beginning, and it’s a difficult one to solve.

Here’s why.

10,000 feet view

At the moment, when a plug-in is triggered, it’s like a web client sending an HTTP request to a web server. It sends a request and awaits an answer, but before an answer is completed the server dies.

From the client’s perspective, it cannot know that the server has died. All it know is that it hasn’t gotten a reply in the expected amount of time.

The trick here is, what is a reasonable amount of time to wait before assuming that the server died? Remember, the server cannot say “I have died”. It can only reply to the request, or say nothing at all.

Solution A

We could do what you said, to send a signal (another request) from server (e.g. Maya, or pyblish-standalone) to the client (pyblish-qml) saying “I’m going to close now, you can stop waiting for whatever you expect me to finish”.

The problem is that we can’t be sure that such a signal would ever be emitted, because things often does not close the way they are supposed to. For example, if Maya is shut down, it could call the corresponding callbacks that may emit this signal. But if it crashes, that won’t happen and we’re back to where we started.

You’re probably already noticed a similar pattern in a web browser. As in, even when internet suddenly goes down, it appears as though the page is still loading, albeit slower. That’s because, as a client of the server running somewhere on the internet, it cannot know that the server is no longer there. (As a side-note, Chrome and others do communicate with the OS on some level so as to be notified about breaking network connections, and it can from there deduct that a page might not load and tell you about it).

Solution B

One way to solve it, might be to have the server emit heartbeats to the client, saying “I’m here. I’m here. I’m here. etc.”. So that even though a plug-in runs for many seconds without returning, there is another signal there keeping pyblish-qml notified about that it is still there doing what it’s supposed to.

We could then, separated from processing, expect a certain heartbeat to arrive within a certain amount of time, say ±0.1 seconds. If a heartbeat is late, we can assume the server has died and we can cancel whatever was going on.

If we’re processing when this happens, we’ll need some way of canceling what’s going on. Now, we can’t kill a running thread. Threads don’t work that way. We can kill a process, but not a thread. So on-top of the heartbeat approach above, we’ll also need to rearrange our call strategy.

Solution B - Part II

At the moment, requests from pyblish-qml to a host, such as pyblish-standalone, are made via what’s known as “long polling”. It’s making one request, and wait for a reply. Like any synchronous function call that might also take time.

The benefit here is that it’s dead simple and doesn’t require any smarts about it. We just make a request like we would a function, and as soon as the request finishes, it’ll be sent back and handled. Done.

As we can’t do that here, we can instead rely on another mechanism which you may be familiar with.

  1. Make a request to get called back when it’s your turn
  2. Get called back and start the conversation

This is what some banks do when there’s 20 people in line and the expected waiting time is 60 minutes+. Rather than you hanging onto the phone without estimate on how long it’ll take, they’ll call you.

In Python speak, this means making a request but not waiting for it to finish. Instead, we’ll give them our number (in this case our port number) and ask them to start running a plug-in. When the plug-in finishes, we call back with the result dictionary.

Technically, this would require both pyblish-standalone and pyblish-qml to be client and server simultaneously, because a server cannot contact a client; it only works the other way around. Luckily, this already happens. pyblish-qml can tell a host to process, we already know that. And we also know a host can tell the GUI to show.

So the foundation is already there. All it needs is someone to either implement the above, or to think of an alternative solution.

Any takers? :slight_smile:

Possibly in the near future:)

I’m looking into this issue, this week.

I’m curious whether this problem is exclusive to pyblish-qml? Say I run pyblish-lite as standalone, I assume I would run into the same problems?

1 Like

It’s due to the client/server nature of pyblish-qml, so anywhere that is involved this problem may surface.

On the other hand, it is only a problem where the client (e.g. pyblish-standalone or Maya) aborts prematurely, so it’s possibly partially solved by implementing a safety net on their part. For example, on exit, there could be a short delay before actually terminating the process.

Maybe pop up an issue about it to better keep track of it once this forum post is buried under newer forum posts. I expect it to be solvable, but challenging.

This is happening to us all the time, with all the hosts. It might very well be, that people are impatient and are shutting down app weirdly, and doing other unexpected things.

Since pyblish-lite is out, it’s actually better for us to use that even though it’s missing features, just for it’s stability.

You are experiencing less problems with pyblish-lite? Guessing you aren’t using pyblish-standalone?

You’re guessing right :slight_smile:

How about making pyblish-standalone adhere to the registered GUIs, so that lite is an option? There is an environment variable to be added to this too, possibly PYBLISH_GUI. (not in front of a computer at the moment, but there is an issue about it in base I think).

Doesn’t seem like pyblish-standalone is being used with pyblish-lite anywhere. The sole reason why we are still on pyblish-qml is because we make heavy use of pyblish-standalone for CelAction.

This was also why I asked about pyblish-lite with pyblish-standalone having the same problems as pyblish-qml. I haven’t used pyblish-lite, but would it need a client-server to run with pybish-standalone?

Is the pyblish_qml.rpc server started within CelAction at all? I’m not familiar enough with how pyblish-standalone works to know whether pyblish-lite could work with it.

If what you need is a completely isolated GUI running from a terminal, lite should work equally fine. But if what you need is CelAction to host the RPC server the qml is your only option so far.

No, all CelAction does is calling a batch script with some arguments. We have pyblish-tray running in the background.

Then this might work as a drop-in replacement for qml.

$ python -m pyblish_lite

Am I right in thinking that by using solution B part 2, we would make pyblish-qml asynchronous meaning we could have multiple publishes going at the same time?

Yes, that would be one potential foundation for this in a GUI. Lite would likely have to follow suit in a similar way if needed.

But for completeness, remember the primary factor holding something like that back is not the GUI, but hosts such as Maya. They are unable to do more than one thing at a time.

Having said that, there are still many potential benefits, including doing things unrelated to a host, such as validation and integration, and preparing for when host inevitably will enable support for multiprocessing. It’s possibly Maya is already able in some respects, with its recent addition of multiprocessing features.

meaning we could have multiple publishes going at the same time?

Wait, did you mean running multiple instances of the GUI, or multiple plug-ins at once?

Meant multiple instances of the GUI, or at least not tied to one host while processing.

Like publishing from 2-3 mayas at the same time. The reason why we mostly use lite now.