Pyblish-standalone exit prematurely

tokejepsen · July 26, 2016, 3:19pm

We quite often get frozen pyblish-qml GUI, when using pyblish-standalone and I’m pretty sure it’s to do with exiting prematurely.

Guessing it’s related to this; https://github.com/pyblish/pyblish-qml/issues/205

Full log from start to frozen.

K:\development\tools\python\Python27>python -m pyblish_tray
Launching Pyblish QML..
Finished
Listening for output..
Starting Pyblish..
Spent 141.00 ms creating the application
Listening on 127.0.0.1:9090Launching virtual host..

Entering state: "hidden"
Entering state: "ready"
Entering state: "clean"
 Finding available port..
Entering state: "alive"
Distributing new port 9001
Virtual server listening on127.0.0.1:9001
Finding available port..
Distributing new port 9002
Settings:
  WindowTitle = Pyblish
  WindowSize = [430, 600]
  WindowPosition = [100, 100]
  HeartbeatInterval = 60
Entering state: "visible"
Entering state: "initialising"
Spent 3201.00 ms resetting
Made 27 requests during publish.
Spent 3994.00 ms resetting
Made 33 requests during reset.
Entering state: "ready"
Entering state: "initialising"
Traceback (most recent call last):
  File "K:\development\tools\pyblish\pyblish-qml\pyblish_qml\control.py", line 8
17, in on_next
    return on_finished(str(result))
  File "K:\development\tools\pyblish\pyblish-qml\pyblish_qml\control.py", line 8
46, in on_finished
    stats["requestCount"] -= self.host.stats()["totalRequestCount"]
  File "K:\development\tools\python\Python27\lib\xmlrpclib.py", line 1233, in __
call__
    return self.__send(self.__name, args)
  File "K:\development\tools\python\Python27\lib\xmlrpclib.py", line 1591, in __
request
    verbose=self.__verbose
  File "K:\development\tools\python\Python27\lib\xmlrpclib.py", line 1273, in re
quest
    return self.single_request(host, handler, request_body, verbose)
  File "K:\development\tools\python\Python27\lib\xmlrpclib.py", line 1301, in si
ngle_request
    self.send_content(h, request_body)
  File "K:\development\tools\python\Python27\lib\xmlrpclib.py", line 1448, in se
nd_content
    connection.endheaders(request_body)
  File "K:\development\tools\python\Python27\lib\httplib.py", line 997, in endhe
aders
    self._send_output(message_body)
  File "K:\development\tools\python\Python27\lib\httplib.py", line 850, in _send
_output
    self.send(msg)
  File "K:\development\tools\python\Python27\lib\httplib.py", line 812, in send
    self.connect()
  File "K:\development\tools\python\Python27\lib\httplib.py", line 793, in conne
ct
    self.timeout, self.source_address)
  File "K:\development\tools\python\Python27\lib\socket.py", line 571, in create
_connection
    raise err
socket.error: [Errno 10061] No connection could be made because the target machi
ne actively refused it
Finding available port..
Distributing new port 9003
Settings:
  WindowTitle = Pyblish
  WindowSize = [430, 600]
  WindowPosition = [100, 100]
  HeartbeatInterval = 60
Warning: Could not enter ready state
Awaited statemachine for 1021.00 ms
Not ready

marcus · July 26, 2016, 6:05pm

It does look like it might be related to that issue, yes. Could you outline how to reproduce it? Preferable as a snippet of runnable code.

tokejepsen · July 27, 2016, 9:54am

Sorry, I was not able to make a complete runnable code.

Have pyblish-qml running in a terminal with python -m pyblish_qml.
Register a long running collector like this one;

import time

import pyblish.api


class LongDelay(pyblish.api.ContextPlugin):

    order = pyblish.api.CollectorOrder

    def process(self, context):
        time.sleep(30)

Run pyblish-standalone in a separate terminal with python -m pyblish_standalone.
While the long running plugin LongDelay, shut down the pyblish-standalone terminal, then shut down the GUI window.
Run a new pyblish-standalone in a separate terminal with python -m pyblish_standalone.

This produces the frozen pyblish-qml GUI.

Sorry for my ignorance on this topic, but can you not signal to the server when the pyblish-standalone GUI window closes?

marcus · July 31, 2016, 6:21pm

Ok, so this is a problem, it’s been there since the beginning, and it’s a difficult one to solve.

Here’s why.

10,000 feet view

At the moment, when a plug-in is triggered, it’s like a web client sending an HTTP request to a web server. It sends a request and awaits an answer, but before an answer is completed the server dies.

From the client’s perspective, it cannot know that the server has died. All it know is that it hasn’t gotten a reply in the expected amount of time.

The trick here is, what is a reasonable amount of time to wait before assuming that the server died? Remember, the server cannot say “I have died”. It can only reply to the request, or say nothing at all.

Solution A

We could do what you said, to send a signal (another request) from server (e.g. Maya, or pyblish-standalone) to the client (pyblish-qml) saying “I’m going to close now, you can stop waiting for whatever you expect me to finish”.

The problem is that we can’t be sure that such a signal would ever be emitted, because things often does not close the way they are supposed to. For example, if Maya is shut down, it could call the corresponding callbacks that may emit this signal. But if it crashes, that won’t happen and we’re back to where we started.

You’re probably already noticed a similar pattern in a web browser. As in, even when internet suddenly goes down, it appears as though the page is still loading, albeit slower. That’s because, as a client of the server running somewhere on the internet, it cannot know that the server is no longer there. (As a side-note, Chrome and others do communicate with the OS on some level so as to be notified about breaking network connections, and it can from there deduct that a page might not load and tell you about it).

Solution B

One way to solve it, might be to have the server emit heartbeats to the client, saying “I’m here. I’m here. I’m here. etc.”. So that even though a plug-in runs for many seconds without returning, there is another signal there keeping pyblish-qml notified about that it is still there doing what it’s supposed to.

We could then, separated from processing, expect a certain heartbeat to arrive within a certain amount of time, say ±0.1 seconds. If a heartbeat is late, we can assume the server has died and we can cancel whatever was going on.

If we’re processing when this happens, we’ll need some way of canceling what’s going on. Now, we can’t kill a running thread. Threads don’t work that way. We can kill a process, but not a thread. So on-top of the heartbeat approach above, we’ll also need to rearrange our call strategy.

Solution B - Part II

At the moment, requests from pyblish-qml to a host, such as pyblish-standalone, are made via what’s known as “long polling”. It’s making one request, and wait for a reply. Like any synchronous function call that might also take time.

The benefit here is that it’s dead simple and doesn’t require any smarts about it. We just make a request like we would a function, and as soon as the request finishes, it’ll be sent back and handled. Done.

As we can’t do that here, we can instead rely on another mechanism which you may be familiar with.

Make a request to get called back when it’s your turn
Get called back and start the conversation

This is what some banks do when there’s 20 people in line and the expected waiting time is 60 minutes+. Rather than you hanging onto the phone without estimate on how long it’ll take, they’ll call you.

In Python speak, this means making a request but not waiting for it to finish. Instead, we’ll give them our number (in this case our port number) and ask them to start running a plug-in. When the plug-in finishes, we call back with the result dictionary.

Technically, this would require both pyblish-standalone and pyblish-qml to be client and server simultaneously, because a server cannot contact a client; it only works the other way around. Luckily, this already happens. pyblish-qml can tell a host to process, we already know that. And we also know a host can tell the GUI to show.

So the foundation is already there. All it needs is someone to either implement the above, or to think of an alternative solution.

Any takers?

tokejepsen · August 1, 2016, 2:10pm

Possibly in the near future:)

tokejepsen · August 8, 2016, 7:11am

I’m looking into this issue, this week.

I’m curious whether this problem is exclusive to pyblish-qml? Say I run pyblish-lite as standalone, I assume I would run into the same problems?

marcus · August 8, 2016, 7:26am

It’s due to the client/server nature of pyblish-qml, so anywhere that is involved this problem may surface.

On the other hand, it is only a problem where the client (e.g. pyblish-standalone or Maya) aborts prematurely, so it’s possibly partially solved by implementing a safety net on their part. For example, on exit, there could be a short delay before actually terminating the process.

Maybe pop up an issue about it to better keep track of it once this forum post is buried under newer forum posts. I expect it to be solvable, but challenging.

mkolar · August 8, 2016, 9:21am

This is happening to us all the time, with all the hosts. It might very well be, that people are impatient and are shutting down app weirdly, and doing other unexpected things.

Since pyblish-lite is out, it’s actually better for us to use that even though it’s missing features, just for it’s stability.

tokejepsen · August 8, 2016, 9:23am

You are experiencing less problems with pyblish-lite? Guessing you aren’t using pyblish-standalone?

mkolar · August 8, 2016, 9:24am

You’re guessing right

marcus · August 8, 2016, 10:58am

How about making pyblish-standalone adhere to the registered GUIs, so that lite is an option? There is an environment variable to be added to this too, possibly PYBLISH_GUI. (not in front of a computer at the moment, but there is an issue about it in base I think).

tokejepsen · August 8, 2016, 11:03am

Doesn’t seem like pyblish-standalone is being used with pyblish-lite anywhere. The sole reason why we are still on pyblish-qml is because we make heavy use of pyblish-standalone for CelAction.

This was also why I asked about pyblish-lite with pyblish-standalone having the same problems as pyblish-qml. I haven’t used pyblish-lite, but would it need a client-server to run with pybish-standalone?

marcus · August 8, 2016, 11:13am

Is the pyblish_qml.rpc server started within CelAction at all? I’m not familiar enough with how pyblish-standalone works to know whether pyblish-lite could work with it.

If what you need is a completely isolated GUI running from a terminal, lite should work equally fine. But if what you need is CelAction to host the RPC server the qml is your only option so far.

tokejepsen · August 8, 2016, 11:18am

No, all CelAction does is calling a batch script with some arguments. We have pyblish-tray running in the background.

marcus · August 8, 2016, 1:13pm

Then this might work as a drop-in replacement for qml.

$ python -m pyblish_lite

tokejepsen · August 9, 2016, 7:09am

Am I right in thinking that by using solution B part 2, we would make pyblish-qml asynchronous meaning we could have multiple publishes going at the same time?

marcus · August 9, 2016, 8:36am

Yes, that would be one potential foundation for this in a GUI. Lite would likely have to follow suit in a similar way if needed.

But for completeness, remember the primary factor holding something like that back is not the GUI, but hosts such as Maya. They are unable to do more than one thing at a time.

Having said that, there are still many potential benefits, including doing things unrelated to a host, such as validation and integration, and preparing for when host inevitably will enable support for multiprocessing. It’s possibly Maya is already able in some respects, with its recent addition of multiprocessing features.

marcus · August 9, 2016, 8:42am

meaning we could have multiple publishes going at the same time?

Wait, did you mean running multiple instances of the GUI, or multiple plug-ins at once?

tokejepsen · August 9, 2016, 8:43am

Meant multiple instances of the GUI, or at least not tied to one host while processing.

mkolar · August 9, 2016, 8:54am

Like publishing from 2-3 mayas at the same time. The reason why we mostly use lite now.