Pipeline Development Environment

jedfrechette · May 6, 2016, 7:49pm

I started using Anaconda as my main standalone Python distribution quite a while ago, so I have a fair amount experience with the package management side of conda from an end user’s perspective. Anaconda is just a conda metapackage defining a specific set of package versions that are known to work together, so it’s directly equivalent to the “production” environment described above.

Conda is great for managing not only pure Python packages but also things in other languages that need to be compiled to native code (e.g. Qt). It’s not as mature as linux package mangers like apt-get, but it’s pretty good, and I think it would take a lot of effort to achieve equivalent functionality[*]. I don’t have an example off the top of my head but I don’t know why you couldn’t use it to package binary installers, in which case the build script would probably just run the installer, e.g:

\\local_server\foundry\vendors\MODO_10.0v1_win.exe /dir=“C:\Modo

Things like that or in-house tools could be hosted in private repositories while open source components like Pyblish go in shared public repos.

I haven’t used the multi-environment feature extensively because of a long standing bug that broke all of the environment variable magic if you were on Windows and tried to use a shell other than cmd.exe. Fortunately, that bug (linked below) has finally been fixed in master. I’m really looking forward to the next release when it should be possible to use conda environments inside a sane shell on Windows. Once that’s out I’ll likely need to some more time with it.

github.com/conda/conda

make conda work with msysgit terminal or cygwin

opened 01:44PM - 28 May 14 UTC

closed 09:15PM - 01 Feb 18 UTC

ijstokes

type::feature cli locked

The bash shell provided with msysgit (http://msysgit.github.io) or with Cygwin a…re very useful for people on Windows looking for something more useful than cmd.exe or PowerShell. However, it appears that at least some of the standard conda commands won't work in this environment (probably not surprisingly), e.g. "activate", since .bat file modification of cmd.exe environment is doubtless different from what a bash shell expects. It would be nice if this worked. Perhaps just bundling msysgit with Anaconda for Windows would be a good way to provide a "full solution".

[*] To a certain extent I look at pyblish/pyblish-shell (not to mention all of the previous attempts to package and distribute the Pyblish components) and wonder how much it achieves that couldn’t be done by just defining some dependencies and relying on an external package manager like conda.

tokejepsen · May 14, 2016, 12:58pm

So I have explored Conda a bit, and here are my initial thoughts.

The conda framework is really interesting and easy to get into. Within a couple of minutes I had pyblish-base in a isolated environment, that can be shared this like this;

name: pyblish-base
dependencies:
- git=2.6.4=0
- pip=8.1.1=py27_1
- python=2.7.11=4
- setuptools=20.7.0=py27_0
- vs2008_runtime=9.00.30729.1=0
- wheel=0.29.0=py27_0
- pip:
  - "--editable=git+https://github.com/pyblish/pyblish-base.git#egg=pyblish-base"

Its a very robust framework where you can easily build more complex environments, that includes non-pip packages like ffmpeg etc.

The current biggest issue, if you can call it that, is that the cloned python packages needs to be setup to work with pip. Think this is very small hurdle to get over though.
If anyone want to give it a try, you can clone ad run update.bat; https://github.com/Bumpybox/pipeline-manager

BigRoy · May 14, 2016, 7:34pm

Thanks for updating us with your progress. I quickly downloaded the repo to test the functionality and it worked flawlessly. I did have the idea though that the overall installation procedure took quite some time (the download).

Its a very robust framework where you can easily build more complex
environments, that includes non-pip packages like ffmpeg etc.

How would one, for example, include ffmpeg?

The current biggest issue, if you can call it that, is that the cloned
python packages needs to be setup to work with pip. Think this is very
small hurdle to get over though.

Not sure if I understand. It will only work with repositories that work with pip install? It seems like it can also just git pull a repo and add it to the paths? Or am I misunderstanding how Conda works?

Aside from that, say this would pull everything for your pipeline. How would you separate between the part of the pipeline (e.g. Pyside) that only work with certain applications (e.g. Maya has its own). Would that be a totally separate anaconda environment? And would that duplicate all the required files for each environment? Or does it manage it more cleverly and somehow keeps only single copies around?

jedfrechette · May 14, 2016, 11:08pm

ffmpeg is in the conda-forge community repo. Here’s its recipe:

tokejepsen · May 15, 2016, 10:33am

True you could just use git to pull the repositories, but I wouldn’t know how to get the environment paths working from a single environment file. That’s the main point, for me, with going for pip installing, so you can describe your environment in a single yaml file.
When I dig further into Conda there is probably better ways of doing this, but it pretty cool atm to get your environment easily setup.

I’m guessing you would create an environment for Maya, but I haven’t delved into that stuff yet.

All packages are linked into the different environments, meaning only a single copy for multiple environments. Pretty neat

tokejepsen · May 15, 2016, 10:42am

More technically Conda creates a .egg-link in the site-packages of the environment, that it pointing to the cloned repository. So you could probably easily emulate that.

tokejepsen · May 19, 2016, 12:31pm

Update on this… Got a full environment working with the QML front-end.

Clone/download; https://github.com/Bumpybox/pipeline
Run update.bat. Still will take a while due to downloading git and python-qt5.
Run prompt.bat.
Execute python -m pyblish_qml --debug.

You should get the debug mode of pyblish-qml.

marcus · May 19, 2016, 1:12pm

Wouldn’t it make sense to assume conda is available on a system? Kind of like how the install instructions for anything on PyPI assumes you’ve got pip.

Also, I haven’t run it, but does update.bat actually update? Looks like it does a one-off install and that you can’t run it twice. Maybe something like init.bat?

tokejepsen · May 19, 2016, 1:19pm

My goal is to have the user do as little as possible, when updating/installing. I would like to even download the miniconda install file, but that seems to take an unusually long time.

It updates the repositories as well, as updating its environment files (before updating the environments). So you can run as many times as you want.
There are definitely some testing to be done around this area though.

marcus · May 19, 2016, 1:27pm

But what user are you building this for?

If it’s for a studio, then setting up the foundation is a one-off that you could then use to manage software and versions.

If it’s for a user anywhere on the planet, then Conda seems way overkill. :S

tokejepsen · May 19, 2016, 1:35pm

How come conda is overkill?

marcus · May 19, 2016, 1:40pm

Because to do what your script is currently doing doesn’t need to be any more than a few git commands. At this point, the simplicity is wrapped up into something most “users” (assuming non-local) won’t understand, myself included.

Maybe if you post your goals it would make more sense. Who is it for? What are you looking to solve? Why isn’t “just git” enough?

tokejepsen · May 19, 2016, 1:52pm

So my goal is to encapsulate an entire pipeline in an easy to install/update package.

The end users would be individuals or studios that would like to use the pipeline, and be able to update it without any coding knowledge.

The problem with using just git, is that I quickly run into complexity with non-pip packages like ffmpeg to install. I think conda has a solution to that, but maybe not entirely fully realised. So I think its more to do with using an existing solution instead of trying to re-invent.

During my exploration of conda, their environment handling I think is quite interesting. Currently I’m playing around with that you could have environment for different situation like: pyblish-qml, pyblish-maya. With the conda environments not being python specific you can have an entire pipeline described, and with that you could have other pipelines described.

marcus · May 19, 2016, 2:05pm

I think it’s important to realise that this is really deep waters.

You should definitely get familiar with Conda; the developers will likely have encountered many of the issues you would face if you were going to approach this yourself. You can learn from that.

At some point however I think you’ll come to realise that this is an unsolved problem. That there is no universal solution. Those that have tried encapsulating it into a holistic and reusable solution end up with something like Puppet. Or Salt. People build careers with that.

What I’m saying is that - rather than trying to delegate an understanding of this problem to an outside product, I would suggest you narrow down your problem, target audience and scope of your solution to the smallest possible and solve that. Because otherwise I think you’ll spend more time learning about something else, like a management system, then about what you are actually trying to solve.

tokejepsen · May 19, 2016, 2:10pm

Thanks I’ll keep that in mind. I’m very aware of the scope of the problem, and even more aware of how much time I’m spending on it

marcus · May 19, 2016, 2:11pm

Ok then.

tokejepsen · October 31, 2016, 5:43pm

I’ve now done a couple of takes on using conda with a pipeline, and I think I have the beginning of a structure that can be used by other people.

In the end all pipelines that want to use conda as their package manager, should really be looking at making conda packages, but this is not an easy task and it doesn’t work well with remote git repositories.

The aim of the project would be to make conda easily accessible for people to setup a known environment for their pipeline. Thus this project would be used for isolating the environment and resolving dependencies.

The idea is to have a single entry point through batch and shell script, that installs Miniconda and runs any python scripts from a configuration. This seems to be the simplest and most flexible.
Another idea is to be more based in git. Here the configuration would be pointing to a git repsitory (remote or local), which would be pulled/updated and a python script from the repository would be executed. Here the projects responsibilities would to extended to keep git repositories updated, and execute a python script in a known environment.

I’m very open to ideas or pros/cons, as I’m still toying with the main concept.

marcus · October 31, 2016, 9:36pm

Cool! How about typing up a getting-started tutorial, so we can give it a whirl?

tokejepsen · November 1, 2016, 2:29pm

Got a working prototype for this; https://github.com/tokejepsen/conda-git-deployment

Its a bit rough, but should work for the example any a bit more.

Let me know what you think

marcus · November 1, 2016, 9:33pm

Ok, here we go.

Setup on Ubuntu 16.04

$ apt-get update && apt-get install -y git python nano wget
$ wget -c http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh
$ chmod +x Miniconda-latest-Linux-x86_64.sh
$ ./Miniconda-latest-Linux-x86_64.sh
# ... interactive installation process ...
$ export PATH=/root/miniconda2/bin:$PATH
$ export PYTHONPATH=/root/miniconda2/lib/python2.7/site-packages

Launch

$ git clone https://github.com/tokejepsen/conda-git-deployment.git
$ cd conda-git-deployment
$ cp environment.yml.example environment.yml
$ python conda_git_deployment/update.py
# conda-git-deployment
# Already up-to-date.
# Traceback (most recent call last):
#   File "conda_git_deployment/update.py", line 21, in <module>
#     conf = utils.get_configuration()
#   File "/root/conda-git-deployment/conda_git_deployment/utils.py", line 55, in get_configuration
#     import conda.common.yaml
# ImportError: No module named common.yaml

I know you said you only had looked Windows so far, but I felt more comfortable experimenting on a machine that wouldn’t infect my personal setup, and Linux was the quickest thing to boot up.

Any idea where it could have gone wrong?