So I had some further thoughts about encapsulating a pipeline, and have been looking at Docker.
I still haven’t done any practical explorations into this, but this is how id imagined it working.
You would make sure that the all code operations are executed on the local machine and outside of the docket image, especially io operations so you don’t have to deal with mounting volumes. Basically the docker images would act as a server telling the local machine what to do.
@marcus, did you look at Docker at some point for Pyblish?
I’m using Docker for a lot of things and have been considering how we could leverage its potential with Pyblish, but not in this area.
Not sure how far you’ve gotten into your exploration, but each Docker image is like a VMWare Workstation or VirtualBox image; a fully encapsulated operating system where you install things like you would in any operating system, completely independent of the host.
In some ways, it’s the opposite of integration. That’s more or less the point. To encapsulate, not integrate.
Maybe if you broke it down into the steps you’d imagine this integration to work, or a user story of sorts? Because I’m not sure I’m seeing what you’re seeing here.
I started a repo for trying to encapsulate the pipeline, by pulling repositories reference in a json file;
Interesting start. I wonder when it really becomes a bottleneck or when the extra layer of the “json-reference” layer really starts shining.
Any feedback is welcome.
I’m assuming you’ll be adding git to vendors already based on your comment, but was going to mention that.
I think the repositorities.json should hold a specific version number or “range” that it requires of a git repository for the pipeline to function. I imagine this repo to be a lightweight version controlled “pipeline” that gets tagged with a specific version to switch to. Basically allowing you to “go back in time to that specific state for that particular show you had then and then”. So it would need to know what particular tag/commit should be pulled.
Similarly updating the pipeline (git pull + update) wouldn’t necessarily have the latest of all external git repos, since it could potentially break the pipeline (e.g. when there are external tools being updated with backwards incompatibility with your tools)
How are you thinking to serve multiple OS platforms?
Should the pipeline also come with its own launchers and environment variables it should set? e.g. does your entire pipeline work if you only have the git repositories loaded? Or are there additional things that influence it?
I wonder when it really becomes a bottleneck or when the extra layer of the “json-reference” layer really starts shining.
True, I guess you could just as easily include all the repos you need for the pipeline. Though git submodules aren’t very nice to work with, when you want to develop.
I think the repositorities.json should hold a specific version number or “range” that it requires of a git repository for the pipeline to function. I imagine this repo to be a lightweight version controlled “pipeline” that gets tagged with a specific version to switch to. Basically allowing you to “go back in time to that specific state for that particular show you had then and then”. So it would need to know what particular tag/commit should be pulled.
Yeah, my thoughts are to have an optional commit in json file, where you can specify which commit to pull to. On that note I’m thinking I could add a snapshot method to update the json file with whatever current commit the repos are at.
How are you thinking to serve multiple OS platforms?
Thinking to have a shell script for other OSs with accommodating executables as vendors.
Should the pipeline also come with its own launchers and environment variables it should set? e.g. does your entire pipeline work if you only have the git repositories loaded? Or are there additional things that influence it?
You could definitely do that, or have a separate repo for this.
Actually I currently toying with making the repo more usable for other pipelines as well. The repo could be the “pipeline-manager”, that just updates repositories. If it finds a packages.json in other repos, it’ll pull the repos. This way you could point the “pipeline-manager” to any pipeline repo that has it’s own vendors and dependent repos.
I would make it a tag, with an overriding, optional commit for brute force versioning.
Tags are made whenever a release is made, so a tag for pyblish-base for example could be 1.4.0. It’d make it clearer in the JSON what’s going on, but also encourage stable releases of projects. Sometimes though you might still need a particular commit.
I started using Anaconda as my main standalone Python distribution quite a while ago, so I have a fair amount experience with the package management side of conda from an end user’s perspective. Anaconda is just a conda metapackage defining a specific set of package versions that are known to work together, so it’s directly equivalent to the “production” environment described above.
Conda is great for managing not only pure Python packages but also things in other languages that need to be compiled to native code (e.g. Qt). It’s not as mature as linux package mangers like apt-get, but it’s pretty good, and I think it would take a lot of effort to achieve equivalent functionality[*]. I don’t have an example off the top of my head but I don’t know why you couldn’t use it to package binary installers, in which case the build script would probably just run the installer, e.g:
Things like that or in-house tools could be hosted in private repositories while open source components like Pyblish go in shared public repos.
I haven’t used the multi-environment feature extensively because of a long standing bug that broke all of the environment variable magic if you were on Windows and tried to use a shell other than cmd.exe. Fortunately, that bug (linked below) has finally been fixed in master. I’m really looking forward to the next release when it should be possible to use conda environments inside a sane shell on Windows. Once that’s out I’ll likely need to some more time with it.
[*] To a certain extent I look at pyblish/pyblish-shell (not to mention all of the previous attempts to package and distribute the Pyblish components) and wonder how much it achieves that couldn’t be done by just defining some dependencies and relying on an external package manager like conda.
So I have explored Conda a bit, and here are my initial thoughts.
The conda framework is really interesting and easy to get into. Within a couple of minutes I had pyblish-base in a isolated environment, that can be shared this like this;
Its a very robust framework where you can easily build more complex environments, that includes non-pip packages like ffmpeg etc.
The current biggest issue, if you can call it that, is that the cloned python packages needs to be setup to work with pip. Think this is very small hurdle to get over though.
If anyone want to give it a try, you can clone ad run update.bat; https://github.com/Bumpybox/pipeline-manager
Thanks for updating us with your progress. I quickly downloaded the repo to test the functionality and it worked flawlessly. I did have the idea though that the overall installation procedure took quite some time (the download).
Its a very robust framework where you can easily build more complex
environments, that includes non-pip packages like ffmpeg etc.
How would one, for example, include ffmpeg?
The current biggest issue, if you can call it that, is that the cloned
python packages needs to be setup to work with pip. Think this is very
small hurdle to get over though.
Not sure if I understand. It will only work with repositories that work with pip install? It seems like it can also just git pull a repo and add it to the paths? Or am I misunderstanding how Conda works?
Aside from that, say this would pull everything for your pipeline. How would you separate between the part of the pipeline (e.g. Pyside) that only work with certain applications (e.g. Maya has its own). Would that be a totally separate anaconda environment? And would that duplicate all the required files for each environment? Or does it manage it more cleverly and somehow keeps only single copies around?
True you could just use git to pull the repositories, but I wouldn’t know how to get the environment paths working from a single environment file. That’s the main point, for me, with going for pip installing, so you can describe your environment in a single yaml file.
When I dig further into Conda there is probably better ways of doing this, but it pretty cool atm to get your environment easily setup.
I’m guessing you would create an environment for Maya, but I haven’t delved into that stuff yet.
All packages are linked into the different environments, meaning only a single copy for multiple environments. Pretty neat
More technically Conda creates a .egg-link in the site-packages of the environment, that it pointing to the cloned repository. So you could probably easily emulate that.
Wouldn’t it make sense to assume conda is available on a system? Kind of like how the install instructions for anything on PyPI assumes you’ve got pip.
Also, I haven’t run it, but does update.bat actually update? Looks like it does a one-off install and that you can’t run it twice. Maybe something like init.bat?
My goal is to have the user do as little as possible, when updating/installing. I would like to even download the miniconda install file, but that seems to take an unusually long time.
It updates the repositories as well, as updating its environment files (before updating the environments). So you can run as many times as you want.
There are definitely some testing to be done around this area though.