Folder Structure Resolver


#1

Continuing the discussion from Pyblish Magenta:

I’ve think it’s better if we continue this discussion in it’s own thread. It’s potentially a big feature that might be a great selling point for people who might consider starting with pyblish.

From my point of view, having built in ability to resolve paths based on easy templates would push usability of pyblish through the sky. I’m currently using lucidity to deal paths in studio. It works well and is simple enough, however the templates are python files with a bit cumbersome structure, hence not very appealing to studios without a developer I think.


#2

Lucidity looks good to me.

Not sure it falls within the scope of Pyblish itself, but we could certainly provide extensions for it.


#3

The way I see it. You’re already talking about something similar in here Pyblish Magenta

The point is that making a good folder structure package for pyblish based on easy to use templates and which could be used by other packages would increase pyblish appeal tenfold. Question s how to implement it so it would be usable across the board and simple to customize.

Ideal scenario in my head is like this.

User has a bunch of plugins and packages that all need to create and validate paths. Instead of dealing with this on a per plugins bases. I’d like to just take raw data from plugin (dictionary let’s say) and pass it to function that return formatted path based on yaml template.

Now I’m doing this using lucidity right now (minus yaml), but I’d much rather just do:

import pyblish.api
import pyblish.utils

data = {project: 'pr01', shot: 'sh001', task: 'animation', version: '1'}

path = pyblish.utils.pathFromData('templateName', data)

and get something like

>>> /root/pr01/sh0001/animation/pr01_sh001_animation_v001.ma

Is it doable now? Absolutely.
Would it be great if it was built in? Yes
Is it within scope of Pyblish? I think so.

There are many other useful convenience function that I can think of bundling with pyblish that would make people’s life much easier. VersionUp, VersionDown, geVersion, setVersion come to mind for instance.


#4

I was thinking Lucidity could be bundled with Magenta. It looks to be an implementation of just what @BigRoy is looking for, which is great.

I can’t deny the usability of your suggestion, and I do think we should consider providing directory management utilities to users in some shape or form, but I’m worried that this approach in particular might be making too many assumptions. Primarily because templates is just one way of associating metadata to content, and there are many others. Making it a core component of Pyblish is taking a stand on which way is right, and which is wrong, and frankly I don’t think there is such a thing.

What’s wrong with doing this?

import lucidity
data = {project: 'pr01', shot: 'sh001', task: 'animation', version: '1'}
path = lucidity.pathFromData('templateName', data)

And if the interface is not “Pyblishy” enough, then how about this?

import pyblish_lucidity
data = {project: 'pr01', shot: 'sh001', task: 'animation', version: '1'}
path = pyblish_lucidity.pathFromData('templateName', data)

In which case the interface may be wrapped into something that could implicitly interact with the Context and Instances somehow, like setting and fetching data.


#5

Nothing at all. It’s exactly what I’m doing. The only thing is, that I’m not a fan of how lucidity handles templates. It seemed to me from the discussion in magenta thread, that you guys wanted to build something custom that deals with templated paths.

I’m not sure I agree with this. It’s not making a stand. It’s supplying functionality that can be used for many more things than just paths.and folder structures. Anyways, bundling lucidity would certainly be a step towards the right direction in making these bigger packages easily usable for newcomers.


#6

If that’s your current point of view then I don’t think bundling lucidity solves much? At the same time I do agree with @marcus that bundling it directly with Pyblish gives the wrong impression. There’s much more ways to tackle this beast, and to look at most tools it’s not based on paths but databases of sorts instead.

I would assume Pyblish its core to be the base component for processing SVEC plug-ins. On the other hand if we can develop a very nice solution for matching path templates with nice features then it would be great to recommend using that for smaller studios. That could be in the form of links in the wiki, a repository like Magenta and some accompanying tutorials for getting a nice link between Pyblish SVEC plug-ins and our path pattern matching.

What’s wrong with Lucidity?

I’ve just taken a quick glimpse at Lucidity since I didn’t know of its existence. Just looking quickly over the API and its functionality there’s a lot to like, but also some things that seem missing (at first glance). But since you have experience with it I’m especially wondering what exactly you’re not a fan of.

Do you only need a yaml parser for Lucidity. Or is it lacking more?

I’d be happy to contribute the pattern matching as a standalone package and develop on some good defined goals. Especially since I need to force it to be more separate from the Pyblish Magenta repository anyway. That way the path matching code will remain more simple by itself. More focused?

What would you like to see/have?


#7

Ah, yes you are right. In fact, I think @BigRoy has already developed it. But it is strictly only for the Magenta project, which is defining an entire pipeline more or less. It’s a safe place to make those kinds of assumptions as they go hand in hand with how plug-ins work. Other projects may then appear which defines similar or completely different methods of handling paths, and there are many on the way. (!)

Developing an alternative to Lucidity would certainly be an interesting project, but as I said, it’s too circumstantial to fit into the core library.


#8

I’m with you now. Agreed keeping it separate will certainly be a better way forward.

To be honest the main things I’m not liking are.

  • Formatting of the templates - essentially using yaml for temlates would be much nicer than python scripts.
  • Optional entries in templates e.g. {entry1}[.{entry2}].exr (if entry2 is missing from the data, path evaluates without it )
  • Better template matching if template name is not provided. (currently it only picks the first match)

there were a few more things that I can’t think of now. will have to look through my code for workaround I’ve done to remember. Essentially though it’s all about the ease of writing templates.

I was thinking of forking lucidity for quite some time now and looking at fixing these, but due to lack of time I simply couldn’t. The developer won’t be adding much to it any time soon I’m afraid as he is one of ftrack devs now and they are swarmed with other things to do.


#9

Great, that’s some awesome starting points.

Would you recommend forking Lucidity or get something running more separate? I’m not sure how heavily Lucidity is already being used (and/or maintained?) I can have a look at forking it and managing extra features, though I’d have to get into how it works on the inside before being productive. :wink:

Goals

  1. YAML parsing of templates
  2. Optional entries in a template
  3. Better template matching if template name is not provided. Returning multiple matches.
  4. Allowing to list/search for directories that match a template.

#10

I’ve just had a look through lucidity code and it seem to be very compact piece of code doing all the work. All in all around 350 lines of quite readable code.

I’m generally a fan of forking things rather than starting from scratch if it’s mostly working apart from certain things. How involved it would be to implement these I’m honestly not sure. Some of them were already discussed in lucidity issues.

It seems to be maintained (last update a month ago), so forking might be the best.


#11

I’ve had a better look at lucidity. The code indeed looks clean and short but there are some things that somewhat bother me. Or at least leave me somewhat questioning.

No Schema (the collection/container of templates) class?

Within __init__.py you have the methods for discovering templates and then using the returned list for the other functions, like parse(path, templates), format(data, templates), get_template(name, templates).

Instead I would rather have a Schema class that once instantiad could be used like:

schema = Schema()
scheme.discover_templates(template_paths)
# parsing
data, template = scheme.parse(path)

# formatting
# note: I do think it's weird you format data to all templates returning the first *matching* one. What's the use here? Can the same data only ever be used for one path? Are names of the keys never re-used?
scheme.format(data)

# getting a single template by name
scheme.get_template(name)

What do you think about using a Schema class?
Or what’s the benefit of having it the way it currently is?

Code formatting / style

The style used in lucidity is a bit strange. For example the use of triple single quotes instead of triple double quotes for docstrings isn’t PEP08. Even though overall it seems neatly written and controlled. But even more it’s very neatly written for autodoc & sphinx, but that does make the docstrings a bit annoying to read directly in the code. (Could be just me?)

And the tests formatting just looks really hard to read as well (to me). Is that a pytest thing you need to get used to? (Had never seen that package before)


By the way, building in the match all patterns really is about a couple of lines of change without a major overhaul. :wink:


#12

Hey all, I’m behind Lucidity - Marcus pointed me to this discussion so hopefully I can contribute something.

As was mentioned in the thread, development on Lucidity is slow at the moment as I’m working full time at ftrack now (though we do also use Lucidity at ftrack as part of our structure plugins that deal with filesystem structure management). I am very open to pull requests though and also hope to find some time soon to tackle the template referencing and best match features.

Lucidity was designed as a flexible library so lots of places build on top of it, but perhaps it is time to add more to the core.


#13

Could definitely be a useful addition. At first conception, Lucidity was designed to be a low level library that could be built on top of (with things like template management coming from elsewhere). However, it might be a good time now to add in some basic template management tools to the core. Feel free to post a request to https://github.com/4degrees/lucidity/issues

This is just personal preference that saves on RSI - one less key to hold down! The rest follows PEP8 quite closely.

Yeah, for those from a Unittest or Nose background it can look a bit different, but I highly encourage you to try pytest out with this style. It helps ensure test isolation for easy parallelization and also to use dependency injection for quick parametrization of tests (different input data etc). It took me a while to get used to it, but I haven’t looked back since.


#14

Don’t worry. I’ve forked the repo and did a quick implementation. You can find it here Github.com/BigRoy/lucidity
After committing I noticed my commit messages were in past tense (habits die hard!), I’ll try to stick more closely to recommended format. :wink:

It contains only one quick test running in pytest because it took me more time to get through understanding how pytest works than I’d hoped. But so far the test log is super nice! Thanks.

I’ll stick with in my forked repo. It’s just that my IDE’s highlighting goes crazy on anything non PEP8 by default, it’s a minor switch. PyCharm is pretty strict in its default settings.

Took me some time to figure out, but it looks like I have a test running in my forked repo for parse_iter(). Yay.
So that’s another thing I implemented based on the goals discussed above in this topic, that is:

  • Better template matching if template name is not provided. Returning multiple matches.

Done.


Schema/Template from YAML

I’ll try to get YAML format in there soon.

If ftrack uses it to manage filesystem structure does that mean you already have a system/wrapper in place that converts beween another format than the Python template structure. Or isn’t the filesystem structure defined on the ftrack server and only locally?

Schema conversions to map one folder structure to another

What does everyone think of adding functionality for conversions of different Schemas. For example something that allows you to easily change a folder hierarchy to another based on template. Something like:

schema_old = Schema()
schema_new = Schema()
for old_path, new_path, old_template, new_template for x in schema_old.map(paths, schema_new):
    # use this information to restructure your folders
    pass

#15

I did just have a quick look and think I will give this a go. Have been using Nose and can relate to some of the advantages on their front-page, like DI, and grouping tests.

Thanks for the tip!


#16

I’ve been meaning to give this a proper introduction sometime in the future, but seeing that it’s relevant to the conversation I figure I’d put it out there to give some perspective on an alternative to path parsing.

Motivation

Path parsing is ultimately about metadata. About deriving information related to a particular file or folder (i.e. “asset”) from its absolute path, or vice versa about creating paths. Since paths are also structural, the metadata is tightly, very tightly coupled with its access. With its location on disk. Even though these two have very little in common and needn’t be coupled at all.

In the real world, taking a fruit out of a bowl doesn’t make it any less of a fruit.

With path parsing, this is no longer true, as it’s identity is based on where it is. This ultimately leads to brittle tools built upon it, and a limited, very limited amount and type of data you may associate with any asset.

Amount and type can be worked around. A common example is to associate an absolute path with an external entity, such as a database. This way, an absolute path points to an external source where it’s identity, and arbitrary amount of binary data lies.

But no matter where you choose to associate a path, it doesn’t eliminate the most destructive disadvantage to tools development and asset re-use which is that when you take the fruit out of the bowl, the fruit is no longer a fruit.

An alternative

Some time ago, I invested an amount of research and development into solving this and the fruits of this labour (pun!) led to a system similar to what Sony Imageworks and ILM uses and is based on the Unix philosophy “Everything is a file”.

In a nutshell, rather than associating identity with an absolute path, it is associated to the asset itself in the form of side-car files.

$ cd Peter
$ cquery tag .Asset

This way, no matter where it is, a fruit is always a fruit and remains a fruit even across different projects.

Another advantage to this approach is that paths no longer require a schema. Because remember the sole purpose of a path schema is to derive metadata out of the path. In this case, metadata is available at the source which means paths are fully decoupled.

I mocked up an example of this, that I refer to as “Schemaless Directory Structure”, in a GitHub gist here:

Transitioning

Due to how different this approach is to path parsing, a transition isn’t necessarily pretty. However I’m confident the gain is enough to make it just. If you are in a position of re-write and can make a clean entry, then getting started should be quite a lot easier than path parsing. No schemas, no regular expressions, just tag it.

Performance

Databases are second-to-none when it comes to the performance of querying and filtering information and it may seem that storing identity with assets eliminate this advantage. But there’s no reason to.

For example, rather than associating a database-entry to an absolute path, associate the database-entry to the asset.

$ cd MyAsset
$ cquery tag .0f3fvbs36nASjr

Move the fruit, it’s identity moves along with it whilst still reap the benefits of database performance.

Future

As I’m the only maintainer to the project and my time is spent mostly with Pyblish, development has been slow. But I believe it’s a clear winner to future asset identification techniques and am frankly astounded by how large established brands get by relying on solely on path parsing, and there are several out there. I’ve gotten several level-ups since the project got started and intend on continuing what I’ve started when the time is right.


#17

I think the slowest thing is without a form of knowing what path leads to what asset data is that finding a related path for a specific piece of data is very tricky. If you have the data of an asset you would still want to format the path based on a template I think? :wink:

I guess you’re saying it’s less needed for getting the asset information from a path? On the other hand it’s still very useful for figuring out a good path based solely on the asset information. You would still need a scheme for that?

Example:

asset_data = {'name': "hero", 'type': "character"}
path = template.get_path('model', asset_data)

Even if you would need it only to create the path!


I’m a bit tight on schedule now, but I hope to reply to the rest soon as well! :wink:


#18

Very true, but there are a few parts to what makes a path “good”.

One is how intuitive it is and another is how programmable it is; i.e. how to build tools around it. In other words, one is for humans, and the other machines.

The problem with path parsing is that these two become tightly coupled. In my experience it has often been the case that what is intuitive to humans isn’t that intuitive for the programmer who builds tools based on it.

So if you do need to generate destination paths for assets, that’s fine, you can still do that. But this time you can design the destination by what is intuitive for humans and compromise less when it comes to also making it programmable. Getting the best of both worlds.

Also, consider whether creating a destination path is more important to you than parsing it. This may reveal how heavily the metadata is used in tools. In my experience, paths are read a lot more than they are written. By a ratio of 1/10,000 or so.


#19

Actually for us it’s exactly the opposite. The main advantage of using systems like ftrack for tracking assets and shots is that it removes much of the filesystem dependencies. I would never try to figure out what an asset it when reading it from the disk, but rather look into database for asset I need, read it’s path from there and import.

I’d say 99% of my dealing with paths is writing into them, because when reading assets I just query published versions from the database (ftrack in our case) an get the path from there.


#20

It may be more dependent than you think.

Consider why you need a database to provide information about a file on disk in the first place.