Folder Structure Resolver

marcus · May 21, 2015, 1:46pm

I figure I’d provide a practical example to let it settle in.

Starting a shot

Let’s assume a scenario in which an artist is about to begin working on a shot. He’s been given Task A to work on Shot B from Film C, and he is an Animator.

$ cd /projects/filmc
$ cquery find .Shot
/projects/filmc/shots/ShotA
/projects/filmc/shots/ShotB
/projects/filmc/shots/ShotC
$ cd shots/ShotB
$ maya

From that point, the directory he is in, along with the user he is logged in as, is enough context for a tool to launch an appropriate session of Maya relevant to his task.

Querying parent project

From a tools perspective, let’s assume the tool has been given the path to an asset and is sensitive about displaying information relevant to which project it is in.

$ cd /projects/filmc/shots/ShotC/animation/private/marcus/maya/scenes
$ cquery .Project
/projects/filmc

From that point, data can be extracted from the project, either from metadata local to itself or e.g. FTrack if desired. The critical thing here is that each asset along the way knew what they were and didn’t require an external query.

In fact, neither of these require any interaction with anything other than the content itself. I’ve referred to it as “smart content” and I think it fits.

Where does it go?

In relation to figuring out where a file should go, this is where the parent project Open Metadata comes in.

# Basics
$ om write key --value="value"
$ om read key
value

# Usage
$ cd /project/filmc
$ om read assetsDir
subdir/assets

# More
$ om read shotsDir
shots
$ cd subdir/assets/MyAsset
$ om read developmentDir
private

The critical point here is that, the project itself is in control over where it’s children go, and it’s children are delegated a similar responsibility. This is the “decentralised” part of OM.

BigRoy · May 22, 2015, 12:44pm

True. But being more programmable also directly means it must follow more logical patterns/rules. Thus with a given set of information (whether a human or computer) the pattern can be figured out easier based on those rules.

That’s why we need a toolset that allows to consistently figure what the path to read from or write to is without needing to perform the black magic of complex path parsing. Preferrably how I would see it is that we deliver what an asset manager is to artists but then for programmers. The toolset to easily manage path structure logic for your tools.

So we take needing to program how paths are resolved away. Instead we give them the ability to define their set of rules easily plus build their tools that work with it.

I might be misinterpreting that but it’s definitely not only about the destination path. It’s even about the first folder ever created for that specific asset, even before the folder exists. So before the metadata could exist in the folder. Because the pattern also defines its creation.

What pattern would you see that is more intuitive to humans but is not based on a given set of rules that define its logic?

Consider something is tagged as a fruit but other than being tagged it doesn’t have any rule where to be. If it ends up in the vegetable bowl do we still want it to be a fruit? With patterns/rules its clear that the fruit shouldn’t be in the vegetable bowl. I would personally find it more confusing to find (as a human) the fruit around the vegetables?

Shouldn’t the path in the database be a path pattern?

I know Shotgun supports multiple Studios having different folder structures. The same assets can be in different folder structures based on studios preferences/template.

Are you storing absolute paths in the database?

How would you retrieve the correct asset if all you have/know is the work scene/path? Or am I misinterpreting what we’re solving?

marcus · May 22, 2015, 12:51pm

I’m having a bit of trouble following along to be honest, would it be possible to break it down to practical examples? If you have a particular scenario in mind, it might be easier to discuss starting from there.

mkolar · May 22, 2015, 1:26pm

Ftrack does something very similar if you use it’s location features. We’re currently not using this however, so we’re storing relative paths to the project root folder (so technically yes. absolute paths for now).

If scene path is the only thing I have then, yes, I would absolutely need to get the information from there, however that is never the case for us, because every time you work on a task, you open your host from ftrack which give me access to all the needed data within that instance of software (task id most impotantly). If scripts know artist is working on task X, they can figure out, where it’s file should go and where to look for it’s inputs. If someone launches nuke from desktop, hence I don’t have any of this data, it is considered out of the system and can’t publish, can’t use our tools etc.

marcus · May 22, 2015, 1:29pm

I understand what you mean, but unfortunately it isn’t quite this simple.

There are a few competing concerns when it comes to managing files and no matter how clever and optimal any layout is, the best you can hope for is a compromise.

Taking a step back, I made an illustration of the concerns I speak of.

When relying fully on path parsing, best you can hope for is D. A medium of all. And even the greatest Path Layout Tool in the world cannot hope to design a layout that fulfils every concern, simply because the amount of data you can ever hope to store in any path is simply not enough to encompass them all.

An example of A might look like this.

/lowPolyHeroModels
/highPolyHeroModels_chunk1
/highPolyHeroModels_chunk2
/highPolyHeroModels_chunk3
/previsAnimalRigs
/previsLightShaders

As in, everything occupies it’s own file-system and lies flat. This allows for many optimisations to take place and maximises performance. But it isn’t terribly convenient to look at or develop towards.

An example of B might look like this.

/projects/projectA/assets/assetA/rigs/private/roy/maya/scenes

Which makes for a logical and encapsulated layout, but may be less convenient to browse due to how deeply nested content becomes, and leaves little room to separate parts of a hierarchy, either for sake of performance, security or disk space.

The point of the alternative I posted above is that it allows to break apart each concern and maximise the potential of each.

marcus · May 22, 2015, 1:38pm

There are a many ways to go about this. Here’s an example of how that could work using cQuery and Open Metadata.

# Save an asset to disk
asset_name = "MyAsset"
current_file = maya.cmds.file(sn=True)
current_project = cquery.first_match(current_file, ".Project")
asset_dir = om.read(current_project, "assetDir")
final_path = os.path.join(asset_dir, asset_name)
# e.g. /server/projects/projectA/assets/MyAsset

BigRoy · May 24, 2015, 10:21am

Lucidity

I’ve done some draft work with lucidity to get some extra things in there. The changes so far are:

Remove the dependency on Bunch by implementing a custom formatter (#1).
Allow to parse and format all templates with the help of parse_iter() and format_iter() methods, instead of returning only the first match (#18).
Set up the basis for a Schema class that holds a list of Templates to work with.
Draft designs for a YAML schema format (#20), those can be found here.

They are in my forked repository: https://github.com/BigRoy/lucidity.

Though note that my master branch is my development branch (since I’ve been changing as I go). The Template class barely changed and most of the changes are within parse/format methods in __init__.py or come in additional files, like schema.py and formatter.py.

All tests are still passing, except for the new schema yaml tests that I’ve been setting up. To get to passing those completely we would need to have optional arguments (??) and nested/referenced templates (#2) implemented in lucidity as well.

cQuery and Open Metadata

@marcus what I like about your setup is that it keeps data very close to what it belongs to, but how paths that are ‘to be constructed’ are defined seems cumbersome. Well, at least cumbersome at this low level. I wonder how you’d go about setting this up in a higher level python package that can manage the file structure easier. So more how do you see a full project being managed like that?

At this stage I feel that your solution reduces the freedom on folder structure more than what an extensive schema could offer. Because the data must be nested under its respective parent. In theory it’s confusing for the code (so the computer) that the asset could live in a separated work and publish folder, where your solution has a conflict. That would mean that same data is duplicated, which is exactly what you’re trying to avoid. Right?

For example

work/asset/hero/.Asset
publish/asset/hero/.Asset

Where would you store the data and how would you manage the file structure pattern?

On the other hand there’re other things your solution could solve, like:

Arbitrary nesting of folders (you always know when you’re IN the asset’s own folder, since it’s tagged like that)
If a folder is deleted its entry is automatically deleted as well, since the data is inside. (On the contrary though, what would happen if a folder gets renamed/moved (not saying that you should)?)

marcus · May 25, 2015, 8:02am

It sounds like you’ve got the gist of how it works by now. We should definitely have a chat about how to solve these more specific issues if you’re interested, how about a separate thread?

BigRoy · May 25, 2015, 6:56pm

I’m all up for these discussions.

Seems like this thread is a mixture of pure folder structure and managing a project solely through folder structure. The second one might mostly be irrelevant to the first. That is with tools like ftrack/shotgun or metadata on folders. At least it seems like you’ve got something nice to bring to the table. If you think some separation is in place feel free to kickstart a thread with a clearer separation.

panupat · June 16, 2015, 7:45am

I’m trying out Lucidity and it looks like tokens cannot contain “/” character? So you have to extract the {project_root} out of the path before parsing it seems.

These would raise an error

path = 'P:/evo/scene'
token = '{project_root}/scene'

While this works.

path = 'P:/evo/scene'
token = 'P:/{project_root}/scene'

BigRoy · June 16, 2015, 7:54am

Correct. You would have to use your own regex format for that token.
To do your own project_root you can do something like:

path = 'P:/evo/scene'
token = r"{project_root:(^[\w]*:*[\/]?(?:(?:(?<=[\/])[^\/]+[\/]?)*))}/scene"

Or for readability (to explain):

# The complex regex part that can match any root path (NT and UNIX)
regex_pattern = r'(^[\w]*:*[\/]?(?:(?:(?<=[\/])[^\/]+[\/]?)*))'

# The lucidity formatted *project_root* key
root = '{project_root:%s}' % regex_pattern

# Your token
token = root + "/scene"

We’re using this same regex pattern for the The Deal test project we’re doing with Pyblish Magenta

panupat · June 16, 2015, 8:06am

Thank you for sharing BigRoy. I’ll need to dig deeper into the tutorial

marcus · June 16, 2015, 11:29am

I never really got the role of the regex here, could someone explain it to me in layman terms? Why can’t you have a slash?

BigRoy · June 16, 2015, 11:52am

Lucidity parses each token or key in a template as a name that does not hold any slashes. So the default key will never resolve to nested folders.

Otherwise any key would always match arbitrary amount of nested folders:

template = '{root}/dev/{asset}/test'
path = 'C:/projects/foobar/dev/character/maya/test

# {root}: C:/projects/foobar
# {asset}: character/maya

Though currently it does not match that by default, since {root} and {asset} can only ever be a single folder.

So we overwrite the regex by implementing our custom regex to allow the {root} to match anything up to that point. In theory you could have a very simple regex for {root} to match anything non-greedy up to that point. Something like {root:.+?} where the regex pattern is .+? to match any 1 or more characters in a non-greedy fashion.

marcus · June 16, 2015, 12:03pm

Ah, I think I understand. So it’s only relevant to parsing a string into “tokens”, and not formatting a path from them?

BigRoy · June 16, 2015, 12:28pm

Not entirely sure, but if lucidity would confirm that a path is always reversible (can be parsed) than formatting should obey to the same rules.

Theoretically it could confirm that it parses correctly after formatting. Actually now that I think of it, it should!

marcus · June 16, 2015, 12:40pm

Cool, thanks @BigRoy

marcus · June 22, 2015, 11:19am

Got a link from a friend working on a similar solution as Lucidity.

http://docs.efestolab.uk/ade/docs/build/html/index.html

martin · July 16, 2015, 10:39am

Ade is tackling a slightly different (and important problem) I think - it operates at a higher level than Lucidity which is just a simple library at the end of the day.

Ade solves things like (correct me if I am wrong Lorenzo!) what permissions should a directory have in the hierarchy. One way you can do that is come up with a fun config file format, but an arguably nicer way would be to just setup up a real directory structure as a template and reference that, which I think is what Ade does. This is similar to project templates you see in development such as cookiecutter.

Great to see more solutions being open sourced!

marcus · July 16, 2015, 10:41am

Ah, that does sound interesting (and complex).

I’ll ping Lorenzo, maybe he can give us the tour.