Folder Structure Resolver

BigRoy · May 20, 2015, 10:15pm

I’ve had a better look at lucidity. The code indeed looks clean and short but there are some things that somewhat bother me. Or at least leave me somewhat questioning.

No Schema (the collection/container of templates) class?

Within __init__.py you have the methods for discovering templates and then using the returned list for the other functions, like parse(path, templates), format(data, templates), get_template(name, templates).

Instead I would rather have a Schema class that once instantiad could be used like:

schema = Schema()
scheme.discover_templates(template_paths)
# parsing
data, template = scheme.parse(path)

# formatting
# note: I do think it's weird you format data to all templates returning the first *matching* one. What's the use here? Can the same data only ever be used for one path? Are names of the keys never re-used?
scheme.format(data)

# getting a single template by name
scheme.get_template(name)

What do you think about using a Schema class?
Or what’s the benefit of having it the way it currently is?

Code formatting / style

The style used in lucidity is a bit strange. For example the use of triple single quotes instead of triple double quotes for docstrings isn’t PEP08. Even though overall it seems neatly written and controlled. But even more it’s very neatly written for autodoc & sphinx, but that does make the docstrings a bit annoying to read directly in the code. (Could be just me?)

And the tests formatting just looks really hard to read as well (to me). Is that a pytest thing you need to get used to? (Had never seen that package before)

By the way, building in the match all patterns really is about a couple of lines of change without a major overhaul.

martin · May 20, 2015, 10:36pm

Hey all, I’m behind Lucidity - Marcus pointed me to this discussion so hopefully I can contribute something.

As was mentioned in the thread, development on Lucidity is slow at the moment as I’m working full time at ftrack now (though we do also use Lucidity at ftrack as part of our structure plugins that deal with filesystem structure management). I am very open to pull requests though and also hope to find some time soon to tackle the template referencing and best match features.

Lucidity was designed as a flexible library so lots of places build on top of it, but perhaps it is time to add more to the core.

martin · May 20, 2015, 10:51pm

Could definitely be a useful addition. At first conception, Lucidity was designed to be a low level library that could be built on top of (with things like template management coming from elsewhere). However, it might be a good time now to add in some basic template management tools to the core. Feel free to post a request to https://github.com/4degrees/lucidity/issues

This is just personal preference that saves on RSI - one less key to hold down! The rest follows PEP8 quite closely.

Yeah, for those from a Unittest or Nose background it can look a bit different, but I highly encourage you to try pytest out with this style. It helps ensure test isolation for easy parallelization and also to use dependency injection for quick parametrization of tests (different input data etc). It took me a while to get used to it, but I haven’t looked back since.

BigRoy · May 21, 2015, 6:10am

Don’t worry. I’ve forked the repo and did a quick implementation. You can find it here Github.com/BigRoy/lucidity
After committing I noticed my commit messages were in past tense (habits die hard!), I’ll try to stick more closely to recommended format.

It contains only one quick test running in pytest because it took me more time to get through understanding how pytest works than I’d hoped. But so far the test log is super nice! Thanks.

I’ll stick with in my forked repo. It’s just that my IDE’s highlighting goes crazy on anything non PEP8 by default, it’s a minor switch. PyCharm is pretty strict in its default settings.

Took me some time to figure out, but it looks like I have a test running in my forked repo for parse_iter(). Yay.
So that’s another thing I implemented based on the goals discussed above in this topic, that is:

Better template matching if template name is not provided. Returning multiple matches.

Done.

Schema/Template from YAML

I’ll try to get YAML format in there soon.

If ftrack uses it to manage filesystem structure does that mean you already have a system/wrapper in place that converts beween another format than the Python template structure. Or isn’t the filesystem structure defined on the ftrack server and only locally?

Schema conversions to map one folder structure to another

What does everyone think of adding functionality for conversions of different Schemas. For example something that allows you to easily change a folder hierarchy to another based on template. Something like:

schema_old = Schema()
schema_new = Schema()
for old_path, new_path, old_template, new_template for x in schema_old.map(paths, schema_new):
    # use this information to restructure your folders
    pass

marcus · May 21, 2015, 6:43am

I did just have a quick look and think I will give this a go. Have been using Nose and can relate to some of the advantages on their front-page, like DI, and grouping tests.

Thanks for the tip!

marcus · May 21, 2015, 9:33am

I’ve been meaning to give this a proper introduction sometime in the future, but seeing that it’s relevant to the conversation I figure I’d put it out there to give some perspective on an alternative to path parsing.

Motivation

Path parsing is ultimately about metadata. About deriving information related to a particular file or folder (i.e. “asset”) from its absolute path, or vice versa about creating paths. Since paths are also structural, the metadata is tightly, very tightly coupled with its access. With its location on disk. Even though these two have very little in common and needn’t be coupled at all.

In the real world, taking a fruit out of a bowl doesn’t make it any less of a fruit.

With path parsing, this is no longer true, as it’s identity is based on where it is. This ultimately leads to brittle tools built upon it, and a limited, very limited amount and type of data you may associate with any asset.

Amount and type can be worked around. A common example is to associate an absolute path with an external entity, such as a database. This way, an absolute path points to an external source where it’s identity, and arbitrary amount of binary data lies.

But no matter where you choose to associate a path, it doesn’t eliminate the most destructive disadvantage to tools development and asset re-use which is that when you take the fruit out of the bowl, the fruit is no longer a fruit.

An alternative

Some time ago, I invested an amount of research and development into solving this and the fruits of this labour (pun!) led to a system similar to what Sony Imageworks and ILM uses and is based on the Unix philosophy “Everything is a file”.

In a nutshell, rather than associating identity with an absolute path, it is associated to the asset itself in the form of side-car files.

$ cd Peter
$ cquery tag .Asset

This way, no matter where it is, a fruit is always a fruit and remains a fruit even across different projects.

Another advantage to this approach is that paths no longer require a schema. Because remember the sole purpose of a path schema is to derive metadata out of the path. In this case, metadata is available at the source which means paths are fully decoupled.

I mocked up an example of this, that I refer to as “Schemaless Directory Structure”, in a GitHub gist here:

Transitioning

Due to how different this approach is to path parsing, a transition isn’t necessarily pretty. However I’m confident the gain is enough to make it just. If you are in a position of re-write and can make a clean entry, then getting started should be quite a lot easier than path parsing. No schemas, no regular expressions, just tag it.

Performance

Databases are second-to-none when it comes to the performance of querying and filtering information and it may seem that storing identity with assets eliminate this advantage. But there’s no reason to.

For example, rather than associating a database-entry to an absolute path, associate the database-entry to the asset.

$ cd MyAsset
$ cquery tag .0f3fvbs36nASjr

Move the fruit, it’s identity moves along with it whilst still reap the benefits of database performance.

Future

As I’m the only maintainer to the project and my time is spent mostly with Pyblish, development has been slow. But I believe it’s a clear winner to future asset identification techniques and am frankly astounded by how large established brands get by relying on solely on path parsing, and there are several out there. I’ve gotten several level-ups since the project got started and intend on continuing what I’ve started when the time is right.

BigRoy · May 21, 2015, 12:38pm

I think the slowest thing is without a form of knowing what path leads to what asset data is that finding a related path for a specific piece of data is very tricky. If you have the data of an asset you would still want to format the path based on a template I think?

I guess you’re saying it’s less needed for getting the asset information from a path? On the other hand it’s still very useful for figuring out a good path based solely on the asset information. You would still need a scheme for that?

Example:

asset_data = {'name': "hero", 'type': "character"}
path = template.get_path('model', asset_data)

Even if you would need it only to create the path!

I’m a bit tight on schedule now, but I hope to reply to the rest soon as well!

marcus · May 21, 2015, 1:00pm

Very true, but there are a few parts to what makes a path “good”.

One is how intuitive it is and another is how programmable it is; i.e. how to build tools around it. In other words, one is for humans, and the other machines.

The problem with path parsing is that these two become tightly coupled. In my experience it has often been the case that what is intuitive to humans isn’t that intuitive for the programmer who builds tools based on it.

So if you do need to generate destination paths for assets, that’s fine, you can still do that. But this time you can design the destination by what is intuitive for humans and compromise less when it comes to also making it programmable. Getting the best of both worlds.

Also, consider whether creating a destination path is more important to you than parsing it. This may reveal how heavily the metadata is used in tools. In my experience, paths are read a lot more than they are written. By a ratio of 1/10,000 or so.

mkolar · May 21, 2015, 1:06pm

Actually for us it’s exactly the opposite. The main advantage of using systems like ftrack for tracking assets and shots is that it removes much of the filesystem dependencies. I would never try to figure out what an asset it when reading it from the disk, but rather look into database for asset I need, read it’s path from there and import.

I’d say 99% of my dealing with paths is writing into them, because when reading assets I just query published versions from the database (ftrack in our case) an get the path from there.

marcus · May 21, 2015, 1:35pm

It may be more dependent than you think.

Consider why you need a database to provide information about a file on disk in the first place.

marcus · May 21, 2015, 1:46pm

I figure I’d provide a practical example to let it settle in.

Starting a shot

Let’s assume a scenario in which an artist is about to begin working on a shot. He’s been given Task A to work on Shot B from Film C, and he is an Animator.

$ cd /projects/filmc
$ cquery find .Shot
/projects/filmc/shots/ShotA
/projects/filmc/shots/ShotB
/projects/filmc/shots/ShotC
$ cd shots/ShotB
$ maya

From that point, the directory he is in, along with the user he is logged in as, is enough context for a tool to launch an appropriate session of Maya relevant to his task.

Querying parent project

From a tools perspective, let’s assume the tool has been given the path to an asset and is sensitive about displaying information relevant to which project it is in.

$ cd /projects/filmc/shots/ShotC/animation/private/marcus/maya/scenes
$ cquery .Project
/projects/filmc

From that point, data can be extracted from the project, either from metadata local to itself or e.g. FTrack if desired. The critical thing here is that each asset along the way knew what they were and didn’t require an external query.

In fact, neither of these require any interaction with anything other than the content itself. I’ve referred to it as “smart content” and I think it fits.

Where does it go?

In relation to figuring out where a file should go, this is where the parent project Open Metadata comes in.

# Basics
$ om write key --value="value"
$ om read key
value

# Usage
$ cd /project/filmc
$ om read assetsDir
subdir/assets

# More
$ om read shotsDir
shots
$ cd subdir/assets/MyAsset
$ om read developmentDir
private

The critical point here is that, the project itself is in control over where it’s children go, and it’s children are delegated a similar responsibility. This is the “decentralised” part of OM.

BigRoy · May 22, 2015, 12:44pm

True. But being more programmable also directly means it must follow more logical patterns/rules. Thus with a given set of information (whether a human or computer) the pattern can be figured out easier based on those rules.

That’s why we need a toolset that allows to consistently figure what the path to read from or write to is without needing to perform the black magic of complex path parsing. Preferrably how I would see it is that we deliver what an asset manager is to artists but then for programmers. The toolset to easily manage path structure logic for your tools.

So we take needing to program how paths are resolved away. Instead we give them the ability to define their set of rules easily plus build their tools that work with it.

I might be misinterpreting that but it’s definitely not only about the destination path. It’s even about the first folder ever created for that specific asset, even before the folder exists. So before the metadata could exist in the folder. Because the pattern also defines its creation.

What pattern would you see that is more intuitive to humans but is not based on a given set of rules that define its logic?

Consider something is tagged as a fruit but other than being tagged it doesn’t have any rule where to be. If it ends up in the vegetable bowl do we still want it to be a fruit? With patterns/rules its clear that the fruit shouldn’t be in the vegetable bowl. I would personally find it more confusing to find (as a human) the fruit around the vegetables?

Shouldn’t the path in the database be a path pattern?

I know Shotgun supports multiple Studios having different folder structures. The same assets can be in different folder structures based on studios preferences/template.

Are you storing absolute paths in the database?

How would you retrieve the correct asset if all you have/know is the work scene/path? Or am I misinterpreting what we’re solving?

marcus · May 22, 2015, 12:51pm

I’m having a bit of trouble following along to be honest, would it be possible to break it down to practical examples? If you have a particular scenario in mind, it might be easier to discuss starting from there.

mkolar · May 22, 2015, 1:26pm

Ftrack does something very similar if you use it’s location features. We’re currently not using this however, so we’re storing relative paths to the project root folder (so technically yes. absolute paths for now).

If scene path is the only thing I have then, yes, I would absolutely need to get the information from there, however that is never the case for us, because every time you work on a task, you open your host from ftrack which give me access to all the needed data within that instance of software (task id most impotantly). If scripts know artist is working on task X, they can figure out, where it’s file should go and where to look for it’s inputs. If someone launches nuke from desktop, hence I don’t have any of this data, it is considered out of the system and can’t publish, can’t use our tools etc.

marcus · May 22, 2015, 1:29pm

I understand what you mean, but unfortunately it isn’t quite this simple.

There are a few competing concerns when it comes to managing files and no matter how clever and optimal any layout is, the best you can hope for is a compromise.

Taking a step back, I made an illustration of the concerns I speak of.

When relying fully on path parsing, best you can hope for is D. A medium of all. And even the greatest Path Layout Tool in the world cannot hope to design a layout that fulfils every concern, simply because the amount of data you can ever hope to store in any path is simply not enough to encompass them all.

An example of A might look like this.

/lowPolyHeroModels
/highPolyHeroModels_chunk1
/highPolyHeroModels_chunk2
/highPolyHeroModels_chunk3
/previsAnimalRigs
/previsLightShaders

As in, everything occupies it’s own file-system and lies flat. This allows for many optimisations to take place and maximises performance. But it isn’t terribly convenient to look at or develop towards.

An example of B might look like this.

/projects/projectA/assets/assetA/rigs/private/roy/maya/scenes

Which makes for a logical and encapsulated layout, but may be less convenient to browse due to how deeply nested content becomes, and leaves little room to separate parts of a hierarchy, either for sake of performance, security or disk space.

The point of the alternative I posted above is that it allows to break apart each concern and maximise the potential of each.

marcus · May 22, 2015, 1:38pm

There are a many ways to go about this. Here’s an example of how that could work using cQuery and Open Metadata.

# Save an asset to disk
asset_name = "MyAsset"
current_file = maya.cmds.file(sn=True)
current_project = cquery.first_match(current_file, ".Project")
asset_dir = om.read(current_project, "assetDir")
final_path = os.path.join(asset_dir, asset_name)
# e.g. /server/projects/projectA/assets/MyAsset

BigRoy · May 24, 2015, 10:21am

Lucidity

I’ve done some draft work with lucidity to get some extra things in there. The changes so far are:

Remove the dependency on Bunch by implementing a custom formatter (#1).
Allow to parse and format all templates with the help of parse_iter() and format_iter() methods, instead of returning only the first match (#18).
Set up the basis for a Schema class that holds a list of Templates to work with.
Draft designs for a YAML schema format (#20), those can be found here.

They are in my forked repository: https://github.com/BigRoy/lucidity.

Though note that my master branch is my development branch (since I’ve been changing as I go). The Template class barely changed and most of the changes are within parse/format methods in __init__.py or come in additional files, like schema.py and formatter.py.

All tests are still passing, except for the new schema yaml tests that I’ve been setting up. To get to passing those completely we would need to have optional arguments (??) and nested/referenced templates (#2) implemented in lucidity as well.

cQuery and Open Metadata

@marcus what I like about your setup is that it keeps data very close to what it belongs to, but how paths that are ‘to be constructed’ are defined seems cumbersome. Well, at least cumbersome at this low level. I wonder how you’d go about setting this up in a higher level python package that can manage the file structure easier. So more how do you see a full project being managed like that?

At this stage I feel that your solution reduces the freedom on folder structure more than what an extensive schema could offer. Because the data must be nested under its respective parent. In theory it’s confusing for the code (so the computer) that the asset could live in a separated work and publish folder, where your solution has a conflict. That would mean that same data is duplicated, which is exactly what you’re trying to avoid. Right?

For example

work/asset/hero/.Asset
publish/asset/hero/.Asset

Where would you store the data and how would you manage the file structure pattern?

On the other hand there’re other things your solution could solve, like:

Arbitrary nesting of folders (you always know when you’re IN the asset’s own folder, since it’s tagged like that)
If a folder is deleted its entry is automatically deleted as well, since the data is inside. (On the contrary though, what would happen if a folder gets renamed/moved (not saying that you should)?)

marcus · May 25, 2015, 8:02am

It sounds like you’ve got the gist of how it works by now. We should definitely have a chat about how to solve these more specific issues if you’re interested, how about a separate thread?

BigRoy · May 25, 2015, 6:56pm

I’m all up for these discussions.

Seems like this thread is a mixture of pure folder structure and managing a project solely through folder structure. The second one might mostly be irrelevant to the first. That is with tools like ftrack/shotgun or metadata on folders. At least it seems like you’ve got something nice to bring to the table. If you think some separation is in place feel free to kickstart a thread with a clearer separation.

panupat · June 16, 2015, 7:45am

I’m trying out Lucidity and it looks like tokens cannot contain “/” character? So you have to extract the {project_root} out of the path before parsing it seems.

These would raise an error

path = 'P:/evo/scene'
token = '{project_root}/scene'

While this works.

path = 'P:/evo/scene'
token = 'P:/{project_root}/scene'