I'll give you the idealistic view of what an Extractor should know, keeping in mind that in practice some of this may be bent.
This is an expansion of the CVEI definition from the Wiki, in particular the note on "out-of-band" access.
If we think of CVEI as a self-contained unit within which input enters through Collection and output exits through Integration, then Extraction should know nothing about the surrounding pipeline.
_______________________________
| ___ ___ ___ ___ |
| | | | | | | | | |
in --->|-| C |-->| V |-->| E |-->| I |-|---> out
| |___| |___| |___| |___| |
|_______________________________|
CVEI dataflow
Idealistically, that means even Validation should not be allowed to grasp for information outside of what already exists within the Context and Instances.
For example.
my_collector.py
# OK
import external_library_a
import external_library_b
class MyCollector(...):
...
my_extractor.py
# NOT OK
import external_library_a
my_integrator.py
# OK
import external_library_a
Collection and Integration reach out-of-band for information, that's fine. But extraction cannot. It should already have all it needs, and if it doesn't than there's more to collect during collection.
The idea is to isolate external requests to as few points as possible and keep the internals as simple and "dumb" as possible, so as to facilitate reuse in other plug-in stacks and/or pipelines. If each plug-in reaches out and grabs information ad-hoc (i.e. globally), you've got a recipe for tight coupling and poor reusability.
With the above in mind, the extractor's only responsibility is to extract. To serialise. To take something only it understands (e.g. the Maya scene format) and produce something other's can potentially understand (e.g. an .obj or .json or .abc).
It's unfortunate it must produce physical files, as it forces us to make a decision up-front about where to put this file. But I would make a conscious effort to make this location as un-global and simple as possible, such as a temporary directory.
Think of it like an in-memory location, that by limitations of Maya and others must have a filesystem address. Something being built during extraction, but not complete until all extractor finishes and not available to others until Integration has formatted and integrated it into the pipeline.
As a practical example to your question, I would define a temporary directory during collection, potentially based on an out-of-band mechanism (e.g. you have a pipelined temporary directory, for dedicated network access or whatever), and create/fill this during extraction.
If there are special rules related to formatting of names that you can establish during Collection, such as a global filename pattern, then I would make note of this too during Collection and use this during Extraction. Other things, such as versioning, may only be available during integration, in which case it's fine to rename these as you move/copy these into the public space.
class MyCollector(...):
def process(self, context):
context.data["tempdir"] = tempfile.mkdtemp()
context.data["alembicTemplate"] = "{project}_{asset}_alembic_{version}.abc"