* git's fascination with absolute paths
@ 2009-12-21 18:42 J Chapman Flack
2009-12-21 22:09 ` Avery Pennarun
0 siblings, 1 reply; 5+ messages in thread
From: J Chapman Flack @ 2009-12-21 18:42 UTC (permalink / raw)
To: git
Hi list,
I have a requirement involving the reasonably common Unix-y design
pattern where a directory owned by a particular user lives in a
directory that user can't access, with a setuid gate that chdirs to
the right place and then drops to the real user's id to do certain
allowed things.
I wanted to use git for some of those allowed things and I can't,
because the code seems to call make_absolute_path on approximately
everything, and this is one of the situations that illustrates why it
isn't safe to assume you can get an absolute path that's even
usable (let alone race-free) corresponding to a relative path
in general.
git init tries to do this on the db path (even if
it's specified explicitly in GIT_DIR), and even if I do the git init
as root in advance, all the other git subcommands I've tried also
try to do it and fail when they chdir up out of the
working directory and try to chdir (not fchdir) back in.
In general it seems best for a program to stay free of assumptions
about absolute paths except when there is a specific functional
requirement that needs them. I assume there is something git does
that requires it to have this limitation, but it's not intuitive
to me if I just think about what I expect an scm system to do.
I've searched on 'absolute' in the list archive to see if there
was a past discussion like "we've decided we need absolute paths
everywhere because X" but I didn't find any. Can someone
describe what the reasoning was? A security concern perhaps?
(And one more serious than the race condition built into
make_absolute_path?)
Or, perhaps I should be asking, what is there in git that will
break if I recompile it with make_absolute_path(p){return p;}?
Does it store absolute paths in the db? Would a recompiled
version produce a db other gits couldn't read?
(Or less drastic perhaps, what if there were a version of m_a_p
that still returned an absolute path when possible and safe, and
just returned p otherwise so the program could still be usable?)
Thanks,
Chapman Flack
mathematics, purdue university
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths
2009-12-21 18:42 git's fascination with absolute paths J Chapman Flack
@ 2009-12-21 22:09 ` Avery Pennarun
2009-12-22 0:26 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Avery Pennarun @ 2009-12-21 22:09 UTC (permalink / raw)
To: J Chapman Flack; +Cc: git
On Mon, Dec 21, 2009 at 1:42 PM, J Chapman Flack <jflack@math.purdue.edu> wrote:
> In general it seems best for a program to stay free of assumptions
> about absolute paths except when there is a specific functional
> requirement that needs them. I assume there is something git does
> that requires it to have this limitation, but it's not intuitive
> to me if I just think about what I expect an scm system to do.
> I've searched on 'absolute' in the list archive to see if there
> was a past discussion like "we've decided we need absolute paths
> everywhere because X" but I didn't find any. Can someone
> describe what the reasoning was? A security concern perhaps?
> (And one more serious than the race condition built into
> make_absolute_path?)
I think it's probably just because it's easier to deal with absolute
paths than relative ones. Those ".." things can be annoying,
particularly inside scripts, etc, and git uses a lot of scripts. Much
more straightforward to just normalize all the paths once and be sure
there are no weird dots in them.
Not to say that it can't be done... just that it seems nobody has been
inspired to do so. I'm guessing most of the existing developers still
won't be inspired to do it based on your rather unusual use case;
however, they might accept patches.
> Or, perhaps I should be asking, what is there in git that will
> break if I recompile it with make_absolute_path(p){return p;}?
> Does it store absolute paths in the db? Would a recompiled
> version produce a db other gits couldn't read?
You might try this and then see what happens in 'make test'. I
imagine a set of clean patches that removed a lot of assumptions about
absolute paths, without breaking any unit tests, would be something
worth considering for integration into git.
Have fun,
Avery
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths
2009-12-21 22:09 ` Avery Pennarun
@ 2009-12-22 0:26 ` Junio C Hamano
2009-12-22 6:30 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2009-12-22 0:26 UTC (permalink / raw)
To: Avery Pennarun; +Cc: J Chapman Flack, git
Avery Pennarun <apenwarr@gmail.com> writes:
> On Mon, Dec 21, 2009 at 1:42 PM, J Chapman Flack <jflack@math.purdue.edu> wrote:
>> In general it seems best for a program to stay free of assumptions
>> about absolute paths except when there is a specific functional
>> requirement that needs them. I assume there is something git does
>> that requires it to have this limitation, but it's not intuitive
>> to me if I just think about what I expect an scm system to do.
>> I've searched on 'absolute' in the list archive to see if there
>> was a past discussion like "we've decided we need absolute paths
>> everywhere because X" but I didn't find any. Can someone
>> describe what the reasoning was? A security concern perhaps?
>> (And one more serious than the race condition built into
>> make_absolute_path?)
>
> I think it's probably just because it's easier to deal with absolute
> paths than relative ones. Those ".." things can be annoying,
> particularly inside scripts, etc, and git uses a lot of scripts. Much
> more straightforward to just normalize all the paths once and be sure
> there are no weird dots in them.
Not really. The scripts can work with ".." just fine, as long as they
know how to use "cd_to_topdir" and "rev-parse --show-prefix" correctly.
While I do not necessarily agree with the original claim that hiding
higher level of hierarchies are "standard" practice in UNIX (it instead
falls into "an unusual set-up that is permitted but you have to be
careful" category), I don't think it is fundamental that we need read
access all the way up to the root level. It is only that getcwd(3) does.
At the basic work tree and index operations operate relative to the root
of the work tree. Originally, almost no privision was made to run from a
subdirectory of a work tree (you were expected to run from the top-level,
having ./.git as the meta information sture), and we didn't have to run
any getcwd(3). Later we added "look at parent directories until we find
the one that has .git subdirectory, while remembering how many levels we
went up", in order to support operations from a subdirectory of a work
tree. The commands chdir up to the root of the work tree and would use
the path they climbed as a pathspec to limit the scope of their operation.
While "counting how many levels we went up" can be expressed by a sequence
of "../", turning it to the directory prefix means at some point you would
need to do what getcwd(3) does. It wants to be able to read ".." to give
you an absolute path.
By rewriting that part of the "root-level-discovery" code to do something
like
- while test -d .git is not true:
- stat(".") to get the inum;
- chdir(".."); and
- opendir(".") and readdir() to find where we were;
while going up every level, you should be able to construct the prefix
without being to able to read all the way up to the filesystem root. You
only need to be able to read your work tree.
Admittedly the code complexity got worse later when we added support for
GIT_WORK_TREE and also GIT_CEILING_DIRECTORIES, as they fundamentally need
to know where you are relative to the root of the filesystem tree and need
a working getcwd(3) support, which J Chapman's set-up refuses to give.
Also I wouldn't be surprised if the support for these two features cheat
in order to reduce code complexity by always using absolute paths even in
places where a path relative to the root of the work tree might have
sufficed.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths
2009-12-22 0:26 ` Junio C Hamano
@ 2009-12-22 6:30 ` Junio C Hamano
2009-12-22 17:21 ` J Chapman Flack
0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2009-12-22 6:30 UTC (permalink / raw)
To: Avery Pennarun; +Cc: J Chapman Flack, git
Junio C Hamano <gitster@pobox.com> writes:
> Not really. The scripts can work with ".." just fine, as long as they
> know how to use "cd_to_topdir" and "rev-parse --show-prefix" correctly.
>
> While I do not necessarily agree with the original claim that hiding
> higher level of hierarchies are "standard" practice in UNIX (it instead
> falls into "an unusual set-up that is permitted but you have to be
> careful" category), I don't think it is fundamental that we need read
> access all the way up to the root level. It is only that getcwd(3) does.
>
> At the basic level, work tree and index operations operate relative to the root
> of the work tree. Originally, almost no privision was made to run from a
> subdirectory of a work tree (you were expected to run from the top-level,
> having ./.git as the meta information sture), and we didn't have to run
> any getcwd(3). Later we added "look at parent directories until we find
> the one that has .git subdirectory, while remembering how many levels we
> went up", in order to support operations from a subdirectory of a work
> tree. The commands chdir up to the root of the work tree and would use
> the path they climbed as a pathspec to limit the scope of their operation.
>
> While "counting how many levels we went up" can be expressed by a sequence
> of "../", turning it to the directory prefix means at some point you would
> need to do what getcwd(3) does. It wants to be able to read ".." to give
> you an absolute path.
>
> By rewriting that part of the "root-level-discovery" code to do something
> like
>
> - while test -d .git is not true:
> - stat(".") to get the inum;
> - chdir(".."); and
> - opendir(".") and readdir() to find where we were;
>
> while going up every level, you should be able to construct the prefix
> without being to able to read all the way up to the filesystem root. You
> only need to be able to read your work tree.
>
> Admittedly the code complexity got worse later when we added support for
> GIT_WORK_TREE and also GIT_CEILING_DIRECTORIES, as they fundamentally need
> to know where you are relative to the root of the filesystem tree and need
> a working getcwd(3) support, which J Chapman's set-up refuses to give.
>
> Also I wouldn't be surprised if the support for these two features cheat
> in order to reduce code complexity by always using absolute paths even in
> places where a path relative to the root of the work tree might have
> sufficed.
Clarificaiton.
The above, like many other messages from me, was not meant as a
justification, but a mere explanation of the historical fact. IOW, don't
get me wrong by interpreting that I am not interested in seeing a solution
that does not use absolute paths.
While I think the original "higher levels in the filesystems may not be
accessible" is a rather unusual set-up, making paths absolute and relying
on being able to always do so have another drawback in a not-so-unusual
setup. A work tree that is shallow (say, has only one t/ subdirectory and
short filenames) may not be usable if it is so deep in the filesystem
hierarchy that the result of getcwd(3) exceeds PATH_MAX. The "hand roll
what getcwd(3) did in traditional UNIX while looking for the root level of
the work tree" approach I outlined in the previous message will be a way
to fix such a use case; as long as the deepest level of your work tree
relative to the root of the work tree does not exceed PATH_MAX, you'll be
Ok.
We have a few known issues in the GIT_WORK_TREE (IIRC, it has a funny
interaction with alias expansion). When we reexamine the codepath that
the introduction of the feature needed to touch, I would love to see us at
least try to see if it is feasible to redo this without calling getcwd(3)
when no GIT_WORK_TREE (or core.worktree) is set.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths
2009-12-22 6:30 ` Junio C Hamano
@ 2009-12-22 17:21 ` J Chapman Flack
0 siblings, 0 replies; 5+ messages in thread
From: J Chapman Flack @ 2009-12-22 17:21 UTC (permalink / raw)
To: git
Junio C Hamano wrote:
> Clarificaiton.
>
> The above, like many other messages from me, was not meant as a
> justification, but a mere explanation of the historical fact. IOW,
> don't get me wrong by interpreting that I am not interested in seeing
> a solution that does not use absolute paths.
No worries - a sense of the history is exactly the kind of response
I was hoping for. :)
> While I think the original "higher levels in the filesystems may not
> be accessible" is a rather unusual set-up, making paths absolute and
> relying on being able to always do so have another drawback in a
> not-so-unusual setup. A work tree that is shallow (say, has only one
> t/ subdirectory and short filenames) may not be usable if it is so
> deep in the filesystem hierarchy that the result of getcwd(3) exceeds
> PATH_MAX. The "hand roll
This is funny; I think in my own career I've seen applications that keep
sensitive data in subtrees restricted at the top somewhat routinely (which
might not mean "really often in absolute terms" but something more like
"whenever it made sense for the app") going back at least as far as uucp
(IIRC) ... Solaris Zones are set up that way too (when viewed from the
global zone). By the same token, while I'd never be surprised to hear
of someone with deep hierarchies that exceed PATH_MAX, I'm not sure I've
ever actually seen it happen myself.
I don't intend that as the start of a "which case is more unusual"
comparison, I think it just illustrates the difficulty in making such
judgments of usualness, as different people's career trajectories expose
them to very different things. I'd rather spend the mental effort trying
to extract whatever general principle can be used to code robustly so the
fewest judgments of usualness need to be made. Here the general principle
(which I think you've already kind of stated yourself, so I'm not trying
to preach but just to finish the thought) is that, for various reasons,
trying to transform a relative into an absolute path is not always well
defined, can't even always be done, can have implementation-dependent
side effects and race conditions, and ought to be an operation that gets
pulled out of the arsenal only with deliberation and only for specific
paths that must be made absolute if that's the only way to satisfy some
known functional requirement of the app.
It might be tied in to the principle of least astonishment, just by
assuming that whatever path the caller, user, or admin provides is
probably the path s/he wants you to use, for any number of possible
reasons that you don't need to be able to foresee.
>> By rewriting that part of the "root-level-discovery" code to do
>> something like
>>
>> - while test -d .git is not true:
>> - stat(".") to get the inum;
>> - chdir(".."); and
>> - opendir(".") and readdir() to find where we were;
>>
>> while going up every level, you should be able to construct the prefix
>> without being to able to read all the way up to the filesystem root.
You
>> only need to be able to read your work tree.
Yes, that's exactly what I would have suggested for the root level
discovery. As long as you can find .git before reaching any inaccessible
ancestor, life is good. (And if you can't it's a perfectly good reason
to give up without astonishing the user.)
It would still be better to open "." once at the beginning and
fchdir back to it, rather than trying to chdir back to the
constructed path string (an inherent race condition), just as the
Notes section in linux getcwd(3) says.
An interesting point, the chdir ../opendir/readdir algorithm you
give above is no longer necessarily what getcwd does, though it
traditionally was. These days there's often a kernel name cache
that getcwd gets at through a syscall or /proc. You can tell,
because when I tried to use 'git init' in my situation, the error
was not from getcwd but from the later access() call done on the
full path that getcwd successfully returned.
[there's a side issue: access(2) isn't for what a lot of people
think it's for. It's a rather esoteric call for testing access
by the real ids instead of the effective ones, and it's needed
in code that (a) runs set{u,g}id AND (b) wants to confirm that
a user-specified file is really something the user has rights to.
Using access() routinely just to ask "can I open this" has
a couple of problems: (1) it has a race condition and gives you
less information than just trying the open and testing errno;
(2) if anybody down the road does try to use your program or
code under set{u,g}id circumstances, astonishing failures can
result. Even for its intended purpose access() suffers from a
race; current OSes make it fairly easy to just drop to the real
user's IDs, try the open, and test errno, so access() is a bit
of a dinosaur.]
Anyway, it would also be possible to make use of the modern
optimized getcwd in the root-level-discovery algorithm. If getcwd
gives a result you can just follow it backwards until you find .git,
confirming in each step that you can stat the name of the child and
the inums match (an O(1) test instead of O(directory-size) for readdir).
Just don't follow it back past the directory containing .git. The
slower fallback code is only needed for less modern systems in case
getcwd returns EACCES. But now we're in the realm of optimization;
the traditional loop does the trick.
(well, one nice feature of the modern approach, beyond speed, is that
it only needs x on the directories up to .git and not r. but since
I'm doubtful anything else git does would actually work without r,
that's probably moot.)
>> Admittedly the code complexity got worse later when we added support
>> for GIT_WORK_TREE and also GIT_CEILING_DIRECTORIES, as they
>> fundamentally need to know where you are relative to the root of the
>> filesystem tree and need a working getcwd(3) support
That's the kind of thing I wondered about, are there particular
features that genuinely require an absolute path? I'm a git newbie
and I don't know what these features do, so I can't comment. (This
project was going to be my excuse for learning git, but rcs actually
suffices and I need to get it done.)
Do these features actually need to traverse the full path, or just
to know what it is? On a system with modern getcwd that can return
the path even if it isn't traversable, could they make use of that?
-Chap
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-12-22 17:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-21 18:42 git's fascination with absolute paths J Chapman Flack
2009-12-21 22:09 ` Avery Pennarun
2009-12-22 0:26 ` Junio C Hamano
2009-12-22 6:30 ` Junio C Hamano
2009-12-22 17:21 ` J Chapman Flack
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).