* git's fascination with absolute paths @ 2009-12-21 18:42 J Chapman Flack 2009-12-21 22:09 ` Avery Pennarun 0 siblings, 1 reply; 5+ messages in thread From: J Chapman Flack @ 2009-12-21 18:42 UTC (permalink / raw) To: git Hi list, I have a requirement involving the reasonably common Unix-y design pattern where a directory owned by a particular user lives in a directory that user can't access, with a setuid gate that chdirs to the right place and then drops to the real user's id to do certain allowed things. I wanted to use git for some of those allowed things and I can't, because the code seems to call make_absolute_path on approximately everything, and this is one of the situations that illustrates why it isn't safe to assume you can get an absolute path that's even usable (let alone race-free) corresponding to a relative path in general. git init tries to do this on the db path (even if it's specified explicitly in GIT_DIR), and even if I do the git init as root in advance, all the other git subcommands I've tried also try to do it and fail when they chdir up out of the working directory and try to chdir (not fchdir) back in. In general it seems best for a program to stay free of assumptions about absolute paths except when there is a specific functional requirement that needs them. I assume there is something git does that requires it to have this limitation, but it's not intuitive to me if I just think about what I expect an scm system to do. I've searched on 'absolute' in the list archive to see if there was a past discussion like "we've decided we need absolute paths everywhere because X" but I didn't find any. Can someone describe what the reasoning was? A security concern perhaps? (And one more serious than the race condition built into make_absolute_path?) Or, perhaps I should be asking, what is there in git that will break if I recompile it with make_absolute_path(p){return p;}? Does it store absolute paths in the db? Would a recompiled version produce a db other gits couldn't read? (Or less drastic perhaps, what if there were a version of m_a_p that still returned an absolute path when possible and safe, and just returned p otherwise so the program could still be usable?) Thanks, Chapman Flack mathematics, purdue university ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths 2009-12-21 18:42 git's fascination with absolute paths J Chapman Flack @ 2009-12-21 22:09 ` Avery Pennarun 2009-12-22 0:26 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Avery Pennarun @ 2009-12-21 22:09 UTC (permalink / raw) To: J Chapman Flack; +Cc: git On Mon, Dec 21, 2009 at 1:42 PM, J Chapman Flack <jflack@math.purdue.edu> wrote: > In general it seems best for a program to stay free of assumptions > about absolute paths except when there is a specific functional > requirement that needs them. I assume there is something git does > that requires it to have this limitation, but it's not intuitive > to me if I just think about what I expect an scm system to do. > I've searched on 'absolute' in the list archive to see if there > was a past discussion like "we've decided we need absolute paths > everywhere because X" but I didn't find any. Can someone > describe what the reasoning was? A security concern perhaps? > (And one more serious than the race condition built into > make_absolute_path?) I think it's probably just because it's easier to deal with absolute paths than relative ones. Those ".." things can be annoying, particularly inside scripts, etc, and git uses a lot of scripts. Much more straightforward to just normalize all the paths once and be sure there are no weird dots in them. Not to say that it can't be done... just that it seems nobody has been inspired to do so. I'm guessing most of the existing developers still won't be inspired to do it based on your rather unusual use case; however, they might accept patches. > Or, perhaps I should be asking, what is there in git that will > break if I recompile it with make_absolute_path(p){return p;}? > Does it store absolute paths in the db? Would a recompiled > version produce a db other gits couldn't read? You might try this and then see what happens in 'make test'. I imagine a set of clean patches that removed a lot of assumptions about absolute paths, without breaking any unit tests, would be something worth considering for integration into git. Have fun, Avery ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths 2009-12-21 22:09 ` Avery Pennarun @ 2009-12-22 0:26 ` Junio C Hamano 2009-12-22 6:30 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2009-12-22 0:26 UTC (permalink / raw) To: Avery Pennarun; +Cc: J Chapman Flack, git Avery Pennarun <apenwarr@gmail.com> writes: > On Mon, Dec 21, 2009 at 1:42 PM, J Chapman Flack <jflack@math.purdue.edu> wrote: >> In general it seems best for a program to stay free of assumptions >> about absolute paths except when there is a specific functional >> requirement that needs them. I assume there is something git does >> that requires it to have this limitation, but it's not intuitive >> to me if I just think about what I expect an scm system to do. >> I've searched on 'absolute' in the list archive to see if there >> was a past discussion like "we've decided we need absolute paths >> everywhere because X" but I didn't find any. Can someone >> describe what the reasoning was? A security concern perhaps? >> (And one more serious than the race condition built into >> make_absolute_path?) > > I think it's probably just because it's easier to deal with absolute > paths than relative ones. Those ".." things can be annoying, > particularly inside scripts, etc, and git uses a lot of scripts. Much > more straightforward to just normalize all the paths once and be sure > there are no weird dots in them. Not really. The scripts can work with ".." just fine, as long as they know how to use "cd_to_topdir" and "rev-parse --show-prefix" correctly. While I do not necessarily agree with the original claim that hiding higher level of hierarchies are "standard" practice in UNIX (it instead falls into "an unusual set-up that is permitted but you have to be careful" category), I don't think it is fundamental that we need read access all the way up to the root level. It is only that getcwd(3) does. At the basic work tree and index operations operate relative to the root of the work tree. Originally, almost no privision was made to run from a subdirectory of a work tree (you were expected to run from the top-level, having ./.git as the meta information sture), and we didn't have to run any getcwd(3). Later we added "look at parent directories until we find the one that has .git subdirectory, while remembering how many levels we went up", in order to support operations from a subdirectory of a work tree. The commands chdir up to the root of the work tree and would use the path they climbed as a pathspec to limit the scope of their operation. While "counting how many levels we went up" can be expressed by a sequence of "../", turning it to the directory prefix means at some point you would need to do what getcwd(3) does. It wants to be able to read ".." to give you an absolute path. By rewriting that part of the "root-level-discovery" code to do something like - while test -d .git is not true: - stat(".") to get the inum; - chdir(".."); and - opendir(".") and readdir() to find where we were; while going up every level, you should be able to construct the prefix without being to able to read all the way up to the filesystem root. You only need to be able to read your work tree. Admittedly the code complexity got worse later when we added support for GIT_WORK_TREE and also GIT_CEILING_DIRECTORIES, as they fundamentally need to know where you are relative to the root of the filesystem tree and need a working getcwd(3) support, which J Chapman's set-up refuses to give. Also I wouldn't be surprised if the support for these two features cheat in order to reduce code complexity by always using absolute paths even in places where a path relative to the root of the work tree might have sufficed. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths 2009-12-22 0:26 ` Junio C Hamano @ 2009-12-22 6:30 ` Junio C Hamano 2009-12-22 17:21 ` J Chapman Flack 0 siblings, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2009-12-22 6:30 UTC (permalink / raw) To: Avery Pennarun; +Cc: J Chapman Flack, git Junio C Hamano <gitster@pobox.com> writes: > Not really. The scripts can work with ".." just fine, as long as they > know how to use "cd_to_topdir" and "rev-parse --show-prefix" correctly. > > While I do not necessarily agree with the original claim that hiding > higher level of hierarchies are "standard" practice in UNIX (it instead > falls into "an unusual set-up that is permitted but you have to be > careful" category), I don't think it is fundamental that we need read > access all the way up to the root level. It is only that getcwd(3) does. > > At the basic level, work tree and index operations operate relative to the root > of the work tree. Originally, almost no privision was made to run from a > subdirectory of a work tree (you were expected to run from the top-level, > having ./.git as the meta information sture), and we didn't have to run > any getcwd(3). Later we added "look at parent directories until we find > the one that has .git subdirectory, while remembering how many levels we > went up", in order to support operations from a subdirectory of a work > tree. The commands chdir up to the root of the work tree and would use > the path they climbed as a pathspec to limit the scope of their operation. > > While "counting how many levels we went up" can be expressed by a sequence > of "../", turning it to the directory prefix means at some point you would > need to do what getcwd(3) does. It wants to be able to read ".." to give > you an absolute path. > > By rewriting that part of the "root-level-discovery" code to do something > like > > - while test -d .git is not true: > - stat(".") to get the inum; > - chdir(".."); and > - opendir(".") and readdir() to find where we were; > > while going up every level, you should be able to construct the prefix > without being to able to read all the way up to the filesystem root. You > only need to be able to read your work tree. > > Admittedly the code complexity got worse later when we added support for > GIT_WORK_TREE and also GIT_CEILING_DIRECTORIES, as they fundamentally need > to know where you are relative to the root of the filesystem tree and need > a working getcwd(3) support, which J Chapman's set-up refuses to give. > > Also I wouldn't be surprised if the support for these two features cheat > in order to reduce code complexity by always using absolute paths even in > places where a path relative to the root of the work tree might have > sufficed. Clarificaiton. The above, like many other messages from me, was not meant as a justification, but a mere explanation of the historical fact. IOW, don't get me wrong by interpreting that I am not interested in seeing a solution that does not use absolute paths. While I think the original "higher levels in the filesystems may not be accessible" is a rather unusual set-up, making paths absolute and relying on being able to always do so have another drawback in a not-so-unusual setup. A work tree that is shallow (say, has only one t/ subdirectory and short filenames) may not be usable if it is so deep in the filesystem hierarchy that the result of getcwd(3) exceeds PATH_MAX. The "hand roll what getcwd(3) did in traditional UNIX while looking for the root level of the work tree" approach I outlined in the previous message will be a way to fix such a use case; as long as the deepest level of your work tree relative to the root of the work tree does not exceed PATH_MAX, you'll be Ok. We have a few known issues in the GIT_WORK_TREE (IIRC, it has a funny interaction with alias expansion). When we reexamine the codepath that the introduction of the feature needed to touch, I would love to see us at least try to see if it is feasible to redo this without calling getcwd(3) when no GIT_WORK_TREE (or core.worktree) is set. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: git's fascination with absolute paths 2009-12-22 6:30 ` Junio C Hamano @ 2009-12-22 17:21 ` J Chapman Flack 0 siblings, 0 replies; 5+ messages in thread From: J Chapman Flack @ 2009-12-22 17:21 UTC (permalink / raw) To: git Junio C Hamano wrote: > Clarificaiton. > > The above, like many other messages from me, was not meant as a > justification, but a mere explanation of the historical fact. IOW, > don't get me wrong by interpreting that I am not interested in seeing > a solution that does not use absolute paths. No worries - a sense of the history is exactly the kind of response I was hoping for. :) > While I think the original "higher levels in the filesystems may not > be accessible" is a rather unusual set-up, making paths absolute and > relying on being able to always do so have another drawback in a > not-so-unusual setup. A work tree that is shallow (say, has only one > t/ subdirectory and short filenames) may not be usable if it is so > deep in the filesystem hierarchy that the result of getcwd(3) exceeds > PATH_MAX. The "hand roll This is funny; I think in my own career I've seen applications that keep sensitive data in subtrees restricted at the top somewhat routinely (which might not mean "really often in absolute terms" but something more like "whenever it made sense for the app") going back at least as far as uucp (IIRC) ... Solaris Zones are set up that way too (when viewed from the global zone). By the same token, while I'd never be surprised to hear of someone with deep hierarchies that exceed PATH_MAX, I'm not sure I've ever actually seen it happen myself. I don't intend that as the start of a "which case is more unusual" comparison, I think it just illustrates the difficulty in making such judgments of usualness, as different people's career trajectories expose them to very different things. I'd rather spend the mental effort trying to extract whatever general principle can be used to code robustly so the fewest judgments of usualness need to be made. Here the general principle (which I think you've already kind of stated yourself, so I'm not trying to preach but just to finish the thought) is that, for various reasons, trying to transform a relative into an absolute path is not always well defined, can't even always be done, can have implementation-dependent side effects and race conditions, and ought to be an operation that gets pulled out of the arsenal only with deliberation and only for specific paths that must be made absolute if that's the only way to satisfy some known functional requirement of the app. It might be tied in to the principle of least astonishment, just by assuming that whatever path the caller, user, or admin provides is probably the path s/he wants you to use, for any number of possible reasons that you don't need to be able to foresee. >> By rewriting that part of the "root-level-discovery" code to do >> something like >> >> - while test -d .git is not true: >> - stat(".") to get the inum; >> - chdir(".."); and >> - opendir(".") and readdir() to find where we were; >> >> while going up every level, you should be able to construct the prefix >> without being to able to read all the way up to the filesystem root. You >> only need to be able to read your work tree. Yes, that's exactly what I would have suggested for the root level discovery. As long as you can find .git before reaching any inaccessible ancestor, life is good. (And if you can't it's a perfectly good reason to give up without astonishing the user.) It would still be better to open "." once at the beginning and fchdir back to it, rather than trying to chdir back to the constructed path string (an inherent race condition), just as the Notes section in linux getcwd(3) says. An interesting point, the chdir ../opendir/readdir algorithm you give above is no longer necessarily what getcwd does, though it traditionally was. These days there's often a kernel name cache that getcwd gets at through a syscall or /proc. You can tell, because when I tried to use 'git init' in my situation, the error was not from getcwd but from the later access() call done on the full path that getcwd successfully returned. [there's a side issue: access(2) isn't for what a lot of people think it's for. It's a rather esoteric call for testing access by the real ids instead of the effective ones, and it's needed in code that (a) runs set{u,g}id AND (b) wants to confirm that a user-specified file is really something the user has rights to. Using access() routinely just to ask "can I open this" has a couple of problems: (1) it has a race condition and gives you less information than just trying the open and testing errno; (2) if anybody down the road does try to use your program or code under set{u,g}id circumstances, astonishing failures can result. Even for its intended purpose access() suffers from a race; current OSes make it fairly easy to just drop to the real user's IDs, try the open, and test errno, so access() is a bit of a dinosaur.] Anyway, it would also be possible to make use of the modern optimized getcwd in the root-level-discovery algorithm. If getcwd gives a result you can just follow it backwards until you find .git, confirming in each step that you can stat the name of the child and the inums match (an O(1) test instead of O(directory-size) for readdir). Just don't follow it back past the directory containing .git. The slower fallback code is only needed for less modern systems in case getcwd returns EACCES. But now we're in the realm of optimization; the traditional loop does the trick. (well, one nice feature of the modern approach, beyond speed, is that it only needs x on the directories up to .git and not r. but since I'm doubtful anything else git does would actually work without r, that's probably moot.) >> Admittedly the code complexity got worse later when we added support >> for GIT_WORK_TREE and also GIT_CEILING_DIRECTORIES, as they >> fundamentally need to know where you are relative to the root of the >> filesystem tree and need a working getcwd(3) support That's the kind of thing I wondered about, are there particular features that genuinely require an absolute path? I'm a git newbie and I don't know what these features do, so I can't comment. (This project was going to be my excuse for learning git, but rcs actually suffices and I need to get it done.) Do these features actually need to traverse the full path, or just to know what it is? On a system with modern getcwd that can return the path even if it isn't traversable, could they make use of that? -Chap ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-12-22 17:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-21 18:42 git's fascination with absolute paths J Chapman Flack 2009-12-21 22:09 ` Avery Pennarun 2009-12-22 0:26 ` Junio C Hamano 2009-12-22 6:30 ` Junio C Hamano 2009-12-22 17:21 ` J Chapman Flack
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).