* [wishlist] git-archive -L @ 2009-02-02 14:34 Pierre Habouzit 2009-02-03 8:10 ` René Scharfe 0 siblings, 1 reply; 4+ messages in thread From: Pierre Habouzit @ 2009-02-02 14:34 UTC (permalink / raw) To: rene.scharfe; +Cc: git [-- Attachment #1: Type: text/plain, Size: 1596 bytes --] Hi Rene, I wanted to do that myself, but I sadly miss the time right now, so I wonder if you'd know how to do the following. We have in our repository a kind of modular system (for a family of web sites) where each web-site uses a (versionned) symlink farm. IOW it works basically that way: www/module1 www/module2 product_A/www/module1 -> ../../www/module1 product_A/www/module_A product_B/www/module1 -> ../../www/module1 product_B/www/module2 -> ../../www/module2 product_B/www/module_B Though product_A and _B even if they share a fair amount of code, are separate products and when we release, we'd like to be able to perform from inside: git archive --format=tar -L product_$A where -L basically does what it does in cp: dereference symlinks. To make the thing hairier, we also have symlinks _inside_ www/ (pointing into the same subtree) that we'd like to keep if possible (even if it's not a big deal). So I'd suggest something where -L only dereferences the symlink if it goes outside of the list of paths passed to git-archive, and -LL (or -L -L) dereferences anything. Of course this would only make sense if the symlinks resolve to something that is tracked :) For now we git archive the whole repository, use tar xh; rm what we don't like, reset the symlinks we want to keep, and retar, which is kind of counterproductive :) -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [wishlist] git-archive -L 2009-02-02 14:34 [wishlist] git-archive -L Pierre Habouzit @ 2009-02-03 8:10 ` René Scharfe 2009-02-04 23:00 ` René Scharfe 0 siblings, 1 reply; 4+ messages in thread From: René Scharfe @ 2009-02-03 8:10 UTC (permalink / raw) To: Pierre Habouzit; +Cc: git Pierre Habouzit schrieb: > Hi Rene, > > I wanted to do that myself, but I sadly miss the time right now, so I > wonder if you'd know how to do the following. > > We have in our repository a kind of modular system (for a family of web > sites) where each web-site uses a (versionned) symlink farm. IOW it > works basically that way: > > www/module1 > www/module2 > product_A/www/module1 -> ../../www/module1 > product_A/www/module_A > product_B/www/module1 -> ../../www/module1 > product_B/www/module2 -> ../../www/module2 > product_B/www/module_B > > Though product_A and _B even if they share a fair amount of code, are > separate products and when we release, we'd like to be able to perform > from inside: > > git archive --format=tar -L product_$A > > where -L basically does what it does in cp: dereference symlinks. To > make the thing hairier, we also have symlinks _inside_ www/ (pointing > into the same subtree) that we'd like to keep if possible (even if it's > not a big deal). > > So I'd suggest something where -L only dereferences the symlink if it > goes outside of the list of paths passed to git-archive, and -LL (or -L > -L) dereferences anything. Of course this would only make sense if the > symlinks resolve to something that is tracked :) Last April, I was working on making archive follow all symlinks pointing to internal files. The goal was a bit different, namely to create archives for platforms without symlink support (i.e. it would resolve all symlinks pointing to tracked objects). IIRC the code had some limitations, e.g. it couldn't follow a symlink to a path containing symlinked directories. I'll need to rebase it to master first, though, as the surrounding code has changed a bit in the meantime. To follow only symlinks that point outside of the specified paths sounds like a sensible mode of operation, but I'm not sure that it's worth a one letter option. Given your setup you also might want to take a look at submodules and the recent submodule archival support patches by Lars Hjelmi. Anyway, I'll try to resurrect my old, incomplete symlink following code, but I don't have much time, either. :-/ René ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [wishlist] git-archive -L 2009-02-03 8:10 ` René Scharfe @ 2009-02-04 23:00 ` René Scharfe 2009-02-05 15:04 ` Pierre Habouzit 0 siblings, 1 reply; 4+ messages in thread From: René Scharfe @ 2009-02-04 23:00 UTC (permalink / raw) To: Pierre Habouzit; +Cc: git René Scharfe schrieb: > Anyway, I'll try to resurrect my old, incomplete symlink following code, > but I don't have much time, either. :-/ After a second and a third look I don't see any salvageable parts in the old code any more. It was a just prototype that taught me something I should have been able to find out by thinking alone: that to follow links within tracked content we can't simply jump to the target, but we have to walk the whole path step by step. E.g., consider a repository with these four entries: Type Name Target ------- ------- ------ file a/f symlink a/x f symlink a/y ../b/f symlink b a Let's say our goal is to follow symlinks pointing to tracked content. We can easily follow "a/x" to get to its target "f" by concatenating the directory part of the symlink's path ("a/") with the target ("f"), i.e. we only need to do a simple string operation. If we do the same for "a/y", we'd arrive at "b/f", which is not a tracked file by itself, though. We need to look up each path element one by one and follow symlinks at each step. That can't be done with our existing tree walkers, AFAICS, so we'd need to write a new one. The decision to follow a link can be made by the callback and passed to read_tree_recursive() as a return value, with, e.g., READ_TREE_FOLLOW and READ_TREE_FOLLOW_NON_MATCHES meaning to follow all internal symlinks and to follow only those whose target doesn't match the specified paths, respectively. René ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [wishlist] git-archive -L 2009-02-04 23:00 ` René Scharfe @ 2009-02-05 15:04 ` Pierre Habouzit 0 siblings, 0 replies; 4+ messages in thread From: Pierre Habouzit @ 2009-02-05 15:04 UTC (permalink / raw) To: René Scharfe; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3824 bytes --] On Wed, Feb 04, 2009 at 11:00:18PM +0000, René Scharfe wrote: > René Scharfe schrieb: > > Anyway, I'll try to resurrect my old, incomplete symlink following code, > > but I don't have much time, either. :-/ > > After a second and a third look I don't see any salvageable parts in the > old code any more. It was a just prototype that taught me something I > should have been able to find out by thinking alone: that to follow > links within tracked content we can't simply jump to the target, but we > have to walk the whole path step by step. > > E.g., consider a repository with these four entries: > > Type Name Target > ------- ------- ------ > file a/f > symlink a/x f > symlink a/y ../b/f > symlink b a > > Let's say our goal is to follow symlinks pointing to tracked content. > > We can easily follow "a/x" to get to its target "f" by concatenating the > directory part of the symlink's path ("a/") with the target ("f"), i.e. > we only need to do a simple string operation. > > If we do the same for "a/y", we'd arrive at "b/f", which is not a > tracked file by itself, though. We need to look up each path element > one by one and follow symlinks at each step. That can't be done with > our existing tree walkers, AFAICS, so we'd need to write a new one. I mostly stumbled on those issues before I gave up having no time to understand how tree walkers work :/ Because of course, our symlinks are exactly symlinks to directories, so not supporting'em is unacceptable to us. > The decision to follow a link can be made by the callback and passed to > read_tree_recursive() as a return value, with, e.g., READ_TREE_FOLLOW > and READ_TREE_FOLLOW_NON_MATCHES meaning to follow all internal symlinks > and to follow only those whose target doesn't match the specified paths, > respectively. It has to be more clever. If you consider something like: symlink a/b .. Or funnier: symlink a/b ../../c symlink c/d ../../a If you don't pay attention, you end up with a nice busy loop, and really really really long path names (a/b/b/b/b/b.... for the first one, and a/b/d/b/d/b/d/b/d/b/... for the latter). That's why I was thinking of a more straight approach, basicaly doing that: * when meeting a symlink to a blob, see if that blob is tracked or not, and if its "real" path in the repository is inside what we're archiving or not. Then match that with what the user asked (following any symlinks -- if we want to, this looks like a pretty big security risk to me, and I see no good reason for that --, only tracked symlinks outside of the archived paths, or only tracked symlinks no matter what), and do it. This one is the almost easy bit. * when meeting a symlink to a directory, look at the pointee, and like for the file, see if it's "tracked" (IOW contains tracked files) and see if the user want symlink replacement or not. If yes, then remember the current <path, pointed directory inside the repository> and put it in a worklist. When finishing the first "pass" of archiving, run a new archiving based on the worklist. Do it a few times. and if you don't converge to a fixed point where the worklist is empty, then you are likely to be in a situation like the ones I depict earlier. *phew*. Though this need quite a reeingeenering of the code, and I had (still don't really have) no time for it. But I think this is the straight approach that would work easily (I don't know for zip though, but in tar where entries are not really sorted, it should work). -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-02-05 15:06 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-02 14:34 [wishlist] git-archive -L Pierre Habouzit 2009-02-03 8:10 ` René Scharfe 2009-02-04 23:00 ` René Scharfe 2009-02-05 15:04 ` Pierre Habouzit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).