* Using GIT to store /etc (Or: How to make GIT store all file permission bits) @ 2006-12-10 13:40 Kyle Moffett 2006-12-10 14:49 ` Jeff Garzik ` (3 more replies) 0 siblings, 4 replies; 34+ messages in thread From: Kyle Moffett @ 2006-12-10 13:40 UTC (permalink / raw) To: git I've recently become somewhat interested in the idea of using GIT to store the contents of various folders in /etc. However after a bit of playing with this, I discovered that GIT doesn't actually preserve all permission bits since that would cause problems with the more traditional software development model. I'm curious if anyone has done this before; and if so, how they went about handling the permissions and ownership issues. I spent a little time looking over how GIT stores and compares permission bits; trying to figure out if it's possible to patch in a new configuration variable or two; say "preserve_all_perms" and "preserve_owner", or maybe even "save_acls". It looks like standard permission preservation is fairly basic; you would just need to patch a few routines which alter the permissions read in from disk or compare them with ones from the database. On the other hand, it would appear that preserving ownership or full POSIX ACLs might be a bit of a challenge. Thanks for your insight and advice! Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett @ 2006-12-10 14:49 ` Jeff Garzik 2006-12-10 15:30 ` Jakub Narebski 2006-12-10 15:06 ` Santi Béjar ` (2 subsequent siblings) 3 siblings, 1 reply; 34+ messages in thread From: Jeff Garzik @ 2006-12-10 14:49 UTC (permalink / raw) To: Kyle Moffett; +Cc: git Kyle Moffett wrote: > I've recently become somewhat interested in the idea of using GIT to > store the contents of various folders in /etc. However after a bit of > playing with this, I discovered that GIT doesn't actually preserve all > permission bits since that would cause problems with the more > traditional software development model. I'm curious if anyone has done > this before; and if so, how they went about handling the permissions and > ownership issues. > > I spent a little time looking over how GIT stores and compares > permission bits; trying to figure out if it's possible to patch in a new > configuration variable or two; say "preserve_all_perms" and > "preserve_owner", or maybe even "save_acls". It looks like standard > permission preservation is fairly basic; you would just need to patch a > few routines which alter the permissions read in from disk or compare > them with ones from the database. On the other hand, it would appear > that preserving ownership or full POSIX ACLs might be a bit of a challenge. It's a great idea, something I would like to do, and something I've suggested before. You could dig through the mailing list archives, if you're motivated. I actively use git to version, store and distribute an exim mail configuration across six servers. So far my solution has been a 'fix perms' script, or using the file perm checking capabilities of cfengine. But it would be a lot better if git natively cared about ownership and permissions (presumably via an option). Jeff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 14:49 ` Jeff Garzik @ 2006-12-10 15:30 ` Jakub Narebski 2006-12-10 18:10 ` Kyle Moffett 0 siblings, 1 reply; 34+ messages in thread From: Jakub Narebski @ 2006-12-10 15:30 UTC (permalink / raw) To: git Jeff Garzik wrote: > Kyle Moffett wrote: >> >> I've recently become somewhat interested in the idea of using GIT to >> store the contents of various folders in /etc. However after a bit of >> playing with this, I discovered that GIT doesn't actually preserve all >> permission bits since that would cause problems with the more >> traditional software development model. I'm curious if anyone has done >> this before; and if so, how they went about handling the permissions and >> ownership issues. >> >> I spent a little time looking over how GIT stores and compares >> permission bits; trying to figure out if it's possible to patch in a new >> configuration variable or two; say "preserve_all_perms" and >> "preserve_owner", or maybe even "save_acls". It looks like standard >> permission preservation is fairly basic; you would just need to patch a >> few routines which alter the permissions read in from disk or compare >> them with ones from the database. On the other hand, it would appear >> that preserving ownership or full POSIX ACLs might be a bit of a challenge. > > It's a great idea, something I would like to do, and something I've > suggested before. You could dig through the mailing list archives, if > you're motivated. > > I actively use git to version, store and distribute an exim mail > configuration across six servers. So far my solution has been a 'fix > perms' script, or using the file perm checking capabilities of cfengine. Fix perms' script used on a checkout hook is a best idea I think. > But it would be a lot better if git natively cared about ownership and > permissions (presumably via an option). There is currently no place for ownership and extended attributes in the tree object; and even full POSIX permissions might be challenge because for example currently unused 'is socket' permission bit is used for experimental commit-in-tree submodule support. And given Linus stance that git is "content tracker"... In the loooong thread "VCS comparison table" there was some talk about using git (or any SCM) to manage /etc. Check out: * Message-ID: <Pine.LNX.4.64.0610220926170.3962@g5.osdl.org> http://permalink.gmane.org/gmane.comp.version-control.git/29765 * Message-ID: <20061023051932.GA8625@evofed.localdomain> http://marc.theaimsgroup.com/?i=<20061023051932.GA8625@evofed.localdomain> (and other messages in this subthread). -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 15:30 ` Jakub Narebski @ 2006-12-10 18:10 ` Kyle Moffett 2006-12-10 18:18 ` Jakub Narebski 2006-12-10 18:26 ` Jakub Narebski 0 siblings, 2 replies; 34+ messages in thread From: Kyle Moffett @ 2006-12-10 18:10 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Dec 10, 2006, at 10:30:00, Jakub Narebski wrote: > Jeff Garzik wrote: >> I actively use git to version, store and distribute an exim mail >> configuration across six servers. So far my solution has been a >> 'fix perms' script, or using the file perm checking capabilities >> of cfengine. > > Fix perms' script used on a checkout hook is a best idea I think. Hmm, unfortunately that has problems with security-related race conditions when used directly for /etc. Think about what happens with "/etc/shadow" in that case, for example. (/etc/.git is of course 0700) I'm sure there are others where non-root daemons get unhappy when they get an inotify event and their config files have suddenly become root:root:0600. I also want to be able to "cd /etc && git status" to see what changed after running "apt-get update" or maybe fiddling in SWAT or webmin, so a makefile which installs into / etc won't quite solve it either. It would also be nice to see when things change the permissions on files in /etc, or even bind-mount an append-only volume over /etc/.git/objects to provide additional data security. >> But it would be a lot better if git natively cared about ownership >> and permissions (presumably via an option). > > There is currently no place for ownership and extended attributes > in the tree object; and even full POSIX permissions might be > challenge because for example currently unused 'is socket' > permission bit is used for experimental commit-in-tree submodule > support. What about doing something crazy like "is socket" && "is directory" && "is symlink"? Or something else that old GIT versions would ignore and new GIT versions could do something useful with? Perhaps like I mentioned in an earlier email, the new data could be stored as part of a modified "file" object. Alternatively could a directory have a file named with an empty string with bogus mode bits which points to an extended-attributes-tree object? > And given Linus stance that git is "content tracker"... Extended attributes are content too! This includes things like icons, security labels (Think unclassified/confidential/secret/top- secret/etc), ACLs, summaries, and other metadata. Content tracker purists could also just ignore the new default-off config options and be perfectly happy with status-quo. :-D > In the loooong thread "VCS comparison table" there was some talk > about using git (or any SCM) to manage /etc. Check out: > > * Message-ID: <Pine.LNX.4.64.0610220926170.3962@g5.osdl.org> > http://permalink.gmane.org/gmane.comp.version-control.git/29765 > * Message-ID: <20061023051932.GA8625@evofed.localdomain> > http://marc.theaimsgroup.com/? > i=<20061023051932.GA8625@evofed.localdomain> > > (and other messages in this subthread). I have, and while it's interesting material that thread produced no real patches :-D. I'd like to introduce some new config options to control the new code: "preserve_full_perms", "preserve_posix_acls", "preserve_security_labels", and "preserve_user_xattrs" which default to false but when set modify GIT's behavior to store, retrieve, and compare additional data. If you have any suggestions on how to store the data such that old GIT ignores it I'm all ears :-D. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 18:10 ` Kyle Moffett @ 2006-12-10 18:18 ` Jakub Narebski 2006-12-10 18:26 ` Jakub Narebski 1 sibling, 0 replies; 34+ messages in thread From: Jakub Narebski @ 2006-12-10 18:18 UTC (permalink / raw) To: git Kyle Moffett wrote: > On Dec 10, 2006, at 10:30:00, Jakub Narebski wrote: >> Jeff Garzik wrote: >>> >>> I actively use git to version, store and distribute an exim mail >>> configuration across six servers. So far my solution has been a >>> 'fix perms' script, or using the file perm checking capabilities >>> of cfengine. >> >> Fix perms' script used on a checkout hook is a best idea I think. > > Hmm, unfortunately that has problems with security-related race > conditions when used directly for /etc. Think about what happens > with "/etc/shadow" in that case, for example. (/etc/.git is of > course 0700) I'm sure there are others where non-root daemons get > unhappy when they get an inotify event and their config files have > suddenly become root:root:0600. I also want to be able to "cd /etc > && git status" to see what changed after running "apt-get update" or > maybe fiddling in SWAT or webmin, so a makefile which installs into / > etc won't quite solve it either. It would also be nice to see when > things change the permissions on files in /etc, or even bind-mount an > append-only volume over /etc/.git/objects to provide additional data > security. The idea is to not store /etc in git directly, but use import/export scripts, which for example saves permissions and ownership in some file also tracked by git on import, and restores correct permissions on export. That is what I remember from this discussion. This of course means that you would have to write your own porcelain... What about mentioned in other email IsiSetup? -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 18:10 ` Kyle Moffett 2006-12-10 18:18 ` Jakub Narebski @ 2006-12-10 18:26 ` Jakub Narebski 2006-12-10 18:35 ` Kyle Moffett 1 sibling, 1 reply; 34+ messages in thread From: Jakub Narebski @ 2006-12-10 18:26 UTC (permalink / raw) To: Kyle Moffett; +Cc: git Kyle Moffett wrote: > On Dec 10, 2006, at 10:30:00, Jakub Narebski wrote: >> Jeff Garzik wrote: >>> >>> I actively use git to version, store and distribute an exim mail >>> configuration across six servers. So far my solution has been a >>> 'fix perms' script, or using the file perm checking capabilities >>> of cfengine. >> >> Fix perms' script used on a checkout hook is a best idea I think. > > Hmm, unfortunately that has problems with security-related race > conditions when used directly for /etc. Think about what happens > with "/etc/shadow" in that case, for example. (/etc/.git is of > course 0700) I'm sure there are others where non-root daemons get > unhappy when they get an inotify event and their config files have > suddenly become root:root:0600. I also want to be able to "cd /etc > && git status" to see what changed after running "apt-get update" or > maybe fiddling in SWAT or webmin, so a makefile which installs into / > etc won't quite solve it either. It would also be nice to see when > things change the permissions on files in /etc, or even bind-mount an > append-only volume over /etc/.git/objects to provide additional data > security. The idea is to not store /etc in git directly, but use import/export scripts, which for example saves permissions and ownership in some file also tracked by git on import, and restores correct permissions on export. That is what I remember from this discussion. This of course means that you would have to write your own porcelain... What about mentioned in other email IsiSetup? -- Jakub Narebski Warsaw, Poland ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 18:26 ` Jakub Narebski @ 2006-12-10 18:35 ` Kyle Moffett 2006-12-11 10:39 ` Andreas Ericsson 0 siblings, 1 reply; 34+ messages in thread From: Kyle Moffett @ 2006-12-10 18:35 UTC (permalink / raw) To: Jakub Narebski; +Cc: git On Dec 10, 2006, at 13:26:32, Jakub Narebski wrote: > The idea is to not store /etc in git directly, but use import/ > export scripts, which for example saves permissions and ownership > in some file also tracked by git on import, and restores correct > permissions on export. That is what I remember from this > discussion. This of course means that you would have to write your > own porcelain... > > What about mentioned in other email IsiSetup? The real problem I have with that is you literally have to duplicate all sorts of functionality. I want to run "foo-status" in /etc and get something useful, but if /etc is not a git directory in and of itself then you have to duplicate most of "git-status" anyways. And the same applies to all the other commands. From what I can see of IsiSetup the tools for checking out, merging, modifying, cloning, etc are all much more limited and immature than the ones available through GIT/cogito, and I would be loathe to discard all that extra functionality and duplicate a few thousand lines of code in the name of "concept purity". GIT already has _some_ idea about file permissions, it just discards most of the data before writing to disk. Of course, adding POSIX ACLs and user-extended-attributes requires a new data format, but those are very similar to filesystem permissions; they differ only in amount of data stored, not in purpose. Import/export scripts literally require wrapping every single GIT command with a script that changes directory a few times, reads from a different checked-out tree, and permutes some extended-attribute data slightly before storing it in the underlying GIT tree. Even without adding any new functionality whatsoever that doubles the amount of code just for finding your repository and checking command- line arguments, and that's a crazy trade-off to make in any situation. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 18:35 ` Kyle Moffett @ 2006-12-11 10:39 ` Andreas Ericsson 2006-12-11 10:55 ` Jeff Garzik 2006-12-11 12:13 ` Josef Weidendorfer 0 siblings, 2 replies; 34+ messages in thread From: Andreas Ericsson @ 2006-12-11 10:39 UTC (permalink / raw) To: Kyle Moffett; +Cc: Jakub Narebski, git Kyle Moffett wrote: > On Dec 10, 2006, at 13:26:32, Jakub Narebski wrote: >> The idea is to not store /etc in git directly, but use import/export >> scripts, which for example saves permissions and ownership in some >> file also tracked by git on import, and restores correct permissions >> on export. That is what I remember from this discussion. This of >> course means that you would have to write your own porcelain... >> >> What about mentioned in other email IsiSetup? > > The real problem I have with that is you literally have to duplicate all > sorts of functionality. I want to run "foo-status" in /etc and get > something useful, but if /etc is not a git directory in and of itself > then you have to duplicate most of "git-status" anyways. Make /etc/.git a symlink to where you store your repo and go to the other directory when you want to *restore* configuration. The only "own porcelain" you need to write is a simple program that understands "save" and "restore" (or some such) and tucks away the meta-data in a file somewhere inside the git tree. If you make it in the format octal-mode path/to/file you can even get decently human-readable permission diffs, which will most likely be prettier and easier to read than anything git currently has. > > GIT already has _some_ idea about file permissions, it just discards > most of the data before writing to disk. Of course, adding POSIX ACLs > and user-extended-attributes requires a new data format, but those are > very similar to filesystem permissions; they differ only in amount of > data stored, not in purpose. > The amount of data stored is the issue here. The current implementation (which works just fine and does The Right Thing(tm) for code-repos) only stores what it has to and uses the spare bits to do other things. > Import/export scripts literally require wrapping every single GIT > command with a script that changes directory a few times, reads from a > different checked-out tree, and permutes some extended-attribute data > slightly before storing it in the underlying GIT tree. Even without > adding any new functionality whatsoever that doubles the amount of code > just for finding your repository and checking command-line arguments, > and that's a crazy trade-off to make in any situation. > GIT_DIR=/some/where/else/.git git log -p Why would you want to read from a different checked-out tree? Non-committed data is "changes", committed data is "HEAD" (or commit-ish) and marked data is "index". I see no reason what so ever for a second checked-out tree. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-11 10:39 ` Andreas Ericsson @ 2006-12-11 10:55 ` Jeff Garzik 2006-12-11 12:13 ` Josef Weidendorfer 1 sibling, 0 replies; 34+ messages in thread From: Jeff Garzik @ 2006-12-11 10:55 UTC (permalink / raw) To: Kyle Moffett; +Cc: Andreas Ericsson, Jakub Narebski, git Another option is to have a process that stores your configs in git, and script an export from git to rpm|deb. Packaging systems make it even easier to go between config versions. Jeff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-11 10:39 ` Andreas Ericsson 2006-12-11 10:55 ` Jeff Garzik @ 2006-12-11 12:13 ` Josef Weidendorfer 2006-12-11 13:33 ` Johannes Schindelin 1 sibling, 1 reply; 34+ messages in thread From: Josef Weidendorfer @ 2006-12-11 12:13 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Kyle Moffett, Jakub Narebski, git On Monday 11 December 2006 11:39, Andreas Ericsson wrote: > > Import/export scripts literally require wrapping every single GIT > > command with a script that changes directory a few times, reads from a > > different checked-out tree, and permutes some extended-attribute data > > slightly before storing it in the underlying GIT tree. Even without > > adding any new functionality whatsoever that doubles the amount of code > > just for finding your repository and checking command-line arguments, > > and that's a crazy trade-off to make in any situation. > > > > GIT_DIR=/some/where/else/.git git log -p Doing this everytime you want to run a git command *is* a lot of time wasted for typing. The .gitlink proposal would come in handy here: you have a simple file instead of .git/, which links to the real repository. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-11 12:13 ` Josef Weidendorfer @ 2006-12-11 13:33 ` Johannes Schindelin 2006-12-11 15:07 ` Josef Weidendorfer 0 siblings, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2006-12-11 13:33 UTC (permalink / raw) To: Josef Weidendorfer; +Cc: Andreas Ericsson, Kyle Moffett, Jakub Narebski, git Hi, On Mon, 11 Dec 2006, Josef Weidendorfer wrote: > On Monday 11 December 2006 11:39, Andreas Ericsson wrote: > > > Import/export scripts literally require wrapping every single GIT > > > command with a script that changes directory a few times, reads from a > > > different checked-out tree, and permutes some extended-attribute data > > > slightly before storing it in the underlying GIT tree. Even without > > > adding any new functionality whatsoever that doubles the amount of code > > > just for finding your repository and checking command-line arguments, > > > and that's a crazy trade-off to make in any situation. > > > > > > > GIT_DIR=/some/where/else/.git git log -p > > Doing this everytime you want to run a git command *is* a lot of time > wasted for typing. > > The .gitlink proposal would come in handy here: you have a simple > file instead of .git/, which links to the real repository. I beg your pardon; I'm just joining in. Why is a symbolic link for .git inacceptable? Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-11 13:33 ` Johannes Schindelin @ 2006-12-11 15:07 ` Josef Weidendorfer 0 siblings, 0 replies; 34+ messages in thread From: Josef Weidendorfer @ 2006-12-11 15:07 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Andreas Ericsson, Kyle Moffett, Jakub Narebski, git On Monday 11 December 2006 14:33, Johannes Schindelin wrote: > On Mon, 11 Dec 2006, Josef Weidendorfer wrote: > > > On Monday 11 December 2006 11:39, Andreas Ericsson wrote: > > > > Import/export scripts literally require wrapping every single GIT > > > > command with a script that changes directory a few times, reads from a > > > > different checked-out tree, and permutes some extended-attribute data > > > > slightly before storing it in the underlying GIT tree. Even without > > > > adding any new functionality whatsoever that doubles the amount of code > > > > just for finding your repository and checking command-line arguments, > > > > and that's a crazy trade-off to make in any situation. > > > > > > > > > > GIT_DIR=/some/where/else/.git git log -p > > > > Doing this everytime you want to run a git command *is* a lot of time > > wasted for typing. > > > > The .gitlink proposal would come in handy here: you have a simple > > file instead of .git/, which links to the real repository. > > I beg your pardon; I'm just joining in. Why is a symbolic link for .git > inacceptable? You are totally right. The .gitlink thing is tailored to allow submodule support later. It includes some smart searching for the git repository to allow moving the checkout in some limits without breaking the link to the repository. Aside from this, the proposal is more flexible in that you can specify not only GIT_DIR (or the GIT_DIR_HINT to trigger smart search), but also GIT_INDEX_FILE and GIT_HEAD_FILE, which allows different checkouts (with different index state and HEAD) for the same repo easily. Which is not needed in this case. So, sorry for the noise ;-) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett 2006-12-10 14:49 ` Jeff Garzik @ 2006-12-10 15:06 ` Santi Béjar 2006-12-10 17:46 ` Kyle Moffett 2007-01-10 1:39 ` David Lang 2006-12-11 10:50 ` Nikolai Weibull 2006-12-12 3:45 ` Daniel Barkalow 3 siblings, 2 replies; 34+ messages in thread From: Santi Béjar @ 2006-12-10 15:06 UTC (permalink / raw) To: Kyle Moffett; +Cc: git On 12/10/06, Kyle Moffett <mrmacman_g4@mac.com> wrote: > I've recently become somewhat interested in the idea of using GIT to > store the contents of various folders in /etc. However after a bit > of playing with this, I discovered that GIT doesn't actually preserve > all permission bits since that would cause problems with the more > traditional software development model. I'm curious if anyone has > done this before; and if so, how they went about handling the > permissions and ownership issues. > > I spent a little time looking over how GIT stores and compares > permission bits; trying to figure out if it's possible to patch in a > new configuration variable or two; say "preserve_all_perms" and > "preserve_owner", or maybe even "save_acls". It looks like standard > permission preservation is fairly basic; you would just need to patch > a few routines which alter the permissions read in from disk or > compare them with ones from the database. On the other hand, it > would appear that preserving ownership or full POSIX ACLs might be a > bit of a challenge. > > Thanks for your insight and advice! I have not used it, but you could try: http://www.isisetup.ch/ that uses git as a backend. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 15:06 ` Santi Béjar @ 2006-12-10 17:46 ` Kyle Moffett 2006-12-10 18:10 ` Jakub Narebski 2007-01-10 1:39 ` David Lang 1 sibling, 1 reply; 34+ messages in thread From: Kyle Moffett @ 2006-12-10 17:46 UTC (permalink / raw) To: Santi Béjar, Jeff Garzik; +Cc: git > On 12/10/06, Kyle Moffett <mrmacman_g4@mac.com> wrote: >> I've recently become somewhat interested in the idea of using GIT >> to store the contents of various folders in /etc. However after a >> bit of playing with this, I discovered that GIT doesn't actually >> preserve all permission bits since that would cause problems with >> the more traditional software development model. I'm curious if >> anyone has done this before; and if so, how they went about >> handling the permissions and ownership issues. >> >> I spent a little time looking over how GIT stores and compares >> permission bits; trying to figure out if it's possible to patch in >> a new configuration variable or two; say "preserve_all_perms" and >> "preserve_owner", or maybe even "save_acls". It looks like >> standard permission preservation is fairly basic; you would just >> need to patch a few routines which alter the permissions read in >> from disk or compare them with ones from the database. On the >> other hand, it would appear that preserving ownership or full >> POSIX ACLs might be a bit of a challenge. On Dec 10, 2006, at 10:06:14, Santi Béjar wrote: > I have not used it, but you could try: > > http://www.isisetup.ch/ > > that uses git as a backend. Wow, umm, that's actually really interesting for me, given that I'm most interested in these sorts of things on Debian. I can't find much documentation on their site; the tools look vaguely immature but I haven't really had much time to look at it yet. On Dec 10, 2006, at 09:49:50, Jeff Garzik wrote: > It's a great idea, something I would like to do, and something I've > suggested before. You could dig through the mailing list archives, > if you're motivated. I have been digging through the archives; I was just holding out hope that somebody else on the list had already halfway beat me to the punch. Guess not :-D > I actively use git to version, store and distribute an exim mail > configuration across six servers. So far my solution has been a > 'fix perms' script, or using the file perm checking capabilities of > cfengine. > > But it would be a lot better if git natively cared about ownership > and permissions (presumably via an option). I was thinking about a standard config option in the GIT config file, that way users could have a personal default and repositories could specify it locally. I started tinkering but quickly discovered that permissions handling in general in GIT seems to be a mess; there's about 4 different tiers where permissions data is manipulated in various formats. Some places use network-endian 16-bit values, there's a couple functions which do different truncations to 644 or 755 format. There are 2 functions which canonicalize the file mode based on symlink or directory status, each in subtly different ways. I'm slowly sorting through things but if I could get a few pointers from someone intimately familiar with the code that would be most appreciated: I'd like to try to add new entries to tree objects which older versions of GIT would ignore but which newer versions of GIT would use to store ACL or extended-attribute data. The simplest solution which admittedly breaks the ability of older GITs to read the data from a file with attributes (ignoring the ext- attrs themselves) is to create a new "file-with-extended-attributes" object which contains a binary concatenation (with length bytes and attribute names and such) of the file and its extended attributes. That breaks the old GIT assumption that permission and security data is part of the directory not the file, but it's more in-line with the way extended attributes are attached to the inodes in the filesystem (although that doesn't really matter IMO). Alternatively I might be able to add a new entry to each tree object with invalid extended file mods bits (IE: Neither a directory, a file, nor a symlink), or perhaps an entry with an empty name, which points to a new "extended attribute table". That table could either map from (entry, attribute) => (data) or from (entry) => ((attribute,data),(attribute,data),[...]), depending on which would be more efficient. It's essential that the overhead for non-ext-attr repositories is O(1) and ideally the overhead for a bunch of files with the same ext-attr is O(size-of-ext-attr) + O(number-of-files- with-that-attr), although that may vary depending on implementation. Advice, opinions, problems, and "this-has-no-chance-of-ever-even- remotely-working" are all useful and welcome! Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 17:46 ` Kyle Moffett @ 2006-12-10 18:10 ` Jakub Narebski 0 siblings, 0 replies; 34+ messages in thread From: Jakub Narebski @ 2006-12-10 18:10 UTC (permalink / raw) To: git Kyle Moffett wrote: > The simplest solution which admittedly breaks the ability of older > GITs to read the data from a file with attributes (ignoring the ext- > attrs themselves) is to create a new "file-with-extended-attributes" > object which contains a binary concatenation (with length bytes and > attribute names and such) of the file and its extended attributes. > That breaks the old GIT assumption that permission and security data > is part of the directory not the file, but it's more in-line with the > way extended attributes are attached to the inodes in the filesystem > (although that doesn't really matter IMO). This contradict git philosophy of "tracking contents". > Alternatively I might be able to add a new entry to each tree object > with invalid extended file mods bits (IE: Neither a directory, a > file, nor a symlink), or perhaps an entry with an empty name, which > points to a new "extended attribute table". That table could either > map from (entry, attribute) => (data) or from (entry) => > ((attribute,data),(attribute,data),[...]), depending on which would > be more efficient. It's essential that the overhead for non-ext-attr > repositories is O(1) and ideally the overhead for a bunch of files > with the same ext-attr is O(size-of-ext-attr) + O(number-of-files- > with-that-attr), although that may vary depending on implementation. Wouldn't it be better to add another field in the tree object, that instead of storing "(filemode, link to contents, name)" it would store "(filemode, link to extended attributes, link to contents, name)" where "filemode" is mode of a file of which git uses only a few bits (is a directory, is a symlink, is a file, is a executable file), and "link to" is sha1 of appropriate blob (or tree) object? Extended attributes could be stored in new type of object, or just in blob object. Well, you'd have to extend index in similar way (and add a way to store extended attributes for directories in index; nowit only stores info about files). This of course breaks backwards compatibility... -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 15:06 ` Santi Béjar 2006-12-10 17:46 ` Kyle Moffett @ 2007-01-10 1:39 ` David Lang 2007-01-10 2:30 ` Shawn O. Pearce 1 sibling, 1 reply; 34+ messages in thread From: David Lang @ 2007-01-10 1:39 UTC (permalink / raw) To: git I want to have a tripwire-like system checking the files to make sure that they haven't changed unexpectedly. the program I'm looking at notices inode as well as timestamp and content changed. when you checkout a file from git will it re-write/overwrite a file that hasn't changed or will it realize there is no change and leave it as-is? does this answer change if there is a trigger on checkout (to change permissions or otherwise manipulate the file)? David Lang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2007-01-10 1:39 ` David Lang @ 2007-01-10 2:30 ` Shawn O. Pearce 2007-01-10 18:34 ` David Lang 0 siblings, 1 reply; 34+ messages in thread From: Shawn O. Pearce @ 2007-01-10 2:30 UTC (permalink / raw) To: David Lang; +Cc: git David Lang <david.lang@digitalinsight.com> wrote: > I want to have a tripwire-like system checking the files to make sure that > they haven't changed unexpectedly. the program I'm looking at notices inode > as well as timestamp and content changed. > > when you checkout a file from git will it re-write/overwrite a file that > hasn't changed or will it realize there is no change and leave it as-is? If the stat data is current it will leave it as-is. You can force the index to refresh with `git update-index --refresh` or by running git status. > does this answer change if there is a trigger on checkout (to change > permissions or otherwise manipulate the file)? Only if the trigger does something in addition, like force overwrite files. But we don't have a checkout trigger. So there's no trigger. -- Shawn. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2007-01-10 2:30 ` Shawn O. Pearce @ 2007-01-10 18:34 ` David Lang 2007-01-12 0:55 ` Shawn O. Pearce 0 siblings, 1 reply; 34+ messages in thread From: David Lang @ 2007-01-10 18:34 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git On Tue, 9 Jan 2007, Shawn O. Pearce wrote: > David Lang <david.lang@digitalinsight.com> wrote: >> I want to have a tripwire-like system checking the files to make sure that >> they haven't changed unexpectedly. the program I'm looking at notices inode >> as well as timestamp and content changed. >> >> when you checkout a file from git will it re-write/overwrite a file that >> hasn't changed or will it realize there is no change and leave it as-is? > > If the stat data is current it will leave it as-is. You can force > the index to refresh with `git update-index --refresh` or by running > git status. I was looking at checkout, not checkin so I'm not understanding how the index is involved here. >> does this answer change if there is a trigger on checkout (to change >> permissions or otherwise manipulate the file)? > > Only if the trigger does something in addition, like force overwrite > files. But we don't have a checkout trigger. So there's no trigger. we don't have a checkout trigger? I thought that what Linus had suggested for permissions was to have a script triggered on checkin that stored the permissions of the files, and a script triggered on checkout that set the permissions from the stored file. if there isn't a checkout trigger how would the permissions ever get set? in my particular case I'd like to have the checkin run a script that produces a 'generic' version of each file, and the checkout run a script that converts the generic version into the host specific version. I already have a script that does this work (and (ab)uses ssh to propogate the generic version to other hosts and create the host specific versions there), but I was interested in useing git to add better version control to the generic versions of the files (I currently use RCS on each box to version control the host specific versions) David Lang ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2007-01-10 18:34 ` David Lang @ 2007-01-12 0:55 ` Shawn O. Pearce 0 siblings, 0 replies; 34+ messages in thread From: Shawn O. Pearce @ 2007-01-12 0:55 UTC (permalink / raw) To: David Lang; +Cc: git David Lang <david.lang@digitalinsight.com> wrote: > On Tue, 9 Jan 2007, Shawn O. Pearce wrote: > >If the stat data is current it will leave it as-is. You can force > >the index to refresh with `git update-index --refresh` or by running > >git status. > > I was looking at checkout, not checkin so I'm not understanding how the > index is involved here. During checkout we use the index to help us decide if a file needs to be updated with new content or can be left as-is. Its a cache of what version each file is at, and its based on the file stat data (dev, inode, modification date, etc.) to tell us if the file has been modified or was last created by Git. If Git was the one that last modified the file and the version stored in the index matches the version needed during the checkout, the file is left alone. But if anything differs then the file gets overwritten. > >>does this answer change if there is a trigger on checkout (to change > >>permissions or otherwise manipulate the file)? > > > >Only if the trigger does something in addition, like force overwrite > >files. But we don't have a checkout trigger. So there's no trigger. > > we don't have a checkout trigger? No. > I thought that what Linus had suggested > for permissions was to have a script triggered on checkin that stored the > permissions of the files, and a script triggered on checkout that set the > permissions from the stored file. Yes. It is what he suggested. > if there isn't a checkout trigger how would the permissions ever get set? Someone needs to implement support for a post-checkout trigger. _Then_ a checkout trigger could perform this action. > in my particular case I'd like to have the checkin run a script that > produces a 'generic' version of each file, You may be able to do that in the pre-commit hook by updating the index -- Shawn. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett 2006-12-10 14:49 ` Jeff Garzik 2006-12-10 15:06 ` Santi Béjar @ 2006-12-11 10:50 ` Nikolai Weibull 2006-12-12 3:45 ` Daniel Barkalow 3 siblings, 0 replies; 34+ messages in thread From: Nikolai Weibull @ 2006-12-11 10:50 UTC (permalink / raw) To: Kyle Moffett; +Cc: git On 12/10/06, Kyle Moffett <mrmacman_g4@mac.com> wrote: > I've recently become somewhat interested in the idea of using GIT to > store the contents of various folders in /etc. However after a bit > of playing with this, I discovered that GIT doesn't actually preserve > all permission bits since that would cause problems with the more > traditional software development model. I'm curious if anyone has > done this before; and if so, how they went about handling the > permissions and ownership issues. I keep the files I want to track in a separate folder that I track with Git and use a Makefile for updating /etc. I basically have a rule for checking for differences between the tracked folder and /etc and a rule for installing changed files (with the correct permissions). It works, but it does require some "Makefile magic" to work right (or the way /I/ want it anyway). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett ` (2 preceding siblings ...) 2006-12-11 10:50 ` Nikolai Weibull @ 2006-12-12 3:45 ` Daniel Barkalow 2006-12-12 13:49 ` Kyle Moffett 3 siblings, 1 reply; 34+ messages in thread From: Daniel Barkalow @ 2006-12-12 3:45 UTC (permalink / raw) To: Kyle Moffett; +Cc: git On Sun, 10 Dec 2006, Kyle Moffett wrote: > I've recently become somewhat interested in the idea of using GIT to store the > contents of various folders in /etc. However after a bit of playing with > this, I discovered that GIT doesn't actually preserve all permission bits > since that would cause problems with the more traditional software development > model. I'm curious if anyone has done this before; and if so, how they went > about handling the permissions and ownership issues. > > I spent a little time looking over how GIT stores and compares permission > bits; trying to figure out if it's possible to patch in a new configuration > variable or two; say "preserve_all_perms" and "preserve_owner", or maybe even > "save_acls". It looks like standard permission preservation is fairly basic; > you would just need to patch a few routines which alter the permissions read > in from disk or compare them with ones from the database. On the other hand, > it would appear that preserving ownership or full POSIX ACLs might be a bit of > a challenge. The first thing you'd want to do is correct the fact that the index doesn't keep full permissions. We decided long ago that we don't want to track more than 0100, but we're discarding the rest between the filesystem and the index, rather than between the index and the tree. (This is weird of us, since we keep gid and uid in the index, as changedness heuristics, but don't keep permissions; of course, we'd have to apply umask to the index when we check it out to sync what we expect to be there with what has actually been created.) I think that would be the only change needed to the index and index/working directory connection, although it might be necessary to support longer values for uid/gid/etc, since they'd be important data now. Note that git only stores content, not incidental information. But a lot of information which is incidental in a source tree is content in /etc. This implies that /etc and working/linux-2.6 are fundamentally different sorts of things, because different aspects of them are content. I'd suggest a new object type for a directory with permissions, ACLs, and so forth. It should probably use symbolic owner and group, too. My guess is that you'll want to use "commit"s, the new object type, and "blob"s. Everything that uses trees would need to have a version that uses the new type. But I think that you generally want different behavior anyway, so that's not a major issue. -Daniel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-12 3:45 ` Daniel Barkalow @ 2006-12-12 13:49 ` Kyle Moffett 2006-12-12 15:53 ` Andy Parkins 2006-12-13 18:10 ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow 0 siblings, 2 replies; 34+ messages in thread From: Kyle Moffett @ 2006-12-12 13:49 UTC (permalink / raw) To: Daniel Barkalow; +Cc: git On Dec 11, 2006, at 22:45:25, Daniel Barkalow wrote: > The first thing you'd want to do is correct the fact that the index > doesn't keep full permissions. We decided long ago that we don't > want to track more than 0100, but we're discarding the rest between > the filesystem and the index, rather than between the index and the > tree. (This is weird of us, since we keep gid and uid in the index, > as changedness heuristics, but don't keep permissions; of course, > we'd have to apply umask to the index when we check it out to sync > what we expect to be there with what has actually been created.) > > I think that would be the only change needed to the index and index/ > working directory connection, although it might be necessary to > support longer values for uid/gid/etc, since they'd be important > data now. Hmm, ok. It would seem to be a reasonable requirement that if you want to change any of the "preserve_*_attributes" config options you need to blow away and recreate your index, no? I would probably change the underlying index format pretty completely and stick a new version tag inside it. > Note that git only stores content, not incidental information. But > a lot of information which is incidental in a source tree is > content in /etc. This implies that /etc and working/linux-2.6 are > fundamentally different sorts of things, because different aspects > of them are content. Ahh, I hadn't thought of it that way before but that makes a lot of sense. Thanks! > I'd suggest a new object type for a directory with permissions, > ACLs, and so forth. It should probably use symbolic owner and > group, too. My guess is that you'll want to use "commit"s, the new > object type, and "blob"s. Everything that uses trees would need to > have a version that uses the new type. But I think that you > generally want different behavior anyway, so that's not a major issue. Ok, seems straightforward enough. One other thing that crossed my mind was figuring out how to handle hardlinks. The simplest solution would be to add an extra layer of indirection between the "file inode" and the "file data". Instead of your directory pointing to a "file-data" blob and "file-attributes" object, it would point to an "file-inode" object with embedded attribute data and a pointer to the file contents blob. I remember reading some discussions from the early days of GIT about how that was considered and discarded because the extra overhead wouldn't give any real tangible benefit. On the other hand for something like /etc the added benefits of tracking extended attributes and hardlinks might outweigh the cost of a bunch of extra objects in the database. A bit of care with the construction of the index file should make it sufficiently efficient for day-to-day usage. If you're interested in some random musings about using GIT concepts to version whole filesystems (think checkpointing your disk drive and instantly restoring when you screw up), read on below, otherwise don't bother. Cheers, Kyle Moffett <Random Tangential Off-the-Wall Thought Experiment> NOTE: This probably belongs in it's own thread but it's such a random, undeveloped, and off-the-wall concept that I threw it in here just for kicks. Combining extensions like those described above with something like the Ext3 block-allocation, inode-management and journalling code to produce a "versioned filesystem". With the exponential growth of storage density over the last several years we've gotten to the point where we can many many hours of extremely realistic video and audio on your average small-computer drive. Versioning your home directory, or even your entire computer, even with fairly steady modifications to multimedia files, installation of software programs, etc, doesn't seem like such an impossible undertaking anymore. One predefined inode would contain a list of tags/heads and their current hashes. Mount the filesystem with a "tag=$TAG" option to specify the initial tree object used for the root directory (with syscalls to navigate the history). Allocate an inode per-mount to represent any changes from the last commit. For efficiency purposes (no need to revision the entire system when I commit a change in my home directory) add a "subtree" object type which can specify either a particular hash or a symbolic tag/head name as a pseudo sub-mountpoint. Trap traversal of the sub- mountpoint node to mount the filesystem with "tag=$SUBTAG" on the sub- mountpoint, expiring it some time after the last traversal. The only remaining issue would be properly navigating through the history, preserving or discarding changes. Since the kernel could easily manage copy-on-write semantics for underlying disk blocks you wouldn't need a separate "working copy" except where it's modified from the original, and discarding changes is as simple as unlinking any files referenced by the per-mount delta inode. Committing changes would get tricky, you would need to hot-remap memory-mapped pages read-only while you checksum and store them. The next write attempt would then separate the page from the freshly- committed on-disk version. Would need a mechanism for applications to "trap" the commit so they could make databases consistent, with the ability for root or the mountpoint owner to commit without waiting for synchronization. Only needs to synchronize files belonging to the new commit. Merges would be managed from userspace, as long as there is a way to browse through objects by hash given sufficient permissions. Make sure it's really easy to make a new atomic commit and/or reset to a known state every time the computer is rebooted (whether soft- rebooted or via crash/powerkill). With journalling and the write- once nature of GIT it would be trivial to never require an fsck run. Also needs a way to move data between filesystems. Makes LVM largely irrelevant; it doesn't matter how many disks you have if they're all treated as a shared storage pool for your GITfs data. Make sure it's possible to archive data onto slower disks/media and purge older commits from the archive (missing parent commit references are tolerable in many situations). Needs a way to notice hash collisions and take action to avoid them. </Random Tangential Off-the-Wall Thought Experiment> Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-12 13:49 ` Kyle Moffett @ 2006-12-12 15:53 ` Andy Parkins 2006-12-12 22:49 ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm 2006-12-13 18:10 ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow 1 sibling, 1 reply; 34+ messages in thread From: Andy Parkins @ 2006-12-12 15:53 UTC (permalink / raw) To: git On Tuesday 2006 December 12 13:49, Kyle Moffett wrote: > Hmm, ok. It would seem to be a reasonable requirement that if you > want to change any of the "preserve_*_attributes" config options you > need to blow away and recreate your index, no? I would probably > change the underlying index format pretty completely and stick a new > version tag inside it. I wonder if git's skill at managing content is the answer? Rather than mess around with git's internals, the index, or the object database; how about simply having a pre-commit script that writes out a file that looks like: -rw-r--r-- andyp andyp CHANGES -rw-r--r-- andyp andyp COPYING -rw-rw-r-- andyp andyp CREDITS -rw-r--r-- andyp andyp Configure -rw-rw-r-- andyp andyp Makefile -rw-r--r-- andyp andyp README If /that/ file were stored in the repository and you had a script that could read that file and apply the permissions after a checkout you'd have what you want. If the permissions of a file changed but the content didn't, then this ".gitpermissions" file would have changed content but the file itself would remain the same. If the content changed but not the permissions then ".gitpermissions" would be untouched. Assuming that you're allowed to mess with the index in pre-commit (I haven't checked), one half of it can be automatic. I suppose you could also plead for a post-checkout hook to apply those permissions and the whole lot would be transparent. Andy -- Dr Andy Parkins, M Eng (hons), MIEE ^ permalink raw reply [flat|nested] 34+ messages in thread
* Using git as a general backup mechanism (was Re: Using GIT to store /etc) 2006-12-12 15:53 ` Andy Parkins @ 2006-12-12 22:49 ` Steven Grimm 2006-12-12 22:57 ` Johannes Schindelin ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Steven Grimm @ 2006-12-12 22:49 UTC (permalink / raw) To: git This discussion reminds me of a use of git I've had in the back of my head to try out for a while. Right now I'm doing my local snapshot backups using the rsync-with-hard-links scheme (http://www.mikerubel.org/computers/rsync_snapshots/ if you're not familiar with it). This is nice in that the contents of files that don't change are only stored once on the backup disk. But it is less than optimal in that a file that changes even a little bit is stored from scratch. What would be great for this would be to store each day's backup as a git revision; with a periodic repack, this would be much more space-efficient than the rsync hard links. The problem is that while that would give me a very efficient backup scheme, the repository would still grow over time. In rsync land, I solve the disk space issue by keeping two weeks' worth of daily snapshots, then six months' worth of weekly snapshots, then two years' worth of monthly snapshots; files that change daily have a constant number of revisions stored in my backups, and older files drop off the backup disk as they age. Given that there's no way (or is there?) to delete revisions from the *beginning* of a git revision history, right now it seems like the only approach that comes close is to give up on the "daily then weekly then monthly" thing -- probably fine given the space savings of delta compression -- and periodically make shallow clones of the backup repository that fetch all but the first N revisions; once a shallow clone is made, the original gets deleted and the clone is the new backup repo. But it would sure be more efficient to be able to "shallow-ize" an existing repository. That would be useful for things other than backups, too, e.g. the recent request for some way to track just the current version of the kernel code rather than its revision history. If there were a shallowize command, you could do something like "git pull; git shallowize --depth 1" to track the latest revision without keeping the history locally. Anyone think that sounds like an interesting thing to explore? -Steve ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc) 2006-12-12 22:49 ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm @ 2006-12-12 22:57 ` Johannes Schindelin 2006-12-12 23:06 ` Steven Grimm 2006-12-12 23:15 ` Martin Langhoff 2006-12-12 23:43 ` Using git as a general backup mechanism Junio C Hamano 2 siblings, 1 reply; 34+ messages in thread From: Johannes Schindelin @ 2006-12-12 22:57 UTC (permalink / raw) To: Steven Grimm; +Cc: git Hi, On Tue, 12 Dec 2006, Steven Grimm wrote: > If there were a shallowize command, you could do something like "git > pull; git shallowize --depth 1" to track the latest revision without > keeping the history locally. Almost! $ git pull --depth 1 Though it needs a server _and_ a client supporting shallow clones, which support is brewed in "next" right now. Ciao, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc) 2006-12-12 22:57 ` Johannes Schindelin @ 2006-12-12 23:06 ` Steven Grimm 2006-12-13 0:01 ` Johannes Schindelin 0 siblings, 1 reply; 34+ messages in thread From: Steven Grimm @ 2006-12-12 23:06 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin wrote: > $ git pull --depth 1 > > Though it needs a server _and_ a client supporting shallow clones, which > support is brewed in "next" right now. > Will that actually discard old revisions that are already stored locally? -Steve ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc) 2006-12-12 23:06 ` Steven Grimm @ 2006-12-13 0:01 ` Johannes Schindelin 0 siblings, 0 replies; 34+ messages in thread From: Johannes Schindelin @ 2006-12-13 0:01 UTC (permalink / raw) To: Steven Grimm; +Cc: git Hi, On Tue, 12 Dec 2006, Steven Grimm wrote: > Johannes Schindelin wrote: > > $ git pull --depth 1 > > > > Though it needs a server _and_ a client supporting shallow clones, > > which support is brewed in "next" right now. > > Will that actually discard old revisions that are already stored > locally? No. A pull should _never_ lose anything from the repository. However, if some objects become no-longer reachable (and at the moment it looks like we cut of history, even if we should not need to), they can be pruned from the repo. Hth, Dscho ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc) 2006-12-12 22:49 ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm 2006-12-12 22:57 ` Johannes Schindelin @ 2006-12-12 23:15 ` Martin Langhoff 2006-12-12 23:23 ` Martin Langhoff 2006-12-12 23:43 ` Using git as a general backup mechanism Junio C Hamano 2 siblings, 1 reply; 34+ messages in thread From: Martin Langhoff @ 2006-12-12 23:15 UTC (permalink / raw) To: Steven Grimm; +Cc: git Steven, I've been thinking myself of writing a pdumpfs lookalike that uses git internally. Sounds you you've got one already ;-) In terms of getting rid of old history, have you considered moving a graft point "forward" in time, and running git-repack -a -d? With your history being (mostly?) linear this could be a workable scheme, but I don't have much practice with using grafts. cheers, ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism (was Re: Using GIT to store /etc) 2006-12-12 23:15 ` Martin Langhoff @ 2006-12-12 23:23 ` Martin Langhoff 0 siblings, 0 replies; 34+ messages in thread From: Martin Langhoff @ 2006-12-12 23:23 UTC (permalink / raw) To: Steven Grimm; +Cc: git On 12/13/06, Martin Langhoff <martin.langhoff@gmail.com> wrote: > I've been thinking myself of writing a pdumpfs lookalike that uses git > internally. Sounds you you've got one already ;-) Actually - what I was considering was mixing the "daily commit" with GITFS ;-) http://www.sfgoth.com/~mitch/linux/gitfs/ are your scripts published anywhere? cheers, ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism 2006-12-12 22:49 ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm 2006-12-12 22:57 ` Johannes Schindelin 2006-12-12 23:15 ` Martin Langhoff @ 2006-12-12 23:43 ` Junio C Hamano 2006-12-14 23:33 ` Steven Grimm 2 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2006-12-12 23:43 UTC (permalink / raw) To: Steven Grimm; +Cc: git Steven Grimm <koreth@midwinter.com> writes: > What would be great for this would be to store each day's backup as a > git revision; with a periodic repack, this would be much more > space-efficient than the rsync hard links. > > The problem is that while that would give me a very efficient backup > scheme, the repository would still grow over time. In rsync land, I > solve the disk space issue by keeping two weeks' worth of daily > snapshots, then six months' worth of weekly snapshots, then two years' > worth of monthly snapshots; files that change daily have a constant > number of revisions stored in my backups, and older files drop off the > backup disk as they age. Why not use N independent branches? I'd illustrate only with two levels below, but you could: (0) make a full tree snapshot. Store the commit in 'daily' branch as its tip. (1) A new day comes. Create an empty branch 'daily' if you do not already have one. Make a full tree snapshot, and create a parentless commit for the day if the 'daily' branch did not exist, or make it a child of the 'daily' commit from the previous day if the branch existed. (2) End of week comes. Create an empty branch 'weekly' if you do not already have one. Make a full tree snapshot, and create a parentless commit for the week if the 'weekly' branch did not exist, or make it a child of the 'weekly' commit from the last week. Discard 'lastweek' branch if you have one, and rename 'daily' branch to 'lastweek'. At the end of month, you can rename 'weekly' to 'lastmonth'; if you discard previous 'lastmonth' at this point, you essentially made files older than two months drop off the backup disk. You can add more hierarchy with longer period to extend the scheme ad infinitum. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism 2006-12-12 23:43 ` Using git as a general backup mechanism Junio C Hamano @ 2006-12-14 23:33 ` Steven Grimm 2006-12-15 0:33 ` Junio C Hamano 0 siblings, 1 reply; 34+ messages in thread From: Steven Grimm @ 2006-12-14 23:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > (2) End of week comes. Create an empty branch 'weekly' if you > do not already have one. Make a full tree snapshot, and > create a parentless commit for the week if the 'weekly' > branch did not exist, or make it a child of the 'weekly' > commit from the last week. Discard 'lastweek' branch if > you have one, and rename 'daily' branch to 'lastweek'. That sounds like it'd work, but doesn't it imply that the history of a given file in the backups is not continuous? That is, an old copy of a file on the "weekly" branch doesn't have any kind of ancestor relationship with the same file on the "daily" branch? While that's obviously no different than the current git-less situation where there's no notion of ancestry at all, it'd be neat if this backup scheme could actually track long-term changes to individual files. I wonder if rebasing can get me what I want. Something like: (1) Make a new branch from the latest daily. Commit a full tree snapshot to the new branch. (Each branch has exactly one commit.) (2) To expire a daily backup, rebase the second-oldest daily branch, which will initially be a child of the oldest daily branch, under the latest weekly branch instead. Delete the oldest daily branch. I believe the right commands here would be: git-rebase -s recursive -s ours --onto latest-weekly \ oldest-daily second-oldest-daily git-branch -D oldest-daily (Not sure about the double "-s", but I want it to detect renames where possible and never flag any conflicts.) (3) At the end of the week, instead of expiring the oldest daily branch, rename it to indicate that it's now a weekly snapshot. (That will implicitly do the first part of step 2, since the next daily branch in line will already be a descendant of the newly renamed branch.) Repeat step 2, rebasing against the latest monthly branch, to expire the oldest weekly. (4) To expire an old monthly, rebase the second-oldest monthly branch under the initial empty revision, then delete the oldest monthly. This is basically step 2 again, but rebasing under a fixed starting point. (5) Run git-prune to expire the objects in the deleted branches, then git-repack -a -d to delta-compress everything. That's a bit convoluted, admittedly, and probably a perversion of everything pure about the branch system, but would it work? The big thing I'm not sure about here is whether, after doing my rebase and delete in step 2, the objects from the oldest daily will actually be removed by git-prune. They should be unreachable at that point, I think. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using git as a general backup mechanism 2006-12-14 23:33 ` Steven Grimm @ 2006-12-15 0:33 ` Junio C Hamano 0 siblings, 0 replies; 34+ messages in thread From: Junio C Hamano @ 2006-12-15 0:33 UTC (permalink / raw) To: Steven Grimm; +Cc: git Steven Grimm <koreth@midwinter.com> writes: > Junio C Hamano wrote: >> (2) End of week comes. Create an empty branch 'weekly' if you >> do not already have one. Make a full tree snapshot, and >> create a parentless commit for the week if the 'weekly' >> branch did not exist, or make it a child of the 'weekly' >> commit from the last week. Discard 'lastweek' branch if >> you have one, and rename 'daily' branch to 'lastweek'. > > That sounds like it'd work, but doesn't it imply that the history of a > given file in the backups is not continuous? That is, an old copy of a > file on the "weekly" branch doesn't have any kind of ancestor > relationship with the same file on the "daily" branch? While that's > obviously no different than the current git-less situation where > there's no notion of ancestry at all, it'd be neat if this backup > scheme could actually track long-term changes to individual files. You can keep them connected by rewriting history of bounded number of commits. When you start a new week, you would make the Monday commit a child of the tip of weekly branch that represents the latest weekly shapshot. Then on Friday, the history would show the 5 commits during the week and behind that would be a sequence of commits with one-per-week granularity. When you rotate the week's daily log out and the commit for Monday is based on the weekly history you are going to toss out, you may need to rebase that week's daily log branch. Let's say your policy is to keep daily log for at least one week and enough number of end-of-week weekly logs. Let's say it is week #2 right now. Aooo... (week #2 daily) /| ooooooB | (week #1 daily) / | o--------o---------C (end-of-week weekly log) The first commit in this week's daily log (A) would have two parents: last commit from daily log of week #1 (B), and the latest commit on the end-of-week weekly log (C). Most likely, B and C would have exactly the same tree. That way, you would have at least 7 days of daily log; at the end of this week you would have close to 14 days but "keeping at least one week" is satisfied. When starting the 3rd week, you will discard 1st week's log; you would need to rewrite 7 days worth of commits from week #2, because the first commit of week #2 should now only have one parent (C), and you would forget the commit on the last day of week #1 as its parent (B). Which cascades through 7 commits you made during week #2. You are not changing any trees, so this should be quite efficient. Then the first daily commit of 3rd week would have two parents, the commit at the end of week #2 daily branch (D), and a new commit (E) at the tip of the end-of-week log. Again, D and E would have the identical trees. o...... (week #3 daily) /| Aooo..D | (week #2 daily) | | (week #1 daily - gone) | | | | o--------o---------C-------E (end-of-week weekly log) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-12 13:49 ` Kyle Moffett 2006-12-12 15:53 ` Andy Parkins @ 2006-12-13 18:10 ` Daniel Barkalow 2006-12-14 5:06 ` Chris Riddoch 1 sibling, 1 reply; 34+ messages in thread From: Daniel Barkalow @ 2006-12-13 18:10 UTC (permalink / raw) To: Kyle Moffett; +Cc: git On Tue, 12 Dec 2006, Kyle Moffett wrote: > Hmm, ok. It would seem to be a reasonable requirement that if you want to > change any of the "preserve_*_attributes" config options you need to blow away > and recreate your index, no? I would probably change the underlying index > format pretty completely and stick a new version tag inside it. You should be able to promote an insufficient-version index to a new-version index that's needs to be refreshed for every entry. (And then update-index would take care of the necessary rewrite-everything in the normal way). But I suspect that the right thing is to require that the repository be created with a "commits-include-directories-not-trees" flag, and this means that you always use the extra-detailed index, and the options only affect what information is filtered out in transit between the directory object and the index. Having more information in the index is merely a potential waste of space, not a correctness issue (we have extra information for trees in the index now, remember); it just means that there are more things that will cause git to reread the file, rather than declaring it unchanged with a stat(). For that matter, it may be best for the directory objects to record what information in them is real, and keep the "what's content" mask in the index as well. If it changes over the history of a repository, you want to correctly interpret the historical commits. > Ok, seems straightforward enough. One other thing that crossed my mind was > figuring out how to handle hardlinks. The simplest solution would be to add > an extra layer of indirection between the "file inode" and the "file data". > Instead of your directory pointing to a "file-data" blob and "file-attributes" > object, it would point to an "file-inode" object with embedded attribute data > and a pointer to the file contents blob. > > I remember reading some discussions from the early days of GIT about how that > was considered and discarded because the extra overhead wouldn't give any real > tangible benefit. On the other hand for something like /etc the added > benefits of tracking extended attributes and hardlinks might outweigh the cost > of a bunch of extra objects in the database. A bit of care with the > construction of the index file should make it sufficiently efficient for > day-to-day usage. I was thinking this could be internal to the directory object, but you probably want to support hardlinks shared between dentries in different directory objects, so you're probably right that this makes sense. Alternatively, you could use a single "directory" object for the whole state (including subdirectories), making hardlinks out of the object clearly impossible, or you could use some scheme for sharing sub-"directory" objects that would imply that hardlinks are within an object (the hard part here is finding things when their locations aren't predictable by name). -Daniel ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Using GIT to store /etc (Or: How to make GIT store all file permission bits) 2006-12-13 18:10 ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow @ 2006-12-14 5:06 ` Chris Riddoch 0 siblings, 0 replies; 34+ messages in thread From: Chris Riddoch @ 2006-12-14 5:06 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Kyle Moffett, git So, I've been making little repositories for appropriately related stuff. For example, I have a repository for my ~/.bashrc, ~/.bash_profile, ~/.bash_completions/*, and such. I recall Linus's post in the "VCS Comparison Table" thread, and after thinking about it, I decided the best thing to do would be to have a couple extra files tracked in the repository, alongside other data. I use a backup shell script to copy things from my system to the repository, and then I run getfacl on it all to write out all the details to a 'facl' file in my repository. Then I can make a commit. Then there's a restore shell script to copy things back to my system, and restore ownership and permissions with setfacl. I store the backup and restore scripts in the repository. Paths are currently hard-coded. I'm sure there's a more flexible way to do this, though I'd need some means of representing the correspondence between content in the repository and files in my filesystem. On 12/13/06, Daniel Barkalow <barkalow@iabervon.org> wrote: > On Tue, 12 Dec 2006, Kyle Moffett wrote: > > > Hmm, ok. It would seem to be a reasonable requirement that if you want to > > change any of the "preserve_*_attributes" config options you need to blow > away > > and recreate your index, no? I would probably change the underlying index > > format pretty completely and stick a new version tag inside it. > > You should be able to promote an insufficient-version index to a > new-version index that's needs to be refreshed for every entry. (And then > update-index would take care of the necessary rewrite-everything in the > normal way). But I suspect that the right thing is to require that the > repository be created with a "commits-include-directories-not-trees" flag, > and this means that you always use the extra-detailed index, and the > options only affect what information is filtered out in transit between > the directory object and the index. Having more information in the index > is merely a potential waste of space, not a correctness issue (we have > extra information for trees in the index now, remember); it just means > that there are more things that will cause git to reread the file, rather > than declaring it unchanged with a stat(). > > For that matter, it may be best for the directory objects to record what > information in them is real, and keep the "what's content" mask in the > index as well. If it changes over the history of a repository, you want to > correctly interpret the historical commits. > > > Ok, seems straightforward enough. One other thing that crossed my mind > was > > figuring out how to handle hardlinks. The simplest solution would be to > add > > an extra layer of indirection between the "file inode" and the "file > data". > > Instead of your directory pointing to a "file-data" blob and > "file-attributes" > > object, it would point to an "file-inode" object with embedded attribute > data > > and a pointer to the file contents blob. > > > > I remember reading some discussions from the early days of GIT about how > that > > was considered and discarded because the extra overhead wouldn't give any > real > > tangible benefit. On the other hand for something like /etc the added > > benefits of tracking extended attributes and hardlinks might outweigh the > cost > > of a bunch of extra objects in the database. A bit of care with the > > construction of the index file should make it sufficiently efficient for > > day-to-day usage. > > I was thinking this could be internal to the directory object, but you > probably want to support hardlinks shared between dentries in different > directory objects, so you're probably right that this makes sense. > > Alternatively, you could use a single "directory" object for the whole > state (including subdirectories), making hardlinks out of the object > clearly impossible, or you could use some scheme for sharing > sub-"directory" objects that would imply that hardlinks are within an > object (the hard part here is finding things when their locations aren't > predictable by name). > > -Daniel > *This .sig left intentionally blank* > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- epistemological humility ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2007-01-12 0:55 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-12-10 13:40 Using GIT to store /etc (Or: How to make GIT store all file permission bits) Kyle Moffett 2006-12-10 14:49 ` Jeff Garzik 2006-12-10 15:30 ` Jakub Narebski 2006-12-10 18:10 ` Kyle Moffett 2006-12-10 18:18 ` Jakub Narebski 2006-12-10 18:26 ` Jakub Narebski 2006-12-10 18:35 ` Kyle Moffett 2006-12-11 10:39 ` Andreas Ericsson 2006-12-11 10:55 ` Jeff Garzik 2006-12-11 12:13 ` Josef Weidendorfer 2006-12-11 13:33 ` Johannes Schindelin 2006-12-11 15:07 ` Josef Weidendorfer 2006-12-10 15:06 ` Santi Béjar 2006-12-10 17:46 ` Kyle Moffett 2006-12-10 18:10 ` Jakub Narebski 2007-01-10 1:39 ` David Lang 2007-01-10 2:30 ` Shawn O. Pearce 2007-01-10 18:34 ` David Lang 2007-01-12 0:55 ` Shawn O. Pearce 2006-12-11 10:50 ` Nikolai Weibull 2006-12-12 3:45 ` Daniel Barkalow 2006-12-12 13:49 ` Kyle Moffett 2006-12-12 15:53 ` Andy Parkins 2006-12-12 22:49 ` Using git as a general backup mechanism (was Re: Using GIT to store /etc) Steven Grimm 2006-12-12 22:57 ` Johannes Schindelin 2006-12-12 23:06 ` Steven Grimm 2006-12-13 0:01 ` Johannes Schindelin 2006-12-12 23:15 ` Martin Langhoff 2006-12-12 23:23 ` Martin Langhoff 2006-12-12 23:43 ` Using git as a general backup mechanism Junio C Hamano 2006-12-14 23:33 ` Steven Grimm 2006-12-15 0:33 ` Junio C Hamano 2006-12-13 18:10 ` Using GIT to store /etc (Or: How to make GIT store all file permission bits) Daniel Barkalow 2006-12-14 5:06 ` Chris Riddoch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).