* Kernel headers git tree
@ 2006-07-13 23:59 David Woodhouse
2006-07-14 0:39 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 27+ messages in thread
From: David Woodhouse @ 2006-07-13 23:59 UTC (permalink / raw)
To: linux-kernel; +Cc: git
At http://git.kernel.org/git/?p=linux/kernel/git/dwmw2/kernel-headers.git
there's a git tree which contains the sanitised exported headers for all
architectures -- basically the result of 'make headers_install'.
It tracks Linus' kernel tree, by means of some evil scripts.¹
Only commits in Linus' tree which actually affect the exported result
should have an equivalent commit in the above tree, which means that any
changes which affect userspace should be clearly visible for review.
--
dwmw2
¹ http://david.woodhou.se/extract-khdrs-git.sh and
http://david.woodhou.se/extract-khdrs-stage2.sh for the stout of stomach
^ permalink raw reply [flat|nested] 27+ messages in thread* Re: Kernel headers git tree 2006-07-13 23:59 Kernel headers git tree David Woodhouse @ 2006-07-14 0:39 ` Junio C Hamano 2006-07-14 0:56 ` David Woodhouse 2006-07-14 1:05 ` Linus Torvalds 2006-07-14 7:20 ` Ian Campbell 2006-07-14 18:05 ` Ingo Oeser 2 siblings, 2 replies; 27+ messages in thread From: Junio C Hamano @ 2006-07-14 0:39 UTC (permalink / raw) To: David Woodhouse; +Cc: git David Woodhouse <dwmw2@infradead.org> writes: > ¹ http://david.woodhou.se/extract-khdrs-git.sh and > http://david.woodhou.se/extract-khdrs-stage2.sh for the stout of stomach With modern enough git, you can rewrite KBUILDSHA=`git ls-tree $TREE -- Kbuild | cut -f3 -d\ | cut -f1` with KBUILDSHA1=`git rev-parse $TREE:Kbuild` I am not sure what function incparent() is trying to do with this: git rev-list --max-count=1 --topo-order $1 -- . ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 0:39 ` Junio C Hamano @ 2006-07-14 0:56 ` David Woodhouse 2006-07-14 1:08 ` Linus Torvalds 2006-07-14 2:37 ` Junio C Hamano 2006-07-14 1:05 ` Linus Torvalds 1 sibling, 2 replies; 27+ messages in thread From: David Woodhouse @ 2006-07-14 0:56 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Thu, 2006-07-13 at 17:39 -0700, Junio C Hamano wrote: > With modern enough git, you can rewrite > KBUILDSHA=`git ls-tree $TREE -- Kbuild | cut -f3 -d\ | cut -f1` > with > KBUILDSHA1=`git rev-parse $TREE:Kbuild` Aha. Thanks. > I am not sure what function incparent() is trying to do with > this: > > git rev-list --max-count=1 --topo-order $1 -- . Find the latest ancestor commit which actually changed any files. The first script has a similar line, except that it finds the latest ancestor which changed anything in include/ Consider a kernel tree with commits A-->B-->C-->D, of which only A and C change anything in include/ and in fact only C actually changes the _exported_ headers after the unifdef and sed bits. The first script (extract-khdrs-git.sh) creates a 'stage1' branch which only contains commits A'-->C', with the _exported_ header tree for each. The second script (extract-khdrs-stage2.sh) then creates the master branch with the same tree objects, but omitting the commits which don't change anything. So it contains only commit C'' For an example of this, compare http://git.kernel.org/git/?p=linux/kernel/git/dwmw2/kernel-headers.git with http://git.kernel.org/git/?p=linux/kernel/git/dwmw2/kernel-headers.git;a=shortlog;h=stage1 Btw, git-rev-list is _very_ slow at this. Even when the output is actually HEAD, it takes my 2.3GHz G5 a _long_ time to give a result: pmac /pmac/git/linux-2.6 $ git-rev-parse HEAD ab6cf0d0cb96417ef65cc2c2120c0e879edf7a4a pmac /pmac/git/linux-2.6 $ time git-rev-list --max-count=1 --topo-order HEAD -- include ab6cf0d0cb96417ef65cc2c2120c0e879edf7a4a real 0m18.840s Is there a better way to do that step? -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 0:56 ` David Woodhouse @ 2006-07-14 1:08 ` Linus Torvalds 2006-07-14 2:37 ` Junio C Hamano 1 sibling, 0 replies; 27+ messages in thread From: Linus Torvalds @ 2006-07-14 1:08 UTC (permalink / raw) To: David Woodhouse; +Cc: Junio C Hamano, git On Fri, 14 Jul 2006, David Woodhouse wrote: > > Btw, git-rev-list is _very_ slow at this. Even when the output is > actually HEAD, it takes my 2.3GHz G5 a _long_ time to give a result: > > pmac /pmac/git/linux-2.6 $ git-rev-parse HEAD > ab6cf0d0cb96417ef65cc2c2120c0e879edf7a4a > pmac /pmac/git/linux-2.6 $ time git-rev-list --max-count=1 --topo-order HEAD -- include > ab6cf0d0cb96417ef65cc2c2120c0e879edf7a4a > > real 0m18.840s > > Is there a better way to do that step? Umm.. On my poor little 1.6GHz laptop: [torvalds@evo linux]$ time git-rev-list --max-count=1 HEAD -- include ab6cf0d0cb96417ef65cc2c2120c0e879edf7a4a real 0m0.014s user 0m0.004s sys 0m0.012s that's 0.014 sec. Not exactly slow. Now, the --topo-order you have there does slow it down a lot: [torvalds@evo linux]$ time git-rev-list --max-count=1 --topo-order HEAD -- include ab6cf0d0cb96417ef65cc2c2120c0e879edf7a4a real 0m24.016s user 0m23.973s sys 0m0.016s so now it takes 24 seconds, and gives the same result. Linus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 0:56 ` David Woodhouse 2006-07-14 1:08 ` Linus Torvalds @ 2006-07-14 2:37 ` Junio C Hamano 1 sibling, 0 replies; 27+ messages in thread From: Junio C Hamano @ 2006-07-14 2:37 UTC (permalink / raw) To: David Woodhouse; +Cc: git David Woodhouse <dwmw2@infradead.org> writes: > On Thu, 2006-07-13 at 17:39 -0700, Junio C Hamano wrote: >> With modern enough git, you can rewrite >> KBUILDSHA=`git ls-tree $TREE -- Kbuild | cut -f3 -d\ | cut -f1` >> with >> KBUILDSHA1=`git rev-parse $TREE:Kbuild` > > > Aha. Thanks. > >> I am not sure what function incparent() is trying to do with >> this: >> >> git rev-list --max-count=1 --topo-order $1 -- . > > Find the latest ancestor commit which actually changed any files. The > first script has a similar line, except that it finds the latest > ancestor which changed anything in include/ > > Consider a kernel tree with commits A-->B-->C-->D, of which only A and C > change anything in include/ and in fact only C actually changes the > _exported_ headers after the unifdef and sed bits. > > The first script (extract-khdrs-git.sh) creates a 'stage1' branch which > only contains commits A'-->C', with the _exported_ header tree for each. > > The second script (extract-khdrs-stage2.sh) then creates the master > branch with the same tree objects, but omitting the commits which don't > change anything. So it contains only commit C'' I guess what I was getting at was if you can avoid creating commits that do not change anything from previous in stage1 branch, you do not have to do this, but I haven't studied stage1 script deeply enough. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 0:39 ` Junio C Hamano 2006-07-14 0:56 ` David Woodhouse @ 2006-07-14 1:05 ` Linus Torvalds 2006-07-14 1:27 ` David Woodhouse 1 sibling, 1 reply; 27+ messages in thread From: Linus Torvalds @ 2006-07-14 1:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: David Woodhouse, git On Thu, 13 Jul 2006, Junio C Hamano wrote: > > I am not sure what function incparent() is trying to do with > this: > > git rev-list --max-count=1 --topo-order $1 -- . Yeah, that looks strange. The "--topo-order" in particular looks pointless, and just slows things down. The default ordering from git-rev-list (and all other revision listing things, ie "git log" etc) _does_ guarantee that we never show a child before _one_ of its parents has been shown (although "parent" in this case may be the command line). As such, "--max-count=1 --topo-order" is pointless if you only give one revision, because whether you use --topo-order or not, the first commit will always be the parent of all subsequent commits. So --topo-order just makes things MUCH MUCH slower with no upsides. But that thing is doubly strange, because it uses "." as a path specifier. If this is done in the top-most directory, that should mean "all changes", which in turn means that the whole thing should be equivalent to git rev-parse "$1^0" since all commits should make _some_ change, and thus the first revision in the list should always be the top commit - the one you passed in as an argument. Linus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 1:05 ` Linus Torvalds @ 2006-07-14 1:27 ` David Woodhouse 2006-07-14 5:16 ` Linus Torvalds 2006-07-14 5:52 ` Linus Torvalds 0 siblings, 2 replies; 27+ messages in thread From: David Woodhouse @ 2006-07-14 1:27 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git On Thu, 2006-07-13 at 18:05 -0700, Linus Torvalds wrote: > > On Thu, 13 Jul 2006, Junio C Hamano wrote: > > > > I am not sure what function incparent() is trying to do with > > this: > > > > git rev-list --max-count=1 --topo-order $1 -- . > > Yeah, that looks strange. > > The "--topo-order" in particular looks pointless, and just slows things > down. > > The default ordering from git-rev-list (and all other revision listing > things, ie "git log" etc) _does_ guarantee that we never show a child > before _one_ of its parents has been shown (although "parent" in this case > may be the command line). Does it? I thought at one point it sorted on some random criterion like alphabetically by author, or some other cosmetic information which isn't really part of the git structure -- like the timestamp or something? We still don't enforce monotonicity, do we? The timestamps are still just fluff? > But that thing is doubly strange, because it uses "." as a path specifier. > If this is done in the top-most directory, that should mean "all changes", > which in turn means that the whole thing should be equivalent to > > git rev-parse "$1^0" > > since all commits should make _some_ change, and thus the first revision > in the list should always be the top commit - the one you passed in as an > argument. In this case, I really do have commits in the intermediate tree which don't actually change anything, and I want to filter them out -- I couldn't see a simple way to do it all in one pass. -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 1:27 ` David Woodhouse @ 2006-07-14 5:16 ` Linus Torvalds 2006-07-14 10:23 ` David Woodhouse 2006-07-14 5:52 ` Linus Torvalds 1 sibling, 1 reply; 27+ messages in thread From: Linus Torvalds @ 2006-07-14 5:16 UTC (permalink / raw) To: David Woodhouse; +Cc: Junio C Hamano, git On Fri, 14 Jul 2006, David Woodhouse wrote: > > > > The default ordering from git-rev-list (and all other revision listing > > things, ie "git log" etc) _does_ guarantee that we never show a child > > before _one_ of its parents has been shown (although "parent" in this case > > may be the command line). > > Does it? I thought at one point it sorted on some random criterion like > alphabetically by author, or some other cosmetic information which isn't > really part of the git structure -- like the timestamp or something? > We still don't enforce monotonicity, do we? The timestamps are still > just fluff? The timestamps are, and always have been, just a heuristic. The output order of git-rev-list is actually entirely well-defined, but it's the _cheap_ ordering, not the strict and full topological one. The cheap ordering means that we don't ever look at the whole history, but it's still a real "DAG reachability ordering" in the sense that when we output a commit, we have _always_ output _one_ full path of commits to reach that commit from one of the starting point. But since you can traverse the DAG in any number of ways, the heuristic is that when there are multiple choices, we pick the one with the most recent commit date. So to give an example, let's say we have HEAD -> A / \ B C / \ \ D E F \ / / \ G H I ....... the difference between --topo-order and the default ordering for git rev-list HEAD is most visible for commit 'G'. For --topo-order, we guarantee that before we show 'G', we _will_ have shown both 'D' and 'E'. In other words, --topo-ordering guarantees that it shows _all_ children before it shows the parent. That's a _very_ very expensive thing to guarantee, because you can't actually tell that you've seen all children on 'G' before you've basically traversed most of the tree. In the above example, you CANNOT tell whether 'F' is a child of 'G', for exmaple. Think about it. You don't know - maybe the missing piece is 'I' -> 'Z' -> 'G', but without having parsed all the commits, you'll never know. [ Actually, strictly speaking, you can guarantee it earlier than before you parsed them _all_: you can guarantee it once _every_single_commit_ whose parents you haven't followed yet is a direct ancestor of 'G' - at that point, and not before, do you know that 'G' can have no more children. That's actually very expensive to compute, so we don't do it - we will walk the whole history, and only _then_ do we use one of the algorithms to generate a topological sort from the full DAG. If somebody knows of an _incremental_ algorithm that doesn't need the full DAG and can do a topo-ordering, that would be wonderful. But it's basically very very very expensive. ] So by default, we don't do that at all. By default, we will print out 'G' whenever we have printed out _any_ path leading to 'G', and 'G' is the commit with the most recent commit date. So we might print things out as A, B, D, G, E ... - notice how we printed out 'E' _after_ we did 'G', but we did have the A->B->D->G path, so G was reachable from the top along the path we printed. > In this case, I really do have commits in the intermediate tree which > don't actually change anything, and I want to filter them out -- I > couldn't see a simple way to do it all in one pass. Ok, in that case, the "." is correct, but the --topo-order should be unnecessary because you only care about the first entry. Linus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 5:16 ` Linus Torvalds @ 2006-07-14 10:23 ` David Woodhouse 2006-07-14 15:57 ` Linus Torvalds 0 siblings, 1 reply; 27+ messages in thread From: David Woodhouse @ 2006-07-14 10:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git On Thu, 2006-07-13 at 22:16 -0700, Linus Torvalds wrote: > So to give an example, let's say we have > > HEAD -> A > / \ > B C > / \ \ > D E F > \ / / \ > G H I > ....... > > the difference between --topo-order and the default ordering for > > git rev-list HEAD > > is most visible for commit 'G'. > > For --topo-order, we guarantee that before we show 'G', we _will_ have > shown both 'D' and 'E'. In other words, --topo-ordering guarantees that it > shows _all_ children before it shows the parent. Ah, OK. Then it should probably be fine. I'll talk myself through it... We're building a parallel graph of commits, containing a _subset_ of the commits in the master tree -- only those which touch certain files. For each 'interesting' commit X, we create a corresponding commit X' in the slave tree -- we create the corresponding tree object, and we also recursively create its parent commits -- replacing each parent in the original commit X with the slave-tree equivalent of the closest _interesting_ ancestor commit. It's that "closest interesting ancestor" which we're finding with the 'rev-list --max-count-1 -- myfile' invocation. The extract-khdrs-stage2.sh script is a simple example of this, and differs from the other script mostly in the way that it creates the _tree_ objects. So working from your example above, and assuming that only commits I and E actually change the files we care about. This means that merges A, B and F are _also_ going to show up in the output of 'rev-list -- myfile'. So the slave tree will look like this: A' / \ B' F' | | E' I' The interesting case, if I'm trying to convince myself that my 'slave' tree is always going to have the correct topology, is when a merge commit is _missing_ from the rev-list output -- for example, if commits D and E in your original tree both make the _same_ change, then I believe that the merge commit B will no longer show up, because 'myfile' is identical in B and in both of its parents. In that case, we accept that the representation isn't going to be perfect -- the left-hand parent of A' is going to appear to be _either_ D' or E', but not B'. In fact, since D' and E' are _identical_ as far as we're concerned, it doesn't really matter which is chosen. The other one of the two becomes an unused branch with no children -- we end up with a graph looking like this. A' / \ D' E' F' \/ | I' ... and the parent of D' and E' is the closest ancestor of G which actually touches the files we care about, of course. All we care about, in this case, is that the first commit listed by rev-list is _either_ D or E, and not something further down the tree. And that's obviously true from your description of the 'weak ordering', so yes -- it does look like I can drop the '--topo-order'. Thanks. (It would actually be quite nice if I _could_ find a cheap way to include commit B' in that final example, but it's such a rare case and it would be so expensive to do it that I don't think it's worth pursuing.) -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 10:23 ` David Woodhouse @ 2006-07-14 15:57 ` Linus Torvalds 2006-07-14 17:51 ` Daniel Barkalow 0 siblings, 1 reply; 27+ messages in thread From: Linus Torvalds @ 2006-07-14 15:57 UTC (permalink / raw) To: David Woodhouse; +Cc: Junio C Hamano, git On Fri, 14 Jul 2006, David Woodhouse wrote: > On Thu, 2006-07-13 at 22:16 -0700, Linus Torvalds wrote: > > > > HEAD -> A > > / \ > > B C > > / \ \ > > D E F > > \ / / \ > > G H I > > ....... > > > > So working from your example above, and assuming that only commits I and > E actually change the files we care about. This means that merges A, B > and F are _also_ going to show up in the output of 'rev-list -- myfile'. Not necessarily. > So the slave tree will look like this: > > A' > / \ > B' F' > | | > E' I' Yes, but ONLY IF the following is true: A is different from _both_ F and B in the relevant files. If A == F (in those files), then the A merge will have been simplified away. Strictly speaking, what happens is that when it sees the merge A (which has parents B and C), and sees that _all_ the changes came from C, the simplification will decide that B simply isn't even interesting, and rewrite the merge A as having _only_ C as a parent, since C clearly explains everything that happened to those files, and B had nothing to do with it. It will then remove both A (which is no longer a merge) and C, since neither of them change the files, and will leave you with just F' | I' instead. > The interesting case, if I'm trying to convince myself that my 'slave' > tree is always going to have the correct topology, is when a merge > commit is _missing_ from the rev-list output Note that there are only two ways you can be missing a merge: - you literally asked for it with "--no-merges" - the merge had one parent that was identical to it, and the merge was simplified as above. > In that case, we accept that the representation isn't going to be > perfect -- the left-hand parent of A' is going to appear to be _either_ > D' or E', but not B'. In fact, since D' and E' are _identical_ as far as > we're concerned, it doesn't really matter which is chosen. The other one > of the two becomes an unused branch with no children -- we end up with a > graph looking like this. > > A' > / \ > D' E' F' > \/ | > I' You will never see this, because D' is simply not reachable. You can have either: - A got simplified away as a merge entirely, because C was identical, and B was thus considered "uninteresting" (as in "it not matter for the end result"), and then the later phase will always remove A too (since, by definition, for the merge to be simplified to a non-merge, it must be identical to the parent it was simplified to have) - or _both_ B and C were different to A in those files, and A still exists as a merge, but B was identical to one of its parents (let's say E), and was first simplified to "B->E->G", and then because B and E were identical, B itself was dropped, and only A' / \ E' F' | | G' I' remains. NOTE NOTE NOTE! This is how "git rev-list" (and all the other related git tools, like "git log" etc) simplify the tree. It is, in my opinion, the only sane way to do it, although you can pass in "--full-history" to say that you don't want any merge simplification at all. The reason I mention it is that _your_ simplifications may obviously do something else entirely, and you may obviously have different rules for how you simplify the tree further. But it sounds like you don't simplify the history at all (apart from the simplification that git-rev-list did for you)? Linus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 15:57 ` Linus Torvalds @ 2006-07-14 17:51 ` Daniel Barkalow 2006-07-14 17:58 ` David Woodhouse 0 siblings, 1 reply; 27+ messages in thread From: Daniel Barkalow @ 2006-07-14 17:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: David Woodhouse, Junio C Hamano, git On Fri, 14 Jul 2006, Linus Torvalds wrote: > On Fri, 14 Jul 2006, David Woodhouse wrote: > > > On Thu, 2006-07-13 at 22:16 -0700, Linus Torvalds wrote: > > > > > > HEAD -> A > > > / \ > > > B C > > > / \ \ > > > D E F > > > \ / / \ > > > G H I > > > ....... > > > > > > > So working from your example above, and assuming that only commits I and > > E actually change the files we care about. This means that merges A, B > > and F are _also_ going to show up in the output of 'rev-list -- myfile'. > > Not necessarily. > > > So the slave tree will look like this: > > > > A' > > / \ > > B' F' > > | | > > E' I' > > Yes, but ONLY IF the following is true: A is different from _both_ F and B > in the relevant files. Actually, this is an unlikely result, because B' and F' wouldn't appear unless they either have multiple children that appear or they have new modifications made to the files during the merge. The result under the conditions that the only changes are in E and I is: A' / \ E' I' Which, of course, is what you should expect: it only includes E, I, and merges which create a novel combination of changes (even if the changes they include have appeared alone before). > NOTE NOTE NOTE! This is how "git rev-list" (and all the other related git > tools, like "git log" etc) simplify the tree. It is, in my opinion, the > only sane way to do it, although you can pass in "--full-history" to say > that you don't want any merge simplification at all. > > The reason I mention it is that _your_ simplifications may obviously do > something else entirely, and you may obviously have different rules for > how you simplify the tree further. But it sounds like you don't simplify > the history at all (apart from the simplification that git-rev-list did > for you)? It seems like we ought to be able to provide the simplification procedure to code that's done further filtering on the set of commits somehow, or provide a framework with a callback, but it's a non-trivial design. I think that a program to generate a slave git tree based in some user-modifiable way on a parent repository would be useful and implementable. I'd thought a bunch about it a while ago, for extracting separable parts of projects (e.g., make a kbuild project that's pulled out of the kernel tree, but is still a regular git project to anyone who doesn't know this). My conclusion was that you need a cache of mappings, because otherwise you can't identify that you already have a transformed version of a commit, because you don't know its transformed parents, unless you've gone all the way back to the root (which doesn't have parents). But I think a "git2git" script wouldn't be any harder than the other import scripts, and would solve this problem nicely. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 17:51 ` Daniel Barkalow @ 2006-07-14 17:58 ` David Woodhouse 2006-07-14 18:21 ` Daniel Barkalow 0 siblings, 1 reply; 27+ messages in thread From: David Woodhouse @ 2006-07-14 17:58 UTC (permalink / raw) To: Daniel Barkalow; +Cc: Linus Torvalds, Junio C Hamano, git On Fri, 2006-07-14 at 13:51 -0400, Daniel Barkalow wrote: > I think that a program to generate a slave git tree based in some > user-modifiable way on a parent repository would be useful and > implementable. I'd thought a bunch about it a while ago, for extracting > separable parts of projects (e.g., make a kbuild project that's pulled out > of the kernel tree, but is still a regular git project to anyone who > doesn't know this). My conclusion was that you need a cache of mappings, > because otherwise you can't identify that you already have a transformed > version of a commit, because you don't know its transformed parents, > unless you've gone all the way back to the root (which doesn't have > parents). Absolutely. You don't want to go all the way back to the root every time -- it's an incremental process, and you have to cache the mappings from objects in the 'master' tree to objects in the 'slave' tree. My existing scripts already do that part -- I didn't think it was worth commenting on. http://david.woodhou.se/extract-jffs2-git.sh http://david.woodhou.se/extract-khdrs-git.sh http://david.woodhou.se/extract-khdrs-stage2.sh And no, I don't do any further simplification of the graph of commits other than what 'git-rev-list' does for me. I need to fully go over Linus' last mail and understand it, but I think the conclusion is that the above scripts are fine, and I can happily drop --topo-order from them. -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 17:58 ` David Woodhouse @ 2006-07-14 18:21 ` Daniel Barkalow 0 siblings, 0 replies; 27+ messages in thread From: Daniel Barkalow @ 2006-07-14 18:21 UTC (permalink / raw) To: David Woodhouse; +Cc: Linus Torvalds, Junio C Hamano, git On Fri, 14 Jul 2006, David Woodhouse wrote: > And no, I don't do any further simplification of the graph of commits > other than what 'git-rev-list' does for me. I need to fully go over > Linus' last mail and understand it, but I think the conclusion is that > the above scripts are fine, and I can happily drop --topo-order from > them. I think the mechanism you're using is fine, but it's also generally useful, and it would be nice to have the generic part split out from the particular application. Also, those scripts really are as evil as advertized, and using more of the git programs would make that a lot saner. -Daniel *This .sig left intentionally blank* ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 1:27 ` David Woodhouse 2006-07-14 5:16 ` Linus Torvalds @ 2006-07-14 5:52 ` Linus Torvalds 2006-07-14 9:38 ` David Woodhouse 1 sibling, 1 reply; 27+ messages in thread From: Linus Torvalds @ 2006-07-14 5:52 UTC (permalink / raw) To: David Woodhouse; +Cc: Junio C Hamano, git On Fri, 14 Jul 2006, David Woodhouse wrote: > > > But that thing is doubly strange, because it uses "." as a path specifier. > > If this is done in the top-most directory, that should mean "all changes", > > which in turn means that the whole thing should be equivalent to > > > > git rev-parse "$1^0" > > > > since all commits should make _some_ change, and thus the first revision > > in the list should always be the top commit - the one you passed in as an > > argument. > > In this case, I really do have commits in the intermediate tree which > don't actually change anything, and I want to filter them out -- I > couldn't see a simple way to do it all in one pass. Btw, I'm actually surprised that my path simplification didn't filter out the "." and make it mean exactly the same as not giving a path at all. I thought I had done that earlier, but if you say "-- ." matters, then it obviously does.. Linus ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 5:52 ` Linus Torvalds @ 2006-07-14 9:38 ` David Woodhouse 2006-07-14 15:39 ` Linus Torvalds 2006-07-14 18:01 ` Kernel headers git tree Junio C Hamano 0 siblings, 2 replies; 27+ messages in thread From: David Woodhouse @ 2006-07-14 9:38 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git On Thu, 2006-07-13 at 22:52 -0700, Linus Torvalds wrote: > Btw, I'm actually surprised that my path simplification didn't filter out > the "." and make it mean exactly the same as not giving a path at all. I > thought I had done that earlier, but if you say "-- ." matters, then it > obviously does.. In this specific case where I have a whole bunch of commits which don't actually change anything, it definitely does make a difference... hera /home/dwmw2 $ export GIT_DIR=/pub/scm/linux/kernel/git/dwmw2/kernel-headers.git hera /home/dwmw2 $ git-rev-list --max-count=5 stage1 e4e2fcc2c333aac5f6331c1df256ff28d7ee76d7 32ca8021c5ab7b9d44e8a08aeb53e52af5223fec 6b8380885464e069ae22e1e04f4a905c9e918f4e 2dee58696cab32506f655cb94a63cf4b18a13b37 402429bc9ac5eb891f253f6dae1228338f7f0ea5 hera /home/dwmw2 $ git-rev-list --max-count=5 stage1 -- . d1aba9314210d616cd2aa9ee91176c1dba6d3834 0b627fd403d6319fe50fbd8b95d5ea02017731fa b29cfa21bbdfc25271ef446b9df94ed8b5425711 e2407b6a9a643b378700474c9079dd8620e820ed c0df084d3e2ec0df6dafda8099e7c27c29760843 Junio is right -- if I can avoid creating commits that don't change any files in the stage1 branch, then I don't have to do this. That would be _hard_ though... Currently, the selection of commits from your original tree to be represented in the stage1 branch is simple -- it's "those commits which touch include/". And 'rev-list -- include' works nicely for that. Yet what I actually want in the final result is "those commits which change the result of the _exported_ headers". It's slightly less realistic to want rev-list to find that for me directly from the original kernel tree without having done the export step in stage1 -- what I need to do is create the exported header tree for each commit which _might_ change it, then filter out the commits which don't _actually_ change it. The extra commits in the stage1 branch are cheap enough -- by definition they don't lead to any extra tree or blob objects. I think the two-stage export is probably the best approach, unless I'm missing something. -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 9:38 ` David Woodhouse @ 2006-07-14 15:39 ` Linus Torvalds 2006-07-17 22:34 ` [PATCH] Trivial path optimization test Alex Riesen 2006-07-24 23:23 ` Alex Riesen 2006-07-14 18:01 ` Kernel headers git tree Junio C Hamano 1 sibling, 2 replies; 27+ messages in thread From: Linus Torvalds @ 2006-07-14 15:39 UTC (permalink / raw) To: David Woodhouse; +Cc: Junio C Hamano, git On Fri, 14 Jul 2006, David Woodhouse wrote: > On Thu, 2006-07-13 at 22:52 -0700, Linus Torvalds wrote: > > Btw, I'm actually surprised that my path simplification didn't filter out > > the "." and make it mean exactly the same as not giving a path at all. I > > thought I had done that earlier, but if you say "-- ." matters, then it > > obviously does.. > > In this specific case where I have a whole bunch of commits which don't > actually change anything, it definitely does make a difference... Yes, I'm looking at "get_pathspec()", and noting that it really isn't able to optimize away the ".". It does turn it into an empty string (which is correct - git internally does _not_ ever understand the notion of "." as the current working directory), but it doesn't ever do the optimization of noticing that a pathspec that consists solely of an empty string is "equivalent" to an empty pathspec. Which is exactly what you _want_ in this case, of course, but maybe we should add a test-case for that, so that we never do that trivial optimization by mistake. Maybe something like git init-db echo Hello > a git add a git commit -m "Initial commit" a and then: commit=$(echo "Unchanged tree" | git-commit-tree "HEAD^{tree}" -p HEAD) git-rev-list $commit | wc -l git-rev-list $commit -- . | wc -l where the first git-rev-list should return 2, and the second one should return 1. Anybody want to write that as a test, verify it, and send Junio a patch? Linus ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] Trivial path optimization test 2006-07-14 15:39 ` Linus Torvalds @ 2006-07-17 22:34 ` Alex Riesen 2006-07-24 6:41 ` Junio C Hamano 2006-07-24 23:23 ` Alex Riesen 1 sibling, 1 reply; 27+ messages in thread From: Alex Riesen @ 2006-07-17 22:34 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Linus Torvalds Linus Torvalds, Fri, Jul 14, 2006 17:39:24 +0200: > > > Btw, I'm actually surprised that my path simplification didn't filter out > > > the "." and make it mean exactly the same as not giving a path at all. I > > > thought I had done that earlier, but if you say "-- ." matters, then it > > > obviously does.. > > > > In this specific case where I have a whole bunch of commits which don't > > actually change anything, it definitely does make a difference... > > Yes, I'm looking at "get_pathspec()", and noting that it really isn't able > to optimize away the ".". > > It does turn it into an empty string (which is correct - git internally > does _not_ ever understand the notion of "." as the current working > directory), but it doesn't ever do the optimization of noticing that a > pathspec that consists solely of an empty string is "equivalent" to an > empty pathspec. > > Which is exactly what you _want_ in this case, of course, but maybe we > should add a test-case for that, so that we never do that trivial > optimization by mistake. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- ... > Anybody want to write that as a test, verify it, and send Junio a patch? > > Linus So here it is. t/t6004-rev-list-path-optim.sh | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/t/t6004-rev-list-path-optim.sh b/t/t6004-rev-list-path-optim.sh new file mode 100755 index 0000000..5182dbb --- /dev/null +++ b/t/t6004-rev-list-path-optim.sh @@ -0,0 +1,19 @@ +#!/bin/sh + +test_description='git-rev-list trivial path optimization test' + +. ./test-lib.sh + +test_expect_success setup ' +echo Hello > a && +git add a && +git commit -m "Initial commit" a +' + +test_expect_success path-optimization ' + commit=$(echo "Unchanged tree" | git-commit-tree "HEAD^{tree}" -p HEAD) && + test $(git-rev-list $commit | wc -l) = 2 && + test $(git-rev-list $commit -- . | wc -l) = 1 +' + +test_done -- 1.4.1.gb944 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] Trivial path optimization test 2006-07-17 22:34 ` [PATCH] Trivial path optimization test Alex Riesen @ 2006-07-24 6:41 ` Junio C Hamano 2006-07-24 23:23 ` Alex Riesen 0 siblings, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2006-07-24 6:41 UTC (permalink / raw) To: Alex Riesen; +Cc: git, Linus Torvalds Clean up the commit log pretty please. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Trivial path optimization test 2006-07-24 6:41 ` Junio C Hamano @ 2006-07-24 23:23 ` Alex Riesen 0 siblings, 0 replies; 27+ messages in thread From: Alex Riesen @ 2006-07-24 23:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Linus Torvalds Junio C Hamano, Mon, Jul 24, 2006 08:41:16 +0200: > Clean up the commit log pretty please. No problem. ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] Trivial path optimization test 2006-07-14 15:39 ` Linus Torvalds 2006-07-17 22:34 ` [PATCH] Trivial path optimization test Alex Riesen @ 2006-07-24 23:23 ` Alex Riesen 1 sibling, 0 replies; 27+ messages in thread From: Alex Riesen @ 2006-07-24 23:23 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Linus Torvalds Linus: get_pathspec() does turn '.' into an empty string (which is correct - git internally does _not_ ever understand the notion of "." as the current working directory), but it doesn't ever do the optimization of noticing that a pathspec that consists solely of an empty string is "equivalent" to an empty pathspec. The test is to ensure that this behaviour stays. Signed-off-by: Alex Riesen <raa.lkml@gmail.com> --- t/t6004-rev-list-path-optim.sh | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/t/t6004-rev-list-path-optim.sh b/t/t6004-rev-list-path-optim.sh new file mode 100755 index 0000000..5182dbb --- /dev/null +++ b/t/t6004-rev-list-path-optim.sh @@ -0,0 +1,19 @@ +#!/bin/sh + +test_description='git-rev-list trivial path optimization test' + +. ./test-lib.sh + +test_expect_success setup ' +echo Hello > a && +git add a && +git commit -m "Initial commit" a +' + +test_expect_success path-optimization ' + commit=$(echo "Unchanged tree" | git-commit-tree "HEAD^{tree}" -p HEAD) && + test $(git-rev-list $commit | wc -l) = 2 && + test $(git-rev-list $commit -- . | wc -l) = 1 +' + +test_done -- 1.4.1.gb944 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 9:38 ` David Woodhouse 2006-07-14 15:39 ` Linus Torvalds @ 2006-07-14 18:01 ` Junio C Hamano 2006-07-14 18:21 ` David Woodhouse 1 sibling, 1 reply; 27+ messages in thread From: Junio C Hamano @ 2006-07-14 18:01 UTC (permalink / raw) To: David Woodhouse; +Cc: git David Woodhouse <dwmw2@infradead.org> writes: > Yet what I actually want in the final result is "those commits which > change the result of the _exported_ headers". It's slightly less > realistic to want rev-list to find that for me directly from the > original kernel tree without having done the export step in stage1 -- > what I need to do is create the exported header tree for each commit > which _might_ change it, then filter out the commits which don't > _actually_ change it. > > The extra commits in the stage1 branch are cheap enough -- by definition > they don't lead to any extra tree or blob objects. I think the two-stage > export is probably the best approach, unless I'm missing something. Since you are not building an exact parallel history with the same topology (you are trying to cull the commits in the new tree that do not change the resulting header files), I do not see much point in the parent conversion loop in the first script to compute CONVERTEDPARENTS. How about making it simpler? * Keep the current HEAD of the "headers" branch at in refs/heads/kernel-headers * Whenever you see $UPSTREAM_GITDIR/refs/heads/master changes, you do your converttree to come up with the new header tree * See if the resulting tree changed by doing something like this: TREE=`converttree $INCDIR $KBUILDASMSHA` case "`git diff-tree --name-only kernel-headers $TREE`" in '') # No changes in the result exit esac Stop processing here if there is no change. * Make a new commit, with its parent set to the current value of refs/heads/kernel-headers, perhaps with the same message as $UPSTREAM_GITDIR/refs/heads/master has as you do already. * Advance refs/heads/kernel-headers only when you actually make a new commit. I would further suggest to record the value of the upstream commit object name, $UPSTREAM_GITDIR/refs/heads/master, somewhere in the commit message, by using "git describe". This will help people who use your converted headers to know which released version of the Linus kernel the headers correspond to, and also help you notice when the upstream is updated during the next run. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 18:01 ` Kernel headers git tree Junio C Hamano @ 2006-07-14 18:21 ` David Woodhouse 0 siblings, 0 replies; 27+ messages in thread From: David Woodhouse @ 2006-07-14 18:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Fri, 2006-07-14 at 11:01 -0700, Junio C Hamano wrote: > David Woodhouse <dwmw2@infradead.org> writes: > > > Yet what I actually want in the final result is "those commits which > > change the result of the _exported_ headers". It's slightly less > > realistic to want rev-list to find that for me directly from the > > original kernel tree without having done the export step in stage1 -- > > what I need to do is create the exported header tree for each commit > > which _might_ change it, then filter out the commits which don't > > _actually_ change it. > > > > The extra commits in the stage1 branch are cheap enough -- by definition > > they don't lead to any extra tree or blob objects. I think the two-stage > > export is probably the best approach, unless I'm missing something. > > Since you are not building an exact parallel history with the > same topology (you are trying to cull the commits in the new > tree that do not change the resulting header files), I do not > see much point in the parent conversion loop in the first script > to compute CONVERTEDPARENTS. > > How about making it simpler? > > * Keep the current HEAD of the "headers" branch at in > refs/heads/kernel-headers > > * Whenever you see $UPSTREAM_GITDIR/refs/heads/master > changes, you do your converttree to come up with the > new header tree > > * See if the resulting tree changed by doing something > like this: > > TREE=`converttree $INCDIR $KBUILDASMSHA` > case "`git diff-tree --name-only kernel-headers $TREE`" in > '') > # No changes in the result > exit > esac > > Stop processing here if there is no change. > > * Make a new commit, with its parent set to the current > value of refs/heads/kernel-headers, perhaps with the > same message as $UPSTREAM_GITDIR/refs/heads/master > has as you do already. > > * Advance refs/heads/kernel-headers only when you > actually make a new commit. Unless I'm misunderstanding, I then don't get a tree with a topology which matches Linus' tree -- I just get a series of snapshots, and it's dependent on the timing of my cron jobs. That means that there isn't a 1:1 relationship between any commit in the slave tree and a corresponding commit in the upstream tree, and that the slave tree can't (sensibly) be reproduced. I'd much rather keep it the way it is -- but I'm certainly interested in ways that I could simplify the process of generating what I have at the moment. > I would further suggest to record the value of the upstream > commit object name, $UPSTREAM_GITDIR/refs/heads/master, > somewhere in the commit message, by using "git describe". This > will help people who use your converted headers to know which > released version of the Linus kernel the headers correspond to, > and also help you notice when the upstream is updated during the > next run. Yeah, that was already suggested. I'll do that. -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-13 23:59 Kernel headers git tree David Woodhouse 2006-07-14 0:39 ` Junio C Hamano @ 2006-07-14 7:20 ` Ian Campbell 2006-07-14 7:52 ` Junio C Hamano 2006-07-14 18:05 ` Ingo Oeser 2 siblings, 1 reply; 27+ messages in thread From: Ian Campbell @ 2006-07-14 7:20 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel, git On Fri, 2006-07-14 at 00:59 +0100, David Woodhouse wrote: > At http://git.kernel.org/git/?p=linux/kernel/git/dwmw2/kernel-headers.git > there's a git tree which contains the sanitised exported headers for all > architectures -- basically the result of 'make headers_install'. > > It tracks Linus' kernel tree, by means of some evil scripts.¹ > > Only commits in Linus' tree which actually affect the exported result > should have an equivalent commit in the above tree, which means that any > changes which affect userspace should be clearly visible for review. It might be useful to append the commit checksum from Linus' tree to the comments so it is easier to backtrack to the original commit. Ian. -- Ian Campbell Your step will soil many countries. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 7:20 ` Ian Campbell @ 2006-07-14 7:52 ` Junio C Hamano 0 siblings, 0 replies; 27+ messages in thread From: Junio C Hamano @ 2006-07-14 7:52 UTC (permalink / raw) To: Ian Campbell; +Cc: git, David Woodhouse Ian Campbell <ijc@hellion.org.uk> writes: > It might be useful to append the commit checksum from Linus' tree to the > comments so it is easier to backtrack to the original commit. Although I am not a kernel person, I can imagine how that would be useful. The pre-generated documentation branches in git.git repository are managed similarly to allow tracking of the branch they originate from. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-13 23:59 Kernel headers git tree David Woodhouse 2006-07-14 0:39 ` Junio C Hamano 2006-07-14 7:20 ` Ian Campbell @ 2006-07-14 18:05 ` Ingo Oeser 2006-07-14 18:16 ` David Woodhouse 2 siblings, 1 reply; 27+ messages in thread From: Ingo Oeser @ 2006-07-14 18:05 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel, git Hi David, On Friday, 14. July 2006 01:59, David Woodhouse wrote: > Only commits in Linus' tree which actually affect the exported result > should have an equivalent commit in the above tree, which means that any > changes which affect userspace should be clearly visible for review. Where can I subscribe for commit messages there? Every serious systems programmer (for Linux) will ask this question soon :-) Maybe one of the Postmasters at vger.kernel.org can setup a mailing list for this. Regards Ingo Oeser, happy to see this project finally there ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 18:05 ` Ingo Oeser @ 2006-07-14 18:16 ` David Woodhouse 2006-07-18 21:15 ` Ingo Oeser 0 siblings, 1 reply; 27+ messages in thread From: David Woodhouse @ 2006-07-14 18:16 UTC (permalink / raw) To: Ingo Oeser; +Cc: linux-kernel, git On Fri, 2006-07-14 at 20:05 +0200, Ingo Oeser wrote: > Hi David, > > On Friday, 14. July 2006 01:59, David Woodhouse wrote: > > Only commits in Linus' tree which actually affect the exported result > > should have an equivalent commit in the above tree, which means that any > > changes which affect userspace should be clearly visible for review. > > Where can I subscribe for commit messages there? Well, they're all derived from commits in Linus' tree. I could set up another mailing list feed script which tracks it, but I'd like to give it a while (until I'm happy with the export scripts) first. -- dwmw2 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Kernel headers git tree 2006-07-14 18:16 ` David Woodhouse @ 2006-07-18 21:15 ` Ingo Oeser 0 siblings, 0 replies; 27+ messages in thread From: Ingo Oeser @ 2006-07-18 21:15 UTC (permalink / raw) To: David Woodhouse; +Cc: linux-kernel, git Hi David, On Friday, 14. July 2006 20:16, David Woodhouse wrote: > Well, they're all derived from commits in Linus' tree. I could set up > another mailing list feed script which tracks it, but I'd like to give > it a while (until I'm happy with the export scripts) first. Sounds good :-) Thanks & Regards Ingo Oeser ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2006-07-24 23:24 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-13 23:59 Kernel headers git tree David Woodhouse 2006-07-14 0:39 ` Junio C Hamano 2006-07-14 0:56 ` David Woodhouse 2006-07-14 1:08 ` Linus Torvalds 2006-07-14 2:37 ` Junio C Hamano 2006-07-14 1:05 ` Linus Torvalds 2006-07-14 1:27 ` David Woodhouse 2006-07-14 5:16 ` Linus Torvalds 2006-07-14 10:23 ` David Woodhouse 2006-07-14 15:57 ` Linus Torvalds 2006-07-14 17:51 ` Daniel Barkalow 2006-07-14 17:58 ` David Woodhouse 2006-07-14 18:21 ` Daniel Barkalow 2006-07-14 5:52 ` Linus Torvalds 2006-07-14 9:38 ` David Woodhouse 2006-07-14 15:39 ` Linus Torvalds 2006-07-17 22:34 ` [PATCH] Trivial path optimization test Alex Riesen 2006-07-24 6:41 ` Junio C Hamano 2006-07-24 23:23 ` Alex Riesen 2006-07-24 23:23 ` Alex Riesen 2006-07-14 18:01 ` Kernel headers git tree Junio C Hamano 2006-07-14 18:21 ` David Woodhouse 2006-07-14 7:20 ` Ian Campbell 2006-07-14 7:52 ` Junio C Hamano 2006-07-14 18:05 ` Ingo Oeser 2006-07-14 18:16 ` David Woodhouse 2006-07-18 21:15 ` Ingo Oeser
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).