* Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
From: Martin Schlemmer @ 2005-04-19 23:11 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Greg KH, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504191539000.2274@ppc970.osdl.org>
[-- Attachment #1: Type: text/plain, Size: 1347 bytes --]
On Tue, 2005-04-19 at 15:43 -0700, Linus Torvalds wrote:
>
> On Wed, 20 Apr 2005, Martin Schlemmer wrote:
> >
> > Correct me if I am wrong, but the right way to do this is to set the
> > hostname to just that - the hostname, and add 'domain foo.com'
> > to /etc/resolv.conf.
>
> I'll correct you.
>
> The fact is, that's not what people do. Not me, not kernel.org, not _any_
> of the machines I've got access to. They put the fully qualified name in
> the hostname, and just do "search foo.com" in /etc/resolv.conf.
>
> So clearly, expecting that people work the way you claim is being
> extremely optimistic. I'm sure some people do that too, but I suspect I'm
> in the majority. Both Fedora Core and YellowDog act the way I described,
> not the way you do..
>
The interesting bit you snipped was the part where you said you do not
know how to get dnsdomainname to work properly, and that I answered.
Why this other crap about how 90% of the world does it?
PS: If you have later tools, setting hostname to the FQDN and then still
adding 'domain' to resolv.conf seems to do the right thing, although it
did not some time back (and was why I said the bit about hostname only
containing the hostname, else you got something like 'hostname -f'
returning 'www1.foo.com.foo.com) ...
--
Martin Schlemmer
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply
* Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
From: Steven Cole @ 2005-04-19 23:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Greg KH, Greg KH, Git Mailing List, linux-kernel, sensors
In-Reply-To: <Pine.LNX.4.58.0504191525290.2274@ppc970.osdl.org>
On Tuesday 19 April 2005 04:38 pm, Linus Torvalds wrote:
>
> On Tue, 19 Apr 2005, Steven Cole wrote:
> >
> > But perhaps a progress bar right about here might be
> > a good thing for the terminally impatient.
> >
> > real 3m54.909s
> > user 0m14.835s
> > sys 0m10.587s
> >
> > 4 minutes might be long enough to cause some folks to lose hope.
>
> Well, the real operations took only 15 seconds. What kind of horribe
> person are you, that you don't have all of the kernel in your disk cache
> already? Shame on you.
>
> Or was the 4 minutes for downloading all the objest too?
Yes, I was using a very recent version of the pasky tools,
I had created the repo this morning with git init YOUR_RSYC_URL_FOR_LINUX-2.6.
I did time git pull origin and watched the fur fly.
Then, the flurry of patching file blah messages, followed by a rather
pregnant pause after the last patching message.
I wasn't complaining about the 4 minutes, just the lack of feedback
during the majority of that time. And most of it was after the last
patching file message.
>
> Anyway, it looks like you are using pasky's scripts, and the old
> "patch-based" upgrade at that. You certainly will _not_ see the
>
> [many files patched]
> patching file mm/mmap.c
> ..
>
> if you use a real git merge. That's probable be the real problem here.
>
> Real merges have no patches taking place _anywhere_. And they take about
> half a second. Doing an "update" of your tree should _literally_ boil down
> to
>
> #
> # "repo" needs to point to the repo we update from
> #
> rsync -avz --ignore-existing $repo/objects/. .git/objects/.
> rsync -L $repo/HEAD .git/NEW_HEAD || exit 1
> read-tree -m $(cat .git/NEW_HEAD) || exit 1
> checkout-cache -f -a
> update-cache --refresh
> mv .git/NEW_HEAD .git/HEAD
>
> and if it does anything else, it's literally broken. Btw, the above does
> need my "read-tree -m" thing which I committed today.
>
> (CAREFUL: the above is not a good script, because it _will_ just overwrite
> all your old contents with the stuff you updated to. You should thus not
> actually use something like this, but a "git update" should literally end
> up doing the above operations in the end, and just add proper checking).
>
> And if that takes 4 minutes, you've got problems.
>
> Just say no to patches.
>
> Linus
>
> PS: If you want a clean tree without any old files or anything else, for
> that matter, you can then do a "show-files -z --others | xargs -0 rm", but
> be careful: that will blow away _anything_ that wasn't revision controlled
> with git. So don't blame me if your pr0n collection is gone afterwards.
>
OK. I may try some of this tomorrow from work, where I have a fat pipe.
I'm on dialup from home, and I suspect not very many folks want to hear
the sad tale of how long it takes to get the kernel over 56k dialup.
Steven
^ permalink raw reply
* Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
From: Petr Baudis @ 2005-04-19 23:16 UTC (permalink / raw)
To: Steven Cole
Cc: Linus Torvalds, Greg KH, Greg KH, Git Mailing List, linux-kernel,
sensors
In-Reply-To: <200504191704.48976.elenstev@mesatop.com>
Dear diary, on Wed, Apr 20, 2005 at 01:04:48AM CEST, I got a letter
where Steven Cole <elenstev@mesatop.com> told me that...
> Then, the flurry of patching file blah messages, followed by a rather
> pregnant pause after the last patching message.
>
> I wasn't complaining about the 4 minutes, just the lack of feedback
> during the majority of that time. And most of it was after the last
> patching file message.
That must've been the update-cache.
Well, you can listen to your strained disk crepitating direly.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
^ permalink raw reply
* Re: Darcs and git: plan of action
From: Ray Lee @ 2005-04-19 23:21 UTC (permalink / raw)
To: Tupshin Harper; +Cc: Kevin Smith, git, darcs-devel
In-Reply-To: <42658D95.7020404@tupshin.com>
On Tue, 2005-04-19 at 16:00 -0700, Tupshin Harper wrote:
> Ray Lee wrote:
>
> >Here's where we disagree. If you checkpoint your tree before the
> >replace, and immediately after, the only differences in the
> >source-controlled files would be due to the replace.
> >
> This is assuming that you only have one replace and no other operations
> recorded in the patch. If you have multiple replaces or a replace and a
> traditional diff recorded in the same patch, then this is not true.
I had a precondition on my argument (not quoted), that the code was
checkpointed before and after. Obviously, a large set of changes in one
patch is a problem. However, a darcs replace is (effectively) a commit
on its own, so I was limiting myself to the same situation under a
different system.
> A more fundamental problem comes back to intent. If I have a file
> "foo" before:
> a1
> a2
> and after:
> b1
> b2
> is that a "replace [_a-zA-Z0-9] a b foo" patch, or is that a
> -a1
> -a2
> +b1
> +b2
> patch?
Okay, so in reading the online darcs manual (yet) again, I now see that
it allows regular expressions for the match and replace, which means
multiple unique tokens could change atomically. (Does anyone actually
*use* regexes? Sounds like a cannon that'd be hard to aim.)
Regardless, I only care about code, not free text. If it's in a language
that doesn't do some use-'em-as-you-need-'em duck typing spiel
(<cough>python</cough), then the context of your patch (namely, the
file) already has those tokens somewhere in them. And I bet that if
*you* looked at that file, you could tell if it was a replace or a mere
textual diff. Am I wrong?
> Note that this comes down to heuristics, and no matter what you
> use, you will be wrong sometimes, *and* the choice that is made can
> substantively affect the contents of the repository after additional
> patches are applied.
Unless I'm missing something, the darcs replace patch can already do the
wrong thing. If I do a replace patch on a variable introduced in a local
tree, then do a darcs replace on it before committing it to a shared
repository, and coder B introduces a variable of the same original name
in my copy, then there's a chance that the replace patch will
incorrectly apply upon his newly introduced variable. No?
> It's provable that you can not.
I'm still not seeing the problem, at least when it comes to ANSI C.
Ray
^ permalink raw reply
* Re: A VFS layer - was: SCM ideas from 2003
From: Stéphane Fillod @ 2005-04-19 23:13 UTC (permalink / raw)
To: git
In-Reply-To: <2cfc403205041901074ca57724@mail.gmail.com>
Jon Seymour <jon.seymour <at> gmail.com> writes:
[...]
> It seems to me that file-orientation is here to stay and it would be
> really cool to layer some kind of virtual filesystem over the git
> repository so that different trees become transparently accessible via
> different branches of a file system, e.g.:
> /mnt/gitfs/working # some kind of writeable virtual
directory over the git cache
> /mnt/gitfs/c157067185209b50b350571fe762c2740ea13fc1 # read-only tree of
commit c157...
> /mnt/gitfs/5b53d3a08d64198d26d4f2323f235790c04aeaab # read-only tree of
comit 5b53...
Ah, you mean wrapping the libgit in a FUSE plugin and let
the Linux pagecache/dentry cache do the caching of on-demand
inflated blobs and indexes? Would the hash be the "inode number"?
--
Stephane
^ permalink raw reply
* Re: Change "pull" to _only_ download, and "git update"=pull+merge?
From: Daniel Barkalow @ 2005-04-19 23:20 UTC (permalink / raw)
To: David A. Wheeler; +Cc: Petr Baudis, Martin Schlemmer, David Greaves, git
In-Reply-To: <42658888.60007@dwheeler.com>
On Tue, 19 Apr 2005, David A. Wheeler wrote:
> In a _logical_ sense that's true; I'd only want to pull data if I intended
> to (possibly) do something with it. But as a _practical_ matter,
> I can see lots of reasons for doing a pull as a separate operation.
> One is disconnected operation; (...)
That's true. I think I actually like "git pull" as the operation for "make
sure I have everything I need, so I can lose net".
> What command would you suggest for the common case
> of "update with current track?" I've proposed "git update [NAME]".
> "git merge" with update-from-current-track as default seems unclear, and
> I worry that I might accidentally press RETURN too soon & merge with
> the wrong thing. And I like the idea of "git update" doing the same thing
> (essentially) as "cvs update" and "svn update"; LOTS of people "know"
> what update does, so using the same command name for one of the most
> common operations smooths transition (GNU Arch's "tla update"
> is almost, though not exactly, the same too.)
I think that having "git update" update a tracked branch is best, if only as
an aid to discoverability. And "git merge" should require you to say what
you want to merge with, because it's too easy to pick a wrong default, and
the user had better know.
It seems to me like this makes "update" identical to "merge <tracked>", so
"update [NAME]" and "merge" don't make sense, since they'd do the other
command, but less intuitively.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply
* Re: Darcs and git: plan of action
From: Tupshin Harper @ 2005-04-19 23:32 UTC (permalink / raw)
To: Ray Lee; +Cc: git, Kevin Smith, darcs-devel
In-Reply-To: <1113951972.29444.42.camel@orca.madrabbit.org>
Ray Lee wrote:
> I'm still not communicating well.
>
>Give me a case where assuming it's a replace will do the wrong thing,
>for C code, where it's a variable or function name.
>
>Ray
>
>-
>
I think you are communicating fine, but not fully understanding darcs.
try this:
initial patch creates hello.c
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("Hello world!\n");
return 0;
}
second patch:
replace ./hello.c [A-Za-z_0-9] world universe
third patch, for conceptual clarity, created in another repository that
had seen the first patch, but not the second (adds function wide_world):
hunk ./hello.c 3
+void wide_world()
+{
+ printf("Hello wide world\n");
+}
+
hunk ./hello.c 11
+ wide_world();
}
If patch2 was a replace patch, then the result of running the combined 3
patch version would be:
Hello universe!
Hello wide universe
but if patch2 was a non-replace patch, then the result would be:
Hello universe!
Hello wide world
-Tupshin
^ permalink raw reply
* Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
From: Linus Torvalds @ 2005-04-19 23:38 UTC (permalink / raw)
To: Steven Cole; +Cc: Greg KH, Greg KH, Git Mailing List, linux-kernel, sensors
In-Reply-To: <200504191704.48976.elenstev@mesatop.com>
On Tue, 19 Apr 2005, Steven Cole wrote:
>
> I wasn't complaining about the 4 minutes, just the lack of feedback
> during the majority of that time. And most of it was after the last
> patching file message.
That should be exactly the thing that the new "read-tree -m" fixes.
Before, when you read in a new tree (which is what you do when you update
to somebody elses version), git would throw all the cached information
away, and so you'd end up doing a "checkout-cache -f -a" that re-wrote
every single checked-out file, followed by "update-cache --refresh" that
then re-created the cache for every single file.
With the new read-tree, the same sequence (assuming you have the "-m"
flag to tell read-tree to merge the cache information) will now only write
out and re-check the files that actually changed due to the update or
merge.
So that last phase should go from minutes to seconds - instead of checking
17,000+ files, you'd end up checking maybe a few hundred for most "normal"
updates.
For example, updating all the way from the git root (ie plain 2.6.12-rc2)
to the current head, only 577 files have changed, and the rest (16,740)
should never be touched at all.
You can see why doing just the 577 instead of the full 17,317 might speed
things up a bit ;)
Linus
PS. Of course, right now it probably does make sense to waste some time
occasionally, and run "fsck-cache $(cat .git/HEAD)" every once in a while.
Just in case..
^ permalink raw reply
* Re: Darcs and git: plan of action
From: Tupshin Harper @ 2005-04-19 23:38 UTC (permalink / raw)
To: Ray Lee; +Cc: Kevin Smith, git, darcs-devel
In-Reply-To: <1113952916.29444.60.camel@orca.madrabbit.org>
Ray Lee wrote:
>it allows regular expressions for the match and replace, which means
>multiple unique tokens could change atomically. (Does anyone actually
>*use* regexes? Sounds like a cannon that'd be hard to aim.)
>
>
Yes, and replace patches need to be used very carefully.
>Regardless, I only care about code, not free text. If it's in a language
>that doesn't do some use-'em-as-you-need-'em duck typing spiel
>(<cough>python</cough), then the context of your patch (namely, the
>file) already has those tokens somewhere in them. And I bet that if
>*you* looked at that file, you could tell if it was a replace or a mere
>textual diff. Am I wrong?
>
>
Yes. See my hello world example from my last email.
>
>Unless I'm missing something, the darcs replace patch can already do the
>wrong thing.
>
Yes, depending on how you define wrong. Darcs replace is fully
predictable, and poorly chosen replaces can lead to incorrect results
after future patches are applied.
>If I do a replace patch on a variable introduced in a local
>tree, then do a darcs replace on it before committing it to a shared
>repository, and coder B introduces a variable of the same original name
>in my copy, then there's a chance that the replace patch will
>incorrectly apply upon his newly introduced variable. No?
>
>
Absolutely correct, and the exact reason why replace patches need to be
used *very* selectively.
>
>
>>It's provable that you can not.
>>
>>
>
>I'm still not seeing the problem, at least when it comes to ANSI C.
>
>Ray
>
>
See hello world example in my other email. You can argue that it is an
existing problem in darcs, but really, it just points out the fact that
a computer is *incapable* of knowing whether it is safe to use a replace
patch based on a diff because replace patches are dangerous if not used
intelligently.
-Tupshin
^ permalink raw reply
* Re: "True" git merge in git-pasky
From: Francois Romieu @ 2005-04-19 23:40 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
In-Reply-To: <20050419035107.GB5554@pasky.ji.cz>
Petr Baudis <pasky@ucw.cz> :
[...]
> Now you decided to do a little bit of parallel development and stick
> your patches not ready for 2.6.12 to a separate tree. That's fine, do
>
> git fork experimental ~/linux-2.6.experimental
>
> and get some coffee. (It takes about 8 minutes here, but I think git
> isn't at fault - it is probably all spent in
>
> read-tree $(tree-id)
> checkout-cache -a
> update-cache --refresh
Tip of the day: cat the whole tree to /dev/null before the fork
--
Ueimor
^ permalink raw reply
* Re: [PATCH] write-tree performance problems
From: David Lang @ 2005-04-19 23:42 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Chris Mason, git
In-Reply-To: <Pine.LNX.4.58.0504191608230.2274@ppc970.osdl.org>
On Tue, 19 Apr 2005, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, David Lang wrote:
>>
>> if you are useing quilt for locally developed patches I fully agree with
>> you, but I was thinking of the case where Andrew is receiving independant
>> patches from lots of people and storing them in quilt for testing, and
>> then sending them on to you. In this case the patches really are
>> independant and it may be useful to continue to treat them this way
>> instead of collapsing them into one 'update from Andrew' feed.
>
> If so, he should set up one repository per quilt patch.
a tool to do this automaticaly is what I was trying to suggest (and asking
if it would be useful)
> That would be crazy, but yes, it would allow me to cherry-pick which
> one(s) I want to merge with.
>
> But the fact is, that cherry-picking should happen at quilt-time not at
> git time.
Ok, I could see arguments for both methods. if the forest of disposeable
repositories is fast enough and flexible enough there is some value of
getting patches into git as quickly as possible, and not having to fan
them out to quilt as an intermediate step, but it may not be enough value
to be worth the added complexity.
not being at all familar with quilt (in fact haveing never seen it, just
seen it discussed here and LKML), how painful would it be to try and
implement it useing git as a back-end? you would end up with a bunch of
extra objects that you will ignore (they are parts of branches that you
throw away), but I don't know if that space cost (plus the cost of the
extra trees in git) is going to be too high.
this brings up a thought, is there a way to point at a bunch of
repositories (trees) and a collection of objects and tell git to purge any
objects that don't have anything linking to them? in the short-medium term
this isn't a problem, but in the long term you will have extra objects
being created and then orphaned when a branch gets thrown away that will
eventually amount to a noticable amount of space.
David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
^ permalink raw reply
* Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
From: Steven Cole @ 2005-04-19 23:41 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Greg KH, Greg KH, Git Mailing List, linux-kernel, sensors
In-Reply-To: <Pine.LNX.4.58.0504191627420.2274@ppc970.osdl.org>
On Tuesday 19 April 2005 05:38 pm, Linus Torvalds wrote:
>
> On Tue, 19 Apr 2005, Steven Cole wrote:
> >
> > I wasn't complaining about the 4 minutes, just the lack of feedback
> > during the majority of that time. And most of it was after the last
> > patching file message.
>
> That should be exactly the thing that the new "read-tree -m" fixes.
>
> Before, when you read in a new tree (which is what you do when you update
> to somebody elses version), git would throw all the cached information
> away, and so you'd end up doing a "checkout-cache -f -a" that re-wrote
> every single checked-out file, followed by "update-cache --refresh" that
> then re-created the cache for every single file.
>
> With the new read-tree, the same sequence (assuming you have the "-m"
> flag to tell read-tree to merge the cache information) will now only write
> out and re-check the files that actually changed due to the update or
> merge.
>
> So that last phase should go from minutes to seconds - instead of checking
> 17,000+ files, you'd end up checking maybe a few hundred for most "normal"
> updates.
>
> For example, updating all the way from the git root (ie plain 2.6.12-rc2)
> to the current head, only 577 files have changed, and the rest (16,740)
> should never be touched at all.
>
> You can see why doing just the 577 instead of the full 17,317 might speed
> things up a bit ;)
>
> Linus
Cool. Petr, I hope this works like this with your tools tomorrow.
>
> PS. Of course, right now it probably does make sense to waste some time
> occasionally, and run "fsck-cache $(cat .git/HEAD)" every once in a while.
> Just in case..
>
>
Sounds like a good thing to schedule for $WEEHOUR.
Steven
^ permalink raw reply
* Re: [PATCH] write-tree performance problems
From: Linus Torvalds @ 2005-04-19 23:59 UTC (permalink / raw)
To: David Lang; +Cc: Chris Mason, git
In-Reply-To: <Pine.LNX.4.62.0504191629410.26365@qynat.qvtvafvgr.pbz>
On Tue, 19 Apr 2005, David Lang wrote:
> >
> > If so, he should set up one repository per quilt patch.
>
> a tool to do this automaticaly is what I was trying to suggest (and asking
> if it would be useful)
Heh. It's certainly possible. Esepcially with the object sharing, you
could create a git archive by just doing a "read-tree" and updating a few
files, and you'd never have to even check out the rest of the files at
all.
IOW, you can probably set up a new git archive in not much more time than
it takes for a "read-tree" + "write-tree", with very little in between.
That comes out to about a second, and the write-tree index optimizations
would take it down to next to nothing..
However, it definitely wouldn't be useful for _me_. The whole thing that
I'm after is to allow painless merging of distributed work. If I have to
merge one patch at a time, I'd much rather see people send me patches
directly - that's much simpler than having a whole new GIT repository.
So at least to me, a git repository only makes sense when it is a
collection of patches.
Does that mean that it wouldn't make sense to others? No. It's really
cheap to keep a shared object directory, and have a number of different
git archives using that, and you can have ten different trees tracking ten
different things, with very little overhead.
But even "cheap" is relative. If you actually want to do _work_ in those
repositories, you want to check things out in them, and populate them with
files. Even if you do that with hardlinked blobs, just _populating_ the
tree itself (setting up the subdirectories and the links) is going to be
more expensive than applying a patch in quilt.
Linus
^ permalink raw reply
* Re: [GIT PATCH] I2C and W1 bugfixes for 2.6.12-rc2
From: Jan-Benedict Glaw @ 2005-04-20 0:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Martin Schlemmer, Greg KH, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504191539000.2274@ppc970.osdl.org>
On Tue, 2005-04-19 15:43:54 -0700, Linus Torvalds <torvalds@osdl.org> wrote:
> On Wed, 20 Apr 2005, Martin Schlemmer wrote:
> >
> > Correct me if I am wrong, but the right way to do this is to set the
> > hostname to just that - the hostname, and add 'domain foo.com'
> > to /etc/resolv.conf.
>
> I'll correct you.
>
> The fact is, that's not what people do. Not me, not kernel.org, not _any_
> of the machines I've got access to. They put the fully qualified name in
> the hostname, and just do "search foo.com" in /etc/resolv.conf.
That's not entirely correct. Actually, basically all machines
(administered by a number of people) only have the real hostname in
/etc/hostname and a domain entry in /etc/resolv.conf .
> So clearly, expecting that people work the way you claim is being
> extremely optimistic. I'm sure some people do that too, but I suspect I'm
> in the majority. Both Fedora Core and YellowDog act the way I described,
> not the way you do..
Maybe these two do it that way. I just checked a recently installed
Debian box--they to it the way I'm used to^W^W^WMartin describes it.
MfG, JBG
--
Jan-Benedict Glaw jbglaw@lug-owl.de . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
fuer einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
^ permalink raw reply
* wit 0.0.3 - a web interface for git available
From: Christian Meder @ 2005-04-20 0:29 UTC (permalink / raw)
To: git
Hi,
ok it's starting to look like spam ;-)
I uploaded a new version of wit to http://www.absolutegiganten.org/wit
Wit is a web interface for git. Right now it includes: views of blob,
commit and tree objects, generating patches for the commits, downloading
of gz or bzip2 tarballs of trees.
It's easy to setup and a simple stand alone server configuration is
included.
Changes:
* first release which is tested on the current kernel.git archive
* fix diffTree output by using -r
* enhance the patch generation to work against multiple parents
* remove temporary files after diff generation
* fix the tar generation by using the recursive ls-tree variant
* disable colored link on tree objects
I still hope that I'll get feedback someday ;-)
Christian
--
Christian Meder, email: chris@absolutegiganten.org
The Way-Seeking Mind of a tenzo is actualized
by rolling up your sleeves.
(Eihei Dogen Zenji)
^ permalink raw reply
* [RFC] Possible strategy cleanup for git add/remove/diff etc.
From: Junio C Hamano @ 2005-04-20 0:32 UTC (permalink / raw)
To: Petr Baudis; +Cc: git
In-Reply-To: <20050419035107.GB5554@pasky.ji.cz>
I was reading this comment in gitcommit.sh and started
thinking...
# We bother with added/removed files here instead of updating
# the cache at the time of git(add|rm).sh, since we want to
# have the cache in a consistent state representing the tree
# as it was the last time we committed. Otherwise, e.g. partial
# conflicts would be a PITA since added/removed files would
# be committed along automagically as well.
Let's for a moment forget what git-pasky currently does, which
is not to touch .git/index until the user says "Ok, let's
commit". I am wondering if that is the root cause of all the
trouble git-pasky needs to go through. Specifically I think
having to deal with add/remove queue seems to affect not just
commit you have that comment above but also with diffs.
I'd like to start from a different premise and see what happens:
- What .git/index records is *not* the state as the last
commit. It is just an cache Cogito uses to speed up access
to the user's working tree. From the user's point of view,
it does not even exist.
- The way this hypothetical Cogito uses .git/index is to always
reflect add and remove but modification may be out of sync.
It is updated lazily when .git/index must match the working
tree. Again, this is invisible to the user. From the user's
point of view, there are only two things: the last commit
represented as .git/HEAD and his own working tree.
I call this hypothetical implementation of Cogito "jit-*" in the
following description. Also this is just to convey the idea, so
all the error checking (e.g. "what the user gave jit-merge is
not a valid commit id") and sugarcoating (e.g. tags, symbolic
foreign repository names instead of rsync URL etc) are omitted.
* jit-checkout $commit_id
This is like "cvs co". Same as what you are doing I suppose.
committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
read-tree $committed_tree
checkout-cache -f -a
echo $commit_id >.git/HEAD
* jit-add files... | jit-remove files...
Like "cvs add". Here, .git/index is treated as just a cache
of the working tree, not the mirror of previous commit. So
unlike git-pasky, jit-* touches .git/index here.
update-cache --add "$@"
---
rm -f "$@" ;# this is debatable...
update-cache --remove "$@"
* jit-diff [files...]
Like "cvs diff". The user wants to see what's different
between his working tree and the last commit.
case "$#" in 0) set x $(show-files --cached); shift ;; esac
update-cache --add --remove "$@" --refresh
current_tree=$(write-tree)
committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
diff-tree -r -z $committed_tree $current_tree |
filter-output-to-limit-to-given-filelist "$@" |
parse-diff-tree-output-and-show-real-file-diffs
Unlike git-pasky, jit-* does not keep the state from the last
commit in .git/index. Instead, .git/index is meant to cache
the state of the working tree. So the first three lines in
the above updates .git/index lazily from what is in the
working tree for the part that needs to be diffed. Then it
uses helper scripts to filter and parse diff-tree output and
generates per-file diffs. Since add and remove are already
recorded in .git/index, it does not have to special case
"uncommitted add" and such.
* jit-commit
Like "cvs commit".
set x $(show-files --cached); shift
update-cache --add --remove "$@"
current_tree=$(write-tree)
next_commit=$(commmit-tree $current_tree -p $(cat .git/HEAD))
echo $next_commit >.git/HEAD
Unlike git-pasky, .git/index already has adds and removes but
it does not know about local modifications. So it runs
update-cache to make it match the working tree first, and then
does the usual commit thing.
The above only allows the whole tree commit. But allowing
single file commit is not that hard:
(
set x $(show-files --cached); shift
update-cache --add --remove "$@"
) ;# we use subshell to preserve "$@" here...
current_tree=$(write-tree)
committed_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
read-tree $(committed_tree)
update-cache --add --remove "$@"
next_commit=$(commmit-tree $current_tree -p $(cat .git/HEAD))
echo $next_commit >.git/HEAD
read-tree $current_tree
The first four lines are to preserve the current tree state.
Then we rewind the dircache to the last committed state,
update only the named files to bring it to the state the user
wanted to commit, and commit. Once done, we re-read the state
to match the user's original intention (e.g. adds recorded in
.git/index previously but not committed in this run is
preserved).
* jit-merge $commit_id
LIke "cvs up -j". I have working tree which is based on some
commit, and I want to merge somebody else's head $commit_id.
Stated more exactly: I want to have the result of my changes
in my working tree, if I started out from the merge between
the commit I am actually based on and $commit_id.
# First get my changes and stash away in a safe place.
jit-diff >,,working-tree-changes-as-patch
# After the above, we know .git/index matches the working tree, so...
current_tree=$(write-tree)
# Usual 3-way Linus merge.
merge_base=$(merge-base $(cat .git/HEAD) $commit_id)
base_tree=$(cat-file commit $merge_base | sed -e 's/^tree //;q')
committed_tree=$(cat-file commit $(cat .git/HEAD) | sed -e 's/^tree //;q')
his_tree=$(cat-file commit $commit_id | sed -e 's/^tree //;q')
read-tree -m $base_tree $committed_tree $his_tree
merge-cache three-way-merge-script -a
# Now our .git/index has the merge result. Match working
# tree to it.
checkout-cache -f -a
# Apply our precious changes.
patch <,,working-tree-changes-as-patch
# Here we need to detect adds and removes and issue
# appropriate update-cache --add --remove.
* jit-pull $foreign_repository
I do not think we need this. Just rsync but not merge.
It looks quite simple. I am asking your opinion because I am
sure you have thought about issues involved through, and the
above outline looks simple only because it is missing something
important that you already had to deal with and solved---and the
solution looks convoluted to me only because I am not aware of
the problem you had to solve.
^ permalink raw reply
* Re: [PATCH] write-tree performance problems
From: Chris Mason @ 2005-04-20 0:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0504191420060.19286@ppc970.osdl.org>
On Tuesday 19 April 2005 17:23, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Chris Mason wrote:
> > Regardless, putting it into the index somehow should be fastest, I'll see
> > what I can do.
>
> Start by putting it in at "read-tree" time, and adding the code to
> invalidate all parent directory indexes when somebody changes a file in
> the index (ie "update-cache" for anything but a "--refresh").
>
> That would be needed anyway, since those two are the ones that already
> change the index file.
>
> Once you're sure that you can correctly invalidate the entries (so that
> you could never use a stale tree entry by mistake), the second stage would
> be to update it at "write-tree" time.
This was much easier then I expected, and it seems to be working here. It
does slow down the write-tree slightly because we have to write out the index
file, but I can get around that with the index file on tmpfs change.
The original write-tree needs .54 seconds to run
write-tree with the index speedup gets that down to .024s (same as my first
patch) when nothing has changed. When it has to rewrite the index file
because something changed, it's .167s.
I'll finish off the patch once you ok the basics below. My current code works
like this:
1) read-tree will insert index entries for directories. There is no index
entry for the root.
2) update-cache removes index entries for all parents of the file you're
updating. So, if you update-cache fs/ext3/inode.c, I remove the index of fs
and fs/ext3
3) If write-tree finds a directory in the index, it uses the sha1 in the cache
entry and skips all files/dirs under that directory.
4) If write-tree detects a subdir with no directory in the index, it calls
write_tree the same way it used to. It then inserts a new cache object with
the calculated sha1.
5) right before exiting, write-tree updates the index if it made any changes.
The downside to this setup is that I've got to change other index users to
deal with directory entries that are there sometimes and missing other times.
The nice part is that I don't have to "invalidate" the directory entry, if it
is present, it is valid.
-chris
^ permalink raw reply
* Re: [PATCH] write-tree performance problems
From: Christopher Li @ 2005-04-19 21:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: David Lang, Chris Mason, git
In-Reply-To: <Pine.LNX.4.58.0504191651110.6467@ppc970.osdl.org>
On Tue, Apr 19, 2005 at 04:59:18PM -0700, Linus Torvalds wrote:
>
> However, it definitely wouldn't be useful for _me_. The whole thing that
> I'm after is to allow painless merging of distributed work. If I have to
> merge one patch at a time, I'd much rather see people send me patches
> directly - that's much simpler than having a whole new GIT repository.
>
> So at least to me, a git repository only makes sense when it is a
> collection of patches.
Same here, I have been toying the idea to using git as quilt back
end then I can get rid of the .pc/ directory in quilt.
But think about it more, I don't get a good reason to do it.
quilt as it is, works great with git or other SCM. Using git to
store the quilt patches will require merge more often, instead of
just applying patches. Introduce more steps and more objects to clean
up later on. It seems that every thing I have been using quilt for,
it is easier just deal with the series patches.
Chris
^ permalink raw reply
* Re: [PATCH] write-tree performance problems
From: Linus Torvalds @ 2005-04-20 1:09 UTC (permalink / raw)
To: Chris Mason; +Cc: git
In-Reply-To: <200504192049.21947.mason@suse.com>
On Tue, 19 Apr 2005, Chris Mason wrote:
>
> 5) right before exiting, write-tree updates the index if it made any changes.
This part won't work. It needs to do the proper locking, which means that
it needs to create "index.lock" _before_ it reads the index file, and
write everything to that one and then do a rename.
If it doesn't need to do the write, it can just remove index.lock without
writing to it, obviously.
> The downside to this setup is that I've got to change other index users to
> deal with directory entries that are there sometimes and missing other times.
> The nice part is that I don't have to "invalidate" the directory entry, if it
> is present, it is valid.
To me, the biggest downside is actually the complexity part, and worrying
about the directory index ever getting stale. How big do the changes end
up being?
Linus
^ permalink raw reply
* Re: [darcs-devel] Darcs and git: plan of action
From: Ray Lee @ 2005-04-20 1:11 UTC (permalink / raw)
To: Tupshin Harper; +Cc: Kevin Smith, git, darcs-devel
In-Reply-To: <426594F9.4090002@tupshin.com>
Thanks for your patience.
On Tue, 2005-04-19 at 16:32 -0700, Tupshin Harper wrote:
> >Give me a case where assuming it's a replace will do the wrong thing,
> >for C code, where it's a variable or function name.
> try this:
> initial patch creates hello.c
> #include <stdio.h>
>
> int main(int argc, char *argv[])
> {
> printf("Hello world!\n");
> return 0;
> }
>
> second patch:
> replace ./hello.c [A-Za-z_0-9] world universe
Aha! Okay, I now see at least part of issue: we're using different
definitions of 'token.' Yours is quite sensible, in that it matches the
darcs syntax. However, I'm claiming a token is defined by the file's
language, and that a replace patch on anything but a token as per those
language standards is a silly thing.
In your example, I'd claim you did an inter-token edit, as the natural
token there was "Hello world!\n".
With that, let me restate what I think is possible.
One should be able to discover renames (replaces) of user identifiers in
C code programmatically. Is that everything darcs replace does?
Obviously not. Is that what users would usually *want*? If I were using
it, that's what I'd want (especially including the limited scope of
replacement -- user identifiers such as variable or function names,
etc.). But then I'm not a lurker on the darcs user list, so I don't know
how usage of darcs replace plays out in actual practice.
So, it's a subset. Is it a useful subset? Yes, as it addresses what
happens during refactoring, which is when I'd usually see this getting
used. (Syntactically ignorant search and replace is so, y'know,
*1970s*.)
Any clearer?
Ray
^ permalink raw reply
* Re: [darcs-devel] Darcs and git: plan of action
From: Ray Lee @ 2005-04-20 1:22 UTC (permalink / raw)
To: Juliusz Chroboczek; +Cc: darcs-devel, git, Kevin Smith
In-Reply-To: <7i7jizyy4i.fsf@lanthane.pps.jussieu.fr>
On Tue, 2005-04-19 at 10:22 +0200, Juliusz Chroboczek wrote:
> > > Aye, that will require some metadata on the git side (the hack,
> > > suggested by Linus, of using git hashes to notice moves won't work).
>
> > So, why won't it work?
>
> Because two files can legitimately have identical contents without
> being ``the same'' file from the VC system's point of view.
>
> In other words, two files may happen to have the same contents but
> have distinct histories.
Eh, let's not talk using integral/summation view across all the patches
that ever could have come in against the file. We're hamstringing
ourselves if we do that, and it's not what darcs does. darcs looks at a
differential view of the changes, and for a mv, it looks at it when it
happens.
darcs does a "darcs mv" to commit a "file move patch" to whatever
logging or patch repository it keeps below the surface.
The equivalent in git would be to have a given tree, move a file via
bash's mv, and then checkpoint a new tree. (I'm sure there's details in
there, but that's plumbing, and what we have Petr for.)
A differential comparison of the two trees shows no content changed, but
a file label was modified. Ergo, a rename occurred.
QED.
~r.
^ permalink raw reply
* Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
From: Linus Torvalds @ 2005-04-20 1:51 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vacnumgot.fsf@assigned-by-dhcp.cox.net>
On Tue, 19 Apr 2005, Junio C Hamano wrote:
>
> Let's for a moment forget what git-pasky currently does, which
> is not to touch .git/index until the user says "Ok, let's
> commit".
I think git-pasky is wrong.
It's true that we want to often (almost always) diff against the last
"released" thing, and I actually think git-pasky does what it does because
I never wrote a tool to diff the current working directory against a
"tree".
At the same time, I very much worked with a model where you do _not_ have
a traditional "work file", but the index really _is_ the "work file".
> I'd like to start from a different premise and see what happens:
>
> - What .git/index records is *not* the state as the last
> commit. It is just an cache Cogito uses to speed up access
> to the user's working tree. From the user's point of view,
> it does not even exist.
Yes. Yes. YES.
That is indeed the whole point of the index file. In my world-view, the
index file does _everything_. It's the staging area ("work file"), it's
the merging area ("merge directory") and it's the cache file ("stat
cache").
I'll immediately write a tool to diff the current working directory
against a tree object, and hopefully that will just make pasky happy with
this model too.
Is there any other reason why git-pasky wants to have a work file?
Linus
^ permalink raw reply
* More transport methods
From: Daniel Barkalow @ 2005-04-20 1:49 UTC (permalink / raw)
To: git
Just in case someone else is considering trying it, I've just written a
pair of programs to transfer a commit and everything it uses directly over
ssh (i.e., without rsync); it is also clever enough to reject anything
that either doesn't inflate or doesn't hash correctly. It also doesn't
transfer anything that the recipient already has or doesn't need.
I have some more cleaning to go on it, but I could post it if others want
to hack on it.
-Daniel
*This .sig left intentionally blank*
^ permalink raw reply
* Re: [RFC] Possible strategy cleanup for git add/remove/diff etc.
From: Junio C Hamano @ 2005-04-20 1:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0504191846290.6467@ppc970.osdl.org>
>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
LT> Is there any other reason why git-pasky wants to have a work file?
Do you mean "why does a user wants to check things out in the
working directory and make changes, possibly run compile tests
before pushing the result to Linus?" ;-) I'm confused what you
mean by "a work file", I guess...
^ permalink raw reply
* Re: [script] ge: export commits as patches
From: David A. Wheeler @ 2005-04-20 2:34 UTC (permalink / raw)
To: Petr Baudis; +Cc: Ingo Molnar, git
In-Reply-To: <20050419194108.GN12757@pasky.ji.cz>
Forget my earlier "aspatch" proposal, that's a lousy name.
How about "mkpatch"? Seems like a reasonable name for
a command that makes a patch. GNU Arch uses that command name.
CVS & Subversion basically do this as part of "diff"
(which is another possibility).
--- David A. Wheeler
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox