* Re: [PATCH] Implement limited context matching in git-apply.
From: Eric W. Biederman @ 2006-04-10 18:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0604100821340.9504@g5.osdl.org>
Linus Torvalds <torvalds@osdl.org> writes:
> On Mon, 10 Apr 2006, Eric W. Biederman wrote:
>>
>> If I just loop through all of Andrews patches in order
>> and run git-apply --index -C1 I process the entire patchset
>> in 1m53s or about 6 patches per second. So running
>> git-mailinfo, git-write-tree, git-commit-tree, and
>> git-update-ref everytime has a measurable impact,
>> and shows things can be speeded up even more.
>
> git-write-tree is actually a fairly expensive operation on the kernel. It
> needs to write the 1000+ tree objects - and while _most_ of them already
> exist (and thus don't actually need to be written out), we need to
> generate the tree object and its SHA1 in order to notice that that is the
> case.
>
> I'm almost certain that 90%+ of the overhead you see is the tree writing,
> not the rest of the scripting.
Well it is easy enough to time. Looking at the timings
going from just git-apply to git-apply && git-write-tree
does seem to about the double the amount of time taken,
or take me to about 4 minutes. With everything else
in there things happen in the 6-7 minute range with
in the hot cache scenario. So write-tree is closer
to 50% of the overhead.
Is it possible to cache the sha1 of unmodified directories?
If we did that we could probe to see if the hash already
existed before we attempted to look for the subdirectories.
The pain would is remembering which directory sha1 are
current. If nothing else we can modify:
remove_cache_entry, and add_file_to_cache to clear
the parent directories cached sha1 when we update an
index entry. But I keep thinking there should
be something more elegant. Like using ce_flags,
or comparing mtime values.
...
Ok taking a quick look at write-tree to see where
the bottle neck is:
I made two modified versions of write-tree.
- git-write-tree-nowritetree which calls return just before calling
write_tree.
- git-write-tree-nosha1write which does everything except call
sha1_file_write.
With just git-apply and git-write-tree-nosha1write it takes
me about 3m:20s to process 2.6.17-rc1-mm2.
With just git-apply and git-write-tree-nowritetree it takes:
real 2m59.985s
user 1m38.353s
sys 0m31.445s
With just git-apply and /bin/true it takes:
real 2m1.581s
user 1m3.169s
sys 0m29.903s
Looking at the individual numbers:
$ time git-write-tree-nowritetree --missing-ok
real 0m0.158s
user 0m0.052s
sys 0m0.008s
$ time git-write-tree-nowritetree --missing-ok
real 0m0.155s
user 0m0.057s
sys 0m0.003s
$ time git-write-tree-nowritetree --missing-ok
real 0m0.065s
user 0m0.057s
sys 0m0.002s
$ time git-write-tree-nowritetree --missing-ok
real 0m0.159s
user 0m0.055s
sys 0m0.005s
$ time git-write-tree-nowritetree --missing-ok
real 0m0.151s
user 0m0.054s
sys 0m0.007s
$ time git-write-tree-nowritetree --missing-ok
real 0m0.154s
user 0m0.056s
sys 0m0.005s
$ time git-write-tree-nosha1write --missing-ok
0000000000000000000000000000000000000000
real 0m0.199s
user 0m0.091s
sys 0m0.008s
$ time git-write-tree-nosha1write --missing-ok
0000000000000000000000000000000000000000
real 0m0.195s
user 0m0.094s
sys 0m0.007s
$ time git-write-tree-nosha1write --missing-ok
0000000000000000000000000000000000000000
real 0m0.198s
user 0m0.092s
sys 0m0.009s
$ time git-write-tree --missing-ok
0ecfe3dbc2e65aa9638c62abf0cf05057c77f884
real 0m0.217s
user 0m0.113s
sys 0m0.012s
$ time git-write-tree
0ecfe3dbc2e65aa9638c62abf0cf05057c77f884
real 0m0.276s
user 0m0.169s
sys 0m0.008s
So at a quick inspection it looks to me like:
About .059s to perform to check for missing files.
About .019s to write the new tree.
About .155s in start up overhead, read_cache, and sanity checks.
So at a first glance it looks like librification to
allow the redundant work to be skipped, is where
the big speed win on my machine would be.
> Your patch looks ok from a quick read-through:
Thanks.
My import of 2.6.17-rc1-mm2 gives exactly the same
result as simply applying Andrews patch. Which while
not definitive hits a lot of interesting cases.
> Acked-by: Linus Torvalds <torvalds@osdl.org>
>
> Linus
^ permalink raw reply
* Re: git-svnimport on OSX?
From: Randal L. Schwartz @ 2006-04-10 14:43 UTC (permalink / raw)
To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90604031911y415dd795nc1c8814f80a02ad7@mail.gmail.com>
>>>>> "Martin" == Martin Langhoff <martin.langhoff@gmail.com> writes:
Martin> BTW, getting git-svnimport to work normally takes me quite a few tries
Martin> with different options, so OSX may be perfectly innocent this time...
Well, is there some combination of things that will give me what
http://svn.perl.org/perl6/doc does? Maybe it just resists all attempts. :)
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
^ permalink raw reply
* Re: [PATCH] Implement limited context matching in git-apply.
From: Linus Torvalds @ 2006-04-10 15:25 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Junio C Hamano, git
In-Reply-To: <m1irphhj1p.fsf_-_@ebiederm.dsl.xmission.com>
On Mon, 10 Apr 2006, Eric W. Biederman wrote:
>
> If I just loop through all of Andrews patches in order
> and run git-apply --index -C1 I process the entire patchset
> in 1m53s or about 6 patches per second. So running
> git-mailinfo, git-write-tree, git-commit-tree, and
> git-update-ref everytime has a measurable impact,
> and shows things can be speeded up even more.
git-write-tree is actually a fairly expensive operation on the kernel. It
needs to write the 1000+ tree objects - and while _most_ of them already
exist (and thus don't actually need to be written out), we need to
generate the tree object and its SHA1 in order to notice that that is the
case.
I'm almost certain that 90%+ of the overhead you see is the tree writing,
not the rest of the scripting.
Your patch looks ok from a quick read-through:
Acked-by: Linus Torvalds <torvalds@osdl.org>
Linus
^ permalink raw reply
* [PATCH] Implement limited context matching in git-apply.
From: Eric W. Biederman @ 2006-04-10 9:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <m13bgmht9v.fsf@ebiederm.dsl.xmission.com>
Ok this really should be the good version. The option
handling has been reworked to be automation safe.
Currently to import the -mm tree I have to work around
git-apply by using patch. Because some of Andrews
patches in quilt will only apply with fuzz.
I started out implementing a --fuzz option and then I realized
fuzz is not a very safe concept for an automated system. What
you really want is a minimum number of context lines that must
match. This allows policy to be set without knowing how many
lines of context a patch actually provides. By default
the policy remains to match all provided lines of context.
Allowng git-apply to match a restricted set of context makes
it much easier to import the -mm tree into git. I am still only
processing 1.5 to 1.6 patches a second for the 692 patches in
2.6.17-rc1-mm2 is still painful but it does help.
If I just loop through all of Andrews patches in order
and run git-apply --index -C1 I process the entire patchset
in 1m53s or about 6 patches per second. So running
git-mailinfo, git-write-tree, git-commit-tree, and
git-update-ref everytime has a measurable impact,
and shows things can be speeded up even more.
All of these timings were taking on my poor 700Mhz Athlon
with 512MB of ram. So people with fast machiens should
see much better performance.
When a match is found after the number of context are reduced a
warning is generated. Since this is a rare event and possibly
dangerous this seems to make sense. Unless you are patching
a single file the error message is a little bit terse at
the moment, but it should be easy to go back and fix.
I have also updated the documentation for git-apply to reflect
the new -C option that sets the minimum number of context
lines that must match.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
Documentation/git-apply.txt | 8 ++-
apply.c | 121 +++++++++++++++++++++++++++++++++++++------
2 files changed, 111 insertions(+), 18 deletions(-)
6b3a4565b760664a9b72096dd5eea8be9e1d1311
diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index 1c64a1a..e93ea1f 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -11,7 +11,7 @@ SYNOPSIS
[verse]
'git-apply' [--stat] [--numstat] [--summary] [--check] [--index] [--apply]
[--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM]
- [--whitespace=<nowarn|warn|error|error-all|strip>]
+ [-CNUM] [--whitespace=<nowarn|warn|error|error-all|strip>]
[<patch>...]
DESCRIPTION
@@ -72,6 +72,12 @@ OPTIONS
-p<n>::
Remove <n> leading slashes from traditional diff paths. The
default is 1.
+
+-C<n>::
+ Ensure at least <n> lines of surrounding context match before
+ and after each change. When fewer lines of surrounding
+ context exist they all most match. By default no context is
+ ever ignored.
--apply::
If you use any of the options marked ``Turns off
diff --git a/apply.c b/apply.c
index 33b4271..147a919 100644
--- a/apply.c
+++ b/apply.c
@@ -32,8 +32,9 @@ static int apply = 1;
static int no_add = 0;
static int show_index_info = 0;
static int line_termination = '\n';
+static unsigned long p_context = -1;
static const char apply_usage[] =
-"git-apply [--stat] [--numstat] [--summary] [--check] [--index] [--apply] [--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM] [--whitespace=<nowarn|warn|error|error-all|strip>] <patch>...";
+"git-apply [--stat] [--numstat] [--summary] [--check] [--index] [--apply] [--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM] [-CNUM] [--whitespace=<nowarn|warn|error|error-all|strip>] <patch>...";
static enum whitespace_eol {
nowarn_whitespace,
@@ -100,6 +101,7 @@ static int max_change, max_len;
static int linenr = 1;
struct fragment {
+ unsigned long leading, trailing;
unsigned long oldpos, oldlines;
unsigned long newpos, newlines;
const char *patch;
@@ -817,12 +819,15 @@ static int parse_fragment(char *line, un
int added, deleted;
int len = linelen(line, size), offset;
unsigned long oldlines, newlines;
+ unsigned long leading, trailing;
offset = parse_fragment_header(line, len, fragment);
if (offset < 0)
return -1;
oldlines = fragment->oldlines;
newlines = fragment->newlines;
+ leading = 0;
+ trailing = 0;
if (patch->is_new < 0) {
patch->is_new = !oldlines;
@@ -860,10 +865,14 @@ static int parse_fragment(char *line, un
case ' ':
oldlines--;
newlines--;
+ if (!deleted && !added)
+ leading++;
+ trailing++;
break;
case '-':
deleted++;
oldlines--;
+ trailing = 0;
break;
case '+':
/*
@@ -887,6 +896,7 @@ static int parse_fragment(char *line, un
}
added++;
newlines--;
+ trailing = 0;
break;
/* We allow "\ No newline at end of file". Depending
@@ -904,6 +914,9 @@ static int parse_fragment(char *line, un
}
if (oldlines || newlines)
return -1;
+ fragment->leading = leading;
+ fragment->trailing = trailing;
+
/* If a fragment ends with an incomplete line, we failed to include
* it in the above loop because we hit oldlines == newlines == 0
* before seeing it.
@@ -1087,7 +1100,7 @@ static int read_old_data(struct stat *st
}
}
-static int find_offset(const char *buf, unsigned long size, const char *fragment, unsigned long fragsize, int line)
+static int find_offset(const char *buf, unsigned long size, const char *fragment, unsigned long fragsize, int line, int *lines)
{
int i;
unsigned long start, backwards, forwards;
@@ -1148,6 +1161,7 @@ static int find_offset(const char *buf,
n = (i >> 1)+1;
if (i & 1)
n = -n;
+ *lines = n;
return try;
}
@@ -1155,6 +1169,33 @@ static int find_offset(const char *buf,
* We should start searching forward and backward.
*/
return -1;
+}
+
+static void remove_first_line(const char **rbuf, int *rsize)
+{
+ const char *buf = *rbuf;
+ int size = *rsize;
+ unsigned long offset;
+ offset = 0;
+ while (offset <= size) {
+ if (buf[offset++] == '\n')
+ break;
+ }
+ *rsize = size - offset;
+ *rbuf = buf + offset;
+}
+
+static void remove_last_line(const char **rbuf, int *rsize)
+{
+ const char *buf = *rbuf;
+ int size = *rsize;
+ unsigned long offset;
+ offset = size - 1;
+ while (offset > 0) {
+ if (buf[--offset] == '\n')
+ break;
+ }
+ *rsize = offset + 1;
}
struct buffer_desc {
@@ -1192,7 +1233,10 @@ static int apply_one_fragment(struct buf
int offset, size = frag->size;
char *old = xmalloc(size);
char *new = xmalloc(size);
+ const char *oldlines, *newlines;
int oldsize = 0, newsize = 0;
+ unsigned long leading, trailing;
+ int pos, lines;
while (size > 0) {
int len = linelen(patch, size);
@@ -1241,23 +1285,59 @@ #ifdef NO_ACCURATE_DIFF
newsize--;
}
#endif
+
+ oldlines = old;
+ newlines = new;
+ leading = frag->leading;
+ trailing = frag->trailing;
+ lines = 0;
+ pos = frag->newpos;
+ for (;;) {
+ offset = find_offset(buf, desc->size, oldlines, oldsize, pos, &lines);
+ if (offset >= 0) {
+ int diff = newsize - oldsize;
+ unsigned long size = desc->size + diff;
+ unsigned long alloc = desc->alloc;
+
+ /* Warn if it was necessary to reduce the number
+ * of context lines.
+ */
+ if ((leading != frag->leading) || (trailing != frag->trailing))
+ fprintf(stderr, "Context reduced to (%ld/%ld) to apply fragment at %d\n",
+ leading, trailing, pos + lines);
+
+ if (size > alloc) {
+ alloc = size + 8192;
+ desc->alloc = alloc;
+ buf = xrealloc(buf, alloc);
+ desc->buffer = buf;
+ }
+ desc->size = size;
+ memmove(buf + offset + newsize, buf + offset + oldsize, size - offset - newsize);
+ memcpy(buf + offset, newlines, newsize);
+ offset = 0;
- offset = find_offset(buf, desc->size, old, oldsize, frag->newpos);
- if (offset >= 0) {
- int diff = newsize - oldsize;
- unsigned long size = desc->size + diff;
- unsigned long alloc = desc->alloc;
-
- if (size > alloc) {
- alloc = size + 8192;
- desc->alloc = alloc;
- buf = xrealloc(buf, alloc);
- desc->buffer = buf;
+ break;
}
- desc->size = size;
- memmove(buf + offset + newsize, buf + offset + oldsize, size - offset - newsize);
- memcpy(buf + offset, new, newsize);
- offset = 0;
+
+ /* Am I at my context limits? */
+ if ((leading <= p_context) && (trailing <= p_context))
+ break;
+ /* Reduce the number of context lines
+ * Reduce both leading and trailing if they are equal
+ * otherwise just reduce the larger context.
+ */
+ if (leading >= trailing) {
+ remove_first_line(&oldlines, &oldsize);
+ remove_first_line(&newlines, &newsize);
+ pos--;
+ leading--;
+ }
+ if (trailing > leading) {
+ remove_last_line(&oldlines, &oldsize);
+ remove_last_line(&newlines, &newsize);
+ trailing--;
+ }
}
free(old);
@@ -1882,6 +1962,7 @@ int main(int argc, char **argv)
for (i = 1; i < argc; i++) {
const char *arg = argv[i];
+ char *end;
int fd;
if (!strcmp(arg, "-")) {
@@ -1943,6 +2024,12 @@ int main(int argc, char **argv)
}
if (!strcmp(arg, "-z")) {
line_termination = 0;
+ continue;
+ }
+ if (!strncmp(arg, "-C", 2)) {
+ p_context = strtoul(arg + 2, &end, 0);
+ if (*end != '\0')
+ die("unrecognized context count '%s'", arg + 2);
continue;
}
if (!strncmp(arg, "--whitespace=", 13)) {
--
1.3-rc3.GIT
^ permalink raw reply related
* Re: git commit broken ?
From: Franck Bui-Huu @ 2006-04-10 8:24 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vhd51alti.fsf@assigned-by-dhcp.cox.net>
2006/4/10, Junio C Hamano <junkio@cox.net>:
> "Franck Bui-Huu" <vagabon.xyz@gmail.com> writes:
>
> > It seems that "git commit -a -c ORIG_HEAD" command do not work as
> > expected.
> >
> > $ git commit -a -c ORIG_HEAD
> > $ git status
> > nothing to commit
> >
> > So it seems that c has been commmited this time...Is it the expected
> > behaviour ?
>
> You said "git commit -a" to tell it to commit all your changes
> in your working tree, using "-c ORIG_HEAD" which means "take the
> commit log message and authorship information from that commit".
>
> So I do not understand what else, other than both a and c
> getting committed (hence subsequent "git status" to report
> "nothing to commit'), you are expecting...
oops, I forgot that switch, sorry for the noise.
Thanks
--
Franck
^ permalink raw reply
* Re: [PATCH] git log [diff-tree options]...
From: Johannes Schindelin @ 2006-04-10 8:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
In-Reply-To: <7vy7ye9uk8.fsf@assigned-by-dhcp.cox.net>
Hi,
On Sun, 9 Apr 2006, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Hi,
> >
> > On Sun, 9 Apr 2006, Johannes Schindelin wrote:
> >
> >> On Sun, 9 Apr 2006, Linus Torvalds wrote:
> >>
> >> > - keep it - for historical reasons - as a internal shorthand, and just
> >> > turn it into "git log --diff -cc"
> >>
> >> It is "git log --cc", right?
> >
> > Like this?
>
> I do not think so. You should default to --cc only there is no
> explicit command line stuff from the user.
Well, my idea was to get rid of the whatchanged script, and deprecate the
internal whatchanged. IMHO "git log" is much faster typed than "git
whatchanged", especially if you have no completion installed. I, for one,
will never ever again use whatchanged.
But you and the list gave me bad marks for that patch, and rightfully so.
Ciao,
Dscho
^ permalink raw reply
* Re: git commit broken ?
From: Junio C Hamano @ 2006-04-10 8:15 UTC (permalink / raw)
To: Franck Bui-Huu; +Cc: git
In-Reply-To: <cda58cb80604100102p92e5258qf33a128f75f1b088@mail.gmail.com>
"Franck Bui-Huu" <vagabon.xyz@gmail.com> writes:
> It seems that "git commit -a -c ORIG_HEAD" command do not work as
> expected.
>
> $ git commit -a -c ORIG_HEAD
> $ git status
> nothing to commit
>
> So it seems that c has been commmited this time...Is it the expected
> behaviour ?
You said "git commit -a" to tell it to commit all your changes
in your working tree, using "-c ORIG_HEAD" which means "take the
commit log message and authorship information from that commit".
So I do not understand what else, other than both a and c
getting committed (hence subsequent "git status" to report
"nothing to commit'), you are expecting...
^ permalink raw reply
* Re: [PATCH] git log [diff-tree options]...
From: Johannes Schindelin @ 2006-04-10 8:10 UTC (permalink / raw)
To: Timo Hirvonen; +Cc: git, junkio
In-Reply-To: <20060410012258.589f1581.tihirvon@gmail.com>
Hi,
On Mon, 10 Apr 2006, Timo Hirvonen wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> > +static int cmd_whatchanged(int argc, const char **argv, char **envp)
> > +{
> > + memmove(argv + 2, argv + 1, argc - 1);
>
> Shouldn't the size be sizeof(char *) * argc (NULL terminated array)?
> There's also overflow...
Yeah, this was some useless late-night coding. But you get the idea.
Ciao,
Dscho
^ permalink raw reply
* git commit broken ?
From: Franck Bui-Huu @ 2006-04-10 8:02 UTC (permalink / raw)
To: Git Mailing List
It seems that "git commit -a -c ORIG_HEAD" command do not work as
expected.
Here is an example:
$ ls
a b c
$ git status
nothing to commit
$ echo "good modif" > a
$ echo "temp modif" > c
$ git-update-index a
$ git commit -m "work in prog"
$ git reset --soft HEAD^
$ git status
#
# Updated but not checked in:
# (will commit)
#
# modified: a
#
#
# Changed but not updated:
# (use git-update-index to mark for commit)
#
# modified: c
#
$ git commit -a -c ORIG_HEAD
$ git status
nothing to commit
So it seems that c has been commmited this time...Is it the expected
behaviour ?
My git version:
$ git --version
git version 1.3.0.rc3.g0ed4
Thanks
--
Franck
^ permalink raw reply
* Re: git pull origin doesn't update the master
From: Junio C Hamano @ 2006-04-10 7:24 UTC (permalink / raw)
To: Aneesh Kumar; +Cc: git
In-Reply-To: <cc723f590604092345r7d0e2cedr8f9838d054ecb023@mail.gmail.com>
"Aneesh Kumar" <aneesh.kumar@gmail.com> writes:
> Work flow is now a bit complicated. I clone the repository. Now i need
> to edit remotes/origin to make sure what all branches i need to
> follow. And then do a git pull origin. Earlier i just need to do a
> clone and a git pull. I don't need to fast forward pu branch.
What you are saying is that the previous round did a wrong thing
without telling the user, and it just happened that you did not
care about the wrong thing it did.
It is a gentle reminder that heads that are rewound need to be
advertised as such. It is conceivable that in future versions
of git we might want to be able to mark some branches "this is
expected to be rewound" explicitly and make the clone operation
to take notice, to give you the plus sign automatically.
^ permalink raw reply
* Re: git pull origin doesn't update the master
From: Aneesh Kumar @ 2006-04-10 6:45 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <7vr7469c4n.fsf@assigned-by-dhcp.cox.net>
On 4/10/06, Junio C Hamano <junkio@cox.net> wrote:
> "Aneesh Kumar" <aneesh.kumar@gmail.com> writes:
>
> > While updating the git code base the master branch is not getting
> > updated. First look tell me that the below commit is the issue
> > a9698bb22fb7b66e5882c3a5e7b2b8b53ea03f90
> >
> > git-pull.sh does
> > git-fetch --update-head-ok "$@" || exit 1
>
> Yes, this was done as an response to an explicit request from
> Andrew Morten.
>
> What Sean said about "pu" branch is correct.
>
>
Sean suggestion worked for me.
Work flow is now a bit complicated. I clone the repository. Now i need
to edit remotes/origin to make sure what all branches i need to
follow. And then do a git pull origin. Earlier i just need to do a
clone and a git pull. I don't need to fast forward pu branch. But then
i need to get the update of master branch. since git-fetch fails
update pu branch git pull origin fails to update master branch also
which i am really interested.
-aneesh
^ permalink raw reply
* Re: git pull origin doesn't update the master
From: Junio C Hamano @ 2006-04-10 6:29 UTC (permalink / raw)
To: Aneesh Kumar; +Cc: git
In-Reply-To: <cc723f590604092141q3517136cmc0a895a069021b8f@mail.gmail.com>
"Aneesh Kumar" <aneesh.kumar@gmail.com> writes:
> While updating the git code base the master branch is not getting
> updated. First look tell me that the below commit is the issue
> a9698bb22fb7b66e5882c3a5e7b2b8b53ea03f90
>
> git-pull.sh does
> git-fetch --update-head-ok "$@" || exit 1
Yes, this was done as an response to an explicit request from
Andrew Morten.
What Sean said about "pu" branch is correct.
^ permalink raw reply
* Re: [PATCH] Implement --fuzz= option for git-apply.
From: Eric W. Biederman @ 2006-04-10 5:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
In-Reply-To: <m1d5fqi23b.fsf@ebiederm.dsl.xmission.com>
ebiederm@xmission.com (Eric W. Biederman) writes:
> Currently to import the -mm tree I have to work around
> git-apply by using patch. Because some of Andrews
> patches in quilt will only apply with fuzz.
>
> Allow git-apply to handle fuzz makes it much easier to import
> the -mm tree into git. I am still only processing about 1.5 patch a
> second which for the 692 patches in 2.6.17-rc1-mm2 is still painful
> but it does help.
>
> If I just apply the patches and don't run git-mailinfo
> git-write-tree, and git-write-commit I get about 4 patches
> per second.
>
> This patch defaults to leaving fuzz processing off so if you don't
> want patches that only apply with fuzz you won't get them.
>
> If a patch does require fuzz to apply you will get a warning:
>> Fragment applied at offset: +-#lines (fuzz: #context_lines_deleted)
Bother I almost had it right the first time.
I forgot to remove the context lines from the new lines that we
apply, in addition to the old lines that we remove. This updated
patch fixes that problem.
Context lines patching themselves in is a weird bug.
Eric
diff --git a/apply.c b/apply.c
index 33b4271..4faf365 100644
--- a/apply.c
+++ b/apply.c
@@ -32,8 +32,9 @@ static int apply = 1;
static int no_add = 0;
static int show_index_info = 0;
static int line_termination = '\n';
+static int p_fuzz = 0;
static const char apply_usage[] =
-"git-apply [--stat] [--numstat] [--summary] [--check] [--index] [--apply] [--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM] [--whitespace=<nowarn|warn|error|error-all|strip>] <patch>...";
+"git-apply [--stat] [--numstat] [--summary] [--check] [--index] [--apply] [--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM] [--fuzz=NUM] [--whitespace=<nowarn|warn|error|error-all|strip>] <patch>...";
static enum whitespace_eol {
nowarn_whitespace,
@@ -100,6 +101,7 @@ static int max_change, max_len;
static int linenr = 1;
struct fragment {
+ unsigned long context;
unsigned long oldpos, oldlines;
unsigned long newpos, newlines;
const char *patch;
@@ -817,12 +819,15 @@ static int parse_fragment(char *line, un
int added, deleted;
int len = linelen(line, size), offset;
unsigned long oldlines, newlines;
+ unsigned long leading, trailing;
offset = parse_fragment_header(line, len, fragment);
if (offset < 0)
return -1;
oldlines = fragment->oldlines;
newlines = fragment->newlines;
+ leading = 0;
+ trailing = 0;
if (patch->is_new < 0) {
patch->is_new = !oldlines;
@@ -860,10 +865,14 @@ static int parse_fragment(char *line, un
case ' ':
oldlines--;
newlines--;
+ if (!deleted && !added)
+ leading++;
+ trailing++;
break;
case '-':
deleted++;
oldlines--;
+ trailing = 0;
break;
case '+':
/*
@@ -887,6 +896,7 @@ static int parse_fragment(char *line, un
}
added++;
newlines--;
+ trailing = 0;
break;
/* We allow "\ No newline at end of file". Depending
@@ -904,6 +914,10 @@ static int parse_fragment(char *line, un
}
if (oldlines || newlines)
return -1;
+ fragment->context = leading;
+ if (leading > trailing)
+ fragment->context = trailing;
+
/* If a fragment ends with an incomplete line, we failed to include
* it in the above loop because we hit oldlines == newlines == 0
* before seeing it.
@@ -1087,7 +1101,7 @@ static int read_old_data(struct stat *st
}
}
-static int find_offset(const char *buf, unsigned long size, const char *fragment, unsigned long fragsize, int line)
+static int find_offset(const char *buf, unsigned long size, const char *fragment, unsigned long fragsize, int line, int *lines)
{
int i;
unsigned long start, backwards, forwards;
@@ -1148,6 +1162,7 @@ static int find_offset(const char *buf,
n = (i >> 1)+1;
if (i & 1)
n = -n;
+ *lines = n;
return try;
}
@@ -1155,6 +1170,31 @@ static int find_offset(const char *buf,
* We should start searching forward and backward.
*/
return -1;
+}
+
+static void reduce_context(char **buf, int *size)
+{
+ char *ctx = *buf;
+ unsigned long ctxsize = *size;
+ unsigned long offset;
+
+ /* Remove the first line */
+ offset = 0;
+ while (offset <= ctxsize) {
+ if (ctx[offset++] == '\n')
+ break;
+ }
+ ctxsize -= offset;
+ ctx += offset;
+ /* Remove the last line */
+ offset = ctxsize - 1;
+ while (offset > 0) {
+ if (ctx[--offset] == '\n')
+ break;
+ }
+ ctxsize = offset + 1;
+ *buf = ctx;
+ *size = ctxsize;
}
struct buffer_desc {
@@ -1192,7 +1232,10 @@ static int apply_one_fragment(struct buf
int offset, size = frag->size;
char *old = xmalloc(size);
char *new = xmalloc(size);
+ char *oldlines, *newlines;
int oldsize = 0, newsize = 0;
+ int lines;
+ int fuzz, max_fuzz;
while (size > 0) {
int len = linelen(patch, size);
@@ -1241,23 +1284,42 @@ #ifdef NO_ACCURATE_DIFF
newsize--;
}
#endif
+
+ offset = -1; /* shutup gcc */
+ oldlines = old;
+ newlines = new;
+ lines = 0;
+ max_fuzz = (p_fuzz < frag->context) ? p_fuzz : frag->context;
+ for (fuzz = 0; fuzz <= max_fuzz; fuzz++) {
+ /* Reduce the number of context lines */
+ if (fuzz) {
+ reduce_context(&oldlines, &oldsize);
+ reduce_context(&newlines, &newsize);
+ }
- offset = find_offset(buf, desc->size, old, oldsize, frag->newpos);
- if (offset >= 0) {
- int diff = newsize - oldsize;
- unsigned long size = desc->size + diff;
- unsigned long alloc = desc->alloc;
-
- if (size > alloc) {
- alloc = size + 8192;
- desc->alloc = alloc;
- buf = xrealloc(buf, alloc);
- desc->buffer = buf;
+ offset = find_offset(buf, desc->size, oldlines, oldsize, frag->newpos + fuzz, &lines);
+ if (offset >= 0) {
+ int diff = newsize - oldsize;
+ unsigned long size = desc->size + diff;
+ unsigned long alloc = desc->alloc;
+
+ if (fuzz)
+ fprintf(stderr, "Fragment applied at offset: %d (fuzz: %d)\n",
+ lines, fuzz);
+
+ if (size > alloc) {
+ alloc = size + 8192;
+ desc->alloc = alloc;
+ buf = xrealloc(buf, alloc);
+ desc->buffer = buf;
+ }
+ desc->size = size;
+ memmove(buf + offset + newsize, buf + offset + oldsize, size - offset - newsize);
+ memcpy(buf + offset, newlines, newsize);
+ offset = 0;
+
+ break;
}
- desc->size = size;
- memmove(buf + offset + newsize, buf + offset + oldsize, size - offset - newsize);
- memcpy(buf + offset, new, newsize);
- offset = 0;
}
free(old);
@@ -1943,6 +2005,10 @@ int main(int argc, char **argv)
}
if (!strcmp(arg, "-z")) {
line_termination = 0;
+ continue;
+ }
+ if (!strncmp(arg, "--fuzz=", 7)) {
+ p_fuzz = atoi(arg + 7);
continue;
}
if (!strncmp(arg, "--whitespace=", 13)) {
^ permalink raw reply related
* Re: git pull origin doesn't update the master
From: sean @ 2006-04-10 5:03 UTC (permalink / raw)
To: Aneesh Kumar; +Cc: git, junkio
In-Reply-To: <cc723f590604092141q3517136cmc0a895a069021b8f@mail.gmail.com>
On Mon, 10 Apr 2006 10:11:05 +0530
"Aneesh Kumar" <aneesh.kumar@gmail.com> wrote:
> While updating the git code base the master branch is not getting
> updated. First look tell me that the below commit is the issue
> a9698bb22fb7b66e5882c3a5e7b2b8b53ea03f90
>
> git-pull.sh does
> git-fetch --update-head-ok "$@" || exit 1
>
> and git-fetch.sh exit with status 1 printing the below message
>
> * refs/heads/pu: does not fast forward to branch 'pu' of
> http://git.kernel.org/pub/scm/git/git;
> not updating.
>
The "pu" branch often won't fast forward because some commits have
been completely deleted in it since the last time you pulled.
If you want to track it, add a plus (+) sign to the proper line in
your .git/remotes/origin file, like this:
Pull: +refs/heads/pu:refs/heads/pu
Which tells git to deal with the problem for you by merging instead
of fast forwarding. Or you can just delete that line completely
if you don't want to track the pu branch at all.
HTH,
Sean
^ permalink raw reply
* Re: git pull origin doesn't update the master
From: Aneesh Kumar @ 2006-04-10 4:51 UTC (permalink / raw)
To: Git Mailing List, Junio C Hamano
In-Reply-To: <cc723f590604092141q3517136cmc0a895a069021b8f@mail.gmail.com>
On 4/10/06, Aneesh Kumar <aneesh.kumar@gmail.com> wrote:
> While updating the git code base the master branch is not getting
> updated. First look tell me that the below commit is the issue
> a9698bb22fb7b66e5882c3a5e7b2b8b53ea03f90
>
> git-pull.sh does
> git-fetch --update-head-ok "$@" || exit 1
>
> and git-fetch.sh exit with status 1 printing the below message
>
> * refs/heads/pu: does not fast forward to branch 'pu' of
> http://git.kernel.org/pub/scm/git/git;
> not updating.
>
>
I think the git-update-server-info is not run on kernel.org. I am
using the http fetch
562036a04223709de4922873238462007bcb529f refs/heads/pu
$ more pu
c52d221ba03c84e0b818d19f7ec30cb4d75fe509
$
-aneesh
^ permalink raw reply
* git pull origin doesn't update the master
From: Aneesh Kumar @ 2006-04-10 4:41 UTC (permalink / raw)
To: Git Mailing List, Junio C Hamano
While updating the git code base the master branch is not getting
updated. First look tell me that the below commit is the issue
a9698bb22fb7b66e5882c3a5e7b2b8b53ea03f90
git-pull.sh does
git-fetch --update-head-ok "$@" || exit 1
and git-fetch.sh exit with status 1 printing the below message
* refs/heads/pu: does not fast forward to branch 'pu' of
http://git.kernel.org/pub/scm/git/git;
not updating.
-aneesh
^ permalink raw reply
* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Nicolas Pitre @ 2006-04-10 3:29 UTC (permalink / raw)
To: Peter Eriksen; +Cc: git
In-Reply-To: <20060409224548.GB21455@erlang.gbar.dtu.dk>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1001 bytes --]
On Mon, 10 Apr 2006, Peter Eriksen wrote:
> On Sun, Apr 09, 2006 at 01:45:00PM -0400, Nicolas Pitre wrote:
> ...
> > Try this with the README file from the git source tree:
> >
> > sed s/git/GIT/g < ./README > /tmp/README.mod
> > test-delta -d ./README /tmp/README.mod /tmp/README.delta
> > [BOOM!]
>
> I found the bug. The code still has some limitations, but now
> it passes the test suite. Thanks for your help, Nicolas.
OK here's some more meat for you:
Copy the same README file from the git source tree, then edit the copied
version so the "Blob Object" section and the "Tree Object" section are
swapped around like shown in the attached patch.
The best delta that can be achieved is 24 bytes.
With the current code the produced delta is 42 bytes.
With your code the resulting delta is 4978 bytes, about twice as large
as the attached patch.
One major limitation of your algorithm appears to not have a global view
of the base buffer before starting to find matches.
Nicolas
[-- Attachment #2: Type: TEXT/PLAIN, Size: 2372 bytes --]
--- f1 2006-04-09 13:31:26.000000000 -0400
+++ f2 2006-04-09 23:04:10.000000000 -0400
@@ -87,26 +87,6 @@
The object types in some more detail:
-Blob Object
-~~~~~~~~~~~
-A "blob" object is nothing but a binary blob of data, and doesn't
-refer to anything else. There is no signature or any other
-verification of the data, so while the object is consistent (it 'is'
-indexed by its sha1 hash, so the data itself is certainly correct), it
-has absolutely no other attributes. No name associations, no
-permissions. It is purely a blob of data (i.e. normally "file
-contents").
-
-In particular, since the blob is entirely defined by its data, if two
-files in a directory tree (or in multiple different versions of the
-repository) have the same contents, they will share the same blob
-object. The object is totally independent of its location in the
-directory tree, and renaming a file does not change the object that
-file is associated with in any way.
-
-A blob is typically created when gitlink:git-update-index[1]
-is run, and its data can be accessed by gitlink:git-cat-file[1].
-
Tree Object
~~~~~~~~~~~
The next hierarchical object type is the "tree" object. A tree object
@@ -147,6 +127,26 @@
its data can be accessed by gitlink:git-ls-tree[1].
Two trees can be compared with gitlink:git-diff-tree[1].
+Blob Object
+~~~~~~~~~~~
+A "blob" object is nothing but a binary blob of data, and doesn't
+refer to anything else. There is no signature or any other
+verification of the data, so while the object is consistent (it 'is'
+indexed by its sha1 hash, so the data itself is certainly correct), it
+has absolutely no other attributes. No name associations, no
+permissions. It is purely a blob of data (i.e. normally "file
+contents").
+
+In particular, since the blob is entirely defined by its data, if two
+files in a directory tree (or in multiple different versions of the
+repository) have the same contents, they will share the same blob
+object. The object is totally independent of its location in the
+directory tree, and renaming a file does not change the object that
+file is associated with in any way.
+
+A blob is typically created when gitlink:git-update-index[1]
+is run, and its data can be accessed by gitlink:git-cat-file[1].
+
Commit Object
~~~~~~~~~~~~~
The "commit" object is an object that introduces the notion of
^ permalink raw reply
* [PATCH] Implement --fuzz= option for git-apply.
From: Eric W. Biederman @ 2006-04-10 2:41 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Currently to import the -mm tree I have to work around
git-apply by using patch. Because some of Andrews
patches in quilt will only apply with fuzz.
Allow git-apply to handle fuzz makes it much easier to import
the -mm tree into git. I am still only processing about 1.5 patch a
second which for the 692 patches in 2.6.17-rc1-mm2 is still painful
but it does help.
If I just apply the patches and don't run git-mailinfo
git-write-tree, and git-write-commit I get about 4 patches
per second.
This patch defaults to leaving fuzz processing off so if you don't
want patches that only apply with fuzz you won't get them.
If a patch does require fuzz to apply you will get a warning:
> Fragment applied at offset: +-#lines (fuzz: #context_lines_deleted)
diff --git a/apply.c b/apply.c
index 33b4271..a07503f 100644
--- a/apply.c
+++ b/apply.c
@@ -32,8 +32,9 @@ static int apply = 1;
static int no_add = 0;
static int show_index_info = 0;
static int line_termination = '\n';
+static int p_fuzz = 0;
static const char apply_usage[] =
-"git-apply [--stat] [--numstat] [--summary] [--check] [--index] [--apply] [--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM] [--whitespace=<nowarn|warn|error|error-all|strip>] <patch>...";
+"git-apply [--stat] [--numstat] [--summary] [--check] [--index] [--apply] [--no-add] [--index-info] [--allow-binary-replacement] [-z] [-pNUM] [--fuzz=NUM] [--whitespace=<nowarn|warn|error|error-all|strip>] <patch>...";
static enum whitespace_eol {
nowarn_whitespace,
@@ -100,6 +101,7 @@ static int max_change, max_len;
static int linenr = 1;
struct fragment {
+ unsigned long context;
unsigned long oldpos, oldlines;
unsigned long newpos, newlines;
const char *patch;
@@ -817,12 +819,15 @@ static int parse_fragment(char *line, un
int added, deleted;
int len = linelen(line, size), offset;
unsigned long oldlines, newlines;
+ unsigned long leading, trailing;
offset = parse_fragment_header(line, len, fragment);
if (offset < 0)
return -1;
oldlines = fragment->oldlines;
newlines = fragment->newlines;
+ leading = 0;
+ trailing = 0;
if (patch->is_new < 0) {
patch->is_new = !oldlines;
@@ -860,10 +865,14 @@ static int parse_fragment(char *line, un
case ' ':
oldlines--;
newlines--;
+ if (!deleted && !added)
+ leading++;
+ trailing++;
break;
case '-':
deleted++;
oldlines--;
+ trailing = 0;
break;
case '+':
/*
@@ -887,6 +896,7 @@ static int parse_fragment(char *line, un
}
added++;
newlines--;
+ trailing = 0;
break;
/* We allow "\ No newline at end of file". Depending
@@ -904,6 +914,10 @@ static int parse_fragment(char *line, un
}
if (oldlines || newlines)
return -1;
+ fragment->context = leading;
+ if (leading > trailing)
+ fragment->context = trailing;
+
/* If a fragment ends with an incomplete line, we failed to include
* it in the above loop because we hit oldlines == newlines == 0
* before seeing it.
@@ -1087,7 +1101,7 @@ static int read_old_data(struct stat *st
}
}
-static int find_offset(const char *buf, unsigned long size, const char *fragment, unsigned long fragsize, int line)
+static int find_offset(const char *buf, unsigned long size, const char *fragment, unsigned long fragsize, int line, int *lines)
{
int i;
unsigned long start, backwards, forwards;
@@ -1148,6 +1162,7 @@ static int find_offset(const char *buf,
n = (i >> 1)+1;
if (i & 1)
n = -n;
+ *lines = n;
return try;
}
@@ -1155,6 +1170,31 @@ static int find_offset(const char *buf,
* We should start searching forward and backward.
*/
return -1;
+}
+
+static void reduce_context(char **buf, int *size)
+{
+ char *ctx = *buf;
+ unsigned long ctxsize = *size;
+ unsigned long offset;
+
+ /* Remove the first line */
+ offset = 0;
+ while (offset <= ctxsize) {
+ if (ctx[offset++] == '\n')
+ break;
+ }
+ ctxsize -= offset;
+ ctx += offset;
+ /* Remove the last line */
+ offset = ctxsize - 1;
+ while (offset > 0) {
+ if (ctx[--offset] == '\n')
+ break;
+ }
+ ctxsize = offset + 1;
+ *buf = ctx;
+ *size = ctxsize;
}
struct buffer_desc {
@@ -1192,7 +1232,10 @@ static int apply_one_fragment(struct buf
int offset, size = frag->size;
char *old = xmalloc(size);
char *new = xmalloc(size);
- int oldsize = 0, newsize = 0;
+ char *ctx;
+ int oldsize = 0, newsize = 0, ctxsize;
+ int lines;
+ int fuzz, max_fuzz;
while (size > 0) {
int len = linelen(patch, size);
@@ -1241,23 +1284,39 @@ #ifdef NO_ACCURATE_DIFF
newsize--;
}
#endif
+
+ offset = -1; /* shutup gcc */
+ ctx = old;
+ ctxsize = oldsize;
+ lines = 0;
+ max_fuzz = (p_fuzz < frag->context) ? p_fuzz : frag->context;
+ for (fuzz = 0; fuzz <= max_fuzz; fuzz++) {
+ /* Reduce the number of context lines */
+ if (fuzz)
+ reduce_context(&ctx, &ctxsize);
+ offset = find_offset(buf, desc->size, ctx, ctxsize, frag->newpos + fuzz, &lines);
+ if (offset >= 0) {
+ int diff = newsize - ctxsize;
+ unsigned long size = desc->size + diff;
+ unsigned long alloc = desc->alloc;
+
+ if (fuzz)
+ fprintf(stderr, "Fragment applied at offset: %d (fuzz: %d)\n",
+ lines, fuzz);
+
+ if (size > alloc) {
+ alloc = size + 8192;
+ desc->alloc = alloc;
+ buf = xrealloc(buf, alloc);
+ desc->buffer = buf;
+ }
+ desc->size = size;
+ memmove(buf + offset + newsize, buf + offset + ctxsize, size - offset - newsize);
+ memcpy(buf + offset, new, newsize);
+ offset = 0;
- offset = find_offset(buf, desc->size, old, oldsize, frag->newpos);
- if (offset >= 0) {
- int diff = newsize - oldsize;
- unsigned long size = desc->size + diff;
- unsigned long alloc = desc->alloc;
-
- if (size > alloc) {
- alloc = size + 8192;
- desc->alloc = alloc;
- buf = xrealloc(buf, alloc);
- desc->buffer = buf;
+ break;
}
- desc->size = size;
- memmove(buf + offset + newsize, buf + offset + oldsize, size - offset - newsize);
- memcpy(buf + offset, new, newsize);
- offset = 0;
}
free(old);
@@ -1943,6 +2002,10 @@ int main(int argc, char **argv)
}
if (!strcmp(arg, "-z")) {
line_termination = 0;
+ continue;
+ }
+ if (!strncmp(arg, "--fuzz=", 7)) {
+ p_fuzz = atoi(arg + 7);
continue;
}
if (!strncmp(arg, "--whitespace=", 13)) {
^ permalink raw reply related
* Re: [PATCH] git log [diff-tree options]...
From: Linus Torvalds @ 2006-04-10 0:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Johannes Schindelin, git
In-Reply-To: <7vy7ye9uk8.fsf@assigned-by-dhcp.cox.net>
On Sun, 9 Apr 2006, Junio C Hamano wrote:
>
> I do not think so. You should default to --cc only there is no
> explicit command line stuff from the user.
Actually, even that would be wrong, when I think more about it. The
default for "git-whatchanged" is to do diffing, but default to the "raw"
diff (just "-r" for recursive).
So the most appropriate default set of flags is likely "-r -c", which also
means that any subsequent explicit command line stuff will override it (ie
adding a "-p" should automatically do the right thing).
But the "memmove()" to move the arguments around was definitely broken.
Much better to just initialize the diff flags manually, I think.
Linus
^ permalink raw reply
* Re: [PATCH] git log [diff-tree options]...
From: Junio C Hamano @ 2006-04-09 23:51 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.63.0604100000430.30000@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> Hi,
>
> On Sun, 9 Apr 2006, Johannes Schindelin wrote:
>
>> On Sun, 9 Apr 2006, Linus Torvalds wrote:
>>
>> > - keep it - for historical reasons - as a internal shorthand, and just
>> > turn it into "git log --diff -cc"
>>
>> It is "git log --cc", right?
>
> Like this?
I do not think so. You should default to --cc only there is no
explicit command line stuff from the user.
^ permalink raw reply
* Re: Fixes to parsecvs
From: Francois Romieu @ 2006-04-09 23:17 UTC (permalink / raw)
To: Keith Packard; +Cc: Jan-Benedict Glaw, Git Mailing List
In-Reply-To: <1144334896.2303.259.camel@neko.keithp.com>
Keith Packard <keithp@keithp.com> :
[...]
> > How well does this work with even larger repositories?
>
> postgresql is the largest I've run; starting with a 615M CVS repository,
> it built a 1.7G .git tree, which packed down to 125M.
As a datapoint, I gave parsecvs a try on a local CVS repository.
The repository weights 3.28 Go. It contains 53k files (45k non-attic).
.git/objets grew from ~100k files at the end of the first pass to
199k files (~11k commit). It took 18h on a 3GHz PIV with 2Go RAM.
After 6 hours, 400 Mo were pushed to swap and parsecvs took 1.95 Go
of RAM for itself. No significant swap activity. Swap grew to 900 Mo
at end of run. A tarball (5 Mo) containing vmstat + size of objects
is available at http://www.cogenit.fr/linux/misc/cvsparse-debug.tar.bz2
I have interrupted 'git repack -a -d' after 6 hours.
--
Ueimor
^ permalink raw reply
* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Peter Eriksen @ 2006-04-09 22:45 UTC (permalink / raw)
To: git
In-Reply-To: <Pine.LNX.4.64.0604091340540.2215@localhost.localdomain>
On Sun, Apr 09, 2006 at 01:45:00PM -0400, Nicolas Pitre wrote:
...
> Try this with the README file from the git source tree:
>
> sed s/git/GIT/g < ./README > /tmp/README.mod
> test-delta -d ./README /tmp/README.mod /tmp/README.delta
> [BOOM!]
I found the bug. The code still has some limitations, but now
it passes the test suite. Thanks for your help, Nicolas.
Peter
----->8---diff-delta.c---->8-------
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "delta.h"
#define BASE 257
#define PREFIX_SIZE 3
#define SIZE 10
#define HASH_TABLE_SIZE (1<<SIZE)
#define DELTA_SIZE (1024 * 1024)
unsigned int init_hash(unsigned char* data) {
return data[0]*BASE*BASE + data[1]*BASE + data[2];
}
unsigned int hash(unsigned char* data, unsigned int hash) {
return (hash - data[-1]*BASE*BASE)*BASE + data[2];
}
#define GR_PRIME 0x9e370001
#define HASH(v) ((v * GR_PRIME) >> (32 - SIZE))
struct entry {
char file;
char* offset;
};
void flush(struct entry* table) {
memset(table, 0, HASH_TABLE_SIZE * sizeof(struct entry));
}
int same_prefixes(char* data1, char* data2) {
return !memcmp(data1, data2, PREFIX_SIZE);
}
void encode_add(char* out, int* outpos, char* version_start, char* version_copy) {
unsigned int size = version_copy - version_start;
if (!size) return;
int pos = *outpos;
while(size > 127) {
out[pos++] = 127;
memcpy(out + pos, version_start, 127);
pos += 127;
version_start += 127;
size -= 127;
}
out[pos++] = size;
memcpy(out + pos, version_start, size);
pos += size;
*outpos = pos;
}
void encode_copy(char* out, int* outpos, int offset, int size) {
int pos = (*outpos) + 1;
int i = 0x80;
if (offset & 0xff) { out[pos++] = offset; i |= 0x01; }
offset >>= 8;
if (offset & 0xff) { out[pos++] = offset; i |= 0x02; }
offset >>= 8;
if (offset & 0xff) { out[pos++] = offset; i |= 0x04; }
offset >>= 8;
if (offset & 0xff) { out[pos++] = offset; i |= 0x08; }
if (size & 0xff) { out[pos++] = size; i |= 0x10; }
size >>= 8;
if (size & 0xff) { out[pos++] = size; i |= 0x20; }
out[*outpos] = i;
*outpos = pos;
}
void encode_size(char* out, int* outpos, unsigned long size) {
int pos = *outpos;
out[pos] = size;
size >>= 7;
while (size) {
out[pos++] |= 0x80;
out[pos] = size;
size >>= 7;
}
*outpos = ++pos;
}
void *diff_delta(void *from_buf, unsigned long from_size,
void *to_buf, unsigned long to_size,
unsigned long *delta_size,
unsigned long max_size) {
unsigned int index;
unsigned int l;
unsigned char* base = from_buf;
unsigned char* version = to_buf;
unsigned long base_size = from_size;
unsigned long version_size = to_size;
unsigned char* base_copy = base;
unsigned char* version_copy = version;
struct entry* table = calloc(HASH_TABLE_SIZE, sizeof(struct entry));
//int delta_alloc = DELTA_SIZE;
unsigned char* delta = malloc(DELTA_SIZE);
unsigned int deltapos = 0;
unsigned char* base_top = base + base_size;
unsigned char* version_top = version + version_size;
encode_size(delta, &deltapos, base_size);
encode_size(delta, &deltapos, version_size);
unsigned char* base_offset = base;
unsigned char* version_offset = version;
unsigned int base_hash = init_hash(base);
unsigned int version_hash = init_hash(version);
unsigned char* version_start = version;
while(base_offset - base + PREFIX_SIZE < base_top - base &&
version_offset - version + PREFIX_SIZE < version_top - version) {
// step2:
index = HASH(base_hash);
switch (table[index].file) {
case '\0': {
table[index].file = 'b';
table[index].offset = base_offset;
break;
}
case 'v': {
if (same_prefixes(base_offset, table[index].offset)) {
base_copy = base_offset;
version_copy = table[index].offset;
goto step3;
} else break;
}
case 'b': break;
default: printf("AAAAAARGH 2b\n");
}
index = HASH(version_hash);
switch (table[index].file) {
case '\0': {
table[index].file = 'v';
table[index].offset = version_offset;
break;
}
case 'b': {
if (same_prefixes(table[index].offset, version_offset)) {
base_copy = table[index].offset;
version_copy = version_offset;
goto step3;
} else break;
}
case 'v': break;
default: printf("AAAAAARGH 2v\n");
}
base_offset++;
version_offset++;
base_hash = hash(base_offset, base_hash);
version_hash = hash(version_offset, version_hash);
continue; // goto step2;
step3:
l = 0;
while(base_copy[l] == version_copy[l] && base_copy + l < base_top && version_copy + l < version_top) l++;
base_offset = base_copy + l;
version_offset = version_copy + l;
/*
// Make sure we don't run out of delta buffer when encoding.
if((delta_alloc - deltapos) <
(version_start - version_copy) + 1 + 8 + (PREFIX_SIZE + 1)) {
delta_alloc = delta_alloc * 3 / 2;
delta = (char*) realloc(delta, delta_alloc);
}
*/
if(max_size && deltapos > max_size) {
free(delta);
free(table);
return NULL;
}
//fprintf(stdout, "add: pos %u, v_start %u, v_copy %u\n",
// deltapos, version_start - version, version_copy - version);
// step4:
encode_add(delta, &deltapos, version_start, version_copy);
//fprintf(stdout, "copy: pos %u, v_copy %u, l %u\n",
// deltapos, base_copy - base, l);
encode_copy(delta, &deltapos, base_copy - base, l);
// step5:
flush(table);
version_start = version_offset;
base_hash = init_hash(base_offset);
version_hash = init_hash(version_offset);
//fprintf(stdout, "3) pos %u, v_start %u, v %u, b %u\n",
// deltapos, version_start - version, version_offset - version, base_offset- base);
} // goto step2;
//fprintf(stdout, "pos %u, v_start %u, v_top %u\n",
// deltapos, version_start - version, version_size);
encode_add(delta, &deltapos, version_start, version + version_size);
*delta_size = deltapos;
free(table);
return delta;
}
^ permalink raw reply
* Re: [PATCH] git log [diff-tree options]...
From: Timo Hirvonen @ 2006-04-09 22:22 UTC (permalink / raw)
To: git; +Cc: torvalds, junkio, git
In-Reply-To: <Pine.LNX.4.63.0604100000430.30000@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> +static int cmd_whatchanged(int argc, const char **argv, char **envp)
> +{
> + memmove(argv + 2, argv + 1, argc - 1);
Shouldn't the size be sizeof(char *) * argc (NULL terminated array)?
There's also overflow...
--
http://onion.dynserv.net/~timo/
^ permalink raw reply
* Re: [ANNOUNCE] git-svnconvert: YASI (Yet Another SVN importer)
From: Randal L. Schwartz @ 2006-04-09 22:06 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, Jakub Narebski, git
In-Reply-To: <Pine.LNX.4.63.0604092325590.29434@wbgn013.biozentrum.uni-wuerzburg.de>
>>>>> "Johannes" == Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
Johannes> I have _never_ seen a setup where Ruby was installed by
Johannes> default. Perl always, Python often.
OSX includes ruby by default.
Johannes> Furthermore, my feeling is that we are in the beginning phase of
Johannes> migration from scripting languages (which are good for prototyping)
Johannes> towards plain C. So adding yet another scripting language
Johannes> dependency is a little backwards.
You seem a bit prejudiced here. Are there performance problems in
the Perl and python parts of git? If so, concentrate first on optimizing
the code where it matters. Then, creating bindings to the "git lib"
so that the heavy lifting can be done in C while still providing for
the basic algorithms to be written in a higher level language.
It would be a step *backwards* to recode all of git in C.
Now, the *shell* parts, on the other hand, are screaming for a rewrite into
Perl or Python. fork-fork-fork and worrying about escaping special characters
needlessly burns a lot of cpu and programmer time.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
^ permalink raw reply
* Re: [PATCH] git log [diff-tree options]...
From: Johannes Schindelin @ 2006-04-09 22:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.63.0604092312340.29136@wbgn013.biozentrum.uni-wuerzburg.de>
Hi,
On Sun, 9 Apr 2006, Johannes Schindelin wrote:
> On Sun, 9 Apr 2006, Linus Torvalds wrote:
>
> > - keep it - for historical reasons - as a internal shorthand, and just
> > turn it into "git log --diff -cc"
>
> It is "git log --cc", right?
Like this?
---
git.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)
751e205a9ffd3a55094a0c0f657735023776cf74
diff --git a/git.c b/git.c
index 8776088..3a94afa 100644
--- a/git.c
+++ b/git.c
@@ -385,6 +385,13 @@ static int cmd_log(int argc, const char
return 0;
}
+static int cmd_whatchanged(int argc, const char **argv, char **envp)
+{
+ memmove(argv + 2, argv + 1, argc - 1);
+ argv[1] = "--cc";
+ return cmd_log(argc + 1, argv, envp);
+}
+
static void handle_internal_command(int argc, const char **argv, char **envp)
{
const char *cmd = argv[0];
@@ -395,6 +402,7 @@ static void handle_internal_command(int
{ "version", cmd_version },
{ "help", cmd_help },
{ "log", cmd_log },
+ { "whatchanged", cmd_whatchanged },
};
int i;
--
1.2.0.g61002-dirty
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox