* [PATCH] cleans up builtin-mv
@ 2006-08-18 5:59 David Rientjes
2006-08-18 6:12 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2006-08-18 5:59 UTC (permalink / raw)
To: git
Cleans up builtin-mv by removing a needless check of source's length,
redefinition of source's length, and misuse of strlen call that was
already assigned.
Signed-off-by: David Rientjes <rientjes@google.com>
---
builtin-mv.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/builtin-mv.c b/builtin-mv.c
index c0c8764..54c9262 100644
--- a/builtin-mv.c
+++ b/builtin-mv.c
@@ -126,7 +126,7 @@ int cmd_mv(int argc, const char **argv,
/* Checking */
for (i = 0; i < count; i++) {
- int length;
+ int length = strlen(source[i]);
const char *bad = NULL;
if (show_only)
@@ -137,14 +137,13 @@ int cmd_mv(int argc, const char **argv,
bad = "bad source";
if (!bad &&
- (length = strlen(source[i])) >= 0 &&
!strncmp(destination[i], source[i], length) &&
(destination[i][length] == 0 || destination[i][length] == '/'))
bad = "can not move directory into itself";
if (S_ISDIR(st.st_mode)) {
const char *dir = source[i], *dest_dir = destination[i];
- int first, last, len = strlen(dir);
+ int first, last;
if (lstat(dest_dir, &st) == 0) {
bad = "cannot move directory over file";
@@ -153,14 +152,15 @@ int cmd_mv(int argc, const char **argv,
modes[i] = WORKING_DIRECTORY;
- first = cache_name_pos(source[i], len);
+ first = cache_name_pos(source[i], length);
if (first >= 0)
die ("Huh? %s/ is in index?", dir);
first = -1 - first;
for (last = first; last < active_nr; last++) {
const char *path = active_cache[last]->name;
- if (strncmp(path, dir, len) || path[len] != '/')
+ if (strncmp(path, dir, length) ||
+ path[length] != '/')
break;
}
@@ -189,7 +189,7 @@ int cmd_mv(int argc, const char **argv,
source[count + j] = path;
destination[count + j] =
prefix_path(dest_dir, dst_len,
- path + len);
+ path + length);
modes[count + j] = INDEX;
}
count += last - first;
@@ -217,7 +217,7 @@ int cmd_mv(int argc, const char **argv,
}
}
- if (!bad && cache_name_pos(source[i], strlen(source[i])) < 0)
+ if (!bad && cache_name_pos(source[i], length) < 0)
bad = "not under version control";
if (!bad) {
--
1.4.2.rc4.g55c3-dirty
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 5:59 [PATCH] cleans up builtin-mv David Rientjes
@ 2006-08-18 6:12 ` David Rientjes
2006-08-18 9:51 ` Johannes Schindelin
0 siblings, 1 reply; 8+ messages in thread
From: David Rientjes @ 2006-08-18 6:12 UTC (permalink / raw)
To: git
On Thu, 17 Aug 2006, David Rientjes wrote:
> Cleans up builtin-mv by removing a needless check of source's length,
> redefinition of source's length, and misuse of strlen call that was
> already assigned.
>
I'm not sure when this command had been added to the tree because it
definitely was not included six months ago in a git tree I use everyday.
It seems to me like this would more appropriately be handled by a simple
shell script that would be much simpler to implement and could not
possibly be slower than this implementation.
This patch is a small fraction of what could be changed in this
implementation and I don't doubt it will undergo a complete rewrite in the
future. I think the problems with it have compounded on top of itself
over time which doesn't make a lot of sense since it appears to be a
relatively new addition.
For example:
(length = strlen(source[i])) >= 0
was _completely_ unnecessary since the previous instruction was a call to
lstat(source[i], ...) which would return ENOENT if source[i] was empty.
strlen(source[i]) was assigned to a variable later in the function, this
time called "len" instead. There was also an additional call to
strlen(source[i]) on its own even though the len variable was within
scope.
This code is _utterly_ unsatisfactory.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 6:12 ` David Rientjes
@ 2006-08-18 9:51 ` Johannes Schindelin
2006-08-18 17:07 ` David Rientjes
2006-08-18 19:33 ` Junio C Hamano
0 siblings, 2 replies; 8+ messages in thread
From: Johannes Schindelin @ 2006-08-18 9:51 UTC (permalink / raw)
To: David Rientjes; +Cc: git
Hi,
On Thu, 17 Aug 2006, David Rientjes wrote:
> On Thu, 17 Aug 2006, David Rientjes wrote:
>
> > Cleans up builtin-mv by removing a needless check of source's length,
> > redefinition of source's length, and misuse of strlen call that was
> > already assigned.
> >
>
> I'm not sure when this command had been added to the tree
Tip of the day: "git log builtin-mv.c"
> because it definitely was not included six months ago in a git tree I
> use everyday. It seems to me like this would more appropriately be
> handled by a simple shell script
Nooooooo!
First, it _was_ a perl script, which you probably could find out by
checking your old git.
Second, it was rewritten to use Git.pm, and because _that_ did not work,
git-mv was rewritten as a builtin.
> that would be much simpler to implement and could not possibly be slower
> than this implementation.
Not slower? I beg to differ, admitting it is only a few percent. But your
statement is obviously uncorrect.
As for the "simpler to implement": "harder to port" comes into mind. And
do not try to argue that everybody and his dog could switch to Linux.
> This patch is a small fraction of what could be changed in this
> implementation and I don't doubt it will undergo a complete rewrite in
> the future.
Well, the patch has an improvement factor of almost none. I actually read
the patch, and asked myself: why would anybody fix a non-problem?
> I think the problems with it have compounded on top of itself over time
> which doesn't make a lot of sense since it appears to be a relatively
> new addition.
>
> For example:
> (length = strlen(source[i])) >= 0
Yes. Taken out of context, this sure sounds silly.
What you cleverly did not mention: It was inside a
if (!bad &&
(length = strlen(source[i])) >= 0 &&
!strncmp(destination[i], source[i], length) &&
(destination[i][length] == 0 || destination[i][length] == '/'))
construct. So, we assign the "length" variable only if we have to. And the
">= 0" trick is a common one. I could have done
!strncmp(destination[i], source[i], (length = strlen(source[i])))
but even I find that ugly.
> was _completely_ unnecessary since the previous instruction was a call to
> lstat(source[i], ...) which would return ENOENT if source[i] was empty.
Clarified enough?
> strlen(source[i]) was assigned to a variable later in the function, this
> time called "len" instead.
Only if source[i] is a directory. So again, we only do it when we need to.
> This code is _utterly_ unsatisfactory.
I disagree.
What is unsatisfactory to me is that I expected some performance
improvements from one of your earlier mails, and I see patches which
rearrange working code. This _can_ result in performance improvements, but
in these cases, no, it doesn't.
Having said that, I do not have anything against the patch being applied,
but if I see more of these i-would-like-the-cupboard-here-not-there
patches, I will just not review them any more.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 9:51 ` Johannes Schindelin
@ 2006-08-18 17:07 ` David Rientjes
2006-08-18 18:35 ` Josef Weidendorfer
2006-08-18 19:33 ` Junio C Hamano
1 sibling, 1 reply; 8+ messages in thread
From: David Rientjes @ 2006-08-18 17:07 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git
On Fri, 18 Aug 2006, Johannes Schindelin wrote:
> First, it _was_ a perl script, which you probably could find out by
> checking your old git.
>
> Second, it was rewritten to use Git.pm, and because _that_ did not work,
> git-mv was rewritten as a builtin.
>
It shouldn't have ever been a perl script, it should have been /bin/sh.
Any shell implementation of this would be significantly faster than the
current implementation.
> Not slower? I beg to differ, admitting it is only a few percent. But your
> statement is obviously uncorrect.
>
It _is_ slower since it takes considerably more time to do its job than
any corresponding shell script.
> Well, the patch has an improvement factor of almost none. I actually read
> the patch, and asked myself: why would anybody fix a non-problem?
>
Because it's _wrong_. Secondly, it's WRONG.
> > For example:
> > (length = strlen(source[i])) >= 0
>
> Yes. Taken out of context, this sure sounds silly.
>
> What you cleverly did not mention: It was inside a
>
> if (!bad &&
> (length = strlen(source[i])) >= 0 &&
> !strncmp(destination[i], source[i], length) &&
> (destination[i][length] == 0 || destination[i][length] == '/'))
>
> construct. So, we assign the "length" variable only if we have to. And the
> ">= 0" trick is a common one. I could have done
>
This is not a plausible justification _at all_. The idea that "length" is
assigned only on the condition that lstat(path, ...) failed does not
justify its comparison to >= 0 since this comparison is always true, nor
does it justify the assignment of
char *dir = source[i];
int len = strlen(dir);
later.
> > strlen(source[i]) was assigned to a variable later in the function, this
> > time called "len" instead.
>
> Only if source[i] is a directory. So again, we only do it when we need to.
>
You're completely ignoring the point, and more importantly, ignoring the
code path. Your implementation would have _always_ assigned
strlen(source[i]) to "length" if lstat returned 0. So at this point in
the code, "length" is always equal to strlen(source[i]). But your code
introduces another call to strlen, another variable, and another
assignment.
> Having said that, I do not have anything against the patch being applied,
> but if I see more of these i-would-like-the-cupboard-here-not-there
> patches, I will just not review them any more.
>
My patch is correct and improves your code. Any criticism for such a
patch has purely personal motives, and not technical motives, assigned to
it.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 17:07 ` David Rientjes
@ 2006-08-18 18:35 ` Josef Weidendorfer
2006-08-18 19:01 ` David Rientjes
0 siblings, 1 reply; 8+ messages in thread
From: Josef Weidendorfer @ 2006-08-18 18:35 UTC (permalink / raw)
To: David Rientjes; +Cc: Johannes Schindelin, git
On Friday 18 August 2006 19:07, David Rientjes wrote:
> It shouldn't have ever been a perl script, it should have been /bin/sh.
> Any shell implementation of this would be significantly faster than the
> current implementation.
Can you explain your reasoning in more detail?
C compiles to native code. Bash itself first has to
parse the script. How on earth can this be faster than native code?
I simply do not understand this discussion about implementation language,
especially in this case where most of the work is probably done changing
git's index (the add's and rm's of tree entries). Of course it could have
been done in /bin/sh, but it wasn't (it started as git-rename.perl).
The portability argument speaks for C, thus I agree with Dscho.
> > if (!bad &&
> > (length = strlen(source[i])) >= 0 &&
> > !strncmp(destination[i], source[i], length) &&
> > (destination[i][length] == 0 || destination[i][length] == '/'))
> >
> > construct. So, we assign the "length" variable only if we have to. And the
> > ">= 0" trick is a common one. I could have done
> >
>
> This is not a plausible justification _at all_.
Hmm... I suppose Dscho's argument was that this "... >=0" is a standard way
to code an assignment inside of an expression.
Josef
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 18:35 ` Josef Weidendorfer
@ 2006-08-18 19:01 ` David Rientjes
0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2006-08-18 19:01 UTC (permalink / raw)
To: Josef Weidendorfer; +Cc: Johannes Schindelin, git
On Fri, 18 Aug 2006, Josef Weidendorfer wrote:
> Can you explain your reasoning in more detail?
> C compiles to native code. Bash itself first has to
> parse the script. How on earth can this be faster than native code?
>
> I simply do not understand this discussion about implementation language,
> especially in this case where most of the work is probably done changing
> git's index (the add's and rm's of tree entries). Of course it could have
> been done in /bin/sh, but it wasn't (it started as git-rename.perl).
>
It's not faster than native code, it's faster than the current
implementation of builtin-mv. And when you're working with terabytes of
data like I am, I would prefer to use something fast.
> Hmm... I suppose Dscho's argument was that this "... >=0" is a standard way
> to code an assignment inside of an expression.
>
That argument is unjustified since the only advantage of putting it in an
expression is to not evaluate it if the lstat failed (and not fail by
means of ENOENT because copy_pathspec guarantees all results have strlen >
0). So "length" is set unnecessarily only if lstat fails which should
never happen if copy_pathspec does it's job with correct arguments. I'm
willing to sacrifice that if the _working_ case is faster (and
significantly faster) especially since this is an iteration and is
directly tied to the command's speed.
The comparison to 0 simply creates a cmpl $0, x(%ebp) that will always be
true and a jump to a label that never needed to exist.
Likewise, the additional declaration and initilization of a completely
redundant case call to strlen slows us down FOR EVERY ITERATION OF THE
MOVE:
movl %eax, x(%ebp)
movl (x*2)(%ebp), %eax
movl $-1, %ecx
movl %eax, (x*4)(%ebp)
movb %0, %al
cld
movl (x*4)(%ebp), %edi
repnz
scasb
movl %ecx, %eax
notl %eax
decl %eax
And then repeat that same call again because of its miscall later on when
it's already been assigned to a variable.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 9:51 ` Johannes Schindelin
2006-08-18 17:07 ` David Rientjes
@ 2006-08-18 19:33 ` Junio C Hamano
2006-08-19 1:26 ` Johannes Schindelin
1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2006-08-18 19:33 UTC (permalink / raw)
To: Johannes Schindelin; +Cc: git, David Rientjes
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> What you cleverly did not mention: It was inside a
>
> if (!bad &&
> (length = strlen(source[i])) >= 0 &&
> !strncmp(destination[i], source[i], length) &&
> (destination[i][length] == 0 || destination[i][length] == '/'))
>
> construct. So, we assign the "length" variable only if we have to. And the
> ">= 0" trick is a common one. I could have done
>
> !strncmp(destination[i], source[i], (length = strlen(source[i])))
>
> but even I find that ugly.
I usually side with you but on this I can't.
There are 2 ways to generate branch instructions in C.
- compound statements specifically designed for expressing
control structure: if () ... else ..., for (), while (),
switch (), etc.
- expressions using conditional operators or logical operators
that short circuit: ... ? ... : ..., ... && ... || ...
The latter form may still be readable even with simple side
effects inside its terms, but "(l = strlen(s)) >= 0" is done
solely for the side effect, and its computed value does not have
anything to do with the logical operation &&.
THIS IS UGLY. And do not want to live in a world where this
ugliness is a "common one", as you put it.
And this avoiding one call to strlen(source[i]) is unnecessary
even as an optimization -- you end up calling strlen() on it
later in the code anyway, as David points out.
I think this part is far easier to read if you did it like this:
length = strlen(source[i]);
if (lstat(source[i], &st) < 0)
bad = "bad source";
else if (!strncmp(destination[i], source[i], length) &&
(destination[i][length] == 0 ||
destination[i][length] == '/'))
bad = "can not move directory into itself";
if (S_ISDIR(st.st_mode)) {
...
Note that the above is an absolute minimum rewrite. Other
things I noticed are:
- source[i] and destination[i] are referenced all the time; the
code would be easer to read if you had something like this
upfront:
/* Checking */
for (i = 0; i < count; i++) {
const char *bad = NULL;
const char *src = source[i];
const char *dst = destination[i];
int srclen = strlen(src);
int dstlen = strlen(dst);
You might end up not using dstlen in some cases, but I think
this would be far easier to read. Micro-optimizing by saying
"this is used only in this branch of this later if()
statement but in that case it is always set in that branch of
that earlier if() statement" makes unmaintainably confusing
code.
- I do not think you need "const char *dir, *dest_dir" inside
the "source is directory" branch; I would just use src and dst
consistently;
- You muck with dest_dir by calling add_slash(dest_dir) but
call prefix_path() with dst_len you computed earlier;
prefix_path() may know what to do, but is this intended?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] cleans up builtin-mv
2006-08-18 19:33 ` Junio C Hamano
@ 2006-08-19 1:26 ` Johannes Schindelin
0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2006-08-19 1:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Hi,
On Fri, 18 Aug 2006, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > What you cleverly did not mention: It was inside a
> >
> > if (!bad &&
> > (length = strlen(source[i])) >= 0 &&
> > !strncmp(destination[i], source[i], length) &&
> > (destination[i][length] == 0 || destination[i][length] == '/'))
> >
> > construct. So, we assign the "length" variable only if we have to. And the
> > ">= 0" trick is a common one. I could have done
> >
> > !strncmp(destination[i], source[i], (length = strlen(source[i])))
> >
> > but even I find that ugly.
>
> I usually side with you but on this I can't.
>
> There are 2 ways to generate branch instructions in C.
>
> - compound statements specifically designed for expressing
> control structure: if () ... else ..., for (), while (),
> switch (), etc.
>
> - expressions using conditional operators or logical operators
> that short circuit: ... ? ... : ..., ... && ... || ...
>
> The latter form may still be readable even with simple side
> effects inside its terms, but "(l = strlen(s)) >= 0" is done
> solely for the side effect, and its computed value does not have
> anything to do with the logical operation &&.
>
> THIS IS UGLY. And do not want to live in a world where this
> ugliness is a "common one", as you put it.
Okay. Probably the explanation is: I do not use git-mv myself, but only
got annoyed enough by a failing t7001 to rewrite it.
> And this avoiding one call to strlen(source[i]) is unnecessary
> even as an optimization -- you end up calling strlen() on it
> later in the code anyway, as David points out.
>
> I think this part is far easier to read if you did it like this:
>
> length = strlen(source[i]);
> if (lstat(source[i], &st) < 0)
> bad = "bad source";
> else if (!strncmp(destination[i], source[i], length) &&
> (destination[i][length] == 0 ||
> destination[i][length] == '/'))
> bad = "can not move directory into itself";
>
> if (S_ISDIR(st.st_mode)) {
> ...
>
> Note that the above is an absolute minimum rewrite. Other
> things I noticed are:
>
> - source[i] and destination[i] are referenced all the time; the
> code would be easer to read if you had something like this
> upfront:
>
> /* Checking */
> for (i = 0; i < count; i++) {
> const char *bad = NULL;
> const char *src = source[i];
> const char *dst = destination[i];
> int srclen = strlen(src);
> int dstlen = strlen(dst);
>
> You might end up not using dstlen in some cases, but I think
> this would be far easier to read. Micro-optimizing by saying
> "this is used only in this branch of this later if()
> statement but in that case it is always set in that branch of
> that earlier if() statement" makes unmaintainably confusing
> code.
>
> - I do not think you need "const char *dir, *dest_dir" inside
> the "source is directory" branch; I would just use src and dst
> consistently;
These changes would make the source more readable, yes.
> - You muck with dest_dir by calling add_slash(dest_dir) but
> call prefix_path() with dst_len you computed earlier;
> prefix_path() may know what to do, but is this intended?
That is probably a late night oversight.
If noone else is faster, I will do the requested changes tomorrow.
Ciao,
Dscho
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-08-19 1:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-18 5:59 [PATCH] cleans up builtin-mv David Rientjes
2006-08-18 6:12 ` David Rientjes
2006-08-18 9:51 ` Johannes Schindelin
2006-08-18 17:07 ` David Rientjes
2006-08-18 18:35 ` Josef Weidendorfer
2006-08-18 19:01 ` David Rientjes
2006-08-18 19:33 ` Junio C Hamano
2006-08-19 1:26 ` Johannes Schindelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).