git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFH] plumber's puzzle
@ 2007-04-22 19:53 Junio C Hamano
  2007-04-22 20:30 ` Florian Weimer
  2007-04-22 20:43 ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-04-22 19:53 UTC (permalink / raw)
  To: git

I know people had trouble fetching from the official git.git
repository this morning; sorry about the gotcha.

Now it has all been hopefully sorted out, here is one thing that
I have been having trouble with lately, and can use some help.

Please try this, after getting the latest 'pu':

	$ git checkout origin/pu
	$ make all test-chmtime
	$ cd t
        $ sh ./t0021-filter.sh
        $ cd trash
        $ rm -f test.t
        $ ../../git-checkout-index -f -q -a
	$ ../../git-diff

This shows the single "diff --git" header line without anything,
to show that the path is not stat-clean, but the contents are
unchanged, which is what is expected.

A bit of background.  test.t has a funny "rot13" filter set for
both smudge and clean.  So test.t file has "a b c d e...", but
the actual blob records "n o p q r...".  Since we teached git
about the filter, git-checkout-index and git-diff do the right
thing.

Now, go back to the toplevel, comment out "close(1)" in
convert.c::filter_buffer() around line 236, and rebuild.

	$ cd ../..
        $ edit convert.c ;# comment out "close(1)"
        $ make
        $ cd t/trash

We get a quite _WRONG_ diff, like this:

        $ ../../git-diff
        diff --git a/test.t b/test.t
        --- a/test.t
        +++ b/test.t
        @@ -1,3 +1,4 @@
         n o p q r s t u v w x y z
         a b c d e f g h i j k l m
         $vqrag$
        +diff --git a/test.t b/test.t

and if you disable pager like this:

	$ PAGER= ../../git-diff

then you get the expected correct result.  Also you get this if
we disable the internal pager:

        $ ../../git-diff | cat
        diff --git a/test.t b/test.t
        --- a/test.t
        +++ b/test.t
        @@ -1,3 +1,4 @@
         n o p q r s t u v w x y z
         a b c d e f g h i j k l m
         $vqrag$
        +diff --git a/test.t b/test.t

I do not want to have a "fix" that I do not understand *why* it
fixes things in the tree; I couldn't figure out why that
close(1) in that codepath matters, although I was the one who
added that line.

Insights?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-22 19:53 [RFH] plumber's puzzle Junio C Hamano
@ 2007-04-22 20:30 ` Florian Weimer
  2007-04-22 20:43 ` Linus Torvalds
  1 sibling, 0 replies; 8+ messages in thread
From: Florian Weimer @ 2007-04-22 20:30 UTC (permalink / raw)
  To: git

* Junio C. Hamano:

> Insights?

5520  write(1, "diff --git a/test.t b/test.t\n", 29) = -1 EBADF (Bad file descriptor)

It's not a fix, you just discard the output because you've closed
standard output.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-22 19:53 [RFH] plumber's puzzle Junio C Hamano
  2007-04-22 20:30 ` Florian Weimer
@ 2007-04-22 20:43 ` Linus Torvalds
  2007-04-22 22:34   ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2007-04-22 20:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 22 Apr 2007, Junio C Hamano wrote:
> 
> This shows the single "diff --git" header line without anything,
> to show that the path is not stat-clean, but the contents are
> unchanged, which is what is expected.

Actually, I think the "good" case is the broken one.

Do an "strace -f" on the two cases, and you'll see an EBADF in the case 
that you think is good: the missing output *is* there, it's just that you 
closed the file descriptor so you don't see it.

So if the output you want is with the close(1) (ie with the output 
discarded), then you have some other bug there.

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-22 20:43 ` Linus Torvalds
@ 2007-04-22 22:34   ` Junio C Hamano
  2007-04-22 23:03     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-04-22 22:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Apr 2007, Junio C Hamano wrote:
>> 
>> This shows the single "diff --git" header line without anything,
>> to show that the path is not stat-clean, but the contents are
>> unchanged, which is what is expected.
>
> Actually, I think the "good" case is the broken one.
>
> Do an "strace -f" on the two cases, and you'll see an EBADF in the case 
> that you think is good: the missing output *is* there, it's just that you 
> closed the file descriptor so you don't see it.
>
> So if the output you want is with the close(1) (ie with the output 
> discarded), then you have some other bug there.

I think I figured it out.  The extra EBADF output comes from the
process that calls finish_command() in filter_buffer().

That is because the caller is diff_flush() which prepares its
output using stdio, and when apply_filter -> filter_bufer
callchain forks, the unflushed stdout hangs around in the
child.  Then we call exit() in apply_filter() to terminate the
child we spawned to do the filtering.  It flushes its copy of
stdio buffer.

Yuck.

I should be happy that I figured out what is going on, but I am
not very happy with this patch.

diff --git a/convert.c b/convert.c
index 845825b..35bb8cf 100644
--- a/convert.c
+++ b/convert.c
@@ -233,7 +233,6 @@ static int filter_buffer(const char *path, const char *src,
 		return 1;
 	}
 	close(pipe_feed[0]);
-	close(1);
 
 	write_err = (write_in_full(pipe_feed[1], src, size) < 0);
 	if (close(pipe_feed[1]))
@@ -273,6 +272,7 @@ static char *apply_filter(const char *path, const char *src,
 		return NULL;
 	}
 
+	fflush(stdout);
 	child_process.pid = fork();
 	if (child_process.pid < 0) {
 		error("cannot fork to run external filter %s", cmd);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-22 22:34   ` Junio C Hamano
@ 2007-04-22 23:03     ` Linus Torvalds
  2007-04-22 23:52       ` Junio C Hamano
  2007-04-24  0:51       ` H. Peter Anvin
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2007-04-22 23:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 22 Apr 2007, Junio C Hamano wrote:
> 
> I should be happy that I figured out what is going on, but I am
> not very happy with this patch.

That actually looks like the right patch.

The "fflush() before fork()" thing is a real issue, and a real bug. Stdio 
is buffered, and yes, fork() will duplicate the buffer if not flushed.

Of course, I'm not 100% sure that is the right _place_ for the fflush() 
call. I wonder if we should just do the fflush() closer to the place that 
generates the data. As it is, we may have other things like that lurking.

Of course, delaying the fflush as long as possible is likely good for 
performance, so doing it just before the fork() (even if it may be ugly 
and somewhat unexpected at that point to have to do it) may just be the 
right thing regardless...

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-22 23:03     ` Linus Torvalds
@ 2007-04-22 23:52       ` Junio C Hamano
  2007-04-24  0:51       ` H. Peter Anvin
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-04-22 23:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Sun, 22 Apr 2007, Junio C Hamano wrote:
>> 
>> I should be happy that I figured out what is going on, but I am
>> not very happy with this patch.
>
> That actually looks like the right patch.
>
> The "fflush() before fork()" thing is a real issue, and a real bug. Stdio 
> is buffered, and yes, fork() will duplicate the buffer if not flushed.
>
> Of course, I'm not 100% sure that is the right _place_ for the fflush() 
> call. I wonder if we should just do the fflush() closer to the place that 
> generates the data. As it is, we may have other things like that lurking.
>
> Of course, delaying the fflush as long as possible is likely good for 
> performance, so doing it just before the fork() (even if it may be ugly 
> and somewhat unexpected at that point to have to do it) may just be the 
> right thing regardless...

Another possibility I considered is to call low-level _exit so
that we bypass not just stdio flushing but also atexit().  But I
think "fflush() before fork()" is cleaner.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-22 23:03     ` Linus Torvalds
  2007-04-22 23:52       ` Junio C Hamano
@ 2007-04-24  0:51       ` H. Peter Anvin
  2007-04-24  8:36         ` Johannes Schindelin
  1 sibling, 1 reply; 8+ messages in thread
From: H. Peter Anvin @ 2007-04-24  0:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Linus Torvalds wrote:
> 
> On Sun, 22 Apr 2007, Junio C Hamano wrote:
>> I should be happy that I figured out what is going on, but I am
>> not very happy with this patch.
> 
> That actually looks like the right patch.
> 
> The "fflush() before fork()" thing is a real issue, and a real bug. Stdio 
> is buffered, and yes, fork() will duplicate the buffer if not flushed.
> 
> Of course, I'm not 100% sure that is the right _place_ for the fflush() 
> call. I wonder if we should just do the fflush() closer to the place that 
> generates the data. As it is, we may have other things like that lurking.
> 
> Of course, delaying the fflush as long as possible is likely good for 
> performance, so doing it just before the fork() (even if it may be ugly 
> and somewhat unexpected at that point to have to do it) may just be the 
> right thing regardless...
> 

It might be worthwhile to have a wrapper function for fork() which adds 
fflush(NULL); before forking?

	-hpa

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFH] plumber's puzzle
  2007-04-24  0:51       ` H. Peter Anvin
@ 2007-04-24  8:36         ` Johannes Schindelin
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2007-04-24  8:36 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Junio C Hamano, git

Hi,

On Mon, 23 Apr 2007, H. Peter Anvin wrote:

> It might be worthwhile to have a wrapper function for fork() which adds 
> fflush(NULL); before forking?

It might be worthwhile to have wrapper functions which do much more than 
that. If only to help portability. FWIW, the MinGW port has some nice code 
which I'd like to see back in git.git.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-04-24  8:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-22 19:53 [RFH] plumber's puzzle Junio C Hamano
2007-04-22 20:30 ` Florian Weimer
2007-04-22 20:43 ` Linus Torvalds
2007-04-22 22:34   ` Junio C Hamano
2007-04-22 23:03     ` Linus Torvalds
2007-04-22 23:52       ` Junio C Hamano
2007-04-24  0:51       ` H. Peter Anvin
2007-04-24  8:36         ` Johannes Schindelin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).