[PATCH] blame: make sure that the last line ends in an LF

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] blame: make sure that the last line ends in an LF
@ 2009-10-20  3:06 Sverre Rabbelier
  2009-10-20  7:00 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Sverre Rabbelier @ 2009-10-20  3:06 UTC (permalink / raw)
  To: Git List, Junio C Hamano; +Cc: Johannes Schindelin, Sverre Rabbelier

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This is convenient when parsing multiple the blame of multiple files,
for example:

    git ls-files -z --exclude-standard -- "*.[ch]" |
    xargs --null -n 1 git blame -p > output

and then analyzing the 'output' file using a seperate script.

Currently the parsing is difficult when not all files have a newline
at EOF, this patch ensures that even such files have a newline at the
end of the blame output.

Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
CC: Johannes Schindelin <johannes.schindelin@gmx.de>
---

    Patch by Dscho, commit message by me. Apologies to Dscho for
    taking so long to send it :).

 builtin-blame.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 7512773..dd16b22 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1604,6 +1604,9 @@ static void emit_porcelain(struct scoreboard *sb, struct blame_entry *ent)
 		} while (ch != '\n' &&
 			 cp < sb->final_buf + sb->final_buf_size);
 	}
+
+	if (sb->final_buf_size && cp[-1] != '\n')
+		putchar('\n');
 }
 
 static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
@@ -1667,6 +1670,9 @@ static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
 		} while (ch != '\n' &&
 			 cp < sb->final_buf + sb->final_buf_size);
 	}
+
+	if (sb->final_buf_size && cp[-1] != '\n')
+		putchar('\n');
 }
 
 static void output(struct scoreboard *sb, int option)
-- 
1.6.5.1.123.ge01f7

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20  3:06 [PATCH] blame: make sure that the last line ends in an LF Sverre Rabbelier
@ 2009-10-20  7:00 ` Junio C Hamano
  2009-10-20 13:15   ` Sverre Rabbelier
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-10-20  7:00 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Git List, Johannes Schindelin

Sverre Rabbelier <srabbelier@gmail.com> writes:

> Currently the parsing is difficult when not all files have a newline
> at EOF, this patch ensures that even such files have a newline at the
> end of the blame output.

Thanks.  This clearly shows that blame was somewhat sloppily written in
that it never considered to be fed anything but well formed text files.

My knee-jerk reaction to the issue is that there are two reasonable
approaches to the problem:

 (1) Admit that the code is not prepared to take anything but well formed
     text files.  This will lead to adding the necessary LF after reading
     a blob and if it does not end with LF.  After all, I do not trust the
     code (iow, "me") if it is not prepared to take a blob with incomplete
     line to handle the internal comparison between blobs with incomplete
     lines.

 (2) Do the right thing, by coming up with a notation to show that the
     final line is incomplete, perhaps similar to "\No newline ..."
     notation used by "diff".

To put the last sentence of (1) differently, does the code assign blame
correctly around the last line of the original blob?  What if an older
version ended with an incomplete line and a later version changed the line
(without adding the terminating LF)?  What if a later version changed the
line and added the terminating LF?  What if a later version only added the
terminating LF and did nothing else?  Are these three cases handled
correctly?

After thinking issues like the above, I read the patch and I see it does
not take neither approach.  That makes me feel nervous.

By tweaking only the output routine you _might_ be getting correct output,
but even then it looks to me like the end result is working correctly not
by design but by accident.  IOW, the patch may be better than nothing, but
somehow it just feels like it is papering over the real issue than being a
proper fix.

Or am I worrying too much?

> Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
> CC: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>
>     Patch by Dscho, commit message by me. Apologies to Dscho for
>     taking so long to send it :).
>
>  builtin-blame.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-blame.c b/builtin-blame.c
> index 7512773..dd16b22 100644
> --- a/builtin-blame.c
> +++ b/builtin-blame.c
> @@ -1604,6 +1604,9 @@ static void emit_porcelain(struct scoreboard *sb, struct blame_entry *ent)
>  		} while (ch != '\n' &&
>  			 cp < sb->final_buf + sb->final_buf_size);
>  	}
> +
> +	if (sb->final_buf_size && cp[-1] != '\n')
> +		putchar('\n');
>  }
>  
>  static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
> @@ -1667,6 +1670,9 @@ static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
>  		} while (ch != '\n' &&
>  			 cp < sb->final_buf + sb->final_buf_size);
>  	}
> +
> +	if (sb->final_buf_size && cp[-1] != '\n')
> +		putchar('\n');
>  }
>  
>  static void output(struct scoreboard *sb, int option)
> -- 
> 1.6.5.1.123.ge01f7

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20  7:00 ` Junio C Hamano
@ 2009-10-20 13:15   ` Sverre Rabbelier
  2009-10-20 16:55     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Sverre Rabbelier @ 2009-10-20 13:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Johannes Schindelin

Heya,

On Tue, Oct 20, 2009 at 02:00, Junio C Hamano <gitster@pobox.com> wrote:
>  (2) Do the right thing, by coming up with a notation to show that the
>     final line is incomplete, perhaps similar to "\No newline ..."
>     notation used by "diff".

What about 'git blame -p', can we just go about changing the format like that?

For purpose of the discussion below let's assume we squash in the following:

-- <8 --

diff --git a/builtin-blame.c b/builtin-blame.c
index dd16b22..cf492a0 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1606,7 +1606,7 @@ static void emit_porcelain(struct scoreboard
*sb, struct blame_entry *ent)
 	}

 	if (sb->final_buf_size && cp[-1] != '\n')
-		putchar('\n');
+		printf("\n\\ No newline at end of file\n");
 }

 static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
@@ -1672,7 +1672,7 @@ static void emit_other(struct scoreboard *sb,
struct blame_entry *ent, int opt)
 	}

 	if (sb->final_buf_size && cp[-1] != '\n')
-		putchar('\n');
+		printf("\n\\ No newline at end of file\n");
 }

 static void output(struct scoreboard *sb, int option)
-- 

-- <8 --

> Does the code assign blame
> correctly around the last line of the original blob?

Yes, it does, when there is no trailing newline an extra "\ No newline
at end of file" is printed, but the last line is still attributed
correctly.

> What if an older
> version ended with an incomplete line and a later version changed the line
> (without adding the terminating LF)?

Nothing changes, the blame on that last line is attributed correctly
and the "\ No newline at end of file" is printed.

> What if a later version changed the
> line and added the terminating LF?

The trailing "\ No newline at end of file" is no longer printed and
the last line is correctly attributed to the commit that added the
trailing LF.

> What if a later version only added the
> terminating LF and did nothing else?  Are these three cases handled
> correctly?

Same as above.

> After thinking issues like the above, I read the patch and I see it does
> not take neither approach.  That makes me feel nervous.

Reading your reply I see that if you care about the presence (or
absence) of a trailing newline the current patch would be problematic,
as it makes it impossible to see in the blame output whether there was
a trailing newline or not.

> By tweaking only the output routine you _might_ be getting correct output,
> but even then it looks to me like the end result is working correctly not
> by design but by accident.  IOW, the patch may be better than nothing, but
> somehow it just feels like it is papering over the real issue than being a
> proper fix.
>
> Or am I worrying too much?

No, I think your concerns are valid, we should go with (2) and DTRT.
Does the updated patch address your concerns? If so I can send a new
version.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20 13:15   ` Sverre Rabbelier
@ 2009-10-20 16:55     ` Junio C Hamano
  2009-10-20 19:55       ` Johannes Schindelin
  2009-10-20 20:04       ` Sverre Rabbelier
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2009-10-20 16:55 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Git List, Johannes Schindelin

Sverre Rabbelier <srabbelier@gmail.com> writes:

>> Does the code assign blame
>> correctly around the last line of the original blob?
> 
> Yes, it does,...

That is kind of surprising ;-) as I do remember that I never thought about
this issue of dealing with the incomplete lines while writing the blame
algorithm.  I actually didn't even think about "well this will not work
because I am ignoring the incomplete lines".

>> Or am I worrying too much?
>
> No, I think your concerns are valid, we should go with (2) and DTRT.
> Does the updated patch address your concerns? If so I can send a new
> version.

Assuming the internal blame algorithm correctly works with such an input,
I'd be happier with an approach to allow users to tell the difference.
The --porcelain output was designed to be extensible, and it might make
sense to show the "this line is incomplete" as a separate bit, though.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20 16:55     ` Junio C Hamano
@ 2009-10-20 19:55       ` Johannes Schindelin
  2009-10-20 22:00         ` Junio C Hamano
  2009-10-20 20:04       ` Sverre Rabbelier
  1 sibling, 1 reply; 9+ messages in thread
From: Johannes Schindelin @ 2009-10-20 19:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sverre Rabbelier, Git List

Hi,

On Tue, 20 Oct 2009, Junio C Hamano wrote:

> Sverre Rabbelier <srabbelier@gmail.com> writes:
> 
> >> Or am I worrying too much?
> >
> > No, I think your concerns are valid, we should go with (2) and DTRT. 
> > Does the updated patch address your concerns? If so I can send a new 
> > version.
> 
> Assuming the internal blame algorithm correctly works with such an 
> input, I'd be happier with an approach to allow users to tell the 
> difference. The --porcelain output was designed to be extensible, and it 
> might make sense to show the "this line is incomplete" as a separate 
> bit, though.

Sorry, you lost me.  If, say, line 614 is the last line that does not end 
in a new line, if I ask for it to be blamed, I want to know who is 
responsible for the current form of line 614.

Not whether the line ends in a new line or not.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20 16:55     ` Junio C Hamano
  2009-10-20 19:55       ` Johannes Schindelin
@ 2009-10-20 20:04       ` Sverre Rabbelier
  2009-10-20 20:28         ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Sverre Rabbelier @ 2009-10-20 20:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Johannes Schindelin

Heya,

On Tue, Oct 20, 2009 at 11:55, Junio C Hamano <gitster@pobox.com> wrote:
> That is kind of surprising ;-) as I do remember that I never thought about
> this issue of dealing with the incomplete lines while writing the blame
> algorithm.  I actually didn't even think about "well this will not work
> because I am ignoring the incomplete lines".

I used the following repo for testing:

$ git init
Initialized empty Git repository in /home/sverre/code/test/.git/
$ echo "first line" > test
$ git add test
$ git commit -m "initial"
[master (root-commit) d573d06] initial
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 test
$ echo -n "second line, no newline" >> test
$ git add test
$ git commit -m "second"
[master 76ad2f9] second
 1 files changed, 1 insertions(+), 0 deletions(-)

Regular output looks good:

$ git blame test
^d573d06 (Sverre Rabbelier 2009-10-20 12:30:56 -0500 1) first line
76ad2f90 (Sverre Rabbelier 2009-10-20 12:31:57 -0500 2) second line, no newline

Porcelain output looks good too:

$ git blame -p test
d573d06f0dd50148ba8e59bf8f1ef8fa7ee9fc88 1 1 1
author Sverre Rabbelier
author-mail <srabbelier@gmail.com>
author-time 1256059856
author-tz -0500
committer Sverre Rabbelier
committer-mail <srabbelier@gmail.com>
committer-time 1256059856
committer-tz -0500
summary initial
boundary
filename test
        first line
76ad2f90bde689a65715e37afd37d45942c74954 2 2 1
author Sverre Rabbelier
author-mail <srabbelier@gmail.com>
author-time 1256059917
author-tz -0500
committer Sverre Rabbelier
committer-mail <srabbelier@gmail.com>
committer-time 1256059917
committer-tz -0500
summary second
previous d573d06f0dd50148ba8e59bf8f1ef8fa7ee9fc88 test
filename test
        second line, no newline

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20 20:04       ` Sverre Rabbelier
@ 2009-10-20 20:28         ` Junio C Hamano
  2009-10-22  1:33           ` Sverre Rabbelier
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2009-10-20 20:28 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: Junio C Hamano, Git List, Johannes Schindelin

Sverre Rabbelier <srabbelier@gmail.com> writes:

> On Tue, Oct 20, 2009 at 11:55, Junio C Hamano <gitster@pobox.com> wrote:
>> That is kind of surprising ;-) as I do remember that I never thought about
>> this issue of dealing with the incomplete lines while writing the blame
>> algorithm.  I actually didn't even think about "well this will not work
>> because I am ignoring the incomplete lines".
>
> I used the following repo for testing:
>
> $ git init
> Initialized empty Git repository in /home/sverre/code/test/.git/
> $ echo "first line" > test
> $ git add test
> $ git commit -m "initial"
> [master (root-commit) d573d06] initial
>  1 files changed, 1 insertions(+), 0 deletions(-)
>  create mode 100644 test
> $ echo -n "second line, no newline" >> test
> $ git add test
> $ git commit -m "second"
> [master 76ad2f9] second
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> Regular output looks good:
>
> $ git blame test
> ^d573d06 (Sverre Rabbelier 2009-10-20 12:30:56 -0500 1) first line
> 76ad2f90 (Sverre Rabbelier 2009-10-20 12:31:57 -0500 2) second line, no newline
>
> Porcelain output looks good too:
>
> $ git blame -p test
> d573d06f0dd50148ba8e59bf8f1ef8fa7ee9fc88 1 1 1
> author Sverre Rabbelier
> author-mail <srabbelier@gmail.com>
> author-time 1256059856
> author-tz -0500
> committer Sverre Rabbelier
> committer-mail <srabbelier@gmail.com>
> committer-time 1256059856
> committer-tz -0500
> summary initial
> boundary
> filename test
>         first line
> 76ad2f90bde689a65715e37afd37d45942c74954 2 2 1
> author Sverre Rabbelier
> author-mail <srabbelier@gmail.com>
> author-time 1256059917
> author-tz -0500
> committer Sverre Rabbelier
> committer-mail <srabbelier@gmail.com>
> committer-time 1256059917
> committer-tz -0500
> summary second
> previous d573d06f0dd50148ba8e59bf8f1ef8fa7ee9fc88 test
> filename test
>         second line, no newline


Thanks.

For both styles of output, adding an extra LF after "no newline" would be
necessary to make the output legible (for human) and parsable (for
scripts).

In addition, it would help Porcelains to re-construct the final text if
you added a boolean "incomplete-line" (put it on its own line, immediately
after "filename test" line).  Then they will know that LF after "second
line, no newline" was not there in the original and was added for
parsability.

I am not sure what we want to do for non-porcelain output (other than
adding the extra LF at the end).  Assuming that they are meant to be read
by humans (and casual scripts that do not bother reading --porcelain
format), it might be best not to add any extra marking.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20 19:55       ` Johannes Schindelin
@ 2009-10-20 22:00         ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2009-10-20 22:00 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Sverre Rabbelier, Git List

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Tue, 20 Oct 2009, Junio C Hamano wrote:
>
>> Sverre Rabbelier <srabbelier@gmail.com> writes:
>> 
>> >> Or am I worrying too much?
>> >
>> > No, I think your concerns are valid, we should go with (2) and DTRT. 
>> > Does the updated patch address your concerns? If so I can send a new 
>> > version.
>> 
>> Assuming the internal blame algorithm correctly works with such an 
>> input, I'd be happier with an approach to allow users to tell the 
>> difference. The --porcelain output was designed to be extensible, and it 
>> might make sense to show the "this line is incomplete" as a separate 
>> bit, though.
>
> Sorry, you lost me.  If, say, line 614 is the last line that does not end 
> in a new line, if I ask for it to be blamed, I want to know who is 
> responsible for the current form of line 614.
>
> Not whether the line ends in a new line or not.

Yeah, I know.

I was primarily worried about making the output format into something
Porcelains (that read from --porcelain format) cannot reconstruct the
original text from.

See Message-ID: <7viqe9n72w.fsf@alter.siamese.dyndns.org> for a revised
suggestion.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] blame: make sure that the last line ends in an LF
  2009-10-20 20:28         ` Junio C Hamano
@ 2009-10-22  1:33           ` Sverre Rabbelier
  0 siblings, 0 replies; 9+ messages in thread
From: Sverre Rabbelier @ 2009-10-22  1:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Johannes Schindelin

Heya,

On Tue, Oct 20, 2009 at 15:28, Junio C Hamano <gitster@pobox.com> wrote:
> For both styles of output, adding an extra LF after "no newline" would be
> necessary to make the output legible (for human) and parsable (for
> scripts).

You mean like this right?

```
+               printf("\n\\ No newline at end of file\n");
```

Or does it need _another_ newline, like this

```
+               printf("\n\\ No newline at end of file\n\n");
```

> In addition, it would help Porcelains to re-construct the final text if
> you added a boolean "incomplete-line" (put it on its own line, immediately
> after "filename test" line).  Then they will know that LF after "second
> line, no newline" was not there in the original and was added for
> parsability.

What do we do in the case that the last few lines are attributed to
the same commit? Do we just signify 'incomplete line' to mean that the
last one of those is incomplete?

> I am not sure what we want to do for non-porcelain output (other than
> adding the extra LF at the end).  Assuming that they are meant to be read
> by humans (and casual scripts that do not bother reading --porcelain
> format), it might be best not to add any extra marking.

Fine by me :).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-10-22  1:34 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-20  3:06 [PATCH] blame: make sure that the last line ends in an LF Sverre Rabbelier
2009-10-20  7:00 ` Junio C Hamano
2009-10-20 13:15   ` Sverre Rabbelier
2009-10-20 16:55     ` Junio C Hamano
2009-10-20 19:55       ` Johannes Schindelin
2009-10-20 22:00         ` Junio C Hamano
2009-10-20 20:04       ` Sverre Rabbelier
2009-10-20 20:28         ` Junio C Hamano
2009-10-22  1:33           ` Sverre Rabbelier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).