* entry terminator/separator behavior in show_log()
@ 2008-04-28 4:55 Adam Simpkins
2008-04-28 16:10 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Adam Simpkins @ 2008-04-28 4:55 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
This is a write-up describing the current entry terminator/separator
behavior of show_log(). I wrote it while updating my "log --graph"
code to work with Junio's new use_terminator flag.
I was initially hoping to find places where we could simplify the
behavior. However, after examining all of the cases, I'm not sure if
there is much we can do without breaking backwards compatibility too
badly. Perhaps we should just check this description into
Documentation/technical, for the next person who needs to work with
this part of the code.
Towards the end of the document, I've also listed how the new --graph
option will interact with each of the cases. I'd love to have
feedback to see if my current plans sound reasonable.
Variables affecting the behavior
--------------------------------
- The sep argument to show_log()
As far as I can tell, this is always "". show_log() is sometimes
invoked with a sep argument of diffopt->msg_sep. However,
"git grep -w msg_sep" shows that msg_sep is never set to anything but
the empty string.
Perhaps we could just remove this argument, to reduce complexity and
confusion?
- The line_termination member of struct diff_options
This is normally '\n', but is set to '\0' if the -z option is
supplied.
- The use_terminator flag in struct rev_info
This flag is enabled by --pretty=oneline and --pretty=format. It is
off for all other formats.
This flag has no effect if opt->verbose_header is not set. If
opt->verbose_header is set, the behavior is as follows:
When use_terminator is set, a newline is added to the end of each log
entry. diffopt.line_termination is not printed in between entries.
When use_terminator is not set, diffopt.line_termination is printed
before all entries except the very first one.
- Whether or not the log entry ends in a terminating newline
The log entry for CMIT_FMT_ONELINE never ends in a newline.
The log entry for CMIT_FMT_USERFORMAT only ends in a newline if the
user explicitly places a newline at the end of the format.
For all other formats, the entry always ends in a terminating newline.
- Whether or not the log message buffer is empty
If use_terminator is set and the message buffer returned by
pretty_print_commit() is empty, no newline is printed between entries.
This can happen, for example, if CMIT_FMT_USERFORMAT is used with an
empty format string. Even though pretty_print_commit() returns an
empty buffer, the log entry itself does not necessarily have to be
empty. For example, the log size will be printed if if the --log-size
argument is used.
I'm not sure if there any are other code paths that can also cause
this to happen.
Current Behavior
----------------
(1) For all formats except CMIT_FMT_ONELINE and CMIT_FMT_USERFORMAT:
diffopt->line_termination is printed before each log entry but the
first. This results in diffopt->line_termination in between each
entry, and not after the final entry.
(1.1)
When diffopt->line_termination is '\0', this prints a NUL
character between log entries.
(1.2)
When diffopt->line_termination is '\n', this essentially adds an
extra blank line of padding between log entries, since each log
entry already ends in a newline.
(2) For CMIT_FMT_ONELINE and CMIT_FMT_USERFORMAT:
(2.1)
If use_terminator is set (--pretty=oneline and --pretty=tformat):
(2.1.1)
If the message buffer returned by pretty_print_commit() is
non-empty:
A newline is printed at the end of each entry. Since the log
entry does not end in a newline already, the newline ends the
last line of the log entry, and there is no extra blank line
between entries.
With --pretty=tformat, if the user's format explicitly ends in a
newline, this will create a blank line at the end of each entry.
(This looks like a padding line, very similar to case 1 above.
However, it shows up even after the very last entry.)
(2.1.2)
If the message buffer returned by pretty_print_commit() is
empty:
A newline is not printed at the end of each entry. Therefore
there are no blank lines between log entries. If the log
entries are completely empty, nothing at all is printed.
(2.2)
If use_terminator is not set (--pretty=format):
diffopt->line_termination is printed before each log entry but the
first. This results in diffopt->line_termination in between each
entry, and not after the final entry.
(2.2.1)
When diffopt->line_termination is '\0', this is reasonable
behavior, and makes sense for consumption by other scripts.
(2.2.2)
When diffopt->line_termination is '\n':
(2.2.2.1)
If the user's format is non-empty and does not end in an
explicit newline, log entries do not end in a newline.
However, the line_termination character printed before the
next entry appears to terminate the previous entry, so it
looks human-readable. There are no blank lines between
entries.
However, no newline is printed after the very last entry.
This seems like a bug to me, since I would expect that the
output is normally being read by a human instead of another
script when diffopt->line_termination is '\n'.
Fortunately, the log output is normally piped through less,
and less hides the fact that the output does not end in a
newline. It prints a terminating newline for us if the git
output doesn't end in a newline.
(2.2.2.2)
If the user's format does end in an explicit newline, the
diffopt->line_termination character between entries creates a
blank line between entries. The output looks exactly like
case 1: a blank line show up between entries, but not after
the very last entry.
(2.2.2.3)
If the user's format is the empty string, the log entry will
either be completely empty, or will already end in a newline.
The diffopt->line_termination characters printed between
entries appear as blank padding lines between entries. If the
log entries are completely empty the resulting output
consists solely of padding lines.
(3) opt->verbose_header is unset
Only a single line of output is printed, terminated with the
diffopt->line_termination character.
Note that this differs from case 1! The line_termination character
is printed even after the very last entry.
Comments on the current behavior
--------------------------------
The current behavior is fairly complicated. There are quite a few
corner cases, especially around uncommon behavior (for example,
"--pretty=format:" with an empty format string). It would be nice to
eliminate some of the complexity. However, doing so would change the
behavior and potentially break existing user scripts relying on this
behavior.
- Cases 1.1 and 1.2 make perfect sense to me.
- Case 2.1 makes perfect sense to me.
- Case 2.1.2 seems reasonable, but not really necessary. It might be
nicer to simply always print a terminating newline even if the message
buffer is empty. This would appear as an extra blank line after each
entry.
The --pretty=tformat option is quite new, so I doubt that many people
would complain if we changed this behavior.
- Case 2.2.1 makes perfect sense. There are probably lots of scripts
out there relying on this behavior.
- I don't really like the behavior for any of the cases under 2.2.2.
I especially don't like the fact that the output does not end in a
terminating newline for case 2.2.2.1.
If the output is for human consumption, the user probably really wants
case 2.1 (i.e., --pretty=tformat instead of --pretty=format.)
Hopefully most scripts that parse the log output use the -z option,
but there are probably also many that don't. Changing this behavior
would break these scripts. Teaching users to use --pretty=tformat
instead of --pretty=format will probably cause less pain than
breaking backwards compatibility.
Desired behavior with the new --graph option
--------------------------------------------
- Case 1.1
(All formats but CMIT_FMT_ONELINE and CMIT_FMT_USERFORMAT,
line_termination is '\0')
The graph output should be printed before each line in the log entry.
Graph output should not be printed between the last newline in the log
entry and the line_termination character ('\0') for the next log entry.
- Case 1.2
(All formats but CMIT_FMT_ONELINE and CMIT_FMT_USERFORMAT,
line_termination is '\n')
The graph output should be printed before each line in the log entry.
For each entry but the first, graph padding should also be printed
*before* the line_termination character. Otherwise the blank padding
line between entries will be completely blank, which would leave gaps
in the graph output.
- Cases 2.1.1 and 2.1.2
(--pretty=oneline and --pretty=tformat)
The graph output should be printed before each line in the log entry.
There is no additional separator between entries, so there is nothing
else to worry about here. All log entries always end in a newline,
which also helps keep things simple.
- Case 2.2.1
(--pretty=format and line_termination is '\0')
The graph output should be printed before each line in the log entry.
Graph output should not be printed between the last newline in the log
entry and the line_termination character ('\0') for the next log entry.
This behavior is identical to that for case 1.1.
- Case 2.2.2.1
(--pretty=format, line_termination is '\n', and with a non-empty
format that doesn't end with a newline)
The graph output should be printed before each line in the log entry.
Graph output should not be printed before the line_termination
character, since this character is merely terminating the last line of
output from the previous entry.
- Cases 2.2.2.2 and 2.2.2.3
(--pretty=format, line_termination is '\n', and with a format that
ends with an explicit newline, or with an empty format)
The graph output should be printed before each line in the log entry.
Graph output *should* be before the line_termination character,
since this character is creating a new blank padding line.
- Case 3
(opt->verbose_header is unset)
Each log entry consists of a single line, terminated with the
line_termination character. The graph output should be printed before
each line.
However, this case currently can't happen via the "git log" command,
since this command always sets the verbose_header flag.
With the current code, there's not really enough information for
the graphing code in show_log() to distinguish between cases cases
2.2.2.1 and 2.2.2.{2,3}.
In order to do so, I plan to add a new flag to struct rev_info to
indicate whether or not the previous log entry ended in a newline. The
next invocation of show_log() can check this flag to see if it should
print graph padding before the line_termination character or not.
Thanks for reading, if you made it all the way down here!
--
Adam Simpkins
adam@adamsimpkins.net
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: entry terminator/separator behavior in show_log()
2008-04-28 4:55 entry terminator/separator behavior in show_log() Adam Simpkins
@ 2008-04-28 16:10 ` Junio C Hamano
2008-04-29 8:58 ` Adam Simpkins
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2008-04-28 16:10 UTC (permalink / raw)
To: Adam Simpkins; +Cc: git
Adam Simpkins <adam@adamsimpkins.net> writes:
> Variables affecting the behavior
> --------------------------------
>
> - The sep argument to show_log()
>
> As far as I can tell, this is always "". show_log() is sometimes
> invoked with a sep argument of diffopt->msg_sep. However,
> "git grep -w msg_sep" shows that msg_sep is never set to anything but
> the empty string.
>
> Perhaps we could just remove this argument, to reduce complexity and
> confusion?
Yes. 39bc9a6 (Add msg_sep to diff_options, 2006-06-25) and f005df3
(diff-tree: Use ---\n as a message separator, 2006-06-27) introduced the
variable and used it as the mechanism to add "---\n" at the end of the
message, which was a mistake, because it treated "---\n" as a part of the
message even though it is not --- it is a separator between the message
and the diff part if and only if the diff exists.
Later, 3969cf7 (Fix some more diff options changes., 2006-06-27) corrected
this and made generation of --- as part of the log-tree, where "the diff
for the commit" is generated. The variable is not used anymore and we can
safely remove it.
> - Whether or not the log entry ends in a terminating newline
>
> The log entry for CMIT_FMT_ONELINE never ends in a newline.
> The log entry for CMIT_FMT_USERFORMAT only ends in a newline if the
> user explicitly places a newline at the end of the format.
>
> For all other formats, the entry always ends in a terminating newline.
Hmm, interesting. This is true even when the original commit log message
ends with an incomplete line.
> - Whether or not the log message buffer is empty
>
> If use_terminator is set and the message buffer returned by
> pretty_print_commit() is empty, no newline is printed between entries.
>
> This can happen, for example, if CMIT_FMT_USERFORMAT is used with an
> empty format string. Even though pretty_print_commit() returns an
> empty buffer, the log entry itself does not necessarily have to be
> empty. For example, the log size will be printed if if the --log-size
> argument is used.
This could just be a bug. I do not think of a good reason to treat an
empty entry any differently from a single line entry that happens to have
zero characters on it.
> (2) For CMIT_FMT_ONELINE and CMIT_FMT_USERFORMAT:
> (2.2)
> If use_terminator is not set (--pretty=format):
>
> diffopt->line_termination is printed before each log entry but the
> first. This results in diffopt->line_termination in between each
> entry, and not after the final entry.
IOW, "use separators between entries."
> - Case 2.1.2 seems reasonable, but not really necessary. It might be
> nicer to simply always print a terminating newline even if the message
> buffer is empty. This would appear as an extra blank line after each
> entry.
>
> The --pretty=tformat option is quite new, so I doubt that many people
> would complain if we changed this behavior.
Concurred.
> - Case 2.2.1 makes perfect sense. There are probably lots of scripts
> out there relying on this behavior.
I would agree with the second sentence, but not necessarily with the
first. For script consumption, terminator is easier to deal with than
separator if the script is streaming; separator is easier if the script
gobbles up everything and then uses split(/$separator/, $everything).
They both have their uses.
> - I don't really like the behavior for any of the cases under 2.2.2.
>
> I especially don't like the fact that the output does not end in a
> terminating newline for case 2.2.2.1.
This is exactly why I did tformat so that we do not have to have a complex
special case (Jeff and I exchanged a few weatherbaloon patches on the list
trying out heuristics) to avoid breaking existing scripts that use format.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: entry terminator/separator behavior in show_log()
2008-04-28 16:10 ` Junio C Hamano
@ 2008-04-29 8:58 ` Adam Simpkins
0 siblings, 0 replies; 3+ messages in thread
From: Adam Simpkins @ 2008-04-29 8:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Mon, Apr 28, 2008 at 09:10:59AM -0700, Junio C Hamano wrote:
> Adam Simpkins <adam@adamsimpkins.net> writes:
>
> > - I don't really like the behavior for any of the cases under 2.2.2.
> >
> > I especially don't like the fact that the output does not end in a
> > terminating newline for case 2.2.2.1.
>
> This is exactly why I did tformat so that we do not have to have a complex
> special case (Jeff and I exchanged a few weatherbaloon patches on the list
> trying out heuristics) to avoid breaking existing scripts that use format.
Hmm. Here's another possible option to kick around:
Modify the argument parsing code to set use_terminator for
--pretty=format when -z is not supplied. In other words:
- If the -z argument is used (diffopt->line_termination is '\0'),
the code has separator semantics, just like it always has.
'\0' appears between the entries, and not at the end of the last
one.
- If the -z argument is not used (diffopt->line_termination is '\n'),
the code has terminator semantics. '\n' appears at the end of each
entry.
The only change to the current behavior is that if the -z option is
not used, an extra newline will appear at the end of the output. This
might confuse some scripts that don't use -z.
The nice thing about this change is that now all of the cases under
2.2.2 from my initial email behave identically. These cases are the
most annoying to distinguish and handle correctly for the new --graph
code, so it would make the graph logic simpler.
Do you think this change would be acceptable, or would it still break
too many scripts?
--
Adam Simpkins
adam@adamsimpkins.net
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-04-29 8:59 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-28 4:55 entry terminator/separator behavior in show_log() Adam Simpkins
2008-04-28 16:10 ` Junio C Hamano
2008-04-29 8:58 ` Adam Simpkins
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).