Re: [PATCH 03/21] Refactoring to make verify_tag() and parse_tag_buffer() more similar

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Johan Herland <johan@herland.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 03/21] Refactoring to make verify_tag() and parse_tag_buffer() more similar
Date: Sat, 09 Jun 2007 12:49:44 +0200	[thread overview]
Message-ID: <200706091249.45042.johan@herland.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0706090339280.4059@racer.site>

On Saturday 09 June 2007, Johannes Schindelin wrote:
> Hi,

Hi. Thanks for taking the time to look at (some of) my patch. Most of your
questions below can be answered with a single answer:

The main purpose of the patch is (as the subject line says) to bring the
two functions more in line with eachother. At the time I made the patch,
I had made the observation that these function were trying to do much the
same thing, albeit in a slightly different form. This patch is therefore
about applying a series of (mostly non-functional) refactorings to make
their diff as small as possible. This involves "stupid" changes such as
renaming variables, tweaking whitespace, reordering the declaration of
variables, etc. It's all to make the functions similar to the point where
I can diff them, get a small and meaningful result, see the remaining
_real_ differences, and in the end, _merge_ them (see patches 7-9).
If this whole exercise didn't end up with merging the two functions into
one, I would _totally_ agree with you that all this refactoring is more
harmful than beneficial.

> On Sat, 9 Jun 2007, Johan Herland wrote:
> >  	if (size < 64)
> >  		return error("wanna fool me ? you obviously got the size wrong !");
> >  
> > -	buffer[size] = 0;
> 
> Are you sure that your buffer is always NUL terminated?

First, (and you'll see this in the commit message) I'm _moving_ (not
removing) the NUL termination out of verify_tag() and into main() (which I
can be sure is the only caller of verify_tag(), since verify_tag is
declared static, and there is no other call in that file). Two reasons for
doing this:

1. Make verify_tag more similar to parse_tag_buffer() (because
parse_tag_buffer() does not NUL terminate)

2. Do the NUL termination as close to the code that actually populated the
buffer with data (the read_pipe() in main())

So now you can ask: Why doesn't parse_tag_buffer() NUL terminate its
input? It _that_ safe? And I ran around checking all the callers of
parse_tag_buffer, and found that all of them use data (most of which
originates from read_sha1_file()) that's already NUL terminated.

In the end, I also put in a comment on the resulting function
(parse_and_verify_tag_buffer()), explicitly saying the given data _must_
be NUL terminated.

Side note: At first I actually thought the manual NUL termination
could cause a buffer overflow (i.e. if the given size was the same as the
allocated size), so I actually have a version of all of this where I
_don't_ assume the buffer is NUL-terminated at all, and put in lots of
bounds checking, replace strchr() with memchr(), etc.

I then took a hard look at read_pipe(), and discovered that if you
use it to fill a 4096-byte buffer with 4096 bytes of data, it actually
_will_ reallocate to 8192 bytes and leave room for the NUL termination
(and much more) (I believe this should have been documented in read_pipe).
Thus the NUL termination was safe all along.

> > -	type_line = object + 48;
> > +	type_line = data + 48;
> 
> Quite a lot of changes seem to do this object->data. The patch would have 
> been much more compact if you just had renamed buffer to object instead of 
> data.

Yes, but it would have made the aforementioned diff to parse_tag_buffer()
larger.

> >  	/* TODO: check for committer info + blank line? */
> >  	/* Also, the minimum length is probably + "tagger .", or 63+8=71 */
> >  
> >  	/* The actual stuff afterwards we don't care about.. */
> >  	return 0;
> > -}
> >  
> >  #undef PD_FMT
> > +}
> 
> Any particular reason for this?

Well, PD_FMT is only used inside the function, so I found it easier to
move the #definition of PD_FMT inside the function to indicate the scope
(_perceived_ scope; I know it hasn't any effect on the compiled code).
But since the whole function is going away in a few patches anyway,
I should have probably left it out of this patch entirely.

> > @@ -124,6 +120,7 @@ int main(int argc, char **argv)
> >  		free(buffer);
> >  		die("could not read from stdin");
> >  	}
> > +	buffer[size] = 0;
> 
> Ah, so you terminate the buffer here. From the patch, it is relatively 
> hard to see if this line is always hit _before_ the function is called 
> that evidently relies on NUL termination. By moving it here, I think it is 
> much more likely to overlook the fact that the function, which made sure 
> that its assumption was met, needs this line. Whereas if you left it where 
> it was, the assumption would always be met.

But if I leave the NUL termination within the function I would have to
backtrack out of the function to all of its potential callers and check
whether it's safe to write to index size. Since the word "size" could
easily mean "allocated size" I would have the initial feeling that this
might be a buffer overflow, i.e. _not_ safe.

In the end, I think the best solution is to make sure NUL termination
happens before calling the function, and then documenting explicitly
that the function assumes NUL terminated input. Which is exactly what
I end up with at the end of the patch series.

> > -	sig_line++;
> > +	tagger_line++;
> 
> I am really reluctant with renamings like these. IMHO they don't buy you 
> much, except for possible confusion. It is evident that sig means the 
> signer (and it is obvious in the case of an unsigned tag, who is meant, 
> too).

Hmm. The "type" line is found in the variable type_line, the "tag" line is
found in the variable tag_line, and the "tagger" line is found in the
variable ... sig_line? Nope, I don't buy it.

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

next prev parent reply	other threads:[~2007-06-09 10:50 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-07 22:50 error: char103: premature end of data Johannes Schindelin
2007-06-07 23:05 ` Johan Herland
2007-06-07 23:28   ` Johannes Schindelin
2007-06-07 23:47     ` Johan Herland
2007-06-07 23:55       ` Johannes Schindelin
2007-06-08  0:08       ` [PATCH] Fix failed tag parsing when tag object has no body/message (and thus ends with a single '\n') Johan Herland
2007-06-08  6:05         ` Junio C Hamano
2007-06-08  8:18           ` Johan Herland
2007-06-08 16:06             ` Junio C Hamano
2007-06-09  0:10               ` [PATCH 0/21] Refactor the tag object (take 2) Johan Herland
2007-06-09  0:12                 ` [PATCH 01/21] Remove unnecessary code and comments on non-existing 8kB tag object restriction Johan Herland
2007-06-09  0:13                 ` [PATCH 02/21] Return error messages when parsing fails Johan Herland
2007-06-09 18:01                   ` Junio C Hamano
2007-06-09 18:28                     ` Johan Herland
2007-06-09 19:42                       ` [PATCH] Silence error messages unless 'thorough_verify' is set Johan Herland
2007-06-10  6:48                         ` Johannes Schindelin
2007-06-10  8:15                           ` Junio C Hamano
2007-06-10 10:08                             ` Johannes Schindelin
2007-06-10 12:10                               ` Johan Herland
2007-06-10 18:51                                 ` Johannes Schindelin
2007-06-10 19:16                                   ` Johan Herland
2007-06-10 11:47                             ` [PATCH 0/4] Restructure the tag object Johan Herland
2007-06-10 11:49                               ` [PATCH 1/4] Make tag names (i.e. the tag object's "tag" line) optional Johan Herland
2007-06-10 22:46                                 ` Junio C Hamano
2007-06-10 23:01                                   ` Johan Herland
2007-06-11  1:11                                     ` Junio C Hamano
2007-06-10 11:50                               ` [PATCH 2/4] Introduce optional "keywords" on tag objects Johan Herland
2007-06-10 18:42                                 ` Johannes Schindelin
2007-06-10 19:04                                   ` Johan Herland
2007-06-10 21:43                                     ` Junio C Hamano
2007-06-10 23:16                                       ` Johan Herland
2007-06-11  1:01                                         ` Junio C Hamano
2007-06-10 11:50                               ` [PATCH 3/4] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
2007-06-10 11:50                               ` [PATCH 4/4] git-mktag tests: Expand on mktag selftests according to the new " Johan Herland
2007-06-10 18:35                               ` [PATCH 0/4] Restructure the tag object Johannes Schindelin
2007-06-09  0:13                 ` [PATCH 03/21] Refactoring to make verify_tag() and parse_tag_buffer() more similar Johan Herland
2007-06-09  2:54                   ` Johannes Schindelin
2007-06-09 10:49                     ` Johan Herland [this message]
2007-06-09  0:14                 ` [PATCH 04/21] Refactor verification of "tagger" line to be more similar to verification of "type" and "tagger" lines Johan Herland
2007-06-09 18:01                   ` Junio C Hamano
2007-06-10  7:49                     ` Johannes Schindelin
2007-06-09  0:14                 ` [PATCH 05/21] Make parse_tag_buffer_internal() handle item == NULL Johan Herland
2007-06-09 18:01                   ` Junio C Hamano
2007-06-10  0:45                     ` [PATCH] Move check for already parsed tag object to parse_tag_buffer() wrapper function Johan Herland
2007-06-10  8:06                   ` [PATCH 05/21] Make parse_tag_buffer_internal() handle item == NULL Johannes Schindelin
2007-06-09  0:15                 ` [PATCH 06/21] Refactor tag name verification loop to use index 'i' instead of incrementing pointer 'tag_line' Johan Herland
2007-06-09 21:26                   ` Alex Riesen
2007-06-09 21:34                     ` Johan Herland
2007-06-10  8:14                       ` Johannes Schindelin
2007-06-10  9:01                   ` Johannes Schindelin
2007-06-09  0:15                 ` [PATCH 07/21] Copy the remaining differences from verify_tag() to parse_tag_buffer_internal() Johan Herland
2007-06-09 21:31                   ` Alex Riesen
2007-06-09 21:39                     ` Johan Herland
2007-06-10  8:22                       ` Johannes Schindelin
2007-06-09  0:15                 ` [PATCH 08/21] Switch from verify_tag() to parse_and_verify_tag_buffer() for verifying tag objects in git-mktag Johan Herland
2007-06-09  0:16                 ` [PATCH 09/21] Remove unneeded code from mktag.c Johan Herland
2007-06-09 21:39                   ` Alex Riesen
2007-06-09 21:42                     ` Johan Herland
2007-06-09  0:16                 ` [PATCH 10/21] Free mktag's buffer before dying Johan Herland
2007-06-09 21:37                   ` Alex Riesen
2007-06-09 21:46                     ` Johan Herland
2007-06-09 22:00                       ` Alex Riesen
2007-06-09 22:05                         ` Johan Herland
2007-06-10  8:38                   ` Johannes Schindelin
2007-06-09  0:17                 ` [PATCH 11/21] Rewrite error messages; fix up line lengths Johan Herland
2007-06-10  8:38                   ` Johannes Schindelin
2007-06-09  0:17                 ` [PATCH 12/21] Use prefixcmp() instead of memcmp() for cleaner code with less magic numbers Johan Herland
2007-06-09 21:42                   ` Alex Riesen
2007-06-09 21:47                     ` Johan Herland
2007-06-10  8:41                   ` Johannes Schindelin
2007-06-09  0:18                 ` [PATCH 13/21] Collect skipping of header field names and calculation of line lengths in one place Johan Herland
2007-06-10  8:45                   ` Johannes Schindelin
2007-06-09  0:18                 ` [PATCH 14/21] Add proper parsing of "tagger" line, but only when thorough_verify is set Johan Herland
2007-06-10  8:52                   ` Johannes Schindelin
2007-06-10  8:58                   ` Johannes Schindelin
2007-06-09  0:19                 ` [PATCH 15/21] Make tag names (i.e. the tag object's "tag" line) optional Johan Herland
2007-06-10  9:07                   ` Johannes Schindelin
2007-06-09  0:19                 ` [PATCH 16/21] Introduce optional "keywords" on tag objects Johan Herland
2007-06-09 21:52                   ` Alex Riesen
2007-06-09 22:00                     ` Johan Herland
2007-06-09 22:36                     ` [PATCH] Use xstrndup() instead of xmalloc() and memcpy(); fix buglet with generating default item->keywords Johan Herland
2007-06-10  0:05                     ` [PATCH 16/21] Introduce optional "keywords" on tag objects Junio C Hamano
2007-06-10  0:35                       ` [PATCH] Fail if tag name and keywords is not within "printable ASCII" Johan Herland
2007-06-10  1:33                         ` Junio C Hamano
2007-06-09  0:20                 ` [PATCH 17/21] Update comments on tag objects in mktag.c Johan Herland
2007-06-09  0:20                 ` [PATCH 18/21] git-fsck: Do thorough verification of tag objects Johan Herland
2007-06-09  0:20                 ` [PATCH 19/21] Documentation/git-mktag: Document the changes in tag object structure Johan Herland
2007-06-09  0:21                 ` [PATCH 20/21] git-mktag tests: Expand on mktag selftests according to the new " Johan Herland
2007-06-09  0:21                 ` [PATCH 21/21] Add fsck_verify_ref_to_tag_object() to verify that refname matches name stored in tag object Johan Herland
2007-06-07 23:11 ` error: char103: premature end of data Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200706091249.45042.johan@herland.net \
    --to=johan@herland.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).