Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: karthik nayak <karthik.188@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'
Date: Thu, 19 Mar 2015 01:05:08 +0530	[thread overview]
Message-ID: <5509D36C.6000908@gmail.com> (raw)
In-Reply-To: <xmqqbnjro05k.fsf@gitster.dls.corp.google.com>



On 03/18/2015 01:40 AM, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Subject: Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'
>
>> Modify sha1_loose_object_info() to support 'cat-file --literally'
>> by accepting flags and also make changes to copy the type to
>> object_info::typename.
>
> It would be more descriptive to mention the central point on the
> title regarding what it takes to "support cat-file --literally".
>
> For example:
>
>    sha1_file.c: support reading from a loose object of a bogus type
>
>    Update sha1_loose_object_info() to optionally allow it to read
>    from a loose object file of unknown/bogus type; as the function
>    usually returns the type of the object it read in the form of enum
>    for known types, add an optional "typename" field to receive the
>    name of the type in textual form.
>
> By the way, I think your 1/2 would not even compile unless you have
> this change; the patches in these two patch series must be swapped,
> no?
>
Noted. Yes it wouldn't. I thought both would be applied together and 
didn't give it much thought, but yeah, I should pay more attention to that.
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 69a60ec..e31e9e2 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
>>   	return git_inflate(stream, 0);
>>   }
>>
>> +static int unpack_sha1_header_literally(git_zstream *stream, unsigned char *map,
>> +					unsigned long mapsize,
>> +					struct strbuf *header)
>> +{
>> +	unsigned char buffer[32], *cp;
>> +	unsigned long bufsiz = sizeof(buffer);
>> +	int status;
>> +
>> +	/* Get the data stream */
>> +	memset(stream, 0, sizeof(*stream));
>> +	stream->next_in = map;
>> +	stream->avail_in = mapsize;
>> +	stream->next_out = buffer;
>> +	stream->avail_out = bufsiz;
>> +
>> +	git_inflate_init(stream);
>> +
>> +	do {
>> +		status = git_inflate(stream, 0);
>> +		strbuf_add(header, buffer, stream->next_out - buffer);
>> +		for (cp = buffer; cp < stream->next_out; cp++)
>> +			if (!*cp)
>> +				/* Found the NUL at the end of the header */
>> +				return 0;
>> +		stream->next_out = buffer;
>> +		stream->avail_out = bufsiz;
>> +	} while (status == Z_OK);
>> +	return -1;
>> +}
>> +
>
> There is nothing "literal" about this function.
>
> The only difference between the original and this one is that this
> uses a strbuf, instead of a buffer of a fixed size allocated by the
> caller, to return the early part of the inflated data.
>
> I wonder if it incurs too much overhead to the existing callers if
> you reimplementated unpack_sha1_header() as a thin wrapper around
> this function, something like
>
> 	int unpack_sha1_header(git_zstream *stream,
> 				unsigned char *map, unsigned long mapsize,
>                                  void *buffer, unsigned long bufsiz)
> 	{
>                  int status;
> 		struct strbuf header = STRBUF_INIT;
>
>                  status = unpack_sha1_header_to_strbuf(stream, map, mapsize, &header);
> 		if (bufsiz < header.len)
>                  	die("object header too long");
> 		memcpy(buffer, header.buf, header.len);
>                  return status;
> 	}
>      			
> Note that I am *not* suggesting to do this blindly.  If there is no
> downside from performance point of view, however, the above would be
> a way to avoid code duplication.
>
> Another way to avoid code duplication may be to implement
> unpack_sha1_header_to_strbuf() in terms of unpack_sha1_header(),
> perhaps like this:
>
> 	static int unpack_sha1_header_to_strbuf(...)
>          {
> 		unsigned char buffer[32], *cp;
>                  unsigned long bufsiz = sizeof(buffer);
>          	int status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
>
> 		if (status)
>                  	return status;
> 		do {
> 			strbuf_add(header, buffer, stream->next_out - buffer);
> 			for (cp = buffer; cp < stream->next_out; cp++)
> 				if (!*cp)
> 					/* Found the NUL at the end of the header */
> 					return 0;
> 			stream->next_out = buffer;
> 			stream->avail_out = bufsiz;
> 		} while (status == Z_OK);
> 		return -1;
> 	}
>
> which may be more in line with how we read data from loose objects.

Agreed there is code duplication with unpack_sha1_header() and 
unpack_sha1_header_extended(). But I thought there would be a 
performance trade off, Any way I could test this?
Also the second suggestion you have given seems to be more practical, As 
there is no performance overhead, if called by existing code.
>
>> +int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
>> +			       int flags)
>
> Unless we are taking advantage of the fact that MSB is special in
> signed integral types, I would prefer to see us use an unsigned type
> to store these bits in a variable of an integral type.  That would
> let the readers assume that they have fewer tricky things possibly
> going on in the code (also see the footnote to $gmane/263751).

Thanks for the link. Will change to unsigned.
>
>> @@ -1652,12 +1674,45 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep)
>>   			size = size * 10 + c;
>>   		}
>>   	}
>> -	*sizep = size;
>> +
>> +	type = type_from_string_gently(typename.buf, -1, 1);
>
> Doesn't the code know how long the typename is at this point?
Yes, will change.
>
>> +	if (oi->sizep)
>> +		*oi->sizep = size;
>> +	if (oi->typename)
>> +		strbuf_addstr(oi->typename, typename.buf);
>
> Likewise.  Perhaps strbuf_addbuf()?
That also calls strbuf_add(). But does improve readability
>
>> +	if (oi->typep)
>> +		*oi->typep = type;
>
> Do you want to store -1 to *oi->typep when the caller asked to do
> the "literally" thing?  Shouldn't it match what is returned from
> this function?
Yes it should. Will make changes.
>
>> +	return *hdr ? -1 : type;
>> +}
>> +
>> +/*
>> + * We used to just use "sscanf()", but that's actually way
>> + * too permissive for what we want to check. So do an anal
>> + * object header parse by hand. Calls the extended function.
>> + */
>
> The comment "let's do better than sscanf() by parsing ourselves"
> applies to the implementation of _extended() function, not this one,
> no?  It is clear to everybody that it "Calls the extended function",
> so why mention it?
>
>> +int parse_sha1_header(const char *hdr, unsigned long *sizep)
>> +{
>> +	struct object_info oi;
>> +
>> +	oi.sizep = sizep;
>> +	oi.typename = NULL;
>> +	oi.typep = NULL;
>> +	return parse_sha1_header_extended(hdr, &oi, LOOKUP_REPLACE_OBJECT);
>>   }
>
>> @@ -2524,13 +2579,15 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1,
>>   }
>>
>>   static int sha1_loose_object_info(const unsigned char *sha1,
>> -				  struct object_info *oi)
>> +				  struct object_info *oi,
>> +				  int flags)
>>   {
>> -	int status;
>> +	int status = 0;
>>   	unsigned long mapsize, size;
>>   	void *map;
>>   	git_zstream stream;
>>   	char hdr[32];
>> +	struct strbuf hdrbuf = STRBUF_INIT;
>>
>>   	if (oi->delta_base_sha1)
>>   		hashclr(oi->delta_base_sha1);
>> @@ -2557,17 +2614,29 @@ static int sha1_loose_object_info(const unsigned char *sha1,
>>   		return -1;
>>   	if (oi->disk_sizep)
>>   		*oi->disk_sizep = mapsize;
>> -	if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
>> -		status = error("unable to unpack %s header",
>> -			       sha1_to_hex(sha1));
>> -	else if ((status = parse_sha1_header(hdr, &size)) < 0)
>> -		status = error("unable to parse %s header", sha1_to_hex(sha1));
>> -	else if (oi->sizep)
>> +	if ((flags & LOOKUP_LITERALLY)) {
>> +		if (unpack_sha1_header_literally(&stream, map, mapsize, &hdrbuf) < 0)
>> +			status = error("unable to unpack %s header with --literally",
>> +				       sha1_to_hex(sha1));
>> +		else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0)
>> +			status = error("unable to parse %s header", sha1_to_hex(sha1));
>> +	} else {
>> +		if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
>> +			status = error("unable to unpack %s header",
>> +				       sha1_to_hex(sha1));
>> +		else if ((status = parse_sha1_header(hdr, &size)) < 0)
>> +			status = error("unable to parse %s header", sha1_to_hex(sha1));
>> +	}
>> +	if (oi->sizep)
>>   		*oi->sizep = size;
>
> Have you considered calling parse_sha1_header_extended() for both
> literally and normal cases?  Then you wouldn't have to do any of the
> "if (oi->blah) *oi->blah = value", no?
>
>>   	git_inflate_end(&stream);
>>   	munmap(map, mapsize);
>>   	if (oi->typep)
>>   		*oi->typep = status;
>
> Likewise.
>
>> +	if (oi->typename && !(oi->typename->len))
>> +		strbuf_addstr(oi->typename, typename(status));
>
> Likewise.
>

No. I have not considered that. will look into this.
Thanks for the suggestions.

Regards
-Karthik

     prev parent reply	other threads:[~2015-03-18 19:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-17  5:13 [PATCH v4 0/2] cat-file: add a '--literally' option karthik nayak
2015-03-17  5:16 ` [PATCH v4 1/2] cat-file: teach cat-file " Karthik Nayak
2015-03-17  6:51   ` Eric Sunshine
2015-03-17 13:59     ` Karthik Nayak
2015-03-17  5:16 ` [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally' Karthik Nayak
2015-03-17 20:10   ` Junio C Hamano
2015-03-18 19:35     ` karthik nayak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5509D36C.6000908@gmail.com \
    --to=karthik.188@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).