From: karthik nayak <karthik.188@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'
Date: Thu, 19 Mar 2015 01:05:08 +0530 [thread overview]
Message-ID: <5509D36C.6000908@gmail.com> (raw)
In-Reply-To: <xmqqbnjro05k.fsf@gitster.dls.corp.google.com>
On 03/18/2015 01:40 AM, Junio C Hamano wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Subject: Re: [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally'
>
>> Modify sha1_loose_object_info() to support 'cat-file --literally'
>> by accepting flags and also make changes to copy the type to
>> object_info::typename.
>
> It would be more descriptive to mention the central point on the
> title regarding what it takes to "support cat-file --literally".
>
> For example:
>
> sha1_file.c: support reading from a loose object of a bogus type
>
> Update sha1_loose_object_info() to optionally allow it to read
> from a loose object file of unknown/bogus type; as the function
> usually returns the type of the object it read in the form of enum
> for known types, add an optional "typename" field to receive the
> name of the type in textual form.
>
> By the way, I think your 1/2 would not even compile unless you have
> this change; the patches in these two patch series must be swapped,
> no?
>
Noted. Yes it wouldn't. I thought both would be applied together and
didn't give it much thought, but yeah, I should pay more attention to that.
>> diff --git a/sha1_file.c b/sha1_file.c
>> index 69a60ec..e31e9e2 100644
>> --- a/sha1_file.c
>> +++ b/sha1_file.c
>> @@ -1564,6 +1564,36 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
>> return git_inflate(stream, 0);
>> }
>>
>> +static int unpack_sha1_header_literally(git_zstream *stream, unsigned char *map,
>> + unsigned long mapsize,
>> + struct strbuf *header)
>> +{
>> + unsigned char buffer[32], *cp;
>> + unsigned long bufsiz = sizeof(buffer);
>> + int status;
>> +
>> + /* Get the data stream */
>> + memset(stream, 0, sizeof(*stream));
>> + stream->next_in = map;
>> + stream->avail_in = mapsize;
>> + stream->next_out = buffer;
>> + stream->avail_out = bufsiz;
>> +
>> + git_inflate_init(stream);
>> +
>> + do {
>> + status = git_inflate(stream, 0);
>> + strbuf_add(header, buffer, stream->next_out - buffer);
>> + for (cp = buffer; cp < stream->next_out; cp++)
>> + if (!*cp)
>> + /* Found the NUL at the end of the header */
>> + return 0;
>> + stream->next_out = buffer;
>> + stream->avail_out = bufsiz;
>> + } while (status == Z_OK);
>> + return -1;
>> +}
>> +
>
> There is nothing "literal" about this function.
>
> The only difference between the original and this one is that this
> uses a strbuf, instead of a buffer of a fixed size allocated by the
> caller, to return the early part of the inflated data.
>
> I wonder if it incurs too much overhead to the existing callers if
> you reimplementated unpack_sha1_header() as a thin wrapper around
> this function, something like
>
> int unpack_sha1_header(git_zstream *stream,
> unsigned char *map, unsigned long mapsize,
> void *buffer, unsigned long bufsiz)
> {
> int status;
> struct strbuf header = STRBUF_INIT;
>
> status = unpack_sha1_header_to_strbuf(stream, map, mapsize, &header);
> if (bufsiz < header.len)
> die("object header too long");
> memcpy(buffer, header.buf, header.len);
> return status;
> }
>
> Note that I am *not* suggesting to do this blindly. If there is no
> downside from performance point of view, however, the above would be
> a way to avoid code duplication.
>
> Another way to avoid code duplication may be to implement
> unpack_sha1_header_to_strbuf() in terms of unpack_sha1_header(),
> perhaps like this:
>
> static int unpack_sha1_header_to_strbuf(...)
> {
> unsigned char buffer[32], *cp;
> unsigned long bufsiz = sizeof(buffer);
> int status = unpack_sha1_header(stream, map, mapsize, buffer, bufsiz);
>
> if (status)
> return status;
> do {
> strbuf_add(header, buffer, stream->next_out - buffer);
> for (cp = buffer; cp < stream->next_out; cp++)
> if (!*cp)
> /* Found the NUL at the end of the header */
> return 0;
> stream->next_out = buffer;
> stream->avail_out = bufsiz;
> } while (status == Z_OK);
> return -1;
> }
>
> which may be more in line with how we read data from loose objects.
Agreed there is code duplication with unpack_sha1_header() and
unpack_sha1_header_extended(). But I thought there would be a
performance trade off, Any way I could test this?
Also the second suggestion you have given seems to be more practical, As
there is no performance overhead, if called by existing code.
>
>> +int parse_sha1_header_extended(const char *hdr, struct object_info *oi,
>> + int flags)
>
> Unless we are taking advantage of the fact that MSB is special in
> signed integral types, I would prefer to see us use an unsigned type
> to store these bits in a variable of an integral type. That would
> let the readers assume that they have fewer tricky things possibly
> going on in the code (also see the footnote to $gmane/263751).
Thanks for the link. Will change to unsigned.
>
>> @@ -1652,12 +1674,45 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep)
>> size = size * 10 + c;
>> }
>> }
>> - *sizep = size;
>> +
>> + type = type_from_string_gently(typename.buf, -1, 1);
>
> Doesn't the code know how long the typename is at this point?
Yes, will change.
>
>> + if (oi->sizep)
>> + *oi->sizep = size;
>> + if (oi->typename)
>> + strbuf_addstr(oi->typename, typename.buf);
>
> Likewise. Perhaps strbuf_addbuf()?
That also calls strbuf_add(). But does improve readability
>
>> + if (oi->typep)
>> + *oi->typep = type;
>
> Do you want to store -1 to *oi->typep when the caller asked to do
> the "literally" thing? Shouldn't it match what is returned from
> this function?
Yes it should. Will make changes.
>
>> + return *hdr ? -1 : type;
>> +}
>> +
>> +/*
>> + * We used to just use "sscanf()", but that's actually way
>> + * too permissive for what we want to check. So do an anal
>> + * object header parse by hand. Calls the extended function.
>> + */
>
> The comment "let's do better than sscanf() by parsing ourselves"
> applies to the implementation of _extended() function, not this one,
> no? It is clear to everybody that it "Calls the extended function",
> so why mention it?
>
>> +int parse_sha1_header(const char *hdr, unsigned long *sizep)
>> +{
>> + struct object_info oi;
>> +
>> + oi.sizep = sizep;
>> + oi.typename = NULL;
>> + oi.typep = NULL;
>> + return parse_sha1_header_extended(hdr, &oi, LOOKUP_REPLACE_OBJECT);
>> }
>
>> @@ -2524,13 +2579,15 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1,
>> }
>>
>> static int sha1_loose_object_info(const unsigned char *sha1,
>> - struct object_info *oi)
>> + struct object_info *oi,
>> + int flags)
>> {
>> - int status;
>> + int status = 0;
>> unsigned long mapsize, size;
>> void *map;
>> git_zstream stream;
>> char hdr[32];
>> + struct strbuf hdrbuf = STRBUF_INIT;
>>
>> if (oi->delta_base_sha1)
>> hashclr(oi->delta_base_sha1);
>> @@ -2557,17 +2614,29 @@ static int sha1_loose_object_info(const unsigned char *sha1,
>> return -1;
>> if (oi->disk_sizep)
>> *oi->disk_sizep = mapsize;
>> - if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
>> - status = error("unable to unpack %s header",
>> - sha1_to_hex(sha1));
>> - else if ((status = parse_sha1_header(hdr, &size)) < 0)
>> - status = error("unable to parse %s header", sha1_to_hex(sha1));
>> - else if (oi->sizep)
>> + if ((flags & LOOKUP_LITERALLY)) {
>> + if (unpack_sha1_header_literally(&stream, map, mapsize, &hdrbuf) < 0)
>> + status = error("unable to unpack %s header with --literally",
>> + sha1_to_hex(sha1));
>> + else if ((status = parse_sha1_header_extended(hdrbuf.buf, oi, flags)) < 0)
>> + status = error("unable to parse %s header", sha1_to_hex(sha1));
>> + } else {
>> + if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
>> + status = error("unable to unpack %s header",
>> + sha1_to_hex(sha1));
>> + else if ((status = parse_sha1_header(hdr, &size)) < 0)
>> + status = error("unable to parse %s header", sha1_to_hex(sha1));
>> + }
>> + if (oi->sizep)
>> *oi->sizep = size;
>
> Have you considered calling parse_sha1_header_extended() for both
> literally and normal cases? Then you wouldn't have to do any of the
> "if (oi->blah) *oi->blah = value", no?
>
>> git_inflate_end(&stream);
>> munmap(map, mapsize);
>> if (oi->typep)
>> *oi->typep = status;
>
> Likewise.
>
>> + if (oi->typename && !(oi->typename->len))
>> + strbuf_addstr(oi->typename, typename(status));
>
> Likewise.
>
No. I have not considered that. will look into this.
Thanks for the suggestions.
Regards
-Karthik
prev parent reply other threads:[~2015-03-18 19:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-17 5:13 [PATCH v4 0/2] cat-file: add a '--literally' option karthik nayak
2015-03-17 5:16 ` [PATCH v4 1/2] cat-file: teach cat-file " Karthik Nayak
2015-03-17 6:51 ` Eric Sunshine
2015-03-17 13:59 ` Karthik Nayak
2015-03-17 5:16 ` [PATCH v4 2/2] sha1_file: refactor sha1_file.c to support 'cat-file --literally' Karthik Nayak
2015-03-17 20:10 ` Junio C Hamano
2015-03-18 19:35 ` karthik nayak [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5509D36C.6000908@gmail.com \
--to=karthik.188@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).