From: Erik Faye-Lund <kusmabite@gmail.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, jwa@urbancode.com, drew.northup@maine.edu
Subject: Re: [PATCH] config: support values longer than 1024 bytes
Date: Wed, 6 Apr 2011 18:16:50 +0200 [thread overview]
Message-ID: <BANLkTikuzfZrv+N0Qm7utfYF6fsYQn7Zcg@mail.gmail.com> (raw)
In-Reply-To: <20110406153509.GA1864@sigill.intra.peff.net>
On Wed, Apr 6, 2011 at 5:35 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Apr 06, 2011 at 11:10:42AM +0200, Erik Faye-Lund wrote:
>
>> > But for the other, one of the invariants of strbuf is that the string is
>> > always NUL-terminated. So I would expect strbuf_init to properly
>> > NUL-terminate after growing based on the hint.
>>
>> I agree. An unterminated yet non-NULL return from strbuf_detach is
>> just dangerous behavior. Something like this should probably be
>> applied:
>>
>> ---8<---
>> diff --git a/strbuf.c b/strbuf.c
>> index 77444a9..538035a 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -24,14 +24,16 @@ int suffixcmp(const char *str, const char *suffix)
>> * buf is non NULL and ->buf is NUL terminated even for a freshly
>> * initialized strbuf.
>> */
>> -char strbuf_slopbuf[1];
>> +char strbuf_slopbuf[1] = { '\0' };
>
> This hunk is redundant. slopbuf will already be initialized to 0.
Right, silly me. I somehow thought that implicit zero-initialization
only applied to static variables, but K&R tells me I was wrong. Thanks
for pointing it out :)
>
>> void strbuf_init(struct strbuf *sb, size_t hint)
>> {
>> sb->alloc = sb->len = 0;
>> sb->buf = strbuf_slopbuf;
>> - if (hint)
>> + if (hint) {
>> strbuf_grow(sb, hint);
>> + sb->buf[0] = '\0';
>> + }
>> }
>
> But this one is the right fix.
>
OK, I'll turn this into a two-patch series, then.
>> that. But this brings a new issue: leaving potentially huge blocks of
>> memory (especially since this patch is about long lines) allocated
>> inside a function can be a bit nasty. But it's probably not a big deal
>
> Yeah. It's just one block, though, and in the normal case it is probably
> only about 80 characters. So it is more efficient than what's there now. :)
>
> Somebody could have some gigantic value, though, and yes, we'll grow to
> the biggest one and never free that memory. You could also have
> parse_value take a strbuf parameter to output into, and then free it
> after config reading is done.
>
>> In other words: I think you're right, it's a much better approach.
>> Less allocations, less penalty on the start-up time for every little
>> git-command.
>
> I doubt the efficiency increase is measurable. We end up xstrdup'ing
> quite a few of the values in the config callbacks anyway. I would do
> whatever seems most natural for reading/writing the code.
>
I think I'll just leave the single leak in. Since it would allocate
the buffer lazily, it would really only "waste" more memory than the
existing implementation when the old implementation would fail. So my
conscience is clear ;)
>> > I do wonder, though, if we could be reusing the unquote_c_style()
>> > function in quote.c. They are obviously similar, but I haven't checked
>> > if there is more going on in the config code.
>>
>> Hmm, this is an interesting suggestion. It would be a part of a bigger
>> change though: unquote_c_style requires it's input to be in memory,
>> while parse_value uses a function called get_next_char to feed the
>> parser. So we'd either have to read the entire file into memory, or
>> find some way to read the file line-by-line while handling \n-escaping
>> correctly.
>>
>> It also seems like there's differences in what kind of escaping and
>> normalization the two functions handle; unquote_c_style handles more
>> escaped character sequences, while parse_value normalize all
>> non-escaped space-characters ('\t' et. al) into SP. This might not be
>> such a big problem in reality.
>
> This was just a random thought that I had, and I didn't investigate it
> how hard it would be. If it turns out to be too much trouble, just
> forget it.
>
Yeah. I think I'll skip it, since it isn't dead obvious.
Thanks for the input, I'll cook up a new version soon.
prev parent reply other threads:[~2011-04-06 16:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-05 23:30 [PATCH] config: support values longer than 1024 bytes Erik Faye-Lund
2011-04-05 23:38 ` Erik Faye-Lund
2011-04-06 0:52 ` Jeff King
2011-04-06 9:10 ` Erik Faye-Lund
2011-04-06 15:35 ` Jeff King
2011-04-06 16:16 ` Erik Faye-Lund [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTikuzfZrv+N0Qm7utfYF6fsYQn7Zcg@mail.gmail.com \
--to=kusmabite@gmail.com \
--cc=drew.northup@maine.edu \
--cc=git@vger.kernel.org \
--cc=jwa@urbancode.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).