From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCH] config: support values longer than 1024 bytes Date: Wed, 6 Apr 2011 11:35:09 -0400 Message-ID: <20110406153509.GA1864@sigill.intra.peff.net> References: <1302046203-4408-1-git-send-email-kusmabite@gmail.com> <20110406005223.GA10374@sigill.intra.peff.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org, jwa@urbancode.com, drew.northup@maine.edu To: Erik Faye-Lund X-From: git-owner@vger.kernel.org Wed Apr 06 17:35:23 2011 Return-path: Envelope-to: gcvg-git-2@lo.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1Q7UlN-0001fc-5o for gcvg-git-2@lo.gmane.org; Wed, 06 Apr 2011 17:35:21 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932125Ab1DFPfO (ORCPT ); Wed, 6 Apr 2011 11:35:14 -0400 Received: from 99-108-226-0.lightspeed.iplsin.sbcglobal.net ([99.108.226.0]:55437 "EHLO peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932117Ab1DFPfN (ORCPT ); Wed, 6 Apr 2011 11:35:13 -0400 Received: (qmail 5647 invoked by uid 107); 6 Apr 2011 15:35:59 -0000 Received: from 205.158.58.41.ptr.us.xo.net (HELO sigill.intra.peff.net) (205.158.58.41) (smtp-auth username relayok, mechanism cram-md5) by peff.net (qpsmtpd/0.84) with ESMTPA; Wed, 06 Apr 2011 11:35:59 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 06 Apr 2011 11:35:09 -0400 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Wed, Apr 06, 2011 at 11:10:42AM +0200, Erik Faye-Lund wrote: > > But for the other, one of the invariants of strbuf is that the string is > > always NUL-terminated. So I would expect strbuf_init to properly > > NUL-terminate after growing based on the hint. > > I agree. An unterminated yet non-NULL return from strbuf_detach is > just dangerous behavior. Something like this should probably be > applied: > > ---8<--- > diff --git a/strbuf.c b/strbuf.c > index 77444a9..538035a 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -24,14 +24,16 @@ int suffixcmp(const char *str, const char *suffix) > * buf is non NULL and ->buf is NUL terminated even for a freshly > * initialized strbuf. > */ > -char strbuf_slopbuf[1]; > +char strbuf_slopbuf[1] = { '\0' }; This hunk is redundant. slopbuf will already be initialized to 0. > void strbuf_init(struct strbuf *sb, size_t hint) > { > sb->alloc = sb->len = 0; > sb->buf = strbuf_slopbuf; > - if (hint) > + if (hint) { > strbuf_grow(sb, hint); > + sb->buf[0] = '\0'; > + } > } But this one is the right fix. > that. But this brings a new issue: leaving potentially huge blocks of > memory (especially since this patch is about long lines) allocated > inside a function can be a bit nasty. But it's probably not a big deal Yeah. It's just one block, though, and in the normal case it is probably only about 80 characters. So it is more efficient than what's there now. :) Somebody could have some gigantic value, though, and yes, we'll grow to the biggest one and never free that memory. You could also have parse_value take a strbuf parameter to output into, and then free it after config reading is done. > In other words: I think you're right, it's a much better approach. > Less allocations, less penalty on the start-up time for every little > git-command. I doubt the efficiency increase is measurable. We end up xstrdup'ing quite a few of the values in the config callbacks anyway. I would do whatever seems most natural for reading/writing the code. > > I do wonder, though, if we could be reusing the unquote_c_style() > > function in quote.c. They are obviously similar, but I haven't checked > > if there is more going on in the config code. > > Hmm, this is an interesting suggestion. It would be a part of a bigger > change though: unquote_c_style requires it's input to be in memory, > while parse_value uses a function called get_next_char to feed the > parser. So we'd either have to read the entire file into memory, or > find some way to read the file line-by-line while handling \n-escaping > correctly. > > It also seems like there's differences in what kind of escaping and > normalization the two functions handle; unquote_c_style handles more > escaped character sequences, while parse_value normalize all > non-escaped space-characters ('\t' et. al) into SP. This might not be > such a big problem in reality. This was just a random thought that I had, and I didn't investigate it how hard it would be. If it turns out to be too much trouble, just forget it. -Peff