git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>, git <git@vger.kernel.org>
Subject: Project idea: strbuf allocation modes
Date: Fri, 18 Apr 2014 15:50:46 +0200	[thread overview]
Message-ID: <53512DB6.1070600@alum.mit.edu> (raw)
In-Reply-To: <vpqr457omgs.fsf@anie.imag.fr>

I like that strbuf is getting used more than it used to, and I think
that is a good general trend to help git rid of and/or avoid buffer
overflows and arbitrary limits on strings.

It is unfortunate that it is currently impossible to use a strbuf
without doing a memory allocation.  So code like

    void f()
    {
        char path[PATH_MAX];

        ...
    }

typically gets turned into either

    void f()
    {
        struct strbuf path;

        strbuf_add(&path, ...);   <-- does a malloc
        ...
        strbuf_release(&path);    <-- does a free
    }

which costs extra memory allocations, or

    void f()
    {
        static struct strbuf path;

        strbuf_add(&path, ...);
        ...
        strbuf_setlen(&path, 0);
    }

which, by using a static variable, avoids most of the malloc/free
overhead, but makes the function unsafe to use recursively or from
multiple threads.


The Idea
========

I would like to see strbuf enhanced to allow it to use memory that it
doesn't own (for example, stack-allocated memory), while (optionally)
allowing it to switch over to using allocated memory if the string grows
past the size of the pre-allocated buffer.

The way I envision this, strbuf would gain a flags field with two bits:

    STRBUF_OWNS_MEMORY

        The memory pointed to by buf is owned by this strbuf.  If this
        bit is not set, then the memory should never be freed, and
        (among other things) strbuf_detach() must always call xstrcpy().

    STRBUF_FIXED_MEMORY

        This strbuf can use the memory of length alloc at buf however
        it likes.  But it must never free() or realloc() the memory or
        move the string.  Essentially, buf should be treated as a
        constant pointer.  If the string overgrows the buffer, die().

The different bit combinations would have the following effects:

    flags == ~STRBUF_OWNS_MEMORY | ~STRBUF_FIXED_MEMORY

        The strbuf doesn't own the current memory, but it is allowed to
        grow the buffer into other (allocated) memory.  When needed,
        malloc() new memory, copy the old string to the new memory,
        and clear this bit -- but *don't* free the old memory.  Note
        that STRBUF_INIT, when buf is initialized to point at
        strbuf_slopbuf, would use this flag setting.  This flag
        combination could be used to wrap a stack-allocated buffer
        without preventing the string from growing beyond the
        size of the buffer:

        void f()
        {
            char path_buf[PATH_MAX];
            struct strbuf path;

            strbuf_wrap_preallocated(&path, path_buf, 0, sizeof(path));
            ...
            strbuf_release(&path);
        }

        (Probably the boilerplate could be reduced by using macros.)

    flags == STRBUF_OWNS_MEMORY | ~STRBUF_FIXED_MEMORY

        This gives the current behavior of a non-empty strbuf.

    flags == ~STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY

        This would make it possible to use a strbuf to wrap a
        fixed-size buffer that belongs to somebody else and must not
        be moved.  For example,

        void f(char *path_buf, size_t path_buf_len)
        {
            struct strbuf path;

            strbuf_wrap_fixed(&path, path_buf,
                              strlen(path_buf), path_buf_len);
            ...
            /*
             * no strbuf_release() required here, but if called it
             * is a NOOP
             */
        }

        (Probably the boilerplate could be reduced by using macros.)

        It would also be possible to use this mode to dynamically
        allocate a strbuf along with space for its string (i.e., to
        make do with one allocation instead of two):

            struct strbuf sb = xmalloc(sizeof(strbuf) + len + 1);
            sb.alloc = len + 1;
            sb.len = len;
            sb.flags = STRBUF_FIXED_MEMORY;
            sb.buf = (void *)sb + sizeof(strbuf);
            *sb.buf = '\0';

        If this is considered useful, it should of course be offered
        via a function.

    flags == STRBUF_OWNS_MEMORY | STRBUF_FIXED_MEMORY

        This combination seems less useful, but I leave it here for
        completeness.  It is basically a strbuf whose length is
        constrained to its currently-allocated size.

The nice thing is that I believe this new functionality could be
implemented without slowing down the most-used strbuf functions.  For
example, none of the inline functions would have to be changed.  The
flags would only have to be checked when doing memory-related operations
like strbuf_grow(), strbuf_detach(), etc.


Anyway, I just wanted to get the idea out there.  I'd like to get
feedback about whether people think this is a good idea.  If so, maybe
I'll add it to the wiki as a suggested project.

I've wanted to work on this myself, but realistically I'm not going to
have time in the near future.  It might be a good Ensimag project or a
future GSoC project (though it might be a bit "janitorial" to excite
students).  Improving the strbuf API in this way shouldn't be too
difficult, and it can be followed up by an almost limitless number of
fixes throughout the Git codebase to use the new functionality.

Of course if somebody is looking for a relatively straightforward
project, I think this would be a nice improvement: it makes strbufs
easier and cheaper to use and thereby reduces the temptation to do
manual memory management, which is all too prone to creating security
holes via buffer overflows.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2014-04-18 13:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-08 15:44 Students project on Git (Ensimag) Matthieu Moy
2014-04-18 13:50 ` Michael Haggerty [this message]
2014-04-18 17:21   ` Project idea: strbuf allocation modes Junio C Hamano
2014-04-18 20:04     ` Michael Haggerty
2014-04-19  0:55       ` Junio C Hamano
2014-04-22  7:07   ` Matthieu Moy
2014-04-22  9:09     ` Michael Haggerty
2014-04-22 10:38       ` Matthieu Moy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53512DB6.1070600@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=Matthieu.Moy@grenoble-inp.fr \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).