Re: [PATCH] receive-pack: plug minor memory leak in unpack()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] receive-pack: plug minor memory leak in unpack()
Date: Sun, 19 Oct 2014 13:13:30 +0200	[thread overview]
Message-ID: <54439CDA.9070804@web.de> (raw)
In-Reply-To: <20141014091628.GB16686@peff.net>

Am 14.10.2014 um 11:16 schrieb Jeff King:
> On Mon, Oct 13, 2014 at 12:08:09PM -0700, Junio C Hamano wrote:
>
>>> I wonder if run-command should provide a managed env array similar
>>> to the "args" array.

That's a good idea.

>>
>> I took a look at a few of them:
>
> I took a brief look, too.
>
> I had hoped we could just all it "env" and everybody would be happy
> using it instead of bare pointers. But quite a few callers assign
> "local_repo_env" to to child_process.env. We could copy it into an
> argv_array, of course, but that really feels like working around the
> interface. So I think we would prefer to keep both forms available.

We could add a flag (clear_local_repo_env?) and reference local_repo_env 
only in run-command.c for these cases.  But some other cases remain that 
are better off providing their own array, like in daemon.c.

> That raises the question: what should it be called? The "argv_array"
> form of "argv" is called "args". The more I see it, the more I hate that
> name, as the two are easily confused. We could have:
>
>    const char **argv;
>    struct argv_array argv_array;
>    const char **env;
>    struct argv_array env_array;
>
> though "argv_array" is a lot to type when you have a string of
> argv_array_pushf() calls (which are already by themselves kind of
> verbose). Maybe that's not too big a deal, though.

I actually like args and argv. :)  Mixing them up is noticed by the 
compiler, so any confusion is cleared up rather quickly.

> We could flip it to give the managed version the short name (and calling
> the unmanaged version "env_ptr" or something). That would require
> munging the existing callers, but the tweak would be simple.

Perhaps, but I'm a but skeptical of big renames.  Let's start small and 
add env_array, and see how far we get with that.

>>   - daemon.c::handle() uses a static set of environment variables
>>     that are not built with argv_array().  Same for argv.
>
> Most of the callers you mentioned are good candidates. This one is
> tricky.
>
> The argv array gets malloc'd and set up by the parent git-daemon
> process. Then each time we get a client, we create a new struct
> child_process that references it. So using the managed argv-array would
> actually be a bit more complicated (and not save any memory; we just
> always point to the single copy for each child).
>
> For the environment, we build it in a function-local buffer, point the
> child_process's env field at it, start the child, and then copy the
> child_process into our global list of children. When we notice a child
> is dead (by linearly going through the list with waitpid), we free the
> list entry. So there are a few potentially bad things here:
>
>    1. We memcpy the child_process to put it on the list. Which does work,
>       though it feels a little like we are violating the abstraction
>       barrier.
>
>    2. The child_process in the list points to the local "env" buffer that
>       is no longer valid. There's no bug because we don't ever look at
>       it. Moving to a managed env would fix that. But I have to wonder if
>       we even want to be keeping the "struct child_process" around in the
>       first place (all we really care about is the pid).
>
>    3. If we do move to a managed env, then we expect it to get cleaned up
>       in finish_command. But we never call that; we just free the list
>       memory containing the child_process. We would want to call
>       finish_command. Except that we will have reaped the process already
>       with our call to waitpid() from check_dead_children. So we'd need a
>       new call to do just the cleanup without the wait, I guess.
>
>    4. For every loop on the listen socket, we call waitpid() for each
>       living child, which is a bit wasteful. We'd probably do better to
>       call waitpid(0, &status, WNOHANG), and then look up the resulting
>       pid in a hashmap or sorted list when we actually see something that
>       died. I don't know that this is a huge problem in practice. We use
>       git-daemon pretty heavily on our backend servers at GitHub, and it
>       seems to use about 5% of a CPU constantly on each machine. Which is
>       kind of lame, given that it isn't doing anything at all, but is
>       hardly earth-shattering.
>
> So I'm not sure if it is worth converting to a managed env. There's a
> bit of ugliness, but none of it is causing any problems, and doing so
> opens a can of worms. The most interesting thing to fix (to me, anyway)
> is number 4, but that has nothing to do with the env in the first place.
> :)

Trickiness makes me nervous, especially in daemon.c.  And 5% CPU usage 
just for waiting sounds awful.  Using waitpid(0, ...) is not supported 
by the current implementation in compat/mingw.c, however.

I agree that env handling should only be changed after the wait loop has 
been improved.

By the way, does getaddrinfo(3) show up in your profiles much?  Recently 
I looked into calling it only on demand instead of for all incoming 
connections because doing that unconditional with a user-supplied 
("tainted") hostname just felt wrong.  The resulting patch series turned 
out to be not very pretty and I didn't see any performance improvements 
in my very limited tests, however; not sure if it's worth it.

René

next prev parent reply	other threads:[~2014-10-19 11:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-11 11:00 [PATCH] receive-pack: plug minor memory leak in unpack() René Scharfe
2014-10-12  1:53 ` Jeff King
2014-10-13 19:08   ` Junio C Hamano
2014-10-14  9:16     ` Jeff King
2014-10-19 11:13       ` René Scharfe [this message]
2014-10-20  9:19         ` Jeff King
2014-10-19 11:13     ` [PATCH 1/2] run-command: add env_array, an optional argv_array for env René Scharfe
2014-10-19 11:14     ` [PATCH 2/2] use env_array member of struct child_process René Scharfe
2014-10-20  9:19       ` Jeff King
2014-11-09 13:49         ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54439CDA.9070804@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.