Re: [PATCH] receive-pack: plug minor memory leak in unpack()

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "René Scharfe" <l.s.r@web.de>
To: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] receive-pack: plug minor memory leak in unpack()
Date: Sun, 19 Oct 2014 13:13:30 +0200	[thread overview]
Message-ID: <54439CDA.9070804@web.de> (raw)
In-Reply-To: <20141014091628.GB16686@peff.net>

Am 14.10.2014 um 11:16 schrieb Jeff King:
> On Mon, Oct 13, 2014 at 12:08:09PM -0700, Junio C Hamano wrote:
>
>>> I wonder if run-command should provide a managed env array similar
>>> to the "args" array.

That's a good idea.

>>
>> I took a look at a few of them:
>
> I took a brief look, too.
>
> I had hoped we could just all it "env" and everybody would be happy
> using it instead of bare pointers. But quite a few callers assign
> "local_repo_env" to to child_process.env. We could copy it into an
> argv_array, of course, but that really feels like working around the
> interface. So I think we would prefer to keep both forms available.

We could add a flag (clear_local_repo_env?) and reference local_repo_env 
only in run-command.c for these cases.  But some other cases remain that 
are better off providing their own array, like in daemon.c.

> That raises the question: what should it be called? The "argv_array"
> form of "argv" is called "args". The more I see it, the more I hate that
> name, as the two are easily confused. We could have:
>
>    const char **argv;
>    struct argv_array argv_array;
>    const char **env;
>    struct argv_array env_array;
>
> though "argv_array" is a lot to type when you have a string of
> argv_array_pushf() calls (which are already by themselves kind of
> verbose). Maybe that's not too big a deal, though.

I actually like args and argv. :)  Mixing them up is noticed by the 
compiler, so any confusion is cleared up rather quickly.

> We could flip it to give the managed version the short name (and calling
> the unmanaged version "env_ptr" or something). That would require
> munging the existing callers, but the tweak would be simple.

Perhaps, but I'm a but skeptical of big renames.  Let's start small and 
add env_array, and see how far we get with that.

>>   - daemon.c::handle() uses a static set of environment variables
>>     that are not built with argv_array().  Same for argv.
>
> Most of the callers you mentioned are good candidates. This one is
> tricky.
>
> The argv array gets malloc'd and set up by the parent git-daemon
> process. Then each time we get a client, we create a new struct
> child_process that references it. So using the managed argv-array would
> actually be a bit more complicated (and not save any memory; we just
> always point to the single copy for each child).
>
> For the environment, we build it in a function-local buffer, point the
> child_process's env field at it, start the child, and then copy the
> child_process into our global list of children. When we notice a child
> is dead (by linearly going through the list with waitpid), we free the
> list entry. So there are a few potentially bad things here:
>
>    1. We memcpy the child_process to put it on the list. Which does work,
>       though it feels a little like we are violating the abstraction
>       barrier.
>
>    2. The child_process in the list points to the local "env" buffer that
>       is no longer valid. There's no bug because we don't ever look at
>       it. Moving to a managed env would fix that. But I have to wonder if
>       we even want to be keeping the "struct child_process" around in the
>       first place (all we really care about is the pid).
>
>    3. If we do move to a managed env, then we expect it to get cleaned up
>       in finish_command. But we never call that; we just free the list
>       memory containing the child_process. We would want to call
>       finish_command. Except that we will have reaped the process already
>       with our call to waitpid() from check_dead_children. So we'd need a
>       new call to do just the cleanup without the wait, I guess.
>
>    4. For every loop on the listen socket, we call waitpid() for each
>       living child, which is a bit wasteful. We'd probably do better to
>       call waitpid(0, &status, WNOHANG), and then look up the resulting
>       pid in a hashmap or sorted list when we actually see something that
>       died. I don't know that this is a huge problem in practice. We use
>       git-daemon pretty heavily on our backend servers at GitHub, and it
>       seems to use about 5% of a CPU constantly on each machine. Which is
>       kind of lame, given that it isn't doing anything at all, but is
>       hardly earth-shattering.
>
> So I'm not sure if it is worth converting to a managed env. There's a
> bit of ugliness, but none of it is causing any problems, and doing so
> opens a can of worms. The most interesting thing to fix (to me, anyway)
> is number 4, but that has nothing to do with the env in the first place.
> :)

Trickiness makes me nervous, especially in daemon.c.  And 5% CPU usage 
just for waiting sounds awful.  Using waitpid(0, ...) is not supported 
by the current implementation in compat/mingw.c, however.

I agree that env handling should only be changed after the wait loop has 
been improved.

By the way, does getaddrinfo(3) show up in your profiles much?  Recently 
I looked into calling it only on demand instead of for all incoming 
connections because doing that unconditional with a user-supplied 
("tainted") hostname just felt wrong.  The resulting patch series turned 
out to be not very pretty and I didn't see any performance improvements 
in my very limited tests, however; not sure if it's worth it.

René

next prev parent reply	other threads:[~2014-10-19 11:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-11 11:00 [PATCH] receive-pack: plug minor memory leak in unpack() René Scharfe
2014-10-12  1:53 ` Jeff King
2014-10-13 19:08   ` Junio C Hamano
2014-10-14  9:16     ` Jeff King
2014-10-19 11:13       ` René Scharfe [this message]
2014-10-20  9:19         ` Jeff King
2014-10-19 11:13     ` [PATCH 1/2] run-command: add env_array, an optional argv_array for env René Scharfe
2014-10-19 11:14     ` [PATCH 2/2] use env_array member of struct child_process René Scharfe
2014-10-20  9:19       ` Jeff King
2014-11-09 13:49         ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54439CDA.9070804@web.de \
    --to=l.s.r@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).