Re: [PATCH] Btrfs: fix num_start_workers count if we fail to make an alloc

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: Josef Bacik <josef@redhat.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] Btrfs: fix num_start_workers count if we fail to make an alloc
Date: Fri, 18 Nov 2011 20:20:56 +0000	[thread overview]
Message-ID: <20111118202056.GA2203@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1321645134-3944-1-git-send-email-josef@redhat.com>

On Fri, Nov 18, 2011 at 02:38:54PM -0500, Josef Bacik wrote:
> Al pointed out that if we fail to start a worker for whatever reason (ENOMEM
> basically), we could leak our count for num_start_workers, and so we'd think we
> had more workers than we actually do.  This could cause us to shrink workers
> when we shouldn't or not start workers when we should.  So check the return
> value and if we failed fix num_start_workers and fallback.  Thanks,

It's actually uglier than that; consider check_pending_workers_create()
where we
	* bump the num_start_workers
	* call start_new_worker(), which can fail, and then we have the same
leak; if it doesn't fail, it schedules a call of start_new_worker_func()
	* when start_new_worker_func() runs, it does btrfs_start_workers(),
which can run into the same leak again (this time on another pool - one
we have as ->async_helper).

Worse, __btrfs_start_workers() does btrfs_stop_workers() on failure.  That,
to put it mildly, is using excessive force.  As far as I can see, it's
_never_ the right thing to do - __btrfs_start_workers() is always getting
1 as the second argument, so even calls from mount path don't need that
kind of "kill ones we'd already created if we fail halfway through".  It
used to make sense when they had all been started at mount time, but now
it's useless in the best case (mount) and destructive elsewhere (when
pool had already been non-empty).

So I'd suggest killing that call of btrfs_stop_workers() completely, losing
the num_workers argument (along with a loop in __btrfs_start_workers())
and looking into check_pending_workers_create() path.

Probably I'd put decrement on ->num_workers_starting into failure exits of
__btrfs_start_workers() and start_new_worker(), but... can btrfs_queue_worker()
ever return non-zero?  AFAICS it can't and we could just merge
start_new_worker() into check_pending_workers() and pull allocation before
incrementing the ->num_workers_starting...

next prev parent reply	other threads:[~2011-11-18 20:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-18 19:38 [PATCH] Btrfs: fix num_start_workers count if we fail to make an alloc Josef Bacik
2011-11-18 20:20 ` Al Viro [this message]
2011-11-19  1:37   ` Al Viro
2011-11-19  2:12     ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111118202056.GA2203@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=josef@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).