All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-raid@vger.kernel.org
Subject: Re: [RFC] using mempools for raid5-cache
Date: Wed, 09 Dec 2015 17:34:45 +1100	[thread overview]
Message-ID: <87vb88p1tm.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20151209012812.GA2403138@devbig084.prn1.facebook.com>

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

On Wed, Dec 09 2015, Shaohua Li wrote:

> On Wed, Dec 09, 2015 at 11:36:30AM +1100, NeilBrown wrote:
>> On Thu, Dec 03 2015, Christoph Hellwig wrote:
>> 
>> > Currently the raid5-cache code is heavily relying on GFP_NOFAIL allocations.
>> >
>> > I've looked into replacing these with mempools and biosets, and for the
>> > bio and the meta_page that's pretty trivial as they have short life times
>> > and do make guaranteed progress.  I'm massively struggling with the iounit
>> > allocation, though.  These can live on for a long time over log I/O, cache
>> > flushing and last but not least RAID I/O, and every attempt at something
>> > mempool-like results in reproducible deadlocks.  I wonder if we need to
>> > figure out some more efficient data structure to communicate the completion
>> > status that doesn't rely on these fairly long living allocations from
>> > the I/O path.
>> 
>> Presumably the root cause of these deadlocks is that the raid5d thread
>> has called
>>    handle_stripe -> ops_run_io ->r5l_write_stripe -> r5l_log_stripe
>>       -> r5l_get_meta -> r5l_new_meta
>> 
>> and r5l_new_meta is blocked on memory allocation, which won't complete
>> until some raid5 stripes get written out, which requires raid5d to do
>> something more useful than sitting and waiting.
>> 
>> I suspect a good direction towards a solution would be to allow the
>> memory allocation to fail, to cleanly propagate that failure indication
>> up through r5l_log_stripe to r5l_write_stripe which falls back to adding
>> the stripe_head to ->no_space_stripes.
>> 
>> Then we only release stripes from no_space_stripes when a memory
>> allocation might succeed.
>> 
>> There are lots of missing details, and possibly we would need a separate
>> list rather than re-using no_space_stripes.
>> But the key idea is that raid5d should never block (except beneath
>> submit_bio on some other device) and when it cannot make progress
>> without blocking, it should queue the stripe_head for later handling.
>> 
>> Does that make sense?
>
> It does remove the scary __GFP_NOFAIL, but the approach is essentially
> idential to a 'retry after allocation failure'. Why not just let the mm
> (with __GFP_NOFAIL) to do the retry then?
>

Because deadlocks.

If raid5d is waiting for the mm to allocated memory, then it cannot
retire write requests which could free up memory.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  reply	other threads:[~2015-12-09  6:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-02 16:10 [RFC] using mempools for raid5-cache Christoph Hellwig
2015-12-02 16:10 ` [PATCH 1/2] raid5-cache: use a bio_set Christoph Hellwig
2015-12-03  5:01   ` Shaohua Li
2015-12-08 23:22   ` NeilBrown
2015-12-14 21:12     ` Christoph Hellwig
2015-12-02 16:10 ` [PATCH 2/2] raid5-cache: use a mempool for the metadata block Christoph Hellwig
2015-12-08 23:27   ` NeilBrown
2015-12-03  4:49 ` [RFC] using mempools for raid5-cache Shaohua Li
2015-12-09  0:36 ` NeilBrown
2015-12-09  1:28   ` Shaohua Li
2015-12-09  6:34     ` NeilBrown [this message]
2015-12-10 23:40       ` Shaohua Li
2015-12-11  0:09         ` NeilBrown
2015-12-11  1:10           ` Shaohua Li
2015-12-11  1:56             ` NeilBrown
2015-12-09 13:51     ` Wols Lists

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vb88p1tm.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=hch@lst.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.