linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@fb.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-raid@vger.kernel.org
Subject: Re: [RFC] using mempools for raid5-cache
Date: Wed, 09 Dec 2015 17:34:45 +1100	[thread overview]
Message-ID: <87vb88p1tm.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20151209012812.GA2403138@devbig084.prn1.facebook.com>

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

On Wed, Dec 09 2015, Shaohua Li wrote:

> On Wed, Dec 09, 2015 at 11:36:30AM +1100, NeilBrown wrote:
>> On Thu, Dec 03 2015, Christoph Hellwig wrote:
>> 
>> > Currently the raid5-cache code is heavily relying on GFP_NOFAIL allocations.
>> >
>> > I've looked into replacing these with mempools and biosets, and for the
>> > bio and the meta_page that's pretty trivial as they have short life times
>> > and do make guaranteed progress.  I'm massively struggling with the iounit
>> > allocation, though.  These can live on for a long time over log I/O, cache
>> > flushing and last but not least RAID I/O, and every attempt at something
>> > mempool-like results in reproducible deadlocks.  I wonder if we need to
>> > figure out some more efficient data structure to communicate the completion
>> > status that doesn't rely on these fairly long living allocations from
>> > the I/O path.
>> 
>> Presumably the root cause of these deadlocks is that the raid5d thread
>> has called
>>    handle_stripe -> ops_run_io ->r5l_write_stripe -> r5l_log_stripe
>>       -> r5l_get_meta -> r5l_new_meta
>> 
>> and r5l_new_meta is blocked on memory allocation, which won't complete
>> until some raid5 stripes get written out, which requires raid5d to do
>> something more useful than sitting and waiting.
>> 
>> I suspect a good direction towards a solution would be to allow the
>> memory allocation to fail, to cleanly propagate that failure indication
>> up through r5l_log_stripe to r5l_write_stripe which falls back to adding
>> the stripe_head to ->no_space_stripes.
>> 
>> Then we only release stripes from no_space_stripes when a memory
>> allocation might succeed.
>> 
>> There are lots of missing details, and possibly we would need a separate
>> list rather than re-using no_space_stripes.
>> But the key idea is that raid5d should never block (except beneath
>> submit_bio on some other device) and when it cannot make progress
>> without blocking, it should queue the stripe_head for later handling.
>> 
>> Does that make sense?
>
> It does remove the scary __GFP_NOFAIL, but the approach is essentially
> idential to a 'retry after allocation failure'. Why not just let the mm
> (with __GFP_NOFAIL) to do the retry then?
>

Because deadlocks.

If raid5d is waiting for the mm to allocated memory, then it cannot
retire write requests which could free up memory.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

  reply	other threads:[~2015-12-09  6:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-02 16:10 [RFC] using mempools for raid5-cache Christoph Hellwig
2015-12-02 16:10 ` [PATCH 1/2] raid5-cache: use a bio_set Christoph Hellwig
2015-12-03  5:01   ` Shaohua Li
2015-12-08 23:22   ` NeilBrown
2015-12-14 21:12     ` Christoph Hellwig
2015-12-02 16:10 ` [PATCH 2/2] raid5-cache: use a mempool for the metadata block Christoph Hellwig
2015-12-08 23:27   ` NeilBrown
2015-12-03  4:49 ` [RFC] using mempools for raid5-cache Shaohua Li
2015-12-09  0:36 ` NeilBrown
2015-12-09  1:28   ` Shaohua Li
2015-12-09  6:34     ` NeilBrown [this message]
2015-12-10 23:40       ` Shaohua Li
2015-12-11  0:09         ` NeilBrown
2015-12-11  1:10           ` Shaohua Li
2015-12-11  1:56             ` NeilBrown
2015-12-09 13:51     ` Wols Lists

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87vb88p1tm.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=hch@lst.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=shli@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).