From: NeilBrown <neilb@suse.com>
To: shli@fb.com
Cc: linux-raid@vger.kernel.org,
Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Subject: Re: [PATCH v5 3/7] raid5-ppl: Partial Parity Log write logging implementation
Date: Wed, 22 Mar 2017 09:00:47 +1100 [thread overview]
Message-ID: <87wpbib88g.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170309090003.13298-4-artur.paszkiewicz@intel.com>
[-- Attachment #1: Type: text/plain, Size: 4095 bytes --]
On Thu, Mar 09 2017, Artur Paszkiewicz wrote:
> Implement the calculation of partial parity for a stripe and PPL write
> logging functionality. The description of PPL is added to the
> documentation. More details can be found in the comments in raid5-ppl.c.
>
> Attach a page for holding the partial parity data to stripe_head.
> Allocate it only if mddev has the MD_HAS_PPL flag set.
>
> Partial parity is the xor of not modified data chunks of a stripe and is
> calculated as follows:
>
> - reconstruct-write case:
> xor data from all not updated disks in a stripe
>
> - read-modify-write case:
> xor old data and parity from all updated disks in a stripe
>
> Implement it using the async_tx API and integrate into raid_run_ops().
> It must be called when we still have access to old data, so do it when
> STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is
> stored into sh->ppl_page.
>
> Partial parity is not meaningful for full stripe write and is not stored
> in the log or used for recovery, so don't attempt to calculate it when
> stripe has STRIPE_FULL_WRITE.
>
> Put the PPL metadata structures to md_p.h because userspace tools
> (mdadm) will also need to read/write PPL.
>
> Warn about using PPL with enabled disk volatile write-back cache for
> now. It can be removed once disk cache flushing before writing PPL is
> implemented.
>
> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Sorry for the delay in getting to this for review...
> +static struct ppl_io_unit *ppl_new_iounit(struct ppl_log *log,
> + struct stripe_head *sh)
> +{
> + struct ppl_conf *ppl_conf = log->ppl_conf;
> + struct ppl_io_unit *io;
> + struct ppl_header *pplhdr;
> +
> + io = mempool_alloc(ppl_conf->io_pool, GFP_ATOMIC);
> + if (!io)
> + return NULL;
> +
> + memset(io, 0, sizeof(*io));
> + io->log = log;
> + INIT_LIST_HEAD(&io->log_sibling);
> + INIT_LIST_HEAD(&io->stripe_list);
> + atomic_set(&io->pending_stripes, 0);
> + bio_init(&io->bio, io->biovec, PPL_IO_INLINE_BVECS);
> +
> + io->header_page = mempool_alloc(ppl_conf->meta_pool, GFP_NOIO);
I'm trying to understand how these two mempool_alloc()s relate, and
particularly why the first one needs to be GFP_ATOMIC, while the second
one can safely be GFP_NOIO.
I see that the allocated memory is freed in different places:
header_page is called from the bi_endio function as soon as the write
completes, while 'io' is freed later. But I'm not sure that is enough
to make it safe.
When working with mempools, you need to assume that the pool only
contains one element, and that every time you call mempool_alloc(), it
waits for that one element to be available. While that doesn't usually
happen, it is possible and if that case isn't handled correctly, the
system can deadlock.
If no memory is available when this mempool_alloc() is called, it will
block. As it is called from the raid5d thread, the whole array will
block. So this can only complete safely is the write request has
already been submitted - or if there is some other workqueue which
submit requests after a timeout or similar.
I don't see that in the code. These ppl_io_unit structures can queue up
and are only submitted later by raid5d (I think). So if raid5d waits
for one to become free, it will wait forever.
One easy way around this problem (assuming my understanding is correct)
is to just have a single mempool which allocates both a struct
ppl_io_unit and a page. You would need to define you own alloc/free
routines for the pool but that is easy enough.
Then you only need a single mempool_alloc(), which can sensibly be
GFP_ATOMIC.
If that fails, you queue for later handling as you do now. If it
succeeds, then you continue to use the memory without any risk of
deadlocking.
Thanks,
NeilBrown
> + pplhdr = page_address(io->header_page);
> + clear_page(pplhdr);
> + memset(pplhdr->reserved, 0xff, PPL_HDR_RESERVED);
> + pplhdr->signature = cpu_to_le32(ppl_conf->signature);
> +
> + io->seq = atomic64_add_return(1, &ppl_conf->seq);
> + pplhdr->generation = cpu_to_le64(io->seq);
> +
> + return io;
> +}
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-03-21 22:00 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-09 8:59 [PATCH v5 0/7] Partial Parity Log for MD RAID 5 Artur Paszkiewicz
2017-03-09 8:59 ` [PATCH v5 1/7] md: superblock changes for PPL Artur Paszkiewicz
2017-03-09 8:59 ` [PATCH v5 2/7] raid5: separate header for log functions Artur Paszkiewicz
2017-03-09 8:59 ` [PATCH v5 3/7] raid5-ppl: Partial Parity Log write logging implementation Artur Paszkiewicz
2017-03-09 23:24 ` Shaohua Li
2017-03-10 15:16 ` Artur Paszkiewicz
2017-03-10 18:15 ` Shaohua Li
2017-03-10 18:42 ` Dan Williams
2017-03-21 22:00 ` NeilBrown [this message]
2017-03-24 16:46 ` Shaohua Li
2017-03-28 14:12 ` Artur Paszkiewicz
2017-03-28 16:16 ` Shaohua Li
2017-04-16 22:58 ` Greg Thelen
2017-04-19 8:48 ` [PATCH] uapi: fix linux/raid/md_p.h userspace compilation error Artur Paszkiewicz
2017-04-19 16:59 ` Greg Thelen
2017-04-20 16:41 ` Shaohua Li
2017-03-09 9:00 ` [PATCH v5 4/7] md: add sysfs entries for PPL Artur Paszkiewicz
2017-03-09 9:00 ` [PATCH v5 5/7] raid5-ppl: load and recover the log Artur Paszkiewicz
2017-03-09 23:30 ` Shaohua Li
2017-03-10 15:23 ` Artur Paszkiewicz
2017-03-09 9:00 ` [PATCH v5 6/7] raid5-ppl: support disk hot add/remove with PPL Artur Paszkiewicz
2017-03-09 9:00 ` [PATCH v5 7/7] raid5-ppl: runtime PPL enabling or disabling Artur Paszkiewicz
2017-03-09 23:32 ` [PATCH v5 0/7] Partial Parity Log for MD RAID 5 Shaohua Li
2017-03-10 15:40 ` [PATCH] raid5-ppl: two minor improvements Artur Paszkiewicz
2017-03-10 18:16 ` Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wpbib88g.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=artur.paszkiewicz@intel.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).