From: Neil Brown <neilb@suse.de>
To: Shaohua Li <shli@fb.com>, linux-raid@vger.kernel.org
Cc: Kernel-team@fb.com, songliubraving@fb.com, hch@infradead.org,
dan.j.williams@intel.com
Subject: Re: [PATCH 2/9] raid5-cache: add trim support for log
Date: Mon, 12 Oct 2015 17:00:14 +1100 [thread overview]
Message-ID: <877fms1v9t.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <effe8247d5013e4951fd7efa06cc45efe66a94c4.1444360850.git.shli@fb.com>
[-- Attachment #1: Type: text/plain, Size: 5087 bytes --]
Shaohua Li <shli@fb.com> writes:
> Since superblock is updated infrequently, we do a simple trim of log
> disk (a synchronous trim)
>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
> drivers/md/raid5-cache.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 62 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
> index caeec10..5dbe084 100644
> --- a/drivers/md/raid5-cache.c
> +++ b/drivers/md/raid5-cache.c
> @@ -85,6 +85,7 @@ struct r5l_log {
> spinlock_t no_space_stripes_lock;
>
> bool need_cache_flush;
> + bool in_teardown;
> };
>
> /*
> @@ -644,6 +645,60 @@ void r5l_flush_stripe_to_raid(struct r5l_log *log)
> }
>
> static void r5l_write_super(struct r5l_log *log, sector_t cp);
> +static void r5l_write_super_and_discard_space(struct r5l_log *log,
> + sector_t end)
> +{
> + struct block_device *bdev = log->rdev->bdev;
> + struct mddev *mddev;
> +
> + r5l_write_super(log, end);
> +
> + if (!blk_queue_discard(bdev_get_queue(bdev)))
> + return;
> +
> + mddev = log->rdev->mddev;
> + /*
> + * This is to avoid a deadlock. r5l_quiesce holds reconfig_mutex and
> + * wait for this thread to finish. This thread waits for
> + * MD_CHANGE_PENDING clear, which is supposed to be done in
> + * md_check_recovery(). md_check_recovery() tries to get
> + * reconfig_mutex. Since r5l_quiesce already holds the mutex,
> + * md_check_recovery() fails, so the PENDING never get cleared. The
> + * in_teardown check workaround this issue.
> + * */
Thanks for this comment (but can we just end the comment with "*/"
instead of "* */" - that's what everyone else does).
It helps a lot. If feels a bit clumsy, but maybe that is just me. At
least I understand now and it make sense.
> + if (!log->in_teardown) {
> + set_bit(MD_CHANGE_DEVS, &mddev->flags);
> + set_bit(MD_CHANGE_PENDING, &mddev->flags);
> + md_wakeup_thread(mddev->thread);
> + wait_event(mddev->sb_wait,
> + !test_bit(MD_CHANGE_PENDING, &mddev->flags) ||
> + log->in_teardown);
> + /*
> + * r5l_quiesce could run after in_teardown check and hold
> + * mutex first. Superblock might get updated twice.
> + * */
> + if (log->in_teardown)
> + md_update_sb(mddev, 1);
> + } else {
> + BUG_ON(!mddev_is_locked(mddev));
> + md_update_sb(mddev, 1);
> + }
> +
> + if (log->last_checkpoint < end) {
> + blkdev_issue_discard(bdev,
> + log->last_checkpoint + log->rdev->data_offset,
> + end - log->last_checkpoint, GFP_NOIO, 0);
> + } else {
> + blkdev_issue_discard(bdev,
> + log->last_checkpoint + log->rdev->data_offset,
> + log->device_size - log->last_checkpoint,
> + GFP_NOIO, 0);
> + blkdev_issue_discard(bdev, log->rdev->data_offset, end,
> + GFP_NOIO, 0);
> + }
> +}
> +
> +
> static void r5l_do_reclaim(struct r5l_log *log)
> {
> sector_t reclaim_target = xchg(&log->reclaim_target, 0);
> @@ -685,7 +740,7 @@ static void r5l_do_reclaim(struct r5l_log *log)
> * here, because the log area might be reused soon and we don't want to
> * confuse recovery
> */
> - r5l_write_super(log, next_checkpoint);
> + r5l_write_super_and_discard_space(log, next_checkpoint);
>
> mutex_lock(&log->io_mutex);
> log->last_checkpoint = next_checkpoint;
> @@ -721,9 +776,11 @@ static void r5l_wake_reclaim(struct r5l_log *log, sector_t space)
>
> void r5l_quiesce(struct r5l_log *log, int state)
> {
> + struct mddev *mddev;
> if (!log || state == 2)
> return;
> if (state == 0) {
> + log->in_teardown = 0;
> log->reclaim_thread = md_register_thread(r5l_reclaim_thread,
> log->rdev->mddev, "reclaim");
> } else if (state == 1) {
> @@ -731,6 +788,10 @@ void r5l_quiesce(struct r5l_log *log, int state)
> * at this point all stripes are finished, so io_unit is at
> * least in STRIPE_END state
> */
> + log->in_teardown = 1;
> + /* make sure r5l_write_super_and_discard_space exits */
> + mddev = log->rdev->mddev;
> + wake_up(&mddev->sb_wait);
> r5l_wake_reclaim(log, -1L);
> md_unregister_thread(&log->reclaim_thread);
I think this is racey (though you haven't changed the code, so it must
have been racy before).
There is no guarantee that the thread will actually wake up to handle
the reclaim before md_unregister_thread marks the thread as
kthread_should_stop().
Maybe the thing to do would be
md_unregister_thread()
log->reclaim_target = (unsigned long) -1;
r5l_do_reclaim(log);
or something like that. i.e. just do it synchronously.
Then.... maybe you don't need in_teardown.
When r5l_do_reclaim() is called from the thread, you wait.
When called from r5l_quiesce(), you md_update_sb().
Would that work?
Thanks,
NeilBrown
> r5l_do_reclaim(log);
> --
> 2.4.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
next prev parent reply other threads:[~2015-10-12 6:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-09 4:54 [PATCH 0/9] raid5-cache fixes Shaohua Li
2015-10-09 4:54 ` [PATCH 1/9] MD: fix info output for journal disk Shaohua Li
2015-10-12 5:37 ` Neil Brown
2015-10-12 17:01 ` Shaohua Li
2015-10-12 23:26 ` Neil Brown
2015-10-12 23:38 ` Shaohua Li
2015-10-09 4:54 ` [PATCH 2/9] raid5-cache: add trim support for log Shaohua Li
2015-10-12 6:00 ` Neil Brown [this message]
2015-10-12 16:57 ` Shaohua Li
2015-10-09 4:54 ` [PATCH 3/9] raid5: journal disk can't be removed Shaohua Li
2015-10-09 4:54 ` [PATCH 4/9] raid5-cache: IO error handling Shaohua Li
2015-10-09 4:54 ` [PATCH 5/9] MD: add new bit to indicate raid array with journal Shaohua Li
2015-10-09 4:54 ` [PATCH 6/9] raid5-cache: start raid5 readonly if journal is missing Shaohua Li
2015-10-09 4:54 ` [PATCH 7/9] MD: kick out journal disk if it's not fresh Shaohua Li
2015-10-09 4:54 ` [PATCH 8/9] MD: set journal disk ->raid_disk Shaohua Li
2015-10-09 4:54 ` [PATCH 9/9] MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW Shaohua Li
2015-10-12 6:17 ` [PATCH 0/9] raid5-cache fixes Neil Brown
2015-10-13 21:03 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877fms1v9t.fsf@notabene.neil.brown.name \
--to=neilb@suse.de \
--cc=Kernel-team@fb.com \
--cc=dan.j.williams@intel.com \
--cc=hch@infradead.org \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
--cc=songliubraving@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).