From: NeilBrown <neilb@suse.com>
To: Shaohua Li <shli@kernel.org>, linux-raid@vger.kernel.org
Cc: Shaohua Li <shli@fb.com>, Mikulas Patocka <mpatocka@redhat.com>
Subject: Re: [PATCH] md: make suspend range wait timed out
Date: Fri, 16 Jun 2017 13:26:00 +1000 [thread overview]
Message-ID: <877f0c7gvr.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <d95473d287e196a86a83390cb9c136913643bd61.1497037210.git.shli@fb.com>
[-- Attachment #1: Type: text/plain, Size: 4431 bytes --]
On Fri, Jun 09 2017, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
>
> suspend range is controlled by userspace. If userspace doesn't clear suspend
> range, it's possible a thread will wait for the range forever, we can't even
> kill it. This is bad behavior. Add a timeout in the wait. If timeout happens,
> we return IO error. The app controlling suspend range looks like part of disk
> firmware, if disk isn't responded for a long time, timed out IO error is
> returned.
>
> A simple search in SCSI code shows maximum IO timeout is 120s, so I use this
> value here too.
I really don't like this. It is an API change with no really
justification. Has the current behavior caused a problem?
Both md and dm export APIs which allow IO to be suspended while
changes are made. Changing that to a timed-out period needs, I think,
to be clearly justified.
If it is changed to a timed-out period, then that should be explicit,
rather than having each request independently time out.
i.e. when the suspend is initiated, the end-time should be computed, and
any IO would block until that time, not block for some number of
seconds.
If an md device is left suspended, then the current code will block IO
indefinitely. This patch will at a 20minute times to every single
request, which will mean IO proceeds, but extremely slowly. I don't see
that as a useful improvement.
Thanks,
NeilBrown
>
> Cc: NeilBrown <neilb@suse.com>
> Cc: Mikulas Patocka <mpatocka@redhat.com>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
> drivers/md/md.h | 1 +
> drivers/md/raid1.c | 12 +++++++++++-
> drivers/md/raid5.c | 10 +++++++++-
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 63d342d560b8..11a0ec33e79b 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -29,6 +29,7 @@
>
> #define MaxSector (~(sector_t)0)
>
> +#define MD_SUSPEND_TIMEOUT (120 * HZ)
> /*
> * These flags should really be called "NO_RETRY" rather than
> * "FAILFAST" because they don't make any promise about time lapse,
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index d9e5373444d2..bc6dee0259df 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1326,6 +1326,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> (mddev_is_clustered(mddev) &&
> md_cluster_ops->area_resyncing(mddev, WRITE,
> bio->bi_iter.bi_sector, bio_end_sector(bio)))) {
> + long remaining = -1;
>
> /*
> * As the suspend_* range is controlled by userspace, we want
> @@ -1345,10 +1346,19 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> break;
> sigfillset(&full);
> sigprocmask(SIG_BLOCK, &full, &old);
> - schedule();
> + remaining = schedule_timeout(MD_SUSPEND_TIMEOUT);
> sigprocmask(SIG_SETMASK, &old, NULL);
> + if (remaining == 0)
> + break;
> }
> finish_wait(&conf->wait_barrier, &w);
> + if (remaining == 0) {
> + pr_err("md/raid1:%s: suspend range is locked\n",
> + mdname(mddev));
> + bio->bi_error = -ETIMEDOUT;
> + bio_endio(bio);
> + return;
> + }
> }
> wait_barrier(conf, bio->bi_iter.bi_sector);
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index cf1ac2e0f4c8..24297f1530d1 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5685,6 +5685,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> if (rw == WRITE &&
> logical_sector >= mddev->suspend_lo &&
> logical_sector < mddev->suspend_hi) {
> + long remaining = -1;
> raid5_release_stripe(sh);
> /* As the suspend_* range is controlled by
> * userspace, we want an interruptible
> @@ -5697,10 +5698,17 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
> sigset_t full, old;
> sigfillset(&full);
> sigprocmask(SIG_BLOCK, &full, &old);
> - schedule();
> + remaining = schedule_timeout(
> + MD_SUSPEND_TIMEOUT);
> sigprocmask(SIG_SETMASK, &old, NULL);
> do_prepare = true;
> }
> + if (remaining == 0) {
> + pr_err("md/raid5:%s: suspend range is locked\n",
> + mdname(mddev));
> + bi->bi_error = -ETIMEDOUT;
> + break;
> + }
> goto retry;
> }
>
> --
> 2.11.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-06-16 3:26 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-09 19:41 [PATCH] md: make suspend range wait timed out Shaohua Li
2017-06-16 3:26 ` NeilBrown [this message]
2017-06-16 15:52 ` Shaohua Li
2017-06-16 18:06 ` Brad Campbell
2017-06-16 19:07 ` Shaohua Li
2017-06-16 19:22 ` Brad Campbell
2017-06-16 19:37 ` Brad Campbell
2017-06-18 21:30 ` NeilBrown
2017-06-20 0:54 ` Shaohua Li
2017-06-21 14:09 ` Mikulas Patocka
2017-06-21 16:07 ` Shaohua Li
2017-06-22 21:54 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877f0c7gvr.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=shli@fb.com \
--cc=shli@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).