From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Keith Busch <keith.busch@intel.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfsprogs: Issue smaller discards at mkfs
Date: Thu, 26 Oct 2017 09:25:18 -0700 [thread overview]
Message-ID: <20171026162518.GW5483@magnolia> (raw)
In-Reply-To: <20171026144131.26885-1-keith.busch@intel.com>
On Thu, Oct 26, 2017 at 08:41:31AM -0600, Keith Busch wrote:
> Running mkfs.xfs was discarding the entire capacity in a single range. The
> block layer would split these into potentially many smaller requests
> and dispatch all of them to the device at roughly the same time.
>
> SSD capacities are getting so large that full capacity discards will
> take some time to complete. When discards are deeply queued, the block
> layer may trigger timeout handling and IO failure, though the device is
> operating normally.
>
> This patch uses smaller discard ranges in a loop for mkfs to avoid
> risking such timeouts. The max discard range is arbitrarilly set to
> 128GB in this patch.
I'd have thought devices would set sane blk_queue_max_discard_sectors
so that the block layer doesn't send such a huge command that the kernel
times out...
...but then I actually went and grepped that in the kernel and
discovered that nbd, zram, raid0, mtd, and nvme all pass in UINT_MAX,
which is 2T. Frighteningly xen-blkfront passes in get_capacity() (which
overflows the unsigned int parameter on big virtual disks, I guess?).
(I still think this is the kernel's problem, not userspace's, but now
with an extra layer of OMGWTF sprayed on.)
I dunno. What kind of device produces these timeouts, and does it go
away if max_discards is lowered?
--D
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
> include/linux.h | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux.h b/include/linux.h
> index 6ce344c5..702aee0c 100644
> --- a/include/linux.h
> +++ b/include/linux.h
> @@ -132,10 +132,17 @@ static __inline__ void platform_uuid_copy(uuid_t *dst, uuid_t *src)
> static __inline__ int
> platform_discard_blocks(int fd, uint64_t start, uint64_t len)
> {
> - uint64_t range[2] = { start, len };
> + uint64_t end = start + len;
> + uint64_t size = 128ULL * 1024ULL * 1024ULL * 1024ULL;
>
> - if (ioctl(fd, BLKDISCARD, &range) < 0)
> - return errno;
> + for (; start < end; start += size) {
> + uint64_t range[2] = { start, MIN(len, size) };
> +
> + len -= range[1];
> + if (ioctl(fd, BLKDISCARD, &range) < 0)
> + return errno;
> +
> + }
> return 0;
> }
>
> --
> 2.13.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-10-26 16:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-26 14:41 [PATCH] xfsprogs: Issue smaller discards at mkfs Keith Busch
2017-10-26 16:25 ` Darrick J. Wong [this message]
2017-10-26 17:49 ` Eric Sandeen
2017-10-26 18:01 ` Eric Sandeen
2017-10-26 18:32 ` Keith Busch
2017-10-26 19:59 ` Darrick J. Wong
2017-10-26 21:24 ` Keith Busch
2017-10-26 22:24 ` Dave Chinner
2017-10-26 23:09 ` Keith Busch
2017-10-26 18:00 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171026162518.GW5483@magnolia \
--to=darrick.wong@oracle.com \
--cc=keith.busch@intel.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox