public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Keith Busch <keith.busch@intel.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfsprogs: Issue smaller discards at mkfs
Date: Thu, 26 Oct 2017 12:59:23 -0700	[thread overview]
Message-ID: <20171026195923.GB5483@magnolia> (raw)
In-Reply-To: <20171026183216.GA27317@localhost.localdomain>

On Thu, Oct 26, 2017 at 12:32:17PM -0600, Keith Busch wrote:
> On Thu, Oct 26, 2017 at 01:01:29PM -0500, Eric Sandeen wrote:
> > On 10/26/17 12:49 PM, Eric Sandeen wrote:
> > > Yeah, lots of devices are unhappy with large discards.  And yeah, in the
> > > end I think this papers over a kernel and/or hardware problem.
> > > 
> > > But sometimes we do that, if only to keep things working reasonably
> > > well with older kernels or hardware that'll never get fixed...
> > > 
> > > (TBH sometimes I regret putting mkfs-time discard in by default in the
> > > first place.)
> > 
> > I think I left this on a too-positive note.  It seems pretty clear that there
> > is no way to fix all of userspace to not issue "too big" discards, when
> > "too big" isn't even well-defined, or specified by anything at all.
> 
> Yeah, I totally get this proposal is just a bandaid, and other user
> space programs may suffer when used with devices behaving this way. XFS
> is just very popular, so it's frequently reported as problematic against
> large capacity devices.

Sure, but now you have to go fix mke2fs and everything /else/ that
issues BLKDISCARD (or FALLOC_FL_PUNCH) on a large file / device, and
until you fix every program to work around this weird thing in the
kernel there'll still be someone somewhere with this timeout problem...

...so I started digging into what the kernel does with a BLKDISCARD
request, which is to say that I looked at blkdev_issue_discard.  That
function uses blk_*_plug() to wrap __blkdev_issue_discard, which in turn
splits the request into a chain of UINT_MAX-sized struct bios.

128G's worth of 4G ios == 32 chained bios.

2T worth of 4G ios == 512 chained bios.

So now I'm wondering, is the problem more that the first bio in the
chain times out because the last one hasn't finished yet, so the whole
thing gets aborted because we chained too much work together?

Would it make sense to fix __blkdev_issue_discard to chain fewer bios
together?  Or just issue the bios independently and track the
completions individually?

> > I'm not wise in the ways of queueing and throttling, but from my naiive
> > perspective, it seems like something to be fixed in the kernel, or if it
> > can't, export some new "maximum discard request size" which can be trusted?

I would've thought that's what max_discard_sectors was for, but... eh.

How big is the device you were trying to mkfs, anyway?

--D

> The problem isn't really that a discard sent to the device was "too
> big". It's that "too many" are issued at the same time, and there isn't
> a way for a driver to limit the number of outstanding discards without
> affecting read/write.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2017-10-26 19:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-26 14:41 [PATCH] xfsprogs: Issue smaller discards at mkfs Keith Busch
2017-10-26 16:25 ` Darrick J. Wong
2017-10-26 17:49   ` Eric Sandeen
2017-10-26 18:01     ` Eric Sandeen
2017-10-26 18:32       ` Keith Busch
2017-10-26 19:59         ` Darrick J. Wong [this message]
2017-10-26 21:24           ` Keith Busch
2017-10-26 22:24             ` Dave Chinner
2017-10-26 23:09               ` Keith Busch
2017-10-26 18:00   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026195923.GB5483@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=keith.busch@intel.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox