public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <keith.busch@intel.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfsprogs: Issue smaller discards at mkfs
Date: Thu, 26 Oct 2017 15:24:15 -0600	[thread overview]
Message-ID: <20171026212414.GA30535@localhost.localdomain> (raw)
In-Reply-To: <20171026195923.GB5483@magnolia>

On Thu, Oct 26, 2017 at 12:59:23PM -0700, Darrick J. Wong wrote:
> 
> Sure, but now you have to go fix mke2fs and everything /else/ that
> issues BLKDISCARD (or FALLOC_FL_PUNCH) on a large file / device, and
> until you fix every program to work around this weird thing in the
> kernel there'll still be someone somewhere with this timeout problem...

e2progs already splits large discards in a loop. ;)

> ...so I started digging into what the kernel does with a BLKDISCARD
> request, which is to say that I looked at blkdev_issue_discard.  That
> function uses blk_*_plug() to wrap __blkdev_issue_discard, which in turn
> splits the request into a chain of UINT_MAX-sized struct bios.
> 
> 128G's worth of 4G ios == 32 chained bios.
> 
> 2T worth of 4G ios == 512 chained bios.
> 
> So now I'm wondering, is the problem more that the first bio in the
> chain times out because the last one hasn't finished yet, so the whole
> thing gets aborted because we chained too much work together?

You're sort of on the right track. The timeouts are set on an individual
request in the chain rather than one timeout for the entire chain.

All the bios in the chain get turned into 'struct request' and sent
to the low-level driver. The driver calls blk_mq_start_request before
sending to hardware. That starts the timer on _that_ request,
independent of the other requests in the chain.

NVMe supports very large queues. A 4TB discard becomes 1024 individual
requests started at nearly the same time. The last ones in the queue are
the ones that risk timeout.

When we're doing read/write, latencies at the same depth are well within
tolerance, and high queue depths are good for throughput. When doing
discard, though, tail latencies fall outside the timeout tolerance at
the same queue depth.
 
> Would it make sense to fix __blkdev_issue_discard to chain fewer bios
> together?  Or just issue the bios independently and track the
> completions individually?
>
> > > I'm not wise in the ways of queueing and throttling, but from my naiive
> > > perspective, it seems like something to be fixed in the kernel, or if it
> > > can't, export some new "maximum discard request size" which can be trusted?
> 
> I would've thought that's what max_discard_sectors was for, but... eh.
> 
> How big is the device you were trying to mkfs, anyway?

The largest single SSD I have is 6.4TB. Other folks I know have larger.

  reply	other threads:[~2017-10-26 21:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-26 14:41 [PATCH] xfsprogs: Issue smaller discards at mkfs Keith Busch
2017-10-26 16:25 ` Darrick J. Wong
2017-10-26 17:49   ` Eric Sandeen
2017-10-26 18:01     ` Eric Sandeen
2017-10-26 18:32       ` Keith Busch
2017-10-26 19:59         ` Darrick J. Wong
2017-10-26 21:24           ` Keith Busch [this message]
2017-10-26 22:24             ` Dave Chinner
2017-10-26 23:09               ` Keith Busch
2017-10-26 18:00   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171026212414.GA30535@localhost.localdomain \
    --to=keith.busch@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox