From: Jaegeuk Kim <jaegeuk@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>,
Damien Le Moal <dlemoal@kernel.org>, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org
Subject: Re: [PATCH 3/3] block/mq-deadline: Disable I/O prioritization in certain cases
Date: Thu, 14 Dec 2023 09:22:51 -0800 [thread overview]
Message-ID: <ZXs563M66THrUw50@google.com> (raw)
In-Reply-To: <20231214085729.GA9099@lst.de>
On 12/14, Christoph Hellwig wrote:
> On Wed, Dec 13, 2023 at 08:41:32AM -0800, Jaegeuk Kim wrote:
> > I don't have any
> > concern to keep the same ioprio on writes, since handheld devices are mostly
> > sensitive to reads. So, if you have other use-cases using zoned writes which
> > require different ioprio on writes, I think you can suggest a knob to control
> > it by users.
>
> Get out of your little handheld world. In Linux we need a generally usable
> I/O stack, and any feature exposed by the kernel and will be used quite
> differently than you imagine.
>
> Just like people will add reordering to the I/O stack that's not there
> right now (in addition to the ones your testing doesn't hit). That
> doensn't mean we should avoid them - you genereally get better performance
> by not reordering without a good reason (like thotting), but especially
> in error handling paths or resource constrained environment they will
> hapen all over. We've had this whole discussion with the I/O barriers
> that did not work for exactly the same reasons.
>
> >
> > >
> > > > it is essential to place the data per file to get better bandwidth. And for
> > > > NAND-based storage, filesystem is the right place to deal with the more efficient
> > > > garbage collecion based on the known data locations.
> > >
> > > And that works perfectly fine match for zone append.
> >
> > How that works, if the device gives random LBAs back to the adjacent data in
> > a file? And, how to make the LBAs into the sequential ones back?
>
> Why would your device pick random LBAs? If you send a zone append to
> zone it will be written at the write pointer, which is absolutely not
> random. All I/O written in a single write is going to be sequential,
> so just like for all other devices doing large sequential writes is
> important. Multiple writes can get reordered, but if you havily hit
> the same zone you'd get the same effect in the file system allocator
> too.
How can you guarantee the device does not give any random LBAs? What'd
be the selling point of zone append to end users? Are you sure this can
give the better write trhought forever? Have you considered how to
implement this in device side such as FTL mapping overhead and garbage
collection leading to tail latencies?
My takeaway on the two approaches would be:
zone_append zone_write
----------- ----------
LBA from FTL from filesystem
FTL mapping Page-map Zone-map
SRAM/DRAM needs Large Small
FTL GC Required Not required
Tail latencies Exist Not exisit
GC Efficience Worse Better
Longevity As-is Longer
Discard cmd Required Not required
Block complexity Small Large
Failure cases Less exist Exist
Fsck Don't know F2FS-TOOLS support
Filesystem BTRFS support(?) F2FS support
Given this, I took zone_write, especially for mobile devices, since we can
recover the unaligned writes in the corner cases by fsck. And, most benefit
would be getting rid of FTL mapping overhead which improves random read IOPs
significantly due to the lack of SRAM in low-end storages. And, longer lifetime
by mitigating garbage collection overhead is more important in mobile world.
If there's any flag or knob that we can set, IMO, that'd be enough.
>
> > Sorry, I needed to stop reading here, as you're totally biased. This is not
> > the case in JEDEC, as Bart spent multiple years to synchronize the technical
> > benefitcs that we've seen across UFS vendors as well as OEMs.
>
> *lol* There is no more fucked up corporate pressure standard committee
> than the storage standards in JEDEC. That's why not one actually takes
> them seriously.
next prev parent reply other threads:[~2023-12-14 17:22 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-05 5:32 [PATCH 0/3] Improve mq-deadline I/O priority support Bart Van Assche
2023-12-05 5:32 ` [PATCH 1/3] block/mq-deadline: Use dd_rq_ioclass() instead of open-coding it Bart Van Assche
2023-12-06 2:35 ` Damien Le Moal
2023-12-11 16:54 ` Christoph Hellwig
2023-12-05 5:32 ` [PATCH 2/3] block/mq-deadline: Introduce dd_bio_ioclass() Bart Van Assche
2023-12-06 2:35 ` Damien Le Moal
2023-12-11 16:55 ` Christoph Hellwig
2023-12-18 17:35 ` Bart Van Assche
2023-12-05 5:32 ` [PATCH 3/3] block/mq-deadline: Disable I/O prioritization in certain cases Bart Van Assche
2023-12-06 2:42 ` Damien Le Moal
2023-12-06 3:24 ` Bart Van Assche
2023-12-08 0:03 ` Bart Van Assche
2023-12-08 3:37 ` Damien Le Moal
2023-12-08 18:40 ` Bart Van Assche
2023-12-11 7:40 ` Damien Le Moal
2023-12-12 22:44 ` Bart Van Assche
2023-12-12 23:52 ` Damien Le Moal
2023-12-13 1:02 ` Bart Van Assche
2023-12-13 5:29 ` Damien Le Moal
2023-12-11 16:57 ` Christoph Hellwig
2023-12-11 17:20 ` Bart Van Assche
2023-12-12 15:40 ` Christoph Hellwig
2023-12-11 22:40 ` Damien Le Moal
2023-12-12 15:41 ` Christoph Hellwig
2023-12-12 17:15 ` Bart Van Assche
2023-12-12 17:18 ` Christoph Hellwig
2023-12-12 17:42 ` Bart Van Assche
2023-12-12 17:48 ` Christoph Hellwig
2023-12-12 18:09 ` Bart Van Assche
2023-12-12 18:13 ` Christoph Hellwig
2023-12-12 18:19 ` Bart Van Assche
2023-12-12 18:26 ` Christoph Hellwig
2023-12-12 19:03 ` Jaegeuk Kim
2023-12-12 23:44 ` Damien Le Moal
2023-12-13 16:49 ` Jaegeuk Kim
2023-12-13 22:55 ` Damien Le Moal
2023-12-13 15:56 ` Christoph Hellwig
2023-12-13 16:41 ` Jaegeuk Kim
2023-12-14 8:57 ` Christoph Hellwig
2023-12-14 17:22 ` Jaegeuk Kim [this message]
2023-12-15 1:12 ` Damien Le Moal
2023-12-15 2:03 ` Jaegeuk Kim
2023-12-15 2:20 ` Keith Busch
2023-12-15 4:49 ` Christoph Hellwig
2023-12-14 19:32 ` Bart Van Assche
2023-12-14 0:08 ` Bart Van Assche
2023-12-14 0:37 ` Damien Le Moal
2023-12-14 8:51 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXs563M66THrUw50@google.com \
--to=jaegeuk@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=dlemoal@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.