From: Hannes Reinecke <hare@suse.de>
To: Viacheslav Dubeyko <slava@dubeyko.com>,
Bart Van Assche <bvanassche@acm.org>,
Matthew Wilcox <willy@infradead.org>
Cc: lsf-pc@lists.linuxfoundation.org, linux-mm@kvack.org,
linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] Large block for I/O
Date: Fri, 22 Dec 2023 13:29:18 +0100 [thread overview]
Message-ID: <4f03e599-2772-4eb3-afb2-efa788eb08c4@suse.de> (raw)
In-Reply-To: <BB694C7D-0000-4E2F-B26C-F0E719119B0C@dubeyko.com>
On 12/22/23 09:23, Viacheslav Dubeyko wrote:
>
>
>> On Dec 21, 2023, at 11:33 PM, Bart Van Assche <bvanassche@acm.org> wrote:
>>
>
> <skipped>
>
>>> .
>>
>> Hi Hannes,
>>
>> I'm interested in this topic. But I'm wondering whether the disadvantages of
>> large blocks will be covered? Some NAND storage vendors are less than
>> enthusiast about increasing the logical block size beyond 4 KiB because it
>> increases the size of many writes to the device and hence increases write
>> amplification.
>>
>
> I am also interested in this discussion. Every SSD manufacturer carefully hides
> the details of architecture and FTL’s behavior. I believe that switching on bigger
> logical size (like 8KB, 16KB, etc) could be even better for SSD's internal mapping
> scheme and erase blocks management. I assume that it could require significant
> reworking the firmware and, potentially, ASIC logic. This could be the main pain
> for SSD manufactures. Frankly speaking, I don’t see the direct relation between
> increasing logical block size and increasing write amplification. If you have 16KB
> logical block size on SSD side and file system will continue to use 4KB logical
> block size, then, yes, I can see the problem. But if file system manages the space
> in 16KB logical blocks and carefully issue the I/O requests of proper size, then
> everything should be good. Again, FTL is simply trying to write logical blocks into
> erase block. And we have, for example, 8MB erase block, then mapping and writing
> 16KB logical blocks looks like more beneficial operation compared with 4KB logical
> block.
>
> So, I see more troubles on file systems side to support bigger logical size. For example,
> we discussed the 8KB folio size support recently. Matthew already shared the patch
> for supporting 8KB folio size, but everything should be carefully tested. Also, I experienced
> the issue with read ahead logic. For example, if I format my file system volume with 32KB
> logical block, then read ahead logic returns to me 16KB folios that was slightly surprising
> to me. So, I assume we can find a lot of potential issues on file systems side for bigger
> logical size from the point of view of efficiency of metadata and user data operations.
> Also, high-loaded systems could have fragmented memory that could make the memory
> allocation more tricky operation. I mean here that it could be not easy to allocate one big
> folio. Log-structured file systems can easily aligned write I/O requests for bigger logical
> size. But in-place update file systems can increase write amplification for bigger logical
> size because of necessity to flush bigger portion of data for small modification. However,
> FTL can use delta-encoding and smart logic of compaction several logical blocks into
> one NAND flash page. And, by the way, NAND flash page usually is bigger than 4KB.
>
And that is actually a very valid point; memory fragmentation will
become an issue with larger block sizes.
Theoretically it should be quite easily solved; just switch the memory
subsystem to use the largest block size in the system, and run every
smaller memory allocation via SLUB (or whatever the allocator-of-the-day
currently is :-). Then trivially the system will never be fragmented,
and I/O can always use large folios.
However, that means to do away with alloc_page(), which is still in
widespread use throughout the kernel. I would actually in favour of it,
but it might be that mm people have a different view.
Matthew, worth a new topic?
Handling memory fragmentation on large block I/O systems?
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich
next prev parent reply other threads:[~2023-12-22 12:29 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7970ad75-ca6a-34b9-43ea-c6f67fe6eae6@iogearbox.net>
2023-12-20 10:01 ` LSF/MM/BPF: 2024: Call for Proposals Daniel Borkmann
2023-12-20 15:03 ` [LSF/MM/BPF TOPIC] Large block for I/O Hannes Reinecke
2023-12-21 20:33 ` Bart Van Assche
2023-12-21 20:42 ` Matthew Wilcox
2023-12-21 21:00 ` Bart Van Assche
2023-12-22 5:09 ` Christoph Hellwig
2023-12-22 5:13 ` Matthew Wilcox
2023-12-22 5:37 ` Christoph Hellwig
2024-01-08 19:30 ` Bart Van Assche
2024-01-08 19:35 ` Matthew Wilcox
2024-02-22 18:45 ` Luis Chamberlain
2024-02-25 23:09 ` Dave Chinner
2024-02-26 15:25 ` Luis Chamberlain
2024-03-07 1:59 ` Luis Chamberlain
2024-03-07 5:31 ` Dave Chinner
2024-03-07 7:29 ` Luis Chamberlain
2023-12-22 8:23 ` Viacheslav Dubeyko
2023-12-22 12:29 ` Hannes Reinecke [this message]
2023-12-22 13:29 ` Matthew Wilcox
2023-12-22 15:10 ` Keith Busch
2023-12-22 16:06 ` Matthew Wilcox
2023-12-25 8:55 ` Viacheslav Dubeyko
2023-12-25 8:12 ` Viacheslav Dubeyko
2024-02-23 16:41 ` Pankaj Raghav (Samsung)
2024-01-17 13:37 ` LSF/MM/BPF: 2024: Call for Proposals [Reminder] Daniel Borkmann
2024-02-14 13:03 ` LSF/MM/BPF: 2024: Call for Proposals [Final Reminder] Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4f03e599-2772-4eb3-afb2-efa788eb08c4@suse.de \
--to=hare@suse.de \
--cc=bvanassche@acm.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linuxfoundation.org \
--cc=slava@dubeyko.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox