Re: [LSF/MM/BPF TOPIC] Large block for I/O

public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed

From: Hannes Reinecke <hare@suse.de>
To: Viacheslav Dubeyko <slava@dubeyko.com>,
	Bart Van Assche <bvanassche@acm.org>,
	Matthew Wilcox <willy@infradead.org>
Cc: lsf-pc@lists.linuxfoundation.org, linux-mm@kvack.org,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] Large block for I/O
Date: Fri, 22 Dec 2023 13:29:18 +0100	[thread overview]
Message-ID: <4f03e599-2772-4eb3-afb2-efa788eb08c4@suse.de> (raw)
In-Reply-To: <BB694C7D-0000-4E2F-B26C-F0E719119B0C@dubeyko.com>

On 12/22/23 09:23, Viacheslav Dubeyko wrote:
> 
> 
>> On Dec 21, 2023, at 11:33 PM, Bart Van Assche <bvanassche@acm.org> wrote:
>>
> 
> <skipped>
> 
>>> .
>>
>> Hi Hannes,
>>
>> I'm interested in this topic. But I'm wondering whether the disadvantages of
>> large blocks will be covered? Some NAND storage vendors are less than
>> enthusiast about increasing the logical block size beyond 4 KiB because it
>> increases the size of many writes to the device and hence increases write
>> amplification.
>>
> 
> I  am also interested in this discussion. Every SSD manufacturer carefully hides
> the details of architecture and FTL’s behavior. I believe that switching on bigger
> logical size (like 8KB, 16KB, etc) could be even better for SSD's internal mapping
> scheme and erase blocks management. I assume that it could require significant
> reworking the firmware and, potentially, ASIC logic. This could be the main pain
> for SSD manufactures. Frankly speaking, I don’t see the direct relation between
> increasing logical block size and increasing write amplification. If you have 16KB
> logical block size on SSD side and file system will continue to use 4KB logical
> block size, then, yes, I can see the problem. But if file system manages the space
> in 16KB logical blocks and carefully issue the I/O requests of proper size, then
> everything should be good. Again, FTL is simply trying to write logical blocks into
> erase block. And we have, for example, 8MB erase block, then mapping and writing
> 16KB logical blocks looks like more beneficial operation compared with 4KB logical
> block.
> 
> So, I see more troubles on file systems side to support bigger logical size. For example,
> we discussed the 8KB folio size support recently. Matthew already shared the patch
> for supporting 8KB folio size, but everything should be carefully tested. Also, I experienced
> the issue with read ahead logic. For example, if I format my file system volume with 32KB
> logical block, then read ahead logic returns to me 16KB folios that was slightly surprising
> to me. So, I assume we can find a lot of potential issues on file systems side for bigger
> logical size from the point of view of efficiency of metadata and user data operations.
> Also, high-loaded systems could have fragmented memory that could make the memory
> allocation more tricky operation. I mean here that it could be not easy to allocate one big
> folio. Log-structured file systems can easily aligned write I/O requests for bigger logical
> size. But in-place update file systems can increase write amplification for bigger logical
> size because of necessity to flush bigger portion of data for small modification. However,
> FTL can use delta-encoding and smart logic of compaction several logical blocks into
> one NAND flash page. And, by the way, NAND flash page usually is bigger than 4KB.
> 
And that is actually a very valid point; memory fragmentation will 
become an issue with larger block sizes.

Theoretically it should be quite easily solved; just switch the memory 
subsystem to use the largest block size in the system, and run every 
smaller memory allocation via SLUB (or whatever the allocator-of-the-day
currently is :-). Then trivially the system will never be fragmented,
and I/O can always use large folios.

However, that means to do away with alloc_page(), which is still in 
widespread use throughout the kernel. I would actually in favour of it,
but it might be that mm people have a different view.

Matthew, worth a new topic?
Handling memory fragmentation on large block I/O systems?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich

next prev parent reply	other threads:[~2023-12-22 12:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <7970ad75-ca6a-34b9-43ea-c6f67fe6eae6@iogearbox.net>
2023-12-20 10:01 ` LSF/MM/BPF: 2024: Call for Proposals Daniel Borkmann
2023-12-20 15:03   ` [LSF/MM/BPF TOPIC] Large block for I/O Hannes Reinecke
2023-12-21 20:33     ` Bart Van Assche
2023-12-21 20:42       ` Matthew Wilcox
2023-12-21 21:00         ` Bart Van Assche
2023-12-22  5:09       ` Christoph Hellwig
2023-12-22  5:13       ` Matthew Wilcox
2023-12-22  5:37         ` Christoph Hellwig
2024-01-08 19:30           ` Bart Van Assche
2024-01-08 19:35             ` Matthew Wilcox
2024-02-22 18:45               ` Luis Chamberlain
2024-02-25 23:09                 ` Dave Chinner
2024-02-26 15:25                   ` Luis Chamberlain
2024-03-07  1:59                     ` Luis Chamberlain
2024-03-07  5:31                       ` Dave Chinner
2024-03-07  7:29                         ` Luis Chamberlain
2023-12-22  8:23       ` Viacheslav Dubeyko
2023-12-22 12:29         ` Hannes Reinecke [this message]
2023-12-22 13:29           ` Matthew Wilcox
2023-12-22 15:10         ` Keith Busch
2023-12-22 16:06           ` Matthew Wilcox
2023-12-25  8:55             ` Viacheslav Dubeyko
2023-12-25  8:12           ` Viacheslav Dubeyko
2024-02-23 16:41     ` Pankaj Raghav (Samsung)
2024-01-17 13:37   ` LSF/MM/BPF: 2024: Call for Proposals [Reminder] Daniel Borkmann
2024-02-14 13:03     ` LSF/MM/BPF: 2024: Call for Proposals [Final Reminder] Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4f03e599-2772-4eb3-afb2-efa788eb08c4@suse.de \
    --to=hare@suse.de \
    --cc=bvanassche@acm.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linuxfoundation.org \
    --cc=slava@dubeyko.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox