linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Hannes Reinecke <hare@suse.de>
Cc: "Luis Chamberlain" <mcgrof@kernel.org>,
	"Keith Busch" <kbusch@kernel.org>,
	"Theodore Ts'o" <tytso@mit.edu>,
	"Pankaj Raghav" <p.raghav@samsung.com>,
	"Daniel Gomez" <da.gomez@samsung.com>,
	"Javier González" <javier.gonz@samsung.com>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
Date: Sat, 4 Mar 2023 17:54:38 +0000	[thread overview]
Message-ID: <ZAOF3p+vqA6pd7px@casper.infradead.org> (raw)
In-Reply-To: <edac909b-98e5-cb6d-bb80-2f6a20a15029@suse.de>

On Sat, Mar 04, 2023 at 06:17:35PM +0100, Hannes Reinecke wrote:
> On 3/4/23 17:47, Matthew Wilcox wrote:
> > On Sat, Mar 04, 2023 at 12:08:36PM +0100, Hannes Reinecke wrote:
> > > We could implement a (virtual) zoned device, and expose each zone as a
> > > block. That gives us the required large block characteristics, and with
> > > a bit of luck we might be able to dial up to really large block sizes
> > > like the 256M sizes on current SMR drives.
> > > ublk might be a good starting point.
> > 
> > Ummmm.  Is supporting 256MB block sizes really a desired goal?  I suggest
> > that is far past the knee of the curve; if we can only write 256MB chunks
> > as a single entity, we're looking more at a filesystem redesign than we
> > are at making filesystems and the MM support 256MB size blocks.
> > 
> Naa, not really. It _would_ be cool as we could get rid of all the cludges
> which have nowadays re sequential writes.
> And, remember, 256M is just a number someone thought to be a good
> compromise. If we end up with a lower number (16M?) we might be able
> to convince the powers that be to change their zone size.
> Heck, with 16M block size there wouldn't be a _need_ for zones in
> the first place.
> 
> But yeah, 256M is excessive. Initially I would shoot for something
> like 2M.

I think we're talking about different things (probably different storage
vendors want different things, or even different people at the same
storage vendor want different things).

Luis and I are talking about larger LBA sizes.  That is, the minimum
read/write size from the block device is 16kB or 64kB or whatever.
In this scenario, the minimum amount of space occupied by a file goes
up from 512 bytes or 4kB to 64kB.  That's doable, even if somewhat
suboptimal.

Your concern seems to be more around shingled devices (or their equivalent
in SSD terms) where there are large zones which are append-only, but
you can still random-read 512 byte LBAs.  I think there are different
solutions to these problems, and people are working on both of these
problems.

But if storage vendors are really pushing for 256MB LBAs, then that's
going to need a third kind of solution, and I'm not aware of anyone
working on that.

  reply	other threads:[~2023-03-04 17:54 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-01  3:52 [LSF/MM/BPF TOPIC] Cloud storage optimizations Theodore Ts'o
2023-03-01  4:18 ` Gao Xiang
2023-03-01  4:40   ` Matthew Wilcox
2023-03-01  4:59     ` Gao Xiang
2023-03-01  4:35 ` Matthew Wilcox
2023-03-01  4:49   ` Gao Xiang
2023-03-01  5:01     ` Matthew Wilcox
2023-03-01  5:09       ` Gao Xiang
2023-03-01  5:19         ` Gao Xiang
2023-03-01  5:42         ` Matthew Wilcox
2023-03-01  5:51           ` Gao Xiang
2023-03-01  6:00             ` Gao Xiang
2023-03-02  3:13 ` Chaitanya Kulkarni
2023-03-02  3:50 ` Darrick J. Wong
2023-03-03  3:03   ` Martin K. Petersen
2023-03-02 20:30 ` Bart Van Assche
2023-03-03  3:05   ` Martin K. Petersen
2023-03-03  1:58 ` Keith Busch
2023-03-03  3:49   ` Matthew Wilcox
2023-03-03 11:32     ` Hannes Reinecke
2023-03-03 13:11     ` James Bottomley
2023-03-04  7:34       ` Matthew Wilcox
2023-03-04 13:41         ` James Bottomley
2023-03-04 16:39           ` Matthew Wilcox
2023-03-05  4:15             ` Luis Chamberlain
2023-03-05  5:02               ` Matthew Wilcox
2023-03-08  6:11                 ` Luis Chamberlain
2023-03-08  7:59                   ` Dave Chinner
2023-03-06 12:04               ` Hannes Reinecke
2023-03-06  3:50             ` James Bottomley
2023-03-04 19:04         ` Luis Chamberlain
2023-03-03 21:45     ` Luis Chamberlain
2023-03-03 22:07       ` Keith Busch
2023-03-03 22:14         ` Luis Chamberlain
2023-03-03 22:32           ` Keith Busch
2023-03-03 23:09             ` Luis Chamberlain
2023-03-16 15:29             ` Pankaj Raghav
2023-03-16 15:41               ` Pankaj Raghav
2023-03-03 23:51       ` Bart Van Assche
2023-03-04 11:08       ` Hannes Reinecke
2023-03-04 13:24         ` Javier González
2023-03-04 16:47         ` Matthew Wilcox
2023-03-04 17:17           ` Hannes Reinecke
2023-03-04 17:54             ` Matthew Wilcox [this message]
2023-03-04 18:53               ` Luis Chamberlain
2023-03-05  3:06               ` Damien Le Moal
2023-03-05 11:22               ` Hannes Reinecke
2023-03-06  8:23                 ` Matthew Wilcox
2023-03-06 10:05                   ` Hannes Reinecke
2023-03-06 16:12                   ` Theodore Ts'o
2023-03-08 17:53                     ` Matthew Wilcox
2023-03-08 18:13                       ` James Bottomley
2023-03-09  8:04                         ` Javier González
2023-03-09 13:11                           ` James Bottomley
2023-03-09 14:05                             ` Keith Busch
2023-03-09 15:23                             ` Martin K. Petersen
2023-03-09 20:49                               ` James Bottomley
2023-03-09 21:13                                 ` Luis Chamberlain
2023-03-09 21:28                                   ` Martin K. Petersen
2023-03-10  1:16                                     ` Dan Helmick
2023-03-10  7:59                             ` Javier González
2023-03-08 19:35                 ` Luis Chamberlain
2023-03-08 19:55                 ` Bart Van Assche
2023-03-03  2:54 ` Martin K. Petersen
2023-03-03  3:29   ` Keith Busch
2023-03-03  4:20   ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZAOF3p+vqA6pd7px@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=da.gomez@samsung.com \
    --cc=hare@suse.de \
    --cc=javier.gonz@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).