Re: I/O topology fixes for big physical block size

public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed

* Re: I/O topology fixes for big physical block size
       [not found]       ` <yq18w2mddav.fsf@sermon.lab.mkp.net>
@ 2010-09-27 23:15         ` Mike Snitzer
  2010-09-28  4:30           ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2010-09-27 23:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Jens Axboe, James.Bottomley@hansenpartnership.com,
	linux-scsi@vger.kernel.org, linux-ext4

On Mon, Sep 27 2010 at  6:36pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes:
> Jens> Does mkfs do the right thing?
> 
> Depends on which mkfs it is. Mike has tested things and can chip in
> here...

I haven't test all mkfs.* but...

mkfs.xfs just works with 1M physical_block_size.

mkfs.ext4 won't by default but -F "fixes" that:

# mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006
mke2fs 1.41.12 (17-May-2010)
Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue
...

I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at
physical_block_size).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-27 23:15         ` I/O topology fixes for big physical block size Mike Snitzer
@ 2010-09-28  4:30           ` Jens Axboe
  2010-09-28  5:20             ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2010-09-28  4:30 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, James.Bottomley@hansenpartnership.com,
	linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org

On 2010-09-28 08:15, Mike Snitzer wrote:
> On Mon, Sep 27 2010 at  6:36pm -0400,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
>>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes:
>> Jens> Does mkfs do the right thing?
>>
>> Depends on which mkfs it is. Mike has tested things and can chip in
>> here...
> 
> I haven't test all mkfs.* but...
> 
> mkfs.xfs just works with 1M physical_block_size.
> 
> mkfs.ext4 won't by default but -F "fixes" that:
> 
> # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006
> mke2fs 1.41.12 (17-May-2010)
> Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue

OK, so that's not exactly doing the right thing, but at least you can
work around it with a parameter. So I'd say that is good enough.

> I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at
> physical_block_size).

Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-28  4:30           ` Jens Axboe
@ 2010-09-28  5:20             ` Eric Sandeen
  2010-09-28 14:15               ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-09-28  5:20 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Mike Snitzer, Martin K. Petersen,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

Jens Axboe wrote:
> On 2010-09-28 08:15, Mike Snitzer wrote:
>> On Mon, Sep 27 2010 at  6:36pm -0400,
>> Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>
>>>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes:
>>> Jens> Does mkfs do the right thing?
>>>
>>> Depends on which mkfs it is. Mike has tested things and can chip in
>>> here...
>> I haven't test all mkfs.* but...
>>
>> mkfs.xfs just works with 1M physical_block_size.
>>
>> mkfs.ext4 won't by default but -F "fixes" that:
>>
>> # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006
>> mke2fs 1.41.12 (17-May-2010)
>> Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue
> 
> OK, so that's not exactly doing the right thing, but at least you can
> work around it with a parameter. So I'd say that is good enough.

Which part of it is the wrong thing...?

Today mkfs.ext4 refuses to create an fs blocksize which is smaller than logical
or physical by default, because one is suboptimal and the other is impossible.
-F (force) can override the suboptimal fs blocksize < logical blocksize case...

Should we change something?

Thanks,
-Eric

>> I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at
>> physical_block_size).
> 
> Thanks!
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-28  5:20             ` Eric Sandeen
@ 2010-09-28 14:15               ` Mike Snitzer
  2010-09-28 20:57                 ` Ted Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2010-09-28 14:15 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Jens Axboe, Martin K. Petersen,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

On Tue, Sep 28 2010 at  1:20am -0400,
Eric Sandeen <sandeen@redhat.com> wrote:

> Jens Axboe wrote:
> > On 2010-09-28 08:15, Mike Snitzer wrote:
> >> On Mon, Sep 27 2010 at  6:36pm -0400,
> >> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>
> >>>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes:
> >>> Jens> Does mkfs do the right thing?
> >>>
> >>> Depends on which mkfs it is. Mike has tested things and can chip in
> >>> here...
> >> I haven't test all mkfs.* but...
> >>
> >> mkfs.xfs just works with 1M physical_block_size.
> >>
> >> mkfs.ext4 won't by default but -F "fixes" that:
> >>
> >> # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006
> >> mke2fs 1.41.12 (17-May-2010)
> >> Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue
> > 
> > OK, so that's not exactly doing the right thing, but at least you can
> > work around it with a parameter. So I'd say that is good enough.
> 
> Which part of it is the wrong thing...?
> 
> Today mkfs.ext4 refuses to create an fs blocksize which is smaller than logical
> or physical by default, because one is suboptimal and the other is impossible.
> -F (force) can override the suboptimal fs blocksize < logical blocksize case...

Actually, -F allows one to override fs blocksize < physical_block_size.

In this instance we have the following:
# cat /sys/block/dm-2/queue/physical_block_size 
1048576
# cat /sys/block/dm-2/queue/logical_block_size 
512
 
> Should we change something?

Unclear.  I could see maybe automatically capping the fs block size at
4096 if physical_block_size is larger and is a multiple of 4096?

> >> I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at
> >> physical_block_size).

Both fdisk and parted look good (partitions are physical_block_size
aligned, will warn if you attempt to stray from that alignment).  I'll
spare you detials of the creation steps...

Results of fdisk:
-----------------

# fdisk /dev/sdb
...
The device presents a logical sector size that is smaller than
the physical sector size. Aligning to a physical sector (or optimal
I/O) size boundary is recommended, or performance may be impacted.
...

# fdisk -l -u /dev/sdb

Disk /dev/sdb: 17.2 GB, 17179869184 bytes
255 heads, 63 sectors/track, 2088 cylinders, total 33554432 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 1048576 bytes
I/O size (minimum/optimal): 1048576 bytes / 1048576 bytes
Disk identifier: 0x0009bf46

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1            2048    16775167     8386560   83  Linux


Results of parted:
------------------
Also looks good, doesn't care about physical_block_size.  Is more
concerned with {minimum,optimal}_io_size.

(parted) unit MiB
(parted) p
Model: XXXXXXXXXXXXX
Disk /dev/sdb: 16384MiB
Sector size (logical/physical): 512B/1048576B
Partition Table: msdos

Number  Start    End      Size     Type     File system  Flags
 1      1.00MiB  8191MiB  8190MiB  primary

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-28 14:15               ` Mike Snitzer
@ 2010-09-28 20:57                 ` Ted Ts'o
  2010-09-28 21:24                   ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Ted Ts'o @ 2010-09-28 20:57 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Eric Sandeen, Jens Axboe, Martin K. Petersen,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

On Tue, Sep 28, 2010 at 10:15:45AM -0400, Mike Snitzer wrote:
> Actually, -F allows one to override fs blocksize < physical_block_size.
> 
> In this instance we have the following:
> # cat /sys/block/dm-2/queue/physical_block_size 
> 1048576
> # cat /sys/block/dm-2/queue/logical_block_size 
> 512
>  
> > Should we change something?
> 
> Unclear.  I could see maybe automatically capping the fs block size at
> 4096 if physical_block_size is larger and is a multiple of 4096?

Can we decide soon what the right thing should be?  I'm about to
release e2fsrogs 1.41.13, and if I should put in some sanity checking
code so mke2fs does something sane when it sees a 1M physical block
size, I can do that.

Or if the kernel is going to do that, it's fine too....

      	  	    	     	      	   - Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-28 20:57                 ` Ted Ts'o
@ 2010-09-28 21:24                   ` Martin K. Petersen
  2010-09-28 21:36                     ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2010-09-28 21:24 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Mike Snitzer, Eric Sandeen, Jens Axboe, Martin K. Petersen,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

>>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes:

Ted> Can we decide soon what the right thing should be?  I'm about to
Ted> release e2fsrogs 1.41.13, and if I should put in some sanity
Ted> checking code so mke2fs does something sane when it sees a 1M
Ted> physical block size, I can do that.

I don't think it's entirely clear what the "right thing" would be.

Let's ignore the 1MB block size for now. That's clearly a fluke and a
buggy device. But there are SSDs that will advertise an 8KiB physical
block size. And apparently 16KiB devices are in the pipeline.

How do we want to handle these devices? Allowing blocks bigger than the
page size is going to be painful.

So the question is whether we can tweak the filesystem layout in a way
that would alleviate the pain without having to change the filesystem
block size in the traditional sense.

At least we're talking about SSDs and arrays here. I assume the partial
block write penalty for these devices would be smaller than it is for
rotating media.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-28 21:24                   ` Martin K. Petersen
@ 2010-09-28 21:36                     ` Eric Sandeen
  2010-09-30 16:30                       ` Ted Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-09-28 21:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Ted Ts'o, Mike Snitzer, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

Martin K. Petersen wrote:

>>>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes:
>>>>>>             
>
> Ted> Can we decide soon what the right thing should be?  I'm about to
> Ted> release e2fsrogs 1.41.13, and if I should put in some sanity
> Ted> checking code so mke2fs does something sane when it sees a 1M
> Ted> physical block size, I can do that.
>
> I don't think it's entirely clear what the "right thing" would be.
>
> Let's ignore the 1MB block size for now. That's clearly a fluke and a
> buggy device. But there are SSDs that will advertise an 8KiB physical
> block size. And apparently 16KiB devices are in the pipeline.
>   
Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less
than device physical sectorsize without -F is broken, and that should
be removed.  I'd say issue a warning in the case but if there's a 16k
physical device maybe there's no point in warning either?

> How do we want to handle these devices? Allowing blocks bigger than the
> page size is going to be painful.
>
> So the question is whether we can tweak the filesystem layout in a way
> that would alleviate the pain without having to change the filesystem
> block size in the traditional sense.
>
> At least we're talking about SSDs and arrays here. I assume the partial
> block write penalty for these devices would be smaller than it is for
> rotating media.
>
>   
I guess it must be.

Anyway here's a patch to remove the force requirement and just give the
user whatever they want, since apparently we can't avoid fs blocksize
less than physical sector size in general.  It does still warn
that the fs blocksize is less than physical sectorsize, but *shrug*


diff --git a/misc/mke2fs.c b/misc/mke2fs.c
index add7c0c..6010fc1 100644
--- a/misc/mke2fs.c
+++ b/misc/mke2fs.c
@@ -1634,17 +1634,15 @@ static void PRS(int argc, char *argv[])
 					ext2fs_blocks_count(&fs_param) /
 					(blocksize / 1024));
 	} else {
-		if (blocksize < lsector_size ||			/* Impossible */
-		    (!force && (blocksize < psector_size))) {	/* Suboptimal */
+		if (blocksize < lsector_size) {			/* Impossible */
 			com_err(program_name, EINVAL,
 				_("while setting blocksize; too small "
 				  "for device\n"));
 			exit(1);
-		} else if (blocksize < psector_size) {
+		} else if (blocksize < psector_size) {		/* Suboptimal */
 			fprintf(stderr, _("Warning: specified blocksize %d is "
-				"less than device physical sectorsize %d, "
-				"forced to continue\n"), blocksize,
-				psector_size);
+				"less than device physical sectorsize %d\n")
+				blocksize, psector_size);
 		}
 	}
 

-Eric



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-28 21:36                     ` Eric Sandeen
@ 2010-09-30 16:30                       ` Ted Ts'o
  2010-09-30 17:07                         ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: Ted Ts'o @ 2010-09-30 16:30 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Martin K. Petersen, Mike Snitzer, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

On Tue, Sep 28, 2010 at 04:36:42PM -0500, Eric Sandeen wrote:
> Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less
> than device physical sectorsize without -F is broken, and that should
> be removed.  I'd say issue a warning in the case but if there's a 16k
> physical device maybe there's no point in warning either?

If the device physical sectorsize is that big, should we perhaps use
that as a hint to align writes to that blocks aligned with that
physical sectorsize?  Right now we use the optimal I/O size, but if
the optimal I/O size is not specified and the physical sectorsize is,
say, 16k or 32k, maybe we should use to calculate for
s_raid_stripe_width?

						- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-30 16:30                       ` Ted Ts'o
@ 2010-09-30 17:07                         ` Eric Sandeen
  2010-09-30 17:33                           ` Mike Snitzer
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-09-30 17:07 UTC (permalink / raw)
  To: Ted Ts'o, Martin K. Petersen, Mike Snitzer, Jens Axboe,
	"James.Bottomley@hansenpartnership

On 09/30/2010 11:30 AM, Ted Ts'o wrote:
> On Tue, Sep 28, 2010 at 04:36:42PM -0500, Eric Sandeen wrote:
>> Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less
>> than device physical sectorsize without -F is broken, and that should
>> be removed.  I'd say issue a warning in the case but if there's a 16k
>> physical device maybe there's no point in warning either?
> 
> If the device physical sectorsize is that big, should we perhaps use
> that as a hint to align writes to that blocks aligned with that
> physical sectorsize?  Right now we use the optimal I/O size, but if
> the optimal I/O size is not specified and the physical sectorsize is,

I can't keep track of all the parameters, is it ever true that optimal
I/O size isn't specified?

> say, 16k or 32k, maybe we should use to calculate for
> s_raid_stripe_width?

Perhaps, though really ext4 still doesn't do -that- much with the value,
anyway...

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-30 17:07                         ` Eric Sandeen
@ 2010-09-30 17:33                           ` Mike Snitzer
  2010-10-01 14:24                             ` Ted Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2010-09-30 17:33 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Ted Ts'o, Martin K. Petersen, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

On Thu, Sep 30 2010 at  1:07pm -0400,
Eric Sandeen <sandeen@redhat.com> wrote:

> On 09/30/2010 11:30 AM, Ted Ts'o wrote:
> > On Tue, Sep 28, 2010 at 04:36:42PM -0500, Eric Sandeen wrote:
> >> Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less
> >> than device physical sectorsize without -F is broken, and that should
> >> be removed.  I'd say issue a warning in the case but if there's a 16k
> >> physical device maybe there's no point in warning either?
> > 
> > If the device physical sectorsize is that big, should we perhaps use
> > that as a hint to align writes to that blocks aligned with that
> > physical sectorsize?  Right now we use the optimal I/O size, but if
> > the optimal I/O size is not specified and the physical sectorsize is,
> 
> I can't keep track of all the parameters, is it ever true that optimal
> I/O size isn't specified?

Yes optimal_io_size may be 0.  But minimum_io_size will always be scaled
up to at least match physical_block_size.

In any case: this 1MB physical_block_size device, which started this
thread, also has 1MB for both minimum_io_size and optimal_io_size.

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-09-30 17:33                           ` Mike Snitzer
@ 2010-10-01 14:24                             ` Ted Ts'o
  2010-10-01 22:19                               ` Martin K. Petersen
  0 siblings, 1 reply; 15+ messages in thread
From: Ted Ts'o @ 2010-10-01 14:24 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Eric Sandeen, Martin K. Petersen, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

On Thu, Sep 30, 2010 at 01:33:43PM -0400, Mike Snitzer wrote:
> 
> Yes optimal_io_size may be 0.  But minimum_io_size will always be scaled
> up to at least match physical_block_size.

Woah!  Are we sure we want to do that?  According to Jens, 8k physical
blockes are here already and 16k physical blocks sizes are right
around the corner.  If we scale minimum_io_size up to the physical
block size, then even though these devices will have 512 or 4k logical
block sizes, minimum_io_size will be 16k?  That sounds wrong,
incorrect, and given that the Linux VM can't handle file system block
sizes greater than page size.  And if we scale the minimum_io_size to
the physical block size, mke2fs will refuse to create a 4k blocksize
filesystem --- since presumably "minimum io size" means we can't do
I/O's smaller than that.

Please tell me you meant to say __logical__ blocksize above?

Or am I misunderstanding what you meant?

						- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-10-01 14:24                             ` Ted Ts'o
@ 2010-10-01 22:19                               ` Martin K. Petersen
  2010-10-02  2:31                                 ` Ted Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Martin K. Petersen @ 2010-10-01 22:19 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Mike Snitzer, Eric Sandeen, Martin K. Petersen, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

>>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes:

Ted> If we scale minimum_io_size up to the physical block size, then
Ted> even though these devices will have 512 or 4k logical block sizes,
Ted> minimum_io_size will be 16k?  That sounds wrong, incorrect, and
Ted> given that the Linux VM can't handle file system block sizes
Ted> greater than page size.  And if we scale the minimum_io_size to the
Ted> physical block size, mke2fs will refuse to create a 4k blocksize
Ted> filesystem --- since presumably "minimum io size" means we can't do
Ted> I/O's smaller than that.

logical <= physical <= minimum 

	logical is the smallest unit we can address.  Usually 512 bytes.

        physical is the allocation unit the device claims to use
        internally.  Typically 512 or 4096.  8 and 16 KiB coming.

        minimal is the device's preferred minimum random I/O unit.  This
        is usually identical to the physical block size.  Arrays might
        report a multiple of the physical block size here (stripe chunk
        size).

        optimal (if provided) is the preferred sequential I/O unit and a
        multiple of minimal (stripe width).

The logical and physical parameters are device protocol-centric values.
The minimum and optimal I/O sizes are the two "soft" values that
filesystems should be looking at for layout hints.

A filesystem should use minimal as a cue for block size and optimal as a
cue for stripe width.  minimum may indeed be bigger than page size and
this discussion was started to figure out if there were thing we could
do to accommodate these device without actually changing the filesystem
block size in the traditional sense.

Since not all drives guarantee that read-modify-write cycle on a 4 KiB
physical block won't clobber adjacent 512-byte logical blocks it may be
a good idea to look at physical block size if there are atomicity
concerns.  I.e. filesystems that depend on atomic journal writes may
want to look at the reported physical block size.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-10-01 22:19                               ` Martin K. Petersen
@ 2010-10-02  2:31                                 ` Ted Ts'o
  2010-10-02  3:03                                   ` Daniel Taylor
  2010-10-04 19:49                                   ` Martin K. Petersen
  0 siblings, 2 replies; 15+ messages in thread
From: Ted Ts'o @ 2010-10-02  2:31 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Mike Snitzer, Eric Sandeen, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

On Fri, Oct 01, 2010 at 06:19:21PM -0400, Martin K. Petersen wrote:
> Since not all drives guarantee that read-modify-write cycle on a 4 KiB
> physical block won't clobber adjacent 512-byte logical blocks it may be
> a good idea to look at physical block size if there are atomicity
> concerns.  I.e. filesystems that depend on atomic journal writes may
> want to look at the reported physical block size.

OK, but what do we do when we start seeing devices with 8k or 16k
physical block sizes?  The VM doesn't deal well with block sizes >
page size.

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: I/O topology fixes for big physical block size
  2010-10-02  2:31                                 ` Ted Ts'o
@ 2010-10-02  3:03                                   ` Daniel Taylor
  2010-10-04 19:49                                   ` Martin K. Petersen
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Taylor @ 2010-10-02  3:03 UTC (permalink / raw)
  To: linux-ext4

 

> -----Original Message-----
> From: linux-ext4-owner@vger.kernel.org 
> [mailto:linux-ext4-owner@vger.kernel.org] On Behalf Of Ted Ts'o
> Sent: Friday, October 01, 2010 7:31 PM
> To: Martin K. Petersen
> Cc: Mike Snitzer; Eric Sandeen; Jens Axboe; 
> James.Bottomley@hansenpartnership.com; 
> linux-scsi@vger.kernel.org; linux-ext4@vger.kernel.org
> Subject: Re: I/O topology fixes for big physical block size
> 
> On Fri, Oct 01, 2010 at 06:19:21PM -0400, Martin K. Petersen wrote:
> > Since not all drives guarantee that read-modify-write cycle 
> on a 4 KiB
> > physical block won't clobber adjacent 512-byte logical 
> blocks it may be
> > a good idea to look at physical block size if there are atomicity
> > concerns.  I.e. filesystems that depend on atomic journal writes may
> > want to look at the reported physical block size.
> 
> OK, but what do we do when we start seeing devices with 8k or 16k
> physical block sizes?  The VM doesn't deal well with block sizes >
> page size.

This is a very real concern.

Those drives already exist, in essence, in RAID configurations, and
we have had to do a workaround that complicates our production process
to handle file systems for embedded devices where the file system block
size is 64K (the kernel block size for the device is also 64K), but
there's no corresponding x86 block size available.

BTW, not all drives with 4096-byte physical blocks are reporting themselves
as such.  Some of them report as 512-byte physical.

> 
> 					- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe 
> linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: I/O topology fixes for big physical block size
  2010-10-02  2:31                                 ` Ted Ts'o
  2010-10-02  3:03                                   ` Daniel Taylor
@ 2010-10-04 19:49                                   ` Martin K. Petersen
  1 sibling, 0 replies; 15+ messages in thread
From: Martin K. Petersen @ 2010-10-04 19:49 UTC (permalink / raw)
  To: Ted Ts'o
  Cc: Martin K. Petersen, Mike Snitzer, Eric Sandeen, Jens Axboe,
	James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org,
	linux-ext4@vger.kernel.org

>>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes:

Ted,

Ted> OK, but what do we do when we start seeing devices with 8k or 16k
Ted> physical block sizes?  The VM doesn't deal well with block sizes >
Ted> page size.

I don't think we're going to see devices reporting logical blocks bigger
than 4KiB anytime soon.  Too much pain for everybody in the industry
(most other operating systems can't even deal with 4KiB logical blocks
yet).  Eventually we will have to do the required page cache surgery to
support filesystem block sizes bigger than the page size.  But I don't
think that's something we'll have to deal with in the immediate future.

In the meantime, however, the question is whether there is something we
can do in the allocators to mitigate effects of devices reporting
physical blocks bigger than PAGE_CACHE_SIZE.  Obviously this would be in
the I/O hint/alignment category and not something which would guarantee
that all writes would be aligned multiples of that physical block size.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-10-04 19:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1285605664-27027-1-git-send-email-martin.petersen@oracle.com>
     [not found] ` <4CA0CC38.5010804@fusionio.com>
     [not found]   ` <yq1pqvzcddq.fsf@sermon.lab.mkp.net>
     [not found]     ` <4CA118FF.1080100@fusionio.com>
     [not found]       ` <yq18w2mddav.fsf@sermon.lab.mkp.net>
2010-09-27 23:15         ` I/O topology fixes for big physical block size Mike Snitzer
2010-09-28  4:30           ` Jens Axboe
2010-09-28  5:20             ` Eric Sandeen
2010-09-28 14:15               ` Mike Snitzer
2010-09-28 20:57                 ` Ted Ts'o
2010-09-28 21:24                   ` Martin K. Petersen
2010-09-28 21:36                     ` Eric Sandeen
2010-09-30 16:30                       ` Ted Ts'o
2010-09-30 17:07                         ` Eric Sandeen
2010-09-30 17:33                           ` Mike Snitzer
2010-10-01 14:24                             ` Ted Ts'o
2010-10-01 22:19                               ` Martin K. Petersen
2010-10-02  2:31                                 ` Ted Ts'o
2010-10-02  3:03                                   ` Daniel Taylor
2010-10-04 19:49                                   ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox