* Re: I/O topology fixes for big physical block size [not found] ` <yq18w2mddav.fsf@sermon.lab.mkp.net> @ 2010-09-27 23:15 ` Mike Snitzer 2010-09-28 4:30 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: Mike Snitzer @ 2010-09-27 23:15 UTC (permalink / raw) To: Martin K. Petersen Cc: Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4 On Mon, Sep 27 2010 at 6:36pm -0400, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes: > Jens> Does mkfs do the right thing? > > Depends on which mkfs it is. Mike has tested things and can chip in > here... I haven't test all mkfs.* but... mkfs.xfs just works with 1M physical_block_size. mkfs.ext4 won't by default but -F "fixes" that: # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006 mke2fs 1.41.12 (17-May-2010) Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue ... I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at physical_block_size). ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-27 23:15 ` I/O topology fixes for big physical block size Mike Snitzer @ 2010-09-28 4:30 ` Jens Axboe 2010-09-28 5:20 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2010-09-28 4:30 UTC (permalink / raw) To: Mike Snitzer Cc: Martin K. Petersen, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On 2010-09-28 08:15, Mike Snitzer wrote: > On Mon, Sep 27 2010 at 6:36pm -0400, > Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes: >> Jens> Does mkfs do the right thing? >> >> Depends on which mkfs it is. Mike has tested things and can chip in >> here... > > I haven't test all mkfs.* but... > > mkfs.xfs just works with 1M physical_block_size. > > mkfs.ext4 won't by default but -F "fixes" that: > > # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006 > mke2fs 1.41.12 (17-May-2010) > Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue OK, so that's not exactly doing the right thing, but at least you can work around it with a parameter. So I'd say that is good enough. > I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at > physical_block_size). Thanks! -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-28 4:30 ` Jens Axboe @ 2010-09-28 5:20 ` Eric Sandeen 2010-09-28 14:15 ` Mike Snitzer 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2010-09-28 5:20 UTC (permalink / raw) To: Jens Axboe Cc: Mike Snitzer, Martin K. Petersen, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org Jens Axboe wrote: > On 2010-09-28 08:15, Mike Snitzer wrote: >> On Mon, Sep 27 2010 at 6:36pm -0400, >> Martin K. Petersen <martin.petersen@oracle.com> wrote: >> >>>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes: >>> Jens> Does mkfs do the right thing? >>> >>> Depends on which mkfs it is. Mike has tested things and can chip in >>> here... >> I haven't test all mkfs.* but... >> >> mkfs.xfs just works with 1M physical_block_size. >> >> mkfs.ext4 won't by default but -F "fixes" that: >> >> # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006 >> mke2fs 1.41.12 (17-May-2010) >> Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue > > OK, so that's not exactly doing the right thing, but at least you can > work around it with a parameter. So I'd say that is good enough. Which part of it is the wrong thing...? Today mkfs.ext4 refuses to create an fs blocksize which is smaller than logical or physical by default, because one is suboptimal and the other is impossible. -F (force) can override the suboptimal fs blocksize < logical blocksize case... Should we change something? Thanks, -Eric >> I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at >> physical_block_size). > > Thanks! > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-28 5:20 ` Eric Sandeen @ 2010-09-28 14:15 ` Mike Snitzer 2010-09-28 20:57 ` Ted Ts'o 0 siblings, 1 reply; 15+ messages in thread From: Mike Snitzer @ 2010-09-28 14:15 UTC (permalink / raw) To: Eric Sandeen Cc: Jens Axboe, Martin K. Petersen, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On Tue, Sep 28 2010 at 1:20am -0400, Eric Sandeen <sandeen@redhat.com> wrote: > Jens Axboe wrote: > > On 2010-09-28 08:15, Mike Snitzer wrote: > >> On Mon, Sep 27 2010 at 6:36pm -0400, > >> Martin K. Petersen <martin.petersen@oracle.com> wrote: > >> > >>>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes: > >>> Jens> Does mkfs do the right thing? > >>> > >>> Depends on which mkfs it is. Mike has tested things and can chip in > >>> here... > >> I haven't test all mkfs.* but... > >> > >> mkfs.xfs just works with 1M physical_block_size. > >> > >> mkfs.ext4 won't by default but -F "fixes" that: > >> > >> # mkfs.ext4 -b 4096 -F /dev/mapper/20017380023360006 > >> mke2fs 1.41.12 (17-May-2010) > >> Warning: specified blocksize 4096 is less than device physical sectorsize 1048576, forced to continue > > > > OK, so that's not exactly doing the right thing, but at least you can > > work around it with a parameter. So I'd say that is good enough. > > Which part of it is the wrong thing...? > > Today mkfs.ext4 refuses to create an fs blocksize which is smaller than logical > or physical by default, because one is suboptimal and the other is impossible. > -F (force) can override the suboptimal fs blocksize < logical blocksize case... Actually, -F allows one to override fs blocksize < physical_block_size. In this instance we have the following: # cat /sys/block/dm-2/queue/physical_block_size 1048576 # cat /sys/block/dm-2/queue/logical_block_size 512 > Should we change something? Unclear. I could see maybe automatically capping the fs block size at 4096 if physical_block_size is larger and is a multiple of 4096? > >> I'll check fdisk and parted tomorrow (I know lvm2 doesn't look at > >> physical_block_size). Both fdisk and parted look good (partitions are physical_block_size aligned, will warn if you attempt to stray from that alignment). I'll spare you detials of the creation steps... Results of fdisk: ----------------- # fdisk /dev/sdb ... The device presents a logical sector size that is smaller than the physical sector size. Aligning to a physical sector (or optimal I/O) size boundary is recommended, or performance may be impacted. ... # fdisk -l -u /dev/sdb Disk /dev/sdb: 17.2 GB, 17179869184 bytes 255 heads, 63 sectors/track, 2088 cylinders, total 33554432 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 1048576 bytes I/O size (minimum/optimal): 1048576 bytes / 1048576 bytes Disk identifier: 0x0009bf46 Device Boot Start End Blocks Id System /dev/sdb1 2048 16775167 8386560 83 Linux Results of parted: ------------------ Also looks good, doesn't care about physical_block_size. Is more concerned with {minimum,optimal}_io_size. (parted) unit MiB (parted) p Model: XXXXXXXXXXXXX Disk /dev/sdb: 16384MiB Sector size (logical/physical): 512B/1048576B Partition Table: msdos Number Start End Size Type File system Flags 1 1.00MiB 8191MiB 8190MiB primary ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-28 14:15 ` Mike Snitzer @ 2010-09-28 20:57 ` Ted Ts'o 2010-09-28 21:24 ` Martin K. Petersen 0 siblings, 1 reply; 15+ messages in thread From: Ted Ts'o @ 2010-09-28 20:57 UTC (permalink / raw) To: Mike Snitzer Cc: Eric Sandeen, Jens Axboe, Martin K. Petersen, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On Tue, Sep 28, 2010 at 10:15:45AM -0400, Mike Snitzer wrote: > Actually, -F allows one to override fs blocksize < physical_block_size. > > In this instance we have the following: > # cat /sys/block/dm-2/queue/physical_block_size > 1048576 > # cat /sys/block/dm-2/queue/logical_block_size > 512 > > > Should we change something? > > Unclear. I could see maybe automatically capping the fs block size at > 4096 if physical_block_size is larger and is a multiple of 4096? Can we decide soon what the right thing should be? I'm about to release e2fsrogs 1.41.13, and if I should put in some sanity checking code so mke2fs does something sane when it sees a 1M physical block size, I can do that. Or if the kernel is going to do that, it's fine too.... - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-28 20:57 ` Ted Ts'o @ 2010-09-28 21:24 ` Martin K. Petersen 2010-09-28 21:36 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Martin K. Petersen @ 2010-09-28 21:24 UTC (permalink / raw) To: Ted Ts'o Cc: Mike Snitzer, Eric Sandeen, Jens Axboe, Martin K. Petersen, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org >>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes: Ted> Can we decide soon what the right thing should be? I'm about to Ted> release e2fsrogs 1.41.13, and if I should put in some sanity Ted> checking code so mke2fs does something sane when it sees a 1M Ted> physical block size, I can do that. I don't think it's entirely clear what the "right thing" would be. Let's ignore the 1MB block size for now. That's clearly a fluke and a buggy device. But there are SSDs that will advertise an 8KiB physical block size. And apparently 16KiB devices are in the pipeline. How do we want to handle these devices? Allowing blocks bigger than the page size is going to be painful. So the question is whether we can tweak the filesystem layout in a way that would alleviate the pain without having to change the filesystem block size in the traditional sense. At least we're talking about SSDs and arrays here. I assume the partial block write penalty for these devices would be smaller than it is for rotating media. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-28 21:24 ` Martin K. Petersen @ 2010-09-28 21:36 ` Eric Sandeen 2010-09-30 16:30 ` Ted Ts'o 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2010-09-28 21:36 UTC (permalink / raw) To: Martin K. Petersen Cc: Ted Ts'o, Mike Snitzer, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org Martin K. Petersen wrote: >>>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes: >>>>>> > > Ted> Can we decide soon what the right thing should be? I'm about to > Ted> release e2fsrogs 1.41.13, and if I should put in some sanity > Ted> checking code so mke2fs does something sane when it sees a 1M > Ted> physical block size, I can do that. > > I don't think it's entirely clear what the "right thing" would be. > > Let's ignore the 1MB block size for now. That's clearly a fluke and a > buggy device. But there are SSDs that will advertise an 8KiB physical > block size. And apparently 16KiB devices are in the pipeline. > Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less than device physical sectorsize without -F is broken, and that should be removed. I'd say issue a warning in the case but if there's a 16k physical device maybe there's no point in warning either? > How do we want to handle these devices? Allowing blocks bigger than the > page size is going to be painful. > > So the question is whether we can tweak the filesystem layout in a way > that would alleviate the pain without having to change the filesystem > block size in the traditional sense. > > At least we're talking about SSDs and arrays here. I assume the partial > block write penalty for these devices would be smaller than it is for > rotating media. > > I guess it must be. Anyway here's a patch to remove the force requirement and just give the user whatever they want, since apparently we can't avoid fs blocksize less than physical sector size in general. It does still warn that the fs blocksize is less than physical sectorsize, but *shrug* diff --git a/misc/mke2fs.c b/misc/mke2fs.c index add7c0c..6010fc1 100644 --- a/misc/mke2fs.c +++ b/misc/mke2fs.c @@ -1634,17 +1634,15 @@ static void PRS(int argc, char *argv[]) ext2fs_blocks_count(&fs_param) / (blocksize / 1024)); } else { - if (blocksize < lsector_size || /* Impossible */ - (!force && (blocksize < psector_size))) { /* Suboptimal */ + if (blocksize < lsector_size) { /* Impossible */ com_err(program_name, EINVAL, _("while setting blocksize; too small " "for device\n")); exit(1); - } else if (blocksize < psector_size) { + } else if (blocksize < psector_size) { /* Suboptimal */ fprintf(stderr, _("Warning: specified blocksize %d is " - "less than device physical sectorsize %d, " - "forced to continue\n"), blocksize, - psector_size); + "less than device physical sectorsize %d\n") + blocksize, psector_size); } } -Eric ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-28 21:36 ` Eric Sandeen @ 2010-09-30 16:30 ` Ted Ts'o 2010-09-30 17:07 ` Eric Sandeen 0 siblings, 1 reply; 15+ messages in thread From: Ted Ts'o @ 2010-09-30 16:30 UTC (permalink / raw) To: Eric Sandeen Cc: Martin K. Petersen, Mike Snitzer, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On Tue, Sep 28, 2010 at 04:36:42PM -0500, Eric Sandeen wrote: > Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less > than device physical sectorsize without -F is broken, and that should > be removed. I'd say issue a warning in the case but if there's a 16k > physical device maybe there's no point in warning either? If the device physical sectorsize is that big, should we perhaps use that as a hint to align writes to that blocks aligned with that physical sectorsize? Right now we use the optimal I/O size, but if the optimal I/O size is not specified and the physical sectorsize is, say, 16k or 32k, maybe we should use to calculate for s_raid_stripe_width? - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-30 16:30 ` Ted Ts'o @ 2010-09-30 17:07 ` Eric Sandeen 2010-09-30 17:33 ` Mike Snitzer 0 siblings, 1 reply; 15+ messages in thread From: Eric Sandeen @ 2010-09-30 17:07 UTC (permalink / raw) To: Ted Ts'o, Martin K. Petersen, Mike Snitzer, Jens Axboe, "James.Bottomley@hansenpartnership On 09/30/2010 11:30 AM, Ted Ts'o wrote: > On Tue, Sep 28, 2010 at 04:36:42PM -0500, Eric Sandeen wrote: >> Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less >> than device physical sectorsize without -F is broken, and that should >> be removed. I'd say issue a warning in the case but if there's a 16k >> physical device maybe there's no point in warning either? > > If the device physical sectorsize is that big, should we perhaps use > that as a hint to align writes to that blocks aligned with that > physical sectorsize? Right now we use the optimal I/O size, but if > the optimal I/O size is not specified and the physical sectorsize is, I can't keep track of all the parameters, is it ever true that optimal I/O size isn't specified? > say, 16k or 32k, maybe we should use to calculate for > s_raid_stripe_width? Perhaps, though really ext4 still doesn't do -that- much with the value, anyway... -Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-30 17:07 ` Eric Sandeen @ 2010-09-30 17:33 ` Mike Snitzer 2010-10-01 14:24 ` Ted Ts'o 0 siblings, 1 reply; 15+ messages in thread From: Mike Snitzer @ 2010-09-30 17:33 UTC (permalink / raw) To: Eric Sandeen Cc: Ted Ts'o, Martin K. Petersen, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On Thu, Sep 30 2010 at 1:07pm -0400, Eric Sandeen <sandeen@redhat.com> wrote: > On 09/30/2010 11:30 AM, Ted Ts'o wrote: > > On Tue, Sep 28, 2010 at 04:36:42PM -0500, Eric Sandeen wrote: > >> Ok, then it sounds like mkfs.ext4's refusal to make fs blocksize less > >> than device physical sectorsize without -F is broken, and that should > >> be removed. I'd say issue a warning in the case but if there's a 16k > >> physical device maybe there's no point in warning either? > > > > If the device physical sectorsize is that big, should we perhaps use > > that as a hint to align writes to that blocks aligned with that > > physical sectorsize? Right now we use the optimal I/O size, but if > > the optimal I/O size is not specified and the physical sectorsize is, > > I can't keep track of all the parameters, is it ever true that optimal > I/O size isn't specified? Yes optimal_io_size may be 0. But minimum_io_size will always be scaled up to at least match physical_block_size. In any case: this 1MB physical_block_size device, which started this thread, also has 1MB for both minimum_io_size and optimal_io_size. Mike ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-09-30 17:33 ` Mike Snitzer @ 2010-10-01 14:24 ` Ted Ts'o 2010-10-01 22:19 ` Martin K. Petersen 0 siblings, 1 reply; 15+ messages in thread From: Ted Ts'o @ 2010-10-01 14:24 UTC (permalink / raw) To: Mike Snitzer Cc: Eric Sandeen, Martin K. Petersen, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On Thu, Sep 30, 2010 at 01:33:43PM -0400, Mike Snitzer wrote: > > Yes optimal_io_size may be 0. But minimum_io_size will always be scaled > up to at least match physical_block_size. Woah! Are we sure we want to do that? According to Jens, 8k physical blockes are here already and 16k physical blocks sizes are right around the corner. If we scale minimum_io_size up to the physical block size, then even though these devices will have 512 or 4k logical block sizes, minimum_io_size will be 16k? That sounds wrong, incorrect, and given that the Linux VM can't handle file system block sizes greater than page size. And if we scale the minimum_io_size to the physical block size, mke2fs will refuse to create a 4k blocksize filesystem --- since presumably "minimum io size" means we can't do I/O's smaller than that. Please tell me you meant to say __logical__ blocksize above? Or am I misunderstanding what you meant? - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-10-01 14:24 ` Ted Ts'o @ 2010-10-01 22:19 ` Martin K. Petersen 2010-10-02 2:31 ` Ted Ts'o 0 siblings, 1 reply; 15+ messages in thread From: Martin K. Petersen @ 2010-10-01 22:19 UTC (permalink / raw) To: Ted Ts'o Cc: Mike Snitzer, Eric Sandeen, Martin K. Petersen, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org >>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes: Ted> If we scale minimum_io_size up to the physical block size, then Ted> even though these devices will have 512 or 4k logical block sizes, Ted> minimum_io_size will be 16k? That sounds wrong, incorrect, and Ted> given that the Linux VM can't handle file system block sizes Ted> greater than page size. And if we scale the minimum_io_size to the Ted> physical block size, mke2fs will refuse to create a 4k blocksize Ted> filesystem --- since presumably "minimum io size" means we can't do Ted> I/O's smaller than that. logical <= physical <= minimum logical is the smallest unit we can address. Usually 512 bytes. physical is the allocation unit the device claims to use internally. Typically 512 or 4096. 8 and 16 KiB coming. minimal is the device's preferred minimum random I/O unit. This is usually identical to the physical block size. Arrays might report a multiple of the physical block size here (stripe chunk size). optimal (if provided) is the preferred sequential I/O unit and a multiple of minimal (stripe width). The logical and physical parameters are device protocol-centric values. The minimum and optimal I/O sizes are the two "soft" values that filesystems should be looking at for layout hints. A filesystem should use minimal as a cue for block size and optimal as a cue for stripe width. minimum may indeed be bigger than page size and this discussion was started to figure out if there were thing we could do to accommodate these device without actually changing the filesystem block size in the traditional sense. Since not all drives guarantee that read-modify-write cycle on a 4 KiB physical block won't clobber adjacent 512-byte logical blocks it may be a good idea to look at physical block size if there are atomicity concerns. I.e. filesystems that depend on atomic journal writes may want to look at the reported physical block size. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-10-01 22:19 ` Martin K. Petersen @ 2010-10-02 2:31 ` Ted Ts'o 2010-10-02 3:03 ` Daniel Taylor 2010-10-04 19:49 ` Martin K. Petersen 0 siblings, 2 replies; 15+ messages in thread From: Ted Ts'o @ 2010-10-02 2:31 UTC (permalink / raw) To: Martin K. Petersen Cc: Mike Snitzer, Eric Sandeen, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org On Fri, Oct 01, 2010 at 06:19:21PM -0400, Martin K. Petersen wrote: > Since not all drives guarantee that read-modify-write cycle on a 4 KiB > physical block won't clobber adjacent 512-byte logical blocks it may be > a good idea to look at physical block size if there are atomicity > concerns. I.e. filesystems that depend on atomic journal writes may > want to look at the reported physical block size. OK, but what do we do when we start seeing devices with 8k or 16k physical block sizes? The VM doesn't deal well with block sizes > page size. - Ted ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: I/O topology fixes for big physical block size 2010-10-02 2:31 ` Ted Ts'o @ 2010-10-02 3:03 ` Daniel Taylor 2010-10-04 19:49 ` Martin K. Petersen 1 sibling, 0 replies; 15+ messages in thread From: Daniel Taylor @ 2010-10-02 3:03 UTC (permalink / raw) To: linux-ext4 > -----Original Message----- > From: linux-ext4-owner@vger.kernel.org > [mailto:linux-ext4-owner@vger.kernel.org] On Behalf Of Ted Ts'o > Sent: Friday, October 01, 2010 7:31 PM > To: Martin K. Petersen > Cc: Mike Snitzer; Eric Sandeen; Jens Axboe; > James.Bottomley@hansenpartnership.com; > linux-scsi@vger.kernel.org; linux-ext4@vger.kernel.org > Subject: Re: I/O topology fixes for big physical block size > > On Fri, Oct 01, 2010 at 06:19:21PM -0400, Martin K. Petersen wrote: > > Since not all drives guarantee that read-modify-write cycle > on a 4 KiB > > physical block won't clobber adjacent 512-byte logical > blocks it may be > > a good idea to look at physical block size if there are atomicity > > concerns. I.e. filesystems that depend on atomic journal writes may > > want to look at the reported physical block size. > > OK, but what do we do when we start seeing devices with 8k or 16k > physical block sizes? The VM doesn't deal well with block sizes > > page size. This is a very real concern. Those drives already exist, in essence, in RAID configurations, and we have had to do a workaround that complicates our production process to handle file systems for embedded devices where the file system block size is 64K (the kernel block size for the device is also 64K), but there's no corresponding x86 block size available. BTW, not all drives with 4096-byte physical blocks are reporting themselves as such. Some of them report as 512-byte physical. > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe > linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: I/O topology fixes for big physical block size 2010-10-02 2:31 ` Ted Ts'o 2010-10-02 3:03 ` Daniel Taylor @ 2010-10-04 19:49 ` Martin K. Petersen 1 sibling, 0 replies; 15+ messages in thread From: Martin K. Petersen @ 2010-10-04 19:49 UTC (permalink / raw) To: Ted Ts'o Cc: Martin K. Petersen, Mike Snitzer, Eric Sandeen, Jens Axboe, James.Bottomley@hansenpartnership.com, linux-scsi@vger.kernel.org, linux-ext4@vger.kernel.org >>>>> "Ted" == Ted Ts'o <tytso@mit.edu> writes: Ted, Ted> OK, but what do we do when we start seeing devices with 8k or 16k Ted> physical block sizes? The VM doesn't deal well with block sizes > Ted> page size. I don't think we're going to see devices reporting logical blocks bigger than 4KiB anytime soon. Too much pain for everybody in the industry (most other operating systems can't even deal with 4KiB logical blocks yet). Eventually we will have to do the required page cache surgery to support filesystem block sizes bigger than the page size. But I don't think that's something we'll have to deal with in the immediate future. In the meantime, however, the question is whether there is something we can do in the allocators to mitigate effects of devices reporting physical blocks bigger than PAGE_CACHE_SIZE. Obviously this would be in the I/O hint/alignment category and not something which would guarantee that all writes would be aligned multiples of that physical block size. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-10-04 19:49 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1285605664-27027-1-git-send-email-martin.petersen@oracle.com>
[not found] ` <4CA0CC38.5010804@fusionio.com>
[not found] ` <yq1pqvzcddq.fsf@sermon.lab.mkp.net>
[not found] ` <4CA118FF.1080100@fusionio.com>
[not found] ` <yq18w2mddav.fsf@sermon.lab.mkp.net>
2010-09-27 23:15 ` I/O topology fixes for big physical block size Mike Snitzer
2010-09-28 4:30 ` Jens Axboe
2010-09-28 5:20 ` Eric Sandeen
2010-09-28 14:15 ` Mike Snitzer
2010-09-28 20:57 ` Ted Ts'o
2010-09-28 21:24 ` Martin K. Petersen
2010-09-28 21:36 ` Eric Sandeen
2010-09-30 16:30 ` Ted Ts'o
2010-09-30 17:07 ` Eric Sandeen
2010-09-30 17:33 ` Mike Snitzer
2010-10-01 14:24 ` Ted Ts'o
2010-10-01 22:19 ` Martin K. Petersen
2010-10-02 2:31 ` Ted Ts'o
2010-10-02 3:03 ` Daniel Taylor
2010-10-04 19:49 ` Martin K. Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox