* [PATCH] dm: Fix alignment stacking on partitioned devices @ 2009-12-18 7:30 Martin K. Petersen 2009-12-18 17:33 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Martin K. Petersen @ 2009-12-18 7:30 UTC (permalink / raw) To: device-mapper development, Mike Snitzer; +Cc: jens.axboe The dm device limits function passes the start sector within the block device to the block layer stacking function. However, the partition's offset on the physical device is not added, resulting in incorrect alignment reporting. Add the partition offset to the values passed to blk_stack_limits(). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> --- diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c index be62547..67efac9 100644 --- a/drivers/md/dm-table.c +++ b/drivers/md/dm-table.c @@ -495,6 +495,7 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, struct queue_limits *limits = data; struct block_device *bdev = dev->bdev; struct request_queue *q = bdev_get_queue(bdev); + sector_t offset = (get_start_sect(bdev) + start) << 9; char b[BDEVNAME_SIZE]; if (unlikely(!q)) { @@ -503,7 +504,7 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, return 0; } - if (blk_stack_limits(limits, &q->limits, start << 9) < 0) + if (blk_stack_limits(limits, &q->limits, offset) < 0) DMWARN("%s: target device %s is misaligned: " "physical_block_size=%u, logical_block_size=%u, " "alignment_offset=%u, start=%llu", @@ -511,7 +512,7 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, q->limits.physical_block_size, q->limits.logical_block_size, q->limits.alignment_offset, - (unsigned long long) start << 9); + (unsigned long long) offset); /* ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-18 7:30 [PATCH] dm: Fix alignment stacking on partitioned devices Martin K. Petersen @ 2009-12-18 17:33 ` Mike Snitzer 2009-12-21 17:49 ` Martin K. Petersen 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-12-18 17:33 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe On Fri, Dec 18 2009 at 2:30am -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > > The dm device limits function passes the start sector within the block > device to the block layer stacking function. However, the partition's > offset on the physical device is not added, resulting in incorrect > alignment reporting. > > Add the partition offset to the values passed to blk_stack_limits(). > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Martin, This is not required because DM assumes alignment_offset has already been accounted for by the caller (e.g. LVM2 or some other ficitional DM consumer). Below, "start" represents the aligned start of the data for a given volume. That start must have already been shifted by alignment_offset; as is the case with lvm2 (>= 2.02.51), see the following lvm2 commits: http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commit;h=282029eb45e56 http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commit;h=6c88b6c660020 All this being said, how did you arrive at this patch? Why do you feel it is needed? Was it just from code inspection? Regards, Mike > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c > index be62547..67efac9 100644 > --- a/drivers/md/dm-table.c > +++ b/drivers/md/dm-table.c > @@ -495,6 +495,7 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, > struct queue_limits *limits = data; > struct block_device *bdev = dev->bdev; > struct request_queue *q = bdev_get_queue(bdev); > + sector_t offset = (get_start_sect(bdev) + start) << 9; > char b[BDEVNAME_SIZE]; > > if (unlikely(!q)) { > @@ -503,7 +504,7 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, > return 0; > } > > - if (blk_stack_limits(limits, &q->limits, start << 9) < 0) > + if (blk_stack_limits(limits, &q->limits, offset) < 0) > DMWARN("%s: target device %s is misaligned: " > "physical_block_size=%u, logical_block_size=%u, " > "alignment_offset=%u, start=%llu", > @@ -511,7 +512,7 @@ int dm_set_device_limits(struct dm_target *ti, struct dm_dev *dev, > q->limits.physical_block_size, > q->limits.logical_block_size, > q->limits.alignment_offset, > - (unsigned long long) start << 9); > + (unsigned long long) offset); > > > /* ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-18 17:33 ` Mike Snitzer @ 2009-12-21 17:49 ` Martin K. Petersen 2009-12-21 19:52 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Martin K. Petersen @ 2009-12-21 17:49 UTC (permalink / raw) To: Mike Snitzer, device-mapper development; +Cc: Martin K. Petersen, jens.axboe >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike> This is not required because DM assumes alignment_offset has Mike> already been accounted for by the caller (e.g. LVM2 or some other Mike> ficitional DM consumer). Mike> Below, "start" represents the aligned start of the data for a Mike> given volume. That start must have already been shifted by Mike> alignment_offset; as is the case with lvm2 (>= 2.02.51), see the Mike> following lvm2 commits: Mike> http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commit;h=282029eb45e56 Mike> http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commit;h=6c88b6c660020 Mike> All this being said, how did you arrive at this patch? Why do you Mike> feel it is needed? Was it just from code inspection? Interesting. I have one case in my topology test scripts that prints a DM alignment warning with the old stacking algorithm but doesn't with the new one. A bit surprising given that the new approach is much more picky. MD correctly prints a warning for the same device with both algorithms. So I added a few printks and noticed that DM was calling the stacking function with the same start sector for both devices. That seemed odd because the PV devices are misaligned partitions on the same disk. Note that this was run using an old EL5 LVM toolkit because I'm not interested in having userland compensate for any misalignment. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-21 17:49 ` Martin K. Petersen @ 2009-12-21 19:52 ` Mike Snitzer 2009-12-22 4:27 ` Martin K. Petersen 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-12-21 19:52 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe On Mon, Dec 21 2009 at 12:49pm -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> This is not required because DM assumes alignment_offset has > Mike> already been accounted for by the caller (e.g. LVM2 or some other > Mike> ficitional DM consumer). > > Mike> Below, "start" represents the aligned start of the data for a > Mike> given volume. That start must have already been shifted by > Mike> alignment_offset; as is the case with lvm2 (>= 2.02.51), see the > Mike> following lvm2 commits: > Mike> http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commit;h=282029eb45e56 > Mike> http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commit;h=6c88b6c660020 > > Mike> All this being said, how did you arrive at this patch? Why do you > Mike> feel it is needed? Was it just from code inspection? > > Interesting. > > I have one case in my topology test scripts that prints a DM alignment > warning with the old stacking algorithm but doesn't with the new one. A > bit surprising given that the new approach is much more picky. MD > correctly prints a warning for the same device with both algorithms. > > So I added a few printks and noticed that DM was calling the stacking > function with the same start sector for both devices. That seemed odd > because the PV devices are misaligned partitions on the same disk. That is interesting. Do you have traces from the kernel with the old stacking function? (BTW. you don't _need_ kernel traces because the 'dmsetup table' output gives you the "start"). I have to believe the same "start" values are being provided regardless of the stacking function. If that is the case then the data "start" should be different for these individual PVs (one following the other). If not, can you provide the 'dmsetup table' output for the associated DM devices? > Note that this was run using an old EL5 LVM toolkit because I'm not > interested in having userland compensate for any misalignment. OK, strikes me as odd but you must have your reasons (maybe you're just testing that the old untrained tools create misaligned partitions, LVs, etc?). Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-21 19:52 ` Mike Snitzer @ 2009-12-22 4:27 ` Martin K. Petersen 2009-12-22 14:32 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Martin K. Petersen @ 2009-12-22 4:27 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, Martin K. Petersen, jens.axboe >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike> If that is the case then the data "start" should be different for Mike> these individual PVs (one following the other). If not, can you Mike> provide the 'dmsetup table' output for the associated DM devices? Did some more runs. It turns out the old stacking algorithm identified the two alignments as being incompatible for the wrong reasons. That's why a message was printed with the older kernel. I do find it a bit strange that you pass relative offsets to the stacking function and rely on the userland utilities to do the right thing, though. Doesn't that mean that existing misaligned volumes will go undetected? Or do you perform a check when you prepare the table before feeding it to the kernel? >> Note that this was run using an old EL5 LVM toolkit because I'm not >> interested in having userland compensate for any misalignment. Mike> OK, strikes me as odd but you must have your reasons I'm testing the stacking function. I'm not interested in things being lined up. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-22 4:27 ` Martin K. Petersen @ 2009-12-22 14:32 ` Mike Snitzer 2009-12-22 17:41 ` Martin K. Petersen 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-12-22 14:32 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe On Mon, Dec 21 2009 at 11:27pm -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> If that is the case then the data "start" should be different for > Mike> these individual PVs (one following the other). If not, can you > Mike> provide the 'dmsetup table' output for the associated DM devices? > > Did some more runs. It turns out the old stacking algorithm identified > the two alignments as being incompatible for the wrong reasons. That's > why a message was printed with the older kernel. OK, so is MD somehow getting things wrong? (You originally said that with the new stacking function MD resulted in an error but DM did not). > I do find it a bit strange that you pass relative offsets to the > stacking function and rely on the userland utilities to do the right > thing, though. Doesn't that mean that existing misaligned volumes will > go undetected? Or do you perform a check when you prepare the table > before feeding it to the kernel? "start" isn't a relative offset. The "start" is absolute from the beginning of the device. So for old LVs that are already misaligned they'll pass down a start that _should_ cause misalignment messages in the kernel (via the stacking function). I know I've seen such messages when LVM doesn't compensate for alignment_offset but I'll revisit this with your new stacking function. The justification for relying on userspace to consume and adjust for alignment_offset is: you're not creating LVM LVs with the kernel. Metadata that (may) precede the data "start" of a volume gets layed out (and managed) by userspace (LVM). Not having the kernel involved in generating the "correct" offset to data allows such concerns to be purely in userspace (having the kernel contribute to that calculation doesn't get us anything other than potential for split brain bugs). Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-22 14:32 ` Mike Snitzer @ 2009-12-22 17:41 ` Martin K. Petersen 2009-12-22 21:42 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Martin K. Petersen @ 2009-12-22 17:41 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, Martin K. Petersen, jens.axboe >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike, Mike> OK, so is MD somehow getting things wrong? (You originally said Mike> that with the new stacking function MD resulted in an error but DM Mike> did not). MD is passing in absolute offsets and got the right result in both cases. Mike> "start" isn't a relative offset. It's relative to the beginning of the partition (block_device), not relative to the beginning of the disk (request_queue). My beef here is that DM created a device like this: [root@10 ~]# pvs -o +pe_start PV VG Fmt Attr PSize PFree 1st PE /dev/sde1 foo lvm2 a- 508.00M 256.00M 192.00K /dev/sde2 foo lvm2 a- 512.00M 260.00M 192.00K [root@10 ~]# dmsetup table foo-bar: 0 1032192 striped 2 32 8:66 384 8:65 384 /dev/sde has an alignment_offset of 3584 /dev/sde1 has an alignment_offset of 0 /dev/sde2 has an alignment_offset of 1024 dm_set_device_limits() calls blk_stack_limits() with a byte offset of 196608 for both sde1 and sde2. And that offset is checked for alignment with the queue's limits which has an alignment_offset of 3584. The two PVs are misaligned but that information is lost because you use blk_stack_limits() with partition-relative offsets. My patch was an attempt to fix that. Test case: # modprobe scsi_debug dev_size_mb=1024 num_parts=2 lowest_aligned=7 physblk_exp=3 # pvcreate /dev/sde1 # pvcreate /dev/sde2 # vgcreate foo /dev/sde1 /dev/sde2 # lvcreate -I 16 -i 2 -L 500M -n bar foo -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-22 17:41 ` Martin K. Petersen @ 2009-12-22 21:42 ` Mike Snitzer 2009-12-22 22:13 ` Mike Snitzer 2009-12-23 6:05 ` Martin K. Petersen 0 siblings, 2 replies; 16+ messages in thread From: Mike Snitzer @ 2009-12-22 21:42 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe Martin, Apologies for the long mail here... On Tue, Dec 22 2009 at 12:41pm -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike, > > Mike> OK, so is MD somehow getting things wrong? (You originally said > Mike> that with the new stacking function MD resulted in an error but DM > Mike> did not). > > MD is passing in absolute offsets and got the right result in both > cases. > > > Mike> "start" isn't a relative offset. > > It's relative to the beginning of the partition (block_device), not > relative to the beginning of the disk (request_queue). Ah, thanks for clarifying! No idea why I thought "start" was relative to the beginning of the disk (request_queue). > My beef here is that DM created a device like this: > > [root@10 ~]# pvs -o +pe_start > PV VG Fmt Attr PSize PFree 1st PE > /dev/sde1 foo lvm2 a- 508.00M 256.00M 192.00K > /dev/sde2 foo lvm2 a- 512.00M 260.00M 192.00K > [root@10 ~]# dmsetup table > foo-bar: 0 1032192 striped 2 32 8:66 384 8:65 384 > > /dev/sde has an alignment_offset of 3584 > /dev/sde1 has an alignment_offset of 0 > /dev/sde2 has an alignment_offset of 1024 I'm seeing the following (when I use the scsi_debug from below): # cat /sys/block/sdb/alignment_offset 3584 # cat /sys/block/sdb/sdb1/alignment_offset 512 # cat /sys/block/sdb/sdb2/alignment_offset 512 > dm_set_device_limits() calls blk_stack_limits() with a byte offset of > 196608 for both sde1 and sde2. And that offset is checked for alignment > with the queue's limits which has an alignment_offset of 3584. > > The two PVs are misaligned but that information is lost because you use > blk_stack_limits() with partition-relative offsets. My patch was an > attempt to fix that. Coming full circle, your patch makes sense. Sorry about giving you the run around! But I have new concerns (at the very end below) about the new blk_stack_limits(); I also need to review your patch further (relative to virtual device stacking: does get_start_sect(bdev) always work?). > Test case: > > # modprobe scsi_debug dev_size_mb=1024 num_parts=2 lowest_aligned=7 physblk_exp=3 > # pvcreate /dev/sde1 > # pvcreate /dev/sde2 > # vgcreate foo /dev/sde1 /dev/sde2 > # lvcreate -I 16 -i 2 -L 500M -n bar foo Here I'm just showing that I verified what you were seeing above. With an updated LVM but the older blk_stack_limits() the same test gives me: # pvs -o +pe_start PV VG Fmt Attr PSize PFree 1st PE /dev/sdb1 lvm2 -- 99.48m 99.48m 192.50k /dev/sdb2 lvm2 -- 100.50m 100.50m 192.50k # vgcreate foo /dev/sdb1 /dev/sdb2 # lvcreate -I 16 -i 2 -L100M -n bar foo # dmsetup table foo-bar 0 212992 striped 2 32 8:18 385 8:17 385 device-mapper: table: 253:1: target device sdb2 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=197120 device-mapper: table: 253:1: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=197120 If I disable LVM's alignment_offset detection (runs the same as your older LVM) I get: # pvcreate --config 'devices {data_alignment_offset_detection=0}' /dev/sdb1 # pvcreate --config 'devices {data_alignment_offset_detection=0}' /dev/sdb2 # pvs -o +pe_start PV VG Fmt Attr PSize PFree 1st PE /dev/sdb1 lvm2 -- 99.48m 99.48m 192.00k /dev/sdb2 lvm2 -- 100.50m 100.50m 192.00k ... # dmsetup table foo-bar 0 212992 striped 2 32 8:18 384 8:17 384 device-mapper: table: 253:1: target device sdb2 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=196608 device-mapper: table: 253:1: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=196608 Now if I apply your patch to add the partition offset -- which makes the offset that DM passes to blk_stack_limits() absolute: The old blk_stack_limits(), using LVM2 w/ data_alignment_offset_detection=1 shows this aligned case as "misaligned": device-mapper: table: 253:1: target device sdb2 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104530432 device-mapper: table: 253:1: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=213504 The new blk_stack_limits() "works" (believes the device is aligned). But if I use pvcreate w/ data_alignment_offset_detection=0 it _incorrectly_ believes the device is aligned. Why does the new blk_stack_limits() think the misaligned offset is aligned? I added some debugging to dm_set_device_limits() to see the following: 1) MISALIGNED-case: LVM2 with data_alignment_offset_detection=0: device-mapper: table: 254:1: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104529920 device-mapper: table: 254:1: target device sdb1 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=212992 This is not properly aligned relative to physical_block_size=4096, sdb's alignment_offset=3584 and each partitions' alignment_offset=512: >>> 104529920%4096 0 >>> 212992%4096 0 2) ALIGNED-case: LVM2 with data_alignment_offset_detection=1: device-mapper: table: 254:1: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104530432 device-mapper: table: 254:1: target device sdb1 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=213504 In this 2nd case, LVM2 is correctly adding the alignment_offset (512) for each partitioned device. Each data "start" includes the alignment_offset of 512. So the device is properly aligned: >>> 104530432%4096 512 >>> 213504%4096 512 I'll dig into blk_stack_limits() to see if I can make sense of this. And in the end; what, if anything, is DM doing wrong? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-22 21:42 ` Mike Snitzer @ 2009-12-22 22:13 ` Mike Snitzer 2009-12-23 6:05 ` Martin K. Petersen 1 sibling, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-12-22 22:13 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe On Tue, Dec 22 2009 at 4:42pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > Martin, > > Apologies for the long mail here... > > On Tue, Dec 22 2009 at 12:41pm -0500, > Martin K. Petersen <martin.petersen@oracle.com> wrote: > > > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > > > Mike, > > > > Mike> OK, so is MD somehow getting things wrong? (You originally said > > Mike> that with the new stacking function MD resulted in an error but DM > > Mike> did not). > > > > MD is passing in absolute offsets and got the right result in both > > cases. > > > > > > Mike> "start" isn't a relative offset. > > > > It's relative to the beginning of the partition (block_device), not > > relative to the beginning of the disk (request_queue). > > Ah, thanks for clarifying! No idea why I thought "start" was relative > to the beginning of the disk (request_queue). > > > My beef here is that DM created a device like this: > > > > [root@10 ~]# pvs -o +pe_start > > PV VG Fmt Attr PSize PFree 1st PE > > /dev/sde1 foo lvm2 a- 508.00M 256.00M 192.00K > > /dev/sde2 foo lvm2 a- 512.00M 260.00M 192.00K > > [root@10 ~]# dmsetup table > > foo-bar: 0 1032192 striped 2 32 8:66 384 8:65 384 > > > > /dev/sde has an alignment_offset of 3584 > > /dev/sde1 has an alignment_offset of 0 > > /dev/sde2 has an alignment_offset of 1024 > > I'm seeing the following (when I use the scsi_debug from below): > # cat /sys/block/sdb/alignment_offset > 3584 > # cat /sys/block/sdb/sdb1/alignment_offset > 512 > # cat /sys/block/sdb/sdb2/alignment_offset > 512 Bleh, I used dev_size_mb=200 to get the above alignment_offsets. > > # modprobe scsi_debug dev_size_mb=1024 num_parts=2 lowest_aligned=7 physblk_exp=3 If I increase to dev_size_mb=500 I get: sdb1 alignment_offset=0 sdb2 alignment_offset=512 If I increase to dev_size_mb=1000 I also get: sdb1 alignment_offset=0 sdb2 alignment_offset=1024 Anyway, this doesn't change the fact that the new blk_stack_limits() isn't detecting the offset that DM provides as misaligned (in the legacy LVM case, data_alignment_offset_detection=0). In an earlier mail you said: dm_set_device_limits() calls blk_stack_limits() with a byte offset of 196608 for both sde1 and sde2. And that offset is checked for alignment with the queue's limits which has an alignment_offset of 3584. Applying that to this example that I shared: > The new blk_stack_limits() "works" (believes the device is aligned). > But if I use pvcreate w/ data_alignment_offset_detection=0 it > _incorrectly_ believes the device is aligned. > > Why does the new blk_stack_limits() think the misaligned offset is > aligned? I added some debugging to dm_set_device_limits() to see the > following: > > 1) MISALIGNED-case: LVM2 with data_alignment_offset_detection=0: > device-mapper: table: 254:1: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104529920 > device-mapper: table: 254:1: target device sdb1 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=212992 > > This is not properly aligned relative to physical_block_size=4096, sdb's > alignment_offset=3584 and each partitions' alignment_offset=512: > > >>> 104529920%4096 > 0 > >>> 212992%4096 > 0 dm_set_device_limits() calls blk_stack_limits() with a byte offset of 212992 for sdb1 and 104529920 for sdb2. And those offsets are checked for alignment with the queue's limits which has an alignment_offset of 3584. /me now goes to look at the new blk_stack_limits() ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-22 21:42 ` Mike Snitzer 2009-12-22 22:13 ` Mike Snitzer @ 2009-12-23 6:05 ` Martin K. Petersen 2009-12-23 14:26 ` Mike Snitzer 1 sibling, 1 reply; 16+ messages in thread From: Martin K. Petersen @ 2009-12-23 6:05 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, Martin K. Petersen, jens.axboe >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike> I also need to review your patch further (relative to virtual Mike> device stacking: does get_start_sect(bdev) always work?). Yep. I found a bug in the partition alignment reporting. The following patch should fix that. Please try it out. Concerning your test cases: It is perfectly valid for two component devices to be misaligned with regards to their underlying physical devices as long as they have identical alignment. In that case the top level device will report a suitable alignment_offset. block: Fix topology stacking for data and discard alignment The stacking code incorrectly scaled up the data offset in some cases causing misaligned devices to report alignment. Rewrite the stacking algorithm to remedy this and apply the same alignment principles to the discard handling. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> diff --git a/block/blk-settings.c b/block/blk-settings.c index 6ae118d..ca4f0a4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -517,9 +517,8 @@ static unsigned int lcm(unsigned int a, unsigned int b) int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, sector_t offset) { - int ret; - - ret = 0; + sector_t alignment; + unsigned int top, bottom; t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); @@ -537,6 +536,18 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + alignment = queue_limit_alignment_offset(b, offset); + + if (t->alignment_offset != alignment) { + + top = max(t->physical_block_size, t->io_min) + + t->alignment_offset; + bottom = max(b->physical_block_size, b->io_min) + alignment; + + if (max(top, bottom) & (min(top, bottom) - 1)) + t->misaligned = 1; + } + t->logical_block_size = max(t->logical_block_size, b->logical_block_size); @@ -544,54 +555,55 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->physical_block_size); t->io_min = max(t->io_min, b->io_min); + t->io_opt = lcm(t->io_opt, b->io_opt); + t->no_cluster |= b->no_cluster; t->discard_zeroes_data &= b->discard_zeroes_data; - /* Bottom device offset aligned? */ - if (offset && - (offset & (b->physical_block_size - 1)) != b->alignment_offset) { + if (t->physical_block_size & (t->logical_block_size - 1)) { + t->physical_block_size = t->logical_block_size; t->misaligned = 1; - ret = -1; } - /* - * Temporarily disable discard granularity. It's currently buggy - * since we default to 0 for discard_granularity, hence this - * "failure" will always trigger for non-zero offsets. - */ -#if 0 - if (offset && - (offset & (b->discard_granularity - 1)) != b->discard_alignment) { - t->discard_misaligned = 1; - ret = -1; + if (t->io_min & (t->physical_block_size - 1)) { + t->io_min = t->physical_block_size; + t->misaligned = 1; } -#endif - /* If top has no alignment offset, inherit from bottom */ - if (!t->alignment_offset) - t->alignment_offset = - b->alignment_offset & (b->physical_block_size - 1); + if (t->io_opt & (t->physical_block_size - 1)) { + t->io_opt = 0; + t->misaligned = 1; + } - if (!t->discard_alignment) - t->discard_alignment = - b->discard_alignment & (b->discard_granularity - 1); + t->alignment_offset = lcm(t->alignment_offset, alignment) + & (max(t->physical_block_size, t->io_min) - 1); - /* Top device aligned on logical block boundary? */ - if (t->alignment_offset & (t->logical_block_size - 1)) { + if (t->alignment_offset & (t->logical_block_size - 1)) t->misaligned = 1; - ret = -1; - } - /* Find lcm() of optimal I/O size and granularity */ - t->io_opt = lcm(t->io_opt, b->io_opt); - t->discard_granularity = lcm(t->discard_granularity, - b->discard_granularity); + /* Discard alignment and granularity */ + if (b->discard_granularity) { + + alignment = b->discard_alignment - + (offset & (b->discard_granularity - 1)); + + if (t->discard_granularity != 0 && + t->discard_alignment != alignment) { + top = t->discard_granularity + t->discard_alignment; + bottom = b->discard_granularity + alignment; - /* Verify that optimal I/O size is a multiple of io_min */ - if (t->io_min && t->io_opt % t->io_min) - ret = -1; + /* Verify that top and bottom intervals line up */ + if (max(top, bottom) & (min(top, bottom) - 1)) + t->discard_misaligned = 1; + } + + t->discard_granularity = max(t->discard_granularity, + b->discard_granularity); + t->discard_alignment = lcm(t->discard_alignment, alignment) & + (t->discard_granularity - 1); + } - return ret; + return t->misaligned ? -1 : 0; } EXPORT_SYMBOL(blk_stack_limits); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 784a919..af0ffac 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1116,11 +1116,19 @@ static inline int queue_alignment_offset(struct request_queue *q) return q->limits.alignment_offset; } +static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t offset) +{ + unsigned int granularity = max(lim->physical_block_size, lim->io_min); + + offset &= granularity - 1; + + return (granularity + lim->alignment_offset - offset) & (granularity - 1); +} + static inline int queue_sector_alignment_offset(struct request_queue *q, sector_t sector) { - return ((sector << 9) - q->limits.alignment_offset) - & (q->limits.io_min - 1); + return queue_limit_alignment_offset(&q->limits, sector << 9); } static inline int bdev_alignment_offset(struct block_device *bdev) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-23 6:05 ` Martin K. Petersen @ 2009-12-23 14:26 ` Mike Snitzer 2009-12-23 16:33 ` Martin K. Petersen 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-12-23 14:26 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe On Wed, Dec 23 2009 at 1:05am -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> I also need to review your patch further (relative to virtual > Mike> device stacking: does get_start_sect(bdev) always work?). > > Yep. > > I found a bug in the partition alignment reporting. The following patch > should fix that. Please try it out. alignment_offset(s) look good now. Though I have one observation below. > Concerning your test cases: It is perfectly valid for two component > devices to be misaligned with regards to their underlying physical > devices as long as they have identical alignment. In that case the top > level device will report a suitable alignment_offset. One thing I noticed along the way is that: when stacking one or more misaligned devices there is the potential for misleading error messages, e.g.: foo-bar: 0 212992 striped 2 32 8:18 384 8:17 384 sdb alignment_offset is 3584 sdb1 alignment_offset is 0 sdb2 alignment_offset is 3072 when loading this table sdb2 is stacked before sdb1; sdb2 is misaligned but sdb1 is not, yet: device-mapper: table: 254:2: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=534839808 device-mapper: table: 254:2: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=228864 >>> 228864 % 4096 3584 >>> 534839808 % 4096 512 So sdb2 is taken to be aligned (even though it isn't, relative to the queue's alignment_offset of 3584) and then when sdb1 is stacked it is flagged as misaligned. Doesn't seem right. > block: Fix topology stacking for data and discard alignment > > The stacking code incorrectly scaled up the data offset in some cases > causing misaligned devices to report alignment. Rewrite the stacking > algorithm to remedy this and apply the same alignment principles to the > discard handling. > > Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> With this patch the alignment_offset values are correct. They accurately reflect the established baseline (queue's limits alignment_offset). Here is the incremental diff (for those following along): --- block/blk-settings.c | 7 +++---- include/linux/blkdev.h | 12 ++++++++++-- 2 files changed, 13 insertions(+), 6 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index e14fcbc..ca4f0a4 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -518,7 +518,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, sector_t offset) { sector_t alignment; - unsigned int top, bottom, granularity; + unsigned int top, bottom; t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); @@ -536,14 +536,13 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); - granularity = max(b->physical_block_size, b->io_min); - alignment = b->alignment_offset - (offset & (granularity - 1)); + alignment = queue_limit_alignment_offset(b, offset); if (t->alignment_offset != alignment) { top = max(t->physical_block_size, t->io_min) + t->alignment_offset; - bottom = granularity + alignment; + bottom = max(b->physical_block_size, b->io_min) + alignment; if (max(top, bottom) & (min(top, bottom) - 1)) t->misaligned = 1; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 784a919..af0ffac 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1116,11 +1116,19 @@ static inline int queue_alignment_offset(struct request_queue *q) return q->limits.alignment_offset; } +static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t offset) +{ + unsigned int granularity = max(lim->physical_block_size, lim->io_min); + + offset &= granularity - 1; + + return (granularity + lim->alignment_offset - offset) & (granularity - 1); +} + static inline int queue_sector_alignment_offset(struct request_queue *q, sector_t sector) { - return ((sector << 9) - q->limits.alignment_offset) - & (q->limits.io_min - 1); + return queue_limit_alignment_offset(&q->limits, sector << 9); } static inline int bdev_alignment_offset(struct block_device *bdev) ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-23 14:26 ` Mike Snitzer @ 2009-12-23 16:33 ` Martin K. Petersen 2009-12-23 17:13 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Martin K. Petersen @ 2009-12-23 16:33 UTC (permalink / raw) To: Mike Snitzer; +Cc: device-mapper development, Martin K. Petersen, jens.axboe >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: Mike> One thing I noticed along the way is that: when stacking one or Mike> more misaligned devices there is the potential for misleading Mike> error messages, e.g.: [...] Mike> So sdb2 is taken to be aligned (even though it isn't, relative to Mike> the queue's alignment_offset of 3584) and then when sdb1 is Mike> stacked it is flagged as misaligned. Doesn't seem right. I know. It's an unfortunate side-effect of having an iterative stacking process. I don't know the whole picture so I don't know which device the real offender is. At the time you add device #2 I have no recollection of device #1. When you create a new top-level (DM/MD) device and add the first disk to it I have no option but to inherit that component's queue limits. So that's what initially defines the top-level. In your case adding a misaligned device will cause the stacking to report that alignment can be obtained by padding the top level device with an offset. Then you add a device which happens to be naturally aligned. And that means the stacking logic breaks down because the 0-alignment is incompatible with the top's nonzero alignment_offset. Originally (before the Great Naming Debate) the `misaligned' flag was called `inconsistent' (actually, it was called `consistent' and the logic was reversed). I think that's a better term for it. Compatible would also work, I guess. So when blk_stack_limits() returns an error it is because no consistent alignment could be calculated. blk_stack_limits() returning 0 does not imply "alignment" (i.e. an alignment_offset of 0). A return value of 0 means that we have successfully added the bottom device and the resulting top device queue_limits now apply. Maybe you should print something along the lines of "adding device sdX caused an alignment consistency error on DM device foo". That's a bit vague but doesn't single out sdX as much. The alternative is to add a device list to the queue_limits so we can record previously stacked devices. But that opens up a whole other can of worms... -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: dm: Fix alignment stacking on partitioned devices 2009-12-23 16:33 ` Martin K. Petersen @ 2009-12-23 17:13 ` Mike Snitzer 2009-12-23 20:31 ` Topology fixes Martin K. Petersen ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Mike Snitzer @ 2009-12-23 17:13 UTC (permalink / raw) To: Martin K. Petersen; +Cc: device-mapper development, jens.axboe On Wed, Dec 23 2009 at 11:33am -0500, Martin K. Petersen <martin.petersen@oracle.com> wrote: > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes: > > Mike> One thing I noticed along the way is that: when stacking one or > Mike> more misaligned devices there is the potential for misleading > Mike> error messages, e.g.: > > [...] > > Mike> So sdb2 is taken to be aligned (even though it isn't, relative to > Mike> the queue's alignment_offset of 3584) and then when sdb1 is > Mike> stacked it is flagged as misaligned. Doesn't seem right. > > I know. It's an unfortunate side-effect of having an iterative stacking > process. I don't know the whole picture so I don't know which device > the real offender is. At the time you add device #2 I have no > recollection of device #1. > > When you create a new top-level (DM/MD) device and add the first disk to > it I have no option but to inherit that component's queue limits. So > that's what initially defines the top-level. > > In your case adding a misaligned device will cause the stacking to > report that alignment can be obtained by padding the top level device > with an offset. Then you add a device which happens to be naturally > aligned. And that means the stacking logic breaks down because the > 0-alignment is incompatible with the top's nonzero alignment_offset. Understood. Thanks for clarifying. > Originally (before the Great Naming Debate) the `misaligned' flag was > called `inconsistent' (actually, it was called `consistent' and the > logic was reversed). I think that's a better term for it. Compatible > would also work, I guess. > > So when blk_stack_limits() returns an error it is because no consistent > alignment could be calculated. blk_stack_limits() returning 0 does not > imply "alignment" (i.e. an alignment_offset of 0). A return value of 0 > means that we have successfully added the bottom device and the > resulting top device queue_limits now apply. I think it would be great if you added something along those lines as a comment block above blk_stack_limits() (rather than "Returns 0 if alignment didn't change.") Also, your most recent patch eliminates many of the inline comments you had in blk_stack_limits(). I think that is unfortunate considering those comments helped give context for code that wasn't always immediately clear on its own. That observation aside, please feel free to add 'Tested-by: Mike Snitzer <snitzer@redhat.com>' to your most recent blk_stack_limits() patch header. > Maybe you should print something along the lines of "adding device sdX > caused an alignment consistency error on DM device foo". That's a bit > vague but doesn't single out sdX as much. Good suggestion. I'll work that in to the DM patch you originally posted (the start of this thread) and repost it to dm-devel. Thanks, Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Topology fixes 2009-12-23 17:13 ` Mike Snitzer @ 2009-12-23 20:31 ` Martin K. Petersen 2009-12-23 20:31 ` [PATCH 1/2] block: Fix incorrect reporting of partition alignment Martin K. Petersen 2009-12-23 20:31 ` [PATCH 2/2] block: Fix topology stacking for data and discard alignment Martin K. Petersen 2 siblings, 0 replies; 16+ messages in thread From: Martin K. Petersen @ 2009-12-23 20:31 UTC (permalink / raw) To: jens.axboe, snitzer, dm-devel These two patches fix several block I/O topology stacking issues. The first patch corrects an error in calculating partition offsets. It should go into stable as well as .33. The second patch is a revamp of the stacking algorithm for both data and discard. 2.6.33 only. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/2] block: Fix incorrect reporting of partition alignment 2009-12-23 17:13 ` Mike Snitzer 2009-12-23 20:31 ` Topology fixes Martin K. Petersen @ 2009-12-23 20:31 ` Martin K. Petersen 2009-12-23 20:31 ` [PATCH 2/2] block: Fix topology stacking for data and discard alignment Martin K. Petersen 2 siblings, 0 replies; 16+ messages in thread From: Martin K. Petersen @ 2009-12-23 20:31 UTC (permalink / raw) To: jens.axboe, snitzer, dm-devel; +Cc: Martin K. Petersen queue_sector_alignment_offset returned the wrong value which caused partitions to report an incorrect alignment_offset. Since offset calculation is needed several places it has been split into a separate helper function. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Mike Snitzer <snitzer@redhat.com> --- include/linux/blkdev.h | 11 +++++++++-- 1 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 784a919..59b832b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1116,11 +1116,18 @@ static inline int queue_alignment_offset(struct request_queue *q) return q->limits.alignment_offset; } +static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t offset) +{ + unsigned int granularity = max(lim->physical_block_size, lim->io_min); + + offset &= granularity - 1; + return (granularity + lim->alignment_offset - offset) & (granularity - 1); +} + static inline int queue_sector_alignment_offset(struct request_queue *q, sector_t sector) { - return ((sector << 9) - q->limits.alignment_offset) - & (q->limits.io_min - 1); + return queue_limit_alignment_offset(&q->limits, sector << 9); } static inline int bdev_alignment_offset(struct block_device *bdev) -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] block: Fix topology stacking for data and discard alignment 2009-12-23 17:13 ` Mike Snitzer 2009-12-23 20:31 ` Topology fixes Martin K. Petersen 2009-12-23 20:31 ` [PATCH 1/2] block: Fix incorrect reporting of partition alignment Martin K. Petersen @ 2009-12-23 20:31 ` Martin K. Petersen 2 siblings, 0 replies; 16+ messages in thread From: Martin K. Petersen @ 2009-12-23 20:31 UTC (permalink / raw) To: jens.axboe, snitzer, dm-devel; +Cc: Martin K. Petersen The stacking code incorrectly scaled up the data offset in some cases causing misaligned devices to report alignment. Rewrite the stacking algorithm to remedy this and apply the same alignment principles to the discard handling. Clarify documentation and comments. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Tested-by: Mike Snitzer <snitzer@redhat.com> --- block/blk-settings.c | 121 ++++++++++++++++++++++++++++++++------------------ 1 files changed, 78 insertions(+), 43 deletions(-) diff --git a/block/blk-settings.c b/block/blk-settings.c index 6ae118d..d52d4ad 100644 --- a/block/blk-settings.c +++ b/block/blk-settings.c @@ -505,21 +505,30 @@ static unsigned int lcm(unsigned int a, unsigned int b) /** * blk_stack_limits - adjust queue_limits for stacked devices - * @t: the stacking driver limits (top) - * @b: the underlying queue limits (bottom) + * @t: the stacking driver limits (top device) + * @b: the underlying queue limits (bottom, component device) * @offset: offset to beginning of data within component device * * Description: - * Merges two queue_limit structs. Returns 0 if alignment didn't - * change. Returns -1 if adding the bottom device caused - * misalignment. + * This function is used by stacking drivers like MD and DM to ensure + * that all component devices have compatible block sizes and + * alignments. The stacking driver must provide a queue_limits + * struct (top) and then iteratively call the stacking function for + * all component (bottom) devices. The stacking function will + * attempt to combine the values and ensure proper alignment. + * + * Returns 0 if the top and bottom queue_limits are compatible. The + * top device's block sizes and alignment offsets may be adjusted to + * ensure alignment with the bottom device. If no compatible sizes + * and alignments exist, -1 is returned and the resulting top + * queue_limits will have the misaligned flag set to indicate that + * the alignment_offset is undefined. */ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, sector_t offset) { - int ret; - - ret = 0; + sector_t alignment; + unsigned int top, bottom; t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors); t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors); @@ -537,6 +546,22 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size); + alignment = queue_limit_alignment_offset(b, offset); + + /* Bottom device has different alignment. Check that it is + * compatible with the current top alignment. + */ + if (t->alignment_offset != alignment) { + + top = max(t->physical_block_size, t->io_min) + + t->alignment_offset; + bottom = max(b->physical_block_size, b->io_min) + alignment; + + /* Verify that top and bottom intervals line up */ + if (max(top, bottom) & (min(top, bottom) - 1)) + t->misaligned = 1; + } + t->logical_block_size = max(t->logical_block_size, b->logical_block_size); @@ -544,54 +569,64 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, b->physical_block_size); t->io_min = max(t->io_min, b->io_min); + t->io_opt = lcm(t->io_opt, b->io_opt); + t->no_cluster |= b->no_cluster; t->discard_zeroes_data &= b->discard_zeroes_data; - /* Bottom device offset aligned? */ - if (offset && - (offset & (b->physical_block_size - 1)) != b->alignment_offset) { + /* Physical block size a multiple of the logical block size? */ + if (t->physical_block_size & (t->logical_block_size - 1)) { + t->physical_block_size = t->logical_block_size; t->misaligned = 1; - ret = -1; } - /* - * Temporarily disable discard granularity. It's currently buggy - * since we default to 0 for discard_granularity, hence this - * "failure" will always trigger for non-zero offsets. - */ -#if 0 - if (offset && - (offset & (b->discard_granularity - 1)) != b->discard_alignment) { - t->discard_misaligned = 1; - ret = -1; + /* Minimum I/O a multiple of the physical block size? */ + if (t->io_min & (t->physical_block_size - 1)) { + t->io_min = t->physical_block_size; + t->misaligned = 1; } -#endif - - /* If top has no alignment offset, inherit from bottom */ - if (!t->alignment_offset) - t->alignment_offset = - b->alignment_offset & (b->physical_block_size - 1); - if (!t->discard_alignment) - t->discard_alignment = - b->discard_alignment & (b->discard_granularity - 1); - - /* Top device aligned on logical block boundary? */ - if (t->alignment_offset & (t->logical_block_size - 1)) { + /* Optimal I/O a multiple of the physical block size? */ + if (t->io_opt & (t->physical_block_size - 1)) { + t->io_opt = 0; t->misaligned = 1; - ret = -1; } - /* Find lcm() of optimal I/O size and granularity */ - t->io_opt = lcm(t->io_opt, b->io_opt); - t->discard_granularity = lcm(t->discard_granularity, - b->discard_granularity); + /* Find lowest common alignment_offset */ + t->alignment_offset = lcm(t->alignment_offset, alignment) + & (max(t->physical_block_size, t->io_min) - 1); - /* Verify that optimal I/O size is a multiple of io_min */ - if (t->io_min && t->io_opt % t->io_min) - ret = -1; + /* Verify that new alignment_offset is on a logical block boundary */ + if (t->alignment_offset & (t->logical_block_size - 1)) + t->misaligned = 1; + + /* Discard alignment and granularity */ + if (b->discard_granularity) { + unsigned int granularity = b->discard_granularity; + offset &= granularity - 1; + + alignment = (granularity + b->discard_alignment - offset) + & (granularity - 1); + + if (t->discard_granularity != 0 && + t->discard_alignment != alignment) { + top = t->discard_granularity + t->discard_alignment; + bottom = b->discard_granularity + alignment; + + /* Verify that top and bottom intervals line up */ + if (max(top, bottom) & (min(top, bottom) - 1)) + t->discard_misaligned = 1; + } + + t->max_discard_sectors = min_not_zero(t->max_discard_sectors, + b->max_discard_sectors); + t->discard_granularity = max(t->discard_granularity, + b->discard_granularity); + t->discard_alignment = lcm(t->discard_alignment, alignment) & + (t->discard_granularity - 1); + } - return ret; + return t->misaligned ? -1 : 0; } EXPORT_SYMBOL(blk_stack_limits); -- 1.6.0.6 ^ permalink raw reply related [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-12-23 20:31 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-18 7:30 [PATCH] dm: Fix alignment stacking on partitioned devices Martin K. Petersen 2009-12-18 17:33 ` Mike Snitzer 2009-12-21 17:49 ` Martin K. Petersen 2009-12-21 19:52 ` Mike Snitzer 2009-12-22 4:27 ` Martin K. Petersen 2009-12-22 14:32 ` Mike Snitzer 2009-12-22 17:41 ` Martin K. Petersen 2009-12-22 21:42 ` Mike Snitzer 2009-12-22 22:13 ` Mike Snitzer 2009-12-23 6:05 ` Martin K. Petersen 2009-12-23 14:26 ` Mike Snitzer 2009-12-23 16:33 ` Martin K. Petersen 2009-12-23 17:13 ` Mike Snitzer 2009-12-23 20:31 ` Topology fixes Martin K. Petersen 2009-12-23 20:31 ` [PATCH 1/2] block: Fix incorrect reporting of partition alignment Martin K. Petersen 2009-12-23 20:31 ` [PATCH 2/2] block: Fix topology stacking for data and discard alignment Martin K. Petersen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.