All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: device-mapper development <dm-devel@redhat.com>, jens.axboe@oracle.com
Subject: Re: dm: Fix alignment stacking on partitioned devices
Date: Tue, 22 Dec 2009 16:42:52 -0500	[thread overview]
Message-ID: <20091222214250.GA28497@redhat.com> (raw)
In-Reply-To: <yq1my1a52k4.fsf@sermon.lab.mkp.net>

Martin,

Apologies for the long mail here...

On Tue, Dec 22 2009 at 12:41pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike,
> 
> Mike> OK, so is MD somehow getting things wrong?  (You originally said
> Mike> that with the new stacking function MD resulted in an error but DM
> Mike> did not).
> 
> MD is passing in absolute offsets and got the right result in both
> cases.
> 
> 
> Mike> "start" isn't a relative offset.  
> 
> It's relative to the beginning of the partition (block_device), not
> relative to the beginning of the disk (request_queue).

Ah, thanks for clarifying!  No idea why I thought "start" was relative
to the beginning of the disk (request_queue).

> My beef here is that DM created a device like this:
> 
> [root@10 ~]# pvs -o +pe_start
>   PV         VG   Fmt  Attr PSize   PFree   1st PE 
>   /dev/sde1  foo  lvm2 a-   508.00M 256.00M 192.00K
>   /dev/sde2  foo  lvm2 a-   512.00M 260.00M 192.00K
> [root@10 ~]# dmsetup table
> foo-bar: 0 1032192 striped 2 32 8:66 384 8:65 384
> 
> /dev/sde  has an alignment_offset of 3584
> /dev/sde1 has an alignment_offset of 0
> /dev/sde2 has an alignment_offset of 1024

I'm seeing the following (when I use the scsi_debug from below):
# cat /sys/block/sdb/alignment_offset 
3584
# cat /sys/block/sdb/sdb1/alignment_offset 
512
# cat /sys/block/sdb/sdb2/alignment_offset 
512
 
> dm_set_device_limits() calls blk_stack_limits() with a byte offset of
> 196608 for both sde1 and sde2.  And that offset is checked for alignment
> with the queue's limits which has an alignment_offset of 3584.
> 
> The two PVs are misaligned but that information is lost because you use
> blk_stack_limits() with partition-relative offsets.  My patch was an
> attempt to fix that.

Coming full circle, your patch makes sense.  Sorry about giving you the
run around!  But I have new concerns (at the very end below) about the
new blk_stack_limits(); I also need to review your patch further
(relative to virtual device stacking: does get_start_sect(bdev) always
work?).

> Test case:
> 
> # modprobe scsi_debug dev_size_mb=1024 num_parts=2 lowest_aligned=7 physblk_exp=3
> # pvcreate /dev/sde1
> # pvcreate /dev/sde2
> # vgcreate foo /dev/sde1 /dev/sde2
> # lvcreate -I 16 -i 2 -L 500M -n bar foo

Here I'm just showing that I verified what you were seeing above.  With
an updated LVM but the older blk_stack_limits() the same test gives me:

# pvs -o +pe_start
  PV         VG   Fmt  Attr PSize   PFree   1st PE 
  /dev/sdb1       lvm2 --    99.48m  99.48m 192.50k
  /dev/sdb2       lvm2 --   100.50m 100.50m 192.50k
# vgcreate foo /dev/sdb1 /dev/sdb2
# lvcreate -I 16 -i 2 -L100M -n bar foo
# dmsetup table foo-bar
0 212992 striped 2 32 8:18 385 8:17 385

device-mapper: table: 253:1: target device sdb2 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=197120
device-mapper: table: 253:1: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=197120

If I disable LVM's alignment_offset detection (runs the same as your
older LVM) I get:
# pvcreate --config 'devices {data_alignment_offset_detection=0}' /dev/sdb1
# pvcreate --config 'devices {data_alignment_offset_detection=0}' /dev/sdb2
# pvs -o +pe_start
  PV         VG   Fmt  Attr PSize   PFree   1st PE 
  /dev/sdb1       lvm2 --    99.48m  99.48m 192.00k
  /dev/sdb2       lvm2 --   100.50m 100.50m 192.00k
...
# dmsetup table foo-bar
0 212992 striped 2 32 8:18 384 8:17 384

device-mapper: table: 253:1: target device sdb2 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=196608
device-mapper: table: 253:1: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=196608


Now if I apply your patch to add the partition offset -- which makes the
offset that DM passes to blk_stack_limits() absolute:

The old blk_stack_limits(), using LVM2 w/
data_alignment_offset_detection=1 shows this aligned case as
"misaligned":
device-mapper: table: 253:1: target device sdb2 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104530432
device-mapper: table: 253:1: target device sdb1 is misaligned: physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=213504

The new blk_stack_limits() "works" (believes the device is aligned).
But if I use pvcreate w/ data_alignment_offset_detection=0 it
_incorrectly_ believes the device is aligned.

Why does the new blk_stack_limits() think the misaligned offset is
aligned?  I added some debugging to dm_set_device_limits() to see the
following:

1) MISALIGNED-case: LVM2 with data_alignment_offset_detection=0:
device-mapper: table: 254:1: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104529920
device-mapper: table: 254:1: target device sdb1 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=212992

This is not properly aligned relative to physical_block_size=4096, sdb's
alignment_offset=3584 and each partitions' alignment_offset=512:

>>> 104529920%4096
0
>>> 212992%4096
0

2) ALIGNED-case: LVM2 with data_alignment_offset_detection=1:
device-mapper: table: 254:1: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104530432
device-mapper: table: 254:1: target device sdb1 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=213504

In this 2nd case, LVM2 is correctly adding the alignment_offset (512)
for each partitioned device.  Each data "start" includes the
alignment_offset of 512.  So the device is properly aligned:

>>> 104530432%4096
512
>>> 213504%4096
512


I'll dig into blk_stack_limits() to see if I can make sense of this.
And in the end; what, if anything, is DM doing wrong?

  reply	other threads:[~2009-12-22 21:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-18  7:30 [PATCH] dm: Fix alignment stacking on partitioned devices Martin K. Petersen
2009-12-18 17:33 ` Mike Snitzer
2009-12-21 17:49   ` Martin K. Petersen
2009-12-21 19:52     ` Mike Snitzer
2009-12-22  4:27       ` Martin K. Petersen
2009-12-22 14:32         ` Mike Snitzer
2009-12-22 17:41           ` Martin K. Petersen
2009-12-22 21:42             ` Mike Snitzer [this message]
2009-12-22 22:13               ` Mike Snitzer
2009-12-23  6:05               ` Martin K. Petersen
2009-12-23 14:26                 ` Mike Snitzer
2009-12-23 16:33                   ` Martin K. Petersen
2009-12-23 17:13                     ` Mike Snitzer
2009-12-23 20:31                       ` Topology fixes Martin K. Petersen
2009-12-23 20:31                       ` [PATCH 1/2] block: Fix incorrect reporting of partition alignment Martin K. Petersen
2009-12-23 20:31                       ` [PATCH 2/2] block: Fix topology stacking for data and discard alignment Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091222214250.GA28497@redhat.com \
    --to=snitzer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jens.axboe@oracle.com \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.