All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: device-mapper development <dm-devel@redhat.com>, jens.axboe@oracle.com
Subject: Re: dm: Fix alignment stacking on partitioned devices
Date: Tue, 22 Dec 2009 17:13:53 -0500	[thread overview]
Message-ID: <20091222221352.GA16299@redhat.com> (raw)
In-Reply-To: <20091222214250.GA28497@redhat.com>

On Tue, Dec 22 2009 at  4:42pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> Martin,
> 
> Apologies for the long mail here...
> 
> On Tue, Dec 22 2009 at 12:41pm -0500,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:
> 
> > >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> > 
> > Mike,
> > 
> > Mike> OK, so is MD somehow getting things wrong?  (You originally said
> > Mike> that with the new stacking function MD resulted in an error but DM
> > Mike> did not).
> > 
> > MD is passing in absolute offsets and got the right result in both
> > cases.
> > 
> > 
> > Mike> "start" isn't a relative offset.  
> > 
> > It's relative to the beginning of the partition (block_device), not
> > relative to the beginning of the disk (request_queue).
> 
> Ah, thanks for clarifying!  No idea why I thought "start" was relative
> to the beginning of the disk (request_queue).
> 
> > My beef here is that DM created a device like this:
> > 
> > [root@10 ~]# pvs -o +pe_start
> >   PV         VG   Fmt  Attr PSize   PFree   1st PE 
> >   /dev/sde1  foo  lvm2 a-   508.00M 256.00M 192.00K
> >   /dev/sde2  foo  lvm2 a-   512.00M 260.00M 192.00K
> > [root@10 ~]# dmsetup table
> > foo-bar: 0 1032192 striped 2 32 8:66 384 8:65 384
> > 
> > /dev/sde  has an alignment_offset of 3584
> > /dev/sde1 has an alignment_offset of 0
> > /dev/sde2 has an alignment_offset of 1024
> 
> I'm seeing the following (when I use the scsi_debug from below):
> # cat /sys/block/sdb/alignment_offset 
> 3584
> # cat /sys/block/sdb/sdb1/alignment_offset 
> 512
> # cat /sys/block/sdb/sdb2/alignment_offset 
> 512

Bleh, I used dev_size_mb=200 to get the above alignment_offsets.

> > # modprobe scsi_debug dev_size_mb=1024 num_parts=2 lowest_aligned=7 physblk_exp=3

If I increase to dev_size_mb=500 I get:
sdb1 alignment_offset=0
sdb2 alignment_offset=512

If I increase to dev_size_mb=1000 I also get:
sdb1 alignment_offset=0
sdb2 alignment_offset=1024

Anyway, this doesn't change the fact that the new blk_stack_limits()
isn't detecting the offset that DM provides as misaligned (in the legacy
LVM case, data_alignment_offset_detection=0).

In an earlier mail you said:
dm_set_device_limits() calls blk_stack_limits() with a byte offset of
196608 for both sde1 and sde2.  And that offset is checked for alignment
with the queue's limits which has an alignment_offset of 3584.

Applying that to this example that I shared:

> The new blk_stack_limits() "works" (believes the device is aligned).
> But if I use pvcreate w/ data_alignment_offset_detection=0 it
> _incorrectly_ believes the device is aligned.
> 
> Why does the new blk_stack_limits() think the misaligned offset is
> aligned?  I added some debugging to dm_set_device_limits() to see the
> following:
> 
> 1) MISALIGNED-case: LVM2 with data_alignment_offset_detection=0:
> device-mapper: table: 254:1: target device sdb2 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=104529920
> device-mapper: table: 254:1: target device sdb1 call to blk_stack_limits(): physical_block_size=4096, logical_block_size=512, alignment_offset=3584, start=212992
> 
> This is not properly aligned relative to physical_block_size=4096, sdb's
> alignment_offset=3584 and each partitions' alignment_offset=512:
> 
> >>> 104529920%4096
> 0
> >>> 212992%4096
> 0

dm_set_device_limits() calls blk_stack_limits() with a byte offset of
212992 for sdb1 and 104529920 for sdb2.  And those offsets are checked
for alignment with the queue's limits which has an alignment_offset of
3584.

/me now goes to look at the new blk_stack_limits()

  reply	other threads:[~2009-12-22 22:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-18  7:30 [PATCH] dm: Fix alignment stacking on partitioned devices Martin K. Petersen
2009-12-18 17:33 ` Mike Snitzer
2009-12-21 17:49   ` Martin K. Petersen
2009-12-21 19:52     ` Mike Snitzer
2009-12-22  4:27       ` Martin K. Petersen
2009-12-22 14:32         ` Mike Snitzer
2009-12-22 17:41           ` Martin K. Petersen
2009-12-22 21:42             ` Mike Snitzer
2009-12-22 22:13               ` Mike Snitzer [this message]
2009-12-23  6:05               ` Martin K. Petersen
2009-12-23 14:26                 ` Mike Snitzer
2009-12-23 16:33                   ` Martin K. Petersen
2009-12-23 17:13                     ` Mike Snitzer
2009-12-23 20:31                       ` Topology fixes Martin K. Petersen
2009-12-23 20:31                       ` [PATCH 1/2] block: Fix incorrect reporting of partition alignment Martin K. Petersen
2009-12-23 20:31                       ` [PATCH 2/2] block: Fix topology stacking for data and discard alignment Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091222221352.GA16299@redhat.com \
    --to=snitzer@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=jens.axboe@oracle.com \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.