All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: device-mapper development <dm-devel@redhat.com>,
	"Alasdair G. Kergon" <agk@redhat.com>
Subject: Re: [PATCH v2] dm: Fix alignment stacking on partitioned devices
Date: Thu, 7 Jan 2010 13:55:05 -0500	[thread overview]
Message-ID: <20100107185505.GA15181@redhat.com> (raw)
In-Reply-To: <yq1hbqxsup7.fsf@sermon.lab.mkp.net>

On Thu, Jan 07 2010 at 12:15pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> >> However, there are other subsystems that need to stack without an
> >> associated block_device.  So instead of changing blk_stack_limits I
> >> propose we use the wrapper below.
> 
> Mike> Yes, looks like osdblk.c's use of blk_queue_stack_limits()
> 
> DM also does a stacking pass without a bdev (dm_calculate_queue_limits),
> although I think you can avoid that by getting rid of ti_limits
> altogether and just stack limits.

Yeah, I'm keeping the stacking of all devices within a target
self-contained; I also do validation of the target's limits:

                /*
                 * Check each device area is consistent with the target's
                 * overall queue limits.
                 */
                if (ti->type->iterate_devices(ti, device_area_is_invalid,
                                              &ti_limits))
                        return -EINVAL;

The validation, with device_area_is_invalid, is confined to validating
only this target's devices' limits.  I couldn't do the same target-local
validation by stacking a single flat 'limits'.

I then stack that target's limits on the final DM device's 'limits' (and
will spew a warning accordingly).

> I couldn't figure out why the alignment_offset for a misaligned DM
> device was zero, despite being -1 after stacking had completed.  I
> finally spotted this in dm_table_set_restrictions():
> 
>         limits->alignment_offset = 0;
>         limits->misaligned = 0;
> 
> So after all the stacking function's hard work you set the alignment to
> 0 and clear the misaligned flag.  Why?

Good question, it made my head hurt trying to recall why :)

A very critical piece to this is the assumption that alignment_offset is
consumed by LVM2 (it adjusts by shifting a target device's 'start' to
include alignment_offset).  DM doesn't rely on the block layer to
validate alignment_offset at all.

The comment above the lines you referenced offers:

        /*
         * Each target device in the table has a data area that should normally
         * be aligned such that the DM device's alignment_offset is 0.
         * FIXME: Propagate alignment_offsets up the stack and warn of
         *        sub-optimal or inconsistent settings.
         */

Seems I took the time to add a comment whose FIXME doesn't ring many
bells now!  But ignoring that, the comment before the FIXME is making a
veiled reference to userspace having consumed alignment_offset.

To recap: Userspace consumes alignment_offset so it shouldn't be a
concern in DM.  Each target device is validated (like I covered above)
such that any misalignment (purely logicical vs physical misalignment)
within a target device is not tolerated.  So by the time the DM table's
limits get copied to the DM device's limits (in
dm_table_set_restrictions) any misalignment percieved by the block layer
is not trusted/published.

Mike

p.s. Any feedback on fatal flaws in all of this is much appreciated.  I
thought I cleared most of this with you as it was developed but seems
not (obviously I didn't clear resetting alignment_offset and
misaligned).

But I'm now thinking that if all the above works according to plan we
shouldn't ever have a DM device's alignment_offset or misaligned be
anything but 0 (unless one isn't using a trained LVM2; which I'd imagine
you aren't).  Anyway, long story short, those lines _should_ likely be
removed... but it needs further testing that I can't do right now.

  reply	other threads:[~2010-01-07 18:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-23 18:07 [PATCH v2] dm: Fix alignment stacking on partitioned devices Mike Snitzer
2009-12-23 19:21 ` Mike Snitzer
2010-01-05 18:04 ` Alasdair G Kergon
2010-01-05 18:27 ` Alasdair G Kergon
2010-01-06  2:23   ` Martin K. Petersen
2010-01-06  2:57     ` Mike Snitzer
2010-01-06  3:03       ` Mike Snitzer
2010-01-06  3:24       ` Martin K. Petersen
2010-01-06  4:10         ` Mike Snitzer
2010-01-06  4:16           ` Martin K. Petersen
2010-01-06  5:16             ` Mike Snitzer
2010-01-06  8:39               ` Martin K. Petersen
2010-01-06 13:25                 ` Mike Snitzer
2010-01-07 17:15                   ` Martin K. Petersen
2010-01-07 18:55                     ` Mike Snitzer [this message]
2010-01-08 18:41                       ` Martin K. Petersen
2010-01-08 20:28                         ` Mike Snitzer
2010-01-09  0:01                   ` [PATCH] block: bdev_stack_limits wrapper and DM topology fixes Mike Snitzer
2010-01-06 13:55                 ` [PATCH v2] dm: Fix alignment stacking on partitioned devices Alasdair G Kergon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100107185505.GA15181@redhat.com \
    --to=snitzer@redhat.com \
    --cc=agk@redhat.com \
    --cc=dm-devel@redhat.com \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.