* [RFC PATCH] change default alignment of pe_start to 1MB
@ 2010-08-05 19:10 Mike Snitzer
2010-08-06 4:11 ` [RFC PATCH v2] " Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2010-08-05 19:10 UTC (permalink / raw)
To: lvm-devel
The new standard in the storage industry is to default alignment of data
areas to 1MB. fdisk, parted, and mdadm have all been updated to this
default.
Update LVM to align the PV's data area start (pe_start) to 1MB. This
provides a more useful default than the previous default of 64K (which
generally ended up being a 192K pe_start once the first metadata area
was created).
Having recently boasted about the new 1MB default in the Linux storage
stack I was soon faced with the harsh reality that LVM was a weak link
(on legacy hardware that doesn't export I/O topology limits).
This meant that an 8 disk HW raid0 array (with 64K chunk_size) ended up
having a PV pe_start that was not aligned on a full-stripe boundary.
The array would've been configured optimally "out of the box" had LVM
used a default alignment of 1MB.
Before this patch:
# pvs -o name,vg_mda_size,pe_start
PV VMdaSize 1st PE
/dev/sdd 188.00k 192.00k
After this patch:
# pvs -o name,vg_mda_size,pe_start
PV VMdaSize 1st PE
/dev/sdd 1020.00k 1.00m
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
---
doc/example.conf.in | 2 +-
lib/metadata/metadata.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/doc/example.conf.in b/doc/example.conf.in
index 850b7e2..f7dcc63 100644
--- a/doc/example.conf.in
+++ b/doc/example.conf.in
@@ -113,7 +113,7 @@ devices {
# Alignment (in KB) of start of data area when creating a new PV.
# If a PV is placed directly upon an md device and md_chunk_alignment or
# data_alignment_detection is enabled this parameter is ignored.
- # Set to 0 for the default alignment of 64KB or page size, if larger.
+ # Set to 0 for the default alignment of 1MB or page size, if larger.
data_alignment = 0
# By default, the start of the PV's aligned data area will be shifted by
diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c
index d7edf54..887c8bc 100644
--- a/lib/metadata/metadata.c
+++ b/lib/metadata/metadata.c
@@ -70,7 +70,7 @@ unsigned long set_pe_align(struct physical_volume *pv, unsigned long data_alignm
if (data_alignment)
pv->pe_align = data_alignment;
else
- pv->pe_align = MAX(65536UL, lvm_getpagesize()) >> SECTOR_SHIFT;
+ pv->pe_align = MAX(1048576UL, lvm_getpagesize()) >> SECTOR_SHIFT;
if (!pv->dev)
goto out;
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH v2] change default alignment of pe_start to 1MB
2010-08-05 19:10 [RFC PATCH] change default alignment of pe_start to 1MB Mike Snitzer
@ 2010-08-06 4:11 ` Mike Snitzer
2010-08-09 16:28 ` Milan Broz
0 siblings, 1 reply; 4+ messages in thread
From: Mike Snitzer @ 2010-08-06 4:11 UTC (permalink / raw)
To: lvm-devel
The switch to a 1MB default alignment causes various tests in the LVM2
testsuite to fail -- not a big deal but the tests would need updating.
Of more concern is that the existing LVM2 set_pe_align() code doesn't
always properly respect the alignment determined from
'devices/md_chunk_alignment' or 'devices/data_alignment_detection'.
With the previous default alignment of 64k it would generally do the
right thing -- use the detected values. But switching the default to
the larger value exposes the fact that MAX() of the MD or I/O Topology
detected values will generally always be 1MB -- when they are compared
to 1MB.
The following revised patch changes the LVM alignment detection
semantics to model what fdisk has elected to do:
- If the default value (1MB) is a multiple of the specified/detected
alignment then just use the default.
- Otherwise, use the specified/detected value.
In practice this means we'll almost always use 1MB -- that is unless:
- the specified --dataalignment, MD's full stripe width, or the
optimal_io_size exceeds 1MB
- the specified/detected value is not a power-of-2
NOTE: even with a default of 64k the old set_pe_align code would result
in incorrect alignment if a value < 64k were used for --dataalignment
---
doc/example.conf.in | 2 +-
lib/metadata/metadata.c | 30 ++++++++++++++++++------------
2 files changed, 19 insertions(+), 13 deletions(-)
diff --git a/doc/example.conf.in b/doc/example.conf.in
index 850b7e2..f7dcc63 100644
--- a/doc/example.conf.in
+++ b/doc/example.conf.in
@@ -113,7 +113,7 @@ devices {
# Alignment (in KB) of start of data area when creating a new PV.
# If a PV is placed directly upon an md device and md_chunk_alignment or
# data_alignment_detection is enabled this parameter is ignored.
- # Set to 0 for the default alignment of 64KB or page size, if larger.
+ # Set to 0 for the default alignment of 1MB or page size, if larger.
data_alignment = 0
# By default, the start of the PV's aligned data area will be shifted by
diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c
index d7edf54..469473a 100644
--- a/lib/metadata/metadata.c
+++ b/lib/metadata/metadata.c
@@ -64,13 +64,16 @@ const char _really_init[] =
unsigned long set_pe_align(struct physical_volume *pv, unsigned long data_alignment)
{
+ unsigned long temp_pe_align, default_pe_align = 2048;
+
if (pv->pe_align)
goto out;
if (data_alignment)
pv->pe_align = data_alignment;
else
- pv->pe_align = MAX(65536UL, lvm_getpagesize()) >> SECTOR_SHIFT;
+ pv->pe_align = MAX((default_pe_align << SECTOR_SHIFT),
+ lvm_getpagesize()) >> SECTOR_SHIFT;
if (!pv->dev)
goto out;
@@ -79,10 +82,11 @@ unsigned long set_pe_align(struct physical_volume *pv, unsigned long data_alignm
* Align to stripe-width of underlying md device if present
*/
if (find_config_tree_bool(pv->fmt->cmd, "devices/md_chunk_alignment",
- DEFAULT_MD_CHUNK_ALIGNMENT))
- pv->pe_align = MAX(pv->pe_align,
- dev_md_stripe_width(pv->fmt->cmd->sysfs_dir,
- pv->dev));
+ DEFAULT_MD_CHUNK_ALIGNMENT)) {
+ temp_pe_align = dev_md_stripe_width(pv->fmt->cmd->sysfs_dir, pv->dev);
+ if (temp_pe_align && (default_pe_align % temp_pe_align))
+ pv->pe_align = temp_pe_align;
+ }
/*
* Align to topology's minimum_io_size or optimal_io_size if present
@@ -94,13 +98,15 @@ unsigned long set_pe_align(struct physical_volume *pv, unsigned long data_alignm
if (find_config_tree_bool(pv->fmt->cmd,
"devices/data_alignment_detection",
DEFAULT_DATA_ALIGNMENT_DETECTION)) {
- pv->pe_align = MAX(pv->pe_align,
- dev_minimum_io_size(pv->fmt->cmd->sysfs_dir,
- pv->dev));
-
- pv->pe_align = MAX(pv->pe_align,
- dev_optimal_io_size(pv->fmt->cmd->sysfs_dir,
- pv->dev));
+ temp_pe_align = dev_minimum_io_size(pv->fmt->cmd->sysfs_dir,
+ pv->dev);
+ if (temp_pe_align && (default_pe_align % temp_pe_align))
+ pv->pe_align = temp_pe_align;
+
+ temp_pe_align = dev_optimal_io_size(pv->fmt->cmd->sysfs_dir,
+ pv->dev);
+ if (temp_pe_align && (default_pe_align % temp_pe_align))
+ pv->pe_align = temp_pe_align;
}
log_very_verbose("%s: Setting PE alignment to %lu sectors.",
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH v2] change default alignment of pe_start to 1MB
2010-08-06 4:11 ` [RFC PATCH v2] " Mike Snitzer
@ 2010-08-09 16:28 ` Milan Broz
2010-08-09 16:42 ` Mike Snitzer
0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2010-08-09 16:28 UTC (permalink / raw)
To: lvm-devel
On 08/06/2010 06:11 AM, Mike Snitzer wrote:
> The switch to a 1MB default alignment causes various tests in the LVM2
> testsuite to fail -- not a big deal but the tests would need updating.
>
> Of more concern is that the existing LVM2 set_pe_align() code doesn't
> always properly respect the alignment determined from
> 'devices/md_chunk_alignment' or 'devices/data_alignment_detection'.
>
> With the previous default alignment of 64k it would generally do the
> right thing -- use the detected values. But switching the default to
> the larger value exposes the fact that MAX() of the MD or I/O Topology
> detected values will generally always be 1MB -- when they are compared
> to 1MB.
>
> The following revised patch changes the LVM alignment detection
> semantics to model what fdisk has elected to do:
> - If the default value (1MB) is a multiple of the specified/detected
> alignment then just use the default.
> - Otherwise, use the specified/detected value.
>
> In practice this means we'll almost always use 1MB -- that is unless:
> - the specified --dataalignment, MD's full stripe width, or the
> optimal_io_size exceeds 1MB
> - the specified/detected value is not a power-of-2
patch not tested, but Ack for idea.
(I just did independently the same for LUKS devices.)
Milan
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH v2] change default alignment of pe_start to 1MB
2010-08-09 16:28 ` Milan Broz
@ 2010-08-09 16:42 ` Mike Snitzer
0 siblings, 0 replies; 4+ messages in thread
From: Mike Snitzer @ 2010-08-09 16:42 UTC (permalink / raw)
To: lvm-devel
On Mon, Aug 09 2010 at 12:28pm -0400,
Milan Broz <mbroz@redhat.com> wrote:
> On 08/06/2010 06:11 AM, Mike Snitzer wrote:
> > The following revised patch changes the LVM alignment detection
> > semantics to model what fdisk has elected to do:
> > - If the default value (1MB) is a multiple of the specified/detected
> > alignment then just use the default.
> > - Otherwise, use the specified/detected value.
> >
> > In practice this means we'll almost always use 1MB -- that is unless:
> > - the specified --dataalignment, MD's full stripe width, or the
> > optimal_io_size exceeds 1MB
> > - the specified/detected value is not a power-of-2
>
> patch not tested, but Ack for idea.
> (I just did independently the same for LUKS devices.)
Great. And just one point of clarification: if --dataalignment is used
then the requested alignment is used (no additional logic is allowed to
change the requested value).
Mike
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-08-09 16:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-05 19:10 [RFC PATCH] change default alignment of pe_start to 1MB Mike Snitzer
2010-08-06 4:11 ` [RFC PATCH v2] " Mike Snitzer
2010-08-09 16:28 ` Milan Broz
2010-08-09 16:42 ` Mike Snitzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.