* [PATCH 0/6 v5] LVM2 topology support
@ 2009-07-13 20:11 Mike Snitzer
2009-07-13 20:11 ` [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area Mike Snitzer
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw)
To: lvm-devel
LVM2 topology support to properly align the PV's pe_start using the
associated 'alignment_offset' exposed in sysfs.
Provides manual configuration with pvcreate --dataalignmentoffset and
auto config control via 'devices/data_alignment_offset_detection' in
lvm.conf.
Documentation was added to the lvm.conf and pvcreate man pages.
Testing with pvcreate + vgextend + vgreduce + vgextend + vgreduce ...
proved to maintain the pe_start from the initial pvcreate.
v2 changes:
Also added another method for auto detection of --dataalignment using
the topology information that is exposed in sysfs. Controlled via
'devices/data_alignment_detection' in lvm2.conf.
v3 changes:
Fixed 'devices/data_alignment_detection' patch (3/6) to retrieve the
minimum_io_size and optimal_io_size for partitions from the parent
device's 'queue/'. Both attributes are found in 'queue/'
v4 changes:
Simplified the _mda_setup() change in --dataalignmentoffset patch (1/6).
Improved the documentation and comments in all patches.
v5 changes:
Improved LVM2's support for partitions (in the MD and topology code paths).
Added MD, partitions, and topology tests to the testsuite.
Mike Snitzer (6):
Add --dataalignmentoffset to pvcreate to pad aligned data area
Add devices/data_alignment_offset_detection to lvm.conf.
Add devices/data_alignment_detection to lvm.conf.
Improve ability to lookup primary device associated with partition
Retrieve MD sysfs attributes for MD partitions
Add MD, partition and topology tests to the LVM2 test-suite
WHATS_NEW | 3 +
doc/example.conf | 21 ++++-
lib/config/defaults.h | 2 +
lib/device/dev-md.c | 13 ++-
lib/device/device.c | 184 ++++++++++++++++++++++++++++++++++++++
lib/device/device.h | 12 +++
lib/format1/format1.c | 1 +
lib/format_pool/format_pool.c | 1 +
lib/format_text/archiver.c | 2 +-
lib/format_text/format-text.c | 16 +++-
lib/metadata/metadata-exported.h | 1 +
lib/metadata/metadata.c | 37 +++++++-
lib/metadata/metadata.h | 1 +
man/lvm.conf.5.in | 22 ++++-
man/pvcreate.8.in | 20 ++++-
test/t-pvcreate-operation-md.sh | 85 +++++++++++++++++
test/t-pvcreate-usage.sh | 12 +++
test/test-utils.sh | 2 +-
tools/args.h | 1 +
tools/commands.h | 7 +-
tools/pvcreate.c | 23 +++++-
tools/vgconvert.c | 2 +-
22 files changed, 444 insertions(+), 24 deletions(-)
create mode 100644 test/t-pvcreate-operation-md.sh
^ permalink raw reply [flat|nested] 16+ messages in thread* [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer @ 2009-07-13 20:11 ` Mike Snitzer 2009-07-16 15:11 ` Alasdair G Kergon 2009-07-13 20:11 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer ` (4 subsequent siblings) 5 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw) To: lvm-devel Implement pvcreate --dataalignmentoffset to pad the start of the aligned data area. When setting up the first mda, format-text.c:_mda_setup now sets the pe_start to immediately follow the first mda (which ends at a pe_align boundry). data_alignment_offset will be added to pe_start if --dataalignmentoffset was used. Testing included verifying 'pvcreate --metadatacopies 2' works (2nd mda's location on disk isn't affected by data_alignment_offset). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- WHATS_NEW | 1 + lib/format1/format1.c | 1 + lib/format_pool/format_pool.c | 1 + lib/format_text/archiver.c | 2 +- lib/format_text/format-text.c | 16 ++++++++++++++-- lib/metadata/metadata-exported.h | 1 + lib/metadata/metadata.c | 10 ++++++++-- lib/metadata/metadata.h | 1 + man/pvcreate.8.in | 20 +++++++++++++++++--- tools/args.h | 1 + tools/commands.h | 7 ++++--- tools/pvcreate.c | 23 +++++++++++++++++++++-- tools/vgconvert.c | 2 +- 13 files changed, 72 insertions(+), 14 deletions(-) Index: LVM2/WHATS_NEW =================================================================== --- LVM2.orig/WHATS_NEW +++ LVM2/WHATS_NEW @@ -1,5 +1,6 @@ Version 2.02.49 - ================================ + Add --dataalignmentoffset to pvcreate to pad aligned data area. Fix dev name mismatch in vgcreate man page example. Refactor vg_remove_single for use in liblvm. Make all tools consistent with lock ordering - obtain VG_ORPHAN lock second. Index: LVM2/lib/format1/format1.c =================================================================== --- LVM2.orig/lib/format1/format1.c +++ LVM2/lib/format1/format1.c @@ -296,6 +296,7 @@ static int _format1_pv_setup(const struc uint64_t pe_start, uint32_t extent_count, uint32_t extent_size, unsigned long data_alignment __attribute((unused)), + unsigned long data_alignment_offset __attribute((unused)), int pvmetadatacopies __attribute((unused)), uint64_t pvmetadatasize __attribute((unused)), struct dm_list *mdas __attribute((unused)), struct physical_volume *pv, struct volume_group *vg __attribute((unused))) Index: LVM2/lib/format_pool/format_pool.c =================================================================== --- LVM2.orig/lib/format_pool/format_pool.c +++ LVM2/lib/format_pool/format_pool.c @@ -192,6 +192,7 @@ static int _pool_pv_setup(const struct f uint32_t extent_count __attribute((unused)), uint32_t extent_size __attribute((unused)), unsigned long data_alignment __attribute((unused)), + unsigned long data_alignment_offset __attribute((unused)), int pvmetadatacopies __attribute((unused)), uint64_t pvmetadatasize __attribute((unused)), struct dm_list *mdas __attribute((unused)), Index: LVM2/lib/format_text/archiver.c =================================================================== --- LVM2.orig/lib/format_text/archiver.c +++ LVM2/lib/format_text/archiver.c @@ -320,7 +320,7 @@ int backup_restore_vg(struct cmd_context return 0; } if (!vg->fid->fmt->ops-> - pv_setup(vg->fid->fmt, UINT64_C(0), 0, 0, 0, 0UL, + pv_setup(vg->fid->fmt, UINT64_C(0), 0, 0, 0, 0, 0UL, UINT64_C(0), &vg->fid->metadata_areas, pv, vg)) { log_error("Format-specific setup for %s failed", pv_dev_name(pv)); Index: LVM2/lib/format_text/format-text.c =================================================================== --- LVM2.orig/lib/format_text/format-text.c +++ LVM2/lib/format_text/format-text.c @@ -1175,6 +1175,7 @@ static int _text_scan(const struct forma Always have an mda between end-of-label and pe_align() boundary */ static int _mda_setup(const struct format_type *fmt, uint64_t pe_start, uint64_t pe_end, + unsigned long data_alignment_offset, int pvmetadatacopies, uint64_t pvmetadatasize, struct dm_list *mdas, struct physical_volume *pv, @@ -1251,6 +1252,16 @@ static int _mda_setup(const struct forma return 0; } + if (!pe_start && !pe_end) { + /* + * Set PV's pe_start to immediately follow the + * first mda (which ends at a pe_align boundary) + */ + pv->pe_start = (start1 + mda_size1) >> SECTOR_SHIFT; + if (data_alignment_offset) + pv->pe_start += data_alignment_offset; + } + if (pvmetadatacopies == 1) return 1; } else @@ -1598,6 +1609,7 @@ static struct metadata_area_ops _metadat static int _text_pv_setup(const struct format_type *fmt, uint64_t pe_start, uint32_t extent_count, uint32_t extent_size, unsigned long data_alignment, + unsigned long data_alignment_offset, int pvmetadatacopies, uint64_t pvmetadatasize, struct dm_list *mdas, struct physical_volume *pv, struct volume_group *vg) @@ -1707,8 +1719,8 @@ static int _text_pv_setup(const struct f if (extent_count) pe_end = pe_start + extent_count * extent_size - 1; - if (!_mda_setup(fmt, pe_start, pe_end, pvmetadatacopies, - pvmetadatasize, mdas, pv, vg)) + if (!_mda_setup(fmt, pe_start, pe_end, data_alignment_offset, + pvmetadatacopies, pvmetadatasize, mdas, pv, vg)) return_0; } Index: LVM2/lib/metadata/metadata-exported.h =================================================================== --- LVM2.orig/lib/metadata/metadata-exported.h +++ LVM2/lib/metadata/metadata-exported.h @@ -402,6 +402,7 @@ pv_t *pv_create(const struct cmd_context struct id *id, uint64_t size, unsigned long data_alignment, + unsigned long data_alignment_offset, uint64_t pe_start, uint32_t existing_extent_count, uint32_t existing_extent_size, Index: LVM2/lib/metadata/metadata.c =================================================================== --- LVM2.orig/lib/metadata/metadata.c +++ LVM2/lib/metadata/metadata.c @@ -48,6 +48,7 @@ static struct physical_volume *_pv_creat struct device *dev, struct id *id, uint64_t size, unsigned long data_alignment, + unsigned long data_alignment_offset, uint64_t pe_start, uint32_t existing_extent_count, uint32_t existing_extent_size, @@ -161,7 +162,7 @@ int add_pv_to_vg(struct volume_group *vg pv->pe_alloc_count = 0; if (!fid->fmt->ops->pv_setup(fid->fmt, UINT64_C(0), 0, - vg->extent_size, 0, 0UL, UINT64_C(0), + vg->extent_size, 0, 0, 0UL, UINT64_C(0), &fid->metadata_areas, pv, vg)) { log_error("Format-specific setup of physical volume '%s' " "failed.", pv_name); @@ -904,6 +905,7 @@ int vg_split_mdas(struct cmd_context *cm * @id: PV UUID to use for initialization * @size: size of the PV in sectors * @data_alignment: requested alignment of data + * @data_alignment_offset: requested offset to aligned data * @pe_start: physical extent start * @existing_extent_count * @existing_extent_size @@ -922,13 +924,15 @@ pv_t *pv_create(const struct cmd_context struct device *dev, struct id *id, uint64_t size, unsigned long data_alignment, + unsigned long data_alignment_offset, uint64_t pe_start, uint32_t existing_extent_count, uint32_t existing_extent_size, int pvmetadatacopies, uint64_t pvmetadatasize, struct dm_list *mdas) { - return _pv_create(cmd->fmt, dev, id, size, data_alignment, pe_start, + return _pv_create(cmd->fmt, dev, id, size, + data_alignment, data_alignment_offset, pe_start, existing_extent_count, existing_extent_size, pvmetadatacopies, @@ -973,6 +977,7 @@ static struct physical_volume *_pv_creat struct device *dev, struct id *id, uint64_t size, unsigned long data_alignment, + unsigned long data_alignment_offset, uint64_t pe_start, uint32_t existing_extent_count, uint32_t existing_extent_size, @@ -1024,6 +1029,7 @@ static struct physical_volume *_pv_creat if (!fmt->ops->pv_setup(fmt, pe_start, existing_extent_count, existing_extent_size, data_alignment, + data_alignment_offset, pvmetadatacopies, pvmetadatasize, mdas, pv, NULL)) { log_error("%s: Format-specific setup of physical volume " Index: LVM2/lib/metadata/metadata.h =================================================================== --- LVM2.orig/lib/metadata/metadata.h +++ LVM2/lib/metadata/metadata.h @@ -213,6 +213,7 @@ struct format_handler { int (*pv_setup) (const struct format_type * fmt, uint64_t pe_start, uint32_t extent_count, uint32_t extent_size, unsigned long data_alignment, + unsigned long data_alignment_offset, int pvmetadatacopies, uint64_t pvmetadatasize, struct dm_list * mdas, struct physical_volume * pv, struct volume_group * vg); Index: LVM2/man/pvcreate.8.in =================================================================== --- LVM2.orig/man/pvcreate.8.in +++ LVM2/man/pvcreate.8.in @@ -14,6 +14,7 @@ pvcreate \- initialize a disk or partiti .RB [ \-\-metadatacopies #copies ] .RB [ \-\-metadatasize size ] .RB [ \-\-dataalignment alignment ] +.RB [ \-\-dataalignmentoffset offset ] .RB [ \-\-restorefile file ] .RB [ \-\-setphysicalvolumesize size ] .RB [ \-u | \-\-uuid uuid ] @@ -91,13 +92,18 @@ The approximate amount of space to be se (The size you specify may get rounded.) .TP .BR \-\-dataalignment " alignment" -Align the offset of the start of the data to a multiple of this number. +Align the start of the data to a multiple of this number. You should also specify an appropriate \fBPhysicalExtentSize\fP when creating the Volume Group with \fBvgcreate\fP. .sp To see the location of the first Physical Extent of an existing Physical Volume use \fBpvs -o +pe_start\fP . It will be a multiple of the requested -\fBdata_alignment\fP. +\fBalignment\fP. In addition it may be padded by \fBalignment_offset\fP from +\fBdata_alignment_offset_detection\fP (if enabled in \fBlvm.conf\fP) or +\fB--dataalignmentoffset\fP. +.TP +.BR \-\-dataalignmentoffset " alignment_offset" +Pad the start of the data with this additional \fBalignment_offset\fP. .TP .BR \-\-metadatacopies " copies" The number of metadata areas to set aside on each PV. Currently @@ -128,13 +134,21 @@ in the source). Use with care. .TP .BR \-\-setphysicalvolumesize " size" Overrides the automatically-detected size of the PV. Use with care. -.SH Example +.SH EXAMPLES Initialize partition #4 on the third SCSI disk and the entire fifth SCSI disk for later use by LVM: .sp .B pvcreate /dev/sdc4 /dev/sde .sp +If the 2nd SCSI disk is a 4KB sector drive that compensates for windows +partitioning (sector 7 is the lowest aligned logical block, the 4KB +sectors start at LBA -1, and consequently sector 63 is aligned on a 4KB +boundary) manually account for this when initializing for use by LVM: +.sp +.B pvcreate --dataalignmentoffset 7s /dev/sdb +.sp .SH SEE ALSO +.BR lvm.conf (5), .BR lvm (8), .BR vgcreate (8), .BR vgextend (8), Index: LVM2/tools/args.h =================================================================== --- LVM2.orig/tools/args.h +++ LVM2/tools/args.h @@ -59,6 +59,7 @@ arg(nameprefixes_ARG, '\0', "nameprefixe arg(unquoted_ARG, '\0', "unquoted", NULL, 0) arg(rows_ARG, '\0', "rows", NULL, 0) arg(dataalignment_ARG, '\0', "dataalignment", size_kb_arg, 0) +arg(dataalignmentoffset_ARG, '\0', "dataalignmentoffset", size_kb_arg, 0) arg(virtualoriginsize_ARG, '\0', "virtualoriginsize", size_mb_arg, 0) arg(virtualsize_ARG, '\0', "virtualsize", size_mb_arg, 0) Index: LVM2/tools/commands.h =================================================================== --- LVM2.orig/tools/commands.h +++ LVM2/tools/commands.h @@ -470,6 +470,7 @@ xx(pvcreate, "\t[--metadatacopies #copies]" "\n" "\t[--metadatasize MetadataSize[bBsSkKmMgGtTpPeE]]" "\n" "\t[--dataalignment Alignment[bBsSkKmMgGtTpPeE]]" "\n" + "\t[--dataalignmentoffset AlignmentOffset[bBsSkKmMgGtTpPeE]]" "\n" "\t[--setphysicalvolumesize PhysicalVolumeSize[bBsSkKmMgGtTpPeE]" "\n" "\t[-t|--test] " "\n" "\t[-u|--uuid uuid] " "\n" @@ -479,9 +480,9 @@ xx(pvcreate, "\t[--version] " "\n" "\tPhysicalVolume [PhysicalVolume...]\n", - dataalignment_ARG, force_ARG, test_ARG, labelsector_ARG, metadatatype_ARG, - metadatacopies_ARG, metadatasize_ARG, physicalvolumesize_ARG, - restorefile_ARG, uuidstr_ARG, yes_ARG, zero_ARG) + dataalignment_ARG, dataalignmentoffset_ARG, force_ARG, test_ARG, + labelsector_ARG, metadatatype_ARG, metadatacopies_ARG, metadatasize_ARG, + physicalvolumesize_ARG, restorefile_ARG, uuidstr_ARG, yes_ARG, zero_ARG) xx(pvdata, "Display the on-disk metadata for physical volume(s)", Index: LVM2/tools/pvcreate.c =================================================================== --- LVM2.orig/tools/pvcreate.c +++ LVM2/tools/pvcreate.c @@ -20,6 +20,7 @@ struct pvcreate_params { int zero; uint64_t size; uint64_t data_alignment; + uint64_t data_alignment_offset; int pvmetadatacopies; uint64_t pvmetadatasize; int64_t labelsector; @@ -203,8 +204,8 @@ static int pvcreate_single(struct cmd_co dm_list_init(&mdas); if (!(pv = pv_create(cmd, dev, pp->idp, pp->size, - pp->data_alignment, pp->pe_start, - pp->extent_count, pp->extent_size, + pp->data_alignment, pp->data_alignment_offset, + pp->pe_start, pp->extent_count, pp->extent_size, pp->pvmetadatacopies, pp->pvmetadatasize,&mdas))) { log_error("Failed to setup physical volume \"%s\"", pv_name); @@ -377,6 +378,24 @@ static int pvcreate_validate_params(stru pp->data_alignment = 0; } + if (arg_sign_value(cmd, dataalignmentoffset_ARG, 0) == SIGN_MINUS) { + log_error("Physical volume data alignment offset may not be negative"); + return 0; + } + pp->data_alignment_offset = arg_uint64_value(cmd, dataalignmentoffset_ARG, UINT64_C(0)); + + if (pp->data_alignment_offset > ULONG_MAX) { + log_error("Physical volume data alignment offset is too big."); + return 0; + } + + if (pp->data_alignment_offset && pp->pe_start) { + log_warn("WARNING: Ignoring data alignment offset %" PRIu64 + " incompatible with --restorefile value (%" + PRIu64").", pp->data_alignment_offset, pp->pe_start); + pp->data_alignment_offset = 0; + } + if (arg_sign_value(cmd, metadatasize_ARG, 0) == SIGN_MINUS) { log_error("Metadata size may not be negative"); return 0; Index: LVM2/tools/vgconvert.c =================================================================== --- LVM2.orig/tools/vgconvert.c +++ LVM2/tools/vgconvert.c @@ -123,7 +123,7 @@ static int vgconvert_single(struct cmd_c dm_list_init(&mdas); if (!(pv = pv_create(cmd, pv_dev(existing_pv), - &existing_pv->id, size, 0, + &existing_pv->id, size, 0, 0, pe_start, pv_pe_count(existing_pv), pv_pe_size(existing_pv), pvmetadatacopies, pvmetadatasize, &mdas))) { ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area 2009-07-13 20:11 ` [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area Mike Snitzer @ 2009-07-16 15:11 ` Alasdair G Kergon 2009-07-16 20:53 ` Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Alasdair G Kergon @ 2009-07-16 15:11 UTC (permalink / raw) To: lvm-devel On Mon, Jul 13, 2009 at 04:11:17PM -0400, Mike Snitzer wrote: > When setting up the first mda, format-text.c:_mda_setup now sets the > pe_start to immediately follow the first mda (which ends at a pe_align > boundry). data_alignment_offset will be added to pe_start if > --dataalignmentoffset was used. The reason this cannot be reviewed easily is this: > Index: LVM2/lib/format_text/format-text.c > =================================================================== > --- LVM2.orig/lib/format_text/format-text.c > +++ LVM2/lib/format_text/format-text.c > @@ -1175,6 +1175,7 @@ static int _text_scan(const struct forma > Always have an mda between end-of-label and pe_align() boundary */ > static int _mda_setup(const struct format_type *fmt, > uint64_t pe_start, uint64_t pe_end, > + unsigned long data_alignment_offset, > int pvmetadatacopies, > uint64_t pvmetadatasize, struct dm_list *mdas, > struct physical_volume *pv, > @@ -1251,6 +1252,16 @@ static int _mda_setup(const struct forma > return 0; > } > > + if (!pe_start && !pe_end) { > + /* > + * Set PV's pe_start to immediately follow the > + * first mda (which ends at a pe_align boundary) > + */ > + pv->pe_start = (start1 + mda_size1) >> SECTOR_SHIFT; > + if (data_alignment_offset) > + pv->pe_start += data_alignment_offset; > + } > + > if (pvmetadatacopies == 1) > return 1; > } else While the code already has elsewhere (_text_pv_write): /* * If pe_start is still unset, set it to first aligned * sector after any metadata areas that begin before pe_start. */ if (!pv->pe_start) pv->pe_start = pv->pe_align; Further explanation is required to understand how both can be correct. Alasdair ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area 2009-07-16 15:11 ` Alasdair G Kergon @ 2009-07-16 20:53 ` Mike Snitzer 2009-07-16 22:11 ` Alasdair G Kergon 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-07-16 20:53 UTC (permalink / raw) To: lvm-devel On Thu, Jul 16 2009 at 11:11am -0400, Alasdair G Kergon <agk@redhat.com> wrote: > On Mon, Jul 13, 2009 at 04:11:17PM -0400, Mike Snitzer wrote: > > When setting up the first mda, format-text.c:_mda_setup now sets the > > pe_start to immediately follow the first mda (which ends at a pe_align > > boundry). data_alignment_offset will be added to pe_start if > > --dataalignmentoffset was used. > > The reason this cannot be reviewed easily is this: > > > Index: LVM2/lib/format_text/format-text.c > > =================================================================== > > --- LVM2.orig/lib/format_text/format-text.c > > +++ LVM2/lib/format_text/format-text.c > > @@ -1175,6 +1175,7 @@ static int _text_scan(const struct forma > > Always have an mda between end-of-label and pe_align() boundary */ > > static int _mda_setup(const struct format_type *fmt, > > uint64_t pe_start, uint64_t pe_end, > > + unsigned long data_alignment_offset, > > int pvmetadatacopies, > > uint64_t pvmetadatasize, struct dm_list *mdas, > > struct physical_volume *pv, > > @@ -1251,6 +1252,16 @@ static int _mda_setup(const struct forma > > return 0; > > } > > > > + if (!pe_start && !pe_end) { > > + /* > > + * Set PV's pe_start to immediately follow the > > + * first mda (which ends at a pe_align boundary) > > + */ > > + pv->pe_start = (start1 + mda_size1) >> SECTOR_SHIFT; > > + if (data_alignment_offset) > > + pv->pe_start += data_alignment_offset; > > + } > > + > > if (pvmetadatacopies == 1) > > return 1; > > } else > > While the code already has elsewhere (_text_pv_write): > > /* > * If pe_start is still unset, set it to first aligned > * sector after any metadata areas that begin before pe_start. > */ > if (!pv->pe_start) > pv->pe_start = pv->pe_align; > > > Further explanation is required to understand how both can be correct. Yes, that code in _text_pv_write really has no place being in there (that I can see); especially given that _text_pv_setup() also sets pv->pe_start accordingly with: if (pe_start) pv->pe_start = pe_start; ... if (pv->pe_start < pv->pe_align) pv->pe_start = pv->pe_align; If anything: that _text_pv_write being where it is makes that code quite a bit more labored/awkward.. see the dm_list_iterate_items() loop that follows the pv->pe_start initialization you shared above: ... pv->pe_start = (mdac->area.start + mdac->area.size) >> SECTOR_SHIFT; adjustment = pv->pe_start % pv->pe_align; if (adjustment) pv->pe_start += pv->pe_align - adjustment; } All those nuanced conditional checks (that preceed the _text_pv_write code I just shared) really amount to "we need to place the pe_start immediately after the first data area". The setup path (_mda_setup) is the more logical place to establish the start of the PV's data area. But given the care that someone clearly put into all those (undocumented) checks in _text_pv_write I thought I could still be missing _why_ the pe_start needed to be established so late (in the _text_pv_write path). I'll take one more pass through it when I get back (tracing all entry into _text_pv_write). It could be safe to just remove that code in _text_pv_write. Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area 2009-07-16 20:53 ` Mike Snitzer @ 2009-07-16 22:11 ` Alasdair G Kergon 0 siblings, 0 replies; 16+ messages in thread From: Alasdair G Kergon @ 2009-07-16 22:11 UTC (permalink / raw) To: lvm-devel The difficulty is that lots of code paths converge here. Cases where there are pre-existing mdas that must be preserved. Cases where there is a pre-existing pe_start that must be preserved. Cases which back up the existing setting of pe_start and restore it after returning - the mdas had better not have moved to overlap. pe_start is kept in two places on disk - inside the PV header (as the data area location) and inside the VG metadata if the PV belongs to a VG. The latter could mask the former. E.g. Did you test pvcreate with some new offset; add to VG; vgcfgrestore from a backup that did not have an offset when it was created; vgremove; and see whether the original offset got preserved? Then throw a pvchange --uuid into the sequence just before the vgremove. And test conversion from/to lvm1 format. These are examples of code paths touched by the patch which should all be retested. Alternatively go for a rule-based patch ("wherever it currently does X now also do Y") that avoids the need to audit and test all those code paths thoroughly. Or if you want to move the place pe_start is set and you have audited/tested the code, do that as a separate patch prior to incorporating the new setting. Also, we need a crisper definition of this new setting and how it interacts with the existing alignment code (page size, data_alignment, pe_align, md_chunk_alignment etc.) What's the behaviour if someone gives a value much larger than the calculated pe_align? Are we sure it's never applied twice e.g. with multiple invocations of the code (in any combination of the cases I mentioned above) pe_start doesn't grow bigger and bigger? Should the first mda get extended to fill up any free space prior to pe_start? Alasdair ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer 2009-07-13 20:11 ` [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area Mike Snitzer @ 2009-07-13 20:11 ` Mike Snitzer 2009-07-16 18:47 ` Milan Broz 2009-07-13 20:11 ` [PATCH 3/6] Add devices/data_alignment_detection " Mike Snitzer ` (3 subsequent siblings) 5 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw) To: lvm-devel If the pvcreate --dataalignmentoffset option is not specified the start of a PV's aligned data area will be padded with the associated 'alignment_offset' exposed in sysfs (unless devices/data_alignment_offset_detection is disabled in lvm.conf). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- WHATS_NEW | 1 doc/example.conf | 9 ++++++ lib/config/defaults.h | 1 lib/device/device.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++++ lib/device/device.h | 3 ++ lib/metadata/metadata.c | 8 +++++ man/lvm.conf.5.in | 9 +++++- 7 files changed, 101 insertions(+), 1 deletion(-) Index: LVM2/WHATS_NEW =================================================================== --- LVM2.orig/WHATS_NEW +++ LVM2/WHATS_NEW @@ -1,5 +1,6 @@ Version 2.02.49 - ================================ + Add devices/data_alignment_offset_detection to lvm.conf. Add --dataalignmentoffset to pvcreate to pad aligned data area. Fix dev name mismatch in vgcreate man page example. Refactor vg_remove_single for use in liblvm. Index: LVM2/doc/example.conf =================================================================== --- LVM2.orig/doc/example.conf +++ LVM2/doc/example.conf @@ -104,6 +104,15 @@ devices { # Set to 0 for the default alignment of 64KB or page size, if larger. data_alignment = 0 + # By default, the start of a PV's aligned data area will be padded with + # the 'alignment_offset' exposed in sysfs. This offset is often 0 but + # may be non-zero; e.g.: certain 4KB sector drives that compensate for + # windows partitioning will have an alignment_offset of 3584 bytes + # (sector 7 is the lowest aligned logical block, the 4KB sectors start + # at LBA -1, and consequently sector 63 is aligned on a 4KB boundary). + # 1 enables; 0 disables. + data_alignment_offset_detection = 1 + # If, while scanning the system for PVs, LVM2 encounters a device-mapper # device that has its I/O suspended, it waits for it to become accessible. # Set this to 1 to skip such devices. This should only be needed Index: LVM2/lib/config/defaults.h =================================================================== --- LVM2.orig/lib/config/defaults.h +++ LVM2/lib/config/defaults.h @@ -34,6 +34,7 @@ #define DEFAULT_MD_COMPONENT_DETECTION 1 #define DEFAULT_MD_CHUNK_ALIGNMENT 1 #define DEFAULT_IGNORE_SUSPENDED_DEVICES 1 +#define DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION 1 #define DEFAULT_LOCK_DIR "/var/lock/lvm" #define DEFAULT_LOCKING_LIB "liblvm2clusterlock.so" Index: LVM2/lib/device/device.c =================================================================== --- LVM2.orig/lib/device/device.c +++ LVM2/lib/device/device.c @@ -282,3 +282,74 @@ int _get_partition_type(struct dev_mgr * return 0; } #endif + +#ifdef linux + +static unsigned long _dev_topology_attribute(const char *attribute, + const char *sysfs_dir, + struct device *dev) +{ + char path[PATH_MAX+1], buffer[64]; + FILE *fp; + struct stat info; + unsigned long result = 0UL; + + if (!attribute || !*attribute) + return_0; + + if (!sysfs_dir || !*sysfs_dir) + return_0; + + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", + sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev), + attribute) < 0) { + log_error("dm_snprintf %s failed", attribute); + return 0; + } + + /* check if the desired sysfs attribute exists */ + if (stat(path, &info) < 0) + return 0; + + if (!(fp = fopen(path, "r"))) { + log_sys_error("fopen", path); + return 0; + } + + if (!fgets(buffer, sizeof(buffer), fp)) { + log_sys_error("fgets", path); + goto out; + } + + if (sscanf(buffer, "%lu", &result) != 1) { + log_error("sysfs file %s not in expected format: %s", path, + buffer); + goto out; + } + + log_very_verbose("Device %s %s is %lu bytes.", + dev_name(dev), attribute, result); + +out: + if (fclose(fp)) + log_sys_error("fclose", path); + + return result >> SECTOR_SHIFT; +} + +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev) +{ + return _dev_topology_attribute("alignment_offset", + sysfs_dir, dev); +} + +#else + +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev) +{ + return 0UL; +} + +#endif Index: LVM2/lib/device/device.h =================================================================== --- LVM2.orig/lib/device/device.h +++ LVM2/lib/device/device.h @@ -100,4 +100,7 @@ unsigned long dev_md_stripe_width(const int is_partitioned_dev(struct device *dev); +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev); + #endif Index: LVM2/lib/metadata/metadata.c =================================================================== --- LVM2.orig/lib/metadata/metadata.c +++ LVM2/lib/metadata/metadata.c @@ -1027,6 +1027,14 @@ static struct physical_volume *_pv_creat pv->fmt = fmt; pv->vg_name = fmt->orphan_vg_name; + if (!pe_start && !data_alignment_offset && + find_config_tree_bool(pv->fmt->cmd, + "devices/data_alignment_offset_detection", + DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION)) { + data_alignment_offset = + dev_alignment_offset(pv->fmt->cmd->sysfs_dir, pv->dev); + } + if (!fmt->ops->pv_setup(fmt, pe_start, existing_extent_count, existing_extent_size, data_alignment, data_alignment_offset, Index: LVM2/man/lvm.conf.5.in =================================================================== --- LVM2.orig/man/lvm.conf.5.in +++ LVM2/man/lvm.conf.5.in @@ -142,10 +142,17 @@ when creating a new Physical Volume usin If a Physical Volume is placed directly upon an md device and \fBmd_chunk_alignment\fP is enabled this parameter is ignored. Set to 0 to use the default alignment of 64KB or the page size, if larger. +.IP +\fBdata_alignment_offset_detection\fP \(em If set to 1, and your kernel +provides topology information in sysfs for the Physical Volume, the +start of the aligned data area of the Physical Volume will be padded +with the alignment_offset exposed in sysfs. .sp To see the location of the first Physical Extent of an existing Physical Volume use \fBpvs -o +pe_start\fP . It will be a multiple of the requested -\fBdata_alignment\fP. +\fBdata_alignment\fP plus the alignment_offset from +\fBdata_alignment_offset_detection\fP (if enabled) or the pvcreate +commandline. .TP \fBlog\fP \(em Default log settings .IP ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-13 20:11 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer @ 2009-07-16 18:47 ` Milan Broz 2009-07-16 21:00 ` Mike Snitzer 2009-07-16 21:15 ` Mike Snitzer 0 siblings, 2 replies; 16+ messages in thread From: Milan Broz @ 2009-07-16 18:47 UTC (permalink / raw) To: lvm-devel Mike Snitzer wrote: > + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", > + sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev), > + attribute) < 0) { > + log_error("dm_snprintf %s failed", attribute); > + return 0; > + } > > this segfaults here on 32bit... --- lib/device/device.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/device/device.c b/lib/device/device.c index 2d01ec8..5330e3c 100644 --- a/lib/device/device.c +++ b/lib/device/device.c @@ -296,7 +296,7 @@ int primary_dev(const char *sysfs_dir, /* check if dev is a partition */ if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/partition", - sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev)) < 0) { + sysfs_dir, (int)MAJOR(dev->dev), (int)MINOR(dev->dev)) < 0) { log_error("dm_snprintf partition failed"); return ret; } @@ -366,7 +366,7 @@ static unsigned long _dev_topology_attribute(const char *attribute, return_0; if (dm_snprintf(path, PATH_MAX, sysfs_fmt_str, sysfs_dir, - MAJOR(dev->dev), MINOR(dev->dev), + (int)MAJOR(dev->dev), (int)MINOR(dev->dev), attribute) < 0) { log_error("dm_snprintf %s failed", attribute); return 0; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-16 18:47 ` Milan Broz @ 2009-07-16 21:00 ` Mike Snitzer 2009-07-16 21:15 ` Mike Snitzer 1 sibling, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-16 21:00 UTC (permalink / raw) To: lvm-devel On Thu, Jul 16 2009 at 2:47pm -0400, Milan Broz <mbroz@redhat.com> wrote: > Mike Snitzer wrote: > > + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", > > + sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev), > > + attribute) < 0) { > > + log_error("dm_snprintf %s failed", attribute); > > + return 0; > > + } > > > > > this segfaults here on 32bit... Odd.. I mean is it obvious to you _why_ that would be the case? Thanks for your fix. Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-16 18:47 ` Milan Broz 2009-07-16 21:00 ` Mike Snitzer @ 2009-07-16 21:15 ` Mike Snitzer 2009-07-17 7:43 ` Milan Broz 1 sibling, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-07-16 21:15 UTC (permalink / raw) To: lvm-devel On Thu, Jul 16 2009 at 2:47pm -0400, Milan Broz <mbroz@redhat.com> wrote: > Mike Snitzer wrote: > > + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", > > + sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev), > > + attribute) < 0) { > > + log_error("dm_snprintf %s failed", attribute); > > + return 0; > > + } > > > > > this segfaults here on 32bit... > > --- > lib/device/device.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/device/device.c b/lib/device/device.c > index 2d01ec8..5330e3c 100644 > --- a/lib/device/device.c > +++ b/lib/device/device.c > @@ -296,7 +296,7 @@ int primary_dev(const char *sysfs_dir, > > /* check if dev is a partition */ > if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/partition", > - sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev)) < 0) { > + sysfs_dir, (int)MAJOR(dev->dev), (int)MINOR(dev->dev)) < 0) { > log_error("dm_snprintf partition failed"); > return ret; > } > @@ -366,7 +366,7 @@ static unsigned long _dev_topology_attribute(const char *attribute, > return_0; > > if (dm_snprintf(path, PATH_MAX, sysfs_fmt_str, sysfs_dir, > - MAJOR(dev->dev), MINOR(dev->dev), > + (int)MAJOR(dev->dev), (int)MINOR(dev->dev), > attribute) < 0) { > log_error("dm_snprintf %s failed", attribute); > return 0; > > BTW, Peter was seeing dm_snprintf() segfaults with my patches too. I'll be sure to cast all MAJOR() and MINOR() calls when used with dm_snprintf(). Would still like to understand how not using a cast gets us into trouble... but that is for when I get back from vacation ;) Mike ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-16 21:15 ` Mike Snitzer @ 2009-07-17 7:43 ` Milan Broz 0 siblings, 0 replies; 16+ messages in thread From: Milan Broz @ 2009-07-17 7:43 UTC (permalink / raw) To: lvm-devel Mike Snitzer wrote: > On Thu, Jul 16 2009 at 2:47pm -0400, > Milan Broz <mbroz@redhat.com> wrote: > I'll be sure to cast all MAJOR() and MINOR() calls when used with > dm_snprintf(). Would still like to understand how not using a cast gets > us into trouble... but that is for when I get back from vacation ;) That's clear, we have compatibility macros for MAJOR defined, on linux it is defined: # define MAJOR(dev) ((dev & 0xfff00) >> 8) printf %d expect on 32bit int (4 bytes) but MAJOR(dev) is long (8 bytes) here. On x86_64 it is the same size, so it works. Just use retype here... Milan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 3/6] Add devices/data_alignment_detection to lvm.conf. 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer 2009-07-13 20:11 ` [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area Mike Snitzer 2009-07-13 20:11 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer @ 2009-07-13 20:11 ` Mike Snitzer 2009-07-13 20:11 ` [PATCH 4/6] Improve ability to lookup primary device associated with partition Mike Snitzer ` (2 subsequent siblings) 5 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw) To: lvm-devel Adds 'data_alignment_detection' config option to the devices section of lvm.conf. If your kernel provides topology information in sysfs (linux >= 2.6.31) for the Physical Volume, the start of data area will be aligned on a multiple of the ???minimum_io_size??? or ???optimal_io_size??? exposed in sysfs. minimum_io_size is used if optimal_io_size is undefined (0). If both md_chunk_alignment and data_alignment_detection are enabled the result of data_alignment_detection is used. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- WHATS_NEW | 1 doc/example.conf | 12 ++++++- lib/config/defaults.h | 1 lib/device/device.c | 75 ++++++++++++++++++++++++++++++++++++++++++++---- lib/device/device.h | 6 +++ lib/metadata/metadata.c | 19 ++++++++++++ man/lvm.conf.5.in | 13 +++++++- 7 files changed, 118 insertions(+), 9 deletions(-) Index: LVM2/WHATS_NEW =================================================================== --- LVM2.orig/WHATS_NEW +++ LVM2/WHATS_NEW @@ -1,5 +1,6 @@ Version 2.02.49 - ================================ + Add devices/data_alignment_detection to lvm.conf. Add devices/data_alignment_offset_detection to lvm.conf. Add --dataalignmentoffset to pvcreate to pad aligned data area. Fix dev name mismatch in vgcreate man page example. Index: LVM2/doc/example.conf =================================================================== --- LVM2.orig/doc/example.conf +++ LVM2/doc/example.conf @@ -98,9 +98,17 @@ devices { # 1 enables; 0 disables. md_chunk_alignment = 1 + # By default, the start of a PV's data area will be aligned with + # the 'minimum_io_size' or 'optimal_io_size' exposed in sysfs. + # minimum_io_size is used if optimal_io_size is undefined (0). + # If md_chunk_alignment is enabled, that detects the optimal_io_size. + # This setting takes precedence over md_chunk_alignment. + # 1 enables; 0 disables. + data_alignment_detection = 1 + # Alignment (in KB) of start of data area when creating a new PV. - # If a PV is placed directly upon an md device and md_chunk_alignment is - # enabled this parameter is ignored. + # If a PV is placed directly upon an md device and md_chunk_alignment or + # data_alignment_detection is enabled this parameter is ignored. # Set to 0 for the default alignment of 64KB or page size, if larger. data_alignment = 0 Index: LVM2/lib/config/defaults.h =================================================================== --- LVM2.orig/lib/config/defaults.h +++ LVM2/lib/config/defaults.h @@ -35,6 +35,7 @@ #define DEFAULT_MD_CHUNK_ALIGNMENT 1 #define DEFAULT_IGNORE_SUSPENDED_DEVICES 1 #define DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION 1 +#define DEFAULT_DATA_ALIGNMENT_DETECTION 1 #define DEFAULT_LOCK_DIR "/var/lock/lvm" #define DEFAULT_LOCKING_LIB "liblvm2clusterlock.so" Index: LVM2/lib/device/device.c =================================================================== --- LVM2.orig/lib/device/device.c +++ LVM2/lib/device/device.c @@ -285,13 +285,36 @@ int _get_partition_type(struct dev_mgr * #ifdef linux +static int _primary_dev(const char *sysfs_dir, + struct device *dev, dev_t *result) +{ + char path[PATH_MAX+1]; + struct stat info; + + /* check if dev is a partition */ + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/partition", + sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev)) < 0) { + log_error("dm_snprintf partition failed"); + return 0; + } + + if (stat(path, &info) < 0) + return 0; + + *result = dev->dev - + (MINOR(dev->dev) % max_partitions(MAJOR(dev->dev))); + return 1; +} + static unsigned long _dev_topology_attribute(const char *attribute, const char *sysfs_dir, struct device *dev) { + const char *sysfs_fmt_str = "%s/dev/block/%d:%d/%s"; char path[PATH_MAX+1], buffer[64]; FILE *fp; struct stat info; + dev_t uninitialized_var(primary); unsigned long result = 0UL; if (!attribute || !*attribute) @@ -300,16 +323,32 @@ static unsigned long _dev_topology_attri if (!sysfs_dir || !*sysfs_dir) return_0; - if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", - sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev), + if (dm_snprintf(path, PATH_MAX, sysfs_fmt_str, sysfs_dir, + MAJOR(dev->dev), MINOR(dev->dev), attribute) < 0) { log_error("dm_snprintf %s failed", attribute); return 0; } - /* check if the desired sysfs attribute exists */ - if (stat(path, &info) < 0) - return 0; + /* + * check if the desired sysfs attribute exists + * - if not: either the kernel doesn't have topology support + * or the device could be a partition + */ + if (stat(path, &info) < 0) { + if (!_primary_dev(sysfs_dir, dev, &primary)) + return 0; + + /* get attribute from partition's primary device */ + if (dm_snprintf(path, PATH_MAX, sysfs_fmt_str, sysfs_dir, + MAJOR(primary), MINOR(primary), + attribute) < 0) { + log_error("pri dm_snprintf %s failed", attribute); + return 0; + } + if (stat(path, &info) < 0) + return 0; + } if (!(fp = fopen(path, "r"))) { log_sys_error("fopen", path); @@ -344,6 +383,20 @@ unsigned long dev_alignment_offset(const sysfs_dir, dev); } +unsigned long dev_minimum_io_size(const char *sysfs_dir, + struct device *dev) +{ + return _dev_topology_attribute("queue/minimum_io_size", + sysfs_dir, dev); +} + +unsigned long dev_optimal_io_size(const char *sysfs_dir, + struct device *dev) +{ + return _dev_topology_attribute("queue/optimal_io_size", + sysfs_dir, dev); +} + #else unsigned long dev_alignment_offset(const char *sysfs_dir, @@ -352,4 +405,16 @@ unsigned long dev_alignment_offset(const return 0UL; } +unsigned long dev_minimum_io_size(const char *sysfs_dir, + struct device *dev) +{ + return 0UL; +} + +unsigned long dev_optimal_io_size(const char *sysfs_dir, + struct device *dev) +{ + return 0UL; +} + #endif Index: LVM2/lib/device/device.h =================================================================== --- LVM2.orig/lib/device/device.h +++ LVM2/lib/device/device.h @@ -103,4 +103,10 @@ int is_partitioned_dev(struct device *de unsigned long dev_alignment_offset(const char *sysfs_dir, struct device *dev); +unsigned long dev_minimum_io_size(const char *sysfs_dir, + struct device *dev); + +unsigned long dev_optimal_io_size(const char *sysfs_dir, + struct device *dev); + #endif Index: LVM2/lib/metadata/metadata.c =================================================================== --- LVM2.orig/lib/metadata/metadata.c +++ LVM2/lib/metadata/metadata.c @@ -94,6 +94,25 @@ unsigned long set_pe_align(struct physic dev_md_stripe_width(pv->fmt->cmd->sysfs_dir, pv->dev)); + /* + * Align to topology's minimum_io_size or optimal_io_size if present + * - minimum_io_size - the smallest request the device can perform + * w/o incurring a read-modify-write penalty (e.g. MD's chunk size) + * - optimal_io_size - the device's preferred unit of receiving I/O + * (e.g. MD's stripe width) + */ + if (find_config_tree_bool(pv->fmt->cmd, + "devices/data_alignment_detection", + DEFAULT_DATA_ALIGNMENT_DETECTION)) { + pv->pe_align = MAX(pv->pe_align, + dev_minimum_io_size(pv->fmt->cmd->sysfs_dir, + pv->dev)); + + pv->pe_align = MAX(pv->pe_align, + dev_optimal_io_size(pv->fmt->cmd->sysfs_dir, + pv->dev)); + } + log_very_verbose("%s: Setting PE alignment to %lu sectors.", dev_name(pv->dev), pv->pe_align); Index: LVM2/man/lvm.conf.5.in =================================================================== --- LVM2.orig/man/lvm.conf.5.in +++ LVM2/man/lvm.conf.5.in @@ -137,11 +137,20 @@ has been reused without wiping the md su directly upon an md device, LVM2 will align its data blocks with the md device's stripe-width. .IP +\fBdata_alignment_detection\fP \(em If set to 1, and your kernel provides +topology information in sysfs for the Physical Volume, the start of data +area will be aligned on a multiple of the ???minimum_io_size??? or +???optimal_io_size??? exposed in sysfs. minimum_io_size is used if +optimal_io_size is undefined (0). If both \fBmd_chunk_alignment\fP and +\fBdata_alignment_detection\fP are enabled the result of +\fBdata_alignment_detection\fP is used. +.IP \fBdata_alignment\fP \(em Default alignment (in KB) of start of data area when creating a new Physical Volume using the \fBlvm2\fP format. If a Physical Volume is placed directly upon an md device and -\fBmd_chunk_alignment\fP is enabled this parameter is ignored. -Set to 0 to use the default alignment of 64KB or the page size, if larger. +\fBmd_chunk_alignment\fP or \fBdata_alignment_detection\fP is enabled +this parameter is ignored. Set to 0 to use the default alignment of +64KB or the page size, if larger. .IP \fBdata_alignment_offset_detection\fP \(em If set to 1, and your kernel provides topology information in sysfs for the Physical Volume, the ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 4/6] Improve ability to lookup primary device associated with partition 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer ` (2 preceding siblings ...) 2009-07-13 20:11 ` [PATCH 3/6] Add devices/data_alignment_detection " Mike Snitzer @ 2009-07-13 20:11 ` Mike Snitzer 2009-07-13 20:11 ` [PATCH 5/6] Retrieve MD sysfs attributes for MD partitions Mike Snitzer 2009-07-13 20:11 ` [PATCH 6/6] Add MD, partition and topology tests to the LVM2 test-suite Mike Snitzer 5 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw) To: lvm-devel Improve lib/device/device.c:_primary_dev()'s ability to look up the primary device associated with all partitions; including blkext (e.g. partitions directly on MD). The same will also work for obscure sysfs paths; e.g.: paths with mangled names like the HP cciss driver uses: /sys/block/cciss!c0d0/cciss!c0d0p1/ Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- lib/device/device.c | 56 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 49 insertions(+), 7 deletions(-) Index: LVM2/lib/device/device.c =================================================================== --- LVM2.orig/lib/device/device.c +++ LVM2/lib/device/device.c @@ -13,6 +13,7 @@ * Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#include <libgen.h> /* dirname, basename */ #include "lib.h" #include "lvm-types.h" #include "device.h" @@ -288,22 +289,63 @@ int _get_partition_type(struct dev_mgr * static int _primary_dev(const char *sysfs_dir, struct device *dev, dev_t *result) { - char path[PATH_MAX+1]; + char path[PATH_MAX+1], temp_path[PATH_MAX+1], buffer[64]; struct stat info; + FILE *fp; + int pri_maj, pri_min, ret = 0; /* check if dev is a partition */ if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/partition", sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev)) < 0) { log_error("dm_snprintf partition failed"); - return 0; + return ret; } - if (stat(path, &info) < 0) - return 0; + return ret; - *result = dev->dev - - (MINOR(dev->dev) % max_partitions(MAJOR(dev->dev))); - return 1; + /* + * extract parent's path from the partition's symlink, e.g.: + * - readlink /sys/dev/block/259:0 = ../../block/md0/md0p1 + * - dirname ../../block/md0/md0p1 = ../../block/md0 + * - basename ../../block/md0/md0 = md0 + * Parent's 'dev' sysfs attribute = /sys/block/md0/dev + */ + if (readlink(dirname(path), temp_path, PATH_MAX) < 0) { + log_sys_error("readlink", path); + return ret; + } + if (dm_snprintf(path, PATH_MAX, "%s/block/%s/dev", + sysfs_dir, basename(dirname(temp_path))) < 0) { + log_error("dm_snprintf dev failed"); + return ret; + } + + /* finally, parse 'dev' attribute and create corresponding dev_t */ + if (stat(path, &info) < 0) { + log_error("sysfs file %s does not exist", path); + return ret; + } + if (!(fp = fopen(path, "r"))) { + log_sys_error("fopen", path); + return ret; + } + if (!fgets(buffer, sizeof(buffer), fp)) { + log_sys_error("fgets", path); + goto out; + } + if (sscanf(buffer, "%d:%d", &pri_maj, &pri_min) != 2) { + log_error("sysfs file %s not in expected MAJ:MIN format: %s", + path, buffer); + goto out; + } + *result = makedev(pri_maj, pri_min); + ret = 1; + +out: + if (fclose(fp)) + log_sys_error("fclose", path); + + return ret; } static unsigned long _dev_topology_attribute(const char *attribute, ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/6] Retrieve MD sysfs attributes for MD partitions 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer ` (3 preceding siblings ...) 2009-07-13 20:11 ` [PATCH 4/6] Improve ability to lookup primary device associated with partition Mike Snitzer @ 2009-07-13 20:11 ` Mike Snitzer 2009-07-13 20:11 ` [PATCH 6/6] Add MD, partition and topology tests to the LVM2 test-suite Mike Snitzer 5 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw) To: lvm-devel Rename private _primary_dev() to a public primary_dev() and reuse it to allow retrieval of the MD sysfs attributes (raid level, etc) for MD partitions. Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- lib/device/dev-md.c | 13 +++++++++---- lib/device/device.c | 12 +++++++++--- lib/device/device.h | 3 +++ 3 files changed, 21 insertions(+), 7 deletions(-) Index: LVM2/lib/device/dev-md.c =================================================================== --- LVM2.orig/lib/device/dev-md.c +++ LVM2/lib/device/dev-md.c @@ -131,16 +131,21 @@ static int _md_sysfs_attribute_snprintf( const char *attribute) { struct stat info; + dev_t _dev = dev->dev; int ret = -1; - if (MAJOR(dev->dev) != md_major()) + if (!sysfs_dir || !*sysfs_dir) return ret; - if (!sysfs_dir || !*sysfs_dir) +check_md_major: + if (MAJOR(_dev) != md_major()) { + if (primary_dev(sysfs_dir, dev, &_dev)) + goto check_md_major; return ret; + } ret = dm_snprintf(path, size, "%s/dev/block/%d:%d/md/%s", - sysfs_dir, MAJOR(dev->dev), MINOR(dev->dev), attribute); + sysfs_dir, MAJOR(_dev), MINOR(_dev), attribute); if (ret < 0) { log_error("dm_snprintf md %s failed", attribute); return ret; @@ -149,7 +154,7 @@ static int _md_sysfs_attribute_snprintf( if (stat(path, &info) < 0) { /* old sysfs structure */ ret = dm_snprintf(path, size, "%s/block/md%d/md/%s", - sysfs_dir, MINOR(dev->dev), attribute); + sysfs_dir, MINOR(_dev), attribute); if (ret < 0) { log_error("dm_snprintf old md %s failed", attribute); return ret; Index: LVM2/lib/device/device.c =================================================================== --- LVM2.orig/lib/device/device.c +++ LVM2/lib/device/device.c @@ -286,8 +286,8 @@ int _get_partition_type(struct dev_mgr * #ifdef linux -static int _primary_dev(const char *sysfs_dir, - struct device *dev, dev_t *result) +int primary_dev(const char *sysfs_dir, + struct device *dev, dev_t *result) { char path[PATH_MAX+1], temp_path[PATH_MAX+1], buffer[64]; struct stat info; @@ -378,7 +378,7 @@ static unsigned long _dev_topology_attri * or the device could be a partition */ if (stat(path, &info) < 0) { - if (!_primary_dev(sysfs_dir, dev, &primary)) + if (!primary_dev(sysfs_dir, dev, &primary)) return 0; /* get attribute from partition's primary device */ @@ -441,6 +441,12 @@ unsigned long dev_optimal_io_size(const #else +int primary_dev(const char *sysfs_dir, + struct device *dev, dev_t *result) +{ + return 0; +} + unsigned long dev_alignment_offset(const char *sysfs_dir, struct device *dev) { Index: LVM2/lib/device/device.h =================================================================== --- LVM2.orig/lib/device/device.h +++ LVM2/lib/device/device.h @@ -100,6 +100,9 @@ unsigned long dev_md_stripe_width(const int is_partitioned_dev(struct device *dev); +int primary_dev(const char *sysfs_dir, + struct device *dev, dev_t *result); + unsigned long dev_alignment_offset(const char *sysfs_dir, struct device *dev); ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 6/6] Add MD, partition and topology tests to the LVM2 test-suite 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer ` (4 preceding siblings ...) 2009-07-13 20:11 ` [PATCH 5/6] Retrieve MD sysfs attributes for MD partitions Mike Snitzer @ 2009-07-13 20:11 ` Mike Snitzer 5 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-13 20:11 UTC (permalink / raw) To: lvm-devel Add MD, partition, and topology tests to the LVM2 test-suite. Added MD devices to the filter in test-utils.sh:prepare_lvmconf() Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- test/t-pvcreate-operation-md.sh | 85 ++++++++++++++++++++++++++++++++++++++++ test/t-pvcreate-usage.sh | 12 +++++ test/test-utils.sh | 2 3 files changed, 98 insertions(+), 1 deletion(-) Index: LVM2/test/t-pvcreate-operation-md.sh =================================================================== --- /dev/null +++ LVM2/test/t-pvcreate-operation-md.sh @@ -0,0 +1,85 @@ +# Copyright (C) 2009 Red Hat, Inc. All rights reserved. +# +# This copyrighted material is made available to anyone wishing to use, +# modify, copy, or redistribute it subject to the terms and conditions +# of the GNU General Public License v.2. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software Foundation, +# Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + +# skip this test if mdadm or sfdisk aren't available +which mdadm || exit 200 +which sfdisk || exit 200 + +. ./test-utils.sh + +aux prepare_devs 2 + +# Have MD use a non-standard name to avoid colliding with an existing MD device +mddev=/dev/md_lvm_test0 +mddev_p=${mddev}p1 + +cleanup_md() { + mdadm --stop $mddev + rm -f $mddev +} + +# create 2 disk MD raid0 array (stripe_width=128K) +[ -b "$mddev" ] && exit 200 +mdadm --create $mddev --auto=md --level 0 --raid-devices=2 --chunk 64 $dev1 $dev2 +trap 'aux cleanup_md' EXIT # cleanup this MD device at the end of the test + +# Test alignment of PV on MD without any MD-aware or topology-aware detection +# - should treat $mddev just like any other block device +pv_align="192.00K" +pvcreate --metadatasize 128k \ + --config 'devices {md_chunk_alignment=0 data_alignment_detection=0 data_alignment_offset_detection=0}' \ + $mddev +check_pv_field_ $mddev pe_start $pv_align + +# Test md_chunk_alignment independent of topology-aware detection +pv_align="256.00K" +pvcreate --metadatasize 128k \ + --config 'devices {data_alignment_detection=0 data_alignment_offset_detection=0}' \ + $mddev +check_pv_field_ $mddev pe_start $pv_align + +# Test newer topology-aware alignment detection +pv_align="256.00K" +pvcreate --metadatasize 128k \ + --config 'devices { md_chunk_alignment=0 }' $mddev +check_pv_field_ $mddev pe_start $pv_align + +# partition MD array directly, depends on blkext in Linux >= 2.6.28 +linux_minor=$(echo `uname -r` | cut -d'.' -f3 | cut -d'-' -f1) +if [ $linux_minor -gt 27 ]; then + sfdisk $mddev <<EOF +,,83 +EOF + # make sure partition on MD is _not_ removed + # - tests partition -> parent lookup via sysfs paths + not pvcreate --metadatasize 128k $mddev + + # verify alignment_offset padding is accounted for in pe_start + # - topology infrastructure is available in Linux >= 2.6.31 + # - also tests partition -> parent lookup via sysfs paths + base_mddev=`basename $mddev` + base_mddev_p=`basename $mddev_p` + sysfs_alignment_offset=/sys/block/${base_mddev}/${base_mddev_p}/alignment_offset + [ -f "$sysfs_alignment_offset" ] && \ + alignment_offset=`cat $sysfs_alignment_offset` || \ + alignment_offset=0 + + if [ "$alignment_offset" = "512" ]; then + pv_align="256.50K" + pvcreate --metadatasize 128k $mddev_p + check_pv_field_ $mddev_p pe_start $pv_align + pvremove $mddev_p + elif [ "$alignment_offset" = "2048" ]; then + pv_align="258.00K" + pvcreate --metadatasize 128k $mddev_p + check_pv_field_ $mddev_p pe_start $pv_align + pvremove $mddev_p + fi +fi Index: LVM2/test/t-pvcreate-usage.sh =================================================================== --- LVM2.orig/test/t-pvcreate-usage.sh +++ LVM2/test/t-pvcreate-usage.sh @@ -116,6 +116,18 @@ check_pv_field_ $dev1 pe_start $pv_align pvcreate --metadatasize 128k --metadatacopies 2 --dataalignment 3.5k $dev1 check_pv_field_ $dev1 pe_start $pv_align +# data area is aligned to 64k by default, +# data area has additional padding before its start +pv_align="195.50K" +pvcreate --metadatasize 128k --dataalignmentoffset 7s $dev1 +check_pv_field_ $dev1 pe_start $pv_align + +# 2nd metadata area is created without problems when +# additional data area padding is used +pvcreate --metadatasize 128k --metadatacopies 2 --dataalignmentoffset 7s $dev1 +check_pv_field_ $dev1 pv_mda_count 2 +# FIXME: compare start of 2nd mda with and without --dataalignmentoffset + #COMM 'pv with LVM1 compatible data alignment can be convereted' #compatible == LVM1_PE_ALIGN == 64k pvcreate --dataalignment 256k $dev1 Index: LVM2/test/test-utils.sh =================================================================== --- LVM2.orig/test/test-utils.sh +++ LVM2/test/test-utils.sh @@ -182,7 +182,7 @@ prepare_lvmconf() { devices { dir = "$G_dev_" scan = "$G_dev_" - filter = [ "a/dev\/mirror/", "a/dev\/mapper\/.*pv[0-9_]*$/", "r/.*/" ] + filter = [ "a|/dev/md.*|", "a/dev\/mirror/", "a/dev\/mapper\/.*pv[0-9_]*$/", "r/.*/" ] cache_dir = "$G_root_/etc" sysfs_scan = 0 } ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 0/6 v6] LVM2 topology support @ 2009-07-23 2:53 Mike Snitzer 2009-07-23 2:53 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-07-23 2:53 UTC (permalink / raw) To: lvm-devel LVM2 topology support to properly align the PV's pe_start using the associated 'alignment_offset' exposed in sysfs. Provides manual configuration with pvcreate --dataalignmentoffset and auto config control via 'devices/data_alignment_offset_detection' in lvm.conf. Documentation was added to the lvm.conf and pvcreate man pages. v2 changes: Also added another method for auto detection of --dataalignment using the topology information that is exposed in sysfs. Controlled via 'devices/data_alignment_detection' in lvm2.conf. v3 changes: Fixed 'devices/data_alignment_detection' patch (3/6) to retrieve the minimum_io_size and optimal_io_size for partitions from the parent device's 'queue/'. Both attributes are found in 'queue/' v4 changes: Simplified the _mda_setup() change in --dataalignmentoffset patch (1/6). Improved the documentation and comments in all patches. v5 changes: Improved LVM2's support for partitions (in the MD and topology code paths). Added MD, partitions, and topology tests to the testsuite. v6 changes: Added pe_align_offset to 'struct physical_volume'; Added set_pe_align_offset(). After pe_start is initialized pe_align_offset is added to it. Mike Snitzer (6): Add --dataalignmentoffset to pvcreate to pad aligned data area Add devices/data_alignment_offset_detection to lvm.conf. Add devices/data_alignment_detection to lvm.conf. Improve ability to lookup primary device associated with partition Retrieve MD sysfs attributes for MD partitions Add MD, partition and topology tests to the LVM2 test-suite WHATS_NEW | 3 + doc/example.conf | 21 ++++- lib/config/defaults.h | 2 + lib/device/dev-md.c | 13 ++- lib/device/device.c | 193 ++++++++++++++++++++++++++++++++++++++ lib/device/device.h | 12 +++ lib/format1/format1.c | 1 + lib/format_pool/format_pool.c | 1 + lib/format_text/archiver.c | 2 +- lib/format_text/format-text.c | 15 +++- lib/metadata/metadata-exported.h | 2 + lib/metadata/metadata.c | 57 +++++++++++- lib/metadata/metadata.h | 3 + man/lvm.conf.5.in | 22 ++++- man/pvcreate.8.in | 20 ++++- test/t-pvcreate-operation-md.sh | 85 +++++++++++++++++ test/t-pvcreate-usage.sh | 12 +++ test/test-utils.sh | 2 +- tools/args.h | 1 + tools/commands.h | 7 +- tools/pvcreate.c | 23 ++++- tools/vgconvert.c | 2 +- 22 files changed, 476 insertions(+), 23 deletions(-) create mode 100644 test/t-pvcreate-operation-md.sh ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-23 2:53 [PATCH 0/6 v6] LVM2 topology support Mike Snitzer @ 2009-07-23 2:53 ` Mike Snitzer 0 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-23 2:53 UTC (permalink / raw) To: lvm-devel If the pvcreate --dataalignmentoffset option is not specified the start of a PV's aligned data area will be padded with the associated 'alignment_offset' exposed in sysfs (unless devices/data_alignment_offset_detection is disabled in lvm.conf). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- WHATS_NEW | 1 + doc/example.conf | 9 +++++ lib/config/defaults.h | 1 + lib/device/device.c | 71 +++++++++++++++++++++++++++++++++++++++++ lib/device/device.h | 3 ++ lib/format_text/format-text.c | 8 +++- lib/metadata/metadata.c | 8 +++++ man/lvm.conf.5.in | 9 +++++- 8 files changed, 107 insertions(+), 3 deletions(-) diff --git a/WHATS_NEW b/WHATS_NEW index 92fed6a..8b8a749 100644 --- a/WHATS_NEW +++ b/WHATS_NEW @@ -1,5 +1,6 @@ Version 2.02.50 - ================================ + Add devices/data_alignment_offset_detection to lvm.conf. Add --dataalignmentoffset to pvcreate to pad aligned data area. Add liblvm2app Makefile installation targets. Add liblvm pkgconfig file. diff --git a/doc/example.conf b/doc/example.conf index 0682415..d52ab98 100644 --- a/doc/example.conf +++ b/doc/example.conf @@ -104,6 +104,15 @@ devices { # Set to 0 for the default alignment of 64KB or page size, if larger. data_alignment = 0 + # By default, the start of a PV's aligned data area will be padded with + # the 'alignment_offset' exposed in sysfs. This offset is often 0 but + # may be non-zero; e.g.: certain 4KB sector drives that compensate for + # windows partitioning will have an alignment_offset of 3584 bytes + # (sector 7 is the lowest aligned logical block, the 4KB sectors start + # at LBA -1, and consequently sector 63 is aligned on a 4KB boundary). + # 1 enables; 0 disables. + data_alignment_offset_detection = 1 + # If, while scanning the system for PVs, LVM2 encounters a device-mapper # device that has its I/O suspended, it waits for it to become accessible. # Set this to 1 to skip such devices. This should only be needed diff --git a/lib/config/defaults.h b/lib/config/defaults.h index ee6d758..68a862e 100644 --- a/lib/config/defaults.h +++ b/lib/config/defaults.h @@ -34,6 +34,7 @@ #define DEFAULT_MD_COMPONENT_DETECTION 1 #define DEFAULT_MD_CHUNK_ALIGNMENT 1 #define DEFAULT_IGNORE_SUSPENDED_DEVICES 1 +#define DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION 1 #define DEFAULT_LOCK_DIR "/var/lock/lvm" #define DEFAULT_LOCKING_LIB "liblvm2clusterlock.so" diff --git a/lib/device/device.c b/lib/device/device.c index 3248dd6..ebd8141 100644 --- a/lib/device/device.c +++ b/lib/device/device.c @@ -282,3 +282,74 @@ int _get_partition_type(struct dev_mgr *dm, struct device *d) return 0; } #endif + +#ifdef linux + +static unsigned long _dev_topology_attribute(const char *attribute, + const char *sysfs_dir, + struct device *dev) +{ + char path[PATH_MAX+1], buffer[64]; + FILE *fp; + struct stat info; + unsigned long result = 0UL; + + if (!attribute || !*attribute) + return_0; + + if (!sysfs_dir || !*sysfs_dir) + return_0; + + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", + sysfs_dir, (int)MAJOR(dev->dev), (int)MINOR(dev->dev), + attribute) < 0) { + log_error("dm_snprintf %s failed", attribute); + return 0; + } + + /* check if the desired sysfs attribute exists */ + if (stat(path, &info) < 0) + return 0; + + if (!(fp = fopen(path, "r"))) { + log_sys_error("fopen", path); + return 0; + } + + if (!fgets(buffer, sizeof(buffer), fp)) { + log_sys_error("fgets", path); + goto out; + } + + if (sscanf(buffer, "%lu", &result) != 1) { + log_error("sysfs file %s not in expected format: %s", path, + buffer); + goto out; + } + + log_very_verbose("Device %s %s is %lu bytes.", + dev_name(dev), attribute, result); + +out: + if (fclose(fp)) + log_sys_error("fclose", path); + + return result >> SECTOR_SHIFT; +} + +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev) +{ + return _dev_topology_attribute("alignment_offset", + sysfs_dir, dev); +} + +#else + +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev) +{ + return 0UL; +} + +#endif diff --git a/lib/device/device.h b/lib/device/device.h index b016823..32aee41 100644 --- a/lib/device/device.h +++ b/lib/device/device.h @@ -100,4 +100,7 @@ unsigned long dev_md_stripe_width(const char *sysfs_dir, struct device *dev); int is_partitioned_dev(struct device *dev); +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev); + #endif diff --git a/lib/format_text/format-text.c b/lib/format_text/format-text.c index 32af70b..1d2916c 100644 --- a/lib/format_text/format-text.c +++ b/lib/format_text/format-text.c @@ -1705,8 +1705,12 @@ static int _text_pv_setup(const struct format_type *fmt, "%lu sectors (requested %lu sectors)", pv_dev_name(pv), pv->pe_align, data_alignment); - if (!pe_start) - set_pe_align_offset(pv, data_alignment_offset); + if (!pe_start && + (set_pe_align_offset(pv, data_alignment_offset) != data_alignment_offset) && + data_alignment_offset) + log_warn("WARNING: %s: Overriding data alignment offset to " + "%lu sectors (requested %lu sectors)", + pv_dev_name(pv), pv->pe_align_offset, data_alignment_offset); if (pv->pe_start < pv->pe_align) { pv->pe_start = pv->pe_align; diff --git a/lib/metadata/metadata.c b/lib/metadata/metadata.c index 0dbb7bc..6ba8857 100644 --- a/lib/metadata/metadata.c +++ b/lib/metadata/metadata.c @@ -115,6 +115,14 @@ unsigned long set_pe_align_offset(struct physical_volume *pv, if (!pv->dev) goto out; + if (find_config_tree_bool(pv->fmt->cmd, + "devices/data_alignment_offset_detection", + DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION)) + pv->pe_align_offset = + MAX(pv->pe_align_offset, + dev_alignment_offset(pv->fmt->cmd->sysfs_dir, + pv->dev)); + log_very_verbose("%s: Setting PE alignment offset to %lu sectors.", dev_name(pv->dev), pv->pe_align_offset); diff --git a/man/lvm.conf.5.in b/man/lvm.conf.5.in index 0e3a949..0e7ede4 100644 --- a/man/lvm.conf.5.in +++ b/man/lvm.conf.5.in @@ -142,10 +142,17 @@ when creating a new Physical Volume using the \fBlvm2\fP format. If a Physical Volume is placed directly upon an md device and \fBmd_chunk_alignment\fP is enabled this parameter is ignored. Set to 0 to use the default alignment of 64KB or the page size, if larger. +.IP +\fBdata_alignment_offset_detection\fP \(em If set to 1, and your kernel +provides topology information in sysfs for the Physical Volume, the +start of the aligned data area of the Physical Volume will be padded +with the alignment_offset exposed in sysfs. .sp To see the location of the first Physical Extent of an existing Physical Volume use \fBpvs -o +pe_start\fP . It will be a multiple of the requested -\fBdata_alignment\fP. +\fBdata_alignment\fP plus the alignment_offset from +\fBdata_alignment_offset_detection\fP (if enabled) or the pvcreate +commandline. .TP \fBlog\fP \(em Default log settings .IP -- 1.6.2.5 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 0/6 v7] LVM2 topology support @ 2009-07-25 22:14 Mike Snitzer 2009-07-25 22:14 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer 0 siblings, 1 reply; 16+ messages in thread From: Mike Snitzer @ 2009-07-25 22:14 UTC (permalink / raw) To: lvm-devel LVM2 topology support to properly align the PV's pe_start using the associated 'alignment_offset' exposed in sysfs. Provides manual configuration with pvcreate --dataalignmentoffset and auto config control via 'devices/data_alignment_offset_detection' in lvm.conf. Documentation was added to the lvm.conf and pvcreate man pages. v2 changes: Also added another method for auto detection of --dataalignment using the topology information that is exposed in sysfs. Controlled via 'devices/data_alignment_detection' in lvm2.conf. v3 changes: Fixed 'devices/data_alignment_detection' patch (3/6) to retrieve the minimum_io_size and optimal_io_size for partitions from the parent device's 'queue/'. Both attributes are found in 'queue/' v4 changes: Simplified the _mda_setup() change in --dataalignmentoffset patch (1/6). Improved the documentation and comments in all patches. v5 changes: Improved LVM2's support for partitions (in the MD and topology code paths). Added MD, partitions, and topology tests to the testsuite. v6 changes: Added pe_align_offset to 'struct physical_volume'; Added set_pe_align_offset(). After pe_start is initialized pe_align_offset is added to it. v7 changes: Revised pvcreate manpage to document --datalignmentoffset more clearly. Added negative check that makes sure the PV's data area is not beyond the end of the device. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf. 2009-07-25 22:14 [PATCH 0/6 v7] LVM2 topology support Mike Snitzer @ 2009-07-25 22:14 ` Mike Snitzer 0 siblings, 0 replies; 16+ messages in thread From: Mike Snitzer @ 2009-07-25 22:14 UTC (permalink / raw) To: lvm-devel If the pvcreate --dataalignmentoffset option is not specified the start of a PV's aligned data area will be padded with the associated 'alignment_offset' exposed in sysfs (unless devices/data_alignment_offset_detection is disabled in lvm.conf). Signed-off-by: Mike Snitzer <snitzer@redhat.com> --- WHATS_NEW | 1 doc/example.conf | 9 +++++ lib/config/defaults.h | 1 lib/device/device.c | 71 ++++++++++++++++++++++++++++++++++++++++++ lib/device/device.h | 3 + lib/format_text/format-text.c | 8 +++- lib/metadata/metadata.c | 8 ++++ man/lvm.conf.5.in | 9 ++++- 8 files changed, 107 insertions(+), 3 deletions(-) Index: LVM2/WHATS_NEW =================================================================== --- LVM2.orig/WHATS_NEW +++ LVM2/WHATS_NEW @@ -1,5 +1,6 @@ Version 2.02.50 - ================================ + Add devices/data_alignment_offset_detection to lvm.conf. Add --dataalignmentoffset to pvcreate to shift start of aligned data area. Add global/wait_for_locks to lvm.conf so blocking for locks can be disabled. All LV locks are non-blocking so remove LCK_NONBLOCK from separate macros. Index: LVM2/doc/example.conf =================================================================== --- LVM2.orig/doc/example.conf +++ LVM2/doc/example.conf @@ -104,6 +104,15 @@ devices { # Set to 0 for the default alignment of 64KB or page size, if larger. data_alignment = 0 + # By default, the start of a PV's aligned data area will be padded with + # the 'alignment_offset' exposed in sysfs. This offset is often 0 but + # may be non-zero; e.g.: certain 4KB sector drives that compensate for + # windows partitioning will have an alignment_offset of 3584 bytes + # (sector 7 is the lowest aligned logical block, the 4KB sectors start + # at LBA -1, and consequently sector 63 is aligned on a 4KB boundary). + # 1 enables; 0 disables. + data_alignment_offset_detection = 1 + # If, while scanning the system for PVs, LVM2 encounters a device-mapper # device that has its I/O suspended, it waits for it to become accessible. # Set this to 1 to skip such devices. This should only be needed Index: LVM2/lib/config/defaults.h =================================================================== --- LVM2.orig/lib/config/defaults.h +++ LVM2/lib/config/defaults.h @@ -34,6 +34,7 @@ #define DEFAULT_MD_COMPONENT_DETECTION 1 #define DEFAULT_MD_CHUNK_ALIGNMENT 1 #define DEFAULT_IGNORE_SUSPENDED_DEVICES 1 +#define DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION 1 #define DEFAULT_LOCK_DIR "/var/lock/lvm" #define DEFAULT_LOCKING_LIB "liblvm2clusterlock.so" Index: LVM2/lib/device/device.c =================================================================== --- LVM2.orig/lib/device/device.c +++ LVM2/lib/device/device.c @@ -282,3 +282,74 @@ int _get_partition_type(struct dev_mgr * return 0; } #endif + +#ifdef linux + +static unsigned long _dev_topology_attribute(const char *attribute, + const char *sysfs_dir, + struct device *dev) +{ + char path[PATH_MAX+1], buffer[64]; + FILE *fp; + struct stat info; + unsigned long result = 0UL; + + if (!attribute || !*attribute) + return_0; + + if (!sysfs_dir || !*sysfs_dir) + return_0; + + if (dm_snprintf(path, PATH_MAX, "%s/dev/block/%d:%d/%s", + sysfs_dir, (int)MAJOR(dev->dev), (int)MINOR(dev->dev), + attribute) < 0) { + log_error("dm_snprintf %s failed", attribute); + return 0; + } + + /* check if the desired sysfs attribute exists */ + if (stat(path, &info) < 0) + return 0; + + if (!(fp = fopen(path, "r"))) { + log_sys_error("fopen", path); + return 0; + } + + if (!fgets(buffer, sizeof(buffer), fp)) { + log_sys_error("fgets", path); + goto out; + } + + if (sscanf(buffer, "%lu", &result) != 1) { + log_error("sysfs file %s not in expected format: %s", path, + buffer); + goto out; + } + + log_very_verbose("Device %s %s is %lu bytes.", + dev_name(dev), attribute, result); + +out: + if (fclose(fp)) + log_sys_error("fclose", path); + + return result >> SECTOR_SHIFT; +} + +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev) +{ + return _dev_topology_attribute("alignment_offset", + sysfs_dir, dev); +} + +#else + +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev) +{ + return 0UL; +} + +#endif Index: LVM2/lib/device/device.h =================================================================== --- LVM2.orig/lib/device/device.h +++ LVM2/lib/device/device.h @@ -100,4 +100,7 @@ unsigned long dev_md_stripe_width(const int is_partitioned_dev(struct device *dev); +unsigned long dev_alignment_offset(const char *sysfs_dir, + struct device *dev); + #endif Index: LVM2/lib/format_text/format-text.c =================================================================== --- LVM2.orig/lib/format_text/format-text.c +++ LVM2/lib/format_text/format-text.c @@ -1711,8 +1711,12 @@ static int _text_pv_setup(const struct f "%lu sectors (requested %lu sectors)", pv_dev_name(pv), pv->pe_align, data_alignment); - if (!pe_start) - set_pe_align_offset(pv, data_alignment_offset); + if (!pe_start && + (set_pe_align_offset(pv, data_alignment_offset) != data_alignment_offset) && + data_alignment_offset) + log_warn("WARNING: %s: Overriding data alignment offset to " + "%lu sectors (requested %lu sectors)", + pv_dev_name(pv), pv->pe_align_offset, data_alignment_offset); if (pv->pe_start < pv->pe_align) { pv->pe_start = pv->pe_align; Index: LVM2/lib/metadata/metadata.c =================================================================== --- LVM2.orig/lib/metadata/metadata.c +++ LVM2/lib/metadata/metadata.c @@ -115,6 +115,14 @@ unsigned long set_pe_align_offset(struct if (!pv->dev) goto out; + if (find_config_tree_bool(pv->fmt->cmd, + "devices/data_alignment_offset_detection", + DEFAULT_DATA_ALIGNMENT_OFFSET_DETECTION)) + pv->pe_align_offset = + MAX(pv->pe_align_offset, + dev_alignment_offset(pv->fmt->cmd->sysfs_dir, + pv->dev)); + log_very_verbose("%s: Setting PE alignment offset to %lu sectors.", dev_name(pv->dev), pv->pe_align_offset); Index: LVM2/man/lvm.conf.5.in =================================================================== --- LVM2.orig/man/lvm.conf.5.in +++ LVM2/man/lvm.conf.5.in @@ -142,10 +142,17 @@ when creating a new Physical Volume usin If a Physical Volume is placed directly upon an md device and \fBmd_chunk_alignment\fP is enabled this parameter is ignored. Set to 0 to use the default alignment of 64KB or the page size, if larger. +.IP +\fBdata_alignment_offset_detection\fP \(em If set to 1, and your kernel +provides topology information in sysfs for the Physical Volume, the +start of the aligned data area of the Physical Volume will be padded +with the alignment_offset exposed in sysfs. .sp To see the location of the first Physical Extent of an existing Physical Volume use \fBpvs -o +pe_start\fP . It will be a multiple of the requested -\fBdata_alignment\fP. +\fBdata_alignment\fP plus the alignment_offset from +\fBdata_alignment_offset_detection\fP (if enabled) or the pvcreate +commandline. .TP \fBlog\fP \(em Default log settings .IP ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-07-25 22:14 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-13 20:11 [PATCH 0/6 v5] LVM2 topology support Mike Snitzer 2009-07-13 20:11 ` [PATCH 1/6] Add --dataalignmentoffset to pvcreate to pad aligned data area Mike Snitzer 2009-07-16 15:11 ` Alasdair G Kergon 2009-07-16 20:53 ` Mike Snitzer 2009-07-16 22:11 ` Alasdair G Kergon 2009-07-13 20:11 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer 2009-07-16 18:47 ` Milan Broz 2009-07-16 21:00 ` Mike Snitzer 2009-07-16 21:15 ` Mike Snitzer 2009-07-17 7:43 ` Milan Broz 2009-07-13 20:11 ` [PATCH 3/6] Add devices/data_alignment_detection " Mike Snitzer 2009-07-13 20:11 ` [PATCH 4/6] Improve ability to lookup primary device associated with partition Mike Snitzer 2009-07-13 20:11 ` [PATCH 5/6] Retrieve MD sysfs attributes for MD partitions Mike Snitzer 2009-07-13 20:11 ` [PATCH 6/6] Add MD, partition and topology tests to the LVM2 test-suite Mike Snitzer -- strict thread matches above, loose matches on Subject: below -- 2009-07-23 2:53 [PATCH 0/6 v6] LVM2 topology support Mike Snitzer 2009-07-23 2:53 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer 2009-07-25 22:14 [PATCH 0/6 v7] LVM2 topology support Mike Snitzer 2009-07-25 22:14 ` [PATCH 2/6] Add devices/data_alignment_offset_detection to lvm.conf Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.