* [PATCH 1/3] sysfs representation of stacked devices (dm/md common)
2006-02-17 18:00 [PATCH 0/3] sysfs representation of stacked devices (dm/md) Jun'ichi Nomura
@ 2006-02-17 18:01 ` Jun'ichi Nomura
2006-02-17 18:44 ` Alasdair G Kergon
2006-02-17 18:03 ` [PATCH 2/3] sysfs representation of stacked devices (dm) Jun'ichi Nomura
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-17 18:01 UTC (permalink / raw)
To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree
Cc: device-mapper development, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 300 bytes --]
This patch provides common functions to create symlinks in sysfs
between stacked device and its slaves.
I placed functions in fs/block_dev.c as some of them are
privately used by bd_claim().
I'm not sure if it's better to put them in other files.
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
[-- Attachment #2: stacked-device-representation-in-sysfs-1-common.patch --]
[-- Type: text/x-patch, Size: 5621 bytes --]
Exporting stacked device relationship to sysfs (common functions)
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
--- linux-2.6.15.orig/include/linux/fs.h 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/include/linux/fs.h 2006-02-16 10:46:31.000000000 -0500
@@ -373,6 +373,7 @@ struct block_device {
struct list_head bd_inodes;
void * bd_holder;
int bd_holders;
+ struct kobject bd_holder_kobj;
struct block_device * bd_contains;
unsigned bd_block_size;
struct hd_struct * bd_part;
@@ -1352,6 +1353,13 @@ extern int blkdev_put(struct block_devic
extern int bd_claim(struct block_device *, void *);
extern void bd_release(struct block_device *);
+/* stacked device linking */
+extern void stackdev_init(struct kobject *, struct kobject *);
+extern void stackdev_clear(struct kobject *);
+extern void stackdev_link(struct block_device *,
+ struct kobject *, struct kobject *);
+extern void stackdev_unlink(struct block_device *,
+ struct kobject *, struct kobject *);
/* fs/char_dev.c */
extern int alloc_chrdev_region(dev_t *, unsigned, unsigned, const char *);
extern int register_chrdev_region(dev_t, unsigned, const char *);
--- linux-2.6.15.orig/fs/block_dev.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/fs/block_dev.c 2006-02-17 10:06:17.000000000 -0500
@@ -443,6 +443,126 @@ void bd_forget(struct inode *inode)
spin_unlock(&bdev_lock);
}
+/*
+ * kobject linking functions for stacked device
+ */
+
+static inline struct kobject * bdev_get_kobj(struct block_device *bdev)
+{
+ if (!bdev)
+ return NULL;
+ else if (bdev->bd_contains != bdev)
+ return kobject_get(&bdev->bd_part->kobj);
+ else
+ return kobject_get(&bdev->bd_disk->kobj);
+}
+
+static inline void add_symlink(struct kobject *from, struct kobject *to)
+{
+ if (!from || !to)
+ return;
+ sysfs_create_link(from, to, kobject_name(to));
+}
+
+static inline void del_symlink(struct kobject *from, struct kobject *to)
+{
+ if (!from || !to)
+ return;
+ sysfs_remove_link(from, kobject_name(to));
+}
+
+/* This is a mere directory in sysfs. No methods are needed. */
+static struct kobj_type bd_holder_ktype = {
+ .release = NULL,
+ .sysfs_ops = NULL,
+ .default_attrs = NULL,
+};
+
+/*
+ * Set the bdev as possible slave of a stacked device
+ * (called from bd_claim)
+ */
+static inline void add_holder_object(struct block_device *bdev)
+{
+ struct kobject *kobj = &bdev->bd_holder_kobj;
+
+ kobj->ktype = &bd_holder_ktype;
+ kobject_set_name(kobj, "holders");
+ kobj->parent = bdev_get_kobj(bdev);
+ kobject_register(kobj);
+ kobject_put(kobj->parent);
+}
+
+/*
+ * Declare the bdev is no longer a possible slave of a stacked device
+ * (called from bd_release)
+ */
+static inline void del_holder_object(struct block_device *bdev)
+{
+ kobject_unregister(&bdev->bd_holder_kobj);
+}
+
+/* This is a mere directory in sysfs. No methods are needed. */
+static struct kobj_type bd_slave_ktype = {
+ .release = NULL,
+ .sysfs_ops = NULL,
+ .default_attrs = NULL,
+};
+
+/* Set the bdev as stacked device */
+void stackdev_init(struct kobject *slave_dir, struct kobject *holder_dev)
+{
+ slave_dir->ktype = &bd_slave_ktype;
+ kobject_set_name(slave_dir, "slaves");
+ slave_dir->parent = holder_dev;
+
+ kobject_register(slave_dir);
+}
+EXPORT_SYMBOL(stackdev_init);
+
+/* Declare the bdev is no longer a stacked device */
+void stackdev_clear(struct kobject *slave_dir)
+{
+ kobject_unregister(slave_dir);
+}
+EXPORT_SYMBOL(stackdev_clear);
+
+/*
+ * Create symlinks between the holder and the slave of the stacked device.
+ * The slave devices needs to be bd_claim()-ed before.
+ */
+void stackdev_link(struct block_device *slave,
+ struct kobject *slave_dir, struct kobject *holder_dev)
+{
+ struct kobject *slave_dev, *holder_dir;
+
+ slave_dev = bdev_get_kobj(slave);
+
+ add_symlink(slave_dir, slave_dev);
+ add_symlink(&slave->bd_holder_kobj, holder_dev);
+
+ kobject_put(slave_dev);
+}
+EXPORT_SYMBOL(stackdev_link);
+
+/*
+ * Remove symlinks between the holder and the slave of the stacked device.
+ * This should be called before bd_release() the slave device.
+ */
+void stackdev_unlink(struct block_device *slave,
+ struct kobject *slave_dir, struct kobject *holder_dev)
+{
+ struct kobject *slave_dev, *holder_dir;
+
+ slave_dev = bdev_get_kobj(slave);
+
+ del_symlink(slave_dir, slave_dev);
+ del_symlink(&slave->bd_holder_kobj, holder_dev);
+
+ kobject_put(slave_dev);
+}
+EXPORT_SYMBOL(stackdev_unlink);
+
int bd_claim(struct block_device *bdev, void *holder)
{
int res;
@@ -450,7 +570,7 @@ int bd_claim(struct block_device *bdev,
/* first decide result */
if (bdev->bd_holder == holder)
- res = 0; /* already a holder */
+ res = 1; /* already a holder */
else if (bdev->bd_holder != NULL)
res = -EBUSY; /* held by someone else */
else if (bdev->bd_contains == bdev)
@@ -469,10 +589,14 @@ int bd_claim(struct block_device *bdev,
* will be incremented twice, and bd_holder will
* be set to bd_claim before being set to holder
*/
- bdev->bd_contains->bd_holders ++;
bdev->bd_contains->bd_holder = bd_claim;
- bdev->bd_holders++;
bdev->bd_holder = holder;
+ add_holder_object(bdev);
+ }
+ if (res >= 0) {
+ bdev->bd_contains->bd_holders ++;
+ bdev->bd_holders++;
+ res = 0;
}
spin_unlock(&bdev_lock);
return res;
@@ -485,8 +609,10 @@ void bd_release(struct block_device *bde
spin_lock(&bdev_lock);
if (!--bdev->bd_contains->bd_holders)
bdev->bd_contains->bd_holder = NULL;
- if (!--bdev->bd_holders)
+ if (!--bdev->bd_holders) {
bdev->bd_holder = NULL;
+ del_holder_object(bdev);
+ }
spin_unlock(&bdev_lock);
}
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 1/3] sysfs representation of stacked devices (dm/md common)
2006-02-17 18:01 ` [PATCH 1/3] sysfs representation of stacked devices (dm/md common) Jun'ichi Nomura
@ 2006-02-17 18:44 ` Alasdair G Kergon
2006-02-18 1:03 ` Jun'ichi Nomura
0 siblings, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-17 18:44 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree,
device-mapper development, linux-kernel
Make sure you test this properly under low memory situations.
On Fri, Feb 17, 2006 at 01:01:48PM -0500, Jun'ichi Nomura wrote:
> This patch provides common functions to create symlinks in sysfs
> between stacked device and its slaves.
dm_swap_table() mustn't block waiting for memory to become
free (except in a controlled way e.g. with a mempool, but
it would need more than that here).
Here, dm_swap_table() leads to kmalloc() getting called
in sysfs_add_link().
[e.g. Consider the extreme case where the dm device
you're changing is your swap device. While dm_swap_table()
runs, no I/O will get through to your swap device.]
If you can't avoid the sysfs code allocating memory, then
you must find a way of doing it before the dm suspend or
after the dm resume.
e.g. Do the sysfs memory allocations for the links prior to
the dm suspend [which may have happened in a previous system call]
and then use a different function to move them into place during
dm_swap_table() without performing further memory allocations?
[Lazy workaround is to set PF_MEMALLOC again...]
Alasdair
--
agk@redhat.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] sysfs representation of stacked devices (dm/md common)
2006-02-17 18:44 ` Alasdair G Kergon
@ 2006-02-18 1:03 ` Jun'ichi Nomura
2006-02-18 19:50 ` Alasdair G Kergon
0 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-18 1:03 UTC (permalink / raw)
To: Alasdair G Kergon
Cc: Neil Brown, Lars Marowsky-Bree, device-mapper development,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 749 bytes --]
Hi Alasdair,
Thank you for the comments.
Alasdair G Kergon wrote:
> Make sure you test this properly under low memory situations.
OK.
> dm_swap_table() mustn't block waiting for memory to become
> free (except in a controlled way e.g. with a mempool, but
> it would need more than that here).
>
> Here, dm_swap_table() leads to kmalloc() getting called
> in sysfs_add_link().
I moved the sysfs_add_link() to the last part of dm_resume().
Directory creation/deletion are also moved in the alloc_dev/
free_dev function as kobject_register/unregister may allocate
memory.
I didn't change the part of removing symlink as it doesn't
allocate memory.
Do you think this is an acceptable approach?
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
[-- Attachment #2: stacked-device-representation-in-sysfs-2-dm.patch --]
[-- Type: text/x-patch, Size: 2539 bytes --]
Exporting stacked device relationship to sysfs (dm)
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
--- linux-2.6.15.orig/drivers/md/dm.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm.c 2006-02-17 16:27:38.000000000 -0500
@@ -98,6 +98,12 @@ struct mapped_device {
*/
struct super_block *frozen_sb;
struct block_device *frozen_bdev;
+
+ /*
+ * sysfs deptree
+ */
+ int sysfs_linked;
+ struct kobject slave_dir;
};
#define MIN_IOS 256
@@ -791,6 +797,7 @@ static struct mapped_device *alloc_dev(u
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
add_disk(md->disk);
+ stackdev_init(&md->slave_dir, &md->disk->kobj);
atomic_set(&md->pending, 0);
init_waitqueue_head(&md->wait);
@@ -815,6 +822,7 @@ static void free_dev(struct mapped_devic
free_minor(md->disk->first_minor);
mempool_destroy(md->tio_pool);
mempool_destroy(md->io_pool);
+ stackdev_clear(&md->slave_dir);
del_gendisk(md->disk);
put_disk(md->disk);
blk_put_queue(md->queue);
@@ -841,6 +849,42 @@ static void __set_size(struct mapped_dev
up(&md->frozen_bdev->bd_inode->i_sem);
}
+/* create sysfs symlinks between mapped device and underlying devices */
+static int __link_device(struct mapped_device *md, struct dm_table *t)
+{
+ struct list_head *d, *devices;
+
+ if (md->sysfs_linked)
+ return 0;
+
+ devices = dm_table_get_devices(t);
+ for (d = devices->next; d != devices; d = d->next) {
+ struct dm_dev *dd = list_entry(d, struct dm_dev, list);
+ stackdev_link(dd->bdev, &md->slave_dir, &md->disk->kobj);
+ }
+
+ md->sysfs_linked = 1;
+ return 0;
+}
+
+/* remove sysfs symlinks between mapped device and underlying devices */
+static int __unlink_device(struct mapped_device *md, struct dm_table *t)
+{
+ struct list_head *d, *devices;
+
+ if (!md->sysfs_linked)
+ return 0;
+
+ devices = dm_table_get_devices(t);
+ for (d = devices->next; d != devices; d = d->next) {
+ struct dm_dev *dd = list_entry(d, struct dm_dev, list);
+ stackdev_unlink(dd->bdev, &md->slave_dir, &md->disk->kobj);
+ }
+
+ md->sysfs_linked = 0;
+ return 0;
+}
+
static int __bind(struct mapped_device *md, struct dm_table *t)
{
request_queue_t *q = md->queue;
@@ -873,6 +917,7 @@ static void __unbind(struct mapped_devic
write_lock(&md->map_lock);
md->map = NULL;
write_unlock(&md->map_lock);
+ __unlink_device(md, map);
dm_table_put(map);
}
@@ -1139,6 +1184,8 @@ int dm_resume(struct mapped_device *md)
dm_table_unplug_all(map);
+ __link_device(md, map);
+
r = 0;
out:
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 1/3] sysfs representation of stacked devices (dm/md common)
2006-02-18 1:03 ` Jun'ichi Nomura
@ 2006-02-18 19:50 ` Alasdair G Kergon
2006-02-21 15:33 ` Jun'ichi Nomura
0 siblings, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-18 19:50 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Alasdair G Kergon, Neil Brown, Lars Marowsky-Bree,
device-mapper development, linux-kernel
On Fri, Feb 17, 2006 at 08:03:48PM -0500, Jun'ichi Nomura wrote:
> I moved the sysfs_add_link() to the last part of dm_resume().
Test with trees of devices too - where a whole tree is suspended -
I don't think you can allocate anywhere in dm_swap_table()
without PF_MEMALLOC (which I recently removed and am reluctant
to reinstate).
Have you considered if anything is feasible based around bd_claim()?
Doesn't it make more sense for the links to be set up at table
load time - i.e. superset of both tables if present?
Alasdair
--
agk@redhat.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] sysfs representation of stacked devices (dm/md common)
2006-02-18 19:50 ` Alasdair G Kergon
@ 2006-02-21 15:33 ` Jun'ichi Nomura
2006-02-21 15:52 ` Alasdair G Kergon
0 siblings, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-21 15:33 UTC (permalink / raw)
To: Alasdair G Kergon, Neil Brown
Cc: Lars Marowsky-Bree, device-mapper development, linux-kernel
Thanks Alasdair and Neil,
Alasdair G Kergon wrote:
> Test with trees of devices too - where a whole tree is suspended -
Suspending maps in the tree and reload one of them?
I'll try that.
> I don't think you can allocate anywhere in dm_swap_table()
> without PF_MEMALLOC (which I recently removed and am reluctant
> to reinstate).
I understand your reluctance and I don't want to revive it either.
I think moving sysfs_add_link() outside of dm_swap_table() solves
this. Am I right?
Or do you want to eliminate the possibility that sysfs_remove_symlink()
may require memory allocation in future?
Anyway, I'll seek for bd_claim based approach.
> Have you considered if anything is feasible based around bd_claim()?
> Doesn't it make more sense for the links to be set up at table
> load time - i.e. superset of both tables if present?
I think it makes sense. But I have difficulty with it.
What I once thought was extending bd_claim() like:
bd_claim_with_owner(bdev, void *holder, struct kobject *owner)
where "owner" is a kobject for "slaves" directory.
We may have the object embedded in gendisk structure.
Then we can create symlinks like:
/sys/block/<bdev>/holders/<owner> --> /sys/block/<owner>
/sys/block/<owner>/slaves/<bdev> --> /sys/block/<bdev>
This should work for md.
However, dm needs more for its flexibility.
Because multiple dm devices can hold one device and one dm device
can hold a device twice (i.e. current table and new table),
we need to reference-count per relationship basis, not per slave
device.
This might be solved by allocating management struct in bd_claim()
to reference-counting the relationship.
I'll try this. Comments are welcome.
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] sysfs representation of stacked devices (dm/md common)
2006-02-21 15:33 ` Jun'ichi Nomura
@ 2006-02-21 15:52 ` Alasdair G Kergon
0 siblings, 0 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-21 15:52 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Neil Brown, Lars Marowsky-Bree, device-mapper development,
linux-kernel
On Tue, Feb 21, 2006 at 10:33:40AM -0500, Jun'ichi Nomura wrote:
> Alasdair G Kergon wrote:
> >Test with trees of devices too - where a whole tree is suspended -
> Suspending maps in the tree and reload one of them?
Reload a complete tree of devices like lvm2 does:
It loads inactivate tables wherever it needs to in the tree,
then suspends the devices in the correct order (according to
the dependencies of the live tables to avoid ever 'trapping' I/O
between two devices), then resumes them in order.
> >I don't think you can allocate anywhere in dm_swap_table()
> >without PF_MEMALLOC (which I recently removed and am reluctant
> >to reinstate).
> I understand your reluctance and I don't want to revive it either.
> I think moving sysfs_add_link() outside of dm_swap_table() solves
> this. Am I right?
I should have said: try hard to avoid allocations in any code run
during the 'DM_SUSPEND' ioctl - if you really have to, your options
include PF_MEMALLOC or a mempool, as appropriate.
> Or do you want to eliminate the possibility that sysfs_remove_symlink()
> may require memory allocation in future?
Either that, or:
> Anyway, I'll seek for bd_claim based approach.
This dodges the allocation problem because it happens in the DM_TABLE_LOAD
ioctl where I was able to remove the restriction recently.
Alasdair
--
agk@redhat.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] sysfs representation of stacked devices (dm)
2006-02-17 18:00 [PATCH 0/3] sysfs representation of stacked devices (dm/md) Jun'ichi Nomura
2006-02-17 18:01 ` [PATCH 1/3] sysfs representation of stacked devices (dm/md common) Jun'ichi Nomura
@ 2006-02-17 18:03 ` Jun'ichi Nomura
2006-02-17 18:05 ` [PATCH 3/3] sysfs representation of stacked devices (md) Jun'ichi Nomura
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-17 18:03 UTC (permalink / raw)
To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree
Cc: device-mapper development, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 129 bytes --]
This patch modifies dm driver to create symlinks to/from
underlying devices.
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
[-- Attachment #2: stacked-device-representation-in-sysfs-1-dm.patch --]
[-- Type: text/x-patch, Size: 2812 bytes --]
Exporting stacked device relationship to sysfs (dm)
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
--- linux-2.6.15.orig/drivers/md/dm.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm.c 2006-02-14 12:18:13.000000000 -0500
@@ -853,6 +853,7 @@ static int __bind(struct mapped_device *
dm_table_get(t);
dm_table_event_callback(t, event_callback, md);
+ dm_link_device(md, t);
write_lock(&md->map_lock);
md->map = t;
@@ -873,6 +874,7 @@ static void __unbind(struct mapped_devic
write_lock(&md->map_lock);
md->map = NULL;
write_unlock(&md->map_lock);
+ dm_unlink_device(md, map);
dm_table_put(map);
}
--- linux-2.6.15.orig/drivers/md/dm.h 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm.h 2006-02-14 12:21:10.000000000 -0500
@@ -94,6 +94,12 @@ int dm_wait_event(struct mapped_device *
struct gendisk *dm_disk(struct mapped_device *md);
int dm_suspended(struct mapped_device *md);
+/*
+ * kobject linking functions
+ */
+int dm_link_device(struct mapped_device *md, struct dm_table *t);
+int dm_unlink_device(struct mapped_device *md, struct dm_table *t);
+
/*-----------------------------------------------------------------
* Functions for manipulating a table. Tables are also reference
* counted.
--- linux-2.6.15.orig/drivers/md/dm-table.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/dm-table.c 2006-02-16 23:25:49.000000000 -0500
@@ -53,6 +53,9 @@ struct dm_table {
/* events get handed up using this callback */
void (*event_fn)(void *);
void *event_context;
+
+ /* sysfs deptree */
+ struct kobject slave_dir;
};
/*
@@ -945,6 +948,43 @@ int dm_table_flush_all(struct dm_table *
return ret;
}
+/* create sysfs symlinks between mapped device and underlying devices */
+int dm_link_device(struct mapped_device *md, struct dm_table *t)
+{
+ struct list_head *d, *devices;
+ struct kobject *md_kobj;
+
+ md_kobj = &dm_disk(md)->kobj;
+ stackdev_init(&t->slave_dir, md_kobj);
+
+ devices = dm_table_get_devices(t);
+ for (d = devices->next; d != devices; d = d->next) {
+ struct dm_dev *dd = list_entry(d, struct dm_dev, list);
+ stackdev_link(dd->bdev, &t->slave_dir, md_kobj);
+ }
+
+ return 0;
+}
+
+/* remove sysfs symlinks between mapped device and underlying devices */
+int dm_unlink_device(struct mapped_device *md, struct dm_table *t)
+{
+ struct list_head *d, *devices;
+ struct kobject *md_kobj;
+
+ md_kobj = &dm_disk(md)->kobj;
+
+ devices = dm_table_get_devices(t);
+ for (d = devices->next; d != devices; d = d->next) {
+ struct dm_dev *dd = list_entry(d, struct dm_dev, list);
+ stackdev_unlink(dd->bdev, &t->slave_dir, md_kobj);
+ }
+
+ stackdev_clear(&t->slave_dir);
+
+ return 0;
+}
+
EXPORT_SYMBOL(dm_vcalloc);
EXPORT_SYMBOL(dm_get_device);
EXPORT_SYMBOL(dm_put_device);
^ permalink raw reply [flat|nested] 15+ messages in thread* [PATCH 3/3] sysfs representation of stacked devices (md)
2006-02-17 18:00 [PATCH 0/3] sysfs representation of stacked devices (dm/md) Jun'ichi Nomura
2006-02-17 18:01 ` [PATCH 1/3] sysfs representation of stacked devices (dm/md common) Jun'ichi Nomura
2006-02-17 18:03 ` [PATCH 2/3] sysfs representation of stacked devices (dm) Jun'ichi Nomura
@ 2006-02-17 18:05 ` Jun'ichi Nomura
2006-02-17 19:42 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) Alasdair G Kergon
2006-02-19 22:04 ` Neil Brown
4 siblings, 0 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-17 18:05 UTC (permalink / raw)
To: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree
Cc: device-mapper development, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 129 bytes --]
This patch modifies md driver to create symlinks to/from
underlying devices.
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
[-- Attachment #2: stacked-device-representation-in-sysfs-1-md.patch --]
[-- Type: text/x-patch, Size: 1703 bytes --]
Exporting stacked device relationship to sysfs (md)
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
--- linux-2.6.15.orig/include/linux/raid/md_k.h 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/include/linux/raid/md_k.h 2006-02-16 15:36:22.000000000 -0500
@@ -155,6 +155,7 @@ struct mddev_s
struct gendisk *gendisk;
struct kobject kobj;
+ struct kobject slave_dir;
/* Superblock information */
int major_version,
md.c linux-2.6.15/drivers/md/md.c
--- linux-2.6.15.orig/drivers/md/md.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/drivers/md/md.c 2006-02-16 17:44:49.000000000 -0500
@@ -182,6 +182,7 @@ static void mddev_put(mddev_t *mddev)
return;
if (!mddev->raid_disks && list_empty(&mddev->disks)) {
list_del(&mddev->all_mddevs);
+ stackdev_clear(&mddev->slave_dir);
blk_put_queue(mddev->queue);
kobject_unregister(&mddev->kobj);
}
@@ -1226,6 +1227,7 @@ static int bind_rdev_to_array(mdk_rdev_t
else
ko = &rdev->bdev->bd_disk->kobj;
sysfs_create_link(&rdev->kobj, ko, "block");
+ stackdev_link(rdev->bdev, &mddev->slave_dir, &mddev->gendisk->kobj);
return 0;
}
@@ -1236,6 +1238,8 @@ static void unbind_rdev_from_array(mdk_r
MD_BUG();
return;
}
+ stackdev_unlink(rdev->bdev, &rdev->mddev->slave_dir,
+ &rdev->mddev->gendisk->kobj);
list_del_init(&rdev->same_set);
printk(KERN_INFO "md: unbind<%s>\n", bdevname(rdev->bdev,b));
rdev->mddev = NULL;
@@ -1924,6 +1928,7 @@ static struct kobject *md_probe(dev_t de
snprintf(mddev->kobj.name, KOBJ_NAME_LEN, "%s", "md");
mddev->kobj.ktype = &md_ktype;
kobject_register(&mddev->kobj);
+ stackdev_init(&mddev->slave_dir, &mddev->gendisk->kobj);
return NULL;
}
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md)
2006-02-17 18:00 [PATCH 0/3] sysfs representation of stacked devices (dm/md) Jun'ichi Nomura
` (2 preceding siblings ...)
2006-02-17 18:05 ` [PATCH 3/3] sysfs representation of stacked devices (md) Jun'ichi Nomura
@ 2006-02-17 19:42 ` Alasdair G Kergon
2006-02-18 1:21 ` Jun'ichi Nomura
2006-02-18 6:06 ` Kyle Moffett
2006-02-19 22:04 ` Neil Brown
4 siblings, 2 replies; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-17 19:42 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Neil Brown, Alasdair Kergon, Lars Marowsky-Bree,
device-mapper development, linux-kernel
On Fri, Feb 17, 2006 at 01:00:17PM -0500, Jun'ichi Nomura wrote:
> These patches provide common representation of dependencies
> between stacked devices (dm and md) in sysfs.
I'm neutral on this change so long as it can be done without
introducing problems for device-mapper.
> Though md0, dm-0, dm-1 and sd[a-d] contain same LVM2 meta data,
> LVM2 should pick up md0 as PV, not dm-0, dm-1 and sdXs.
> mdadm should build md0 from dm-0 and dm-1, not from sdXs.
> Similar things will happen on 'mount' and 'fsck' if we use
> file system labels instead of LVM2.
I can't speak for the 'mount' code base, but I don't think it'll
make any significant difference to LVM2 - we'd still have to do
all the same device scanning as we do now because we have to be
aware of md devices defined in on-disk metadata regardless of
whether or not the kernel knows about them at the time the
command is run.
> Currently, these relationships are determined by each tool
> combining information like the existence of md metadata
> and dm dependency ioctl.
And attempts to open a device exclusively. That's one check LVM2
does before running 'pvcreate' on a device.
> thus we only need to check "holders" directory of the device
> to decide whether the device is used by dm/md.
> Also we can walk down the "slaves" directories to collect
> the devices conposing the given dm/md device.
For device-mapper devices, 'dmsetup deps' and ls --tree already
gives you this information reasonably efficiently.
Would others find the proposal useful for non-dm devices?
And rather than adding code just to dm and md, would it be better
to implement it by enhancing bd_claim()?
Alasdair
--
agk@redhat.com
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md)
2006-02-17 19:42 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) Alasdair G Kergon
@ 2006-02-18 1:21 ` Jun'ichi Nomura
2006-02-18 19:53 ` Alasdair G Kergon
2006-02-18 6:06 ` Kyle Moffett
1 sibling, 1 reply; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-18 1:21 UTC (permalink / raw)
To: Alasdair G Kergon
Cc: Neil Brown, Lars Marowsky-Bree, device-mapper development,
linux-kernel
Hi,
Alasdair G Kergon wrote:
> I can't speak for the 'mount' code base, but I don't think it'll
> make any significant difference to LVM2 - we'd still have to do
> all the same device scanning as we do now because we have to be
> aware of md devices defined in on-disk metadata regardless of
> whether or not the kernel knows about them at the time the
> command is run.
Actually, as you say, LVM2 already does the relationship analysis
correctly by itself. So it's not 'good' example...
The point was that dm and md have similar dependency
structure but currently we have to scan all devices to
find out the upward relationship using different method
for dm and md.
>>thus we only need to check "holders" directory of the device
>>to decide whether the device is used by dm/md.
>>Also we can walk down the "slaves" directories to collect
>>the devices conposing the given dm/md device.
>
> For device-mapper devices, 'dmsetup deps' and ls --tree already
> gives you this information reasonably efficiently.
Speaking about the efficiency, 'dmsetup ls --tree' works well.
However, I haven't yet found a efficient way to implement
'dmsetup info --tree -o inverted dm-0', for example.
Deps ioctl provides downward information for a given dm device
but there is no method for upward information.
Providing reverse-deps ioctl in dm may be alternative solution.
But it still doesn't provide the holders of non-dm devices.
So I feel sysfs solution is appealing.
> Would others find the proposal useful for non-dm devices?
I would appreciate comments from others as well.
> And rather than adding code just to dm and md, would it be better
> to implement it by enhancing bd_claim()?
It may be possible if I can extend the bd_claim to accept
additional parameter because all dm devices use same 'holder'
signature for bd_claim but actual owner of the claim should
be determined to create symlinks.
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md)
2006-02-18 1:21 ` Jun'ichi Nomura
@ 2006-02-18 19:53 ` Alasdair G Kergon
2006-02-21 15:34 ` Jun'ichi Nomura
0 siblings, 1 reply; 15+ messages in thread
From: Alasdair G Kergon @ 2006-02-18 19:53 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Neil Brown, Lars Marowsky-Bree, device-mapper development,
linux-kernel
On Fri, Feb 17, 2006 at 08:21:32PM -0500, Jun'ichi Nomura wrote:
> Speaking about the efficiency, 'dmsetup ls --tree' works well.
> However, I haven't yet found a efficient way to implement
> 'dmsetup info --tree -o inverted dm-0', for example.
Indeed - but what needs this that doesn't also need to scan
everything? mount?
Alasdair
--
agk@redhat.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md)
2006-02-18 19:53 ` Alasdair G Kergon
@ 2006-02-21 15:34 ` Jun'ichi Nomura
0 siblings, 0 replies; 15+ messages in thread
From: Jun'ichi Nomura @ 2006-02-21 15:34 UTC (permalink / raw)
To: Alasdair G Kergon
Cc: Neil Brown, Lars Marowsky-Bree, device-mapper development,
linux-kernel
Alasdair G Kergon wrote:
>>Speaking about the efficiency, 'dmsetup ls --tree' works well.
>>However, I haven't yet found a efficient way to implement
>>'dmsetup info --tree -o inverted dm-0', for example.
>
> Indeed - but what needs this that doesn't also need to scan
> everything? mount?
mount, fsck and other blkid based tools could be optimized with it.
However, what I had in mind was system administration like just
using dmsetup or looking /sys to check where a device belongs.
--
Jun'ichi Nomura, NEC Solutions (America), Inc.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md)
2006-02-17 19:42 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) Alasdair G Kergon
2006-02-18 1:21 ` Jun'ichi Nomura
@ 2006-02-18 6:06 ` Kyle Moffett
1 sibling, 0 replies; 15+ messages in thread
From: Kyle Moffett @ 2006-02-18 6:06 UTC (permalink / raw)
To: Alasdair G Kergon
Cc: Jun'ichi Nomura, Neil Brown, Lars Marowsky-Bree,
device-mapper development, linux-kernel
On Feb 17, 2006, at 14:42, Alasdair G Kergon wrote:
> On Fri, Feb 17, 2006 at 01:00:17PM -0500, Jun'ichi Nomura wrote:
>> Though md0, dm-0, dm-1 and sd[a-d] contain same LVM2 meta data,
>> LVM2 should pick up md0 as PV, not dm-0, dm-1 and sdXs. mdadm
>> should build md0 from dm-0 and dm-1, not from sdXs. Similar things
>> will happen on 'mount' and 'fsck' if we use file system labels
>> instead of LVM2.
>
> I can't speak for the 'mount' code base, but I don't think it'll
> make any significant difference to LVM2 - we'd still have to do all
> the same device scanning as we do now because we have to be aware
> of md devices defined in on-disk metadata regardless of whether or
> not the kernel knows about them at the time the command is run.
Aha! This is a very valid reason why we should export partition
types from the kernel to userspace: Partitions/devices that appear
to have 2 different filesystems/formats. The _kernel_ cannot
reliably tell which to use. On the other hand, a properly configured
_userspace_ initramfs could use configured partition-type
information, a small config file, and a user-configurable detection
algorithm to figure out that the device is _actually_ the first
segment of an ext3-on-LVM-on-RAID1, instead of a raw ext3, and mount
it appropriately. Now, this requires that the admin correctly
specify the partition types, but that seems a bit more reliable than
depending on the probe-order to get things right.
Cheers,
Kyle Moffett
--
Unix was not designed to stop people from doing stupid things,
because that would also stop them from doing clever things.
-- Doug Gwyn
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/3] sysfs representation of stacked devices (dm/md)
2006-02-17 18:00 [PATCH 0/3] sysfs representation of stacked devices (dm/md) Jun'ichi Nomura
` (3 preceding siblings ...)
2006-02-17 19:42 ` [PATCH 0/3] sysfs representation of stacked devices (dm/md) Alasdair G Kergon
@ 2006-02-19 22:04 ` Neil Brown
4 siblings, 0 replies; 15+ messages in thread
From: Neil Brown @ 2006-02-19 22:04 UTC (permalink / raw)
To: Jun'ichi Nomura
Cc: Alasdair Kergon, Lars Marowsky-Bree, device-mapper development,
linux-kernel
On Friday February 17, j-nomura@ce.jp.nec.com wrote:
> Hello,
>
> These patches provide common representation of dependencies
> between stacked devices (dm and md) in sysfs.
> For example, if dm-0 maps to sda, we have the following symlinks;
> /sys/block/dm-0/slaves/sda --> /sys/block/sda
> /sys/block/sda/holders/dm-0 --> /sys/block/dm-0
I happy with the idea of having these links.
I agree that it would be nice to have this very strongly based on the
bd_claim infrastructure.
It would be really nice if bd_claim took a "kobject *" rather than a
"void *" and put a link in there. This would be easy for dm and md,
but awkward for other claimers like filesystems and open file
descriptors as they don't currently have kobjects.
Possibly an extra flag that says if the 'holder' is a kobject or not,
and if it is, appropriate symlinks are created...??
NeilBrown
^ permalink raw reply [flat|nested] 15+ messages in thread