public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] error management in add_disk()
@ 2007-07-24 11:57 Alban Crequy
  2007-07-24 13:28 ` Al Viro
  0 siblings, 1 reply; 3+ messages in thread
From: Alban Crequy @ 2007-07-24 11:57 UTC (permalink / raw)
  To: jens.axboe; +Cc: linux-kernel

Hi,

I have a problem with the error management of add_disk() and del_gendisk().

add_disk() adds an entry in /sys/block/<name>. The filename in /sys/block is
not (struct gen_disk)->disk_name but more or less the first KOBJ_NAME_LEN
characters of (struct gen_disk)->disk_name.

#define KOBJ_NAME_LEN                   20

My problem occurs when we try to add 2 disks with different names, but when
the KOBJ_NAME_LEN first characters are the same.

It is not allowed to have several files in /sys/block/ with the same name. It
does not prevent the disk to work, but the creation of the file in /sys/block
will silently fail. See the call chain:

add_disk() -> register_disk() -> kobject_add(&disk->kobj)
del_gendisk() -> kobject_del(&disk->kobj)

add_disk() and register_disk() return void, so the caller cannot know that
there is a problem. kobject_add() returns an error but register_disk() ignores
the error.

The disk driver is unaware of the failure of disk_add(), and may call
del_gendisk(). It deletes an object in /sys/block/ that was not added (bug!).
The disk driver cannot check that, so the reference counter of /sys/block is
decremented on kobject_del().

If the user run add_disk() and del_gendisk() a lot of times with different
names having the same 20-characters beginning, the reference counter of the
/sys/block directory will reach 0 (and it will be freed) although it still
contains files.

The attached test module triggers the problem. You can try something like:
for i in $(seq 1 100) ; do insmod ./adddiskbug.ko ; rmmod adddiskbug ; done

The attached patch fixes the problem by changing the prototype of add_disk() and
register_disk() to return errors.


Index: linux-2.6.23-rc1/block/genhd.c
===================================================================
--- linux-2.6.23-rc1.orig/block/genhd.c	2007-07-23 14:52:11.000000000 +0200
+++ linux-2.6.23-rc1/block/genhd.c	2007-07-23 18:08:58.000000000 +0200
@@ -175,13 +175,22 @@
  * This function registers the partitioning information in @disk
  * with the kernel.
  */
-void add_disk(struct gendisk *disk)
+int add_disk(struct gendisk *disk)
 {
+	int ret;
+
 	disk->flags |= GENHD_FL_UP;
 	blk_register_region(MKDEV(disk->major, disk->first_minor),
 			    disk->minors, NULL, exact_match, exact_lock, disk);
-	register_disk(disk);
+	ret = register_disk(disk);
+	if (ret) {
+		blk_unregister_region(MKDEV(disk->major, disk->first_minor),
+				      disk->minors);
+		return ret;
+	}
 	blk_register_queue(disk);
+
+	return 0;
 }
 
 EXPORT_SYMBOL(add_disk);
Index: linux-2.6.23-rc1/include/linux/genhd.h
===================================================================
--- linux-2.6.23-rc1.orig/include/linux/genhd.h	2007-07-09 01:32:17.000000000 +0200
+++ linux-2.6.23-rc1/include/linux/genhd.h	2007-07-23 15:12:53.000000000 +0200
@@ -235,7 +235,7 @@
 
 /* drivers/block/genhd.c */
 extern int get_blkdev_list(char *, int);
-extern void add_disk(struct gendisk *disk);
+extern int add_disk(struct gendisk *disk);
 extern void del_gendisk(struct gendisk *gp);
 extern void unlink_gendisk(struct gendisk *gp);
 extern struct gendisk *get_gendisk(dev_t dev, int *part);
Index: linux-2.6.23-rc1/fs/partitions/check.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/partitions/check.c	2007-07-23 14:52:13.000000000 +0200
+++ linux-2.6.23-rc1/fs/partitions/check.c	2007-07-23 15:53:04.000000000 +0200
@@ -469,7 +469,7 @@
 }
 
 /* Not exported, helper to add_disk(). */
-void register_disk(struct gendisk *disk)
+int register_disk(struct gendisk *disk)
 {
 	struct block_device *bdev;
 	char *s;
@@ -483,11 +483,11 @@
 	if (s)
 		*s = '!';
 	if ((err = kobject_add(&disk->kobj)))
-		return;
+		return -EEXIST;
 	err = disk_sysfs_symlinks(disk);
 	if (err) {
 		kobject_del(&disk->kobj);
-		return;
+		return -EEXIST;
 	}
  	disk_sysfs_add_subdirs(disk);
 
@@ -523,6 +523,8 @@
 			continue;
 		kobject_uevent(&p->kobj, KOBJ_ADD);
 	}
+
+	return 0;
 }
 
 int rescan_partitions(struct gendisk *disk, struct block_device *bdev)
Index: linux-2.6.23-rc1/include/linux/blkdev.h
===================================================================
--- linux-2.6.23-rc1.orig/include/linux/blkdev.h	2007-07-23 14:52:14.000000000 +0200
+++ linux-2.6.23-rc1/include/linux/blkdev.h	2007-07-23 15:58:47.000000000 +0200
@@ -643,7 +643,7 @@
 
 extern int blk_register_queue(struct gendisk *disk);
 extern void blk_unregister_queue(struct gendisk *disk);
-extern void register_disk(struct gendisk *dev);
+extern int register_disk(struct gendisk *dev);
 extern void generic_make_request(struct bio *bio);
 extern void blk_put_request(struct request *);
 extern void __blk_put_request(request_queue_t *, struct request *);



My test module:


----->8---------->8---------->8-----
#include <linux/module.h>
#include <linux/kernel.h>

#include <linux/blkdev.h>
#include <linux/genhd.h>
#include <linux/fs.h>
#include <linux/init.h>


#define DRIVER_NAME "adddiskbug"

int adddiskbug_init(void);
void adddiskbug_exit(void);


struct block_device_operations adddiskbug_fops = {
  .owner   = THIS_MODULE,
  .open    = NULL,
  .release = NULL,
  .ioctl   = NULL,
};


int major_number;
struct gendisk *gd[256];

void
print_refcount(struct gendisk *_gd, const char *str)
{
  struct kobject *obj;

  printk("%s\n", str);
  printk("name=%s: count=%d\n",
         _gd->disk_name,
         atomic_read(& _gd->kobj.kref.refcount));

  for (obj = & _gd->kobj ; obj ; obj = obj->parent)
  {
    printk("  obj %p '%s': %p '%s' count=%d\n",
           obj, obj->name, obj->k_name, obj->k_name?obj->k_name:"",
           atomic_read(& obj->kref.refcount));
  }
}

int new_disk(const char *name, int minor)
{
  int ret = 0;

  if (strlen(name) >= 32)
  {
    printk("ERROR: Name '%s' too long\n", name);
    return -EINVAL;
  }

  gd[minor] = alloc_disk(1);
  if (!gd[minor])
    return -ENOMEM;

  print_refcount(gd[0], "After alloc_disk:\n");

  gd[minor]->major = major_number;
  gd[minor]->first_minor = minor;
  gd[minor]->fops = &adddiskbug_fops;
  strlcpy(gd[minor]->disk_name, name, sizeof(gd[minor]->disk_name));

  set_capacity(gd[minor], (sector_t) 8192); /* 4MB */

  gd[minor]->queue = blk_alloc_queue(GFP_KERNEL);
  if (! gd[minor]->queue)
  {
    put_disk(gd[minor]);
    return -ENOMEM;
  }

#ifndef KERNEL_WITH_MY_PATCH
  add_disk(gd[minor]);
#else
  ret = add_disk(gd[minor]);
  if (ret)
  {
    blk_cleanup_queue(gd[minor]->queue);
    put_disk(gd[minor]);
    return ret;
  }
#endif

  return ret;
}

void delete_disk(int minor)
{
  struct gendisk *_gd = gd[minor];
  blk_cleanup_queue(_gd->queue);
  del_gendisk(_gd);
  put_disk(_gd);
}


int
adddiskbug_init(void)
{
  int ret;

  printk(" ==== init ===== \n");

  major_number = register_blkdev(0, DRIVER_NAME);
  if (major_number < 0)
    return -EINVAL;

  ret = new_disk("my_bogus_driver_number_1", 0);
  if (ret < 0)
  {
    unregister_blkdev(major_number, DRIVER_NAME);
    return -EINVAL;
  }
  print_refcount(gd[0], "after new_disk 0");

  ret = new_disk("my_bogus_driver_number_2", 1);
  if (ret < 0)
  {
    delete_disk(0);
    unregister_blkdev(major_number, DRIVER_NAME);
    return -EINVAL;
  }
  print_refcount(gd[0], "after new_disk 1");


  return 0;
}

void
adddiskbug_exit(void)
{
  print_refcount(gd[0], "before delete_disk 0");
  delete_disk(1);

  print_refcount(gd[0], "before delete_disk 1");
  delete_disk(0);

  unregister_blkdev(major_number, DRIVER_NAME);
}


module_init(adddiskbug_init);
module_exit(adddiskbug_exit);


----->8---------->8---------->8-----


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] error management in add_disk()
  2007-07-24 11:57 [RFC] error management in add_disk() Alban Crequy
@ 2007-07-24 13:28 ` Al Viro
  2007-07-25 15:29   ` Alban Crequy
  0 siblings, 1 reply; 3+ messages in thread
From: Al Viro @ 2007-07-24 13:28 UTC (permalink / raw)
  To: Alban Crequy; +Cc: jens.axboe, linux-kernel

On Tue, Jul 24, 2007 at 01:57:53PM +0200, Alban Crequy wrote:
> Hi,
> 
> I have a problem with the error management of add_disk() and del_gendisk().
> 
> add_disk() adds an entry in /sys/block/<name>. The filename in /sys/block is
> not (struct gen_disk)->disk_name but more or less the first KOBJ_NAME_LEN
> characters of (struct gen_disk)->disk_name.
> 
> #define KOBJ_NAME_LEN                   20
> 
> My problem occurs when we try to add 2 disks with different names, but when
> the KOBJ_NAME_LEN first characters are the same.

So don't do that.

> The attached test module triggers the problem. You can try something like:
> for i in $(seq 1 100) ; do insmod ./adddiskbug.ko ; rmmod adddiskbug ; done
> 
> The attached patch fixes the problem by changing the prototype of add_disk() and
> register_disk() to return errors.
 
This is bogus.  Just what would callers do with these error values?
Ignore them silently?  Bail out?  Can't do - at that point disk just might
have been opened already.  add_disk() is the point of no return; we are
already past the last point where we could bail out.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC] error management in add_disk()
  2007-07-24 13:28 ` Al Viro
@ 2007-07-25 15:29   ` Alban Crequy
  0 siblings, 0 replies; 3+ messages in thread
From: Alban Crequy @ 2007-07-25 15:29 UTC (permalink / raw)
  To: Al Viro; +Cc: jens.axboe, linux-kernel

Le Tue, 24 Jul 2007 14:28:05 +0100,
Al Viro <viro@ftp.linux.org.uk> a écrit :

>On Tue, Jul 24, 2007 at 01:57:53PM +0200, Alban Crequy wrote:
>> Hi,
>> 
>> I have a problem with the error management of add_disk() and
>> del_gendisk().
>> 
>> add_disk() adds an entry in /sys/block/<name>. The filename
>> in /sys/block is not (struct gen_disk)->disk_name but more or less
>> the first KOBJ_NAME_LEN characters of (struct gen_disk)->disk_name.
>> 
>> #define KOBJ_NAME_LEN                   20
>> 
>> My problem occurs when we try to add 2 disks with different names,
>> but when the KOBJ_NAME_LEN first characters are the same.
>
>So don't do that.

I no more do that. But I still think it would be better if we found a
way to manage errors in that case.

I fear that parts of kernel make this error. For example, old version of
GFS has this code:

http://csourcesearch.net/package/gfs-kernel/2.6.9/gfs-kernel-2.6.9-27/src/gfs/diaper.c
  char buf[BDEVNAME_SIZE];
  bdevname(real, buf);
  snprintf(gd->disk_name, sizeof(gd->disk_name), "diapered_%s", buf);

Since BDEVNAME_SIZE is 32 and KOBJ_NAME_LEN is 20, the bug happens quite
easily.

I did not check closely if this is a problem, but there is other parts
in the current kernel that build the disk_name with snprintf("...%s...")

>> The attached test module triggers the problem. You can try something
>> like: for i in $(seq 1 100) ; do insmod ./adddiskbug.ko ; rmmod
>> adddiskbug ; done
>> 
>> The attached patch fixes the problem by changing the prototype of
>> add_disk() and register_disk() to return errors.
> 
>This is bogus.  Just what would callers do with these error values?
>Ignore them silently?  Bail out?  Can't do - at that point disk just
>might have been opened already.  add_disk() is the point of no return;
>we are already past the last point where we could bail out.

I missed that point - that the disk might have been opened.  Where is
the point of no return in add_disk() exactly?  Is it really before the
kobject_add() that causes the problem?

In this case, perhaps we can 1/ check that the kobject_add() will not
fail before the point of no return, 2/ pass this point and then 3/ do
the kobject_add(). And add appropriate locking to ensure that nobody
add another disk with the same 20-characters truncated name between 1/
and 3/.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-07-25 15:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-24 11:57 [RFC] error management in add_disk() Alban Crequy
2007-07-24 13:28 ` Al Viro
2007-07-25 15:29   ` Alban Crequy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox