linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* livelock during MD device open
@ 2014-01-14 17:14 Nicolas Schichan
  2014-01-15  1:57 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Schichan @ 2014-01-14 17:14 UTC (permalink / raw)
  To: Neil Brown, LKML, linux-raid



Hi,

I have recently been trying to find the cause a livelock occurring during MD 
device open.

The livelock happens when a process tries to open an MD device for the
first time and another opens the same MD device and sends an invalid
ioctl:

Process 1				Process 2
---------				---------

md_alloc()
   mddev_find()
   -> returns a new mddev with
      hold_active == UNTIL_IOCTL
   add_disk()
   -> sends KOBJ_ADD uevent

					(sees KOBJ_ADD uevent for device)
					md_open()
					md_ioctl(INVALID_IOCTL)
					-> returns ENODEV and clears
					   mddev->hold_active
					md_release()
					  md_put()
					  -> deletes the mddev as
					     hold_active is 0

md_open()
   mddev_find()
   -> returns a newly
     allocated mddev with
     mddev->gendisk == NULL
-> returns with ERESTARTSYS
    (kernel restarts the open syscall)


As to how to fix this, I see two possibilities:

- don't set hold_active to 0 if err is -ENODEV in the abort_unlock
   path in md_ioctl().

- check cmd parameter early in md_ioctl() and return -ENOTTY if the
   cmd parameter is not a valid MD ioctl.

Please advise on the preferred way to fix this, I'll be glad to send a
patch for whatever is the preferred solution.

I have also a simple C program that I can send should you want to reproduce 
the issue.

Regards,

-- 
Nicolas Schichan
Freebox SAS

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: livelock during MD device open
  2014-01-14 17:14 livelock during MD device open Nicolas Schichan
@ 2014-01-15  1:57 ` NeilBrown
  2014-01-15 13:11   ` Nicolas Schichan
  2014-01-15 15:58   ` [PATCH] md: check command validity early in md_ioctl() Nicolas Schichan
  0 siblings, 2 replies; 5+ messages in thread
From: NeilBrown @ 2014-01-15  1:57 UTC (permalink / raw)
  To: Nicolas Schichan; +Cc: LKML, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2966 bytes --]

On Tue, 14 Jan 2014 18:14:46 +0100 Nicolas Schichan <nschichan@freebox.fr>
wrote:

> 
> 
> Hi,
> 
> I have recently been trying to find the cause a livelock occurring during MD 
> device open.
> 
> The livelock happens when a process tries to open an MD device for the
> first time and another opens the same MD device and sends an invalid
> ioctl:
> 
> Process 1				Process 2
> ---------				---------
> 
> md_alloc()
>    mddev_find()
>    -> returns a new mddev with
>       hold_active == UNTIL_IOCTL
>    add_disk()
>    -> sends KOBJ_ADD uevent
> 
> 					(sees KOBJ_ADD uevent for device)
> 					md_open()
> 					md_ioctl(INVALID_IOCTL)
> 					-> returns ENODEV and clears
> 					   mddev->hold_active
> 					md_release()
> 					  md_put()
> 					  -> deletes the mddev as
> 					     hold_active is 0
> 
> md_open()
>    mddev_find()
>    -> returns a newly
>      allocated mddev with
>      mddev->gendisk == NULL
> -> returns with ERESTARTSYS
>     (kernel restarts the open syscall)
> 
> 
> As to how to fix this, I see two possibilities:
> 
> - don't set hold_active to 0 if err is -ENODEV in the abort_unlock
>    path in md_ioctl().
> 
> - check cmd parameter early in md_ioctl() and return -ENOTTY if the
>    cmd parameter is not a valid MD ioctl.
> 
> Please advise on the preferred way to fix this, I'll be glad to send a
> patch for whatever is the preferred solution.
> 
> I have also a simple C program that I can send should you want to reproduce 
> the issue.
> 
> Regards,
> 

That's a very small race you are consistently losing - if I understand
correctly.

In __blkdev_get:

 restart:

	ret = -ENXIO;
	disk = get_gendisk(bdev->bd_dev, &partno);
	if (!disk)
		goto out;
	owner = disk->fops->owner;

	disk_block_events(disk);
	mutex_lock_nested(&bdev->bd_mutex, for_part);


The "get_gendisk" calls into md_alloc (via md_probe) and then add_disk(),
which generates a uevent which is handled by udev.
And before the above code gets to the mutex_lock_nexted(), the process run by
udev must have opened the device (executing all that code above and more) and
issued the ioctl.

I guess it is possible, but happening every time to produce a live-lock
suggests that the scheduler must be encouraging that behaviour.  Presumably
this is a virtual machine with just one CPU ??

I suppose the best fix is, as you suggest, to avoid clearing hold_active for
invalid ioctls.  It feels a bit like papering over a bug, but I think the
only way to really fix it is to add extra locking to the above code sequence
and I don't want to do that.

Of your two suggestions I much prefer the second.  It will be more code, but
it is also more obviously correct.  The current code is rather messy with
respect to invalid ioctl commands.

I would be happy to accept a patch which aborted md_ioctl if the cmd wasn't
one of those known to md.

Thanks,
NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: livelock during MD device open
  2014-01-15  1:57 ` NeilBrown
@ 2014-01-15 13:11   ` Nicolas Schichan
  2014-01-15 15:58   ` [PATCH] md: check command validity early in md_ioctl() Nicolas Schichan
  1 sibling, 0 replies; 5+ messages in thread
From: Nicolas Schichan @ 2014-01-15 13:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: LKML, linux-raid

On 01/15/2014 02:57 AM, NeilBrown wrote:
[...]
> That's a very small race you are consistently losing - if I understand
> correctly.
>
> In __blkdev_get:
>
>   restart:
>
> 	ret = -ENXIO;
> 	disk = get_gendisk(bdev->bd_dev, &partno);
> 	if (!disk)
> 		goto out;
> 	owner = disk->fops->owner;
>
> 	disk_block_events(disk);
> 	mutex_lock_nested(&bdev->bd_mutex, for_part);
>
>
> The "get_gendisk" calls into md_alloc (via md_probe) and then add_disk(),
> which generates a uevent which is handled by udev.
> And before the above code gets to the mutex_lock_nexted(), the process run by
> udev must have opened the device (executing all that code above and more) and
> issued the ioctl.
>
> I guess it is possible, but happening every time to produce a live-lock
> suggests that the scheduler must be encouraging that behaviour.  Presumably
> this is a virtual machine with just one CPU ??

add_disk() will call schedule via call_usermode_helper() (for /sbin/hotplug), 
the scheduler will, I think, almost always choose to schedule the "udev" 
process instead of the process which did the first open to the md device, once 
the usermode helper has run.

SysRq + 'w' consistently showed a process that had called schedule() from 
md_alloc() -> add_disk() ... -> call_usermodehelper():


[   57.932671] test            D 805ac41c     0   748    745 0x00000000
[   57.939075] [<805ac41c>] (__schedule+0x33c/0x3b0) from [<805aa8f0>] 
(schedule_timeout+0x18/0x168)
[   57.947984] [<805aa8f0>] (schedule_timeout+0x18/0x168) from [<805abfa4>] 
(wait_for_common+0xf0/0x198)
[   57.957245] [<805abfa4>] (wait_for_common+0xf0/0x198) from [<80029e30>] 
(call_usermodehelper_exec+0xf8/0x15c)
[   57.967205] [<80029e30>] (call_usermodehelper_exec+0xf8/0x15c) from 
[<8028abd0>] (kobject_uevent_env+0x37c/0x3e8)
[   57.977515] [<8028abd0>] (kobject_uevent_env+0x37c/0x3e8) from [<8027cff8>] 
(add_disk+0x29c/0x400)
[   57.986520] [<8027cff8>] (add_disk+0x29c/0x400) from [<803bbcac>] 
(md_alloc+0x1cc/0x2cc)
[   57.994645] [<803bbcac>] (md_alloc+0x1cc/0x2cc) from [<803bbe38>] 
(md_probe+0xc/0x14)
[   58.002511] [<803bbe38>] (md_probe+0xc/0x14) from [<802ede10>] 
(kobj_lookup+0xd8/0x110)
[   58.010550] [<802ede10>] (kobj_lookup+0xd8/0x110) from [<8027cacc>] 
(get_gendisk+0x2c/0xe0)
[   58.018942] [<8027cacc>] (get_gendisk+0x2c/0xe0) from [<800c0598>] 
(__blkdev_get+0x28/0x364)
[   58.027416] [<800c0598>] (__blkdev_get+0x28/0x364) from [<800c0ab0>] 
(blkdev_get+0x1dc/0x318)
[   58.035983] [<800c0ab0>] (blkdev_get+0x1dc/0x318) from [<800918bc>] 
(do_dentry_open.isra.15+0x184/0x248)
[   58.045510] [<800918bc>] (do_dentry_open.isra.15+0x184/0x248) from 
[<800919a4>] (finish_open+0x24/0x38)
[   58.054945] [<800919a4>] (finish_open+0x24/0x38) from [<8009fb40>] 
(do_last+0x9a8/0xc08)
[   58.063071] [<8009fb40>] (do_last+0x9a8/0xc08) from [<8009ffdc>] 
(path_openat+0x23c/0x5c8)
[   58.071371] [<8009ffdc>] (path_openat+0x23c/0x5c8) from [<800a0620>] 
(do_filp_open+0x2c/0x78)
[   58.079935] [<800a0620>] (do_filp_open+0x2c/0x78) from [<800929b4>] 
(do_sys_open+0x124/0x1c4)
[   58.088498] [<800929b4>] (do_sys_open+0x124/0x1c4) from [<80009060>] 
(ret_fast_syscall+0x0/0x2c)

The system showing the livelock is a Marvell 88F6282 CPU (single core processor).

It took me some time start to suspect the problem was due to the md_ioctl() 
code :)

> I suppose the best fix is, as you suggest, to avoid clearing hold_active for
> invalid ioctls.  It feels a bit like papering over a bug, but I think the
> only way to really fix it is to add extra locking to the above code sequence
> and I don't want to do that.
>
> Of your two suggestions I much prefer the second.  It will be more code, but
> it is also more obviously correct.  The current code is rather messy with
> respect to invalid ioctl commands.
>
> I would be happy to accept a patch which aborted md_ioctl if the cmd wasn't
> one of those known to md.

I'll send a patch for that then.

Regards,


-- 
Nicolas Schichan
Freebox SAS

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] md: check command validity early in md_ioctl().
  2014-01-15  1:57 ` NeilBrown
  2014-01-15 13:11   ` Nicolas Schichan
@ 2014-01-15 15:58   ` Nicolas Schichan
  2014-01-15 21:55     ` NeilBrown
  1 sibling, 1 reply; 5+ messages in thread
From: Nicolas Schichan @ 2014-01-15 15:58 UTC (permalink / raw)
  To: Neil Brown, linux-raid, linux-kernel; +Cc: Nicolas Schichan

Verify that the cmd parameter passed to md_ioctl() is valid before
doing anything.

This fixes mddev->hold_active being set to 0 when an invalid ioctl
command is passed to md_ioctl() before the array has been configured.

Clearing mddev->hold_active in that case can lead to a livelock
situation when an invalid ioctl number is given to md_ioctl() by a
process when the mddev is currently being opened by another process:

Process 1				Process 2
---------				---------

md_alloc()
  mddev_find()
  -> returns a new mddev with
     hold_active == UNTIL_IOCTL
  add_disk()
  -> sends KOBJ_ADD uevent

					(sees KOBJ_ADD uevent for device)
                    			md_open()
                    			md_ioctl(INVALID_IOCTL)
                    			-> returns ENODEV and clears
                       			   mddev->hold_active
                    			md_release()
                      			md_put()
                      			-> deletes the mddev as
                         		   hold_active is 0

md_open()
  mddev_find()
  -> returns a newly
    allocated mddev with
    mddev->gendisk == NULL
-> returns with ERESTARTSYS
   (kernel restarts the open syscall)

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
---

A couple of notes:

This patch is based on linux 3.13-rc8.

The following  MD ioctl constants are  defined in md_u.h  but not used
anywhere else, so are not accepted as valid ioctl commands:

CLEAR_ARRAY
SET_DISK_INFO
WRITE_RAID_INFO
UNPROTECT_ARRAY
PROTECT_ARRAY


 drivers/md/md.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 21f4d7f..941ac65 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6328,6 +6328,32 @@ static int md_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 	return 0;
 }
 
+static inline bool md_ioctl_valid(unsigned int cmd)
+{
+	switch (cmd) {
+	case ADD_NEW_DISK:
+	case BLKROSET:
+	case GET_ARRAY_INFO:
+	case GET_BITMAP_FILE:
+	case GET_DISK_INFO:
+	case HOT_ADD_DISK:
+	case HOT_REMOVE_DISK:
+	case PRINT_RAID_DEBUG:
+	case RAID_AUTORUN:
+	case RAID_VERSION:
+	case RESTART_ARRAY_RW:
+	case RUN_ARRAY:
+	case SET_ARRAY_INFO:
+	case SET_BITMAP_FILE:
+	case SET_DISK_FAULTY:
+	case STOP_ARRAY:
+	case STOP_ARRAY_RO:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static int md_ioctl(struct block_device *bdev, fmode_t mode,
 			unsigned int cmd, unsigned long arg)
 {
@@ -6336,6 +6362,9 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
 	struct mddev *mddev = NULL;
 	int ro;
 
+	if (!md_ioctl_valid(cmd))
+		return -ENOTTY;
+
 	switch (cmd) {
 	case RAID_VERSION:
 	case GET_ARRAY_INFO:
-- 
1.8.3.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] md: check command validity early in md_ioctl().
  2014-01-15 15:58   ` [PATCH] md: check command validity early in md_ioctl() Nicolas Schichan
@ 2014-01-15 21:55     ` NeilBrown
  0 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2014-01-15 21:55 UTC (permalink / raw)
  To: Nicolas Schichan; +Cc: linux-raid, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3081 bytes --]

On Wed, 15 Jan 2014 16:58:52 +0100 Nicolas Schichan <nschichan@freebox.fr>
wrote:

> Verify that the cmd parameter passed to md_ioctl() is valid before
> doing anything.
> 
> This fixes mddev->hold_active being set to 0 when an invalid ioctl
> command is passed to md_ioctl() before the array has been configured.
> 
> Clearing mddev->hold_active in that case can lead to a livelock
> situation when an invalid ioctl number is given to md_ioctl() by a
> process when the mddev is currently being opened by another process:
> 
> Process 1				Process 2
> ---------				---------
> 
> md_alloc()
>   mddev_find()
>   -> returns a new mddev with
>      hold_active == UNTIL_IOCTL
>   add_disk()
>   -> sends KOBJ_ADD uevent
> 
> 					(sees KOBJ_ADD uevent for device)
>                     			md_open()
>                     			md_ioctl(INVALID_IOCTL)
>                     			-> returns ENODEV and clears
>                        			   mddev->hold_active
>                     			md_release()
>                       			md_put()
>                       			-> deletes the mddev as
>                          		   hold_active is 0
> 
> md_open()
>   mddev_find()
>   -> returns a newly
>     allocated mddev with
>     mddev->gendisk == NULL
> -> returns with ERESTARTSYS
>    (kernel restarts the open syscall)
> 
> Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
> ---
> 
> A couple of notes:
> 
> This patch is based on linux 3.13-rc8.
> 
> The following  MD ioctl constants are  defined in md_u.h  but not used
> anywhere else, so are not accepted as valid ioctl commands:
> 
> CLEAR_ARRAY
> SET_DISK_INFO
> WRITE_RAID_INFO
> UNPROTECT_ARRAY
> PROTECT_ARRAY
> 
> 
>  drivers/md/md.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 21f4d7f..941ac65 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6328,6 +6328,32 @@ static int md_getgeo(struct block_device *bdev, struct hd_geometry *geo)
>  	return 0;
>  }
>  
> +static inline bool md_ioctl_valid(unsigned int cmd)
> +{
> +	switch (cmd) {
> +	case ADD_NEW_DISK:
> +	case BLKROSET:
> +	case GET_ARRAY_INFO:
> +	case GET_BITMAP_FILE:
> +	case GET_DISK_INFO:
> +	case HOT_ADD_DISK:
> +	case HOT_REMOVE_DISK:
> +	case PRINT_RAID_DEBUG:
> +	case RAID_AUTORUN:
> +	case RAID_VERSION:
> +	case RESTART_ARRAY_RW:
> +	case RUN_ARRAY:
> +	case SET_ARRAY_INFO:
> +	case SET_BITMAP_FILE:
> +	case SET_DISK_FAULTY:
> +	case STOP_ARRAY:
> +	case STOP_ARRAY_RO:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>  static int md_ioctl(struct block_device *bdev, fmode_t mode,
>  			unsigned int cmd, unsigned long arg)
>  {
> @@ -6336,6 +6362,9 @@ static int md_ioctl(struct block_device *bdev, fmode_t mode,
>  	struct mddev *mddev = NULL;
>  	int ro;
>  
> +	if (!md_ioctl_valid(cmd))
> +		return -ENOTTY;
> +
>  	switch (cmd) {
>  	case RAID_VERSION:
>  	case GET_ARRAY_INFO:


Patch applied - thanks!

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-15 21:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-14 17:14 livelock during MD device open Nicolas Schichan
2014-01-15  1:57 ` NeilBrown
2014-01-15 13:11   ` Nicolas Schichan
2014-01-15 15:58   ` [PATCH] md: check command validity early in md_ioctl() Nicolas Schichan
2014-01-15 21:55     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).