linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Schichan <nschichan@freebox.fr>
To: NeilBrown <neilb@suse.de>
Cc: LKML <linux-kernel@vger.kernel.org>, linux-raid@vger.kernel.org
Subject: Re: livelock during MD device open
Date: Wed, 15 Jan 2014 14:11:26 +0100	[thread overview]
Message-ID: <52D688FE.2030908@freebox.fr> (raw)
In-Reply-To: <20140115125740.160e8998@notabene.brown>

On 01/15/2014 02:57 AM, NeilBrown wrote:
[...]
> That's a very small race you are consistently losing - if I understand
> correctly.
>
> In __blkdev_get:
>
>   restart:
>
> 	ret = -ENXIO;
> 	disk = get_gendisk(bdev->bd_dev, &partno);
> 	if (!disk)
> 		goto out;
> 	owner = disk->fops->owner;
>
> 	disk_block_events(disk);
> 	mutex_lock_nested(&bdev->bd_mutex, for_part);
>
>
> The "get_gendisk" calls into md_alloc (via md_probe) and then add_disk(),
> which generates a uevent which is handled by udev.
> And before the above code gets to the mutex_lock_nexted(), the process run by
> udev must have opened the device (executing all that code above and more) and
> issued the ioctl.
>
> I guess it is possible, but happening every time to produce a live-lock
> suggests that the scheduler must be encouraging that behaviour.  Presumably
> this is a virtual machine with just one CPU ??

add_disk() will call schedule via call_usermode_helper() (for /sbin/hotplug), 
the scheduler will, I think, almost always choose to schedule the "udev" 
process instead of the process which did the first open to the md device, once 
the usermode helper has run.

SysRq + 'w' consistently showed a process that had called schedule() from 
md_alloc() -> add_disk() ... -> call_usermodehelper():


[   57.932671] test            D 805ac41c     0   748    745 0x00000000
[   57.939075] [<805ac41c>] (__schedule+0x33c/0x3b0) from [<805aa8f0>] 
(schedule_timeout+0x18/0x168)
[   57.947984] [<805aa8f0>] (schedule_timeout+0x18/0x168) from [<805abfa4>] 
(wait_for_common+0xf0/0x198)
[   57.957245] [<805abfa4>] (wait_for_common+0xf0/0x198) from [<80029e30>] 
(call_usermodehelper_exec+0xf8/0x15c)
[   57.967205] [<80029e30>] (call_usermodehelper_exec+0xf8/0x15c) from 
[<8028abd0>] (kobject_uevent_env+0x37c/0x3e8)
[   57.977515] [<8028abd0>] (kobject_uevent_env+0x37c/0x3e8) from [<8027cff8>] 
(add_disk+0x29c/0x400)
[   57.986520] [<8027cff8>] (add_disk+0x29c/0x400) from [<803bbcac>] 
(md_alloc+0x1cc/0x2cc)
[   57.994645] [<803bbcac>] (md_alloc+0x1cc/0x2cc) from [<803bbe38>] 
(md_probe+0xc/0x14)
[   58.002511] [<803bbe38>] (md_probe+0xc/0x14) from [<802ede10>] 
(kobj_lookup+0xd8/0x110)
[   58.010550] [<802ede10>] (kobj_lookup+0xd8/0x110) from [<8027cacc>] 
(get_gendisk+0x2c/0xe0)
[   58.018942] [<8027cacc>] (get_gendisk+0x2c/0xe0) from [<800c0598>] 
(__blkdev_get+0x28/0x364)
[   58.027416] [<800c0598>] (__blkdev_get+0x28/0x364) from [<800c0ab0>] 
(blkdev_get+0x1dc/0x318)
[   58.035983] [<800c0ab0>] (blkdev_get+0x1dc/0x318) from [<800918bc>] 
(do_dentry_open.isra.15+0x184/0x248)
[   58.045510] [<800918bc>] (do_dentry_open.isra.15+0x184/0x248) from 
[<800919a4>] (finish_open+0x24/0x38)
[   58.054945] [<800919a4>] (finish_open+0x24/0x38) from [<8009fb40>] 
(do_last+0x9a8/0xc08)
[   58.063071] [<8009fb40>] (do_last+0x9a8/0xc08) from [<8009ffdc>] 
(path_openat+0x23c/0x5c8)
[   58.071371] [<8009ffdc>] (path_openat+0x23c/0x5c8) from [<800a0620>] 
(do_filp_open+0x2c/0x78)
[   58.079935] [<800a0620>] (do_filp_open+0x2c/0x78) from [<800929b4>] 
(do_sys_open+0x124/0x1c4)
[   58.088498] [<800929b4>] (do_sys_open+0x124/0x1c4) from [<80009060>] 
(ret_fast_syscall+0x0/0x2c)

The system showing the livelock is a Marvell 88F6282 CPU (single core processor).

It took me some time start to suspect the problem was due to the md_ioctl() 
code :)

> I suppose the best fix is, as you suggest, to avoid clearing hold_active for
> invalid ioctls.  It feels a bit like papering over a bug, but I think the
> only way to really fix it is to add extra locking to the above code sequence
> and I don't want to do that.
>
> Of your two suggestions I much prefer the second.  It will be more code, but
> it is also more obviously correct.  The current code is rather messy with
> respect to invalid ioctl commands.
>
> I would be happy to accept a patch which aborted md_ioctl if the cmd wasn't
> one of those known to md.

I'll send a patch for that then.

Regards,


-- 
Nicolas Schichan
Freebox SAS

  reply	other threads:[~2014-01-15 13:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-14 17:14 livelock during MD device open Nicolas Schichan
2014-01-15  1:57 ` NeilBrown
2014-01-15 13:11   ` Nicolas Schichan [this message]
2014-01-15 15:58   ` [PATCH] md: check command validity early in md_ioctl() Nicolas Schichan
2014-01-15 21:55     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D688FE.2030908@freebox.fr \
    --to=nschichan@freebox.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).