All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Ed Cashin <ecashin@coraid.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/7] aoe: avoid races between device destruction and discovery
Date: Tue, 4 Dec 2012 15:45:34 -0800	[thread overview]
Message-ID: <20121204154534.7fb65f2b.akpm@linux-foundation.org> (raw)
In-Reply-To: <d3095fd2ad59a9e1fffe9208c34e8b35e7a8b5e5.1354501189.git.ecashin@coraid.com>

On Mon, 3 Dec 2012 20:42:56 -0500
Ed Cashin <ecashin@coraid.com> wrote:

> This change avoids a race that could result in a NULL pointer
> derference following a WARNing from kobject_add_internal, "don't
> try to register things with the same name in the same directory."
> 
> The problem was found with a test that forgets and discovers an
> aoe device in a loop:
> 
>   while test ! -r /tmp/stop; do
> 	aoe-flush -a
> 	aoe-discover
>   done
> 
> The race was between aoedev_flush taking aoedevs out of the
> devlist, allowing a new discovery of the same AoE target to take
> place before the driver gets around to calling
> sysfs_remove_group.  Fixing that one revealed another race
> between do_open and add_disk, and this patch avoids that, too.
> 
> The fix required some care, because for flushing (forgetting) an
> aoedev, some of the steps must be performed under lock and some
> must be able to sleep.  Also, for discovering a new aoedev, some
> steps might sleep.
> 
> The check for a bad aoedev pointer remains from a time when about
> half of this patch was done, and it was possible for the
> bdev->bd_disk->private_data to become corrupted.  The check
> should be removed eventually, but it is not expected to add
> significant overhead, occurring in the aoeblk_open routine.
> 
>
> ...
>
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -147,9 +147,18 @@ aoeblk_open(struct block_device *bdev, fmode_t mode)
>  	struct aoedev *d = bdev->bd_disk->private_data;
>  	ulong flags;
>  
> +	if (!virt_addr_valid(d)) {
> +		pr_crit("aoe: invalid device pointer in %s\n",
> +			__func__);
> +		WARN_ON(1);
> +		return -ENODEV;
> +	}

Can this ever happen?

> +	if (!(d->flags & DEVFL_UP) || d->flags & DEVFL_TKILL)
> +		return -ENODEV;
> +
>  	mutex_lock(&aoeblk_mutex);
>  	spin_lock_irqsave(&d->lock, flags);
> -	if (d->flags & DEVFL_UP) {
> +	if (d->flags & DEVFL_UP && !(d->flags & DEVFL_TKILL)) {
>  		d->nopen++;
>  		spin_unlock_irqrestore(&d->lock, flags);
>  		mutex_unlock(&aoeblk_mutex);
> @@ -259,6 +268,18 @@ aoeblk_gdalloc(void *vp)
>  	struct request_queue *q;
>  	enum { KB = 1024, MB = KB * KB, READ_AHEAD = 2 * MB, };
>  	ulong flags;
> +	int late = 0;
> +
> +	spin_lock_irqsave(&d->lock, flags);
> +	if (d->flags & DEVFL_GDALLOC
> +	&& !(d->flags & DEVFL_TKILL)
> +	&& !(d->flags & DEVFL_GD_NOW))

That's pretty sickly-looking code layout.

We often do

	if ((d->flags & (DEVFL_GDALLOC|DEVFL_TKILL|DEVFL_GD_NOW)) ==
		DEVFL_GDALLOC)

in these cases.

> +		d->flags |= DEVFL_GD_NOW;
> +	else
> +		late = 1;
> +	spin_unlock_irqrestore(&d->lock, flags);
> +	if (late)
> +		return;
>  
>  	gd = alloc_disk(AOE_PARTITIONS);
>  	if (gd == NULL) {
>
> ...
>


  reply	other threads:[~2012-12-04 23:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1354501189.git.ecashin@coraid.com>
2012-12-04  1:40 ` [PATCH 1/7] aoe: improve handling of misbehaving network paths Ed Cashin, Ed Cashin
2012-12-04 23:39   ` Andrew Morton
2012-12-06 16:12     ` Ed Cashin
2012-12-04  1:42 ` [PATCH 2/7] aoe: avoid races between device destruction and discovery Ed Cashin, Ed Cashin
2012-12-04 23:45   ` Andrew Morton [this message]
2012-12-06 16:16     ` Ed Cashin
2012-12-04  1:44 ` [PATCH 3/7] aoe: use dynamic number of remote ports for AoE storage target Ed Cashin, Ed Cashin
2012-12-04  1:46 ` [PATCH 4/7] aoe: allow user to disable target failure timeout Ed Cashin, Ed Cashin
2012-12-04  1:48 ` [PATCH 5/7] aoe: allow comma separator in aoe_iflist value Ed Cashin, Ed Cashin
2012-12-04  1:50 ` [PATCH 6/7] aoe: identify source of runt AoE packets Ed Cashin, Ed Cashin
2012-12-04  1:53 ` [PATCH 7/7] aoe: update internal version number to 81 Ed Cashin, Ed Cashin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121204154534.7fb65f2b.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=ecashin@coraid.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.