linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian King <brking@us.ibm.com>
To: Olaf Hering <olh@suse.de>
Cc: James Bottomley <James.Bottomley@SteelEye.com>,
	linux-scsi@vger.kernel.org
Subject: Re: 2.6.16-rc1 crash in scsi_target_reap_work
Date: Wed, 22 Feb 2006 08:38:03 -0600	[thread overview]
Message-ID: <43FC774B.8050301@us.ibm.com> (raw)
In-Reply-To: <20060222083657.GA24802@suse.de>

Olaf Hering wrote:
>  On Mon, Feb 20, Brian King wrote:
> 
>> Olaf Hering wrote:
>>> 1:mon> d c0000000024cacc8
>>> c0000000024cacc8 00000000dead4ead ffffffff00000000  |......N.........|
>>> c0000000024cacd8 ffffffffffffffff c0000000024cace0  |.............L..|
>>> c0000000024cace8 c0000000024cace0 c000000000614f68  |.....L.......aOh|
>>> c0000000024cacf8 c000000000614f38 0000000000000000  |.....aO8........|
>>> c0000000024cad08 0000000000000000 0000000000000000  |................|
>>> c0000000024cad18 0000000000000000 0000000000000000  |................|
>>> c0000000024cad28 0000000000000000 0000000000000000  |................|
>>> c0000000024cad38 0000000000000000 0000000000000000  |................|
>>> c0000000024cad48 0000000000000000 0000000000000000  |................|
>>> c0000000024cad58 0000000000000000 0000000000000000  |................|
>>> c0000000024cad68 0000000000000000 0000000000000000  |................|
>>> c0000000024cad78 0000000000000000 0000000000000000  |................|
>>> c0000000024cad88 0000000000000000 0000000000000000  |................|
>>> c0000000024cad98 0000000000000000 0000000000000000  |................|
>>> c0000000024cada8 0000000000000000 0000000000000000  |................|
>>> c0000000024cadb8 0000000000000000 0000000000000000  |................|
>>> c0000000024cadc8 0000000000000000 0000000000000000  |................|
>>> c0000000024cadd8 0000000000000000 0000000000000000  |................|
>> I've now seen a couple recreates of this problem on various systems in
>> our labs, and there are always a bunch of zeroes in the struct device
>> in the same place as above. I wonder if perhaps the call to device_add
>> is failing in scsi_alloc_target. Failure of this call is not being handled
>> today. Can you give the attached patch a try? 
> 
> This fixes it, tested with plain rc3. Lots of -EEXIST, I wonder if the real bug is elsewhere.

I would guess that the -EEXIST is coming from:

create_dir
sysfs_create_dir
create_dir
kobject_add
device_add

Looking at the scsi_target reap code, it looks like there is a race condition. The
target is removed from the hosts list of targets under the host lock, then the host
lock is released. If another thread tries to add the same target that is being
tore down at this point (before device_del), the device_add will fail with EEXIST
since the sysfs directory for the device still exists.

Any reason we can't protect the target reaping code from this by grabbing the 
scan_mutex?


Brian

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

  reply	other threads:[~2006-02-22 14:38 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-17  0:05 2.6.15-git12, slab corruption in ipr Olaf Hering
2006-01-18 18:42 ` Brian King
2006-01-19 21:05   ` Olaf Hering
2006-01-30 10:46     ` Olaf Hering
2006-01-30 16:49       ` Olaf Hering
2006-02-06 22:04         ` 2.6.16-rc1 crash in scsi_target_reap_work Olaf Hering
2006-02-06 22:26           ` Olaf Hering
2006-02-06 22:44           ` James Bottomley
2006-02-09 20:05             ` Olaf Hering
2006-02-10 10:11               ` Olaf Hering
2006-02-10 14:04                 ` James Bottomley
2006-02-10 14:10                   ` Olaf Hering
2006-02-10 23:01                     ` Olaf Hering
2006-02-10 23:21                       ` Brian King
2006-02-10 23:29                         ` Olaf Hering
2006-02-11 10:34                           ` Olaf Hering
2006-02-20 23:00                           ` Brian King
2006-02-22  8:36                             ` Olaf Hering
2006-02-22 14:38                               ` Brian King [this message]
2006-02-22 15:53                                 ` Olaf Hering
2006-02-22 16:47                                 ` Mike Anderson
2006-02-22 17:05                                   ` James Bottomley
2006-02-10 21:28                   ` Brian King
2006-01-30 18:07 ` 2.6.15-git12, slab corruption in ipr Olaf Hering

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43FC774B.8050301@us.ibm.com \
    --to=brking@us.ibm.com \
    --cc=James.Bottomley@SteelEye.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=olh@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).