* [PATCH v2 1/2] sysfs: handle 'parent deleted before child added'
@ 2012-04-06 20:41 Dan Williams
2012-04-06 20:41 ` [PATCH v2 2/2] sysfs: provide more diagnostic info for kobject_add_internal() failures Dan Williams
2012-04-06 20:45 ` [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Greg KH
0 siblings, 2 replies; 7+ messages in thread
From: Dan Williams @ 2012-04-06 20:41 UTC (permalink / raw)
To: gregkh; +Cc: James Bottomley, linux-kernel, linux-scsi
In scsi at least two cases of the parent device being deleted before the
child is added have been observed.
1/ scsi is performing async scans and the device is removed prior to the
async can thread running (can happen with an in-opportune / unlikely
unplug during initial scan).
2/ libsas discovery event running after the parent port has been torn
down (this is a bug in libsas).
Result in crash signatures like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
...
Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
Stack:
00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
Call Trace:
[<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
[<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8125e641>] kobject_add_varg+0x41/0x50
[<ffffffff8125e70b>] kobject_add+0x64/0x66
[<ffffffff8131122b>] device_add+0x12d/0x63a
In this scenario the parent is still valid (because we have a
reference), but it has been device_del()'d which means its kobj->sd
pointer is NULL'd via:
device_del()->kobject_del()->sysfs_remove_dir()
...and then sysfs_create_dir() (without this fix) goes ahead and
de-references parent_sd via sysfs_ns_type():
return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;
This scenario is being fixed in scsi/libsas, but if other subsystems
present the same ordering the system need not immediately crash.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v1: http://marc.info/?l=linux-scsi&m=133239707303441&w=2
1/ split out the kobject_add_internal() warning message changes into
its own patch
2/ clarified the changelog a bit due to the discussion from v1.
I think this patch can wait until 3.5, since it is just closing a hole I
found along the way to fixing the real bug. Now, the system won't crash
at device_add() time, but the system may still be de-stabilized if the
caller does not handle device_add() errors properly.
fs/sysfs/dir.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 7fdf6a7..86521ee 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -714,6 +714,9 @@ int sysfs_create_dir(struct kobject * kobj)
else
parent_sd = &sysfs_root;
+ if (!parent_sd)
+ return -ENOENT;
+
if (sysfs_ns_type(parent_sd))
ns = kobj->ktype->namespace(kobj);
type = sysfs_read_ns_type(kobj);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] sysfs: provide more diagnostic info for kobject_add_internal() failures
2012-04-06 20:41 [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Dan Williams
@ 2012-04-06 20:41 ` Dan Williams
2012-04-06 20:45 ` [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Greg KH
1 sibling, 0 replies; 7+ messages in thread
From: Dan Williams @ 2012-04-06 20:41 UTC (permalink / raw)
To: gregkh; +Cc: James Bottomley, linux-kernel, linux-scsi
1/ convert open-coded KERN_ERR+dump_stack() to WARN(), so that automated
tools pick up this warning.
2/ include the 'child' and 'parent' kobject names. This information was
useful for tracking down the case where scsi invoked device_del() on a
parent object and subsequently invoked device_add() on a child. Now the
warning looks like:
kobject_add_internal failed for target8:0:16 (error: -2 parent: end_device-8:0:24)
Pid: 2942, comm: scsi_scan_8 Not tainted 3.3.0-rc7-isci+ #2
Call Trace:
[<ffffffff8125e551>] kobject_add_internal+0x1c1/0x1f3
[<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff8125e659>] kobject_add_varg+0x41/0x50
[<ffffffff8125e723>] kobject_add+0x64/0x66
[<ffffffff8131124b>] device_add+0x12d/0x63a
[<ffffffff8125e0ef>] ? kobject_put+0x4c/0x50
[<ffffffff8132f370>] scsi_sysfs_add_sdev+0x4e/0x28a
[<ffffffff8132dce3>] do_scan_async+0x9c/0x145
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Bottomley <JBottomley@parallels.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
lib/kobject.c | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/lib/kobject.c b/lib/kobject.c
index c33d7a1..1bd0893 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -192,14 +192,14 @@ static int kobject_add_internal(struct kobject *kobj)
/* be noisy on error issues */
if (error == -EEXIST)
- printk(KERN_ERR "%s failed for %s with "
- "-EEXIST, don't try to register things with "
- "the same name in the same directory.\n",
- __func__, kobject_name(kobj));
+ WARN(1, "%s failed for %s with "
+ "-EEXIST, don't try to register things with "
+ "the same name in the same directory.\n",
+ __func__, kobject_name(kobj));
else
- printk(KERN_ERR "%s failed for %s (%d)\n",
- __func__, kobject_name(kobj), error);
- dump_stack();
+ WARN(1, "%s failed for %s (error: %d parent: %s)\n",
+ __func__, kobject_name(kobj), error,
+ parent ? kobject_name(parent) : "'none'");
} else
kobj->state_in_sysfs = 1;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] sysfs: handle 'parent deleted before child added'
2012-04-06 20:41 [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Dan Williams
2012-04-06 20:41 ` [PATCH v2 2/2] sysfs: provide more diagnostic info for kobject_add_internal() failures Dan Williams
@ 2012-04-06 20:45 ` Greg KH
2012-04-06 21:06 ` Williams, Dan J
1 sibling, 1 reply; 7+ messages in thread
From: Greg KH @ 2012-04-06 20:45 UTC (permalink / raw)
To: Dan Williams; +Cc: James Bottomley, linux-kernel, linux-scsi
On Fri, Apr 06, 2012 at 01:41:06PM -0700, Dan Williams wrote:
> In scsi at least two cases of the parent device being deleted before the
> child is added have been observed.
>
> 1/ scsi is performing async scans and the device is removed prior to the
> async can thread running (can happen with an in-opportune / unlikely
> unplug during initial scan).
That sounds like a bug in the scsi code, doesn't it?
> 2/ libsas discovery event running after the parent port has been torn
> down (this is a bug in libsas).
Is this fixed somewhere?
I don't want to paper over bugs like this by changing the sysfs core.
We went through this a lot years ago when scsi changed to use the driver
core, and I thought we had fixed all of these types of errors properly.
So, any chance to fix these properly as well?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] sysfs: handle 'parent deleted before child added'
2012-04-06 20:45 ` [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Greg KH
@ 2012-04-06 21:06 ` Williams, Dan J
2012-04-06 21:17 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Williams, Dan J @ 2012-04-06 21:06 UTC (permalink / raw)
To: Greg KH; +Cc: James Bottomley, linux-kernel, linux-scsi
On Fri, Apr 6, 2012 at 1:45 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Apr 06, 2012 at 01:41:06PM -0700, Dan Williams wrote:
>> In scsi at least two cases of the parent device being deleted before the
>> child is added have been observed.
>>
>> 1/ scsi is performing async scans and the device is removed prior to the
>> async can thread running (can happen with an in-opportune / unlikely
>> unplug during initial scan).
>
> That sounds like a bug in the scsi code, doesn't it?
>
>> 2/ libsas discovery event running after the parent port has been torn
>> down (this is a bug in libsas).
>
> Is this fixed somewhere?
Yes, these two issues have pending fixes that are posted to linux-scsi:
http://marc.info/?l=linux-scsi&m=133239707903443&w=2
http://marc.info/?l=linux-scsi&m=133239709603452&w=2
> I don't want to paper over bugs like this by changing the sysfs core.
> We went through this a lot years ago when scsi changed to use the driver
> core, and I thought we had fixed all of these types of errors properly.
Hotplug lifetime rules are still transport specific. So in this case
scsi-core is innocent these are bugs from libsas and
scsi_transport_sas.
> So, any chance to fix these properly as well?
This patch doesn't really paper over anything. It turns a NULL
pointer crash into an explicit warning from kobject_add_internal. For
the libsas/scsi case this device_add() failure is still fatal.
Regardless of whether sysfs changes the above two fixes are still
required.
Since the -EEXIST case is just a KERN_ERR and not a BUG_ON I figured
it was worthwhile to post a patch to do the same for this 'parent
deleted' case. But if crashing is the expectation then this patch can
be dropped.
--
Dan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] sysfs: handle 'parent deleted before child added'
2012-04-06 21:06 ` Williams, Dan J
@ 2012-04-06 21:17 ` Greg KH
2012-04-06 21:44 ` Williams, Dan J
0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2012-04-06 21:17 UTC (permalink / raw)
To: Williams, Dan J; +Cc: James Bottomley, linux-kernel, linux-scsi
On Fri, Apr 06, 2012 at 02:06:50PM -0700, Williams, Dan J wrote:
> On Fri, Apr 6, 2012 at 1:45 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Fri, Apr 06, 2012 at 01:41:06PM -0700, Dan Williams wrote:
> >> In scsi at least two cases of the parent device being deleted before the
> >> child is added have been observed.
> >>
> >> 1/ scsi is performing async scans and the device is removed prior to the
> >> async can thread running (can happen with an in-opportune / unlikely
> >> unplug during initial scan).
> >
> > That sounds like a bug in the scsi code, doesn't it?
> >
> >> 2/ libsas discovery event running after the parent port has been torn
> >> down (this is a bug in libsas).
> >
> > Is this fixed somewhere?
>
> Yes, these two issues have pending fixes that are posted to linux-scsi:
>
> http://marc.info/?l=linux-scsi&m=133239707903443&w=2
> http://marc.info/?l=linux-scsi&m=133239709603452&w=2
>
> > I don't want to paper over bugs like this by changing the sysfs core.
> > We went through this a lot years ago when scsi changed to use the driver
> > core, and I thought we had fixed all of these types of errors properly.
>
> Hotplug lifetime rules are still transport specific. So in this case
> scsi-core is innocent these are bugs from libsas and
> scsi_transport_sas.
Ok, thanks for the explaination.
> > So, any chance to fix these properly as well?
>
> This patch doesn't really paper over anything. It turns a NULL
> pointer crash into an explicit warning from kobject_add_internal. For
> the libsas/scsi case this device_add() failure is still fatal.
> Regardless of whether sysfs changes the above two fixes are still
> required.
>
> Since the -EEXIST case is just a KERN_ERR and not a BUG_ON I figured
> it was worthwhile to post a patch to do the same for this 'parent
> deleted' case. But if crashing is the expectation then this patch can
> be dropped.
No, crashing is not the expectation :)
But, without that crash, would the above fixes ever have been noticed
and fixed? The device_add() most likely would have quietly failed and
no one would have been the wiser.
Or would something else have caused this to be an obvious problem?
thanks,
greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] sysfs: handle 'parent deleted before child added'
2012-04-06 21:17 ` Greg KH
@ 2012-04-06 21:44 ` Williams, Dan J
2012-04-06 22:11 ` Greg KH
0 siblings, 1 reply; 7+ messages in thread
From: Williams, Dan J @ 2012-04-06 21:44 UTC (permalink / raw)
To: Greg KH; +Cc: James Bottomley, linux-kernel, linux-scsi
On Fri, Apr 6, 2012 at 2:17 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Fri, Apr 06, 2012 at 02:06:50PM -0700, Williams, Dan J wrote:
>> On Fri, Apr 6, 2012 at 1:45 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> > On Fri, Apr 06, 2012 at 01:41:06PM -0700, Dan Williams wrote:
>> >> In scsi at least two cases of the parent device being deleted before the
>> >> child is added have been observed.
>> >>
>> >> 1/ scsi is performing async scans and the device is removed prior to the
>> >> async can thread running (can happen with an in-opportune / unlikely
>> >> unplug during initial scan).
>> >
>> > That sounds like a bug in the scsi code, doesn't it?
>> >
>> >> 2/ libsas discovery event running after the parent port has been torn
>> >> down (this is a bug in libsas).
>> >
>> > Is this fixed somewhere?
>>
>> Yes, these two issues have pending fixes that are posted to linux-scsi:
>>
>> http://marc.info/?l=linux-scsi&m=133239707903443&w=2
>> http://marc.info/?l=linux-scsi&m=133239709603452&w=2
>>
>> > I don't want to paper over bugs like this by changing the sysfs core.
>> > We went through this a lot years ago when scsi changed to use the driver
>> > core, and I thought we had fixed all of these types of errors properly.
>>
>> Hotplug lifetime rules are still transport specific. So in this case
>> scsi-core is innocent these are bugs from libsas and
>> scsi_transport_sas.
>
> Ok, thanks for the explaination.
>
>> > So, any chance to fix these properly as well?
>>
>> This patch doesn't really paper over anything. It turns a NULL
>> pointer crash into an explicit warning from kobject_add_internal. For
>> the libsas/scsi case this device_add() failure is still fatal.
>> Regardless of whether sysfs changes the above two fixes are still
>> required.
>>
>> Since the -EEXIST case is just a KERN_ERR and not a BUG_ON I figured
>> it was worthwhile to post a patch to do the same for this 'parent
>> deleted' case. But if crashing is the expectation then this patch can
>> be dropped.
>
> No, crashing is not the expectation :)
I thought not, but sometimes the kernel likes to teach people that
bollix an api a hard lesson :).
>
> But, without that crash, would the above fixes ever have been noticed
> and fixed? The device_add() most likely would have quietly failed and
> no one would have been the wiser.
>
> Or would something else have caused this to be an obvious problem?
>
We still have the big red flag dump_stack() in kobject_add_internal()
(which patch 2 turns into a real WARN()), and for scsi our hotplug
tests still crash later on because libsas makes assumptions about the
device path. I understand the paranoia here, "check for NULL" is
usually a band-aid, but in this case this is just a softer
introduction to a debug session. No less vocal than before as far as
I can see.
--
Dan
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] sysfs: handle 'parent deleted before child added'
2012-04-06 21:44 ` Williams, Dan J
@ 2012-04-06 22:11 ` Greg KH
0 siblings, 0 replies; 7+ messages in thread
From: Greg KH @ 2012-04-06 22:11 UTC (permalink / raw)
To: Williams, Dan J; +Cc: James Bottomley, linux-kernel, linux-scsi
On Fri, Apr 06, 2012 at 02:44:37PM -0700, Williams, Dan J wrote:
> On Fri, Apr 6, 2012 at 2:17 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Fri, Apr 06, 2012 at 02:06:50PM -0700, Williams, Dan J wrote:
> >> On Fri, Apr 6, 2012 at 1:45 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> >> > On Fri, Apr 06, 2012 at 01:41:06PM -0700, Dan Williams wrote:
> >> >> In scsi at least two cases of the parent device being deleted before the
> >> >> child is added have been observed.
> >> >>
> >> >> 1/ scsi is performing async scans and the device is removed prior to the
> >> >> async can thread running (can happen with an in-opportune / unlikely
> >> >> unplug during initial scan).
> >> >
> >> > That sounds like a bug in the scsi code, doesn't it?
> >> >
> >> >> 2/ libsas discovery event running after the parent port has been torn
> >> >> down (this is a bug in libsas).
> >> >
> >> > Is this fixed somewhere?
> >>
> >> Yes, these two issues have pending fixes that are posted to linux-scsi:
> >>
> >> http://marc.info/?l=linux-scsi&m=133239707903443&w=2
> >> http://marc.info/?l=linux-scsi&m=133239709603452&w=2
> >>
> >> > I don't want to paper over bugs like this by changing the sysfs core.
> >> > We went through this a lot years ago when scsi changed to use the driver
> >> > core, and I thought we had fixed all of these types of errors properly.
> >>
> >> Hotplug lifetime rules are still transport specific. So in this case
> >> scsi-core is innocent these are bugs from libsas and
> >> scsi_transport_sas.
> >
> > Ok, thanks for the explaination.
> >
> >> > So, any chance to fix these properly as well?
> >>
> >> This patch doesn't really paper over anything. It turns a NULL
> >> pointer crash into an explicit warning from kobject_add_internal. For
> >> the libsas/scsi case this device_add() failure is still fatal.
> >> Regardless of whether sysfs changes the above two fixes are still
> >> required.
> >>
> >> Since the -EEXIST case is just a KERN_ERR and not a BUG_ON I figured
> >> it was worthwhile to post a patch to do the same for this 'parent
> >> deleted' case. But if crashing is the expectation then this patch can
> >> be dropped.
> >
> > No, crashing is not the expectation :)
>
> I thought not, but sometimes the kernel likes to teach people that
> bollix an api a hard lesson :).
>
> >
> > But, without that crash, would the above fixes ever have been noticed
> > and fixed? The device_add() most likely would have quietly failed and
> > no one would have been the wiser.
> >
> > Or would something else have caused this to be an obvious problem?
> >
>
> We still have the big red flag dump_stack() in kobject_add_internal()
> (which patch 2 turns into a real WARN()), and for scsi our hotplug
> tests still crash later on because libsas makes assumptions about the
> device path. I understand the paranoia here, "check for NULL" is
> usually a band-aid, but in this case this is just a softer
> introduction to a debug session. No less vocal than before as far as
> I can see.
Ok, that's reasonable, I'll queue this up for 3.5.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-04-06 22:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-06 20:41 [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Dan Williams
2012-04-06 20:41 ` [PATCH v2 2/2] sysfs: provide more diagnostic info for kobject_add_internal() failures Dan Williams
2012-04-06 20:45 ` [PATCH v2 1/2] sysfs: handle 'parent deleted before child added' Greg KH
2012-04-06 21:06 ` Williams, Dan J
2012-04-06 21:17 ` Greg KH
2012-04-06 21:44 ` Williams, Dan J
2012-04-06 22:11 ` Greg KH
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).