* Re: [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
[not found] ` <20171106225354.6ucl4f4ipsjlntzl@wfg-t540p.sh.intel.com>
@ 2017-11-06 23:12 ` Linus Torvalds
2017-11-07 0:12 ` Tejun Heo
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Linus Torvalds @ 2017-11-06 23:12 UTC (permalink / raw)
To: Fengguang Wu
Cc: IDE-ML, Christoph Hellwig, Tejun Heo, Hannes Reinecke,
Linux Kernel Mailing List, Johannes Thumshirn, Martin K. Petersen,
linux-scsi, James Bottomley
On Mon, Nov 6, 2017 at 2:53 PM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>
> The same dmesg happen to contain another libata related bug. Attached again.
> It's rare and in the error handling path, so unlikely a new regression.
>
> [ 49.608280] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
> [ 49.647821] mutex_lock+0x20/0x50
> [ 49.651443] kernfs_find_and_get_ns+0x23/0x60
> [ 49.656104] sysfs_notify+0x77/0x90
> [ 49.659900] scsi_device_set_state+0x63/0x150
> [ 49.664559] ata_scsi_offline_dev+0x1c/0x30 [libata]
> [ 49.669817] ata_eh_detach_dev+0x3b/0xb0 [libata]
ata_eh_detach_dev() does
spin_lock_irqsave(ap->lock, flags);
and then does
if (ata_scsi_offline_dev(dev)) {
dev->flags |= ATA_DFLAG_DETACHED;
ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
}
inside that spinlock. And this code is not new - it has done it since
2006 or so.
But it does seem to be a new regression in 4.14, caused by commit
8a97712e5314 ("scsi: make 'state' device attribute pollable"), because
that's what added the sysfs_notify() call to scsi_device_set_state(),
which made that spinlock be a problem.
That commit came in through the SCSI merge this merge window, and it
seems to still revert cleanly.
So I do suspect that by now we should just revert that commit. It's
not clear why that state attribute should be pollable, and the new
code is clearly very much buggy.
Hannes, Martin?
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
2017-11-06 23:12 ` [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 Linus Torvalds
@ 2017-11-07 0:12 ` Tejun Heo
2017-11-07 3:34 ` Martin K. Petersen
2017-11-07 6:55 ` Hannes Reinecke
2 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2017-11-07 0:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Fengguang Wu, IDE-ML, Christoph Hellwig, Hannes Reinecke,
Linux Kernel Mailing List, Johannes Thumshirn, Martin K. Petersen,
linux-scsi, James Bottomley
Hello,
On Mon, Nov 06, 2017 at 03:12:31PM -0800, Linus Torvalds wrote:
> But it does seem to be a new regression in 4.14, caused by commit
> 8a97712e5314 ("scsi: make 'state' device attribute pollable"), because
> that's what added the sysfs_notify() call to scsi_device_set_state(),
> which made that spinlock be a problem.
Yeah, pinged Hannes about it a couple of weeks ago.
> That commit came in through the SCSI merge this merge window, and it
> seems to still revert cleanly.
>
> So I do suspect that by now we should just revert that commit. It's
> not clear why that state attribute should be pollable, and the new
> code is clearly very much buggy.
I think reverting is the right thing to do right now. If necessary,
we can make kernfs_notify() safe to be called from atomic contexts.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
2017-11-06 23:12 ` [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 Linus Torvalds
2017-11-07 0:12 ` Tejun Heo
@ 2017-11-07 3:34 ` Martin K. Petersen
2017-11-07 6:55 ` Hannes Reinecke
2 siblings, 0 replies; 4+ messages in thread
From: Martin K. Petersen @ 2017-11-07 3:34 UTC (permalink / raw)
To: Linus Torvalds
Cc: Fengguang Wu, IDE-ML, Christoph Hellwig, Tejun Heo,
Hannes Reinecke, Linux Kernel Mailing List, Johannes Thumshirn,
Martin K. Petersen, linux-scsi, James Bottomley
Linus,
> But it does seem to be a new regression in 4.14, caused by commit
> 8a97712e5314 ("scsi: make 'state' device attribute pollable"), because
> that's what added the sysfs_notify() call to scsi_device_set_state(),
> which made that spinlock be a problem.
>
> That commit came in through the SCSI merge this merge window, and it
> seems to still revert cleanly.
>
> So I do suspect that by now we should just revert that commit. It's
> not clear why that state attribute should be pollable, and the new
> code is clearly very much buggy.
Yeah, let's revert it for now.
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
2017-11-06 23:12 ` [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 Linus Torvalds
2017-11-07 0:12 ` Tejun Heo
2017-11-07 3:34 ` Martin K. Petersen
@ 2017-11-07 6:55 ` Hannes Reinecke
2 siblings, 0 replies; 4+ messages in thread
From: Hannes Reinecke @ 2017-11-07 6:55 UTC (permalink / raw)
To: Linus Torvalds, Fengguang Wu
Cc: IDE-ML, Christoph Hellwig, Tejun Heo, Linux Kernel Mailing List,
Johannes Thumshirn, Martin K. Petersen, linux-scsi,
James Bottomley
On 11/07/2017 12:12 AM, Linus Torvalds wrote:
> On Mon, Nov 6, 2017 at 2:53 PM, Fengguang Wu <fengguang.wu@intel.com> wrote:
>>
>> The same dmesg happen to contain another libata related bug. Attached again.
>> It's rare and in the error handling path, so unlikely a new regression.
>>
>> [ 49.608280] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238
>> [ 49.647821] mutex_lock+0x20/0x50
>> [ 49.651443] kernfs_find_and_get_ns+0x23/0x60
>> [ 49.656104] sysfs_notify+0x77/0x90
>> [ 49.659900] scsi_device_set_state+0x63/0x150
>> [ 49.664559] ata_scsi_offline_dev+0x1c/0x30 [libata]
>> [ 49.669817] ata_eh_detach_dev+0x3b/0xb0 [libata]
>
> ata_eh_detach_dev() does
>
> spin_lock_irqsave(ap->lock, flags);
>
> and then does
>
> if (ata_scsi_offline_dev(dev)) {
> dev->flags |= ATA_DFLAG_DETACHED;
> ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
> }
>
> inside that spinlock. And this code is not new - it has done it since
> 2006 or so.
>
> But it does seem to be a new regression in 4.14, caused by commit
> 8a97712e5314 ("scsi: make 'state' device attribute pollable"), because
> that's what added the sysfs_notify() call to scsi_device_set_state(),
> which made that spinlock be a problem.
>
> That commit came in through the SCSI merge this merge window, and it
> seems to still revert cleanly.
>
> So I do suspect that by now we should just revert that commit. It's
> not clear why that state attribute should be pollable, and the new
> code is clearly very much buggy.
>
> Hannes, Martin?
>
Seeing the complexity involved, yes, please revert that.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-11-07 6:55 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+55aFxSJGeN=2X-uX-on1Uq2Nb8+v1aiMDz5H1+tKW_N5Q+6g@mail.gmail.com>
[not found] ` <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com>
[not found] ` <20171029231835.3725fnd5yehlmqob@wfg-t540p.sh.intel.com>
[not found] ` <20171030110511.scfrdtlnf5lbdhu5@pd.tnic>
[not found] ` <CA+55aFy2c88wKndjRAds-3MZ+MsArGBySOog+Rxmd2AciGFGkA@mail.gmail.com>
[not found] ` <CA+55aFwW7sbguREtVzKAUkNGb4cGh_eADqkEYcUurX5xS60_ww@mail.gmail.com>
[not found] ` <526e7cf2-0672-e44b-c32f-26128a2dfd37@codeaurora.org>
[not found] ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
[not found] ` <20171106225354.6ucl4f4ipsjlntzl@wfg-t540p.sh.intel.com>
2017-11-06 23:12 ` [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 Linus Torvalds
2017-11-07 0:12 ` Tejun Heo
2017-11-07 3:34 ` Martin K. Petersen
2017-11-07 6:55 ` Hannes Reinecke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox