* Re: [Bug] 12.864681 BUG: lock held when returning to user space!
[not found] ` <5253A9E7.5030707@oracle.com>
@ 2013-10-08 13:45 ` Douglas Gilbert
2013-10-16 13:24 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Douglas Gilbert @ 2013-10-08 13:45 UTC (permalink / raw)
To: vaughan, Madper Xie; +Cc: linux-kernel, SCSI development list
On 13-10-08 02:44 AM, vaughan wrote:
> Hi Madper,
>
> CC to Douglas to get comments.
> I use the rw_semaphore o_sem to protect excl open, introduced in commit
> 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
> Is it forbidden to do like that in kernel?...
It appears you can not (allow sg_open() to hold a semaphore
then return to the user space). So you will need to do some
rework on that patch or revert it.
Doug Gilbert
Reference: scsi-linux + kernel lists, title:
[PATCH v6 0/4][SCSI] sg: fix race condition in sg_open
20130828
> On 10/08/2013 01:57 PM, Madper Xie wrote:
>> Howdy Vaughan Cao,
>> I can't meet this issue on both 3.11 and 3.11.4. There are only four
>> patches between 3.11 and 3.12-rc2 and you are the author. Will you
>> please check them if you have time.
>>
>> cxie@redhat.com writes:
>>
>>> Hi all,
>>> With kernel3.12-rc2 the dmesg shows following logs:
>>> [ 12.864680] ================================================
>>> [ 12.864681] [ BUG: lock held when returning to user space! ]
>>> [ 12.864682] 3.12.0-rc2 #1 Not tainted
>>> [ 12.864683] ------------------------------------------------
>>> [ 12.864684] iprinit/719 is leaving the kernel with locks still held!
>>> [ 12.864685] 1 lock held by iprinit/719:
>>> [ 12.864686] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffffa050de05>] sg_open+0x4b5/0x644 [sg]
>>> [ 12.934954] ath9k 0000:01:00.0: enabling device (0000 -> 0002)
>>> [ 12.940346] ath: phy0: timeout (1000 us) on reg 0x15f18: 0x00000000 & 0x00000007 != 0x00000004
>>> [ 12.943125] ath: EEPROM regdomain: 0x60
>>> [ 12.943127] ath: EEPROM indicates we should expect a direct regpair map
>>> [ 12.943129] ath: Country alpha2 being used: 00
>>> [ 12.943130] ath: Regpair used: 0x60
>>> [ 12.960202] r8169 0000:02:00.0 p3p1: link down
>>> [ 12.960236] r8169 0000:02:00.0 p3p1: link down
>>> [ 12.960256] IPv6: ADDRCONF(NETDEV_UP): p3p1: link is not ready
>>> [ 13.003523] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
>>> [ 13.003886] ieee80211 phy0: Atheros AR9485 Rev:1 mem=0xffffc9000bc80000, irq=16
>>> [ 13.012120] ip6_tables: (C) 2000-2006 Netfilter Core Team
>>> [ 13.023667] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
>>> [ 13.055802] Ebtables v2.0 registered
>>> [ 13.192291] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
>>> [ 15.906392] r8169 0000:02:00.0 p3p1: link up
>>> [ 15.906416] IPv6: ADDRCONF(NETDEV_CHANGE): p3p1: link becomes ready
>>> [ 17.121989] systemd-udevd (334) used greatest stack depth: 3352 bytes left
>>>
>>> I'm working on finding which version bring this bug in.
>>
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] 12.864681 BUG: lock held when returning to user space!
2013-10-08 13:45 ` [Bug] 12.864681 BUG: lock held when returning to user space! Douglas Gilbert
@ 2013-10-16 13:24 ` James Bottomley
2013-10-16 22:41 ` Douglas Gilbert
0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2013-10-16 13:24 UTC (permalink / raw)
To: dgilbert; +Cc: vaughan, Madper Xie, linux-kernel, SCSI development list
On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote:
> On 13-10-08 02:44 AM, vaughan wrote:
> > Hi Madper,
> >
> > CC to Douglas to get comments.
> > I use the rw_semaphore o_sem to protect excl open, introduced in commit
> > 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
> > Is it forbidden to do like that in kernel?...
>
> It appears you can not (allow sg_open() to hold a semaphore
> then return to the user space). So you will need to do some
> rework on that patch or revert it.
OK, there being no reply on this, I'll do the revert ... that's all four
patches, correct?
James
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] 12.864681 BUG: lock held when returning to user space!
2013-10-16 13:24 ` James Bottomley
@ 2013-10-16 22:41 ` Douglas Gilbert
2013-10-17 2:52 ` vaughan
0 siblings, 1 reply; 4+ messages in thread
From: Douglas Gilbert @ 2013-10-16 22:41 UTC (permalink / raw)
To: James Bottomley; +Cc: vaughan, Madper Xie, linux-kernel, SCSI development list
On 13-10-16 09:24 AM, James Bottomley wrote:
> On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote:
>> On 13-10-08 02:44 AM, vaughan wrote:
>>> Hi Madper,
>>>
>>> CC to Douglas to get comments.
>>> I use the rw_semaphore o_sem to protect excl open, introduced in commit
>>> 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1.
>>> Is it forbidden to do like that in kernel?...
>>
>> It appears you can not (allow sg_open() to hold a semaphore
>> then return to the user space). So you will need to do some
>> rework on that patch or revert it.
>
> OK, there being no reply on this, I'll do the revert ... that's all four
> patches, correct?
That seems to be the case. Vaughan acknowledged the
problem and forwarded it to me 8 days ago. Yes, it
seems to be a "no-no" to hold a any kernel semaphore
when returning to the user space; in this case from
sg_open(). I was hoping a revised patch might
appear from Vaughan but to date that has not been
the case. So with only a few weeks to go before
lk 3.12 is released, reverting the whole 4 patches
in that series seems to be the safest course.
Also without a new patch from Vaughan in the next few
weeks he may also miss the opportunity of getting
his improved O_EXCL logic into the lk 3.13 series.
Thinking about how to solve this problem: a field could
be added to 'struct sg_device' with one of three states:
no_opens, non_excl_opens and excl_open. It could be
manipulated by sg_open() and sg_release() like a
read-write semaphore. And the faulty 'struct
rw_semaphore o_sem' in sg_device could be replaced by a
normal semaphore to protect the manipulation of the new
three-state field.
Doug Gilbert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] 12.864681 BUG: lock held when returning to user space!
2013-10-16 22:41 ` Douglas Gilbert
@ 2013-10-17 2:52 ` vaughan
0 siblings, 0 replies; 4+ messages in thread
From: vaughan @ 2013-10-17 2:52 UTC (permalink / raw)
To: dgilbert, James Bottomley; +Cc: Madper Xie, linux-kernel, SCSI development list
On 10/17/2013 06:41 AM, Douglas Gilbert wrote:
> That seems to be the case. Vaughan acknowledged the
> problem and forwarded it to me 8 days ago. Yes, it
> seems to be a "no-no" to hold a any kernel semaphore
> when returning to the user space; in this case from
> sg_open(). I was hoping a revised patch might
> appear from Vaughan but to date that has not been
> the case. So with only a few weeks to go before
> lk 3.12 is released, reverting the whole 4 patches
> in that series seems to be the safest course.
>
> Also without a new patch from Vaughan in the next few
> weeks he may also miss the opportunity of getting
> his improved O_EXCL logic into the lk 3.13 series.
>
>
> Thinking about how to solve this problem: a field could
> be added to 'struct sg_device' with one of three states:
> no_opens, non_excl_opens and excl_open. It could be
> manipulated by sg_open() and sg_release() like a
> read-write semaphore. And the faulty 'struct
> rw_semaphore o_sem' in sg_device could be replaced by a
> normal semaphore to protect the manipulation of the new
> three-state field.
> And the new three-state field would replace (or expand)
the 'char exclude' field in struct sg_device .
>
> Doug Gilbert
Hi Doug,
Thanks for providing advice on how to fix this.
However, it seems be still awkward somehow. We have to
1. hold a lock (maybe sg_index_lock or a new one)
2. check
a) the new three-state field;
b) if sfp list is empty;
c) sdp->detached field;
if either condition fails, we may link the open process into o_excl_wait
queue and need wakeup.
if satisfied, we go on.
3. then we release at least sg_index_lock to malloc a new sfp and
initialize.
4. try to acquire sg_index_lock again and add this sfp into sfd_siblings
list if possible. <== We still have to check at least sdp->detached field
5. update three-state field to reflect the result of Step 4, and wake up
processes waiting in o_excl_wait.
This uncomfortable is introduced by releasing the sg_index_lock in the
middle of check->malloc->add the new sfp struct.
I wanna ask if it is possible to split the sg_add_sfp() into two
functions, sg_init_sfp() and sg_add_sfp2(). We can do all initialize
work in sg_init_sfp()
without any lock and let sg_add_sfp2() only serve lock-check-add in one
way. It seems more convenient for me to understand.
But there is still some questions on this approach:
1. memory consume can be very large if lots of sg_init_sfp in the same time;
2. some field are initialized according to the fields of scsi device sdp
points to, such as low_dma, sg_tablesize, max_sector, phys_segs.
I know scsi_device_get() would keep the underlying scsi_device
alive, however would these fields change in the gap of our initialize
and add?
The relationship of sg_device and scsi_device like above said
confuse me somehow...
Thanks,
Vaughan
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-10-17 2:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <8761tilrl0.fsf@redhat.com>
[not found] ` <874n8s2tjz.fsf@redhat.com>
[not found] ` <5253A9E7.5030707@oracle.com>
2013-10-08 13:45 ` [Bug] 12.864681 BUG: lock held when returning to user space! Douglas Gilbert
2013-10-16 13:24 ` James Bottomley
2013-10-16 22:41 ` Douglas Gilbert
2013-10-17 2:52 ` vaughan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).