* Re: [Bug] 12.864681 BUG: lock held when returning to user space! [not found] ` <5253A9E7.5030707@oracle.com> @ 2013-10-08 13:45 ` Douglas Gilbert 2013-10-16 13:24 ` James Bottomley 0 siblings, 1 reply; 4+ messages in thread From: Douglas Gilbert @ 2013-10-08 13:45 UTC (permalink / raw) To: vaughan, Madper Xie; +Cc: linux-kernel, SCSI development list On 13-10-08 02:44 AM, vaughan wrote: > Hi Madper, > > CC to Douglas to get comments. > I use the rw_semaphore o_sem to protect excl open, introduced in commit > 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1. > Is it forbidden to do like that in kernel?... It appears you can not (allow sg_open() to hold a semaphore then return to the user space). So you will need to do some rework on that patch or revert it. Doug Gilbert Reference: scsi-linux + kernel lists, title: [PATCH v6 0/4][SCSI] sg: fix race condition in sg_open 20130828 > On 10/08/2013 01:57 PM, Madper Xie wrote: >> Howdy Vaughan Cao, >> I can't meet this issue on both 3.11 and 3.11.4. There are only four >> patches between 3.11 and 3.12-rc2 and you are the author. Will you >> please check them if you have time. >> >> cxie@redhat.com writes: >> >>> Hi all, >>> With kernel3.12-rc2 the dmesg shows following logs: >>> [ 12.864680] ================================================ >>> [ 12.864681] [ BUG: lock held when returning to user space! ] >>> [ 12.864682] 3.12.0-rc2 #1 Not tainted >>> [ 12.864683] ------------------------------------------------ >>> [ 12.864684] iprinit/719 is leaving the kernel with locks still held! >>> [ 12.864685] 1 lock held by iprinit/719: >>> [ 12.864686] #0: (&sdp->o_sem){.+.+..}, at: [<ffffffffa050de05>] sg_open+0x4b5/0x644 [sg] >>> [ 12.934954] ath9k 0000:01:00.0: enabling device (0000 -> 0002) >>> [ 12.940346] ath: phy0: timeout (1000 us) on reg 0x15f18: 0x00000000 & 0x00000007 != 0x00000004 >>> [ 12.943125] ath: EEPROM regdomain: 0x60 >>> [ 12.943127] ath: EEPROM indicates we should expect a direct regpair map >>> [ 12.943129] ath: Country alpha2 being used: 00 >>> [ 12.943130] ath: Regpair used: 0x60 >>> [ 12.960202] r8169 0000:02:00.0 p3p1: link down >>> [ 12.960236] r8169 0000:02:00.0 p3p1: link down >>> [ 12.960256] IPv6: ADDRCONF(NETDEV_UP): p3p1: link is not ready >>> [ 13.003523] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' >>> [ 13.003886] ieee80211 phy0: Atheros AR9485 Rev:1 mem=0xffffc9000bc80000, irq=16 >>> [ 13.012120] ip6_tables: (C) 2000-2006 Netfilter Core Team >>> [ 13.023667] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready >>> [ 13.055802] Ebtables v2.0 registered >>> [ 13.192291] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) >>> [ 15.906392] r8169 0000:02:00.0 p3p1: link up >>> [ 15.906416] IPv6: ADDRCONF(NETDEV_CHANGE): p3p1: link becomes ready >>> [ 17.121989] systemd-udevd (334) used greatest stack depth: 3352 bytes left >>> >>> I'm working on finding which version bring this bug in. >> > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] 12.864681 BUG: lock held when returning to user space! 2013-10-08 13:45 ` [Bug] 12.864681 BUG: lock held when returning to user space! Douglas Gilbert @ 2013-10-16 13:24 ` James Bottomley 2013-10-16 22:41 ` Douglas Gilbert 0 siblings, 1 reply; 4+ messages in thread From: James Bottomley @ 2013-10-16 13:24 UTC (permalink / raw) To: dgilbert; +Cc: vaughan, Madper Xie, linux-kernel, SCSI development list On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote: > On 13-10-08 02:44 AM, vaughan wrote: > > Hi Madper, > > > > CC to Douglas to get comments. > > I use the rw_semaphore o_sem to protect excl open, introduced in commit > > 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1. > > Is it forbidden to do like that in kernel?... > > It appears you can not (allow sg_open() to hold a semaphore > then return to the user space). So you will need to do some > rework on that patch or revert it. OK, there being no reply on this, I'll do the revert ... that's all four patches, correct? James ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] 12.864681 BUG: lock held when returning to user space! 2013-10-16 13:24 ` James Bottomley @ 2013-10-16 22:41 ` Douglas Gilbert 2013-10-17 2:52 ` vaughan 0 siblings, 1 reply; 4+ messages in thread From: Douglas Gilbert @ 2013-10-16 22:41 UTC (permalink / raw) To: James Bottomley; +Cc: vaughan, Madper Xie, linux-kernel, SCSI development list On 13-10-16 09:24 AM, James Bottomley wrote: > On Tue, 2013-10-08 at 09:45 -0400, Douglas Gilbert wrote: >> On 13-10-08 02:44 AM, vaughan wrote: >>> Hi Madper, >>> >>> CC to Douglas to get comments. >>> I use the rw_semaphore o_sem to protect excl open, introduced in commit >>> 15b06f9a02406e5460001db6d5af5c738cd3d4e7 since v3.12-rc1. >>> Is it forbidden to do like that in kernel?... >> >> It appears you can not (allow sg_open() to hold a semaphore >> then return to the user space). So you will need to do some >> rework on that patch or revert it. > > OK, there being no reply on this, I'll do the revert ... that's all four > patches, correct? That seems to be the case. Vaughan acknowledged the problem and forwarded it to me 8 days ago. Yes, it seems to be a "no-no" to hold a any kernel semaphore when returning to the user space; in this case from sg_open(). I was hoping a revised patch might appear from Vaughan but to date that has not been the case. So with only a few weeks to go before lk 3.12 is released, reverting the whole 4 patches in that series seems to be the safest course. Also without a new patch from Vaughan in the next few weeks he may also miss the opportunity of getting his improved O_EXCL logic into the lk 3.13 series. Thinking about how to solve this problem: a field could be added to 'struct sg_device' with one of three states: no_opens, non_excl_opens and excl_open. It could be manipulated by sg_open() and sg_release() like a read-write semaphore. And the faulty 'struct rw_semaphore o_sem' in sg_device could be replaced by a normal semaphore to protect the manipulation of the new three-state field. Doug Gilbert ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug] 12.864681 BUG: lock held when returning to user space! 2013-10-16 22:41 ` Douglas Gilbert @ 2013-10-17 2:52 ` vaughan 0 siblings, 0 replies; 4+ messages in thread From: vaughan @ 2013-10-17 2:52 UTC (permalink / raw) To: dgilbert, James Bottomley; +Cc: Madper Xie, linux-kernel, SCSI development list On 10/17/2013 06:41 AM, Douglas Gilbert wrote: > That seems to be the case. Vaughan acknowledged the > problem and forwarded it to me 8 days ago. Yes, it > seems to be a "no-no" to hold a any kernel semaphore > when returning to the user space; in this case from > sg_open(). I was hoping a revised patch might > appear from Vaughan but to date that has not been > the case. So with only a few weeks to go before > lk 3.12 is released, reverting the whole 4 patches > in that series seems to be the safest course. > > Also without a new patch from Vaughan in the next few > weeks he may also miss the opportunity of getting > his improved O_EXCL logic into the lk 3.13 series. > > > Thinking about how to solve this problem: a field could > be added to 'struct sg_device' with one of three states: > no_opens, non_excl_opens and excl_open. It could be > manipulated by sg_open() and sg_release() like a > read-write semaphore. And the faulty 'struct > rw_semaphore o_sem' in sg_device could be replaced by a > normal semaphore to protect the manipulation of the new > three-state field. > And the new three-state field would replace (or expand) the 'char exclude' field in struct sg_device . > > Doug Gilbert Hi Doug, Thanks for providing advice on how to fix this. However, it seems be still awkward somehow. We have to 1. hold a lock (maybe sg_index_lock or a new one) 2. check a) the new three-state field; b) if sfp list is empty; c) sdp->detached field; if either condition fails, we may link the open process into o_excl_wait queue and need wakeup. if satisfied, we go on. 3. then we release at least sg_index_lock to malloc a new sfp and initialize. 4. try to acquire sg_index_lock again and add this sfp into sfd_siblings list if possible. <== We still have to check at least sdp->detached field 5. update three-state field to reflect the result of Step 4, and wake up processes waiting in o_excl_wait. This uncomfortable is introduced by releasing the sg_index_lock in the middle of check->malloc->add the new sfp struct. I wanna ask if it is possible to split the sg_add_sfp() into two functions, sg_init_sfp() and sg_add_sfp2(). We can do all initialize work in sg_init_sfp() without any lock and let sg_add_sfp2() only serve lock-check-add in one way. It seems more convenient for me to understand. But there is still some questions on this approach: 1. memory consume can be very large if lots of sg_init_sfp in the same time; 2. some field are initialized according to the fields of scsi device sdp points to, such as low_dma, sg_tablesize, max_sector, phys_segs. I know scsi_device_get() would keep the underlying scsi_device alive, however would these fields change in the gap of our initialize and add? The relationship of sg_device and scsi_device like above said confuse me somehow... Thanks, Vaughan ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2013-10-17 2:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <8761tilrl0.fsf@redhat.com>
[not found] ` <874n8s2tjz.fsf@redhat.com>
[not found] ` <5253A9E7.5030707@oracle.com>
2013-10-08 13:45 ` [Bug] 12.864681 BUG: lock held when returning to user space! Douglas Gilbert
2013-10-16 13:24 ` James Bottomley
2013-10-16 22:41 ` Douglas Gilbert
2013-10-17 2:52 ` vaughan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).