schdule bug in 4.4.38-rt49

public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed

* schdule bug in 4.4.38-rt49
@ 2019-06-26  7:35 xiaoqiang.zhao
  2019-07-03 11:42 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 5+ messages in thread
From: xiaoqiang.zhao @ 2019-06-26  7:35 UTC (permalink / raw)
  To: linux-rt-users

Hi, guys:

     I have built a kernel 4.4.38-rt49 with CONFIG_PREEMPT_RT_FULL=y , 
the kernel crash when I run the UnixBench of spawn test case.

     Here is the oops info:

[  206.143829] BUG: scheduling while atomic: spawn/27356/0x00000002
[  206.143839] Modules linked in: bcmdhd pci_tegra bluedroid_pm ip_tables
[  206.143846] CPU: 5 PID: 27356 Comm: spawn Tainted: G W       
4.4.38-DATA-RT-g06219d69-dirty #7
[  206.143848] Hardware name: quill (DT)
[  206.143850] Call trace:
[  206.143871] [<ffffffc0000898f0>] dump_backtrace+0x0/0x100
[  206.143875] [<ffffffc000089ab8>] show_stack+0x14/0x1c
[  206.143884] [<ffffffc000314120>] dump_stack+0x98/0xc0
[  206.143902] [<ffffffc00016c330>] __schedule_bug+0x44/0x5c
[  206.143911] [<ffffffc000afd690>] __schedule+0x418/0x4f4
[  206.143913] [<ffffffc000afd7b8>] schedule+0x4c/0xe4
[  206.143918] [<ffffffc000afeeb8>] rt_spin_lock_slowlock+0x194/0x2c4
[  206.143921] [<ffffffc000b0048c>] rt_spin_lock+0x58/0x5c
[  206.143926] [<ffffffc0000e4678>] __wake_up+0x20/0x4c
[  206.143930] [<ffffffc0000e6d58>] __percpu_up_read+0x34/0x3c
[  206.143939] [<ffffffc0000a2de0>] copy_process.isra.52+0x136c/0x19f0
[  206.143942] [<ffffffc0000a3590>] _do_fork+0x74/0x39c
[  206.143945] [<ffffffc0000a3980>] SyS_clone+0x1c/0x24
[  206.143949] [<ffffffc000084ff0>] el0_svc_naked+0x24/0x28
[  206.143963] Unable to handle kernel paging request at virtual address 
7ff31fc040
[  206.143964] pgd = ffffffc1df8d7000
[  206.143985] [7ff31fc040] *pgd=0000000262400003, 
*pud=0000000262400003, *pmd=000000025ff4c003, *pte=00e0000255829f3
[  206.143989] Internal error: Oops: 9200004f [#1] PREEMPT SMP
[  206.143996] Modules linked in: bcmdhd pci_tegra bluedroid_pm ip_tables
[  206.143999] CPU: 5 PID: 27356 Comm: spawn Tainted: G W       
4.4.38-DATA-RT-g06219d69-dirty #7
[  206.144000] Hardware name: quill (DT)
[  206.144002] task: ffffffc1e45bd100 ti: ffffffc1e0320000 task.ti: 
ffffffc1e0320000
[  206.144005] PC is at 0x7f9fb6a198
[  206.144006] LR is at 0x559517b9b0
[  206.144008] pc : [<0000007f9fb6a198>] lr : [<000000559517b9b0>] 
pstate: 20000000
[  206.144009] sp : 0000007ff31fc060
[  206.144013] x29: 0000007ff31fc0a0 x28: 0000000000000000
[  206.144016] x27: 0000000000000000 x26: 0000000000000000
[  206.144018] x25: 0000000000000000 x24: 0000000000000000
[  206.144021] x23: 0000000000000000 x22: 000000000000001e
[  206.144023] x21: 000000559518b000 x20: 0000007ff31fc094
[  206.144026] x19: 000000559518c048 x18: 0000000000000003
[  206.144028] x17: 0000007f9fb6a198 x16: 000000559518bf90
[  206.144030] x15: 0000007f9fc4c150 x14: 0000000000000008
[  206.144033] x13: 0000007f9fc2a34c x12: 0000007ff31fbfa0
[  206.144035] x11: 0000007f9fc4f740 x10: 0000000000000000
[  206.144038] x9 : 0000007ff31fc128 x8 : 00000000000000dc
[  206.144040] x7 : 0000007f9fbee088 x6 : 0000007f9fc4fac8
[  206.144042] x5 : 0000007f9fc40bb0 x4 : 0000007f9fc40c80
[  206.144045] x3 : 0000000000000000 x2 : 8c391e6c47b6d000
[  206.144047] x1 : 0000000000000000 x0 : 0000007ff31fc094
[  206.144048]
[  206.144050] Process spawn (pid: 27356, stack limit = 0xffffffc1e0320028)


The call path is:

  do_fork-> copy_process -> threadgroup_change_end -> 
percpu_up_read(call preempt_disable) -> __percpu_up_read

  -> wake_up -> rt_spin_lock -> rt_spin_lock_slowlock ->  schedule(call 
preempt_disable again) -> __schedule

  -> schedule_debug -> in_aotmic_preempt_off (return true, preempt_count 
== 2) -> __schedule_bug ( leads to kernel pagefault exception, OOPS!!)

Before schedule, we have call preempt_disable twice, this will 
definitely bump preempt_count to 2 and

in_atomic_preempt_off will fail.

I did not figure out:   WHY we call schedule inside 
rt_spin_lock_slowlock and under what condition this call is correct ?

Any ideas ?




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: schdule bug in 4.4.38-rt49
  2019-06-26  7:35 schdule bug in 4.4.38-rt49 xiaoqiang.zhao
@ 2019-07-03 11:42 ` Sebastian Andrzej Siewior
       [not found]   ` <987eec05-14d0-29a5-723c-7bfbc0a5465b@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-07-03 11:42 UTC (permalink / raw)
  To: xiaoqiang.zhao; +Cc: linux-rt-users

On 2019-06-26 15:35:04 [+0800], xiaoqiang.zhao wrote:
> Hi, guys:
Hi,

>     I have built a kernel 4.4.38-rt49 with CONFIG_PREEMPT_RT_FULL=y , the
> kernel crash when I run the UnixBench of spawn test case.

Can you forward to something newer, 4.4.179-rt181 for instance?

>     Here is the oops info:
…
> The call path is:
> 
>  do_fork-> copy_process -> threadgroup_change_end -> percpu_up_read(call
> preempt_disable) -> __percpu_up_read
> 
>  -> wake_up -> rt_spin_lock -> rt_spin_lock_slowlock ->  schedule(call
> preempt_disable again) -> __schedule
> 
>  -> schedule_debug -> in_aotmic_preempt_off (return true, preempt_count ==
> 2) -> __schedule_bug ( leads to kernel pagefault exception, OOPS!!)
> 
> Before schedule, we have call preempt_disable twice, this will definitely
> bump preempt_count to 2 and

something probably disabled preemption before that.

> in_atomic_preempt_off will fail.
> 
> I did not figure out:   WHY we call schedule inside rt_spin_lock_slowlock
> and under what condition this call is correct ?

if the lock is acquired you schedule out and wait und it is available
again.

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: schdule bug in 4.4.38-rt49
       [not found]   ` <987eec05-14d0-29a5-723c-7bfbc0a5465b@gmail.com>
@ 2019-07-04  8:00     ` Sebastian Andrzej Siewior
  2019-07-04 10:17     ` xiaoqiang.zhao
  1 sibling, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-07-04  8:00 UTC (permalink / raw)
  To: xiaoqiang.zhao; +Cc: linux-rt-users

On 2019-07-04 13:50:08 [+0800], xiaoqiang.zhao wrote:
> 
> > > 2) -> __schedule_bug ( leads to kernel pagefault exception, OOPS!!)
> > > 
> > > Before schedule, we have call preempt_disable twice, this will definitely
> > > bump preempt_count to 2 and
> > 
> > something probably disabled preemption before that
> 
> I feel this is not make sense.  In my opinion, the preempt_count must be
> zero before we call 'schedule()',

that is correct. I was saying that the preempt count was > 0 before
wake_up() was invoked.

> otherwise, in_atomic_preempt_off will return true and trigger the
> __schedule_bug. If we have already
> 
> disable_preempt, we may in atomic context and we should not call schedule,
> right ?

correct.

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: schdule bug in 4.4.38-rt49
       [not found]   ` <987eec05-14d0-29a5-723c-7bfbc0a5465b@gmail.com>
  2019-07-04  8:00     ` Sebastian Andrzej Siewior
@ 2019-07-04 10:17     ` xiaoqiang.zhao
  2019-07-05  6:31       ` xiaoqiang.zhao
  1 sibling, 1 reply; 5+ messages in thread
From: xiaoqiang.zhao @ 2019-07-04 10:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Resend as plain-text to linux-rt-users list.

在 2019/7/4 下午1:50, xiaoqiang.zhao 写道:

在 2019/7/3 下午7:42, Sebastian Andrzej Siewior 写道:
> On 2019-06-26 15:35:04 [+0800], xiaoqiang.zhao wrote:
>> Hi, guys:
> Hi,
>
Thanks for your reply ;-)

>> 2) -> __schedule_bug ( leads to kernel pagefault exception, OOPS!!)
>>
>> Before schedule, we have call preempt_disable twice, this will 
>> definitely
>> bump preempt_count to 2 and
>
> something probably disabled preemption before that

I feel this is not make sense.  In my opinion, the preempt_count must be 
zero before we call 'schedule()',

otherwise, in_atomic_preempt_off will return true and trigger the 
__schedule_bug. If we have already

disable_preempt, we may in atomic context and we should not call 
schedule, right ?

>> in_atomic_preempt_off will fail.
>>
>> I did not figure out:   WHY we call schedule inside 
>> rt_spin_lock_slowlock
>> and under what condition this call is correct ?
> if the lock is acquired you schedule out and wait und it is available
> again.
got this.


>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: schdule bug in 4.4.38-rt49
  2019-07-04 10:17     ` xiaoqiang.zhao
@ 2019-07-05  6:31       ` xiaoqiang.zhao
  0 siblings, 0 replies; 5+ messages in thread
From: xiaoqiang.zhao @ 2019-07-05  6:31 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users


在 2019/7/4 下午6:17, xiaoqiang.zhao 写道:
> Resend as plain-text to linux-rt-users list.
>
> 在 2019/7/4 下午1:50, xiaoqiang.zhao 写道:
>
> 在 2019/7/3 下午7:42, Sebastian Andrzej Siewior 写道:
>> On 2019-06-26 15:35:04 [+0800], xiaoqiang.zhao wrote:
>>> Hi, guys:
>> Hi,
>>
> Thanks for your reply ;-)
>
>>> 2) -> __schedule_bug ( leads to kernel pagefault exception, OOPS!!)
>>>
>>> Before schedule, we have call preempt_disable twice, this will 
>>> definitely
>>> bump preempt_count to 2 and
>>
>> something probably disabled preemption before that
>
> I feel this is not make sense.  In my opinion, the preempt_count must 
> be zero before we call 'schedule()',
>
> otherwise, in_atomic_preempt_off will return true and trigger the 
> __schedule_bug. If we have already
>
> disable_preempt, we may in atomic context and we should not call 
> schedule, right ?
>
>>> in_atomic_preempt_off will fail.
>>>
>>> I did not figure out:   WHY we call schedule inside 
>>> rt_spin_lock_slowlock
>>> and under what condition this call is correct ?
>> if the lock is acquired you schedule out and wait und it is available
>> again.
> got this.
>
>
>>

Finally, this issue is resolved by revert commit 
80127a39681bd68c959f0953f84a830cbd7c3b1c <locking/percpu-rwsem: Optimize 
readers and reduce global impact>.  This commit introduce a 
"preempt_disable()" call in "percpu_up_read" function and  can NOT 
coexist with 4.4.38-rt49 preempt-rt patch set

Hope this information may be useful to someone who encounter the same 
problem ;-)

Thanks Sebastian !



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-07-05  6:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-26  7:35 schdule bug in 4.4.38-rt49 xiaoqiang.zhao
2019-07-03 11:42 ` Sebastian Andrzej Siewior
     [not found]   ` <987eec05-14d0-29a5-723c-7bfbc0a5465b@gmail.com>
2019-07-04  8:00     ` Sebastian Andrzej Siewior
2019-07-04 10:17     ` xiaoqiang.zhao
2019-07-05  6:31       ` xiaoqiang.zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox