public inbox for linux-rt-users@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
@ 2022-07-13  7:50 Juri Lelli
  2022-07-13 11:46 ` Peter Zijlstra
  2022-07-13 21:31 ` Srivatsa S. Bhat
  0 siblings, 2 replies; 11+ messages in thread
From: Juri Lelli @ 2022-07-13  7:50 UTC (permalink / raw)
  To: LKML, linux-rt-users
  Cc: Juri Lelli, Ingo Molnar, Peter Zijlstra, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider

Tasks the are being deboosted from SCHED_DEADLINE might enter
enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
since they are not boosted anymore, the if (is_dl_boosted()) branch is
not taken, but the else if (!dl_prio) is and inside this one we
BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
otherwise we had entered the if branch above. Long story short, the
current condition doesn't make sense and always leads to triggering of a
BUG.

Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
to be present, but additional flags are not a problem.

Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/sched/deadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5867e186c39a..0447d46f4718 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1703,7 +1703,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
 		 * the throttle.
 		 */
 		p->dl.dl_throttled = 0;
-		BUG_ON(!is_dl_boosted(&p->dl) || flags != ENQUEUE_REPLENISH);
+		BUG_ON(!(flags & ENQUEUE_REPLENISH));
 		return;
 	}
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-13  7:50 [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks Juri Lelli
@ 2022-07-13 11:46 ` Peter Zijlstra
  2022-07-13 12:58   ` Juri Lelli
  2022-07-13 21:31 ` Srivatsa S. Bhat
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2022-07-13 11:46 UTC (permalink / raw)
  To: Juri Lelli
  Cc: LKML, linux-rt-users, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider

On Wed, Jul 13, 2022 at 09:50:14AM +0200, Juri Lelli wrote:
> Tasks the are being deboosted from SCHED_DEADLINE might enter
> enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
> since they are not boosted anymore, the if (is_dl_boosted()) branch is
> not taken, but the else if (!dl_prio) is and inside this one we
> BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
> otherwise we had entered the if branch above. Long story short, the
> current condition doesn't make sense and always leads to triggering of a
> BUG.
> 
> Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
> to be present, but additional flags are not a problem.
> 
> Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> ---
>  kernel/sched/deadline.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5867e186c39a..0447d46f4718 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1703,7 +1703,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>  		 * the throttle.
>  		 */
>  		p->dl.dl_throttled = 0;
> -		BUG_ON(!is_dl_boosted(&p->dl) || flags != ENQUEUE_REPLENISH);
> +		BUG_ON(!(flags & ENQUEUE_REPLENISH));

While there, can we perhaps make it less fatal? 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-13 11:46 ` Peter Zijlstra
@ 2022-07-13 12:58   ` Juri Lelli
  0 siblings, 0 replies; 11+ messages in thread
From: Juri Lelli @ 2022-07-13 12:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, linux-rt-users, Ingo Molnar, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider

On 13/07/22 13:46, Peter Zijlstra wrote:
> On Wed, Jul 13, 2022 at 09:50:14AM +0200, Juri Lelli wrote:
> > Tasks the are being deboosted from SCHED_DEADLINE might enter
> > enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
> > since they are not boosted anymore, the if (is_dl_boosted()) branch is
> > not taken, but the else if (!dl_prio) is and inside this one we
> > BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
> > otherwise we had entered the if branch above. Long story short, the
> > current condition doesn't make sense and always leads to triggering of a
> > BUG.
> > 
> > Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
> > to be present, but additional flags are not a problem.
> > 
> > Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")
> > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> > ---
> >  kernel/sched/deadline.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 5867e186c39a..0447d46f4718 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1703,7 +1703,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
> >  		 * the throttle.
> >  		 */
> >  		p->dl.dl_throttled = 0;
> > -		BUG_ON(!is_dl_boosted(&p->dl) || flags != ENQUEUE_REPLENISH);
> > +		BUG_ON(!(flags & ENQUEUE_REPLENISH));
> 
> While there, can we perhaps make it less fatal? 

Yep. On it. Thanks!

Juri


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-13  7:50 [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks Juri Lelli
  2022-07-13 11:46 ` Peter Zijlstra
@ 2022-07-13 21:31 ` Srivatsa S. Bhat
  2022-07-14  7:28   ` Juri Lelli
  1 sibling, 1 reply; 11+ messages in thread
From: Srivatsa S. Bhat @ 2022-07-13 21:31 UTC (permalink / raw)
  To: Juri Lelli, LKML, linux-rt-users
  Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
	Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, Valentin Schneider, Sharan Turlapati,
	bordoloih, ankitja, Keerthana K, Anish Swaminathan, Srivatsa Bhat


Hi Juri,

On 7/13/22 12:50 AM, Juri Lelli wrote:
> Tasks the are being deboosted from SCHED_DEADLINE might enter
> enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
> since they are not boosted anymore, the if (is_dl_boosted()) branch is
> not taken, but the else if (!dl_prio) is and inside this one we
> BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
> otherwise we had entered the if branch above. Long story short, the
> current condition doesn't make sense and always leads to triggering of a
> BUG.
> 
> Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
> to be present, but additional flags are not a problem.
> 
> Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")

It looks like this problem goes further back than the above commit
(which was merged in v5.10).

Even the oldest LTS kernel (4.9) has code like this:

if (... && p->dl.dl_boosted && ...)) {
	/* code */

} else if (!dl_prio(p->normal_prio)) {

	BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
	return;
} 

And we have observed crashes in the 4.19 kernel series too (CC'ed
Ankit Jain and Him Kalyan who have reproduced this issue).

I believe commit 64be6f1f5f71 ("sched/deadline: Don't replenish from a
!SCHED_DEADLINE entity") introduced the problem, which dates back to
v3.18.

Would you mind updating the Fixes: tag and adding a CC: stable tag as
well, when you respin the patch, please?

Thank you!

> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> ---
>  kernel/sched/deadline.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5867e186c39a..0447d46f4718 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1703,7 +1703,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
>  		 * the throttle.
>  		 */
>  		p->dl.dl_throttled = 0;
> -		BUG_ON(!is_dl_boosted(&p->dl) || flags != ENQUEUE_REPLENISH);
> +		BUG_ON(!(flags & ENQUEUE_REPLENISH));
>  		return;
>  	}
>  
> 

Regards,
Srivatsa
VMware Photon OS

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-13 21:31 ` Srivatsa S. Bhat
@ 2022-07-14  7:28   ` Juri Lelli
  2022-07-15  4:49     ` Ankit Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2022-07-14  7:28 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: LKML, linux-rt-users, Ingo Molnar, Peter Zijlstra,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
	Mel Gorman, Daniel Bristot de Oliveira, Valentin Schneider,
	Sharan Turlapati, bordoloih, ankitja, Keerthana K,
	Anish Swaminathan, Srivatsa Bhat

Hi,

On 13/07/22 14:31, Srivatsa S. Bhat wrote:
> 
> Hi Juri,
> 
> On 7/13/22 12:50 AM, Juri Lelli wrote:
> > Tasks the are being deboosted from SCHED_DEADLINE might enter
> > enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
> > since they are not boosted anymore, the if (is_dl_boosted()) branch is
> > not taken, but the else if (!dl_prio) is and inside this one we
> > BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
> > otherwise we had entered the if branch above. Long story short, the
> > current condition doesn't make sense and always leads to triggering of a
> > BUG.
> > 
> > Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
> > to be present, but additional flags are not a problem.
> > 
> > Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")
> 
> It looks like this problem goes further back than the above commit
> (which was merged in v5.10).
> 
> Even the oldest LTS kernel (4.9) has code like this:
> 
> if (... && p->dl.dl_boosted && ...)) {
> 	/* code */
> 
> } else if (!dl_prio(p->normal_prio)) {
> 
> 	BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
> 	return;
> } 
> 
> And we have observed crashes in the 4.19 kernel series too (CC'ed
> Ankit Jain and Him Kalyan who have reproduced this issue).
> 
> I believe commit 64be6f1f5f71 ("sched/deadline: Don't replenish from a
> !SCHED_DEADLINE entity") introduced the problem, which dates back to
> v3.18.
> 
> Would you mind updating the Fixes: tag and adding a CC: stable tag as
> well, when you respin the patch, please?

I think you are right. Will do.

Thanks for taking a look!

Best,
Juri


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-14  7:28   ` Juri Lelli
@ 2022-07-15  4:49     ` Ankit Jain
  2022-07-15  7:47       ` Juri Lelli
  0 siblings, 1 reply; 11+ messages in thread
From: Ankit Jain @ 2022-07-15  4:49 UTC (permalink / raw)
  To: Juri Lelli
  Cc: srivatsa@csail.mit.edu, LKML, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Sharan Turlapati, Him Kalyan Bordoloi,
	Keerthana Kalyanasundaram, Anish Swaminathan, Srivatsa Bhat

[Resending my previous email in plaintext]

> On 14-Jul-2022, at 12:58 PM, Juri Lelli <juri.lelli@redhat.com> wrote:
> 
> ⚠ External Email
> 
> Hi,
> 
> On 13/07/22 14:31, Srivatsa S. Bhat wrote:
>> 
>> Hi Juri,
>> 
>> On 7/13/22 12:50 AM, Juri Lelli wrote:
>>> Tasks the are being deboosted from SCHED_DEADLINE might enter
>>> enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
>>> since they are not boosted anymore, the if (is_dl_boosted()) branch is
>>> not taken, but the else if (!dl_prio) is and inside this one we
>>> BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
>>> otherwise we had entered the if branch above. Long story short, the
>>> current condition doesn't make sense and always leads to triggering of a
>>> BUG.
>>> 
>>> Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
>>> to be present, but additional flags are not a problem.
>>> 
>>> Fixes: 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes")
>> 
>> It looks like this problem goes further back than the above commit
>> (which was merged in v5.10).
>> 
>> Even the oldest LTS kernel (4.9) has code like this:
>> 
>> if (... && p->dl.dl_boosted && ...)) {
>>      /* code */
>> 
>> } else if (!dl_prio(p->normal_prio)) {
>> 
>>      BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH);
>>      return;
>> }
>> 
>> And we have observed crashes in the 4.19 kernel series too (CC'ed
>> Ankit Jain and Him Kalyan who have reproduced this issue).
>> 
>> I believe commit 64be6f1f5f71 ("sched/deadline: Don't replenish from a
>> !SCHED_DEADLINE entity") introduced the problem, which dates back to
>> v3.18.
>> 
>> Would you mind updating the Fixes: tag and adding a CC: stable tag as
>> well, when you respin the patch, please?
> 
> I think you are right. Will do.
> 
> Thanks for taking a look!
> 
> Best,
> Juri
> 

Hi Juri,

I tried the patch but it still hit the BUG_ON.

[  163.094094] ------------[ cut here ]------------
[  163.094095] kernel BUG at kernel/sched/deadline.c:1525!
[  163.094103] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  163.094105] CPU: 0 PID: 5494 Comm: stalld/34 Not tainted 4.19.247-rt108-10.ph3-rt #1-photon
[  163.094107] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
[  163.094113] RIP: 0010:enqueue_task_dl+0x35d/0x9d0
[  163.094115] Code: 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 8b 56 74 85 d2 0f 88 91 fd ff ff 80 a6 0c 03 00 00 fe 41 83 e5 20 0f 85
[  163.094116] RSP: 0018:ffff9b9286537e40 EFLAGS: 00010046
[  163.094118] RAX: ffffffff840bded0 RBX: ffff8dda07c48000 RCX: 0000000000002000
[  163.094119] RDX: 0000000000000078 RSI: ffff8dda07c48000 RDI: ffff8dddb79a87c0
[  163.094120] RBP: ffff9b9286537e78 R08: 0000000000000000 R09: 000000000000007f
[  163.094121] R10: ffff9b9286537e68 R11: 0000000000000000 R12: ffff9b9286537ef0
[  163.094121] R13: 0000000000000000 R14: ffff8dddb79a87c0 R15: ffff8dda07c482b8
[  163.094123] FS:  00007f81a27e4700(0000) GS:ffff8ddbb7600000(0000) knlGS:0000000000000000
[  163.094124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  163.094125] CR2: 00007f85e6cfc4c0 CR3: 0000000233744004 CR4: 00000000007606b0
[  163.094176] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  163.094177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  163.094177] PKRU: 55555554
[  163.094178] Call Trace:
[  163.094183]  ? dequeue_task_dl+0x38/0x1d0
[  163.094188]  __sched_setscheduler+0x2e2/0x8e0
[  163.094191]  __x64_sys_sched_setattr+0x74/0xb0
[  163.094194]  do_syscall_64+0x60/0x1b0
[  163.094200]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  163.094201] RIP: 0033:0x7f81b6ffe319
[  163.094202] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c
[  163.094203] RSP: 002b:00007f81a27e3e28 EFLAGS: 00000206 ORIG_RAX: 000000000000013a
[  163.094205] RAX: ffffffffffffffda RBX: 000000000000150a RCX: 00007f81b6ffe319
[  163.094205] RDX: 0000000000000000 RSI: 00007f81a27e3e50 RDI: 000000000000150a
[  163.094206] RBP: 000000000000150a R08: 0000000000000000 R09: 0000000000000030
[  163.094207] R10: 00007f80f0002090 R11: 0000000000000206 R12: 0000556306de7a20
[  163.094208] R13: 0000000000000002 R14: 0000556306ba3570 R15: 00007f81a27e4700
[  163.094210] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp
[  163.099512] ---[ end trace 0000000000000002 ]--- 


In enqueue_task_dl():

} else if (!dl_prio(p->normal_prio)) {
  …
  BUG_ON(!(flags & ENQUEUE_REPLENISH));
  return;
}

I observe flags value as (ENQUEUE_RESTORE |  ENQUEUE_NOCLOCK)


Thanks,
Ankit


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-15  4:49     ` Ankit Jain
@ 2022-07-15  7:47       ` Juri Lelli
  2022-07-18  7:46         ` Ankit Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2022-07-15  7:47 UTC (permalink / raw)
  To: Ankit Jain
  Cc: srivatsa@csail.mit.edu, LKML, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Sharan Turlapati, Him Kalyan Bordoloi,
	Keerthana Kalyanasundaram, Anish Swaminathan, Srivatsa Bhat

Hi,

On 15/07/22 04:49, Ankit Jain wrote:

[...]

> Hi Juri,
> 
> I tried the patch but it still hit the BUG_ON.
> 
> [  163.094094] ------------[ cut here ]------------
> [  163.094095] kernel BUG at kernel/sched/deadline.c:1525!
> [  163.094103] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> [  163.094105] CPU: 0 PID: 5494 Comm: stalld/34 Not tainted 4.19.247-rt108-10.ph3-rt #1-photon
> [  163.094107] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
> [  163.094113] RIP: 0010:enqueue_task_dl+0x35d/0x9d0
> [  163.094115] Code: 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 8b 56 74 85 d2 0f 88 91 fd ff ff 80 a6 0c 03 00 00 fe 41 83 e5 20 0f 85
> [  163.094116] RSP: 0018:ffff9b9286537e40 EFLAGS: 00010046
> [  163.094118] RAX: ffffffff840bded0 RBX: ffff8dda07c48000 RCX: 0000000000002000
> [  163.094119] RDX: 0000000000000078 RSI: ffff8dda07c48000 RDI: ffff8dddb79a87c0
> [  163.094120] RBP: ffff9b9286537e78 R08: 0000000000000000 R09: 000000000000007f
> [  163.094121] R10: ffff9b9286537e68 R11: 0000000000000000 R12: ffff9b9286537ef0
> [  163.094121] R13: 0000000000000000 R14: ffff8dddb79a87c0 R15: ffff8dda07c482b8
> [  163.094123] FS:  00007f81a27e4700(0000) GS:ffff8ddbb7600000(0000) knlGS:0000000000000000
> [  163.094124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  163.094125] CR2: 00007f85e6cfc4c0 CR3: 0000000233744004 CR4: 00000000007606b0
> [  163.094176] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  163.094177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  163.094177] PKRU: 55555554
> [  163.094178] Call Trace:
> [  163.094183]  ? dequeue_task_dl+0x38/0x1d0
> [  163.094188]  __sched_setscheduler+0x2e2/0x8e0
> [  163.094191]  __x64_sys_sched_setattr+0x74/0xb0
> [  163.094194]  do_syscall_64+0x60/0x1b0
> [  163.094200]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  163.094201] RIP: 0033:0x7f81b6ffe319
> [  163.094202] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c
> [  163.094203] RSP: 002b:00007f81a27e3e28 EFLAGS: 00000206 ORIG_RAX: 000000000000013a
> [  163.094205] RAX: ffffffffffffffda RBX: 000000000000150a RCX: 00007f81b6ffe319
> [  163.094205] RDX: 0000000000000000 RSI: 00007f81a27e3e50 RDI: 000000000000150a
> [  163.094206] RBP: 000000000000150a R08: 0000000000000000 R09: 0000000000000030
> [  163.094207] R10: 00007f80f0002090 R11: 0000000000000206 R12: 0000556306de7a20
> [  163.094208] R13: 0000000000000002 R14: 0000556306ba3570 R15: 00007f81a27e4700
> [  163.094210] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp
> [  163.099512] ---[ end trace 0000000000000002 ]--- 
> 
> 
> In enqueue_task_dl():
> 
> } else if (!dl_prio(p->normal_prio)) {
>   …
>   BUG_ON(!(flags & ENQUEUE_REPLENISH));
>   return;
> }
> 
> I observe flags value as (ENQUEUE_RESTORE |  ENQUEUE_NOCLOCK)

Thanks for testing!

However, it looks like 4.19-rt is at least missing commit 46fcc4b00c3cc
("sched/deadline: Fix stale throttling on de-/boosted tasks") and commit
2279f540ea7d0 ("sched/deadline: Fix priority inheritance with multiple
scheduling classes") that might be playing also a role here.

Best,
Juri


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-15  7:47       ` Juri Lelli
@ 2022-07-18  7:46         ` Ankit Jain
  2022-07-18 13:01           ` Juri Lelli
  0 siblings, 1 reply; 11+ messages in thread
From: Ankit Jain @ 2022-07-18  7:46 UTC (permalink / raw)
  To: Juri Lelli
  Cc: srivatsa@csail.mit.edu, LKML, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Sharan Turlapati, Him Kalyan Bordoloi,
	Keerthana Kalyanasundaram, Anish Swaminathan, Srivatsa Bhat



> On 15-Jul-2022, at 1:17 PM, Juri Lelli <juri.lelli@redhat.com> wrote:
> 
> ⚠ External Email
> 
> Hi,
> 
> On 15/07/22 04:49, Ankit Jain wrote:
> 
> [...]
> 
>> Hi Juri,
>> 
>> I tried the patch but it still hit the BUG_ON.
>> 
>> [  163.094094] ------------[ cut here ]------------
>> [  163.094095] kernel BUG at kernel/sched/deadline.c:1525!
>> [  163.094103] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>> [  163.094105] CPU: 0 PID: 5494 Comm: stalld/34 Not tainted 4.19.247-rt108-10.ph3-rt #1-photon
>> [  163.094107] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/03/2018
>> [  163.094113] RIP: 0010:enqueue_task_dl+0x35d/0x9d0
>> [  163.094115] Code: 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 8b 56 74 85 d2 0f 88 91 fd ff ff 80 a6 0c 03 00 00 fe 41 83 e5 20 0f 85
>> [  163.094116] RSP: 0018:ffff9b9286537e40 EFLAGS: 00010046
>> [  163.094118] RAX: ffffffff840bded0 RBX: ffff8dda07c48000 RCX: 0000000000002000
>> [  163.094119] RDX: 0000000000000078 RSI: ffff8dda07c48000 RDI: ffff8dddb79a87c0
>> [  163.094120] RBP: ffff9b9286537e78 R08: 0000000000000000 R09: 000000000000007f
>> [  163.094121] R10: ffff9b9286537e68 R11: 0000000000000000 R12: ffff9b9286537ef0
>> [  163.094121] R13: 0000000000000000 R14: ffff8dddb79a87c0 R15: ffff8dda07c482b8
>> [  163.094123] FS:  00007f81a27e4700(0000) GS:ffff8ddbb7600000(0000) knlGS:0000000000000000
>> [  163.094124] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  163.094125] CR2: 00007f85e6cfc4c0 CR3: 0000000233744004 CR4: 00000000007606b0
>> [  163.094176] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  163.094177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  163.094177] PKRU: 55555554
>> [  163.094178] Call Trace:
>> [  163.094183]  ? dequeue_task_dl+0x38/0x1d0
>> [  163.094188]  __sched_setscheduler+0x2e2/0x8e0
>> [  163.094191]  __x64_sys_sched_setattr+0x74/0xb0
>> [  163.094194]  do_syscall_64+0x60/0x1b0
>> [  163.094200]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [  163.094201] RIP: 0033:0x7f81b6ffe319
>> [  163.094202] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c
>> [  163.094203] RSP: 002b:00007f81a27e3e28 EFLAGS: 00000206 ORIG_RAX: 000000000000013a
>> [  163.094205] RAX: ffffffffffffffda RBX: 000000000000150a RCX: 00007f81b6ffe319
>> [  163.094205] RDX: 0000000000000000 RSI: 00007f81a27e3e50 RDI: 000000000000150a
>> [  163.094206] RBP: 000000000000150a R08: 0000000000000000 R09: 0000000000000030
>> [  163.094207] R10: 00007f80f0002090 R11: 0000000000000206 R12: 0000556306de7a20
>> [  163.094208] R13: 0000000000000002 R14: 0000556306ba3570 R15: 00007f81a27e4700
>> [  163.094210] Modules linked in: ipt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp
>> [  163.099512] ---[ end trace 0000000000000002 ]---
>> 
>> 
>> In enqueue_task_dl():
>> 
>> } else if (!dl_prio(p->normal_prio)) {
>>  …
>>  BUG_ON(!(flags & ENQUEUE_REPLENISH));
>>  return;
>> }
>> 
>> I observe flags value as (ENQUEUE_RESTORE |  ENQUEUE_NOCLOCK)
> 
> Thanks for testing!
> 
> However, it looks like 4.19-rt is at least missing commit 46fcc4b00c3cc
> ("sched/deadline: Fix stale throttling on de-/boosted tasks") and commit
> 2279f540ea7d0 ("sched/deadline: Fix priority inheritance with multiple
> scheduling classes") that might be playing also a role here.
> 
> Best,
> Juri
> 

Hi Juri,

Actually, while testing I already included below commits in 4.19-rt :

feff2e65efd8d84cf831668e182b2ce73c604bbb (sched/deadline: Unthrottle PI boosted threads while
 enqueuing)
46fcc4b00c3cca8adb9b7c9afdd499f64e427135 (sched/deadline: Fix stale throttling on de-/boosted tasks)
2279f540ea7d05f22d2f0c4224319330228586bc (sched/deadline: Fix priority inheritance with multiple)
0e3872499de1a1230cef5221607d71aa09264bd5 (kernel/sched: Remove dl_boosted flag comment)

Thanks,
Ankit


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-18  7:46         ` Ankit Jain
@ 2022-07-18 13:01           ` Juri Lelli
  2022-07-19  5:30             ` Ankit Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Juri Lelli @ 2022-07-18 13:01 UTC (permalink / raw)
  To: Ankit Jain
  Cc: srivatsa@csail.mit.edu, LKML, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Sharan Turlapati, Him Kalyan Bordoloi,
	Keerthana Kalyanasundaram, Anish Swaminathan, Srivatsa Bhat

On 18/07/22 07:46, Ankit Jain wrote:

...

> Hi Juri,
> 
> Actually, while testing I already included below commits in 4.19-rt :
> 
> feff2e65efd8d84cf831668e182b2ce73c604bbb (sched/deadline: Unthrottle PI boosted threads while
>  enqueuing)
> 46fcc4b00c3cca8adb9b7c9afdd499f64e427135 (sched/deadline: Fix stale throttling on de-/boosted tasks)
> 2279f540ea7d05f22d2f0c4224319330228586bc (sched/deadline: Fix priority inheritance with multiple)
> 0e3872499de1a1230cef5221607d71aa09264bd5 (kernel/sched: Remove dl_boosted flag comment)

Interesting.

Is the workload you are using to test this easily reproducible? I'd like
to try that out on my end to check if I see the same (of course the
issue I was working on goes away with my fix :).

Best,
Juri


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-18 13:01           ` Juri Lelli
@ 2022-07-19  5:30             ` Ankit Jain
  2022-07-19  7:19               ` Juri Lelli
  0 siblings, 1 reply; 11+ messages in thread
From: Ankit Jain @ 2022-07-19  5:30 UTC (permalink / raw)
  To: Juri Lelli
  Cc: srivatsa@csail.mit.edu, LKML, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Sharan Turlapati, Him Kalyan Bordoloi,
	Keerthana Kalyanasundaram, Anish Swaminathan, Srivatsa Bhat



> On 18-Jul-2022, at 6:31 PM, Juri Lelli <juri.lelli@redhat.com> wrote:
> 
> ⚠ External Email
> 
> On 18/07/22 07:46, Ankit Jain wrote:
> 
> ...
> 
>> Hi Juri,
>> 
>> Actually, while testing I already included below commits in 4.19-rt :
>> 
>> feff2e65efd8d84cf831668e182b2ce73c604bbb (sched/deadline: Unthrottle PI boosted threads while
>> enqueuing)
>> 46fcc4b00c3cca8adb9b7c9afdd499f64e427135 (sched/deadline: Fix stale throttling on de-/boosted tasks)
>> 2279f540ea7d05f22d2f0c4224319330228586bc (sched/deadline: Fix priority inheritance with multiple)
>> 0e3872499de1a1230cef5221607d71aa09264bd5 (kernel/sched: Remove dl_boosted flag comment)
> 
> Interesting.
> 
> Is the workload you are using to test this easily reproducible? I'd like
> to try that out on my end to check if I see the same (of course the
> issue I was working on goes away with my fix :).
> 
> Best,
> Juri
> 

Hi Juri,

The test with which i am able to hit the issue is as follows:
	• Schedule SCHED_FIFO/55 (sched_priority = 55) tasks running infinite loop on all isolated cores.
	• spwan 30-40 docker containers in loop (docker load , docker run)
	• Immediately after that schedule SCHED_FIFO/55 (sched_priority = 55) tasks running infinite loop on all isolated cores again.
	• BUG_ON gets hit almost every time.
System config as follows:
	• 4.19-rt kernel
	• 40 cpu (0-1 housekeeping, 2-39 isol cpus)
	• stalld-1.3.0 with the fixes from latest version (for task starvation avoidance), "tuned" with real-time profile 

Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks
  2022-07-19  5:30             ` Ankit Jain
@ 2022-07-19  7:19               ` Juri Lelli
  0 siblings, 0 replies; 11+ messages in thread
From: Juri Lelli @ 2022-07-19  7:19 UTC (permalink / raw)
  To: Ankit Jain
  Cc: srivatsa@csail.mit.edu, LKML, linux-rt-users, Ingo Molnar,
	Peter Zijlstra, Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira,
	Valentin Schneider, Sharan Turlapati, Him Kalyan Bordoloi,
	Keerthana Kalyanasundaram, Anish Swaminathan, Srivatsa Bhat

On 19/07/22 05:30, Ankit Jain wrote:
> 
> 
> > On 18-Jul-2022, at 6:31 PM, Juri Lelli <juri.lelli@redhat.com> wrote:
> > 
> > ⚠ External Email
> > 
> > On 18/07/22 07:46, Ankit Jain wrote:
> > 
> > ...
> > 
> >> Hi Juri,
> >> 
> >> Actually, while testing I already included below commits in 4.19-rt :
> >> 
> >> feff2e65efd8d84cf831668e182b2ce73c604bbb (sched/deadline: Unthrottle PI boosted threads while
> >> enqueuing)
> >> 46fcc4b00c3cca8adb9b7c9afdd499f64e427135 (sched/deadline: Fix stale throttling on de-/boosted tasks)
> >> 2279f540ea7d05f22d2f0c4224319330228586bc (sched/deadline: Fix priority inheritance with multiple)
> >> 0e3872499de1a1230cef5221607d71aa09264bd5 (kernel/sched: Remove dl_boosted flag comment)
> > 
> > Interesting.
> > 
> > Is the workload you are using to test this easily reproducible? I'd like
> > to try that out on my end to check if I see the same (of course the
> > issue I was working on goes away with my fix :).
> > 
> > Best,
> > Juri
> > 
> 
> Hi Juri,
> 
> The test with which i am able to hit the issue is as follows:
> 	• Schedule SCHED_FIFO/55 (sched_priority = 55) tasks running infinite loop on all isolated cores.
> 	• spwan 30-40 docker containers in loop (docker load , docker run)
> 	• Immediately after that schedule SCHED_FIFO/55 (sched_priority = 55) tasks running infinite loop on all isolated cores again.
> 	• BUG_ON gets hit almost every time.
> System config as follows:
> 	• 4.19-rt kernel
> 	• 40 cpu (0-1 housekeeping, 2-39 isol cpus)
> 	• stalld-1.3.0 with the fixes from latest version (for task starvation avoidance), "tuned" with real-time profile 

Thanks for the details. Yeah, I should be able to set this up on my end.
It might just take a bit though, as I have some pto planned coming up.
But, I'll get to it eventually.

Thanks again,
Juri


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-07-19  7:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-13  7:50 [PATCH] sched/deadline: Fix BUG_ON condition for deboosted tasks Juri Lelli
2022-07-13 11:46 ` Peter Zijlstra
2022-07-13 12:58   ` Juri Lelli
2022-07-13 21:31 ` Srivatsa S. Bhat
2022-07-14  7:28   ` Juri Lelli
2022-07-15  4:49     ` Ankit Jain
2022-07-15  7:47       ` Juri Lelli
2022-07-18  7:46         ` Ankit Jain
2022-07-18 13:01           ` Juri Lelli
2022-07-19  5:30             ` Ankit Jain
2022-07-19  7:19               ` Juri Lelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox