public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] irqchip/gic-v4.1: sleeping function called from invalid context
@ 2020-06-29  9:39 Zenghui Yu
  2020-06-29 14:01 ` Marc Zyngier
  0 siblings, 1 reply; 3+ messages in thread
From: Zenghui Yu @ 2020-06-29  9:39 UTC (permalink / raw)
  To: Marc Zyngier, Thomas Gleixner, Jason Cooper
  Cc: Linux Kernel Mailing List, wanghaibin.wang, kuhn.chenqun,
	wangjingyi11

Hi All,

Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 enabled
box, I get the following kernel splat:

[    0.053766] BUG: sleeping function called from invalid context at 
mm/slab.h:567
[    0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 
0, name: swapper/1
[    0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23
[    0.053770] Call trace:
[    0.053774]  dump_backtrace+0x0/0x218
[    0.053775]  show_stack+0x2c/0x38
[    0.053777]  dump_stack+0xc4/0x10c
[    0.053779]  ___might_sleep+0xfc/0x140
[    0.053780]  __might_sleep+0x58/0x90
[    0.053782]  slab_pre_alloc_hook+0x7c/0x90
[    0.053783]  kmem_cache_alloc_trace+0x60/0x2f0
[    0.053785]  its_cpu_init+0x6f4/0xe40
[    0.053786]  gic_starting_cpu+0x24/0x38
[    0.053788]  cpuhp_invoke_callback+0xa0/0x710
[    0.053789]  notify_cpu_starting+0xcc/0xd8
[    0.053790]  secondary_start_kernel+0x148/0x200

# ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40
its_cpu_init+0x6f4/0xe40:
allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818
(inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138
(inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166

I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate memory
in this atomic context, and the splat disappears. But after a quick look
at [*], it seems not a good idea to allocate memory within the CPU
hotplug notifier. I really don't know much about it, please have a look.

[*] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f


Thanks,
Zenghui

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] irqchip/gic-v4.1: sleeping function called from invalid context
  2020-06-29  9:39 [BUG] irqchip/gic-v4.1: sleeping function called from invalid context Zenghui Yu
@ 2020-06-29 14:01 ` Marc Zyngier
  2020-06-30  3:00   ` Zenghui Yu
  0 siblings, 1 reply; 3+ messages in thread
From: Marc Zyngier @ 2020-06-29 14:01 UTC (permalink / raw)
  To: Zenghui Yu
  Cc: Thomas Gleixner, Jason Cooper, Linux Kernel Mailing List,
	wanghaibin.wang, kuhn.chenqun, wangjingyi11

Hi Zenghui,

On 2020-06-29 10:39, Zenghui Yu wrote:
> Hi All,
> 
> Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 
> enabled
> box, I get the following kernel splat:
> 
> [    0.053766] BUG: sleeping function called from invalid context at
> mm/slab.h:567
> [    0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0,
> pid: 0, name: swapper/1
> [    0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23
> [    0.053770] Call trace:
> [    0.053774]  dump_backtrace+0x0/0x218
> [    0.053775]  show_stack+0x2c/0x38
> [    0.053777]  dump_stack+0xc4/0x10c
> [    0.053779]  ___might_sleep+0xfc/0x140
> [    0.053780]  __might_sleep+0x58/0x90
> [    0.053782]  slab_pre_alloc_hook+0x7c/0x90
> [    0.053783]  kmem_cache_alloc_trace+0x60/0x2f0
> [    0.053785]  its_cpu_init+0x6f4/0xe40
> [    0.053786]  gic_starting_cpu+0x24/0x38
> [    0.053788]  cpuhp_invoke_callback+0xa0/0x710
> [    0.053789]  notify_cpu_starting+0xcc/0xd8
> [    0.053790]  secondary_start_kernel+0x148/0x200
> 
> # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40
> its_cpu_init+0x6f4/0xe40:
> allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818
> (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138
> (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166

Let me guess: a system with more than a single CommonLPIAff group?

> I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate 
> memory
> in this atomic context, and the splat disappears. But after a quick 
> look
> at [*], it seems not a good idea to allocate memory within the CPU
> hotplug notifier. I really don't know much about it, please have a 
> look.
> 
> [*]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f

The allocation of the cpumask is pretty benign, and could either be
allocated upfront for all RDs (and freed on detecting that we share
the same CommonLPIAff group) or made atomic.

The much bigger issue is the alloc_pages call just after. Allocating 
this
upfront probably is the wrong thing to do, as you are likely to allocate
way too much memory, even if you free it quickly afterwards.

At this stage, I'd rather we turn this into an atomic allocation. A 
notifier
is just another atomic context, and if this fails at such an early 
stage,
then the CPU is unlikely to continue booting...

Would you like to write a patch for this? Given that you have tested
something, it probably already exists. Or do you want me to do it?

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] irqchip/gic-v4.1: sleeping function called from invalid context
  2020-06-29 14:01 ` Marc Zyngier
@ 2020-06-30  3:00   ` Zenghui Yu
  0 siblings, 0 replies; 3+ messages in thread
From: Zenghui Yu @ 2020-06-30  3:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Thomas Gleixner, Jason Cooper, Linux Kernel Mailing List,
	wanghaibin.wang, kuhn.chenqun, wangjingyi11

Hi Marc,

On 2020/6/29 22:01, Marc Zyngier wrote:
> Hi Zenghui,
> 
> On 2020-06-29 10:39, Zenghui Yu wrote:
>> Hi All,
>>
>> Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 enabled
>> box, I get the following kernel splat:
>>
>> [    0.053766] BUG: sleeping function called from invalid context at
>> mm/slab.h:567
>> [    0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0,
>> pid: 0, name: swapper/1
>> [    0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23
>> [    0.053770] Call trace:
>> [    0.053774]  dump_backtrace+0x0/0x218
>> [    0.053775]  show_stack+0x2c/0x38
>> [    0.053777]  dump_stack+0xc4/0x10c
>> [    0.053779]  ___might_sleep+0xfc/0x140
>> [    0.053780]  __might_sleep+0x58/0x90
>> [    0.053782]  slab_pre_alloc_hook+0x7c/0x90
>> [    0.053783]  kmem_cache_alloc_trace+0x60/0x2f0
>> [    0.053785]  its_cpu_init+0x6f4/0xe40
>> [    0.053786]  gic_starting_cpu+0x24/0x38
>> [    0.053788]  cpuhp_invoke_callback+0xa0/0x710
>> [    0.053789]  notify_cpu_starting+0xcc/0xd8
>> [    0.053790]  secondary_start_kernel+0x148/0x200
>>
>> # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40
>> its_cpu_init+0x6f4/0xe40:
>> allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818
>> (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138
>> (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166
> 
> Let me guess: a system with more than a single CommonLPIAff group?

I *think* you're right. E.g., when we're allocating vpe_table_mask for
the first CPU of the second CommonLPIAff group.

The truth is that all the GICv4.1 boards I'm having on hand only have a
single CommonLPIAff group. Just to get the above backtrace, I did some
crazy hacking on my 920 and pretend it as v4.1 capable (well, please
ignore me). Hopefully I can get a new GICv4.1 board with more than one
CommonLPIAff group next month and do more tests.

>> I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate memory
>> in this atomic context, and the splat disappears. But after a quick look
>> at [*], it seems not a good idea to allocate memory within the CPU
>> hotplug notifier. I really don't know much about it, please have a look.
>>
>> [*]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f 
>>
> 
> The allocation of the cpumask is pretty benign, and could either be
> allocated upfront for all RDs (and freed on detecting that we share
> the same CommonLPIAff group) or made atomic.
> 
> The much bigger issue is the alloc_pages call just after. Allocating this
> upfront probably is the wrong thing to do, as you are likely to allocate
> way too much memory, even if you free it quickly afterwards.
> 
> At this stage, I'd rather we turn this into an atomic allocation. A 
> notifier
> is just another atomic context, and if this fails at such an early stage,
> then the CPU is unlikely to continue booting...

Got it.

> Would you like to write a patch for this? Given that you have tested
> something, it probably already exists. Or do you want me to do it?

Yes, I had written something like below. I will add a commit message and
send it out today.

diff --git a/drivers/irqchip/irq-gic-v3-its.c 
b/drivers/irqchip/irq-gic-v3-its.c
index 6a5a87fc4601..b66eeca442c4 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -2814,7 +2814,7 @@ static int allocate_vpe_l1_table(void)
  	if (val & GICR_VPROPBASER_4_1_VALID)
  		goto out;

-	gic_data_rdist()->vpe_table_mask = kzalloc(sizeof(cpumask_t), GFP_KERNEL);
+	gic_data_rdist()->vpe_table_mask = kzalloc(sizeof(cpumask_t), GFP_ATOMIC);
  	if (!gic_data_rdist()->vpe_table_mask)
  		return -ENOMEM;

@@ -2881,7 +2881,7 @@ static int allocate_vpe_l1_table(void)

  	pr_debug("np = %d, npg = %lld, psz = %d, epp = %d, esz = %d\n",
  		 np, npg, psz, epp, esz);
-	page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(np * PAGE_SIZE));
+	page = alloc_pages(GFP_ATOMIC | __GFP_ZERO, get_order(np * PAGE_SIZE));
  	if (!page)
  		return -ENOMEM;


Thanks,
Zenghui

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-06-30  3:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-29  9:39 [BUG] irqchip/gic-v4.1: sleeping function called from invalid context Zenghui Yu
2020-06-29 14:01 ` Marc Zyngier
2020-06-30  3:00   ` Zenghui Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox