* [BUG] irqchip/gic-v4.1: sleeping function called from invalid context @ 2020-06-29 9:39 Zenghui Yu 2020-06-29 14:01 ` Marc Zyngier 0 siblings, 1 reply; 3+ messages in thread From: Zenghui Yu @ 2020-06-29 9:39 UTC (permalink / raw) To: Marc Zyngier, Thomas Gleixner, Jason Cooper Cc: Linux Kernel Mailing List, wanghaibin.wang, kuhn.chenqun, wangjingyi11 Hi All, Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 enabled box, I get the following kernel splat: [ 0.053766] BUG: sleeping function called from invalid context at mm/slab.h:567 [ 0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1 [ 0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23 [ 0.053770] Call trace: [ 0.053774] dump_backtrace+0x0/0x218 [ 0.053775] show_stack+0x2c/0x38 [ 0.053777] dump_stack+0xc4/0x10c [ 0.053779] ___might_sleep+0xfc/0x140 [ 0.053780] __might_sleep+0x58/0x90 [ 0.053782] slab_pre_alloc_hook+0x7c/0x90 [ 0.053783] kmem_cache_alloc_trace+0x60/0x2f0 [ 0.053785] its_cpu_init+0x6f4/0xe40 [ 0.053786] gic_starting_cpu+0x24/0x38 [ 0.053788] cpuhp_invoke_callback+0xa0/0x710 [ 0.053789] notify_cpu_starting+0xcc/0xd8 [ 0.053790] secondary_start_kernel+0x148/0x200 # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40 its_cpu_init+0x6f4/0xe40: allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818 (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138 (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166 I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate memory in this atomic context, and the splat disappears. But after a quick look at [*], it seems not a good idea to allocate memory within the CPU hotplug notifier. I really don't know much about it, please have a look. [*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f Thanks, Zenghui ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] irqchip/gic-v4.1: sleeping function called from invalid context 2020-06-29 9:39 [BUG] irqchip/gic-v4.1: sleeping function called from invalid context Zenghui Yu @ 2020-06-29 14:01 ` Marc Zyngier 2020-06-30 3:00 ` Zenghui Yu 0 siblings, 1 reply; 3+ messages in thread From: Marc Zyngier @ 2020-06-29 14:01 UTC (permalink / raw) To: Zenghui Yu Cc: Thomas Gleixner, Jason Cooper, Linux Kernel Mailing List, wanghaibin.wang, kuhn.chenqun, wangjingyi11 Hi Zenghui, On 2020-06-29 10:39, Zenghui Yu wrote: > Hi All, > > Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 > enabled > box, I get the following kernel splat: > > [ 0.053766] BUG: sleeping function called from invalid context at > mm/slab.h:567 > [ 0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, > pid: 0, name: swapper/1 > [ 0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23 > [ 0.053770] Call trace: > [ 0.053774] dump_backtrace+0x0/0x218 > [ 0.053775] show_stack+0x2c/0x38 > [ 0.053777] dump_stack+0xc4/0x10c > [ 0.053779] ___might_sleep+0xfc/0x140 > [ 0.053780] __might_sleep+0x58/0x90 > [ 0.053782] slab_pre_alloc_hook+0x7c/0x90 > [ 0.053783] kmem_cache_alloc_trace+0x60/0x2f0 > [ 0.053785] its_cpu_init+0x6f4/0xe40 > [ 0.053786] gic_starting_cpu+0x24/0x38 > [ 0.053788] cpuhp_invoke_callback+0xa0/0x710 > [ 0.053789] notify_cpu_starting+0xcc/0xd8 > [ 0.053790] secondary_start_kernel+0x148/0x200 > > # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40 > its_cpu_init+0x6f4/0xe40: > allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818 > (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138 > (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166 Let me guess: a system with more than a single CommonLPIAff group? > I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate > memory > in this atomic context, and the splat disappears. But after a quick > look > at [*], it seems not a good idea to allocate memory within the CPU > hotplug notifier. I really don't know much about it, please have a > look. > > [*] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f The allocation of the cpumask is pretty benign, and could either be allocated upfront for all RDs (and freed on detecting that we share the same CommonLPIAff group) or made atomic. The much bigger issue is the alloc_pages call just after. Allocating this upfront probably is the wrong thing to do, as you are likely to allocate way too much memory, even if you free it quickly afterwards. At this stage, I'd rather we turn this into an atomic allocation. A notifier is just another atomic context, and if this fails at such an early stage, then the CPU is unlikely to continue booting... Would you like to write a patch for this? Given that you have tested something, it probably already exists. Or do you want me to do it? Thanks, M. -- Jazz is not dead. It just smells funny... ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] irqchip/gic-v4.1: sleeping function called from invalid context 2020-06-29 14:01 ` Marc Zyngier @ 2020-06-30 3:00 ` Zenghui Yu 0 siblings, 0 replies; 3+ messages in thread From: Zenghui Yu @ 2020-06-30 3:00 UTC (permalink / raw) To: Marc Zyngier Cc: Thomas Gleixner, Jason Cooper, Linux Kernel Mailing List, wanghaibin.wang, kuhn.chenqun, wangjingyi11 Hi Marc, On 2020/6/29 22:01, Marc Zyngier wrote: > Hi Zenghui, > > On 2020-06-29 10:39, Zenghui Yu wrote: >> Hi All, >> >> Booting the latest kernel with DEBUG_ATOMIC_SLEEP=y on a GICv4.1 enabled >> box, I get the following kernel splat: >> >> [ 0.053766] BUG: sleeping function called from invalid context at >> mm/slab.h:567 >> [ 0.053767] in_atomic(): 1, irqs_disabled(): 128, non_block: 0, >> pid: 0, name: swapper/1 >> [ 0.053769] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3+ #23 >> [ 0.053770] Call trace: >> [ 0.053774] dump_backtrace+0x0/0x218 >> [ 0.053775] show_stack+0x2c/0x38 >> [ 0.053777] dump_stack+0xc4/0x10c >> [ 0.053779] ___might_sleep+0xfc/0x140 >> [ 0.053780] __might_sleep+0x58/0x90 >> [ 0.053782] slab_pre_alloc_hook+0x7c/0x90 >> [ 0.053783] kmem_cache_alloc_trace+0x60/0x2f0 >> [ 0.053785] its_cpu_init+0x6f4/0xe40 >> [ 0.053786] gic_starting_cpu+0x24/0x38 >> [ 0.053788] cpuhp_invoke_callback+0xa0/0x710 >> [ 0.053789] notify_cpu_starting+0xcc/0xd8 >> [ 0.053790] secondary_start_kernel+0x148/0x200 >> >> # ./scripts/faddr2line vmlinux its_cpu_init+0x6f4/0xe40 >> its_cpu_init+0x6f4/0xe40: >> allocate_vpe_l1_table at drivers/irqchip/irq-gic-v3-its.c:2818 >> (inlined by) its_cpu_init_lpis at drivers/irqchip/irq-gic-v3-its.c:3138 >> (inlined by) its_cpu_init at drivers/irqchip/irq-gic-v3-its.c:5166 > > Let me guess: a system with more than a single CommonLPIAff group? I *think* you're right. E.g., when we're allocating vpe_table_mask for the first CPU of the second CommonLPIAff group. The truth is that all the GICv4.1 boards I'm having on hand only have a single CommonLPIAff group. Just to get the above backtrace, I did some crazy hacking on my 920 and pretend it as v4.1 capable (well, please ignore me). Hopefully I can get a new GICv4.1 board with more than one CommonLPIAff group next month and do more tests. >> I've tried to replace GFP_KERNEL flag with GFP_ATOMIC to allocate memory >> in this atomic context, and the splat disappears. But after a quick look >> at [*], it seems not a good idea to allocate memory within the CPU >> hotplug notifier. I really don't know much about it, please have a look. >> >> [*] >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=11e37d357f6ba7a9af850a872396082cc0a0001f >> > > The allocation of the cpumask is pretty benign, and could either be > allocated upfront for all RDs (and freed on detecting that we share > the same CommonLPIAff group) or made atomic. > > The much bigger issue is the alloc_pages call just after. Allocating this > upfront probably is the wrong thing to do, as you are likely to allocate > way too much memory, even if you free it quickly afterwards. > > At this stage, I'd rather we turn this into an atomic allocation. A > notifier > is just another atomic context, and if this fails at such an early stage, > then the CPU is unlikely to continue booting... Got it. > Would you like to write a patch for this? Given that you have tested > something, it probably already exists. Or do you want me to do it? Yes, I had written something like below. I will add a commit message and send it out today. diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 6a5a87fc4601..b66eeca442c4 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2814,7 +2814,7 @@ static int allocate_vpe_l1_table(void) if (val & GICR_VPROPBASER_4_1_VALID) goto out; - gic_data_rdist()->vpe_table_mask = kzalloc(sizeof(cpumask_t), GFP_KERNEL); + gic_data_rdist()->vpe_table_mask = kzalloc(sizeof(cpumask_t), GFP_ATOMIC); if (!gic_data_rdist()->vpe_table_mask) return -ENOMEM; @@ -2881,7 +2881,7 @@ static int allocate_vpe_l1_table(void) pr_debug("np = %d, npg = %lld, psz = %d, epp = %d, esz = %d\n", np, npg, psz, epp, esz); - page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(np * PAGE_SIZE)); + page = alloc_pages(GFP_ATOMIC | __GFP_ZERO, get_order(np * PAGE_SIZE)); if (!page) return -ENOMEM; Thanks, Zenghui ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-06-30 3:02 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-29 9:39 [BUG] irqchip/gic-v4.1: sleeping function called from invalid context Zenghui Yu 2020-06-29 14:01 ` Marc Zyngier 2020-06-30 3:00 ` Zenghui Yu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox