* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review [not found] ` <718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net> @ 2024-08-05 3:28 ` Guenter Roeck 2024-08-05 8:56 ` Thomas Gleixner [not found] ` <CAHk-=wiZ7WJQ1y=CwuMwqBxQYtaD8psq+Vxa3r1Z6_ftDZK+hA@mail.gmail.com> 1 sibling, 1 reply; 33+ messages in thread From: Guenter Roeck @ 2024-08-05 3:28 UTC (permalink / raw) To: Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On 8/4/24 11:36, Guenter Roeck wrote: > Hi, > > On 7/31/24 03:03, Greg Kroah-Hartman wrote: >> This is the start of the stable review cycle for the 6.10.3 release. >> There are 809 patches in this series, all will be posted as a response >> to this one. If anyone has any issues with these being applied, please >> let me know. >> >> Responses should be made by Fri, 02 Aug 2024 09:47:47 +0000. >> Anything received after that time might be too late. >> > [ ... ] > >> Rafael J. Wysocki <rafael.j.wysocki@intel.com> >> genirq: Set IRQF_COND_ONESHOT in request_irq() >> > > With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages > > [ 0.000000] ============================================================================= > [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 > [ 0.000000] ----------------------------------------------------------------------------- > > This never stops until the emulation aborts. > > Reverting this patch fixes the problem for me. > > I noticed a similar problem in the mainline kernel but it is either spurious there > or the problem has been fixed. > As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me. I guess that suggests some kind of race condition. Added Helge and the parisc mailing list to Cc:. Sorry, I forgot that earlier. Guenter --- diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index dd53298ef1a5..53a0f654ab56 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -8,6 +8,7 @@ #define pr_fmt(fmt) "genirq: " fmt +#include <linux/delay.h> #include <linux/irq.h> #include <linux/kthread.h> #include <linux/module.h> @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler, struct irq_desc *desc; int retval; + udelay(1); + if (irq == IRQ_NOTCONNECTED) return -ENOTCONN; ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-05 3:28 ` [PATCH 6.10 000/809] 6.10.3-rc3 review Guenter Roeck @ 2024-08-05 8:56 ` Thomas Gleixner 2024-08-05 12:51 ` Thomas Gleixner 2024-08-05 17:42 ` Guenter Roeck 0 siblings, 2 replies; 33+ messages in thread From: Thomas Gleixner @ 2024-08-05 8:56 UTC (permalink / raw) To: Guenter Roeck, Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote: > On 8/4/24 11:36, Guenter Roeck wrote: >>> Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>> genirq: Set IRQF_COND_ONESHOT in request_irq() >>> >> >> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages >> >> [ 0.000000] ============================================================================= >> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 >> [ 0.000000] ----------------------------------------------------------------------------- Do you have a full boot log? It's unclear to me at which point of the boot process this happens. Is this before or after the secondary CPUs have been brought up? >> This never stops until the emulation aborts. Do you have a recipe how to reproduce? >> Reverting this patch fixes the problem for me. >> >> I noticed a similar problem in the mainline kernel but it is either spurious there >> or the problem has been fixed. >> > > As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me. > I guess that suggests some kind of race condition. > > > @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler, > struct irq_desc *desc; > int retval; > > + udelay(1); > + > if (irq == IRQ_NOTCONNECTED) > return -ENOTCONN; That all makes absolutely no sense to me. IRQF_COND_ONESHOT has only an effect on shared interrupts, when the interrupt was already requested with IRQF_ONESHOT. If this is really a race then the following must be true: 1) no delay CPU0 CPU1 request_irq(IRQF_ONESHOT) request_irq(IRQF_COND_ONESHOT) 2) delay CPU0 CPU1 request_irq(IRQF_COND_ONESHOT) request_irq(IRQF_ONESHOT) In this case the request on CPU 0 fails with -EBUSY ... Confused tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-05 8:56 ` Thomas Gleixner @ 2024-08-05 12:51 ` Thomas Gleixner 2024-08-05 15:02 ` Guenter Roeck 2024-08-05 17:42 ` Guenter Roeck 1 sibling, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-05 12:51 UTC (permalink / raw) To: Guenter Roeck, Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On Mon, Aug 05 2024 at 10:56, Thomas Gleixner wrote: > If this is really a race then the following must be true: > > 1) no delay > > CPU0 CPU1 > request_irq(IRQF_ONESHOT) > request_irq(IRQF_COND_ONESHOT) > > 2) delay > > CPU0 CPU1 > request_irq(IRQF_COND_ONESHOT) > request_irq(IRQF_ONESHOT) > > In this case the request on CPU 0 fails with -EBUSY ... > > Confused More confusing: Adding a printk() in setup_irq() - using the config, rootfs and the run.sh script from: http://server.roeck-us.net/qemu/parisc64-6.1.5/ results in: [ 0.000000] genirq: 64 flags: 00215600 [ 0.000000] genirq: 65 flags: 00200400 [ 8.110946] genirq: 66 flags: 00200080 IRQF_ONESHOT is 0x2000 which is not set by any of the interrupt requests. IRQF_COND_ONESHOT has only an effect when 1) Interrupt is shared 2) First interrupt request has IRQF_ONESHOT set Neither #1 nor #2 are true, but maybe your current config enables some moar devices than the one on your website. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-05 12:51 ` Thomas Gleixner @ 2024-08-05 15:02 ` Guenter Roeck 2024-08-05 21:49 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: Guenter Roeck @ 2024-08-05 15:02 UTC (permalink / raw) To: Thomas Gleixner, Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On 8/5/24 05:51, Thomas Gleixner wrote: > On Mon, Aug 05 2024 at 10:56, Thomas Gleixner wrote: >> If this is really a race then the following must be true: >> >> 1) no delay >> >> CPU0 CPU1 >> request_irq(IRQF_ONESHOT) >> request_irq(IRQF_COND_ONESHOT) >> >> 2) delay >> >> CPU0 CPU1 >> request_irq(IRQF_COND_ONESHOT) >> request_irq(IRQF_ONESHOT) >> >> In this case the request on CPU 0 fails with -EBUSY ... >> >> Confused > > More confusing: > > Adding a printk() in setup_irq() - using the config, rootfs and the run.sh > script from: > > http://server.roeck-us.net/qemu/parisc64-6.1.5/ > > results in: > > [ 0.000000] genirq: 64 flags: 00215600 > [ 0.000000] genirq: 65 flags: 00200400 > [ 8.110946] genirq: 66 flags: 00200080 > > IRQF_ONESHOT is 0x2000 which is not set by any of the interrupt > requests. > > IRQF_COND_ONESHOT has only an effect when > > 1) Interrupt is shared > 2) First interrupt request has IRQF_ONESHOT set > > Neither #1 nor #2 are true, but maybe your current config enables some moar > devices than the one on your website. > No, it is pretty much the same, except for a more recent C compiler, and it requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/. Debugging shows pretty much the same for me, and any log message added to request_irq() makes the problem go away (or be different), and if the problem is seen it doesn't even get to the third interrupt request. I copied a more complete log to bad.log.gz in above page. Below is yet another "fix" of the problem, just as puzzling as the other "fix". Guenter --- diff --git a/arch/parisc/kernel/time.c b/arch/parisc/kernel/time.c index 9714fbd7c42d..9707914c1a62 100644 --- a/arch/parisc/kernel/time.c +++ b/arch/parisc/kernel/time.c @@ -75,6 +75,8 @@ irqreturn_t __irq_entry timer_interrupt(int irq, void *dev_id) /* Initialize next_tick to the old expected tick time. */ next_tick = cpuinfo->it_value; + pr_info_once("####### First timer interrupt\n"); + /* Calculate how many ticks have elapsed. */ now = mfctl(16); do { ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-05 15:02 ` Guenter Roeck @ 2024-08-05 21:49 ` Thomas Gleixner 2024-08-06 1:16 ` Guenter Roeck 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-05 21:49 UTC (permalink / raw) To: Guenter Roeck, Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On Mon, Aug 05 2024 at 08:02, Guenter Roeck wrote: > On 8/5/24 05:51, Thomas Gleixner wrote: >> IRQF_COND_ONESHOT has only an effect when >> >> 1) Interrupt is shared >> 2) First interrupt request has IRQF_ONESHOT set >> >> Neither #1 nor #2 are true, but maybe your current config enables some moar >> devices than the one on your website. >> > > No, it is pretty much the same, except for a more recent C compiler, and it > requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/. > > Debugging shows pretty much the same for me, and any log message added > to request_irq() makes the problem go away (or be different), and if the problem > is seen it doesn't even get to the third interrupt request. I copied a more complete > log to bad.log.gz in above page. At the point where the problem happens is way before the first interrupt is requested, so that definitely excludes any side effect of COND_ONESHOT. It happens right after [ 0.000000] Memory: 991812K/1048576K available (16752K kernel code, 5220K rwdata, 3044K rodata, 760K init, 1424K bss, 56764K reserved, 0K cma-reserved) SNIP [ 0.000000] ** This system shows unhashed kernel memory addresses ** SNIP I.e. the big fat warning about unhashed kernel addresses) In the good case the first interrupt is requested here: [ 0.000000] Memory: 992804K/1048576K available (16532K kernel code, 5096K rwdata, 2984K rodata, 744K init, 1412K bss, 55772K reserved, 0K cma-reserved) SNIP [ 0.000000] ** This system shows unhashed kernel memory addresses ** SNIP [ 0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] ODEBUG: selftest passed [ 0.000000] rcu: Hierarchical RCU implementation. [ 0.000000] rcu: RCU event tracing is enabled. [ 0.000000] rcu: RCU callback double-/use-after-free debug is enabled. [ 0.000000] rcu: RCU debug extended QS entry/exit. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] NR_IRQS: 128 <- This is where the interrupt subsystem is initialized [ 0.000000] genirq: 64: 00215600 <- This is probably the timer interrupt Looking at the backtrace the fail happens from within start_kernel(), i.e. mm_core_init(): slab_err+0x13c/0x190 check_slab+0xf4/0x140 alloc_debug_processing+0x58/0x248 ___slab_alloc+0x22c/0xa38 __slab_alloc.isra.0+0x60/0x88 kmem_cache_alloc_node_noprof+0x148/0x3c0 __kmem_cache_create+0x5d4/0x680 create_boot_cache+0xc4/0x128 new_kmalloc_cache+0x1ac/0x1d8 create_kmalloc_caches+0xac/0x108 kmem_cache_init+0x1cc/0x340 mm_core_init+0x458/0x560 start_kernel+0x9ac/0x11e0 start_parisc+0x188/0x1b0 The interrupt subsystem is initialized way later and request_irq() only works after that point. I'm 100% sure by now that this commit has absolutely nothing to do with the underlying problem. All it does is changing the code size and placement of text, data and .... So I finally managed to reproduce with gcc-13.3.0 from the k.org cross tools (the debian 12.2.2 does not). If I add: --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1513,6 +1513,8 @@ static int unsigned long flags, thread_mask = 0; int ret, nested, shared = 0; + pr_info("%u: %08x\n", irq, new->flags); + if (!desc) return -EINVAL; it still reproduces. If I change that to: --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1513,6 +1513,8 @@ static int unsigned long flags, thread_mask = 0; int ret, nested, shared = 0; + new->flags &= ~IRQF_COND_ONESHOT; + if (!desc) return -EINVAL; that does neither cure it (unsurprisingly). Reverting the "offending" commit cures it. So I tried your + pr_info_once("####### First timer interrupt\n"); which cures it too. So now where is the difference between the printk() in __setup_irq(), the new->flag mangling, the revert and the printk() in timer interrupt? There is ZERO functional change. There is no race either because at that point everything is single threaded and interrupts are disabled. The only thing which changes is the code and data layout as I can observe when looking at System.map of the builds. I stared at the diffs for a bit, but nothing stood out. Maybe the PARISC people can shed some light on it. This reminds me of the x86 32-bit disaster we debugged a few days ago... Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-05 21:49 ` Thomas Gleixner @ 2024-08-06 1:16 ` Guenter Roeck 0 siblings, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-06 1:16 UTC (permalink / raw) To: Thomas Gleixner, Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On 8/5/24 14:49, Thomas Gleixner wrote: > On Mon, Aug 05 2024 at 08:02, Guenter Roeck wrote: >> On 8/5/24 05:51, Thomas Gleixner wrote: >>> IRQF_COND_ONESHOT has only an effect when >>> >>> 1) Interrupt is shared >>> 2) First interrupt request has IRQF_ONESHOT set >>> >>> Neither #1 nor #2 are true, but maybe your current config enables some moar >>> devices than the one on your website. >>> >> >> No, it is pretty much the same, except for a more recent C compiler, and it >> requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/. >> >> Debugging shows pretty much the same for me, and any log message added >> to request_irq() makes the problem go away (or be different), and if the problem >> is seen it doesn't even get to the third interrupt request. I copied a more complete >> log to bad.log.gz in above page. > > At the point where the problem happens is way before the first interrupt > is requested, so that definitely excludes any side effect of > COND_ONESHOT. > > It happens right after > > [ 0.000000] Memory: 991812K/1048576K available (16752K kernel code, 5220K rwdata, 3044K rodata, 760K init, 1424K bss, 56764K reserved, 0K cma-reserved) > SNIP > [ 0.000000] ** This system shows unhashed kernel memory addresses ** > SNIP > > I.e. the big fat warning about unhashed kernel addresses) > > In the good case the first interrupt is requested here: > > [ 0.000000] Memory: 992804K/1048576K available (16532K kernel code, 5096K rwdata, 2984K rodata, 744K init, 1412K bss, 55772K reserved, 0K cma-reserved) > SNIP > [ 0.000000] ** This system shows unhashed kernel memory addresses ** > SNIP > [ 0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > [ 0.000000] ODEBUG: selftest passed > [ 0.000000] rcu: Hierarchical RCU implementation. > [ 0.000000] rcu: RCU event tracing is enabled. > [ 0.000000] rcu: RCU callback double-/use-after-free debug is enabled. > [ 0.000000] rcu: RCU debug extended QS entry/exit. > [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. > > [ 0.000000] NR_IRQS: 128 <- This is where the interrupt > subsystem is initialized > [ 0.000000] genirq: 64: 00215600 <- This is probably the timer interrupt > > Looking at the backtrace the fail happens from within start_kernel(), > i.e. mm_core_init(): > > slab_err+0x13c/0x190 > check_slab+0xf4/0x140 > alloc_debug_processing+0x58/0x248 > ___slab_alloc+0x22c/0xa38 > __slab_alloc.isra.0+0x60/0x88 > kmem_cache_alloc_node_noprof+0x148/0x3c0 > __kmem_cache_create+0x5d4/0x680 > create_boot_cache+0xc4/0x128 > new_kmalloc_cache+0x1ac/0x1d8 > create_kmalloc_caches+0xac/0x108 > kmem_cache_init+0x1cc/0x340 > mm_core_init+0x458/0x560 > start_kernel+0x9ac/0x11e0 > start_parisc+0x188/0x1b0 > > The interrupt subsystem is initialized way later and request_irq() only > works after that point. > > I'm 100% sure by now that this commit has absolutely nothing to do with > the underlying problem. All it does is changing the code size and > placement of text, data and .... > > So I finally managed to reproduce with gcc-13.3.0 from the k.org cross > tools (the debian 12.2.2 does not). > > If I add: > > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -1513,6 +1513,8 @@ static int > unsigned long flags, thread_mask = 0; > int ret, nested, shared = 0; > > + pr_info("%u: %08x\n", irq, new->flags); > + > if (!desc) > return -EINVAL; > > it still reproduces. If I change that to: > > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -1513,6 +1513,8 @@ static int > unsigned long flags, thread_mask = 0; > int ret, nested, shared = 0; > > + new->flags &= ~IRQF_COND_ONESHOT; > + > if (!desc) > return -EINVAL; > > that does neither cure it (unsurprisingly). > > Reverting the "offending" commit cures it. > > So I tried your > > + pr_info_once("####### First timer interrupt\n"); > > which cures it too. > > So now where is the difference between the printk() in __setup_irq(), > the new->flag mangling, the revert and the printk() in timer interrupt? > > There is ZERO functional change. There is no race either because at that > point everything is single threaded and interrupts are disabled. > > The only thing which changes is the code and data layout as I can > observe when looking at System.map of the builds. I stared at the diffs > for a bit, but nothing stood out. > > Maybe the PARISC people can shed some light on it. > > This reminds me of the x86 32-bit disaster we debugged a few days ago... > Looks like it :-( Thanks for checking ! Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-05 8:56 ` Thomas Gleixner 2024-08-05 12:51 ` Thomas Gleixner @ 2024-08-05 17:42 ` Guenter Roeck 1 sibling, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-05 17:42 UTC (permalink / raw) To: Thomas Gleixner, Greg Kroah-Hartman, stable Cc: patches, linux-kernel, torvalds, akpm, shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw, rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki, Helge Deller, Parisc List On 8/5/24 01:56, Thomas Gleixner wrote: > On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote: >> On 8/4/24 11:36, Guenter Roeck wrote: >>>> Rafael J. Wysocki <rafael.j.wysocki@intel.com> >>>> genirq: Set IRQF_COND_ONESHOT in request_irq() >>>> >>> >>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages >>> >>> [ 0.000000] ============================================================================= >>> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 >>> [ 0.000000] ----------------------------------------------------------------------------- > > Do you have a full boot log? It's unclear to me at which point of the boot > process this happens. Is this before or after the secondary CPUs have > been brought up? > >>> This never stops until the emulation aborts. > > Do you have a recipe how to reproduce? > >>> Reverting this patch fixes the problem for me. >>> >>> I noticed a similar problem in the mainline kernel but it is either spurious there >>> or the problem has been fixed. >>> >> >> As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me. >> I guess that suggests some kind of race condition. >> >> >> @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler, >> struct irq_desc *desc; >> int retval; >> >> + udelay(1); >> + >> if (irq == IRQ_NOTCONNECTED) >> return -ENOTCONN; > > That all makes absolutely no sense to me. > Same here, really. I can reproduce the problem with v6.10.3, using my configuration, but whatever debugging I add makes the problem disappear. I had seen the same problem on mainline with v6.11-rc1-272-g17712b7ea075. Log is at https://kerneltests.org/builders/qemu-parisc64-master/builds/168/steps/qemubuildcommand/logs/stdio However, I can no longer reproduce it there. What makes it even more weird / odd is that I can bisect the problem between v6.10.2 and v6.10.3 and it points to this commit, but reproducing it outside that chain seems to be all but impossible. Guenter > IRQF_COND_ONESHOT has only an effect on shared interrupts, when the > interrupt was already requested with IRQF_ONESHOT. > > If this is really a race then the following must be true: > > 1) no delay > > CPU0 CPU1 > request_irq(IRQF_ONESHOT) > request_irq(IRQF_COND_ONESHOT) > > 2) delay > > CPU0 CPU1 > request_irq(IRQF_COND_ONESHOT) > request_irq(IRQF_ONESHOT) > > In this case the request on CPU 0 fails with -EBUSY ... > > Confused > > tglx > > ^ permalink raw reply [flat|nested] 33+ messages in thread
[parent not found: <CAHk-=wiZ7WJQ1y=CwuMwqBxQYtaD8psq+Vxa3r1Z6_ftDZK+hA@mail.gmail.com>]
[parent not found: <53b2e1f2-4291-48e5-a668-7cf57d900ecd@suse.cz>]
[parent not found: <87le194kuq.ffs@tglx>]
[parent not found: <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz>]
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review [not found] ` <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz> @ 2024-08-06 23:24 ` Thomas Gleixner 2024-08-07 0:49 ` James Bottomley 2024-08-08 1:07 ` Guenter Roeck 0 siblings, 2 replies; 33+ messages in thread From: Thomas Gleixner @ 2024-08-06 23:24 UTC (permalink / raw) To: Vlastimil Babka, Linus Torvalds, Guenter Roeck Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc Cc+: Helge, parisc ML We're chasing a weird failure which has been tracked down to the placement of the division library functions (I assume they are imported from libgcc). See the thread starting at: https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: > On 8/6/24 19:33, Thomas Gleixner wrote: >> >> So this change adds 16 bytes to __softirq() which moves the division >> functions up by 16 bytes. That's all it takes to make the stupid go >> away.... > > Heh I was actually wondering if the division is somhow messed up because > maxobj = order_objects() and order_objects() does a division. Now I suspect > it even more. check_slab() calls into that muck, but I checked the disassembly of a working and a broken kernel and the only difference there is the displacement offset when the code calculates the call address, but that's as expected a difference of 16 bytes. Now it becomes interesting. I added a unused function after __do_softirq() into the softirq text section and filled it with ASM nonsense so that it occupies exactly one page. That moves $$divoI, which is what check_slab() calls, exactly one page forward: -0000000041218c70 T $$divoI +0000000041219c70 T $$divoI Guess what happens? If falls on it's nose again. Now with that ASM gunk I can steer the size conveniently. It works up to: 0000000041219c50 T $$divoI and fails for 0000000041219c60 T $$divoI 0000000041219c70 T $$divoI and works again at 0000000041219c80 T $$divoI So I added the following: +extern void testme(void); +extern unsigned int testsize; + +unsigned int testsize = 192; + +void __init testme(void) +{ + pr_info("TESTME: %lu\n", PAGE_SIZE / testsize); +} called that _before_ mm_core_init() from init/main.c and adjusted my ASM hack to make $$divoI be at: 0000000041219c70 T $$divoI again and surprisingly the output is: [ 0.000000] softirq: TESTME: 21 Now I went back to the hppa64 gcc version 12.2.0 again and did the same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 in the page and the same happens. So it's not a compiler dependent problem. But then I added a testme() call to the error path and get: [ 0.000000] softirq: TESTME: 21 [ 0.000000] ============================================================================= [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0 Now what's wrong? Adding more debug: [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0 21 where the last '21' is the output of the same call which made maxobj go south: static int check_slab(struct kmem_cache *s, struct slab *slab) { int maxobj; @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache maxobj = order_objects(slab_order(slab), s->size); if (slab->objects > maxobj) { - slab_err(s, slab, "objects %u > max %u", - slab->objects, maxobj); + testme(); + slab_err(s, slab, "objects %u > max %u size %u sorder %u %u", + slab->objects, maxobj, s->size, slab_order(slab), + order_objects(slab_order(slab), s->size)); return 0; } if (slab->inuse > slab->objects) { I don't know and I don't want to know TBH... Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-06 23:24 ` Thomas Gleixner @ 2024-08-07 0:49 ` James Bottomley 2024-08-07 1:38 ` Guenter Roeck 2024-08-07 12:45 ` Thomas Gleixner 2024-08-08 1:07 ` Guenter Roeck 1 sibling, 2 replies; 33+ messages in thread From: James Bottomley @ 2024-08-07 0:49 UTC (permalink / raw) To: Thomas Gleixner, Vlastimil Babka, Linus Torvalds, Guenter Roeck Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote: > Cc+: Helge, parisc ML > > We're chasing a weird failure which has been tracked down to the > placement of the division library functions (I assume they are > imported > from libgcc). > > See the thread starting at: > > > https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net > > On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: > > On 8/6/24 19:33, Thomas Gleixner wrote: > > > > > > So this change adds 16 bytes to __softirq() which moves the > > > division > > > functions up by 16 bytes. That's all it takes to make the stupid > > > go > > > away.... > > > > Heh I was actually wondering if the division is somhow messed up > > because > > maxobj = order_objects() and order_objects() does a division. Now I > > suspect > > it even more. > > check_slab() calls into that muck, but I checked the disassembly of a > working and a broken kernel and the only difference there is the > displacement offset when the code calculates the call address, but > that's as expected a difference of 16 bytes. > > Now it becomes interesting. > > I added a unused function after __do_softirq() into the softirq text > section and filled it with ASM nonsense so that it occupies exactly > one > page. That moves $$divoI, which is what check_slab() calls, exactly > one > page forward: > > -0000000041218c70 T $$divoI > +0000000041219c70 T $$divoI > > Guess what happens? If falls on it's nose again. > > Now with that ASM gunk I can steer the size conveniently. It works up > to: > > 0000000041219c50 T $$divoI > > and fails for > > 0000000041219c60 T $$divoI > 0000000041219c70 T $$divoI > > and works again at > > 0000000041219c80 T $$divoI So just on this, you seem to have proved that only exact multiples of 48 work. In terms of how PA-RISC caching works that's completely nuts ... however, there may be something else at work, like stack frame alignment. > > So I added the following: > > +extern void testme(void); > +extern unsigned int testsize; > + > +unsigned int testsize = 192; > + > +void __init testme(void) > +{ > + pr_info("TESTME: %lu\n", PAGE_SIZE / testsize); > +} > > called that _before_ mm_core_init() from init/main.c and adjusted my > ASM hack to make $$divoI be at: > > 0000000041219c70 T $$divoI > > again and surprisingly the output is: > > [ 0.000000] softirq: TESTME: 21 OK, why is that surprising? 4096/192 is 21 due to integer rounding. > Now I went back to the hppa64 gcc version 12.2.0 again and did the > same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 > in the page and the same happens. > > So it's not a compiler dependent problem. > > But then I added a testme() call to the error path and get: > > [ 0.000000] softirq: TESTME: 21 > [ 0.000000] > ===================================================================== > ======== > [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 > size 192 sorder 0 > > Now what's wrong? > > Adding more debug: > > [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 > size 192 sorder 0 21 > > where the last '21' is the output of the same call which made maxobj > go > south: > > static int check_slab(struct kmem_cache *s, struct slab *slab) > { > int maxobj; > @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache > > maxobj = order_objects(slab_order(slab), s->size); > if (slab->objects > maxobj) { > - slab_err(s, slab, "objects %u > max %u", > - slab->objects, maxobj); > + testme(); > + slab_err(s, slab, "objects %u > max %u size %u sorder > %u %u", > + slab->objects, maxobj, s->size, > slab_order(slab), > + order_objects(slab_order(slab), s->size)); > return 0; > } > if (slab->inuse > slab->objects) { > > I don't know and I don't want to know TBH... OK, so you're telling us we have a problem with slab_order on parisc ... that's folio_order, so it smells like a parisc bug with folio_test_large? Unfortuntely I'm a bit pissed in an airport lounge on my way to the UK, so I've lost access to my pa test rig and can't test further for a while. Regards, James ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-07 0:49 ` James Bottomley @ 2024-08-07 1:38 ` Guenter Roeck 2024-08-07 12:45 ` Thomas Gleixner 1 sibling, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-07 1:38 UTC (permalink / raw) To: James Bottomley, Thomas Gleixner, Vlastimil Babka, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On 8/6/24 17:49, James Bottomley wrote: > On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote: >> Cc+: Helge, parisc ML >> >> We're chasing a weird failure which has been tracked down to the >> placement of the division library functions (I assume they are >> imported >> from libgcc). >> >> See the thread starting at: >> >> >> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net >> >> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: >>> On 8/6/24 19:33, Thomas Gleixner wrote: >>>> >>>> So this change adds 16 bytes to __softirq() which moves the >>>> division >>>> functions up by 16 bytes. That's all it takes to make the stupid >>>> go >>>> away.... >>> >>> Heh I was actually wondering if the division is somhow messed up >>> because >>> maxobj = order_objects() and order_objects() does a division. Now I >>> suspect >>> it even more. >> >> check_slab() calls into that muck, but I checked the disassembly of a >> working and a broken kernel and the only difference there is the >> displacement offset when the code calculates the call address, but >> that's as expected a difference of 16 bytes. >> >> Now it becomes interesting. >> >> I added a unused function after __do_softirq() into the softirq text >> section and filled it with ASM nonsense so that it occupies exactly >> one >> page. That moves $$divoI, which is what check_slab() calls, exactly >> one >> page forward: >> >> -0000000041218c70 T $$divoI >> +0000000041219c70 T $$divoI >> >> Guess what happens? If falls on it's nose again. >> >> Now with that ASM gunk I can steer the size conveniently. It works up >> to: >> >> 0000000041219c50 T $$divoI >> >> and fails for >> >> 0000000041219c60 T $$divoI >> 0000000041219c70 T $$divoI >> >> and works again at >> >> 0000000041219c80 T $$divoI > > So just on this, you seem to have proved that only exact multiples of > 48 work. In terms of how PA-RISC caching works that's completely nuts > ... however, there may be something else at work, like stack frame > alignment. > >> >> So I added the following: >> >> +extern void testme(void); >> +extern unsigned int testsize; >> + >> +unsigned int testsize = 192; >> + >> +void __init testme(void) >> +{ >> + pr_info("TESTME: %lu\n", PAGE_SIZE / testsize); >> +} >> >> called that _before_ mm_core_init() from init/main.c and adjusted my >> ASM hack to make $$divoI be at: >> >> 0000000041219c70 T $$divoI >> >> again and surprisingly the output is: >> >> [ 0.000000] softirq: TESTME: 21 > > OK, why is that surprising? 4096/192 is 21 due to integer rounding. > Problem is that it sometimes wrongly returns 16. But not always. The surprise may be that it is not consistently wrong, even if divU is at the same location. >> Now I went back to the hppa64 gcc version 12.2.0 again and did the >> same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 >> in the page and the same happens. >> >> So it's not a compiler dependent problem. >> >> But then I added a testme() call to the error path and get: >> >> [ 0.000000] softirq: TESTME: 21 >> [ 0.000000] >> ===================================================================== >> ======== >> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 >> size 192 sorder 0 >> >> Now what's wrong? >> >> Adding more debug: >> >> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 >> size 192 sorder 0 21 >> >> where the last '21' is the output of the same call which made maxobj >> go >> south: >> >> static int check_slab(struct kmem_cache *s, struct slab *slab) >> { >> int maxobj; >> @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache >> >> maxobj = order_objects(slab_order(slab), s->size); >> if (slab->objects > maxobj) { >> - slab_err(s, slab, "objects %u > max %u", >> - slab->objects, maxobj); >> + testme(); >> + slab_err(s, slab, "objects %u > max %u size %u sorder >> %u %u", >> + slab->objects, maxobj, s->size, >> slab_order(slab), >> + order_objects(slab_order(slab), s->size)); >> return 0; >> } >> if (slab->inuse > slab->objects) { >> >> I don't know and I don't want to know TBH... > > OK, so you're telling us we have a problem with slab_order on parisc > ... that's folio_order, so it smells like a parisc bug with > folio_test_large? Unfortuntely I'm a bit pissed in an airport lounge > on my way to the UK, so I've lost access to my pa test rig and can't > test further for a while. > I think the problem is rather that divU, for some unknown reason, sometimes returns bad results. The divU implementation in libgcc isn't exactly easy to understand. There may be some context problem, such as the upper bits of some register not being cleared under some circumstances. Look at my recent commits into arch/parisc for examples how similar problems survived for a long time in the Linux kernel. I'd strongly suspect a qemu emulation problem if it wasn't for those other problems. Still, I think it is very likely that the problem lies in qemu, but we might need a confirmation one way or the other from a test on real hardware. Thanks, Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-07 0:49 ` James Bottomley 2024-08-07 1:38 ` Guenter Roeck @ 2024-08-07 12:45 ` Thomas Gleixner 1 sibling, 0 replies; 33+ messages in thread From: Thomas Gleixner @ 2024-08-07 12:45 UTC (permalink / raw) To: James Bottomley, Vlastimil Babka, Linus Torvalds, Guenter Roeck Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On Tue, Aug 06 2024 at 20:49, James Bottomley wrote: > On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote: >> static int check_slab(struct kmem_cache *s, struct slab *slab) >> { >> int maxobj; >> @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache >> >> maxobj = order_objects(slab_order(slab), s->size); >> if (slab->objects > maxobj) { >> - slab_err(s, slab, "objects %u > max %u", >> - slab->objects, maxobj); >> + testme(); >> + slab_err(s, slab, "objects %u > max %u size %u sorder >> %u %u", >> + slab->objects, maxobj, s->size, >> slab_order(slab), >> + order_objects(slab_order(slab), s->size)); >> return 0; >> } >> if (slab->inuse > slab->objects) { >> >> I don't know and I don't want to know TBH... > > OK, so you're telling us we have a problem with slab_order on parisc > ... that's folio_order, so it smells like a parisc bug with > folio_test_large? Unfortuntely I'm a bit pissed in an airport lounge > on my way to the UK, so I've lost access to my pa test rig and can't > test further for a while. The point is that there are two invocations for order_objects(...) in that code. maxobj = order_objects(slab_order(slab), s->size); and the extra one in the slab_err() output: slab_err(s, slab, "objects %u > max %u size %u sorder %u %u", slab->objects, maxobj, s->size,slab_order(slab), order_objects(slab_order(slab), s->size)); >> [ 0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 >> size 192 sorder 0 21 So maxobj = 16 and the second invocation correctly returns 21, if and only if the $$divoI placement is in that weird range. When I move it out of that range then both return 21 as expected. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-06 23:24 ` Thomas Gleixner 2024-08-07 0:49 ` James Bottomley @ 2024-08-08 1:07 ` Guenter Roeck 2024-08-08 7:48 ` Vlastimil Babka 2024-08-08 9:57 ` Thomas Gleixner 1 sibling, 2 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-08 1:07 UTC (permalink / raw) To: Thomas Gleixner, Vlastimil Babka, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On 8/6/24 16:24, Thomas Gleixner wrote: > Cc+: Helge, parisc ML > > We're chasing a weird failure which has been tracked down to the > placement of the division library functions (I assume they are imported > from libgcc). > > See the thread starting at: > > https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net > > On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: >> On 8/6/24 19:33, Thomas Gleixner wrote: >>> >>> So this change adds 16 bytes to __softirq() which moves the division >>> functions up by 16 bytes. That's all it takes to make the stupid go >>> away.... >> >> Heh I was actually wondering if the division is somhow messed up because >> maxobj = order_objects() and order_objects() does a division. Now I suspect >> it even more. > > check_slab() calls into that muck, but I checked the disassembly of a > working and a broken kernel and the only difference there is the > displacement offset when the code calculates the call address, but > that's as expected a difference of 16 bytes. > > Now it becomes interesting. > > I added a unused function after __do_softirq() into the softirq text > section and filled it with ASM nonsense so that it occupies exactly one > page. That moves $$divoI, which is what check_slab() calls, exactly one > page forward: > With the above added to my tree, I can also play around with the code. Here is the next weird one: diff --git a/mm/slub.c b/mm/slub.c index 4927edec6a8c..b8a33966d858 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) } maxobj = order_objects(slab_order(slab), s->size); + + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj); + if (slab->objects > maxobj) { slab_err(s, slab, "objects %u > max %u", slab->objects, maxobj); results in: ##### slab->objects=21 maxobj=21 ============================================================================= BUG kmem_cache_node (Not tainted): objects 21 > max 16 As Thomas noticed, this only happens if the divide assembler code is within a certain address range. Ok, now I am really lost. Guenter ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 1:07 ` Guenter Roeck @ 2024-08-08 7:48 ` Vlastimil Babka 2024-08-08 14:46 ` Guenter Roeck 2024-08-08 9:57 ` Thomas Gleixner 1 sibling, 1 reply; 33+ messages in thread From: Vlastimil Babka @ 2024-08-08 7:48 UTC (permalink / raw) To: Guenter Roeck, Thomas Gleixner, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On 8/8/24 03:07, Guenter Roeck wrote: > On 8/6/24 16:24, Thomas Gleixner wrote: >> Cc+: Helge, parisc ML >> >> We're chasing a weird failure which has been tracked down to the >> placement of the division library functions (I assume they are imported >> from libgcc). >> >> See the thread starting at: >> >> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net >> >> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: >>> On 8/6/24 19:33, Thomas Gleixner wrote: >>>> >>>> So this change adds 16 bytes to __softirq() which moves the division >>>> functions up by 16 bytes. That's all it takes to make the stupid go >>>> away.... >>> >>> Heh I was actually wondering if the division is somhow messed up because >>> maxobj = order_objects() and order_objects() does a division. Now I suspect >>> it even more. >> >> check_slab() calls into that muck, but I checked the disassembly of a >> working and a broken kernel and the only difference there is the >> displacement offset when the code calculates the call address, but >> that's as expected a difference of 16 bytes. >> >> Now it becomes interesting. >> >> I added a unused function after __do_softirq() into the softirq text >> section and filled it with ASM nonsense so that it occupies exactly one >> page. That moves $$divoI, which is what check_slab() calls, exactly one >> page forward: >> > > With the above added to my tree, I can also play around with the code. > Here is the next weird one: > > diff --git a/mm/slub.c b/mm/slub.c > index 4927edec6a8c..b8a33966d858 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) > } > > maxobj = order_objects(slab_order(slab), s->size); > + > + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj); > + > if (slab->objects > maxobj) { > slab_err(s, slab, "objects %u > max %u", > slab->objects, maxobj); > > results in: > > ##### slab->objects=21 maxobj=21 > ============================================================================= > BUG kmem_cache_node (Not tainted): objects 21 > max 16 But is this printed from the same attempt? The pr_info_once() might have printed earlier and then stopped (as it's _once) and the error case might have happened only later, and there was nothing printed in between as the kmalloc caches are created in a loop. > As Thomas noticed, this only happens if the divide assembler code is within a certain > address range. > > Ok, now I am really lost. > > Guenter > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 7:48 ` Vlastimil Babka @ 2024-08-08 14:46 ` Guenter Roeck 0 siblings, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-08 14:46 UTC (permalink / raw) To: Vlastimil Babka, Thomas Gleixner, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On 8/8/24 00:48, Vlastimil Babka wrote: > On 8/8/24 03:07, Guenter Roeck wrote: >> On 8/6/24 16:24, Thomas Gleixner wrote: >>> Cc+: Helge, parisc ML >>> >>> We're chasing a weird failure which has been tracked down to the >>> placement of the division library functions (I assume they are imported >>> from libgcc). >>> >>> See the thread starting at: >>> >>> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net >>> >>> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote: >>>> On 8/6/24 19:33, Thomas Gleixner wrote: >>>>> >>>>> So this change adds 16 bytes to __softirq() which moves the division >>>>> functions up by 16 bytes. That's all it takes to make the stupid go >>>>> away.... >>>> >>>> Heh I was actually wondering if the division is somhow messed up because >>>> maxobj = order_objects() and order_objects() does a division. Now I suspect >>>> it even more. >>> >>> check_slab() calls into that muck, but I checked the disassembly of a >>> working and a broken kernel and the only difference there is the >>> displacement offset when the code calculates the call address, but >>> that's as expected a difference of 16 bytes. >>> >>> Now it becomes interesting. >>> >>> I added a unused function after __do_softirq() into the softirq text >>> section and filled it with ASM nonsense so that it occupies exactly one >>> page. That moves $$divoI, which is what check_slab() calls, exactly one >>> page forward: >>> >> >> With the above added to my tree, I can also play around with the code. >> Here is the next weird one: >> >> diff --git a/mm/slub.c b/mm/slub.c >> index 4927edec6a8c..b8a33966d858 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) >> } >> >> maxobj = order_objects(slab_order(slab), s->size); >> + >> + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj); >> + >> if (slab->objects > maxobj) { >> slab_err(s, slab, "objects %u > max %u", >> slab->objects, maxobj); >> >> results in: >> >> ##### slab->objects=21 maxobj=21 >> ============================================================================= >> BUG kmem_cache_node (Not tainted): objects 21 > max 16 > > But is this printed from the same attempt? The pr_info_once() might have > printed earlier and then stopped (as it's _once) and the error case might > have happened only later, and there was nothing printed in between as the > kmalloc caches are created in a loop. > No, of course it isn't. Guess it was too late. Sorry for the noise. Here is the updated log, after dropping the _once: ... [ 0.000000] ##### slab->objects=21 maxobj=21 [ 0.000000] ##### slab->objects=25 maxobj=25 [ 0.000000] ##### slab->objects=21 maxobj=16 [ 0.000000] ============================================================================= [ 0.000000] BUG kmalloc-256 (Not tainted): objects 21 > max 16 So this works many times and then suddenly fails. I thought it was the other way, that it failed the very first time. Ok, back to debugging. Thanks! Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 1:07 ` Guenter Roeck 2024-08-08 7:48 ` Vlastimil Babka @ 2024-08-08 9:57 ` Thomas Gleixner 2024-08-08 14:59 ` Guenter Roeck 2024-08-08 15:53 ` Linus Torvalds 1 sibling, 2 replies; 33+ messages in thread From: Thomas Gleixner @ 2024-08-08 9:57 UTC (permalink / raw) To: Guenter Roeck, Vlastimil Babka, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On Wed, Aug 07 2024 at 18:07, Guenter Roeck wrote: > On 8/6/24 16:24, Thomas Gleixner wrote: > diff --git a/mm/slub.c b/mm/slub.c > index 4927edec6a8c..b8a33966d858 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) > } > > maxobj = order_objects(slab_order(slab), s->size); > + > + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj); > + > if (slab->objects > maxobj) { > slab_err(s, slab, "objects %u > max %u", > slab->objects, maxobj); > > results in: > > ##### slab->objects=21 maxobj=21 > ============================================================================= > BUG kmem_cache_node (Not tainted): objects 21 > max 16 Careful vs. the pr_once(). It's not necessarily the first allocation which trips up. I removed slab_err() in that condition and just printed the data: [ 0.000000] Order: 1 Size: 384 Nobj: 21 Maxobj: 16 21 Inuse: 14 [ 0.000000] Order: 0 Size: 168 Nobj: 24 Maxobj: 16 24 Inuse: 1 [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 18 [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 19 [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 20 [ 0.000000] Order: 0 Size: 160 Nobj: 25 Maxobj: 16 25 Inuse: 5 [ 0.000000] Order: 2 Size: 672 Nobj: 24 Maxobj: 16 24 Inuse: 1 [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 1 [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 2 [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10 The maxobj column shows the failed result and the result from the second invocation inside of the printk(). Let's wait for PARISC people to run it on actual hardware. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 9:57 ` Thomas Gleixner @ 2024-08-08 14:59 ` Guenter Roeck 2024-08-08 15:58 ` John David Anglin 2024-08-08 15:53 ` Linus Torvalds 1 sibling, 1 reply; 33+ messages in thread From: Guenter Roeck @ 2024-08-08 14:59 UTC (permalink / raw) To: Thomas Gleixner, Vlastimil Babka, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On 8/8/24 02:57, Thomas Gleixner wrote: > On Wed, Aug 07 2024 at 18:07, Guenter Roeck wrote: >> On 8/6/24 16:24, Thomas Gleixner wrote: >> diff --git a/mm/slub.c b/mm/slub.c >> index 4927edec6a8c..b8a33966d858 100644 >> --- a/mm/slub.c >> +++ b/mm/slub.c >> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab) >> } >> >> maxobj = order_objects(slab_order(slab), s->size); >> + >> + pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj); >> + >> if (slab->objects > maxobj) { >> slab_err(s, slab, "objects %u > max %u", >> slab->objects, maxobj); >> >> results in: >> >> ##### slab->objects=21 maxobj=21 >> ============================================================================= >> BUG kmem_cache_node (Not tainted): objects 21 > max 16 > > Careful vs. the pr_once(). It's not necessarily the first allocation Yes. Thanks for pointing that out. Rookie mistake :-(. > which trips up. I removed slab_err() in that condition and just printed > the data: > > [ 0.000000] Order: 1 Size: 384 Nobj: 21 Maxobj: 16 21 Inuse: 14 > [ 0.000000] Order: 0 Size: 168 Nobj: 24 Maxobj: 16 24 Inuse: 1 > [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 18 > [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 19 > [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 20 > [ 0.000000] Order: 0 Size: 160 Nobj: 25 Maxobj: 16 25 Inuse: 5 > [ 0.000000] Order: 2 Size: 672 Nobj: 24 Maxobj: 16 24 Inuse: 1 > [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 1 > [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 2 > [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10 > > The maxobj column shows the failed result and the result from the second > invocation inside of the printk(). > > Let's wait for PARISC people to run it on actual hardware. > Agreed. I suspect that there is a carry or upper 32 bit of a register not handled properly, but I have no idea where that might be or why that would only be seen if the div functions are located in a certain address range. Thanks, Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 14:59 ` Guenter Roeck @ 2024-08-08 15:58 ` John David Anglin 0 siblings, 0 replies; 33+ messages in thread From: John David Anglin @ 2024-08-08 15:58 UTC (permalink / raw) To: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, Linus Torvalds Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc On 2024-08-08 10:59 a.m., Guenter Roeck wrote: >> Let's wait for PARISC people to run it on actual hardware. >> > > Agreed. I suspect that there is a carry or upper 32 bit of a register not > handled properly, but I have no idea where that might be or why that would > only be seen if the div functions are located in a certain address range. I'm doubtful there's a coding issue in $$divoI. The routine was written by HP and it's used on both HP-UX and Linux. The routine can be found in libgcc/config/pa/milli64.S The routine can trap: Divide by zero is trapped. Divide of -2**31 by -1 is trapped for $$divoI but not for $$divI. $$divoI is a millicode routine. Not sure what calls it. gcc doesn't call it. gcc uses $$divI. It appears you are testing a 64-bit kernel. There might be issues calling millicode routines when branch distance exceeds approximately 4 MB. Millicode routines have a special calling sequence. You could try building kernel with -mlong-calls. Kernel will get bugger and slower with this option. Dave -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 9:57 ` Thomas Gleixner 2024-08-08 14:59 ` Guenter Roeck @ 2024-08-08 15:53 ` Linus Torvalds 2024-08-08 16:12 ` Thomas Gleixner 1 sibling, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2024-08-08 15:53 UTC (permalink / raw) To: Thomas Gleixner Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@linutronix.de> wrote: > > Careful vs. the pr_once(). It's not necessarily the first allocation > which trips up. I removed slab_err() in that condition and just printed > the data: > > [ 0.000000] Order: 1 Size: 384 Nobj: 21 Maxobj: 16 21 Inuse: 14 > [ 0.000000] Order: 0 Size: 168 Nobj: 24 Maxobj: 16 24 Inuse: 1 > [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 18 > [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 19 > [ 0.000000] Order: 1 Size: 320 Nobj: 25 Maxobj: 16 25 Inuse: 20 > [ 0.000000] Order: 0 Size: 160 Nobj: 25 Maxobj: 16 25 Inuse: 5 > [ 0.000000] Order: 2 Size: 672 Nobj: 24 Maxobj: 16 24 Inuse: 1 > [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 1 > [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 2 > [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10 > > The maxobj column shows the failed result and the result from the second > invocation inside of the printk(). Hmm. There's a few patterns there: - the incorrect Maxobj is always 16, with wildly different sizes. - the correct value is always in that 21-25 range and neither of these are particularly common cases for slab objects (well, at least on x86-64). I actually went into the gcc sources to look at the libgcc routines for the hppa $$divU routine, but apart from checking for trivial powers-of-two and for divisions with small divisor values (<=17), all it is ends up being a series of "ds" (divide step) and "addc" instructions. I don't see how that could possibly mess up. It does end up with the final addc in the delay slot of the return, but that's normal parisc behavior (and here by "normal" I mean "it's a really messed up instruction set that did everything wrong, including branch delay slots") I do note that the $$divU function (which is what this all should use) oddly doesn't show up as defined in 'nm' for me when I look at Guenter's vmlinux file. So there's some odd linker thing going on, and it *only* affects the $$div* functions. Thomas' System.map shows some of the same effects, ie it shows $$divoI (signed integer divide with overflow checking), but doesn't show $$divU that is right after it. The reason I was looking was exactly because this should be using $$divU, and clearly code alignment is implicated somehow, but the exact alignment of $$divU wasn't obvious. But it looks like "$$divU" should be somewhere between $$divoI and $$divl_2, and in Guenter's bad case that's 0000000041218c70 T $$divoI 00000000412190d0 T $$divI_2 so *maybe* $$divU is around a page boundary? 0000000041218xxx turning into 0000000041219000? Some ITLB fill issue together with that delayed branch and a qemu bug? Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 15:53 ` Linus Torvalds @ 2024-08-08 16:12 ` Thomas Gleixner 2024-08-08 16:33 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-08 16:12 UTC (permalink / raw) To: Linus Torvalds Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc On Thu, Aug 08 2024 at 08:53, Linus Torvalds wrote: > On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@linutronix.de> wrote: > Hmm. There's a few patterns there: > > - the incorrect Maxobj is always 16, with wildly different sizes. Which means that the size value is rounded up to the next power of 2 >> [ 0.000000] Order: 1 Size: 384 Nobj: 21 Maxobj: 16 21 Inuse: 14 8192/16 = 512 >> [ 0.000000] Order: 0 Size: 168 Nobj: 24 Maxobj: 16 24 Inuse: 1 4096/16 = 256 >> [ 0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 1 32768/16 = 2048 >> The maxobj column shows the failed result and the result from the second >> invocation inside of the printk(). > I actually went into the gcc sources to look at the libgcc routines > for the hppa $$divU routine, but apart from checking for trivial > powers-of-two and for divisions with small divisor values (<=17), all > it is ends up being a series of "ds" (divide step) and "addc" > instructions. I don't see how that could possibly mess up. It does end > up with the final addc in the delay slot of the return, but that's > normal parisc behavior (and here by "normal" I mean "it's a really > messed up instruction set that did everything wrong, including branch > delay slots") > > I do note that the $$divU function (which is what this all should use) > oddly doesn't show up as defined in 'nm' for me when I look at > Guenter's vmlinux file. So there's some odd linker thing going on, and > it *only* affects the $$div* functions. > > Thomas' System.map shows some of the same effects, ie it shows $$divoI > (signed integer divide with overflow checking), but doesn't show > $$divU that is right after it. The reason I was looking was exactly > because this should be using $$divU, and clearly code alignment is > implicated somehow, but the exact alignment of $$divU wasn't obvious. > > But it looks like "$$divU" should be somewhere between $$divoI and > $$divl_2, and in Guenter's bad case that's > > 0000000041218c70 T $$divoI > 00000000412190d0 T $$divI_2 > > so *maybe* $$divU is around a page boundary? 0000000041218xxx turning > into 0000000041219000? It uses $$divU which is at $$divoI + 0x250. I validated that in the disassembly. Thanks, tglx ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 16:12 ` Thomas Gleixner @ 2024-08-08 16:33 ` Linus Torvalds 2024-08-08 17:48 ` Thomas Gleixner 0 siblings, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2024-08-08 16:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc On Thu, 8 Aug 2024 at 09:12, Thomas Gleixner <tglx@linutronix.de> wrote: > > > > But it looks like "$$divU" should be somewhere between $$divoI and > > $$divl_2, and in Guenter's bad case that's > > > > 0000000041218c70 T $$divoI > > 00000000412190d0 T $$divI_2 > > > > so *maybe* $$divU is around a page boundary? 0000000041218xxx turning > > into 0000000041219000? > > It uses $$divU which is at $$divoI + 0x250. I validated that in the > disassembly. Well, that does support "maybe we have a page crosser issue", but it's not quite at the delayed branch. Because that would mean that $$divU starts at 0x41218ec0, and that means that there are 80 instructions from the start of $$divU to the end of that 0x41218xxx page. And if I counted instructions right (I don't have a disassembler, so I'm just looking at the libgcc sources), that puts the page crosser not quite at the delayed branch slot, but it does put it somewhere roughly at or around ds temp,arg1,temp /* 29th divide step */ addc retreg,retreg,retreg /* shift retreg with/into carry */ so it's around the last few bits of the result. The ones we get wrong. Which is intriguing, but honestly, I don't see how we could get itlb misses horribly wrong and not crash left and right. The $$divU routine is unusual in that it uses that millicode calling convention, but I think that's just a different register for the return address. And it obviously depends on the carry flag, which is pretty unusual. Maybe if the ITLB fill messes up C, it wouldn't show up in other areas? But the $$divU result error is more than one single bit getting cleared. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 16:33 ` Linus Torvalds @ 2024-08-08 17:48 ` Thomas Gleixner 2024-08-08 18:19 ` Linus Torvalds 0 siblings, 1 reply; 33+ messages in thread From: Thomas Gleixner @ 2024-08-08 17:48 UTC (permalink / raw) To: Linus Torvalds Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc On Thu, Aug 08 2024 at 09:33, Linus Torvalds wrote: > On Thu, 8 Aug 2024 at 09:12, Thomas Gleixner <tglx@linutronix.de> wrote: >> It uses $$divU which is at $$divoI + 0x250. I validated that in the >> disassembly. > > Well, that does support "maybe we have a page crosser issue", but it's > not quite at the delayed branch. > > Because that would mean that $$divU starts at 0x41218ec0, and that > means that there are 80 instructions from the start of $$divU to the > end of that 0x41218xxx page. > > And if I counted instructions right (I don't have a disassembler, so > I'm just looking at the libgcc sources), that puts the page crosser > not quite at the delayed branch slot, but it does put it somewhere > roughly at or around > > ds temp,arg1,temp /* 29th divide step */ > addc retreg,retreg,retreg /* shift retreg with/into carry */ > > so it's around the last few bits of the result. The ones we get wrong. > > Which is intriguing, but honestly, I don't see how we could get itlb > misses horribly wrong and not crash left and right. Here is the disassembly from my latest crashing debug kernel which shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. 4121dec0: 37 21 3f ff ldo -1(r25),r1 4121dec4: 08 39 22 00 and,= r25,r1,r0 4121dec8: e8 00 00 88 b,l 4121df14 <$$divoI+0x2a4>,r0 4121decc: b3 20 20 00 addi,tc,= 0,r25,r0 4121ded0: 08 1a 02 5d copy r26,ret1 4121ded4: d3 21 39 f0 extrw,u,= r25,15,16,r1 4121ded8: d3 bd 19 f0 extrw,u ret1,15,16,ret1 4121dedc: 08 39 02 59 or r25,r1,r25 4121dee0: 34 1a 01 98 ldi cc,r26 4121dee4: d3 21 3a f8 extrw,u,= r25,23,8,r1 4121dee8: d3 bd 1a e8 extrw,u ret1,23,24,ret1 4121deec: 08 39 02 59 or r25,r1,r25 4121def0: 34 01 01 54 ldi aa,r1 4121def4: d3 20 3b 7c extrw,u,= r25,27,4,r0 4121def8: d3 bd 1b 64 extrw,u ret1,27,28,ret1 4121defc: 0b 59 22 00 and,= r25,r26,r0 4121df00: d3 bd 1b a2 extrw,u ret1,29,30,ret1 4121df04: 08 39 22 00 and,= r25,r1,r0 4121df08: d3 bd 1b c1 extrw,u ret1,30,31,ret1 4121df0c: e8 40 c0 02 bv,n r0(rp) 4121df10: 08 00 02 40 nop 4121df18: 97 21 00 00 subi 0,r25,r1 4121df1c: 08 20 04 40 ds r0,r1,r0 4121df20: 0b 5a 06 1d add r26,r26,ret1 4121df24: 0b 20 04 41 ds r0,r25,r1 4121df28: 0b bd 07 1d add,c ret1,ret1,ret1 4121df2c: 0b 21 04 41 ds r1,r25,r1 4121df30: 0b bd 07 1d add,c ret1,ret1,ret1 4121df34: 0b 21 04 41 ds r1,r25,r1 4121df38: 0b bd 07 1d add,c ret1,ret1,ret1 4121df3c: 0b 21 04 41 ds r1,r25,r1 4121df40: 0b bd 07 1d add,c ret1,ret1,ret1 4121df44: 0b 21 04 41 ds r1,r25,r1 4121df48: 0b bd 07 1d add,c ret1,ret1,ret1 4121df4c: 0b 21 04 41 ds r1,r25,r1 4121df50: 0b bd 07 1d add,c ret1,ret1,ret1 4121df54: 0b 21 04 41 ds r1,r25,r1 4121df58: 0b bd 07 1d add,c ret1,ret1,ret1 4121df5c: 0b 21 04 41 ds r1,r25,r1 4121df60: 0b bd 07 1d add,c ret1,ret1,ret1 4121df64: 0b 21 04 41 ds r1,r25,r1 4121df68: 0b bd 07 1d add,c ret1,ret1,ret1 4121df6c: 0b 21 04 41 ds r1,r25,r1 4121df70: 0b bd 07 1d add,c ret1,ret1,ret1 4121df74: 0b 21 04 41 ds r1,r25,r1 4121df78: 0b bd 07 1d add,c ret1,ret1,ret1 4121df7c: 0b 21 04 41 ds r1,r25,r1 4121df80: 0b bd 07 1d add,c ret1,ret1,ret1 4121df84: 0b 21 04 41 ds r1,r25,r1 4121df88: 0b bd 07 1d add,c ret1,ret1,ret1 4121df8c: 0b 21 04 41 ds r1,r25,r1 4121df90: 0b bd 07 1d add,c ret1,ret1,ret1 4121df94: 0b 21 04 41 ds r1,r25,r1 4121df98: 0b bd 07 1d add,c ret1,ret1,ret1 4121df9c: 0b 21 04 41 ds r1,r25,r1 4121dfa0: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfa4: 0b 21 04 41 ds r1,r25,r1 4121dfa8: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfac: 0b 21 04 41 ds r1,r25,r1 4121dfb0: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfb4: 0b 21 04 41 ds r1,r25,r1 4121dfb8: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfbc: 0b 21 04 41 ds r1,r25,r1 4121dfc0: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfc4: 0b 21 04 41 ds r1,r25,r1 4121dfc8: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfcc: 0b 21 04 41 ds r1,r25,r1 4121dfd0: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfd4: 0b 21 04 41 ds r1,r25,r1 4121dfd8: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfdc: 0b 21 04 41 ds r1,r25,r1 4121dfe0: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfe4: 0b 21 04 41 ds r1,r25,r1 4121dfe8: 0b bd 07 1d add,c ret1,ret1,ret1 4121dfec: 0b 21 04 41 ds r1,r25,r1 4121dff0: 0b bd 07 1d add,c ret1,ret1,ret1 4121dff4: 0b 21 04 41 ds r1,r25,r1 4121dff8: 0b bd 07 1d add,c ret1,ret1,ret1 4121dffc: 0b 21 04 41 ds r1,r25,r1 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 4121e004: 0b 21 04 41 ds r1,r25,r1 4121e008: 0b bd 07 1d add,c ret1,ret1,ret1 4121e00c: 0b 21 04 41 ds r1,r25,r1 4121e010: 0b bd 07 1d add,c ret1,ret1,ret1 4121e014: 0b 21 04 41 ds r1,r25,r1 4121e018: 0b bd 07 1d add,c ret1,ret1,ret1 4121e01c: 0b 21 04 41 ds r1,r25,r1 4121e020: e8 40 c0 00 bv r0(rp) 4121e024: 0b bd 07 1d add,c ret1,ret1,ret1 4121e028: f3 20 0c 00 depd,* r0,31,32,r25 4121e02c: 8f 20 61 10 cmpib,> 0,r25,4121e0bc <$$divoI+0x44c> 4121e030: 08 00 02 40 nop 4121e034: e8 19 40 00 blr r25,r0 4121e038: 08 00 02 40 nop 4121e03c: b3 20 20 00 addi,tc,= 0,r25,r0 4121e040: 08 00 02 40 nop 4121e044: e8 40 c0 00 bv r0(rp) 4121e048: 08 1a 02 5d copy r26,ret1 4121e04c: e8 40 c0 00 bv r0(rp) 4121e050: d3 5d 1b c1 extrw,u r26,30,31,ret1 4121e054: e8 00 01 c2 b,l,n 4121e13c <$$divI_16+0x3c>,r0 4121e058: 08 00 02 40 nop 4121e05c: e8 40 c0 00 bv r0(rp) 4121e060: d3 5d 1b a2 extrw,u r26,29,30,ret1 4121e064: e8 00 02 2a b,l,n 4121e180 <$$divI_16+0x80>,r0 4121e068: 08 00 02 40 nop 4121e06c: e8 00 02 aa b,l,n 4121e1c8 <$$divI_16+0xc8>,r0 4121e070: 08 00 02 40 nop 4121e074: e8 00 06 9a b,l,n 4121e3c8 <$$divU_17+0xbc>,r0 4121e078: 08 00 02 40 nop 4121e07c: e8 40 c0 00 bv r0(rp) 4121e080: d3 5d 1b 83 extrw,u r26,28,29,ret1 4121e084: e8 00 07 12 b,l,n 4121e414 <$$divU_17+0x108>,r0 4121e088: 08 00 02 40 nop 4121e08c: e8 00 02 9a b,l,n 4121e1e0 <$$divI_16+0xe0>,r0 4121e090: 08 00 02 40 nop 4121e094: e8 1f 1d 0d b,l 4121df20 <$$divoI+0x2b0>,r0 4121e098: 08 20 04 40 ds r0,r1,r0 4121e09c: e8 00 03 fa b,l,n 4121e2a0 <$$divI_16+0x1a0>,r0 4121e0a0: 08 00 02 40 nop 4121e0a4: e8 1f 1c ed b,l 4121df20 <$$divoI+0x2b0>,r0 4121e0a8: 08 20 04 40 ds r0,r1,r0 4121e0ac: e8 00 07 02 b,l,n 4121e434 <$$divU_17+0x128>,r0 4121e0b0: 08 00 02 40 nop 4121e0b4: e8 00 04 22 b,l,n 4121e2cc <$$divI_16+0x1cc>,r0 4121e0b8: 08 00 02 40 nop 4121e0bc: 0b 3a 04 00 sub r26,r25,r0 4121e0c0: e8 40 c0 00 bv r0(rp) 4121e0c4: 08 00 07 1d add,c r0,r0,ret1 ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 17:48 ` Thomas Gleixner @ 2024-08-08 18:19 ` Linus Torvalds 2024-08-08 20:52 ` Guenter Roeck 2024-09-03 7:54 ` Helge Deller 0 siblings, 2 replies; 33+ messages in thread From: Linus Torvalds @ 2024-08-08 18:19 UTC (permalink / raw) To: Thomas Gleixner Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: > > Here is the disassembly from my latest crashing debug kernel which > shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. Looks like I was off by an instruction, it's the 28th divide-step (not 29) that does the page crosser: > 4121dffc: 0b 21 04 41 ds r1,r25,r1 > 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 but my parisc knowledge is not good enough to even guess at what could go wrong. And I have no actual reason to believe this has *anything* to do with an itlb miss, except for that whole "exact placement seems to matter, and it crosses a page boundary" detail. None of this makes sense. I think we'll have to wait for Helge. It's not like parisc is a huge concern, and for all we know this is all a qemu bug to begin with. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 18:19 ` Linus Torvalds @ 2024-08-08 20:52 ` Guenter Roeck 2024-08-08 21:50 ` John David Anglin 2024-08-08 22:15 ` Richard Henderson 2024-09-03 7:54 ` Helge Deller 1 sibling, 2 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-08 20:52 UTC (permalink / raw) To: Linus Torvalds, Thomas Gleixner Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On 8/8/24 11:19, Linus Torvalds wrote: > On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: >> >> Here is the disassembly from my latest crashing debug kernel which >> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. > > Looks like I was off by an instruction, it's the 28th divide-step (not > 29) that does the page crosser: > >> 4121dffc: 0b 21 04 41 ds r1,r25,r1 >> 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 > > but my parisc knowledge is not good enough to even guess at what could go wrong. > > And I have no actual reason to believe this has *anything* to do with > an itlb miss, except for that whole "exact placement seems to matter, > and it crosses a page boundary" detail. > > None of this makes sense. I think we'll have to wait for Helge. It's > not like parisc is a huge concern, and for all we know this is all a > qemu bug to begin with. > Copying Richard Henderson who recently made a number of changes to the parisc/hppa qemu implementation (which unfortunately didn't fix the problem). Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 20:52 ` Guenter Roeck @ 2024-08-08 21:50 ` John David Anglin 2024-08-08 22:29 ` John David Anglin 2024-08-09 0:50 ` Guenter Roeck 2024-08-08 22:15 ` Richard Henderson 1 sibling, 2 replies; 33+ messages in thread From: John David Anglin @ 2024-08-08 21:50 UTC (permalink / raw) To: Guenter Roeck, Linus Torvalds, Thomas Gleixner Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On 2024-08-08 4:52 p.m., Guenter Roeck wrote: > On 8/8/24 11:19, Linus Torvalds wrote: >> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: >>> >>> Here is the disassembly from my latest crashing debug kernel which >>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. >> >> Looks like I was off by an instruction, it's the 28th divide-step (not >> 29) that does the page crosser: >> >>> 4121dffc: 0b 21 04 41 ds r1,r25,r1 >>> 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 I think this macro might clobber the C/B bits on a ITLB missing: /* This is for ILP32 PA2.0 only. The TLB insertion needs * to extend into I/O space if the address is 0xfXXXXXXX * so we extend the f's into the top word of the pte in * this case */ .macro f_extend pte,tmp extrd,s \pte,42,4,\tmp addi,<> 1,\tmp,%r0 extrd,s \pte,63,25,\pte .endm The addi instruction affects the C/B bits. However, it is only used for 32-bit PA 2.0 kernels. A second tmp register would be needed to change the addi to an add logical. The mode likely problem is the shladd instruction in the following macro in entry.S: .macro L2_ptep pmd,pte,index,va,fault #if CONFIG_PGTABLE_LEVELS == 3 extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index #else extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index #endif dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ #if CONFIG_PGTABLE_LEVELS < 3 copy %r0,\pte #endif ldw,s \index(\pmd),\pmd bb,>=,n \pmd,_PxD_PRESENT_BIT,\fault dep %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */ SHLREG \pmd,PxD_VALUE_SHIFT,\pmd extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ .endm I believe the shladd instruction should be changed to shladd,l (shift left and add logical). Dave -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 21:50 ` John David Anglin @ 2024-08-08 22:29 ` John David Anglin 2024-08-08 23:33 ` Linus Torvalds 2024-08-09 0:56 ` Guenter Roeck 2024-08-09 0:50 ` Guenter Roeck 1 sibling, 2 replies; 33+ messages in thread From: John David Anglin @ 2024-08-08 22:29 UTC (permalink / raw) To: Guenter Roeck, Linus Torvalds, Thomas Gleixner Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On 2024-08-08 5:50 p.m., John David Anglin wrote: > The mode likely problem is the shladd instruction in the following macro in entry.S: > > .macro L2_ptep pmd,pte,index,va,fault > #if CONFIG_PGTABLE_LEVELS == 3 > extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index > #else > extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index > #endif > dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ > #if CONFIG_PGTABLE_LEVELS < 3 > copy %r0,\pte > #endif > ldw,s \index(\pmd),\pmd > bb,>=,n \pmd,_PxD_PRESENT_BIT,\fault > dep %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */ > SHLREG \pmd,PxD_VALUE_SHIFT,\pmd > extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index > dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ > shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ > .endm > > I believe the shladd instruction should be changed to shladd,l (shift left and add logical). diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index ab23e61a6f01..1ec60406f841 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -399,7 +399,7 @@ SHLREG \pmd,PxD_VALUE_SHIFT,\pmd extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ - shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ + shladd,l \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ .endm /* Look up PTE in a 3-Level scheme. */ Boots okay. Fixing the addi instruction is harder and it would take some time to test. Dave -- John David Anglin dave.anglin@bell.net ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 22:29 ` John David Anglin @ 2024-08-08 23:33 ` Linus Torvalds 2024-08-09 0:33 ` John David Anglin 2024-08-09 0:56 ` Guenter Roeck 1 sibling, 1 reply; 33+ messages in thread From: Linus Torvalds @ 2024-08-08 23:33 UTC (permalink / raw) To: John David Anglin Cc: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On Thu, 8 Aug 2024 at 15:30, John David Anglin <dave.anglin@bell.net> wrote: > > > I believe the shladd instruction should be changed to shladd,l (shift left and add logical). > > diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S > index ab23e61a6f01..1ec60406f841 100644 > --- a/arch/parisc/kernel/entry.S > +++ b/arch/parisc/kernel/entry.S > @@ -399,7 +399,7 @@ > - shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ > + shladd,l \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ This doesn't seem wrong, but doesn't RFIR already restore the status word? So even if the itlb fill modifies C/B, I don't see why that should actually matter. But again, parisc is very much not one of the architectures I've ever worked with, so.. Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 23:33 ` Linus Torvalds @ 2024-08-09 0:33 ` John David Anglin 0 siblings, 0 replies; 33+ messages in thread From: John David Anglin @ 2024-08-09 0:33 UTC (permalink / raw) To: Linus Torvalds Cc: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On 2024-08-08 7:33 p.m., Linus Torvalds wrote: > On Thu, 8 Aug 2024 at 15:30, John David Anglin <dave.anglin@bell.net> wrote: >>> I believe the shladd instruction should be changed to shladd,l (shift left and add logical). >> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S >> index ab23e61a6f01..1ec60406f841 100644 >> --- a/arch/parisc/kernel/entry.S >> +++ b/arch/parisc/kernel/entry.S >> @@ -399,7 +399,7 @@ >> - shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ >> + shladd,l \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ > This doesn't seem wrong, but doesn't RFIR already restore the status word? Yes. > > So even if the itlb fill modifies C/B, I don't see why that should > actually matter. Probably won't help unless there's a bug in qemu rfir emulation. Dave -- John David Anglin dave.anglin@bell.net ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 22:29 ` John David Anglin 2024-08-08 23:33 ` Linus Torvalds @ 2024-08-09 0:56 ` Guenter Roeck 1 sibling, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-09 0:56 UTC (permalink / raw) To: John David Anglin, Linus Torvalds, Thomas Gleixner Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On 8/8/24 15:29, John David Anglin wrote: > On 2024-08-08 5:50 p.m., John David Anglin wrote: >> The mode likely problem is the shladd instruction in the following macro in entry.S: >> >> .macro L2_ptep pmd,pte,index,va,fault >> #if CONFIG_PGTABLE_LEVELS == 3 >> extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index >> #else >> extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index >> #endif >> dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ >> #if CONFIG_PGTABLE_LEVELS < 3 >> copy %r0,\pte >> #endif >> ldw,s \index(\pmd),\pmd >> bb,>=,n \pmd,_PxD_PRESENT_BIT,\fault >> dep %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */ >> SHLREG \pmd,PxD_VALUE_SHIFT,\pmd >> extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index >> dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ >> shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ >> .endm >> >> I believe the shladd instruction should be changed to shladd,l (shift left and add logical). > diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S > index ab23e61a6f01..1ec60406f841 100644 > --- a/arch/parisc/kernel/entry.S > +++ b/arch/parisc/kernel/entry.S > @@ -399,7 +399,7 @@ > SHLREG \pmd,PxD_VALUE_SHIFT,\pmd > extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index > dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ > - shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ > + shladd,l \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ > .endm > > /* Look up PTE in a 3-Level scheme. */ > > Boots okay. Fixing the addi instruction is harder and it would take some time to test. > Odd, it doesn't help for me. Does it crash for you without the above change ? Or, in other words, is divI at the objecting location ? Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 21:50 ` John David Anglin 2024-08-08 22:29 ` John David Anglin @ 2024-08-09 0:50 ` Guenter Roeck 1 sibling, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-08-09 0:50 UTC (permalink / raw) To: John David Anglin, Linus Torvalds, Thomas Gleixner Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc, Richard Henderson On 8/8/24 14:50, John David Anglin wrote: > On 2024-08-08 4:52 p.m., Guenter Roeck wrote: >> On 8/8/24 11:19, Linus Torvalds wrote: >>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: >>>> >>>> Here is the disassembly from my latest crashing debug kernel which >>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. >>> >>> Looks like I was off by an instruction, it's the 28th divide-step (not >>> 29) that does the page crosser: >>> >>>> 4121dffc: 0b 21 04 41 ds r1,r25,r1 >>>> 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 > I think this macro might clobber the C/B bits on a ITLB missing: > > /* This is for ILP32 PA2.0 only. The TLB insertion needs > * to extend into I/O space if the address is 0xfXXXXXXX > * so we extend the f's into the top word of the pte in > * this case */ > .macro f_extend pte,tmp > extrd,s \pte,42,4,\tmp > addi,<> 1,\tmp,%r0 > extrd,s \pte,63,25,\pte > .endm > > The addi instruction affects the C/B bits. However, it is only used for 32-bit PA 2.0 kernels. > A second tmp register would be needed to change the addi to an add logical. > > The mode likely problem is the shladd instruction in the following macro in entry.S: > > .macro L2_ptep pmd,pte,index,va,fault > #if CONFIG_PGTABLE_LEVELS == 3 > extru_safe \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index > #else > extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index > #endif > dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ > #if CONFIG_PGTABLE_LEVELS < 3 > copy %r0,\pte > #endif > ldw,s \index(\pmd),\pmd > bb,>=,n \pmd,_PxD_PRESENT_BIT,\fault > dep %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */ > SHLREG \pmd,PxD_VALUE_SHIFT,\pmd > extru_safe \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index > dep %r0,31,PAGE_SHIFT,\pmd /* clear offset */ > shladd \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */ > .endm > > I believe the shladd instruction should be changed to shladd,l (shift left and add logical). > That doesn't help, at least not in qemu. Guenter ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 20:52 ` Guenter Roeck 2024-08-08 21:50 ` John David Anglin @ 2024-08-08 22:15 ` Richard Henderson 1 sibling, 0 replies; 33+ messages in thread From: Richard Henderson @ 2024-08-08 22:15 UTC (permalink / raw) To: Guenter Roeck, Linus Torvalds, Thomas Gleixner Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller, linux-parisc On 8/9/24 06:52, Guenter Roeck wrote: > On 8/8/24 11:19, Linus Torvalds wrote: >> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: >>> >>> Here is the disassembly from my latest crashing debug kernel which >>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. >> >> Looks like I was off by an instruction, it's the 28th divide-step (not >> 29) that does the page crosser: >> >>> 4121dffc: 0b 21 04 41 ds r1,r25,r1 >>> 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 >> >> but my parisc knowledge is not good enough to even guess at what could go wrong. >> >> And I have no actual reason to believe this has *anything* to do with >> an itlb miss, except for that whole "exact placement seems to matter, >> and it crosses a page boundary" detail. >> >> None of this makes sense. I think we'll have to wait for Helge. It's >> not like parisc is a huge concern, and for all we know this is all a >> qemu bug to begin with. >> > > Copying Richard Henderson who recently made a number of changes to the > parisc/hppa qemu implementation (which unfortunately didn't fix the problem). Wow, that's quite the agile bug you've got there. You can eliminate one class of qemu bug by attempting to reproduce in qemu-linux-user: arrange for the page crossing at the appropriate spot and see if the split between two translation blocks causes carry flag weirdness. If that doesn't reproduce, then I'd be likely to blame something in the exception delivery or return process. Still could be a qemu problem, but it would be something in the system emulation of the exception path. It should be possible to write a small system mode test case for this hypothesis. Ideally the itlb miss handler would be as simple a possible, e.g. computing an identity mapping rather than using real page tables. Only after that would I start digging into the linux kernel's exception paths. r~ ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-08-08 18:19 ` Linus Torvalds 2024-08-08 20:52 ` Guenter Roeck @ 2024-09-03 7:54 ` Helge Deller 2024-09-03 14:13 ` Guenter Roeck 2024-09-03 18:43 ` Linus Torvalds 1 sibling, 2 replies; 33+ messages in thread From: Helge Deller @ 2024-09-03 7:54 UTC (permalink / raw) To: Linus Torvalds, Richard Henderson, Guenter Roeck Cc: Vlastimil Babka, linux-kernel, Linux-MM, linux-parisc, Thomas Gleixner On 8/8/24 20:19, Linus Torvalds wrote: > On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: >> >> Here is the disassembly from my latest crashing debug kernel which >> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. > > Looks like I was off by an instruction, it's the 28th divide-step (not > 29) that does the page crosser: > >> 4121dffc: 0b 21 04 41 ds r1,r25,r1 >> 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 > > but my parisc knowledge is not good enough to even guess at what could go wrong. > > And I have no actual reason to believe this has *anything* to do with > an itlb miss, except for that whole "exact placement seems to matter, > and it crosses a page boundary" detail. Well, you were on the right track :-) Guenters kernel from http://server.roeck-us.net/qemu/parisc64-6.10.3/ boots nicely on my physical C3700 machine, but crashes with Qemu. So, it's not some bug in the kernel ITLB miss handler or other Linux kernel code. Instead it's a Qemu bug, which gets triggered by the page boundary crossing of: 41218ffc: 0b 21 04 41 ds r1,r25,r1 41219000: 0b bd 07 1d add,c ret1,ret1,ret1 During the ITLB miss, the carry bits and the PSW-V-bit (from the divide step) are saved in the IPSW register and restored upon irq return. During packaging the bits there is a qemu coding bug, where we missed to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU. The (copy&pasted) patch below fixes the crash for me. Helge diff --git a/target/hppa/helper.c b/target/hppa/helper.c index b79ddd8184..d4b1a3cd5a 100644 --- a/target/hppa/helper.c +++ b/target/hppa/helper.c @@ -53,7 +53,7 @@ target_ulong cpu_hppa_get_psw(CPUHPPAState *env) } psw |= env->psw_n * PSW_N; - psw |= (env->psw_v < 0) * PSW_V; + psw |= ((env->psw_v >> 31) & 1) * PSW_V; psw |= env->psw | env->psw_xb; return psw; ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-09-03 7:54 ` Helge Deller @ 2024-09-03 14:13 ` Guenter Roeck 2024-09-03 18:43 ` Linus Torvalds 1 sibling, 0 replies; 33+ messages in thread From: Guenter Roeck @ 2024-09-03 14:13 UTC (permalink / raw) To: Helge Deller Cc: Linus Torvalds, Richard Henderson, Vlastimil Babka, linux-kernel, Linux-MM, linux-parisc, Thomas Gleixner On Tue, Sep 03, 2024 at 09:54:19AM +0200, Helge Deller wrote: > On 8/8/24 20:19, Linus Torvalds wrote: > > On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote: > > > > > > Here is the disassembly from my latest crashing debug kernel which > > > shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. > > > > Looks like I was off by an instruction, it's the 28th divide-step (not > > 29) that does the page crosser: > > > > > 4121dffc: 0b 21 04 41 ds r1,r25,r1 > > > 4121e000: 0b bd 07 1d add,c ret1,ret1,ret1 > > > > but my parisc knowledge is not good enough to even guess at what could go wrong. > > > > And I have no actual reason to believe this has *anything* to do with > > an itlb miss, except for that whole "exact placement seems to matter, > > and it crosses a page boundary" detail. > > Well, you were on the right track :-) > > Guenters kernel from > http://server.roeck-us.net/qemu/parisc64-6.10.3/ > boots nicely on my physical C3700 machine, but crashes with Qemu. > > So, it's not some bug in the kernel ITLB miss handler or other > Linux kernel code. > > Instead it's a Qemu bug, which gets triggered by the page > boundary crossing of: > 41218ffc: 0b 21 04 41 ds r1,r25,r1 > 41219000: 0b bd 07 1d add,c ret1,ret1,ret1 > > During the ITLB miss, the carry bits and the PSW-V-bit > (from the divide step) are saved in the IPSW register and restored > upon irq return. > > During packaging the bits there is a qemu coding bug, where we missed > to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU. > The (copy&pasted) patch below fixes the crash for me. > Yes, that works for me as well. Thanks a lot for the fix! Guenter > Helge > > diff --git a/target/hppa/helper.c b/target/hppa/helper.c > index b79ddd8184..d4b1a3cd5a 100644 > --- a/target/hppa/helper.c > +++ b/target/hppa/helper.c > @@ -53,7 +53,7 @@ target_ulong cpu_hppa_get_psw(CPUHPPAState *env) > } > > psw |= env->psw_n * PSW_N; > - psw |= (env->psw_v < 0) * PSW_V; > + psw |= ((env->psw_v >> 31) & 1) * PSW_V; > psw |= env->psw | env->psw_xb; > > return psw; > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review 2024-09-03 7:54 ` Helge Deller 2024-09-03 14:13 ` Guenter Roeck @ 2024-09-03 18:43 ` Linus Torvalds 1 sibling, 0 replies; 33+ messages in thread From: Linus Torvalds @ 2024-09-03 18:43 UTC (permalink / raw) To: Helge Deller Cc: Richard Henderson, Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM, linux-parisc, Thomas Gleixner On Tue, 3 Sept 2024 at 00:54, Helge Deller <deller@gmx.de> wrote: > > During packaging the bits there is a qemu coding bug, where we missed > to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU. Well, that was a fun bug. And by "fun" I obviously mean "let's hope to never do this again". Linus ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2024-09-03 18:43 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240731095022.970699670@linuxfoundation.org>
[not found] ` <718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net>
2024-08-05 3:28 ` [PATCH 6.10 000/809] 6.10.3-rc3 review Guenter Roeck
2024-08-05 8:56 ` Thomas Gleixner
2024-08-05 12:51 ` Thomas Gleixner
2024-08-05 15:02 ` Guenter Roeck
2024-08-05 21:49 ` Thomas Gleixner
2024-08-06 1:16 ` Guenter Roeck
2024-08-05 17:42 ` Guenter Roeck
[not found] ` <CAHk-=wiZ7WJQ1y=CwuMwqBxQYtaD8psq+Vxa3r1Z6_ftDZK+hA@mail.gmail.com>
[not found] ` <53b2e1f2-4291-48e5-a668-7cf57d900ecd@suse.cz>
[not found] ` <87le194kuq.ffs@tglx>
[not found] ` <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz>
2024-08-06 23:24 ` Thomas Gleixner
2024-08-07 0:49 ` James Bottomley
2024-08-07 1:38 ` Guenter Roeck
2024-08-07 12:45 ` Thomas Gleixner
2024-08-08 1:07 ` Guenter Roeck
2024-08-08 7:48 ` Vlastimil Babka
2024-08-08 14:46 ` Guenter Roeck
2024-08-08 9:57 ` Thomas Gleixner
2024-08-08 14:59 ` Guenter Roeck
2024-08-08 15:58 ` John David Anglin
2024-08-08 15:53 ` Linus Torvalds
2024-08-08 16:12 ` Thomas Gleixner
2024-08-08 16:33 ` Linus Torvalds
2024-08-08 17:48 ` Thomas Gleixner
2024-08-08 18:19 ` Linus Torvalds
2024-08-08 20:52 ` Guenter Roeck
2024-08-08 21:50 ` John David Anglin
2024-08-08 22:29 ` John David Anglin
2024-08-08 23:33 ` Linus Torvalds
2024-08-09 0:33 ` John David Anglin
2024-08-09 0:56 ` Guenter Roeck
2024-08-09 0:50 ` Guenter Roeck
2024-08-08 22:15 ` Richard Henderson
2024-09-03 7:54 ` Helge Deller
2024-09-03 14:13 ` Guenter Roeck
2024-09-03 18:43 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox