Re: [PATCH 6.10 000/809] 6.10.3-rc3 review

public inbox for linux-parisc@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
       [not found] ` <718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net>
@ 2024-08-05  3:28   ` Guenter Roeck
  2024-08-05  8:56     ` Thomas Gleixner
       [not found]   ` <CAHk-=wiZ7WJQ1y=CwuMwqBxQYtaD8psq+Vxa3r1Z6_ftDZK+hA@mail.gmail.com>
  1 sibling, 1 reply; 33+ messages in thread
From: Guenter Roeck @ 2024-08-05  3:28 UTC (permalink / raw)
  To: Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On 8/4/24 11:36, Guenter Roeck wrote:
> Hi,
> 
> On 7/31/24 03:03, Greg Kroah-Hartman wrote:
>> This is the start of the stable review cycle for the 6.10.3 release.
>> There are 809 patches in this series, all will be posted as a response
>> to this one.  If anyone has any issues with these being applied, please
>> let me know.
>>
>> Responses should be made by Fri, 02 Aug 2024 09:47:47 +0000.
>> Anything received after that time might be too late.
>>
> [ ... ]
> 
>> Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>      genirq: Set IRQF_COND_ONESHOT in request_irq()
>>
> 
> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
> 
> [    0.000000] =============================================================================
> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
> [    0.000000] -----------------------------------------------------------------------------
> 
> This never stops until the emulation aborts.
> 
> Reverting this patch fixes the problem for me.
> 
> I noticed a similar problem in the mainline kernel but it is either spurious there
> or the problem has been fixed.
> 

As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me.
I guess that suggests some kind of race condition.

Added Helge and the parisc mailing list to Cc:. Sorry, I forgot that earlier.

Guenter

---
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index dd53298ef1a5..53a0f654ab56 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -8,6 +8,7 @@

  #define pr_fmt(fmt) "genirq: " fmt

+#include <linux/delay.h>
  #include <linux/irq.h>
  #include <linux/kthread.h>
  #include <linux/module.h>
@@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
         struct irq_desc *desc;
         int retval;

+       udelay(1);
+
         if (irq == IRQ_NOTCONNECTED)
                 return -ENOTCONN;


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-05  3:28   ` [PATCH 6.10 000/809] 6.10.3-rc3 review Guenter Roeck
@ 2024-08-05  8:56     ` Thomas Gleixner
  2024-08-05 12:51       ` Thomas Gleixner
  2024-08-05 17:42       ` Guenter Roeck
  0 siblings, 2 replies; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-05  8:56 UTC (permalink / raw)
  To: Guenter Roeck, Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote:
> On 8/4/24 11:36, Guenter Roeck wrote:
>>> Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>      genirq: Set IRQF_COND_ONESHOT in request_irq()
>>>
>> 
>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>> 
>> [    0.000000] =============================================================================
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> [    0.000000] -----------------------------------------------------------------------------

Do you have a full boot log? It's unclear to me at which point of the boot
process this happens. Is this before or after the secondary CPUs have
been brought up?

>> This never stops until the emulation aborts.

Do you have a recipe how to reproduce?

>> Reverting this patch fixes the problem for me.
>> 
>> I noticed a similar problem in the mainline kernel but it is either spurious there
>> or the problem has been fixed.
>> 
>
> As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me.
> I guess that suggests some kind of race condition.
>
>
> @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
>          struct irq_desc *desc;
>          int retval;
>
> +       udelay(1);
> +
>          if (irq == IRQ_NOTCONNECTED)
>                  return -ENOTCONN;

That all makes absolutely no sense to me.

IRQF_COND_ONESHOT has only an effect on shared interrupts, when the
interrupt was already requested with IRQF_ONESHOT.

If this is really a race then the following must be true:

1) no delay

   CPU0                                 CPU1
   request_irq(IRQF_ONESHOT)
                                        request_irq(IRQF_COND_ONESHOT)

2) delay

   CPU0                                 CPU1
                                        request_irq(IRQF_COND_ONESHOT)
   request_irq(IRQF_ONESHOT)

   In this case the request on CPU 0 fails with -EBUSY ...

Confused

        tglx



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-05  8:56     ` Thomas Gleixner
@ 2024-08-05 12:51       ` Thomas Gleixner
  2024-08-05 15:02         ` Guenter Roeck
  2024-08-05 17:42       ` Guenter Roeck
  1 sibling, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-05 12:51 UTC (permalink / raw)
  To: Guenter Roeck, Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On Mon, Aug 05 2024 at 10:56, Thomas Gleixner wrote:
> If this is really a race then the following must be true:
>
> 1) no delay
>
>    CPU0                                 CPU1
>    request_irq(IRQF_ONESHOT)
>                                         request_irq(IRQF_COND_ONESHOT)
>
> 2) delay
>
>    CPU0                                 CPU1
>                                         request_irq(IRQF_COND_ONESHOT)
>    request_irq(IRQF_ONESHOT)
>
>    In this case the request on CPU 0 fails with -EBUSY ...
>
> Confused

More confusing:

Adding a printk() in setup_irq() - using the config, rootfs and the run.sh
script from:

  http://server.roeck-us.net/qemu/parisc64-6.1.5/

results in:

[    0.000000] genirq: 64 flags: 00215600
[    0.000000] genirq: 65 flags: 00200400
[    8.110946] genirq: 66 flags: 00200080

IRQF_ONESHOT is 0x2000 which is not set by any of the interrupt
requests.

IRQF_COND_ONESHOT has only an effect when

    1) Interrupt is shared
    2) First interrupt request has IRQF_ONESHOT set

Neither #1 nor #2 are true, but maybe your current config enables some moar
devices than the one on your website.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-05 12:51       ` Thomas Gleixner
@ 2024-08-05 15:02         ` Guenter Roeck
  2024-08-05 21:49           ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: Guenter Roeck @ 2024-08-05 15:02 UTC (permalink / raw)
  To: Thomas Gleixner, Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On 8/5/24 05:51, Thomas Gleixner wrote:
> On Mon, Aug 05 2024 at 10:56, Thomas Gleixner wrote:
>> If this is really a race then the following must be true:
>>
>> 1) no delay
>>
>>     CPU0                                 CPU1
>>     request_irq(IRQF_ONESHOT)
>>                                          request_irq(IRQF_COND_ONESHOT)
>>
>> 2) delay
>>
>>     CPU0                                 CPU1
>>                                          request_irq(IRQF_COND_ONESHOT)
>>     request_irq(IRQF_ONESHOT)
>>
>>     In this case the request on CPU 0 fails with -EBUSY ...
>>
>> Confused
> 
> More confusing:
> 
> Adding a printk() in setup_irq() - using the config, rootfs and the run.sh
> script from:
> 
>    http://server.roeck-us.net/qemu/parisc64-6.1.5/
> 
> results in:
> 
> [    0.000000] genirq: 64 flags: 00215600
> [    0.000000] genirq: 65 flags: 00200400
> [    8.110946] genirq: 66 flags: 00200080
> 
> IRQF_ONESHOT is 0x2000 which is not set by any of the interrupt
> requests.
> 
> IRQF_COND_ONESHOT has only an effect when
> 
>      1) Interrupt is shared
>      2) First interrupt request has IRQF_ONESHOT set
> 
> Neither #1 nor #2 are true, but maybe your current config enables some moar
> devices than the one on your website.
> 

No, it is pretty much the same, except for a more recent C compiler, and it
requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/.

Debugging shows pretty much the same for me, and any log message added
to request_irq() makes the problem go away (or be different), and if the problem
is seen it doesn't even get to the third interrupt request. I copied a more complete
log to bad.log.gz in above page.

Below is yet another "fix" of the problem, just as puzzling as the other "fix".

Guenter

---
diff --git a/arch/parisc/kernel/time.c b/arch/parisc/kernel/time.c
index 9714fbd7c42d..9707914c1a62 100644
--- a/arch/parisc/kernel/time.c
+++ b/arch/parisc/kernel/time.c
@@ -75,6 +75,8 @@ irqreturn_t __irq_entry timer_interrupt(int irq, void *dev_id)
         /* Initialize next_tick to the old expected tick time. */
         next_tick = cpuinfo->it_value;

+       pr_info_once("####### First timer interrupt\n");
+
         /* Calculate how many ticks have elapsed. */
         now = mfctl(16);
         do {


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-05  8:56     ` Thomas Gleixner
  2024-08-05 12:51       ` Thomas Gleixner
@ 2024-08-05 17:42       ` Guenter Roeck
  1 sibling, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-05 17:42 UTC (permalink / raw)
  To: Thomas Gleixner, Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On 8/5/24 01:56, Thomas Gleixner wrote:
> On Sun, Aug 04 2024 at 20:28, Guenter Roeck wrote:
>> On 8/4/24 11:36, Guenter Roeck wrote:
>>>> Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>>       genirq: Set IRQF_COND_ONESHOT in request_irq()
>>>>
>>>
>>> With this patch in v6.10.3, all my parisc64 qemu tests get stuck with repeated error messages
>>>
>>> [    0.000000] =============================================================================
>>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>>> [    0.000000] -----------------------------------------------------------------------------
> 
> Do you have a full boot log? It's unclear to me at which point of the boot
> process this happens. Is this before or after the secondary CPUs have
> been brought up?
> 
>>> This never stops until the emulation aborts.
> 
> Do you have a recipe how to reproduce?
> 
>>> Reverting this patch fixes the problem for me.
>>>
>>> I noticed a similar problem in the mainline kernel but it is either spurious there
>>> or the problem has been fixed.
>>>
>>
>> As a follow-up, the patch below (on top of v6.10.3) "fixes" the problem for me.
>> I guess that suggests some kind of race condition.
>>
>>
>> @@ -2156,6 +2157,8 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
>>           struct irq_desc *desc;
>>           int retval;
>>
>> +       udelay(1);
>> +
>>           if (irq == IRQ_NOTCONNECTED)
>>                   return -ENOTCONN;
> 
> That all makes absolutely no sense to me.
> 

Same here, really. I can reproduce the problem with v6.10.3, using my configuration,
but whatever debugging I add makes the problem disappear. I had seen the same problem
on mainline with v6.11-rc1-272-g17712b7ea075. Log is at
https://kerneltests.org/builders/qemu-parisc64-master/builds/168/steps/qemubuildcommand/logs/stdio
However, I can no longer reproduce it there. What makes it even more weird / odd
is that I can bisect the problem between v6.10.2 and v6.10.3 and it points to this
commit, but reproducing it outside that chain seems to be all but impossible.

Guenter

> IRQF_COND_ONESHOT has only an effect on shared interrupts, when the
> interrupt was already requested with IRQF_ONESHOT.
> 
> If this is really a race then the following must be true:
> 
> 1) no delay
> 
>     CPU0                                 CPU1
>     request_irq(IRQF_ONESHOT)
>                                          request_irq(IRQF_COND_ONESHOT)
> 
> 2) delay
> 
>     CPU0                                 CPU1
>                                          request_irq(IRQF_COND_ONESHOT)
>     request_irq(IRQF_ONESHOT)
> 
>     In this case the request on CPU 0 fails with -EBUSY ...
> 
> Confused
> 
>          tglx
> 
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-05 15:02         ` Guenter Roeck
@ 2024-08-05 21:49           ` Thomas Gleixner
  2024-08-06  1:16             ` Guenter Roeck
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-05 21:49 UTC (permalink / raw)
  To: Guenter Roeck, Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On Mon, Aug 05 2024 at 08:02, Guenter Roeck wrote:
> On 8/5/24 05:51, Thomas Gleixner wrote:
>> IRQF_COND_ONESHOT has only an effect when
>> 
>>      1) Interrupt is shared
>>      2) First interrupt request has IRQF_ONESHOT set
>> 
>> Neither #1 nor #2 are true, but maybe your current config enables some moar
>> devices than the one on your website.
>> 
>
> No, it is pretty much the same, except for a more recent C compiler, and it
> requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/.
>
> Debugging shows pretty much the same for me, and any log message added
> to request_irq() makes the problem go away (or be different), and if the problem
> is seen it doesn't even get to the third interrupt request. I copied a more complete
> log to bad.log.gz in above page.

At the point where the problem happens is way before the first interrupt
is requested, so that definitely excludes any side effect of
COND_ONESHOT.

It happens right after

[    0.000000] Memory: 991812K/1048576K available (16752K kernel code, 5220K rwdata, 3044K rodata, 760K init, 1424K bss, 56764K reserved, 0K cma-reserved)
               SNIP
[    0.000000] ** This system shows unhashed kernel memory addresses   **
               SNIP

I.e. the big fat warning about unhashed kernel addresses)

In the good case the first interrupt is requested here:

[    0.000000] Memory: 992804K/1048576K available (16532K kernel code, 5096K rwdata, 2984K rodata, 744K init, 1412K bss, 55772K reserved, 0K cma-reserved)
               SNIP
[    0.000000] ** This system shows unhashed kernel memory addresses   **
               SNIP
[    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] ODEBUG: selftest passed
[    0.000000] rcu: Hierarchical RCU implementation.
[    0.000000] rcu:     RCU event tracing is enabled.
[    0.000000] rcu:     RCU callback double-/use-after-free debug is enabled.
[    0.000000] rcu:     RCU debug extended QS entry/exit.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.

[    0.000000] NR_IRQS: 128              <- This is where the interrupt
                                            subsystem is initialized
[    0.000000] genirq: 64: 00215600      <- This is probably the timer interrupt

Looking at the backtrace the fail happens from within start_kernel(),
i.e. mm_core_init():

       slab_err+0x13c/0x190
       check_slab+0xf4/0x140
       alloc_debug_processing+0x58/0x248
       ___slab_alloc+0x22c/0xa38
       __slab_alloc.isra.0+0x60/0x88
       kmem_cache_alloc_node_noprof+0x148/0x3c0
       __kmem_cache_create+0x5d4/0x680
       create_boot_cache+0xc4/0x128
       new_kmalloc_cache+0x1ac/0x1d8
       create_kmalloc_caches+0xac/0x108
       kmem_cache_init+0x1cc/0x340
       mm_core_init+0x458/0x560
       start_kernel+0x9ac/0x11e0
       start_parisc+0x188/0x1b0

The interrupt subsystem is initialized way later and request_irq() only
works after that point.

I'm 100% sure by now that this commit has absolutely nothing to do with
the underlying problem. All it does is changing the code size and
placement of text, data and ....

So I finally managed to reproduce with gcc-13.3.0 from the k.org cross
tools (the debian 12.2.2 does not).

If I add:

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1513,6 +1513,8 @@ static int
 	unsigned long flags, thread_mask = 0;
 	int ret, nested, shared = 0;
 
+	pr_info("%u: %08x\n", irq, new->flags);
+
 	if (!desc)
 		return -EINVAL;
 
it still reproduces. If I change that to:

--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1513,6 +1513,8 @@ static int
 	unsigned long flags, thread_mask = 0;
 	int ret, nested, shared = 0;
 
+	new->flags &= ~IRQF_COND_ONESHOT;
+
 	if (!desc)
 		return -EINVAL;
 
that does neither cure it (unsurprisingly).

Reverting the "offending" commit cures it. 

So I tried your

     +       pr_info_once("####### First timer interrupt\n");

which cures it too.

So now where is the difference between the printk() in __setup_irq(),
the new->flag mangling, the revert and the printk() in timer interrupt?

There is ZERO functional change. There is no race either because at that
point everything is single threaded and interrupts are disabled.

The only thing which changes is the code and data layout as I can
observe when looking at System.map of the builds. I stared at the diffs
for a bit, but nothing stood out.

Maybe the PARISC people can shed some light on it.

This reminds me of the x86 32-bit disaster we debugged a few days ago...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-05 21:49           ` Thomas Gleixner
@ 2024-08-06  1:16             ` Guenter Roeck
  0 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-06  1:16 UTC (permalink / raw)
  To: Thomas Gleixner, Greg Kroah-Hartman, stable
  Cc: patches, linux-kernel, torvalds, akpm, shuah, patches,
	lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
	rwarsow, conor, allen.lkml, broonie, Rafael J. Wysocki,
	Helge Deller, Parisc List

On 8/5/24 14:49, Thomas Gleixner wrote:
> On Mon, Aug 05 2024 at 08:02, Guenter Roeck wrote:
>> On 8/5/24 05:51, Thomas Gleixner wrote:
>>> IRQF_COND_ONESHOT has only an effect when
>>>
>>>       1) Interrupt is shared
>>>       2) First interrupt request has IRQF_ONESHOT set
>>>
>>> Neither #1 nor #2 are true, but maybe your current config enables some moar
>>> devices than the one on your website.
>>>
>>
>> No, it is pretty much the same, except for a more recent C compiler, and it
>> requires qemu v9.0. See http://server.roeck-us.net/qemu/parisc64-6.10.3/.
>>
>> Debugging shows pretty much the same for me, and any log message added
>> to request_irq() makes the problem go away (or be different), and if the problem
>> is seen it doesn't even get to the third interrupt request. I copied a more complete
>> log to bad.log.gz in above page.
> 
> At the point where the problem happens is way before the first interrupt
> is requested, so that definitely excludes any side effect of
> COND_ONESHOT.
> 
> It happens right after
> 
> [    0.000000] Memory: 991812K/1048576K available (16752K kernel code, 5220K rwdata, 3044K rodata, 760K init, 1424K bss, 56764K reserved, 0K cma-reserved)
>                 SNIP
> [    0.000000] ** This system shows unhashed kernel memory addresses   **
>                 SNIP
> 
> I.e. the big fat warning about unhashed kernel addresses)
> 
> In the good case the first interrupt is requested here:
> 
> [    0.000000] Memory: 992804K/1048576K available (16532K kernel code, 5096K rwdata, 2984K rodata, 744K init, 1412K bss, 55772K reserved, 0K cma-reserved)
>                 SNIP
> [    0.000000] ** This system shows unhashed kernel memory addresses   **
>                 SNIP
> [    0.000000] SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> [    0.000000] ODEBUG: selftest passed
> [    0.000000] rcu: Hierarchical RCU implementation.
> [    0.000000] rcu:     RCU event tracing is enabled.
> [    0.000000] rcu:     RCU callback double-/use-after-free debug is enabled.
> [    0.000000] rcu:     RCU debug extended QS entry/exit.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
> 
> [    0.000000] NR_IRQS: 128              <- This is where the interrupt
>                                              subsystem is initialized
> [    0.000000] genirq: 64: 00215600      <- This is probably the timer interrupt
> 
> Looking at the backtrace the fail happens from within start_kernel(),
> i.e. mm_core_init():
> 
>         slab_err+0x13c/0x190
>         check_slab+0xf4/0x140
>         alloc_debug_processing+0x58/0x248
>         ___slab_alloc+0x22c/0xa38
>         __slab_alloc.isra.0+0x60/0x88
>         kmem_cache_alloc_node_noprof+0x148/0x3c0
>         __kmem_cache_create+0x5d4/0x680
>         create_boot_cache+0xc4/0x128
>         new_kmalloc_cache+0x1ac/0x1d8
>         create_kmalloc_caches+0xac/0x108
>         kmem_cache_init+0x1cc/0x340
>         mm_core_init+0x458/0x560
>         start_kernel+0x9ac/0x11e0
>         start_parisc+0x188/0x1b0
> 
> The interrupt subsystem is initialized way later and request_irq() only
> works after that point.
> 
> I'm 100% sure by now that this commit has absolutely nothing to do with
> the underlying problem. All it does is changing the code size and
> placement of text, data and ....
> 
> So I finally managed to reproduce with gcc-13.3.0 from the k.org cross
> tools (the debian 12.2.2 does not).
> 
> If I add:
> 
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1513,6 +1513,8 @@ static int
>   	unsigned long flags, thread_mask = 0;
>   	int ret, nested, shared = 0;
>   
> +	pr_info("%u: %08x\n", irq, new->flags);
> +
>   	if (!desc)
>   		return -EINVAL;
>   
> it still reproduces. If I change that to:
> 
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -1513,6 +1513,8 @@ static int
>   	unsigned long flags, thread_mask = 0;
>   	int ret, nested, shared = 0;
>   
> +	new->flags &= ~IRQF_COND_ONESHOT;
> +
>   	if (!desc)
>   		return -EINVAL;
>   
> that does neither cure it (unsurprisingly).
> 
> Reverting the "offending" commit cures it.
> 
> So I tried your
> 
>       +       pr_info_once("####### First timer interrupt\n");
> 
> which cures it too.
> 
> So now where is the difference between the printk() in __setup_irq(),
> the new->flag mangling, the revert and the printk() in timer interrupt?
> 
> There is ZERO functional change. There is no race either because at that
> point everything is single threaded and interrupts are disabled.
> 
> The only thing which changes is the code and data layout as I can
> observe when looking at System.map of the builds. I stared at the diffs
> for a bit, but nothing stood out.
> 
> Maybe the PARISC people can shed some light on it.
> 
> This reminds me of the x86 32-bit disaster we debugged a few days ago...
> 

Looks like it :-(

Thanks for checking !

Guenter


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
       [not found]         ` <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz>
@ 2024-08-06 23:24           ` Thomas Gleixner
  2024-08-07  0:49             ` James Bottomley
  2024-08-08  1:07             ` Guenter Roeck
  0 siblings, 2 replies; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-06 23:24 UTC (permalink / raw)
  To: Vlastimil Babka, Linus Torvalds, Guenter Roeck
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

Cc+: Helge, parisc ML

We're chasing a weird failure which has been tracked down to the
placement of the division library functions (I assume they are imported
from libgcc).

See the thread starting at:

  https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net

On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
> On 8/6/24 19:33, Thomas Gleixner wrote:
>> 
>> So this change adds 16 bytes to __softirq() which moves the division
>> functions up by 16 bytes. That's all it takes to make the stupid go
>> away....
>
> Heh I was actually wondering if the division is somhow messed up because
> maxobj = order_objects() and order_objects() does a division. Now I suspect
> it even more.

check_slab() calls into that muck, but I checked the disassembly of a
working and a broken kernel and the only difference there is the
displacement offset when the code calculates the call address, but
that's as expected a difference of 16 bytes.

Now it becomes interesting.

I added a unused function after __do_softirq() into the softirq text
section and filled it with ASM nonsense so that it occupies exactly one
page. That moves $$divoI, which is what check_slab() calls, exactly one
page forward:

    -0000000041218c70 T $$divoI
    +0000000041219c70 T $$divoI

Guess what happens? If falls on it's nose again.

Now with that ASM gunk I can steer the size conveniently. It works up
to:

    0000000041219c50 T $$divoI

and fails for

    0000000041219c60 T $$divoI
    0000000041219c70 T $$divoI

and works again at

    0000000041219c80 T $$divoI

So I added the following:

+extern void testme(void);
+extern unsigned int testsize;
+
+unsigned int testsize = 192;
+
+void __init testme(void)
+{
+	pr_info("TESTME: %lu\n", PAGE_SIZE / testsize);
+}

called that _before_ mm_core_init() from init/main.c and adjusted my ASM
hack to make $$divoI be at:

    0000000041219c70 T $$divoI

again and surprisingly the output is:

    [    0.000000] softirq: TESTME: 21

Now I went back to the hppa64 gcc version 12.2.0 again and did the same
ASM gunk adjustment so that $$divoI ends up at the offset 0xc70 in the
page and the same happens.

So it's not a compiler dependent problem.

But then I added a testme() call to the error path and get:

[    0.000000] softirq: TESTME: 21
[    0.000000] =============================================================================
[    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0

Now what's wrong?

Adding more debug:

[    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16 size 192 sorder 0 21

where the last '21' is the output of the same call which made maxobj go
south:

 static int check_slab(struct kmem_cache *s, struct slab *slab)
 {
 	int maxobj;
@@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache

 	maxobj = order_objects(slab_order(slab), s->size);
 	if (slab->objects > maxobj) {
-		slab_err(s, slab, "objects %u > max %u",
-			slab->objects, maxobj);
+		testme();
+		slab_err(s, slab, "objects %u > max %u size %u sorder %u %u",
+			 slab->objects, maxobj, s->size, slab_order(slab),
+			 order_objects(slab_order(slab), s->size));
 		return 0;
 	}
 	if (slab->inuse > slab->objects) {

I don't know and I don't want to know TBH...

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-06 23:24           ` Thomas Gleixner
@ 2024-08-07  0:49             ` James Bottomley
  2024-08-07  1:38               ` Guenter Roeck
  2024-08-07 12:45               ` Thomas Gleixner
  2024-08-08  1:07             ` Guenter Roeck
  1 sibling, 2 replies; 33+ messages in thread
From: James Bottomley @ 2024-08-07  0:49 UTC (permalink / raw)
  To: Thomas Gleixner, Vlastimil Babka, Linus Torvalds, Guenter Roeck
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote:
> Cc+: Helge, parisc ML
> 
> We're chasing a weird failure which has been tracked down to the
> placement of the division library functions (I assume they are
> imported
> from libgcc).
> 
> See the thread starting at:
> 
>  
> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net
> 
> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
> > On 8/6/24 19:33, Thomas Gleixner wrote:
> > > 
> > > So this change adds 16 bytes to __softirq() which moves the
> > > division
> > > functions up by 16 bytes. That's all it takes to make the stupid
> > > go
> > > away....
> > 
> > Heh I was actually wondering if the division is somhow messed up
> > because
> > maxobj = order_objects() and order_objects() does a division. Now I
> > suspect
> > it even more.
> 
> check_slab() calls into that muck, but I checked the disassembly of a
> working and a broken kernel and the only difference there is the
> displacement offset when the code calculates the call address, but
> that's as expected a difference of 16 bytes.
> 
> Now it becomes interesting.
> 
> I added a unused function after __do_softirq() into the softirq text
> section and filled it with ASM nonsense so that it occupies exactly
> one
> page. That moves $$divoI, which is what check_slab() calls, exactly
> one
> page forward:
> 
>     -0000000041218c70 T $$divoI
>     +0000000041219c70 T $$divoI
> 
> Guess what happens? If falls on it's nose again.
> 
> Now with that ASM gunk I can steer the size conveniently. It works up
> to:
> 
>     0000000041219c50 T $$divoI
> 
> and fails for
> 
>     0000000041219c60 T $$divoI
>     0000000041219c70 T $$divoI
> 
> and works again at
> 
>     0000000041219c80 T $$divoI

So just on this, you seem to have proved that only exact multiples of
48 work.  In terms of how PA-RISC caching works that's completely nuts
... however, there may be something else at work, like stack frame
alignment.

> 
> So I added the following:
> 
> +extern void testme(void);
> +extern unsigned int testsize;
> +
> +unsigned int testsize = 192;
> +
> +void __init testme(void)
> +{
> +       pr_info("TESTME: %lu\n", PAGE_SIZE / testsize);
> +}
> 
> called that _before_ mm_core_init() from init/main.c and adjusted my
> ASM hack to make $$divoI be at:
> 
>     0000000041219c70 T $$divoI
> 
> again and surprisingly the output is:
> 
>     [    0.000000] softirq: TESTME: 21

OK, why is that surprising?  4096/192 is 21 due to integer rounding.

> Now I went back to the hppa64 gcc version 12.2.0 again and did the
> same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70
> in the page and the same happens.
> 
> So it's not a compiler dependent problem.
> 
> But then I added a testme() call to the error path and get:
> 
> [    0.000000] softirq: TESTME: 21
> [    0.000000]
> =====================================================================
> ========
> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
> size 192 sorder 0
> 
> Now what's wrong?
> 
> Adding more debug:
> 
> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
> size 192 sorder 0 21
> 
> where the last '21' is the output of the same call which made maxobj
> go
> south:
> 
>  static int check_slab(struct kmem_cache *s, struct slab *slab)
>  {
>         int maxobj;
> @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache
>  
>         maxobj = order_objects(slab_order(slab), s->size);
>         if (slab->objects > maxobj) {
> -               slab_err(s, slab, "objects %u > max %u",
> -                       slab->objects, maxobj);
> +               testme();
> +               slab_err(s, slab, "objects %u > max %u size %u sorder
> %u %u",
> +                        slab->objects, maxobj, s->size,
> slab_order(slab),
> +                        order_objects(slab_order(slab), s->size));
>                 return 0;
>         }
>         if (slab->inuse > slab->objects) {
> 
> I don't know and I don't want to know TBH...

OK, so you're telling us we have a problem with slab_order on parisc
... that's folio_order, so it smells like a parisc bug with
folio_test_large?  Unfortuntely I'm a bit pissed in an airport lounge
on my way to the UK, so I've lost access to my pa test rig and can't
test further for a while.

Regards,

James


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-07  0:49             ` James Bottomley
@ 2024-08-07  1:38               ` Guenter Roeck
  2024-08-07 12:45               ` Thomas Gleixner
  1 sibling, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-07  1:38 UTC (permalink / raw)
  To: James Bottomley, Thomas Gleixner, Vlastimil Babka, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 8/6/24 17:49, James Bottomley wrote:
> On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote:
>> Cc+: Helge, parisc ML
>>
>> We're chasing a weird failure which has been tracked down to the
>> placement of the division library functions (I assume they are
>> imported
>> from libgcc).
>>
>> See the thread starting at:
>>
>>   
>> https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net
>>
>> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
>>> On 8/6/24 19:33, Thomas Gleixner wrote:
>>>>
>>>> So this change adds 16 bytes to __softirq() which moves the
>>>> division
>>>> functions up by 16 bytes. That's all it takes to make the stupid
>>>> go
>>>> away....
>>>
>>> Heh I was actually wondering if the division is somhow messed up
>>> because
>>> maxobj = order_objects() and order_objects() does a division. Now I
>>> suspect
>>> it even more.
>>
>> check_slab() calls into that muck, but I checked the disassembly of a
>> working and a broken kernel and the only difference there is the
>> displacement offset when the code calculates the call address, but
>> that's as expected a difference of 16 bytes.
>>
>> Now it becomes interesting.
>>
>> I added a unused function after __do_softirq() into the softirq text
>> section and filled it with ASM nonsense so that it occupies exactly
>> one
>> page. That moves $$divoI, which is what check_slab() calls, exactly
>> one
>> page forward:
>>
>>      -0000000041218c70 T $$divoI
>>      +0000000041219c70 T $$divoI
>>
>> Guess what happens? If falls on it's nose again.
>>
>> Now with that ASM gunk I can steer the size conveniently. It works up
>> to:
>>
>>      0000000041219c50 T $$divoI
>>
>> and fails for
>>
>>      0000000041219c60 T $$divoI
>>      0000000041219c70 T $$divoI
>>
>> and works again at
>>
>>      0000000041219c80 T $$divoI
> 
> So just on this, you seem to have proved that only exact multiples of
> 48 work.  In terms of how PA-RISC caching works that's completely nuts
> ... however, there may be something else at work, like stack frame
> alignment.
> 
>>
>> So I added the following:
>>
>> +extern void testme(void);
>> +extern unsigned int testsize;
>> +
>> +unsigned int testsize = 192;
>> +
>> +void __init testme(void)
>> +{
>> +       pr_info("TESTME: %lu\n", PAGE_SIZE / testsize);
>> +}
>>
>> called that _before_ mm_core_init() from init/main.c and adjusted my
>> ASM hack to make $$divoI be at:
>>
>>      0000000041219c70 T $$divoI
>>
>> again and surprisingly the output is:
>>
>>      [    0.000000] softirq: TESTME: 21
> 
> OK, why is that surprising?  4096/192 is 21 due to integer rounding.
> 

Problem is that it sometimes wrongly returns 16. But not always.
The surprise may be that it is not consistently wrong, even if
divU is at the same location.

>> Now I went back to the hppa64 gcc version 12.2.0 again and did the
>> same ASM gunk adjustment so that $$divoI ends up at the offset 0xc70
>> in the page and the same happens.
>>
>> So it's not a compiler dependent problem.
>>
>> But then I added a testme() call to the error path and get:
>>
>> [    0.000000] softirq: TESTME: 21
>> [    0.000000]
>> =====================================================================
>> ========
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> size 192 sorder 0
>>
>> Now what's wrong?
>>
>> Adding more debug:
>>
>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> size 192 sorder 0 21
>>
>> where the last '21' is the output of the same call which made maxobj
>> go
>> south:
>>
>>   static int check_slab(struct kmem_cache *s, struct slab *slab)
>>   {
>>          int maxobj;
>> @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache
>>   
>>          maxobj = order_objects(slab_order(slab), s->size);
>>          if (slab->objects > maxobj) {
>> -               slab_err(s, slab, "objects %u > max %u",
>> -                       slab->objects, maxobj);
>> +               testme();
>> +               slab_err(s, slab, "objects %u > max %u size %u sorder
>> %u %u",
>> +                        slab->objects, maxobj, s->size,
>> slab_order(slab),
>> +                        order_objects(slab_order(slab), s->size));
>>                  return 0;
>>          }
>>          if (slab->inuse > slab->objects) {
>>
>> I don't know and I don't want to know TBH...
> 
> OK, so you're telling us we have a problem with slab_order on parisc
> ... that's folio_order, so it smells like a parisc bug with
> folio_test_large?  Unfortuntely I'm a bit pissed in an airport lounge
> on my way to the UK, so I've lost access to my pa test rig and can't
> test further for a while.
> 

I think the problem is rather that divU, for some unknown reason, sometimes returns
bad results. The divU implementation in libgcc isn't exactly easy to understand.
There may be some context problem, such as the upper bits of some register not
being cleared under some circumstances. Look at my recent commits into
arch/parisc for examples how similar problems survived for a long time in the
Linux kernel.

I'd strongly suspect a qemu emulation problem if it wasn't for those other
problems. Still, I think it is very likely that the problem lies in qemu,
but we might need a confirmation one way or the other from a test on real
hardware.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-07  0:49             ` James Bottomley
  2024-08-07  1:38               ` Guenter Roeck
@ 2024-08-07 12:45               ` Thomas Gleixner
  1 sibling, 0 replies; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-07 12:45 UTC (permalink / raw)
  To: James Bottomley, Vlastimil Babka, Linus Torvalds, Guenter Roeck
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On Tue, Aug 06 2024 at 20:49, James Bottomley wrote:
> On Wed, 2024-08-07 at 01:24 +0200, Thomas Gleixner wrote:
>>  static int check_slab(struct kmem_cache *s, struct slab *slab)
>>  {
>>         int maxobj;
>> @@ -1386,8 +1388,10 @@ static int check_slab(struct kmem_cache
>>  
>>         maxobj = order_objects(slab_order(slab), s->size);
>>         if (slab->objects > maxobj) {
>> -               slab_err(s, slab, "objects %u > max %u",
>> -                       slab->objects, maxobj);
>> +               testme();
>> +               slab_err(s, slab, "objects %u > max %u size %u sorder
>> %u %u",
>> +                        slab->objects, maxobj, s->size,
>> slab_order(slab),
>> +                        order_objects(slab_order(slab), s->size));
>>                 return 0;
>>         }
>>         if (slab->inuse > slab->objects) {
>> 
>> I don't know and I don't want to know TBH...
>
> OK, so you're telling us we have a problem with slab_order on parisc
> ... that's folio_order, so it smells like a parisc bug with
> folio_test_large?  Unfortuntely I'm a bit pissed in an airport lounge
> on my way to the UK, so I've lost access to my pa test rig and can't
> test further for a while.

The point is that there are two invocations for order_objects(...) in
that code.

	maxobj = order_objects(slab_order(slab), s->size);

and the extra one in the slab_err() output:

        slab_err(s, slab, "objects %u > max %u size %u sorder %u %u",
		 slab->objects, maxobj, s->size,slab_order(slab),
                 order_objects(slab_order(slab), s->size));

>> [    0.000000] BUG kmem_cache_node (Not tainted): objects 21 > max 16
>> size 192 sorder 0 21

So maxobj = 16 and the second invocation correctly returns 21, if and
only if the $$divoI placement is in that weird range.

When I move it out of that range then both return 21 as expected.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-06 23:24           ` Thomas Gleixner
  2024-08-07  0:49             ` James Bottomley
@ 2024-08-08  1:07             ` Guenter Roeck
  2024-08-08  7:48               ` Vlastimil Babka
  2024-08-08  9:57               ` Thomas Gleixner
  1 sibling, 2 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-08  1:07 UTC (permalink / raw)
  To: Thomas Gleixner, Vlastimil Babka, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 8/6/24 16:24, Thomas Gleixner wrote:
> Cc+: Helge, parisc ML
> 
> We're chasing a weird failure which has been tracked down to the
> placement of the division library functions (I assume they are imported
> from libgcc).
> 
> See the thread starting at:
> 
>    https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net
> 
> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
>> On 8/6/24 19:33, Thomas Gleixner wrote:
>>>
>>> So this change adds 16 bytes to __softirq() which moves the division
>>> functions up by 16 bytes. That's all it takes to make the stupid go
>>> away....
>>
>> Heh I was actually wondering if the division is somhow messed up because
>> maxobj = order_objects() and order_objects() does a division. Now I suspect
>> it even more.
> 
> check_slab() calls into that muck, but I checked the disassembly of a
> working and a broken kernel and the only difference there is the
> displacement offset when the code calculates the call address, but
> that's as expected a difference of 16 bytes.
> 
> Now it becomes interesting.
> 
> I added a unused function after __do_softirq() into the softirq text
> section and filled it with ASM nonsense so that it occupies exactly one
> page. That moves $$divoI, which is what check_slab() calls, exactly one
> page forward:
> 

With the above added to my tree, I can also play around with the code.
Here is the next weird one:

diff --git a/mm/slub.c b/mm/slub.c
index 4927edec6a8c..b8a33966d858 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
         }

         maxobj = order_objects(slab_order(slab), s->size);
+
+       pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
+
         if (slab->objects > maxobj) {
                 slab_err(s, slab, "objects %u > max %u",
                         slab->objects, maxobj);

results in:

##### slab->objects=21 maxobj=21
=============================================================================
BUG kmem_cache_node (Not tainted): objects 21 > max 16

As Thomas noticed, this only happens if the divide assembler code is within a certain
address range.

Ok, now I am really lost.

Guenter


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08  1:07             ` Guenter Roeck
@ 2024-08-08  7:48               ` Vlastimil Babka
  2024-08-08 14:46                 ` Guenter Roeck
  2024-08-08  9:57               ` Thomas Gleixner
  1 sibling, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2024-08-08  7:48 UTC (permalink / raw)
  To: Guenter Roeck, Thomas Gleixner, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 8/8/24 03:07, Guenter Roeck wrote:
> On 8/6/24 16:24, Thomas Gleixner wrote:
>> Cc+: Helge, parisc ML
>> 
>> We're chasing a weird failure which has been tracked down to the
>> placement of the division library functions (I assume they are imported
>> from libgcc).
>> 
>> See the thread starting at:
>> 
>>    https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net
>> 
>> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
>>> On 8/6/24 19:33, Thomas Gleixner wrote:
>>>>
>>>> So this change adds 16 bytes to __softirq() which moves the division
>>>> functions up by 16 bytes. That's all it takes to make the stupid go
>>>> away....
>>>
>>> Heh I was actually wondering if the division is somhow messed up because
>>> maxobj = order_objects() and order_objects() does a division. Now I suspect
>>> it even more.
>> 
>> check_slab() calls into that muck, but I checked the disassembly of a
>> working and a broken kernel and the only difference there is the
>> displacement offset when the code calculates the call address, but
>> that's as expected a difference of 16 bytes.
>> 
>> Now it becomes interesting.
>> 
>> I added a unused function after __do_softirq() into the softirq text
>> section and filled it with ASM nonsense so that it occupies exactly one
>> page. That moves $$divoI, which is what check_slab() calls, exactly one
>> page forward:
>> 
> 
> With the above added to my tree, I can also play around with the code.
> Here is the next weird one:
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 4927edec6a8c..b8a33966d858 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
>          }
> 
>          maxobj = order_objects(slab_order(slab), s->size);
> +
> +       pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
> +
>          if (slab->objects > maxobj) {
>                  slab_err(s, slab, "objects %u > max %u",
>                          slab->objects, maxobj);
> 
> results in:
> 
> ##### slab->objects=21 maxobj=21
> =============================================================================
> BUG kmem_cache_node (Not tainted): objects 21 > max 16

But is this printed from the same attempt? The pr_info_once() might have
printed earlier and then stopped (as it's _once) and the error case might
have happened only later, and there was nothing printed in between as the
kmalloc caches are created in a loop.

> As Thomas noticed, this only happens if the divide assembler code is within a certain
> address range.
> 
> Ok, now I am really lost.
> 
> Guenter
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08  1:07             ` Guenter Roeck
  2024-08-08  7:48               ` Vlastimil Babka
@ 2024-08-08  9:57               ` Thomas Gleixner
  2024-08-08 14:59                 ` Guenter Roeck
  2024-08-08 15:53                 ` Linus Torvalds
  1 sibling, 2 replies; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-08  9:57 UTC (permalink / raw)
  To: Guenter Roeck, Vlastimil Babka, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On Wed, Aug 07 2024 at 18:07, Guenter Roeck wrote:
> On 8/6/24 16:24, Thomas Gleixner wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index 4927edec6a8c..b8a33966d858 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
>          }
>
>          maxobj = order_objects(slab_order(slab), s->size);
> +
> +       pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
> +
>          if (slab->objects > maxobj) {
>                  slab_err(s, slab, "objects %u > max %u",
>                          slab->objects, maxobj);
>
> results in:
>
> ##### slab->objects=21 maxobj=21
> =============================================================================
> BUG kmem_cache_node (Not tainted): objects 21 > max 16

Careful vs. the pr_once(). It's not necessarily the first allocation
which trips up. I removed slab_err() in that condition and just printed
the data:

[    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14
[    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1
[    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 18
[    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 19
[    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 20
[    0.000000] Order: 0 Size:  160 Nobj: 25 Maxobj: 16 25 Inuse:  5
[    0.000000] Order: 2 Size:  672 Nobj: 24 Maxobj: 16 24 Inuse:  1
[    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1
[    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  2
[    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10

The maxobj column shows the failed result and the result from the second
invocation inside of the printk().

Let's wait for PARISC people to run it on actual hardware.

Thanks,

        tglx






^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08  7:48               ` Vlastimil Babka
@ 2024-08-08 14:46                 ` Guenter Roeck
  0 siblings, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-08 14:46 UTC (permalink / raw)
  To: Vlastimil Babka, Thomas Gleixner, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 8/8/24 00:48, Vlastimil Babka wrote:
> On 8/8/24 03:07, Guenter Roeck wrote:
>> On 8/6/24 16:24, Thomas Gleixner wrote:
>>> Cc+: Helge, parisc ML
>>>
>>> We're chasing a weird failure which has been tracked down to the
>>> placement of the division library functions (I assume they are imported
>>> from libgcc).
>>>
>>> See the thread starting at:
>>>
>>>     https://lore.kernel.org/all/718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net
>>>
>>> On Tue, Aug 06 2024 at 21:25, Vlastimil Babka wrote:
>>>> On 8/6/24 19:33, Thomas Gleixner wrote:
>>>>>
>>>>> So this change adds 16 bytes to __softirq() which moves the division
>>>>> functions up by 16 bytes. That's all it takes to make the stupid go
>>>>> away....
>>>>
>>>> Heh I was actually wondering if the division is somhow messed up because
>>>> maxobj = order_objects() and order_objects() does a division. Now I suspect
>>>> it even more.
>>>
>>> check_slab() calls into that muck, but I checked the disassembly of a
>>> working and a broken kernel and the only difference there is the
>>> displacement offset when the code calculates the call address, but
>>> that's as expected a difference of 16 bytes.
>>>
>>> Now it becomes interesting.
>>>
>>> I added a unused function after __do_softirq() into the softirq text
>>> section and filled it with ASM nonsense so that it occupies exactly one
>>> page. That moves $$divoI, which is what check_slab() calls, exactly one
>>> page forward:
>>>
>>
>> With the above added to my tree, I can also play around with the code.
>> Here is the next weird one:
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 4927edec6a8c..b8a33966d858 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
>>           }
>>
>>           maxobj = order_objects(slab_order(slab), s->size);
>> +
>> +       pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
>> +
>>           if (slab->objects > maxobj) {
>>                   slab_err(s, slab, "objects %u > max %u",
>>                           slab->objects, maxobj);
>>
>> results in:
>>
>> ##### slab->objects=21 maxobj=21
>> =============================================================================
>> BUG kmem_cache_node (Not tainted): objects 21 > max 16
> 
> But is this printed from the same attempt? The pr_info_once() might have
> printed earlier and then stopped (as it's _once) and the error case might
> have happened only later, and there was nothing printed in between as the
> kmalloc caches are created in a loop.
> 

No, of course it isn't. Guess it was too late. Sorry for the noise.
Here is the updated log, after dropping the _once:

...
[    0.000000] ##### slab->objects=21 maxobj=21
[    0.000000] ##### slab->objects=25 maxobj=25
[    0.000000] ##### slab->objects=21 maxobj=16
[    0.000000] =============================================================================
[    0.000000] BUG kmalloc-256 (Not tainted): objects 21 > max 16

So this works many times and then suddenly fails. I thought it was
the other way, that it failed the very first time. Ok, back to debugging.

Thanks!
Guenter


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08  9:57               ` Thomas Gleixner
@ 2024-08-08 14:59                 ` Guenter Roeck
  2024-08-08 15:58                   ` John David Anglin
  2024-08-08 15:53                 ` Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Guenter Roeck @ 2024-08-08 14:59 UTC (permalink / raw)
  To: Thomas Gleixner, Vlastimil Babka, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 8/8/24 02:57, Thomas Gleixner wrote:
> On Wed, Aug 07 2024 at 18:07, Guenter Roeck wrote:
>> On 8/6/24 16:24, Thomas Gleixner wrote:
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 4927edec6a8c..b8a33966d858 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1385,6 +1385,9 @@ static int check_slab(struct kmem_cache *s, struct slab *slab)
>>           }
>>
>>           maxobj = order_objects(slab_order(slab), s->size);
>> +
>> +       pr_info_once("##### slab->objects=%u maxobj=%u\n", slab->objects, maxobj);
>> +
>>           if (slab->objects > maxobj) {
>>                   slab_err(s, slab, "objects %u > max %u",
>>                           slab->objects, maxobj);
>>
>> results in:
>>
>> ##### slab->objects=21 maxobj=21
>> =============================================================================
>> BUG kmem_cache_node (Not tainted): objects 21 > max 16
> 
> Careful vs. the pr_once(). It's not necessarily the first allocation

Yes. Thanks for pointing that out. Rookie mistake :-(.

> which trips up. I removed slab_err() in that condition and just printed
> the data:
> 
> [    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14
> [    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 18
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 19
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 20
> [    0.000000] Order: 0 Size:  160 Nobj: 25 Maxobj: 16 25 Inuse:  5
> [    0.000000] Order: 2 Size:  672 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  2
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10
> 
> The maxobj column shows the failed result and the result from the second
> invocation inside of the printk().
> 
> Let's wait for PARISC people to run it on actual hardware.
> 

Agreed. I suspect that there is a carry or upper 32 bit of a register not
handled properly, but I have no idea where that might be or why that would
only be seen if the div functions are located in a certain address range.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08  9:57               ` Thomas Gleixner
  2024-08-08 14:59                 ` Guenter Roeck
@ 2024-08-08 15:53                 ` Linus Torvalds
  2024-08-08 16:12                   ` Thomas Gleixner
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2024-08-08 15:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Careful vs. the pr_once(). It's not necessarily the first allocation
> which trips up. I removed slab_err() in that condition and just printed
> the data:
>
> [    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14
> [    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 18
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 19
> [    0.000000] Order: 1 Size:  320 Nobj: 25 Maxobj: 16 25 Inuse: 20
> [    0.000000] Order: 0 Size:  160 Nobj: 25 Maxobj: 16 25 Inuse:  5
> [    0.000000] Order: 2 Size:  672 Nobj: 24 Maxobj: 16 24 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  2
> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse: 10
>
> The maxobj column shows the failed result and the result from the second
> invocation inside of the printk().

Hmm. There's a few patterns there:

 - the incorrect Maxobj is always 16, with wildly different sizes.

 - the correct value is always in that 21-25 range

and neither of these are particularly common cases for slab objects
(well, at least on x86-64).

I actually went into the gcc sources to look at the libgcc routines
for the hppa $$divU routine, but apart from checking for trivial
powers-of-two and for divisions with small divisor values (<=17), all
it is ends up being a series of "ds" (divide step) and "addc"
instructions. I don't see how that could possibly mess up. It does end
up with the final addc in the delay slot of the return, but that's
normal parisc behavior (and here by "normal" I mean "it's a really
messed up instruction set that did everything wrong, including branch
delay slots")

I do note that the $$divU function (which is what this all should use)
oddly doesn't show up as defined in 'nm' for me when I look at
Guenter's vmlinux file. So there's some odd linker thing going on, and
it *only* affects the $$div* functions.

Thomas' System.map shows some of the same effects, ie it shows $$divoI
(signed integer divide with overflow checking), but doesn't show
$$divU that is right after it. The reason I was looking was exactly
because this should be using $$divU, and clearly code alignment is
implicated somehow, but the exact alignment of $$divU wasn't obvious.

But it looks like "$$divU" should be somewhere between $$divoI and
$$divl_2, and in Guenter's bad case that's

  0000000041218c70 T $$divoI
  00000000412190d0 T $$divI_2

so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
into 0000000041219000?

Some ITLB fill issue together with that delayed branch and a qemu bug?

                Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 14:59                 ` Guenter Roeck
@ 2024-08-08 15:58                   ` John David Anglin
  0 siblings, 0 replies; 33+ messages in thread
From: John David Anglin @ 2024-08-08 15:58 UTC (permalink / raw)
  To: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, Linus Torvalds
  Cc: linux-kernel, Linux-MM, Helge Deller, linux-parisc

On 2024-08-08 10:59 a.m., Guenter Roeck wrote:
>> Let's wait for PARISC people to run it on actual hardware.
>>
>
> Agreed. I suspect that there is a carry or upper 32 bit of a register not
> handled properly, but I have no idea where that might be or why that would
> only be seen if the div functions are located in a certain address range.
I'm doubtful there's a coding issue in $$divoI.  The routine was written by HP and
it's used on both HP-UX and Linux.  The routine can be found in libgcc/config/pa/milli64.S

The routine can trap:
    Divide by zero is trapped.
    Divide of -2**31 by -1 is trapped for $$divoI but not for $$divI.

$$divoI is a millicode routine.  Not sure what calls it.  gcc doesn't call it.  gcc uses $$divI.

It appears you are testing a 64-bit kernel.  There might be issues calling millicode routines
when branch distance exceeds approximately 4 MB.  Millicode routines have a special
calling sequence.  You could try building kernel with -mlong-calls. Kernel will get bugger
and slower with this option.

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 15:53                 ` Linus Torvalds
@ 2024-08-08 16:12                   ` Thomas Gleixner
  2024-08-08 16:33                     ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-08 16:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, Aug 08 2024 at 08:53, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 02:57, Thomas Gleixner <tglx@linutronix.de> wrote:
> Hmm. There's a few patterns there:
>
>  - the incorrect Maxobj is always 16, with wildly different sizes.

Which means that the size value is rounded up to the next power of 2

>> [    0.000000] Order: 1 Size:  384 Nobj: 21 Maxobj: 16 21 Inuse: 14

   8192/16 = 512

>> [    0.000000] Order: 0 Size:  168 Nobj: 24 Maxobj: 16 24 Inuse:  1

   4096/16 = 256

>> [    0.000000] Order: 3 Size: 1536 Nobj: 21 Maxobj: 16 21 Inuse:  1

  32768/16 = 2048

>> The maxobj column shows the failed result and the result from the second
>> invocation inside of the printk().

> I actually went into the gcc sources to look at the libgcc routines
> for the hppa $$divU routine, but apart from checking for trivial
> powers-of-two and for divisions with small divisor values (<=17), all
> it is ends up being a series of "ds" (divide step) and "addc"
> instructions. I don't see how that could possibly mess up. It does end
> up with the final addc in the delay slot of the return, but that's
> normal parisc behavior (and here by "normal" I mean "it's a really
> messed up instruction set that did everything wrong, including branch
> delay slots")
>
> I do note that the $$divU function (which is what this all should use)
> oddly doesn't show up as defined in 'nm' for me when I look at
> Guenter's vmlinux file. So there's some odd linker thing going on, and
> it *only* affects the $$div* functions.
>
> Thomas' System.map shows some of the same effects, ie it shows $$divoI
> (signed integer divide with overflow checking), but doesn't show
> $$divU that is right after it. The reason I was looking was exactly
> because this should be using $$divU, and clearly code alignment is
> implicated somehow, but the exact alignment of $$divU wasn't obvious.
>
> But it looks like "$$divU" should be somewhere between $$divoI and
> $$divl_2, and in Guenter's bad case that's
>
>   0000000041218c70 T $$divoI
>   00000000412190d0 T $$divI_2
>
> so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
> into 0000000041219000?

It uses $$divU which is at $$divoI + 0x250. I validated that in the
disassembly.

Thanks,

        tglx


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 16:12                   ` Thomas Gleixner
@ 2024-08-08 16:33                     ` Linus Torvalds
  2024-08-08 17:48                       ` Thomas Gleixner
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2024-08-08 16:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, 8 Aug 2024 at 09:12, Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > But it looks like "$$divU" should be somewhere between $$divoI and
> > $$divl_2, and in Guenter's bad case that's
> >
> >   0000000041218c70 T $$divoI
> >   00000000412190d0 T $$divI_2
> >
> > so *maybe* $$divU is around a page boundary? 0000000041218xxx turning
> > into 0000000041219000?
>
> It uses $$divU which is at $$divoI + 0x250. I validated that in the
> disassembly.

Well, that does support "maybe we have a page crosser issue", but it's
not quite at the delayed branch.

Because that would mean that $$divU starts at 0x41218ec0, and that
means that there are 80 instructions from the start of $$divU to the
end of that 0x41218xxx page.

And if I counted instructions right (I don't have a disassembler, so
I'm just looking at the libgcc sources), that puts the page crosser
not quite at the delayed branch slot, but it does put it somewhere
roughly at or around

        ds      temp,arg1,temp          /* 29th divide step */
        addc    retreg,retreg,retreg    /* shift retreg with/into carry */

so it's around the last few bits of the result. The ones we get wrong.

Which is intriguing, but honestly, I don't see how we could get itlb
misses horribly wrong and not crash left and right.

The $$divU routine is unusual in that it uses that millicode calling
convention, but I think that's just a different register for the
return address.

And it obviously depends on the carry flag, which is pretty unusual.
Maybe if the ITLB fill messes up C, it wouldn't show up in other
areas? But the $$divU result error is more than one single bit getting
cleared.

              Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 16:33                     ` Linus Torvalds
@ 2024-08-08 17:48                       ` Thomas Gleixner
  2024-08-08 18:19                         ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Gleixner @ 2024-08-08 17:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, Aug 08 2024 at 09:33, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 09:12, Thomas Gleixner <tglx@linutronix.de> wrote:
>> It uses $$divU which is at $$divoI + 0x250. I validated that in the
>> disassembly.
>
> Well, that does support "maybe we have a page crosser issue", but it's
> not quite at the delayed branch.
>
> Because that would mean that $$divU starts at 0x41218ec0, and that
> means that there are 80 instructions from the start of $$divU to the
> end of that 0x41218xxx page.
>
> And if I counted instructions right (I don't have a disassembler, so
> I'm just looking at the libgcc sources), that puts the page crosser
> not quite at the delayed branch slot, but it does put it somewhere
> roughly at or around
>
>         ds      temp,arg1,temp          /* 29th divide step */
>         addc    retreg,retreg,retreg    /* shift retreg with/into carry */
>
> so it's around the last few bits of the result. The ones we get wrong.
>
> Which is intriguing, but honestly, I don't see how we could get itlb
> misses horribly wrong and not crash left and right.

Here is the disassembly from my latest crashing debug kernel which
shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work. 

    4121dec0:	37 21 3f ff 	ldo -1(r25),r1
    4121dec4:	08 39 22 00 	and,= r25,r1,r0
    4121dec8:	e8 00 00 88 	b,l 4121df14 <$$divoI+0x2a4>,r0
    4121decc:	b3 20 20 00 	addi,tc,= 0,r25,r0
    4121ded0:	08 1a 02 5d 	copy r26,ret1
    4121ded4:	d3 21 39 f0 	extrw,u,= r25,15,16,r1
    4121ded8:	d3 bd 19 f0 	extrw,u ret1,15,16,ret1
    4121dedc:	08 39 02 59 	or r25,r1,r25
    4121dee0:	34 1a 01 98 	ldi cc,r26
    4121dee4:	d3 21 3a f8 	extrw,u,= r25,23,8,r1
    4121dee8:	d3 bd 1a e8 	extrw,u ret1,23,24,ret1
    4121deec:	08 39 02 59 	or r25,r1,r25
    4121def0:	34 01 01 54 	ldi aa,r1
    4121def4:	d3 20 3b 7c 	extrw,u,= r25,27,4,r0
    4121def8:	d3 bd 1b 64 	extrw,u ret1,27,28,ret1
    4121defc:	0b 59 22 00 	and,= r25,r26,r0
    4121df00:	d3 bd 1b a2 	extrw,u ret1,29,30,ret1
    4121df04:	08 39 22 00 	and,= r25,r1,r0
    4121df08:	d3 bd 1b c1 	extrw,u ret1,30,31,ret1
    4121df0c:	e8 40 c0 02 	bv,n r0(rp)
    4121df10:	08 00 02 40 	nop
    4121df18:	97 21 00 00 	subi 0,r25,r1
    4121df1c:	08 20 04 40 	ds r0,r1,r0
    4121df20:	0b 5a 06 1d 	add r26,r26,ret1
    4121df24:	0b 20 04 41 	ds r0,r25,r1
    4121df28:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df2c:	0b 21 04 41 	ds r1,r25,r1
    4121df30:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df34:	0b 21 04 41 	ds r1,r25,r1
    4121df38:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df3c:	0b 21 04 41 	ds r1,r25,r1
    4121df40:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df44:	0b 21 04 41 	ds r1,r25,r1
    4121df48:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df4c:	0b 21 04 41 	ds r1,r25,r1
    4121df50:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df54:	0b 21 04 41 	ds r1,r25,r1
    4121df58:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df5c:	0b 21 04 41 	ds r1,r25,r1
    4121df60:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df64:	0b 21 04 41 	ds r1,r25,r1
    4121df68:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df6c:	0b 21 04 41 	ds r1,r25,r1
    4121df70:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df74:	0b 21 04 41 	ds r1,r25,r1
    4121df78:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df7c:	0b 21 04 41 	ds r1,r25,r1
    4121df80:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df84:	0b 21 04 41 	ds r1,r25,r1
    4121df88:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df8c:	0b 21 04 41 	ds r1,r25,r1
    4121df90:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df94:	0b 21 04 41 	ds r1,r25,r1
    4121df98:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121df9c:	0b 21 04 41 	ds r1,r25,r1
    4121dfa0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfa4:	0b 21 04 41 	ds r1,r25,r1
    4121dfa8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfac:	0b 21 04 41 	ds r1,r25,r1
    4121dfb0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfb4:	0b 21 04 41 	ds r1,r25,r1
    4121dfb8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfbc:	0b 21 04 41 	ds r1,r25,r1
    4121dfc0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfc4:	0b 21 04 41 	ds r1,r25,r1
    4121dfc8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfcc:	0b 21 04 41 	ds r1,r25,r1
    4121dfd0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfd4:	0b 21 04 41 	ds r1,r25,r1
    4121dfd8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfdc:	0b 21 04 41 	ds r1,r25,r1
    4121dfe0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfe4:	0b 21 04 41 	ds r1,r25,r1
    4121dfe8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dfec:	0b 21 04 41 	ds r1,r25,r1
    4121dff0:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dff4:	0b 21 04 41 	ds r1,r25,r1
    4121dff8:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121dffc:	0b 21 04 41 	ds r1,r25,r1
    4121e000:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e004:	0b 21 04 41 	ds r1,r25,r1
    4121e008:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e00c:	0b 21 04 41 	ds r1,r25,r1
    4121e010:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e014:	0b 21 04 41 	ds r1,r25,r1
    4121e018:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e01c:	0b 21 04 41 	ds r1,r25,r1
    4121e020:	e8 40 c0 00 	bv r0(rp)
    4121e024:	0b bd 07 1d 	add,c ret1,ret1,ret1
    4121e028:	f3 20 0c 00 	depd,* r0,31,32,r25
    4121e02c:	8f 20 61 10 	cmpib,> 0,r25,4121e0bc <$$divoI+0x44c>
    4121e030:	08 00 02 40 	nop
    4121e034:	e8 19 40 00 	blr r25,r0
    4121e038:	08 00 02 40 	nop
    4121e03c:	b3 20 20 00 	addi,tc,= 0,r25,r0
    4121e040:	08 00 02 40 	nop
    4121e044:	e8 40 c0 00 	bv r0(rp)
    4121e048:	08 1a 02 5d 	copy r26,ret1
    4121e04c:	e8 40 c0 00 	bv r0(rp)
    4121e050:	d3 5d 1b c1 	extrw,u r26,30,31,ret1
    4121e054:	e8 00 01 c2 	b,l,n 4121e13c <$$divI_16+0x3c>,r0
    4121e058:	08 00 02 40 	nop
    4121e05c:	e8 40 c0 00 	bv r0(rp)
    4121e060:	d3 5d 1b a2 	extrw,u r26,29,30,ret1
    4121e064:	e8 00 02 2a 	b,l,n 4121e180 <$$divI_16+0x80>,r0
    4121e068:	08 00 02 40 	nop
    4121e06c:	e8 00 02 aa 	b,l,n 4121e1c8 <$$divI_16+0xc8>,r0
    4121e070:	08 00 02 40 	nop
    4121e074:	e8 00 06 9a 	b,l,n 4121e3c8 <$$divU_17+0xbc>,r0
    4121e078:	08 00 02 40 	nop
    4121e07c:	e8 40 c0 00 	bv r0(rp)
    4121e080:	d3 5d 1b 83 	extrw,u r26,28,29,ret1
    4121e084:	e8 00 07 12 	b,l,n 4121e414 <$$divU_17+0x108>,r0
    4121e088:	08 00 02 40 	nop
    4121e08c:	e8 00 02 9a 	b,l,n 4121e1e0 <$$divI_16+0xe0>,r0
    4121e090:	08 00 02 40 	nop
    4121e094:	e8 1f 1d 0d 	b,l 4121df20 <$$divoI+0x2b0>,r0
    4121e098:	08 20 04 40 	ds r0,r1,r0
    4121e09c:	e8 00 03 fa 	b,l,n 4121e2a0 <$$divI_16+0x1a0>,r0
    4121e0a0:	08 00 02 40 	nop
    4121e0a4:	e8 1f 1c ed 	b,l 4121df20 <$$divoI+0x2b0>,r0
    4121e0a8:	08 20 04 40 	ds r0,r1,r0
    4121e0ac:	e8 00 07 02 	b,l,n 4121e434 <$$divU_17+0x128>,r0
    4121e0b0:	08 00 02 40 	nop
    4121e0b4:	e8 00 04 22 	b,l,n 4121e2cc <$$divI_16+0x1cc>,r0
    4121e0b8:	08 00 02 40 	nop
    4121e0bc:	0b 3a 04 00 	sub r26,r25,r0
    4121e0c0:	e8 40 c0 00 	bv r0(rp)
    4121e0c4:	08 00 07 1d 	add,c r0,r0,ret1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 17:48                       ` Thomas Gleixner
@ 2024-08-08 18:19                         ` Linus Torvalds
  2024-08-08 20:52                           ` Guenter Roeck
  2024-09-03  7:54                           ` Helge Deller
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2024-08-08 18:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Vlastimil Babka, linux-kernel, Linux-MM,
	Helge Deller, linux-parisc

On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> Here is the disassembly from my latest crashing debug kernel which
> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.

Looks like I was off by an instruction, it's the 28th divide-step (not
29) that does the page crosser:

>     4121dffc:   0b 21 04 41     ds r1,r25,r1
>     4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1

but my parisc knowledge is not good enough to even guess at what could go wrong.

And I have no actual reason to believe this has *anything* to do with
an itlb miss, except for that whole "exact placement seems to matter,
and it crosses a page boundary" detail.

None of this makes sense. I think we'll have to wait for Helge. It's
not like parisc is a huge concern, and for all we know this is all a
qemu bug to begin with.

          Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 18:19                         ` Linus Torvalds
@ 2024-08-08 20:52                           ` Guenter Roeck
  2024-08-08 21:50                             ` John David Anglin
  2024-08-08 22:15                             ` Richard Henderson
  2024-09-03  7:54                           ` Helge Deller
  1 sibling, 2 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-08 20:52 UTC (permalink / raw)
  To: Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 8/8/24 11:19, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Here is the disassembly from my latest crashing debug kernel which
>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
> 
> Looks like I was off by an instruction, it's the 28th divide-step (not
> 29) that does the page crosser:
> 
>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
> 
> but my parisc knowledge is not good enough to even guess at what could go wrong.
> 
> And I have no actual reason to believe this has *anything* to do with
> an itlb miss, except for that whole "exact placement seems to matter,
> and it crosses a page boundary" detail.
> 
> None of this makes sense. I think we'll have to wait for Helge. It's
> not like parisc is a huge concern, and for all we know this is all a
> qemu bug to begin with.
> 

Copying Richard Henderson who recently made a number of changes to the
parisc/hppa qemu implementation (which unfortunately didn't fix the problem).

Guenter


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 20:52                           ` Guenter Roeck
@ 2024-08-08 21:50                             ` John David Anglin
  2024-08-08 22:29                               ` John David Anglin
  2024-08-09  0:50                               ` Guenter Roeck
  2024-08-08 22:15                             ` Richard Henderson
  1 sibling, 2 replies; 33+ messages in thread
From: John David Anglin @ 2024-08-08 21:50 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 2024-08-08 4:52 p.m., Guenter Roeck wrote:
> On 8/8/24 11:19, Linus Torvalds wrote:
>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Here is the disassembly from my latest crashing debug kernel which
>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>>
>> Looks like I was off by an instruction, it's the 28th divide-step (not
>> 29) that does the page crosser:
>>
>>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
I think this macro might clobber the C/B bits on a ITLB missing:

         /* This is for ILP32 PA2.0 only.  The TLB insertion needs
          * to extend into I/O space if the address is 0xfXXXXXXX
          * so we extend the f's into the top word of the pte in
          * this case */
         .macro          f_extend        pte,tmp
         extrd,s         \pte,42,4,\tmp
         addi,<>         1,\tmp,%r0
         extrd,s         \pte,63,25,\pte
         .endm

The addi instruction affects the C/B bits.  However, it is only used for 32-bit PA 2.0 kernels.
A second tmp register would be needed to change the addi to an add logical.

The mode likely problem is the shladd instruction in the following macro in entry.S:

         .macro          L2_ptep pmd,pte,index,va,fault
#if CONFIG_PGTABLE_LEVELS == 3
         extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
#else
         extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
#endif
         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
#if CONFIG_PGTABLE_LEVELS < 3
         copy            %r0,\pte
#endif
         ldw,s           \index(\pmd),\pmd
         bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
         dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
         SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
         extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
         shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
         .endm

I believe the shladd instruction should be changed to shladd,l (shift left and add logical).

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 20:52                           ` Guenter Roeck
  2024-08-08 21:50                             ` John David Anglin
@ 2024-08-08 22:15                             ` Richard Henderson
  1 sibling, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-08-08 22:15 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc

On 8/9/24 06:52, Guenter Roeck wrote:
> On 8/8/24 11:19, Linus Torvalds wrote:
>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>
>>> Here is the disassembly from my latest crashing debug kernel which
>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>>
>> Looks like I was off by an instruction, it's the 28th divide-step (not
>> 29) that does the page crosser:
>>
>>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
>>
>> but my parisc knowledge is not good enough to even guess at what could go wrong.
>>
>> And I have no actual reason to believe this has *anything* to do with
>> an itlb miss, except for that whole "exact placement seems to matter,
>> and it crosses a page boundary" detail.
>>
>> None of this makes sense. I think we'll have to wait for Helge. It's
>> not like parisc is a huge concern, and for all we know this is all a
>> qemu bug to begin with.
>>
> 
> Copying Richard Henderson who recently made a number of changes to the
> parisc/hppa qemu implementation (which unfortunately didn't fix the problem).

Wow, that's quite the agile bug you've got there.

You can eliminate one class of qemu bug by attempting to reproduce in qemu-linux-user: 
arrange for the page crossing at the appropriate spot and see if the split between two 
translation blocks causes carry flag weirdness.

If that doesn't reproduce, then I'd be likely to blame something in the exception delivery 
or return process.  Still could be a qemu problem, but it would be something in the system 
emulation of the exception path.

It should be possible to write a small system mode test case for this hypothesis.  Ideally 
the itlb miss handler would be as simple a possible, e.g. computing an identity mapping 
rather than using real page tables.

Only after that would I start digging into the linux kernel's exception paths.

r~

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 21:50                             ` John David Anglin
@ 2024-08-08 22:29                               ` John David Anglin
  2024-08-08 23:33                                 ` Linus Torvalds
  2024-08-09  0:56                                 ` Guenter Roeck
  2024-08-09  0:50                               ` Guenter Roeck
  1 sibling, 2 replies; 33+ messages in thread
From: John David Anglin @ 2024-08-08 22:29 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 2024-08-08 5:50 p.m., John David Anglin wrote:
> The mode likely problem is the shladd instruction in the following macro in entry.S:
>
>         .macro          L2_ptep pmd,pte,index,va,fault
> #if CONFIG_PGTABLE_LEVELS == 3
>         extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
> #else
>         extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
> #endif
>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
> #if CONFIG_PGTABLE_LEVELS < 3
>         copy            %r0,\pte
> #endif
>         ldw,s           \index(\pmd),\pmd
>         bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
>         dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
>         SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
>         extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>         shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>         .endm
>
> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
index ab23e61a6f01..1ec60406f841 100644
--- a/arch/parisc/kernel/entry.S
+++ b/arch/parisc/kernel/entry.S
@@ -399,7 +399,7 @@
      SHLREG        \pmd,PxD_VALUE_SHIFT,\pmd
      extru_safe    \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
      dep        %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
-    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
+    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
      .endm

      /* Look up PTE in a 3-Level scheme. */

Boots okay.  Fixing the addi instruction is harder and it would take some time to test.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 22:29                               ` John David Anglin
@ 2024-08-08 23:33                                 ` Linus Torvalds
  2024-08-09  0:33                                   ` John David Anglin
  2024-08-09  0:56                                 ` Guenter Roeck
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2024-08-08 23:33 UTC (permalink / raw)
  To: John David Anglin
  Cc: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, linux-kernel,
	Linux-MM, Helge Deller, linux-parisc, Richard Henderson

On Thu, 8 Aug 2024 at 15:30, John David Anglin <dave.anglin@bell.net> wrote:
>
> > I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
>
> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
> index ab23e61a6f01..1ec60406f841 100644
> --- a/arch/parisc/kernel/entry.S
> +++ b/arch/parisc/kernel/entry.S
> @@ -399,7 +399,7 @@
> -    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
> +    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */

This doesn't seem wrong, but doesn't RFIR already restore the status word?

So even if the itlb fill modifies C/B, I don't see why that should
actually matter.

But again, parisc is very much not one of the architectures I've ever
worked with, so..

          Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 23:33                                 ` Linus Torvalds
@ 2024-08-09  0:33                                   ` John David Anglin
  0 siblings, 0 replies; 33+ messages in thread
From: John David Anglin @ 2024-08-09  0:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Thomas Gleixner, Vlastimil Babka, linux-kernel,
	Linux-MM, Helge Deller, linux-parisc, Richard Henderson

On 2024-08-08 7:33 p.m., Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 15:30, John David Anglin <dave.anglin@bell.net> wrote:
>>> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
>> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
>> index ab23e61a6f01..1ec60406f841 100644
>> --- a/arch/parisc/kernel/entry.S
>> +++ b/arch/parisc/kernel/entry.S
>> @@ -399,7 +399,7 @@
>> -    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>> +    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
> This doesn't seem wrong, but doesn't RFIR already restore the status word?
Yes.
>
> So even if the itlb fill modifies C/B, I don't see why that should
> actually matter.
Probably won't help unless there's a bug in qemu rfir emulation.

Dave

-- 
John David Anglin  dave.anglin@bell.net


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 21:50                             ` John David Anglin
  2024-08-08 22:29                               ` John David Anglin
@ 2024-08-09  0:50                               ` Guenter Roeck
  1 sibling, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-09  0:50 UTC (permalink / raw)
  To: John David Anglin, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 8/8/24 14:50, John David Anglin wrote:
> On 2024-08-08 4:52 p.m., Guenter Roeck wrote:
>> On 8/8/24 11:19, Linus Torvalds wrote:
>>> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>>>
>>>> Here is the disassembly from my latest crashing debug kernel which
>>>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>>>
>>> Looks like I was off by an instruction, it's the 28th divide-step (not
>>> 29) that does the page crosser:
>>>
>>>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
> I think this macro might clobber the C/B bits on a ITLB missing:
> 
>          /* This is for ILP32 PA2.0 only.  The TLB insertion needs
>           * to extend into I/O space if the address is 0xfXXXXXXX
>           * so we extend the f's into the top word of the pte in
>           * this case */
>          .macro          f_extend        pte,tmp
>          extrd,s         \pte,42,4,\tmp
>          addi,<>         1,\tmp,%r0
>          extrd,s         \pte,63,25,\pte
>          .endm
> 
> The addi instruction affects the C/B bits.  However, it is only used for 32-bit PA 2.0 kernels.
> A second tmp register would be needed to change the addi to an add logical.
> 
> The mode likely problem is the shladd instruction in the following macro in entry.S:
> 
>          .macro          L2_ptep pmd,pte,index,va,fault
> #if CONFIG_PGTABLE_LEVELS == 3
>          extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
> #else
>          extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
> #endif
>          dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
> #if CONFIG_PGTABLE_LEVELS < 3
>          copy            %r0,\pte
> #endif
>          ldw,s           \index(\pmd),\pmd
>          bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
>          dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
>          SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
>          extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>          dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>          shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>          .endm
> 
> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
> 

That doesn't help, at least not in qemu.

Guenter


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 22:29                               ` John David Anglin
  2024-08-08 23:33                                 ` Linus Torvalds
@ 2024-08-09  0:56                                 ` Guenter Roeck
  1 sibling, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-08-09  0:56 UTC (permalink / raw)
  To: John David Anglin, Linus Torvalds, Thomas Gleixner
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, Helge Deller,
	linux-parisc, Richard Henderson

On 8/8/24 15:29, John David Anglin wrote:
> On 2024-08-08 5:50 p.m., John David Anglin wrote:
>> The mode likely problem is the shladd instruction in the following macro in entry.S:
>>
>>         .macro          L2_ptep pmd,pte,index,va,fault
>> #if CONFIG_PGTABLE_LEVELS == 3
>>         extru_safe      \va,31-ASM_PMD_SHIFT,ASM_BITS_PER_PMD,\index
>> #else
>>         extru_safe \va,31-ASM_PGDIR_SHIFT,ASM_BITS_PER_PGD,\index
>> #endif
>>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>> #if CONFIG_PGTABLE_LEVELS < 3
>>         copy            %r0,\pte
>> #endif
>>         ldw,s           \index(\pmd),\pmd
>>         bb,>=,n         \pmd,_PxD_PRESENT_BIT,\fault
>>         dep             %r0,31,PxD_FLAG_SHIFT,\pmd /* clear flags */
>>         SHLREG          \pmd,PxD_VALUE_SHIFT,\pmd
>>         extru_safe      \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>>         dep             %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
>>         shladd          \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>>         .endm
>>
>> I believe the shladd instruction should be changed to shladd,l (shift left and add logical).
> diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S
> index ab23e61a6f01..1ec60406f841 100644
> --- a/arch/parisc/kernel/entry.S
> +++ b/arch/parisc/kernel/entry.S
> @@ -399,7 +399,7 @@
>       SHLREG        \pmd,PxD_VALUE_SHIFT,\pmd
>       extru_safe    \va,31-PAGE_SHIFT,ASM_BITS_PER_PTE,\index
>       dep        %r0,31,PAGE_SHIFT,\pmd  /* clear offset */
> -    shladd        \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
> +    shladd,l    \index,BITS_PER_PTE_ENTRY,\pmd,\pmd /* pmd is now pte */
>       .endm
> 
>       /* Look up PTE in a 3-Level scheme. */
> 
> Boots okay.  Fixing the addi instruction is harder and it would take some time to test.
> 

Odd, it doesn't help for me. Does it crash for you without the above change ?
Or, in other words, is divI at the objecting location ?

Guenter



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-08-08 18:19                         ` Linus Torvalds
  2024-08-08 20:52                           ` Guenter Roeck
@ 2024-09-03  7:54                           ` Helge Deller
  2024-09-03 14:13                             ` Guenter Roeck
  2024-09-03 18:43                             ` Linus Torvalds
  1 sibling, 2 replies; 33+ messages in thread
From: Helge Deller @ 2024-09-03  7:54 UTC (permalink / raw)
  To: Linus Torvalds, Richard Henderson, Guenter Roeck
  Cc: Vlastimil Babka, linux-kernel, Linux-MM, linux-parisc,
	Thomas Gleixner

On 8/8/24 20:19, Linus Torvalds wrote:
> On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> Here is the disassembly from my latest crashing debug kernel which
>> shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
>
> Looks like I was off by an instruction, it's the 28th divide-step (not
> 29) that does the page crosser:
>
>>      4121dffc:   0b 21 04 41     ds r1,r25,r1
>>      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
>
> but my parisc knowledge is not good enough to even guess at what could go wrong.
>
> And I have no actual reason to believe this has *anything* to do with
> an itlb miss, except for that whole "exact placement seems to matter,
> and it crosses a page boundary" detail.

Well, you were on the right track :-)

Guenters kernel from
http://server.roeck-us.net/qemu/parisc64-6.10.3/
boots nicely on my physical C3700 machine, but crashes with Qemu.

So, it's not some bug in the kernel ITLB miss handler or other
Linux kernel code.

Instead it's a Qemu bug, which gets triggered by the page
boundary crossing of:
    41218ffc:   0b 21 04 41     ds r1,r25,r1
    41219000:   0b bd 07 1d     add,c ret1,ret1,ret1

During the ITLB miss, the carry bits and the PSW-V-bit
(from the divide step) are saved in the IPSW register and restored
upon irq return.

During packaging the bits there is a qemu coding bug, where we missed
to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU.
The (copy&pasted) patch below fixes the crash for me.

Helge

diff --git a/target/hppa/helper.c b/target/hppa/helper.c
index b79ddd8184..d4b1a3cd5a 100644
--- a/target/hppa/helper.c
+++ b/target/hppa/helper.c
@@ -53,7 +53,7 @@ target_ulong cpu_hppa_get_psw(CPUHPPAState *env)
      }

      psw |= env->psw_n * PSW_N;
-    psw |= (env->psw_v < 0) * PSW_V;
+    psw |= ((env->psw_v >> 31) & 1) * PSW_V;
      psw |= env->psw | env->psw_xb;

      return psw;


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-09-03  7:54                           ` Helge Deller
@ 2024-09-03 14:13                             ` Guenter Roeck
  2024-09-03 18:43                             ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Guenter Roeck @ 2024-09-03 14:13 UTC (permalink / raw)
  To: Helge Deller
  Cc: Linus Torvalds, Richard Henderson, Vlastimil Babka, linux-kernel,
	Linux-MM, linux-parisc, Thomas Gleixner

On Tue, Sep 03, 2024 at 09:54:19AM +0200, Helge Deller wrote:
> On 8/8/24 20:19, Linus Torvalds wrote:
> > On Thu, 8 Aug 2024 at 10:48, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > 
> > > Here is the disassembly from my latest crashing debug kernel which
> > > shifts it up a couple of pages. Add 0x10 or sub 0x20 to make it work.
> > 
> > Looks like I was off by an instruction, it's the 28th divide-step (not
> > 29) that does the page crosser:
> > 
> > >      4121dffc:   0b 21 04 41     ds r1,r25,r1
> > >      4121e000:   0b bd 07 1d     add,c ret1,ret1,ret1
> > 
> > but my parisc knowledge is not good enough to even guess at what could go wrong.
> > 
> > And I have no actual reason to believe this has *anything* to do with
> > an itlb miss, except for that whole "exact placement seems to matter,
> > and it crosses a page boundary" detail.
> 
> Well, you were on the right track :-)
> 
> Guenters kernel from
> http://server.roeck-us.net/qemu/parisc64-6.10.3/
> boots nicely on my physical C3700 machine, but crashes with Qemu.
> 
> So, it's not some bug in the kernel ITLB miss handler or other
> Linux kernel code.
> 
> Instead it's a Qemu bug, which gets triggered by the page
> boundary crossing of:
>    41218ffc:   0b 21 04 41     ds r1,r25,r1
>    41219000:   0b bd 07 1d     add,c ret1,ret1,ret1
> 
> During the ITLB miss, the carry bits and the PSW-V-bit
> (from the divide step) are saved in the IPSW register and restored
> upon irq return.
> 
> During packaging the bits there is a qemu coding bug, where we missed
> to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU.
> The (copy&pasted) patch below fixes the crash for me.
> 

Yes, that works for me as well. Thanks a lot for the fix!

Guenter

> Helge
> 
> diff --git a/target/hppa/helper.c b/target/hppa/helper.c
> index b79ddd8184..d4b1a3cd5a 100644
> --- a/target/hppa/helper.c
> +++ b/target/hppa/helper.c
> @@ -53,7 +53,7 @@ target_ulong cpu_hppa_get_psw(CPUHPPAState *env)
>      }
> 
>      psw |= env->psw_n * PSW_N;
> -    psw |= (env->psw_v < 0) * PSW_V;
> +    psw |= ((env->psw_v >> 31) & 1) * PSW_V;
>      psw |= env->psw | env->psw_xb;
> 
>      return psw;
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6.10 000/809] 6.10.3-rc3 review
  2024-09-03  7:54                           ` Helge Deller
  2024-09-03 14:13                             ` Guenter Roeck
@ 2024-09-03 18:43                             ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2024-09-03 18:43 UTC (permalink / raw)
  To: Helge Deller
  Cc: Richard Henderson, Guenter Roeck, Vlastimil Babka, linux-kernel,
	Linux-MM, linux-parisc, Thomas Gleixner

On Tue, 3 Sept 2024 at 00:54, Helge Deller <deller@gmx.de> wrote:
>
> During packaging the bits there is a qemu coding bug, where we missed
> to handle the PSW-V-bit as 32-bit value even on a 64-bit CPU.

Well, that was a fun bug.

And by "fun" I obviously mean "let's hope to never do this again".

               Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-09-03 18:43 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240731095022.970699670@linuxfoundation.org>
     [not found] ` <718b8afe-222f-4b3a-96d3-93af0e4ceff1@roeck-us.net>
2024-08-05  3:28   ` [PATCH 6.10 000/809] 6.10.3-rc3 review Guenter Roeck
2024-08-05  8:56     ` Thomas Gleixner
2024-08-05 12:51       ` Thomas Gleixner
2024-08-05 15:02         ` Guenter Roeck
2024-08-05 21:49           ` Thomas Gleixner
2024-08-06  1:16             ` Guenter Roeck
2024-08-05 17:42       ` Guenter Roeck
     [not found]   ` <CAHk-=wiZ7WJQ1y=CwuMwqBxQYtaD8psq+Vxa3r1Z6_ftDZK+hA@mail.gmail.com>
     [not found]     ` <53b2e1f2-4291-48e5-a668-7cf57d900ecd@suse.cz>
     [not found]       ` <87le194kuq.ffs@tglx>
     [not found]         ` <90e02d99-37a2-437e-ad42-44b80c4e94f6@suse.cz>
2024-08-06 23:24           ` Thomas Gleixner
2024-08-07  0:49             ` James Bottomley
2024-08-07  1:38               ` Guenter Roeck
2024-08-07 12:45               ` Thomas Gleixner
2024-08-08  1:07             ` Guenter Roeck
2024-08-08  7:48               ` Vlastimil Babka
2024-08-08 14:46                 ` Guenter Roeck
2024-08-08  9:57               ` Thomas Gleixner
2024-08-08 14:59                 ` Guenter Roeck
2024-08-08 15:58                   ` John David Anglin
2024-08-08 15:53                 ` Linus Torvalds
2024-08-08 16:12                   ` Thomas Gleixner
2024-08-08 16:33                     ` Linus Torvalds
2024-08-08 17:48                       ` Thomas Gleixner
2024-08-08 18:19                         ` Linus Torvalds
2024-08-08 20:52                           ` Guenter Roeck
2024-08-08 21:50                             ` John David Anglin
2024-08-08 22:29                               ` John David Anglin
2024-08-08 23:33                                 ` Linus Torvalds
2024-08-09  0:33                                   ` John David Anglin
2024-08-09  0:56                                 ` Guenter Roeck
2024-08-09  0:50                               ` Guenter Roeck
2024-08-08 22:15                             ` Richard Henderson
2024-09-03  7:54                           ` Helge Deller
2024-09-03 14:13                             ` Guenter Roeck
2024-09-03 18:43                             ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox