* Re: Kernel Oops on alpha with kernel version >=6.9.x [not found] ` <8e365392-0e0d-4d78-bae7-69f27c8350f9@paulmck-laptop> @ 2024-12-04 22:22 ` Magnus Lindholm 2024-12-05 15:39 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-04 22:22 UTC (permalink / raw) To: paulmck; +Cc: rcu, linux-alpha I've been looking a bit closer at the RCU problem on Alpha, in the case with the bug related to interface-renaming after the changes in the networking code the code fails with an invalid pointer reference. From the stack trace one can conclude that this happens when using synchronize_rcu_expedited() in stead of synchronize_rcu_normal(). The use of rcu_normal can be enforced by setting kernel parameter rcupdate.rcu_normal=1 at boot. This makes recent kernels boot again on my Alphas, a simple enough workaround for now. The code fails inside work-queue handler wait_rcu_exp_gp() when its trying to call rcu_exp_sel_wait_wake(). looking at the code generated from the compiler the call to rcu_exp_sel_wait_wake() appears to be inline-optimized, so no actual call to this function. If I add some bogus-code (i.e a print call that references the address of a local variable, something that the compiler can't optimize away) before the call to rcu_exp_sel_wait_wake(), the code works! The same effect is achieved by declaring the local variable as volatile. I've also noted a similar behavior in the scsi driver code, where unloading of a scsi driver kernel module (in my case qla1280) will trigger a kernel Oops. As in the example above, this can be mitigated by adding a reference to local variables. When doing "rmmod qla1280" scsi_host_dev_release() calls rcu_barrier(). In this function call I noticed that the stack was somehow corrupted and the return address to scsi_host_dev_release() was overwritten. The stack corruption occurs in the "for_each_possible_cpu(cpu)" loop inside rcu_barrier(). Below are stack dumps from before/after the for_each_possible_cpu loop. The call to scsi_host_dev_release disappears in stack trace since its return address (fffffc0000b6a3ec) is replaced by a '1' and at the of the call to rcu_barrier(). We get a kernel Oops since the $ra=1 is used as return address. In both RCU cases above, stack corruption occurs and the sections that cause problems involve the use of kernel threads so concurrency might be an issue here. Since the RCU code works on other platforms and can be "fixed" on Alpha as well just by declaring certain variables as volatile (or by other means making sure that they are not optimized away from the code) can this be a compiler issue on alpha or is it the result of not taking proper measures, in the code, to account for the weak memory model on Alpha? Or a combination of the two? /Magnus Lindholm Stack traces showing the corrupted stack frames: ---------------------------------------------------------------- rcu: inside rcu_barrier 5 CPU: 1 UID: 0 PID: 1430 Comm: rmmod Not tainted 6.12.1-gentoo #43 fffffc000987fc88 fffffc0000e66440 fffffc00003a8bc8 0000000000000000 fffffc0000e667b0 fffffc000480b5d8 fffffc0000b6a3ec fffffc0004a2a000 fffffc0004a2a240 fffffc000480b5d8 0000000000000000 fffffffc00502068 0000020001043480 00000200010422a0 0000000000000000 0000000000000000 fffffc0000b68efc fffffc0004a2a240 fffffc0006319300 0000000000000000 fffffc0000b2ed80 fffffc0004a2a240 fffffc0000b9d278 0000000000000000 Trace: [<fffffc00003a8bc8>] rcu_barrier+0x1f8/0x580 [<fffffc0000b6a3ec>] scsi_host_dev_release+0xac/0x1cc [<fffffc0000b68efc>] device_release+0x148/0x218 [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270 [<fffffc00007cac3c>] put_device+0x1c/0x30 [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30 [<fffffc00007554a4>] pci_device_remove+0x34/0x90 [<fffffc00007d5c04>] device_remove+0x64/0xb0 [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380 [<fffffc00007d783c>] driver_detach+0x7c/0x110 [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150 [<fffffc00007d80c4>] driver_unregister+0x44/0xa0 [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0 [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320 [<fffffc0000310d34>] entSys+0xa4/0xc0 rcu: inside rcu_barrier 6 CPU: 1 UID: 0 PID: 1430 Comm: rmmod Not tainted 6.12.1-gentoo #43 fffffc000987fc88 fffffc0000e66440 fffffc00003a8c44 0000000000000002 fffffc0000e667b0 fffffc0000e44240 0000000000000001 fffffc0004a2a000 fffffc0004a2a240 fffffc000480b5d8 0000000000000000 fffffffc00502068 0000020001043480 00000200010422a0 0000000000000000 0000000000000000 fffffc0000b68efc fffffc0004a2a240 fffffc0006319300 0000000000000000 fffffc0000b2ed80 fffffc0004a2a240 fffffc0000b9d278 0000000000000000 Trace: [<fffffc00003a8c44>] rcu_barrier+0x274/0x580 [<fffffc0000b68efc>] device_release+0x148/0x218 [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270 [<fffffc00007cac3c>] put_device+0x1c/0x30 [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30 [<fffffc00007554a4>] pci_device_remove+0x34/0x90 [<fffffc00007d5c04>] device_remove+0x64/0xb0 [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380 [<fffffc00007d783c>] driver_detach+0x7c/0x110 [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150 [<fffffc00007d80c4>] driver_unregister+0x44/0xa0 [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0 [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320 [<fffffc0000310d34>] entSys+0xa4/0xc0 Unable to handle kernel paging request at virtual address 0000000000000000 CPU 1 rmmod(1430): Oops -1 pc = [<0000000000000000>] ra = [<0000000000000001>] ps = 0000 Not tainted pc is at 0x0 ra is at 0x1 v0 = 0000000000000007 t0 = fffffc0000ec7aa8 t1 = ffffffffffffffff t2 = fffffc0000e65df0 t3 = 00000000000026f0 t4 = 00000000000028f1 t5 = 00000000000c2e20 t6 = 00000000000c2e68 t7 = fffffc000987c000 s0 = fffffc0004a2a000 s1 = fffffc0004a2a240 s2 = fffffc000480b5d8 s3 = 0000000000000000 s4 = fffffffc00502068 s5 = 0000020001043480 s6 = 00000200010422a0 a0 = 0000000000000000 a1 = 0000000000000001 a2 = 00000000000028f0 a3 = fffffc000987fa38 a4 = 0000000000000000 a5 = 0000000000000000 t8 = 00000000000c2e20 t9 = ffffffffffffffec t10= 0000000000000001 t11= 00000001000024f0 pv = fffffc000038a1f0 at = 0000000000000000 gp = fffffc0000eb7aa8 sp = 00000000183e6a07 Disabling lock debugging due to kernel taint Trace: [<fffffc0000b68efc>] device_release+0x148/0x218 [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270 [<fffffc00007cac3c>] put_device+0x1c/0x30 [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30 [<fffffc00007554a4>] pci_device_remove+0x34/0x90 [<fffffc00007d5c04>] device_remove+0x64/0xb0 [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380 [<fffffc00007d783c>] driver_detach+0x7c/0x110 [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150 [<fffffc00007d80c4>] driver_unregister+0x44/0xa0 [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0 [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320 [<fffffc0000310d34>] entSys+0xa4/0xc0 Below are the changes I made to the kernel source in order mitigate the stack corruption problem this is not really a fix but it can be of use to gain further knowledge on whats really going on: ------------------------------------------------------------------------------------ diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ff98233d4aa5..8241313404f7 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4553,7 +4553,7 @@ static void rcu_barrier_handler(void *cpu_in) */ void rcu_barrier(void) { - uintptr_t cpu; + volatile uintptr_t cpu; unsigned long flags; unsigned long gseq; struct rcu_data *rdp; diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index fb664d3a01c9..afba0ebc80e4 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -477,7 +477,7 @@ static inline void sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) */ static void wait_rcu_exp_gp(struct kthread_work *wp) { - struct rcu_exp_work *rewp; + volatile struct rcu_exp_work *rewp; rewp = container_of(wp, struct rcu_exp_work, rew_work); rcu_exp_sel_wait_wake(rewp->rew_s); @@ -705,6 +705,7 @@ static void rcu_exp_wait_wake(unsigned long s) */ static void rcu_exp_sel_wait_wake(unsigned long s) { + pr_warn("inside rcu_exp_sel_wait_wake, %llx\n",(void*)s); /* Initialize the rcu_node tree in preparation for the wait. */ sync_rcu_exp_select_cpus(); On Sun, Dec 1, 2024 at 6:04 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Sun, Dec 01, 2024 at 11:09:10AM +0100, Magnus Lindholm wrote: > > On Sun, Dec 1, 2024 at 5:31 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > Does booting with the "rcupdate.rcu_normal=1" kernel boot parameter > > > also suppress the problem? > > > > setting rcupdate.rcu_normal=1 also suppresses the problem. I guess this makes > > RCU code not do synchronize_rcu_normal() in stead of the full > > synchronize_rcu_expedited() which is where I get the kernel Oops. > > Exactly, though the effect is that any call to synchronize_rcu_expedited() > instead results in a call to synchronize_rcu(). > > Which means that you can work around this problem without having to > carry patches and without having to slow down network configuration for > everyone else. ;-) > > > > That "pc =" down below is the program counter? If so, I am at a loss > > > as to what RCU could do to make it be zero. > > > > No sure why this happens, if the RCU code is passing around pointers to > > worker function and this somehow ends up being a null pointer on the Alpha? > > Are frame pointers enabled on your setup? If not, could you please > enable them and reproduce the problem? Could you also please try > building and reproducing with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y? > > Thanx, Paul ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-04 22:22 ` Kernel Oops on alpha with kernel version >=6.9.x Magnus Lindholm @ 2024-12-05 15:39 ` John Paul Adrian Glaubitz 2024-12-05 17:02 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2024-12-05 15:39 UTC (permalink / raw) To: Magnus Lindholm, paulmck; +Cc: rcu, linux-alpha Hi Magnus, On Wed, 2024-12-04 at 23:22 +0100, Magnus Lindholm wrote: > I've been looking a bit closer at the RCU problem on Alpha, in the > case with the bug > related to interface-renaming after the changes in the networking code > the code fails > with an invalid pointer reference. From the stack trace one can > conclude that this > happens when using synchronize_rcu_expedited() in stead of > synchronize_rcu_normal(). > The use of rcu_normal can be enforced by setting kernel parameter > rcupdate.rcu_normal=1 > at boot. This makes recent kernels boot again on my Alphas, a simple enough > workaround for now. > > The code fails inside work-queue handler wait_rcu_exp_gp() when its > trying to call > rcu_exp_sel_wait_wake(). looking at the code generated from the > compiler the call > to rcu_exp_sel_wait_wake() appears to be inline-optimized, so no > actual call to this > function. If I add some bogus-code (i.e a print call that references > the address of a > local variable, something that the compiler can't optimize away) > before the call to > rcu_exp_sel_wait_wake(), the code works! The same effect is achieved > by declaring > the local variable as volatile. > > I've also noted a similar behavior in the scsi driver code, where > unloading of a scsi > driver kernel module (in my case qla1280) will trigger a kernel Oops. As in the > example above, this can be mitigated by adding a reference to local variables. > When doing "rmmod qla1280" scsi_host_dev_release() calls rcu_barrier(). In this > function call I noticed that the stack was somehow corrupted and the > return address > to scsi_host_dev_release() was overwritten. The stack corruption occurs in the > "for_each_possible_cpu(cpu)" loop inside rcu_barrier(). Below are stack dumps > from before/after the for_each_possible_cpu loop. The call to > scsi_host_dev_release > disappears in stack trace since its return address (fffffc0000b6a3ec) > is replaced > by a '1' and at the of the call to rcu_barrier(). We get a kernel Oops > since the $ra=1 is used as return address. > > In both RCU cases above, stack corruption occurs and the sections that cause > problems involve the use of kernel threads so concurrency might be an > issue here. > Since the RCU code works on other platforms and can be "fixed" on Alpha as well > just by declaring certain variables as volatile (or by other means > making sure that > they are not optimized away from the code) can this be a compiler issue on > alpha or is it the result of not taking proper measures, in the code, > to account for the > weak memory model on Alpha? Or a combination of the two? > > > /Magnus Lindholm > > > Stack traces showing the corrupted stack frames: > ---------------------------------------------------------------- > > rcu: inside rcu_barrier 5 > CPU: 1 UID: 0 PID: 1430 Comm: rmmod Not tainted 6.12.1-gentoo #43 > fffffc000987fc88 fffffc0000e66440 fffffc00003a8bc8 0000000000000000 > fffffc0000e667b0 fffffc000480b5d8 fffffc0000b6a3ec fffffc0004a2a000 > fffffc0004a2a240 fffffc000480b5d8 0000000000000000 fffffffc00502068 > 0000020001043480 00000200010422a0 0000000000000000 0000000000000000 > fffffc0000b68efc fffffc0004a2a240 fffffc0006319300 0000000000000000 > fffffc0000b2ed80 fffffc0004a2a240 fffffc0000b9d278 0000000000000000 > Trace: > [<fffffc00003a8bc8>] rcu_barrier+0x1f8/0x580 > [<fffffc0000b6a3ec>] scsi_host_dev_release+0xac/0x1cc > [<fffffc0000b68efc>] device_release+0x148/0x218 > [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270 > [<fffffc00007cac3c>] put_device+0x1c/0x30 > [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30 > [<fffffc00007554a4>] pci_device_remove+0x34/0x90 > [<fffffc00007d5c04>] device_remove+0x64/0xb0 > [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380 > [<fffffc00007d783c>] driver_detach+0x7c/0x110 > [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150 > [<fffffc00007d80c4>] driver_unregister+0x44/0xa0 > [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0 > [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320 > [<fffffc0000310d34>] entSys+0xa4/0xc0 > > > rcu: inside rcu_barrier 6 > CPU: 1 UID: 0 PID: 1430 Comm: rmmod Not tainted 6.12.1-gentoo #43 > fffffc000987fc88 fffffc0000e66440 fffffc00003a8c44 0000000000000002 > fffffc0000e667b0 fffffc0000e44240 0000000000000001 fffffc0004a2a000 > fffffc0004a2a240 fffffc000480b5d8 0000000000000000 fffffffc00502068 > 0000020001043480 00000200010422a0 0000000000000000 0000000000000000 > fffffc0000b68efc fffffc0004a2a240 fffffc0006319300 0000000000000000 > fffffc0000b2ed80 fffffc0004a2a240 fffffc0000b9d278 0000000000000000 > Trace: > [<fffffc00003a8c44>] rcu_barrier+0x274/0x580 > [<fffffc0000b68efc>] device_release+0x148/0x218 > [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270 > [<fffffc00007cac3c>] put_device+0x1c/0x30 > [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30 > [<fffffc00007554a4>] pci_device_remove+0x34/0x90 > [<fffffc00007d5c04>] device_remove+0x64/0xb0 > [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380 > [<fffffc00007d783c>] driver_detach+0x7c/0x110 > [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150 > [<fffffc00007d80c4>] driver_unregister+0x44/0xa0 > [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0 > [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320 > [<fffffc0000310d34>] entSys+0xa4/0xc0 > > > Unable to handle kernel paging request at virtual address 0000000000000000 > CPU 1 > rmmod(1430): Oops -1 > pc = [<0000000000000000>] ra = [<0000000000000001>] ps = 0000 Not tainted > pc is at 0x0 > ra is at 0x1 > v0 = 0000000000000007 t0 = fffffc0000ec7aa8 t1 = ffffffffffffffff > t2 = fffffc0000e65df0 t3 = 00000000000026f0 t4 = 00000000000028f1 > t5 = 00000000000c2e20 t6 = 00000000000c2e68 t7 = fffffc000987c000 > s0 = fffffc0004a2a000 s1 = fffffc0004a2a240 s2 = fffffc000480b5d8 > s3 = 0000000000000000 s4 = fffffffc00502068 s5 = 0000020001043480 > s6 = 00000200010422a0 > a0 = 0000000000000000 a1 = 0000000000000001 a2 = 00000000000028f0 > a3 = fffffc000987fa38 a4 = 0000000000000000 a5 = 0000000000000000 > t8 = 00000000000c2e20 t9 = ffffffffffffffec t10= 0000000000000001 > t11= 00000001000024f0 pv = fffffc000038a1f0 at = 0000000000000000 > gp = fffffc0000eb7aa8 sp = 00000000183e6a07 > Disabling lock debugging due to kernel taint > Trace: > [<fffffc0000b68efc>] device_release+0x148/0x218 > [<fffffc0000b2ed80>] kobject_put+0x1d0/0x270 > [<fffffc00007cac3c>] put_device+0x1c/0x30 > [<fffffc00007f47cc>] scsi_host_put+0x1c/0x30 > [<fffffc00007554a4>] pci_device_remove+0x34/0x90 > [<fffffc00007d5c04>] device_remove+0x64/0xb0 > [<fffffc00007d7694>] device_release_driver_internal+0x294/0x380 > [<fffffc00007d783c>] driver_detach+0x7c/0x110 > [<fffffc00007d5240>] bus_remove_driver+0xa0/0x150 > [<fffffc00007d80c4>] driver_unregister+0x44/0xa0 > [<fffffc00007552f8>] pci_unregister_driver+0x38/0xd0 > [<fffffc00003bbb7c>] sys_delete_module+0x19c/0x320 > [<fffffc0000310d34>] entSys+0xa4/0xc0 > > > Below are the changes I made to the kernel source in order mitigate > the stack corruption problem > this is not really a fix but it can be of use to gain further > knowledge on whats really going on: > ------------------------------------------------------------------------------------ > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index ff98233d4aa5..8241313404f7 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4553,7 +4553,7 @@ static void rcu_barrier_handler(void *cpu_in) > */ > void rcu_barrier(void) > { > - uintptr_t cpu; > + volatile uintptr_t cpu; > unsigned long flags; > unsigned long gseq; > struct rcu_data *rdp; > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h > index fb664d3a01c9..afba0ebc80e4 100644 > --- a/kernel/rcu/tree_exp.h > +++ b/kernel/rcu/tree_exp.h > @@ -477,7 +477,7 @@ static inline void > sync_rcu_exp_select_cpus_flush_work(struct rcu_node *rnp) > */ > static void wait_rcu_exp_gp(struct kthread_work *wp) > { > - struct rcu_exp_work *rewp; > + volatile struct rcu_exp_work *rewp; > > rewp = container_of(wp, struct rcu_exp_work, rew_work); > rcu_exp_sel_wait_wake(rewp->rew_s); > @@ -705,6 +705,7 @@ static void rcu_exp_wait_wake(unsigned long s) > */ > static void rcu_exp_sel_wait_wake(unsigned long s) > { > + pr_warn("inside rcu_exp_sel_wait_wake, %llx\n",(void*)s); > /* Initialize the rcu_node tree in preparation for the wait. */ > sync_rcu_exp_select_cpus(); I wonder whether these RCU bugs are related to the SMP issue on alpha that's been introduced/uncovered by f2f84b05e02b7710a201f0017b3272ad7ef703d1 [1]. In particular, one suggestion was to run the RCU stress test and see if the results will tell anything about issues that might be related to the SMP problem. Adrian > [1] https://bugzilla.kernel.org/show_bug.cgi?id=213143 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-05 15:39 ` John Paul Adrian Glaubitz @ 2024-12-05 17:02 ` Magnus Lindholm 2024-12-06 15:39 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-05 17:02 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: paulmck, rcu, linux-alpha > > I wonder whether these RCU bugs are related to the SMP issue on alpha that's > been introduced/uncovered by f2f84b05e02b7710a201f0017b3272ad7ef703d1 [1]. > > In particular, one suggestion was to run the RCU stress test and see if the > results will tell anything about issues that might be related to the SMP > problem. > Hi, I'll take a look. I my case the bug is 100% reproducible, that is every time I unload the kernel module or boot scripts rename the net interface I hit the bug. If I mitigate the bug with the changes to the code the problem goes away, at least to the extent that I haven't run into it again. That said, I'm unsure as to the root cause of my problem so this is interesting. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-05 17:02 ` Magnus Lindholm @ 2024-12-06 15:39 ` Magnus Lindholm 2024-12-06 17:05 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-06 15:39 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: paulmck, rcu, linux-alpha Some updates on this: rcu: inside rcu_barrier begin loop first pass cpu index=0 CPU: 1 UID: 0 PID: 1433 Comm: rmmod Not tainted 6.12.1-gentoo #75 fffffc000a217c88 fffffc0000e62440 fffffc00003a8be8 0000000000000000 fffffc0000e627b0 fffffc0000e42240 fffffc0000b67c98 fffffc0004a18000 here return address to scsi_host_dev_release is still intact on stack rcu: inside rcu_barrier end loop first pass CPU: 1 UID: 0 PID: 1433 Comm: rmmod Not tainted 6.12.1-gentoo #75 fffffc000a217c88 fffffc0000e62440 fffffc00003a8f64 0000000000000000 fffffc0000e627b0 fffffc0000e42240 0000000000000000 fffffc0004a18000 at the end of first pass in loop, return address in replaced by 0 rcu: inside rcu_barrier begin loop second pass cpu index=1 CPU: 1 UID: 0 PID: 1433 Comm: rmmod Not tainted 6.12.1-gentoo #75 fffffc000a217c88 fffffc0000e62440 fffffc00003a8be8 0000000000000001 fffffc0000e627b0 fffffc0000e42240 0000000000000000 fffffc0004a18000 rcu: inside rcu_barrier end loop (second pass) CPU: 1 UID: 0 PID: 1433 Comm: rmmod Not tainted 6.12.1-gentoo #75 fffffc000a217c88 fffffc0000e62440 fffffc00003a8f64 0000000000000001 fffffc0000e627b0 fffffc0000e42240 0000000000000001 fffffc0004a18000 ant the end of second pass return address is replaced by 1. It looks like the variable used as loop counter is the value put on the stack overwriting the return value for scsi_host_dev_release. When adding a reference to the address of this variable or when it is declared volatile, stack corruption does NOT occur. When examining the disassembly of the code generated from kernel/rcu/tree.o the most significant difference I can see is that in the case of a corrupted stack the frame pointer register $fp is used to hold a reference to the loop count variable but in the case with no stack corruption a regular "saved register" is used for the reference. Is it possible that the frame pointer is somehow altered during the execution of the code? not really sure how linux/alpha/gcc treats the frame pointer. I've tried altering -fomit-frame-pointer/-f-no-omit-frame-pointer but so far not getting anywhere with that... /Magnus Return address to scsi_host_dev_release: [<fffffc0000b67c98>] scsi_host_dev_release+0xac/0x1cc ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-06 15:39 ` Magnus Lindholm @ 2024-12-06 17:05 ` John Paul Adrian Glaubitz 2024-12-07 12:33 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2024-12-06 17:05 UTC (permalink / raw) To: Magnus Lindholm; +Cc: paulmck, rcu, linux-alpha Hi Magnus, On Fri, 2024-12-06 at 16:39 +0100, Magnus Lindholm wrote: > It looks like the variable used as loop counter is the value put on the stack > overwriting the return value for scsi_host_dev_release. When adding > a reference to the address of this variable or when it is declared > volatile, stack > corruption does NOT occur. > > When examining the disassembly of the code generated from kernel/rcu/tree.o > the most significant difference I can see is that in the case of a > corrupted stack > the frame pointer register $fp is used to hold a reference to the loop > count variable > but in the case with no stack corruption a regular "saved register" is > used for the > reference. Is it possible that the frame pointer is somehow altered > during the execution > of the code? not really sure how linux/alpha/gcc treats the frame pointer. I've > tried altering -fomit-frame-pointer/-f-no-omit-frame-pointer but so > far not getting > anywhere with that... Could this maybe a compiler bug? What about building the kernel with an older GCC version from [1]? Adrian > [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/ -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-06 17:05 ` John Paul Adrian Glaubitz @ 2024-12-07 12:33 ` Magnus Lindholm 2024-12-07 12:39 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-07 12:33 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: paulmck, rcu, linux-alpha > > Could this maybe a compiler bug? What about building the kernel with an older GCC version from [1]? > > Adrian > > > [1] https://mirrors.edge.kernel.org/pub/tools/crosstool/ Hi, I'll try to dig up an older build environment and see what happens. Still if it is a compiler bug, I wonder what makes RCU/kernel code trigger this and not other code? I also notices that the interface-renaming rcu problem which occurs in a different part of the RCU code, when "fixing" according to the patch supplied previously in this thread, the code I hit the bug again if I revert from using CFLAGS=-02 to just using CFLAGS=-0. So kernel optimization flags can trigger this bug it seems. /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-07 12:33 ` Magnus Lindholm @ 2024-12-07 12:39 ` John Paul Adrian Glaubitz 2024-12-07 17:33 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2024-12-07 12:39 UTC (permalink / raw) To: Magnus Lindholm; +Cc: paulmck, rcu, linux-alpha Hi Magnus, On Sat, 2024-12-07 at 13:33 +0100, Magnus Lindholm wrote: > > I'll try to dig up an older build environment and see what happens. I would suggest to just cross-compile the kernel from x86_64 using the pre-compiled toolchains from kernel.org that I linked to. > Still if it is a compiler bug, > I wonder what makes RCU/kernel code trigger this and not other code? Well, compiler bugs can trigger the most unusual symptoms and issues, so I wouldn't be too surprised. > I also notices that the interface-renaming rcu problem which occurs in > a different part of the > RCU code, when "fixing" according to the patch supplied previously in > this thread, the code > I hit the bug again if I revert from using CFLAGS=-02 to just using > CFLAGS=-0. So kernel > optimization flags can trigger this bug it seems. Which would even make the compiler more likely to be the suspect. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-07 12:39 ` John Paul Adrian Glaubitz @ 2024-12-07 17:33 ` Magnus Lindholm 2024-12-07 18:38 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-07 17:33 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: paulmck, rcu, linux-alpha Hi again, Just a short update: Regarding the SCSI module unload problem, building vanilla kernel-6.12.3: alpha-linux-gcc (Gentoo 14.2.1_p20241026 p3): does NOT work alpha-linux-gcc (GCC) 8.5.0: Works!! alpha-linux-gcc (GCC) 9.5.0: Works!! Regarding the network interface rename/rcu_expedited() problem I've tried gcc-8.5.0, gcc-9.5.0 and Gentoo 14.2.1_p20241026 p3 same results for all three versions, even though the fix is similar, using different compiler versions does not fix the issue. /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-07 17:33 ` Magnus Lindholm @ 2024-12-07 18:38 ` John Paul Adrian Glaubitz 2024-12-08 9:43 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2024-12-07 18:38 UTC (permalink / raw) To: Magnus Lindholm; +Cc: paulmck, rcu, linux-alpha Hi Magnus, On Sat, 2024-12-07 at 18:33 +0100, Magnus Lindholm wrote: > Just a short update: > > Regarding the SCSI module unload problem, building vanilla kernel-6.12.3: > > alpha-linux-gcc (Gentoo 14.2.1_p20241026 p3): does NOT work > alpha-linux-gcc (GCC) 8.5.0: Works!! > alpha-linux-gcc (GCC) 9.5.0: Works!! Interesting. Have you verified that this diagnosis is 100% reproducible? I'm asking since since I have now performed some compiler tests with older GCC versions and the SMP bug [1] is not always reproducible for me. Changing the GCC version does not make any difference. > Regarding the network interface rename/rcu_expedited() problem I've > tried gcc-8.5.0, gcc-9.5.0 and Gentoo 14.2.1_p20241026 p3 same > results for all three versions, even though the fix is similar, using > different compiler versions does not fix the issue. What about here? Is that 100% reproducible? And did you try bisecting this? Adrian > [1] https://bugzilla.kernel.org/show_bug.cgi?id=213143 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-07 18:38 ` John Paul Adrian Glaubitz @ 2024-12-08 9:43 ` Magnus Lindholm 2024-12-08 21:39 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-08 9:43 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: paulmck, rcu, linux-alpha On Sat, Dec 7, 2024 at 7:38 PM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > > Hi Magnus, > > On Sat, 2024-12-07 at 18:33 +0100, Magnus Lindholm wrote: > > Just a short update: > > > > Regarding the SCSI module unload problem, building vanilla kernel-6.12.3: > > > > alpha-linux-gcc (Gentoo 14.2.1_p20241026 p3): does NOT work > > alpha-linux-gcc (GCC) 8.5.0: Works!! > > alpha-linux-gcc (GCC) 9.5.0: Works!! > > Interesting. Have you verified that this diagnosis is 100% reproducible? I'm sticking to the cross-build compilers provided by kernel.org, this is what I have so far: 14.2.0 does NOT work 10.5.0 does NOT work 10.1.0 does NOT work 9.5.0 works! 8.5.0 works! I've run this in a couple of iterations, on my system, this appears to be 100% consistent. I'm using vanilla kernel source tree 6.12.3 for the tests and the qla1280 scsi driver as an example for the module unload test case. > I'm asking since since I have now performed some compiler tests with older > GCC versions and the SMP bug [1] is not always reproducible for me. > > Changing the GCC version does not make any difference. > > Regarding the network interface rename/rcu_expedited() problem I've > > tried gcc-8.5.0, gcc-9.5.0 and Gentoo 14.2.1_p20241026 p3 same > > results for all three versions, even though the fix is similar, using > > different compiler versions does not fix the issue. > > What about here? Is that 100% reproducible? And did you try bisecting this? It seems consistent also here, but I haven't really tested it enough yet I'll get back to this one again, still working on the module unload case. /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-08 9:43 ` Magnus Lindholm @ 2024-12-08 21:39 ` Magnus Lindholm 2024-12-08 23:18 ` Michael Cree 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-08 21:39 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: paulmck, rcu, linux-alpha some updates: I've been building gcc from git and this is what I've seen so far: (GCC) 14.2.1 20241208 does NOT work (GCC) 15.0.0 20241208 (experimental): works! Seems like this bug gets fixed in the 15.x branch! I need to find the commit that fixes this... /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-08 21:39 ` Magnus Lindholm @ 2024-12-08 23:18 ` Michael Cree 2024-12-08 23:31 ` John Paul Adrian Glaubitz 2024-12-09 8:05 ` Magnus Lindholm 0 siblings, 2 replies; 62+ messages in thread From: Michael Cree @ 2024-12-08 23:18 UTC (permalink / raw) To: Magnus Lindholm; +Cc: John Paul Adrian Glaubitz, paulmck, rcu, linux-alpha On Sun, Dec 08, 2024 at 10:39:30PM +0100, Magnus Lindholm wrote: > some updates: > > I've been building gcc from git and this is what I've seen so far: > (GCC) 14.2.1 20241208 does NOT work > (GCC) 15.0.0 20241208 (experimental): works! > > Seems like this bug gets fixed in the 15.x branch! > I need to find the commit that fixes this... Maybe, maybe not. Building 6.11.11 with alpha-linux-gnu-gcc-9 (Debian 9.3.0-22) fixed the null pointer access in the scsi subsystem, and finally I have a bootable system! (Tested on ES45.) But I now get a null pointer access when network driver loaded: [ 34.501935] e100 0001:02:04.0 enP1p2s4: NIC Link is Up 100 Mbps Full Duplex [ 40.692361] Unable to handle kernel paging request at virtual address 0000000000000000 [ 40.799783] CPU 1 [ 40.799783] kworker/1:2(158): Oops -1 Cheers, Michael ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-08 23:18 ` Michael Cree @ 2024-12-08 23:31 ` John Paul Adrian Glaubitz 2024-12-09 8:11 ` Magnus Lindholm 2024-12-09 8:05 ` Magnus Lindholm 1 sibling, 1 reply; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2024-12-08 23:31 UTC (permalink / raw) To: Michael Cree, Magnus Lindholm; +Cc: linux-alpha, Maciej W.Rozycki Hi Magnus and Michael, On Mon, 2024-12-09 at 12:18 +1300, Michael Cree wrote: > On Sun, Dec 08, 2024 at 10:39:30PM +0100, Magnus Lindholm wrote: > > some updates: > > > > I've been building gcc from git and this is what I've seen so far: > > (GCC) 14.2.1 20241208 does NOT work > > (GCC) 15.0.0 20241208 (experimental): works! > > > > Seems like this bug gets fixed in the 15.x branch! > > I need to find the commit that fixes this... > > Maybe, maybe not. Building 6.11.11 with alpha-linux-gnu-gcc-9 > (Debian 9.3.0-22) fixed the null pointer access in the scsi > subsystem, and finally I have a bootable system! (Tested on ES45.) > But I now get a null pointer access when network driver loaded: > > [ 34.501935] e100 0001:02:04.0 enP1p2s4: NIC Link is Up 100 Mbps Full Duplex > [ 40.692361] Unable to handle kernel paging request at virtual address 0000000000000000 > [ 40.799783] CPU 1 > [ 40.799783] kworker/1:2(158): Oops -1 Maciej (CC'ed) is currently working on improving the Alpha backend in GCC and he would need access to a real alpha machine to test his patches. Would either of you be able to help him? I have taken my XP1000 out of storage yesterday, but unfortunately it won't power on at the moment as the power supply needs to be serviced. I have a 433au somewhere in my basement as well, but I haven't located it yet. But it would also be better if Maciej could use a faster machine for his tests. Thanks, Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-08 23:31 ` John Paul Adrian Glaubitz @ 2024-12-09 8:11 ` Magnus Lindholm 2024-12-12 23:23 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-09 8:11 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: Michael Cree, linux-alpha, Maciej W.Rozycki > Maciej (CC'ed) is currently working on improving the Alpha backend in GCC and he would > need access to a real alpha machine to test his patches. > > Would either of you be able to help him? I have taken my XP1000 out of storage yesterday, > but unfortunately it won't power on at the moment as the power supply needs to be serviced. I can probably set something up on one of my systems, I'm traveling this week but I can get back to you guys off-list regarding this. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-09 8:11 ` Magnus Lindholm @ 2024-12-12 23:23 ` Magnus Lindholm 0 siblings, 0 replies; 62+ messages in thread From: Magnus Lindholm @ 2024-12-12 23:23 UTC (permalink / raw) To: John Paul Adrian Glaubitz; +Cc: Michael Cree, linux-alpha, Maciej W.Rozycki Hi again, I did a bisec on gcc and found that if commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b is NOT applied will generate faulty code in the RCU implementation on alpha. The code fails since the when the fram pointer register is used for accessing local variables the stack gets corrupted. When the commit is applied, which it is in GCC-15.0.0 from 25th of june 2024 the gcc compiler will generate good code on alpha which mitigates both the rcu_expedited() bug (network interface renaming) as well as the scsi module unload bug. Not sure but GCC versions from 10.x to 14.x seem to suffer from this. Maybe this commit can be back ported to those versions? /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-08 23:18 ` Michael Cree 2024-12-08 23:31 ` John Paul Adrian Glaubitz @ 2024-12-09 8:05 ` Magnus Lindholm 2024-12-16 22:10 ` Michael Cree 1 sibling, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-09 8:05 UTC (permalink / raw) To: Michael Cree, Magnus Lindholm, John Paul Adrian Glaubitz, paulmck, rcu, linux-alpha > Maybe, maybe not. Building 6.11.11 with alpha-linux-gnu-gcc-9 > (Debian 9.3.0-22) fixed the null pointer access in the scsi > subsystem, and finally I have a bootable system! (Tested on ES45.) > But I now get a null pointer access when network driver loaded: Very interesting! Can you provide a full stack dump? interested to see if this is the same issue as on my system. > > [ 34.501935] e100 0001:02:04.0 enP1p2s4: NIC Link is Up 100 Mbps Full Duplex > [ 40.692361] Unable to handle kernel paging request at virtual address 0000000000000000 > [ 40.799783] CPU 1 > [ 40.799783] kworker/1:2(158): Oops -1 > Have you tried passing rcupdate.rcu_normal=1 as boot parameter to the kernel? That allowed me to bypass the network interface problem on my system. > Cheers, > Michael ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-09 8:05 ` Magnus Lindholm @ 2024-12-16 22:10 ` Michael Cree 2024-12-17 6:23 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Michael Cree @ 2024-12-16 22:10 UTC (permalink / raw) To: Magnus Lindholm; +Cc: John Paul Adrian Glaubitz, paulmck, rcu, linux-alpha On Mon, Dec 09, 2024 at 09:05:46AM +0100, Magnus Lindholm wrote: > > Maybe, maybe not. Building 6.11.11 with alpha-linux-gnu-gcc-9 > > (Debian 9.3.0-22) fixed the null pointer access in the scsi > > subsystem, and finally I have a bootable system! (Tested on ES45.) > > But I now get a null pointer access when network driver loaded: > > Very interesting! Can you provide a full stack dump? interested to > see if this is the same issue as on my system. [ 33.144514] e100 0001:02:04.0 enP1p2s4: NIC Link is Up 100 Mbps Full Duplex [ 36.878887] Unable to handle kernel paging request at virtual address 0000000000000000 [ 36.879863] CPU 0 [ 36.879863] kworker/0:1(10): Oops -1 [ 36.881817] pc = [<0000000000000000>] ra = [<0000000000000001>] ps = 0000 Not tainted [ 36.882793] pc is at 0x0 [ 36.883770] ra is at 0x1 [ 36.884746] v0 = 0000000000000007 t0 = 0000000000000001 t1 = fffffc000091cc80 [ 36.884746] t2 = 0000000000000000 t3 = fffffc00040cca40 t4 = 0000000000000001 [ 36.884746] t5 = 0000000000000001 t6 = 0000000000000000 t7 = fffffc0004124000 [ 36.884746] s0 = fffffc000a037e00 s1 = fffffc000a037f28 s2 = fffffffffffffed8 [ 36.884746] s3 = fffffc0004020b40 s4 = fffffc0001c8d688 s5 = fffffc0004020bc0 [ 36.884746] s6 = fffffc0001c8d680 [ 36.884746] a0 = fffffc0001c5b320 a1 = fffffc000ac14200 a2 = fffffc00040ccac0 [ 36.884746] a3 = fffffc0003df36c0 a4 = fffffc000dfb3d38 a5 = ffffffffffffffff [ 36.884746] t8 = 0000000000000000 t9 = 0000000000000000 t10= 0000000000000001 [ 36.884746] t11= 0000000000000001 pv = fffffc00018b7180 at = 0000000000000001 [ 36.884746] gp = fffffc0001cb08c8 sp = 00000000ab1ca88d [ 36.884746] Disabling lock debugging due to kernel taint [ 36.896465] Trace: [ 36.896465] [<fffffc0001049c8c>] process_scheduled_works+0xdc/0x420 [ 36.898418] [<fffffc000104a740>] worker_thread+0x0/0x3d0 [ 36.899395] [<fffffc000104a8f0>] worker_thread+0x1b0/0x3d0 [ 36.900371] [<fffffc000104a740>] worker_thread+0x0/0x3d0 [ 36.901348] [<fffffc000105705c>] kthread+0x17c/0x1c0 [ 36.902324] [<fffffc000104a740>] worker_thread+0x0/0x3d0 [ 36.903301] [<fffffc0001011198>] ret_from_kernel_thread+0x18/0x20 [ 36.904277] [<fffffc0001056ee0>] kthread+0x0/0x1c0 [ 36.905254] [<fffffc0001011180>] ret_from_kernel_thread+0x0/0x20 [ 36.907207] Code: [ 36.907207] 00000000 [ 36.908184] 00000000 [ 36.909160] 00063301 [ 36.910137] 00000e90 [ 36.911113] 00001111 [ 36.912090] 0000795c > Have you tried passing rcupdate.rcu_normal=1 as boot parameter > to the kernel? That allowed me to bypass the network interface > problem on my system. Haven't as yet, but backtrace above does not go through rcu routines. Maybe I am seeing a different issue. Cheers, Michael. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-16 22:10 ` Michael Cree @ 2024-12-17 6:23 ` Magnus Lindholm 2024-12-18 19:33 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-17 6:23 UTC (permalink / raw) To: Michael Cree, Magnus Lindholm, John Paul Adrian Glaubitz, paulmck, rcu, linux-alpha > [ 36.896465] Trace: > [ 36.896465] [<fffffc0001049c8c>] process_scheduled_works+0xdc/0x420 > [ 36.898418] [<fffffc000104a740>] worker_thread+0x0/0x3d0 > [ 36.899395] [<fffffc000104a8f0>] worker_thread+0x1b0/0x3d0 > [ 36.900371] [<fffffc000104a740>] worker_thread+0x0/0x3d0 > [ 36.901348] [<fffffc000105705c>] kthread+0x17c/0x1c0 > [ 36.902324] [<fffffc000104a740>] worker_thread+0x0/0x3d0 > [ 36.903301] [<fffffc0001011198>] ret_from_kernel_thread+0x18/0x20 > [ 36.904277] [<fffffc0001056ee0>] kthread+0x0/0x1c0 > [ 36.905254] [<fffffc0001011180>] ret_from_kernel_thread+0x0/0x20 Very interesting, both the RCU bugs that I've encountered, kernel kernel threads have been at work just as the stack goes corrupted The stack trace is therefore often incomplete since return addresses have been overwritten by some one of the threads, some information on where this actually happens might therefore be lost on the stack trace. Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-17 6:23 ` Magnus Lindholm @ 2024-12-18 19:33 ` Magnus Lindholm 2024-12-18 20:31 ` Paul E. McKenney 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-18 19:33 UTC (permalink / raw) To: Michael Cree, Magnus Lindholm, John Paul Adrian Glaubitz, paulmck, rcu, linux-alpha Cc: Maciej W. Rozycki Hi again, So, various versions of GCC can trigger/untrigger the RCU bugs that I've been hitting. This may of course still be due to GCC miscompiling some code on alpha but that said I've taken another look at the code involved when this is triggered. From what I've seen so far, this happens when kernel threads are used in the code to access structured data stored on the stack (structs). The quote below is from a comment in the kernel code (include/linux/percpu-defs.h) "s390 and alpha modules require percpu variables to be defined as weak to force the compiler to generate GOT based external references for them. This is necessary because percpu sections will be located outside of the usually addressable area. This definition puts the following two extra restrictions when defining percpu variables. 1. The symbol must be globally unique, even the static ones. 2. Static percpu variables cannot be defined inside a function." Taking these notes into account I've made some minor modifications to the code (briefly described below). The modifications involve declaring some structs previously placed on stack as DEFINE_PER_CPU_SHARED_ALIGNED. This is already done for some strucs in rcu/smp code. Making these modifications fixes the problem and I can build the kernel with GCC versions that previously triggered the bug. The same approach fixed both the network interface-rename bug as well as the scsi module unload bug. in kernel/smp/smp.c ------------------- #if defined(ARCH_NEEDS_WEAK_PER_CPU) smp_call_function_single(...) use csd = this_cpu_ptr(&csd_data) regardless if its called with wait = 0 or 1. Make sure to declare csd_data as "DEFINE_PER_CPU_SHARED_ALIGNED" #endif in kernel/rcu/tree.c -------------------- #use rew_data instead of stack allocated struct static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_exp_work, rew_data); in kernel/rcu/tree_exp.h ------------------------ #if defined(ARCH_NEEDS_WEAK_PER_CPU) void synchronize_rcu_expedited(void) in stead of: struct rcu_exp_work rew; do: #if defined(ARCH_NEEDS_WEAK_PER_CPU) struct rcu_exp_work *rewp; rewp=this_cpu_ptr(&rew_data); rew_data is declared "DEFINE_PER_CPU_SHARED_ALIGNED" #endif /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-18 19:33 ` Magnus Lindholm @ 2024-12-18 20:31 ` Paul E. McKenney 2024-12-18 21:54 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Paul E. McKenney @ 2024-12-18 20:31 UTC (permalink / raw) To: Magnus Lindholm Cc: Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha, Maciej W. Rozycki On Wed, Dec 18, 2024 at 08:33:09PM +0100, Magnus Lindholm wrote: > Hi again, > > So, various versions of GCC can trigger/untrigger the RCU bugs that > I've been hitting. This may of course still be due to GCC miscompiling > some code on alpha but that said I've taken another look at the code > involved when this is triggered. From what I've seen so far, this > happens when kernel threads are used in the code to access structured > data stored on the stack (structs). The quote below is from a comment > in the kernel code (include/linux/percpu-defs.h) > > "s390 and alpha modules require percpu variables to be defined as weak > to force the compiler to generate GOT based external references for > them. This is necessary because percpu sections will be located > outside of the usually addressable area. This definition puts the > following two extra restrictions when defining percpu variables. > > 1. The symbol must be globally unique, even the static ones. > > 2. Static percpu variables cannot be defined inside a function." > > > Taking these notes into account I've made some minor modifications to > the code (briefly described below). The modifications involve > declaring some structs previously placed on stack as > DEFINE_PER_CPU_SHARED_ALIGNED. This is already done for some strucs in > rcu/smp code. Making these modifications fixes the problem and I can > build the kernel with GCC versions that previously triggered the bug. > The same approach fixed both the network interface-rename bug as well > as the scsi module unload bug. > > in kernel/smp/smp.c > ------------------- > #if defined(ARCH_NEEDS_WEAK_PER_CPU) > smp_call_function_single(...) use csd = this_cpu_ptr(&csd_data) > regardless if its called with wait = 0 or 1. Make sure to declare > csd_data as "DEFINE_PER_CPU_SHARED_ALIGNED" > #endif If I understand your proposed changes correctly, this could increase the amount of time CPUs spend waiting for for the csd_data to become free. This would be a very unwelcome development in some quarters. > in kernel/rcu/tree.c > -------------------- > #use rew_data instead of stack allocated struct > static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_exp_work, rew_data); I am not finding this one. > in kernel/rcu/tree_exp.h > ------------------------ > #if defined(ARCH_NEEDS_WEAK_PER_CPU) > void synchronize_rcu_expedited(void) > > in stead of: > struct rcu_exp_work rew; > > do: > #if defined(ARCH_NEEDS_WEAK_PER_CPU) > struct rcu_exp_work *rewp; > rewp=this_cpu_ptr(&rew_data); > > rew_data is declared "DEFINE_PER_CPU_SHARED_ALIGNED" > > #endif OK, I *can* find this one, thank you. At least assuming you mean the local variable "rew" defined in synchronize_rcu_expedited(). I don't see how this relates to the per-CPU variable restrictions called out above relates here. We do not have any sort of per-CPU variable here, but instead a simple structure allocated on the stack. And a relatively small structure at that. In addition, making this be a per-CPU variable will break if a given CPU invokes a second synchronize_rcu_expedited() before its first call to synchronize_rcu_expedited() returns. You will end up with the second call clobbering that per-CPU variable, which will likely fatally confuse the workqueue when attempting to manage the resulting workqueue handlers. You might need heavy load to make this happen, but I don't see anything preventing it from happening. So what am I missing here? Thanx, Paul ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-18 20:31 ` Paul E. McKenney @ 2024-12-18 21:54 ` Magnus Lindholm 2024-12-18 22:50 ` Paul E. McKenney 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-18 21:54 UTC (permalink / raw) To: paulmck Cc: Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha, Maciej W. Rozycki > If I understand your proposed changes correctly, this could increase > the amount of time CPUs spend waiting for for the csd_data to become free. > This would be a very unwelcome development in some quarters. you're probably right, this might not be the right approach. > preventing it from happening. > > So what am I missing here? Maybe a per-cpu variable is the wrong way to go about it. I'm trying to get a better understanding of why, even quite simple, structs get corrupted when accessed by kernel threads. The compiler may be at fault, but is it possible that there is something else going on, which is specific to alpha and we're seeing more of this on more recent kernels which makes better use of threads? /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-18 21:54 ` Magnus Lindholm @ 2024-12-18 22:50 ` Paul E. McKenney 2024-12-19 22:38 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Paul E. McKenney @ 2024-12-18 22:50 UTC (permalink / raw) To: Magnus Lindholm Cc: Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha, Maciej W. Rozycki On Wed, Dec 18, 2024 at 10:54:21PM +0100, Magnus Lindholm wrote: > > If I understand your proposed changes correctly, this could increase > > the amount of time CPUs spend waiting for for the csd_data to become free. > > This would be a very unwelcome development in some quarters. > > you're probably right, this might not be the right approach. > > > preventing it from happening. > > > > So what am I missing here? > > Maybe a per-cpu variable is the wrong way to go about it. I'm trying to get a > better understanding of why, even quite simple, structs get corrupted when > accessed by kernel threads. The compiler may be at fault, but is it possible > that there is something else going on, which is specific to alpha and we're > seeing more of this on more recent kernels which makes better use of > threads? Are the s390 guys seeing this problem? Maybe they saw it in the past and did something to work around it. In the past, there have been compiler optimizations that would use a large store to update a smaller variable, and then do small stores to fix up the clobbered areas. These were all supposed to have been removed for C11. Might some linger in Alpha's compilers? Thanx, Paul ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-18 22:50 ` Paul E. McKenney @ 2024-12-19 22:38 ` Magnus Lindholm 2024-12-19 23:03 ` Paul E. McKenney 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-19 22:38 UTC (permalink / raw) To: paulmck Cc: Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha, Maciej W. Rozycki > In the past, there have been compiler optimizations that would use a large > store to update a smaller variable, and then do small stores to fix up the > clobbered areas. These were all supposed to have been removed for C11. > Might some linger in Alpha's compilers? Hi again, Making just these (see below) one-liner changes to tree_exp.h and smp.c respectively, lets me boot kernel 6.12.5 on alpha without passing any special rcu parameters to the kernel and it also lets me load/unload my scsi kernel module. Is the alignment of structs just giving gcc more room to clobber the stack without actually hitting anything important or is it relevant for how kernel threads access structs on weak memory-model architectures? /Magnus +++ kernel/rcu/tree_exp.h 2024-12-19 18:55:59.091893649 +0100 @@ -940,10 +939,10 @@ void synchronize_rcu_expedited(void) { unsigned long flags; - struct rcu_exp_work rew; + struct ____cacheline_aligned_in_smp rcu_exp_work rew; struct rcu_node *rnp; unsigned long s; +++ kernel/smp.c 2024-12-19 19:01:20.592819628 +0100 @@ -631,7 +631,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait) { call_single_data_t *csd; - call_single_data_t csd_stack = { + struct ____cacheline_aligned_in_smp __call_single_data csd_stack = { .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, }, }; int this_cpu; ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-19 22:38 ` Magnus Lindholm @ 2024-12-19 23:03 ` Paul E. McKenney 2024-12-20 0:00 ` Maciej W. Rozycki 0 siblings, 1 reply; 62+ messages in thread From: Paul E. McKenney @ 2024-12-19 23:03 UTC (permalink / raw) To: Magnus Lindholm Cc: Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha, Maciej W. Rozycki On Thu, Dec 19, 2024 at 11:38:09PM +0100, Magnus Lindholm wrote: > > In the past, there have been compiler optimizations that would use a large > > store to update a smaller variable, and then do small stores to fix up the > > clobbered areas. These were all supposed to have been removed for C11. > > Might some linger in Alpha's compilers? > > > Hi again, > > Making just these (see below) one-liner changes to tree_exp.h and > smp.c respectively, lets me boot kernel 6.12.5 on alpha without > passing any special rcu parameters to the kernel and it also lets me > load/unload my scsi kernel module. Is the alignment of structs just > giving gcc more room to clobber the stack without actually hitting > anything important or is it relevant for how kernel threads access > structs on weak memory-model architectures? This looks like the aforementioned bug in Alpha's compilers, namely failing to comply with C11's restrictions against compiler-induced data races. I can't say that I am at all excited about accepting these changes because they increase stack size. The best thing would of course be to fix the compiler. If that cannot be done, why not just carry these patches? Thanx, Paul > /Magnus > > +++ kernel/rcu/tree_exp.h 2024-12-19 18:55:59.091893649 +0100 > @@ -940,10 +939,10 @@ > void synchronize_rcu_expedited(void) > { > unsigned long flags; > - struct rcu_exp_work rew; > + struct ____cacheline_aligned_in_smp rcu_exp_work rew; > struct rcu_node *rnp; > unsigned long s; > > > +++ kernel/smp.c 2024-12-19 19:01:20.592819628 +0100 > @@ -631,7 +631,7 @@ > int smp_call_function_single(int cpu, smp_call_func_t func, void *info, > int wait) > { > call_single_data_t *csd; > - call_single_data_t csd_stack = { > + struct ____cacheline_aligned_in_smp __call_single_data csd_stack = { > .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, }, > }; > int this_cpu; ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-19 23:03 ` Paul E. McKenney @ 2024-12-20 0:00 ` Maciej W. Rozycki 2024-12-27 10:42 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2024-12-20 0:00 UTC (permalink / raw) To: Paul E. McKenney, Magnus Lindholm Cc: Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Thu, 19 Dec 2024, Paul E. McKenney wrote: > > Making just these (see below) one-liner changes to tree_exp.h and > > smp.c respectively, lets me boot kernel 6.12.5 on alpha without > > passing any special rcu parameters to the kernel and it also lets me > > load/unload my scsi kernel module. Is the alignment of structs just > > giving gcc more room to clobber the stack without actually hitting > > anything important or is it relevant for how kernel threads access > > structs on weak memory-model architectures? > > This looks like the aforementioned bug in Alpha's compilers, namely > failing to comply with C11's restrictions against compiler-induced > data races. I can't say that I am at all excited about accepting these > changes because they increase stack size. > > The best thing would of course be to fix the compiler. If that cannot > be done, why not just carry these patches? Right. Magnus, has your kernel been built with compiler options implying BWX support? If not, can you please rebuild it accordingly and see if it changes anything? Also a data race between RMW accesses can't be ruled out even with BWX Alphas, because GCC insists on producing those sequences, as I discovered in the course of implementing said GCC fix for data safety[1]. For BWX use it should be ready to build a working kernel right away, because no unaligned LL/SC emulation is required, so Magnus, can you please try the patchset out in the second step and see if it makes any change? Of course it might break things horribly too, as I still haven't got to verifying the BWX side beyond the assembly pattern match snippets in the GCC testsuite (to be done hopefully in the next couple of weeks). References: [1] "Fix data races with sub-longword accesses on Alpha", <https://inbox.sourceware.org/gcc-patches/alpine.DEB.2.21.2411141652300.9262@angie.orcam.me.uk/> Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-20 0:00 ` Maciej W. Rozycki @ 2024-12-27 10:42 ` Magnus Lindholm 2024-12-27 11:48 ` John Paul Adrian Glaubitz 2024-12-27 16:30 ` Maciej W. Rozycki 0 siblings, 2 replies; 62+ messages in thread From: Magnus Lindholm @ 2024-12-27 10:42 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha > > The best thing would of course be to fix the compiler. If that cannot > > be done, why not just carry these patches? > > Right. Magnus, has your kernel been built with compiler options implying > BWX support? If not, can you please rebuild it accordingly and see if it > changes anything? > > Also a data race between RMW accesses can't be ruled out even with BWX > Alphas, because GCC insists on producing those sequences, as I discovered > in the course of implementing said GCC fix for data safety[1]. For BWX > use it should be ready to build a working kernel right away, because no > unaligned LL/SC emulation is required, so Magnus, can you please try the > patchset out in the second step and see if it makes any change? > > Of course it might break things horribly too, as I still haven't got to > verifying the BWX side beyond the assembly pattern match snippets in the > GCC testsuite (to be done hopefully in the next couple of weeks). Hi, I've done some more testing last couple of days and it seems like applying the one-liner "fix" to smp.c (alignment of csd_stack in function smp_call_function_single) is sufficient to mitigate both rcu related bugs (the bugs are not even rcu related). I guess it's pretty simple to just carry this patch until we figure out the root cause. Either way, I've tried to get a better understanding of what gcc is doing differently in the two cases: 1) The code generated by gcc from smp.c reserves 96 bytes of stack space but places csd_stack struct on $sp+79. Since sizeof csd_stack is 32 bytes, it seems to me that [($sp+79) & NOT(0x1f) + sizeof(call_single_data_t)] might be greater than "96+$sp" if say, bit 3 and 4 are set in $sp? or am I missing something here? --------------------------- lda sp,-96(sp) ... lda s0,79(sp) ... andnot s0,0x1f,s0 ... stq zero,8(s0) stq zero,0(s0) stq zero,16(s0) stq zero,24(s0) stl t0,8(s0) [.node = CSD_FLAG_LOCK | CSD_TYPE_SYNC] 2) Using cacheline_aligned_in_smp when declaring csd_stack in smp_call_function_single will actually reserve less stack space (80 bytes in stead of 96), csd_stack is referenced directly using $sp. Maybe alignment is just a way to simplify things for gcc and avoid hitting compiler bugs? -------------------------- lda sp,-80(sp) stq zero,48(sp) stq zero,64(sp) stq zero,72(sp) stl t0,56(sp) [.node = CSD_FLAG_LOCK | CSD_TYPE_SYNC] I've made numerous attempts with different versions of GCC, including the most recent git version (with and without the patches from Maciej) and they give similar results, even though the exact amount of stackspace reserved, registers used, and placement of csd_stack struct will differ somewhat. (GCC) 15.0.0 20241225, with Maciej patches applied, will produce the code below: lda sp,-112(sp) lda t1,47(sp) andnot t1,0x1f,t1 ... Which boots and lets me load/unload my scsi kernel moduel, but just adding some debug print statement to smp_call_function_single will again give a kernel null pointer exception. Printing the value of &csd seems to allocate space for csd on the stack instead of keeping it in registers which will later trigger a null pointer excepting when accessed. To me it seems like this just moves around the stack clobbering problem? CPU 1 rmmod(1444): Oops 1 pc = [<fffffc000078e818>] ra = [<fffffc00003dd0f8>] ps = 0000 Not tainted pc is at llist_add_batch+0x8/0x50 ra is at __smp_call_single_queue+0x38/0xa0 v0 = 0000000000000000 t0 = fffffc0000e2b100 t1 = fffffc0000ec4048 t2 = 0000000000000000 t3 = fffffc0000ec4048 t4 = 0000000000000000 t5 = 0000000000000001 t6 = ffffffffffffffec t7 = fffffc0005d4c000 s0 = 0000000000000000 s1 = 0000000000000001 s2 = 0000000000000001 s3 = 0000000000000001 s4 = fffffc0000cd0330 s5 = fffffc000020ee80 s6 = 00000200010422a0 a0 = 0000000000000000 a1 = 0000000000000000 a2 = fffffc000020f100 a3 = fffffc0005d4fa28 a4 = ffff1020ffffff00 a5 = 0000000000000000 t8 = 0000000000000001 t9 = 0000000000000001 t10= 0000000000000000 t11= 0000000000000000 pv = fffffc000078e810 at = 0000000000000000 gp = fffffc0000e9c980 sp = 00000000905861a6 Disabling lock debugging due to kernel taint Trace: [<fffffc00003dd1bc>] generic_exec_single+0x5c/0x150 [<fffffc00003dd3ec>] smp_call_function_single+0x13c/0x220 [<fffffc000082ceec>] device_release+0x3c/0xf0 [<fffffc00003ae178>] rcu_barrier+0x1b8/0x4d0 [<fffffc00003aaa30>] rcu_barrier_handler+0x0/0x120 [<fffffc00003aaa30>] rcu_barrier_handler+0x0/0x120 [<fffffc0000858418>] scsi_host_dev_release+0x58/0x170 [<fffffc000082cf04>] device_release+0x54/0xf0 [<fffffc0000b501f0>] kobject_put+0x90/0x1b0 [<fffffc000082d0fc>] put_device+0x1c/0x30 [<fffffc00008583ac>] scsi_host_put+0x1c/0x30 [<fffffc00007b9694>] pci_device_remove+0x34/0x90 [<fffffc0000838284>] device_remove+0x64/0xb0 [<fffffc0000839d24>] device_release_driver_internal+0x284/0x370 [<fffffc0000839ecc>] driver_detach+0x7c/0x110 [<fffffc00008377e8>] bus_remove_driver+0x98/0x160 [<fffffc000083a754>] driver_unregister+0x44/0xa0 [<fffffc00007b94e8>] pci_unregister_driver+0x38/0xd0 [<fffffc00003be264>] sys_delete_module+0x174/0x2f0 [<fffffc000031095c>] entMM+0x9c/0xc0 [<fffffc0000310d04>] entSys+0xa4/0xc0 [<fffffc0000310d04>] entSys+0xa4/0xc0 Code: f43ffffb 6bfa8001 47ff041f 2ffe0000 a4120000 <--- ldq v0,0(a2) (*first = READ_ONCE(head->first);) 60004000 <--- mb <b4110000> <--- stq v0,0(a1) (new_last->next = first;) 60004000 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-27 10:42 ` Magnus Lindholm @ 2024-12-27 11:48 ` John Paul Adrian Glaubitz 2024-12-27 16:30 ` Maciej W. Rozycki 1 sibling, 0 replies; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2024-12-27 11:48 UTC (permalink / raw) To: Magnus Lindholm, Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, rcu, linux-alpha Hi Magnus, On Fri, 2024-12-27 at 11:42 +0100, Magnus Lindholm wrote: > > > I've done some more testing last couple of days and it seems like > applying the one-liner "fix" to smp.c (alignment of csd_stack in > function smp_call_function_single) is sufficient to mitigate both rcu > related bugs (the bugs are not even rcu related). Could you post the patch? I would like to give it a try myself. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-27 10:42 ` Magnus Lindholm 2024-12-27 11:48 ` John Paul Adrian Glaubitz @ 2024-12-27 16:30 ` Maciej W. Rozycki 2024-12-31 10:43 ` Magnus Lindholm 1 sibling, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2024-12-27 16:30 UTC (permalink / raw) To: Magnus Lindholm Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Fri, 27 Dec 2024, Magnus Lindholm wrote: > The code generated by gcc from smp.c reserves 96 bytes of stack space > but places csd_stack struct on $sp+79. Since sizeof csd_stack is 32 > bytes, it seems to me that [($sp+79) & NOT(0x1f) + > sizeof(call_single_data_t)] might be greater than "96+$sp" if say, bit > 3 and 4 are set in $sp? or am I missing something here? Umm, no. The psABI guarantees 16-byte alignment for the stack pointer, and under this condition (((x - 17) & ~31) + 32 <= x) is guaranteed to be true (except for the overflow case, of course, which does not apply here). > --------------------------- > lda sp,-96(sp) > ... > lda s0,79(sp) > ... > andnot s0,0x1f,s0 > ... > stq zero,8(s0) > stq zero,0(s0) > stq zero,16(s0) > stq zero,24(s0) > stl t0,8(s0) [.node = CSD_FLAG_LOCK | CSD_TYPE_SYNC] So `csd_stack' isn't at ($old_sp + 79) but rather (($old_sp + 79) & ~31), which is just a way to guarantee 32-byte alignment for the data object where the psABI only guarantees 16-byte alignment for stack allocations. > 2) > Using cacheline_aligned_in_smp when declaring csd_stack in > smp_call_function_single will actually reserve less stack space (80 > bytes in stead of 96), csd_stack is referenced directly using $sp. > Maybe alignment is just a way to simplify things for gcc and avoid > hitting compiler bugs? For the Alpha port generic definitions are used, so I wouldn't draw such conclusions. It's just how things work out by default, and objects of the `call_single_data_t' type are aligned to their size, which is 32, while ____cacheline_aligned_in_smp requests alignment to either 32 or 64 bytes, depending on the kernel configuration... > -------------------------- > lda sp,-80(sp) > stq zero,48(sp) > stq zero,64(sp) > stq zero,72(sp) > stl t0,56(sp) [.node = CSD_FLAG_LOCK | CSD_TYPE_SYNC] ... so I don't really know why it causes the alignment to be decreased here back to the psABI value of 16 bytes. > I've made numerous attempts with different versions of GCC, including > the most recent git version (with and without the patches from Maciej) > and they give similar results, even though the exact amount of > stackspace reserved, registers used, and placement of csd_stack struct > will differ somewhat. (GCC) 15.0.0 20241225, with Maciej patches > applied, will produce the code below: > > lda sp,-112(sp) > lda t1,47(sp) > andnot t1,0x1f,t1 > ... > > Which boots and lets me load/unload my scsi kernel moduel, but just > adding some debug print statement to smp_call_function_single will > again give a kernel null pointer exception. Printing the value of &csd > seems to allocate space for csd on the stack instead of keeping it in > registers which will later trigger a null pointer excepting when > accessed. To me it seems like this just moves around the stack > clobbering problem? A code generation bug cannot be ruled out of course. But it might be an ISO C compliance bug too that just happens to trigger for this particular scenario. > CPU 1 > rmmod(1444): Oops 1 > pc = [<fffffc000078e818>] ra = [<fffffc00003dd0f8>] ps = 0000 Not tainted > pc is at llist_add_batch+0x8/0x50 > ra is at __smp_call_single_queue+0x38/0xa0 > v0 = 0000000000000000 t0 = fffffc0000e2b100 t1 = fffffc0000ec4048 > t2 = 0000000000000000 t3 = fffffc0000ec4048 t4 = 0000000000000000 > t5 = 0000000000000001 t6 = ffffffffffffffec t7 = fffffc0005d4c000 > s0 = 0000000000000000 s1 = 0000000000000001 s2 = 0000000000000001 > s3 = 0000000000000001 s4 = fffffc0000cd0330 s5 = fffffc000020ee80 > s6 = 00000200010422a0 > a0 = 0000000000000000 a1 = 0000000000000000 a2 = fffffc000020f100 > a3 = fffffc0005d4fa28 a4 = ffff1020ffffff00 a5 = 0000000000000000 > t8 = 0000000000000001 t9 = 0000000000000001 t10= 0000000000000000 > t11= 0000000000000000 pv = fffffc000078e810 at = 0000000000000000 > gp = fffffc0000e9c980 sp = 00000000905861a6 > Disabling lock debugging due to kernel taint > Trace: > [<fffffc00003dd1bc>] generic_exec_single+0x5c/0x150 > [<fffffc00003dd3ec>] smp_call_function_single+0x13c/0x220 > [<fffffc000082ceec>] device_release+0x3c/0xf0 > [<fffffc00003ae178>] rcu_barrier+0x1b8/0x4d0 > [<fffffc00003aaa30>] rcu_barrier_handler+0x0/0x120 > [<fffffc00003aaa30>] rcu_barrier_handler+0x0/0x120 > [<fffffc0000858418>] scsi_host_dev_release+0x58/0x170 > [<fffffc000082cf04>] device_release+0x54/0xf0 > [<fffffc0000b501f0>] kobject_put+0x90/0x1b0 > [<fffffc000082d0fc>] put_device+0x1c/0x30 > [<fffffc00008583ac>] scsi_host_put+0x1c/0x30 > [<fffffc00007b9694>] pci_device_remove+0x34/0x90 > [<fffffc0000838284>] device_remove+0x64/0xb0 > [<fffffc0000839d24>] device_release_driver_internal+0x284/0x370 > [<fffffc0000839ecc>] driver_detach+0x7c/0x110 > [<fffffc00008377e8>] bus_remove_driver+0x98/0x160 > [<fffffc000083a754>] driver_unregister+0x44/0xa0 > [<fffffc00007b94e8>] pci_unregister_driver+0x38/0xd0 > [<fffffc00003be264>] sys_delete_module+0x174/0x2f0 > [<fffffc000031095c>] entMM+0x9c/0xc0 > [<fffffc0000310d04>] entSys+0xa4/0xc0 > [<fffffc0000310d04>] entSys+0xa4/0xc0 > > Code: > f43ffffb > 6bfa8001 > 47ff041f > 2ffe0000 > a4120000 <--- ldq v0,0(a2) (*first = READ_ONCE(head->first);) > 60004000 <--- mb > <b4110000> <--- stq v0,0(a1) (new_last->next = first;) > 60004000 So `__smp_call_single_queue' is called with NULL `node'. Would you be able to trace it back further, e.g. by adding BUG_ON(!node) to `__smp_call_single_queue' and so on if required, to see where this NULL pointer comes from originally? I do hope such a minimal probe won't disturb code generation enough for this to become a heisenbug. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-27 16:30 ` Maciej W. Rozycki @ 2024-12-31 10:43 ` Magnus Lindholm 2025-01-12 23:25 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2024-12-31 10:43 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha > Umm, no. The psABI guarantees 16-byte alignment for the stack pointer, > and under this condition (((x - 17) & ~31) + 32 <= x) is guaranteed to be > true (except for the overflow case, of course, which does not apply here). aha! that explains it! thanks, is the psABI available somewhere? > > Would you be able to trace it back further, e.g. by adding BUG_ON(!node) > to `__smp_call_single_queue' and so on if required, to see where this NULL > pointer comes from originally? I do hope such a minimal probe won't > disturb code generation enough for this to become a heisenbug. > Hi, below are some additional test that I've made, It seems to me that part of the stack is overwritten with the values of other local variables. Previously this affected the return address on the stack causing kernels Oops on function return (see previous mail in thread). In this run it seems like when the pointer *csd in smp_call_function_single is stored on the stack, it gets overwritten by writes to csd_stack.info. The difference here is that I use GCC 15.0.0 20241225 (experimental) instead of gcc (Gentoo 14.2.1_p20241116 p3) 14.2.1. To me, this looks like the same problem but the clobbering just hits a different part of the stack. Below is some debug-output, where I've added some print statements to the code in smp_call_function_single of smp.c. When csd_stack is declared as "struct ____cacheline_aligned_in_smp __call_single_data csd_stack", this first case is the case where the code works: unloading the scsi module: -------------------------------------------------- smp: &csd_stack.info=fffffc000493fd90 &csd=fffffc000493fd98 smp: &csd_stack.info=fffffc000493fd90 &csd=fffffc000493fd98 sd 6:0:1:0: [sdb] Synchronizing SCSI cache rcu: rcu_barrier: cpu=0 smp: &csd_stack.info=fffffc000935bc50 &csd=fffffc000935bc58 rcu: rcu_barrier: cpu=1 smp: &csd_stack.info=fffffc000935bc50 &csd=fffffc000935bc58 rcu: rcu_barrier: cpu=2 smp: &csd_stack.info=fffffc000935bc50 &csd=fffffc000935bc58 smp: generic_exec_single: csd=fffffc000935bc38 cpu=2 smp_cpu=2 Below is the same debug output when csd_stack is declared as "call_single_data_t csd_stack" (i.e. no patch applied). For some reason, in this case, the address of the csd variable is the same as the address of csd_stack.info. If this is really the case, no wonder that a write to csd_stack.info will overwrite the csd pointer. In this case the code fails according to below: unloading the scsi module: ----------------------------------------- smp: &csd_stack.info=fffffc000493fd98 &csd=fffffc000493fd98 smp: smp_call_function_single: not wait smp_cpu=1 sd 6:0:1:0: [sdb] Synchronizing SCSI cache rcu: rcu_barrier: cpu=0 smp: &csd_stack.info=fffffc0006207c58 &csd=fffffc0006207c58 smp: generic_exec_single: csd=fffffc0006207c40 cpu=0 smp_cpu=0 Unable to handle kernel paging request at virtual address 0000000000000008 CPU 0 rmmod(1443): Oops 0 pc = [<fffffc00003dd564>] ra = [<fffffc00003dd558>] ps = 0000 Not tainted pc is at smp_call_function_single+0x204/0x220 ra is at smp_call_function_single+0x1f8/0x220 Below is yet another test, here the code works, csd_stack is declared as "call_single_data_t csd_stack" (i.e. no patch applied). In this example the code works since I've added some extra "dummy variables" on the stack which seems to steer things around enough. Here it's also clear that the address of csd does not overlap with the address of csd_stack.info. test0 and test1 are just the extra local variables that I've added. ----------------------------------------- smp: &csd_stack.info=fffffc000493fd78 &csd=fffffc000493fd90 smp: smp_call_function_single: not wait smp_cpu=1 smp: &test0=fffffc000493fd98 smp: &test1=fffffc000493fd88 sd 6:0:1:0: [sdb] Synchronizing SCSI cache rcu: rcu_barrier: cpu=0 smp: &csd_stack.info=fffffc0009e07c38 &csd=fffffc0009e07c50 smp: &test0=fffffc0009e07c58 smp: &test1=fffffc0009e07c48 smp: generic_exec_single: csd=fffffc0009e07c20 cpu=0 smp_cpu=0 Patch I used to "fix" kernel/smp.c ---------------------------------------------------- +++ kernel/smp.c 2024-12-19 19:01:20.592819628 +0100 @@ -631,7 +631,7 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, int wait) { call_single_data_t *csd; - call_single_data_t csd_stack = { + struct ____cacheline_aligned_in_smp __call_single_data csd_stack = { .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, }, }; int this_cpu; /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2024-12-31 10:43 ` Magnus Lindholm @ 2025-01-12 23:25 ` Magnus Lindholm 2025-01-13 0:19 ` Maciej W. Rozycki 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2025-01-12 23:25 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha Hi again. I've been running some more tests, this time with a smp kernel but on a system with just one cpu, seems to me as a bit simpler scenario to analyze. I've added some print statements to smp_call_function_single, just to see what's really going on: pr_warn("smp_call_function_single: %llx %llx size=%d\n",&csd_stack,&csd, sizeof(call_single_data_t)); output is seen below: smp: smp_call_function_single: fffffc000493fc40 fffffc000493fc58 size=32 so, the csd_stack struct is 32-bytes in size but &csd - &csd_stack = 24. This does not make any sense? pr_warn("\n&csd_stack.info=%lx\n&csd=%lx\n",&csd_stack.info,&csd); output according to below: smp: &csd_stack.info=fffffc000493fc58 &csd=fffffc000493fc58 Here csd variable has the same address on the stack as csd_stack.info. Using above information and locking at the disassembly of smp_call_function_single in smp.o I've put together the following table mapping out the stack of smp_call_function_single: $sp+0 ra $sp+8 s0 $sp+16 $sp+24 $sp+32 csd_stack.node 0xfffffc000493fc40 $sp+40 csd_stack.node 0xfffffc000493fc48 $sp+48 csd_stack.func 0xfffffc000493fc50 $sp+56 csd_stack.info 0xfffffc000493fc58 $sp+64 csd 0xfffffc000493fc58 $sp+72 $sp+80 a3 $sp+88 a2 $sp+96 a0 $sp+104 a1 $sp+112 - When requesting csd_stack to be aligned using __attribute__((__aligned__(x))) it seems as if the compiler does not leave enough room above the csd_stack struct. i.e since the exact location of csd_stack depends on the actual value of $sp it is not known at compile time. Seems like gcc does not take this into account. The code works fine if I remove the alignment attribute for csd_stack. Also as previously mentioned, declaring csd_stack as "struct ____cacheline_aligned_in_smp" makes it work, but judging from the disassembly code, this statement has no effect on the alignment of csd_stack, i.e csd_stack is not aligned to anything its simply just placed on the stack, indirectly making it just 16-byte aligned instead of the requested 32-byte alignment. It seems to me that, when used to align variables that reside on the stack, __attribute__((__aligned__(x))) does not work correctly with gcc/alpha/linux. /Magnus On Tue, Dec 31, 2024 at 11:43 AM Magnus Lindholm <linmag7@gmail.com> wrote: > > > Umm, no. The psABI guarantees 16-byte alignment for the stack pointer, > > and under this condition (((x - 17) & ~31) + 32 <= x) is guaranteed to be > > true (except for the overflow case, of course, which does not apply here). > > aha! that explains it! thanks, is the psABI available somewhere? > > > > > Would you be able to trace it back further, e.g. by adding BUG_ON(!node) > > to `__smp_call_single_queue' and so on if required, to see where this NULL > > pointer comes from originally? I do hope such a minimal probe won't > > disturb code generation enough for this to become a heisenbug. > > > Hi, below are some additional test that I've made, > > It seems to me that part of the stack is overwritten with the values > of other local variables. Previously this affected the return address > on the stack causing kernels Oops on function return (see previous > mail in thread). In this run it seems like when the pointer *csd in > smp_call_function_single is stored on the stack, it gets overwritten > by writes to csd_stack.info. The difference here is that I use GCC > 15.0.0 20241225 (experimental) instead of gcc (Gentoo 14.2.1_p20241116 > p3) 14.2.1. To me, this looks like the same problem but the clobbering > just hits a different part of the stack. Below is some debug-output, > where I've added some print statements to the code in > smp_call_function_single of smp.c. When csd_stack is declared as > "struct ____cacheline_aligned_in_smp __call_single_data csd_stack", > this first case is the case where the code works: > > unloading the scsi module: > -------------------------------------------------- > smp: > &csd_stack.info=fffffc000493fd90 > &csd=fffffc000493fd98 > smp: > &csd_stack.info=fffffc000493fd90 > &csd=fffffc000493fd98 > sd 6:0:1:0: [sdb] Synchronizing SCSI cache > rcu: rcu_barrier: cpu=0 > smp: > &csd_stack.info=fffffc000935bc50 > &csd=fffffc000935bc58 > rcu: rcu_barrier: cpu=1 > smp: > &csd_stack.info=fffffc000935bc50 > &csd=fffffc000935bc58 > rcu: rcu_barrier: cpu=2 > smp: > &csd_stack.info=fffffc000935bc50 > &csd=fffffc000935bc58 > smp: generic_exec_single: csd=fffffc000935bc38 cpu=2 smp_cpu=2 > > > > Below is the same debug output when csd_stack is declared as > "call_single_data_t csd_stack" (i.e. no patch applied). For some > reason, in this case, the address of the csd variable is the same as > the address of csd_stack.info. If this is really the case, no wonder > that a write to csd_stack.info will overwrite the csd pointer. In this > case the code fails according to below: > > unloading the scsi module: > ----------------------------------------- > smp: > &csd_stack.info=fffffc000493fd98 > &csd=fffffc000493fd98 > smp: smp_call_function_single: not wait smp_cpu=1 > sd 6:0:1:0: [sdb] Synchronizing SCSI cache > rcu: rcu_barrier: cpu=0 > smp: > &csd_stack.info=fffffc0006207c58 > &csd=fffffc0006207c58 > smp: generic_exec_single: csd=fffffc0006207c40 cpu=0 smp_cpu=0 > Unable to handle kernel paging request at virtual address 0000000000000008 > CPU 0 > rmmod(1443): Oops 0 > pc = [<fffffc00003dd564>] ra = [<fffffc00003dd558>] ps = 0000 Not tainted > pc is at smp_call_function_single+0x204/0x220 > ra is at smp_call_function_single+0x1f8/0x220 > > > > > Below is yet another test, here the code works, csd_stack is declared > as "call_single_data_t csd_stack" (i.e. no patch applied). In this > example the code works since I've added some extra "dummy variables" > on the stack which seems to steer things around enough. Here it's also > clear that the address of csd does not overlap with the address of > csd_stack.info. test0 and test1 are just the extra local variables > that I've added. > > ----------------------------------------- > smp: > &csd_stack.info=fffffc000493fd78 > &csd=fffffc000493fd90 > smp: smp_call_function_single: not wait smp_cpu=1 > smp: &test0=fffffc000493fd98 > smp: &test1=fffffc000493fd88 > sd 6:0:1:0: [sdb] Synchronizing SCSI cache > rcu: rcu_barrier: cpu=0 > smp: > &csd_stack.info=fffffc0009e07c38 > &csd=fffffc0009e07c50 > smp: &test0=fffffc0009e07c58 > smp: &test1=fffffc0009e07c48 > smp: generic_exec_single: csd=fffffc0009e07c20 cpu=0 smp_cpu=0 > > > > > Patch I used to "fix" kernel/smp.c > ---------------------------------------------------- > +++ kernel/smp.c 2024-12-19 19:01:20.592819628 +0100 > @@ -631,7 +631,7 @@ > int smp_call_function_single(int cpu, smp_call_func_t func, void *info, > int wait) > { > call_single_data_t *csd; > - call_single_data_t csd_stack = { > + struct ____cacheline_aligned_in_smp __call_single_data csd_stack = { > .node = { .u_flags = CSD_FLAG_LOCK | CSD_TYPE_SYNC, }, > }; > int this_cpu; > > > > /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-12 23:25 ` Magnus Lindholm @ 2025-01-13 0:19 ` Maciej W. Rozycki 2025-01-13 3:08 ` Maciej W. Rozycki 2025-01-13 5:59 ` Magnus Lindholm 0 siblings, 2 replies; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-13 0:19 UTC (permalink / raw) To: Magnus Lindholm Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Mon, 13 Jan 2025, Magnus Lindholm wrote: > I've been running some more tests, this time with a smp kernel but on > a system with just one cpu, seems to me as a bit simpler scenario to > analyze. I've added some print statements to smp_call_function_single, > just to see what's really going on: > > pr_warn("smp_call_function_single: %llx %llx > size=%d\n",&csd_stack,&csd, sizeof(call_single_data_t)); > > output is seen below: > smp: smp_call_function_single: fffffc000493fc40 fffffc000493fc58 size=32 > so, the csd_stack struct is 32-bytes in size but &csd - &csd_stack = > 24. This does not make any sense? Given information supplied previously it does, see below. > When requesting csd_stack to be aligned using > __attribute__((__aligned__(x))) it seems as if the compiler does not > leave enough room above the csd_stack struct. i.e since the exact > location of csd_stack depends on the actual value of $sp it is not > known at compile time. Seems like gcc does not take this into account. > The code works fine if I remove the alignment attribute for csd_stack. > Also as previously mentioned, declaring csd_stack as "struct > ____cacheline_aligned_in_smp" makes it work, but judging from the > disassembly code, this statement has no effect on the alignment of > csd_stack, i.e csd_stack is not aligned to anything its simply just > placed on the stack, indirectly making it just 16-byte aligned instead > of the requested 32-byte alignment. > > It seems to me that, when used to align variables that reside on the > stack, __attribute__((__aligned__(x))) does not work correctly with > gcc/alpha/linux. I smell psABI breakage somewhere causing stack misalignment upframe. It has happened before here and there. It could genuinely be a GCC bug, but I suspect not. I'd rather suspect handcoded assembly or other kind of a manual stack pointer assignment or adjustment made somewhere. Can you please retrieve the value of SP in `smp_call_function_single'? Just something such as: printk("SP: %016lx\n", __builtin_frame_address(0)); should do. If it does show SP as unaligned, then we can dig in deeper. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-13 0:19 ` Maciej W. Rozycki @ 2025-01-13 3:08 ` Maciej W. Rozycki 2025-01-13 5:59 ` Magnus Lindholm 1 sibling, 0 replies; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-13 3:08 UTC (permalink / raw) To: Magnus Lindholm Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Mon, 13 Jan 2025, Maciej W. Rozycki wrote: > > When requesting csd_stack to be aligned using > > __attribute__((__aligned__(x))) it seems as if the compiler does not > > leave enough room above the csd_stack struct. i.e since the exact > > location of csd_stack depends on the actual value of $sp it is not > > known at compile time. Seems like gcc does not take this into account. > > The code works fine if I remove the alignment attribute for csd_stack. > > Also as previously mentioned, declaring csd_stack as "struct > > ____cacheline_aligned_in_smp" makes it work, but judging from the > > disassembly code, this statement has no effect on the alignment of > > csd_stack, i.e csd_stack is not aligned to anything its simply just > > placed on the stack, indirectly making it just 16-byte aligned instead > > of the requested 32-byte alignment. > > > > It seems to me that, when used to align variables that reside on the > > stack, __attribute__((__aligned__(x))) does not work correctly with > > gcc/alpha/linux. > > I smell psABI breakage somewhere causing stack misalignment upframe. It > has happened before here and there. It could genuinely be a GCC bug, but > I suspect not. I'd rather suspect handcoded assembly or other kind of a > manual stack pointer assignment or adjustment made somewhere. Having actually dug speculatively I can see that the psABI was changed in GCC 3.5 with commit e5e10fb4a350 ("re PR target/14539 (128-bit long double improperly aligned)") back in Mar 2004, when the stack pointer alignment was increased from 8 bytes to 16 bytes, and arch/alpha/kernel/entry.S has various suspicious stack pointer adjustments, starting with SP_OFF which is not a whole multiple of 16. If this psABI change was inevitable (I guess it was), then the kernel side should have been adjusted accordingly. At first glance it seems that arch/alpha/kernel/ptrace.c may have to be updated as well. Signal frame handling code might be worth checking too. Some bugs are good at hiding very well... Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-13 0:19 ` Maciej W. Rozycki 2025-01-13 3:08 ` Maciej W. Rozycki @ 2025-01-13 5:59 ` Magnus Lindholm 2025-01-13 8:04 ` Maciej W. Rozycki 2025-01-13 16:52 ` Magnus Lindholm 1 sibling, 2 replies; 62+ messages in thread From: Magnus Lindholm @ 2025-01-13 5:59 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha > Can you please retrieve the value of SP in `smp_call_function_single'? > Just something such as: > > printk("SP: %016lx\n", __builtin_frame_address(0)); > I will check! Also wondering if this may be relevant? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-13 5:59 ` Magnus Lindholm @ 2025-01-13 8:04 ` Maciej W. Rozycki 2025-01-13 16:52 ` Magnus Lindholm 1 sibling, 0 replies; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-13 8:04 UTC (permalink / raw) To: Magnus Lindholm Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Mon, 13 Jan 2025, Magnus Lindholm wrote: > > Can you please retrieve the value of SP in `smp_call_function_single'? > > Just something such as: > > > > printk("SP: %016lx\n", __builtin_frame_address(0)); > > I will check! Always welcome, but please also see my other follow-up. > Also wondering if this may be relevant? > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 The fix may have uncovered the stack pointer alignment breakage with the Alpha/Linux port. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-13 5:59 ` Magnus Lindholm 2025-01-13 8:04 ` Maciej W. Rozycki @ 2025-01-13 16:52 ` Magnus Lindholm 2025-01-20 13:01 ` Magnus Lindholm 1 sibling, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2025-01-13 16:52 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha Hi, I've sprinkled some stack pointer printouts and its seems like (as suspected) the kernel stack pointer is not 16-byte aligned, at least not in kernel-mode. Example printouts: SP: fffffc00059dfc08 SP: fffffc00059dfe48 SP: fffffc00059dfc08 I found some ML threads that seemed relevant one on stack alignment in NetBSD-alpha after changes in GCC and one on x86_64 with linux/gcc https://mail-index.netbsd.org/port-alpha/2021/07/05/msg001145.html https://patchwork.kernel.org/project/linux-crypto/patch/20170110143340.GA3787@gondor.apana.org.au/ /Magnus On Mon, Jan 13, 2025 at 6:59 AM Magnus Lindholm <linmag7@gmail.com> wrote: > > > Can you please retrieve the value of SP in `smp_call_function_single'? > > Just something such as: > > > > printk("SP: %016lx\n", __builtin_frame_address(0)); > > > > > I will check! > > Also wondering if this may be relevant? > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-13 16:52 ` Magnus Lindholm @ 2025-01-20 13:01 ` Magnus Lindholm 2025-01-20 13:19 ` Maciej W. Rozycki 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2025-01-20 13:01 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha Hi, Attempting to summarize the findings relating to this bug as well as the comments in the mail thread, my understanding is this: In order to conform to the psABI, gcc was changed (back in 2004?) to assume 16-byte stack alignment on Linux/alpha. This seems to be the case for user-mode processes stacks, however not for the stack in kernel-mode. Some simple printouts of stack pointers in kernel mode suggest that the kernel stack is in fact only 8-byte aligned on Linux/alpha. In comparison, I've made similar checks on hppa, sparc and x86_64 and on these platforms the kernel stack seems to be 16-byte aligned (at least). If gcc assumes 16-byte alignment, and the code uses __attribute__((__aligned__(x))) gcc will generate assembly code that may cause stack corruption, if the stack at run-time is in fact only 8-byte aligned. A quick-fix/workaround for this might be to avoid using the __attribute__((__aligned__(x))) directive on variables/structs declared on the stack in the kernel code (at least for alpha), but to really get to the bottom of this, the kernel needs to be fixed so that the kernel stack is in fact always 16-byte aligned on alpha. This means that this bug is not really related to rcu or smp but rather a mismatch between gcc and linux-alpha regarding psABI compliance. /Magnus On Mon, Jan 13, 2025 at 5:52 PM Magnus Lindholm <linmag7@gmail.com> wrote: > > Hi, I've sprinkled some stack pointer printouts and its seems like (as > suspected) the kernel stack pointer is not 16-byte aligned, at least > not in kernel-mode. > > Example printouts: > SP: fffffc00059dfc08 > SP: fffffc00059dfe48 > SP: fffffc00059dfc08 > > I found some ML threads that seemed relevant one on stack alignment in > NetBSD-alpha after changes in GCC and one on x86_64 with linux/gcc > > https://mail-index.netbsd.org/port-alpha/2021/07/05/msg001145.html > > https://patchwork.kernel.org/project/linux-crypto/patch/20170110143340.GA3787@gondor.apana.org.au/ > > > > /Magnus > > On Mon, Jan 13, 2025 at 6:59 AM Magnus Lindholm <linmag7@gmail.com> wrote: > > > > > Can you please retrieve the value of SP in `smp_call_function_single'? > > > Just something such as: > > > > > > printk("SP: %016lx\n", __builtin_frame_address(0)); > > > > > > > > > I will check! > > > > Also wondering if this may be relevant? > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-20 13:01 ` Magnus Lindholm @ 2025-01-20 13:19 ` Maciej W. Rozycki 2025-01-21 13:39 ` Ivan Kokshaysky 0 siblings, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-20 13:19 UTC (permalink / raw) To: Magnus Lindholm Cc: Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Mon, 20 Jan 2025, Magnus Lindholm wrote: > 8-byte aligned. A quick-fix/workaround for this might be to avoid > using the __attribute__((__aligned__(x))) directive on > variables/structs declared on the stack in the kernel code (at least > for alpha), but to really get to the bottom of this, the kernel needs > to be fixed so that the kernel stack is in fact always 16-byte aligned > on alpha. This means that this bug is not really related to rcu or smp > but rather a mismatch between gcc and linux-alpha regarding psABI > compliance. Correct. Unless someone beats me to it, I'll work on a fix, but I'm busy with other stuff right now, so it may take a few weeks. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-20 13:19 ` Maciej W. Rozycki @ 2025-01-21 13:39 ` Ivan Kokshaysky 2025-01-23 18:36 ` Ivan Kokshaysky 0 siblings, 1 reply; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-21 13:39 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Mon, Jan 20, 2025 at 01:19:23PM +0000, Maciej W. Rozycki wrote: > On Mon, 20 Jan 2025, Magnus Lindholm wrote: > > > 8-byte aligned. A quick-fix/workaround for this might be to avoid > > using the __attribute__((__aligned__(x))) directive on > > variables/structs declared on the stack in the kernel code (at least > > for alpha), but to really get to the bottom of this, the kernel needs > > to be fixed so that the kernel stack is in fact always 16-byte aligned > > on alpha. This means that this bug is not really related to rcu or smp > > but rather a mismatch between gcc and linux-alpha regarding psABI > > compliance. > > Correct. Unless someone beats me to it, I'll work on a fix, but I'm busy > with other stuff right now, so it may take a few weeks. I'll look into that (I owe you for helping me with the binutils patch :) Indeed, SP_OFF in entry.S is the main suspect at the moment. Ivan. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-21 13:39 ` Ivan Kokshaysky @ 2025-01-23 18:36 ` Ivan Kokshaysky 2025-01-23 23:00 ` Magnus Lindholm ` (2 more replies) 0 siblings, 3 replies; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-23 18:36 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Tue, Jan 21, 2025 at 02:39:12PM +0100, Ivan Kokshaysky wrote: > Indeed, SP_OFF in entry.S is the main suspect at the moment. In fact, it's the odd number of longs (29) in struct pt_regs that makes the stack misaligned by 8 bytes. The patch below works for me - no more oopses in rcu-torture test. Unless I'm missing something, this change shouldn't have any ill effects. Ivan. diff --git a/arch/alpha/include/uapi/asm/ptrace.h b/arch/alpha/include/uapi/asm/ptrace.h index 5ca45934fcbb..d2e8e69a18f1 100644 --- a/arch/alpha/include/uapi/asm/ptrace.h +++ b/arch/alpha/include/uapi/asm/ptrace.h @@ -49,7 +49,7 @@ struct pt_regs { unsigned long r16; unsigned long r17; unsigned long r18; -}; +} __attribute__((aligned(16))); /* GCC expects 16-byte stack alignment */ /* * This is the extended stack used by signal handlers and the context ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-23 18:36 ` Ivan Kokshaysky @ 2025-01-23 23:00 ` Magnus Lindholm 2025-01-23 23:51 ` Michael Cree 2025-01-23 23:57 ` Maciej W. Rozycki 2025-01-24 6:54 ` John Paul Adrian Glaubitz 2 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2025-01-23 23:00 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > the stack misaligned by 8 bytes. The patch below works for me - no more > oopses in rcu-torture test. > > Unless I'm missing something, this change shouldn't have any ill effects. > > Ivan. Very nice! This seems to do the trick! I've been chasing this bug for quite some time now and with this patch it seems to work! applied to my systems and no problems so far. I was looking at a mail-thread working with 16-bit stack alignment on x86_64 and there the solution seems a lot more complex than a one-liner. /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-23 23:00 ` Magnus Lindholm @ 2025-01-23 23:51 ` Michael Cree 0 siblings, 0 replies; 62+ messages in thread From: Michael Cree @ 2025-01-23 23:51 UTC (permalink / raw) To: Magnus Lindholm Cc: Ivan Kokshaysky, Maciej W. Rozycki, Paul E. McKenney, John Paul Adrian Glaubitz, rcu, linux-alpha On Fri, Jan 24, 2025 at 12:00:41AM +0100, Magnus Lindholm wrote: > > > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > > the stack misaligned by 8 bytes. The patch below works for me - no more > > oopses in rcu-torture test. > > > > Unless I'm missing something, this change shouldn't have any ill effects. > > > > Ivan. > > > Very nice! This seems to do the trick! I've been chasing this bug for > quite some time now and with this patch it seems to work! applied to > my systems and no problems so far. I was looking at a mail-thread > working with 16-bit stack alignment on x86_64 and there the solution > seems a lot more complex than a one-liner. Also installed kernel with the patch on an ES45 system and it has booted without any OOPses and is running nicely! Cheers Michael. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-23 18:36 ` Ivan Kokshaysky 2025-01-23 23:00 ` Magnus Lindholm @ 2025-01-23 23:57 ` Maciej W. Rozycki 2025-01-24 6:06 ` Magnus Lindholm 2025-01-24 10:55 ` Ivan Kokshaysky 2025-01-24 6:54 ` John Paul Adrian Glaubitz 2 siblings, 2 replies; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-23 23:57 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Thu, 23 Jan 2025, Ivan Kokshaysky wrote: > > Indeed, SP_OFF in entry.S is the main suspect at the moment. > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > the stack misaligned by 8 bytes. The patch below works for me - no more > oopses in rcu-torture test. > > Unless I'm missing something, this change shouldn't have any ill effects. Umm, this is a part of UAPI, and the change in alignment changes the ABI (think padding where `struct pt_regs' has been embedded into another structure), so AFAICT it is a no-no. But the only place I could quickly find this should matter for is this: /* ... and find our stack ... */ lda $30,0x4000 - SIZEOF_PT_REGS($8) which should be straightforward to fix: lda $30,0x4000 - ((SIZEOF_PT_REGS + 15) & ~15)($8) or suchlike. Have I missed anything? Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-23 23:57 ` Maciej W. Rozycki @ 2025-01-24 6:06 ` Magnus Lindholm 2025-01-24 10:55 ` Ivan Kokshaysky 1 sibling, 0 replies; 62+ messages in thread From: Magnus Lindholm @ 2025-01-24 6:06 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ivan Kokshaysky, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha > Umm, this is a part of UAPI, and the change in alignment changes the ABI > (think padding where `struct pt_regs' has been embedded into another > structure), so AFAICT it is a no-no. > > But the only place I could quickly find this should matter for is this: > > /* ... and find our stack ... */ > lda $30,0x4000 - SIZEOF_PT_REGS($8) > > which should be straightforward to fix: > > lda $30,0x4000 - ((SIZEOF_PT_REGS + 15) & ~15)($8) > > or suchlike. Have I missed anything? > > Maciej > I left my ES40 doing "emerge --update --newuse --deep @world. Which failed and I'm seeing stuff like this in the logs: conftest(29098): unaligned trap at 00000200010005f0: 00000000bb65ed99 28 2 conftest(29098): unaligned trap at 00000200010005f8: 00000000dab6f47f 28 1 Unable to handle kernel paging request at virtual address 535853452d4d4245 CPU 2 cp(19064): Oops 0 pc = [<fffffc00004ac9dc>] ra = [<fffffc0000497630>] ps = 0000 Not tainted pc is at __d_lookup_rcu+0x6c/0x110 ra is at lookup_fast+0x40/0x200 v0 = 535853452d4d4249 t0 = 0000000000000000 t1 = fffffc0003780000 t2 = 000000003a01009f t3 = 6465637379730035 t4 = 0000000000000000 t5 = 0000000000000001 t6 = 0000001400000000 t7 = fffffc00a8e8c000 s0 = fffffc00a8e8fdb0 s1 = 0000000000000001 s2 = fffffc00a8e8fdb0 s3 = 000000011fe45840 s4 = 000002000096408d s5 = 000000011fe46060 s6 = 00000000000003ff a0 = fffffc00fb5260c0 a1 = fffffc00a8e8fdc0 a2 = fffffc00a8e8fdf4 a3 = 0000000000000000 a4 = 0000000000004050 a5 = 00000000021ab34f t8 = fffffc00fb5260c0 t9 = fffffc00a8e8fdf4 t10= 000000146e2931da t11= fffffc000a52b089 pv = fffffc00004ac970 at = fffffc000a52a800 gp = fffffc0000e9c980 sp = 000000001a40800e Disabling lock debugging due to kernel taint Trace: [<fffffc000049cd90>] path_lookupat+0xb0/0x260 [<fffffc000049d620>] filename_lookup+0x90/0x180 [<fffffc000048ce5c>] do_readlinkat.part.0+0x6c/0x200 [<fffffc000043a0e8>] __handle_mm_fault+0xb18/0xd60 [<fffffc000049cf78>] getname_flags+0x38/0x270 [<fffffc0000494bd8>] path_put+0x38/0x50 [<fffffc000048ce34>] do_readlinkat.part.0+0x44/0x200 [<fffffc000031095c>] entMM+0x9c/0xc0 [<fffffc000048d7f0>] sys_readlink+0x30/0x50 [<fffffc0000310d04>] entSys+0xa4/0xc0 /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-23 23:57 ` Maciej W. Rozycki 2025-01-24 6:06 ` Magnus Lindholm @ 2025-01-24 10:55 ` Ivan Kokshaysky 2025-01-24 16:57 ` Magnus Lindholm 2025-01-25 15:35 ` Maciej W. Rozycki 1 sibling, 2 replies; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-24 10:55 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Thu, Jan 23, 2025 at 11:57:03PM +0000, Maciej W. Rozycki wrote: > On Thu, 23 Jan 2025, Ivan Kokshaysky wrote: > > > > Indeed, SP_OFF in entry.S is the main suspect at the moment. > > > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > > the stack misaligned by 8 bytes. The patch below works for me - no more > > oopses in rcu-torture test. > > > > Unless I'm missing something, this change shouldn't have any ill effects. > > Umm, this is a part of UAPI, and the change in alignment changes the ABI > (think padding where `struct pt_regs' has been embedded into another > structure), so AFAICT it is a no-no. Well, the only userspace applications I can think of that need kernel stack layout are debuggers, but at least alpha gdb doesn't use this header. Doesn't matter, though - padding *after* PAL-saved registers is wrong thing to do. I think it's the reason for oopses that Magnus reported today. A "long" padding memder of pt_regs placed *before* PAL-saved registers would be a proper fix for kernel, but it most likely would break gdb... > But the only place I could quickly find this should matter for is this: > > /* ... and find our stack ... */ > lda $30,0x4000 - SIZEOF_PT_REGS($8) > > which should be straightforward to fix: > > lda $30,0x4000 - ((SIZEOF_PT_REGS + 15) & ~15)($8) > > or suchlike. Have I missed anything? That's the first thing I thought of too, but no, it's just a kernel entry point after the bootloader. The stack pointer of kernel threads is assigned in alpha/kernel/process.c. Particularly, these macros in ptrace.h (non-uapi) are interesting: #define task_pt_regs(task) \ ((struct pt_regs *) (task_stack_page(task) + 2*PAGE_SIZE) - 1) #define current_pt_regs() \ ((struct pt_regs *) ((char *)current_thread_info() + 2*PAGE_SIZE) - 1) I'll try to play with alignment here, but it will take some time. Ivan. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-24 10:55 ` Ivan Kokshaysky @ 2025-01-24 16:57 ` Magnus Lindholm 2025-01-25 15:15 ` Ivan Kokshaysky 2025-01-25 15:35 ` Maciej W. Rozycki 1 sibling, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2025-01-24 16:57 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha Are there other parts of the code that might unalign the stack, even if the stack is properly aligned to begin with? i.e passing an uneven number of function arguments on the stack or inside interrupt handlers? Alpha does not make use of a separate interrupt stack, right? On stack alignment in "ALPHA Calling Standard": D.3.1 Stack Alignment "This standard requires that stacks be octaword aligned at the time a new procedure is invoked. During the body of a procedure, however, there is no requirement to keep this level of alignment (even though it may be beneficial). This implies that any asynchronous interrupt handlers must properly align the stack before any standard calls are made." For now I've reverted the stack alignment patch and I'm now just applying this patch. Now the ES40 is stable again --- smp.orig 2025-01-24 17:50:18.822326213 +0100 +++ smp.h 2025-01-24 17:51:13.772835769 +0100 @@ -30,8 +30,7 @@ (struct __call_single_data){ .func = (_func), .info = (_info), } /* Use __aligned() to avoid to use 2 cache lines for 1 csd */ -typedef struct __call_single_data call_single_data_t - __aligned(sizeof(struct __call_single_data)); +typedef struct __call_single_data call_single_data_t; #define INIT_CSD(_csd, _func, _info) \ do { \ /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-24 16:57 ` Magnus Lindholm @ 2025-01-25 15:15 ` Ivan Kokshaysky 2025-01-25 17:01 ` Maciej W. Rozycki 2025-01-25 18:07 ` Magnus Lindholm 0 siblings, 2 replies; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-25 15:15 UTC (permalink / raw) To: Magnus Lindholm Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Fri, Jan 24, 2025 at 05:57:05PM +0100, Magnus Lindholm wrote: > Are there other parts of the code that might unalign the stack, even > if the stack is properly aligned to begin with? i.e passing an uneven > number of function arguments on the stack or inside interrupt > handlers? Alpha does not make use of a separate interrupt stack, > right? Good questions. No, there is no separate interrupt stack, it's always the kernel one. Stack frames from interrupts in user mode are 64-byte aligned though. Interrupts in kernel mode, user mode syscalls and exceptions all use 6 x 64-bit word frames and do not change the stack [mis]alignment. So, what we have now: 1. The "normal" kernel stack is always misaligned by 8, thanks to the sizeof(struct pt_regs); 2. Syscalls and exceptions handlers receive 16-byte aligned stack, as it gets "fixed" by SAVE_ALL macro in entry.S, which pushes the odd number of registers on the stack; 3. Interrupt handlers may, or may not, have got an aligned stack depending on kernel/user mode in which the interrupt had come. Ugh. > On stack alignment in "ALPHA Calling Standard": > D.3.1 Stack Alignment > > "This standard requires that stacks be octaword aligned at the time a > new procedure is invoked. During the body of a procedure, however, > there is no requirement to keep this level of alignment (even though > it may be beneficial). This implies that any asynchronous interrupt > handlers must properly align the stack before any standard calls are > made." I hope we can rely on GCC changing $sp only by multiplies of 16. Magnus, can you please try this variant? (Yes, there is still the UAPI issue that Maciej pointed out, but that's another story.) Ivan. diff --git a/arch/alpha/include/uapi/asm/ptrace.h b/arch/alpha/include/uapi/asm/ptrace.h index 5ca45934fcbb..72ed913a910f 100644 --- a/arch/alpha/include/uapi/asm/ptrace.h +++ b/arch/alpha/include/uapi/asm/ptrace.h @@ -42,6 +42,8 @@ struct pt_regs { unsigned long trap_a0; unsigned long trap_a1; unsigned long trap_a2; +/* This makes the stack 16-byte aligned as GCC expects */ + unsigned long __pad0; /* These are saved by PAL-code: */ unsigned long ps; unsigned long pc; diff --git a/arch/alpha/kernel/asm-offsets.c b/arch/alpha/kernel/asm-offsets.c index 4cfeae42c79a..e9dad60b147f 100644 --- a/arch/alpha/kernel/asm-offsets.c +++ b/arch/alpha/kernel/asm-offsets.c @@ -19,9 +19,13 @@ static void __used foo(void) DEFINE(TI_STATUS, offsetof(struct thread_info, status)); BLANK(); + DEFINE(SP_OFF, offsetof(struct pt_regs, ps)); DEFINE(SIZEOF_PT_REGS, sizeof(struct pt_regs)); BLANK(); + DEFINE(SWITCH_STACK_SIZE, sizeof(struct switch_stack)); + BLANK(); + DEFINE(HAE_CACHE, offsetof(struct alpha_machine_vector, hae_cache)); DEFINE(HAE_REG, offsetof(struct alpha_machine_vector, hae_register)); } diff --git a/arch/alpha/kernel/entry.S b/arch/alpha/kernel/entry.S index dd26062d75b3..6fb38365539d 100644 --- a/arch/alpha/kernel/entry.S +++ b/arch/alpha/kernel/entry.S @@ -15,10 +15,6 @@ .set noat .cfi_sections .debug_frame -/* Stack offsets. */ -#define SP_OFF 184 -#define SWITCH_STACK_SIZE 64 - .macro CFI_START_OSF_FRAME func .align 4 .globl \func ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 15:15 ` Ivan Kokshaysky @ 2025-01-25 17:01 ` Maciej W. Rozycki 2025-01-25 17:43 ` Ivan Kokshaysky 2025-01-25 18:07 ` Magnus Lindholm 1 sibling, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-25 17:01 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, 25 Jan 2025, Ivan Kokshaysky wrote: > > Are there other parts of the code that might unalign the stack, even > > if the stack is properly aligned to begin with? i.e passing an uneven > > number of function arguments on the stack or inside interrupt > > handlers? Alpha does not make use of a separate interrupt stack, > > right? > > Good questions. No, there is no separate interrupt stack, it's always the > kernel one. Stack frames from interrupts in user mode are 64-byte aligned > though. Interrupts in kernel mode, user mode syscalls and exceptions all > use 6 x 64-bit word frames and do not change the stack [mis]alignment. > > So, what we have now: > 1. The "normal" kernel stack is always misaligned by 8, thanks to > the sizeof(struct pt_regs); > 2. Syscalls and exceptions handlers receive 16-byte aligned stack, as it > gets "fixed" by SAVE_ALL macro in entry.S, which pushes the odd number > of registers on the stack; > 3. Interrupt handlers may, or may not, have got an aligned stack depending > on kernel/user mode in which the interrupt had come. > > Ugh. Yeah, just as I observed in my other reply, but notice that syscalls and exceptions handlers typically actually do *not* receive a 16-byte aligned stack now. > > On stack alignment in "ALPHA Calling Standard": > > D.3.1 Stack Alignment > > > > "This standard requires that stacks be octaword aligned at the time a > > new procedure is invoked. During the body of a procedure, however, > > there is no requirement to keep this level of alignment (even though > > it may be beneficial). This implies that any asynchronous interrupt > > handlers must properly align the stack before any standard calls are > > made." > > I hope we can rely on GCC changing $sp only by multiplies of 16. Absolutely, the compiler would be completely broken otherwise. > (Yes, there is still the UAPI issue that Maciej pointed out, but that's > another story.) So we now have two variants to pick from. I wish we could take yours as it's certainly neater, but is it safe enough? I can see arch/alpha/include/uapi/asm/ptrace.h was only incarnated as late as in 2012 with commit 96433f6ee490 ("UAPI: (Scripted) Disintegrate arch/alpha/include/asm") and according to the change heading made in an automated way, with little public discussion, so maybe its existence is actually an accident? Unlike some other platforms we don't expose this `struct pt_regs' via ptrace(2) for PTRACE_GETREGS/PTRACE_SETREGS, which we don't implement. NB, here's a corresponding stack alignment report for your change: start_kernel: SP: fffffc0000dcfe90 do_entInt: SP: fffffc0000dcfc60 copy_thread: SP: fffffc0000dcfc90, regs: fffffc0000dcff10, childregs: fffffc0001837f10, childstack: fffffc0001837ed0 do_page_fault: SP: fffffc0001837bb8 sys_exit_group: SP: fffffc00028bfef0 do_entUnaUser: SP: fffffc0001bcfe68 do_entArith: SP: fffffc0001dfbee0 do_entIF: SP: fffffc0001fafee0 so there's still work to be done for `entMM' and `entUna' exceptions. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 17:01 ` Maciej W. Rozycki @ 2025-01-25 17:43 ` Ivan Kokshaysky 2025-01-25 18:25 ` Maciej W. Rozycki 0 siblings, 1 reply; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-25 17:43 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, Jan 25, 2025 at 05:01:42PM +0000, Maciej W. Rozycki wrote: > Yeah, just as I observed in my other reply, but notice that syscalls and > exceptions handlers typically actually do *not* receive a 16-byte aligned > stack now. Interesting. Perhaps these frames are aligned by PAL-code as well, the reference manual wasn't clear about that. > > > On stack alignment in "ALPHA Calling Standard": > > > D.3.1 Stack Alignment > > > > > > "This standard requires that stacks be octaword aligned at the time a > > > new procedure is invoked. During the body of a procedure, however, > > > there is no requirement to keep this level of alignment (even though > > > it may be beneficial). This implies that any asynchronous interrupt > > > handlers must properly align the stack before any standard calls are > > > made." > > > > I hope we can rely on GCC changing $sp only by multiplies of 16. > > Absolutely, the compiler would be completely broken otherwise. > > > (Yes, there is still the UAPI issue that Maciej pointed out, but that's > > another story.) > > So we now have two variants to pick from. I wish we could take yours as > it's certainly neater, but is it safe enough? > > I can see arch/alpha/include/uapi/asm/ptrace.h was only incarnated as > late as in 2012 with commit 96433f6ee490 ("UAPI: (Scripted) Disintegrate > arch/alpha/include/asm") and according to the change heading made in an > automated way, with little public discussion, so maybe its existence is > actually an accident? Unlike some other platforms we don't expose this > `struct pt_regs' via ptrace(2) for PTRACE_GETREGS/PTRACE_SETREGS, which > we don't implement. Yeah, a bit of e-mail desync, sorry :) At the moment I compile gdb with empty asm/ptrace.h just to be 100% sure. > NB, here's a corresponding stack alignment report for your change: > > start_kernel: SP: fffffc0000dcfe90 > do_entInt: SP: fffffc0000dcfc60 > copy_thread: SP: fffffc0000dcfc90, regs: fffffc0000dcff10, childregs: fffffc0001837f10, childstack: fffffc0001837ed0 > do_page_fault: SP: fffffc0001837bb8 > sys_exit_group: SP: fffffc00028bfef0 > do_entUnaUser: SP: fffffc0001bcfe68 > do_entArith: SP: fffffc0001dfbee0 > do_entIF: SP: fffffc0001fafee0 > > so there's still work to be done for `entMM' and `entUna' exceptions. I knew about entUna, I thought it's safe as it only deals with 64-bit data and not going to be changed in future, but missed entMM... I agree, better fix both. Ivan. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 17:43 ` Ivan Kokshaysky @ 2025-01-25 18:25 ` Maciej W. Rozycki 2025-01-25 18:59 ` Maciej W. Rozycki 0 siblings, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-25 18:25 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, 25 Jan 2025, Ivan Kokshaysky wrote: > > Yeah, just as I observed in my other reply, but notice that syscalls and > > exceptions handlers typically actually do *not* receive a 16-byte aligned > > stack now. > > Interesting. Perhaps these frames are aligned by PAL-code as well, > the reference manual wasn't clear about that. I think it just boils down to the amount of exception nesting. > > So we now have two variants to pick from. I wish we could take yours as > > it's certainly neater, but is it safe enough? > > > > I can see arch/alpha/include/uapi/asm/ptrace.h was only incarnated as > > late as in 2012 with commit 96433f6ee490 ("UAPI: (Scripted) Disintegrate > > arch/alpha/include/asm") and according to the change heading made in an > > automated way, with little public discussion, so maybe its existence is > > actually an accident? Unlike some other platforms we don't expose this > > `struct pt_regs' via ptrace(2) for PTRACE_GETREGS/PTRACE_SETREGS, which > > we don't implement. > > Yeah, a bit of e-mail desync, sorry :) No worries. > At the moment I compile gdb with empty asm/ptrace.h just to be 100% sure. I can't see this stuff being used anywhere in Alpha/GDB. > > NB, here's a corresponding stack alignment report for your change: > > > > start_kernel: SP: fffffc0000dcfe90 > > do_entInt: SP: fffffc0000dcfc60 > > copy_thread: SP: fffffc0000dcfc90, regs: fffffc0000dcff10, childregs: fffffc0001837f10, childstack: fffffc0001837ed0 > > do_page_fault: SP: fffffc0001837bb8 > > sys_exit_group: SP: fffffc00028bfef0 > > do_entUnaUser: SP: fffffc0001bcfe68 > > do_entArith: SP: fffffc0001dfbee0 > > do_entIF: SP: fffffc0001fafee0 > > > > so there's still work to be done for `entMM' and `entUna' exceptions. > > I knew about entUna, I thought it's safe as it only deals with 64-bit data > and not going to be changed in future, but missed entMM... > > I agree, better fix both. Well, we may get away with it in many cases, which is obviously why this bug has survived so long, but in principle it is not safe to enter C code with the stack misaligned, so yes, we need to fix all the code paths, also because a nested exception will cause hell to break loose. Here just bumping up the frame size and adjusting offsets in assembly code accordingly so as to account for the empty longword at the bottom of the frame should do, just as I did across my change. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 18:25 ` Maciej W. Rozycki @ 2025-01-25 18:59 ` Maciej W. Rozycki 2025-01-25 19:48 ` Ivan Kokshaysky 0 siblings, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-25 18:59 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, 25 Jan 2025, Maciej W. Rozycki wrote: > > Interesting. Perhaps these frames are aligned by PAL-code as well, > > the reference manual wasn't clear about that. > > I think it just boils down to the amount of exception nesting. Ah, actually most of these faults were entered from the user mode and the kernel stack starts at a page boundary, so once the stack frame has been allocated by PALcode and SAVE_ALL combined the stack pointer ends up misaligned. For faults entered from the kernel mode the opposite might be the case. So unless we want to play (and we don't) with FP and the saving and restoration of SP, we just want to keep SP aligned at all times. > > I knew about entUna, I thought it's safe as it only deals with 64-bit data > > and not going to be changed in future, but missed entMM... > > > > I agree, better fix both. > > Well, we may get away with it in many cases, which is obviously why > this bug has survived so long, but in principle it is not safe to enter > C code with the stack misaligned, so yes, we need to fix all the code > paths, also because a nested exception will cause hell to break loose. > > Here just bumping up the frame size and adjusting offsets in assembly > code accordingly so as to account for the empty longword at the bottom > of the frame should do, just as I did across my change. ... or, depending on how you look at it, top of the frame and FAOD in any case the longword closest to the stack pointer will be the empty one. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 18:59 ` Maciej W. Rozycki @ 2025-01-25 19:48 ` Ivan Kokshaysky 2025-01-25 22:06 ` Maciej W. Rozycki 0 siblings, 1 reply; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-25 19:48 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, Jan 25, 2025 at 06:59:32PM +0000, Maciej W. Rozycki wrote: > On Sat, 25 Jan 2025, Maciej W. Rozycki wrote: > > > > Interesting. Perhaps these frames are aligned by PAL-code as well, > > > the reference manual wasn't clear about that. > > > > I think it just boils down to the amount of exception nesting. > > Ah, actually most of these faults were entered from the user mode and the > kernel stack starts at a page boundary, so once the stack frame has been > allocated by PALcode and SAVE_ALL combined the stack pointer ends up > misaligned. For faults entered from the kernel mode the opposite might be > the case. Indeed, sounds plausible. > So unless we want to play (and we don't) with FP and the saving > and restoration of SP, we just want to keep SP aligned at all times. > > > I knew about entUna, I thought it's safe as it only deals with 64-bit data > > > and not going to be changed in future, but missed entMM... > > > > > > I agree, better fix both. > > > > Well, we may get away with it in many cases, which is obviously why > > this bug has survived so long, but in principle it is not safe to enter > > C code with the stack misaligned, so yes, we need to fix all the code > > paths, also because a nested exception will cause hell to break loose. > > > > Here just bumping up the frame size and adjusting offsets in assembly > > code accordingly so as to account for the empty longword at the bottom > > of the frame should do, just as I did across my change. > > ... or, depending on how you look at it, top of the frame and FAOD in any > case the longword closest to the stack pointer will be the empty one. Right. So if we agree on my variant, this addition patch is needed. [No problems with gdb, as expected.] Ivan. diff --git a/arch/alpha/kernel/entry.S b/arch/alpha/kernel/entry.S index 6fb38365539d..f4d41b4538c2 100644 --- a/arch/alpha/kernel/entry.S +++ b/arch/alpha/kernel/entry.S @@ -194,8 +194,8 @@ CFI_END_OSF_FRAME entArith CFI_START_OSF_FRAME entMM SAVE_ALL /* save $9 - $15 so the inline exception code can manipulate them. */ - subq $sp, 56, $sp - .cfi_adjust_cfa_offset 56 + subq $sp, 64, $sp + .cfi_adjust_cfa_offset 64 stq $9, 0($sp) stq $10, 8($sp) stq $11, 16($sp) @@ -210,7 +210,7 @@ CFI_START_OSF_FRAME entMM .cfi_rel_offset $13, 32 .cfi_rel_offset $14, 40 .cfi_rel_offset $15, 48 - addq $sp, 56, $19 + addq $sp, 64, $19 /* handle the fault */ lda $8, 0x3fff bic $sp, $8, $8 @@ -223,7 +223,7 @@ CFI_START_OSF_FRAME entMM ldq $13, 32($sp) ldq $14, 40($sp) ldq $15, 48($sp) - addq $sp, 56, $sp + addq $sp, 64, $sp .cfi_restore $9 .cfi_restore $10 .cfi_restore $11 @@ -231,7 +231,7 @@ CFI_START_OSF_FRAME entMM .cfi_restore $13 .cfi_restore $14 .cfi_restore $15 - .cfi_adjust_cfa_offset -56 + .cfi_adjust_cfa_offset -64 /* finish up the syscall as normal. */ br ret_from_sys_call CFI_END_OSF_FRAME entMM @@ -378,8 +378,8 @@ entUnaUser: .cfi_restore $0 .cfi_adjust_cfa_offset -256 SAVE_ALL /* setup normal kernel stack */ - lda $sp, -56($sp) - .cfi_adjust_cfa_offset 56 + lda $sp, -64($sp) + .cfi_adjust_cfa_offset 64 stq $9, 0($sp) stq $10, 8($sp) stq $11, 16($sp) @@ -395,7 +395,7 @@ entUnaUser: .cfi_rel_offset $14, 40 .cfi_rel_offset $15, 48 lda $8, 0x3fff - addq $sp, 56, $19 + addq $sp, 64, $19 bic $sp, $8, $8 jsr $26, do_entUnaUser ldq $9, 0($sp) @@ -405,7 +405,7 @@ entUnaUser: ldq $13, 32($sp) ldq $14, 40($sp) ldq $15, 48($sp) - lda $sp, 56($sp) + lda $sp, 64($sp) .cfi_restore $9 .cfi_restore $10 .cfi_restore $11 @@ -413,7 +413,7 @@ entUnaUser: .cfi_restore $13 .cfi_restore $14 .cfi_restore $15 - .cfi_adjust_cfa_offset -56 + .cfi_adjust_cfa_offset -64 br ret_from_sys_call CFI_END_OSF_FRAME entUna ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 19:48 ` Ivan Kokshaysky @ 2025-01-25 22:06 ` Maciej W. Rozycki 2025-01-25 23:02 ` Ivan Kokshaysky 0 siblings, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-25 22:06 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, 25 Jan 2025, Ivan Kokshaysky wrote: > > > Here just bumping up the frame size and adjusting offsets in assembly > > > code accordingly so as to account for the empty longword at the bottom > > > of the frame should do, just as I did across my change. > > > > ... or, depending on how you look at it, top of the frame and FAOD in any > > case the longword closest to the stack pointer will be the empty one. > > Right. So if we agree on my variant, this addition patch is needed. Erm, that's a good starting point but offsets for the individual register slots need to be updated as well for `do_page_fault' and `do_entUnaUser' to get at the correct ones for those that are at negative indices from the `regs' pointer supplied, i.e. $9 at 8($sp), $10 at 16($sp), etc., and with 0($sp) now unoccupied. Sorry to get it through unclear after all. Maciej ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 22:06 ` Maciej W. Rozycki @ 2025-01-25 23:02 ` Ivan Kokshaysky 2025-01-26 14:00 ` Ivan Kokshaysky 0 siblings, 1 reply; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-25 23:02 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, Jan 25, 2025 at 10:06:38PM +0000, Maciej W. Rozycki wrote: > On Sat, 25 Jan 2025, Ivan Kokshaysky wrote: > > > > > Here just bumping up the frame size and adjusting offsets in assembly > > > > code accordingly so as to account for the empty longword at the bottom > > > > of the frame should do, just as I did across my change. > > > > > > ... or, depending on how you look at it, top of the frame and FAOD in any > > > case the longword closest to the stack pointer will be the empty one. > > > > Right. So if we agree on my variant, this addition patch is needed. > > Erm, that's a good starting point but offsets for the individual register > slots need to be updated as well for `do_page_fault' and `do_entUnaUser' > to get at the correct ones for those that are at negative indices from the > `regs' pointer supplied, i.e. $9 at 8($sp), $10 at 16($sp), etc., and with > 0($sp) now unoccupied. Sorry to get it through unclear after all. Ah, thanks a lot! Should've noticed that myself, but I've booted with the last patch and it didn't crash on me, so I thought the deed is done... I'll fix that tomorrow. Ivan. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 23:02 ` Ivan Kokshaysky @ 2025-01-26 14:00 ` Ivan Kokshaysky 2025-01-26 19:15 ` Magnus Lindholm 0 siblings, 1 reply; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-26 14:00 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sun, Jan 26, 2025 at 12:02:47AM +0100, Ivan Kokshaysky wrote: > > Erm, that's a good starting point but offsets for the individual register > > slots need to be updated as well for `do_page_fault' and `do_entUnaUser' > > to get at the correct ones for those that are at negative indices from the > > `regs' pointer supplied, i.e. $9 at 8($sp), $10 at 16($sp), etc., and with > > 0($sp) now unoccupied. Sorry to get it through unclear after all. > > Ah, thanks a lot! Should've noticed that myself, but I've booted with the > last patch and it didn't crash on me, so I thought the deed is done... > > I'll fix that tomorrow. Fixed in C, I guess it's easier to review. Ivan. diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c index a9a38c80c4a7..035086e19d64 100644 --- a/arch/alpha/kernel/traps.c +++ b/arch/alpha/kernel/traps.c @@ -649,7 +649,7 @@ s_reg_to_mem (unsigned long s_reg) static int unauser_reg_offsets[32] = { R(r0), R(r1), R(r2), R(r3), R(r4), R(r5), R(r6), R(r7), R(r8), /* r9 ... r15 are stored in front of regs. */ - -56, -48, -40, -32, -24, -16, -8, + -64, -56, -48, -40, -32, -24, -16, R(r16), R(r17), R(r18), R(r19), R(r20), R(r21), R(r22), R(r23), R(r24), R(r25), R(r26), R(r27), R(r28), R(gp), diff --git a/arch/alpha/mm/fault.c b/arch/alpha/mm/fault.c index 8c9850437e67..a9816bbc9f34 100644 --- a/arch/alpha/mm/fault.c +++ b/arch/alpha/mm/fault.c @@ -78,8 +78,8 @@ __load_new_mm_context(struct mm_struct *next_mm) /* Macro for exception fixup code to access integer registers. */ #define dpf_reg(r) \ - (((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-16 : \ - (r) <= 18 ? (r)+10 : (r)-10]) + (((unsigned long *)regs)[(r) <= 8 ? (r) : (r) <= 15 ? (r)-17 : \ + (r) <= 18 ? (r)+11 : (r)-10]) asmlinkage void do_page_fault(unsigned long address, unsigned long mmcsr, ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-26 14:00 ` Ivan Kokshaysky @ 2025-01-26 19:15 ` Magnus Lindholm 2025-01-27 11:48 ` Ivan Kokshaysky 0 siblings, 1 reply; 62+ messages in thread From: Magnus Lindholm @ 2025-01-26 19:15 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha I've applied the patches provided by Ivan, so far both systems are running fine, compiling kernels and such to provide some load. /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-26 19:15 ` Magnus Lindholm @ 2025-01-27 11:48 ` Ivan Kokshaysky 2025-01-27 11:56 ` John Paul Adrian Glaubitz 0 siblings, 1 reply; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-27 11:48 UTC (permalink / raw) To: Magnus Lindholm Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sun, Jan 26, 2025 at 08:15:41PM +0100, Magnus Lindholm wrote: > I've applied the patches provided by Ivan, so far both systems are > running fine, compiling kernels and such to provide some load. Thanks, Magnus! I'm going to submit this patchset in a couple of days. Ivan. ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-27 11:48 ` Ivan Kokshaysky @ 2025-01-27 11:56 ` John Paul Adrian Glaubitz 0 siblings, 0 replies; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2025-01-27 11:56 UTC (permalink / raw) To: Ivan Kokshaysky, Magnus Lindholm Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, rcu, linux-alpha Hi Ivan, On Mon, 2025-01-27 at 12:48 +0100, Ivan Kokshaysky wrote: > On Sun, Jan 26, 2025 at 08:15:41PM +0100, Magnus Lindholm wrote: > > I've applied the patches provided by Ivan, so far both systems are > > running fine, compiling kernels and such to provide some load. > > Thanks, Magnus! I'm going to submit this patchset in a couple of days. Can we also get a fix for the SET_PERSONALITY() issue? Thanks, Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 15:15 ` Ivan Kokshaysky 2025-01-25 17:01 ` Maciej W. Rozycki @ 2025-01-25 18:07 ` Magnus Lindholm 1 sibling, 0 replies; 62+ messages in thread From: Magnus Lindholm @ 2025-01-25 18:07 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Maciej W. Rozycki, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha > Magnus, can you please try this variant? > I've built the kernel with this patch applied and I'm running it on two separate systems right now and so far so good! I'm putting some load on the systems and letting it run overnight... /Magnus ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-24 10:55 ` Ivan Kokshaysky 2025-01-24 16:57 ` Magnus Lindholm @ 2025-01-25 15:35 ` Maciej W. Rozycki 2025-01-25 17:09 ` Ivan Kokshaysky 1 sibling, 1 reply; 62+ messages in thread From: Maciej W. Rozycki @ 2025-01-25 15:35 UTC (permalink / raw) To: Ivan Kokshaysky Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Fri, 24 Jan 2025, Ivan Kokshaysky wrote: > > > > Indeed, SP_OFF in entry.S is the main suspect at the moment. > > > > > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > > > the stack misaligned by 8 bytes. The patch below works for me - no more > > > oopses in rcu-torture test. > > > > > > Unless I'm missing something, this change shouldn't have any ill effects. > > > > Umm, this is a part of UAPI, and the change in alignment changes the ABI > > (think padding where `struct pt_regs' has been embedded into another > > structure), so AFAICT it is a no-no. > > Well, the only userspace applications I can think of that need kernel > stack layout are debuggers, but at least alpha gdb doesn't use this header. > Doesn't matter, though - padding *after* PAL-saved registers is wrong > thing to do. I think it's the reason for oopses that Magnus reported > today. > > A "long" padding memder of pt_regs placed *before* PAL-saved registers > would be a proper fix for kernel, but it most likely would break gdb... > > > But the only place I could quickly find this should matter for is this: > > > > /* ... and find our stack ... */ > > lda $30,0x4000 - SIZEOF_PT_REGS($8) > > > > which should be straightforward to fix: > > > > lda $30,0x4000 - ((SIZEOF_PT_REGS + 15) & ~15)($8) > > > > or suchlike. Have I missed anything? > > That's the first thing I thought of too, but no, it's just a kernel > entry point after the bootloader. The stack pointer of kernel threads > is assigned in alpha/kernel/process.c. Particularly, these macros > in ptrace.h (non-uapi) are interesting: > > #define task_pt_regs(task) \ > ((struct pt_regs *) (task_stack_page(task) + 2*PAGE_SIZE) - 1) > > #define current_pt_regs() \ > ((struct pt_regs *) ((char *)current_thread_info() + 2*PAGE_SIZE) - 1) > > I'll try to play with alignment here, but it will take some time. So after a crash course in PALcode stack frames I have come up with the following WIP patch that works for me. If things go well, I'll clean it up a little and turn into a proper patch submission. Not that I think I can make the end result particularly pretty, there's no easy way AFAICT. NB with some instrumentation here's what gets reported for stack without: start_kernel: SP: fffffc0000dcfe98 do_entInt: SP: fffffc0000dcfc30 copy_thread: SP: fffffc0000dcfc98, regs: fffffc0000dcff18, childregs: fffffc0001837f18, childstack: fffffc0001837ed8 do_page_fault: SP: fffffc0001837bc8 sys_exit_group: SP: fffffc0002917ef8 do_entUnaUser: SP: fffffc0001f33e70 do_entArith: SP: fffffc0001f33ee8 do_entIF: SP: fffffc000184bee8 and with the patch: start_kernel: SP: fffffc0000dcfe90 do_entInt: SP: fffffc0000dcfc20 copy_thread: SP: fffffc0000dcfc90, regs: fffffc0000dcff18, childregs: fffffc000183bf18, childstack: fffffc000183bed0 do_page_fault: SP: fffffc000183bbc0 sys_exit_group: SP: fffffc00028d3ef0 do_entUnaUser: SP: fffffc000292fe70 do_entArith: SP: fffffc0001d7fee0 do_entIF: SP: fffffc0002827ee0 for the relevant situations (except for `entDbg', but that's analogous and largely unused anyway). Can you guys please give it a try? Maciej --- arch/alpha/kernel/entry.S | 262 +++++++++++++++++++++++++------------------- arch/alpha/kernel/head.S | 4 arch/alpha/kernel/process.c | 12 +- 3 files changed, 160 insertions(+), 118 deletions(-) Index: linux-melmac/arch/alpha/kernel/entry.S =================================================================== --- linux-melmac.orig/arch/alpha/kernel/entry.S +++ linux-melmac/arch/alpha/kernel/entry.S @@ -16,7 +16,7 @@ .cfi_sections .debug_frame /* Stack offsets. */ -#define SP_OFF 184 +#define SP_OFF 192 #define SWITCH_STACK_SIZE 64 .macro CFI_START_OSF_FRAME func @@ -52,80 +52,80 @@ .macro SAVE_ALL subq $sp, SP_OFF, $sp .cfi_adjust_cfa_offset SP_OFF - stq $0, 0($sp) - stq $1, 8($sp) - stq $2, 16($sp) - stq $3, 24($sp) - stq $4, 32($sp) - stq $28, 144($sp) - .cfi_rel_offset $0, 0 - .cfi_rel_offset $1, 8 - .cfi_rel_offset $2, 16 - .cfi_rel_offset $3, 24 - .cfi_rel_offset $4, 32 - .cfi_rel_offset $28, 144 + stq $0, 8($sp) + stq $1, 16($sp) + stq $2, 24($sp) + stq $3, 32($sp) + stq $4, 40($sp) + stq $28, 152($sp) + .cfi_rel_offset $0, 8 + .cfi_rel_offset $1, 16 + .cfi_rel_offset $2, 24 + .cfi_rel_offset $3, 32 + .cfi_rel_offset $4, 40 + .cfi_rel_offset $28, 152 lda $2, alpha_mv - stq $5, 40($sp) - stq $6, 48($sp) - stq $7, 56($sp) - stq $8, 64($sp) - stq $19, 72($sp) - stq $20, 80($sp) - stq $21, 88($sp) + stq $5, 48($sp) + stq $6, 56($sp) + stq $7, 64($sp) + stq $8, 72($sp) + stq $19, 80($sp) + stq $20, 88($sp) + stq $21, 96($sp) ldq $2, HAE_CACHE($2) - stq $22, 96($sp) - stq $23, 104($sp) - stq $24, 112($sp) - stq $25, 120($sp) - stq $26, 128($sp) - stq $27, 136($sp) - stq $2, 152($sp) - stq $16, 160($sp) - stq $17, 168($sp) - stq $18, 176($sp) - .cfi_rel_offset $5, 40 - .cfi_rel_offset $6, 48 - .cfi_rel_offset $7, 56 - .cfi_rel_offset $8, 64 - .cfi_rel_offset $19, 72 - .cfi_rel_offset $20, 80 - .cfi_rel_offset $21, 88 - .cfi_rel_offset $22, 96 - .cfi_rel_offset $23, 104 - .cfi_rel_offset $24, 112 - .cfi_rel_offset $25, 120 - .cfi_rel_offset $26, 128 - .cfi_rel_offset $27, 136 + stq $22, 104($sp) + stq $23, 112($sp) + stq $24, 120($sp) + stq $25, 128($sp) + stq $26, 136($sp) + stq $27, 144($sp) + stq $2, 160($sp) + stq $16, 168($sp) + stq $17, 176($sp) + stq $18, 184($sp) + .cfi_rel_offset $5, 48 + .cfi_rel_offset $6, 56 + .cfi_rel_offset $7, 64 + .cfi_rel_offset $8, 72 + .cfi_rel_offset $19, 80 + .cfi_rel_offset $20, 88 + .cfi_rel_offset $21, 96 + .cfi_rel_offset $22, 104 + .cfi_rel_offset $23, 112 + .cfi_rel_offset $24, 120 + .cfi_rel_offset $25, 128 + .cfi_rel_offset $26, 136 + .cfi_rel_offset $27, 144 .endm .macro RESTORE_ALL lda $19, alpha_mv - ldq $0, 0($sp) - ldq $1, 8($sp) - ldq $2, 16($sp) - ldq $3, 24($sp) - ldq $21, 152($sp) + ldq $0, 8($sp) + ldq $1, 16($sp) + ldq $2, 24($sp) + ldq $3, 32($sp) + ldq $21, 160($sp) ldq $20, HAE_CACHE($19) - ldq $4, 32($sp) - ldq $5, 40($sp) - ldq $6, 48($sp) - ldq $7, 56($sp) + ldq $4, 40($sp) + ldq $5, 48($sp) + ldq $6, 56($sp) + ldq $7, 64($sp) subq $20, $21, $20 - ldq $8, 64($sp) + ldq $8, 72($sp) beq $20, 99f ldq $20, HAE_REG($19) stq $21, HAE_CACHE($19) stq $21, 0($20) -99: ldq $19, 72($sp) - ldq $20, 80($sp) - ldq $21, 88($sp) - ldq $22, 96($sp) - ldq $23, 104($sp) - ldq $24, 112($sp) - ldq $25, 120($sp) - ldq $26, 128($sp) - ldq $27, 136($sp) - ldq $28, 144($sp) +99: ldq $19, 80($sp) + ldq $20, 88($sp) + ldq $21, 96($sp) + ldq $22, 104($sp) + ldq $23, 112($sp) + ldq $24, 120($sp) + ldq $25, 128($sp) + ldq $26, 136($sp) + ldq $27, 144($sp) + ldq $28, 152($sp) addq $sp, SP_OFF, $sp .cfi_restore $0 .cfi_restore $1 @@ -152,6 +152,26 @@ .macro DO_SWITCH_STACK bsr $1, do_switch_stack .cfi_adjust_cfa_offset SWITCH_STACK_SIZE + .cfi_rel_offset $9, 8 + .cfi_rel_offset $10, 16 + .cfi_rel_offset $11, 24 + .cfi_rel_offset $12, 32 + .cfi_rel_offset $13, 40 + .cfi_rel_offset $14, 48 + .cfi_rel_offset $15, 56 +.endm + +.macro DO_SWITCH_STACK_0 + lda $sp, -SWITCH_STACK_SIZE($sp) + .cfi_adjust_cfa_offset -SWITCH_STACK_SIZE + stq $9, 0($sp) + stq $10, 8($sp) + stq $11, 16($sp) + stq $12, 24($sp) + stq $13, 32($sp) + stq $14, 40($sp) + stq $15, 48($sp) + stq $26, 56($sp) .cfi_rel_offset $9, 0 .cfi_rel_offset $10, 8 .cfi_rel_offset $11, 16 @@ -173,6 +193,26 @@ .cfi_adjust_cfa_offset -SWITCH_STACK_SIZE .endm +.macro UNDO_SWITCH_STACK_0 + ldq $9, 0($sp) + ldq $10, 8($sp) + ldq $11, 16($sp) + ldq $12, 24($sp) + ldq $13, 32($sp) + ldq $14, 40($sp) + ldq $15, 48($sp) + ldq $26, 56($sp) + .cfi_restore $9 + .cfi_restore $10 + .cfi_restore $11 + .cfi_restore $12 + .cfi_restore $13 + .cfi_restore $14 + .cfi_restore $15 + lda $sp, SWITCH_STACK_SIZE($sp) + .cfi_adjust_cfa_offset SWITCH_STACK_SIZE +.endm + /* * Non-syscall kernel entry points. */ @@ -182,7 +222,7 @@ CFI_START_OSF_FRAME entInt lda $8, 0x3fff lda $26, ret_from_sys_call bic $sp, $8, $8 - mov $sp, $19 + addq $sp, 8, $19 jsr $31, do_entInt CFI_END_OSF_FRAME entInt @@ -191,15 +231,15 @@ CFI_START_OSF_FRAME entArith lda $8, 0x3fff lda $26, ret_from_sys_call bic $sp, $8, $8 - mov $sp, $18 + addq $sp, 8, $18 jsr $31, do_entArith CFI_END_OSF_FRAME entArith CFI_START_OSF_FRAME entMM SAVE_ALL /* save $9 - $15 so the inline exception code can manipulate them. */ - subq $sp, 56, $sp - .cfi_adjust_cfa_offset 56 + subq $sp, 48, $sp + .cfi_adjust_cfa_offset 48 stq $9, 0($sp) stq $10, 8($sp) stq $11, 16($sp) @@ -227,7 +267,7 @@ CFI_START_OSF_FRAME entMM ldq $13, 32($sp) ldq $14, 40($sp) ldq $15, 48($sp) - addq $sp, 56, $sp + addq $sp, 48, $sp .cfi_restore $9 .cfi_restore $10 .cfi_restore $11 @@ -235,7 +275,7 @@ CFI_START_OSF_FRAME entMM .cfi_restore $13 .cfi_restore $14 .cfi_restore $15 - .cfi_adjust_cfa_offset -56 + .cfi_adjust_cfa_offset -48 /* finish up the syscall as normal. */ br ret_from_sys_call CFI_END_OSF_FRAME entMM @@ -245,7 +285,7 @@ CFI_START_OSF_FRAME entIF lda $8, 0x3fff lda $26, ret_from_sys_call bic $sp, $8, $8 - mov $sp, $17 + addq $sp, 8, $17 jsr $31, do_entIF CFI_END_OSF_FRAME entIF @@ -382,8 +422,8 @@ CFI_START_OSF_FRAME entUna .cfi_restore $0 .cfi_adjust_cfa_offset -256 SAVE_ALL /* setup normal kernel stack */ - lda $sp, -56($sp) - .cfi_adjust_cfa_offset 56 + lda $sp, -48($sp) + .cfi_adjust_cfa_offset 48 stq $9, 0($sp) stq $10, 8($sp) stq $11, 16($sp) @@ -409,7 +449,7 @@ CFI_START_OSF_FRAME entUna ldq $13, 32($sp) ldq $14, 40($sp) ldq $15, 48($sp) - lda $sp, 56($sp) + lda $sp, 48($sp) .cfi_restore $9 .cfi_restore $10 .cfi_restore $11 @@ -417,7 +457,7 @@ CFI_START_OSF_FRAME entUna .cfi_restore $13 .cfi_restore $14 .cfi_restore $15 - .cfi_adjust_cfa_offset -56 + .cfi_adjust_cfa_offset -48 br ret_from_sys_call CFI_END_OSF_FRAME entUna @@ -426,7 +466,7 @@ CFI_START_OSF_FRAME entDbg lda $8, 0x3fff lda $26, ret_from_sys_call bic $sp, $8, $8 - mov $sp, $16 + addq $sp, 8, $16 jsr $31, do_entDbg CFI_END_OSF_FRAME entDbg \f @@ -478,8 +518,8 @@ CFI_END_OSF_FRAME entDbg ldgp $gp, 0($26) blt $0, $syscall_error /* the call failed */ $ret_success: - stq $0, 0($sp) - stq $31, 72($sp) /* a3=0 => no error */ + stq $0, 8($sp) + stq $31, 80($sp) /* a3=0 => no error */ .align 4 .globl ret_from_sys_call @@ -520,15 +560,15 @@ CFI_END_OSF_FRAME entDbg * frame to indicate that a negative return value wasn't an * error number.. */ - ldq $18, 0($sp) /* old syscall nr (zero if success) */ + ldq $18, 8($sp) /* old syscall nr (zero if success) */ beq $18, $ret_success - ldq $19, 72($sp) /* .. and this a3 */ + ldq $19, 80($sp) /* .. and this a3 */ subq $31, $0, $0 /* with error in v0 */ addq $31, 1, $1 /* set a3 for errno return */ - stq $0, 0($sp) + stq $0, 8($sp) mov $31, $26 /* tell "ret_from_sys_call" we can restart */ - stq $1, 72($sp) /* a3 for return */ + stq $1, 80($sp) /* a3 for return */ br ret_from_sys_call /* @@ -559,7 +599,7 @@ CFI_END_OSF_FRAME entDbg br ret_to_user $work_notifysig: - mov $sp, $16 + addq $sp, 8, $16 DO_SWITCH_STACK jsr $26, do_work_pending UNDO_SWITCH_STACK @@ -589,9 +629,9 @@ CFI_END_OSF_FRAME entDbg ldq $16, SP_OFF+24($sp) ldq $17, SP_OFF+32($sp) ldq $18, SP_OFF+40($sp) - ldq $19, 72($sp) - ldq $20, 80($sp) - ldq $21, 88($sp) + ldq $19, 80($sp) + ldq $20, 88($sp) + ldq $21, 96($sp) /* get the system call pointer.. */ lda $1, NR_syscalls($31) @@ -608,8 +648,8 @@ CFI_END_OSF_FRAME entDbg /* check return.. */ blt $0, $strace_error /* the call failed */ $strace_success: - stq $31, 72($sp) /* a3=0 => no error */ - stq $0, 0($sp) /* save return value */ + stq $31, 80($sp) /* a3=0 => no error */ + stq $0, 8($sp) /* save return value */ DO_SWITCH_STACK jsr $26, syscall_trace_leave @@ -618,14 +658,14 @@ CFI_END_OSF_FRAME entDbg .align 3 $strace_error: - ldq $18, 0($sp) /* old syscall nr (zero if success) */ + ldq $18, 8($sp) /* old syscall nr (zero if success) */ beq $18, $strace_success - ldq $19, 72($sp) /* .. and this a3 */ + ldq $19, 80($sp) /* .. and this a3 */ subq $31, $0, $0 /* with error in v0 */ addq $31, 1, $1 /* set a3 for errno return */ - stq $0, 0($sp) - stq $1, 72($sp) /* a3 for return */ + stq $0, 8($sp) + stq $1, 80($sp) /* a3 for return */ DO_SWITCH_STACK mov $18, $9 /* save old syscall number */ @@ -652,14 +692,14 @@ CFI_END_OSF_FRAME entSys do_switch_stack: lda $sp, -SWITCH_STACK_SIZE($sp) .cfi_adjust_cfa_offset SWITCH_STACK_SIZE - stq $9, 0($sp) - stq $10, 8($sp) - stq $11, 16($sp) - stq $12, 24($sp) - stq $13, 32($sp) - stq $14, 40($sp) - stq $15, 48($sp) - stq $26, 56($sp) + stq $9, 8($sp) + stq $10, 16($sp) + stq $11, 24($sp) + stq $12, 32($sp) + stq $13, 40($sp) + stq $14, 48($sp) + stq $15, 56($sp) + stq $26, 64($sp) ret $31, ($1), 1 .cfi_endproc .size do_switch_stack, .-do_switch_stack @@ -670,14 +710,14 @@ CFI_END_OSF_FRAME entSys .cfi_def_cfa $sp, 0 .cfi_register 64, $1 undo_switch_stack: - ldq $9, 0($sp) - ldq $10, 8($sp) - ldq $11, 16($sp) - ldq $12, 24($sp) - ldq $13, 32($sp) - ldq $14, 40($sp) - ldq $15, 48($sp) - ldq $26, 56($sp) + ldq $9, 8($sp) + ldq $10, 16($sp) + ldq $11, 24($sp) + ldq $12, 32($sp) + ldq $13, 40($sp) + ldq $14, 48($sp) + ldq $15, 56($sp) + ldq $26, 64($sp) lda $sp, SWITCH_STACK_SIZE($sp) ret $31, ($1), 1 .cfi_endproc @@ -733,7 +773,7 @@ CFI_END_OSF_FRAME entSys .type alpha_switch_to, @function .cfi_startproc alpha_switch_to: - DO_SWITCH_STACK + DO_SWITCH_STACK_0 ldl $1, TI_STATUS($8) and $1, TS_RESTORE_FP, $3 bne $3, 1f @@ -745,7 +785,7 @@ CFI_END_OSF_FRAME entSys 1: call_pal PAL_swpctx lda $8, 0x3fff - UNDO_SWITCH_STACK + UNDO_SWITCH_STACK_0 bic $sp, $8, $8 mov $17, $0 ret @@ -802,7 +842,7 @@ CFI_END_OSF_FRAME entSys bsr $26, __save_fpu 1: jsr $26, sys_\name - ldq $26, 56($sp) + ldq $26, 64($sp) lda $sp, SWITCH_STACK_SIZE($sp) ret .end alpha_\name @@ -847,6 +887,6 @@ sigreturn_like rt_sigreturn */ lda $0, -ENOSYS unop - stq $0, 0($sp) + stq $0, 8($sp) ret .end alpha_syscall_zero Index: linux-melmac/arch/alpha/kernel/head.S =================================================================== --- linux-melmac.orig/arch/alpha/kernel/head.S +++ linux-melmac/arch/alpha/kernel/head.S @@ -25,8 +25,8 @@ __HEAD 1: ldgp $29,0($27) /* We need to get current_task_info loaded up... */ lda $8,init_thread_union - /* ... and find our stack ... */ - lda $30,0x4000 - SIZEOF_PT_REGS($8) + /* ... and find our stack, ensuring 16-byte alignment ... */ + lda $30,0x4000 - (SIZEOF_PT_REGS + 8)($8) /* ... and then we can start the kernel. */ jsr $26,start_kernel call_pal PAL_halt Index: linux-melmac/arch/alpha/kernel/process.c =================================================================== --- linux-melmac.orig/arch/alpha/kernel/process.c +++ linux-melmac/arch/alpha/kernel/process.c @@ -240,7 +240,8 @@ int copy_thread(struct task_struct *p, c struct pt_regs *regs = current_pt_regs(); struct switch_stack *childstack, *stack; - childstack = ((struct switch_stack *) childregs) - 1; + childstack = + ((struct switch_stack *)((unsigned long *)childregs - 1)) - 1; childti->pcb.ksp = (unsigned long) childstack; childti->pcb.flags = 1; /* set FEN, clear everything else */ childti->status |= TS_SAVED_FP | TS_RESTORE_FP; @@ -248,7 +249,8 @@ int copy_thread(struct task_struct *p, c if (unlikely(args->fn)) { /* kernel thread */ memset(childstack, 0, - sizeof(struct switch_stack) + sizeof(struct pt_regs)); + (sizeof(struct switch_stack) + + sizeof(unsigned long) + sizeof(struct pt_regs))); childstack->r26 = (unsigned long) ret_from_kernel_thread; childstack->r9 = (unsigned long) args->fn; childstack->r10 = (unsigned long) args->fn_arg; @@ -345,7 +347,7 @@ int elf_core_copy_task_fpregs(struct tas * pointer is the 6th saved long on the kernel stack and that the * saved return address is the first long in the frame. This all * holds provided the thread blocked through a call to schedule() ($15 - * is the frame pointer in schedule() and $15 is saved at offset 48 by + * is the frame pointer in schedule() and $15 is saved at offset 56 by * entry.S:do_switch_stack). * * Under heavy swap load I've seen this lose in an ugly way. So do @@ -360,8 +362,8 @@ thread_saved_pc(struct task_struct *t) unsigned long base = (unsigned long)task_stack_page(t); unsigned long fp, sp = task_thread_info(t)->pcb.ksp; - if (sp > base && sp+6*8 < base + 16*1024) { - fp = ((unsigned long*)sp)[6]; + if (sp > base && sp + 7 * 8 < base + 16 * 1024) { + fp = ((unsigned long *)sp)[7]; if (fp > sp && fp < base + 16*1024) return *(unsigned long *)fp; } ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-25 15:35 ` Maciej W. Rozycki @ 2025-01-25 17:09 ` Ivan Kokshaysky 0 siblings, 0 replies; 62+ messages in thread From: Ivan Kokshaysky @ 2025-01-25 17:09 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, John Paul Adrian Glaubitz, rcu, linux-alpha On Sat, Jan 25, 2025 at 03:35:08PM +0000, Maciej W. Rozycki wrote: > On Fri, 24 Jan 2025, Ivan Kokshaysky wrote: > > > > > > Indeed, SP_OFF in entry.S is the main suspect at the moment. > > > > > > > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > > > > the stack misaligned by 8 bytes. The patch below works for me - no more > > > > oopses in rcu-torture test. > > > > > > > > Unless I'm missing something, this change shouldn't have any ill effects. > > > > > > Umm, this is a part of UAPI, and the change in alignment changes the ABI > > > (think padding where `struct pt_regs' has been embedded into another > > > structure), so AFAICT it is a no-no. > > > > Well, the only userspace applications I can think of that need kernel > > stack layout are debuggers, but at least alpha gdb doesn't use this header. > > Doesn't matter, though - padding *after* PAL-saved registers is wrong > > thing to do. I think it's the reason for oopses that Magnus reported > > today. > > > > A "long" padding memder of pt_regs placed *before* PAL-saved registers > > would be a proper fix for kernel, but it most likely would break gdb... > > > > > But the only place I could quickly find this should matter for is this: > > > > > > /* ... and find our stack ... */ > > > lda $30,0x4000 - SIZEOF_PT_REGS($8) > > > > > > which should be straightforward to fix: > > > > > > lda $30,0x4000 - ((SIZEOF_PT_REGS + 15) & ~15)($8) > > > > > > or suchlike. Have I missed anything? > > > > That's the first thing I thought of too, but no, it's just a kernel > > entry point after the bootloader. The stack pointer of kernel threads > > is assigned in alpha/kernel/process.c. Particularly, these macros > > in ptrace.h (non-uapi) are interesting: > > > > #define task_pt_regs(task) \ > > ((struct pt_regs *) (task_stack_page(task) + 2*PAGE_SIZE) - 1) > > > > #define current_pt_regs() \ > > ((struct pt_regs *) ((char *)current_thread_info() + 2*PAGE_SIZE) - 1) > > > > I'll try to play with alignment here, but it will take some time. > > So after a crash course in PALcode stack frames I have come up with the > following WIP patch that works for me. If things go well, I'll clean it > up a little and turn into a proper patch submission. Not that I think I > can make the end result particularly pretty, there's no easy way AFAICT. > > NB with some instrumentation here's what gets reported for stack without: > > start_kernel: SP: fffffc0000dcfe98 > do_entInt: SP: fffffc0000dcfc30 > copy_thread: SP: fffffc0000dcfc98, regs: fffffc0000dcff18, childregs: fffffc0001837f18, childstack: fffffc0001837ed8 > do_page_fault: SP: fffffc0001837bc8 > sys_exit_group: SP: fffffc0002917ef8 > do_entUnaUser: SP: fffffc0001f33e70 > do_entArith: SP: fffffc0001f33ee8 > do_entIF: SP: fffffc000184bee8 > > and with the patch: > > start_kernel: SP: fffffc0000dcfe90 > do_entInt: SP: fffffc0000dcfc20 > copy_thread: SP: fffffc0000dcfc90, regs: fffffc0000dcff18, childregs: fffffc000183bf18, childstack: fffffc000183bed0 > do_page_fault: SP: fffffc000183bbc0 > sys_exit_group: SP: fffffc00028d3ef0 > do_entUnaUser: SP: fffffc000292fe70 > do_entArith: SP: fffffc0001d7fee0 > do_entIF: SP: fffffc0002827ee0 > > for the relevant situations (except for `entDbg', but that's analogous and > largely unused anyway). > > Can you guys please give it a try? Oh. I have little doubt it works, but so much hard work just to keep the pt_regs thing intact? Instead of asking ourselves how come it ended up in uapi? It was commit 96433f6ee49032d7a8b back in 2012 done by some scripting, I believe it was by mistake, because it's the kernel bowels exposed for absolutely no reason. I was going to propose a patch below (I don't think we can remove the file, as it probably break build of packages like linux-libc), and then add padding to pt_regs with exactly the same effect as your patch. Ivan. diff --git a/arch/alpha/include/uapi/asm/ptrace.h b/arch/alpha/include/uapi/asm/ptrace.h index 5ca45934fcbb..6b09e1df343d 100644 --- a/arch/alpha/include/uapi/asm/ptrace.h +++ b/arch/alpha/include/uapi/asm/ptrace.h @@ -2,7 +2,7 @@ #ifndef _UAPI_ASMAXP_PTRACE_H #define _UAPI_ASMAXP_PTRACE_H - +#ifdef __KERNEL__ /* * This struct defines the way the registers are stored on the * kernel stack during a system call or other kernel entry @@ -64,10 +64,7 @@ struct switch_stack { unsigned long r14; unsigned long r15; unsigned long r26; -#ifndef __KERNEL__ - unsigned long fp[32]; /* fp[31] is fpcr */ -#endif }; - +#endif #endif /* _UAPI_ASMAXP_PTRACE_H */ ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: Kernel Oops on alpha with kernel version >=6.9.x 2025-01-23 18:36 ` Ivan Kokshaysky 2025-01-23 23:00 ` Magnus Lindholm 2025-01-23 23:57 ` Maciej W. Rozycki @ 2025-01-24 6:54 ` John Paul Adrian Glaubitz 2 siblings, 0 replies; 62+ messages in thread From: John Paul Adrian Glaubitz @ 2025-01-24 6:54 UTC (permalink / raw) To: Ivan Kokshaysky, Maciej W. Rozycki Cc: Magnus Lindholm, Paul E. McKenney, Michael Cree, rcu, linux-alpha Hi Ivan, On Thu, 2025-01-23 at 19:36 +0100, Ivan Kokshaysky wrote: > On Tue, Jan 21, 2025 at 02:39:12PM +0100, Ivan Kokshaysky wrote: > > Indeed, SP_OFF in entry.S is the main suspect at the moment. > > In fact, it's the odd number of longs (29) in struct pt_regs that makes > the stack misaligned by 8 bytes. The patch below works for me - no more > oopses in rcu-torture test. > > Unless I'm missing something, this change shouldn't have any ill effects. > > Ivan. > > diff --git a/arch/alpha/include/uapi/asm/ptrace.h b/arch/alpha/include/uapi/asm/ptrace.h > index 5ca45934fcbb..d2e8e69a18f1 100644 > --- a/arch/alpha/include/uapi/asm/ptrace.h > +++ b/arch/alpha/include/uapi/asm/ptrace.h > @@ -49,7 +49,7 @@ struct pt_regs { > unsigned long r16; > unsigned long r17; > unsigned long r18; > -}; > +} __attribute__((aligned(16))); /* GCC expects 16-byte stack alignment */ > > /* > * This is the extended stack used by signal handlers and the context I can confirm that this patch fixes the SMP problem [1] for me. Thanks, Adrian > [1] https://bugzilla.kernel.org/show_bug.cgi?id=213143 -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 62+ messages in thread
* Kernel Oops on alpha with kernel version >=6.9.x @ 2024-11-24 21:47 Magnus Lindholm 0 siblings, 0 replies; 62+ messages in thread From: Magnus Lindholm @ 2024-11-24 21:47 UTC (permalink / raw) To: linux-alpha Hi, First some background: I've been trying to boot recent kernels on my alpha machines. Anything after linux-6.8.12 gives me trouble. After doing a kernel bisect, I found that commit 9187210eee7d87eea37b45ea93454a88681894a4 (net-next-6.9) is where my troubles begin. The problem consists in that the boot process gets stuck when trying to set parameters for network interfaces. The bad commit does make a lot of updates to the network code. When booting the system with kernel 6.12.0 I'm able to boot into single-user mode, but when starting system services one by one I trigger a kernel Oops when the network interface is renamed (see stack dump below). Looking at the changes made by the bad commit, it seems to (among other things) be replacing the locking mechanism (RCU instead of rtnl_lock). The stack dump from the kernel Oops suggests that something is happening in the RCU locking code. I'm no expert on RCU-stuff but I read somewhere that it is done by volatile access on all systems other than DEC Alpha, where a memory barrier instruction is required. This indicates that the change could affect Alpha architecture differently? Inspecting the changes to networking code in the bad commit, particularly the changes made to net/core/dev.c, I put together the patch below. This patch reverts one of the lines changed in the "bad commit" for net/core/dev.c. After reverting the change on just this line, I'm able to boot kernel 6.12.0 on my Alpha ES-40 to full multi-user again. I've tested this on an Alpha ES40 and an UP2000+ and the problem is 100% reproducible on both machines. The patch might not be a real solution to the problem but could be a good place to start looking when figuring out what's really going on. Not sure what is the next step here, it would be interesting to hear if anyone else has seen this or is able to reproduce it? ------------------------------------ Patch to "fix" the problem: ----------------------------------- diff --git a/net/core/dev.c b/net/core/dev.c index 13d00fc10f55..26fda14367e5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1261,7 +1261,7 @@ int dev_change_name(struct net_device *dev, const char *newname) netdev_name_node_del(dev->name_node); - synchronize_net(); + synchronize_rcu(); netdev_name_node_add(net, dev->name_node); -------------------------- dmesg/kernel log: ------------------------- [ 93.431592] tulip 0000:01:02.0 enp1s2: renamed from eth0 [ 93.436475] Unable to handle kernel paging request at virtual address 0000000000000000 [ 93.436475] CPU 1 [ 93.436475] rcu_exp_gp_kthr(17): Oops -1 [ 93.436475] pc = [<0000000000000000>] ra = [<0000000000000000>] ps = 0000 Not tainted [ 93.436475] pc is at 0x0 [ 93.436475] ra is at 0x0 [ 93.436475] v0 = 0000000000000007 t0 = fffffc0000e62440 t1 = 0000000000000001 [ 93.436475] t2 = 0000000000000000 t3 = 0000000000000001 t4 = 0000000000000001 [ 93.436475] t5 = 0000000000000001 t6 = 0000000000000001 t7 = fffffc0003138000 [ 93.436475] s0 = fffffc0000e62440 s1 = fffffc0000ec3a10 s2 = fffffc0000ec3a10 [ 93.436475] s3 = fffffc0000ec3a10 s4 = fffffc00003a90f0 s5 = fffffc0000e62440 [ 93.436475] s6 = 0000000000000000 [ 93.436475] a0 = 0000000000000000 a1 = 0000000000000000 a2 = 0000000000000000 [ 93.436475] a3 = 0000000000000000 a4 = 0000000000000001 a5 = fffffc0000517744 [ 93.436475] t8 = 0000000000000001 t9 = 0000000000000001 t10= fffffc0000e3d320 [ 93.436475] t11= fffffc0000220240 pv = fffffc0000b73210 at = 0000000000000000 [ 93.436475] gp = fffffc0000eb3a10 sp = 00000000ea2ea184 [ 93.436475] Disabling lock debugging due to kernel taint [ 93.436475] Trace: [ 93.436475] [<fffffc00003aee60>] wait_rcu_exp_gp+0x30/0xa0 [ 93.436475] [<fffffc0000b6c200>] __cond_resched+0x30/0x90 [ 93.436475] [<fffffc00003569b8>] kthread_worker_fn+0xc8/0x1f0 [ 93.436475] [<fffffc000035863c>] kthread+0x17c/0x1c0 [ 93.436475] [<fffffc00003568f0>] kthread_worker_fn+0x0/0x1f0 [ 93.436475] [<fffffc0000311128>] ret_from_kernel_thread+0x18/0x20 [ 93.436475] Code: [ 93.436475] 00000000 [ 93.436475] 00000000 [ 93.436475] 00063301 [ 93.436475] 0000077c [ 93.436475] 00001111 [ 93.436475] 000022a2 ^ permalink raw reply related [flat|nested] 62+ messages in thread
end of thread, other threads:[~2025-01-27 11:56 UTC | newest]
Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CA+=Fv5R9NG+1SHU9QV9hjmavycHKpnNyerQ=Ei90G98ukRcRJA@mail.gmail.com>
[not found] ` <f1561626-fde6-41c0-9d45-87a6a7b13816@paulmck-laptop>
[not found] ` <CA+=Fv5TMZ1bckob2VZ1AaNwdU3R+5a8REKg4aZR8novx7+dj-g@mail.gmail.com>
[not found] ` <8e365392-0e0d-4d78-bae7-69f27c8350f9@paulmck-laptop>
2024-12-04 22:22 ` Kernel Oops on alpha with kernel version >=6.9.x Magnus Lindholm
2024-12-05 15:39 ` John Paul Adrian Glaubitz
2024-12-05 17:02 ` Magnus Lindholm
2024-12-06 15:39 ` Magnus Lindholm
2024-12-06 17:05 ` John Paul Adrian Glaubitz
2024-12-07 12:33 ` Magnus Lindholm
2024-12-07 12:39 ` John Paul Adrian Glaubitz
2024-12-07 17:33 ` Magnus Lindholm
2024-12-07 18:38 ` John Paul Adrian Glaubitz
2024-12-08 9:43 ` Magnus Lindholm
2024-12-08 21:39 ` Magnus Lindholm
2024-12-08 23:18 ` Michael Cree
2024-12-08 23:31 ` John Paul Adrian Glaubitz
2024-12-09 8:11 ` Magnus Lindholm
2024-12-12 23:23 ` Magnus Lindholm
2024-12-09 8:05 ` Magnus Lindholm
2024-12-16 22:10 ` Michael Cree
2024-12-17 6:23 ` Magnus Lindholm
2024-12-18 19:33 ` Magnus Lindholm
2024-12-18 20:31 ` Paul E. McKenney
2024-12-18 21:54 ` Magnus Lindholm
2024-12-18 22:50 ` Paul E. McKenney
2024-12-19 22:38 ` Magnus Lindholm
2024-12-19 23:03 ` Paul E. McKenney
2024-12-20 0:00 ` Maciej W. Rozycki
2024-12-27 10:42 ` Magnus Lindholm
2024-12-27 11:48 ` John Paul Adrian Glaubitz
2024-12-27 16:30 ` Maciej W. Rozycki
2024-12-31 10:43 ` Magnus Lindholm
2025-01-12 23:25 ` Magnus Lindholm
2025-01-13 0:19 ` Maciej W. Rozycki
2025-01-13 3:08 ` Maciej W. Rozycki
2025-01-13 5:59 ` Magnus Lindholm
2025-01-13 8:04 ` Maciej W. Rozycki
2025-01-13 16:52 ` Magnus Lindholm
2025-01-20 13:01 ` Magnus Lindholm
2025-01-20 13:19 ` Maciej W. Rozycki
2025-01-21 13:39 ` Ivan Kokshaysky
2025-01-23 18:36 ` Ivan Kokshaysky
2025-01-23 23:00 ` Magnus Lindholm
2025-01-23 23:51 ` Michael Cree
2025-01-23 23:57 ` Maciej W. Rozycki
2025-01-24 6:06 ` Magnus Lindholm
2025-01-24 10:55 ` Ivan Kokshaysky
2025-01-24 16:57 ` Magnus Lindholm
2025-01-25 15:15 ` Ivan Kokshaysky
2025-01-25 17:01 ` Maciej W. Rozycki
2025-01-25 17:43 ` Ivan Kokshaysky
2025-01-25 18:25 ` Maciej W. Rozycki
2025-01-25 18:59 ` Maciej W. Rozycki
2025-01-25 19:48 ` Ivan Kokshaysky
2025-01-25 22:06 ` Maciej W. Rozycki
2025-01-25 23:02 ` Ivan Kokshaysky
2025-01-26 14:00 ` Ivan Kokshaysky
2025-01-26 19:15 ` Magnus Lindholm
2025-01-27 11:48 ` Ivan Kokshaysky
2025-01-27 11:56 ` John Paul Adrian Glaubitz
2025-01-25 18:07 ` Magnus Lindholm
2025-01-25 15:35 ` Maciej W. Rozycki
2025-01-25 17:09 ` Ivan Kokshaysky
2025-01-24 6:54 ` John Paul Adrian Glaubitz
2024-11-24 21:47 Magnus Lindholm
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox