All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH Dovetail 6.16 v2 0/2] Dovetail 6.16: Fix some regressions introduced during last forward port
@ 2025-08-22  6:54 Florian Bezdeka
  2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption Florian Bezdeka
  2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 2/2] irq_pipeline: Fix pipelining code for level triggered IRQs Florian Bezdeka
  0 siblings, 2 replies; 6+ messages in thread
From: Florian Bezdeka @ 2025-08-22  6:54 UTC (permalink / raw)
  To: xenomai; +Cc: Philippe Gerum, Jan Kiszka, Florian Bezdeka

Hi all,

seems the last forward port to 6.16 introduced some regressions.
Please review carefully, thanks!

With that applied I have two open problems:
  - Running dmesg kills the system, no idea yet
    Requires at least 3 CPUs on my qemu system to reproduce.
    Update: I know how to fix that, but not fully understood yet.

  - I'm able to reproduce [1] on 6.16 now
    Update: This one is understood now. Xenomai 3 bug.
    Tests already ongoing.

CC: Philippe Gerum <rpm@xenomai.org>
CC: Jan Kiszka <jan.kiszka@siemens.com>

[1] https://lore.kernel.org/xenomai/6fcb3f9830056d99161ea8f50a016b99d98a7c71.camel@siemens.com/T/

Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---
Changes in v2:
- Patch 1: Reworked as discussed on the list
           Following the same pattern as for arm64 
- Link to v1: https://lore.kernel.org/r/20250820-wip-flo-fixes-for-dovetail-6-15-v1-0-d44f50cb4b0b@siemens.com

---
Florian Bezdeka (2):
      x86/fpu: dovetail: Fix FPU corruption
      irq_pipeline: Fix pipelining code for level triggered IRQs

 arch/x86/include/asm/fpu/sched.h   | 17 ++++-------------
 arch/x86/include/asm/fpu/types.h   | 12 ------------
 arch/x86/include/asm/thread_info.h |  1 +
 arch/x86/kernel/fpu/core.c         | 12 ++++--------
 kernel/irq/chip.c                  | 33 +++++++++++++++++++++++++++++++--
 5 files changed, 40 insertions(+), 35 deletions(-)
---
base-commit: eec111a874f5899e1078ef785ccf29dff986e5a0
change-id: 20250820-wip-flo-fixes-for-dovetail-6-15-1368f7f6e0b1

Best regards,
-- 
Florian Bezdeka <florian.bezdeka@siemens.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption
  2025-08-22  6:54 [PATCH Dovetail 6.16 v2 0/2] Dovetail 6.16: Fix some regressions introduced during last forward port Florian Bezdeka
@ 2025-08-22  6:55 ` Florian Bezdeka
  2025-09-03 16:07   ` Philippe Gerum
  2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 2/2] irq_pipeline: Fix pipelining code for level triggered IRQs Florian Bezdeka
  1 sibling, 1 reply; 6+ messages in thread
From: Florian Bezdeka @ 2025-08-22  6:55 UTC (permalink / raw)
  To: xenomai; +Cc: Philippe Gerum, Jan Kiszka, Florian Bezdeka

The kernel splat looked as follows and could be reproduced by running
the Xenomai 3 testsuite over a stress-ng produced load. The problem is
specific to Dovetail 6.16.

[19527.937613] ------------[ cut here ]------------
[19527.937614] WARNING: CPU: 0 PID: 14 at arch/x86/kernel/fpu/core.c:62 x86_task_fpu+0x18/0x30
[19527.937619] Modules linked in: intel_rapl_msr intel_rapl_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel sha512_ssse3 sha1_ssse3 aesni_intel gf128mul rapl pcspkr button evdev serio_raw joydev sg loop efi_pstore fuse dm_mod drm dax configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom ata_generic sd_mod hid_generic hid_hyperv ata_piix psmouse hid hv_netvsc hv_storvsc libata scsi_transport_fc floppy i2c_piix4 i2c_smbus scsi_mod scsi_common [last unloaded: xeno_can]
[19527.937635] CPU: 0 UID: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.16.0-hyprv-xenomai-0+ #196 PREEMPT(voluntary)
[19527.937637] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[19527.937637] IRQ stage: Xenomai
[19527.937638] RIP: 0010:x86_task_fpu+0x18/0x30
[19527.937640] Code: 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8d 87 00 20 00 00 f6 47 3e 20 75 05 c3 cc cc cc cc 90 <0f> 0b 90 31 c0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f
[19527.937641] RSP: 0018:ffffba19c0083b60 EFLAGS: 00010002
[19527.937641] RAX: ffffa0a90133c3c0 RBX: 0000000000000001 RCX: ffffa0a93fc5a6e8
[19527.937642] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffffa0a90133a3c0
[19527.937643] RBP: ffffa0a93fc5a058 R08: 01364d9cb9685197 R09: ffffffff83e406f8
[19527.937643] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[19527.937643] R13: ffffa0a90133a3c0 R14: ffffba19c0c62008 R15: 0000000000000001
[19527.937645] FS:  0000000000000000(0000) GS:ffffa0a9ba664000(0000) knlGS:0000000000000000
[19527.937646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19527.937646] CR2: 00007f5c694363c0 CR3: 000000003e038004 CR4: 00000000003706f0
[19527.937647] Call Trace:
[19527.937648]  <TASK>
[19527.937649]  fpu__suspend_inband+0x3c/0x50
[19527.937651]  dovetail_context_switch+0x130/0x240
[19527.937652]  ? xntimer_start+0x101/0x240
[19527.937656]  ___xnsched_run+0x1f9/0x4d0
[19527.937659]  run_oob_call+0x87/0x140
[19527.937662]  handle_irq_pipelined_finish+0x188/0x1a0
[19527.937664]  arch_pipeline_entry+0x43/0xf0
[19527.937667]  asm_sysvec_hyperv_stimer0+0x16/0x20
[19527.937669] RIP: 0010:kernel_fpu_end+0x5/0x50
[19527.937670] Code: 57 02 e8 9e 12 42 00 31 c0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <65> 0f b6 05 03 40 57 02 84 c0 75 31 65 c6 05 f7 3f 57 02 01 65 f6
[19527.937671] RSP: 0018:ffffba19c0083d28 EFLAGS: 00000246
[19527.937671] RAX: 000000000000000c RBX: 0000000000000001 RCX: ffffffff842f2260
[19527.937672] RDX: 0000000000000000 RSI: ffffffff84c67450 RDI: ffffffff84c673e0
[19527.937672] RBP: 0000000000000001 R08: ffffffff842f2260 R09: 0000000000000000
[19527.937672] R10: ffffffff84a080c0 R11: 0000000000000001 R12: ffffffff84c67410
[19527.937673] R13: ffffffff84c673e0 R14: 0000000000000040 R15: 0000000000000004
[19527.937675]  blake2s_compress+0x52/0xa0
[19527.937679]  blake2s_update+0x75/0x150
[19527.937681]  add_timer_randomness+0xac/0x1a0
[19527.937684]  scsi_end_request+0x297/0x3a0 [scsi_mod]
[19527.937697]  scsi_io_completion+0x55/0x6a0 [scsi_mod]
[19527.937707]  blk_done_softirq+0x46/0x60
[19527.937709]  handle_softirqs+0xd0/0x430
[19527.937711]  ? __pfx_smpboot_thread_fn+0x10/0x10
[19527.937713]  run_ksoftirqd+0x47/0x60
[19527.937714]  smpboot_thread_fn+0xec/0x220
[19527.937716]  kthread+0xf7/0x240
[19527.937718]  ? __pfx_kthread+0x10/0x10
[19527.937720]  ret_from_fork+0x28c/0x310
[19527.937721]  ? __pfx_kthread+0x10/0x10
[19527.937722]  ret_from_fork_asm+0x1a/0x30
[19527.937725]  </TASK>
[19527.937725] irq event stamp: 1270
[19527.937726] hardirqs last  enabled at (1269): [<ffffffff830b2b1c>] run_ksoftirqd+0x4c/0x60
[19527.937727] hardirqs last disabled at (1270): [<ffffffff83d90d53>] __schedule+0x853/0x1080
[19527.937729] softirqs last  enabled at (1268): [<ffffffff830b29d6>] handle_softirqs+0x346/0x430
[19527.937730] softirqs last disabled at (1215): [<ffffffff830b2b17>] run_ksoftirqd+0x47/0x60
[19527.937731] ---[ end trace 0000000000000000 ]---

The main problem was the x86_task_fpu(tsk) helper, which will deliver
NULL in case CONFIG_X86_DEBUG_FPU is enabled and
(tsk->flags & PF_KTHREAD) is true. We ended up in dereferencing a
NULL pointer.

This problem could be fixed by not using the x86_task_fpu().
But:
  - It seems that the location of struct fpu is under development.
  - Moving the preemption marker from struct fpu to
    struct thread_info::flags synchronizes the amd64 and arm64
    implementation.

Second, the early exit in fpu__resume_inband() was wrong, as that
prevented the FPU restore of in-band kthreads.

Link: https://lore.kernel.org/xenomai/80a48c1ddc3721eb8022b77548028aa8e7371928.camel@siemens.com/T/
Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---
 arch/x86/include/asm/fpu/sched.h   | 17 ++++-------------
 arch/x86/include/asm/fpu/types.h   | 12 ------------
 arch/x86/include/asm/thread_info.h |  1 +
 arch/x86/kernel/fpu/core.c         | 12 ++++--------
 4 files changed, 9 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
index ba677ffa4fbb..2b66960accd4 100644
--- a/arch/x86/include/asm/fpu/sched.h
+++ b/arch/x86/include/asm/fpu/sched.h
@@ -18,19 +18,9 @@ extern void fpu_flush_thread(void);
 
 #ifdef CONFIG_DOVETAIL
 
-static inline void oob_fpu_set_preempt(struct fpu *fpu)
+static inline bool oob_fpu_preempted(struct thread_info *old_ti)
 {
-	fpu->preempted = true;
-}
-
-static inline void oob_fpu_clear_preempt(struct fpu *fpu)
-{
-	fpu->preempted = false;
-}
-
-static inline bool oob_fpu_preempted(struct fpu *old_fpu)
-{
-	return old_fpu->preempted;
+	return test_thread_flag(TIF_KERNEL_FP_PREEMPTED);
 }
 
 #else
@@ -60,9 +50,10 @@ static inline void switch_fpu(struct task_struct *old, int cpu)
 {
 	if (!test_tsk_thread_flag(old, TIF_NEED_FPU_LOAD) &&
 		cpu_feature_enabled(X86_FEATURE_FPU)) {
+		struct thread_info *old_ti = task_thread_info(old);
 		struct fpu *old_fpu = x86_task_fpu(old);
 		if (!(old->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
-			!oob_fpu_preempted(old_fpu)) {
+			!oob_fpu_preempted(old_ti)) {
 			set_tsk_thread_flag(old, TIF_NEED_FPU_LOAD);
 			save_fpregs_to_fpstate(old_fpu);
 			/*
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index c329a878a581..1c94121acd3d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -473,18 +473,6 @@ struct fpu {
 	 */
 	unsigned int			last_cpu;
 
-#ifdef CONFIG_DOVETAIL
-	/*
-	 * @preempted:
-	 *
-	 * When Dovetail is enabled, this flag is set for the inband
-	 * task context saved when entering a kernel_fpu_begin/end()
-	 * section before the latter got preempted by an out-of-band
-	 * task.
-	 */
-	bool				preempted : 1;
-#endif
-
 	/*
 	 * @avx512_timestamp:
 	 *
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 77c4c632cef2..53f11b73d0f0 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -94,6 +94,7 @@ struct thread_info {
 #define TIF_NEED_RESCHED_LAZY	4	/* Lazy rescheduling needed */
 #define TIF_SINGLESTEP		5	/* reenable singlestep on user return*/
 #define TIF_SSBD		6	/* Speculative store bypass disable */
+#define TIF_KERNEL_FP_PREEMPTED	7	/* Dovetail: kernel mode FPU section preempted by OOB */
 #define TIF_SPEC_IB		9	/* Indirect branch speculation mitigation */
 #define TIF_SPEC_L1D_FLUSH	10	/* Flush L1D on mm switches (processes) */
 #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7b87a04c0b6e..6d185ad1a996 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -1044,7 +1044,6 @@ static int fpu__drop_kernel_fpstate(unsigned int cpu)
 void fpu__suspend_inband(void)
 {
 	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
-	struct task_struct *tsk = current;
 
 	/*
 	 * If kernel_fpu_allowed is false, we are dealing with the
@@ -1054,23 +1053,20 @@ void fpu__suspend_inband(void)
 	if (!this_cpu_read(kernel_fpu_allowed)) {
 		save_fpregs_to_fpstate(kfpu);
 		__cpu_invalidate_fpregs_state();
-		oob_fpu_set_preempt(x86_task_fpu(tsk));
+		set_thread_flag(TIF_KERNEL_FP_PREEMPTED);
 	}
 }
 
 void fpu__resume_inband(void)
 {
 	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
-	struct task_struct *tsk = current;
 
-	if (tsk->flags & PF_KTHREAD)
-		return;
-
-	if (oob_fpu_preempted(x86_task_fpu(tsk))) {
+	if (test_and_clear_thread_flag(TIF_KERNEL_FP_PREEMPTED)) {
 		restore_fpregs_from_fpstate(kfpu->fpstate, XFEATURE_MASK_FPSTATE);
 		__cpu_invalidate_fpregs_state();
-		oob_fpu_clear_preempt(x86_task_fpu(tsk));
 	} else {
+		if (current->flags & PF_KTHREAD)
+			return;
 		if (test_thread_flag(TIF_NEED_FPU_LOAD))
 			switch_fpu_return();
 	}

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH Dovetail 6.16 v2 2/2] irq_pipeline: Fix pipelining code for level triggered IRQs
  2025-08-22  6:54 [PATCH Dovetail 6.16 v2 0/2] Dovetail 6.16: Fix some regressions introduced during last forward port Florian Bezdeka
  2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption Florian Bezdeka
@ 2025-08-22  6:55 ` Florian Bezdeka
  1 sibling, 0 replies; 6+ messages in thread
From: Florian Bezdeka @ 2025-08-22  6:55 UTC (permalink / raw)
  To: xenomai; +Cc: Philippe Gerum, Jan Kiszka, Florian Bezdeka

During the last forward port to 6.16 the pipelining code was
accidentally removed from handle_level_irq().

The result was the following warning, observed on qemu/kvm:

[    0.532822] ------------[ cut here ]------------
[    0.532823] WARNING: CPU: 0 PID: 0 at ./include/linux/interrupt.h:670 update_process_times+0xc8/0xf0
[    0.532832] Modules linked in:
[    0.532836] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.16.0-hyprv-xenomai-0+ #212 PREEMPT(voluntary)
[    0.532840] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[    0.532841] IRQ stage: Linux
[    0.532843] RIP: 0010:update_process_times+0xc8/0xf0
[    0.532846] Code: e8 fd f1 01 00 84 c0 74 be 8b 05 17 5d d8 01 85 c0 74 1a 65 8b 05 30 13 6a 02 85 c0 75 0f 65 8b 05 25 10 6a 02 85 c0 75 04 900
[    0.532848] RSP: 0000:ffffffff82e03cb0 EFLAGS: 00010046
[    0.532851] RAX: 0000000000000000 RBX: ffff88803e64a800 RCX: 0000000000000000
[    0.532852] RDX: 00000000fffedb0a RSI: ffff88803e64cd80 RDI: ffff88803e64e040
[    0.532854] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
[    0.532855] R10: 2ecf57e6e4d1a59e R11: 0000000000000185 R12: 0000000000000000
[    0.532856] R13: ffff888004c52400 R14: 0000000000000000 R15: ffffffff82e277c0
[    0.532885] FS:  0000000000000000(0000) GS:ffff8880babc2000(0000) knlGS:0000000000000000
[    0.532887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.532888] CR2: ffff888004801000 CR3: 0000000002e38000 CR4: 00000000000000b0
[    0.532889] Call Trace:
[    0.532891]  <TASK>
[    0.532895]  tick_periodic+0x29/0xe0
[    0.532909]  tick_handle_periodic+0x20/0x70
[    0.532914]  timer_interrupt+0x24/0x40
[    0.532918]  __handle_irq_event_percpu+0x9d/0x2e0
[    0.532924]  handle_irq_event+0x58/0xe0
[    0.532928]  handle_level_irq+0x166/0x1e0
[    0.532933]  generic_pipeline_irq_desc+0xc2/0x220
[    0.532937]  arch_handle_irq+0x5d/0x180
[    0.532943]  arch_pipeline_entry+0x38/0xf0
[    0.532950]  asm_common_interrupt+0x22/0x40
[    0.532953] RIP: 0010:unmask_8259A+0x68/0x70
[    0.532956] Code: c1 01 e6 21 0f b6 05 28 13 c1 01 e6 a1 48 c7 c7 b8 4f e4 82 48 89 de e8 06 3a 11 00 48 c7 c7 a0 4f e4 82 e8 da 6b 11 00 55 9d0
[    0.532957] RSP: 0000:ffffffff82e03e90 EFLAGS: 00000246
[    0.532959] RAX: 0000000000000001 RBX: ffffffff81233c31 RCX: ffffffff82e03e54
[    0.532960] RDX: 0000000000000000 RSI: ffffffff82e44fb8 RDI: ffffffff82e44fa0
[    0.532961] RBP: 0000000000000246 R08: 0000000000000001 R09: ffffffff82e286b0
[    0.532962] R10: 5878d26fbb66c2c1 R11: 0000000000000000 R12: ffffffff82e277c0
[    0.532963] R13: ffff88803ffd1f40 R14: ffffffffffffffea R15: ffffffff82e26e30
[    0.532966]  ? unmask_8259A+0x11/0x70
[    0.532976]  ? unmask_8259A+0x11/0x70
[    0.532979]  enable_IR_x2apic+0xf0/0x240
[    0.532986]  x86_64_probe_apic+0xb/0x50
[    0.532990]  apic_intr_mode_init+0x3e/0xf0
[    0.532992]  x86_late_time_init+0x20/0x40
[    0.532995]  start_kernel+0x75a/0x8b0
[    0.533002]  x86_64_start_reservations+0x14/0x30
[    0.533005]  x86_64_start_kernel+0xff/0x110
[    0.533009]  common_startup_64+0x13e/0x148
[    0.533022]  </TASK>

Forward porting the (old) code from 6.15 again to fix this problem.

Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
---
 kernel/irq/chip.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 8a0831193d74..aacacb5869cd 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -846,15 +846,44 @@ static void cond_unmask_irq(struct irq_desc *desc)
  */
 void handle_level_irq(struct irq_desc *desc)
 {
+	int flow;
+
 	guard(hybrid_spinlock)(&desc->lock);
-	mask_ack_irq(desc);
 
-	if (!irq_can_handle(desc))
+	flow = get_flow_step(desc);
+	if (may_start_flow(flow)) {
+		mask_ack_irq(desc);
+
+		if (!irq_can_handle(desc))
+			return;
+	}
+
+	if (should_feed_pipeline(desc, flow)) {
+		if (handle_oob_irq(desc))
+			goto out_unmask;
+		return;
+	}
+
+	if (flow == IRQ_FLOW_FORWARD) {
+		forward_irq_event(desc);
 		return;
+	}
+
+	desc->istate &= ~(IRQS_REPLAY | IRQS_WAITING);
+
+	/*
+	 * If its disabled or no action available keep it masked and get out
+	 * of here
+	 */
+	if (unlikely(!desc->action || irqd_irq_disabled(&desc->irq_data))) {
+		desc->istate |= IRQS_PENDING;
+		return;
+	}
 
 	kstat_incr_irqs_this_cpu(desc);
 	handle_irq_event(desc);
 
+out_unmask:
 	cond_unmask_irq(desc);
 }
 EXPORT_SYMBOL_GPL(handle_level_irq);

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption
  2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption Florian Bezdeka
@ 2025-09-03 16:07   ` Philippe Gerum
  2025-09-04  7:36     ` Florian Bezdeka
  0 siblings, 1 reply; 6+ messages in thread
From: Philippe Gerum @ 2025-09-03 16:07 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: xenomai, Jan Kiszka

Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> The kernel splat looked as follows and could be reproduced by running
> the Xenomai 3 testsuite over a stress-ng produced load. The problem is
> specific to Dovetail 6.16.
>
> [19527.937613] ------------[ cut here ]------------
> [19527.937614] WARNING: CPU: 0 PID: 14 at arch/x86/kernel/fpu/core.c:62 x86_task_fpu+0x18/0x30
> [19527.937619] Modules linked in: intel_rapl_msr intel_rapl_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel sha512_ssse3 sha1_ssse3 aesni_intel gf128mul rapl pcspkr button evdev serio_raw joydev sg loop efi_pstore fuse dm_mod drm dax configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom ata_generic sd_mod hid_generic hid_hyperv ata_piix psmouse hid hv_netvsc hv_storvsc libata scsi_transport_fc floppy i2c_piix4 i2c_smbus scsi_mod scsi_common [last unloaded: xeno_can]
> [19527.937635] CPU: 0 UID: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.16.0-hyprv-xenomai-0+ #196 PREEMPT(voluntary)
> [19527.937637] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> [19527.937637] IRQ stage: Xenomai
> [19527.937638] RIP: 0010:x86_task_fpu+0x18/0x30
> [19527.937640] Code: 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8d 87 00 20 00 00 f6 47 3e 20 75 05 c3 cc cc cc cc 90 <0f> 0b 90 31 c0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f
> [19527.937641] RSP: 0018:ffffba19c0083b60 EFLAGS: 00010002
> [19527.937641] RAX: ffffa0a90133c3c0 RBX: 0000000000000001 RCX: ffffa0a93fc5a6e8
> [19527.937642] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffffa0a90133a3c0
> [19527.937643] RBP: ffffa0a93fc5a058 R08: 01364d9cb9685197 R09: ffffffff83e406f8
> [19527.937643] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> [19527.937643] R13: ffffa0a90133a3c0 R14: ffffba19c0c62008 R15: 0000000000000001
> [19527.937645] FS:  0000000000000000(0000) GS:ffffa0a9ba664000(0000) knlGS:0000000000000000
> [19527.937646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19527.937646] CR2: 00007f5c694363c0 CR3: 000000003e038004 CR4: 00000000003706f0
> [19527.937647] Call Trace:
> [19527.937648]  <TASK>
> [19527.937649]  fpu__suspend_inband+0x3c/0x50
> [19527.937651]  dovetail_context_switch+0x130/0x240
> [19527.937652]  ? xntimer_start+0x101/0x240
> [19527.937656]  ___xnsched_run+0x1f9/0x4d0
> [19527.937659]  run_oob_call+0x87/0x140
> [19527.937662]  handle_irq_pipelined_finish+0x188/0x1a0
> [19527.937664]  arch_pipeline_entry+0x43/0xf0
> [19527.937667]  asm_sysvec_hyperv_stimer0+0x16/0x20
> [19527.937669] RIP: 0010:kernel_fpu_end+0x5/0x50
> [19527.937670] Code: 57 02 e8 9e 12 42 00 31 c0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <65> 0f b6 05 03 40 57 02 84 c0 75 31 65 c6 05 f7 3f 57 02 01 65 f6
> [19527.937671] RSP: 0018:ffffba19c0083d28 EFLAGS: 00000246
> [19527.937671] RAX: 000000000000000c RBX: 0000000000000001 RCX: ffffffff842f2260
> [19527.937672] RDX: 0000000000000000 RSI: ffffffff84c67450 RDI: ffffffff84c673e0
> [19527.937672] RBP: 0000000000000001 R08: ffffffff842f2260 R09: 0000000000000000
> [19527.937672] R10: ffffffff84a080c0 R11: 0000000000000001 R12: ffffffff84c67410
> [19527.937673] R13: ffffffff84c673e0 R14: 0000000000000040 R15: 0000000000000004
> [19527.937675]  blake2s_compress+0x52/0xa0
> [19527.937679]  blake2s_update+0x75/0x150
> [19527.937681]  add_timer_randomness+0xac/0x1a0
> [19527.937684]  scsi_end_request+0x297/0x3a0 [scsi_mod]
> [19527.937697]  scsi_io_completion+0x55/0x6a0 [scsi_mod]
> [19527.937707]  blk_done_softirq+0x46/0x60
> [19527.937709]  handle_softirqs+0xd0/0x430
> [19527.937711]  ? __pfx_smpboot_thread_fn+0x10/0x10
> [19527.937713]  run_ksoftirqd+0x47/0x60
> [19527.937714]  smpboot_thread_fn+0xec/0x220
> [19527.937716]  kthread+0xf7/0x240
> [19527.937718]  ? __pfx_kthread+0x10/0x10
> [19527.937720]  ret_from_fork+0x28c/0x310
> [19527.937721]  ? __pfx_kthread+0x10/0x10
> [19527.937722]  ret_from_fork_asm+0x1a/0x30
> [19527.937725]  </TASK>
> [19527.937725] irq event stamp: 1270
> [19527.937726] hardirqs last  enabled at (1269): [<ffffffff830b2b1c>] run_ksoftirqd+0x4c/0x60
> [19527.937727] hardirqs last disabled at (1270): [<ffffffff83d90d53>] __schedule+0x853/0x1080
> [19527.937729] softirqs last  enabled at (1268): [<ffffffff830b29d6>] handle_softirqs+0x346/0x430
> [19527.937730] softirqs last disabled at (1215): [<ffffffff830b2b17>] run_ksoftirqd+0x47/0x60
> [19527.937731] ---[ end trace 0000000000000000 ]---
>
> The main problem was the x86_task_fpu(tsk) helper, which will deliver
> NULL in case CONFIG_X86_DEBUG_FPU is enabled and
> (tsk->flags & PF_KTHREAD) is true. We ended up in dereferencing a
> NULL pointer.
>
> This problem could be fixed by not using the x86_task_fpu().
> But:
>   - It seems that the location of struct fpu is under development.
>   - Moving the preemption marker from struct fpu to
>     struct thread_info::flags synchronizes the amd64 and arm64
>     implementation.
>
> Second, the early exit in fpu__resume_inband() was wrong, as that
> prevented the FPU restore of in-band kthreads.
>
> Link: https://lore.kernel.org/xenomai/80a48c1ddc3721eb8022b77548028aa8e7371928.camel@siemens.com/T/
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> ---
>  arch/x86/include/asm/fpu/sched.h   | 17 ++++-------------
>  arch/x86/include/asm/fpu/types.h   | 12 ------------
>  arch/x86/include/asm/thread_info.h |  1 +
>  arch/x86/kernel/fpu/core.c         | 12 ++++--------
>  4 files changed, 9 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
> index ba677ffa4fbb..2b66960accd4 100644
> --- a/arch/x86/include/asm/fpu/sched.h
> +++ b/arch/x86/include/asm/fpu/sched.h
> @@ -18,19 +18,9 @@ extern void fpu_flush_thread(void);
>  
>  #ifdef CONFIG_DOVETAIL
>  
> -static inline void oob_fpu_set_preempt(struct fpu *fpu)
> +static inline bool oob_fpu_preempted(struct thread_info *old_ti)
>  {
> -	fpu->preempted = true;
> -}
> -
> -static inline void oob_fpu_clear_preempt(struct fpu *fpu)
> -{
> -	fpu->preempted = false;
> -}
> -
> -static inline bool oob_fpu_preempted(struct fpu *old_fpu)
> -{
> -	return old_fpu->preempted;
> +	return test_thread_flag(TIF_KERNEL_FP_PREEMPTED);
>  }
>  
>  #else
> @@ -60,9 +50,10 @@ static inline void switch_fpu(struct task_struct *old, int cpu)
>  {
>  	if (!test_tsk_thread_flag(old, TIF_NEED_FPU_LOAD) &&
>  		cpu_feature_enabled(X86_FEATURE_FPU)) {
> +		struct thread_info *old_ti = task_thread_info(old);
>  		struct fpu *old_fpu = x86_task_fpu(old);
>  		if (!(old->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
> -			!oob_fpu_preempted(old_fpu)) {
> +			!oob_fpu_preempted(old_ti)) {
>  			set_tsk_thread_flag(old, TIF_NEED_FPU_LOAD);
>  			save_fpregs_to_fpstate(old_fpu);
>  			/*
> diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
> index c329a878a581..1c94121acd3d 100644
> --- a/arch/x86/include/asm/fpu/types.h
> +++ b/arch/x86/include/asm/fpu/types.h
> @@ -473,18 +473,6 @@ struct fpu {
>  	 */
>  	unsigned int			last_cpu;
>  
> -#ifdef CONFIG_DOVETAIL
> -	/*
> -	 * @preempted:
> -	 *
> -	 * When Dovetail is enabled, this flag is set for the inband
> -	 * task context saved when entering a kernel_fpu_begin/end()
> -	 * section before the latter got preempted by an out-of-band
> -	 * task.
> -	 */
> -	bool				preempted : 1;
> -#endif
> -
>  	/*
>  	 * @avx512_timestamp:
>  	 *
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index 77c4c632cef2..53f11b73d0f0 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -94,6 +94,7 @@ struct thread_info {
>  #define TIF_NEED_RESCHED_LAZY	4	/* Lazy rescheduling needed */
>  #define TIF_SINGLESTEP		5	/* reenable singlestep on user return*/
>  #define TIF_SSBD		6	/* Speculative store bypass disable */
> +#define TIF_KERNEL_FP_PREEMPTED	7	/* Dovetail: kernel mode
> FPU section preempted by OOB */

Only the current task may set/clear/read this flag, so the latter should
rather go to the TLF (synchronous) section. As a consequence, non-atomic
ops would be enough to access it.

>  #define TIF_SPEC_IB		9	/* Indirect branch speculation mitigation */
>  #define TIF_SPEC_L1D_FLUSH	10	/* Flush L1D on mm switches (processes) */
>  #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
> diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> index 7b87a04c0b6e..6d185ad1a996 100644
> --- a/arch/x86/kernel/fpu/core.c
> +++ b/arch/x86/kernel/fpu/core.c
> @@ -1044,7 +1044,6 @@ static int fpu__drop_kernel_fpstate(unsigned int cpu)
>  void fpu__suspend_inband(void)
>  {
>  	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
> -	struct task_struct *tsk = current;
>  
>  	/*
>  	 * If kernel_fpu_allowed is false, we are dealing with the
> @@ -1054,23 +1053,20 @@ void fpu__suspend_inband(void)
>  	if (!this_cpu_read(kernel_fpu_allowed)) {
>  		save_fpregs_to_fpstate(kfpu);
>  		__cpu_invalidate_fpregs_state();
> -		oob_fpu_set_preempt(x86_task_fpu(tsk));
> +		set_thread_flag(TIF_KERNEL_FP_PREEMPTED);
>  	}
>  }
>  
>  void fpu__resume_inband(void)
>  {
>  	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
> -	struct task_struct *tsk = current;
>  
> -	if (tsk->flags & PF_KTHREAD)
> -		return;
> -
> -	if (oob_fpu_preempted(x86_task_fpu(tsk))) {
> +	if (test_and_clear_thread_flag(TIF_KERNEL_FP_PREEMPTED)) {

In most cases, we won't need to clear this flag, so a split
test then clear-if-set approach would be more efficient.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption
  2025-09-03 16:07   ` Philippe Gerum
@ 2025-09-04  7:36     ` Florian Bezdeka
  2025-09-04  7:54       ` Philippe Gerum
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Bezdeka @ 2025-09-04  7:36 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai, Jan Kiszka

On Wed, 2025-09-03 at 18:07 +0200, Philippe Gerum wrote:
> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
> 
> > The kernel splat looked as follows and could be reproduced by running
> > the Xenomai 3 testsuite over a stress-ng produced load. The problem is
> > specific to Dovetail 6.16.
> > 
> > [19527.937613] ------------[ cut here ]------------
> > [19527.937614] WARNING: CPU: 0 PID: 14 at arch/x86/kernel/fpu/core.c:62 x86_task_fpu+0x18/0x30
> > [19527.937619] Modules linked in: intel_rapl_msr intel_rapl_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel sha512_ssse3 sha1_ssse3 aesni_intel gf128mul rapl pcspkr button evdev serio_raw joydev sg loop efi_pstore fuse dm_mod drm dax configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom ata_generic sd_mod hid_generic hid_hyperv ata_piix psmouse hid hv_netvsc hv_storvsc libata scsi_transport_fc floppy i2c_piix4 i2c_smbus scsi_mod scsi_common [last unloaded: xeno_can]
> > [19527.937635] CPU: 0 UID: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.16.0-hyprv-xenomai-0+ #196 PREEMPT(voluntary)
> > [19527.937637] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
> > [19527.937637] IRQ stage: Xenomai
> > [19527.937638] RIP: 0010:x86_task_fpu+0x18/0x30
> > [19527.937640] Code: 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8d 87 00 20 00 00 f6 47 3e 20 75 05 c3 cc cc cc cc 90 <0f> 0b 90 31 c0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f
> > [19527.937641] RSP: 0018:ffffba19c0083b60 EFLAGS: 00010002
> > [19527.937641] RAX: ffffa0a90133c3c0 RBX: 0000000000000001 RCX: ffffa0a93fc5a6e8
> > [19527.937642] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffffa0a90133a3c0
> > [19527.937643] RBP: ffffa0a93fc5a058 R08: 01364d9cb9685197 R09: ffffffff83e406f8
> > [19527.937643] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
> > [19527.937643] R13: ffffa0a90133a3c0 R14: ffffba19c0c62008 R15: 0000000000000001
> > [19527.937645] FS:  0000000000000000(0000) GS:ffffa0a9ba664000(0000) knlGS:0000000000000000
> > [19527.937646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [19527.937646] CR2: 00007f5c694363c0 CR3: 000000003e038004 CR4: 00000000003706f0
> > [19527.937647] Call Trace:
> > [19527.937648]  <TASK>
> > [19527.937649]  fpu__suspend_inband+0x3c/0x50
> > [19527.937651]  dovetail_context_switch+0x130/0x240
> > [19527.937652]  ? xntimer_start+0x101/0x240
> > [19527.937656]  ___xnsched_run+0x1f9/0x4d0
> > [19527.937659]  run_oob_call+0x87/0x140
> > [19527.937662]  handle_irq_pipelined_finish+0x188/0x1a0
> > [19527.937664]  arch_pipeline_entry+0x43/0xf0
> > [19527.937667]  asm_sysvec_hyperv_stimer0+0x16/0x20
> > [19527.937669] RIP: 0010:kernel_fpu_end+0x5/0x50
> > [19527.937670] Code: 57 02 e8 9e 12 42 00 31 c0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <65> 0f b6 05 03 40 57 02 84 c0 75 31 65 c6 05 f7 3f 57 02 01 65 f6
> > [19527.937671] RSP: 0018:ffffba19c0083d28 EFLAGS: 00000246
> > [19527.937671] RAX: 000000000000000c RBX: 0000000000000001 RCX: ffffffff842f2260
> > [19527.937672] RDX: 0000000000000000 RSI: ffffffff84c67450 RDI: ffffffff84c673e0
> > [19527.937672] RBP: 0000000000000001 R08: ffffffff842f2260 R09: 0000000000000000
> > [19527.937672] R10: ffffffff84a080c0 R11: 0000000000000001 R12: ffffffff84c67410
> > [19527.937673] R13: ffffffff84c673e0 R14: 0000000000000040 R15: 0000000000000004
> > [19527.937675]  blake2s_compress+0x52/0xa0
> > [19527.937679]  blake2s_update+0x75/0x150
> > [19527.937681]  add_timer_randomness+0xac/0x1a0
> > [19527.937684]  scsi_end_request+0x297/0x3a0 [scsi_mod]
> > [19527.937697]  scsi_io_completion+0x55/0x6a0 [scsi_mod]
> > [19527.937707]  blk_done_softirq+0x46/0x60
> > [19527.937709]  handle_softirqs+0xd0/0x430
> > [19527.937711]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > [19527.937713]  run_ksoftirqd+0x47/0x60
> > [19527.937714]  smpboot_thread_fn+0xec/0x220
> > [19527.937716]  kthread+0xf7/0x240
> > [19527.937718]  ? __pfx_kthread+0x10/0x10
> > [19527.937720]  ret_from_fork+0x28c/0x310
> > [19527.937721]  ? __pfx_kthread+0x10/0x10
> > [19527.937722]  ret_from_fork_asm+0x1a/0x30
> > [19527.937725]  </TASK>
> > [19527.937725] irq event stamp: 1270
> > [19527.937726] hardirqs last  enabled at (1269): [<ffffffff830b2b1c>] run_ksoftirqd+0x4c/0x60
> > [19527.937727] hardirqs last disabled at (1270): [<ffffffff83d90d53>] __schedule+0x853/0x1080
> > [19527.937729] softirqs last  enabled at (1268): [<ffffffff830b29d6>] handle_softirqs+0x346/0x430
> > [19527.937730] softirqs last disabled at (1215): [<ffffffff830b2b17>] run_ksoftirqd+0x47/0x60
> > [19527.937731] ---[ end trace 0000000000000000 ]---
> > 
> > The main problem was the x86_task_fpu(tsk) helper, which will deliver
> > NULL in case CONFIG_X86_DEBUG_FPU is enabled and
> > (tsk->flags & PF_KTHREAD) is true. We ended up in dereferencing a
> > NULL pointer.
> > 
> > This problem could be fixed by not using the x86_task_fpu().
> > But:
> >   - It seems that the location of struct fpu is under development.
> >   - Moving the preemption marker from struct fpu to
> >     struct thread_info::flags synchronizes the amd64 and arm64
> >     implementation.
> > 
> > Second, the early exit in fpu__resume_inband() was wrong, as that
> > prevented the FPU restore of in-band kthreads.
> > 
> > Link: https://lore.kernel.org/xenomai/80a48c1ddc3721eb8022b77548028aa8e7371928.camel@siemens.com/T/
> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> > ---
> >  arch/x86/include/asm/fpu/sched.h   | 17 ++++-------------
> >  arch/x86/include/asm/fpu/types.h   | 12 ------------
> >  arch/x86/include/asm/thread_info.h |  1 +
> >  arch/x86/kernel/fpu/core.c         | 12 ++++--------
> >  4 files changed, 9 insertions(+), 33 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
> > index ba677ffa4fbb..2b66960accd4 100644
> > --- a/arch/x86/include/asm/fpu/sched.h
> > +++ b/arch/x86/include/asm/fpu/sched.h
> > @@ -18,19 +18,9 @@ extern void fpu_flush_thread(void);
> >  
> >  #ifdef CONFIG_DOVETAIL
> >  
> > -static inline void oob_fpu_set_preempt(struct fpu *fpu)
> > +static inline bool oob_fpu_preempted(struct thread_info *old_ti)
> >  {
> > -	fpu->preempted = true;
> > -}
> > -
> > -static inline void oob_fpu_clear_preempt(struct fpu *fpu)
> > -{
> > -	fpu->preempted = false;
> > -}
> > -
> > -static inline bool oob_fpu_preempted(struct fpu *old_fpu)
> > -{
> > -	return old_fpu->preempted;
> > +	return test_thread_flag(TIF_KERNEL_FP_PREEMPTED);
> >  }
> >  
> >  #else
> > @@ -60,9 +50,10 @@ static inline void switch_fpu(struct task_struct *old, int cpu)
> >  {
> >  	if (!test_tsk_thread_flag(old, TIF_NEED_FPU_LOAD) &&
> >  		cpu_feature_enabled(X86_FEATURE_FPU)) {
> > +		struct thread_info *old_ti = task_thread_info(old);
> >  		struct fpu *old_fpu = x86_task_fpu(old);
> >  		if (!(old->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
> > -			!oob_fpu_preempted(old_fpu)) {
> > +			!oob_fpu_preempted(old_ti)) {
> >  			set_tsk_thread_flag(old, TIF_NEED_FPU_LOAD);
> >  			save_fpregs_to_fpstate(old_fpu);
> >  			/*
> > diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
> > index c329a878a581..1c94121acd3d 100644
> > --- a/arch/x86/include/asm/fpu/types.h
> > +++ b/arch/x86/include/asm/fpu/types.h
> > @@ -473,18 +473,6 @@ struct fpu {
> >  	 */
> >  	unsigned int			last_cpu;
> >  
> > -#ifdef CONFIG_DOVETAIL
> > -	/*
> > -	 * @preempted:
> > -	 *
> > -	 * When Dovetail is enabled, this flag is set for the inband
> > -	 * task context saved when entering a kernel_fpu_begin/end()
> > -	 * section before the latter got preempted by an out-of-band
> > -	 * task.
> > -	 */
> > -	bool				preempted : 1;
> > -#endif
> > -
> >  	/*
> >  	 * @avx512_timestamp:
> >  	 *
> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> > index 77c4c632cef2..53f11b73d0f0 100644
> > --- a/arch/x86/include/asm/thread_info.h
> > +++ b/arch/x86/include/asm/thread_info.h
> > @@ -94,6 +94,7 @@ struct thread_info {
> >  #define TIF_NEED_RESCHED_LAZY	4	/* Lazy rescheduling needed */
> >  #define TIF_SINGLESTEP		5	/* reenable singlestep on user return*/
> >  #define TIF_SSBD		6	/* Speculative store bypass disable */
> > +#define TIF_KERNEL_FP_PREEMPTED	7	/* Dovetail: kernel mode
> > FPU section preempted by OOB */
> 
> Only the current task may set/clear/read this flag, so the latter should
> rather go to the TLF (synchronous) section. As a consequence, non-atomic
> ops would be enough to access it.
> 
> >  #define TIF_SPEC_IB		9	/* Indirect branch speculation mitigation */
> >  #define TIF_SPEC_L1D_FLUSH	10	/* Flush L1D on mm switches (processes) */
> >  #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
> > index 7b87a04c0b6e..6d185ad1a996 100644
> > --- a/arch/x86/kernel/fpu/core.c
> > +++ b/arch/x86/kernel/fpu/core.c
> > @@ -1044,7 +1044,6 @@ static int fpu__drop_kernel_fpstate(unsigned int cpu)
> >  void fpu__suspend_inband(void)
> >  {
> >  	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
> > -	struct task_struct *tsk = current;
> >  
> >  	/*
> >  	 * If kernel_fpu_allowed is false, we are dealing with the
> > @@ -1054,23 +1053,20 @@ void fpu__suspend_inband(void)
> >  	if (!this_cpu_read(kernel_fpu_allowed)) {
> >  		save_fpregs_to_fpstate(kfpu);
> >  		__cpu_invalidate_fpregs_state();
> > -		oob_fpu_set_preempt(x86_task_fpu(tsk));
> > +		set_thread_flag(TIF_KERNEL_FP_PREEMPTED);
> >  	}
> >  }
> >  
> >  void fpu__resume_inband(void)
> >  {
> >  	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
> > -	struct task_struct *tsk = current;
> >  
> > -	if (tsk->flags & PF_KTHREAD)
> > -		return;
> > -
> > -	if (oob_fpu_preempted(x86_task_fpu(tsk))) {
> > +	if (test_and_clear_thread_flag(TIF_KERNEL_FP_PREEMPTED)) {
> 
> In most cases, we won't need to clear this flag, so a split
> test then clear-if-set approach would be more efficient.

I aligned that with the arm64 implementation of the fpu restore. Should
I fix that up for both implementations afterwards? Same applies to the
flag location / access.

> 
> -- 
> Philippe.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption
  2025-09-04  7:36     ` Florian Bezdeka
@ 2025-09-04  7:54       ` Philippe Gerum
  0 siblings, 0 replies; 6+ messages in thread
From: Philippe Gerum @ 2025-09-04  7:54 UTC (permalink / raw)
  To: Florian Bezdeka; +Cc: Philippe Gerum, xenomai, Jan Kiszka

Florian Bezdeka <florian.bezdeka@siemens.com> writes:

> On Wed, 2025-09-03 at 18:07 +0200, Philippe Gerum wrote:
>> Florian Bezdeka <florian.bezdeka@siemens.com> writes:
>> 
>> > The kernel splat looked as follows and could be reproduced by running
>> > the Xenomai 3 testsuite over a stress-ng produced load. The problem is
>> > specific to Dovetail 6.16.
>> > 
>> > [19527.937613] ------------[ cut here ]------------
>> > [19527.937614] WARNING: CPU: 0 PID: 14 at arch/x86/kernel/fpu/core.c:62 x86_task_fpu+0x18/0x30
>> > [19527.937619] Modules linked in: intel_rapl_msr intel_rapl_common intel_pmc_core pmt_telemetry pmt_class intel_pmc_ssram_telemetry intel_vsec ghash_clmulni_intel sha512_ssse3 sha1_ssse3 aesni_intel gf128mul rapl pcspkr button evdev serio_raw joydev sg loop efi_pstore fuse dm_mod drm dax configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 sr_mod cdrom ata_generic sd_mod hid_generic hid_hyperv ata_piix psmouse hid hv_netvsc hv_storvsc libata scsi_transport_fc floppy i2c_piix4 i2c_smbus scsi_mod scsi_common [last unloaded: xeno_can]
>> > [19527.937635] CPU: 0 UID: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.16.0-hyprv-xenomai-0+ #196 PREEMPT(voluntary)
>> > [19527.937637] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
>> > [19527.937637] IRQ stage: Xenomai
>> > [19527.937638] RIP: 0010:x86_task_fpu+0x18/0x30
>> > [19527.937640] Code: 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8d 87 00 20 00 00 f6 47 3e 20 75 05 c3 cc cc cc cc 90 <0f> 0b 90 31 c0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f
>> > [19527.937641] RSP: 0018:ffffba19c0083b60 EFLAGS: 00010002
>> > [19527.937641] RAX: ffffa0a90133c3c0 RBX: 0000000000000001 RCX: ffffa0a93fc5a6e8
>> > [19527.937642] RDX: 0000000000000000 RSI: 0000000000000007 RDI: ffffa0a90133a3c0
>> > [19527.937643] RBP: ffffa0a93fc5a058 R08: 01364d9cb9685197 R09: ffffffff83e406f8
>> > [19527.937643] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
>> > [19527.937643] R13: ffffa0a90133a3c0 R14: ffffba19c0c62008 R15: 0000000000000001
>> > [19527.937645] FS:  0000000000000000(0000) GS:ffffa0a9ba664000(0000) knlGS:0000000000000000
>> > [19527.937646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [19527.937646] CR2: 00007f5c694363c0 CR3: 000000003e038004 CR4: 00000000003706f0
>> > [19527.937647] Call Trace:
>> > [19527.937648]  <TASK>
>> > [19527.937649]  fpu__suspend_inband+0x3c/0x50
>> > [19527.937651]  dovetail_context_switch+0x130/0x240
>> > [19527.937652]  ? xntimer_start+0x101/0x240
>> > [19527.937656]  ___xnsched_run+0x1f9/0x4d0
>> > [19527.937659]  run_oob_call+0x87/0x140
>> > [19527.937662]  handle_irq_pipelined_finish+0x188/0x1a0
>> > [19527.937664]  arch_pipeline_entry+0x43/0xf0
>> > [19527.937667]  asm_sysvec_hyperv_stimer0+0x16/0x20
>> > [19527.937669] RIP: 0010:kernel_fpu_end+0x5/0x50
>> > [19527.937670] Code: 57 02 e8 9e 12 42 00 31 c0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <65> 0f b6 05 03 40 57 02 84 c0 75 31 65 c6 05 f7 3f 57 02 01 65 f6
>> > [19527.937671] RSP: 0018:ffffba19c0083d28 EFLAGS: 00000246
>> > [19527.937671] RAX: 000000000000000c RBX: 0000000000000001 RCX: ffffffff842f2260
>> > [19527.937672] RDX: 0000000000000000 RSI: ffffffff84c67450 RDI: ffffffff84c673e0
>> > [19527.937672] RBP: 0000000000000001 R08: ffffffff842f2260 R09: 0000000000000000
>> > [19527.937672] R10: ffffffff84a080c0 R11: 0000000000000001 R12: ffffffff84c67410
>> > [19527.937673] R13: ffffffff84c673e0 R14: 0000000000000040 R15: 0000000000000004
>> > [19527.937675]  blake2s_compress+0x52/0xa0
>> > [19527.937679]  blake2s_update+0x75/0x150
>> > [19527.937681]  add_timer_randomness+0xac/0x1a0
>> > [19527.937684]  scsi_end_request+0x297/0x3a0 [scsi_mod]
>> > [19527.937697]  scsi_io_completion+0x55/0x6a0 [scsi_mod]
>> > [19527.937707]  blk_done_softirq+0x46/0x60
>> > [19527.937709]  handle_softirqs+0xd0/0x430
>> > [19527.937711]  ? __pfx_smpboot_thread_fn+0x10/0x10
>> > [19527.937713]  run_ksoftirqd+0x47/0x60
>> > [19527.937714]  smpboot_thread_fn+0xec/0x220
>> > [19527.937716]  kthread+0xf7/0x240
>> > [19527.937718]  ? __pfx_kthread+0x10/0x10
>> > [19527.937720]  ret_from_fork+0x28c/0x310
>> > [19527.937721]  ? __pfx_kthread+0x10/0x10
>> > [19527.937722]  ret_from_fork_asm+0x1a/0x30
>> > [19527.937725]  </TASK>
>> > [19527.937725] irq event stamp: 1270
>> > [19527.937726] hardirqs last  enabled at (1269): [<ffffffff830b2b1c>] run_ksoftirqd+0x4c/0x60
>> > [19527.937727] hardirqs last disabled at (1270): [<ffffffff83d90d53>] __schedule+0x853/0x1080
>> > [19527.937729] softirqs last  enabled at (1268): [<ffffffff830b29d6>] handle_softirqs+0x346/0x430
>> > [19527.937730] softirqs last disabled at (1215): [<ffffffff830b2b17>] run_ksoftirqd+0x47/0x60
>> > [19527.937731] ---[ end trace 0000000000000000 ]---
>> > 
>> > The main problem was the x86_task_fpu(tsk) helper, which will deliver
>> > NULL in case CONFIG_X86_DEBUG_FPU is enabled and
>> > (tsk->flags & PF_KTHREAD) is true. We ended up in dereferencing a
>> > NULL pointer.
>> > 
>> > This problem could be fixed by not using the x86_task_fpu().
>> > But:
>> >   - It seems that the location of struct fpu is under development.
>> >   - Moving the preemption marker from struct fpu to
>> >     struct thread_info::flags synchronizes the amd64 and arm64
>> >     implementation.
>> > 
>> > Second, the early exit in fpu__resume_inband() was wrong, as that
>> > prevented the FPU restore of in-band kthreads.
>> > 
>> > Link: https://lore.kernel.org/xenomai/80a48c1ddc3721eb8022b77548028aa8e7371928.camel@siemens.com/T/
>> > Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
>> > ---
>> >  arch/x86/include/asm/fpu/sched.h   | 17 ++++-------------
>> >  arch/x86/include/asm/fpu/types.h   | 12 ------------
>> >  arch/x86/include/asm/thread_info.h |  1 +
>> >  arch/x86/kernel/fpu/core.c         | 12 ++++--------
>> >  4 files changed, 9 insertions(+), 33 deletions(-)
>> > 
>> > diff --git a/arch/x86/include/asm/fpu/sched.h b/arch/x86/include/asm/fpu/sched.h
>> > index ba677ffa4fbb..2b66960accd4 100644
>> > --- a/arch/x86/include/asm/fpu/sched.h
>> > +++ b/arch/x86/include/asm/fpu/sched.h
>> > @@ -18,19 +18,9 @@ extern void fpu_flush_thread(void);
>> >  
>> >  #ifdef CONFIG_DOVETAIL
>> >  
>> > -static inline void oob_fpu_set_preempt(struct fpu *fpu)
>> > +static inline bool oob_fpu_preempted(struct thread_info *old_ti)
>> >  {
>> > -	fpu->preempted = true;
>> > -}
>> > -
>> > -static inline void oob_fpu_clear_preempt(struct fpu *fpu)
>> > -{
>> > -	fpu->preempted = false;
>> > -}
>> > -
>> > -static inline bool oob_fpu_preempted(struct fpu *old_fpu)
>> > -{
>> > -	return old_fpu->preempted;
>> > +	return test_thread_flag(TIF_KERNEL_FP_PREEMPTED);
>> >  }
>> >  
>> >  #else
>> > @@ -60,9 +50,10 @@ static inline void switch_fpu(struct task_struct *old, int cpu)
>> >  {
>> >  	if (!test_tsk_thread_flag(old, TIF_NEED_FPU_LOAD) &&
>> >  		cpu_feature_enabled(X86_FEATURE_FPU)) {
>> > +		struct thread_info *old_ti = task_thread_info(old);
>> >  		struct fpu *old_fpu = x86_task_fpu(old);
>> >  		if (!(old->flags & (PF_KTHREAD | PF_USER_WORKER)) &&
>> > -			!oob_fpu_preempted(old_fpu)) {
>> > +			!oob_fpu_preempted(old_ti)) {
>> >  			set_tsk_thread_flag(old, TIF_NEED_FPU_LOAD);
>> >  			save_fpregs_to_fpstate(old_fpu);
>> >  			/*
>> > diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
>> > index c329a878a581..1c94121acd3d 100644
>> > --- a/arch/x86/include/asm/fpu/types.h
>> > +++ b/arch/x86/include/asm/fpu/types.h
>> > @@ -473,18 +473,6 @@ struct fpu {
>> >  	 */
>> >  	unsigned int			last_cpu;
>> >  
>> > -#ifdef CONFIG_DOVETAIL
>> > -	/*
>> > -	 * @preempted:
>> > -	 *
>> > -	 * When Dovetail is enabled, this flag is set for the inband
>> > -	 * task context saved when entering a kernel_fpu_begin/end()
>> > -	 * section before the latter got preempted by an out-of-band
>> > -	 * task.
>> > -	 */
>> > -	bool				preempted : 1;
>> > -#endif
>> > -
>> >  	/*
>> >  	 * @avx512_timestamp:
>> >  	 *
>> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
>> > index 77c4c632cef2..53f11b73d0f0 100644
>> > --- a/arch/x86/include/asm/thread_info.h
>> > +++ b/arch/x86/include/asm/thread_info.h
>> > @@ -94,6 +94,7 @@ struct thread_info {
>> >  #define TIF_NEED_RESCHED_LAZY	4	/* Lazy rescheduling needed */
>> >  #define TIF_SINGLESTEP		5	/* reenable singlestep on user return*/
>> >  #define TIF_SSBD		6	/* Speculative store bypass disable */
>> > +#define TIF_KERNEL_FP_PREEMPTED	7	/* Dovetail: kernel mode
>> > FPU section preempted by OOB */
>> 
>> Only the current task may set/clear/read this flag, so the latter should
>> rather go to the TLF (synchronous) section. As a consequence, non-atomic
>> ops would be enough to access it.
>> 
>> >  #define TIF_SPEC_IB		9	/* Indirect branch speculation mitigation */
>> >  #define TIF_SPEC_L1D_FLUSH	10	/* Flush L1D on mm switches (processes) */
>> >  #define TIF_USER_RETURN_NOTIFY	11	/* notify kernel of userspace return */
>> > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
>> > index 7b87a04c0b6e..6d185ad1a996 100644
>> > --- a/arch/x86/kernel/fpu/core.c
>> > +++ b/arch/x86/kernel/fpu/core.c
>> > @@ -1044,7 +1044,6 @@ static int fpu__drop_kernel_fpstate(unsigned int cpu)
>> >  void fpu__suspend_inband(void)
>> >  {
>> >  	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
>> > -	struct task_struct *tsk = current;
>> >  
>> >  	/*
>> >  	 * If kernel_fpu_allowed is false, we are dealing with the
>> > @@ -1054,23 +1053,20 @@ void fpu__suspend_inband(void)
>> >  	if (!this_cpu_read(kernel_fpu_allowed)) {
>> >  		save_fpregs_to_fpstate(kfpu);
>> >  		__cpu_invalidate_fpregs_state();
>> > -		oob_fpu_set_preempt(x86_task_fpu(tsk));
>> > +		set_thread_flag(TIF_KERNEL_FP_PREEMPTED);
>> >  	}
>> >  }
>> >  
>> >  void fpu__resume_inband(void)
>> >  {
>> >  	struct fpu *kfpu = this_cpu_read(in_kernel_fpstate);
>> > -	struct task_struct *tsk = current;
>> >  
>> > -	if (tsk->flags & PF_KTHREAD)
>> > -		return;
>> > -
>> > -	if (oob_fpu_preempted(x86_task_fpu(tsk))) {
>> > +	if (test_and_clear_thread_flag(TIF_KERNEL_FP_PREEMPTED)) {
>> 
>> In most cases, we won't need to clear this flag, so a split
>> test then clear-if-set approach would be more efficient.
>
> I aligned that with the arm64 implementation of the fpu restore. Should
> I fix that up for both implementations afterwards? Same applies to the
> flag location / access.
>

Yep, that would make sense to align both.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-09-04  7:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22  6:54 [PATCH Dovetail 6.16 v2 0/2] Dovetail 6.16: Fix some regressions introduced during last forward port Florian Bezdeka
2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 1/2] x86/fpu: dovetail: Fix FPU corruption Florian Bezdeka
2025-09-03 16:07   ` Philippe Gerum
2025-09-04  7:36     ` Florian Bezdeka
2025-09-04  7:54       ` Philippe Gerum
2025-08-22  6:55 ` [PATCH Dovetail 6.16 v2 2/2] irq_pipeline: Fix pipelining code for level triggered IRQs Florian Bezdeka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.