* [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
@ 2024-05-16 16:22 Johannes Nixdorf
2024-05-16 16:25 ` Marc Zyngier
0 siblings, 1 reply; 18+ messages in thread
From: Johannes Nixdorf @ 2024-05-16 16:22 UTC (permalink / raw)
To: linux-arm-kernel
I noticed frequent FS corruption on my M1 MacBook running Linux after
the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy
restore for kernel mode FPSIMD").
This was reproduced with fio's examples/basic-verify.fio (1GB of writing
was not reliably, 10GB triggered it reliably) on vanilla kernels and
happens on any storage backend behind dm-crypt.
I was advised to report it here on IRC.
This was independently described in [1].
Regards,
Johannes Nixdorf
[1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 16:22 [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") Johannes Nixdorf @ 2024-05-16 16:25 ` Marc Zyngier 2024-05-16 17:16 ` Dave Martin ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Marc Zyngier @ 2024-05-16 16:25 UTC (permalink / raw) To: Johannes Nixdorf, Ard Biesheuvel, Mark Brown; +Cc: linux-arm-kernel + Ard, Broonie On 2024-05-16 17:22, Johannes Nixdorf wrote: > I noticed frequent FS corruption on my M1 MacBook running Linux after > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x). > > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy > restore for kernel mode FPSIMD"). > > This was reproduced with fio's examples/basic-verify.fio (1GB of > writing > was not reliably, 10GB triggered it reliably) on vanilla kernels and > happens on any storage backend behind dm-crypt. > > I was advised to report it here on IRC. > > This was independently described in [1]. > > Regards, > Johannes Nixdorf > > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200 > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 16:25 ` Marc Zyngier @ 2024-05-16 17:16 ` Dave Martin 2024-05-16 17:17 ` Ard Biesheuvel ` (2 subsequent siblings) 3 siblings, 0 replies; 18+ messages in thread From: Dave Martin @ 2024-05-16 17:16 UTC (permalink / raw) To: Marc Zyngier Cc: Johannes Nixdorf, Ard Biesheuvel, Mark Brown, linux-arm-kernel On Thu, May 16, 2024 at 05:25:32PM +0100, Marc Zyngier wrote: > + Ard, Broonie > > On 2024-05-16 17:22, Johannes Nixdorf wrote: > > I noticed frequent FS corruption on my M1 MacBook running Linux after > > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x). > > > > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy > > restore for kernel mode FPSIMD"). It's a while since I worked on this code, and things have moved in the meantime, but there seems to be an asymmetry between where fpsimd_bind_state_to_cpu() is called here and where the analogous fpsimd_bind_task_to_cpu() is called for the regular task state. Originally, these hooks did the bookkeeping at load-time to record where the state is loaded. To record this info for the user task state at sched-in time but to defer it until sched-out time for the kernel state looks weird to me. I'd be concerned that the state is getting messed up on the back of an interrupt or similar in the meantime. I haven't fully understood what the current version of this code is doing, but that might be a place to start looking... Cheers ---Dave > > > > This was reproduced with fio's examples/basic-verify.fio (1GB of writing > > was not reliably, 10GB triggered it reliably) on vanilla kernels and > > happens on any storage backend behind dm-crypt. > > > > I was advised to report it here on IRC. > > > > This was independently described in [1]. > > > > Regards, > > Johannes Nixdorf > > > > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200 > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > -- > Jazz is not dead. It just smells funny... > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 16:25 ` Marc Zyngier 2024-05-16 17:16 ` Dave Martin @ 2024-05-16 17:17 ` Ard Biesheuvel 2024-05-17 11:37 ` Will Deacon ` (2 more replies) 2024-05-16 17:34 ` Johannes Nixdorf 2024-05-21 6:22 ` Johannes Nixdorf 3 siblings, 3 replies; 18+ messages in thread From: Ard Biesheuvel @ 2024-05-16 17:17 UTC (permalink / raw) To: Marc Zyngier, Catalin Marinas, Will Deacon, Mark Rutland Cc: Johannes Nixdorf, Mark Brown, linux-arm-kernel On Thu, 16 May 2024 at 18:25, Marc Zyngier <maz@kernel.org> wrote: > > + Ard, Broonie > Ugh. This is going to be tricky to track down if it takes 10G of data to reproduce. For the time being, maybe we should just revert and take the time to really dig into this? It appears to revert cleanly, and the performance gain of the optimization was never quantified in the first place, so perhaps we should get some numbers too when we bring it back. > On 2024-05-16 17:22, Johannes Nixdorf wrote: > > I noticed frequent FS corruption on my M1 MacBook running Linux after > > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x). > > > > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy > > restore for kernel mode FPSIMD"). > > > > This was reproduced with fio's examples/basic-verify.fio (1GB of > > writing > > was not reliably, 10GB triggered it reliably) on vanilla kernels and > > happens on any storage backend behind dm-crypt. > > > > I was advised to report it here on IRC. > > > > This was independently described in [1]. > > > > Regards, > > Johannes Nixdorf > > > > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200 > > > > _______________________________________________ > > linux-arm-kernel mailing list > > linux-arm-kernel@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > -- > Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 17:17 ` Ard Biesheuvel @ 2024-05-17 11:37 ` Will Deacon 2024-05-17 11:40 ` Mark Brown 2024-05-17 11:57 ` Mark Rutland 2 siblings, 0 replies; 18+ messages in thread From: Will Deacon @ 2024-05-17 11:37 UTC (permalink / raw) To: Ard Biesheuvel Cc: Marc Zyngier, Catalin Marinas, Mark Rutland, Johannes Nixdorf, Mark Brown, linux-arm-kernel On Thu, May 16, 2024 at 07:17:00PM +0200, Ard Biesheuvel wrote: > On Thu, 16 May 2024 at 18:25, Marc Zyngier <maz@kernel.org> wrote: > > > > + Ard, Broonie > > > > Ugh. > > This is going to be tricky to track down if it takes 10G of data to reproduce. > > For the time being, maybe we should just revert and take the time to > really dig into this? > > It appears to revert cleanly, and the performance gain of the > optimization was never quantified in the first place, so perhaps we > should get some numbers too when we bring it back. Given that 6.8 is affected too, I'll queue up the revert with a cc stable now. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 17:17 ` Ard Biesheuvel 2024-05-17 11:37 ` Will Deacon @ 2024-05-17 11:40 ` Mark Brown 2024-05-17 11:57 ` Mark Rutland 2 siblings, 0 replies; 18+ messages in thread From: Mark Brown @ 2024-05-17 11:40 UTC (permalink / raw) To: Ard Biesheuvel Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Mark Rutland, Johannes Nixdorf, linux-arm-kernel [-- Attachment #1.1: Type: text/plain, Size: 553 bytes --] On Thu, May 16, 2024 at 07:17:00PM +0200, Ard Biesheuvel wrote: > This is going to be tricky to track down if it takes 10G of data to reproduce. OTOH it's 10G on hardware which does help but yeah. > For the time being, maybe we should just revert and take the time to > really dig into this? > It appears to revert cleanly, and the performance gain of the > optimization was never quantified in the first place, so perhaps we > should get some numbers too when we bring it back. I agree that this is the quickest and simplest thing to do as a fix. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 17:17 ` Ard Biesheuvel 2024-05-17 11:37 ` Will Deacon 2024-05-17 11:40 ` Mark Brown @ 2024-05-17 11:57 ` Mark Rutland 2 siblings, 0 replies; 18+ messages in thread From: Mark Rutland @ 2024-05-17 11:57 UTC (permalink / raw) To: Ard Biesheuvel Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Johannes Nixdorf, Mark Brown, linux-arm-kernel On Thu, May 16, 2024 at 07:17:00PM +0200, Ard Biesheuvel wrote: > On Thu, 16 May 2024 at 18:25, Marc Zyngier <maz@kernel.org> wrote: > > > > + Ard, Broonie > > > > Ugh. > > This is going to be tricky to track down if it takes 10G of data to reproduce. > > For the time being, maybe we should just revert and take the time to > really dig into this? > > It appears to revert cleanly, and the performance gain of the > optimization was never quantified in the first place, so perhaps we > should get some numbers too when we bring it back. I agree that reverting is the right thing to do. The main functional reason we wanted this was in preparation for PREEMPT_AUTO, and that's still a way off. Mark. > > > > > On 2024-05-16 17:22, Johannes Nixdorf wrote: > > > I noticed frequent FS corruption on my M1 MacBook running Linux after > > > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x). > > > > > > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy > > > restore for kernel mode FPSIMD"). > > > > > > This was reproduced with fio's examples/basic-verify.fio (1GB of > > > writing > > > was not reliably, 10GB triggered it reliably) on vanilla kernels and > > > happens on any storage backend behind dm-crypt. > > > > > > I was advised to report it here on IRC. > > > > > > This was independently described in [1]. > > > > > > Regards, > > > Johannes Nixdorf > > > > > > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200 > > > > > > _______________________________________________ > > > linux-arm-kernel mailing list > > > linux-arm-kernel@lists.infradead.org > > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > -- > > Jazz is not dead. It just smells funny... _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 16:25 ` Marc Zyngier 2024-05-16 17:16 ` Dave Martin 2024-05-16 17:17 ` Ard Biesheuvel @ 2024-05-16 17:34 ` Johannes Nixdorf 2024-05-21 6:22 ` Johannes Nixdorf 3 siblings, 0 replies; 18+ messages in thread From: Johannes Nixdorf @ 2024-05-16 17:34 UTC (permalink / raw) To: Marc Zyngier; +Cc: Ard Biesheuvel, Mark Brown, linux-arm-kernel On 2024-05-16 17:22, Johannes Nixdorf wrote: > I noticed frequent FS corruption on my M1 MacBook running Linux after > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x). A small correction: I noticed it at the jump to 6.8.x. The bisect started with 6.8.9 as the first bad commit. To make sure I now tested it again with the current master at ea5f6ad9ad96. The bug still reproduces with it. > This was reproduced with fio's examples/basic-verify.fio (1GB of writing > was not reliably, 10GB triggered it reliably) on vanilla kernels and > happens on any storage backend behind dm-crypt. I reproduced it with the following script in the initramfs since I did not have a working keyboard with vanilla kernels: fallocate -l $((1024 * 1024 * 1024)) disk.img losetup -f disk.img echo test | cryptsetup luksFormat -q /dev/loop0 echo test | cryptsetup open /dev/loop0 test fio /verify.fio And the following verify.fio: [write-and-verify] loops=10 rw=randwrite bs=4k direct=1 ioengine=libaio iodepth=16 verify=crc32c filename=/dev/mapper/test Regards, Johannes Nixdorf _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-16 16:25 ` Marc Zyngier ` (2 preceding siblings ...) 2024-05-16 17:34 ` Johannes Nixdorf @ 2024-05-21 6:22 ` Johannes Nixdorf 2024-05-21 8:55 ` Ard Biesheuvel 2024-05-21 18:34 ` Will Deacon 3 siblings, 2 replies; 18+ messages in thread From: Johannes Nixdorf @ 2024-05-21 6:22 UTC (permalink / raw) To: Marc Zyngier; +Cc: Ard Biesheuvel, Mark Brown, linux-arm-kernel, Janne Grunau Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") reverted during prolonged interactive usage with the downstream Asahi Linux kernel. This prompted me to adjust the reproducer to be closer to the desktop use case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch"). With the vanilla kernel before the commit or that commit reverted on the Asahi Linux kernel the new reproducer also sees no bug, and interactive usage seems fine. The old reproducer used a loop device backed by the initramfs' tmpfs. The new reproducer now uses the actual hardware nvme device as backend: init: dev=/dev/nvme0n1p7 dm_name=test dm_dev=/dev/mapper/${dm_name} echo test | cryptsetup luksFormat -q ${dev} echo test | cryptsetup open ${dev} test mkfs.ext4 ${dm_dev} mount ${dm_dev} /target cd /target fio /verify.fio verify.fio: [write-and-verify] loops=10 rw=randwrite bs=4k direct=1 ioengine=libaio iodepth=16 verify=crc32c size=1G CC: +jannau, as I previously told him aefbab8e77eb was safe after reverting it for the wrong reasons, and then finding out is was not needed after retesting with my old reproducer. Kind Regards, Johannes Nixdorf _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 6:22 ` Johannes Nixdorf @ 2024-05-21 8:55 ` Ard Biesheuvel 2024-05-21 12:56 ` Mark Brown 2024-05-21 18:34 ` Will Deacon 1 sibling, 1 reply; 18+ messages in thread From: Ard Biesheuvel @ 2024-05-21 8:55 UTC (permalink / raw) To: Johannes Nixdorf; +Cc: Marc Zyngier, Mark Brown, linux-arm-kernel, Janne Grunau On Tue, 21 May 2024 at 08:22, Johannes Nixdorf <mixi@shadowice.org> wrote: > > Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement > lazy restore for kernel mode FPSIMD") reverted during prolonged interactive > usage with the downstream Asahi Linux kernel. > > This prompted me to adjust the reproducer to be closer to the desktop use > case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore > kernel mode NEON at context switch"). With the vanilla kernel before the > commit or that commit reverted on the Asahi Linux kernel the new reproducer > also sees no bug, and interactive usage seems fine. > > The old reproducer used a loop device backed by the initramfs' tmpfs. The > new reproducer now uses the actual hardware nvme device as backend: > > init: > dev=/dev/nvme0n1p7 > dm_name=test > dm_dev=/dev/mapper/${dm_name} > echo test | cryptsetup luksFormat -q ${dev} > echo test | cryptsetup open ${dev} test > mkfs.ext4 ${dm_dev} > mount ${dm_dev} /target > cd /target > fio /verify.fio > > verify.fio: > [write-and-verify] > loops=10 > rw=randwrite > bs=4k > direct=1 > ioengine=libaio > iodepth=16 > verify=crc32c > size=1G > Thanks for these instructions - I am trying to reproduce but no 'luck' yet. Could you please share your complete kernel config (based on the vanilla kernel at commit aefbab8e77eb), and a dump of the FIO output on a failure case? _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 8:55 ` Ard Biesheuvel @ 2024-05-21 12:56 ` Mark Brown 0 siblings, 0 replies; 18+ messages in thread From: Mark Brown @ 2024-05-21 12:56 UTC (permalink / raw) To: Ard Biesheuvel Cc: Johannes Nixdorf, Marc Zyngier, linux-arm-kernel, Janne Grunau [-- Attachment #1.1: Type: text/plain, Size: 438 bytes --] On Tue, May 21, 2024 at 10:55:42AM +0200, Ard Biesheuvel wrote: > Thanks for these instructions - I am trying to reproduce but no 'luck' yet. I've also been having a hard time getting things to show. I also have an extension to fp-stress which adds coverage of kernel mode crypto that I'll probably polish a little and post today, as things stand it'll need some hacking out of yield points in the kernel to do anything useful though. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 6:22 ` Johannes Nixdorf 2024-05-21 8:55 ` Ard Biesheuvel @ 2024-05-21 18:34 ` Will Deacon 2024-05-21 18:44 ` Mark Brown 2024-05-21 20:06 ` Janne Grunau 1 sibling, 2 replies; 18+ messages in thread From: Will Deacon @ 2024-05-21 18:34 UTC (permalink / raw) To: Johannes Nixdorf Cc: Marc Zyngier, Ard Biesheuvel, Mark Brown, linux-arm-kernel, Janne Grunau Hi Johannes, On Tue, May 21, 2024 at 08:22:08AM +0200, Johannes Nixdorf wrote: > Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement > lazy restore for kernel mode FPSIMD") reverted during prolonged interactive > usage with the downstream Asahi Linux kernel. Damn, but thanks for the update. I have to ask, but are you absolutely sure this was with 2632e2521769 reverted? If you're able to double-check that, it would be great, since we're having trouble reproducing the issue. > This prompted me to adjust the reproducer to be closer to the desktop use > case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore > kernel mode NEON at context switch"). With the vanilla kernel before the > commit or that commit reverted on the Asahi Linux kernel the new reproducer > also sees no bug, and interactive usage seems fine. I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD"), so it sounds like I should revert aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at context switch") as well while we work to reproduce the issue. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 18:34 ` Will Deacon @ 2024-05-21 18:44 ` Mark Brown 2024-05-21 18:57 ` Ard Biesheuvel 2024-05-21 20:06 ` Janne Grunau 1 sibling, 1 reply; 18+ messages in thread From: Mark Brown @ 2024-05-21 18:44 UTC (permalink / raw) To: Will Deacon Cc: Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel, linux-arm-kernel, Janne Grunau [-- Attachment #1.1: Type: text/plain, Size: 630 bytes --] On Tue, May 21, 2024 at 07:34:45PM +0100, Will Deacon wrote: > I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy > restore for kernel mode FPSIMD"), so it sounds like I should revert > aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at > context switch") as well while we work to reproduce the issue. For paranoia I'd be tempted to revert 9b19700e623f9 ("arm64: fpsimd: Drop unneeded 'busy' flag") as well since we're not sure where exactly the issue is, that'd drop the entire series. It's hopefully excessive but the exact bisection doesn't seem 100% clear and obviously we missed something. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 18:44 ` Mark Brown @ 2024-05-21 18:57 ` Ard Biesheuvel 0 siblings, 0 replies; 18+ messages in thread From: Ard Biesheuvel @ 2024-05-21 18:57 UTC (permalink / raw) To: Mark Brown Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, linux-arm-kernel, Janne Grunau On Tue, 21 May 2024 at 20:44, Mark Brown <broonie@kernel.org> wrote: > > On Tue, May 21, 2024 at 07:34:45PM +0100, Will Deacon wrote: > > > I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy > > restore for kernel mode FPSIMD"), so it sounds like I should revert > > aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at > > context switch") as well while we work to reproduce the issue. > > For paranoia I'd be tempted to revert 9b19700e623f9 ("arm64: fpsimd: > Drop unneeded 'busy' flag") as well since we're not sure where exactly > the issue is, that'd drop the entire series. It's hopefully excessive > but the exact bisection doesn't seem 100% clear and obviously we missed > something. We'll need to revert this ccm change as well, but I'll need to do that manually. 88c6d50f649b2805bbdfe0b616892f93db47e4fa https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88c6d50f649b2805bbdfe0b616892f93db47e4fa _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 18:34 ` Will Deacon 2024-05-21 18:44 ` Mark Brown @ 2024-05-21 20:06 ` Janne Grunau 2024-05-21 20:21 ` Mark Brown 1 sibling, 1 reply; 18+ messages in thread From: Janne Grunau @ 2024-05-21 20:06 UTC (permalink / raw) To: Will Deacon, Johannes Nixdorf Cc: Marc Zyngier, Ard Biesheuvel, Mark Brown, linux-arm-kernel, asahi Hej, On Tue, May 21, 2024, at 20:34, Will Deacon wrote: > Hi Johannes, > > On Tue, May 21, 2024 at 08:22:08AM +0200, Johannes Nixdorf wrote: >> Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement >> lazy restore for kernel mode FPSIMD") reverted during prolonged interactive >> usage with the downstream Asahi Linux kernel. > > Damn, but thanks for the update. I have to ask, but are you absolutely > sure this was with 2632e2521769 reverted? If you're able to double-check > that, it would be great, since we're having trouble reproducing the > issue. I can reproduce the issue with v6.8 and 2632e2521769 reverted on M1 (t8103). Reproduction with 2632e2521769 reverted is harder. I've seen multiple fio runs without verification errors while with plain v6.8 verification errors are hit after a few seconds. Running SIMD intense workloads in user space apparently increase the reproduction odds. When running AV1 decoding using dav1d in parallel errors appear faster. Errors manifest either in changed decoder output, fio verification errors or both. I'm using `dav1d -i sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd assume the exact SIMD user space code doesn't matter as long as it runs on all CPU cores. >> This prompted me to adjust the reproducer to be closer to the desktop use >> case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore >> kernel mode NEON at context switch"). With the vanilla kernel before the >> commit or that commit reverted on the Asahi Linux kernel the new reproducer >> also sees no bug, and interactive usage seems fine. > > I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy > restore for kernel mode FPSIMD"), so it sounds like I should revert > aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at > context switch") as well while we work to reproduce the issue. v6.8 with 2632e2521769 and aefbab8e77eb reverted does no longer reproduce errors. av1 decoding produces a stable hash as expected and fio does not report any verification errors. best regards Janne _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 20:06 ` Janne Grunau @ 2024-05-21 20:21 ` Mark Brown 2024-05-21 21:23 ` Janne Grunau 0 siblings, 1 reply; 18+ messages in thread From: Mark Brown @ 2024-05-21 20:21 UTC (permalink / raw) To: Janne Grunau Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel, linux-arm-kernel, asahi [-- Attachment #1.1: Type: text/plain, Size: 732 bytes --] On Tue, May 21, 2024 at 10:06:36PM +0200, Janne Grunau wrote: > Running SIMD intense workloads in user space apparently increase the > reproduction odds. When running AV1 decoding using dav1d in parallel > errors appear faster. Errors manifest either in changed decoder > output, fio verification errors or both. I'm using `dav1d -i > sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd > assume the exact SIMD user space code doesn't matter as long as it > runs on all CPU cores. There's fp-stress in tools/testing/selftests/arm64/fp which will run two copies of fpsimd-test (from the same directory) per core in parallel looking for corruption in the FPSIMD registers. Specify '-t -1' and it'll run for ever. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 20:21 ` Mark Brown @ 2024-05-21 21:23 ` Janne Grunau 2024-05-22 11:14 ` Mark Brown 0 siblings, 1 reply; 18+ messages in thread From: Janne Grunau @ 2024-05-21 21:23 UTC (permalink / raw) To: Mark Brown Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel, linux-arm-kernel, asahi On Tue, May 21, 2024, at 22:21, Mark Brown wrote: > On Tue, May 21, 2024 at 10:06:36PM +0200, Janne Grunau wrote: > >> Running SIMD intense workloads in user space apparently increase the >> reproduction odds. When running AV1 decoding using dav1d in parallel >> errors appear faster. Errors manifest either in changed decoder >> output, fio verification errors or both. I'm using `dav1d -i >> sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd >> assume the exact SIMD user space code doesn't matter as long as it >> runs on all CPU cores. > > There's fp-stress in tools/testing/selftests/arm64/fp which will run two > copies of fpsimd-test (from the same directory) per core in parallel > looking for corruption in the FPSIMD registers. Specify '-t -1' and > it'll run for ever. It's hard (impossible) to reproduce just with fio and fp-stress. 5 consecutive fio runs (each 40-45 seconds) without verification error in fio or Mismatch from fpsimd-test. With AV1 decoding in parallel each fio run shows at least one of decoding mismatch, fio verification error or mismatches from fpsimd-test as below: | # FPSIMD-6-0: Mismatch: PID=2110, iteration=9989281, reg=0 | # FPSIMD-6-0: Expected [3e0800103e0840103e0880103e08c010] | # FPSIMD-6-0: Got [f0f3cf35dea2f3ea41a13a27d8d9369b] | # Sending signals, timeout remaining: -1 | ... | # Sending signals, timeout remaining: -1 | # FPSIMD-7-0: Mismatch: PID=2112, iteration=15635931, reg=6 | # FPSIMD-7-0: Expected [400806b0400846b0400886b04008c6b0] | # FPSIMD-7-0: Got [b24b366b6b3b470ce763389ad425a33d] | # FPSIMD-4-0: Mismatch: PID=2106, iteration=13371905, reg=0 | # FPSIMD-4-0: Expected [3a0800103a0840103a0880103a08c010] | # FPSIMD-4-0: Got [b9a7d504554cc797724fab09aa988e2f] | # Sending signals, timeout remaining: -1 | ... | # Sending signals, timeout remaining: -1 | # FPSIMD-5-0: Mismatch: PID=2108, iteration=14880477, reg=0 | # FPSIMD-5-0: Expected [3c0800d03c0840d03c0880d03c08c0d0] | # FPSIMD-5-0: Got [c550b9a0f0947cd38aa17241e129f9a6] | # FPSIMD-7-1: Mismatch: PID=2113, iteration=17682959, reg=0 | # FPSIMD-7-1: Expected [410800f0410840f0410880f04108c0f0] | # FPSIMD-7-1: Got [98a43b8ae2157b6b30497714c52bf6d6] | # FPSIMD-2-0: Mismatch: PID=2102, iteration=16482263, reg=0 | # FPSIMD-2-0: Expected [3608007036084070360880703608c070] | # FPSIMD-2-0: Got [1cf724cdaf8f997338ee499a5f1f33e7] In the majority of cases the mismatch is reported for reg=0. Running just fp-stress and AV1 decoding without fio reports no errors. The fio testing probably caused segfaults in sddm / kwin on the tests system using llvmpipe. best regards Janne _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") 2024-05-21 21:23 ` Janne Grunau @ 2024-05-22 11:14 ` Mark Brown 0 siblings, 0 replies; 18+ messages in thread From: Mark Brown @ 2024-05-22 11:14 UTC (permalink / raw) To: Janne Grunau Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel, linux-arm-kernel, asahi [-- Attachment #1.1: Type: text/plain, Size: 567 bytes --] On Tue, May 21, 2024 at 11:23:10PM +0200, Janne Grunau wrote: > On Tue, May 21, 2024, at 22:21, Mark Brown wrote: > It's hard (impossible) to reproduce just with fio and fp-stress. 5 > consecutive fio runs (each 40-45 seconds) without verification error > in fio or Mismatch from fpsimd-test. Huh, that's interesting. I guess the I/O or multiprocess stuff that AV1 decode implies makes the scheduling more interesting? > In the majority of cases the mismatch is reported for reg=0. That's about what I'd expect for a "we loaded the wrong register values" error. [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] [-- Attachment #2: Type: text/plain, Size: 176 bytes --] _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-05-22 11:14 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16 16:22 [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") Johannes Nixdorf
2024-05-16 16:25 ` Marc Zyngier
2024-05-16 17:16 ` Dave Martin
2024-05-16 17:17 ` Ard Biesheuvel
2024-05-17 11:37 ` Will Deacon
2024-05-17 11:40 ` Mark Brown
2024-05-17 11:57 ` Mark Rutland
2024-05-16 17:34 ` Johannes Nixdorf
2024-05-21 6:22 ` Johannes Nixdorf
2024-05-21 8:55 ` Ard Biesheuvel
2024-05-21 12:56 ` Mark Brown
2024-05-21 18:34 ` Will Deacon
2024-05-21 18:44 ` Mark Brown
2024-05-21 18:57 ` Ard Biesheuvel
2024-05-21 20:06 ` Janne Grunau
2024-05-21 20:21 ` Mark Brown
2024-05-21 21:23 ` Janne Grunau
2024-05-22 11:14 ` Mark Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).