* [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
@ 2024-05-16 16:22 Johannes Nixdorf
2024-05-16 16:25 ` Marc Zyngier
0 siblings, 1 reply; 18+ messages in thread
From: Johannes Nixdorf @ 2024-05-16 16:22 UTC (permalink / raw)
To: linux-arm-kernel
I noticed frequent FS corruption on my M1 MacBook running Linux after
the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy
restore for kernel mode FPSIMD").
This was reproduced with fio's examples/basic-verify.fio (1GB of writing
was not reliably, 10GB triggered it reliably) on vanilla kernels and
happens on any storage backend behind dm-crypt.
I was advised to report it here on IRC.
This was independently described in [1].
Regards,
Johannes Nixdorf
[1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 16:22 [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") Johannes Nixdorf
@ 2024-05-16 16:25 ` Marc Zyngier
2024-05-16 17:16 ` Dave Martin
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-05-16 16:25 UTC (permalink / raw)
To: Johannes Nixdorf, Ard Biesheuvel, Mark Brown; +Cc: linux-arm-kernel
+ Ard, Broonie
On 2024-05-16 17:22, Johannes Nixdorf wrote:
> I noticed frequent FS corruption on my M1 MacBook running Linux after
> the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
>
> A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy
> restore for kernel mode FPSIMD").
>
> This was reproduced with fio's examples/basic-verify.fio (1GB of
> writing
> was not reliably, 10GB triggered it reliably) on vanilla kernels and
> happens on any storage backend behind dm-crypt.
>
> I was advised to report it here on IRC.
>
> This was independently described in [1].
>
> Regards,
> Johannes Nixdorf
>
> [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 16:25 ` Marc Zyngier
@ 2024-05-16 17:16 ` Dave Martin
2024-05-16 17:17 ` Ard Biesheuvel
` (2 subsequent siblings)
3 siblings, 0 replies; 18+ messages in thread
From: Dave Martin @ 2024-05-16 17:16 UTC (permalink / raw)
To: Marc Zyngier
Cc: Johannes Nixdorf, Ard Biesheuvel, Mark Brown, linux-arm-kernel
On Thu, May 16, 2024 at 05:25:32PM +0100, Marc Zyngier wrote:
> + Ard, Broonie
>
> On 2024-05-16 17:22, Johannes Nixdorf wrote:
> > I noticed frequent FS corruption on my M1 MacBook running Linux after
> > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
> >
> > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy
> > restore for kernel mode FPSIMD").
It's a while since I worked on this code, and things have moved in the
meantime, but there seems to be an asymmetry between where
fpsimd_bind_state_to_cpu() is called here and where the analogous
fpsimd_bind_task_to_cpu() is called for the regular task state.
Originally, these hooks did the bookkeeping at load-time to record where
the state is loaded. To record this info for the user task state at
sched-in time but to defer it until sched-out time for the kernel state
looks weird to me. I'd be concerned that the state is getting messed
up on the back of an interrupt or similar in the meantime.
I haven't fully understood what the current version of this code is
doing, but that might be a place to start looking...
Cheers
---Dave
> >
> > This was reproduced with fio's examples/basic-verify.fio (1GB of writing
> > was not reliably, 10GB triggered it reliably) on vanilla kernels and
> > happens on any storage backend behind dm-crypt.
> >
> > I was advised to report it here on IRC.
> >
> > This was independently described in [1].
> >
> > Regards,
> > Johannes Nixdorf
> >
> > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> --
> Jazz is not dead. It just smells funny...
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 16:25 ` Marc Zyngier
2024-05-16 17:16 ` Dave Martin
@ 2024-05-16 17:17 ` Ard Biesheuvel
2024-05-17 11:37 ` Will Deacon
` (2 more replies)
2024-05-16 17:34 ` Johannes Nixdorf
2024-05-21 6:22 ` Johannes Nixdorf
3 siblings, 3 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2024-05-16 17:17 UTC (permalink / raw)
To: Marc Zyngier, Catalin Marinas, Will Deacon, Mark Rutland
Cc: Johannes Nixdorf, Mark Brown, linux-arm-kernel
On Thu, 16 May 2024 at 18:25, Marc Zyngier <maz@kernel.org> wrote:
>
> + Ard, Broonie
>
Ugh.
This is going to be tricky to track down if it takes 10G of data to reproduce.
For the time being, maybe we should just revert and take the time to
really dig into this?
It appears to revert cleanly, and the performance gain of the
optimization was never quantified in the first place, so perhaps we
should get some numbers too when we bring it back.
> On 2024-05-16 17:22, Johannes Nixdorf wrote:
> > I noticed frequent FS corruption on my M1 MacBook running Linux after
> > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
> >
> > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy
> > restore for kernel mode FPSIMD").
> >
> > This was reproduced with fio's examples/basic-verify.fio (1GB of
> > writing
> > was not reliably, 10GB triggered it reliably) on vanilla kernels and
> > happens on any storage backend behind dm-crypt.
> >
> > I was advised to report it here on IRC.
> >
> > This was independently described in [1].
> >
> > Regards,
> > Johannes Nixdorf
> >
> > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> --
> Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 16:25 ` Marc Zyngier
2024-05-16 17:16 ` Dave Martin
2024-05-16 17:17 ` Ard Biesheuvel
@ 2024-05-16 17:34 ` Johannes Nixdorf
2024-05-21 6:22 ` Johannes Nixdorf
3 siblings, 0 replies; 18+ messages in thread
From: Johannes Nixdorf @ 2024-05-16 17:34 UTC (permalink / raw)
To: Marc Zyngier; +Cc: Ard Biesheuvel, Mark Brown, linux-arm-kernel
On 2024-05-16 17:22, Johannes Nixdorf wrote:
> I noticed frequent FS corruption on my M1 MacBook running Linux after
> the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
A small correction: I noticed it at the jump to 6.8.x. The bisect started
with 6.8.9 as the first bad commit.
To make sure I now tested it again with the current master at ea5f6ad9ad96. The
bug still reproduces with it.
> This was reproduced with fio's examples/basic-verify.fio (1GB of writing
> was not reliably, 10GB triggered it reliably) on vanilla kernels and
> happens on any storage backend behind dm-crypt.
I reproduced it with the following script in the initramfs since I did not
have a working keyboard with vanilla kernels:
fallocate -l $((1024 * 1024 * 1024)) disk.img
losetup -f disk.img
echo test | cryptsetup luksFormat -q /dev/loop0
echo test | cryptsetup open /dev/loop0 test
fio /verify.fio
And the following verify.fio:
[write-and-verify]
loops=10
rw=randwrite
bs=4k
direct=1
ioengine=libaio
iodepth=16
verify=crc32c
filename=/dev/mapper/test
Regards,
Johannes Nixdorf
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 17:17 ` Ard Biesheuvel
@ 2024-05-17 11:37 ` Will Deacon
2024-05-17 11:40 ` Mark Brown
2024-05-17 11:57 ` Mark Rutland
2 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2024-05-17 11:37 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Catalin Marinas, Mark Rutland, Johannes Nixdorf,
Mark Brown, linux-arm-kernel
On Thu, May 16, 2024 at 07:17:00PM +0200, Ard Biesheuvel wrote:
> On Thu, 16 May 2024 at 18:25, Marc Zyngier <maz@kernel.org> wrote:
> >
> > + Ard, Broonie
> >
>
> Ugh.
>
> This is going to be tricky to track down if it takes 10G of data to reproduce.
>
> For the time being, maybe we should just revert and take the time to
> really dig into this?
>
> It appears to revert cleanly, and the performance gain of the
> optimization was never quantified in the first place, so perhaps we
> should get some numbers too when we bring it back.
Given that 6.8 is affected too, I'll queue up the revert with a cc stable
now.
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 17:17 ` Ard Biesheuvel
2024-05-17 11:37 ` Will Deacon
@ 2024-05-17 11:40 ` Mark Brown
2024-05-17 11:57 ` Mark Rutland
2 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2024-05-17 11:40 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Mark Rutland,
Johannes Nixdorf, linux-arm-kernel
[-- Attachment #1.1: Type: text/plain, Size: 553 bytes --]
On Thu, May 16, 2024 at 07:17:00PM +0200, Ard Biesheuvel wrote:
> This is going to be tricky to track down if it takes 10G of data to reproduce.
OTOH it's 10G on hardware which does help but yeah.
> For the time being, maybe we should just revert and take the time to
> really dig into this?
> It appears to revert cleanly, and the performance gain of the
> optimization was never quantified in the first place, so perhaps we
> should get some numbers too when we bring it back.
I agree that this is the quickest and simplest thing to do as a fix.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 17:17 ` Ard Biesheuvel
2024-05-17 11:37 ` Will Deacon
2024-05-17 11:40 ` Mark Brown
@ 2024-05-17 11:57 ` Mark Rutland
2 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2024-05-17 11:57 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Johannes Nixdorf,
Mark Brown, linux-arm-kernel
On Thu, May 16, 2024 at 07:17:00PM +0200, Ard Biesheuvel wrote:
> On Thu, 16 May 2024 at 18:25, Marc Zyngier <maz@kernel.org> wrote:
> >
> > + Ard, Broonie
> >
>
> Ugh.
>
> This is going to be tricky to track down if it takes 10G of data to reproduce.
>
> For the time being, maybe we should just revert and take the time to
> really dig into this?
>
> It appears to revert cleanly, and the performance gain of the
> optimization was never quantified in the first place, so perhaps we
> should get some numbers too when we bring it back.
I agree that reverting is the right thing to do.
The main functional reason we wanted this was in preparation for
PREEMPT_AUTO, and that's still a way off.
Mark.
>
>
>
> > On 2024-05-16 17:22, Johannes Nixdorf wrote:
> > > I noticed frequent FS corruption on my M1 MacBook running Linux after
> > > the Asahi Linux Kernel was updated to 6.9.x (from 6.6.x).
> > >
> > > A git bisect pointed me to 2632e2521769 ("arm64: fpsimd: Implement lazy
> > > restore for kernel mode FPSIMD").
> > >
> > > This was reproduced with fio's examples/basic-verify.fio (1GB of
> > > writing
> > > was not reliably, 10GB triggered it reliably) on vanilla kernels and
> > > happens on any storage backend behind dm-crypt.
> > >
> > > I was advised to report it here on IRC.
> > >
> > > This was independently described in [1].
> > >
> > > Regards,
> > > Johannes Nixdorf
> > >
> > > [1]: https://github.com/tpwrules/nixos-apple-silicon/issues/200
> > >
> > > _______________________________________________
> > > linux-arm-kernel mailing list
> > > linux-arm-kernel@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
> > --
> > Jazz is not dead. It just smells funny...
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-16 16:25 ` Marc Zyngier
` (2 preceding siblings ...)
2024-05-16 17:34 ` Johannes Nixdorf
@ 2024-05-21 6:22 ` Johannes Nixdorf
2024-05-21 8:55 ` Ard Biesheuvel
2024-05-21 18:34 ` Will Deacon
3 siblings, 2 replies; 18+ messages in thread
From: Johannes Nixdorf @ 2024-05-21 6:22 UTC (permalink / raw)
To: Marc Zyngier; +Cc: Ard Biesheuvel, Mark Brown, linux-arm-kernel, Janne Grunau
Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement
lazy restore for kernel mode FPSIMD") reverted during prolonged interactive
usage with the downstream Asahi Linux kernel.
This prompted me to adjust the reproducer to be closer to the desktop use
case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore
kernel mode NEON at context switch"). With the vanilla kernel before the
commit or that commit reverted on the Asahi Linux kernel the new reproducer
also sees no bug, and interactive usage seems fine.
The old reproducer used a loop device backed by the initramfs' tmpfs. The
new reproducer now uses the actual hardware nvme device as backend:
init:
dev=/dev/nvme0n1p7
dm_name=test
dm_dev=/dev/mapper/${dm_name}
echo test | cryptsetup luksFormat -q ${dev}
echo test | cryptsetup open ${dev} test
mkfs.ext4 ${dm_dev}
mount ${dm_dev} /target
cd /target
fio /verify.fio
verify.fio:
[write-and-verify]
loops=10
rw=randwrite
bs=4k
direct=1
ioengine=libaio
iodepth=16
verify=crc32c
size=1G
CC: +jannau, as I previously told him aefbab8e77eb was safe after reverting it
for the wrong reasons, and then finding out is was not needed after retesting
with my old reproducer.
Kind Regards,
Johannes Nixdorf
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 6:22 ` Johannes Nixdorf
@ 2024-05-21 8:55 ` Ard Biesheuvel
2024-05-21 12:56 ` Mark Brown
2024-05-21 18:34 ` Will Deacon
1 sibling, 1 reply; 18+ messages in thread
From: Ard Biesheuvel @ 2024-05-21 8:55 UTC (permalink / raw)
To: Johannes Nixdorf; +Cc: Marc Zyngier, Mark Brown, linux-arm-kernel, Janne Grunau
On Tue, 21 May 2024 at 08:22, Johannes Nixdorf <mixi@shadowice.org> wrote:
>
> Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement
> lazy restore for kernel mode FPSIMD") reverted during prolonged interactive
> usage with the downstream Asahi Linux kernel.
>
> This prompted me to adjust the reproducer to be closer to the desktop use
> case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore
> kernel mode NEON at context switch"). With the vanilla kernel before the
> commit or that commit reverted on the Asahi Linux kernel the new reproducer
> also sees no bug, and interactive usage seems fine.
>
> The old reproducer used a loop device backed by the initramfs' tmpfs. The
> new reproducer now uses the actual hardware nvme device as backend:
>
> init:
> dev=/dev/nvme0n1p7
> dm_name=test
> dm_dev=/dev/mapper/${dm_name}
> echo test | cryptsetup luksFormat -q ${dev}
> echo test | cryptsetup open ${dev} test
> mkfs.ext4 ${dm_dev}
> mount ${dm_dev} /target
> cd /target
> fio /verify.fio
>
> verify.fio:
> [write-and-verify]
> loops=10
> rw=randwrite
> bs=4k
> direct=1
> ioengine=libaio
> iodepth=16
> verify=crc32c
> size=1G
>
Thanks for these instructions - I am trying to reproduce but no 'luck' yet.
Could you please share your complete kernel config (based on the
vanilla kernel at commit aefbab8e77eb), and a dump of the FIO output
on a failure case?
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 8:55 ` Ard Biesheuvel
@ 2024-05-21 12:56 ` Mark Brown
0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2024-05-21 12:56 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Johannes Nixdorf, Marc Zyngier, linux-arm-kernel, Janne Grunau
[-- Attachment #1.1: Type: text/plain, Size: 438 bytes --]
On Tue, May 21, 2024 at 10:55:42AM +0200, Ard Biesheuvel wrote:
> Thanks for these instructions - I am trying to reproduce but no 'luck' yet.
I've also been having a hard time getting things to show. I also have
an extension to fp-stress which adds coverage of kernel mode crypto that
I'll probably polish a little and post today, as things stand it'll need
some hacking out of yield points in the kernel to do anything useful
though.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 6:22 ` Johannes Nixdorf
2024-05-21 8:55 ` Ard Biesheuvel
@ 2024-05-21 18:34 ` Will Deacon
2024-05-21 18:44 ` Mark Brown
2024-05-21 20:06 ` Janne Grunau
1 sibling, 2 replies; 18+ messages in thread
From: Will Deacon @ 2024-05-21 18:34 UTC (permalink / raw)
To: Johannes Nixdorf
Cc: Marc Zyngier, Ard Biesheuvel, Mark Brown, linux-arm-kernel,
Janne Grunau
Hi Johannes,
On Tue, May 21, 2024 at 08:22:08AM +0200, Johannes Nixdorf wrote:
> Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement
> lazy restore for kernel mode FPSIMD") reverted during prolonged interactive
> usage with the downstream Asahi Linux kernel.
Damn, but thanks for the update. I have to ask, but are you absolutely
sure this was with 2632e2521769 reverted? If you're able to double-check
that, it would be great, since we're having trouble reproducing the
issue.
> This prompted me to adjust the reproducer to be closer to the desktop use
> case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore
> kernel mode NEON at context switch"). With the vanilla kernel before the
> commit or that commit reverted on the Asahi Linux kernel the new reproducer
> also sees no bug, and interactive usage seems fine.
I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy
restore for kernel mode FPSIMD"), so it sounds like I should revert
aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at
context switch") as well while we work to reproduce the issue.
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 18:34 ` Will Deacon
@ 2024-05-21 18:44 ` Mark Brown
2024-05-21 18:57 ` Ard Biesheuvel
2024-05-21 20:06 ` Janne Grunau
1 sibling, 1 reply; 18+ messages in thread
From: Mark Brown @ 2024-05-21 18:44 UTC (permalink / raw)
To: Will Deacon
Cc: Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel, linux-arm-kernel,
Janne Grunau
[-- Attachment #1.1: Type: text/plain, Size: 630 bytes --]
On Tue, May 21, 2024 at 07:34:45PM +0100, Will Deacon wrote:
> I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy
> restore for kernel mode FPSIMD"), so it sounds like I should revert
> aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at
> context switch") as well while we work to reproduce the issue.
For paranoia I'd be tempted to revert 9b19700e623f9 ("arm64: fpsimd:
Drop unneeded 'busy' flag") as well since we're not sure where exactly
the issue is, that'd drop the entire series. It's hopefully excessive
but the exact bisection doesn't seem 100% clear and obviously we missed
something.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 18:44 ` Mark Brown
@ 2024-05-21 18:57 ` Ard Biesheuvel
0 siblings, 0 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2024-05-21 18:57 UTC (permalink / raw)
To: Mark Brown
Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, linux-arm-kernel,
Janne Grunau
On Tue, 21 May 2024 at 20:44, Mark Brown <broonie@kernel.org> wrote:
>
> On Tue, May 21, 2024 at 07:34:45PM +0100, Will Deacon wrote:
>
> > I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy
> > restore for kernel mode FPSIMD"), so it sounds like I should revert
> > aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at
> > context switch") as well while we work to reproduce the issue.
>
> For paranoia I'd be tempted to revert 9b19700e623f9 ("arm64: fpsimd:
> Drop unneeded 'busy' flag") as well since we're not sure where exactly
> the issue is, that'd drop the entire series. It's hopefully excessive
> but the exact bisection doesn't seem 100% clear and obviously we missed
> something.
We'll need to revert this ccm change as well, but I'll need to do that manually.
88c6d50f649b2805bbdfe0b616892f93db47e4fa
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=88c6d50f649b2805bbdfe0b616892f93db47e4fa
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 18:34 ` Will Deacon
2024-05-21 18:44 ` Mark Brown
@ 2024-05-21 20:06 ` Janne Grunau
2024-05-21 20:21 ` Mark Brown
1 sibling, 1 reply; 18+ messages in thread
From: Janne Grunau @ 2024-05-21 20:06 UTC (permalink / raw)
To: Will Deacon, Johannes Nixdorf
Cc: Marc Zyngier, Ard Biesheuvel, Mark Brown, linux-arm-kernel, asahi
Hej,
On Tue, May 21, 2024, at 20:34, Will Deacon wrote:
> Hi Johannes,
>
> On Tue, May 21, 2024 at 08:22:08AM +0200, Johannes Nixdorf wrote:
>> Bad news: I hit the bug again with 2632e2521769 ("arm64: fpsimd: Implement
>> lazy restore for kernel mode FPSIMD") reverted during prolonged interactive
>> usage with the downstream Asahi Linux kernel.
>
> Damn, but thanks for the update. I have to ask, but are you absolutely
> sure this was with 2632e2521769 reverted? If you're able to double-check
> that, it would be great, since we're having trouble reproducing the
> issue.
I can reproduce the issue with v6.8 and 2632e2521769 reverted on M1 (t8103). Reproduction with 2632e2521769 reverted is harder. I've seen multiple fio runs without verification errors while with plain v6.8 verification errors are hit after a few seconds.
Running SIMD intense workloads in user space apparently increase the reproduction odds. When running AV1 decoding using dav1d in parallel errors appear faster. Errors manifest either in changed decoder output, fio verification errors or both. I'm using `dav1d -i sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd assume the exact SIMD user space code doesn't matter as long as it runs on all CPU cores.
>> This prompted me to adjust the reproducer to be closer to the desktop use
>> case, which then also found aefbab8e77eb ("arm64: fpsimd: Preserve/restore
>> kernel mode NEON at context switch"). With the vanilla kernel before the
>> commit or that commit reverted on the Asahi Linux kernel the new reproducer
>> also sees no bug, and interactive usage seems fine.
>
> I've already reverted 2632e2521769 ("arm64: fpsimd: Implement lazy
> restore for kernel mode FPSIMD"), so it sounds like I should revert
> aefbab8e77eb ("arm64: fpsimd: Preserve/restore kernel mode NEON at
> context switch") as well while we work to reproduce the issue.
v6.8 with 2632e2521769 and aefbab8e77eb reverted does no longer reproduce errors. av1 decoding produces a stable hash as expected and fio does not report any verification errors.
best regards
Janne
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 20:06 ` Janne Grunau
@ 2024-05-21 20:21 ` Mark Brown
2024-05-21 21:23 ` Janne Grunau
0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2024-05-21 20:21 UTC (permalink / raw)
To: Janne Grunau
Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel,
linux-arm-kernel, asahi
[-- Attachment #1.1: Type: text/plain, Size: 732 bytes --]
On Tue, May 21, 2024 at 10:06:36PM +0200, Janne Grunau wrote:
> Running SIMD intense workloads in user space apparently increase the
> reproduction odds. When running AV1 decoding using dav1d in parallel
> errors appear faster. Errors manifest either in changed decoder
> output, fio verification errors or both. I'm using `dav1d -i
> sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd
> assume the exact SIMD user space code doesn't matter as long as it
> runs on all CPU cores.
There's fp-stress in tools/testing/selftests/arm64/fp which will run two
copies of fpsimd-test (from the same directory) per core in parallel
looking for corruption in the FPSIMD registers. Specify '-t -1' and
it'll run for ever.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 20:21 ` Mark Brown
@ 2024-05-21 21:23 ` Janne Grunau
2024-05-22 11:14 ` Mark Brown
0 siblings, 1 reply; 18+ messages in thread
From: Janne Grunau @ 2024-05-21 21:23 UTC (permalink / raw)
To: Mark Brown
Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel,
linux-arm-kernel, asahi
On Tue, May 21, 2024, at 22:21, Mark Brown wrote:
> On Tue, May 21, 2024 at 10:06:36PM +0200, Janne Grunau wrote:
>
>> Running SIMD intense workloads in user space apparently increase the
>> reproduction odds. When running AV1 decoding using dav1d in parallel
>> errors appear faster. Errors manifest either in changed decoder
>> output, fio verification errors or both. I'm using `dav1d -i
>> sample.ivf --muxer xxh3 -o -` here as user space SIMD payload but I'd
>> assume the exact SIMD user space code doesn't matter as long as it
>> runs on all CPU cores.
>
> There's fp-stress in tools/testing/selftests/arm64/fp which will run two
> copies of fpsimd-test (from the same directory) per core in parallel
> looking for corruption in the FPSIMD registers. Specify '-t -1' and
> it'll run for ever.
It's hard (impossible) to reproduce just with fio and fp-stress. 5 consecutive fio runs (each 40-45 seconds) without verification error in fio or Mismatch from fpsimd-test.
With AV1 decoding in parallel each fio run shows at least one of decoding mismatch, fio verification error or mismatches from fpsimd-test as below:
| # FPSIMD-6-0: Mismatch: PID=2110, iteration=9989281, reg=0
| # FPSIMD-6-0: Expected [3e0800103e0840103e0880103e08c010]
| # FPSIMD-6-0: Got [f0f3cf35dea2f3ea41a13a27d8d9369b]
| # Sending signals, timeout remaining: -1
| ...
| # Sending signals, timeout remaining: -1
| # FPSIMD-7-0: Mismatch: PID=2112, iteration=15635931, reg=6
| # FPSIMD-7-0: Expected [400806b0400846b0400886b04008c6b0]
| # FPSIMD-7-0: Got [b24b366b6b3b470ce763389ad425a33d]
| # FPSIMD-4-0: Mismatch: PID=2106, iteration=13371905, reg=0
| # FPSIMD-4-0: Expected [3a0800103a0840103a0880103a08c010]
| # FPSIMD-4-0: Got [b9a7d504554cc797724fab09aa988e2f]
| # Sending signals, timeout remaining: -1
| ...
| # Sending signals, timeout remaining: -1
| # FPSIMD-5-0: Mismatch: PID=2108, iteration=14880477, reg=0
| # FPSIMD-5-0: Expected [3c0800d03c0840d03c0880d03c08c0d0]
| # FPSIMD-5-0: Got [c550b9a0f0947cd38aa17241e129f9a6]
| # FPSIMD-7-1: Mismatch: PID=2113, iteration=17682959, reg=0
| # FPSIMD-7-1: Expected [410800f0410840f0410880f04108c0f0]
| # FPSIMD-7-1: Got [98a43b8ae2157b6b30497714c52bf6d6]
| # FPSIMD-2-0: Mismatch: PID=2102, iteration=16482263, reg=0
| # FPSIMD-2-0: Expected [3608007036084070360880703608c070]
| # FPSIMD-2-0: Got [1cf724cdaf8f997338ee499a5f1f33e7]
In the majority of cases the mismatch is reported for reg=0.
Running just fp-stress and AV1 decoding without fio reports no errors.
The fio testing probably caused segfaults in sddm / kwin on the tests system using llvmpipe.
best regards
Janne
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD")
2024-05-21 21:23 ` Janne Grunau
@ 2024-05-22 11:14 ` Mark Brown
0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2024-05-22 11:14 UTC (permalink / raw)
To: Janne Grunau
Cc: Will Deacon, Johannes Nixdorf, Marc Zyngier, Ard Biesheuvel,
linux-arm-kernel, asahi
[-- Attachment #1.1: Type: text/plain, Size: 567 bytes --]
On Tue, May 21, 2024 at 11:23:10PM +0200, Janne Grunau wrote:
> On Tue, May 21, 2024, at 22:21, Mark Brown wrote:
> It's hard (impossible) to reproduce just with fio and fp-stress. 5
> consecutive fio runs (each 40-45 seconds) without verification error
> in fio or Mismatch from fpsimd-test.
Huh, that's interesting. I guess the I/O or multiprocess stuff that AV1
decode implies makes the scheduling more interesting?
> In the majority of cases the mismatch is reported for reg=0.
That's about what I'd expect for a "we loaded the wrong register values"
error.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
[-- Attachment #2: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-05-22 11:14 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16 16:22 [BUG] dm-crypt broken after 2632e2521769 ("arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD") Johannes Nixdorf
2024-05-16 16:25 ` Marc Zyngier
2024-05-16 17:16 ` Dave Martin
2024-05-16 17:17 ` Ard Biesheuvel
2024-05-17 11:37 ` Will Deacon
2024-05-17 11:40 ` Mark Brown
2024-05-17 11:57 ` Mark Rutland
2024-05-16 17:34 ` Johannes Nixdorf
2024-05-21 6:22 ` Johannes Nixdorf
2024-05-21 8:55 ` Ard Biesheuvel
2024-05-21 12:56 ` Mark Brown
2024-05-21 18:34 ` Will Deacon
2024-05-21 18:44 ` Mark Brown
2024-05-21 18:57 ` Ard Biesheuvel
2024-05-21 20:06 ` Janne Grunau
2024-05-21 20:21 ` Mark Brown
2024-05-21 21:23 ` Janne Grunau
2024-05-22 11:14 ` Mark Brown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).