All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Veronika Kabatova <vkabatov@redhat.com>,
	Will Deacon <will@kernel.org>,
	CKI Project <cki-project@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Memory Management <mm-qe@redhat.com>,
	skt-results-master@redhat.com, Jeff Bastian <jbastian@redhat.com>,
	Jan Stancek <jstancek@redhat.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	rjw@rjwysocki.net, lenb@kernel.org, guohanjun@huawei.com,
	sudeep.holla@arm.com, ardb@kernel.org, lv.zheng@intel.com,
	tony.luck@intel.com
Subject: Re: ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9)
Date: Tue, 29 Jun 2021 17:35:43 +0100	[thread overview]
Message-ID: <20210629163543.GA12361@arm.com> (raw)
In-Reply-To: <14ca6f72-9b0f-ebd7-9cf8-a5d6190c8e5d@arm.com>

On Tue, Jun 29, 2021 at 04:14:55PM +0100, Robin Murphy wrote:
> On 2021-06-29 15:44, Lorenzo Pieralisi wrote:
> > On Tue, Jun 29, 2021 at 12:48:14PM +0100, Robin Murphy wrote:
> > > [ +ACPI audience ]
> > > 
> > > On 2021-06-25 12:15, Robin Murphy wrote:
> > > > On 2021-06-25 12:09, Catalin Marinas wrote:
> > > > > On Fri, Jun 25, 2021 at 12:02:52PM +0100, Robin Murphy wrote:
> > > > > > On 2021-06-25 10:52, Veronika Kabatova wrote:
> > > > > > [...]
> > > > > > > > >           ❌ stress: stress-ng
> > > > > > > > 
> > > > > > > > Oh no, this looks like another alignment fault in memcpy:
> > > > > > > > 
> > > > > > > > [13330.651903] Unable to handle kernel paging request at
> > > > > > > > virtual address ffff8000534705ff
[...]
> > > > > > > > [13330.652218] Call trace:
> > > > > > > > [13330.652221]  __memcpy+0x168/0x250
> > > > > > > > [13330.652225]  acpi_data_show+0x5c/0x8c
> > > > > > > > [13330.652232]  sysfs_kf_bin_read+0x78/0xa0
> > > > > > > > [13330.652238]  kernfs_file_read_iter+0x9c/0x1a4
> > > > > > > > [13330.652241]  kernfs_fop_read_iter+0x34/0x50
> > > > > > > > [13330.652244]  new_sync_read+0xdc/0x154
> > > > > > > > [13330.652253]  vfs_read+0x158/0x1e4
> > > > > > > > [13330.652260]  ksys_read+0x64/0xec
> > > > > > > > [13330.652266]  __arm64_sys_read+0x28/0x34
> > > > > > > > [13330.652273]  invoke_syscall+0x50/0x120
> > > > > > > > [13330.652280]  el0_svc_common.constprop.0+0x4c/0xd4
> > > > > > > > [13330.652284]  do_el0_svc+0x30/0x9c
> > > > > > > > [13330.652286]  el0_svc+0x2c/0x54
> > > > > > > > [13330.652294]  el0t_64_sync_handler+0x1a4/0x1b0
> > > > > > > > [13330.652296]  el0t_64_sync+0x19c/0x1a0
> > > > > > > > [13330.652303] Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
> > > > > > > > [13330.652307] ---[ end trace 227d4380f57145d4 ]---
> > > > > > > > 
> > > > > > > > So maybe this issue isn't limited to weird modules, after all...
> > > > > > > 
> > > > > > > It ran on the machine from the same set that we were able to reproduce
> > > > > > > it on previously. If you or anyone else have an idea on how
> > > > > > > to stabilize the reproducibility or have a debug patch we'll be happy to try it.
> > > > > > 
> > > > > > Possibly it depends on the individual machines' firmware exactly how the
> > > > > > relevant bits of their ACPI tables are aligned in memory?
> > > > > > 
> > > > > > I've started digging into that callstack - it may not be a "weird module"
> > > > > > but it's definitely crusty ACPI code... a238317ce818 ("ACPI: Clean up
> > > > > > acpi_os_map/unmap_memory() to eliminate __iomem.") looks frankly a bit
> > > > > > questionable in its decision to blithely cast away __iomem, but then the
> > > > > > rationale in aafc65c731fe ("ACPI: add arm64 to the platforms that use
> > > > > > ioremap") seems particularly dubious on top of that (especially
> > > > > > given this end result).
[...]
> > > After picking through the UEFI spec I think I've now got a clearer picture
> > > of what's happening, but I'm not sure where it goes from here...
> > > 
> > > The spec implies that it *is* legitimate for runtime-loaded ACPI tables to
> > > lie outside the EFI memory map, and that case they must be assumed to be
> > > uncached, so the behaviour of acpi_os_ioremap() is correct.
> > 
> > I'd agree with the reasoning, it would be good to pinpoint whether
> > that's what actually triggers the issue.
> > 
> > I'd like to replicate it if possible (it is TX2 HW but firmware
> > config is likely to differ from the HW I have at hand), the
> > test command line that triggers the fault would be useful as
> > a starting point.
> > 
> > Furthermore, is this a v5.13-rc* regression ? If so it would be
> > good to bisect it - I can't recollect arm64 changes that could
> > have introduced this regression in the last cycle but I may have
> > missed something.
> 
> The actual change which has brought this to light is the update to arm64's
> memcpy() routine for 5.13 - the new version is more aggressive at making
> unaligned loads from the source buffer, so now triggers alignment faults
> more readily when (wrongly) used on iomem mappings in places that were
> getting away with it by chance under the previous implementation (see also
> [1], for example).

I wouldn't revert any of the memcpy() stuff as it just uncovered an
existing bug in how the ACPI tables are handled. Could we actually hit
a similar issue with C code parsing the ACPI tables?

Is there a way to map the ACPI tables as Normal Noncacheable
(ioremap_wc)? Presumably no-one sane would place ACPI tables in memory
that's sensitive to the access size.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Veronika Kabatova <vkabatov@redhat.com>,
	Will Deacon <will@kernel.org>,
	CKI Project <cki-project@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Memory Management <mm-qe@redhat.com>,
	skt-results-master@redhat.com, Jeff Bastian <jbastian@redhat.com>,
	Jan Stancek <jstancek@redhat.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	rjw@rjwysocki.net, lenb@kernel.org, guohanjun@huawei.com,
	sudeep.holla@arm.com, ardb@kernel.org, lv.zheng@intel.com,
	tony.luck@intel.com
Subject: Re: ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9)
Date: Tue, 29 Jun 2021 17:35:43 +0100	[thread overview]
Message-ID: <20210629163543.GA12361@arm.com> (raw)
In-Reply-To: <14ca6f72-9b0f-ebd7-9cf8-a5d6190c8e5d@arm.com>

On Tue, Jun 29, 2021 at 04:14:55PM +0100, Robin Murphy wrote:
> On 2021-06-29 15:44, Lorenzo Pieralisi wrote:
> > On Tue, Jun 29, 2021 at 12:48:14PM +0100, Robin Murphy wrote:
> > > [ +ACPI audience ]
> > > 
> > > On 2021-06-25 12:15, Robin Murphy wrote:
> > > > On 2021-06-25 12:09, Catalin Marinas wrote:
> > > > > On Fri, Jun 25, 2021 at 12:02:52PM +0100, Robin Murphy wrote:
> > > > > > On 2021-06-25 10:52, Veronika Kabatova wrote:
> > > > > > [...]
> > > > > > > > >           ❌ stress: stress-ng
> > > > > > > > 
> > > > > > > > Oh no, this looks like another alignment fault in memcpy:
> > > > > > > > 
> > > > > > > > [13330.651903] Unable to handle kernel paging request at
> > > > > > > > virtual address ffff8000534705ff
[...]
> > > > > > > > [13330.652218] Call trace:
> > > > > > > > [13330.652221]  __memcpy+0x168/0x250
> > > > > > > > [13330.652225]  acpi_data_show+0x5c/0x8c
> > > > > > > > [13330.652232]  sysfs_kf_bin_read+0x78/0xa0
> > > > > > > > [13330.652238]  kernfs_file_read_iter+0x9c/0x1a4
> > > > > > > > [13330.652241]  kernfs_fop_read_iter+0x34/0x50
> > > > > > > > [13330.652244]  new_sync_read+0xdc/0x154
> > > > > > > > [13330.652253]  vfs_read+0x158/0x1e4
> > > > > > > > [13330.652260]  ksys_read+0x64/0xec
> > > > > > > > [13330.652266]  __arm64_sys_read+0x28/0x34
> > > > > > > > [13330.652273]  invoke_syscall+0x50/0x120
> > > > > > > > [13330.652280]  el0_svc_common.constprop.0+0x4c/0xd4
> > > > > > > > [13330.652284]  do_el0_svc+0x30/0x9c
> > > > > > > > [13330.652286]  el0_svc+0x2c/0x54
> > > > > > > > [13330.652294]  el0t_64_sync_handler+0x1a4/0x1b0
> > > > > > > > [13330.652296]  el0t_64_sync+0x19c/0x1a0
> > > > > > > > [13330.652303] Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
> > > > > > > > [13330.652307] ---[ end trace 227d4380f57145d4 ]---
> > > > > > > > 
> > > > > > > > So maybe this issue isn't limited to weird modules, after all...
> > > > > > > 
> > > > > > > It ran on the machine from the same set that we were able to reproduce
> > > > > > > it on previously. If you or anyone else have an idea on how
> > > > > > > to stabilize the reproducibility or have a debug patch we'll be happy to try it.
> > > > > > 
> > > > > > Possibly it depends on the individual machines' firmware exactly how the
> > > > > > relevant bits of their ACPI tables are aligned in memory?
> > > > > > 
> > > > > > I've started digging into that callstack - it may not be a "weird module"
> > > > > > but it's definitely crusty ACPI code... a238317ce818 ("ACPI: Clean up
> > > > > > acpi_os_map/unmap_memory() to eliminate __iomem.") looks frankly a bit
> > > > > > questionable in its decision to blithely cast away __iomem, but then the
> > > > > > rationale in aafc65c731fe ("ACPI: add arm64 to the platforms that use
> > > > > > ioremap") seems particularly dubious on top of that (especially
> > > > > > given this end result).
[...]
> > > After picking through the UEFI spec I think I've now got a clearer picture
> > > of what's happening, but I'm not sure where it goes from here...
> > > 
> > > The spec implies that it *is* legitimate for runtime-loaded ACPI tables to
> > > lie outside the EFI memory map, and that case they must be assumed to be
> > > uncached, so the behaviour of acpi_os_ioremap() is correct.
> > 
> > I'd agree with the reasoning, it would be good to pinpoint whether
> > that's what actually triggers the issue.
> > 
> > I'd like to replicate it if possible (it is TX2 HW but firmware
> > config is likely to differ from the HW I have at hand), the
> > test command line that triggers the fault would be useful as
> > a starting point.
> > 
> > Furthermore, is this a v5.13-rc* regression ? If so it would be
> > good to bisect it - I can't recollect arm64 changes that could
> > have introduced this regression in the last cycle but I may have
> > missed something.
> 
> The actual change which has brought this to light is the update to arm64's
> memcpy() routine for 5.13 - the new version is more aggressive at making
> unaligned loads from the source buffer, so now triggers alignment faults
> more readily when (wrongly) used on iomem mappings in places that were
> getting away with it by chance under the previous implementation (see also
> [1], for example).

I wouldn't revert any of the memcpy() stuff as it just uncovered an
existing bug in how the ACPI tables are handled. Could we actually hit
a similar issue with C code parsing the ACPI tables?

Is there a way to map the ACPI tables as Normal Noncacheable
(ioremap_wc)? Presumably no-one sane would place ACPI tables in memory
that's sensitive to the access size.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-06-29 16:35 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 19:46 ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9) CKI Project
2021-06-25  8:39 ` Will Deacon
     [not found]   ` <CA+tGwn=rP_hAMLLtoy_s90ZzBjfMggu7T2Qj8HyFfGh1BGUoRA@mail.gmail.com>
2021-06-25 11:02     ` Robin Murphy
2021-06-25 11:09       ` Catalin Marinas
2021-06-25 11:15         ` Robin Murphy
2021-06-29 11:48           ` Robin Murphy
2021-06-29 11:48             ` Robin Murphy
2021-06-29 14:44             ` Lorenzo Pieralisi
2021-06-29 14:44               ` Lorenzo Pieralisi
2021-06-29 15:14               ` Robin Murphy
2021-06-29 15:14                 ` Robin Murphy
2021-06-29 16:35                 ` Catalin Marinas [this message]
2021-06-29 16:35                   ` Catalin Marinas
2021-06-30 10:37                   ` Lorenzo Pieralisi
2021-06-30 10:37                     ` Lorenzo Pieralisi
2021-06-30 11:17                     ` Robin Murphy
2021-06-30 11:17                       ` Robin Murphy
2021-06-30 13:22                       ` Ard Biesheuvel
2021-06-30 15:49                         ` Lorenzo Pieralisi
2021-06-30 15:49                           ` Lorenzo Pieralisi
2021-06-30 18:18                           ` Ard Biesheuvel
2021-07-05 16:17                             ` Lorenzo Pieralisi
2021-07-05 16:17                               ` Lorenzo Pieralisi
2021-07-16 16:16                               ` Ard Biesheuvel
2021-07-16 16:26                                 ` Lorenzo Pieralisi
2021-07-16 16:26                                   ` Lorenzo Pieralisi
2021-07-22 12:38                                   ` Veronika Kabatova
2021-07-22 13:51                                     ` Robin Murphy
2021-07-22 13:51                                       ` Robin Murphy
2021-07-22 18:23                                       ` Veronika Kabatova
2021-06-29 17:03                 ` Veronika Kabatova
2021-06-29 17:27                   ` Robin Murphy
2021-06-29 17:27                     ` Robin Murphy
2021-06-29 17:44                     ` Veronika Kabatova
2022-03-04 19:31                     ` ??? " Aristeu Rozanski
2022-03-04 19:39                     ` Aristeu Rozanski
2022-03-04 20:00                       ` Robin Murphy
2022-03-07 10:19                       ` Lorenzo Pieralisi
2022-03-07 19:01                         ` Aristeu Rozanski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210629163543.GA12361@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=ardb@kernel.org \
    --cc=cki-project@redhat.com \
    --cc=guohanjun@huawei.com \
    --cc=jbastian@redhat.com \
    --cc=jstancek@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=lv.zheng@intel.com \
    --cc=mark.rutland@arm.com \
    --cc=mm-qe@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=robin.murphy@arm.com \
    --cc=skt-results-master@redhat.com \
    --cc=sudeep.holla@arm.com \
    --cc=tony.luck@intel.com \
    --cc=vkabatov@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.