Re: ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9)

public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed

From: Robin Murphy <robin.murphy@arm.com>
To: ACPI Devel Maling List <linux-acpi@vger.kernel.org>
Cc: Veronika Kabatova <vkabatov@redhat.com>,
	Will Deacon <will@kernel.org>,
	CKI Project <cki-project@redhat.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Memory Management <mm-qe@redhat.com>,
	skt-results-master@redhat.com, Jeff Bastian <jbastian@redhat.com>,
	Jan Stancek <jstancek@redhat.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	rjw@rjwysocki.net, lenb@kernel.org, guohanjun@huawei.com,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	sudeep.holla@arm.com, ardb@kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	lv.zheng@intel.com, tony.luck@intel.com
Subject: Re: ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9)
Date: Tue, 29 Jun 2021 12:48:14 +0100	[thread overview]
Message-ID: <8a28663f-6541-6ff4-3de0-b140e3f8a5b9@arm.com> (raw)
In-Reply-To: <48b23351-3dba-bec8-242f-3c918ae55708@arm.com>

[ +ACPI audience ]

On 2021-06-25 12:15, Robin Murphy wrote:
> On 2021-06-25 12:09, Catalin Marinas wrote:
>> On Fri, Jun 25, 2021 at 12:02:52PM +0100, Robin Murphy wrote:
>>> On 2021-06-25 10:52, Veronika Kabatova wrote:
>>> [...]
>>>>>>          ❌ stress: stress-ng
>>>>>
>>>>> Oh no, this looks like another alignment fault in memcpy:
>>>>>
>>>>> [13330.651903] Unable to handle kernel paging request at virtual 
>>>>> address ffff8000534705ff
>>>>> [13330.651914] Mem abort info:
>>>>> [13330.651918]   ESR = 0x96000021
>>>>> [13330.651922]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>> [13330.651928]   SET = 0, FnV = 0
>>>>> [13330.651931]   EA = 0, S1PTW = 0
>>>>> [13330.651933]   FSC = 0x21: alignment fault
>>>>> [13330.651938] Data abort info:
>>>>> [13330.651940]   ISV = 0, ISS = 0x00000021
>>>>> [13330.651941]   CM = 0, WnR = 0
>>>>> [13330.651943] swapper pgtable: 4k pages, 48-bit VAs, 
>>>>> pgdp=00000000f3e6b000
>>>>> [13330.651945] [ffff8000534705ff] pgd=1000008ffcfff003, 
>>>>> p4d=1000008ffcfff003, pud=100000088e57d003, pmd=10000008d0aeb003, 
>>>>> pte=006800008021370f
>>>>> [13330.651956] Internal error: Oops: 96000021 [#1] SMP
>>>>> [13330.651961] Modules linked in: unix_diag binfmt_misc fcrypt 
>>>>> sm4_generic crc32_generic md4 michael_mic nhpoly1305_neon 
>>>>> nhpoly1305 poly1305_generic libpoly1305 poly1305_neon rmd160 
>>>>> sha3_generic sm3_generic streebog_generic wp512 blowfish_generic 
>>>>> blowfish_common cast5_generic des_generic libdes chacha_generic 
>>>>> chacha_neon libchacha camellia_generic cast6_generic cast_common 
>>>>> serpent_generic twofish_generic twofish_common dm_thin_pool 
>>>>> dm_persistent_data dm_bio_prison nvme nvme_core ipmi_watchdog 
>>>>> ipmi_poweroff loop tun af_key crypto_user scsi_transport_iscsi 
>>>>> xt_multiport ip_gre ip_tunnel gre overlay xt_CONNSECMARK xt_SECMARK 
>>>>> nft_counter xt_state xt_conntrack nft_compat ah6 ah4 nft_objref 
>>>>> nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables 
>>>>> nfnetlink jfs sctp ip6_udp_tunnel udp_tunnel dm_log_writes 
>>>>> dm_flakey rfkill mlx5_ib ib_uverbs ib_core sunrpc coresight_etm4x 
>>>>> i2c_smbus coresight_replicator coresight_tpiu coresight_tmc joydev 
>>>>> mlx5_core acpi_ipmi psample ipmi_ssif mlxfw !
>>>>>    ipmi_devintf
>>>>> [13330.652076]  ipmi_msghandler coresight_funnel thunderx2_pmu 
>>>>> coresight vfat fat fuse zram ip_tables xfs ast crct10dif_ce 
>>>>> i2c_algo_bit ghash_ce drm_vram_helper drm_kms_helper syscopyarea 
>>>>> sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm 
>>>>> gpio_xlp i2c_xlp9xx uas usb_storage aes_neon_bs [last unloaded: nvmet]
>>>>> [13330.652123] CPU: 115 PID: 188446 Comm: stress-ng Tainted: 
>>>>> G           OEL    5.13.0-rc7 #1
>>>>> [13330.652129] Hardware name: HPE Apollo 70             
>>>>> /C01_APACHE_MB         , BIOS L50_5.13_1.15 05/08/2020
>>>>> [13330.652133] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO BTYPE=--)
>>>>> [13330.652139] pc : __memcpy+0x168/0x250
>>>>> [13330.652150] lr : memory_read_from_buffer+0x58/0x80
>>>>> [13330.652161] sp : ffff800063ef3c20
>>>>> [13330.652163] x29: ffff800063ef3c20 x28: ffff0008b1380000 x27: 
>>>>> 0000000000000000
>>>>> [13330.652170] x26: 0000000000000000 x25: 0000000000000000 x24: 
>>>>> ffff00080a960fe0
>>>>> [13330.652176] x23: ffff800063ef3d28 x22: 000000000000063f x21: 
>>>>> ffff800063ef3c88
>>>>> [13330.652181] x20: 000000000000063f x19: 000000000000063f x18: 
>>>>> 0000000000000000
>>>>> [13330.652186] x17: 0000000000000000 x16: 0000000000000000 x15: 
>>>>> 0000000000000000
>>>>> [13330.652191] x14: 0000000000000000 x13: 0000000000000000 x12: 
>>>>> 0000000000000000
>>>>> [13330.652196] x11: 0000000000000000 x10: 0000000000000000 x9 : 
>>>>> 0000000000000000
>>>>> [13330.652200] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 
>>>>> 0000000000000000
>>>>> [13330.652206] x5 : ffff000d0fb0063f x4 : ffff80005347063f x3 : 
>>>>> ffff000d0fb005c0
>>>>> [13330.652212] x2 : ffffffffffffffef x1 : ffff800053470600 x0 : 
>>>>> ffff000d0fb00000
>>>>> [13330.652218] Call trace:
>>>>> [13330.652221]  __memcpy+0x168/0x250
>>>>> [13330.652225]  acpi_data_show+0x5c/0x8c
>>>>> [13330.652232]  sysfs_kf_bin_read+0x78/0xa0
>>>>> [13330.652238]  kernfs_file_read_iter+0x9c/0x1a4
>>>>> [13330.652241]  kernfs_fop_read_iter+0x34/0x50
>>>>> [13330.652244]  new_sync_read+0xdc/0x154
>>>>> [13330.652253]  vfs_read+0x158/0x1e4
>>>>> [13330.652260]  ksys_read+0x64/0xec
>>>>> [13330.652266]  __arm64_sys_read+0x28/0x34
>>>>> [13330.652273]  invoke_syscall+0x50/0x120
>>>>> [13330.652280]  el0_svc_common.constprop.0+0x4c/0xd4
>>>>> [13330.652284]  do_el0_svc+0x30/0x9c
>>>>> [13330.652286]  el0_svc+0x2c/0x54
>>>>> [13330.652294]  el0t_64_sync_handler+0x1a4/0x1b0
>>>>> [13330.652296]  el0t_64_sync+0x19c/0x1a0
>>>>> [13330.652303] Code: a984346c a9c4342c f1010042 54fffee8 (a97c3c8e)
>>>>> [13330.652307] ---[ end trace 227d4380f57145d4 ]---
>>>>>
>>>>> So maybe this issue isn't limited to weird modules, after all...
>>>>>
>>>>
>>>> It ran on the machine from the same set that we were able to reproduce
>>>> it on previously. If you or anyone else have an idea on how to 
>>>> stabilize
>>>> the reproducibility or have a debug patch we'll be happy to try it.
>>>
>>> Possibly it depends on the individual machines' firmware exactly how the
>>> relevant bits of their ACPI tables are aligned in memory?
>>>
>>> I've started digging into that callstack - it may not be a "weird 
>>> module"
>>> but it's definitely crusty ACPI code... a238317ce818 ("ACPI: Clean up
>>> acpi_os_map/unmap_memory() to eliminate __iomem.") looks frankly a bit
>>> questionable in its decision to blithely cast away __iomem, but then the
>>> rationale in aafc65c731fe ("ACPI: add arm64 to the platforms that use
>>> ioremap") seems particularly dubious on top of that (especially given 
>>> this
>>> end result).
>>>
>>> At a wild guess, I'm wondering if this may be sufficient:
>>>
>>> ----->8-----
>>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>>> index 327e1b4eb6b0..f5d26b102fbe 100644
>>> --- a/drivers/acpi/osl.c
>>> +++ b/drivers/acpi/osl.c
>>> @@ -277,7 +277,7 @@ acpi_map_lookup_virt(void __iomem *virt, 
>>> acpi_size size)
>>>          return NULL;
>>>   }
>>>
>>> -#if defined(CONFIG_IA64) || defined(CONFIG_ARM64)
>>> +#if defined(CONFIG_IA64)
>>>   /* ioremap will take care of cache attributes */
>>>   #define should_use_kmap(pfn)   0
>>>   #else
>>> -----8<-----
>>
>> I thought the same but shouldn't acpi_os_ioremap() map it with the right
>> attributes? It uses the EFI maps to check what kind of memory this is.
> 
> Oh crikey, I missed that branch of the rabbit hole... I guess that must 
> mean that the tables being poked here are *not* covered by the EFI 
> memory map, so page_is_ram() is unlikely to help either :(

After picking through the UEFI spec I think I've now got a clearer 
picture of what's happening, but I'm not sure where it goes from here...

The spec implies that it *is* legitimate for runtime-loaded ACPI tables 
to lie outside the EFI memory map, and that case they must be assumed to 
be uncached, so the behaviour of acpi_os_ioremap() is correct. Given the 
definition of uncached for arm64 memory types though, that means that 
callers of acpi_os_map_memory() still have to be prepared to get an 
__iomem pointer back even if they know they're mapping a table rather 
than some random bit of MMIO for an AML method.

Therefore in this case it seems the blame lies partway between 
acpi_os_map_memory() for casting away __iomem and acpi_data_show() for 
letting an arbitrary offset lead to an arbitrarily-aligned memcpy(), but 
I don't know what the best way to fix it is. Either way I've satisfied 
myself that it's not an issue with the arm64 code itself - I do wonder 
whether this might also be a problem on IA-64 given 
ACPI_MISALIGNMENT_NOT_SUPPORTED, and I guess RISC-V may have alignment 
concerns as well.

Robin.

next      parent reply	other threads:[~2021-06-29 11:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cki.6A65B499FE.46BPQ6DJTC@redhat.com>
     [not found] ` <20210625083918.GA2736@willie-the-truck>
     [not found]   ` <CA+tGwn=rP_hAMLLtoy_s90ZzBjfMggu7T2Qj8HyFfGh1BGUoRA@mail.gmail.com>
     [not found]     ` <31ffe8fc-f5ee-2858-26c5-0fd8bdd68702@arm.com>
     [not found]       ` <20210625110944.GB20835@arm.com>
     [not found]         ` <48b23351-3dba-bec8-242f-3c918ae55708@arm.com>
2021-06-29 11:48           ` Robin Murphy [this message]
2021-06-29 14:44             ` ❌ FAIL: Test report for kernel 5.13.0-rc7 (arm-next, 8ab9b1a9) Lorenzo Pieralisi
2021-06-29 15:14               ` Robin Murphy
2021-06-29 16:35                 ` Catalin Marinas
2021-06-30 10:37                   ` Lorenzo Pieralisi
2021-06-30 11:17                     ` Robin Murphy
2021-06-30 13:22                       ` Ard Biesheuvel
2021-06-30 15:49                         ` Lorenzo Pieralisi
2021-06-30 18:18                           ` Ard Biesheuvel
2021-07-05 16:17                             ` Lorenzo Pieralisi
2021-07-16 16:16                               ` Ard Biesheuvel
2021-07-16 16:26                                 ` Lorenzo Pieralisi
2021-07-22 12:38                                   ` Veronika Kabatova
2021-07-22 13:51                                     ` Robin Murphy
2021-07-22 18:23                                       ` Veronika Kabatova
2021-06-29 17:03                 ` Veronika Kabatova
2021-06-29 17:27                   ` Robin Murphy
2021-06-29 17:44                     ` Veronika Kabatova

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a28663f-6541-6ff4-3de0-b140e3f8a5b9@arm.com \
    --to=robin.murphy@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cki-project@redhat.com \
    --cc=guohanjun@huawei.com \
    --cc=jbastian@redhat.com \
    --cc=jstancek@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=lv.zheng@intel.com \
    --cc=mark.rutland@arm.com \
    --cc=mm-qe@redhat.com \
    --cc=rjw@rjwysocki.net \
    --cc=skt-results-master@redhat.com \
    --cc=sudeep.holla@arm.com \
    --cc=tony.luck@intel.com \
    --cc=vkabatov@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox