From: matoro <matoro_mailinglist_kernel@matoro.tk>
To: John David Anglin <dave.anglin@bell.net>
Cc: Vidra.Jonas@seznam.cz, linux-parisc@vger.kernel.org,
John David Anglin <dave@parisc-linux.org>,
Helge Deller <deller@gmx.de>
Subject: Re: [PATCH] parisc: Try to fix random segmentation faults in package builds
Date: Tue, 04 Jun 2024 11:07:27 -0400 [thread overview]
Message-ID: <52c0dfa7e2054d883bd66da7ab2e68b8@matoro.tk> (raw)
In-Reply-To: <7345472b8bfa050ec2b86df5f69f99a4@matoro.tk>
On 2024-05-30 01:00, matoro wrote:
> On 2024-05-29 12:33, John David Anglin wrote:
>> On 2024-05-29 11:54 a.m., matoro wrote:
>>> On 2024-05-09 13:10, John David Anglin wrote:
>>>> On 2024-05-08 4:52 p.m., John David Anglin wrote:
>>>>>> with no accompanying stack trace and then the BMC would restart the
>>>>>> whole machine automatically. These were infrequent enough that the
>>>>>> segfaults were the bigger problem, but after applying this patch on top
>>>>>> of 6.8, this changed the dynamic. It seems to occur during builds with
>>>>>> varying I/O loads. For example, I was able to build gcc fine, with no
>>>>>> segfaults, but I was unable to build perl, a much smaller build,
>>>>>> without crashing the machine. I did not observe any segfaults over the
>>>>>> day or 2 I ran this patch, but that's not an unheard-of stretch of
>>>>>> time even without it, and I am being forced to revert because of the panics.
>>>>> Looks like there is a problem with 6.8. I'll do some testing with it.
>>>> So far, I haven't seen any panics with 6.8.9 but I have seen some random
>>>> segmentation faults
>>>> in the gcc testsuite. I looked at one ld fault in some detail. 18
>>>> contiguous words in the elf_link_hash_entry
>>>> struct were zeroed starting with the last word in the bfd_link_hash_entry
>>>> struct causing the fault.
>>>> The section pointer was zeroed.
>>>>
>>>> 18 words is a rather strange number of words to corrupt and corruption
>>>> doesn't seem related
>>>> to object structure. In any case, it is not page related.
>>>>
>>>> It's really hard to tell how this happens. The corrupt object was at a
>>>> slightly different location
>>>> than it is when ld is run under gdb. Can't duplicate in gdb.
>>>>
>>>> Dave
>>>
>>> Dave, not sure how much testing you have done with current mainline
>>> kernels, but I've had to temporarily give up on 6.8 and 6.9 for now, as
>>> most heavy builds quickly hit that kernel panic. 6.6 does not seem to have
>>> the problem though. The patch from this thread does not seem to have made
>>> a difference one way or the other w.r.t. segfaults.
>> My latest patch is looking good. I have 6 days of testing on c8000 (1 GHz
>> PA8800) with 6.8.10 and 6.8.11, and I haven't had any random segmentation
>> faults. System has been building debian packages. In addition, it has
>> been building and testing gcc. It's on its third gcc build and check with
>> patch.
>>
>> The latest version uses lpa_user() with fallback to page table search in
>> flush_cache_page_if_present() to obtain physical page address.
>> It revises copy_to_user_page() and copy_from_user_page() to flush kernel
>> mapping with tmpalias flushes. copy_from_user_page()
>> was missing kernel mapping flush. flush_cache_vmap() and
>> flush_cache_vunmap() are moved into cache.c. TLB is now flushed before
>> cache flush to inhibit move-in in these routines. flush_cache_vmap() now
>> handles small VM_IOREMAP flushes instead of flushing
>> entire cache. This latter change is an optimization.
>>
>> If random faults are still present, I believe we will have to give up
>> trying to optimize flush_cache_mm() and flush_cache_range() and
>> flush the whole cache in these routines.
>>
>> Some work would be needed to backport my current patch to longterm kernels
>> because of folio changes in 6.8.
>>
>> Dave
>
> Thanks a ton Dave, I've applied this on top of 6.9.2 and also think I'm
> seeing improvement! No panics yet, I have a couple week's worth of package
> testing to catch up on so I'll report if I see anything!
I've seen a few warnings in my dmesg while testing, although I didn't see any
immediately corresponding failures. Any danger?
[Sun Jun 2 18:46:29 2024] ------------[ cut here ]------------
[Sun Jun 2 18:46:29 2024] WARNING: CPU: 0 PID: 26808 at
arch/parisc/kernel/cache.c:624 flush_cache_page_if_present+0x1a4/0x330
[Sun Jun 2 18:46:29 2024] Modules linked in: raw_diag tcp_diag inet_diag
netlink_diag unix_diag nfnetlink overlay loop nfsv4 dns_resolver nfs
lockd grace sunrpc netfs autofs4 binfmt_misc sr_mod ohci_pci cdrom ehci_pci
ohci_hcd ehci_hcd tg3 pata_cmd64x usbcore ipmi_si hwmon usb_common
libata libphy ipmi_devintf nls_base ipmi_msghandler
[Sun Jun 2 18:46:29 2024] CPU: 0 PID: 26808 Comm: bash Tainted: G W
6.9.3-gentoo-parisc64 #1
[Sun Jun 2 18:46:29 2024] Hardware name: 9000/800/rp3440
[Sun Jun 2 18:46:29 2024] YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[Sun Jun 2 18:46:29 2024] PSW: 00001000000001101111100100001111 Tainted: G
W
[Sun Jun 2 18:46:29 2024] r00-03 000000ff0806f90f 000000004106b280
00000000402090bc 000000005160c6a0
[Sun Jun 2 18:46:29 2024] r04-07 0000000040f99a80 00000000f96da000
00000001659a2360 000000000800000f
[Sun Jun 2 18:46:29 2024] r08-11 0000000c0063f89c 0000000000000000
000000004ce09e9c 000000005160c5a8
[Sun Jun 2 18:46:29 2024] r12-15 000000004ce09eb0 00000000414ebd70
0000000041687768 0000000041646830
[Sun Jun 2 18:46:29 2024] r16-19 00000000516333c0 0000000001200000
00000001c36be780 0000000000000003
[Sun Jun 2 18:46:29 2024] r20-23 0000000000001a46 000000000f584000
ffffffffc0000000 000000000000000f
[Sun Jun 2 18:46:29 2024] r24-27 0000000000000000 000000000800000f
000000004ce09ea0 0000000040f99a80
[Sun Jun 2 18:46:29 2024] r28-31 0000000000000000 000000005160c720
000000005160c750 0000000000000000
[Sun Jun 2 18:46:29 2024] sr00-03 00000000052be800 00000000052be800
0000000000000000 00000000052be800
[Sun Jun 2 18:46:29 2024] sr04-07 0000000000000000 0000000000000000
0000000000000000 0000000000000000
[Sun Jun 2 18:46:29 2024] IASQ: 0000000000000000 0000000000000000 IAOQ:
0000000040209104 0000000040209108
[Sun Jun 2 18:46:29 2024] IIR: 03ffe01f ISR: 0000000010240000 IOR:
0000003382609ea0
[Sun Jun 2 18:46:29 2024] CPU: 0 CR30: 00000000516333c0 CR31:
fffffff0f0e05ee0
[Sun Jun 2 18:46:29 2024] ORIG_R28: 000000005160c7b0
[Sun Jun 2 18:46:29 2024] IAOQ[0]: flush_cache_page_if_present+0x1a4/0x330
[Sun Jun 2 18:46:29 2024] IAOQ[1]: flush_cache_page_if_present+0x1a8/0x330
[Sun Jun 2 18:46:29 2024] RP(r2): flush_cache_page_if_present+0x15c/0x330
[Sun Jun 2 18:46:29 2024] Backtrace:
[Sun Jun 2 18:46:29 2024] [<000000004020afb8>] flush_cache_mm+0x1a8/0x1c8
[Sun Jun 2 18:46:29 2024] [<000000004023cf3c>] copy_mm+0x2a8/0xfd0
[Sun Jun 2 18:46:29 2024] [<0000000040241040>] copy_process+0x1684/0x26e8
[Sun Jun 2 18:46:29 2024] [<0000000040242218>] kernel_clone+0xcc/0x754
[Sun Jun 2 18:46:29 2024] [<0000000040242908>] __do_sys_clone+0x68/0x80
[Sun Jun 2 18:46:29 2024] [<0000000040242d14>] sys_clone+0x30/0x60
[Sun Jun 2 18:46:29 2024] [<0000000040203fbc>] syscall_exit+0x0/0x10
[Sun Jun 2 18:46:29 2024] ---[ end trace 0000000000000000 ]---
next prev parent reply other threads:[~2024-06-04 15:07 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-05 16:58 [PATCH] parisc: Try to fix random segmentation faults in package builds John David Anglin
2024-05-08 8:54 ` Vidra.Jonas
2024-05-08 15:23 ` John David Anglin
2024-05-08 19:18 ` matoro
2024-05-08 20:52 ` John David Anglin
2024-05-08 23:51 ` matoro
2024-05-09 1:21 ` John David Anglin
2024-05-09 17:10 ` John David Anglin
2024-05-29 15:54 ` matoro
2024-05-29 16:33 ` John David Anglin
2024-05-30 5:00 ` matoro
2024-06-04 15:07 ` matoro [this message]
2024-06-04 17:08 ` John David Anglin
2024-06-10 19:52 ` matoro
2024-06-10 20:17 ` John David Anglin
2024-06-26 6:12 ` matoro
2024-06-26 15:44 ` John David Anglin
2024-05-12 6:57 ` Vidra.Jonas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52c0dfa7e2054d883bd66da7ab2e68b8@matoro.tk \
--to=matoro_mailinglist_kernel@matoro.tk \
--cc=Vidra.Jonas@seznam.cz \
--cc=dave.anglin@bell.net \
--cc=dave@parisc-linux.org \
--cc=deller@gmx.de \
--cc=linux-parisc@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox