From: matoro <matoro_mailinglist_kernel@matoro.tk>
To: John David Anglin <dave.anglin@bell.net>
Cc: Vidra.Jonas@seznam.cz, linux-parisc@vger.kernel.org,
John David Anglin <dave@parisc-linux.org>,
Helge Deller <deller@gmx.de>
Subject: Re: [PATCH] parisc: Try to fix random segmentation faults in package builds
Date: Thu, 30 May 2024 01:00:18 -0400 [thread overview]
Message-ID: <7345472b8bfa050ec2b86df5f69f99a4@matoro.tk> (raw)
In-Reply-To: <16d8c07c-9fbe-4e81-b1f1-3127ab05410a@bell.net>
On 2024-05-29 12:33, John David Anglin wrote:
> On 2024-05-29 11:54 a.m., matoro wrote:
>> On 2024-05-09 13:10, John David Anglin wrote:
>>> On 2024-05-08 4:52 p.m., John David Anglin wrote:
>>>>> with no accompanying stack trace and then the BMC would restart the
>>>>> whole machine automatically. These were infrequent enough that the
>>>>> segfaults were the bigger problem, but after applying this patch on top
>>>>> of 6.8, this changed the dynamic. It seems to occur during builds with
>>>>> varying I/O loads. For example, I was able to build gcc fine, with no
>>>>> segfaults, but I was unable to build perl, a much smaller build, without
>>>>> crashing the machine. I did not observe any segfaults over the day or 2
>>>>> I ran this patch, but that's not an unheard-of stretch of
>>>>> time even without it, and I am being forced to revert because of the panics.
>>>> Looks like there is a problem with 6.8. I'll do some testing with it.
>>> So far, I haven't seen any panics with 6.8.9 but I have seen some random
>>> segmentation faults
>>> in the gcc testsuite. I looked at one ld fault in some detail. 18
>>> contiguous words in the elf_link_hash_entry
>>> struct were zeroed starting with the last word in the bfd_link_hash_entry
>>> struct causing the fault.
>>> The section pointer was zeroed.
>>>
>>> 18 words is a rather strange number of words to corrupt and corruption
>>> doesn't seem related
>>> to object structure. In any case, it is not page related.
>>>
>>> It's really hard to tell how this happens. The corrupt object was at a
>>> slightly different location
>>> than it is when ld is run under gdb. Can't duplicate in gdb.
>>>
>>> Dave
>>
>> Dave, not sure how much testing you have done with current mainline
>> kernels, but I've had to temporarily give up on 6.8 and 6.9 for now, as
>> most heavy builds quickly hit that kernel panic. 6.6 does not seem to have
>> the problem though. The patch from this thread does not seem to have made
>> a difference one way or the other w.r.t. segfaults.
> My latest patch is looking good. I have 6 days of testing on c8000 (1 GHz
> PA8800) with 6.8.10 and 6.8.11, and I haven't had any random segmentation
> faults. System has been building debian packages. In addition, it has been
> building and testing gcc. It's on its third gcc build and check with patch.
>
> The latest version uses lpa_user() with fallback to page table search in
> flush_cache_page_if_present() to obtain physical page address.
> It revises copy_to_user_page() and copy_from_user_page() to flush kernel
> mapping with tmpalias flushes. copy_from_user_page()
> was missing kernel mapping flush. flush_cache_vmap() and
> flush_cache_vunmap() are moved into cache.c. TLB is now flushed before
> cache flush to inhibit move-in in these routines. flush_cache_vmap() now
> handles small VM_IOREMAP flushes instead of flushing
> entire cache. This latter change is an optimization.
>
> If random faults are still present, I believe we will have to give up trying
> to optimize flush_cache_mm() and flush_cache_range() and
> flush the whole cache in these routines.
>
> Some work would be needed to backport my current patch to longterm kernels
> because of folio changes in 6.8.
>
> Dave
Thanks a ton Dave, I've applied this on top of 6.9.2 and also think I'm
seeing improvement! No panics yet, I have a couple week's worth of package
testing to catch up on so I'll report if I see anything!
next prev parent reply other threads:[~2024-05-30 5:00 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-05 16:58 [PATCH] parisc: Try to fix random segmentation faults in package builds John David Anglin
2024-05-08 8:54 ` Vidra.Jonas
2024-05-08 15:23 ` John David Anglin
2024-05-08 19:18 ` matoro
2024-05-08 20:52 ` John David Anglin
2024-05-08 23:51 ` matoro
2024-05-09 1:21 ` John David Anglin
2024-05-09 17:10 ` John David Anglin
2024-05-29 15:54 ` matoro
2024-05-29 16:33 ` John David Anglin
2024-05-30 5:00 ` matoro [this message]
2024-06-04 15:07 ` matoro
2024-06-04 17:08 ` John David Anglin
2024-06-10 19:52 ` matoro
2024-06-10 20:17 ` John David Anglin
2024-06-26 6:12 ` matoro
2024-06-26 15:44 ` John David Anglin
2024-05-12 6:57 ` Vidra.Jonas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7345472b8bfa050ec2b86df5f69f99a4@matoro.tk \
--to=matoro_mailinglist_kernel@matoro.tk \
--cc=Vidra.Jonas@seznam.cz \
--cc=dave.anglin@bell.net \
--cc=dave@parisc-linux.org \
--cc=deller@gmx.de \
--cc=linux-parisc@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox