Linux PARISC architecture development
 help / color / mirror / Atom feed
From: matoro <matoro_mailinglist_kernel@matoro.tk>
To: John David Anglin <dave.anglin@bell.net>
Cc: Vidra.Jonas@seznam.cz, linux-parisc@vger.kernel.org,
	John David Anglin <dave@parisc-linux.org>,
	Helge Deller <deller@gmx.de>
Subject: Re: [PATCH] parisc: Try to fix random segmentation faults in package builds
Date: Thu, 30 May 2024 01:00:18 -0400	[thread overview]
Message-ID: <7345472b8bfa050ec2b86df5f69f99a4@matoro.tk> (raw)
In-Reply-To: <16d8c07c-9fbe-4e81-b1f1-3127ab05410a@bell.net>

On 2024-05-29 12:33, John David Anglin wrote:
> On 2024-05-29 11:54 a.m., matoro wrote:
>> On 2024-05-09 13:10, John David Anglin wrote:
>>> On 2024-05-08 4:52 p.m., John David Anglin wrote:
>>>>> with no accompanying stack trace and then the BMC would restart the 
>>>>> whole machine automatically. These were infrequent enough that the 
>>>>> segfaults were the bigger problem, but after applying this patch on top 
>>>>> of 6.8, this changed the dynamic.  It seems to occur during builds with 
>>>>> varying I/O loads.  For example, I was able to build gcc fine, with no 
>>>>> segfaults, but I was unable to build perl, a much smaller build, without 
>>>>> crashing the machine. I did not observe any segfaults over the day or 2 
>>>>> I ran this patch, but that's not an unheard-of stretch of 
>>>>> time even without it, and I am being forced to revert because of the panics.
>>>> Looks like there is a problem with 6.8.  I'll do some testing with it.
>>> So far, I haven't seen any panics with 6.8.9 but I have seen some random 
>>> segmentation faults
>>> in the gcc testsuite.  I looked at one ld fault in some detail. 18 
>>> contiguous words in the  elf_link_hash_entry
>>> struct were zeroed starting with the last word in the bfd_link_hash_entry 
>>> struct causing the fault.
>>> The section pointer was zeroed.
>>> 
>>> 18 words is a rather strange number of words to corrupt and corruption 
>>> doesn't seem related
>>> to object structure.  In any case, it is not page related.
>>> 
>>> It's really hard to tell how this happens.  The corrupt object was at a 
>>> slightly different location
>>> than it is when ld is run under gdb.  Can't duplicate in gdb.
>>> 
>>> Dave
>> 
>> Dave, not sure how much testing you have done with current mainline 
>> kernels, but I've had to temporarily give up on 6.8 and 6.9 for now, as 
>> most heavy builds quickly hit that kernel panic. 6.6 does not seem to have 
>> the problem though.  The patch from this thread does not seem to have made 
>> a difference one way or the other w.r.t. segfaults.
> My latest patch is looking good.  I have 6 days of testing on c8000 (1 GHz 
> PA8800) with 6.8.10 and 6.8.11, and I haven't had any random segmentation
> faults.  System has been building debian packages.  In addition, it has been 
> building and testing gcc.  It's on its third gcc build and check with patch.
> 
> The latest version uses lpa_user() with fallback to page table search in 
> flush_cache_page_if_present() to obtain physical page address.
> It revises copy_to_user_page() and copy_from_user_page() to flush kernel 
> mapping with tmpalias flushes.  copy_from_user_page()
> was missing kernel mapping flush.  flush_cache_vmap() and 
> flush_cache_vunmap() are moved into cache.c.  TLB is now flushed before
> cache flush to inhibit move-in in these routines. flush_cache_vmap() now 
> handles small VM_IOREMAP flushes instead of flushing
> entire cache.  This latter change is an optimization.
> 
> If random faults are still present, I believe we will have to give up trying 
> to optimize flush_cache_mm() and flush_cache_range() and
> flush the whole cache in these routines.
> 
> Some work would be needed to backport my current patch to longterm kernels 
> because of folio changes in 6.8.
> 
> Dave

Thanks a ton Dave, I've applied this on top of 6.9.2 and also think I'm 
seeing improvement!  No panics yet, I have a couple week's worth of package 
testing to catch up on so I'll report if I see anything!

  reply	other threads:[~2024-05-30  5:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-05 16:58 [PATCH] parisc: Try to fix random segmentation faults in package builds John David Anglin
2024-05-08  8:54 ` Vidra.Jonas
2024-05-08 15:23   ` John David Anglin
2024-05-08 19:18     ` matoro
2024-05-08 20:52       ` John David Anglin
2024-05-08 23:51         ` matoro
2024-05-09  1:21           ` John David Anglin
2024-05-09 17:10         ` John David Anglin
2024-05-29 15:54           ` matoro
2024-05-29 16:33             ` John David Anglin
2024-05-30  5:00               ` matoro [this message]
2024-06-04 15:07                 ` matoro
2024-06-04 17:08                   ` John David Anglin
2024-06-10 19:52                     ` matoro
2024-06-10 20:17                       ` John David Anglin
2024-06-26  6:12                         ` matoro
2024-06-26 15:44                           ` John David Anglin
2024-05-12  6:57     ` Vidra.Jonas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7345472b8bfa050ec2b86df5f69f99a4@matoro.tk \
    --to=matoro_mailinglist_kernel@matoro.tk \
    --cc=Vidra.Jonas@seznam.cz \
    --cc=dave.anglin@bell.net \
    --cc=dave@parisc-linux.org \
    --cc=deller@gmx.de \
    --cc=linux-parisc@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox