linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Renaud Métrich" <rmetrich@redhat.com>
To: Luis Chamberlain <mcgrof@kernel.org>,
	Oleksandr Natalenko <oleksandr@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Huang Ying <ying.huang@intel.com>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Will Deacon <will@kernel.org>,
	"Guilherme G . Piccoli" <gpiccoli@igalia.com>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	Stephen Kitt <steve@sk2.org>, Rob Herring <robh@kernel.org>,
	Joel Savitz <jsavitz@redhat.com>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Kees Cook <keescook@chromium.org>,
	Xiaoming Ni <nixiaoming@huawei.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Grzegorz Halat <ghalat@redhat.com>, Qi Guo <qguo@redhat.com>
Subject: Re: [PATCH] core_pattern: add CPU specifier
Date: Thu, 8 Sep 2022 08:45:38 +0200	[thread overview]
Message-ID: <b1673cd8-dd6d-8b50-6c5a-c715f368f12d@redhat.com> (raw)
In-Reply-To: <Yxi+dQkuV2zdBzk3@bombadil.infradead.org>


[-- Attachment #1.1: Type: text/plain, Size: 1690 bytes --]

Hello,

I have been working closely with Oleksandr on a couple of cases where 
customers could see segfaults for various processes, including basic 
tools ("grep", "cut", etc.) that usually don't die.

The coredumps showed of course nothing because from userland's 
perspective there was nothing wrong, but just a bad pointer which 
couldn't be explained.

Memory testing (e.g. Memtest86+) and CPU testing (usually from hardware 
vendor) never showed any issue with the hardware as well, even though 
there was, probably because it required special conditions, such as 
specific load and/or thermal conditions.

The troubleshooting of such cases takes several weeks or even months, 
until we have enough evidence it's not the OS that is faulty, and it's 
always struggling.

Usually when we start getting kernel crashes, we are then happy because 
kernel crashes indicate the CPU the task was running on, and it seems to 
always be reliable enough information to point to faulty CPU. For other 
cases where no kernel crash could be observed, these are solved after 
requesting the customer to replace the hardware components, which is 
something difficult to explain since it usually costs the customer money 
and time.

I hope such feature will be helpful for everybody doing Linux support.

Renaud.

Le 9/7/22 à 17:53, Luis Chamberlain a écrit :
> On Sat, Sep 03, 2022 at 08:43:30AM +0200, Oleksandr Natalenko wrote:
>> Statistically, in a large deployment regular segfaults may indicate a CPU issue.
> Can you elaborate on this? How common is this observed to be true? Are
> there any public findings or bugs where it showed this?
>
>    Luis
>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

      reply	other threads:[~2022-09-08  6:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-03  6:43 [PATCH] core_pattern: add CPU specifier Oleksandr Natalenko
2022-09-03  7:20 ` Oleg Nesterov
2022-09-04 18:27 ` Andrew Morton
2022-09-04 19:38   ` Oleg Nesterov
2022-09-06 22:22 ` Eric W. Biederman
2022-09-07  6:15   ` Oleksandr Natalenko
2022-09-07 17:34     ` Oleg Nesterov
2022-09-07 22:00       ` Eric W. Biederman
2022-09-08  6:11         ` Oleksandr Natalenko
2022-09-07 15:53 ` Luis Chamberlain
2022-09-08  6:45   ` Renaud Métrich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1673cd8-dd6d-8b50-6c5a-c715f368f12d@redhat.com \
    --to=rmetrich@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=ghalat@redhat.com \
    --cc=gpiccoli@igalia.com \
    --cc=jsavitz@redhat.com \
    --cc=keescook@chromium.org \
    --cc=ldufour@linux.ibm.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=nixiaoming@huawei.com \
    --cc=oleg@redhat.com \
    --cc=oleksandr@redhat.com \
    --cc=qguo@redhat.com \
    --cc=robh@kernel.org \
    --cc=steve@sk2.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).