public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tony.luck@intel.com, x86@kernel.org,
	Smita.KoralahalliChannabasappa@amd.com, mpatocka@redhat.com
Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
Date: Wed, 26 Oct 2022 22:12:15 +0200	[thread overview]
Message-ID: <Y1mUn/xvx1RYPqAQ@zn.tnic> (raw)
In-Reply-To: <Y1mOEfEM6MdnV8CX@yaz-fattaah>

On Wed, Oct 26, 2022 at 07:44:17PM +0000, Yazen Ghannam wrote:
> 1) Apply the patch I submitted as a simple fix/workaround for the presented
> symptom. I tried to keep it small and well described to be a stable backport.
> Obviously I wrote it without knowing the shared kobject behavior isn't ideal.

We'll see.

> 2) Address the shared kobject thing.
>    Here are some options:
>    a. Only set up the thresholding kobject on a single CPU per "AMD Node".
>    Technically MCA Bank 4 is "shared" on legacy systems. But AFAICT from
>    looking at old BKDG docs, in practice only the "Node Base Core" can access
>    the registers. This behavior is controlled by a bit in NB which BIOS is
>    supposed to set. Maybe some BIOSes don't do this, but I think that's a
>    "broken BIOS on legacy system" issue if so.

I guess we can do that. And I even think we have some code which finds
out which the NBC is...

/me greps a bit:

ah, there it is: get_nbc_for_node() in arch/x86/kernel/cpu/mce/inject.c.


>    b. Disable the MCA Thresholding interface for Families before 0x17.

Can't. It is user-visible and you don't know for sure whether someone is
using it or not.

Believe me, I have been wanting to disable this thing forever. I've
never heard of anyone using it and all the energy we put in it was for
nothing. :-\

We could try to deprecate it, though, make it default=n in Kconfig and
see who complains. And after a couple of releases, kill it.

>    This is an undocumented interface, 

Of course it is documented - it is in the old BKDGs.

> and I don't know if anyone is using it on older systems.

Yap.

> The issue we're discussing here started because of a splat during
> suspend/resume/CPU hotplug. In disable_err_thresholding(), we disable
> MCA Thresholding for bank 4 on Family 15h, so there's some precedent.
> c. Do nothing at the moment. I *really* want to clean up the MCA
> Thresholding interface, and the shared kobject thing may get resolved
> in that.

Clean it up how exactly?

Put it behind a Kconfig item, disable it and remove it after a while?

:-)

If so, I wouldn't mind. No one's using this. At least I haven't heard of
a single bug report or of a use case. Only when CPU hotplug explodes and
that thing is involved, only then.

Might as well remove it. And then remove it in the hardware too. RAS
folks would love to get rid of some of that crap which takes up verif
resources for no good reason.

:-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2022-10-26 20:12 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 17:43 [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks Yazen Ghannam
2022-06-15  6:33 ` Greg KH
2022-06-15 13:51   ` Yazen Ghannam
2022-10-26 10:16     ` Borislav Petkov
2022-10-26 12:04       ` Greg KH
2022-10-26 15:39         ` Yazen Ghannam
2022-10-26 18:29           ` Borislav Petkov
2022-10-26 19:44             ` Yazen Ghannam
2022-10-26 20:12               ` Borislav Petkov [this message]
2022-11-02  2:36                 ` Yazen Ghannam
2022-08-12 21:14 ` Mateusz Jończyk
2022-08-13 10:09   ` Borislav Petkov
2022-08-13 12:04     ` Mateusz Jończyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1mUn/xvx1RYPqAQ@zn.tnic \
    --to=bp@alien8.de \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox