public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tony.luck@intel.com, x86@kernel.org,
	Smita.KoralahalliChannabasappa@amd.com, mpatocka@redhat.com
Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
Date: Wed, 26 Oct 2022 19:44:17 +0000	[thread overview]
Message-ID: <Y1mOEfEM6MdnV8CX@yaz-fattaah> (raw)
In-Reply-To: <Y1l8nx1KnTFP1xKj@zn.tnic>

On Wed, Oct 26, 2022 at 08:29:51PM +0200, Borislav Petkov wrote:
> On Wed, Oct 26, 2022 at 03:39:15PM +0000, Yazen Ghannam wrote:
> > What's the issue with my original patch?
> 
> Do you see it?
> 
> > @@ -1258,10 +1258,10 @@ static void __threshold_remove_blocks(struct threshold_bank *b)
> >         struct threshold_block *pos = NULL;
> >         struct threshold_block *tmp = NULL;
> >  
> > -       kobject_del(b->kobj);
> > +       kobject_put(b->kobj);
> >  
> >         list_for_each_entry_safe(pos, tmp, &b->blocks->miscj, miscj)
> > -               kobject_del(&pos->kobj);
> > +               kobject_put(b->kobj);
> 
> You're basically putting the parent as many times as there are elements
> on the ->miscj list.
> 
> Basically what Greg doesn't like.
> 
> Him and I need to talk it over first whether my gross hack of grafting
> the bank4 kobject hierarchy from CPU0 onto the other CPUs on the node is
> even viable so stay tuned...
> 
> > I think this is the simplest way to fix the current implementation.
> > But we should probably get rid of this kobject sharing idea in light
> > of Greg's comments.
> 
> You said it. :)
> 
> Or maybe do a better one.

Right, so can we do the following two things?

1) Apply the patch I submitted as a simple fix/workaround for the presented
symptom. I tried to keep it small and well described to be a stable backport.
Obviously I wrote it without knowing the shared kobject behavior isn't ideal.

2) Address the shared kobject thing.
   Here are some options:
   a. Only set up the thresholding kobject on a single CPU per "AMD Node".
   Technically MCA Bank 4 is "shared" on legacy systems. But AFAICT from
   looking at old BKDG docs, in practice only the "Node Base Core" can access
   the registers. This behavior is controlled by a bit in NB which BIOS is
   supposed to set. Maybe some BIOSes don't do this, but I think that's a
   "broken BIOS on legacy system" issue if so.
   b. Disable the MCA Thresholding interface for Families before 0x17. This is
   an undocumented interface, and I don't know if anyone is using it on older
   systems. The issue we're discussing here started because of a splat during
   suspend/resume/CPU hotplug. In disable_err_thresholding(), we disable MCA
   Thresholding for bank 4 on Family 15h, so there's some precedent.
   c. Do nothing at the moment. I *really* want to clean up the MCA
   Thresholding interface, and the shared kobject thing may get resolved in
   that.

What do you think?

Thanks,
Yazen

  reply	other threads:[~2022-10-26 19:45 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 17:43 [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks Yazen Ghannam
2022-06-15  6:33 ` Greg KH
2022-06-15 13:51   ` Yazen Ghannam
2022-10-26 10:16     ` Borislav Petkov
2022-10-26 12:04       ` Greg KH
2022-10-26 15:39         ` Yazen Ghannam
2022-10-26 18:29           ` Borislav Petkov
2022-10-26 19:44             ` Yazen Ghannam [this message]
2022-10-26 20:12               ` Borislav Petkov
2022-11-02  2:36                 ` Yazen Ghannam
2022-08-12 21:14 ` Mateusz Jończyk
2022-08-13 10:09   ` Borislav Petkov
2022-08-13 12:04     ` Mateusz Jończyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1mOEfEM6MdnV8CX@yaz-fattaah \
    --to=yazen.ghannam@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=bp@alien8.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox