public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	tony.luck@intel.com, x86@kernel.org,
	Smita.KoralahalliChannabasappa@amd.com, mpatocka@redhat.com
Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
Date: Wed, 15 Jun 2022 08:33:50 +0200	[thread overview]
Message-ID: <Yql9TqFtebd2h9Z9@kroah.com> (raw)
In-Reply-To: <20220614174346.3648305-1-yazen.ghannam@amd.com>

On Tue, Jun 14, 2022 at 05:43:46PM +0000, Yazen Ghannam wrote:
> AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
> Therefore, the threshold_bank structure for bank 4, and its threshold_block
> structures, will be initialized once at boot time. And the kobject for the
> shared bank will be added to each of the CPUs that share it. Furthermore,
> the threshold_blocks for the shared bank will be added again to the bank's
> kobject. These additions will increase the refcount for the bank's kobject.
> 
> For example, a shared bank with two blocks and shared across two CPUs will
> be set up like this:
> 
> CPU0 init
>   bank create and add; bank refcount = 1; threshold_create_bank()
>     block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
>     block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
> CPU1 init
>   bank add; bank refcount = 3; threshold_create_bank()
>     block 0 add; bank refcount = 4; __threshold_add_blocks()
>     block 1 add; bank refcount = 5; __threshold_add_blocks()
> 
> Currently in threshold_remove_bank(), if the bank is shared then
> __threshold_remove_blocks() is called. Here the shared bank's kobject and
> the bank's blocks' kobjects are deleted. This is done on the first call
> even while the structures are still shared. Subsequent calls from other
> CPUs that share the structures will attempt to delete the kobjects.
> 
> During kobject_del(), kobject->sd is removed. If the kobject is not part of
> a kset with default_groups, then subsequent kobject_del() calls seem safe
> even with kobject->sd == NULL.
> 
> Originally, the AMD MCA thresholding structures did not use default_groups.
> And so the above behavior was not apparent.
> 
> However, a recent change implemented default_groups for the thresholding
> structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
> code path. In this case, the first kobject_del() may succeed and remove
> kobject->sd. But subsequent kobject_del() calls will give a WARNing in
> kernfs_remove_by_name_ns() since kobject->sd == NULL.
> 
> Use kobject_put() on the shared bank's kobject when "removing" blocks. This
> decrements the bank's refcount while keeping kobjects enabled until the
> bank is no longer shared. At that point, kobject_put() will be called on
> the blocks which drives their refcount to 0 and deletes them and also
> decrementing the bank's refcount. And finally kobject_put() will be called
> on the bank driving its refcount to 0 and deleting it.
> 
> With this patch and the example above:
> 
> CPU1 shutdown
>   bank is shared; bank refcount = 5; threshold_remove_bank()
>     block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
>     block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
> CPU0 shutdown
>   bank is no longer shared; bank refcount = 3; threshold_remove_bank()
>     block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
>     block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
>   put bank; bank refcount = 0; threshold_remove_bank()
> 
> Fixes: 7f99cb5e6039 ("x86/CPU/AMD: Use default_groups in kobj_type")

This predates this fixup, this commit just exposed the root problem here
so odds are it should be backported further, right?

thanks,

greg k-h

  reply	other threads:[~2022-06-15  7:41 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14 17:43 [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks Yazen Ghannam
2022-06-15  6:33 ` Greg KH [this message]
2022-06-15 13:51   ` Yazen Ghannam
2022-10-26 10:16     ` Borislav Petkov
2022-10-26 12:04       ` Greg KH
2022-10-26 15:39         ` Yazen Ghannam
2022-10-26 18:29           ` Borislav Petkov
2022-10-26 19:44             ` Yazen Ghannam
2022-10-26 20:12               ` Borislav Petkov
2022-11-02  2:36                 ` Yazen Ghannam
2022-08-12 21:14 ` Mateusz Jończyk
2022-08-13 10:09   ` Borislav Petkov
2022-08-13 12:04     ` Mateusz Jończyk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yql9TqFtebd2h9Z9@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox