public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way
Date: Mon, 8 Apr 2019 19:51:42 +0200	[thread overview]
Message-ID: <20190408175142.GK15689@zn.tnic> (raw)
In-Reply-To: <20190408141205.12376-3-Yazen.Ghannam@amd.com>

On Mon, Apr 08, 2019 at 02:12:16PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Current AMD systems have unique MCA banks per logical CPU even though
> the type of the banks may all align to the same bank number. Each CPU
> will have control of a set of MCA banks in the hardware and these are
> not shared with other CPUs.
> 
> For example, bank 0 may be the Load-Store Unit on every logical CPU, but
> each bank 0 is a unique structure in the hardware. In other words, there
> isn't a *single* Load-Store Unit at MCA bank 0 that all logical CPUs
> share.
> 
> This idea extends even to non-core MCA banks. For example, CPU0 and CPU4
> may see a Unified Memory Controller at bank 15, but each CPU is actually
> seeing a unique hardware structure that is not shared with other CPUs.
> 
> Because the MCA banks are all unique hardware structures, it would be
> good to control them in a more granular way. For example, if there is a
> known issue with the Floating Point Unit on CPU5 and a user wishes to
> disable an error type on the Floating Point Unit, then it would be good
> to do this only for CPU5 rather than all CPUs.
> 
> Also, future AMD systems may have heterogeneous MCA banks. Meaning the
> bank numbers may not necessarily represent the same types between CPUs.
> For example, bank 20 visible to CPU0 may be a Unified Memory Controller
> and bank 20 visible to CPU4 may be a Coherent Slave. So granular control
> will be even more necessary should the user wish to control specific MCA
> banks.
> 
> Split the device attributes from struct mce_bank leaving only the MCA
> bank control fields.
> 
> Make struct mce_banks[] per_cpu in order to have more granular control
> over individual MCA banks in the hardware.
> 
> Allocate the device attributes statically based on the maximum number of
> MCA banks supported. The sysfs interface will use as many as needed per
> CPU. Currently, this is set to mca_cfg.banks, but will be changed to a
> per_cpu bank count in a future patch.
> 
> Allocate the MCA control bits dynamically. Use the maximum number of MCA
> banks supported for now. This will be changed to a per_cpu bank count in
> a future patch.
> 
> Redo the sysfs store/show functions to handle the per_cpu mce_banks[].
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 77 ++++++++++++++++++++++------------
>  1 file changed, 51 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 8d0d1e8425db..14583c5c6e12 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -64,16 +64,21 @@ static DEFINE_MUTEX(mce_sysfs_mutex);
>  
>  DEFINE_PER_CPU(unsigned, mce_exception_count);
>  
> +struct mce_bank {
> +	u64	ctl;	/* subevents to enable */
> +	bool	init;	/* initialise bank? */

Keep that vertical alignment as of that of the members if mce_bank_dev
below.

> +};
> +static DEFINE_PER_CPU_READ_MOSTLY(struct mce_bank*, mce_banks);

Space between mce_bank and *.

> +
>  #define ATTR_LEN               16
>  /* One object for each MCE bank, shared by all CPUs */
> -struct mce_bank {
> -	u64			ctl;			/* subevents to enable */
> -	bool			init;			/* initialise bank? */
> +struct mce_bank_dev {
>  	struct device_attribute	attr;			/* device attribute */
>  	char			attrname[ATTR_LEN];	/* attribute name */
> +	u8			bank;			/* bank number */
>  };
> +static struct mce_bank_dev mce_bank_devs[MAX_NR_BANKS];

What bothers me here is the connection between the mce_bank and the
mce_bank_dev: it is simply not there.

Why isn't there a

	struct mce_bank_dev *dev;

in struct mce_bank?

Because - and correct me if I'm wrong here - but I think if we do
per-CPU banks, then we need to selectively point from each mce_bank to
its corresponding mce_bank_dev descriptor so that you have the proper
names.

For example, if bank3 on CPU5 is not present/disabled/N/A/whatever, then
you need to not initialize the that sysfs file there and have:

/sys/devices/system/machinecheck/machinecheck5/
├── bank0
├── bank1
├── bank10
├── bank11
├── bank12
├── bank13
├── bank14
├── bank15
├── bank16
├── bank17
├── bank18
├── bank19
├── bank2
├── bank20
├── bank21
├── bank22
		<--- bank 3 is not there because unsupported.
├── bank4
├── bank5
├── bank6
├── bank7
├── bank8
├── bank9


Which means that mce_device_create() should learn to be able to create
non-contiguous per-CPU bank sysfs files so that you'll have to iterate
over the per-CPU struct mce_banks array and use only those mce_bank_dev
* pointers which represent present banks on this CPU only.

Yes, no, am I way off?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

  reply	other threads:[~2019-04-08 17:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-08 14:12 [PATCH RESEND 0/5] Handle MCA banks in a per_cpu way Ghannam, Yazen
2019-04-08 14:12 ` [PATCH RESEND 1/5] x86/MCE: Make struct mce_banks[] static Ghannam, Yazen
2019-04-08 14:12 ` [PATCH RESEND 2/5] x86/MCE: Handle MCA controls in a per_cpu way Ghannam, Yazen
2019-04-08 17:51   ` Borislav Petkov [this message]
2019-04-08 18:55     ` Ghannam, Yazen
2019-04-09 20:34       ` Borislav Petkov
2019-04-10 16:36         ` Ghannam, Yazen
2019-04-10 16:40           ` Borislav Petkov
2019-04-10 16:58             ` Ghannam, Yazen
2019-04-10 17:25               ` Borislav Petkov
2019-04-10 19:41                 ` Ghannam, Yazen
2019-04-10 20:04                   ` Borislav Petkov
2019-04-08 14:12 ` [PATCH RESEND 4/5] x86/MCE: Make number of MCA banks per_cpu Ghannam, Yazen
2019-04-08 20:26   ` Luck, Tony
2019-04-08 20:34     ` Borislav Petkov
2019-04-08 20:42       ` Luck, Tony
2019-04-08 20:50         ` Borislav Petkov
2019-04-08 22:48           ` Ghannam, Yazen
2019-04-08 23:23             ` Luck, Tony
2019-04-08 23:37               ` Ghannam, Yazen
2019-04-09 11:38             ` Borislav Petkov
2019-04-08 14:12 ` [PATCH RESEND 3/5] x86/MCE/AMD: Don't cache block addresses on SMCA systems Ghannam, Yazen
2019-04-08 14:12 ` [PATCH RESEND 5/5] x86/MCE: Save MCA control bits that get set in hardware Ghannam, Yazen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190408175142.GK15689@zn.tnic \
    --to=bp@alien8.de \
    --cc=Yazen.Ghannam@amd.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox