From: Paul Fertser <fercerpav@gmail.com>
To: "Winiarska, Iwona" <iwona.winiarska@intel.com>
Cc: "linux@roeck-us.net" <linux@roeck-us.net>,
"jae.hyun.yoo@linux.intel.com" <jae.hyun.yoo@linux.intel.com>,
"Rudolph, Patrick" <patrick.rudolph@9elements.com>,
"pierre-louis.bossart@linux.dev" <pierre-louis.bossart@linux.dev>,
"Solanki, Naresh" <naresh.solanki@9elements.com>,
"jdelvare@suse.com" <jdelvare@suse.com>,
"fr0st61te@gmail.com" <fr0st61te@gmail.com>,
"linux-hwmon@vger.kernel.org" <linux-hwmon@vger.kernel.org>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"openbmc@lists.ozlabs.org" <openbmc@lists.ozlabs.org>,
"joel@jms.id.au" <joel@jms.id.au>
Subject: Re: [PATCH] hwmon: (peci/dimmtemp) Do not provide fake thresholds data
Date: Mon, 27 Jan 2025 21:54:21 +0300 [thread overview]
Message-ID: <Z5fWXfm+bDhGlFIi@home.paul.comp> (raw)
In-Reply-To: <71b63aa1646af4ae30b59f6d70f3daaeb983b6f8.camel@intel.com>
Hi Iwona,
Thank you for the review. Please see inline.
On Mon, Jan 27, 2025 at 04:40:52PM +0000, Winiarska, Iwona wrote:
> On Thu, 2025-01-23 at 15:20 +0300, Paul Fertser wrote:
> > When an Icelake or Sapphire Rapids CPU isn't providing the maximum and
> > critical thresholds for particular DIMM the driver should return an
> > error to the userspace instead of giving it stale (best case) or wrong
> > (the structure contains all zeros after kzalloc() call) data.
> >
> > The issue can be reproduced by binding the peci driver while the host is
> > fully booted and idle, this makes PECI interaction unreliable enough.
> >
> > Fixes: 73bc1b885dae ("hwmon: peci: Add dimmtemp driver")
> > Fixes: 621995b6d795 ("hwmon: (peci/dimmtemp) Add Sapphire Rapids support")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Paul Fertser <fercerpav@gmail.com>
>
> Did you have a chance to test it with OpenBMC dbus-sensors?
Using OpenBMC dbus-sensors is exactly the reason why I'm sending this
patch, so yes, I tested it before and after the change.
> In general, the change looks okay to me, but since it modifies the behavior
> (applications will need to handle this, and returning an error will happen more
> often) we need to confirm that it does not cause any regressions for userspace.
The change is prompted by the current behaviour which is unacceptably
bad: every now and then while powering on the host for the first time
BMC happens to request one of the memory thresholds at a wrong time
(e.g. when UEFI is busy doing something which prevents normal PECI
operation); this leads to the unfixed kernel code returning zero and
dbus-sensors happily using that as a threshold value which later
results in bogus critical over temperature events for the affected
DIMM (as their normal temperatures are always above zero). It was
relatively easy to reproduce on an IceLake-based system.
I consider the current behaviour (in case of PECI timeouts when
requesting DIMM temperature thresholds) to be so broken that changing
it to do the right thing can only do good. The non-failure case is not
affected by this patch.
That said, for sensible operation a dbus-sensors change is indeed
needed and I now have a patch pending upstream review[0] to handle
those errors by retrying until success. Without the patch the daemon
would just load with those thresholds missing but it's better to have
thresholds missing than to have them at zero producing a critical error
right away I think.
[0] https://gerrit.openbmc.org/c/openbmc/dbus-sensors/+/77500/
--
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:fercerpav@gmail.com
prev parent reply other threads:[~2025-01-27 18:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-23 12:20 [PATCH] hwmon: (peci/dimmtemp) Do not provide fake thresholds data Paul Fertser
2025-01-27 16:40 ` Winiarska, Iwona
2025-01-27 17:29 ` Guenter Roeck
2025-01-27 18:30 ` Paul Fertser
2025-01-27 18:39 ` Guenter Roeck
2025-01-27 19:10 ` Paul Fertser
2025-01-28 3:34 ` Guenter Roeck
2025-01-27 18:54 ` Paul Fertser [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z5fWXfm+bDhGlFIi@home.paul.comp \
--to=fercerpav@gmail.com \
--cc=fr0st61te@gmail.com \
--cc=iwona.winiarska@intel.com \
--cc=jae.hyun.yoo@linux.intel.com \
--cc=jdelvare@suse.com \
--cc=joel@jms.id.au \
--cc=linux-hwmon@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=naresh.solanki@9elements.com \
--cc=openbmc@lists.ozlabs.org \
--cc=patrick.rudolph@9elements.com \
--cc=pierre-louis.bossart@linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox