From: "Srivatsa S. Bhat" <srivatsa@csail.mit.edu>
To: "Datta, Shubhrajyoti" <shubhrajyoti.datta@amd.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"git (AMD-Xilinx)" <git@amd.com>,
"ptsm@linux.microsoft.com" <ptsm@linux.microsoft.com>,
"shubhrajyoti.datta@gmail.com" <shubhrajyoti.datta@gmail.com>,
Borislav Petkov <bp@alien8.de>, Tony Luck <tony.luck@intel.com>
Subject: Re: [PATCH] EDAC/versal: Report PFN and page offset for DDR errors
Date: Tue, 28 Apr 2026 00:19:12 +0530 [thread overview]
Message-ID: <ae-vqFb1GvTkj9e0@csail.mit.edu> (raw)
In-Reply-To: <DS2PR12MB9821117564B73D6C805727BB81362@DS2PR12MB9821.namprd12.prod.outlook.com>
On Mon, Apr 27, 2026 at 06:27:36AM +0000, Datta, Shubhrajyoti wrote:
> Public
>
> > -----Original Message-----
> > From: Srivatsa S. Bhat <srivatsa@csail.mit.edu>
> > Sent: Wednesday, April 22, 2026 9:20 AM
> > To: Datta, Shubhrajyoti <shubhrajyoti.datta@amd.com>
> > Cc: linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org; git (AMD-Xilinx)
> > <git@amd.com>; ptsm@linux.microsoft.com; shubhrajyoti.datta@gmail.com;
> > Borislav Petkov <bp@alien8.de>; Tony Luck <tony.luck@intel.com>
> > Subject: Re: [PATCH] EDAC/versal: Report PFN and page offset for DDR errors
> >
> > Caution: This message originated from an External Source. Use proper caution
> > when opening attachments, clicking links, or responding.
> >
> >
> > On Wed, Apr 15, 2026 at 11:32:39AM +0530, Shubhrajyoti Datta wrote:
> > > Currently, DDRMC correctable and uncorrectable error events are
> > > reported to EDAC with page frame number (pfn) and offset set to zero.
> > > This information is not useful to locate the address for memory errors.
> > >
> > > Compute the physical address from the error information and extract
> > > the page frame number and offset before calling edac_mc_handle_error().
> > > This provides the actual memory location information to the userspace.
> > >
> > > Fixes: 6f15b178cd63 ("EDAC/versal: Add a Xilinx Versal memory
> > > controller driver")
> > > Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
> > > ---
> > >
> > > drivers/edac/versal_edac.c | 36 +++++++++++++++++-------------------
> > > 1 file changed, 17 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/edac/versal_edac.c b/drivers/edac/versal_edac.c
> > > index 5a43b5d43ca2..18045f96610e 100644
> > > --- a/drivers/edac/versal_edac.c
> > > +++ b/drivers/edac/versal_edac.c
> > > @@ -414,34 +414,32 @@ static unsigned long convert_to_physical(struct
> > > edac_priv *priv, union ecc_error static void handle_error(struct
> > > mem_ctl_info *mci, struct ecc_status *stat) {
> >
> > [...]
> >
> > > if (stat->error_type == XDDR_ERR_TYPE_CE) {
> >
> > [...]
> >
> > > + } else if (stat->error_type == XDDR_ERR_TYPE_UE) {
> >
> > [...]
> > > + } else {
> > > + return;
> >
> > I like the cleanup contributed by this patch (in terms of reducing code
> > duplication) in addition to the actual fix. However, this patch also introduces a
> > subtle behavior change - the existing code calls
> > memset() to clear out the ecc_status struct unconditionally, but this patch
> > doesn't call memset if the error type is not CE or UE (i.e., in the early return
> > path).
> >
> > Was this change intentional? Wouldn't it potentially cause stale data to be left
> > over in the ecc_status struct, affecting future reuse?
>
> Hi Srivatsa,
> Thanks for the review.
>
> The early return in handle_error is intentionally safe.
> get_error_info is always called before handle_error:
>
> if (get_error_info(priv))
> return;
> handle_error(mci, &priv->stat);
>
> get_error_info already guards against non-CE/UE events:
>
> if (!eccr0_ceval && !eccr1_ceval && !eccr0_ueval && !eccr1_ueval)
> return 1;
>
> So handle_error is only ever reached when at least one CE or UE
> error is present, making the else { return; } branch unreachable
> in practice. The removed memset in that path was overprotective
> dead code.
>
> For future events, get_error_info always reloads the data fresh
> before handle_error runs, so there is no risk of stale data.
>
I see, thank you for the clarification!
> I can add a comment above handle_error noting this precondition if
> that would help future readers.
>
Sure, that would be great! Along with that, how about also removing
the else { return; } branch if it is actually unreachable?
Regards,
Srivatsa
Microsoft Linux Systems Group
prev parent reply other threads:[~2026-04-27 18:49 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 6:02 [PATCH] EDAC/versal: Report PFN and page offset for DDR errors Shubhrajyoti Datta
2026-04-16 5:32 ` Prasanna Kumar T S M
2026-04-22 3:49 ` Srivatsa S. Bhat
2026-04-27 6:27 ` Datta, Shubhrajyoti
2026-04-27 18:49 ` Srivatsa S. Bhat [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae-vqFb1GvTkj9e0@csail.mit.edu \
--to=srivatsa@csail.mit.edu \
--cc=bp@alien8.de \
--cc=git@amd.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ptsm@linux.microsoft.com \
--cc=shubhrajyoti.datta@amd.com \
--cc=shubhrajyoti.datta@gmail.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox