public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@amd64.org>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>,
	Ingo Molnar <mingo@elte.hu>,
	EDAC devel <linux-edac@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer
Date: Tue, 27 Mar 2012 19:06:55 +0200	[thread overview]
Message-ID: <20120327170655.GB7937@aftab> (raw)
In-Reply-To: <20120312180359.GA8214@aftab>

On Mon, Mar 12, 2012 at 07:03:59PM +0100, Borislav Petkov wrote:
> On Mon, Mar 12, 2012 at 04:59:37PM +0000, Luck, Tony wrote:
> > > Sounds better, especially the close-on-exit part. Please elaborate on
> > > the races...
> > 
> > Errors are happening asynchronously to everything. Race looks like:
> > 
> > Daemon exits (or is killed)
> >    <<<< race begins here
> > kernel close routine called
> > close routine updates your global variable
> >    <<<< race ends here
> 
> Well, in that case, we're going to miss logging a single error, or log
> it incomplete.
> 
> Unless, we make the global variable atomic and make the daemon zero it
> as the first action it does when it starts going away. If it is killed,
> then we probably need some sanity-checking functionality which checks
> periodically whether the daemon is still alive ...
> 
> This probably needs more meditation.

Ok, hm, how about we add a timer which runs for a safe period of say...
a couple of minutes after the error has been logged into the buffer.

Before it expires we expect that the userspace daemon comes in and
consumes the information - we test explicitly whether it wrote to some
file - or implicitly by checking whether the buffer got emptied in the
meantime (the exact method is still TBD).

In any case, if during the safe period of time we haven't received
confirmation from userspace that the item has been consumed, we switch
irreversibly back to the kernel log buffer and reissue the decoded info
through printk.

This way we

* don't introduce a device file with a ->close

* remain races-agnostic: either the timeout has happened and userspace
hasn't consumed the decoded data or it worked just fine and we continue
on with our marry error collection.

If other errors happen while the timer is running, we log them as usual
and restart the timer to give the newest error an equal chance. Error
size shouldn't overflow the buffer because we're reserving 4 pages per
CPU currently and this can easily be enlarged...

Hmm, thoughts..?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

  reply	other threads:[~2012-03-27 17:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-06 13:31 [RFC -v3 PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-03-06 13:31 ` [PATCH 1/3] mce: Add a msg string to the MCE tracepoint Borislav Petkov
2012-03-06 13:31 ` [PATCH 2/3] x86, RAS: Add a decoded msg buffer Borislav Petkov
2012-03-06 13:31 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov
2012-03-06 15:42   ` Mauro Carvalho Chehab
2012-03-12 16:18     ` Luck, Tony
2012-03-12 16:26       ` Borislav Petkov
2012-03-12 16:59         ` Luck, Tony
2012-03-12 18:03           ` Borislav Petkov
2012-03-27 17:06             ` Borislav Petkov [this message]
2012-03-27 18:35               ` Luck, Tony
2012-03-27 19:11                 ` Borislav Petkov
  -- strict thread matches above, loose matches on Subject: below --
2012-02-28 16:11 [RFC PATCH 0/3] RAS: Use MCE tracepoint for decoded MCEs Borislav Petkov
2012-02-28 16:11 ` [PATCH 3/3] EDAC: Convert AMD EDAC pieces to use RAS printk buffer Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120327170655.GB7937@aftab \
    --to=bp@amd64.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@redhat.com \
    --cc=mingo@elte.hu \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox