From: Jack Steiner <steiner@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [PATCH 2.6.10-rc2] Drop SALINFO_TIMER_DELAY to one minute
Date: Wed, 01 Dec 2004 13:29:07 +0000 [thread overview]
Message-ID: <20041201132907.GA6181@sgi.com> (raw)
In-Reply-To: <10903.1101872210@kao2.melbourne.sgi.com>
On Wed, Dec 01, 2004 at 02:36:50PM +1100, Keith Owens wrote:
> Experience with recoverable MCA events shows that a poll interval of 5
> minutes for new MCA/INIT records is a bit too long. Drop the poll
> interval to one minute.
I'm not convinced that shortening the delay is the right solution.
We are testing OS recovery from double-bit memory errors. Using an error
injection program:
- a user program injects a double bit error into memory
- user accesses the memory
- platform causes an MCA due to bad ECC in memory
- cpu goes to PAL -> SAL -> OS_MCA
- OS_MCA recovers the error
- OS aborts the user program & logs an error (not sure of
the exact sequence here)
- OS exits from OS_MCA -> SAL -> PAL -> OS
- (this more-or-less works!!)
The MCA record is still held in SAL. Because of potential deadlock
situations, on the call to OS_MCA the MCA error record is not logged
and cleared.
After the error is recovered, neither the OS nor SAL raises an
interrupt to indicate that the OS should log and clear the MCA
record from the MCA. The error record remains in SAL until the
next poll by SALINFO.
The SAL Spec & Error Handling Guide are fuzzy about how this error
should be processed (at least I can't find it). At least some
of the descriptions are obsolete - they assume the OS will log &
clear the error as part of OS_MCA handling. As mentioned before, there
are potential deadlock issues in doing this.
It seems to me that either the OS or SAL should do something (ex.,
interrupt, ...) to cause the MCA error to logged/cleared as quickly
as possible. Waiting for the next poll interval does not seem like
the right solution. If too many MCAs (recovered or not) occur
before the next poll interval, error state will be lost.
>
> Signed-off-by: Keith Owens <kaos@sgi.com>
>
> Index: linux/arch/ia64/kernel/salinfo.c
> =================================> --- linux.orig/arch/ia64/kernel/salinfo.c Tue Oct 19 07:54:40 2004
> +++ linux/arch/ia64/kernel/salinfo.c Wed Dec 1 14:29:16 2004
> @@ -230,8 +230,8 @@ salinfo_log_wakeup(int type, u8 *buffer,
> }
> }
>
> -/* Check for outstanding MCA/INIT records every 5 minutes (arbitrary) */
> -#define SALINFO_TIMER_DELAY (5*60*HZ)
> +/* Check for outstanding MCA/INIT records every minute (arbitrary) */
> +#define SALINFO_TIMER_DELAY (60*HZ)
> static struct timer_list salinfo_timer;
>
> static void
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
next prev parent reply other threads:[~2004-12-01 13:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-01 3:36 [PATCH 2.6.10-rc2] Drop SALINFO_TIMER_DELAY to one minute Keith Owens
2004-12-01 13:29 ` Jack Steiner [this message]
2004-12-01 16:36 ` Jesse Barnes
2004-12-01 16:44 ` Jack Steiner
2004-12-01 17:03 ` Jesse Barnes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20041201132907.GA6181@sgi.com \
--to=steiner@sgi.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox