From: Linas Vepstas <linas@austin.ibm.com>
To: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Matthew Wilcox <matthew@wil.cx>,
Linus Torvalds <torvalds@osdl.org>,
Jeff Garzik <jgarzik@pobox.com>,
Linux Kernel list <linux-kernel@vger.kernel.org>,
linux-pci@atrey.karlin.mff.cuni.cz, linux-ia64@vger.kernel.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
"Luck, Tony" <tony.luck@intel.com>
Subject: Re: [PATCH/RFC] I/O-check interface for driver's error handling
Date: Wed, 2 Mar 2005 11:44:38 -0600 [thread overview]
Message-ID: <20050302174438.GH1220@austin.ibm.com> (raw)
In-Reply-To: <422524B1.10405@jp.fujitsu.com>
On Wed, Mar 02, 2005 at 11:28:01AM +0900, Hidetoshi Seto was heard to remark:
>
> Note that here is a difficulty: the MCA handler on some arch would run on
> special context - MCA environment. In other words, since some MCA handler
> would be called by non-maskable interrupt(e.g. NMI), so it's difficult to
> call some driver's callback using protected kernel locks from MCA context.
>
> Therefore what MCA handler could do is just indicates a error was there,
> by something like status flag which drivers can refer. And after possible
> deley, we would be able to call callbacks.
FWIW, many device drivers do I/O in an interrupt context (e.g. from
timer interrupts), which is why error recovery needs to run in a
separate thread from error detection.
On ppc64, here's what I currently do:
-- check for pci error (call firmware & ask it)
-- if (error) put event on a workqueue
-- workqueue handler sends out a notifier_call_chain
to anyone who cares.
Thus, the error recovery thread runs in its own thread.
Right now, the above code is arch-specific, but I don't see any
reason we couldn't make the event, the workqueue and the
notifier_call chain arch-generic.
Below is some "pseudocode" version (mentally substitute
"pci error event" for every occurance of "eeh"). Its got some
ppc64-specific crud in there that we have to fix to make it
truly generic (I just cut and pasted from current code).
Would a cleaned up version of this code be suitable for a
arch-generic pci error recovery framework? Seto, would
this be useful to you?
--linas
===================================================
Header file:
/**
* EEH Notifier event flags.
* Freeze -- pci slot is frozen, no i/o is possible
*/
#define EEH_NOTIFY_FREEZE 1
/** EEH event -- structure holding pci slot data that describes
* a change in the isolation status of a PCI slot. A pointer
* to this struct is passed as the data pointer in a notify callback.
*/
struct eeh_event {
struct list_head list;
struct pci_dev *dev;
int reset_state;
int time_unavail;
};
/** Register to find out about EEH events. */
int eeh_register_notifier(struct notifier_block *nb);
int eeh_unregister_notifier(struct notifier_block *nb);
===================================================
C file:
/* EEH event workqueue setup. */
static spinlock_t eeh_eventlist_lock = SPIN_LOCK_UNLOCKED;
LIST_HEAD(eeh_eventlist);
static void eeh_event_handler(void *);
DECLARE_WORK(eeh_event_wq, eeh_event_handler, NULL);
static struct notifier_block *eeh_notifier_chain;
/**
* eeh_register_notifier - Register to find out about EEH events.
* @nb: notifier block to callback on events
*/
int eeh_register_notifier(struct notifier_block *nb)
{
return notifier_chain_register(&eeh_notifier_chain, nb);
}
/**
* eeh_unregister_notifier - Unregister to an EEH event notifier.
* @nb: notifier block to callback on events
*/
int eeh_unregister_notifier(struct notifier_block *nb)
{
return notifier_chain_unregister(&eeh_notifier_chain, nb);
}
/**
* queue up a pci error event to be dispatched to all listeners
* of the pci error notifier call chain. This routine is safe to call
* within an interrupt context. The actual event delivery
* will be from a workque thread.
*/
void eeh_queue_failure(struct pci_dev *dev)
{
struct eeh_event *event;
event = kmalloc(sizeof(*event), GFP_ATOMIC);
if (event == NULL) {
printk (KERN_ERR "EEH: out of memory, event not handled\n");
return 1;
}
event->dev = dev;
event->reset_state = rets[0];
event->time_unavail = rets[2];
/* We may be called in an interrupt context */
spin_lock_irqsave(&eeh_eventlist_lock, flags);
list_add(&event->list, &eeh_eventlist);
spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
/* Most EEH events are due to device driver bugs. Having
* a stack trace will help the device-driver authors figure
* out what happened. So print that out. */
if (rets[0] != 5) dump_stack();
schedule_work(&eeh_event_wq);
}
/**
* eeh_event_handler - dispatch EEH events. The detection of a frozen
* slot can occur inside an interrupt, where it can be hard to do
* anything about it. The goal of this routine is to pull these
* detection events out of the context of the interrupt handler, and
* re-dispatch them for processing at a later time in a normal context.
*
* @dummy - unused
*/
static void eeh_event_handler(void *dummy)
{
unsigned long flags;
struct eeh_event *event;
while (1) {
spin_lock_irqsave(&eeh_eventlist_lock, flags);
event = NULL;
if (!list_empty(&eeh_eventlist)) {
event = list_entry(eeh_eventlist.next, struct eeh_event, list);
list_del(&event->list);
}
spin_unlock_irqrestore(&eeh_eventlist_lock, flags);
if (event == NULL)
break;
if (event->reset_state != 5) {
printk(KERN_INFO "EEH: MMIO failure (%d), notifiying device "
"%s %s\n", event->reset_state,
pci_name(event->dev), pci_pretty_name(event->dev));
}
__get_cpu_var(slot_resets)++;
notifier_call_chain (&eeh_notifier_chain,
EEH_NOTIFY_FREEZE, event);
pci_dev_put(event->dev);
kfree(event);
}
}
next prev parent reply other threads:[~2005-03-02 17:49 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-03-01 8:33 [PATCH/RFC] I/O-check interface for driver's error handling Hidetoshi Seto
2005-03-01 14:42 ` Matthew Wilcox
2005-03-01 19:27 ` Linas Vepstas
2005-03-01 19:37 ` Linus Torvalds
2005-03-02 6:13 ` Hidetoshi Seto
2005-03-02 19:20 ` Linas Vepstas
2005-03-04 2:03 ` Hidetoshi Seto
2005-03-04 16:46 ` Linas Vepstas
2005-03-01 16:37 ` Jeff Garzik
2005-03-01 16:49 ` Linus Torvalds
2005-03-01 16:59 ` Matthew Wilcox
2005-03-01 17:10 ` Jesse Barnes
2005-03-01 18:33 ` Linas Vepstas
2005-03-01 22:27 ` Benjamin Herrenschmidt
2005-03-02 20:02 ` Linas Vepstas
2005-03-02 22:46 ` Benjamin Herrenschmidt
2005-03-02 23:37 ` Linas Vepstas
2005-03-01 22:23 ` Benjamin Herrenschmidt
2005-03-02 3:13 ` Hidetoshi Seto
2005-03-04 13:54 ` Pavel Machek
2005-03-04 17:50 ` Jesse Barnes
2005-03-04 22:37 ` Benjamin Herrenschmidt
2005-03-04 22:57 ` Pavel Machek
2005-03-04 23:03 ` Benjamin Herrenschmidt
2005-03-04 23:18 ` Pavel Machek
2005-03-04 23:27 ` Benjamin Herrenschmidt
2005-03-02 2:28 ` Hidetoshi Seto
2005-03-02 17:44 ` Linas Vepstas [this message]
2005-03-02 18:03 ` linux-os
2005-03-02 22:40 ` Benjamin Herrenschmidt
2005-03-04 2:21 ` Hidetoshi Seto
2005-03-01 22:20 ` Benjamin Herrenschmidt
2005-03-02 18:22 ` Linas Vepstas
2005-03-02 18:41 ` Jesse Barnes
2005-03-02 19:46 ` Linas Vepstas
2005-03-02 22:43 ` Benjamin Herrenschmidt
2005-03-02 22:41 ` Benjamin Herrenschmidt
2005-03-02 23:30 ` Linas Vepstas
2005-03-02 23:40 ` Jesse Barnes
2005-03-01 19:17 ` Linas Vepstas
2005-03-01 22:15 ` Benjamin Herrenschmidt
2005-03-01 17:19 ` Andi Kleen
2005-03-01 18:08 ` Linus Torvalds
2005-03-01 18:45 ` Andi Kleen
2005-03-01 18:59 ` Linas Vepstas
2005-03-01 22:26 ` Benjamin Herrenschmidt
2005-03-01 22:24 ` Benjamin Herrenschmidt
2005-03-04 12:40 ` Hidetoshi Seto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050302174438.GH1220@austin.ibm.com \
--to=linas@austin.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=jgarzik@pobox.com \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@atrey.karlin.mff.cuni.cz \
--cc=matthew@wil.cx \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=tony.luck@intel.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox