From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [RFC] I/O error handling for userspace
Date: Mon, 06 Dec 2004 12:42:42 +0000 [thread overview]
Message-ID: <41B453C2.5050203@jp.fujitsu.com> (raw)
In-Reply-To: <200412030831.25662.jbarnes@engr.sgi.com>
Jesse Barnes wrote:
> On Friday, December 3, 2004 8:31 am, Jesse Barnes wrote:
>
>>This patch adds support for sending a SIGBUS to a userspace application
>>using /proc/bus/pci to drive a device if an I/O error occurs. We're using
>>this in house for the X server's BIOS emulator and it seems to be working
>>well.
>>
>>The idea is to track mmaped /proc/bus/pci regions so that the machine check
>>handler is able to properly determine which process is responsible for any
>>faults that occur (ia64 is interesting in that the error may not occur in
>>the process context that actually generated the bad reference). If a match
>>is found, a SIGBUS is sent to the process, along with the address that
>>caused the fault. The machine check record is then cleared and recovery
>>takes place (the assumption is that the signal to userspace is a sufficient
>>record of the error).
Cool!
BTW I have some short comments.
> ------------------------------------------------------------------------
>
> === arch/ia64/kernel/mca.c 1.71 vs edited ==> --- 1.71/arch/ia64/kernel/mca.c 2004-11-11 10:04:30 -08:00
> +++ edited/arch/ia64/kernel/mca.c 2004-12-02 14:30:33 -08:00
> @@ -871,21 +873,69 @@
> void
> ia64_mca_ucmc_handler(void)
> {
> + struct io_range *range, *tmp;
> + unsigned long io_addr = 0;
> pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
> &ia64_sal_to_os_handoff_state.proc_state_param;
> - int recover;
> + int recover = 0;
> + ia64_err_rec_t *curr_record;
>
> /* Get the MCA error record and log it */
> ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
>
> - /* TLB error is only exist in this SAL error record */
> - recover = (psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc))
> - /* other error recovery */
> - || (ia64_mca_ucmc_extension
> - && ia64_mca_ucmc_extension(
> - IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA),
> - &ia64_sal_to_os_handoff_state,
> - &ia64_os_to_sal_handoff_state));
Where the ia64_mca_ucmc_extension had go? ... vanished? :-(
> + /* TLB errors are fixed up before we get here, so recover */
> + if (psp->tc) {
> + recover = 1;
> + goto return_to_sal;
> + }
> +
I'm not sure because I hadn't get any logs on crazy error situation, but
isn't it possible that an error log have both of tc and another(cc|bc|rc|uc)?
Why we could omit "!(psp->cc || psp->bc || psp->rc || psp->uc)" ?
> + /*
> + * If it's not a bus check with a valid target identifier,
> + * we don't have a chance.
> + */
> + if (!psp->bc) {
> + recover = 0;
> + goto return_to_sal;
> + }
> +
> + /*
> + * If we can't get this lock, we can't safely look at the list,
> + * so give up.
> + */
> + if (!spin_trylock(&io_range_list_lock)) {
> + recover = 0;
> + goto return_to_sal;
> + }
> +
> + curr_record = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA);
> + io_addr = curr_record->proc_err.info->target_identifier;
> +
> + /*
> + * See if an I/O error occured in a previously registered range
> + */
> + list_for_each_entry_safe(range, tmp, &pci_io_ranges, range_list) {
> + if (range->start <= io_addr && io_addr <= range->end) {
> + struct siginfo siginfo;
> + struct task_struct *owner = NULL;
> + recover = 1;
> + siginfo.si_signo = SIGBUS;
> + siginfo.si_code = BUS_ADRERR;
> + siginfo.si_addr = (void *) io_addr;
> + owner = find_task_by_pid(range->owner);
> + if (owner)
> + force_sig_info(SIGBUS, &siginfo, owner);
> + else {
> + /*
> + * need to free memory too, is that safe
> + * here?
> + */
> + list_del(&range->range_list);
> + }
> + }
> + }
> + spin_unlock(&io_range_list_lock);
> +
> + return_to_sal:
force_sig_info() takes spinlock in it... I think calling this isn't safe on MCA.
The rest part of the patch (for legacies) seems to be functionally-independent.
I don't have enough ideas for legacy-maps, but I guess you have done good work.
Thanks,
H.Seto
next prev parent reply other threads:[~2004-12-06 12:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-03 16:31 [RFC] I/O error handling for userspace Jesse Barnes
2004-12-03 16:43 ` Jesse Barnes
2004-12-06 12:42 ` Hidetoshi Seto [this message]
2004-12-06 16:13 ` Jesse Barnes
2004-12-06 16:59 ` Jesse Barnes
2004-12-06 17:05 ` Jesse Barnes
2004-12-06 22:56 ` Jesse Barnes
2004-12-06 23:51 ` Keith Owens
2004-12-07 0:38 ` Keith Owens
2004-12-07 0:40 ` Jesse Barnes
2004-12-07 1:29 ` Keith Owens
2004-12-07 1:36 ` Jesse Barnes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41B453C2.5050203@jp.fujitsu.com \
--to=seto.hidetoshi@jp.fujitsu.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox