Re: [RFC] I/O error handling for userspace

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [RFC] I/O error handling for userspace
Date: Mon, 06 Dec 2004 12:42:42 +0000	[thread overview]
Message-ID: <41B453C2.5050203@jp.fujitsu.com> (raw)
In-Reply-To: <200412030831.25662.jbarnes@engr.sgi.com>

Jesse Barnes wrote:
> On Friday, December 3, 2004 8:31 am, Jesse Barnes wrote:
> 
>>This patch adds support for sending a SIGBUS to a userspace application
>>using /proc/bus/pci to drive a device if an I/O error occurs.  We're using
>>this in house for the X server's BIOS emulator and it seems to be working
>>well.
>>
>>The idea is to track mmaped /proc/bus/pci regions so that the machine check
>>handler is able to properly determine which process is responsible for any
>>faults that occur (ia64 is interesting in that the error may not occur in
>>the process context that actually generated the bad reference).  If a match
>>is found, a SIGBUS is sent to the process, along with the address that
>>caused the fault.  The machine check record is then cleared and recovery
>>takes place (the assumption is that the signal to userspace is a sufficient
>>record of the error).

Cool!
BTW I have some short comments.

> ------------------------------------------------------------------------
> 
> === arch/ia64/kernel/mca.c 1.71 vs edited ==> --- 1.71/arch/ia64/kernel/mca.c	2004-11-11 10:04:30 -08:00
> +++ edited/arch/ia64/kernel/mca.c	2004-12-02 14:30:33 -08:00
> @@ -871,21 +873,69 @@
>  void
>  ia64_mca_ucmc_handler(void)
>  {
> +	struct io_range *range, *tmp;
> +	unsigned long io_addr = 0;
>  	pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
>  		&ia64_sal_to_os_handoff_state.proc_state_param;
> -	int recover; 
> +	int recover = 0;
> +	ia64_err_rec_t *curr_record;
>  
>  	/* Get the MCA error record and log it */
>  	ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
>  
> -	/* TLB error is only exist in this SAL error record */
> -	recover = (psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc))
> -	/* other error recovery */
> -	   || (ia64_mca_ucmc_extension 
> -		&& ia64_mca_ucmc_extension(
> -			IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA),
> -			&ia64_sal_to_os_handoff_state,
> -			&ia64_os_to_sal_handoff_state)); 

Where the ia64_mca_ucmc_extension had go? ... vanished? :-(


> + 	/* TLB errors are fixed up before we get here, so recover */
> + 	if (psp->tc) {
> + 		recover = 1;
> + 		goto return_to_sal;
> + 	}
> +

I'm not sure because I hadn't get any logs on crazy error situation, but
isn't it possible that an error log have both of tc and another(cc|bc|rc|uc)?
Why we could omit "!(psp->cc || psp->bc || psp->rc || psp->uc)" ?


> + 	/*
> + 	 * If it's not a bus check with a valid target identifier,
> + 	 * we don't have a chance.
> + 	 */
> + 	if (!psp->bc) {
> + 		recover = 0;
> + 		goto return_to_sal;
> + 	}
> +
> + 	/*
> + 	 * If we can't get this lock, we can't safely look at the list,
> + 	 * so give up.
> + 	 */
> + 	if (!spin_trylock(&io_range_list_lock)) {
> + 		recover = 0;
> + 		goto return_to_sal;
> + 	}
> +
> + 	curr_record = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA);
> +	io_addr = curr_record->proc_err.info->target_identifier;
> +
> + 	/*
> + 	 * See if an I/O error occured in a previously registered range
> + 	 */
> + 	list_for_each_entry_safe(range, tmp, &pci_io_ranges, range_list) {
> + 		if (range->start <= io_addr && io_addr <= range->end) {
> + 			struct siginfo siginfo;
> + 			struct task_struct *owner = NULL;
> + 			recover = 1;
> + 			siginfo.si_signo = SIGBUS;
> + 			siginfo.si_code = BUS_ADRERR;
> + 			siginfo.si_addr  = (void *) io_addr;
> + 			owner = find_task_by_pid(range->owner);
> + 			if (owner)
> + 				force_sig_info(SIGBUS, &siginfo, owner);
> + 			else {
> + 				/*
> + 				 * need to free memory too, is that safe
> + 				 * here?
> + 				 */
> + 				list_del(&range->range_list);
> + 			}
> + 		}
> + 	}
> + 	spin_unlock(&io_range_list_lock);
> +
> + return_to_sal:

force_sig_info() takes spinlock in it... I think calling this isn't safe on MCA.


The rest part of the patch (for legacies) seems to be functionally-independent.
I don't have enough ideas for legacy-maps, but I guess you have done good work.


Thanks,
H.Seto

next prev parent reply	other threads:[~2004-12-06 12:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-03 16:31 [RFC] I/O error handling for userspace Jesse Barnes
2004-12-03 16:43 ` Jesse Barnes
2004-12-06 12:42 ` Hidetoshi Seto [this message]
2004-12-06 16:13 ` Jesse Barnes
2004-12-06 16:59 ` Jesse Barnes
2004-12-06 17:05 ` Jesse Barnes
2004-12-06 22:56 ` Jesse Barnes
2004-12-06 23:51 ` Keith Owens
2004-12-07  0:38 ` Keith Owens
2004-12-07  0:40 ` Jesse Barnes
2004-12-07  1:29 ` Keith Owens
2004-12-07  1:36 ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41B453C2.5050203@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.