Re: [RFC] I/O error handling for userspace

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [RFC] I/O error handling for userspace
Date: Mon, 06 Dec 2004 12:42:42 +0000	[thread overview]
Message-ID: <41B453C2.5050203@jp.fujitsu.com> (raw)
In-Reply-To: <200412030831.25662.jbarnes@engr.sgi.com>

Jesse Barnes wrote:
> On Friday, December 3, 2004 8:31 am, Jesse Barnes wrote:
> 
>>This patch adds support for sending a SIGBUS to a userspace application
>>using /proc/bus/pci to drive a device if an I/O error occurs.  We're using
>>this in house for the X server's BIOS emulator and it seems to be working
>>well.
>>
>>The idea is to track mmaped /proc/bus/pci regions so that the machine check
>>handler is able to properly determine which process is responsible for any
>>faults that occur (ia64 is interesting in that the error may not occur in
>>the process context that actually generated the bad reference).  If a match
>>is found, a SIGBUS is sent to the process, along with the address that
>>caused the fault.  The machine check record is then cleared and recovery
>>takes place (the assumption is that the signal to userspace is a sufficient
>>record of the error).

Cool!
BTW I have some short comments.

> ------------------------------------------------------------------------
> 
> === arch/ia64/kernel/mca.c 1.71 vs edited ==> --- 1.71/arch/ia64/kernel/mca.c	2004-11-11 10:04:30 -08:00
> +++ edited/arch/ia64/kernel/mca.c	2004-12-02 14:30:33 -08:00
> @@ -871,21 +873,69 @@
>  void
>  ia64_mca_ucmc_handler(void)
>  {
> +	struct io_range *range, *tmp;
> +	unsigned long io_addr = 0;
>  	pal_processor_state_info_t *psp = (pal_processor_state_info_t *)
>  		&ia64_sal_to_os_handoff_state.proc_state_param;
> -	int recover; 
> +	int recover = 0;
> +	ia64_err_rec_t *curr_record;
>  
>  	/* Get the MCA error record and log it */
>  	ia64_mca_log_sal_error_record(SAL_INFO_TYPE_MCA);
>  
> -	/* TLB error is only exist in this SAL error record */
> -	recover = (psp->tc && !(psp->cc || psp->bc || psp->rc || psp->uc))
> -	/* other error recovery */
> -	   || (ia64_mca_ucmc_extension 
> -		&& ia64_mca_ucmc_extension(
> -			IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA),
> -			&ia64_sal_to_os_handoff_state,
> -			&ia64_os_to_sal_handoff_state)); 

Where the ia64_mca_ucmc_extension had go? ... vanished? :-(


> + 	/* TLB errors are fixed up before we get here, so recover */
> + 	if (psp->tc) {
> + 		recover = 1;
> + 		goto return_to_sal;
> + 	}
> +

I'm not sure because I hadn't get any logs on crazy error situation, but
isn't it possible that an error log have both of tc and another(cc|bc|rc|uc)?
Why we could omit "!(psp->cc || psp->bc || psp->rc || psp->uc)" ?


> + 	/*
> + 	 * If it's not a bus check with a valid target identifier,
> + 	 * we don't have a chance.
> + 	 */
> + 	if (!psp->bc) {
> + 		recover = 0;
> + 		goto return_to_sal;
> + 	}
> +
> + 	/*
> + 	 * If we can't get this lock, we can't safely look at the list,
> + 	 * so give up.
> + 	 */
> + 	if (!spin_trylock(&io_range_list_lock)) {
> + 		recover = 0;
> + 		goto return_to_sal;
> + 	}
> +
> + 	curr_record = IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_MCA);
> +	io_addr = curr_record->proc_err.info->target_identifier;
> +
> + 	/*
> + 	 * See if an I/O error occured in a previously registered range
> + 	 */
> + 	list_for_each_entry_safe(range, tmp, &pci_io_ranges, range_list) {
> + 		if (range->start <= io_addr && io_addr <= range->end) {
> + 			struct siginfo siginfo;
> + 			struct task_struct *owner = NULL;
> + 			recover = 1;
> + 			siginfo.si_signo = SIGBUS;
> + 			siginfo.si_code = BUS_ADRERR;
> + 			siginfo.si_addr  = (void *) io_addr;
> + 			owner = find_task_by_pid(range->owner);
> + 			if (owner)
> + 				force_sig_info(SIGBUS, &siginfo, owner);
> + 			else {
> + 				/*
> + 				 * need to free memory too, is that safe
> + 				 * here?
> + 				 */
> + 				list_del(&range->range_list);
> + 			}
> + 		}
> + 	}
> + 	spin_unlock(&io_range_list_lock);
> +
> + return_to_sal:

force_sig_info() takes spinlock in it... I think calling this isn't safe on MCA.


The rest part of the patch (for legacies) seems to be functionally-independent.
I don't have enough ideas for legacy-maps, but I guess you have done good work.


Thanks,
H.Seto

next prev parent reply	other threads:[~2004-12-06 12:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-03 16:31 [RFC] I/O error handling for userspace Jesse Barnes
2004-12-03 16:43 ` Jesse Barnes
2004-12-06 12:42 ` Hidetoshi Seto [this message]
2004-12-06 16:13 ` Jesse Barnes
2004-12-06 16:59 ` Jesse Barnes
2004-12-06 17:05 ` Jesse Barnes
2004-12-06 22:56 ` Jesse Barnes
2004-12-06 23:51 ` Keith Owens
2004-12-07  0:38 ` Keith Owens
2004-12-07  0:40 ` Jesse Barnes
2004-12-07  1:29 ` Keith Owens
2004-12-07  1:36 ` Jesse Barnes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=41B453C2.5050203@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox