netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: linas@austin.ibm.com (Linas Vepstas)
To: Ananda Raju <Ananda.Raju@neterion.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>,
	linux-kernel@vger.kernel.org, linux-pci@atrey.karlin.mff.cuni.cz,
	netdev@vger.kernel.org, Jeff Garzik <jgarzik@pobox.com>,
	Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH] s2io: add PCI error recovery support
Date: Thu, 26 Oct 2006 17:51:55 -0500	[thread overview]
Message-ID: <20061026225155.GH6360@austin.ibm.com> (raw)
In-Reply-To: <78C9135A3D2ECE4B8162EBDCE82CAD77DC1ED7@nekter>

Hi.

On Thu, Oct 26, 2006 at 05:56:34AM -0400, Ananda Raju wrote:
> Hi, 
> Can you try attached patch. The attached patch is simple. We set card
> state as down in error_detecct() so that all entry points return error
> and don't proceed further.
> 
> In slot_reset() we do s2io_card_down() will reset adapter. 
> In io_resume() we bringup the driver. 

Simplicity is always better. However, some questions/comments:

> @@ -4175,6 +4186,10 @@ static irqreturn_t s2io_isr(int irq, voi
>  	mac_info_t *mac_control;
>  	struct config_param *config;
>  
> +	if (atomic_read(&sp->card_state) == CARD_DOWN) {
> +		return IRQ_NONE;
> +	}

I used 

    if ((sp->pdev->error_state != pci_channel_io_normal)

here for a reason: the pdev->error_state is set even in an interrupt
context, that is, it gets set even if interrups are disabled, and
so it represents the actual state immediately. By contrast, the
error callbacks do not get called until possibly much later, 
and so sp->card_state = CARD_DOWN might not get set for a while.

If, for any reason, e.g. some obscure corner case, the s2io 
generates zillions of interupts, this could result in a soft-lockup.
I actually saw this in the symbios device driver, which will
regenerate an interrupt until its acknowledged -- and so it 
sat there, spinning. :-(

I was returning IRQ_HANDLED instead of IRQ_NONE, so as to avoid
falling into handle_bad_irq() or report_bad_irq(). I haven't 
seen this happen on s2io, but thought it would still be wise.

If this can't happen, then there's no problem here.

> +/**
> + * s2io_io_slot_reset - called after the pci bus has been reset.
> + * @pdev: Pointer to PCI device
> + *
> + * Restart the card from scratch, as if from a cold-boot.
> + */
> +static pci_ers_result_t s2io_io_slot_reset(struct pci_dev *pdev)
> +{

At this point, the card has just experienced a hardware reset,
(the #RST wire was held low for 250 millisecs, followed by
a settle time of 2 seconds, followed by whatever BIOS thinks
it needed to do, followed by a restore of the pci config space
to what it was after a cold boot. So the card is in a "fresh"
state; in theory its identitcal to a cold boot. So ... 
are you sure you want to "down" at this point? 

> +		s2io_card_down(sp);
> +		sp->device_close_flag = TRUE;	/* Device is shut down. */


One problem I'm having is that the watchdog timer sometimes
pops and tries to reset the card before s2io_card_down()
has a chance to run. I fixed this ... 

======================
So -- just for grins, I thought to myself, "Maybe I can make 
s2io be the first adapter ever to fully recover without 
a hard reset of the card."

The idea is simple: 

1) enable MMIO,
2) call s2io_card_down()
3) enable DMA
4) cal s2io_card_up()

I have a patch that does this, but then hit a few more snags.
I haven't yet nailed down all the trouble spots, maybe tommorrow.

--linas


  reply	other threads:[~2006-10-26 22:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-26  9:56 [PATCH] s2io: add PCI error recovery support Ananda Raju
2006-10-26 22:51 ` Linas Vepstas [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-03-07  0:42 Ramkrishna Vepa
2007-03-05 22:33 Ramkrishna Vepa
2007-03-16 19:49 ` Linas Vepstas
2007-03-16 19:58   ` Ramkrishna Vepa
2007-02-15 23:08 Linas Vepstas
2006-10-27 11:35 Ananda Raju
2006-10-27 19:32 ` Linas Vepstas
2006-10-25  6:29 Ananda Raju
2006-10-25 15:11 ` Linas Vepstas
2006-10-25 20:55   ` Linas Vepstas
2006-10-24 21:54 Linas Vepstas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20061026225155.GH6360@austin.ibm.com \
    --to=linas@austin.ibm.com \
    --cc=Ananda.Raju@neterion.com \
    --cc=akpm@osdl.org \
    --cc=jgarzik@pobox.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@atrey.karlin.mff.cuni.cz \
    --cc=netdev@vger.kernel.org \
    --cc=wenxiong@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).