From: Ben Hutchings <bhutchings@solarflare.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, linux-net-drivers@solarflare.com
Subject: [PATCH 7/7] sfc: Improve NIC internal error recovery
Date: Wed, 04 Mar 2009 20:01:57 +0000 [thread overview]
Message-ID: <1236196917.3140.34.camel@achroite> (raw)
In-Reply-To: <1236196272.3140.15.camel@achroite>
Make the error count a per-NIC variable.
Reset this the count after an hour if it has not reached the critical value.
Set the critical value back to 5.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
drivers/net/sfc/falcon.c | 23 +++++++++++++++++++----
1 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/net/sfc/falcon.c b/drivers/net/sfc/falcon.c
index 82c10f4..2ae51fd 100644
--- a/drivers/net/sfc/falcon.c
+++ b/drivers/net/sfc/falcon.c
@@ -39,11 +39,16 @@
* @next_buffer_table: First available buffer table id
* @pci_dev2: The secondary PCI device if present
* @i2c_data: Operations and state for I2C bit-bashing algorithm
+ * @int_error_count: Number of internal errors seen recently
+ * @int_error_expire: Time at which error count will be expired
*/
struct falcon_nic_data {
unsigned next_buffer_table;
struct pci_dev *pci_dev2;
struct i2c_algo_bit_data i2c_data;
+
+ unsigned int_error_count;
+ unsigned long int_error_expire;
};
/**************************************************************************
@@ -119,8 +124,12 @@ MODULE_PARM_DESC(rx_xon_thresh_bytes, "RX fifo XON threshold");
#define FALCON_EVQ_SIZE 4096
#define FALCON_EVQ_MASK (FALCON_EVQ_SIZE - 1)
-/* Max number of internal errors. After this resets will not be performed */
-#define FALCON_MAX_INT_ERRORS 4
+/* If FALCON_MAX_INT_ERRORS internal errors occur within
+ * FALCON_INT_ERROR_EXPIRE seconds, we consider the NIC broken and
+ * disable it.
+ */
+#define FALCON_INT_ERROR_EXPIRE 3600
+#define FALCON_MAX_INT_ERRORS 5
/* We poll for events every FLUSH_INTERVAL ms, and check FLUSH_POLL_COUNT times
*/
@@ -1374,7 +1383,6 @@ static irqreturn_t falcon_fatal_interrupt(struct efx_nic *efx)
efx_oword_t *int_ker = efx->irq_status.addr;
efx_oword_t fatal_intr;
int error, mem_perr;
- static int n_int_errors;
falcon_read(efx, &fatal_intr, FATAL_INTR_REG_KER);
error = EFX_OWORD_FIELD(fatal_intr, INT_KER_ERROR);
@@ -1401,7 +1409,14 @@ static irqreturn_t falcon_fatal_interrupt(struct efx_nic *efx)
pci_clear_master(nic_data->pci_dev2);
falcon_disable_interrupts(efx);
- if (++n_int_errors < FALCON_MAX_INT_ERRORS) {
+ /* Count errors and reset or disable the NIC accordingly */
+ if (nic_data->int_error_count == 0 ||
+ time_after(jiffies, nic_data->int_error_expire)) {
+ nic_data->int_error_count = 0;
+ nic_data->int_error_expire =
+ jiffies + FALCON_INT_ERROR_EXPIRE * HZ;
+ }
+ if (++nic_data->int_error_count < FALCON_MAX_INT_ERRORS) {
EFX_ERR(efx, "SYSTEM ERROR - reset scheduled\n");
efx_schedule_reset(efx, RESET_TYPE_INT_ERROR);
} else {
--
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
next prev parent reply other threads:[~2009-03-04 20:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-03-04 19:51 [PATCH 1/7] sfc: Fix efx_ethtool_nway_result() to use clause 45 MDIO registers Ben Hutchings
2009-03-04 19:52 ` [PATCH 2/7] sfc: Reject packets from the kernel TX queue during a loopback self-test Ben Hutchings
2009-03-05 1:55 ` David Miller
2009-03-04 19:52 ` [PATCH 3/7] sfc: Clean up properly on reset failure paths Ben Hutchings
2009-03-05 1:55 ` David Miller
2009-03-04 19:53 ` [PATCH 4/7] sfc: Clear I2C adapter structure in falcon_remove_nic() Ben Hutchings
2009-03-05 1:55 ` David Miller
2009-03-04 19:53 ` [PATCH 5/7] sfc: Don't wake TX queues while they're being flushed Ben Hutchings
2009-03-05 1:55 ` David Miller
2009-03-04 20:01 ` [PATCH 6/7] sfc: Fix search for flush completion events Ben Hutchings
2009-03-05 1:55 ` David Miller
2009-03-04 20:01 ` Ben Hutchings [this message]
2009-03-05 1:55 ` [PATCH 7/7] sfc: Improve NIC internal error recovery David Miller
2009-03-05 1:54 ` [PATCH 1/7] sfc: Fix efx_ethtool_nway_result() to use clause 45 MDIO registers David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1236196917.3140.34.camel@achroite \
--to=bhutchings@solarflare.com \
--cc=davem@davemloft.net \
--cc=linux-net-drivers@solarflare.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.