* [PATCH 2/6] forcedeth: new ethtool stat "tx_timeout" to account for tx_timeouts
From: David Decotigny @ 2011-05-19 0:14 UTC (permalink / raw)
To: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel
Cc: kernel-net-upstream, Sameer Nanda, David Decotigny
In-Reply-To: <1305764080-24853-1-git-send-email-decot@google.com>
From: Sameer Nanda <snanda@google.com>
This change publishes a new ethtool stats: tx_timeout that counts the
number of times the tx_timeout callback was triggered.
Signed-off-by: David Decotigny <decot@google.com>
---
drivers/net/forcedeth.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 895471d..112dc0b 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -632,6 +632,7 @@ static const struct nv_ethtool_str nv_estats_str[] = {
{ "rx_packets" },
{ "rx_errors_total" },
{ "tx_errors_total" },
+ { "tx_timeout" },
/* version 2 stats */
{ "tx_deferral" },
@@ -672,6 +673,7 @@ struct nv_ethtool_stats {
u64 rx_packets;
u64 rx_errors_total;
u64 tx_errors_total;
+ u64 tx_timeout;
/* version 2 stats */
u64 tx_deferral;
@@ -2526,6 +2528,8 @@ static void nv_tx_timeout(struct net_device *dev)
spin_lock_irq(&np->lock);
+ np->estats.tx_timeout++;
+
/* 1) stop tx engine */
nv_stop_tx(dev);
--
1.7.3.1
^ permalink raw reply related
* [PATCH 4/6] forcedeth: Acknowledge only interrupts that are being processed
From: David Decotigny @ 2011-05-19 0:14 UTC (permalink / raw)
To: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel
Cc: kernel-net-upstream, Mike Ditto, David Decotigny
In-Reply-To: <1305764080-24853-1-git-send-email-decot@google.com>
From: Mike Ditto <mditto@google.com>
This is to avoid a race, accidentally acknowledging an interrupt that
we didn't notice and won't immediately process. This is based solely
on code inspection; it is not known if there was an actual bug here.
Signed-off-by: David Decotigny <decot@google.com>
---
drivers/net/forcedeth.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 7a6aa08..17e79de 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -3403,7 +3403,8 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
for (i = 0;; i++) {
events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_TX_ALL;
- writel(NVREG_IRQ_TX_ALL, base + NvRegMSIXIrqStatus);
+ writel(events, base + NvRegMSIXIrqStatus);
+ netdev_dbg(dev, "%s: tx irq: %08x\n", dev->name, events);
if (!(events & np->irqmask))
break;
@@ -3514,7 +3515,8 @@ static irqreturn_t nv_nic_irq_rx(int foo, void *data)
for (i = 0;; i++) {
events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_RX_ALL;
- writel(NVREG_IRQ_RX_ALL, base + NvRegMSIXIrqStatus);
+ writel(events, base + NvRegMSIXIrqStatus);
+ netdev_dbg(dev, "%s: rx irq: %08x\n", dev->name, events);
if (!(events & np->irqmask))
break;
@@ -3558,7 +3560,8 @@ static irqreturn_t nv_nic_irq_other(int foo, void *data)
for (i = 0;; i++) {
events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_OTHER;
- writel(NVREG_IRQ_OTHER, base + NvRegMSIXIrqStatus);
+ writel(events, base + NvRegMSIXIrqStatus);
+ netdev_dbg(dev, "%s: irq: %08x\n", dev->name, events);
if (!(events & np->irqmask))
break;
@@ -3622,10 +3625,10 @@ static irqreturn_t nv_nic_irq_test(int foo, void *data)
if (!(np->msi_flags & NV_MSI_X_ENABLED)) {
events = readl(base + NvRegIrqStatus) & NVREG_IRQSTAT_MASK;
- writel(NVREG_IRQ_TIMER, base + NvRegIrqStatus);
+ writel(events & NVREG_IRQ_TIMER, base + NvRegIrqStatus);
} else {
events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
- writel(NVREG_IRQ_TIMER, base + NvRegMSIXIrqStatus);
+ writel(events & NVREG_IRQ_TIMER, base + NvRegMSIXIrqStatus);
}
pci_push(base);
if (!(events & NVREG_IRQ_TIMER))
--
1.7.3.1
^ permalink raw reply related
* [PATCH 5/6] forcedeth: Add messages to indicate using MSI or MSI-X
From: David Decotigny @ 2011-05-19 0:14 UTC (permalink / raw)
To: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel
Cc: kernel-net-upstream, Mike Ditto, David Decotigny
In-Reply-To: <1305764080-24853-1-git-send-email-decot@google.com>
From: Mike Ditto <mditto@google.com>
This adds a few debug messages to indicate whether PCIe interrupts are
signaled with MSI or MSI-X.
Signed-off-by: David Decotigny <decot@google.com>
---
drivers/net/forcedeth.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 17e79de..2712ddc 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -3745,6 +3745,7 @@ static int nv_request_irq(struct net_device *dev, int intr_test)
writel(0, base + NvRegMSIXMap0);
writel(0, base + NvRegMSIXMap1);
}
+ netdev_info(dev, "forcedeth: MSI-X enabled\n");
}
}
if (ret != 0 && np->msi_flags & NV_MSI_CAPABLE) {
@@ -3766,6 +3767,7 @@ static int nv_request_irq(struct net_device *dev, int intr_test)
writel(0, base + NvRegMSIMap1);
/* enable msi vector 0 */
writel(NVREG_MSI_VECTOR_0_ENABLED, base + NvRegMSIIrqMask);
+ netdev_info(dev, "forcedeth: MSI enabled\n");
}
}
if (ret != 0) {
--
1.7.3.1
^ permalink raw reply related
* [PATCH 6/6] forcedeth: Fix a race during rmmod of forcedeth
From: David Decotigny @ 2011-05-19 0:14 UTC (permalink / raw)
To: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel
Cc: kernel-net-upstream, Salman Qazi, David Decotigny
In-Reply-To: <1305764080-24853-1-git-send-email-decot@google.com>
From: Salman Qazi <sqazi@google.com>
The race was between del_timer_sync and nv_do_stats_poll called through
nv_get_ethtool_stats. To prevent this, we have to introduce mutual
exclusion between nv_get_ethtool_stats and del_timer_sync. Notice
that we don't put the mutual exclusion in nv_do_stats_poll. That's
because doing so would result in a deadlock, since it is a timer
callback and hence already waited for by timer deletion.
Signed-off-by: David Decotigny <decot@google.com>
---
drivers/net/forcedeth.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 2712ddc..2121cea 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -3921,6 +3921,10 @@ static void nv_poll_controller(struct net_device *dev)
}
#endif
+/* No locking is needed as long as this is in the timer
+ * callback. However, any other callers must call this
+ * function with np->lock held.
+ */
static void nv_do_stats_poll(unsigned long data)
{
struct net_device *dev = (struct net_device *) data;
@@ -4553,12 +4557,17 @@ static int nv_get_sset_count(struct net_device *dev, int sset)
static void nv_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *estats, u64 *buffer)
{
+ unsigned long flags;
struct fe_priv *np = netdev_priv(dev);
+ spin_lock_irqsave(&np->lock, flags);
+
/* update stats */
nv_do_stats_poll((unsigned long)dev);
memcpy(buffer, &np->estats, nv_get_sset_count(dev, ETH_SS_STATS)*sizeof(u64));
+
+ spin_unlock_irqrestore(&np->lock, flags);
}
static int nv_link_test(struct net_device *dev)
@@ -5176,13 +5185,13 @@ static int nv_close(struct net_device *dev)
spin_lock_irq(&np->lock);
np->in_shutdown = 1;
+ del_timer_sync(&np->stats_poll);
spin_unlock_irq(&np->lock);
nv_napi_disable(dev);
synchronize_irq(np->pci_dev->irq);
del_timer_sync(&np->oom_kick);
del_timer_sync(&np->nic_poll);
- del_timer_sync(&np->stats_poll);
netif_stop_queue(dev);
spin_lock_irq(&np->lock);
--
1.7.3.1
^ permalink raw reply related
* [PATCH 3/6] forcedeth: allow to silence tx_timeout debug messages
From: David Decotigny @ 2011-05-19 0:14 UTC (permalink / raw)
To: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel
Cc: kernel-net-upstream, Sameer Nanda, David Decotigny
In-Reply-To: <1305764080-24853-1-git-send-email-decot@google.com>
From: Sameer Nanda <snanda@google.com>
This adds a new module parameter "debug_tx_timeout" to silence most
debug messages in case of TX timeout. These messages don't provide a
signal/noise ratio high enough for production systems and, with ~30kB
logged each time, they tend to add to a cascade effect if the system
is already under stress (memory pressure, disk, etc.).
By default, the parameter is clear, meaning the debug messages are not
displayed.
Signed-off-by: David Decotigny <decot@google.com>
---
drivers/net/forcedeth.c | 91 ++++++++++++++++++++++++++--------------------
1 files changed, 51 insertions(+), 40 deletions(-)
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 112dc0b..7a6aa08 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -896,6 +896,11 @@ enum {
static int dma_64bit = NV_DMA_64BIT_ENABLED;
/*
+ * Debug output control for tx_timeout
+ */
+static bool debug_tx_timeout = false;
+
+/*
* Crossover Detection
* Realtek 8201 phy + some OEM boards do not work properly.
*/
@@ -2473,7 +2478,6 @@ static void nv_tx_timeout(struct net_device *dev)
u32 status;
union ring_type put_tx;
int saved_tx_limit;
- int i;
if (np->msi_flags & NV_MSI_X_ENABLED)
status = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQSTAT_MASK;
@@ -2482,47 +2486,51 @@ static void nv_tx_timeout(struct net_device *dev)
netdev_info(dev, "Got tx_timeout. irq: %08x\n", status);
- netdev_info(dev, "Ring at %lx\n", (unsigned long)np->ring_addr);
- netdev_info(dev, "Dumping tx registers\n");
- for (i = 0; i <= np->register_size; i += 32) {
- netdev_info(dev,
- "%3x: %08x %08x %08x %08x %08x %08x %08x %08x\n",
- i,
- readl(base + i + 0), readl(base + i + 4),
- readl(base + i + 8), readl(base + i + 12),
- readl(base + i + 16), readl(base + i + 20),
- readl(base + i + 24), readl(base + i + 28));
- }
- netdev_info(dev, "Dumping tx ring\n");
- for (i = 0; i < np->tx_ring_size; i += 4) {
- if (!nv_optimized(np)) {
- netdev_info(dev,
- "%03x: %08x %08x // %08x %08x // %08x %08x // %08x %08x\n",
- i,
- le32_to_cpu(np->tx_ring.orig[i].buf),
- le32_to_cpu(np->tx_ring.orig[i].flaglen),
- le32_to_cpu(np->tx_ring.orig[i+1].buf),
- le32_to_cpu(np->tx_ring.orig[i+1].flaglen),
- le32_to_cpu(np->tx_ring.orig[i+2].buf),
- le32_to_cpu(np->tx_ring.orig[i+2].flaglen),
- le32_to_cpu(np->tx_ring.orig[i+3].buf),
- le32_to_cpu(np->tx_ring.orig[i+3].flaglen));
- } else {
+ if (unlikely(debug_tx_timeout)) {
+ int i;
+
+ netdev_info(dev, "Ring at %lx\n", (unsigned long)np->ring_addr);
+ netdev_info(dev, "Dumping tx registers\n");
+ for (i = 0; i <= np->register_size; i += 32) {
netdev_info(dev,
- "%03x: %08x %08x %08x // %08x %08x %08x // %08x %08x %08x // %08x %08x %08x\n",
+ "%3x: %08x %08x %08x %08x %08x %08x %08x %08x\n",
i,
- le32_to_cpu(np->tx_ring.ex[i].bufhigh),
- le32_to_cpu(np->tx_ring.ex[i].buflow),
- le32_to_cpu(np->tx_ring.ex[i].flaglen),
- le32_to_cpu(np->tx_ring.ex[i+1].bufhigh),
- le32_to_cpu(np->tx_ring.ex[i+1].buflow),
- le32_to_cpu(np->tx_ring.ex[i+1].flaglen),
- le32_to_cpu(np->tx_ring.ex[i+2].bufhigh),
- le32_to_cpu(np->tx_ring.ex[i+2].buflow),
- le32_to_cpu(np->tx_ring.ex[i+2].flaglen),
- le32_to_cpu(np->tx_ring.ex[i+3].bufhigh),
- le32_to_cpu(np->tx_ring.ex[i+3].buflow),
- le32_to_cpu(np->tx_ring.ex[i+3].flaglen));
+ readl(base + i + 0), readl(base + i + 4),
+ readl(base + i + 8), readl(base + i + 12),
+ readl(base + i + 16), readl(base + i + 20),
+ readl(base + i + 24), readl(base + i + 28));
+ }
+ netdev_info(dev, "Dumping tx ring\n");
+ for (i = 0; i < np->tx_ring_size; i += 4) {
+ if (!nv_optimized(np)) {
+ netdev_info(dev,
+ "%03x: %08x %08x // %08x %08x // %08x %08x // %08x %08x\n",
+ i,
+ le32_to_cpu(np->tx_ring.orig[i].buf),
+ le32_to_cpu(np->tx_ring.orig[i].flaglen),
+ le32_to_cpu(np->tx_ring.orig[i+1].buf),
+ le32_to_cpu(np->tx_ring.orig[i+1].flaglen),
+ le32_to_cpu(np->tx_ring.orig[i+2].buf),
+ le32_to_cpu(np->tx_ring.orig[i+2].flaglen),
+ le32_to_cpu(np->tx_ring.orig[i+3].buf),
+ le32_to_cpu(np->tx_ring.orig[i+3].flaglen));
+ } else {
+ netdev_info(dev,
+ "%03x: %08x %08x %08x // %08x %08x %08x // %08x %08x %08x // %08x %08x %08x\n",
+ i,
+ le32_to_cpu(np->tx_ring.ex[i].bufhigh),
+ le32_to_cpu(np->tx_ring.ex[i].buflow),
+ le32_to_cpu(np->tx_ring.ex[i].flaglen),
+ le32_to_cpu(np->tx_ring.ex[i+1].bufhigh),
+ le32_to_cpu(np->tx_ring.ex[i+1].buflow),
+ le32_to_cpu(np->tx_ring.ex[i+1].flaglen),
+ le32_to_cpu(np->tx_ring.ex[i+2].bufhigh),
+ le32_to_cpu(np->tx_ring.ex[i+2].buflow),
+ le32_to_cpu(np->tx_ring.ex[i+2].flaglen),
+ le32_to_cpu(np->tx_ring.ex[i+3].bufhigh),
+ le32_to_cpu(np->tx_ring.ex[i+3].buflow),
+ le32_to_cpu(np->tx_ring.ex[i+3].flaglen));
+ }
}
}
@@ -6006,6 +6014,9 @@ module_param(phy_cross, int, 0);
MODULE_PARM_DESC(phy_cross, "Phy crossover detection for Realtek 8201 phy is enabled by setting to 1 and disabled by setting to 0.");
module_param(phy_power_down, int, 0);
MODULE_PARM_DESC(phy_power_down, "Power down phy and disable link when interface is down (1), or leave phy powered up (0).");
+module_param(debug_tx_timeout, bool, false);
+MODULE_PARM_DESC(debug_tx_timeout,
+ "Dump tx related registers and ring when tx_timeout happens");
MODULE_AUTHOR("Manfred Spraul <manfred@colorfullife.com>");
MODULE_DESCRIPTION("Reverse Engineered nForce ethernet driver");
--
1.7.3.1
^ permalink raw reply related
* Re: [stable submission] vmxnet3: Fix inconsistent LRO state after initialization
From: Greg KH @ 2011-05-19 0:27 UTC (permalink / raw)
To: Thomas Jarosch; +Cc: netdev, stable
In-Reply-To: <4DD2536E.1000008@intra2net.com>
On Tue, May 17, 2011 at 12:52:30PM +0200, Thomas Jarosch wrote:
> Hi greg k-h,
>
> please include commit ebde6f8acba92abfc203585198a54f47e83e2cd0
> "vmxnet3: Fix inconsistent LRO state after initialization"
>
> in 2.6.37 / 2.6.38 stable.
>
>
> Kernel 2.6.32 first included the vmxnet3 driver and I've checked
> that the patch applies to 2.6.32.40 and 2.6.35.13, so it might be
> worth to include it in all maintained kernels.
Now queued up.
thanks,
greg k-h
_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable
^ permalink raw reply
* Re: [PATCH 2/6] forcedeth: new ethtool stat "tx_timeout" to account for tx_timeouts
From: Stephen Hemminger @ 2011-05-19 0:59 UTC (permalink / raw)
To: David Decotigny
Cc: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel,
kernel-net-upstream, Sameer Nanda
In-Reply-To: <1305764080-24853-2-git-send-email-decot@google.com>
On Wed, 18 May 2011 17:14:36 -0700
David Decotigny <decot@google.com> wrote:
> From: Sameer Nanda <snanda@google.com>
>
> This change publishes a new ethtool stats: tx_timeout that counts the
> number of times the tx_timeout callback was triggered.
>
>
> Signed-off-by: David Decotigny <decot@google.com>
Since this is generic, maybe should be done that way not through ethtool
that way tools and administrators don't have to look for something special.
Something like:
--- a/include/linux/netdevice.h 2011-05-18 17:40:15.901691265 -0700
+++ b/include/linux/netdevice.h 2011-05-18 17:56:11.731742792 -0700
@@ -571,6 +571,8 @@ struct netdev_queue {
* please use this field instead of dev->trans_start
*/
unsigned long trans_start;
+
+ unsigned long trans_timeout;
} ____cacheline_aligned_in_smp;
static inline int netdev_queue_numa_node_read(const struct netdev_queue *q)
--- a/net/core/net-sysfs.c 2011-05-18 17:50:54.540403456 -0700
+++ b/net/core/net-sysfs.c 2011-05-18 17:57:47.136747867 -0700
@@ -788,7 +788,6 @@ net_rx_queue_update_kobjects(struct net_
#endif
}
-#ifdef CONFIG_XPS
/*
* netdev_queue sysfs structures and functions.
*/
@@ -834,6 +833,17 @@ static const struct sysfs_ops netdev_que
.store = netdev_queue_attr_store,
};
+static ssize_t show_trans_timeout(struct netdev_queue *queue,
+ struct netdev_queue_attribute *attribute,
+ char *buf)
+{
+ return sprintf(buf, "%lu", queue->trans_timeout);
+}
+
+static struct netdev_queue_attribute queue_trans_timeout =
+ __ATTR(tx_timeout, S_IRUGO, show_trans_timeout, NULL);
+
+#ifdef CONFIG_XPS
static inline unsigned int get_netdev_queue_index(struct netdev_queue *queue)
{
struct net_device *dev = queue->dev;
@@ -1043,9 +1053,13 @@ error:
static struct netdev_queue_attribute xps_cpus_attribute =
__ATTR(xps_cpus, S_IRUGO | S_IWUSR, show_xps_map, store_xps_map);
+#endif /* CONFIG_XPS */
static struct attribute *netdev_queue_default_attrs[] = {
+ &queue_trans_timeout.attr,
+#ifdef CONFIG_XPS
&xps_cpus_attribute.attr,
+#endif
NULL
};
@@ -1125,7 +1139,6 @@ static int netdev_queue_add_kobject(stru
return error;
}
-#endif /* CONFIG_XPS */
int
netdev_queue_update_kobjects(struct net_device *net, int old_num, int new_num)
--- a/net/sched/sch_generic.c 2011-05-18 17:45:07.740756564 -0700
+++ b/net/sched/sch_generic.c 2011-05-18 17:48:18.474761735 -0700
@@ -245,6 +245,7 @@ static void dev_watchdog(unsigned long a
if (netif_tx_queue_stopped(txq) &&
time_after(jiffies, (trans_start +
dev->watchdog_timeo))) {
+ ++txq->trans_timeout;
some_queue_timedout = 1;
break;
}
^ permalink raw reply
* linux-next: manual merge of the wireless tree with the net tree
From: Stephen Rothwell @ 2011-05-19 1:02 UTC (permalink / raw)
To: John W. Linville
Cc: linux-next, linux-kernel, Jiri Pirko, David Miller, netdev,
Javier Lopez
Hi John,
Today's linux-next merge of the wireless tree got a conflict in
drivers/net/wireless/mac80211_hwsim.c between commit 1c5cae815d19 ("net:
call dev_alloc_name from register_netdevice") from the net tree and
commit 444c7896bf5b ("mac80211_hwsim driver support userspace frame
tx/rx") from the wireless tree.
Just context changes. I fixed it up (see below) and can carry the fix as
necessary.
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
diff --cc drivers/net/wireless/mac80211_hwsim.c
index 9d4a40e,d8ec575..0000000
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@@ -1519,8 -1880,23 +1878,16 @@@ static int __init init_mac80211_hwsim(v
if (err < 0)
goto failed_mon;
-
- err = register_netdevice(hwsim_mon);
- if (err < 0)
- goto failed_mon;
-
- rtnl_unlock();
-
+ err = hwsim_init_netlink();
+ if (err < 0)
+ goto failed_nl;
+
return 0;
+ failed_nl:
+ printk(KERN_DEBUG "mac80211_hwsim: failed initializing netlink\n");
+ return err;
+
failed_mon:
rtnl_unlock();
free_netdev(hwsim_mon);
^ permalink raw reply
* Re: ip_vs_ftp causing ip_vs oops on module load.
From: Simon Horman @ 2011-05-19 1:10 UTC (permalink / raw)
To: Dave Jones; +Cc: netdev, Wensong Zhang
In-Reply-To: <20110518201915.GB20475@redhat.com>
On Wed, May 18, 2011 at 04:19:15PM -0400, Dave Jones wrote:
> I get this oops from ip_vs_ftp..
>
> general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> last sysfs file: /sys/module/nf_nat/refcnt
> CPU 3
> Modules linked in: ip_vs(+) libcrc32c nf_nat nfsd lockd nfs_acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_realtek ppdev snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm microcode edac_core snd_timer k10temp snd pcspkr usb_debug edac_mce_amd soundcore snd_page_alloc sp5100_tco i2c_piix4 parport_pc parport wmi r8169 mii lm63 ipv6 pata_acpi firewire_ohci ata_generic firewire_core crc_itu_t pata_atiixp floppy radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: nf_nat]
>
> Pid: 1366, comm: modprobe Not tainted 2.6.39-rc7+ #15 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
> RIP: 0010:[<ffffffff8107bddb>] [<ffffffff8107bddb>] notifier_chain_register+0xb/0x2a
> RSP: 0018:ffff880114139e68 EFLAGS: 00010206
> RAX: 2f736e74656e2f74 RBX: ffffffffa04265d0 RCX: 0000000000000003
> RDX: 00000000656e6567 RSI: ffffffffa04265d0 RDI: ffffffffa04235d8
> RBP: ffff880114139e68 R08: ffff880114139df8 R09: 0000000000000001
> R10: 0000000000000001 R11: 00000000000001cc R12: ffffffffa0432106
> R13: 0000000000000000 R14: 0000000000007f0d R15: 0000000000410e40
> FS: 00007f2aaf242720(0000) GS:ffff88012a800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f2aaea0100f CR3: 000000011424f000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process modprobe (pid: 1366, threadinfo ffff880114138000, task ffff8801146cc7a0)
> Stack:
> ffff880114139e78 ffffffff8107be36 ffff880114139ec8 ffffffff81403058
> 0000000000000000 0000000000000000 ffff880114139ea8 0000000000000000
> ffffffffa0432106 0000000000000000 0000000000007f0d 0000000000410e40
> Call Trace:
> [<ffffffff8107be36>] raw_notifier_chain_register+0xe/0x10
> [<ffffffff81403058>] register_netdevice_notifier+0x2d/0x1b6
> [<ffffffffa0432106>] ? ip_vs_conn_init+0x106/0x106 [ip_vs]
> [<ffffffffa04322c7>] ip_vs_control_init+0xa5/0xce [ip_vs]
> [<ffffffffa0432106>] ? ip_vs_conn_init+0x106/0x106 [ip_vs]
> [<ffffffffa0432116>] ip_vs_init+0x10/0x11c [ip_vs]
> [<ffffffff81002099>] do_one_initcall+0x7f/0x13a
> [<ffffffff81096524>] sys_init_module+0x132/0x281
> [<ffffffff814cc702>] system_call_fastpath+0x16/0x1b
> Code: 07 ff c8 89 43 48 eb 08 48 89 df e8 dc 95 44 00 4c 89 e6 48 89 df e8 a7 a5 44 00 5b 41 5c 5d c3 55 48 89 e5 66 66 66 66 90 eb 0c <8b> 50 10 39 56 10 7f 0c 48 8d 78 08 48 8b 07 48 85 c0 75 ec 48
> RIP [<ffffffff8107bddb>] notifier_chain_register+0xb/0x2a
> RSP <ffff880114139e68>
> ---[ end trace e90d7053ad1a7a5b ]---
>
>
> This script replicates the bug.
> (it usually oopses after just a few loops)
>
> #!/bin/sh
> while [ 1 ];
> do
> modprobe ip_vs_ftp
> modprobe -r ip_vs_ftp
> done
>
> Looks like something isn't getting cleaned up on module exit
> that we fall over when we encounter it next time it gets loaded ?
Thanks Dave, I will look into this.
^ permalink raw reply
* RE: packet received in a wrong rx-queue?
From: Jon Zhou @ 2011-05-19 1:26 UTC (permalink / raw)
To: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F250D748939@MILEXCH2.ds.jdsu.net>
[-- Attachment #1: Type: text/plain, Size: 1488 bytes --]
Anyone can help to check the traffic file?
Thanks
jon
> -----Original Message-----
> From: Jon Zhou [mailto:Jon.Zhou@jdsu.com]
> Sent: Wednesday, May 18, 2011 5:01 PM
> To: e1000-devel@lists.sourceforge.net; netdev@vger.kernel.org
> Subject: [E1000-devel] packet received in a wrong rx-queue?
>
> There are 2 packets in the traffic
>
> #1 create PDP context request, IPV4--UDP--GTP, src_ip=A and
> dst_ip=B,src_port=C,dst_port=D
>
> #2 create PDP context response, IPV4--UDP--GTP,src_ip=B, dst_ip=A,
> src_port=D,dst_port=C
>
> I suppose both of them will be received in same rx-queue but actually
> it doesn't
> Anything need to check?
>
> ethtool -i eth5
> driver: ixgbe
> version: 3.3.9-NAPI
> firmware-version: 0.9-3
> bus-info: 0000:08:00.1
>
> regards
> jon
>
>
>
>
>
> -----------------------------------------------------------------------
> -------
> What Every C/C++ and Fortran developer Should Know!
> Read this article and learn how Intel has extended the reach of its
> next-generation tools to help Windows* and Linux* C/C++ and Fortran
> developers boost performance applications - including clusters.
> http://p.sf.net/sfu/intel-dev2devmay
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel® Ethernet, visit
> http://communities.intel.com/community/wired
[-- Attachment #2: UpdatePdpContextTransaction12-WithCreatePdpDeletePdp.pcap --]
[-- Type: application/octet-stream, Size: 1187 bytes --]
^ permalink raw reply
* Re: packet received in a wrong rx-queue?
From: David Miller @ 2011-05-19 1:32 UTC (permalink / raw)
To: Jon.Zhou; +Cc: e1000-devel, netdev
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F250D749114@MILEXCH2.ds.jdsu.net>
From: Jon Zhou <Jon.Zhou@jdsu.com>
Date: Wed, 18 May 2011 18:26:07 -0700
> Anyone can help to check the traffic file?
I told you yesterday that this behavior is expected.
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply
* Re: [PATCH 5/6] forcedeth: Add messages to indicate using MSI or MSI-X
From: Ben Hutchings @ 2011-05-19 1:34 UTC (permalink / raw)
To: David Decotigny
Cc: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel,
kernel-net-upstream, Mike Ditto
In-Reply-To: <1305764080-24853-5-git-send-email-decot@google.com>
On Wed, 2011-05-18 at 17:14 -0700, David Decotigny wrote:
> From: Mike Ditto <mditto@google.com>
>
> This adds a few debug messages to indicate whether PCIe interrupts are
> signaled with MSI or MSI-X.
>
>
> Signed-off-by: David Decotigny <decot@google.com>
> ---
> drivers/net/forcedeth.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
> index 17e79de..2712ddc 100644
> --- a/drivers/net/forcedeth.c
> +++ b/drivers/net/forcedeth.c
> @@ -3745,6 +3745,7 @@ static int nv_request_irq(struct net_device *dev, int intr_test)
> writel(0, base + NvRegMSIXMap0);
> writel(0, base + NvRegMSIXMap1);
> }
> + netdev_info(dev, "forcedeth: MSI-X enabled\n");
[...]
netdev_info() and similar logging functions already include the driver
name.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* Re: [PATCH 4/6] forcedeth: Acknowledge only interrupts that are being processed
From: Ben Hutchings @ 2011-05-19 1:35 UTC (permalink / raw)
To: David Decotigny
Cc: David S. Miller, Joe Perches, Szymon Janc, netdev, linux-kernel,
kernel-net-upstream, Mike Ditto
In-Reply-To: <1305764080-24853-4-git-send-email-decot@google.com>
On Wed, 2011-05-18 at 17:14 -0700, David Decotigny wrote:
> From: Mike Ditto <mditto@google.com>
>
> This is to avoid a race, accidentally acknowledging an interrupt that
> we didn't notice and won't immediately process. This is based solely
> on code inspection; it is not known if there was an actual bug here.
>
>
> Signed-off-by: David Decotigny <decot@google.com>
> ---
> drivers/net/forcedeth.c | 13 ++++++++-----
> 1 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
> index 7a6aa08..17e79de 100644
> --- a/drivers/net/forcedeth.c
> +++ b/drivers/net/forcedeth.c
> @@ -3403,7 +3403,8 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
>
> for (i = 0;; i++) {
> events = readl(base + NvRegMSIXIrqStatus) & NVREG_IRQ_TX_ALL;
> - writel(NVREG_IRQ_TX_ALL, base + NvRegMSIXIrqStatus);
> + writel(events, base + NvRegMSIXIrqStatus);
> + netdev_dbg(dev, "%s: tx irq: %08x\n", dev->name, events);
[...]
netdev_dbg() and related logging functions already include the device
name, too. :-)
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply
* [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: Benoit Sigoure @ 2011-05-19 2:22 UTC (permalink / raw)
To: davem, kuznet, pekkas, jmorris, yoshfuji, kaber, hagen,
eric.dumazet, alexander.zimmermann
Cc: netdev, linux-kernel, Benoit Sigoure
In-Reply-To: <20110518.155200.801089483916944725.davem@davemloft.net>
Prior to this patch, Linux would always use 3 seconds (compile-time
constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune
this down to 1 second and, in case of a timeout during the TCP 3WHS,
revert the RTO back up to 3 seconds when data transmission begins.
This patch implements this behavior but retains default values for
the initial RTO of 3 seconds, instead of 1 second as is suggested
in the draft RFC. This way, in a default configuration, the behavior
of Linux's TCP is unchanged.
This patch also adds 2 knobs to tweak the initial RTO:
- tcp_initial_rto: initial RTO used during the 3WHS (default remains
unchanged: 3 seconds). This was previously a compile-time constant.
- tcp_initial_fallback_rto: the RTO to fallback to if a timeout occurs
during the 3WHS, with a default value of 3 seconds too, as per the
draft RFC.
Signed-off-by: Benoit Sigoure <tsunanet@gmail.com>
---
On Wed, May 18, 2011 at 12:52 PM, David Miller <davem@davemloft.net> wrote:
> I'll just as easily accept right now a patch right now which lowers
> the initial RTO to 1 second and adds the 3 second RTO fallback.
Here's a first attempt at a patch that implements the behavior described in
the draft RFC. I only compiled it so far, if you would like to move forward
with this approach, I'll go ahead and test it on a real server.
I'm not sure whether COUNTER_TRIES in syncookies.c should be based off
sysctl_tcp_initial_rto or sysctl_tcp_initial_fallback_rto, if we're going
to take the first one down to 1s...
Documentation/networking/ip-sysctl.txt | 19 +++++++++++++++++++
include/net/tcp.h | 4 +++-
net/ipv4/syncookies.c | 2 +-
net/ipv4/sysctl_net_ipv4.c | 20 ++++++++++++++++++++
net/ipv4/tcp.c | 4 ++--
net/ipv4/tcp_input.c | 13 +++++++++----
net/ipv4/tcp_ipv4.c | 6 +++---
net/ipv4/tcp_minisocks.c | 6 +++---
net/ipv4/tcp_output.c | 2 +-
net/ipv4/tcp_timer.c | 10 ++++++----
net/ipv6/syncookies.c | 2 +-
net/ipv6/tcp_ipv6.c | 6 +++---
12 files changed, 71 insertions(+), 23 deletions(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index d3d653a..590042c 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -384,6 +384,25 @@ tcp_retries2 - INTEGER
RFC 1122 recommends at least 100 seconds for the timeout,
which corresponds to a value of at least 8.
+tcp_initial_rto - INTEGER
+ This value sets the initial retransmit timeout (in milliseconds),
+ that is how long the kernel will wait before retransmitting the
+ initial SYN packet.
+
+ RFC 1122 says that this SHOULD be 3000 milliseconds, which is the
+ default. Note that draft RFC 2988bis-02 says that this SHOULD be
+ 1000 milliseconds, which might become the default value in future
+ versions.
+
+tcp_initial_fallback_rto - INTEGER
+ This value sets the initial retransmit timeout (in milliseconds)
+ to use after completing a three-way handshake during which the
+ initial SYN packet had to be retransmitted after waiting for
+ tcp_initial_rto milliseconds.
+
+ Draft RFC 2988bis-02 says that this MUST be 3000 milliseconds,
+ which is the default.
+
tcp_rfc1337 - BOOLEAN
If set, the TCP stack behaves conforming to RFC1337. If unset,
we are not conforming to RFC, but prevent TCP TIME_WAIT
diff --git a/include/net/tcp.h b/include/net/tcp.h
index cda30ea..c974242 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -213,6 +213,8 @@ extern int sysctl_tcp_syn_retries;
extern int sysctl_tcp_synack_retries;
extern int sysctl_tcp_retries1;
extern int sysctl_tcp_retries2;
+extern int sysctl_tcp_initial_rto; /* in jiffies */
+extern int sysctl_tcp_initial_fallback_rto; /* in jiffies */
extern int sysctl_tcp_orphan_retries;
extern int sysctl_tcp_syncookies;
extern int sysctl_tcp_retrans_collapse;
@@ -295,7 +297,7 @@ static inline void tcp_synq_overflow(struct sock *sk)
static inline int tcp_synq_no_recent_overflow(const struct sock *sk)
{
unsigned long last_overflow = tcp_sk(sk)->rx_opt.ts_recent_stamp;
- return time_after(jiffies, last_overflow + TCP_TIMEOUT_INIT);
+ return time_after(jiffies, last_overflow + sysctl_tcp_initial_rto);
}
extern struct proto tcp_prot;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 8b44c6d..b035968 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -186,7 +186,7 @@ __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
* sysctl_tcp_retries1. It's a rather complicated formula (exponential
* backoff) to compute at runtime so it's currently hardcoded here.
*/
-#define COUNTER_TRIES 4
+#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1)
/*
* Check if a ack sequence number is a valid syncookie.
* Return the decoded mss if it is, or 0 if not.
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 321e6e8..abe8cfc 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -30,6 +30,8 @@ static int tcp_adv_win_scale_min = -31;
static int tcp_adv_win_scale_max = 31;
static int ip_ttl_min = 1;
static int ip_ttl_max = 255;
+static int tcp_min_rto = TCP_RTO_MIN;
+static int tcp_max_rto = TCP_RTO_MAX;
/* Update system visible IP port range */
static void set_local_port_range(int range[2])
@@ -247,6 +249,24 @@ static struct ctl_table ipv4_table[] = {
.proc_handler = proc_dointvec
},
{
+ .procname = "tcp_initial_rto",
+ .data = &sysctl_tcp_initial_rto,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_ms_jiffies,
+ .extra1 = &tcp_min_rto,
+ .extra2 = &tcp_max_rto,
+ },
+ {
+ .procname = "tcp_initial_fallback_rto",
+ .data = &sysctl_tcp_initial_fallback_rto,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_ms_jiffies,
+ .extra1 = &tcp_min_rto,
+ .extra2 = &tcp_max_rto,
+ },
+ {
.procname = "tcp_fin_timeout",
.data = &sysctl_tcp_fin_timeout,
.maxlen = sizeof(int),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b22d450..e9e7c3f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2352,7 +2352,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_DEFER_ACCEPT:
/* Translate value in seconds to number of retransmits */
icsk->icsk_accept_queue.rskq_defer_accept =
- secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ,
+ secs_to_retrans(val, sysctl_tcp_initial_rto / HZ,
TCP_RTO_MAX / HZ);
break;
@@ -2539,7 +2539,7 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
break;
case TCP_DEFER_ACCEPT:
val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept,
- TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ);
+ sysctl_tcp_initial_rto / HZ, TCP_RTO_MAX / HZ);
break;
case TCP_WINDOW_CLAMP:
val = tp->window_clamp;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index bef9f04..513cf7a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -868,6 +868,11 @@ static void tcp_init_metrics(struct sock *sk)
{
struct tcp_sock *tp = tcp_sk(sk);
struct dst_entry *dst = __sk_dst_get(sk);
+ /* If we had to retransmit anything during the 3WHS,
+ * use the initial fallback RTO.
+ */
+ int init_rto = inet_csk(sk)->icsk_retransmits ?
+ sysctl_tcp_initial_fallback_rto : sysctl_tcp_initial_rto;
if (dst == NULL)
goto reset;
@@ -890,7 +895,7 @@ static void tcp_init_metrics(struct sock *sk)
if (dst_metric(dst, RTAX_RTT) == 0)
goto reset;
- if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (TCP_TIMEOUT_INIT << 3))
+ if (!tp->srtt && dst_metric_rtt(dst, RTAX_RTT) < (init_rto << 3))
goto reset;
/* Initial rtt is determined from SYN,SYN-ACK.
@@ -916,7 +921,7 @@ static void tcp_init_metrics(struct sock *sk)
tp->mdev_max = tp->rttvar = max(tp->mdev, tcp_rto_min(sk));
}
tcp_set_rto(sk);
- if (inet_csk(sk)->icsk_rto < TCP_TIMEOUT_INIT && !tp->rx_opt.saw_tstamp) {
+ if (inet_csk(sk)->icsk_rto < init_rto && !tp->rx_opt.saw_tstamp) {
reset:
/* Play conservative. If timestamps are not
* supported, TCP will fail to recalculate correct
@@ -924,8 +929,8 @@ reset:
*/
if (!tp->rx_opt.saw_tstamp && tp->srtt) {
tp->srtt = 0;
- tp->mdev = tp->mdev_max = tp->rttvar = TCP_TIMEOUT_INIT;
- inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+ tp->mdev = tp->mdev_max = tp->rttvar = init_rto;
+ inet_csk(sk)->icsk_rto = init_rto;
}
}
tp->snd_cwnd = tcp_init_cwnd(tp, dst);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f7e6c2c..21920e6 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1383,7 +1383,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
want_cookie)
goto drop_and_free;
- inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
+ inet_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto);
return 0;
drop_and_release:
@@ -1834,8 +1834,8 @@ static int tcp_v4_init_sock(struct sock *sk)
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
- icsk->icsk_rto = TCP_TIMEOUT_INIT;
- tp->mdev = TCP_TIMEOUT_INIT;
+ icsk->icsk_rto = sysctl_tcp_initial_rto;
+ tp->mdev = sysctl_tcp_initial_rto;
/* So many TCP implementations out there (incorrectly) count the
* initial SYN frame in their delayed-ACK and congestion control
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 80b1f80..c63ffa0 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -472,8 +472,8 @@ struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req,
tcp_init_wl(newtp, treq->rcv_isn);
newtp->srtt = 0;
- newtp->mdev = TCP_TIMEOUT_INIT;
- newicsk->icsk_rto = TCP_TIMEOUT_INIT;
+ newtp->mdev = sysctl_tcp_initial_rto;
+ newicsk->icsk_rto = sysctl_tcp_initial_rto;
newtp->packets_out = 0;
newtp->retrans_out = 0;
@@ -582,7 +582,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
* it can be estimated (approximately)
* from another data.
*/
- tmp_opt.ts_recent_stamp = get_seconds() - ((TCP_TIMEOUT_INIT/HZ)<<req->retrans);
+ tmp_opt.ts_recent_stamp = get_seconds() - ((sysctl_tcp_initial_rto/HZ)<<req->retrans);
paws_reject = tcp_paws_reject(&tmp_opt, th->rst);
}
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 17388c7..e34b0f6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2599,7 +2599,7 @@ static void tcp_connect_init(struct sock *sk)
tp->rcv_wup = 0;
tp->copied_seq = 0;
- inet_csk(sk)->icsk_rto = TCP_TIMEOUT_INIT;
+ inet_csk(sk)->icsk_rto = sysctl_tcp_initial_rto;
inet_csk(sk)->icsk_retransmits = 0;
tcp_clear_retrans(tp);
}
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index ecd44b0..47fa600 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -29,6 +29,8 @@ int sysctl_tcp_keepalive_probes __read_mostly = TCP_KEEPALIVE_PROBES;
int sysctl_tcp_keepalive_intvl __read_mostly = TCP_KEEPALIVE_INTVL;
int sysctl_tcp_retries1 __read_mostly = TCP_RETR1;
int sysctl_tcp_retries2 __read_mostly = TCP_RETR2;
+int sysctl_tcp_initial_rto __read_mostly = TCP_TIMEOUT_INIT;
+int sysctl_tcp_initial_fallback_rto __read_mostly = TCP_TIMEOUT_INIT;
int sysctl_tcp_orphan_retries __read_mostly;
int sysctl_tcp_thin_linear_timeouts __read_mostly;
@@ -135,8 +137,8 @@ static void tcp_mtu_probing(struct inet_connection_sock *icsk, struct sock *sk)
/* This function calculates a "timeout" which is equivalent to the timeout of a
* TCP connection after "boundary" unsuccessful, exponentially backed-off
- * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if
- * syn_set flag is set.
+ * retransmissions with an initial RTO of TCP_RTO_MIN or
+ * sysctl_tcp_initial_rto if syn_set flag is set.
*/
static bool retransmits_timed_out(struct sock *sk,
unsigned int boundary,
@@ -144,7 +146,7 @@ static bool retransmits_timed_out(struct sock *sk,
bool syn_set)
{
unsigned int linear_backoff_thresh, start_ts;
- unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
+ unsigned int rto_base = syn_set ? sysctl_tcp_initial_rto : TCP_RTO_MIN;
if (!inet_csk(sk)->icsk_retransmits)
return false;
@@ -495,7 +497,7 @@ out_unlock:
static void tcp_synack_timer(struct sock *sk)
{
inet_csk_reqsk_queue_prune(sk, TCP_SYNQ_INTERVAL,
- TCP_TIMEOUT_INIT, TCP_RTO_MAX);
+ sysctl_tcp_initial_rto, TCP_RTO_MAX);
}
void tcp_syn_ack_timeout(struct sock *sk, struct request_sock *req)
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 352c260..f8a07a8 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -45,7 +45,7 @@ static __u16 const msstab[] = {
* sysctl_tcp_retries1. It's a rather complicated formula (exponential
* backoff) to compute at runtime so it's currently hardcoded here.
*/
-#define COUNTER_TRIES 4
+#define COUNTER_TRIES (sysctl_tcp_initial_rto/HZ + 1)
static inline struct sock *get_cookie_sock(struct sock *sk, struct sk_buff *skb,
struct request_sock *req,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 4f49e5d..7e791e6 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1349,7 +1349,7 @@ have_isn:
want_cookie)
goto drop_and_free;
- inet6_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
+ inet6_csk_reqsk_queue_hash_add(sk, req, sysctl_tcp_initial_rto);
return 0;
drop_and_release:
@@ -1957,8 +1957,8 @@ static int tcp_v6_init_sock(struct sock *sk)
tcp_init_xmit_timers(sk);
tcp_prequeue_init(tp);
- icsk->icsk_rto = TCP_TIMEOUT_INIT;
- tp->mdev = TCP_TIMEOUT_INIT;
+ icsk->icsk_rto = sysctl_tcp_initial_rto;
+ tp->mdev = sysctl_tcp_initial_rto;
/* So many TCP implementations out there (incorrectly) count the
* initial SYN frame in their delayed-ACK and congestion control
--
1.7.0.4
^ permalink raw reply related
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: David Miller @ 2011-05-19 2:36 UTC (permalink / raw)
To: tsunanet
Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet,
alexander.zimmermann, netdev, linux-kernel
In-Reply-To: <1305771744-83951-1-git-send-email-tsunanet@gmail.com>
From: Benoit Sigoure <tsunanet@gmail.com>
Date: Wed, 18 May 2011 19:22:24 -0700
> Prior to this patch, Linux would always use 3 seconds (compile-time
> constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune
> this down to 1 second and, in case of a timeout during the TCP 3WHS,
> revert the RTO back up to 3 seconds when data transmission begins.
We just had a discussion where it was determined that changes to
these settings are "network specific" and therefore that if it
is appropriate at all (I'm still not convinced) it is only suitable
as a routing metric.
^ permalink raw reply
* Re: [PATCH 1/2] forcedeth: make module parameters readable in /sys/module
From: Bill Fink @ 2011-05-19 3:09 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David Decotigny, David S. Miller, Joe Perches, Szymon Janc,
netdev, linux-kernel, kernel-net-upstream
In-Reply-To: <20110518150346.508d6406@nehalam>
On Wed, 18 May 2011, Stephen Hemminger wrote:
> On Wed, 18 May 2011 14:09:59 -0700
> David Decotigny <decot@google.com> wrote:
>
> > This change allows to publish the values of the module parameters in
> > /sys/module.
>
> Although this makes more info for developer, it also means more
> stuff in sysfs taking more memory and not providing any real value
> that can't be found by looking at the /etc/modprobe.d for any parameters
> user might have set.
As a user, I find having the module parameter info in
/sys/module/driver/parameters/* extremely useful at times.
In tracking down weird problems I can for example do:
grep . /sys/module/driver/parameters/*
to get a snapshot of module parameters. I can then compare
this with other similar systems to see if there are any
differences of note that might be significant (anything
from differences due to kernel versions to just forgetting
some change I made).
I don't see that it's really much extra significant overhead
on the system either.
-Bill
^ permalink raw reply
* Re: [PATCH] IPv6 transmit hashing for bonding driver
From: John @ 2011-05-19 3:25 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: netdev
In-Reply-To: <26860.1305680256@death>
On 5/17/2011 5:57 PM, Jay Vosburgh wrote:
> It would also be useful to include an update bonding.txt to
> describe the IPv6 algorithm; I'd word that something like the following
> (filling in the missing bits) for the layer3+4 section, applying similar
> changes to the layer2+3 section:
>
Thanks for the feedback. This is a good point, I will take care of this too.
>
> Style nit: I don't believe the outermost parentheses are
> necessary. Since you do this twice, perhaps make a small inline
> function to handle it.
>
The outer parenthesis are definitely not required; I will remove those.
I did speak with Andy Gospodarek about breaking out all of the hashing
methods into separate functions. I'll give that some more thought.
>
> For fragmented datagrams, the above will keep all fragments
> together, which is good, but are there other header types that should be
> skipped over to find the UDP/TCP header for hashing purposes?
>
This is a good question, and I'm not too sure how to proceed. There are
other headers that can sit between the IPv6 header and the upper
protocol payload (hop-by-hop, destination options, routing, fragment,
AH, ESP, mobility), and the current implementation would handle any of
those being present by ignoring the upper protocol data and only hashing
on the source and destination IPv6 addresses.
I was trying to avoid loops but one would be required to process the
headers. Additionally there would need to be code (or a table) that
knows how to process each header type, and that may require maintenance
any time a new header option become popular.
It's definitely do-able, though. Any thoughts?
John
^ permalink raw reply
* Re: ip_vs_ftp causing ip_vs oops on module load.
From: Simon Horman @ 2011-05-19 3:26 UTC (permalink / raw)
To: Dave Jones; +Cc: netdev, Wensong Zhang
In-Reply-To: <20110519011045.GF16688@verge.net.au>
On Thu, May 19, 2011 at 10:10:46AM +0900, Simon Horman wrote:
> On Wed, May 18, 2011 at 04:19:15PM -0400, Dave Jones wrote:
> > I get this oops from ip_vs_ftp..
> >
> > general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > last sysfs file: /sys/module/nf_nat/refcnt
> > CPU 3
> > Modules linked in: ip_vs(+) libcrc32c nf_nat nfsd lockd nfs_acl auth_rpcgss sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables snd_hda_codec_realtek ppdev snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm microcode edac_core snd_timer k10temp snd pcspkr usb_debug edac_mce_amd soundcore snd_page_alloc sp5100_tco i2c_piix4 parport_pc parport wmi r8169 mii lm63 ipv6 pata_acpi firewire_ohci ata_generic firewire_core crc_itu_t pata_atiixp floppy radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: nf_nat]
> >
> > Pid: 1366, comm: modprobe Not tainted 2.6.39-rc7+ #15 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
> > RIP: 0010:[<ffffffff8107bddb>] [<ffffffff8107bddb>] notifier_chain_register+0xb/0x2a
> > RSP: 0018:ffff880114139e68 EFLAGS: 00010206
> > RAX: 2f736e74656e2f74 RBX: ffffffffa04265d0 RCX: 0000000000000003
> > RDX: 00000000656e6567 RSI: ffffffffa04265d0 RDI: ffffffffa04235d8
> > RBP: ffff880114139e68 R08: ffff880114139df8 R09: 0000000000000001
> > R10: 0000000000000001 R11: 00000000000001cc R12: ffffffffa0432106
> > R13: 0000000000000000 R14: 0000000000007f0d R15: 0000000000410e40
> > FS: 00007f2aaf242720(0000) GS:ffff88012a800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f2aaea0100f CR3: 000000011424f000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process modprobe (pid: 1366, threadinfo ffff880114138000, task ffff8801146cc7a0)
> > Stack:
> > ffff880114139e78 ffffffff8107be36 ffff880114139ec8 ffffffff81403058
> > 0000000000000000 0000000000000000 ffff880114139ea8 0000000000000000
> > ffffffffa0432106 0000000000000000 0000000000007f0d 0000000000410e40
> > Call Trace:
> > [<ffffffff8107be36>] raw_notifier_chain_register+0xe/0x10
> > [<ffffffff81403058>] register_netdevice_notifier+0x2d/0x1b6
> > [<ffffffffa0432106>] ? ip_vs_conn_init+0x106/0x106 [ip_vs]
> > [<ffffffffa04322c7>] ip_vs_control_init+0xa5/0xce [ip_vs]
> > [<ffffffffa0432106>] ? ip_vs_conn_init+0x106/0x106 [ip_vs]
> > [<ffffffffa0432116>] ip_vs_init+0x10/0x11c [ip_vs]
> > [<ffffffff81002099>] do_one_initcall+0x7f/0x13a
> > [<ffffffff81096524>] sys_init_module+0x132/0x281
> > [<ffffffff814cc702>] system_call_fastpath+0x16/0x1b
> > Code: 07 ff c8 89 43 48 eb 08 48 89 df e8 dc 95 44 00 4c 89 e6 48 89 df e8 a7 a5 44 00 5b 41 5c 5d c3 55 48 89 e5 66 66 66 66 90 eb 0c <8b> 50 10 39 56 10 7f 0c 48 8d 78 08 48 8b 07 48 85 c0 75 ec 48
> > RIP [<ffffffff8107bddb>] notifier_chain_register+0xb/0x2a
> > RSP <ffff880114139e68>
> > ---[ end trace e90d7053ad1a7a5b ]---
> >
> >
> > This script replicates the bug.
> > (it usually oopses after just a few loops)
> >
> > #!/bin/sh
> > while [ 1 ];
> > do
> > modprobe ip_vs_ftp
> > modprobe -r ip_vs_ftp
> > done
> >
> > Looks like something isn't getting cleaned up on module exit
> > that we fall over when we encounter it next time it gets loaded ?
>
> Thanks Dave, I will look into this.
Hi Dave,
I'm not having much luck reproducing this in KVM.
I will try this evening on real hardware.
Just to make sure we are testing the same thing, are you using Linus's tree?
^ permalink raw reply
* RE: packet received in a wrong rx-queue?
From: Jon Zhou @ 2011-05-19 3:22 UTC (permalink / raw)
To: David Miller; +Cc: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
In-Reply-To: <20110518.213225.1546509125052154115.davem@davemloft.net>
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, May 19, 2011 9:32 AM
> To: Jon Zhou
> Cc: e1000-devel@lists.sourceforge.net; netdev@vger.kernel.org
> Subject: Re: packet received in a wrong rx-queue?
>
> From: Jon Zhou <Jon.Zhou@jdsu.com>
> Date: Wed, 18 May 2011 18:26:07 -0700
>
> > Anyone can help to check the traffic file?
>
> I told you yesterday that this behavior is expected.
form the 82599 datasheet, the hash algorithm is consist of src/dst ip, src/dst port,protocol
why it got different hash value with same ip/port pair?
^ permalink raw reply
* Re: packet received in a wrong rx-queue?
From: David Miller @ 2011-05-19 3:36 UTC (permalink / raw)
To: Jon.Zhou; +Cc: e1000-devel, netdev
In-Reply-To: <4A6A2125329CFD4D8CC40C9E8ABCAB9F250D7491A8@MILEXCH2.ds.jdsu.net>
From: Jon Zhou <Jon.Zhou@jdsu.com>
Date: Wed, 18 May 2011 20:22:07 -0700
> form the 82599 datasheet, the hash algorithm is consist of src/dst
> ip, src/dst port,protocol why it got different hash value with same
> ip/port pair?
The same reason why feeding different sets of discrete 32-bit and
16-bit values to a cryptographic hash results in a different final
hash value.
Our software RPS/RFS implementation used to have this quality too.
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
^ permalink raw reply
* (unknown)
From: WESTERN UNION MONEY TRANSFER @ 2011-05-19 3:33 UTC (permalink / raw)
Dear Western Union Customer,
You have been awarded with the sum of
$50,000 USD by our office, as one of our
customers who use Western Union in their daily
business transaction.
This award has been selected through the
internet, where your e-mail address was
indicated and notified.
Please provide Mr. Gary Epps with the following
detailslisted below so that your fund will be
remited to you through Western Union.
1. Name:______
2. Address________
3. Country:_______
4. Phone Number____
5. Occupation:________
6. Sex:_________________
7. Age___________________
Mr. Gary Epps
Tel: +393883557681
E-mail: wu.africadept12@w.cn
As soon as these details are received and
verified,your fund will be transferred to you.
Thank you, for using western union.
^ permalink raw reply
* (unknown)
From: WESTERN UNION MONEY TRANSFER @ 2011-05-19 3:33 UTC (permalink / raw)
Dear Western Union Customer,
You have been awarded with the sum of
$50,000 USD by our office, as one of our
customers who use Western Union in their daily
business transaction.
This award has been selected through the
internet, where your e-mail address was
indicated and notified.
Please provide Mr. Gary Epps with the following
detailslisted below so that your fund will be
remited to you through Western Union.
1. Name:______
2. Address________
3. Country:_______
4. Phone Number____
5. Occupation:________
6. Sex:_________________
7. Age___________________
Mr. Gary Epps
Tel: +393883557681
E-mail: wu.africadept12@w.cn
As soon as these details are received and
verified,your fund will be transferred to you.
Thank you, for using western union.
^ permalink raw reply
* (unknown)
From: WESTERN UNION MONEY TRANSFER @ 2011-05-19 3:32 UTC (permalink / raw)
Dear Western Union Customer,
You have been awarded with the sum of
$50,000 USD by our office, as one of our
customers who use Western Union in their daily
business transaction.
This award has been selected through the
internet, where your e-mail address was
indicated and notified.
Please provide Mr. Gary Epps with the following
detailslisted below so that your fund will be
remited to you through Western Union.
1. Name:______
2. Address________
3. Country:_______
4. Phone Number____
5. Occupation:________
6. Sex:_________________
7. Age___________________
Mr. Gary Epps
Tel: +393883557681
E-mail: wu.africadept12@w.cn
As soon as these details are received and
verified,your fund will be transferred to you.
Thank you, for using western union.
^ permalink raw reply
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: tsuna @ 2011-05-19 3:56 UTC (permalink / raw)
To: David Miller
Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet,
alexander.zimmermann, netdev, linux-kernel
In-Reply-To: <20110518.223622.1525088601595365235.davem@davemloft.net>
On Wed, May 18, 2011 at 7:36 PM, David Miller <davem@davemloft.net> wrote:
> From: Benoit Sigoure <tsunanet@gmail.com>
> Date: Wed, 18 May 2011 19:22:24 -0700
>
>> Prior to this patch, Linux would always use 3 seconds (compile-time
>> constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune
>> this down to 1 second and, in case of a timeout during the TCP 3WHS,
>> revert the RTO back up to 3 seconds when data transmission begins.
>
> We just had a discussion where it was determined that changes to
> these settings are "network specific" and therefore that if it
> is appropriate at all (I'm still not convinced) it is only suitable
> as a routing metric.
Fair enough. I'll take another stab at it and see if I can change
this to be on a per network basis. Do I need any patch that's not yet
in Linus' tree? I'm referring to this:
On Tue, May 17, 2011 at 5:20 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Adding many knobs to each clone had a huge cost on previous kernels.
> (Think some machines have millions entries in IP route cache), this used
> quite a lot of memory.
>
> With latest David work, we'll consume less ram, because we can now share
> settings, instead of copying them on each dst entry.
If this has already been merged then it sounds like I should have
everything I need..?
--
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com
^ permalink raw reply
* Re: [PATCH] tcp: Implement a two-level initial RTO as per draft RFC 2988bis-02.
From: David Miller @ 2011-05-19 4:14 UTC (permalink / raw)
To: tsunanet
Cc: kuznet, pekkas, jmorris, yoshfuji, kaber, hagen, eric.dumazet,
alexander.zimmermann, netdev, linux-kernel
In-Reply-To: <BANLkTinEN_=jSCw4qR1PqtWaQ+07OMq7tg@mail.gmail.com>
From: tsuna <tsunanet@gmail.com>
Date: Wed, 18 May 2011 20:56:33 -0700
> On Wed, May 18, 2011 at 7:36 PM, David Miller <davem@davemloft.net> wrote:
>> From: Benoit Sigoure <tsunanet@gmail.com>
>> Date: Wed, 18 May 2011 19:22:24 -0700
>>
>>> Prior to this patch, Linux would always use 3 seconds (compile-time
>>> constant) as the initial RTO. Draft RFC 2988bis-02 proposes to tune
>>> this down to 1 second and, in case of a timeout during the TCP 3WHS,
>>> revert the RTO back up to 3 seconds when data transmission begins.
>>
>> We just had a discussion where it was determined that changes to
>> these settings are "network specific" and therefore that if it
>> is appropriate at all (I'm still not convinced) it is only suitable
>> as a routing metric.
>
> Fair enough. I'll take another stab at it and see if I can change
> this to be on a per network basis. Do I need any patch that's not yet
> in Linus' tree? I'm referring to this:
Keep in mind another thing I do not like about this knob.
The IETF draft has a requirement that we fallback to 3 seconds if the
initial RTO is 1 second.
Nothing in your facilities ensure this, or provide a way for the
kernel to make sure this is the case.
And for other values of initial RTO, what fallback is appropriate?
As a result of all of this, I do not really think this is something
the user should control at all.
I really would rather see the initial RTO be static and be set to 1
with fallback RTO of 3.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox