Netdev List
 help / color / mirror / Atom feed
* Re: [git patches] net driver fixes
From: David Miller @ 2008-01-23 11:01 UTC (permalink / raw)
  To: jeff; +Cc: netdev
In-Reply-To: <20080123100518.GA12590@havoc.gtf.org>

From: Jeff Garzik <jeff@garzik.org>
Date: Wed, 23 Jan 2008 05:05:18 -0500

> master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-davem

Pulled into net-2.6, thanks Jeff.

^ permalink raw reply

* Re: 2.6.24-rc8 ppp regression
From: David Miller @ 2008-01-23 10:58 UTC (permalink / raw)
  To: max; +Cc: netdev
In-Reply-To: <20080123093509.GA4718@stro.at>

From: maximilian attems <max@stro.at>
Date: Wed, 23 Jan 2008 10:35:09 +0100

> Jan 22 23:23:54 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1

Already fixed by:

[NEIGH]: Revert 'Fix race between neigh_parms_release and neightbl_fill_parms'

Commit 9cd40029423701c376391da59d2c6469672b4bed (Fix race between
neigh_parms_release and neightbl_fill_parms) introduced device
reference counting regressions for several people, see:

	http://bugzilla.kernel.org/show_bug.cgi?id=9778

for example.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index cc8a2f1..29b8ee4 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1316,6 +1316,8 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 			*p = parms->next;
 			parms->dead = 1;
 			write_unlock_bh(&tbl->lock);
+			if (parms->dev)
+				dev_put(parms->dev);
 			call_rcu(&parms->rcu_head, neigh_rcu_free_parms);
 			return;
 		}
@@ -1326,8 +1328,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 
 void neigh_parms_destroy(struct neigh_parms *parms)
 {
-	if (parms->dev)
-		dev_put(parms->dev);
 	kfree(parms);
 }
 

^ permalink raw reply related

* Re: [PATCH 2.6.25 1/1]S2io: Multiqueue network device support implementation
From: Andi Kleen @ 2008-01-23 10:58 UTC (permalink / raw)
  To: Sreenivasa Honnur; +Cc: netdev, jeff, support
In-Reply-To: <Pine.GSO.4.10.10801221942530.24414-100000@guinness>

Sreenivasa Honnur <Sreenivasa.Honnur@neterion.com> writes:

> Multiqueue netwrok device support implementation.
> - Added a loadable parameter "multiq" to enable/disable multiqueue support,
>   by default it is disabled.
> - skb->queue_mapping is not used for queue/fifo selection. FIFO iselection is
>   based on IP-TOS value, 0x0-0xF TOS values are mapped to 8 FIFOs.

Standard way to use that would be using skb->priority

But I'm surprised you bother with TOS for multi queues at all. TOS
isn't a too important mechanism.

I would have thought the primary use case would be per CPU TX completion
interrupts. With that the queue should be selected based on
the the current CPU.

-Andi

^ permalink raw reply

* Re: Assertions in latest kernels
From: David Miller @ 2008-01-23 10:57 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: krkumar2, netdev
In-Reply-To: <Pine.LNX.4.64.0801231043220.31652@kivilampi-30.cs.helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 23 Jan 2008 11:17:05 +0200 (EET)

> What a strange thing that it has been super quiet on this front
> until now everybody is seeing it, could there be something unrelated
> to TCP which has broken it all recently?

I think it is simply "capture effect".

One person reports the problem, and then others see it and
say "me too" because they were previously too busy to report
their instance of the problem.

Another thing is that a developer is sometimes working on something
and they do not want to pollute the bug reporting if the code they are
writing is what is causing the bug via some indirect corruptions or
whatever :-)

^ permalink raw reply

* Re: 2.6.24-rc8 ppp regression
From: Ben Hutchings @ 2008-01-23 10:53 UTC (permalink / raw)
  To: maximilian attems; +Cc: netdev
In-Reply-To: <20080123093509.GA4718@stro.at>

maximilian attems wrote:
> 
> when killing a wvdial usb modem session:
> 
> Jan 22 23:23:03 dual pppd[7941]: Terminating on signal 15
> Jan 22 23:23:03 dual pppd[7941]: Connect time 92.2 minutes.
> Jan 22 23:23:03 dual pppd[7941]: Sent 1322316 bytes, received 8587156 bytes.
> Jan 22 23:23:03 dual pppd[7941]: Connection terminated.
> Jan 22 23:23:05 dual kernel: usb 3-2: USB disconnect, address 5
> Jan 22 23:23:05 dual kernel: option 3-2:1.0: device disconnected
> Jan 22 23:23:05 dual kernel: option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
> Jan 22 23:23:05 dual kernel: option 3-2:1.1: device disconnected
> Jan 22 23:23:13 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
> Jan 22 23:23:44 dual last message repeated 3 times
> Jan 22 23:23:54 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1

This is a known bug which doesn't just affect PPP:
<http://bugzilla.kernel.org/show_bug.cgi?id=9778>.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.

^ permalink raw reply

* Re: Assertions in latest kernels
From: Ilpo Järvinen @ 2008-01-23 10:49 UTC (permalink / raw)
  To: David Miller; +Cc: Krishna Kumar2, Netdev
In-Reply-To: <Pine.LNX.4.64.0801231140120.31652@kivilampi-30.cs.helsinki.fi>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2953 bytes --]

On Wed, 23 Jan 2008, Ilpo Järvinen wrote:

> On Wed, 23 Jan 2008, Krishna Kumar2 wrote:
> 
> > Hi Ilpo,
> > 
> > > It's almost impossible to know which of these is the main cause and the
> > > first occuring due to reasons I'll not copy here. What a strange thing
> > > that it has been super quiet on this front until now everybody is seeing
> > > it, could there be something unrelated to TCP which has broken it all
> > > recently?
> > 
> > I have been getting this for atleast 3 weeks but I was quiet since those
> > were kernels that I had modified.
> 
> Since you can easily reproduce it, lets just figure out what's causing it 
> hard way, rather than digging the endless git-logs... :-)

Hmm, perhaps it could be something related to this (and some untested 
path somewhere which is now exposed):

commit 4a55b553f691abadaa63570dfc714e20913561c1
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Date:   Thu Dec 20 20:36:03 2007 -0800

    [TCP]: Fix TSO deferring

Dave, what do you think? Wouldn't explain the one -rc only report though 
from Denys. Another one I'm a bit unsure of is this:

commit 757c32944b80fd95542bd66f06032ab773034d53
Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Date:   Thu Jan 3 20:39:01 2008 -0800

    [TCP]: Perform setting of common control fields in one place

->sacked field is cleared in tcp_retransmit_skb due to a subtle change, 
which might be buggy.... However, I find it rather unlikely that this 
would explain Kumar's case. Anyway, here's the one that reverts the 
problematic part of it. ...and this is net-2.6.25 as well so it won't 
solve Denys' case either.

-- 
 i.

--
[PATCH] [TCP]: Revert part of "[TCP]: Perform setting of common control..."

This commit reverts part of 757c32944b80fd95542bd66f06032ab773034d53
([TCP]: Perform setting of common control fields in one place)
because it's not valid to clear ->sacked field like that without
additional precautions with counters, mostly lost_out, sacked_out
should not be set due to reneging which kicks in much earlier than
this.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 648340f..f6cbc1f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1887,10 +1887,12 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	    (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) &&
 	    tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
 		if (!pskb_trim(skb, 0)) {
-			/* Reuse, even though it does some unnecessary work */
-			tcp_init_nondata_skb(skb, TCP_SKB_CB(skb)->end_seq - 1,
-					     TCP_SKB_CB(skb)->flags);
+			TCP_SKB_CB(skb)->seq = TCP_SKB_CB(skb)->end_seq - 1;
+			skb_shinfo(skb)->gso_segs = 1;
+			skb_shinfo(skb)->gso_size = 0;
+			skb_shinfo(skb)->gso_type = 0;
 			skb->ip_summed = CHECKSUM_NONE;
+			skb->csum = 0;
 		}
 	}
 
-- 
1.5.2.2

^ permalink raw reply related

* Re: Assertions in latest kernels
From: Krishna Kumar2 @ 2008-01-23 10:38 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev
In-Reply-To: <Pine.LNX.4.64.0801231140120.31652@kivilampi-30.cs.helsinki.fi>

ilpo.jarvinen@helsinki.fi wrote on 01/23/2008 03:23:24 PM:

> > There were couple of patch apply failures in .c files which I fixed by
> > hand.
> > But when compiling, I got these errors (I am using DM's 2.6.24-rc7
kernel,
> > net-2.6.25.git):
>
> Well, that's annoying, you didn't mention net-2.6.25 back then, it sure
> is incompatible with it like already the patch title said... :-)

I did mention that in my first mail:

"Bits are unmodified 2.6.24-rc7 bits downloaded today (Jan 23rd)".

I tried it hoping it would not change a lot, but it didn't work
obviously :)

> ...This is either due to mismerge or due your modifications.

Again, it was due to wrong patch bits (to reiterate, I am using
unmodified bits without any modifications), but let's move ahead.
Your patch in this mail applied cleanly for me, and I am compiling
now.

- KK


^ permalink raw reply

* Re: [PATCH] pci-skeleton: Misc fixes to build neatly
From: Jike Song @ 2008-01-23 10:16 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev, linux-kernel
In-Reply-To: <47970FF9.7010009@garzik.org>

On 1/23/08, Jeff Garzik <jeff@garzik.org> wrote:
>
> ACK but git-am (everybody's patch import tool) says the patch is corrupted
>
Hi Jeff,

Thank you very much for your acknowledgement!  This is my first patch
for Linux kernel, sorry for the corruption.  I'll resend it ASAP.

Regards,
Jike

^ permalink raw reply

* [git patches] net driver fixes
From: Jeff Garzik @ 2008-01-23 10:05 UTC (permalink / raw)
  To: David Miller; +Cc: netdev


Francois noted that these warranted promotion from net-2.6.25.git to
current 2.6.24-rc.

NOTE:  These changesets were cherry-picked from net-2.6.25, without any
modifications.  Any future rebase or merge should hopefully notice this
automatically.



Please pull from 'upstream-davem' branch of
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-davem

to receive the following updates:

 drivers/net/sis190.c |   21 ++++++++++-----------
 1 files changed, 10 insertions(+), 11 deletions(-)

Francois Romieu (4):
      sis190: add cmos ram access code for the SiS19x/968 chipset pair
      sis190: remove duplicate INIT_WORK
      sis190: mdio operation failure is not correctly detected
      sis190: scheduling while atomic error

diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c
index 7eab072..b570402 100644
--- a/drivers/net/sis190.c
+++ b/drivers/net/sis190.c
@@ -372,7 +372,7 @@ static void __mdio_cmd(void __iomem *ioaddr, u32 ctl)
 		msleep(1);
 	}
 
-	if (i > 999)
+	if (i > 99)
 		printk(KERN_ERR PFX "PHY command failed !\n");
 }
 
@@ -847,10 +847,8 @@ static void sis190_soft_reset(void __iomem *ioaddr)
 {
 	SIS_W32(IntrControl, 0x8000);
 	SIS_PCI_COMMIT();
-	msleep(1);
 	SIS_W32(IntrControl, 0x0);
 	sis190_asic_down(ioaddr);
-	msleep(1);
 }
 
 static void sis190_hw_start(struct net_device *dev)
@@ -1041,8 +1039,6 @@ static int sis190_open(struct net_device *dev)
 	if (rc < 0)
 		goto err_free_rx_1;
 
-	INIT_WORK(&tp->phy_task, sis190_phy_task);
-
 	sis190_request_timer(dev);
 
 	rc = request_irq(dev->irq, sis190_interrupt, IRQF_SHARED, dev->name, dev);
@@ -1549,28 +1545,31 @@ static int __devinit sis190_get_mac_addr_from_eeprom(struct pci_dev *pdev,
 }
 
 /**
- *	sis190_get_mac_addr_from_apc - Get MAC address for SiS965 model
+ *	sis190_get_mac_addr_from_apc - Get MAC address for SiS96x model
  *	@pdev: PCI device
  *	@dev:  network device to get address for
  *
- *	SiS965 model, use APC CMOS RAM to store MAC address.
+ *	SiS96x model, use APC CMOS RAM to store MAC address.
  *	APC CMOS RAM is accessed through ISA bridge.
  *	MAC address is read into @net_dev->dev_addr.
  */
 static int __devinit sis190_get_mac_addr_from_apc(struct pci_dev *pdev,
 						  struct net_device *dev)
 {
+	static const u16 __devinitdata ids[] = { 0x0965, 0x0966, 0x0968 };
 	struct sis190_private *tp = netdev_priv(dev);
 	struct pci_dev *isa_bridge;
 	u8 reg, tmp8;
-	int i;
+	unsigned int i;
 
 	net_probe(tp, KERN_INFO "%s: Read MAC address from APC.\n",
 		  pci_name(pdev));
 
-	isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, 0x0965, NULL);
-	if (!isa_bridge)
-		isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, 0x0966, NULL);
+	for (i = 0; i < ARRAY_SIZE(ids); i++) {
+		isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, ids[i], NULL);
+		if (isa_bridge)
+			break;
+	}
 
 	if (!isa_bridge) {
 		net_probe(tp, KERN_INFO "%s: Can not find ISA bridge.\n",

^ permalink raw reply related

* Re: [PATCH] pci-skeleton: Misc fixes to build neatly
From: Jeff Garzik @ 2008-01-23  9:59 UTC (permalink / raw)
  To: Jike Song; +Cc: netdev, linux-kernel
In-Reply-To: <df9815e70801212216v79ff8137kccc6a94ddb96c818@mail.gmail.com>

Jike Song wrote:
> Hello Jeff,
> 
> The pci-skeleton.c has several problems with compilation, such as missing args
> when calling synchronize_irq(). Fix it.
> 
> Signed-off-by: Jike Song <albcamus@gmail.com>
> ---
>  drivers/net/pci-skeleton.c |   49 ++++++++++++++++++++++---------------------
>  1 files changed, 25 insertions(+), 24 deletions(-)

ACK but git-am (everybody's patch import tool) says the patch is corrupted



^ permalink raw reply

* Re: Assertions in latest kernels
From: Ilpo Järvinen @ 2008-01-23  9:53 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, Netdev
In-Reply-To: <OF6C895B68.E4A75EFD-ON652573D9.0034E821-652573D9.0034BF2E@in.ibm.com>

On Wed, 23 Jan 2008, Krishna Kumar2 wrote:

> Hi Ilpo,
> 
> > It's almost impossible to know which of these is the main cause and the
> > first occuring due to reasons I'll not copy here. What a strange thing
> > that it has been super quiet on this front until now everybody is seeing
> > it, could there be something unrelated to TCP which has broken it all
> > recently?
> 
> I have been getting this for atleast 3 weeks but I was quiet since those
> were kernels that I had modified.

Since you can easily reproduce it, lets just figure out what's causing it 
hard way, rather than digging the endless git-logs... :-)

> > Good thing is that you seem to be able to reproduce it were nicely.
> > Please try with this beauty below... Hopefully got it correctly matching
> > to mainline, in contrast to version I sent Dave Y., there's some added
> > candy which catches highest_sack corruption as well, at least it compiles
> 
> > already :-).
> 
> There were couple of patch apply failures in .c files which I fixed by
> hand.
> But when compiling, I got these errors (I am using DM's 2.6.24-rc7 kernel,
> net-2.6.25.git):

Well, that's annoying, you didn't mention net-2.6.25 back then, it sure 
is incompatible with it like already the patch title said... :-)


> net/ipv4/tcp_output.c: In function 'tcp_push_one':
> net/ipv4/tcp_output.c:1573: error: 'tp' undeclared (first use in this
> function)
> net/ipv4/tcp_output.c:1573: error: (Each undeclared identifier is reported
> only once
> net/ipv4/tcp_output.c:1573: error: for each function it appears in.)

...This is either due to mismerge or due your modifications.

Lets re-iterate, compiled ok for me tcp_{ipv4,input,output}.o...


-- 
 i.

[PATCH] [TCP]: debug S+L (for net-2.5.26 / mm, incompatible with mainline)

---
 include/net/tcp.h     |    8 +++-
 net/ipv4/tcp_input.c  |   10 +++++
 net/ipv4/tcp_ipv4.c   |  107 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_output.c |   21 +++++++---
 4 files changed, 139 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..0685035 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void			tcp_verify_wq(struct sock *sk);
+
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void			tcp_shutdown (struct sock *sk, int how);
@@ -768,7 +770,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)	WARN_ON(tcp_left_out(tp) > tp->packets_out)
+#define tcp_verify_left_out(tp)	\
+	do { \
+		WARN_ON(tcp_left_out(tp) > tp->packets_out); \
+		tcp_verify_wq((struct sock *)tp); \
+	} while(0)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fa2c85c..cdacf70 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2645,6 +2645,10 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
 		tcp_update_scoreboard(sk, fast_rexmit);
 	tcp_cwnd_down(sk, flag);
+
+	WARN_ON(tcp_write_queue_head(sk) == NULL);
+	WARN_ON(!tp->packets_out);
+
 	tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2848,6 +2852,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets)
 		tcp_clear_all_retrans_hints(tp);
 	}
 
+	tcp_verify_left_out(tp);
+
 	if (skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
 		flag |= FLAG_SACK_RENEGING;
 
@@ -3175,6 +3181,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	prior_fackets = tp->fackets_out;
 	prior_in_flight = tcp_packets_in_flight(tp);
 
+	tcp_verify_left_out(tp);
+
 	if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
 		/* Window is constant, pure forward advance.
 		 * No more checks are required.
@@ -3237,6 +3245,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
 		dst_confirm(sk->sk_dst_cache);
 
+	tcp_verify_left_out(tp);
+
 	return 1;
 
 no_queue:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9aea88b..c95682e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,113 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	char s[50+1];
+	char h[50+1];
+	int idx = 0;
+	int i;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		for (i = 0; i < tcp_skb_pcount(skb); i++) {
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+				s[idx] = 'S';
+				if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+					s[idx] = 'B';
+
+			} else if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) {
+				s[idx] = 'L';
+			} else {
+				s[idx] = ' ';
+			}
+			if (s[idx] != ' ' && skb->len < tp->mss_cache)
+				s[idx] += 'a' - 'A';
+
+			if (i == 0) {
+				if (skb == tcp_highest_sack(sk))
+					h[idx] = 'h';
+				else
+					h[idx] = '+';
+			} else {
+				h[idx] = '-';
+			}
+
+			if (++idx >= 50) {
+				s[idx] = 0;
+				h[idx] = 0;
+				printk(KERN_ERR "TCP wq(s) %s\n", s);
+				printk(KERN_ERR "TCP wq(h) %s\n", h);
+				idx = 0;
+			}
+		}
+	}
+	if (idx) {
+		s[idx] = '<';
+		s[idx+1] = 0;
+		h[idx] = '<';
+		h[idx+1] = 0;
+		printk(KERN_ERR "TCP wq(s) %s\n", s);
+		printk(KERN_ERR "TCP wq(h) %s\n", h);
+	}
+	printk(KERN_ERR "l%u s%u f%u p%u seq: su%u hs%u sn%u\n",
+		tp->lost_out, tp->sacked_out, tp->fackets_out, tp->packets_out,
+		tp->snd_una, tcp_highest_sack_seq(tp), tp->snd_nxt);
+}
+
+void tcp_verify_wq(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 lost = 0;
+	u32 sacked = 0;
+	u32 packets = 0;
+	u32 fackets = 0;
+	struct sk_buff *skb;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+			sacked += tcp_skb_pcount(skb);
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+				printk(KERN_ERR "Sacked bitmap S+L: %u %u-%u/%u\n",
+					TCP_SKB_CB(skb)->sacked,
+					TCP_SKB_CB(skb)->end_seq - tp->snd_una,
+					TCP_SKB_CB(skb)->seq - tp->snd_una,
+					tp->snd_una);
+			fackets = sacked;
+		}
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			lost += tcp_skb_pcount(skb);
+
+		packets += tcp_skb_pcount(skb);
+	}
+
+	WARN_ON(lost != tp->lost_out);
+	WARN_ON(tcp_is_sack(tp) && (sacked != tp->sacked_out));
+	WARN_ON(packets != tp->packets_out);
+	WARN_ON(fackets != tp->fackets_out);
+
+	if ((lost != tp->lost_out) ||
+	    (tcp_is_sack(tp) && (sacked != tp->sacked_out)) ||
+	    (packets != tp->packets_out) ||
+	    (fackets != tp->fackets_out)) {
+		printk(KERN_ERR "P: %u L: %u vs %u S: %u vs %u F: %u vs %u w: %u-%u (%u)\n",
+			tp->packets_out,
+			lost, tp->lost_out,
+			sacked, tp->sacked_out,
+			fackets, tp->fackets_out,
+			tp->snd_una, tp->snd_nxt,
+			tp->rx_opt.sack_ok);
+		tcp_print_queue(sk);
+	}
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 89f0188..648340f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -779,10 +779,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 			tp->lost_out -= diff;
 
 		/* Adjust Reno SACK estimate. */
-		if (tcp_is_reno(tp) && diff > 0) {
+		if (tcp_is_reno(tp) && diff > 0)
 			tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
-			tcp_verify_left_out(tp);
-		}
+
 		tcp_adjust_fackets_out(sk, skb, diff);
 	}
 
@@ -790,6 +789,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
 
+	tcp_verify_left_out(tp);
+
 	return 0;
 }
 
@@ -1463,6 +1464,7 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 	} else if (result > 0) {
 		sent_pkts = 1;
 	}
+	tcp_verify_left_out(tp);
 
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
@@ -1764,6 +1766,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb,
 	tcp_clear_retrans_hints_partial(tp);
 
 	sk_wmem_free_skb(sk, next_skb);
+	tcp_verify_left_out(tp);
 }
 
 /* Do a simple retransmit without using the backoff mechanisms in
@@ -1795,13 +1798,13 @@ void tcp_simple_retransmit(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	tcp_clear_all_retrans_hints(tp);
 
 	if (!lost)
 		return;
 
-	tcp_verify_left_out(tp);
-
 	/* Don't muck with the congestion window here.
 	 * Reason is that we do not increase amount of _data_
 	 * in network, but units changed and effective
@@ -1970,8 +1973,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			 * packet to be MSS sized and all the
 			 * packet counting works out.
 			 */
-			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
+			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd) {
+				tcp_verify_left_out(tp);
 				return;
+			}
 
 			if (sacked & TCPCB_LOST) {
 				if (!(sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))) {
@@ -1997,6 +2002,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	/* OK, demanded retransmission is finished. */
 
 	/* Forward retransmissions are possible only during Recovery. */
@@ -2054,6 +2061,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
 	}
+
+	tcp_verify_left_out(tp);
 }
 
 /* Send a fin.  The caller locks the socket for us.  This cannot be
-- 
1.5.2.2


^ permalink raw reply related

* 2.6.24-rc8 ppp regression
From: maximilian attems @ 2008-01-23  9:35 UTC (permalink / raw)
  To: netdev


when killing a wvdial usb modem session:

Jan 22 23:23:03 dual pppd[7941]: Terminating on signal 15
Jan 22 23:23:03 dual pppd[7941]: Connect time 92.2 minutes.
Jan 22 23:23:03 dual pppd[7941]: Sent 1322316 bytes, received 8587156 bytes.
Jan 22 23:23:03 dual pppd[7941]: Connection terminated.
Jan 22 23:23:05 dual kernel: usb 3-2: USB disconnect, address 5
Jan 22 23:23:05 dual kernel: option 3-2:1.0: device disconnected
Jan 22 23:23:05 dual kernel: option1 ttyUSB1: GSM modem (1-port) converter now disconnected from ttyUSB1
Jan 22 23:23:05 dual kernel: option 3-2:1.1: device disconnected
Jan 22 23:23:13 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
Jan 22 23:23:44 dual last message repeated 3 times
Jan 22 23:23:54 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1


2.6.24-rc7 works fine, not yet bisected, will do later in the evening.

-- 
maks

^ permalink raw reply

* Re: [PATCH 12/12 net-2.6.25] [NETNS]: Add namespace for ICMP replying code.
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-23  9:30 UTC (permalink / raw)
  To: mathieu.lacage; +Cc: davem, den, containers, netdev, yoshfuji
In-Reply-To: <1201079789.10778.26.camel@mistigri.inria.fr>

In article <1201079789.10778.26.camel@mistigri.inria.fr> (at Wed, 23 Jan 2008 10:16:29 +0100), Mathieu Lacage <mathieu.lacage@sophia.inria.fr> says:

> I have been following the netns patches on this ML for a while but I
> still have not figured out in which tree the patches fed to David Miller
> are applied. I have attempted to grep the public trees 'davem/net-2.6'
> and 'davem/net-2.6.25' but without much success so far. Is there a
> public git tree I can clone which contains all the netns patches which
> David Miller state are 'Applied' ?

I'm cloning from
	git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git

There may be some time-lag.

--yoshfuji

^ permalink raw reply

* Re: Assertions in latest kernels
From: Ilpo Järvinen @ 2008-01-23  9:17 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, Netdev
In-Reply-To: <OF331AFFAC.CEF61228-ON652573D9.002CF47B-652573D9.002E0A84@in.ibm.com>

On Wed, 23 Jan 2008, Krishna Kumar2 wrote:

> David Miller <davem@davemloft.net> wrote on 01/23/2008 01:27:23 PM:
> 
> > > iperf with multiple threads almost always gets these 4, *especially*
> when I
> > > do some batching :).
> > >
> > > static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int
> flag)
> > > {
> > >       ...
> > >       if (WARN_ON(!tp->sacked_out && tp->fackets_out))
> > >             tp->fackets_out = 0;
> > >       ...
> > > }
> >
> > Does this assertion show up first or do you get the other TCP
> > ones first?  It might be important, in that if you get the
> > others ones first that corrupted state might be what leads to
> > this one.
> 
> Hi Dave,
> 
> I looked at my *old* messages file and found this assert (2506) was first
> to hit (atleast in
> two messages file). It hit 5 times, then I got a different one that I had
> not reported earlier:
> 
> "KERNEL: assertion (packets <= tp->packets_out) failed at
> net/ipv4/tcp_input.c (2139)"
> 
> (though this was hidden in my report under the panic for tcp_input.c:2528.
> 
> Then another two thousand times of the 2506 asserts.
> 
> Today I installed the latest untouched kernel, rebooted system and got the
> following errors
> in sequence, but no 2506 errors (which I have always got when running
> batching in the last
> 2-3 weeks):
> 
> Jan 22 02:07:55 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:07:56 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:07:56 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:07:57 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:07:58 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:07:59 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:07:59 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:00 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:01 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:01 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:02 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:03 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:03 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:04 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:05 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:06 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:06 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:07 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
> Jan 22 02:08:07 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:08 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:09 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:10 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:10 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:11 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:12 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:12 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:13 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:14 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:15 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:15 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
> Jan 22 02:08:16 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
> Jan 22 02:08:16 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
> Jan 22 02:08:17 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:18 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:18 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> Jan 22 02:08:19 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
> Jan 22 02:08:19 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
> 
> and so on for another 700 counts.
>
> The unique asserts are:
>       1767: tcp_verify_left_out (from tcp_entry_frto)
>       2169: tcp_verify_left_out (from tcp_mark_head_lost)
>       2528: tcp_verify_left_out (from tcp_fastretrans_alert)
>       3063: tcp_verify_left_out (from tcp_process_frto)
> (where 2169 seems to preceed any other asserts)

Once you get one of these, you'll get a large number of them, maybe I 
should just change it to WARN_ON_ONCE to stop confusing people with the 
rest.

> The other two asserts that I got only with batching are:
>       2139: BUG_TRAP(packets <= tp->packets_out); (in tcp_mark_head_lost)
>       2506: WARN_ON(!tp->sacked_out && tp->fackets_out) (in
> tcp_fastretrans_alert)
> (where 2506 always seems to preceed any other asserts).

It's almost impossible to know which of these is the main cause and the 
first occuring due to reasons I'll not copy here. What a strange thing
that it has been super quiet on this front until now everybody is seeing 
it, could there be something unrelated to TCP which has broken it all 
recently?

Good thing is that you seem to be able to reproduce it were nicely.
Please try with this beauty below... Hopefully got it correctly matching 
to mainline, in contrast to version I sent Dave Y., there's some added 
candy which catches highest_sack corruption as well, at least it compiles 
already :-).

-- 
 i.

[PATCH] TCP: debug S+L (for mainline, not compatible with mm or net-2.6.25)

---
 include/net/tcp.h     |    8 +++-
 net/ipv4/tcp_input.c  |   10 ++++
 net/ipv4/tcp_ipv4.c   |  123 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_output.c |   21 ++++++--
 4 files changed, 155 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index cb5b033..9ea62f9 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void			tcp_verify_wq(struct sock *sk);
+
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void			tcp_shutdown (struct sock *sk, int how);
@@ -769,7 +771,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)	WARN_ON(tcp_left_out(tp) > tp->packets_out)
+#define tcp_verify_left_out(tp)	\
+	do { \
+		WARN_ON(tcp_left_out(tp) > tp->packets_out); \
+		tcp_verify_wq((struct sock *)tp); \
+	} while(0)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b39f0d8..1e4db05 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2529,6 +2529,10 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (do_lost || tcp_head_timedout(sk))
 		tcp_update_scoreboard(sk);
 	tcp_cwnd_down(sk, flag);
+
+	WARN_ON(tcp_write_queue_head(sk) == NULL);
+	WARN_ON(!tp->packets_out);
+
 	tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2744,6 +2748,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p,
 		tcp_clear_all_retrans_hints(tp);
 	}
 
+	tcp_verify_left_out(tp);
+
 	if (flag & FLAG_ACKED) {
 		u32 pkts_acked = prior_packets - tp->packets_out;
 		const struct tcp_congestion_ops *ca_ops
@@ -3070,6 +3076,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	prior_fackets = tp->fackets_out;
 	prior_in_flight = tcp_packets_in_flight(tp);
 
+	tcp_verify_left_out(tp);
+
 	if (!(flag&FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
 		/* Window is constant, pure forward advance.
 		 * No more checks are required.
@@ -3131,6 +3139,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP))
 		dst_confirm(sk->sk_dst_cache);
 
+	tcp_verify_left_out(tp);
+
 	return 1;
 
 no_queue:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 652c323..9c56153 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,129 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	char s[50+1];
+	char h[50+1];
+	int idx = 0;
+	int i;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		for (i = 0; i < tcp_skb_pcount(skb); i++) {
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+				s[idx] = 'S';
+				if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+					s[idx] = 'B';
+
+			} else if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) {
+				s[idx] = 'L';
+			} else {
+				s[idx] = ' ';
+			}
+			if (s[idx] != ' ' && skb->len < tp->mss_cache)
+				s[idx] += 'a' - 'A';
+
+			if (i == 0) {
+				if ((TCP_SKB_CB(skb)->seq == tp->highest_sack) &&
+				    (tp->fastpath_skb_hint == skb))
+					h[idx] = 'x';
+				else if (TCP_SKB_CB(skb)->seq == tp->highest_sack)
+					h[idx] = 'h';
+				else if (tp->fastpath_skb_hint == skb)
+					h[idx] = 'f';
+				else
+					h[idx] = '+';
+			} else {
+				h[idx] = '-';
+			}
+
+			if (++idx >= 50) {
+				s[idx] = 0;
+				h[idx] = 0;
+				printk(KERN_ERR "TCP wq(s) %s\n", s);
+				printk(KERN_ERR "TCP wq(h) %s\n", h);
+				idx = 0;
+			}
+		}
+	}
+	if (idx) {
+		s[idx] = '<';
+		s[idx+1] = 0;
+		h[idx] = '<';
+		h[idx+1] = 0;
+		printk(KERN_ERR "TCP wq(s) %s\n", s);
+		printk(KERN_ERR "TCP wq(h) %s\n", h);
+	}
+	printk(KERN_ERR "l%u s%u f%u p%u seq: su%u hs%u sn%u\n",
+		tp->lost_out, tp->sacked_out, tp->fackets_out, tp->packets_out,
+		tp->snd_una, tp->highest_sack, tp->snd_nxt);
+}
+
+void tcp_verify_wq(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 lost = 0;
+	u32 sacked = 0;
+	u32 packets = 0;
+	u32 fackets = 0;
+	int hint_failed_at = -1;
+	int highest_sack_corrupt = 0;
+	struct sk_buff *skb;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		if (WARN_ON((tp->fastpath_skb_hint == skb) &&
+			    packets != tp->fastpath_cnt_hint))
+			hint_failed_at = packets;
+
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+			sacked += tcp_skb_pcount(skb);
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+				printk(KERN_ERR "Sacked bitmap S+L: %u %u-%u/%u\n",
+					TCP_SKB_CB(skb)->sacked,
+					TCP_SKB_CB(skb)->end_seq - tp->snd_una,
+					TCP_SKB_CB(skb)->seq - tp->snd_una,
+					tp->snd_una);
+			fackets = sacked;
+			if (WARN_ON(after(TCP_SKB_CB(skb)->seq, tp->highest_sack)))
+				highest_sack_corrupt = 1;
+		}
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			lost += tcp_skb_pcount(skb);
+
+		packets += tcp_skb_pcount(skb);
+	}
+
+	WARN_ON(lost != tp->lost_out);
+	WARN_ON(tcp_is_sack(tp) && (sacked != tp->sacked_out));
+	WARN_ON(packets != tp->packets_out);
+	WARN_ON(fackets != tp->fackets_out);
+
+	if ((lost != tp->lost_out) ||
+	    (tcp_is_sack(tp) && (sacked != tp->sacked_out)) ||
+	    (packets != tp->packets_out) ||
+	    (fackets != tp->fackets_out) ||
+	    (hint_failed_at != -1) ||
+	    highest_sack_corrupt) {
+		printk(KERN_ERR "P: %u L: %u vs %u S: %u vs %u F: %u vs %u H: %u vs %u w: %u-%u (%u, %u)\n",
+			tp->packets_out,
+			lost, tp->lost_out,
+			sacked, tp->sacked_out,
+			fackets, tp->fackets_out,
+			hint_failed_at, tp->fastpath_cnt_hint,
+			tp->snd_una, tp->snd_nxt,
+			tp->rx_opt.sack_ok, highest_sack_corrupt);
+		tcp_print_queue(sk);
+	}
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f4c1eef..cce7531 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -768,10 +768,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 			tp->lost_out -= diff;
 
 		/* Adjust Reno SACK estimate. */
-		if (tcp_is_reno(tp) && diff > 0) {
+		if (tcp_is_reno(tp) && diff > 0)
 			tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
-			tcp_verify_left_out(tp);
-		}
+
 		tcp_adjust_fackets_out(tp, skb, diff);
 	}
 
@@ -779,6 +778,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
 
+	tcp_verify_left_out(tp);
+
 	return 0;
 }
 
@@ -1439,6 +1440,7 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 	} else if (result > 0) {
 		sent_pkts = 1;
 	}
+	tcp_verify_left_out(tp);
 
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
@@ -1760,6 +1762,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 
 		sk_stream_free_skb(sk, next_skb);
 	}
+	tcp_verify_left_out(tp);
 }
 
 /* Do a simple retransmit without using the backoff mechanisms in
@@ -1791,13 +1794,13 @@ void tcp_simple_retransmit(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	tcp_clear_all_retrans_hints(tp);
 
 	if (!lost)
 		return;
 
-	tcp_verify_left_out(tp);
-
 	/* Don't muck with the congestion window here.
 	 * Reason is that we do not increase amount of _data_
 	 * in network, but units changed and effective
@@ -1966,8 +1969,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			 * packet to be MSS sized and all the
 			 * packet counting works out.
 			 */
-			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
+			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd) {
+				tcp_verify_left_out(tp);
 				return;
+			}
 
 			if (sacked & TCPCB_LOST) {
 				if (!(sacked&(TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))) {
@@ -1993,6 +1998,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	/* OK, demanded retransmission is finished. */
 
 	/* Forward retransmissions are possible only during Recovery. */
@@ -2050,6 +2057,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
 	}
+
+	tcp_verify_left_out(tp);
 }
 
 
-- 
1.5.2.2


^ permalink raw reply related

* Re: [PATCH 12/12 net-2.6.25] [NETNS]: Add namespace for ICMP replying code.
From: Mathieu Lacage @ 2008-01-23  9:16 UTC (permalink / raw)
  To: David Miller
  Cc: containers-qjLDD68F18O7TbgM5vRIOg, den-GEFAQzZX7r8dnm+yROfE0A,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080122.235130.84789113.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

hi,

On Tue, 2008-01-22 at 23:51 -0800, David Miller wrote:
> From: "Denis V. Lunev" <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> Date: Wed, 23 Jan 2008 10:46:27 +0300
> 
> > All needed API is done, the namespace is available when required from the
> > device on the DST entry from the incoming packet. So, just replace init_net
> > with proper namespace.
> > 
> > Other protocols will follow.
> > 
> > Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> 
> Applied, thanks.

I have been following the netns patches on this ML for a while but I
still have not figured out in which tree the patches fed to David Miller
are applied. I have attempted to grep the public trees 'davem/net-2.6'
and 'davem/net-2.6.25' but without much success so far. Is there a
public git tree I can clone which contains all the netns patches which
David Miller state are 'Applied' ?

thank you,
Mathieu

^ permalink raw reply

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Ilpo Järvinen @ 2008-01-23  8:41 UTC (permalink / raw)
  To: Dave Young; +Cc: LKML, Netdev, Andrew Morton
In-Reply-To: <a8e1da0801222344l101c2b90s363821c26bc8cb55@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 9210 bytes --]

On Wed, 23 Jan 2008, Dave Young wrote:

> On Jan 23, 2008 3:41 PM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> >
> > On Tue, 22 Jan 2008, David Miller wrote:
> >
> > > From: "Dave Young" <hidave.darkstar@gmail.com>
> > > Date: Wed, 23 Jan 2008 09:44:30 +0800
> > >
> > > > On Jan 22, 2008 6:47 PM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > > > > [PATCH] [TCP]: debug S+L
> > > >
> > > > Thanks, If there's new findings I will let you know.
> > >
> > > Thanks for helping with this bug Dave.
> >
> > I noticed btw that there thing might (is likely to) spuriously trigger at
> > WARN_ON(sacked != tp->sacked_out); because those won't be equal when SACK
> > is not enabled. If that does happen too often, I send a fixed patch for
> > it, yet, the fact that I print print tp->rx_opt.sack_ok allows
> > identification of those cases already as it's zero when SACK is not
> > enabled.
> >
> > Just ask if you need the updated debug patch.
> 
> Thanks,  please send, I would like to get it.

There you go. I fixed non-SACK case by adding tcp_is_sack checks there and 
also added two verifys to tcp_ack to see if there's corruption outside of 
TCP.

-- 
 i.

[PATCH] [TCP]: debug S+L

---
 include/net/tcp.h     |    8 +++-
 net/ipv4/tcp_input.c  |   10 +++++
 net/ipv4/tcp_ipv4.c   |  101 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_output.c |   21 +++++++---
 4 files changed, 133 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..0685035 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void			tcp_verify_wq(struct sock *sk);
+
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void			tcp_shutdown (struct sock *sk, int how);
@@ -768,7 +770,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)	WARN_ON(tcp_left_out(tp) > tp->packets_out)
+#define tcp_verify_left_out(tp)	\
+	do { \
+		WARN_ON(tcp_left_out(tp) > tp->packets_out); \
+		tcp_verify_wq((struct sock *)tp); \
+	} while(0)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fa2c85c..cdacf70 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2645,6 +2645,10 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
 		tcp_update_scoreboard(sk, fast_rexmit);
 	tcp_cwnd_down(sk, flag);
+
+	WARN_ON(tcp_write_queue_head(sk) == NULL);
+	WARN_ON(!tp->packets_out);
+
 	tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2848,6 +2852,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets)
 		tcp_clear_all_retrans_hints(tp);
 	}
 
+	tcp_verify_left_out(tp);
+
 	if (skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
 		flag |= FLAG_SACK_RENEGING;
 
@@ -3175,6 +3181,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	prior_fackets = tp->fackets_out;
 	prior_in_flight = tcp_packets_in_flight(tp);
 
+	tcp_verify_left_out(tp);
+
 	if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
 		/* Window is constant, pure forward advance.
 		 * No more checks are required.
@@ -3237,6 +3245,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
 		dst_confirm(sk->sk_dst_cache);
 
+	tcp_verify_left_out(tp);
+
 	return 1;
 
 no_queue:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9aea88b..7e8ab40 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,107 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	char s[50+1];
+	char h[50+1];
+	int idx = 0;
+	int i;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		for (i = 0; i < tcp_skb_pcount(skb); i++) {
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+				s[idx] = 'S';
+				if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+					s[idx] = 'B';
+
+			} else if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) {
+				s[idx] = 'L';
+			} else {
+				s[idx] = ' ';
+			}
+			if (s[idx] != ' ' && skb->len < tp->mss_cache)
+				s[idx] += 'a' - 'A';
+
+			if (i == 0) {
+				if (TCP_SKB_CB(skb)->seq == tcp_highest_sack_seq(tp))
+					h[idx] = 'h';
+				else
+					h[idx] = '+';
+			} else {
+				h[idx] = '-';
+			}
+
+			if (++idx >= 50) {
+				s[idx] = 0;
+				h[idx] = 0;
+				printk(KERN_ERR "TCP wq(s) %s\n", s);
+				printk(KERN_ERR "TCP wq(h) %s\n", h);
+				idx = 0;
+			}
+		}
+	}
+	if (idx) {
+		s[idx] = '<';
+		s[idx+1] = 0;
+		h[idx] = '<';
+		h[idx+1] = 0;
+		printk(KERN_ERR "TCP wq(s) %s\n", s);
+		printk(KERN_ERR "TCP wq(h) %s\n", h);
+	}
+	printk(KERN_ERR "l%u s%u f%u p%u seq: su%u hs%u sn%u\n",
+		tp->lost_out, tp->sacked_out, tp->fackets_out, tp->packets_out,
+		tp->snd_una, tcp_highest_sack_seq(tp), tp->snd_nxt);
+}
+
+void tcp_verify_wq(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 lost = 0;
+	u32 sacked = 0;
+	u32 packets = 0;
+	struct sk_buff *skb;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+			sacked += tcp_skb_pcount(skb);
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			printk(KERN_ERR "Sacked bitmap S+L: %u %u-%u/%u\n",
+				TCP_SKB_CB(skb)->sacked,
+				TCP_SKB_CB(skb)->end_seq - tp->snd_una,
+				TCP_SKB_CB(skb)->seq - tp->snd_una,
+				tp->snd_una);
+		}
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			lost += tcp_skb_pcount(skb);
+
+		packets += tcp_skb_pcount(skb);
+	}
+
+	WARN_ON(lost != tp->lost_out);
+	WARN_ON(tcp_is_sack(tp) && (sacked != tp->sacked_out));
+	WARN_ON(packets != tp->packets_out);
+	if ((lost != tp->lost_out) ||
+	    (tcp_is_sack(tp) && (sacked != tp->sacked_out)) ||
+	    (packets != tp->packets_out)) {
+		printk(KERN_ERR "P: %u L: %u vs %u S: %u vs %u w: %u-%u (%u)\n",
+			tp->packets_out,
+			lost, tp->lost_out,
+			sacked, tp->sacked_out,
+			tp->snd_una, tp->snd_nxt,
+		       	tp->rx_opt.sack_ok);
+		tcp_print_queue(sk);
+	}
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 89f0188..648340f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -779,10 +779,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 			tp->lost_out -= diff;
 
 		/* Adjust Reno SACK estimate. */
-		if (tcp_is_reno(tp) && diff > 0) {
+		if (tcp_is_reno(tp) && diff > 0)
 			tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
-			tcp_verify_left_out(tp);
-		}
+
 		tcp_adjust_fackets_out(sk, skb, diff);
 	}
 
@@ -790,6 +789,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
 
+	tcp_verify_left_out(tp);
+
 	return 0;
 }
 
@@ -1463,6 +1464,7 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 	} else if (result > 0) {
 		sent_pkts = 1;
 	}
+	tcp_verify_left_out(tp);
 
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
@@ -1764,6 +1766,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb,
 	tcp_clear_retrans_hints_partial(tp);
 
 	sk_wmem_free_skb(sk, next_skb);
+	tcp_verify_left_out(tp);
 }
 
 /* Do a simple retransmit without using the backoff mechanisms in
@@ -1795,13 +1798,13 @@ void tcp_simple_retransmit(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	tcp_clear_all_retrans_hints(tp);
 
 	if (!lost)
 		return;
 
-	tcp_verify_left_out(tp);
-
 	/* Don't muck with the congestion window here.
 	 * Reason is that we do not increase amount of _data_
 	 * in network, but units changed and effective
@@ -1970,8 +1973,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			 * packet to be MSS sized and all the
 			 * packet counting works out.
 			 */
-			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
+			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd) {
+				tcp_verify_left_out(tp);
 				return;
+			}
 
 			if (sacked & TCPCB_LOST) {
 				if (!(sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))) {
@@ -1997,6 +2002,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	/* OK, demanded retransmission is finished. */
 
 	/* Forward retransmissions are possible only during Recovery. */
@@ -2054,6 +2061,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
 	}
+
+	tcp_verify_left_out(tp);
 }
 
 /* Send a fin.  The caller locks the socket for us.  This cannot be
-- 
1.5.2.2

^ permalink raw reply related

* Re: Assertions in latest kernels
From: Krishna Kumar2 @ 2008-01-23  8:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080122.235723.52174015.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote on 01/23/2008 01:27:23 PM:

> > iperf with multiple threads almost always gets these 4, *especially*
when I
> > do some batching :).
> >
> > static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int
flag)
> > {
> >       ...
> >       if (WARN_ON(!tp->sacked_out && tp->fackets_out))
> >             tp->fackets_out = 0;
> >       ...
> > }
>
> Does this assertion show up first or do you get the other TCP
> ones first?  It might be important, in that if you get the
> others ones first that corrupted state might be what leads to
> this one.

Hi Dave,

I looked at my *old* messages file and found this assert (2506) was first
to hit (atleast in
two messages file). It hit 5 times, then I got a different one that I had
not reported earlier:

"KERNEL: assertion (packets <= tp->packets_out) failed at
net/ipv4/tcp_input.c (2139)"

(though this was hidden in my report under the panic for tcp_input.c:2528.

Then another two thousand times of the 2506 asserts.

Today I installed the latest untouched kernel, rebooted system and got the
following errors
in sequence, but no 2506 errors (which I have always got when running
batching in the last
2-3 weeks):

Jan 22 02:07:55 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:07:56 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:07:56 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:07:57 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:07:58 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:07:59 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:07:59 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:00 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:01 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:01 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:02 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:03 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:03 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:04 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:05 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:06 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:06 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:07 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
Jan 22 02:08:07 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:08 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:09 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:10 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:10 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:11 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:12 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:12 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:13 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:14 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:15 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:15 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
Jan 22 02:08:16 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
Jan 22 02:08:16 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:1767
Jan 22 02:08:17 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:18 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:18 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169
Jan 22 02:08:19 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2528
Jan 22 02:08:19 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2169

and so on for another 700 counts.

The unique asserts are:
      1767: tcp_verify_left_out (from tcp_entry_frto)
      2169: tcp_verify_left_out (from tcp_mark_head_lost)
      2528: tcp_verify_left_out (from tcp_fastretrans_alert)
      3063: tcp_verify_left_out (from tcp_process_frto)
(where 2169 seems to preceed any other asserts)

The other two asserts that I got only with batching are:
      2139: BUG_TRAP(packets <= tp->packets_out); (in tcp_mark_head_lost)
      2506: WARN_ON(!tp->sacked_out && tp->fackets_out) (in
tcp_fastretrans_alert)
(where 2506 always seems to preceed any other asserts).

thanks,

- KK


^ permalink raw reply

* Re: Assertions in latest kernels
From: David Miller @ 2008-01-23  7:57 UTC (permalink / raw)
  To: krkumar2; +Cc: netdev
In-Reply-To: <OFC47DE839.9B2F85A5-ON652573D9.002BA994-652573D9.002B38BA@in.ibm.com>

From: Krishna Kumar2 <krkumar2@in.ibm.com>
Date: Wed, 23 Jan 2008 13:22:05 +0530

> Sorry for the duplicate bug report, I just saw that Dave Young had already
> reported three of the
> 4 assertions that I was getting. The only different assert I am getting is:
> 
> iperf with multiple threads almost always gets these 4, *especially* when I
> do some batching :).
> 
> static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
> {
>       ...
>       if (WARN_ON(!tp->sacked_out && tp->fackets_out))
>             tp->fackets_out = 0;
>       ...
> }

Does this assertion show up first or do you get the other TCP
ones first?  It might be important, in that if you get the
others ones first that corrupted state might be what leads to
this one.


^ permalink raw reply

* Re: Assertions in latest kernels
From: Krishna Kumar2 @ 2008-01-23  7:52 UTC (permalink / raw)
  To: netdev
In-Reply-To: <OFBE734C49.5B16C8E9-ON652573D9.002548B8-652573D9.0026407B@in.ibm.com>

Sorry for the duplicate bug report, I just saw that Dave Young had already
reported three of the
4 assertions that I was getting. The only different assert I am getting is:

iperf with multiple threads almost always gets these 4, *especially* when I
do some batching :).

static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
{
      ...
      if (WARN_ON(!tp->sacked_out && tp->fackets_out))
            tp->fackets_out = 0;
      ...
}

Jan 13 20:57:08 elm3b39 kernel: ------------[ cut here ]------------
Jan 13 20:57:08 elm3b39 kernel: Badness at net/ipv4/tcp_input.c:2506
Jan 13 20:57:08 elm3b39 kernel: NIP: c0000000003a0358 LR: c0000000003a0808 CTR: c0000000003d7d90
Jan 13 20:57:08 elm3b39 kernel: REGS: c0000000e60f6e00 TRAP: 0700   Not tainted  (2.6.24-rc6)
Jan 13 20:57:08 elm3b39 kernel: MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24024088  XER: 00000010
Jan 13 20:57:08 elm3b39 kernel: TASK = c0000001d20d1060[0] 'swapper' THREAD: c0000000e60f4000 CPU: 5
Jan 13 20:57:08 elm3b39 kernel: GPR00: 0000000000000001 c0000000e60f7080 c0000000005f1f00 c0000000dcccd280
Jan 13 20:57:08 elm3b39 kernel: GPR04: 000000009845ceac 0000000000000000 00000000000000fb 00000000000000f7
Jan 13 20:57:08 elm3b39 kernel: GPR08: 0000000000000000 0000000000000001 0000000000000000 c0000000003d8230
Jan 13 20:57:08 elm3b39 kernel: GPR12: 0000000044000022 c000000000554380 0000000000000000 0000000003ab69d3
Jan 13 20:57:08 elm3b39 kernel: GPR16: 00000000000000fa 0000000000000008 0000000000000008 c0000000dcccd348
Jan 13 20:57:08 elm3b39 kernel: GPR20: 000000009845ceac 0000000000000001 0000000000000000 0000000000000004
Jan 13 20:57:08 elm3b39 kernel: GPR24: 0000000044004082 0000000024004082 c0000000dcccd280 000000000000050e
Jan 13 20:57:08 elm3b39 kernel: GPR28: 0000000000000003 00000000000000f7 c0000000005ba670 000000000000050e
Jan 13 20:57:08 elm3b39 kernel: NIP [c0000000003a0358] .tcp_ack+0xa98/0x1fe0
Jan 13 20:57:08 elm3b39 kernel: LR [c0000000003a0808] .tcp_ack+0xf48/0x1fe0
Jan 13 20:57:08 elm3b39 kernel: Call Trace:
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7080] [c0000000003a0808] .tcp_ack+0xf48/0x1fe0 (unreliable)
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f71e0] [c0000000003a5a24] .tcp_rcv_established+0x4d4/0x8f0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7290] [c0000000003add54] .tcp_v4_do_rcv+0x134/0x2a0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7350] [c0000000003b013c] .tcp_v4_rcv+0x75c/0x8d0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7410] [c000000000389fa0] .ip_local_deliver+0xf0/0x2d0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f74a0] [c00000000038a4b4] .ip_rcv+0x334/0x6d0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7570] [c000000000358e6c] .netif_receive_skb+0x32c/0x630
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7650] [d00000000040d27c] .ipoib_ib_handle_rx_wc+0x1dc/0x3a0 [ib_ipoib]
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7700] [d00000000040ea6c] .ipoib_poll+0x24c/0x2b0 [ib_ipoib]
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f77e0] [c00000000035c830] .net_rx_action+0x1f0/0x2a0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f78a0] [c00000000005af78] .__do_softirq+0xe8/0x1e0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7960] [c00000000000c4a4] .do_softirq+0x64/0xa0
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f79e0] [c00000000005b134] .irq_exit+0x74/0x90
Jan 13 20:57:08 elm3b39 kernel: [c0000000e60f7a60] [c00000000000cd90] .do_IRQ+0xe0/0x120
Jan 13 20:57:09 elm3b39 kernel: [c0000000e60f7ae0] [c000000000004780] hardware_interrupt_entry+0x18/0x98
Jan 13 20:57:09 elm3b39 kernel: --- Exception: 501 at .pseries_dedicated_idle_sleep+0xe8/0x1c0
Jan 13 20:57:09 elm3b39 kernel:     LR = .pseries_dedicated_idle_sleep+0xe0/0x1c0
Jan 13 20:57:09 elm3b39 kernel: [c0000000e60f7dd0] [0000000000000000] .__start+0x4000000000000000/0x8 (unreliable)
Jan 13 20:57:09 elm3b39 kernel: [c0000000e60f7e70] [c000000000011fec] .cpu_idle+0x13c/0x250
Jan 13 20:57:09 elm3b39 kernel: [c0000000e60f7f00] [c00000000002bd5c] .start_secondary+0x14c/0x190
Jan 13 20:57:09 elm3b39 kernel: [c0000000e60f7f90] [c000000000008364] .start_secondary_prolog+0xc/0x10
Jan 13 20:57:09 elm3b39 kernel: Instruction dump:
Jan 13 20:57:09 elm3b39 kernel: 815a055c 39200000 38000000 2f8a0000 409e0020 801a0560 2f800000 7d200026
Jan 13 20:57:09 elm3b39 kernel: 5529fffe 69290001 79290620 7d204b78 <0b090000> 2fa00000 409e1464 7f280120


^ permalink raw reply

* Re: [PATCH 12/12 net-2.6.25] [NETNS]: Add namespace for ICMP replying code.
From: David Miller @ 2008-01-23  7:51 UTC (permalink / raw)
  To: den; +Cc: netdev, devel, containers
In-Reply-To: <1201074387-7366-3-git-send-email-den@openvz.org>

From: "Denis V. Lunev" <den@openvz.org>
Date: Wed, 23 Jan 2008 10:46:27 +0300

> All needed API is done, the namespace is available when required from the
> device on the DST entry from the incoming packet. So, just replace init_net
> with proper namespace.
> 
> Other protocols will follow.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 11/12 net-2.6.25] [NETNS]: Routing cache virtualization.
From: David Miller @ 2008-01-23  7:51 UTC (permalink / raw)
  To: den; +Cc: netdev, devel, containers
In-Reply-To: <1201074387-7366-2-git-send-email-den@openvz.org>

From: "Denis V. Lunev" <den@openvz.org>
Date: Wed, 23 Jan 2008 10:46:26 +0300

> Basically, this piece looks relatively easy. Namespace is already available
> on the dst entry via device and the device is safe to dereferrence. Compare
> it with one of a searcher and skip entry if appropriate.
> 
> The only exception is ip_rt_frag_needed. So, add namespace parameter to it.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>

Applied.

^ permalink raw reply

* Re: [PATCH 10/12 net-2.6.25] [NETNS]: Correct namespace for connect-time routing.
From: David Miller @ 2008-01-23  7:51 UTC (permalink / raw)
  To: den; +Cc: netdev, devel, containers
In-Reply-To: <1201074387-7366-1-git-send-email-den@openvz.org>

From: "Denis V. Lunev" <den@openvz.org>
Date: Wed, 23 Jan 2008 10:46:25 +0300

> ip_route_connect and ip_route_newports are a part of routing API presented to
> the socket layer. The namespace is available inside them through a socket.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>

Applied.

^ permalink raw reply

* [PATCH 10/12 net-2.6.25] [NETNS]: Correct namespace for connect-time routing.
From: Denis V. Lunev @ 2008-01-23  7:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <479612BE.8030409@openvz.org>

ip_route_connect and ip_route_newports are a part of routing API presented to
the socket layer. The namespace is available inside them through a socket.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/route.h |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index d9b876a..1985d82 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -33,6 +33,7 @@
 #include <linux/ip.h>
 #include <linux/cache.h>
 #include <linux/security.h>
+#include <net/sock.h>
 
 #ifndef __KERNEL__
 #warning This file is not supposed to be used outside of kernel.
@@ -157,8 +158,9 @@ static inline int ip_route_connect(struct rtable **rp, __be32 dst,
 					 .dport = dport } } };
 
 	int err;
+	struct net *net = sk->sk_net;
 	if (!dst || !src) {
-		err = __ip_route_output_key(&init_net, rp, &fl);
+		err = __ip_route_output_key(net, rp, &fl);
 		if (err)
 			return err;
 		fl.fl4_dst = (*rp)->rt_dst;
@@ -167,7 +169,7 @@ static inline int ip_route_connect(struct rtable **rp, __be32 dst,
 		*rp = NULL;
 	}
 	security_sk_classify_flow(sk, &fl);
-	return ip_route_output_flow(&init_net, rp, &fl, sk, flags);
+	return ip_route_output_flow(net, rp, &fl, sk, flags);
 }
 
 static inline int ip_route_newports(struct rtable **rp, u8 protocol,
@@ -184,7 +186,7 @@ static inline int ip_route_newports(struct rtable **rp, u8 protocol,
 		ip_rt_put(*rp);
 		*rp = NULL;
 		security_sk_classify_flow(sk, &fl);
-		return ip_route_output_flow(&init_net, rp, &fl, sk, 0);
+		return ip_route_output_flow(sk->sk_net, rp, &fl, sk, 0);
 	}
 	return 0;
 }
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 12/12 net-2.6.25] [NETNS]: Add namespace for ICMP replying code.
From: Denis V. Lunev @ 2008-01-23  7:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <479612BE.8030409@openvz.org>

All needed API is done, the namespace is available when required from the
device on the DST entry from the incoming packet. So, just replace init_net
with proper namespace.

Other protocols will follow.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/icmp.c      |   21 +++++++++++++--------
 net/ipv4/ip_output.c |    2 +-
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 052b278..a6c092c 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -404,7 +404,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 						.tos = RT_TOS(ip_hdr(skb)->tos) } },
 				    .proto = IPPROTO_ICMP };
 		security_skb_classify_flow(skb, &fl);
-		if (ip_route_output_key(&init_net, &rt, &fl))
+		if (ip_route_output_key(rt->u.dst.dev->nd_net, &rt, &fl))
 			goto out_unlock;
 	}
 	if (icmpv4_xrlim_allow(rt, icmp_param->data.icmph.type,
@@ -436,9 +436,11 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 	struct ipcm_cookie ipc;
 	__be32 saddr;
 	u8  tos;
+	struct net *net;
 
 	if (!rt)
 		goto out;
+	net = rt->u.dst.dev->nd_net;
 
 	/*
 	 *	Find the original header. It is expected to be valid, of course.
@@ -514,7 +516,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		struct net_device *dev = NULL;
 
 		if (rt->fl.iif && sysctl_icmp_errors_use_inbound_ifaddr)
-			dev = dev_get_by_index(&init_net, rt->fl.iif);
+			dev = dev_get_by_index(net, rt->fl.iif);
 
 		if (dev) {
 			saddr = inet_select_addr(dev, 0, RT_SCOPE_LINK);
@@ -569,7 +571,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		struct rtable *rt2;
 
 		security_skb_classify_flow(skb_in, &fl);
-		if (__ip_route_output_key(&init_net, &rt, &fl))
+		if (__ip_route_output_key(net, &rt, &fl))
 			goto out_unlock;
 
 		/* No need to clone since we're just using its address. */
@@ -591,14 +593,14 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		if (xfrm_decode_session_reverse(skb_in, &fl, AF_INET))
 			goto out_unlock;
 
-		if (inet_addr_type(&init_net, fl.fl4_src) == RTN_LOCAL)
-			err = __ip_route_output_key(&init_net, &rt2, &fl);
+		if (inet_addr_type(net, fl.fl4_src) == RTN_LOCAL)
+			err = __ip_route_output_key(net, &rt2, &fl);
 		else {
 			struct flowi fl2 = {};
 			struct dst_entry *odst;
 
 			fl2.fl4_dst = fl.fl4_src;
-			if (ip_route_output_key(&init_net, &rt2, &fl2))
+			if (ip_route_output_key(net, &rt2, &fl2))
 				goto out_unlock;
 
 			/* Ugh! */
@@ -666,6 +668,9 @@ static void icmp_unreach(struct sk_buff *skb)
 	int hash, protocol;
 	struct net_protocol *ipprot;
 	u32 info = 0;
+	struct net *net;
+
+	net = skb->dst->dev->nd_net;
 
 	/*
 	 *	Incomplete header ?
@@ -696,7 +701,7 @@ static void icmp_unreach(struct sk_buff *skb)
 							 "and DF set.\n",
 					       NIPQUAD(iph->daddr));
 			} else {
-				info = ip_rt_frag_needed(&init_net, iph,
+				info = ip_rt_frag_needed(net, iph,
 						     ntohs(icmph->un.frag.mtu));
 				if (!info)
 					goto out;
@@ -734,7 +739,7 @@ static void icmp_unreach(struct sk_buff *skb)
 	 */
 
 	if (!sysctl_icmp_ignore_bogus_error_responses &&
-	    inet_addr_type(&init_net, iph->daddr) == RTN_BROADCAST) {
+	    inet_addr_type(net, iph->daddr) == RTN_BROADCAST) {
 		if (net_ratelimit())
 			printk(KERN_WARNING "%u.%u.%u.%u sent an invalid ICMP "
 					    "type %u, code %u "
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 6a5b839..4fad239 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1377,7 +1377,7 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
 						 .dport = tcp_hdr(skb)->source } },
 				    .proto = sk->sk_protocol };
 		security_skb_classify_flow(skb, &fl);
-		if (ip_route_output_key(&init_net, &rt, &fl))
+		if (ip_route_output_key(sk->sk_net, &rt, &fl))
 			return;
 	}
 
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 11/12 net-2.6.25] [NETNS]: Routing cache virtualization.
From: Denis V. Lunev @ 2008-01-23  7:46 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <479612BE.8030409@openvz.org>

Basically, this piece looks relatively easy. Namespace is already available
on the dst entry via device and the device is safe to dereferrence. Compare
it with one of a searcher and skip entry if appropriate.

The only exception is ip_rt_frag_needed. So, add namespace parameter to it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/route.h |    2 +-
 net/ipv4/icmp.c     |    2 +-
 net/ipv4/route.c    |   21 ++++++++++++++++-----
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index 1985d82..4eabf00 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -115,7 +115,7 @@ extern int		__ip_route_output_key(struct net *, struct rtable **, const struct f
 extern int		ip_route_output_key(struct net *, struct rtable **, struct flowi *flp);
 extern int		ip_route_output_flow(struct net *, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags);
 extern int		ip_route_input(struct sk_buff*, __be32 dst, __be32 src, u8 tos, struct net_device *devin);
-extern unsigned short	ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu);
+extern unsigned short	ip_rt_frag_needed(struct net *net, struct iphdr *iph, unsigned short new_mtu);
 extern void		ip_rt_send_redirect(struct sk_buff *skb);
 
 extern unsigned		inet_addr_type(struct net *net, __be32 addr);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c04aac5..052b278 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -696,7 +696,7 @@ static void icmp_unreach(struct sk_buff *skb)
 							 "and DF set.\n",
 					       NIPQUAD(iph->daddr));
 			} else {
-				info = ip_rt_frag_needed(iph,
+				info = ip_rt_frag_needed(&init_net, iph,
 						     ntohs(icmph->un.frag.mtu));
 				if (!info)
 					goto out;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 87076c6..07dd295 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -648,6 +648,11 @@ static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
 		(fl1->iif ^ fl2->iif)) == 0;
 }
 
+static inline int compare_netns(struct rtable *rt1, struct rtable *rt2)
+{
+	return rt1->u.dst.dev->nd_net == rt2->u.dst.dev->nd_net;
+}
+
 /*
  * Perform a full scan of hash table and free all entries.
  * Can be called by a softirq or a process.
@@ -961,7 +966,7 @@ restart:
 
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	while ((rth = *rthp) != NULL) {
-		if (compare_keys(&rth->fl, &rt->fl)) {
+		if (compare_keys(&rth->fl, &rt->fl) && compare_netns(rth, rt)) {
 			/* Put it first */
 			*rthp = rth->u.dst.rt_next;
 			/*
@@ -1415,7 +1420,8 @@ static __inline__ unsigned short guess_mtu(unsigned short old_mtu)
 	return 68;
 }
 
-unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu)
+unsigned short ip_rt_frag_needed(struct net *net, struct iphdr *iph,
+				 unsigned short new_mtu)
 {
 	int i;
 	unsigned short old_mtu = ntohs(iph->tot_len);
@@ -1438,7 +1444,8 @@ unsigned short ip_rt_frag_needed(struct iphdr *iph, unsigned short new_mtu)
 			    rth->rt_dst  == daddr &&
 			    rth->rt_src  == iph->saddr &&
 			    rth->fl.iif == 0 &&
-			    !(dst_metric_locked(&rth->u.dst, RTAX_MTU))) {
+			    !(dst_metric_locked(&rth->u.dst, RTAX_MTU)) &&
+			    rth->u.dst.dev->nd_net == net) {
 				unsigned short mtu = new_mtu;
 
 				if (new_mtu < 68 || new_mtu >= old_mtu) {
@@ -2049,7 +2056,9 @@ int ip_route_input(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	struct rtable * rth;
 	unsigned	hash;
 	int iif = dev->ifindex;
+	struct net *net;
 
+	net = skb->dev->nd_net;
 	tos &= IPTOS_RT_MASK;
 	hash = rt_hash(daddr, saddr, iif);
 
@@ -2061,7 +2070,8 @@ int ip_route_input(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		    rth->fl.iif == iif &&
 		    rth->fl.oif == 0 &&
 		    rth->fl.mark == skb->mark &&
-		    rth->fl.fl4_tos == tos) {
+		    rth->fl.fl4_tos == tos &&
+		    rth->u.dst.dev->nd_net == net) {
 			dst_use(&rth->u.dst, jiffies);
 			RT_CACHE_STAT_INC(in_hit);
 			rcu_read_unlock();
@@ -2459,7 +2469,8 @@ int __ip_route_output_key(struct net *net, struct rtable **rp,
 		    rth->fl.oif == flp->oif &&
 		    rth->fl.mark == flp->mark &&
 		    !((rth->fl.fl4_tos ^ flp->fl4_tos) &
-			    (IPTOS_RT_MASK | RTO_ONLINK))) {
+			    (IPTOS_RT_MASK | RTO_ONLINK)) &&
+		    rth->u.dst.dev->nd_net == net) {
 			dst_use(&rth->u.dst, jiffies);
 			RT_CACHE_STAT_INC(out_hit);
 			rcu_read_unlock_bh();
-- 
1.5.3.rc5


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox