Netdev List
 help / color / mirror / Atom feed
* [PATCH net-2.6.25 0/10] Make fragments live in net namespaces
From: Pavel Emelyanov @ 2008-01-22 13:52 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel

The overall design I propose is to keep the hash table
global and tag inet_frag_queue with the net. Since the 
fragments hash is going to be re-sizable, this is OK to 
keep fragments from different namespace in one hash.

To speedup the evicting process LRU list is made per
namespace.

As far as the CTL-tuned variables are concerned, the
timeout and thresholds are made per namespace, since
they have the per namespace sense, but the secret rebuild
interval is read-only in sub-namespaces.

Since fragment management code is consolidated for ipv4
and ipv6 I make them all in one go. The conntrack_reasm
netns-ization is not done - we have to make at least the 
core netfilter per namespace first, but this reasm code 
is patched to keep working in the initial namespace.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-22 12:29 UTC (permalink / raw)
  To: jamal; +Cc: David Miller, netdev, slavon, kaber
In-Reply-To: <1201002127.4443.32.camel@localhost>

On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote:
...
> Jarek,
> 
> That looks different from the suggestion from Dave.

Hmm..., I'm not sure you mean my or your suggestion here, but you
are right anyway...

> May i throw in another bone? Theoretically i can see why it would be a
> really bad idea to walk 50K estimators every time you delete one - which
> is horrible if you are trying to destroy the say 50K of them and gets
> worse as the number of schedulers with 50K classes goes up.
> 
> But i am wondering why a simpler list couldnt be walked, meaning:
> 
> In gen_kill_estimator(), instead of:
> 
> for (idx=0; idx <= EST_MAX_INTERVAL; idx++) {
> 
> Would deriving a better initial index be a big improvement?
> for (e = elist[est->interval].list; e; e = e->next) {

Maybe I miss something, but there still could be a lot of this walking
and IMHO any such longer waiting with BHs disabled is hard to accept
with current memory sizes and low-latencies prices. And currently time
seems to be even more precious here: RCU can't even free any
gen_estimator memory during such large qdisc with classes deletion.

Thanks,
Jarek P.

^ permalink raw reply

* forcedeth oops
From: Andrew Brooks @ 2008-01-22 11:54 UTC (permalink / raw)
  To: netdev

Hello

I'm getting an oops in forcedeth whenever I shutdown, details below.

I've tried kernel 2.6.16.59 and the latest forcedeth.c from nvidia.com
which is package-1.23 version-0.62 date-2007/04/27.

How can I download the latest forcedeth.c (including 2008-01-13 patches) ?
It's not in the latest snapshot linux-2.6.24-rc8.

Also, why is the version on nvidia.com not just older than the one in
the kernel, but it appears to have forked back in May 2006.  Has there
been independent development on each version?  They should be the same!

Here's the diff:
<  *    0.56: 22 Mar 2006: Additional ethtool and moduleparam support.
<  *    0.57: 14 May 2006: Moved mac address writes to nv_probe and nv_remove.
<  *    0.58: 20 May 2006: Optimized rx and tx data paths.
<  *    0.59: 31 May 2006: Added support for sideband management unit.
<  *    0.60: 31 May 2006: Added support for recoverable error.
<  *    0.61: 18 Jul 2006: Added support for suspend/resume.
<  *    0.62: 16 Jan 2007: Fixed statistics, mgmt communication, and low phy speed on S5.
---
>  *    0.56: 22 Mar 2006: Additional ethtool config and moduleparam support.
>  *    0.57: 14 May 2006: Mac address set in probe/remove and order corrections.
>  *    0.58: 30 Oct 2006: Added support for sideband management unit.
>  *    0.59: 30 Oct 2006: Added support for recoverable error.
>  *    0.60: 20 Jan 2007: Code optimizations for rings, rx & tx data paths, and stats.


Here's the details of the oops:
md: md0 switched to read-only mode.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
f8ccdd55
*pde = 36c6a001
Oops: 0000 [#1]
SMP
Modules linked in: nvidia ... forcedeth ... sata_nv
CPU: 1
EIP:
EFLAGS: 00010286 (2.6.16.59 #1)
EIP is at nv_suspend+0x85/0x350 [forcedeth]
eax:
esi:
ds:
Process reboot
Stack:
Call Trace:
show_stack_log
show_registers
die
do_page_fault
error_code
nv_reboot_handler
notifier_call_chain
kernel_restart_prepare
kernel_restart
sys_reboot
sysenter_past_esp
Code: 8b 8c 3a 98 01 00 00 01 c8 8b ...
INIT: no more processes left in this runlevel


Andrew

^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: jamal @ 2008-01-22 11:42 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, netdev, slavon, kaber
In-Reply-To: <20080122072152.GA977@ff.dom.local>

On Tue, 2008-22-01 at 08:21 +0100, Jarek Poplawski wrote:
> On 22-01-2008 01:29, David Miller wrote:
> ...
> > Fix this right, make a structure like:
> > 
> > struct kernel_gnet_stats_rate_est {
> > 	struct gnet_stats_rate_est	est;
> > 	void				*gen_estimator;
> > }
> > 
> > And update all the code as needed.
> 
> Thanks!
>  I'll try this...

Jarek,

That looks different from the suggestion from Dave.

May i throw in another bone? Theoretically i can see why it would be a
really bad idea to walk 50K estimators every time you delete one - which
is horrible if you are trying to destroy the say 50K of them and gets
worse as the number of schedulers with 50K classes goes up.

But i am wondering why a simpler list couldnt be walked, meaning:

In gen_kill_estimator(), instead of:

for (idx=0; idx <= EST_MAX_INTERVAL; idx++) {

Would deriving a better initial index be a big improvement?
for (e = elist[est->interval].list; e; e = e->next) {


cheers,
jamal


^ permalink raw reply

* Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: Marcel Holtmann @ 2008-01-22 11:39 UTC (permalink / raw)
  To: Dave Young
  Cc: David Miller, netdev, linux-kernel, bluez-devel, cornelia.huck,
	gombasg, htejun, viro, kay.sievers, greg
In-Reply-To: <a8e1da0801220024u30e9c814va74e44252fc8b11e@mail.gmail.com>

Hi Dave,

> could you tell something more about your coding style?
> I would like to submit patches about bluetooth according to your sytle
> later If I have.
> 
> Maybe you could put it on the bluez web site or anywhere.

it follows closely the kernel coding style as layout within the kernel
documentation. However there are some minor style things, that I am
going to enforce from time to time. Like having the container_of or
get_user_data calls at the top of the variable declaration. This has
never formalized as far as I recall, but makes from my point of view the
code clearer and easier to read.

Some other times I like an extra empty line to more visual separate
different code blocks. For this some people might agree with me others
might disagree. It is fully a personal more liking one way over the
other.

When it comes to indentation and placement of braces etc. I is 100% the
kernel coding style and nothing else. If not, then it needs fixing and
is an oversight from the old days.

Regards

Marcel



^ permalink raw reply

* [Bug 9750] dev: avoid a race that triggers assertion failure
From: Matti Linnanvuori @ 2008-01-22 11:27 UTC (permalink / raw)
  To: netdev, jgarzik; +Cc: bugme-daemon

From: Matti Linnanvuori <mattilinnanvuori@yahoo.com>

There is a race in Linux kernel file net/core/dev.c, function dev_close.
The function calls function dev_deactivate, which calls function
dev_watchdog_down that deletes the watchdog timer. However, after that, a
driver can call netif_carrier_ok, which calls function
__netdev_watchdog_up that can add the watchdog timer again. Function
unregister_netdevice calls function dev_shutdown that traps the bug
!timer_pending(&dev->watchdog_timer).

Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com>

---

--- linux-2.6.23.8/net/core/dev.c    2007-11-16 20:14:27.000000000 +0200
+++ linux-2.6.23.15/net/core/dev.c    2008-01-22 13:16:12.347125794 +0200
@@ -1013,8 +1013,6 @@ int dev_close(struct net_device *dev)
      */
     raw_notifier_call_chain(&netdev_chain, NETDEV_GOING_DOWN, dev);
 
-    dev_deactivate(dev);
-
     clear_bit(__LINK_STATE_START, &dev->state);
 
     /* Synchronize to scheduled poll. We cannot touch poll list,
@@ -1029,6 +1027,8 @@ int dev_close(struct net_device *dev)
         msleep(1);
     }
 
+    dev_deactivate(dev);
+
     /*
      *    Call the device specific close. This cannot fail.
      *    Only if device is UP





      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping

^ permalink raw reply

* [DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64
From: Eric Dumazet @ 2008-01-22 10:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org

On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to 0x180
bytes by SLAB allocator.

We can reduce this to exactly 0x140 bytes, without alignment overhead,
and store 12 struct rtable per PAGE instead of 10.

rate_tokens is currently defined as an "unsigned long", while its content
should not exceed 6*HZ. It can safely be converted to an unsigned int.

Moving tclassid right after rate_tokens to fill the 4 bytes hole permits
to save 8 bytes on 'struct dst_entry', which finally permits to save 8
bytes on 'struct rtable'

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/include/net/dst.h b/include/net/dst.h
index c45dcc3..e3ac7d0 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -56,7 +56,11 @@ struct dst_entry
 	struct dst_entry	*path;
 
 	unsigned long		rate_last;	/* rate limiting for ICMP */
-	unsigned long		rate_tokens;
+	unsigned int		rate_tokens;
+
+#ifdef CONFIG_NET_CLS_ROUTE
+	__u32			tclassid;
+#endif
 
 	struct neighbour	*neighbour;
 	struct hh_cache		*hh;
@@ -65,10 +69,6 @@ struct dst_entry
 	int			(*input)(struct sk_buff*);
 	int			(*output)(struct sk_buff*);
 
-#ifdef CONFIG_NET_CLS_ROUTE
-	__u32			tclassid;
-#endif
-
 	struct  dst_ops	        *ops;
 		
 	unsigned long		lastuse;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 7ed8c50..1dbe89c 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -275,18 +275,19 @@ static inline void icmp_xmit_unlock(void)
 #define XRLIM_BURST_FACTOR 6
 int xrlim_allow(struct dst_entry *dst, int timeout)
 {
-	unsigned long now;
+	unsigned long now, token = dst->rate_tokens;
 	int rc = 0;
 
 	now = jiffies;
-	dst->rate_tokens += now - dst->rate_last;
+	token += now - dst->rate_last;
 	dst->rate_last = now;
-	if (dst->rate_tokens > XRLIM_BURST_FACTOR * timeout)
-		dst->rate_tokens = XRLIM_BURST_FACTOR * timeout;
-	if (dst->rate_tokens >= timeout) {
-		dst->rate_tokens -= timeout;
+	if (token > XRLIM_BURST_FACTOR * timeout)
+		token = XRLIM_BURST_FACTOR * timeout;
+	if (token >= timeout) {
+		token -= timeout;
 		rc = 1;
 	}
+	dst->rate_tokens = token;
 	return rc;
 }
 

^ permalink raw reply related

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Ilpo Järvinen @ 2008-01-22 10:47 UTC (permalink / raw)
  To: Dave Young; +Cc: LKML, David Miller, Netdev, Andrew Morton
In-Reply-To: <a8e1da0801220109v6bf8931ev50f2210402c3ba41@mail.gmail.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 11373 bytes --]

On Tue, 22 Jan 2008, Dave Young wrote:

> On Jan 22, 2008 12:37 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
> >
> > On Jan 22, 2008 5:14 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > >
> > > On Mon, 21 Jan 2008, Dave Young wrote:
> > >
> > > > Please see the kernel messages following,(trigged while using some qemu session)
> > > > BTW, seems there's some e100 error message as well.
> > > >
> > > > PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > > ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > > modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> > > > e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> > > > ACPI: PCI interrupt for device 0000:03:08.0 disabled
> > > > e100: probe of 0000:03:08.0 failed with error -12
> > > > eth0:  setting full-duplex.
> > > > ------------[ cut here ]------------
> > > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > > >  [<c0132100>] ? printk+0x0/0x20
> > > >  [<c0131834>] warn_on_slowpath+0x54/0x80
> > > >  [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
> > > >  [<c03e9527>] ? ip_output+0xe7/0x100
> > > >  [<c03e8a88>] ? ip_local_out+0x18/0x20
> > > >  [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
> > > >  [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
> > > >  [<c0186be1>] ? check_pad_bytes+0x61/0x80
> > > >  [<c03f6031>] tcp_mark_head_lost+0x121/0x150
> > > >  [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
> > > >  [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
> > > >  [<c03f7d93>] tcp_ack+0x1b3/0x3a0
> > > >  [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
> > > >  [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
> > > >  [<c0401bbb>] tcp_v4_rcv+0x5db/0x660
> > >
> > > Doh, once more these S+L things..., the rest are symptom of the first
> > > problem.
> >
> > What is the S+L thing? Could you explain a bit?

It means that one of the skbs is both SACKed and marked as LOST at the
same time in the counters (might be due to miscount of lost/sacked_out
too, not necessarilily in the ->sacked bits). Such state is logically
invalid because it would mean that the sender thinks that the same packet 
both reached the receiver and is lost in the network.

Traditionally TCP has just silently "corrected" over-estimates
(sacked_out+lost_out > packets_out). I changed this couple of releases ago
because those over-estimates often are due to bugs that should be fixed  
(there have been couple of them but it has been very quite on this front  
long time, months or even half year already; but I might have broken
something with the early Dec changes).

These problem may originate from a bug that occurred a number of ACKs
earlier the WARN_ON triggered, therefore they are a bit tricky to track,
those WARN_ON serve just for alerting purposes and usually do not point
out where the bug actually occurred.

I usually just asked people to include exhaustive verifier which compares
->sacked bitmaps with sacked/lost_out counters and report immediately when
the problem shows up, rather than waiting for the cheaper S+L check we do
in the WARN_ON to trigger. I tried to collect tracking patch from the
previous efforts (hopefully got it right after modifications).

> > I'm a bit worried about its
> > > reproducability if it takes this far to see it...
> > >
> 
> It's trigged again in my pc, just while using firefox.

...Good, then there's some chance to catch it.

-- 
 i.

[PATCH] [TCP]: debug S+L

---
 include/net/tcp.h     |    8 +++-
 net/ipv4/tcp_input.c  |    6 +++
 net/ipv4/tcp_ipv4.c   |  101 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_output.c |   21 +++++++---
 4 files changed, 129 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 7de4ea3..0685035 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,8 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void			tcp_verify_wq(struct sock *sk);
+
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
 extern void			tcp_shutdown (struct sock *sk, int how);
@@ -768,7 +770,11 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)	WARN_ON(tcp_left_out(tp) > tp->packets_out)
+#define tcp_verify_left_out(tp)	\
+	do { \
+		WARN_ON(tcp_left_out(tp) > tp->packets_out); \
+		tcp_verify_wq((struct sock *)tp); \
+	} while(0)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fa2c85c..0bda0e1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2645,6 +2645,10 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (do_lost || (tcp_is_fack(tp) && tcp_head_timedout(sk)))
 		tcp_update_scoreboard(sk, fast_rexmit);
 	tcp_cwnd_down(sk, flag);
+
+	WARN_ON(tcp_write_queue_head(sk) == NULL);
+	WARN_ON(!tp->packets_out);
+
 	tcp_xmit_retransmit_queue(sk);
 }
 
@@ -2848,6 +2852,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, int prior_fackets)
 		tcp_clear_all_retrans_hints(tp);
 	}
 
+	tcp_verify_left_out(tp);
+
 	if (skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED))
 		flag |= FLAG_SACK_RENEGING;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9aea88b..21f5888 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,107 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	char s[50+1];
+	char h[50+1];
+	int idx = 0;
+	int i;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		for (i = 0; i < tcp_skb_pcount(skb); i++) {
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+				s[idx] = 'S';
+				if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+					s[idx] = 'B';
+
+			} else if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) {
+				s[idx] = 'L';
+			} else {
+				s[idx] = ' ';
+			}
+			if (s[idx] != ' ' && skb->len < tp->mss_cache)
+				s[idx] += 'a' - 'A';
+
+			if (i == 0) {
+				if (TCP_SKB_CB(skb)->seq == tcp_highest_sack_seq(tp))
+					h[idx] = 'h';
+				else
+					h[idx] = '+';
+			} else {
+				h[idx] = '-';
+			}
+
+			if (++idx >= 50) {
+				s[idx] = 0;
+				h[idx] = 0;
+				printk(KERN_ERR "TCP wq(s) %s\n", s);
+				printk(KERN_ERR "TCP wq(h) %s\n", h);
+				idx = 0;
+			}
+		}
+	}
+	if (idx) {
+		s[idx] = '<';
+		s[idx+1] = 0;
+		h[idx] = '<';
+		h[idx+1] = 0;
+		printk(KERN_ERR "TCP wq(s) %s\n", s);
+		printk(KERN_ERR "TCP wq(h) %s\n", h);
+	}
+	printk(KERN_ERR "l%u s%u f%u p%u seq: su%u hs%u sn%u\n",
+		tp->lost_out, tp->sacked_out, tp->fackets_out, tp->packets_out,
+		tp->snd_una, tcp_highest_sack_seq(tp), tp->snd_nxt);
+}
+
+void tcp_verify_wq(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	u32 lost = 0;
+	u32 sacked = 0;
+	u32 packets = 0;
+	struct sk_buff *skb;
+
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+			sacked += tcp_skb_pcount(skb);
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			printk(KERN_ERR "Sacked bitmap S+L: %u %u-%u/%u\n",
+				TCP_SKB_CB(skb)->sacked,
+				TCP_SKB_CB(skb)->end_seq - tp->snd_una,
+				TCP_SKB_CB(skb)->seq - tp->snd_una,
+				tp->snd_una);
+		}
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST)
+			lost += tcp_skb_pcount(skb);
+
+		packets += tcp_skb_pcount(skb);
+	}
+
+	WARN_ON(lost != tp->lost_out);
+	WARN_ON(sacked != tp->sacked_out);
+	WARN_ON(packets != tp->packets_out);
+	if ((lost != tp->lost_out) ||
+	    (sacked != tp->sacked_out) ||
+	    (packets != tp->packets_out)) {
+		printk(KERN_ERR "P: %u L: %u vs %u S: %u vs %u w: %u-%u (%u)\n",
+			tp->packets_out,
+			lost, tp->lost_out,
+			sacked, tp->sacked_out,
+			tp->snd_una, tp->snd_nxt,
+		       	tp->rx_opt.sack_ok);
+		tcp_print_queue(sk);
+	}
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 89f0188..648340f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -779,10 +779,9 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 			tp->lost_out -= diff;
 
 		/* Adjust Reno SACK estimate. */
-		if (tcp_is_reno(tp) && diff > 0) {
+		if (tcp_is_reno(tp) && diff > 0)
 			tcp_dec_pcount_approx_int(&tp->sacked_out, diff);
-			tcp_verify_left_out(tp);
-		}
+
 		tcp_adjust_fackets_out(sk, skb, diff);
 	}
 
@@ -790,6 +789,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len,
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
 
+	tcp_verify_left_out(tp);
+
 	return 0;
 }
 
@@ -1463,6 +1464,7 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 	} else if (result > 0) {
 		sent_pkts = 1;
 	}
+	tcp_verify_left_out(tp);
 
 	while ((skb = tcp_send_head(sk))) {
 		unsigned int limit;
@@ -1764,6 +1766,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb,
 	tcp_clear_retrans_hints_partial(tp);
 
 	sk_wmem_free_skb(sk, next_skb);
+	tcp_verify_left_out(tp);
 }
 
 /* Do a simple retransmit without using the backoff mechanisms in
@@ -1795,13 +1798,13 @@ void tcp_simple_retransmit(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	tcp_clear_all_retrans_hints(tp);
 
 	if (!lost)
 		return;
 
-	tcp_verify_left_out(tp);
-
 	/* Don't muck with the congestion window here.
 	 * Reason is that we do not increase amount of _data_
 	 * in network, but units changed and effective
@@ -1970,8 +1973,10 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 			 * packet to be MSS sized and all the
 			 * packet counting works out.
 			 */
-			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
+			if (tcp_packets_in_flight(tp) >= tp->snd_cwnd) {
+				tcp_verify_left_out(tp);
 				return;
+			}
 
 			if (sacked & TCPCB_LOST) {
 				if (!(sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))) {
@@ -1997,6 +2002,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		}
 	}
 
+	tcp_verify_left_out(tp);
+
 	/* OK, demanded retransmission is finished. */
 
 	/* Forward retransmissions are possible only during Recovery. */
@@ -2054,6 +2061,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 		NET_INC_STATS_BH(LINUX_MIB_TCPFORWARDRETRANS);
 	}
+
+	tcp_verify_left_out(tp);
 }
 
 /* Send a fin.  The caller locks the socket for us.  This cannot be
-- 
1.5.2.2

^ permalink raw reply related

* Re: [GIT PULL] [IPV6,IPV4]: Fix several sparse warnings.
From: David Miller @ 2008-01-22 10:44 UTC (permalink / raw)
  To: dada1; +Cc: yoshfuji, netdev
In-Reply-To: <20080122110312.10445043.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 22 Jan 2008 11:03:12 +0100

> On Tue, 22 Jan 2008 18:56:32 +0900 (JST)
> YOSHIFUJI Hideaki / 吉藤英明  <yoshfuji@linux-ipv6.org> wrote:
> 
> > @@ -418,7 +418,7 @@ out:
> >  
> >  void udp_err(struct sk_buff *skb, u32 info)
> >  {
> > -	return __udp4_lib_err(skb, info, udp_hash);
> > +	__udp4_lib_err(skb, info, udp_hash);
> >  }
> 
> Hum... On this one, I would say Sparse is picky, not to say buggy :(

Agreed, but making this change is harmless :-)

^ permalink raw reply

* Re: [GIT PULL] [IPV6,IPV4]: Fix several sparse warnings.
From: David Miller @ 2008-01-22 10:42 UTC (permalink / raw)
  To: yoshfuji; +Cc: netdev
In-Reply-To: <20080122.185632.09970660.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Tue, 22 Jan 2008 18:56:32 +0900 (JST)

> Dave, please consider pulling following changes on top of net-2.6.25 tree:
> 	git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev.git net-2.6-dev-20080122

Pulled, thank you.

^ permalink raw reply

* Re: [PATCH 2/3] virtio: Net header needs gso_hdr_len
From: Herbert Xu @ 2008-01-22 10:36 UTC (permalink / raw)
  To: Rusty Russell; +Cc: netdev, virtualization
In-Reply-To: <200801161519.03339.rusty@rustcorp.com.au>

On Wed, Jan 16, 2008 at 03:19:03PM +1100, Rusty Russell wrote:
> > > It's far easier to deal with GSO if we don't have to parse the packet
> > > to figure out the header length.  Add the field to the virtio_net_hdr
> > > struct (and fix the spaces that somehow crept in there).
>
> > Why do we need this? When receiving GSO packets from an untrusted
> > source the network stack will fill in the transport header offset
> > after verifying that the headers are sane.
> 
> Thanks for clarifying; it simplifies things.

Actually now that I've tried your test program I can see that this
field exists not because of GSO, but because of SG.  It tells you
how many bytes you want to put in the skb head as opposed to the
frag array.

So this field is fine with me as long as it is named as such to
avoid confusion since it really has nothing to do with GSO as you
also need it for SG with large MTUs.

I think this is more flexible than the Xen approach where this is
essentially hard-coded to 64 bytes.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [IPV4] ip_gre: should take care of CONFIG_IPV6_MODULE
From: David Miller @ 2008-01-22 10:22 UTC (permalink / raw)
  To: kaber; +Cc: dada1, netdev
In-Reply-To: <4795A786.5080606@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Tue, 22 Jan 2008 09:21:26 +0100

> Eric Dumazet wrote:
> > If IPV6 is configured as a module, GRE code misses some IPV6 parts.
> 
> 
> I believe this is intentional to avoid a runtime dependency on ipv6.
> Fixing this without pulling in the ipv6 module would be preferrable.

Unfortunately this is true.

The only symbol it really needs that isn't provided statically
is icmpv6_send() which is very unfortunate.

Other things it wants like ipv6_addr_type() are already provided
statically in net/ipv6/addrconf_core.c and the appropriate
net/ipv6/Makefile:obj-y rules.

^ permalink raw reply

* Re: RFC: igb: Intel 82575 gigabit ethernet driver (take #3)
From: Jeff Garzik @ 2008-01-22 10:05 UTC (permalink / raw)
  To: Kok, Auke
  Cc: NetDev, Arjan van de Ven, Jesse Brandeburg, Ronciak, John,
	Andrew Morton
In-Reply-To: <4786AB0C.6010202@intel.com>

Kok, Auke wrote:
> All,
> 
> here is the third version of the igb (82575) ethernet controller driver. This
> driver was previously posted 2007-07-13 and 2007-12-11. Many comments received
> were addressed:
> 
> - removed indirection wrappers in the same way as e1000e and ixgbe.
> - cleaned up largely against sparse, checkpatch
> - removed module parameters and moved functionality to ethtool ioctls
> - new NAPI API rewrites
> - by default the driver runs in multiqueue mode with 2 to 40 RX queues enabled.
> 
> and specifically in this version:
> 
> - register macro's were condensed for readability
> - fixed namespace collisions by renaming functions to igb_*
> 
> Since the driver is still too large (allthough the patch shrunk from 558k to 416k
> to 407k, almost 38% of its size) to post to this list I am attaching the bzipped
> patch here. You can get the same driver alternatively from here:
> 
> http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch
> [407k]
> http://foo-projects.org/~sofar/0001-igb-PCI-Express-82575-Gigabit-Ethernet-driver.patch.bz2
> [74k]
> 
> or through git:
>     git://lost.foo-projects.org/~ahkok/git/linux-2.6 #igb
> 
> 
> There are several concerns still open for this driver:
> - hardware code is still a large API. we're expecting more hardware to be
> supported by this driver in the future. The API has already been scrubbed but we
> anticipate that the remaining hooks will be used in the future.
> - The register defines are still named "E1000_" as they are mostly identical to
> the e1000 chipsets (igb register space is a superset of most recent e1000 register
> sets).
> 
> 
> Please review,
> 
> 
> Cheers,
> 
> Auke
> 
> ---
> 
>>From 4ec9e52f44de0c1c41265c5f326b573643f24da7 Mon Sep 17 00:00:00 2001
> From: Auke Kok <auke-jan.h.kok@intel.com>
> Date: Thu, 10 Jan 2008 14:55:46 -0800
> Subject: [PATCH] igb: PCI-Express 82575 Gigabit Ethernet driver
> 
> We are pleased to announce a new Gigabit Ethernet product and its
> driver to the linux community. This product is the Intel(R) 82575
> Gigabit Ethernet adapter family. Physical adapters will be available
> to the public soon. These adapters come in 2- and 4-port versions
> (copper PHY) currently. Other variants will be available later.
> 
> The 82575 chipset supports significantly different features that
> warrant a new driver. The descriptor format is (just like the
> ixgbe driver) different. The device can use multiple MSI-X vectors
> and multiple queues for both send and receive. This allows us to
> optimize some of the driver code specifically as well compared to
> the e1000-supported devices.
> 
> This version of the igb driver no lnger uses fake netdevices and
> incorporates napi_struct members for each ring to do the multi-
> queue polling. multi-queue is enabled by default and the driver
> supports NAPI mode only.
> 
> All the namespace collisions should be gone in this version too. The
> register macro's have been condensed to improve readability.
> 
> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
> ---
>  drivers/net/Kconfig             |   22 +
>  drivers/net/Makefile            |    1 +
>  drivers/net/igb/Makefile        |   37 +
>  drivers/net/igb/e1000_82575.c   | 1269 ++++++++++++
>  drivers/net/igb/e1000_82575.h   |  150 ++
>  drivers/net/igb/e1000_defines.h |  772 ++++++++
>  drivers/net/igb/e1000_hw.h      |  599 ++++++
>  drivers/net/igb/e1000_mac.c     | 1505 ++++++++++++++
>  drivers/net/igb/e1000_mac.h     |   98 +
>  drivers/net/igb/e1000_nvm.c     |  605 ++++++
>  drivers/net/igb/e1000_nvm.h     |   40 +
>  drivers/net/igb/e1000_phy.c     | 1807 +++++++++++++++++
>  drivers/net/igb/e1000_phy.h     |   98 +
>  drivers/net/igb/e1000_regs.h    |  270 +++
>  drivers/net/igb/igb.h           |  300 +++
>  drivers/net/igb/igb_ethtool.c   | 1927 ++++++++++++++++++
>  drivers/net/igb/igb_main.c      | 4138 +++++++++++++++++++++++++++++++++++++++
>  17 files changed, 13638 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/igb/Makefile
>  create mode 100644 drivers/net/igb/e1000_82575.c
>  create mode 100644 drivers/net/igb/e1000_82575.h
>  create mode 100644 drivers/net/igb/e1000_defines.h
>  create mode 100644 drivers/net/igb/e1000_hw.h
>  create mode 100644 drivers/net/igb/e1000_mac.c
>  create mode 100644 drivers/net/igb/e1000_mac.h
>  create mode 100644 drivers/net/igb/e1000_nvm.c
>  create mode 100644 drivers/net/igb/e1000_nvm.h
>  create mode 100644 drivers/net/igb/e1000_phy.c
>  create mode 100644 drivers/net/igb/e1000_phy.h
>  create mode 100644 drivers/net/igb/e1000_regs.h
>  create mode 100644 drivers/net/igb/igb.h
>  create mode 100644 drivers/net/igb/igb_ethtool.c
>  create mode 100644 drivers/net/igb/igb_main.c

applied



^ permalink raw reply

* Re: [GIT PULL] [IPV6,IPV4]: Fix several sparse warnings.
From: Eric Dumazet @ 2008-01-22 10:03 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明; +Cc: davem, netdev
In-Reply-To: <20080122.185632.09970660.yoshfuji@linux-ipv6.org>

On Tue, 22 Jan 2008 18:56:32 +0900 (JST)
YOSHIFUJI Hideaki / 吉藤英明  <yoshfuji@linux-ipv6.org> wrote:


> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index cb2411c..ecd9d91 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -418,7 +418,7 @@ out:
>  
>  void udp_err(struct sk_buff *skb, u32 info)
>  {
> -	return __udp4_lib_err(skb, info, udp_hash);
> +	__udp4_lib_err(skb, info, udp_hash);
>  }

Hum... On this one, I would say Sparse is picky, not to say buggy :(

^ permalink raw reply

* [GIT PULL] [IPV6,IPV4]: Fix several sparse warnings.
From: YOSHIFUJI Hideaki / 吉藤英明 @ 2008-01-22  9:56 UTC (permalink / raw)
  To: davem; +Cc: yoshfuji, netdev

Dave, please consider pulling following changes on top of net-2.6.25 tree:
	git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-dev.git net-2.6-dev-20080122

Thank you.

HEADLINES
---------

    [IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void.
    [IPV6] UDP,UDPLITE: Sparse: {__udp6_lib,udp,udplite}_err() are of void.
    [IPV6] UDPLITE: Sparse: Declare non-static symbols in header.
    [IPV6] ADDRLABEL: Sparse: Make several functions static.
    [IPV6]: Sparse: Declare non-static ipv6_{route,icmp,frag}_sysctl_init() in header.
    [IPV6] ADDRCONF: Sparse: Make inet6_dump_addr() code paths more straight-forward.
    [IPV6] NDISC: Sparse: Use different variable name for local use.

DIFFSTAT
--------

 include/net/ipv6.h         |    4 ++++
 net/ipv4/udp.c             |    2 +-
 net/ipv4/udplite.c         |    2 +-
 net/ipv6/addrconf.c        |   38 ++++++++++++++++++--------------------
 net/ipv6/addrlabel.c       |   20 ++++++++++----------
 net/ipv6/af_inet6.c        |    2 --
 net/ipv6/ndisc.c           |   10 +++++-----
 net/ipv6/sysctl_net_ipv6.c |    3 ---
 net/ipv6/udp.c             |    2 +-
 net/ipv6/udp_impl.h        |    1 +
 net/ipv6/udplite.c         |    2 +-
 11 files changed, 42 insertions(+), 44 deletions(-)

CHANGESETS
----------

commit 9c14555fec7d209c90ae5079c59dc9a338620fd7
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:05:31 2008 +0900

    [IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void.
    
    Fix following sparse warnings:
    | net/ipv4/udp.c:421:2: warning: returning void-valued expression
    | net/ipv4/udplite.c:38:2: warning: returning void-valued expression
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cb2411c..ecd9d91 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -418,7 +418,7 @@ out:
 
 void udp_err(struct sk_buff *skb, u32 info)
 {
-	return __udp4_lib_err(skb, info, udp_hash);
+	__udp4_lib_err(skb, info, udp_hash);
 }
 
 /*
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index f5baeb3..001b881 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -35,7 +35,7 @@ static int udplite_rcv(struct sk_buff *skb)
 
 static void udplite_err(struct sk_buff *skb, u32 info)
 {
-	return __udp4_lib_err(skb, info, udplite_hash);
+	__udp4_lib_err(skb, info, udplite_hash);
 }
 
 static	struct net_protocol udplite_protocol = {

---
commit feafbe254cd11496370192a08dbdc1d0ddda226f
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:09:55 2008 +0900

    [IPV6] UDP,UDPLITE: Sparse: {__udp6_lib,udp,udplite}_err() are of void.
    
    Fix following sparse warnings:
    | net/ipv6/udp.c:262:2: warning: returning void-valued expression
    | net/ipv6/udplite.c:29:2: warning: returning void-valued expression
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index bf58aca..bd4b9df 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -259,7 +259,7 @@ static __inline__ void udpv6_err(struct sk_buff *skb,
 				 struct inet6_skb_parm *opt, int type,
 				 int code, int offset, __be32 info     )
 {
-	return __udp6_lib_err(skb, opt, type, code, offset, info, udp_hash);
+	__udp6_lib_err(skb, opt, type, code, offset, info, udp_hash);
 }
 
 int udpv6_queue_rcv_skb(struct sock * sk, struct sk_buff *skb)
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index 39f0705..87d4202 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -26,7 +26,7 @@ static void udplitev6_err(struct sk_buff *skb,
 			  struct inet6_skb_parm *opt,
 			  int type, int code, int offset, __be32 info)
 {
-	return __udp6_lib_err(skb, opt, type, code, offset, info, udplite_hash);
+	__udp6_lib_err(skb, opt, type, code, offset, info, udplite_hash);
 }
 
 static struct inet6_protocol udplitev6_protocol = {

---
commit ce97db1c7fa125b3f24a3d424a6373824a0bca37
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:25:46 2008 +0900

    [IPV6] UDPLITE: Sparse: Declare non-static symbols in header.
    
    Fix the following sparse warnings:
    | net/ipv6/udplite.c:45:14: warning: symbol 'udplitev6_prot' was not declared. Should it be static?
    | net/ipv6/udplite.c:80:12: warning: symbol 'udplitev6_init' was not declared. Should it be static?
    | net/ipv6/udplite.c:99:6: warning: symbol 'udplitev6_exit' was not declared. Should it be static?
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index 2d3fda6..21be3a8 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -5,6 +5,7 @@
 #include <net/protocol.h>
 #include <net/addrconf.h>
 #include <net/inet_common.h>
+#include <net/transp_v6.h>
 
 extern int  	__udp6_lib_rcv(struct sk_buff *, struct hlist_head [], int );
 extern void 	__udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *,

---
commit c70651db4683cdaec05d83b91b6a53560f045a27
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:12:50 2008 +0900

    [IPV6] ADDRLABEL: Sparse: Make several functions static.
    
    Fix following sparse warnings:
    | net/ipv6/addrlabel.c:172:25: warning: symbol 'ip6addrlbl_alloc' was not declared. Should it be static?
    | net/ipv6/addrlabel.c:219:5: warning: symbol '__ip6addrlbl_add' was not declared. Should it be static?
    | net/ipv6/addrlabel.c:260:5: warning: symbol 'ip6addrlbl_add' was not declared. Should it be static?
    | net/ipv6/addrlabel.c:285:5: warning: symbol '__ip6addrlbl_del' was not declared. Should it be static?
    | net/ipv6/addrlabel.c:311:5: warning: symbol 'ip6addrlbl_del' was not declared. Should it be static?
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index 6f1ca60..3867412 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -169,9 +169,9 @@ u32 ipv6_addr_label(const struct in6_addr *addr, int type, int ifindex)
 }
 
 /* allocate one entry */
-struct ip6addrlbl_entry *ip6addrlbl_alloc(const struct in6_addr *prefix,
-					  int prefixlen, int ifindex,
-					  u32 label)
+static struct ip6addrlbl_entry *ip6addrlbl_alloc(const struct in6_addr *prefix,
+						 int prefixlen, int ifindex,
+						 u32 label)
 {
 	struct ip6addrlbl_entry *newp;
 	int addrtype;
@@ -216,7 +216,7 @@ struct ip6addrlbl_entry *ip6addrlbl_alloc(const struct in6_addr *prefix,
 }
 
 /* add a label */
-int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)
+static int __ip6addrlbl_add(struct ip6addrlbl_entry *newp, int replace)
 {
 	int ret = 0;
 
@@ -257,8 +257,8 @@ out:
 }
 
 /* add a label */
-int ip6addrlbl_add(const struct in6_addr *prefix, int prefixlen,
-		       int ifindex, u32 label, int replace)
+static int ip6addrlbl_add(const struct in6_addr *prefix, int prefixlen,
+			  int ifindex, u32 label, int replace)
 {
 	struct ip6addrlbl_entry *newp;
 	int ret = 0;
@@ -282,8 +282,8 @@ int ip6addrlbl_add(const struct in6_addr *prefix, int prefixlen,
 }
 
 /* remove a label */
-int __ip6addrlbl_del(const struct in6_addr *prefix, int prefixlen,
-			  int ifindex)
+static int __ip6addrlbl_del(const struct in6_addr *prefix, int prefixlen,
+			    int ifindex)
 {
 	struct ip6addrlbl_entry *p = NULL;
 	struct hlist_node *pos, *n;
@@ -308,8 +308,8 @@ int __ip6addrlbl_del(const struct in6_addr *prefix, int prefixlen,
 	return ret;
 }
 
-int ip6addrlbl_del(const struct in6_addr *prefix, int prefixlen,
-		       int ifindex)
+static int ip6addrlbl_del(const struct in6_addr *prefix, int prefixlen,
+			  int ifindex)
 {
 	struct in6_addr prefix_buf;
 	int ret;

---
commit 50207356bc5026b53dbf99a3e86c28a683ae6745
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:18:38 2008 +0900

    [IPV6]: Sparse: Declare non-static ipv6_{route,icmp,frag}_sysctl_init() in header.
    
    Fix the following sparse warnings:
    | net/ipv6/route.c:2491:18: warning: symbol 'ipv6_route_sysctl_init' was not declared. Should it be static?
    | net/ipv6/icmp.c:922:18: warning: symbol 'ipv6_icmp_sysctl_init' was not declared. Should it be static?
    | net/ipv6/reassembly.c:628:6: warning: symbol 'ipv6_frag_sysctl_init' was not declared. Should it be static?
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index c8e8cb2..3712cae 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -586,6 +586,10 @@ extern int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
 			 int __user *optlen);
 
 #ifdef CONFIG_PROC_FS
+extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
+extern void ipv6_frag_sysctl_init(struct net *net);
+extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
+
 extern int  ac6_proc_init(void);
 extern void ac6_proc_exit(void);
 extern int  raw6_proc_init(void);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3150c4b..6738a7b 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -72,8 +72,6 @@ MODULE_LICENSE("GPL");
 static struct list_head inetsw6[SOCK_MAX];
 static DEFINE_SPINLOCK(inetsw6_lock);
 
-void ipv6_frag_sysctl_init(struct net *net);
-
 static __inline__ struct ipv6_pinfo *inet6_sk_generic(struct sock *sk)
 {
 	const int offset = sk->sk_prot->obj_size - sizeof(struct ipv6_pinfo);
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 5e0af4d..7197eb7 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -14,9 +14,6 @@
 #include <net/addrconf.h>
 #include <net/inet_frag.h>
 
-extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
-extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
-
 static ctl_table ipv6_table_template[] = {
 	{
 		.ctl_name	= NET_IPV6_ROUTE,

---
commit 2aa6b4e605b700e10943afd9f34cd0527304f3a3
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:29:40 2008 +0900

    [IPV6] ADDRCONF: Sparse: Make inet6_dump_addr() code paths more straight-forward.
    
    Fix the following sparse warning:
    | net/ipv6/addrconf.c:3384:2: warning: context imbalance in 'inet6_dump_addr' - different lock contexts for basic block
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index aba7b5d..e40213d 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3335,11 +3335,11 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 			     ifa = ifa->if_next, ip_idx++) {
 				if (ip_idx < s_ip_idx)
 					continue;
-				if ((err = inet6_fill_ifaddr(skb, ifa,
-				    NETLINK_CB(cb->skb).pid,
-				    cb->nlh->nlmsg_seq, RTM_NEWADDR,
-				    NLM_F_MULTI)) <= 0)
-					goto done;
+				err = inet6_fill_ifaddr(skb, ifa,
+							NETLINK_CB(cb->skb).pid,
+							cb->nlh->nlmsg_seq,
+							RTM_NEWADDR,
+							NLM_F_MULTI);
 			}
 			break;
 		case MULTICAST_ADDR:
@@ -3348,11 +3348,11 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 			     ifmca = ifmca->next, ip_idx++) {
 				if (ip_idx < s_ip_idx)
 					continue;
-				if ((err = inet6_fill_ifmcaddr(skb, ifmca,
-				    NETLINK_CB(cb->skb).pid,
-				    cb->nlh->nlmsg_seq, RTM_GETMULTICAST,
-				    NLM_F_MULTI)) <= 0)
-					goto done;
+				err = inet6_fill_ifmcaddr(skb, ifmca,
+							  NETLINK_CB(cb->skb).pid,
+							  cb->nlh->nlmsg_seq,
+							  RTM_GETMULTICAST,
+							  NLM_F_MULTI);
 			}
 			break;
 		case ANYCAST_ADDR:
@@ -3361,11 +3361,11 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 			     ifaca = ifaca->aca_next, ip_idx++) {
 				if (ip_idx < s_ip_idx)
 					continue;
-				if ((err = inet6_fill_ifacaddr(skb, ifaca,
-				    NETLINK_CB(cb->skb).pid,
-				    cb->nlh->nlmsg_seq, RTM_GETANYCAST,
-				    NLM_F_MULTI)) <= 0)
-					goto done;
+				err = inet6_fill_ifacaddr(skb, ifaca,
+							  NETLINK_CB(cb->skb).pid,
+							  cb->nlh->nlmsg_seq,
+							  RTM_GETANYCAST,
+							  NLM_F_MULTI);
 			}
 			break;
 		default:
@@ -3373,14 +3373,12 @@ static int inet6_dump_addr(struct sk_buff *skb, struct netlink_callback *cb,
 		}
 		read_unlock_bh(&idev->lock);
 		in6_dev_put(idev);
+
+		if (err <= 0)
+			break;
 cont:
 		idx++;
 	}
-done:
-	if (err <= 0) {
-		read_unlock_bh(&idev->lock);
-		in6_dev_put(idev);
-	}
 	cb->args[0] = idx;
 	cb->args[1] = ip_idx;
 	return skb->len;

---
commit b825a8d0e0d210bffeec948b1790c3be4c3c5448
Author: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Date:   Tue Jan 22 17:32:53 2008 +0900

    [IPV6] NDISC: Sparse: Use different variable name for local use.
    
    Fix the following sparse warnings:
    | net/ipv6/ndisc.c:1300:21: warning: symbol 'opt' shadows an earlier one
    | net/ipv6/ndisc.c:1078:7: originally declared here
    
    Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index e1554ba..92b6775 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1297,11 +1297,11 @@ skip_defrtr:
 	}
 
 	if (ndopts.nd_useropts) {
-		struct nd_opt_hdr *opt;
-		for (opt = ndopts.nd_useropts;
-		     opt;
-		     opt = ndisc_next_useropt(opt, ndopts.nd_useropts_end)) {
-				ndisc_ra_useropt(skb, opt);
+		struct nd_opt_hdr *p;
+		for (p = ndopts.nd_useropts;
+		     p;
+		     p = ndisc_next_useropt(p, ndopts.nd_useropts_end)) {
+			ndisc_ra_useropt(skb, p);
 		}
 	}
 

---

-- 
YOSHIFUJI Hideaki @ USAGI Project  <yoshfuji@linux-ipv6.org>
GPG-FP  : 9022 65EB 1ECF 3AD1 0BDF  80D8 4807 F894 E062 0EEA

^ permalink raw reply related

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Dave Young @ 2008-01-22  9:18 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: LKML, David Miller, Netdev, Andrew Morton
In-Reply-To: <a8e1da0801220109v6bf8931ev50f2210402c3ba41@mail.gmail.com>

On Jan 22, 2008 5:09 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
>
> On Jan 22, 2008 12:37 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
> >
> > On Jan 22, 2008 5:14 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > >
> > > On Mon, 21 Jan 2008, Dave Young wrote:
> > >
> > > > Please see the kernel messages following,(trigged while using some qemu session)
> > > > BTW, seems there's some e100 error message as well.
> > > >
> > > > PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > > ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > > modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> > > > e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> > > > ACPI: PCI interrupt for device 0000:03:08.0 disabled
> > > > e100: probe of 0000:03:08.0 failed with error -12
> > > > eth0:  setting full-duplex.
> > > > ------------[ cut here ]------------
> > > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > > >  [<c0132100>] ? printk+0x0/0x20
> > > >  [<c0131834>] warn_on_slowpath+0x54/0x80
> > > >  [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
> > > >  [<c03e9527>] ? ip_output+0xe7/0x100
> > > >  [<c03e8a88>] ? ip_local_out+0x18/0x20
> > > >  [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
> > > >  [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
> > > >  [<c0186be1>] ? check_pad_bytes+0x61/0x80
> > > >  [<c03f6031>] tcp_mark_head_lost+0x121/0x150
> > > >  [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
> > > >  [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
> > > >  [<c03f7d93>] tcp_ack+0x1b3/0x3a0
> > > >  [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
> > > >  [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
> > > >  [<c0401bbb>] tcp_v4_rcv+0x5db/0x660
> > >
> > > Doh, once more these S+L things..., the rest are symptom of the first
> > > problem.
> >
> > What is the S+L thing? Could you explain a bit?
> >
> > >
> > > What is strange is that it doesn't show up until now, the last TCP
> > > changes that could have some significance are from early Dec/Nov. Is
> > > there some reason why you haven't seen this before this (e.g., not
> > > tested with similar cfg or so)?
> >
> > Hmm, don't know how to answer ...
> >
> >
> > I'm a bit worried about its
> > > reproducability if it takes this far to see it...
> > >
>
> It's trigged again in my pc, just while using firefox.

Maybe relate to the e100 error, I will apply jiri slaby's
e100-iomap-mem-accesses patch to test.
>
> > >
> > > --
> > >  i.
> > >
> >
>

^ permalink raw reply

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Dave Young @ 2008-01-22  9:09 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: LKML, David Miller, Netdev, Andrew Morton
In-Reply-To: <a8e1da0801212037uaa34a10xc2239ac7309a4ed0@mail.gmail.com>

On Jan 22, 2008 12:37 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
>
> On Jan 22, 2008 5:14 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> >
> > On Mon, 21 Jan 2008, Dave Young wrote:
> >
> > > Please see the kernel messages following,(trigged while using some qemu session)
> > > BTW, seems there's some e100 error message as well.
> > >
> > > PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> > > e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> > > ACPI: PCI interrupt for device 0000:03:08.0 disabled
> > > e100: probe of 0000:03:08.0 failed with error -12
> > > eth0:  setting full-duplex.
> > > ------------[ cut here ]------------
> > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > >  [<c0132100>] ? printk+0x0/0x20
> > >  [<c0131834>] warn_on_slowpath+0x54/0x80
> > >  [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
> > >  [<c03e9527>] ? ip_output+0xe7/0x100
> > >  [<c03e8a88>] ? ip_local_out+0x18/0x20
> > >  [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
> > >  [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
> > >  [<c0186be1>] ? check_pad_bytes+0x61/0x80
> > >  [<c03f6031>] tcp_mark_head_lost+0x121/0x150
> > >  [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
> > >  [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
> > >  [<c03f7d93>] tcp_ack+0x1b3/0x3a0
> > >  [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
> > >  [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
> > >  [<c0401bbb>] tcp_v4_rcv+0x5db/0x660
> >
> > Doh, once more these S+L things..., the rest are symptom of the first
> > problem.
>
> What is the S+L thing? Could you explain a bit?
>
> >
> > What is strange is that it doesn't show up until now, the last TCP
> > changes that could have some significance are from early Dec/Nov. Is
> > there some reason why you haven't seen this before this (e.g., not
> > tested with similar cfg or so)?
>
> Hmm, don't know how to answer ...
>
>
> I'm a bit worried about its
> > reproducability if it takes this far to see it...
> >

It's trigged again in my pc, just while using firefox.

> >
> > --
> >  i.
> >
>

^ permalink raw reply

* [PATCH] SCTP: Fix kernel panic while received AUTH chunk with BAD shared key identifier
From: Wei Yongjun @ 2008-01-22  8:29 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers, Vlad Yasevich

If SCTP-AUTH is enabled, received AUTH chunk with BAD shared key 
identifier will cause kernel panic.

Test as following:
step1: enabled /proc/sys/net/sctp/auth_enable
step 2:  connect  to SCTP server with auth capable. Association is 
established between endpoints. Then send a AUTH chunk with a bad 
shareid, SCTP server will kernel panic after received that AUTH chunk.

SCTP client                   SCTP server
  INIT         ---------->   
    (with auth capable)
               <----------    INIT-ACK
                              (with auth capable)
  COOKIE-ECHO  ---------->
               <----------    COOKIE-ACK
  AUTH         ---------->


AUTH chunk is like this:
  AUTH chunk
    Chunk type: AUTH (15)
    Chunk flags: 0x00
    Chunk length: 28
    Shared key identifier: 10
    HMAC identifier: SHA-1 (1)
    HMAC: 0000000000000000000000000000000000000000

kernel panic message:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000005
printing eip: c8f5de2e *pde = 07bc6067 *pte = 00000000
Oops: 0000 [#1] SMP
Modules linked in: sha256_generic md5 sctp ipv6 dm_mirror dm_mod sbs sbshc battery lp snd_ens1371 sg gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss floppy snd_mixer_oss ide_cd snd_pcm cdrom serio_raw ac snd_timer snd button pcnet32 soundcore mii snd_page_alloc parport_pc parport i2c_piix4 i2c_core pcspkr mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd

Pid: 0, comm: swapper Not tainted (2.6.24-rc8 #1)
EIP: 0060:[<c8f5de2e>] EFLAGS: 00010202 CPU: 0
EIP is at sctp_auth_asoc_create_secret+0xe9/0x1a1 [sctp]
EAX: 00000056 EBX: c701a940 ECX: c701ab00 EDX: 00000001
ESI: c7ae9444 EDI: fffffffe EBP: c701a940 ESP: c0756cc0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c0756000 task=c06d63a0 task.ti=c070f000)
Stack: 00000020 00000020 c7ae9444 c701ab00 c701ab00 c701a940 c0756da8 c701a948
       c7ae8000 c7ad1e48 c7bee300 c7ad1e40 c8f5e183 c04058c0 c38b9bc0 00010246
       c7ad1e48 c7ad1e48 c0756da8 00000014 c0460992 0000007b 0000007b 00000014
Call Trace:
 [<c8f5e183>] sctp_auth_calculate_hmac+0x5a/0x126 [sctp]
 [<c04058c0>] apic_timer_interrupt+0x28/0x30
 [<c0460992>] kmemdup+0x14/0x33
 [<c8f46157>] sctp_sf_authenticate+0x126/0x160 [sctp]
 [<c8f4a068>] sctp_sf_eat_auth+0x13c/0x159 [sctp]
 [<c8f5d32c>] sctp_cname+0x0/0x38 [sctp]
 [<c8f4a835>] sctp_do_sm+0xb4/0x103f [sctp]
 [<c8f4e639>] sctp_assoc_bh_rcv+0xc1/0xf4 [sctp]
 [<c8f52b77>] sctp_inq_push+0x2a/0x2d [sctp]
 [<c8f5d24b>] sctp_rcv+0x5c3/0x6a4 [sctp]
 [<c0425241>] try_to_wake_up+0x3bb/0x3c5
 [<c042256f>] find_busiest_group+0x204/0x5f3
 [<c05dd7be>] ip_local_deliver_finish+0xda/0x17d
 [<c05dd6c5>] ip_rcv_finish+0x2c5/0x2e4
 [<c05dd91d>] ip_rcv+0x0/0x237
 [<c05c13f1>] netif_receive_skb+0x328/0x392
 [<c05c37c4>] process_backlog+0x5c/0x9a
 [<c05c32d2>] net_rx_action+0x8d/0x163
 [<c0432db7>] run_timer_softirq+0x2f/0x156
 [<c042fdd3>] __do_softirq+0x5d/0xc1
 [<c0406f38>] do_softirq+0x59/0xa8
 [<c0441e6b>] tick_handle_periodic+0x17/0x5c
 [<c041ae2a>] smp_apic_timer_interrupt+0x74/0x80
 [<c0403c87>] default_idle+0x0/0x3e
 [<c0403c87>] default_idle+0x0/0x3e
 [<c04058c0>] apic_timer_interrupt+0x28/0x30
 [<c0403c87>] default_idle+0x0/0x3e
 [<c0403cb3>] default_idle+0x2c/0x3e
 [<c0403571>] cpu_idle+0x92/0xab
 [<c07148ea>] start_kernel+0x2f7/0x2ff
 [<c07140e0>] unknown_bootoption+0x0/0x195
 =======================
Code: 89 6c 24 14 89 54 24 10 78 08 89 6c 24 10 89 54 24 14 8b 74 24 08 8b 4c 24 10 8b 5c 24 14 8b 56 0c 8b 41 04 03 43 04 85 d2 74 03 <03> 42 04 8b 54 24 04 e8 eb fe ff ff 85 c0 89 44 24 18 0f 84 84
EIP: [<c8f5de2e>] sctp_auth_asoc_create_secret+0xe9/0x1a1 [sctp] SS:ESP 0068:c0756cc0
Kernel panic - not syncing: Fatal exception in interrupt


This patch fix this problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>

--- a/net/sctp/auth.c	2008-01-21 00:03:25.000000000 -0500
+++ b/net/sctp/auth.c	2008-01-21 21:31:47.000000000 -0500
@@ -420,15 +420,15 @@ struct sctp_shared_key *sctp_auth_get_sh
 				const struct sctp_association *asoc,
 				__u16 key_id)
 {
-	struct sctp_shared_key *key = NULL;
+	struct sctp_shared_key *key;
 
 	/* First search associations set of endpoint pair shared keys */
 	key_for_each(key, &asoc->endpoint_shared_keys) {
 		if (key->key_id == key_id)
-			break;
+			return key;
 	}
 
-	return key;
+	return NULL;
 }
 
 /*



^ permalink raw reply

* Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: Dave Young @ 2008-01-22  8:24 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: David Miller, netdev, linux-kernel, bluez-devel, cornelia.huck,
	gombasg, htejun, viro, kay.sievers, greg
In-Reply-To: <1200982696.7978.148.camel@aeonflux>

On Jan 22, 2008 2:18 PM, Marcel Holtmann <marcel@holtmann.org> wrote:
> Hi Dave,
>
> > > Add people missed in cc-list.
> >
> > Thanks Dave for your continued efforts on Bluetooth bugs like this.
> >
> > Marcel, are you going to review/ACK/integrate/push-upstream/whatever
> > any of these Bluetooth patches?
> >
> > It hasn't been getting much love from you as of late, you are one of
> > the listed maintainers, and I don't want to lose any of Dave's
> > valuable bug fixing work.

Thanks.

>
> I will be fully back in business next week. Just got stuck in a project
> that needed 200% of my time to get it going.
>
> > Or should I just handle it all directly?
>
> I followed the list only a little bit, but from what I have seen is that
> Dave is doing a great job in tracking all issues down to the real cause.

Thanks.

>
> I had a look at his last patch and after review, I agree that this is a
> possible solution. I only have two nitpicks about the coding style. So
> in del_conn the struct device declaration should be made after the
> struct hci_conn assignment from the container and I would put an extra
> empty line before the devel_del, put_device block. Nitpicks only.

Marcel, could you tell something more about your coding style?
I would like to submit patches about bluetooth according to your sytle
later If I have.

Maybe you could put it on the bluez web site or anywhere.

>
> Right now I can't think of any side effects by this patch. Actually I
> only see an improvement with this patch. So please take it directly and
> starting with next week, I gonna make sure that they are handled again
> properly by me.
>
> Regards
>
> Marcel
>
>
>

^ permalink raw reply

* Re: [IPV4] ip_gre: should take care of CONFIG_IPV6_MODULE
From: Patrick McHardy @ 2008-01-22  8:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List
In-Reply-To: <4795A096.60904@cosmosbay.com>

Eric Dumazet wrote:
> If IPV6 is configured as a module, GRE code misses some IPV6 parts.


I believe this is intentional to avoid a runtime dependency on ipv6.
Fixing this without pulling in the ipv6 module would be preferrable.


^ permalink raw reply

* [IPV4] ip_gre: should take care of CONFIG_IPV6_MODULE
From: Eric Dumazet @ 2008-01-22  7:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 120 bytes --]

If IPV6 is configured as a module, GRE code misses some IPV6 parts.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: gre.patch --]
[-- Type: text/plain, Size: 1380 bytes --]

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4b93f32..beaf450 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -40,7 +40,7 @@
 #include <net/inet_ecn.h>
 #include <net/xfrm.h>
 
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 #include <net/ipv6.h>
 #include <net/ip6_fib.h>
 #include <net/ip6_route.h>
@@ -705,7 +705,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 			if ((dst = rt->rt_gateway) == 0)
 				goto tx_error_icmp;
 		}
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 		else if (skb->protocol == htons(ETH_P_IPV6)) {
 			struct in6_addr *addr6;
 			int addr_type;
@@ -778,7 +778,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 			goto tx_error;
 		}
 	}
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 	else if (skb->protocol == htons(ETH_P_IPV6)) {
 		struct rt6_info *rt6 = (struct rt6_info*)skb->dst;
 
@@ -851,7 +851,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	if ((iph->ttl = tiph->ttl) == 0) {
 		if (skb->protocol == htons(ETH_P_IP))
 			iph->ttl = old_iph->ttl;
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 		else if (skb->protocol == htons(ETH_P_IPV6))
 			iph->ttl = ((struct ipv6hdr*)old_iph)->hop_limit;
 #endif

^ permalink raw reply related

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: David Miller @ 2008-01-22  7:32 UTC (permalink / raw)
  To: yanmin_zhang; +Cc: dada1, rick.jones2, netdev
In-Reply-To: <1200984752.3151.261.camel@ymzhang>

From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Date: Tue, 22 Jan 2008 14:52:32 +0800

> I double-checked it and they are queued to socket A. If I define a
> different local port for netperf, packets will be queued to socket
> B.

This does not prove the kernel is buggy.

If netperf is binding to devices, that could make the kernel consider
the 0.0.0.0 bound socket equally preferable to the 127.0.0.1 bound
one.  When preference is equal, the first socket in the list is
choosen.

The algorithm is in net/ipv4/udp.c:__udp4_lib_lookup(), you
can look for yourself.  It uses a scoring system to decide
which socket to match.  Binding to a specific device gives
the score two points, so does binding to a specific local
address.


^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-22  7:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, slavon, kaber, hadi
In-Reply-To: <20080122072152.GA977@ff.dom.local>

On Tue, Jan 22, 2008 at 08:21:52AM +0100, Jarek Poplawski wrote:
...

Part 2 of mini RFC:

HTB changes to use new, faster gen_estimator functions._

This is done against 2.6.24-rc8-mm1. 

Thanks,
Jarek P.

---

diff -Nurp 2.6.24-rc8-mm1-/net/sched/sch_htb.c 2.6.24-rc8-mm1+/net/sched/sch_htb.c
--- 2.6.24-rc8-mm1-/net/sched/sch_htb.c	2008-01-19 17:54:49.000000000 +0100
+++ 2.6.24-rc8-mm1+/net/sched/sch_htb.c	2008-01-22 00:00:31.000000000 +0100
@@ -127,6 +127,7 @@ struct htb_class {
 	int prio;		/* For parent to leaf return possible here */
 	int quantum;		/* we do backup. Finally full replacement  */
 				/* of un.leaf originals should be done. */
+	unsigned long	gen_estimator;	/* ngen_new_estimator() data */
 };
 
 static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate,
@@ -1195,7 +1196,7 @@ static void htb_destroy_class(struct Qdi
 		BUG_TRAP(cl->un.leaf.q);
 		qdisc_destroy(cl->un.leaf.q);
 	}
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	ngen_kill_estimator(&cl->gen_estimator);
 	qdisc_put_rtab(cl->rate);
 	qdisc_put_rtab(cl->ceil);
 
@@ -1348,9 +1349,10 @@ static int htb_change_class(struct Qdisc
 		if ((cl = kzalloc(sizeof(*cl), GFP_KERNEL)) == NULL)
 			goto failure;
 
-		gen_new_estimator(&cl->bstats, &cl->rate_est,
-				  &sch->dev->queue_lock,
-				  tca[TCA_RATE-1] ? : &est.rta);
+		ngen_new_estimator(&cl->bstats, &cl->rate_est,
+				   &sch->dev->queue_lock,
+				   tca[TCA_RATE-1] ? : &est.rta,
+				   &cl->gen_estimator);
 		cl->refcnt = 1;
 		INIT_LIST_HEAD(&cl->sibling);
 		INIT_HLIST_NODE(&cl->hlist);
@@ -1404,9 +1406,10 @@ static int htb_change_class(struct Qdisc
 			      parent ? &parent->children : &q->root);
 	} else {
 		if (tca[TCA_RATE-1])
-			gen_replace_estimator(&cl->bstats, &cl->rate_est,
-					      &sch->dev->queue_lock,
-					      tca[TCA_RATE-1]);
+			ngen_replace_estimator(&cl->bstats, &cl->rate_est,
+					       &sch->dev->queue_lock,
+					       tca[TCA_RATE-1],
+					       &cl->gen_estimator);
 		sch_tree_lock(sch);
 	}
 

^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-22  7:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, slavon, kaber, hadi
In-Reply-To: <20080121.162918.148129860.davem@davemloft.net>

On 22-01-2008 01:29, David Miller wrote:
...
> Fix this right, make a structure like:
> 
> struct kernel_gnet_stats_rate_est {
> 	struct gnet_stats_rate_est	est;
> 	void				*gen_estimator;
> }
> 
> And update all the code as needed.

Thanks! I'll try this...

...But, as a matter of fact I've thought about something similar (of
course much worse), and I was afraid of doing quite a lot of changes
at once, maybe again skip something like here. So, maybe one more
tiny RFC here...

I've tried this from the other side: to make alternative, new api of
gen_estimator functions, and then the rest of changes without hurry.
This looks not very nice, but IMHO should be safer (especially
considering my 'knowledge' of this code and current changes). There
is this not very nice additional parameter used e.g. in
ngen_new_estimator(), but it seems, after all changes, this should
be more visible how and where this could be optimized. (And, after
all, this new pointer shouldn't be used very often, so could sit a
bit further.)

Anyway, if you don't like this idea, let me know and I'll try yours.
It will only need more time for this.

This is done against 2.6.24-rc8-mm1 with this 3/3 cosmetic patch. 
I'll send soon part 2: htb patch to use this.

Regards,
Jarek P.

---

 include/net/gen_stats.h  |   11 +++++
 net/core/gen_estimator.c |   99 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 101 insertions(+), 9 deletions(-)


diff -Nurp 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h
--- 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h	2007-10-09 22:31:38.000000000 +0200
+++ 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h	2008-01-22 00:13:48.000000000 +0100
@@ -46,4 +46,15 @@ extern int gen_replace_estimator(struct 
 				 struct gnet_stats_rate_est *rate_est,
 				 spinlock_t *stats_lock, struct rtattr *opt);
 
+extern int ngen_new_estimator(struct gnet_stats_basic *bstats,
+			      struct gnet_stats_rate_est *rate_est,
+			      spinlock_t *stats_lock, struct rtattr *opt,
+			      unsigned long *pgen_estimator);
+extern void ngen_kill_estimator(unsigned long *pgen_estimator);
+
+extern int ngen_replace_estimator(struct gnet_stats_basic *bstats,
+				  struct gnet_stats_rate_est *rate_est,
+				  spinlock_t *stats_lock, struct rtattr *opt,
+				  unsigned long *pgen_estimator);
+
 #endif
diff -Nurp 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c
--- 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c	2008-01-22 00:01:30.000000000 +0100
+++ 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c	2008-01-22 00:22:37.000000000 +0100
@@ -140,26 +140,30 @@ skip:
 }
 
 /**
- * gen_new_estimator - create a new rate estimator
+ * ngen_new_estimator - create a new rate estimator (new version)
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  * @stats_lock: statistics lock
  * @opt: rate estimator configuration TLV
+ * @pgen_estimator: pointer to return ngen_new_estimator data
  *
  * Creates a new rate estimator with &bstats as source and &rate_est
  * as destination. A new timer with the interval specified in the
  * configuration TLV is created. Upon each interval, the latest statistics
  * will be read from &bstats and the estimated rate will be stored in
  * &rate_est with the statistics lock grabed during this period.
+ * Called directly for pgen_estimator and possibility of fast kill
+ * or indirectly by gen_new_estimator.
  *
- * Returns 0 on success or a negative error code.
+ * Returns 0 and data pointed by &pgen_estimator on success
+ * or a negative error code.
  *
  * NOTE: Called under rtnl_mutex
  */
-int gen_new_estimator(struct gnet_stats_basic *bstats,
-		      struct gnet_stats_rate_est *rate_est,
-		      spinlock_t *stats_lock,
-		      struct rtattr *opt)
+int ngen_new_estimator(struct gnet_stats_basic *bstats,
+		       struct gnet_stats_rate_est *rate_est,
+		       spinlock_t *stats_lock, struct rtattr *opt,
+		       unsigned long *pgen_estimator)
 {
 	struct gen_estimator *est;
 	struct gnet_estimator *parm = RTA_DATA(opt);
@@ -184,6 +188,7 @@ int gen_new_estimator(struct gnet_stats_
 	est->avbps = rate_est->bps<<5;
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
+	*pgen_estimator = (unsigned long)est;
 
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
@@ -197,6 +202,32 @@ int gen_new_estimator(struct gnet_stats_
 	return 0;
 }
 
+/**
+ * gen_new_estimator - create a new rate estimator 
+ * @bstats: basic statistics
+ * @rate_est: rate estimator statistics
+ * @stats_lock: statistics lock
+ * @opt: rate estimator configuration TLV
+ *
+ * Creates a new rate estimator with &bstats as source and &rate_est
+ * as destination. A new timer with the interval specified in the
+ * configuration TLV is created. Upon each interval, the latest statistics
+ * will be read from &bstats and the estimated rate will be stored in
+ * &rate_est with the statistics lock grabed during this period.
+ *
+ * Returns 0 on success or a negative error code.
+ *
+ * NOTE: Called under rtnl_mutex
+ */
+int gen_new_estimator(struct gnet_stats_basic *bstats,
+		       struct gnet_stats_rate_est *rate_est,
+		       spinlock_t *stats_lock, struct rtattr *opt)
+{
+	unsigned long dump;
+
+	return ngen_new_estimator(bstats, rate_est, stats_lock, opt, &dump);
+}
+
 static void __gen_kill_estimator(struct rcu_head *head)
 {
 	struct gen_estimator *e = container_of(head,
@@ -209,8 +240,7 @@ static void __gen_kill_estimator(struct 
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  *
- * Removes the rate estimator specified by &bstats and &rate_est
- * and deletes the timer.
+ * Removes the rate estimator specified by &bstats and &rate_est.
  *
  * NOTE: Called under rtnl_mutex
  */
@@ -241,6 +271,32 @@ void gen_kill_estimator(struct gnet_stat
 }
 
 /**
+ * ngen_kill_estimator - remove a rate estimator (new version)
+ * @pgen_estimator: gen_estimator data got from ngen_new_estimator
+ *
+ * Removes the rate estimator specified by &pgen_estimator
+ * and replaces it with 0.
+ *
+ * NOTE: Called under rtnl_mutex
+ */
+void ngen_kill_estimator(unsigned long *pgen_estimator)
+{
+	if (pgen_estimator && *pgen_estimator) {
+		struct gen_estimator *e;
+		
+		e = (struct gen_estimator *)*pgen_estimator;
+		*pgen_estimator = 0;
+
+		write_lock_bh(&est_lock);
+		e->bstats = NULL;
+		write_unlock_bh(&est_lock);
+
+		list_del_rcu(&e->list);
+		call_rcu(&e->e_rcu, __gen_kill_estimator);
+	}
+}
+
+/**
  * gen_replace_estimator - replace rate estimator configuration
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
@@ -260,7 +316,32 @@ int gen_replace_estimator(struct gnet_st
 	return gen_new_estimator(bstats, rate_est, stats_lock, opt);
 }
 
+/**
+ * ngen_replace_estimator - replace rate estimator configuration (new version)
+ * @bstats: basic statistics
+ * @rate_est: rate estimator statistics
+ * @stats_lock: statistics lock
+ * @opt: rate estimator configuration TLV
+ * @pgen_estimator: gen_estimator data got from ngen_new_estimator
+ *
+ * Replaces the configuration of a rate estimator by calling
+ * ngen_kill_estimator and ngen_new_estimator.
+ *
+ * Returns 0 on success or a negative error code.
+ */
+int ngen_replace_estimator(struct gnet_stats_basic *bstats,
+			   struct gnet_stats_rate_est *rate_est,
+			   spinlock_t *stats_lock, struct rtattr *opt,
+		       	   unsigned long *pgen_estimator)
+{
+	ngen_kill_estimator(pgen_estimator);
+	return ngen_new_estimator(bstats, rate_est, stats_lock, opt,
+				  pgen_estimator);
+}
 
-EXPORT_SYMBOL(gen_kill_estimator);
 EXPORT_SYMBOL(gen_new_estimator);
+EXPORT_SYMBOL(gen_kill_estimator);
 EXPORT_SYMBOL(gen_replace_estimator);
+EXPORT_SYMBOL(ngen_new_estimator);
+EXPORT_SYMBOL(ngen_kill_estimator);
+EXPORT_SYMBOL(ngen_replace_estimator);

^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Eric Dumazet @ 2008-01-22  7:13 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: David Miller, rick.jones2, netdev
In-Reply-To: <1200984664.3151.253.camel@ymzhang>

Zhang, Yanmin a écrit :
> On Mon, 2008-01-21 at 22:22 -0800, David Miller wrote:
>> From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
>> Date: Tue, 22 Jan 2008 14:07:19 +0800
>>
>>> I am wondering if UDP stack in kernel has a bug.
>> If one server binds to INADDR_ANY with port N, then any other socket
>> can be bound to a specific IP address with port N.  When packets
>> come in destined for port N, the delivery will be prioritized
>> to whichever socket has the more specific and matching binding.
> What does 'more specific' mean here? I assume 127.0.0.1 should be
> prioritized before 0.0.0.0 which means packets should be queued to
> 127.0.0.1 firstly.

vi +278 net/ipv4/udp.c

                         int score = (sk->sk_family == PF_INET ? 1 : 0);
                         if (inet->rcv_saddr) {
                                 if (inet->rcv_saddr != daddr)
                                         continue;
                                 score+=2;
                         }
                         if (inet->daddr) {
                                 if (inet->daddr != saddr)
                                         continue;
                                 score+=2;
                         }
                         if (inet->dport) {
                                 if (inet->dport != sport)
                                         continue;
                                 score+=2;
                         }
                         if (sk->sk_bound_dev_if) {
                                 if (sk->sk_bound_dev_if != dif)
                                         continue;
                                 score+=2;
                         }

So in your case, socket bound to 127.0.0.1 should have a better score (+2) 
than other one, unless the other one got an >= score because of another match 
(rcv_saddr set or bounded to an interface)






^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox