Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Dave Young @ 2008-01-22  9:09 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: LKML, David Miller, Netdev, Andrew Morton
In-Reply-To: <a8e1da0801212037uaa34a10xc2239ac7309a4ed0@mail.gmail.com>

On Jan 22, 2008 12:37 PM, Dave Young <hidave.darkstar@gmail.com> wrote:
>
> On Jan 22, 2008 5:14 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
> >
> > On Mon, 21 Jan 2008, Dave Young wrote:
> >
> > > Please see the kernel messages following,(trigged while using some qemu session)
> > > BTW, seems there's some e100 error message as well.
> > >
> > > PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > > e100: Copyright(c) 1999-2006 Intel Corporation
> > > ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > > modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> > > e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> > > ACPI: PCI interrupt for device 0000:03:08.0 disabled
> > > e100: probe of 0000:03:08.0 failed with error -12
> > > eth0:  setting full-duplex.
> > > ------------[ cut here ]------------
> > > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> > >  [<c0132100>] ? printk+0x0/0x20
> > >  [<c0131834>] warn_on_slowpath+0x54/0x80
> > >  [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
> > >  [<c03e9527>] ? ip_output+0xe7/0x100
> > >  [<c03e8a88>] ? ip_local_out+0x18/0x20
> > >  [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
> > >  [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
> > >  [<c0186be1>] ? check_pad_bytes+0x61/0x80
> > >  [<c03f6031>] tcp_mark_head_lost+0x121/0x150
> > >  [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
> > >  [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
> > >  [<c03f7d93>] tcp_ack+0x1b3/0x3a0
> > >  [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
> > >  [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
> > >  [<c0401bbb>] tcp_v4_rcv+0x5db/0x660
> >
> > Doh, once more these S+L things..., the rest are symptom of the first
> > problem.
>
> What is the S+L thing? Could you explain a bit?
>
> >
> > What is strange is that it doesn't show up until now, the last TCP
> > changes that could have some significance are from early Dec/Nov. Is
> > there some reason why you haven't seen this before this (e.g., not
> > tested with similar cfg or so)?
>
> Hmm, don't know how to answer ...
>
>
> I'm a bit worried about its
> > reproducability if it takes this far to see it...
> >

It's trigged again in my pc, just while using firefox.

> >
> > --
> >  i.
> >
>

^ permalink raw reply

* [PATCH] SCTP: Fix kernel panic while received AUTH chunk with BAD shared key identifier
From: Wei Yongjun @ 2008-01-22  8:29 UTC (permalink / raw)
  To: netdev; +Cc: lksctp-developers, Vlad Yasevich

If SCTP-AUTH is enabled, received AUTH chunk with BAD shared key 
identifier will cause kernel panic.

Test as following:
step1: enabled /proc/sys/net/sctp/auth_enable
step 2:  connect  to SCTP server with auth capable. Association is 
established between endpoints. Then send a AUTH chunk with a bad 
shareid, SCTP server will kernel panic after received that AUTH chunk.

SCTP client                   SCTP server
  INIT         ---------->   
    (with auth capable)
               <----------    INIT-ACK
                              (with auth capable)
  COOKIE-ECHO  ---------->
               <----------    COOKIE-ACK
  AUTH         ---------->


AUTH chunk is like this:
  AUTH chunk
    Chunk type: AUTH (15)
    Chunk flags: 0x00
    Chunk length: 28
    Shared key identifier: 10
    HMAC identifier: SHA-1 (1)
    HMAC: 0000000000000000000000000000000000000000

kernel panic message:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000005
printing eip: c8f5de2e *pde = 07bc6067 *pte = 00000000
Oops: 0000 [#1] SMP
Modules linked in: sha256_generic md5 sctp ipv6 dm_mirror dm_mod sbs sbshc battery lp snd_ens1371 sg gameport snd_rawmidi snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss floppy snd_mixer_oss ide_cd snd_pcm cdrom serio_raw ac snd_timer snd button pcnet32 soundcore mii snd_page_alloc parport_pc parport i2c_piix4 i2c_core pcspkr mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd

Pid: 0, comm: swapper Not tainted (2.6.24-rc8 #1)
EIP: 0060:[<c8f5de2e>] EFLAGS: 00010202 CPU: 0
EIP is at sctp_auth_asoc_create_secret+0xe9/0x1a1 [sctp]
EAX: 00000056 EBX: c701a940 ECX: c701ab00 EDX: 00000001
ESI: c7ae9444 EDI: fffffffe EBP: c701a940 ESP: c0756cc0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c0756000 task=c06d63a0 task.ti=c070f000)
Stack: 00000020 00000020 c7ae9444 c701ab00 c701ab00 c701a940 c0756da8 c701a948
       c7ae8000 c7ad1e48 c7bee300 c7ad1e40 c8f5e183 c04058c0 c38b9bc0 00010246
       c7ad1e48 c7ad1e48 c0756da8 00000014 c0460992 0000007b 0000007b 00000014
Call Trace:
 [<c8f5e183>] sctp_auth_calculate_hmac+0x5a/0x126 [sctp]
 [<c04058c0>] apic_timer_interrupt+0x28/0x30
 [<c0460992>] kmemdup+0x14/0x33
 [<c8f46157>] sctp_sf_authenticate+0x126/0x160 [sctp]
 [<c8f4a068>] sctp_sf_eat_auth+0x13c/0x159 [sctp]
 [<c8f5d32c>] sctp_cname+0x0/0x38 [sctp]
 [<c8f4a835>] sctp_do_sm+0xb4/0x103f [sctp]
 [<c8f4e639>] sctp_assoc_bh_rcv+0xc1/0xf4 [sctp]
 [<c8f52b77>] sctp_inq_push+0x2a/0x2d [sctp]
 [<c8f5d24b>] sctp_rcv+0x5c3/0x6a4 [sctp]
 [<c0425241>] try_to_wake_up+0x3bb/0x3c5
 [<c042256f>] find_busiest_group+0x204/0x5f3
 [<c05dd7be>] ip_local_deliver_finish+0xda/0x17d
 [<c05dd6c5>] ip_rcv_finish+0x2c5/0x2e4
 [<c05dd91d>] ip_rcv+0x0/0x237
 [<c05c13f1>] netif_receive_skb+0x328/0x392
 [<c05c37c4>] process_backlog+0x5c/0x9a
 [<c05c32d2>] net_rx_action+0x8d/0x163
 [<c0432db7>] run_timer_softirq+0x2f/0x156
 [<c042fdd3>] __do_softirq+0x5d/0xc1
 [<c0406f38>] do_softirq+0x59/0xa8
 [<c0441e6b>] tick_handle_periodic+0x17/0x5c
 [<c041ae2a>] smp_apic_timer_interrupt+0x74/0x80
 [<c0403c87>] default_idle+0x0/0x3e
 [<c0403c87>] default_idle+0x0/0x3e
 [<c04058c0>] apic_timer_interrupt+0x28/0x30
 [<c0403c87>] default_idle+0x0/0x3e
 [<c0403cb3>] default_idle+0x2c/0x3e
 [<c0403571>] cpu_idle+0x92/0xab
 [<c07148ea>] start_kernel+0x2f7/0x2ff
 [<c07140e0>] unknown_bootoption+0x0/0x195
 =======================
Code: 89 6c 24 14 89 54 24 10 78 08 89 6c 24 10 89 54 24 14 8b 74 24 08 8b 4c 24 10 8b 5c 24 14 8b 56 0c 8b 41 04 03 43 04 85 d2 74 03 <03> 42 04 8b 54 24 04 e8 eb fe ff ff 85 c0 89 44 24 18 0f 84 84
EIP: [<c8f5de2e>] sctp_auth_asoc_create_secret+0xe9/0x1a1 [sctp] SS:ESP 0068:c0756cc0
Kernel panic - not syncing: Fatal exception in interrupt


This patch fix this problem.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>

--- a/net/sctp/auth.c	2008-01-21 00:03:25.000000000 -0500
+++ b/net/sctp/auth.c	2008-01-21 21:31:47.000000000 -0500
@@ -420,15 +420,15 @@ struct sctp_shared_key *sctp_auth_get_sh
 				const struct sctp_association *asoc,
 				__u16 key_id)
 {
-	struct sctp_shared_key *key = NULL;
+	struct sctp_shared_key *key;
 
 	/* First search associations set of endpoint pair shared keys */
 	key_for_each(key, &asoc->endpoint_shared_keys) {
 		if (key->key_id == key_id)
-			break;
+			return key;
 	}
 
-	return key;
+	return NULL;
 }
 
 /*



^ permalink raw reply

* Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: Dave Young @ 2008-01-22  8:24 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: David Miller, netdev, linux-kernel, bluez-devel, cornelia.huck,
	gombasg, htejun, viro, kay.sievers, greg
In-Reply-To: <1200982696.7978.148.camel@aeonflux>

On Jan 22, 2008 2:18 PM, Marcel Holtmann <marcel@holtmann.org> wrote:
> Hi Dave,
>
> > > Add people missed in cc-list.
> >
> > Thanks Dave for your continued efforts on Bluetooth bugs like this.
> >
> > Marcel, are you going to review/ACK/integrate/push-upstream/whatever
> > any of these Bluetooth patches?
> >
> > It hasn't been getting much love from you as of late, you are one of
> > the listed maintainers, and I don't want to lose any of Dave's
> > valuable bug fixing work.

Thanks.

>
> I will be fully back in business next week. Just got stuck in a project
> that needed 200% of my time to get it going.
>
> > Or should I just handle it all directly?
>
> I followed the list only a little bit, but from what I have seen is that
> Dave is doing a great job in tracking all issues down to the real cause.

Thanks.

>
> I had a look at his last patch and after review, I agree that this is a
> possible solution. I only have two nitpicks about the coding style. So
> in del_conn the struct device declaration should be made after the
> struct hci_conn assignment from the container and I would put an extra
> empty line before the devel_del, put_device block. Nitpicks only.

Marcel, could you tell something more about your coding style?
I would like to submit patches about bluetooth according to your sytle
later If I have.

Maybe you could put it on the bluez web site or anywhere.

>
> Right now I can't think of any side effects by this patch. Actually I
> only see an improvement with this patch. So please take it directly and
> starting with next week, I gonna make sure that they are handled again
> properly by me.
>
> Regards
>
> Marcel
>
>
>

^ permalink raw reply

* Re: [IPV4] ip_gre: should take care of CONFIG_IPV6_MODULE
From: Patrick McHardy @ 2008-01-22  8:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Linux Netdev List
In-Reply-To: <4795A096.60904@cosmosbay.com>

Eric Dumazet wrote:
> If IPV6 is configured as a module, GRE code misses some IPV6 parts.

I believe this is intentional to avoid a runtime dependency on ipv6.
Fixing this without pulling in the ipv6 module would be preferrable.

^ permalink raw reply

* [IPV4] ip_gre: should take care of CONFIG_IPV6_MODULE
From: Eric Dumazet @ 2008-01-22  7:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 120 bytes --]

If IPV6 is configured as a module, GRE code misses some IPV6 parts.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>


[-- Attachment #2: gre.patch --]
[-- Type: text/plain, Size: 1380 bytes --]

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 4b93f32..beaf450 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -40,7 +40,7 @@
 #include <net/inet_ecn.h>
 #include <net/xfrm.h>
 
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 #include <net/ipv6.h>
 #include <net/ip6_fib.h>
 #include <net/ip6_route.h>
@@ -705,7 +705,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 			if ((dst = rt->rt_gateway) == 0)
 				goto tx_error_icmp;
 		}
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 		else if (skb->protocol == htons(ETH_P_IPV6)) {
 			struct in6_addr *addr6;
 			int addr_type;
@@ -778,7 +778,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 			goto tx_error;
 		}
 	}
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 	else if (skb->protocol == htons(ETH_P_IPV6)) {
 		struct rt6_info *rt6 = (struct rt6_info*)skb->dst;
 
@@ -851,7 +851,7 @@ static int ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 	if ((iph->ttl = tiph->ttl) == 0) {
 		if (skb->protocol == htons(ETH_P_IP))
 			iph->ttl = old_iph->ttl;
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 		else if (skb->protocol == htons(ETH_P_IPV6))
 			iph->ttl = ((struct ipv6hdr*)old_iph)->hop_limit;
 #endif

^ permalink raw reply related

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: David Miller @ 2008-01-22  7:32 UTC (permalink / raw)
  To: yanmin_zhang; +Cc: dada1, rick.jones2, netdev
In-Reply-To: <1200984752.3151.261.camel@ymzhang>

From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Date: Tue, 22 Jan 2008 14:52:32 +0800

> I double-checked it and they are queued to socket A. If I define a
> different local port for netperf, packets will be queued to socket
> B.

This does not prove the kernel is buggy.

If netperf is binding to devices, that could make the kernel consider
the 0.0.0.0 bound socket equally preferable to the 127.0.0.1 bound
one.  When preference is equal, the first socket in the list is
choosen.

The algorithm is in net/ipv4/udp.c:__udp4_lib_lookup(), you
can look for yourself.  It uses a scoring system to decide
which socket to match.  Binding to a specific device gives
the score two points, so does binding to a specific local
address.

^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-22  7:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, slavon, kaber, hadi
In-Reply-To: <20080122072152.GA977@ff.dom.local>

On Tue, Jan 22, 2008 at 08:21:52AM +0100, Jarek Poplawski wrote:
...

Part 2 of mini RFC:

HTB changes to use new, faster gen_estimator functions._

This is done against 2.6.24-rc8-mm1. 

Thanks,
Jarek P.

---

diff -Nurp 2.6.24-rc8-mm1-/net/sched/sch_htb.c 2.6.24-rc8-mm1+/net/sched/sch_htb.c
--- 2.6.24-rc8-mm1-/net/sched/sch_htb.c	2008-01-19 17:54:49.000000000 +0100
+++ 2.6.24-rc8-mm1+/net/sched/sch_htb.c	2008-01-22 00:00:31.000000000 +0100
@@ -127,6 +127,7 @@ struct htb_class {
 	int prio;		/* For parent to leaf return possible here */
 	int quantum;		/* we do backup. Finally full replacement  */
 				/* of un.leaf originals should be done. */
+	unsigned long	gen_estimator;	/* ngen_new_estimator() data */
 };
 
 static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate,
@@ -1195,7 +1196,7 @@ static void htb_destroy_class(struct Qdi
 		BUG_TRAP(cl->un.leaf.q);
 		qdisc_destroy(cl->un.leaf.q);
 	}
-	gen_kill_estimator(&cl->bstats, &cl->rate_est);
+	ngen_kill_estimator(&cl->gen_estimator);
 	qdisc_put_rtab(cl->rate);
 	qdisc_put_rtab(cl->ceil);
 
@@ -1348,9 +1349,10 @@ static int htb_change_class(struct Qdisc
 		if ((cl = kzalloc(sizeof(*cl), GFP_KERNEL)) == NULL)
 			goto failure;
 
-		gen_new_estimator(&cl->bstats, &cl->rate_est,
-				  &sch->dev->queue_lock,
-				  tca[TCA_RATE-1] ? : &est.rta);
+		ngen_new_estimator(&cl->bstats, &cl->rate_est,
+				   &sch->dev->queue_lock,
+				   tca[TCA_RATE-1] ? : &est.rta,
+				   &cl->gen_estimator);
 		cl->refcnt = 1;
 		INIT_LIST_HEAD(&cl->sibling);
 		INIT_HLIST_NODE(&cl->hlist);
@@ -1404,9 +1406,10 @@ static int htb_change_class(struct Qdisc
 			      parent ? &parent->children : &q->root);
 	} else {
 		if (tca[TCA_RATE-1])
-			gen_replace_estimator(&cl->bstats, &cl->rate_est,
-					      &sch->dev->queue_lock,
-					      tca[TCA_RATE-1]);
+			ngen_replace_estimator(&cl->bstats, &cl->rate_est,
+					       &sch->dev->queue_lock,
+					       tca[TCA_RATE-1],
+					       &cl->gen_estimator);
 		sch_tree_lock(sch);
 	}
 

^ permalink raw reply

* Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
From: Jarek Poplawski @ 2008-01-22  7:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, slavon, kaber, hadi
In-Reply-To: <20080121.162918.148129860.davem@davemloft.net>

On 22-01-2008 01:29, David Miller wrote:
...
> Fix this right, make a structure like:
> 
> struct kernel_gnet_stats_rate_est {
> 	struct gnet_stats_rate_est	est;
> 	void				*gen_estimator;
> }
> 
> And update all the code as needed.

Thanks! I'll try this...

...But, as a matter of fact I've thought about something similar (of
course much worse), and I was afraid of doing quite a lot of changes
at once, maybe again skip something like here. So, maybe one more
tiny RFC here...

I've tried this from the other side: to make alternative, new api of
gen_estimator functions, and then the rest of changes without hurry.
This looks not very nice, but IMHO should be safer (especially
considering my 'knowledge' of this code and current changes). There
is this not very nice additional parameter used e.g. in
ngen_new_estimator(), but it seems, after all changes, this should
be more visible how and where this could be optimized. (And, after
all, this new pointer shouldn't be used very often, so could sit a
bit further.)

Anyway, if you don't like this idea, let me know and I'll try yours.
It will only need more time for this.

This is done against 2.6.24-rc8-mm1 with this 3/3 cosmetic patch. 
I'll send soon part 2: htb patch to use this.

Regards,
Jarek P.

---

 include/net/gen_stats.h  |   11 +++++
 net/core/gen_estimator.c |   99 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 101 insertions(+), 9 deletions(-)


diff -Nurp 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h
--- 2.6.24-rc8-mm1-p3-/include/net/gen_stats.h	2007-10-09 22:31:38.000000000 +0200
+++ 2.6.24-rc8-mm1-p3+/include/net/gen_stats.h	2008-01-22 00:13:48.000000000 +0100
@@ -46,4 +46,15 @@ extern int gen_replace_estimator(struct 
 				 struct gnet_stats_rate_est *rate_est,
 				 spinlock_t *stats_lock, struct rtattr *opt);
 
+extern int ngen_new_estimator(struct gnet_stats_basic *bstats,
+			      struct gnet_stats_rate_est *rate_est,
+			      spinlock_t *stats_lock, struct rtattr *opt,
+			      unsigned long *pgen_estimator);
+extern void ngen_kill_estimator(unsigned long *pgen_estimator);
+
+extern int ngen_replace_estimator(struct gnet_stats_basic *bstats,
+				  struct gnet_stats_rate_est *rate_est,
+				  spinlock_t *stats_lock, struct rtattr *opt,
+				  unsigned long *pgen_estimator);
+
 #endif
diff -Nurp 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c
--- 2.6.24-rc8-mm1-p3-/net/core/gen_estimator.c	2008-01-22 00:01:30.000000000 +0100
+++ 2.6.24-rc8-mm1-p3+/net/core/gen_estimator.c	2008-01-22 00:22:37.000000000 +0100
@@ -140,26 +140,30 @@ skip:
 }
 
 /**
- * gen_new_estimator - create a new rate estimator
+ * ngen_new_estimator - create a new rate estimator (new version)
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  * @stats_lock: statistics lock
  * @opt: rate estimator configuration TLV
+ * @pgen_estimator: pointer to return ngen_new_estimator data
  *
  * Creates a new rate estimator with &bstats as source and &rate_est
  * as destination. A new timer with the interval specified in the
  * configuration TLV is created. Upon each interval, the latest statistics
  * will be read from &bstats and the estimated rate will be stored in
  * &rate_est with the statistics lock grabed during this period.
+ * Called directly for pgen_estimator and possibility of fast kill
+ * or indirectly by gen_new_estimator.
  *
- * Returns 0 on success or a negative error code.
+ * Returns 0 and data pointed by &pgen_estimator on success
+ * or a negative error code.
  *
  * NOTE: Called under rtnl_mutex
  */
-int gen_new_estimator(struct gnet_stats_basic *bstats,
-		      struct gnet_stats_rate_est *rate_est,
-		      spinlock_t *stats_lock,
-		      struct rtattr *opt)
+int ngen_new_estimator(struct gnet_stats_basic *bstats,
+		       struct gnet_stats_rate_est *rate_est,
+		       spinlock_t *stats_lock, struct rtattr *opt,
+		       unsigned long *pgen_estimator)
 {
 	struct gen_estimator *est;
 	struct gnet_estimator *parm = RTA_DATA(opt);
@@ -184,6 +188,7 @@ int gen_new_estimator(struct gnet_stats_
 	est->avbps = rate_est->bps<<5;
 	est->last_packets = bstats->packets;
 	est->avpps = rate_est->pps<<10;
+	*pgen_estimator = (unsigned long)est;
 
 	if (!elist[idx].timer.function) {
 		INIT_LIST_HEAD(&elist[idx].list);
@@ -197,6 +202,32 @@ int gen_new_estimator(struct gnet_stats_
 	return 0;
 }
 
+/**
+ * gen_new_estimator - create a new rate estimator 
+ * @bstats: basic statistics
+ * @rate_est: rate estimator statistics
+ * @stats_lock: statistics lock
+ * @opt: rate estimator configuration TLV
+ *
+ * Creates a new rate estimator with &bstats as source and &rate_est
+ * as destination. A new timer with the interval specified in the
+ * configuration TLV is created. Upon each interval, the latest statistics
+ * will be read from &bstats and the estimated rate will be stored in
+ * &rate_est with the statistics lock grabed during this period.
+ *
+ * Returns 0 on success or a negative error code.
+ *
+ * NOTE: Called under rtnl_mutex
+ */
+int gen_new_estimator(struct gnet_stats_basic *bstats,
+		       struct gnet_stats_rate_est *rate_est,
+		       spinlock_t *stats_lock, struct rtattr *opt)
+{
+	unsigned long dump;
+
+	return ngen_new_estimator(bstats, rate_est, stats_lock, opt, &dump);
+}
+
 static void __gen_kill_estimator(struct rcu_head *head)
 {
 	struct gen_estimator *e = container_of(head,
@@ -209,8 +240,7 @@ static void __gen_kill_estimator(struct 
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
  *
- * Removes the rate estimator specified by &bstats and &rate_est
- * and deletes the timer.
+ * Removes the rate estimator specified by &bstats and &rate_est.
  *
  * NOTE: Called under rtnl_mutex
  */
@@ -241,6 +271,32 @@ void gen_kill_estimator(struct gnet_stat
 }
 
 /**
+ * ngen_kill_estimator - remove a rate estimator (new version)
+ * @pgen_estimator: gen_estimator data got from ngen_new_estimator
+ *
+ * Removes the rate estimator specified by &pgen_estimator
+ * and replaces it with 0.
+ *
+ * NOTE: Called under rtnl_mutex
+ */
+void ngen_kill_estimator(unsigned long *pgen_estimator)
+{
+	if (pgen_estimator && *pgen_estimator) {
+		struct gen_estimator *e;
+		
+		e = (struct gen_estimator *)*pgen_estimator;
+		*pgen_estimator = 0;
+
+		write_lock_bh(&est_lock);
+		e->bstats = NULL;
+		write_unlock_bh(&est_lock);
+
+		list_del_rcu(&e->list);
+		call_rcu(&e->e_rcu, __gen_kill_estimator);
+	}
+}
+
+/**
  * gen_replace_estimator - replace rate estimator configuration
  * @bstats: basic statistics
  * @rate_est: rate estimator statistics
@@ -260,7 +316,32 @@ int gen_replace_estimator(struct gnet_st
 	return gen_new_estimator(bstats, rate_est, stats_lock, opt);
 }
 
+/**
+ * ngen_replace_estimator - replace rate estimator configuration (new version)
+ * @bstats: basic statistics
+ * @rate_est: rate estimator statistics
+ * @stats_lock: statistics lock
+ * @opt: rate estimator configuration TLV
+ * @pgen_estimator: gen_estimator data got from ngen_new_estimator
+ *
+ * Replaces the configuration of a rate estimator by calling
+ * ngen_kill_estimator and ngen_new_estimator.
+ *
+ * Returns 0 on success or a negative error code.
+ */
+int ngen_replace_estimator(struct gnet_stats_basic *bstats,
+			   struct gnet_stats_rate_est *rate_est,
+			   spinlock_t *stats_lock, struct rtattr *opt,
+		       	   unsigned long *pgen_estimator)
+{
+	ngen_kill_estimator(pgen_estimator);
+	return ngen_new_estimator(bstats, rate_est, stats_lock, opt,
+				  pgen_estimator);
+}
 
-EXPORT_SYMBOL(gen_kill_estimator);
 EXPORT_SYMBOL(gen_new_estimator);
+EXPORT_SYMBOL(gen_kill_estimator);
 EXPORT_SYMBOL(gen_replace_estimator);
+EXPORT_SYMBOL(ngen_new_estimator);
+EXPORT_SYMBOL(ngen_kill_estimator);
+EXPORT_SYMBOL(ngen_replace_estimator);

^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Eric Dumazet @ 2008-01-22  7:13 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: David Miller, rick.jones2, netdev
In-Reply-To: <1200984664.3151.253.camel@ymzhang>

Zhang, Yanmin a écrit :
> On Mon, 2008-01-21 at 22:22 -0800, David Miller wrote:
>> From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
>> Date: Tue, 22 Jan 2008 14:07:19 +0800
>>
>>> I am wondering if UDP stack in kernel has a bug.
>> If one server binds to INADDR_ANY with port N, then any other socket
>> can be bound to a specific IP address with port N.  When packets
>> come in destined for port N, the delivery will be prioritized
>> to whichever socket has the more specific and matching binding.
> What does 'more specific' mean here? I assume 127.0.0.1 should be
> prioritized before 0.0.0.0 which means packets should be queued to
> 127.0.0.1 firstly.

vi +278 net/ipv4/udp.c

                         int score = (sk->sk_family == PF_INET ? 1 : 0);
                         if (inet->rcv_saddr) {
                                 if (inet->rcv_saddr != daddr)
                                         continue;
                                 score+=2;
                         }
                         if (inet->daddr) {
                                 if (inet->daddr != saddr)
                                         continue;
                                 score+=2;
                         }
                         if (inet->dport) {
                                 if (inet->dport != sport)
                                         continue;
                                 score+=2;
                         }
                         if (sk->sk_bound_dev_if) {
                                 if (sk->sk_bound_dev_if != dif)
                                         continue;
                                 score+=2;
                         }

So in your case, socket bound to 127.0.0.1 should have a better score (+2) 
than other one, unless the other one got an >= score because of another match 
(rcv_saddr set or bounded to an interface)






^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Zhang, Yanmin @ 2008-01-22  6:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, David Miller
In-Reply-To: <47958CC8.9060609@cosmosbay.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 2221 bytes --]

On Tue, 2008-01-22 at 07:27 +0100, Eric Dumazet wrote:
> Zhang, Yanmin a écrit :
> > On Tue, 2008-01-22 at 13:24 +0800, Zhang, Yanmin wrote:
> >> On Mon, 2008-01-14 at 09:46 -0800, Rick Jones wrote:
> >>>>> *) netperf/netserver support CPU affinity within themselves with the 
> >>>>> global -T option to netperf.  Is the result with taskset much different? 
> >>>>>   The equivalent to the above would be to run netperf with:
> >>>>>
> >>>>> ./netperf -T 0,7 ..
> >>>> I checked the source codes and didn't find this option.
> >>>> I use netperf V2.3 (I found the number in the makefile).
> >>> Indeed, that version pre-dates the -T option.  If you weren't already 
> >>> chasing a regression I'd suggest an upgrade to 2.4.mumble.  Once you are 
> >>> at a point where changing another variable won't muddle things you may 
> >>> want to consider upgrading.
> >>>
> >>> happy benchmarking,
> >> Rick,
> >>
> >> I found my UDP_RR testing is just loop in netperf instead of ping-pang between
> >> netserver and netperf. Is it correct? TCP_RR is ok.
> >>
> >> #./netserver
> >> #./netperf -t UDP_RR -l 60 -H 127.0.0.1 -i 30,3 -I 99,5 -- -P 12384 -r 1,1
> > I digged into netperf and netserver.
> > 
> > netperf binds ip 0 and port 12384 to its own socket. netserver binds ip
> > 127.0.0.1 and port 12384 to its own socket. Then, netperf calls connect to setup server
> > 127.0.0.1 and port 12384. Then, netperf starts sends UDP packets, but all packets netperf
> > sends are just received by netperf itself. netserver doesn't receive any packet.
> > 
> > I think netperf binding should fail, or netperf shouldn't get the packet it sends out, because
> > netserver already binds port 12384.
> > 
> > I am wondering if UDP stack in kernel has a bug.
> 
> If :
> - socket A is bound to 0.0.0.0:12384 and
> - socket B is bound to 127.0.0.1:12384
> 
> Then packets sent to 127.0.0.1:12384 should be queued for socket B
> 
> If they are queued to socket A as you believe it is currently done, then yes 
> there is a bug in kernel.
I double-checked it and they are queued to socket A. If I define a different local port
for netperf, packets will be queued to socket B.

-yanmin



^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Zhang, Yanmin @ 2008-01-22  6:51 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev
In-Reply-To: <20080121.222214.184161381.davem@davemloft.net>

On Mon, 2008-01-21 at 22:22 -0800, David Miller wrote:
> From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
> Date: Tue, 22 Jan 2008 14:07:19 +0800
> 
> > I am wondering if UDP stack in kernel has a bug.
> 
> If one server binds to INADDR_ANY with port N, then any other socket
> can be bound to a specific IP address with port N.  When packets
> come in destined for port N, the delivery will be prioritized
> to whichever socket has the more specific and matching binding.
What does 'more specific' mean here? I assume 127.0.0.1 should be
prioritized before 0.0.0.0 which means packets should be queued to
127.0.0.1 firstly.

> 
> So the kernel is fine.
But kernel now queues packets to 0.0.0.0.

> 
> Netperf just needs to be more careful in order to handle this kind of
> case more cleanly.
It's better if kernel works more reasonable.

-yanmin



^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Eric Dumazet @ 2008-01-22  6:27 UTC (permalink / raw)
  To: Zhang, Yanmin; +Cc: Rick Jones, netdev, David Miller
In-Reply-To: <1200982039.3151.120.camel@ymzhang>

Zhang, Yanmin a écrit :
> On Tue, 2008-01-22 at 13:24 +0800, Zhang, Yanmin wrote:
>> On Mon, 2008-01-14 at 09:46 -0800, Rick Jones wrote:
>>>>> *) netperf/netserver support CPU affinity within themselves with the 
>>>>> global -T option to netperf.  Is the result with taskset much different? 
>>>>>   The equivalent to the above would be to run netperf with:
>>>>>
>>>>> ./netperf -T 0,7 ..
>>>> I checked the source codes and didn't find this option.
>>>> I use netperf V2.3 (I found the number in the makefile).
>>> Indeed, that version pre-dates the -T option.  If you weren't already 
>>> chasing a regression I'd suggest an upgrade to 2.4.mumble.  Once you are 
>>> at a point where changing another variable won't muddle things you may 
>>> want to consider upgrading.
>>>
>>> happy benchmarking,
>> Rick,
>>
>> I found my UDP_RR testing is just loop in netperf instead of ping-pang between
>> netserver and netperf. Is it correct? TCP_RR is ok.
>>
>> #./netserver
>> #./netperf -t UDP_RR -l 60 -H 127.0.0.1 -i 30,3 -I 99,5 -- -P 12384 -r 1,1
> I digged into netperf and netserver.
> 
> netperf binds ip 0 and port 12384 to its own socket. netserver binds ip
> 127.0.0.1 and port 12384 to its own socket. Then, netperf calls connect to setup server
> 127.0.0.1 and port 12384. Then, netperf starts sends UDP packets, but all packets netperf
> sends are just received by netperf itself. netserver doesn't receive any packet.
> 
> I think netperf binding should fail, or netperf shouldn't get the packet it sends out, because
> netserver already binds port 12384.
> 
> I am wondering if UDP stack in kernel has a bug.

If :
- socket A is bound to 0.0.0.0:12384 and
- socket B is bound to 127.0.0.1:12384

Then packets sent to 127.0.0.1:12384 should be queued for socket B

If they are queued to socket A as you believe it is currently done, then yes 
there is a bug in kernel.

> 
> TCP_RR testing hasn't such issue.
> 
> -yanmin
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply

* Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: David Miller @ 2008-01-22  6:26 UTC (permalink / raw)
  To: marcel
  Cc: hidave.darkstar, netdev, linux-kernel, bluez-devel, cornelia.huck,
	gombasg, htejun, viro, kay.sievers, greg
In-Reply-To: <1200982696.7978.148.camel@aeonflux>

From: Marcel Holtmann <marcel@holtmann.org>
Date: Tue, 22 Jan 2008 07:18:16 +0100

> Right now I can't think of any side effects by this patch. Actually I
> only see an improvement with this patch. So please take it directly and
> starting with next week, I gonna make sure that they are handled again
> properly by me.

Excellent, I'll do that.

Thanks for the feedback.

^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: David Miller @ 2008-01-22  6:22 UTC (permalink / raw)
  To: yanmin_zhang; +Cc: rick.jones2, netdev
In-Reply-To: <1200982039.3151.120.camel@ymzhang>

From: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Date: Tue, 22 Jan 2008 14:07:19 +0800

> I am wondering if UDP stack in kernel has a bug.

If one server binds to INADDR_ANY with port N, then any other socket
can be bound to a specific IP address with port N.  When packets
come in destined for port N, the delivery will be prioritized
to whichever socket has the more specific and matching binding.

So the kernel is fine.

Netperf just needs to be more careful in order to handle this kind of
case more cleanly.

^ permalink raw reply

* Re: [PATCH] bluetooth : move children of connection device to NULL before connection down
From: Marcel Holtmann @ 2008-01-22  6:18 UTC (permalink / raw)
  To: David Miller
  Cc: hidave.darkstar, netdev, linux-kernel, bluez-devel, cornelia.huck,
	gombasg, htejun, viro, kay.sievers, greg
In-Reply-To: <20080121.031417.171098597.davem@davemloft.net>

Hi Dave,

> > Add people missed in cc-list.
> 
> Thanks Dave for your continued efforts on Bluetooth bugs like this.
> 
> Marcel, are you going to review/ACK/integrate/push-upstream/whatever
> any of these Bluetooth patches?
> 
> It hasn't been getting much love from you as of late, you are one of
> the listed maintainers, and I don't want to lose any of Dave's
> valuable bug fixing work.

I will be fully back in business next week. Just got stuck in a project
that needed 200% of my time to get it going.

> Or should I just handle it all directly?

I followed the list only a little bit, but from what I have seen is that
Dave is doing a great job in tracking all issues down to the real cause.

I had a look at his last patch and after review, I agree that this is a
possible solution. I only have two nitpicks about the coding style. So
in del_conn the struct device declaration should be made after the
struct hci_conn assignment from the container and I would put an extra
empty line before the devel_del, put_device block. Nitpicks only.

Right now I can't think of any side effects by this patch. Actually I
only see an improvement with this patch. So please take it directly and
starting with next week, I gonna make sure that they are handled again
properly by me.

Regards

Marcel

^ permalink raw reply

* [PATCH] pci-skeleton: Misc fixes to build neatly
From: Jike Song @ 2008-01-22  6:16 UTC (permalink / raw)
  To: jeff; +Cc: netdev, linux-kernel

Hello Jeff,

The pci-skeleton.c has several problems with compilation, such as missing args
when calling synchronize_irq(). Fix it.

Signed-off-by: Jike Song <albcamus@gmail.com>
---
 drivers/net/pci-skeleton.c |   49 ++++++++++++++++++++++---------------------
 1 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/net/pci-skeleton.c b/drivers/net/pci-skeleton.c
index ed402e0..fffc49b 100644
--- a/drivers/net/pci-skeleton.c
+++ b/drivers/net/pci-skeleton.c
@@ -541,7 +541,7 @@ static void netdrv_hw_start (struct net_device *dev);
 #define NETDRV_W32_F(reg, val32)	do { writel ((val32), ioaddr +
(reg)); readl (ioaddr + (reg)); } while (0)


-#if MMIO_FLUSH_AUDIT_COMPLETE
+#ifdef MMIO_FLUSH_AUDIT_COMPLETE

 /* write MMIO register */
 #define NETDRV_W8(reg, val8)	writeb ((val8), ioaddr + (reg))
@@ -603,7 +603,7 @@ static int __devinit netdrv_init_board (struct
pci_dev *pdev,
 		return -ENOMEM;
 	}
 	SET_NETDEV_DEV(dev, &pdev->dev);
-	tp = dev->priv;
+	tp = netdev_priv(dev);

 	/* enable device (incl. PCI PM wakeup), and bus-mastering */
 	rc = pci_enable_device (pdev);
@@ -759,7 +759,7 @@ static int __devinit netdrv_init_one (struct pci_dev *pdev,
 		return i;
 	}

-	tp = dev->priv;
+	tp = netdev_priv(dev);

 	assert (ioaddr != NULL);
 	assert (dev != NULL);
@@ -783,7 +783,7 @@ static int __devinit netdrv_init_one (struct pci_dev *pdev,
 	dev->base_addr = (unsigned long) ioaddr;

 	/* dev->priv/tp zeroed and aligned in alloc_etherdev */
-	tp = dev->priv;
+	tp = netdev_priv(dev);

 	/* note: tp->chipset set in netdrv_init_board */
 	tp->drv_flags = PCI_COMMAND_IO | PCI_COMMAND_MEMORY |
@@ -841,7 +841,7 @@ static void __devexit netdrv_remove_one (struct
pci_dev *pdev)

 	assert (dev != NULL);

-	np = dev->priv;
+	np = netdev_priv(dev);
 	assert (np != NULL);

 	unregister_netdev (dev);
@@ -974,7 +974,7 @@ static void mdio_sync (void *mdio_addr)

 static int mdio_read (struct net_device *dev, int phy_id, int location)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *mdio_addr = tp->mmio_addr + Config4;
 	int mii_cmd = (0xf6 << 10) | (phy_id << 5) | location;
 	int retval = 0;
@@ -1017,7 +1017,7 @@ static int mdio_read (struct net_device *dev,
int phy_id, int location)
 static void mdio_write (struct net_device *dev, int phy_id, int location,
 			int value)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *mdio_addr = tp->mmio_addr + Config4;
 	int mii_cmd =
 	    (0x5002 << 16) | (phy_id << 23) | (location << 18) | value;
@@ -1060,7 +1060,7 @@ static void mdio_write (struct net_device *dev,
int phy_id, int location,

 static int netdrv_open (struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	int retval;
 #ifdef NETDRV_DEBUG
 	void *ioaddr = tp->mmio_addr;
@@ -1121,7 +1121,7 @@ static int netdrv_open (struct net_device *dev)
 /* Start the hardware at open or resume. */
 static void netdrv_hw_start (struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	u32 i;

@@ -1191,7 +1191,7 @@ static void netdrv_hw_start (struct net_device *dev)
 /* Initialize the Rx and Tx rings, along with various 'dev' bits. */
 static void netdrv_init_ring (struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	int i;

 	DPRINTK ("ENTER\n");
@@ -1213,7 +1213,7 @@ static void netdrv_init_ring (struct net_device *dev)
 static void netdrv_timer (unsigned long data)
 {
 	struct net_device *dev = (struct net_device *) data;
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	int next_tick = 60 * HZ;
 	int mii_lpa;
@@ -1252,9 +1252,10 @@ static void netdrv_timer (unsigned long data)
 }


-static void netdrv_tx_clear (struct netdrv_private *tp)
+static void netdrv_tx_clear (struct net_device *dev)
 {
 	int i;
+	struct netdrv_private *tp = netdev_priv(dev);

 	atomic_set (&tp->cur_tx, 0);
 	atomic_set (&tp->dirty_tx, 0);
@@ -1278,7 +1279,7 @@ static void netdrv_tx_clear (struct netdrv_private *tp)

 static void netdrv_tx_timeout (struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	int i;
 	u8 tmp8;
@@ -1311,7 +1312,7 @@ static void netdrv_tx_timeout (struct net_device *dev)
 	/* Stop a shared interrupt from scavenging while we are. */
 	spin_lock_irqsave (&tp->lock, flags);

-	netdrv_tx_clear (tp);
+	netdrv_tx_clear (dev);

 	spin_unlock_irqrestore (&tp->lock, flags);

@@ -1325,7 +1326,7 @@ static void netdrv_tx_timeout (struct net_device *dev)

 static int netdrv_start_xmit (struct sk_buff *skb, struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	int entry;

@@ -1525,7 +1526,7 @@ static void netdrv_rx_interrupt (struct net_device *dev,
 		DPRINTK ("%s:  netdrv_rx() status %4.4x, size %4.4x,"
 			 " cur %4.4x.\n", dev->name, rx_status,
 			 rx_size, cur_rx);
-#if NETDRV_DEBUG > 2
+#if defined(NETDRV_DEBUG) && (NETDRV_DEBUG > 2)
 		{
 			int i;
 			DPRINTK ("%s: Frame contents ", dev->name);
@@ -1648,7 +1649,7 @@ static void netdrv_weird_interrupt (struct
net_device *dev,
 static irqreturn_t netdrv_interrupt (int irq, void *dev_instance)
 {
 	struct net_device *dev = (struct net_device *) dev_instance;
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	int boguscnt = max_interrupt_work;
 	void *ioaddr = tp->mmio_addr;
 	int status = 0, link_changed = 0; /* avoid bogus "uninit" warning */
@@ -1711,7 +1712,7 @@ static irqreturn_t netdrv_interrupt (int irq,
void *dev_instance)

 static int netdrv_close (struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	unsigned long flags;

@@ -1738,10 +1739,10 @@ static int netdrv_close (struct net_device *dev)

 	spin_unlock_irqrestore (&tp->lock, flags);

-	synchronize_irq ();
+	synchronize_irq (dev->irq);
 	free_irq (dev->irq, dev);

-	netdrv_tx_clear (tp);
+	netdrv_tx_clear (dev);

 	pci_free_consistent(tp->pci_dev, RX_BUF_TOT_LEN,
 			    tp->rx_ring, tp->rx_ring_dma);
@@ -1762,7 +1763,7 @@ static int netdrv_close (struct net_device *dev)

 static int netdrv_ioctl (struct net_device *dev, struct ifreq *rq, int cmd)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	struct mii_ioctl_data *data = if_mii(rq);
 	unsigned long flags;
 	int rc = 0;
@@ -1805,7 +1806,7 @@ static int netdrv_ioctl (struct net_device *dev,
struct ifreq *rq, int cmd)

 static void netdrv_set_rx_mode (struct net_device *dev)
 {
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	u32 mc_filter[2];	/* Multicast hash filter */
 	int i, rx_mode;
@@ -1862,7 +1863,7 @@ static void netdrv_set_rx_mode (struct net_device *dev)
 static int netdrv_suspend (struct pci_dev *pdev, pm_message_t state)
 {
 	struct net_device *dev = pci_get_drvdata (pdev);
-	struct netdrv_private *tp = dev->priv;
+	struct netdrv_private *tp = netdev_priv(dev);
 	void *ioaddr = tp->mmio_addr;
 	unsigned long flags;

@@ -1892,7 +1893,7 @@ static int netdrv_suspend (struct pci_dev *pdev,
pm_message_t state)
 static int netdrv_resume (struct pci_dev *pdev)
 {
 	struct net_device *dev = pci_get_drvdata (pdev);
-	struct netdrv_private *tp = dev->priv;
+	/*struct netdrv_private *tp = netdev_priv(dev);*/

 	if (!netif_running(dev))
 		return 0;
-- 
1.5.3.4

^ permalink raw reply related

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Zhang, Yanmin @ 2008-01-22  6:07 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, David Miller
In-Reply-To: <1200979482.3151.103.camel@ymzhang>

On Tue, 2008-01-22 at 13:24 +0800, Zhang, Yanmin wrote:
> On Mon, 2008-01-14 at 09:46 -0800, Rick Jones wrote:
> > >>*) netperf/netserver support CPU affinity within themselves with the 
> > >>global -T option to netperf.  Is the result with taskset much different? 
> > >>   The equivalent to the above would be to run netperf with:
> > >>
> > >>./netperf -T 0,7 ..
> > > 
> > > I checked the source codes and didn't find this option.
> > > I use netperf V2.3 (I found the number in the makefile).
> > 
> > Indeed, that version pre-dates the -T option.  If you weren't already 
> > chasing a regression I'd suggest an upgrade to 2.4.mumble.  Once you are 
> > at a point where changing another variable won't muddle things you may 
> > want to consider upgrading.
> > 
> > happy benchmarking,
> Rick,
> 
> I found my UDP_RR testing is just loop in netperf instead of ping-pang between
> netserver and netperf. Is it correct? TCP_RR is ok.
> 
> #./netserver
> #./netperf -t UDP_RR -l 60 -H 127.0.0.1 -i 30,3 -I 99,5 -- -P 12384 -r 1,1
I digged into netperf and netserver.

netperf binds ip 0 and port 12384 to its own socket. netserver binds ip
127.0.0.1 and port 12384 to its own socket. Then, netperf calls connect to setup server
127.0.0.1 and port 12384. Then, netperf starts sends UDP packets, but all packets netperf
sends are just received by netperf itself. netserver doesn't receive any packet.

I think netperf binding should fail, or netperf shouldn't get the packet it sends out, because
netserver already binds port 12384.

I am wondering if UDP stack in kernel has a bug.

TCP_RR testing hasn't such issue.

-yanmin



^ permalink raw reply

* Re: questions on NAPI processing latency and dropped network packets
From: Eric Dumazet @ 2008-01-22  5:46 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev, linux-kernel
In-Reply-To: <479529DF.5030707@nortel.com>

Chris Friesen a écrit :
> Eric Dumazet wrote:
>> Chris Friesen a écrit :
>>
>>> I've done some further digging, and it appears that one of the 
>>> problems we may be facing is very high instantaneous traffic rates.
>>>
>>> Instrumentation showed up to 222K packets/sec for short periods (at 
>>> least 1.1 ms, possibly longer), although the long-term average is 
>>> down around 14-16K packets/sec.
>>
>>
>> Instrumentation done where exactly ?
> 
> I added some code to e1000_clean_rx_irq() to track rx_fifo drops, total 
> packets received, and an accurate timestamp.
> 
> If rx_fifo errors changed, it would dump the information.
> 
>>> Is there anything else we can do to minimize the latency of network 
>>> packet processing and avoid having to crank the rx ring size up so high?
> 
>> You have some tasks that disable softirqs too long. Sometimes, bumping 
>> RX ring size is OK (but you will still have delays), sometimes it is 
>> not an option, since 4096 is the limit on current hardware.
> 
> I added some instrumentation to take timestamps in __do_softirq() as 
> well.  Based on these timestamps, I can see the following code sequence:
> 
> 2374604616 usec, start processing softirqs in __do_softirq()
> 2374610337 usec, log values in e1000_clean_rx_irq()
> 2374611411 usec, log values in e1000_clean_rx_irq()
> 
> In between the successive calls to e1000_clean_rx_irq() the rx_fifo 
> counts went up.
> 
> Does anyone have any patchsets to track down what softirqs are taking a 
> long time, and/or who's disabling softirqs?
> 

Not for linux-2.6.10 unfortunatly.

Check net/ipv4/route.c, where many improvements can be done, especially if you 
have a large rt cache

grep . /proc/sys/net/ipv4/route/*

^ permalink raw reply

* Re: Netperf TCP_RR(loopback) 10% regression in 2.6.24-rc6, comparing with 2.6.22
From: Zhang, Yanmin @ 2008-01-22  5:24 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <478B9FE0.3040801@hp.com>

On Mon, 2008-01-14 at 09:46 -0800, Rick Jones wrote:
> >>*) netperf/netserver support CPU affinity within themselves with the 
> >>global -T option to netperf.  Is the result with taskset much different? 
> >>   The equivalent to the above would be to run netperf with:
> >>
> >>./netperf -T 0,7 ..
> > 
> > I checked the source codes and didn't find this option.
> > I use netperf V2.3 (I found the number in the makefile).
> 
> Indeed, that version pre-dates the -T option.  If you weren't already 
> chasing a regression I'd suggest an upgrade to 2.4.mumble.  Once you are 
> at a point where changing another variable won't muddle things you may 
> want to consider upgrading.
> 
> happy benchmarking,
Rick,

I found my UDP_RR testing is just loop in netperf instead of ping-pang between
netserver and netperf. Is it correct? TCP_RR is ok.

#./netserver
#./netperf -t UDP_RR -l 60 -H 127.0.0.1 -i 30,3 -I 99,5 -- -P 12384 -r 1,1

Thanks,
-yanmin



^ permalink raw reply

* Re: 2.6.24-rc8-mm1 : net tcp_input.c warnings
From: Dave Young @ 2008-01-22  4:37 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: LKML, David Miller, Netdev, Andrew Morton
In-Reply-To: <Pine.LNX.4.64.0801212302170.29700@kivilampi-30.cs.helsinki.fi>

On Jan 22, 2008 5:14 AM, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
>
> On Mon, 21 Jan 2008, Dave Young wrote:
>
> > Please see the kernel messages following,(trigged while using some qemu session)
> > BTW, seems there's some e100 error message as well.
> >
> > PCI: Setting latency timer of device 0000:00:1b.0 to 64
> > e100: Intel(R) PRO/100 Network Driver, 3.5.23-k4-NAPI
> > e100: Copyright(c) 1999-2006 Intel Corporation
> > ACPI: PCI Interrupt 0000:03:08.0[A] -> GSI 20 (level, low) -> IRQ 20
> > modprobe:2331 conflicting cache attribute efaff000-efb00000 uncached<->default
> > e100: 0000:03:08.0: e100_probe: Cannot map device registers, aborting.
> > ACPI: PCI interrupt for device 0000:03:08.0 disabled
> > e100: probe of 0000:03:08.0 failed with error -12
> > eth0:  setting full-duplex.
> > ------------[ cut here ]------------
> > WARNING: at net/ipv4/tcp_input.c:2169 tcp_mark_head_lost+0x121/0x150()
> > Modules linked in: snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss eeprom e100 psmouse snd_hda_intel snd_pcm snd_timer btusb rtc_cmos thermal bluetooth rtc_core serio_raw intel_agp button processor sg snd rtc_lib i2c_i801 evdev agpgart soundcore dcdbas 3c59x pcspkr snd_page_alloc
> > Pid: 0, comm: swapper Not tainted 2.6.24-rc8-mm1 #4
> >  [<c0132100>] ? printk+0x0/0x20
> >  [<c0131834>] warn_on_slowpath+0x54/0x80
> >  [<c03e8df8>] ? ip_finish_output+0x128/0x2e0
> >  [<c03e9527>] ? ip_output+0xe7/0x100
> >  [<c03e8a88>] ? ip_local_out+0x18/0x20
> >  [<c03e991c>] ? ip_queue_xmit+0x3dc/0x470
> >  [<c043641e>] ? _spin_unlock_irqrestore+0x5e/0x70
> >  [<c0186be1>] ? check_pad_bytes+0x61/0x80
> >  [<c03f6031>] tcp_mark_head_lost+0x121/0x150
> >  [<c03f60ac>] tcp_update_scoreboard+0x4c/0x170
> >  [<c03f6e0a>] tcp_fastretrans_alert+0x48a/0x6b0
> >  [<c03f7d93>] tcp_ack+0x1b3/0x3a0
> >  [<c03fa14b>] tcp_rcv_established+0x3eb/0x710
> >  [<c04015c5>] tcp_v4_do_rcv+0xe5/0x100
> >  [<c0401bbb>] tcp_v4_rcv+0x5db/0x660
>
> Doh, once more these S+L things..., the rest are symptom of the first
> problem.

What is the S+L thing? Could you explain a bit?

>
> What is strange is that it doesn't show up until now, the last TCP
> changes that could have some significance are from early Dec/Nov. Is
> there some reason why you haven't seen this before this (e.g., not
> tested with similar cfg or so)?

Hmm, don't know how to answer ...

I'm a bit worried about its
> reproducability if it takes this far to see it...
>
>
> --
>  i.
>

^ permalink raw reply

* Re: sky2: call sky2_set_multicast from sky2_restart?
From: Stephen Hemminger @ 2008-01-22  4:02 UTC (permalink / raw)
  To: Tom Burns; +Cc: netdev
In-Reply-To: <671c20540801181130w4ec704acu42a89a78e8b84cb3@mail.gmail.com>

On Fri, 18 Jan 2008 14:30:52 -0500
"Tom Burns" <tom.i.burns@gmail.com> wrote:

> Hi List,
> 
> Throughout the sky2.c file any call to sky2_up(dev) is followed (soon
> enough) by a call to sky2_set_multicast(dev).  Should this not also be
> the case when it is called in sky2_restart()?
> 
> Cheers,
> Tom Burns
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

It probably should be done, but not worth forcing into 2.6.24 at this late date.

^ permalink raw reply

* Re: [PATCH 2/4] dsmark: get rid of trivial function
From: David Miller @ 2008-01-22  3:59 UTC (permalink / raw)
  To: stephen.hemminger; +Cc: netdev
In-Reply-To: <20080121194751.31fd42a0@speedy>

From: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date: Mon, 21 Jan 2008 19:47:51 -0800

> >  
> >  	indices = RTA_GET_U16(tb[TCA_DSMARK_INDICES-1]);
> > -	if (!indices || !dsmark_valid_indices(indices))
> > +
> > +	if (hweight32(indices) != 1)
> >  		goto errout;
> 
> Come on Dave, that is a step backwards.

Absolutely not.

> So you took a two instruction thing that any programmer who ever had
> one of those technical trick interviews would surely understand, and
> made it call a function...  Seems like the thing you would consul
> others against.

It's counting bits, "hamming weight" is a count of bits.

That is more understandable to me than:

	Oh BTW, power of two values also just so happen to
	have only 1 bit set.

Testing for a power of two obfuscates the meaning of the
test.  It doesn't want a power-of-two, it wants a bitmask
with only one bit set.

^ permalink raw reply

* Re: [BNX2]: Fix driver software flag namespace.
From: David Miller @ 2008-01-22  3:51 UTC (permalink / raw)
  To: mchan; +Cc: netdev
In-Reply-To: <1200970820.10010.58.camel@dell>

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 21 Jan 2008 19:00:20 -0800

> [BNX2]: Fix driver phy_flags name space.
> 
> Prefix "bp->phy_flags" names with BNX2_PHY_FLAG_* for consistency.
> 
> Signed-off-by: Michael Chan <mchan@broadcom.com>

Applied, thanks Michael.

^ permalink raw reply

* Re: [PATCH 2/4] dsmark: get rid of trivial function
From: Stephen Hemminger @ 2008-01-22  3:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20080121.022223.85874858.davem@davemloft.net>

On Mon, 21 Jan 2008 02:22:23 -0800 (PST)
David Miller <davem@davemloft.net> wrote:

> From: Patrick McHardy <kaber@trash.net>
> Date: Mon, 21 Jan 2008 01:16:32 +0100
> 
> > Stephen Hemminger wrote:
> > > Replace loop in dsmark_valid_indices with equivalent bit math.
> > > 
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > > 
> > > --- a/net/sched/sch_dsmark.c	2008-01-20 13:07:58.000000000 -0800
> > > +++ b/net/sched/sch_dsmark.c	2008-01-20 13:22:54.000000000 -0800
> > > @@ -45,13 +45,8 @@ struct dsmark_qdisc_data {
> > >  
> > >  static inline int dsmark_valid_indices(u16 indices)
> > >  {
> > > -	while (indices != 1) {
> > > -		if (indices & 1)
> > > -			return 0;
> > > -		indices >>= 1;
> > > -	}
> > > -
> > > -	return 1;
> > > +	/* Must have only one bit set */
> > > +	return (indices & (indices - 1)) == 0;
> > 
> > hweight seems easier to understand, it took me a bit
> > to realize that the comment matches the code :)
> 
> Sounds good.  Here is what I ended up checking in.
> 
> [PKT_SCHED] dsmark: Use hweight32() instead of convoluted loop.
> 
> Based upon a patch by Stephen Hemminger and suggestions
> from Patrick McHardy.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
> index a9732ae..d96eaf0 100644
> --- a/net/sched/sch_dsmark.c
> +++ b/net/sched/sch_dsmark.c
> @@ -10,6 +10,7 @@
>  #include <linux/errno.h>
>  #include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
> +#include <linux/bitops.h>
>  #include <net/pkt_sched.h>
>  #include <net/dsfield.h>
>  #include <net/inet_ecn.h>
> @@ -43,17 +44,6 @@ struct dsmark_qdisc_data {
>  	int			set_tc_index;
>  };
>  
> -static inline int dsmark_valid_indices(u16 indices)
> -{
> -	while (indices != 1) {
> -		if (indices & 1)
> -			return 0;
> -		indices >>= 1;
> -	}
> -
> -	return 1;
> -}
> -
>  static inline int dsmark_valid_index(struct dsmark_qdisc_data *p, u16 index)
>  {
>  	return (index <= p->indices && index > 0);
> @@ -348,7 +338,8 @@ static int dsmark_init(struct Qdisc *sch, struct rtattr *opt)
>  		goto errout;
>  
>  	indices = RTA_GET_U16(tb[TCA_DSMARK_INDICES-1]);
> -	if (!indices || !dsmark_valid_indices(indices))
> +
> +	if (hweight32(indices) != 1)
>  		goto errout;

Come on Dave, that is a step backwards.
So you took a two instruction thing that any programmer who ever had one of those
technical trick interviews would surely understand, and made it call a function...
Seems like the thing you would consul others against.

Please use !is_power_of_2(indices) instead.

^ permalink raw reply

* Re: [PATCH 3/4] bonding: Fix work rearming
From: Makito SHIOKAWA @ 2008-01-22  3:35 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev, Makito SHIOKAWA
In-Reply-To: <20080121133338.GH1789@ff.dom.local>

> No: you mentioned about treating new_value == 0 like new_value < 0
> with 'if (new_value <= 0)', and I didn't understand this idea...
I'm sorry to have misunderstood you. I wanted to say that there is a way just
to fix 'if (new_value < 0)' to 'if (new_value <= 0)' and reject miimon == 0,
instead of PATCH 3/4. (Of course, you won't be able to stop mii monitor only
in that case.)

> Alas I don't understand the reason of this change in bond_main()...
> Some comment?
It was unnecessary any more..., thanks.


Signed-off-by: Makito SHIOKAWA <mshiokawa@miraclelinux.com>
---
  drivers/net/bonding/bond_main.c  |    7 ++-----
  drivers/net/bonding/bond_sysfs.c |   19 +++++++++++++++++--
  2 files changed, 19 insertions(+), 7 deletions(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2801,8 +2801,7 @@ void bond_loadbalance_arp_mon(struct wor
  	}

  re_arm:
-	if (bond->params.arp_interval)
-		queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
+	queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
  out:
  	read_unlock(&bond->lock);
  }
@@ -3058,9 +3057,7 @@ void bond_activebackup_arp_mon(struct wo
  	}

  re_arm:
-	if (bond->params.arp_interval) {
-		queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
-	}
+	queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
  out:
  	read_unlock(&bond->lock);
  }
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -644,6 +644,15 @@ static ssize_t bonding_store_arp_interva
  	       ": %s: Setting ARP monitoring interval to %d.\n",
  	       bond->dev->name, new_value);
  	bond->params.arp_interval = new_value;
+	if (bond->params.arp_interval == 0 && (bond->dev->flags & IFF_UP)) {
+		printk(KERN_INFO DRV_NAME
+		       ": %s: Disabling ARP monitoring.\n",
+		       bond->dev->name);
+		if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+			cancel_delayed_work_sync(&bond->ab_arp_work);
+		else
+			cancel_delayed_work_sync(&bond->lb_arp_work);
+	}
  	if (bond->params.miimon) {
  		printk(KERN_INFO DRV_NAME
  		       ": %s: ARP monitoring cannot be used with MII monitoring. "
@@ -658,7 +667,7 @@ static ssize_t bonding_store_arp_interva
  		       "but no ARP targets have been specified.\n",
  		       bond->dev->name);
  	}
-	if (bond->dev->flags & IFF_UP) {
+	if (bond->params.arp_interval && (bond->dev->flags & IFF_UP)) {
  		/* If the interface is up, we may need to fire off
  		 * the ARP timer.  If the interface is down, the
  		 * timer will get fired off when the open function
@@ -997,6 +1006,12 @@ static ssize_t bonding_store_miimon(stru
  		       ": %s: Setting MII monitoring interval to %d.\n",
  		       bond->dev->name, new_value);
  		bond->params.miimon = new_value;
+		if (bond->params.miimon == 0 && (bond->dev->flags & IFF_UP)) {
+			printk(KERN_INFO DRV_NAME
+			       ": %s: Disabling MII monitoring...\n",
+			       bond->dev->name);
+			cancel_delayed_work_sync(&bond->mii_work);
+		}
  		if(bond->params.updelay)
  			printk(KERN_INFO DRV_NAME
  			      ": %s: Note: Updating updelay (to %d) "
@@ -1026,7 +1041,7 @@ static ssize_t bonding_store_miimon(stru
  				cancel_delayed_work_sync(&bond->lb_arp_work);
  		}

-		if (bond->dev->flags & IFF_UP) {
+		if (bond->params.miimon && (bond->dev->flags & IFF_UP)) {
  			/* If the interface is up, we may need to fire off
  			 * the MII timer. If the interface is down, the
  			 * timer will get fired off when the open function


-- 
Makito SHIOKAWA
MIRACLE LINUX CORPORATION

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox