Netdev List
 help / color / mirror / Atom feed
* TCP cache performance
From: Tom Quetchenbach @ 2008-01-08  1:22 UTC (permalink / raw)
  To: netdev; +Cc: Lachlan Andrew

I've been continuing off and on to investigate TCP performance issues.
As has been noted before on this list, loss and subsequent processing
can lead to spikes in the measured RTT which confuse delay-based
congestion control algorithms.

I've done some experiments that indicate that cache size is a
significant limiting factor here. My desktop machine with a 2.4 GHz Core
Duo and 4 MB cache quite noticeably outperforms our experiment servers,
which have two dual-core Xeons at 2.66 GHz but only 512 KB cache. At 400
Mbps with 40ms round-trip delay and 1024-packet buffer the desktop
behaves fairly normally, although there is still a large RTT spike at
the start of the flow due to slow-start. The servers show large RTT
spikes at each loss event, as well as some timeouts.

This suggests that efforts to improve TCP performance should focus on
cache usage rather than just processing time.

Plots of cwnd, RTT, and CPU load are available here:

512K cache:
http://wil-ns.cs.caltech.edu/benchmark.tmp/265/2flow--ALG=illinois-BUF=1024-BUF_tgt=1333,1.0-BW=400M-GAP=150-LEN=600-RTT=40--1/

4M cache:
http://wil-ns.cs.caltech.edu/benchmark.tmp/266/2flow--ALG=illinois-BUF=1024-BUF_tgt=1333,1.0-BW=400M-GAP=150-LEN=600-RTT=40--1/

Tests were done with net-2.6 (2.6.23.1 gives similar results though)
using tcp_probe to capture data.

-Tom


^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Kevin Winchester @ 2008-01-08  1:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, Arjan van de Ven, Linux Kernel Mailing List,
	Linus Torvalds, Andrew Morton, NetDev
In-Reply-To: <20080107174431.GC27741@fieldses.org>

J. Bruce Fields wrote:
> On Sat, Jan 05, 2008 at 09:39:35PM +0000, Al Viro wrote:
>> On Sat, Jan 05, 2008 at 01:06:17PM -0800, Arjan van de Ven wrote:
>>> The http://www.kerneloops.org website collects kernel oops and 
>>> warning reports from various mailing lists and bugzillas as well 
>>> as with a client users can install to auto-submit oopses. Below 
>>> is a top 10 list of the oopses collected in the last 7 days. 
>>> (Reports prior to 2.6.23 have been omitted in collecting the top 
>>> 10)
>>> 
>>> This week, a total of 49 oopses and warnings have been reported,
>>>  compared to 53 reports in the previous week.
>> FWIW, people moaning about the lack of entry-level kernel work 
>> would do well by decoding those to the level of "this place in this
>>  function, called from <here>, with so-and-so variable being
>> <this>" and posting the results.  As skills go, it's far more
>> useful than "how to trim the trailing whitespace" and the rest of 
>> checkpatch.pl-inspired crap that got so popular lately...
> 
> Is there any good basic documentation on this to point people at?
> 

I would second this question.  I see people "decode" oops on lkml often enough, but I've never been entirely sure how its done.  Is it somewhere in Documentation?

-- 
Kevin Winchester

^ permalink raw reply

* Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
From: David Stevens @ 2008-01-08  1:18 UTC (permalink / raw)
  To: Brian Haley
  Cc: David Miller, netdev@vger.kernel.org, netdev-owner,
	YOSHIFUJI Hideaki
In-Reply-To: <47825B50.2060200@hp.com>

Brian,
        Looks good to me.

                        +-DLS


Acked-by: David L Stevens <dlstevens@us.ibm.com>

> How about the simple patch below?  I just removed the ENINVAL check from 

> my original patch, but it accomplishes the same thing.
...
> 
> Signed-off-by: Brian Haley <brian.haley@hp.com>
> ---
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index 2ed689a..5d4245a 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -123,11 +123,11 @@ ipv4_connected:
>              goto out;
>           }
>           sk->sk_bound_dev_if = usin->sin6_scope_id;
> -         if (!sk->sk_bound_dev_if &&
> -             (addr_type & IPV6_ADDR_MULTICAST))
> -            fl.oif = np->mcast_oif;
>        }
> 
> +      if (!sk->sk_bound_dev_if && (addr_type & IPV6_ADDR_MULTICAST))
> +         sk->sk_bound_dev_if = np->mcast_oif;
> +
>        /* Connect to link-local address requires an interface */
>        if (!sk->sk_bound_dev_if) {
>           err = -EINVAL;


^ permalink raw reply

* [PATCH 001/001] ipv4: enable use of 240/4 address space
From: Vince Fuller @ 2008-01-08  1:10 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, vaf

from Vince Fuller <vaf@vaf.net>

This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
(aka "class-E") address space as consistent with the Internet Draft
draft-fuller-240space-00.txt.

Signed-off-by: Vince Fuller <vaf@vaf.net>

---

--- include/linux/in.h.orig	2007-04-12 10:16:20.000000000 -0700
+++ include/linux/in.h	2008-01-07 16:54:38.000000000 -0800
@@ -215,8 +215,16 @@ struct sockaddr_in {
 #define	IN_MULTICAST(a)		IN_CLASSD(a)
 #define IN_MULTICAST_NET	0xF0000000
 
+#define IN_CLASSE(a)		((((long int) (a)) & 0xf0000000) == 0xf0000000)
+#define	IN_CLASSE_NET		0xffffff00
+#define	IN_CLASSE_NSHIFT	8
+#define	IN_CLASSE_HOST		(0xffffffff & ~IN_CLASSE_NET)
+
+/* 
+ * these are no longer used
 #define	IN_EXPERIMENTAL(a)	((((long int) (a)) & 0xf0000000) == 0xf0000000)
 #define	IN_BADCLASS(a)		IN_EXPERIMENTAL((a))
+*/
 
 /* Address to accept any incoming messages. */
 #define	INADDR_ANY		((unsigned long int) 0x00000000)
--- net/ipv4/devinet.c.orig	2007-04-12 10:16:23.000000000 -0700
+++ net/ipv4/devinet.c	2008-01-07 16:55:59.000000000 -0800
@@ -594,6 +594,8 @@ static __inline__ int inet_abc_len(__be3
 			rc = 16;
 		else if (IN_CLASSC(haddr))
 			rc = 24;
+		else if (IN_CLASSE(haddr))
+			rc = 24;
 	}
 
   	return rc;
--- net/ipv4/fib_frontend.c.orig	2007-06-07 10:47:08.000000000 -0700
+++ net/ipv4/fib_frontend.c	2008-01-07 16:55:59.000000000 -0800
@@ -152,7 +152,7 @@ unsigned inet_addr_type(__be32 addr)
 	struct fib_result	res;
 	unsigned ret = RTN_BROADCAST;
 
-	if (ZERONET(addr) || BADCLASS(addr))
+	if (ZERONET(addr) || addr == INADDR_BROADCAST)
 		return RTN_BROADCAST;
 	if (MULTICAST(addr))
 		return RTN_MULTICAST;
--- net/ipv4/ipconfig.c.orig	2007-04-12 10:16:23.000000000 -0700
+++ net/ipv4/ipconfig.c	2008-01-07 16:55:59.000000000 -0800
@@ -379,6 +379,8 @@ static int __init ic_defaults(void)
 			ic_netmask = htonl(IN_CLASSB_NET);
 		else if (IN_CLASSC(ntohl(ic_myaddr)))
 			ic_netmask = htonl(IN_CLASSC_NET);
+		else if (IN_CLASSE(ntohl(ic_myaddr)))
+			ic_netmask = htonl(IN_CLASSE_NET);
 		else {
 			printk(KERN_ERR "IP-Config: Unable to guess netmask for address %u.%u.%u.%u\n",
 				NIPQUAD(ic_myaddr));
--- net/ipv4/route.c.orig	2007-04-12 10:16:24.000000000 -0700
+++ net/ipv4/route.c	2008-01-07 16:55:59.000000000 -0800
@@ -1140,7 +1140,7 @@ void ip_rt_redirect(__be32 old_gw, __be3
 		return;
 
 	if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev)
-	    || MULTICAST(new_gw) || BADCLASS(new_gw) || ZERONET(new_gw))
+	    || MULTICAST(new_gw) || new_gw == INADDR_BROADCAST || ZERONET(new_gw))
 		goto reject_redirect;
 
 	if (!IN_DEV_SHARED_MEDIA(in_dev)) {
@@ -1617,7 +1617,7 @@ static int ip_route_input_mc(struct sk_b
 	if (in_dev == NULL)
 		return -EINVAL;
 
-	if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr) ||
+	if (MULTICAST(saddr) || saddr == INADDR_BROADCAST || LOOPBACK(saddr) ||
 	    skb->protocol != htons(ETH_P_IP))
 		goto e_inval;
 
@@ -1935,7 +1935,7 @@ static int ip_route_input_slow(struct sk
 	   by fib_lookup.
 	 */
 
-	if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr))
+	if (MULTICAST(saddr) || saddr == INADDR_BROADCAST || LOOPBACK(saddr))
 		goto martian_source;
 
 	if (daddr == htonl(0xFFFFFFFF) || (saddr == 0 && daddr == 0))
@@ -1947,7 +1947,7 @@ static int ip_route_input_slow(struct sk
 	if (ZERONET(saddr))
 		goto martian_source;
 
-	if (BADCLASS(daddr) || ZERONET(daddr) || LOOPBACK(daddr))
+	if (ZERONET(daddr) || LOOPBACK(daddr))
 		goto martian_destination;
 
 	/*
@@ -2171,7 +2171,7 @@ static inline int __mkroute_output(struc
 		res->type = RTN_BROADCAST;
 	else if (MULTICAST(fl->fl4_dst))
 		res->type = RTN_MULTICAST;
-	else if (BADCLASS(fl->fl4_dst) || ZERONET(fl->fl4_dst))
+	else if (ZERONET(fl->fl4_dst))
 		return -EINVAL;
 
 	if (dev_out->flags & IFF_LOOPBACK)
@@ -2391,7 +2391,7 @@ static int ip_route_output_slow(struct r
 	if (oldflp->fl4_src) {
 		err = -EINVAL;
 		if (MULTICAST(oldflp->fl4_src) ||
-		    BADCLASS(oldflp->fl4_src) ||
+		    oldflp->fl4_src == INADDR_BROADCAST ||
 		    ZERONET(oldflp->fl4_src))
 			goto out;
 

^ permalink raw reply

* Re: WARNING: at kernel/softirq.c:139 local_bh_enable()
From: Kok, Auke @ 2008-01-08  1:09 UTC (permalink / raw)
  To: Jayakrishnan.Chathu; +Cc: linux-kernel, NetDev
In-Reply-To: <2198383E1141814486F0B881B3260CF501D6BB66@daebe103.NOE.Nokia.com>

Jayakrishnan.Chathu@nokia.com wrote:
> I am running 2.6.23 kernel on a DUAL core and QUAD core i386 boxes and
> after everyboot, when the ethernet traffic starts i get this warning.
> 
> All the ports in the system are e1000 and i am using the kernel e1000
> driver.

[added netdev to the Cc:]

can you repro this with 2.6.24-rc7? What distro are you using? Is your distro
running a link-monitoring tool of some sorts?

Auke



> 
> Jan  7 22:31:00 localhost [warning] WARNING: at kernel/softirq.c:139
> local_bh_enable() 
> Jan  7 22:31:00 localhost [warning] [<c012bd0f>]
> local_bh_enable+0x49/0xa9 
> Jan  7 22:31:00 localhost [warning] [<c039ba1a>]
> dev_queue_xmit+0x26c/0x275 
> Jan  7 22:31:00 localhost [warning] [<c03cdf6c>] arp_xmit+0x4d/0x51 
> Jan  7 22:31:00 localhost [warning] [<c03cd9f6>] arp_solicit+0x156/0x174
> 
> Jan  7 22:31:00 localhost [warning] [<c03a047f>]
> neigh_timer_handler+0x1e0/0x224 
> Jan  7 22:31:00 localhost [warning] [<c012f820>]
> run_timer_softirq+0x113/0x172 
> Jan  7 22:31:00 localhost [warning] [<c013b042>] WARNING: at
> kernel/softirq.c:139 local_bh_enable() 
> Jan  7 22:31:00 localhost [warning] hrtimer_interrupt+0x19c/0x1c4 
> Jan  7 22:31:00 localhost [warning] [<c014002a>]  [<c012bd0f>]
> local_bh_enable+0x49/0xa9 
> Jan  7 22:31:00 localhost [warning] [<c039ba1a>]
> dev_queue_xmit+0x26c/0x275 
> Jan  7 22:31:00 localhost [warning] [<c03a0c05>]
> neigh_resolve_output+0x12c/0x15e 
> Jan  7 22:31:00 localhost [warning] [<c03a0881>]
> neigh_update+0x246/0x2cb 
> Jan  7 22:31:00 localhost [warning] [<c039fb21>] neigh_lookup+0xa9/0xb3 
> Jan  7 22:31:00 localhost [warning] [<c03ce410>] arp_process+0x43c/0x477
> 
> Jan  7 22:31:00 localhost [warning] [<c0120b73>]
> enqueue_task_fair+0x2d/0x30 
> Jan  7 22:31:00 localhost [warning] tick_sched_timer+0x0/0xba 
> Jan  7 22:31:00 localhost [warning] [<c03ce554>] arp_rcv+0x104/0x119 
> Jan  7 22:31:00 localhost [warning] [<c03a029f>]  [<c039bda6>]
> netif_receive_skb+0x1c5/0x1de 
> Jan  7 22:31:00 localhost [warning] [<f897a61d>]
> e1000_clean_rx_irq+0x40e/0x4ca [e1000] 
> Jan  7 22:31:00 localhost [warning] [<c013bdc6>]
> getnstimeofday+0x36/0x10c 
> Jan  7 22:31:00 localhost [warning] neigh_timer_handler+0x0/0x224 
> Jan  7 22:31:00 localhost [warning] [<c012be12>] __do_softirq+0x60/0xc1 
> Jan  7 22:31:00 localhost [warning] [<f8979e34>] e1000_clean+0x74/0x119
> [e1000] 
> Jan  7 22:31:00 localhost [warning] [<c039bf03>]  [<c012bea4>]
> net_rx_action+0x5a/0xd3 
> Jan  7 22:31:00 localhost [warning] [<c012be12>] __do_softirq+0x60/0xc1 
> Jan  7 22:31:00 localhost [warning] do_softirq+0x31/0x35 
> Jan  7 22:31:00 localhost [warning] [<c012bea4>] do_softirq+0x31/0x35 
> Jan  7 22:31:00 localhost [warning] [<c012bf03>] irq_exit+0x38/0x6b 
> Jan  7 22:31:00 localhost [warning] [<c0106a1e>]  [<c012bf03>]
> do_IRQ+0x80/0x93 
> Jan  7 22:31:00 localhost [warning] irq_exit+0x38/0x6b 
> Jan  7 22:31:00 localhost [warning] [<c01057b7>]
> common_interrupt+0x23/0x28 
> Jan  7 22:31:00 localhost [warning] [<c01600d8>]  [<c011a34d>]
> get_swap_page+0xe7/0x215 
> Jan  7 22:31:00 localhost [warning] [<c0103232>]
> mwait_idle_with_hints+0x34/0x38 Jan  7 22:31:00 localhost [warning]
> [<c0103236>] mwait_idle+0x0/0xa 
> Jan  7 22:31:00 localhost [warning] [<c01030f2>] cpu_idle+0x98/0xb9 
> Jan  7 22:31:00 localhost [warning] smp_apic_timer_interrupt+0x2c/0x35
> Jan  7 22:31:00 localhost [warning] ======================= 
> Jan  7 22:31:00 localhost [warning] [<c0105874>]
> apic_timer_interrupt+0x28/0x30 
> Jan  7 22:31:00 localhost [warning] [<c01600d8>]
> get_swap_page+0xe7/0x215 
> Jan  7 22:31:00 localhost [warning] [<c0103232>]
> mwait_idle_with_hints+0x34/0x38 
> Jan  7 22:31:00 localhost [warning] [<c0103236>] mwait_idle+0x0/0xa 
> Jan  7 22:31:00 localhost [warning] [<c01030f2>] cpu_idle+0x98/0xb9 
> Jan  7 22:31:00 localhost [warning] =======================
> 
> 
> Thanks
> Jayakrishnan Chathu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply

* [PATCH] intel ethernet drivers: update MAINTAINERS
From: Auke Kok @ 2008-01-08  0:54 UTC (permalink / raw)
  To: jeff, davem; +Cc: netdev

Unfortunately Jeb decided to move away from our group. We wish
Jeb good luck with his new group!

Reordered people a bit so most active team members are on top.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
---

 MAINTAINERS |   18 ++++++++----------
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index efdef65..087b871 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1982,29 +1982,27 @@ L:	netdev@vger.kernel.org
 S:	Maintained
 
 INTEL PRO/100 ETHERNET SUPPORT
-P:	John Ronciak
-M:	john.ronciak@intel.com
+P:	Auke Kok
+M:	auke-jan.h.kok@intel.com
 P:	Jesse Brandeburg
 M:	jesse.brandeburg@intel.com
 P:	Jeff Kirsher
 M:	jeffrey.t.kirsher@intel.com
-P:	Auke Kok
-M:	auke-jan.h.kok@intel.com
+P:	John Ronciak
+M:	john.ronciak@intel.com
 L:	e1000-devel@lists.sourceforge.net
 W:	http://sourceforge.net/projects/e1000/
 S:	Supported
 
 INTEL PRO/1000 GIGABIT ETHERNET SUPPORT
-P:	Jeb Cramer
-M:	cramerj@intel.com
-P:	John Ronciak
-M:	john.ronciak@intel.com
+P:	Auke Kok
+M:	auke-jan.h.kok@intel.com
 P:	Jesse Brandeburg
 M:	jesse.brandeburg@intel.com
 P:	Jeff Kirsher
 M:	jeffrey.t.kirsher@intel.com
-P:	Auke Kok
-M:	auke-jan.h.kok@intel.com
+P:	John Ronciak
+M:	john.ronciak@intel.com
 L:	e1000-devel@lists.sourceforge.net
 W:	http://sourceforge.net/projects/e1000/
 S:	Supported


^ permalink raw reply related

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: Andi Kleen @ 2008-01-08  0:23 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, ilpo.jarvinen, netdev, acme, paul.moore, latten
In-Reply-To: <20080105.231658.168081302.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> Similarly I question just about any inline usage at all in *.c files

Don't forget the .h files. Especially a lot of stuff in tcp.h should
be probably in some .c file and not be inline.

-Andi

^ permalink raw reply

* [PATCH] [NET] kaweth was forgotten in msec switchover of usb_start_wait_urb
From: Russell Dill @ 2008-01-07 23:20 UTC (permalink / raw)
  To: Michael Zappe, Stephane Alnet, Brad Hards, Oliver Neukum; +Cc: netdev

Back in 2.6.12-pre, usb_start_wait_urb was switched over to take
milliseconds instead of jiffies. kaweth.c was never updated to match.

Signed-off-by: Russ Dill <Russ.Dill@asu.edu>
---
 drivers/net/usb/kaweth.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/kaweth.c b/drivers/net/usb/kaweth.c
index 58a53a6..569ad8b 100644
--- a/drivers/net/usb/kaweth.c
+++ b/drivers/net/usb/kaweth.c
@@ -70,7 +70,7 @@
 #define KAWETH_TX_TIMEOUT		(5 * HZ)
 #define KAWETH_SCRATCH_SIZE		32
 #define KAWETH_FIRMWARE_BUF_SIZE	4096
-#define KAWETH_CONTROL_TIMEOUT		(30 * HZ)
+#define KAWETH_CONTROL_TIMEOUT		(30000)

 #define KAWETH_STATUS_BROKEN		0x0000001
 #define KAWETH_STATUS_CLOSING		0x0000002
--
1.5.3.4

^ permalink raw reply related

* Re: [PATCH] via-velocity big-endian support
From: Francois Romieu @ 2008-01-07 23:18 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, jgarzik, linux, Brian Hall, Lennert Buytenhek,
	Jay Cliburn
In-Reply-To: <20071228184008.GN27894@ZenIV.linux.org.uk>

Al Viro <viro@ZenIV.linux.org.uk> :
[...]
> Point taken...
> 
> * kill multibyte bitfields
> * annotate
> * add missing conversions
> * fix a couple of brainos in zerocopy stuff (fortunately, it's ifdef'ed out)
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Thanks a lot. After the usual xmas delay, split and review, it is
available at:

git://git.kernel.org/pub/scm/linux/kernel/git/romieu/netdev-2.6.git velocity

I have shortened the end-of-line comments associated with several #define
but the end result is otherwise the same.

Remarks for the random reader below.

> ---
>  drivers/net/via-velocity.c |   62 ++++++-------
>  drivers/net/via-velocity.h |  220 +++++++++++++++++++-------------------------
>  2 files changed, 125 insertions(+), 157 deletions(-)
> 
> diff --git a/drivers/net/via-velocity.c b/drivers/net/via-velocity.c
> index 35cd65d..3842516 100644
> --- a/drivers/net/via-velocity.c
> +++ b/drivers/net/via-velocity.c
[...]
> @@ -777,7 +776,7 @@ static void velocity_init_registers(struct velocity_info *vptr,
>  
>  		vptr->int_mask = INT_MASK_DEF;
>  
> -		writel(cpu_to_le32(vptr->rd_pool_dma), &regs->RDBaseLo);
> +		writel(vptr->rd_pool_dma, &regs->RDBaseLo);
>  		writew(vptr->options.numrx - 1, &regs->RDCSize);
>  		mac_rx_queue_run(regs);
>  		mac_rx_queue_wake(regs);

The cpu_to_le32 was probably wrong in the first place.

[...]
> @@ -785,7 +784,7 @@ static void velocity_init_registers(struct velocity_info *vptr,
>  		writew(vptr->options.numtx - 1, &regs->TDCSize);
>  
>  		for (i = 0; i < vptr->num_txq; i++) {
> -			writel(cpu_to_le32(vptr->td_pool_dma[i]), &(regs->TDBaseLo[i]));
> +			writel(vptr->td_pool_dma[i], &regs->TDBaseLo[i]);
>  			mac_tx_queue_run(regs, i);
>  		}
>  

Same thing.

[...]
> @@ -1421,7 +1420,7 @@ static int velocity_rx_srv(struct velocity_info *vptr, int status)
>  		/*
>  		 *	Don't drop CE or RL error frame although RXOK is off
>  		 */
> -		if ((rd->rdesc0.RSR & RSR_RXOK) || (!(rd->rdesc0.RSR & RSR_RXOK) && (rd->rdesc0.RSR & (RSR_CE | RSR_RL)))) {
> +		if (rd->rdesc0.RSR & (RSR_RXOK | RSR_CE | RSR_RL)) {
>  			if (velocity_receive_frame(vptr, rd_curr) < 0)
>  				stats->rx_dropped++;
>  		} else {

This one was fun.

[...]
> @@ -2122,22 +2121,23 @@ static int velocity_xmit(struct sk_buff *skb, struct net_device *dev)
>  			tdinfo->nskb_dma = 0;
>  			tdinfo->skb_dma[i] = pci_map_single(vptr->pdev, skb->data, skb->len - skb->data_len, PCI_DMA_TODEVICE);
>  
> -			td_ptr->tdesc0.pktsize = pktlen;
> +			td_ptr->tdesc0.len = len;
>  
>  			/* FIXME: support 48bit DMA later */
>  			td_ptr->td_buf[i].pa_low = cpu_to_le32(tdinfo->skb_dma);
>  			td_ptr->td_buf[i].pa_high = 0;
> -			td_ptr->td_buf[i].bufsize = skb->len->skb->data_len;
> +			td_ptr->td_buf[i].size =
> +				cpu_to_le16(skb->len->skb->data_len);

skb->len->skb->data_len...

Still a bit broken but it is endian clean now. Fine.

-- 
Ueimor

^ permalink raw reply

* Re: sunhme x86 SMP issues
From: Jarek Poplawski @ 2008-01-07 23:01 UTC (permalink / raw)
  To: Costa Tsaousis; +Cc: netdev
In-Reply-To: <477369C9.7030405@tsaousis.gr>

Costa Tsaousis wrote, On 12/27/2007 10:00 AM:

> Merry Christams,
> 
> I would like to report incompatibilities of the sunhme driver with x86 SMP.
> 


Hi Costa,

It seems your report has to wait for better times... I'm not driver's
expert, but maybe you could try some of these:

- since 'the list' seems to be busy now, opening a bug report at
bugzilla.kernel.org (copy most of these information and lspci -vvv)
looks like the best way to not forget about this,

- I'd a look at the code and it seems it uses quite coarse locking,
so it could be SMP unfriendly by design...; could you try with
kernel boot parameter: noirqbalance (or read about SMP issues e.g.
for another card Documentation/networking/cxgb.txt).

Regards,
Jarek P.

^ permalink raw reply

* Re: [git patches] net driver fixes
From: David Miller @ 2008-01-07 21:56 UTC (permalink / raw)
  To: mroos; +Cc: jeff, netdev, shemminger
In-Reply-To: <Pine.SOC.4.64.0801071443390.28077@math.ut.ee>

From: Meelis Roos <mroos@linux.ee>
Date: Mon, 7 Jan 2008 14:44:03 +0200 (EET)

> > > JG> A couple [minorly] notable wireless bug fixes, and plenty of viro fixes
> > > JG> for obscure issues :)
> > > 
> > > What about the tulip NAPI fix from Stephen Hemminger? Without this, my tulip
> > > is hosed easily.
> > > 
> > > The thread where I reported it was "Badness at net/core/dev.c:2199", around
> > > Dec 16.
> > 
> > That's going up in the first post-Xmas batch.
> 
> Well, it's 2.6.24-rc7 already - any news?

I put this into my net-2.6 tree last night since Jeff asked
me to look over critical networking driver stuff for a little
while.

^ permalink raw reply

* 3c509: PnP resource management fix
From: Krzysztof Helt @ 2008-01-07 21:44 UTC (permalink / raw)
  To: netdev

From: Krzysztof Helt <krzysztof.h1@wp.pl>

In order to release PnP resources a card type must be set to EL3_PNP.
Previously, it was never set hence the PnP resources were not
released and device was left in incorrect state.

Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>

---

The type value is set only if the card is enabled as PnP device.

--- linux-2.6.24/drivers/net/3c509.c	2007-12-25 23:20:20.000000000 +0100
+++ linux-new/drivers/net/3c509.c	2007-12-26 12:18:08.000000000 +0100
@@ -385,6 +385,7 @@ static int __init el3_probe(int card_idx
 #if defined(__ISAPNP__)
 	static int pnp_cards;
 	struct pnp_dev *idev = NULL;
+	int pnp_found = 0;
 
 	if (nopnp == 1)
 		goto no_pnp;
@@ -430,6 +431,7 @@ __again:
 			pnp_cards++;
 
 			netdev_boot_setup_check(dev);
+			pnp_found = 1;
 			goto found;
 		}
 	}
@@ -560,6 +562,8 @@ no_pnp:
 	lp = netdev_priv(dev);
 #if defined(__ISAPNP__)
 	lp->dev = &idev->dev;
+	if (pnp_found)
+		lp->type = EL3_PNP;
 #endif
 	err = el3_common_init(dev);
 

^ permalink raw reply

* Re: [RFC PATCH v2 1/2] NET: Clone the sk_buff 'iif' field in __skb_clone()
From: James Morris @ 2008-01-07 21:11 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, David S. Miller
In-Reply-To: <20080107174741.13488.93482.stgit@flek.americas.hpqcorp.net>

On Mon, 7 Jan 2008, Paul Moore wrote:

> Both NetLabel and SELinux (other LSMs may grow to use it as well) rely on the
> 'iif' field to determine the receiving network interface of inbound packets.
> Unfortunately, at present this field is not preserved across a skb clone
> operation which can lead to garbage values if the cloned skb is sent back
> through the network stack.  This patch corrects this problem by properly
> copying the 'iif' field in __skb_clone() and removing the 'iif' field
> assignment from skb_act_clone() since it is no longer needed.
> 
> Also, while we are here, put the assignments in the same order as the offsets
> to reduce cacheline bounces.
> 
> Signed-off-by: Paul Moore <paul.moore@hp.com>

Dave, perhaps this one should pushed to Linus now as a bugfix?

> ---
> 
>  include/net/sch_generic.h |    1 -
>  net/core/skbuff.c         |   11 ++++++-----
>  2 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index c926551..4c3b351 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -325,7 +325,6 @@ static inline struct sk_buff *skb_act_clone(struct sk_buff *skb, gfp_t gfp_mask)
>  		n->tc_verd = SET_TC_VERD(n->tc_verd, 0);
>  		n->tc_verd = CLR_TC_OK2MUNGE(n->tc_verd);
>  		n->tc_verd = CLR_TC_MUNGED(n->tc_verd);
> -		n->iif = skb->iif;
>  	}
>  	return n;
>  }
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 5b4ce9b..b628377 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -416,16 +416,17 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
>  	C(len);
>  	C(data_len);
>  	C(mac_len);
> -	n->cloned = 1;
>  	n->hdr_len = skb->nohdr ? skb_headroom(skb) : skb->hdr_len;
> +	n->cloned = 1;
>  	n->nohdr = 0;
>  	n->destructor = NULL;
> -	C(truesize);
> -	atomic_set(&n->users, 1);
> -	C(head);
> -	C(data);
> +	C(iif);
>  	C(tail);
>  	C(end);
> +	C(head);
> +	C(data);
> +	C(truesize);
> +	atomic_set(&n->users, 1);
>  
>  	atomic_inc(&(skb_shinfo(skb)->dataref));
>  	skb->cloned = 1;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply

* Re: [IPV4] ROUTE: Avoid sparse warnings
From: Herbert Xu @ 2008-01-07 20:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, viro
In-Reply-To: <47828FB5.1030303@cosmosbay.com>

On Mon, Jan 07, 2008 at 09:46:45PM +0100, Eric Dumazet wrote:
>
> Well, we call rcu_read_unlock_bh()/rcu_read_lock_bh() for each bucket, 
> empty or not, before and after patch, so we dont change latency.

Oh I see.  Your patch looks good then.  But we still need a solution
in general unless we're to avoid all uses of conditional locking.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH net-2.6.25][NETNS][IPV6] add ipv6 structure for netns
From: Daniel Lezcano @ 2008-01-07 20:48 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: add-ipv6-for-netns.patch --]
[-- Type: text/x-patch, Size: 1325 bytes --]

Subject: add ipv6 structure for netns
From: Daniel Lezcano <dlezcano@fr.ibm.com>

Like the ipv4 part, this patch adds an ipv6 structure in the net structure
to aggregate the different resources to make ipv6 per namespace.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/net_namespace.h |    4 ++++
 include/net/netns/ipv6.h    |   10 ++++++++++
 2 files changed, 14 insertions(+)

Index: net-2.6.25/include/net/net_namespace.h
===================================================================
--- net-2.6.25.orig/include/net/net_namespace.h
+++ net-2.6.25/include/net/net_namespace.h
@@ -11,6 +11,7 @@
 #include <net/netns/unix.h>
 #include <net/netns/packet.h>
 #include <net/netns/ipv4.h>
+#include <net/netns/ipv6.h>
 
 struct proc_dir_entry;
 struct net_device;
@@ -48,6 +49,9 @@ struct net {
 	struct netns_packet	packet;
 	struct netns_unix	unx;
 	struct netns_ipv4	ipv4;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	struct netns_ipv6	ipv6;
+#endif
 };
 
 #ifdef CONFIG_NET
Index: net-2.6.25/include/net/netns/ipv6.h
===================================================================
--- /dev/null
+++ net-2.6.25/include/net/netns/ipv6.h
@@ -0,0 +1,10 @@
+/*
+ * ipv6 in net namespaces
+ */
+
+#ifndef __NETNS_IPV6_H__
+#define __NETNS_IPV6_H__
+
+struct netns_ipv6 {
+};
+#endif

^ permalink raw reply

* [PATCH net-2.6.25][NETNS][IPV6] make a subsystem for af_inet6
From: Daniel Lezcano @ 2008-01-07 20:44 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 5 bytes --]







[-- Attachment #2: make-af-inet6-a-subsystem.patch --]
[-- Type: text/x-patch, Size: 2042 bytes --]

Subject: make a subsystem for af_inet6
From: Daniel Lezcano <dlezcano@fr.ibm.com>

This patch add a network namespace subsystem for the af_inet6 module. 
It does nothing right now, but one of its purpose is to receive the 
different variables for sysctl in order to initialize them.

When the sysctl variable will be moved to the network namespace structure,
they will be no longer initialized as global static variables, so we must
find a place to initialize them. Because the sysctl can be disabled, it 
has no sense to store them in the sysctl_net_ipv6 file.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/ipv6/af_inet6.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Index: net-2.6.25/net/ipv6/af_inet6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/af_inet6.c
+++ net-2.6.25/net/ipv6/af_inet6.c
@@ -719,6 +719,21 @@ static void cleanup_ipv6_mibs(void)
 	snmp_mib_free((void **)udplite_stats_in6);
 }
 
+static int inet6_net_init(struct net *net)
+{
+	return 0;
+}
+
+static void inet6_net_exit(struct net *net)
+{
+	return;
+}
+
+static struct pernet_operations inet6_net_ops = {
+	.init = inet6_net_init,
+	.exit = inet6_net_exit,
+};
+
 static int __init inet6_init(void)
 {
 	struct sk_buff *dummy_skb;
@@ -782,6 +797,10 @@ static int __init inet6_init(void)
 	 *	able to communicate via both network protocols.
 	 */
 
+	err = register_pernet_subsys(&inet6_net_ops);
+	if (err)
+		goto register_pernet_fail;
+
 #ifdef CONFIG_SYSCTL
 	ipv6_sysctl_register();
 #endif
@@ -898,6 +917,8 @@ icmp_fail:
 #ifdef CONFIG_SYSCTL
 	ipv6_sysctl_unregister();
 #endif
+	unregister_pernet_subsys(&inet6_net_ops);
+register_pernet_fail:
 	cleanup_ipv6_mibs();
 out_unregister_sock:
 	sock_unregister(PF_INET6);
@@ -953,6 +974,7 @@ static void __exit inet6_exit(void)
 #ifdef CONFIG_SYSCTL
 	ipv6_sysctl_unregister();
 #endif
+	unregister_pernet_subsys(&inet6_net_ops);
 	cleanup_ipv6_mibs();
 	proto_unregister(&rawv6_prot);
 	proto_unregister(&udplitev6_prot);

^ permalink raw reply

* make ipv6_sysctl_register to return a value
From: Daniel Lezcano @ 2008-01-07 20:39 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 0 bytes --]



[-- Attachment #2: ipv6-sysctl-register-return-value.patch --]
[-- Type: text/x-patch, Size: 2093 bytes --]

Subject: make ipv6_sysctl_register to return a value
From: Daniel Lezcano <dlezcano@fr.ibm.com>

This patch makes the function ipv6_sysctl_register to return a
value. The af_inet6 init function is now able to handle an error
and catch it from the initialization of the sysctl.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 include/net/ipv6.h         |    2 +-
 net/ipv6/af_inet6.c        |    5 ++++-
 net/ipv6/sysctl_net_ipv6.c |    6 +++++-
 3 files changed, 10 insertions(+), 3 deletions(-)

Index: net-2.6.25/include/net/ipv6.h
===================================================================
--- net-2.6.25.orig/include/net/ipv6.h
+++ net-2.6.25/include/net/ipv6.h
@@ -620,7 +620,7 @@ static inline int snmp6_unregister_dev(s
 extern ctl_table ipv6_route_table[];
 extern ctl_table ipv6_icmp_table[];
 
-extern void ipv6_sysctl_register(void);
+extern int ipv6_sysctl_register(void);
 extern void ipv6_sysctl_unregister(void);
 #endif
 
Index: net-2.6.25/net/ipv6/af_inet6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/af_inet6.c
+++ net-2.6.25/net/ipv6/af_inet6.c
@@ -783,7 +783,9 @@ static int __init inet6_init(void)
 	 */
 
 #ifdef CONFIG_SYSCTL
-	ipv6_sysctl_register();
+	err = ipv6_sysctl_register();
+	if (err)
+		goto sysctl_fail;
 #endif
 	err = icmpv6_init(&inet6_family_ops);
 	if (err)
@@ -897,6 +899,7 @@ ndisc_fail:
 icmp_fail:
 #ifdef CONFIG_SYSCTL
 	ipv6_sysctl_unregister();
+sysctl_fail:
 #endif
 	cleanup_ipv6_mibs();
 out_unregister_sock:
Index: net-2.6.25/net/ipv6/sysctl_net_ipv6.c
===================================================================
--- net-2.6.25.orig/net/ipv6/sysctl_net_ipv6.c
+++ net-2.6.25/net/ipv6/sysctl_net_ipv6.c
@@ -90,9 +90,13 @@ static struct ctl_path ipv6_ctl_path[] =
 
 static struct ctl_table_header *ipv6_sysctl_header;
 
-void ipv6_sysctl_register(void)
+int ipv6_sysctl_register(void)
 {
 	ipv6_sysctl_header = register_sysctl_paths(ipv6_ctl_path, ipv6_table);
+	if (!ipv6_sysctl_header)
+		return -ENOMEM;
+
+	return 0;
 }
 
 void ipv6_sysctl_unregister(void)

^ permalink raw reply

* Re: [IPV4] ROUTE: Avoid sparse warnings
From: Eric Dumazet @ 2008-01-07 20:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, netdev, viro
In-Reply-To: <20080107203845.GA29803@gondor.apana.org.au>

Herbert Xu a écrit :
> On Mon, Jan 07, 2008 at 02:56:24PM +0100, Eric Dumazet wrote:
>> AFAIK, this patch reduces complexity and text size.
> 
> But if we had loads of empty hash buckets couldn't this potentially
> increase latency by disabling BH longer than before?

Well, we call rcu_read_unlock_bh()/rcu_read_lock_bh() for each bucket, empty 
or not, before and after patch, so we dont change latency.




^ permalink raw reply

* Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
From: Jay Vosburgh @ 2008-01-07 20:40 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Krzysztof Oledzki, Herbert Xu, Andrew Morton, bugme-daemon,
	shemminger, davem, netdev
In-Reply-To: <20080107202626.GC8728@gospo.usersys.redhat.com>

Andy Gospodarek <andy@greyhouse.net> wrote:

>On Mon, Jan 07, 2008 at 06:57:25PM +0100, Krzysztof Oledzki wrote:
>> 
>> 
>> On Wed, 19 Dec 2007, Andy Gospodarek wrote:
>> 
>> >On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
>> >>
>> >>
>> >>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>> >>
>> >>>On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
>> >>>>
>> >>>>
>> >>>>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>> >>>>
>> >>>>>On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>On Wed, 12 Dec 2007, Jay Vosburgh wrote:
>> >>>>>>
>> >>>>>>>Herbert Xu <herbert@gondor.apana.org.au> wrote:
>> >>>>>>>
>> >>>>>>>>>diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>> >>>>>>>>>drivers/net/bonding/bond_sysfs.c
>> >>>>>>>>>--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>> >>>>>>>>>+++ a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>>>@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
>> >>>>>>>>>out:
>> >>>>>>>>>    write_unlock_bh(&bond->lock);
>> >>>>>>>>>
>> >>>>>>>>>-       rtnl_unlock();
>> >>>>>>>>>-
>> >>>>>>>>
>> >>>>>>>>Looking at the changeset that added this perhaps the intention
>> >>>>>>>>is to hold the lock? If so we should add an rtnl_lock to the start
>> >>>>>>>>of the function.
>> >>>>>>>
>> >>>>>>>	Yes, this function needs to hold locks, and more than just
>> >>>>>>>what's there now.  I believe the following should be correct; I 
>> >>>>>>>haven't
>> >>>>>>>tested it, though (I'm supposedly on vacation right now).
>> >>>>>>>
>> >>>>>>>	The following change should be correct for the
>> >>>>>>>bonding_store_primary case discussed in this thread, and also 
>> >>>>>>>corrects
>> >>>>>>>the bonding_store_active case which performs similar functions.
>> >>>>>>>
>> >>>>>>>	The bond_change_active_slave and bond_select_active_slave
>> >>>>>>>functions both require rtnl, bond->lock for read and curr_slave_lock
>> >>>>>>>for
>> >>>>>>>write_bh, and no other locks.  This is so that the lower level
>> >>>>>>>mode-specific functions can release locks down to just rtnl in order 
>> >>>>>>>to
>> >>>>>>>call, e.g., dev_set_mac_address with the locks it expects (rtnl 
>> >>>>>>>only).
>> >>>>>>>
>> >>>>>>>Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
>> >>>>>>>
>> >>>>>>>diff --git a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>b/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>index 11b76b3..28a2d80 100644
>> >>>>>>>--- a/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>+++ b/drivers/net/bonding/bond_sysfs.c
>> >>>>>>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
>> >>>>>>>device
>> >>>>>>>*d,
>> >>>>>>>	struct slave *slave;
>> >>>>>>>	struct bonding *bond = to_bond(d);
>> >>>>>>>
>> >>>>>>>-	write_lock_bh(&bond->lock);
>> >>>>>>>+	rtnl_lock();
>> >>>>>>>+	read_lock(&bond->lock);
>> >>>>>>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>>>>>+
>> >>>>>>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>>>>>		printk(KERN_INFO DRV_NAME
>> >>>>>>>		       ": %s: Unable to set primary slave; %s is in 
>> >>>>>>>		       mode
>> >>>>>>>		       %d\n",
>> >>>>>>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
>> >>>>>>>device
>> >>>>>>>*d,
>> >>>>>>>		}
>> >>>>>>>	}
>> >>>>>>>out:
>> >>>>>>>-	write_unlock_bh(&bond->lock);
>> >>>>>>>-
>> >>>>>>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>>>>>+	read_unlock(&bond->lock);
>> >>>>>>>	rtnl_unlock();
>> >>>>>>>
>> >>>>>>>	return count;
>> >>>>>>>@@ -1190,7 +1193,8 @@ static ssize_t 
>> >>>>>>>bonding_store_active_slave(struct
>> >>>>>>>device *d,
>> >>>>>>>	struct bonding *bond = to_bond(d);
>> >>>>>>>
>> >>>>>>>	rtnl_lock();
>> >>>>>>>-	write_lock_bh(&bond->lock);
>> >>>>>>>+	read_lock(&bond->lock);
>> >>>>>>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>>>>>
>> >>>>>>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>>>>>		printk(KERN_INFO DRV_NAME
>> >>>>>>>@@ -1247,7 +1251,8 @@ static ssize_t 
>> >>>>>>>bonding_store_active_slave(struct
>> >>>>>>>device *d,
>> >>>>>>>		}
>> >>>>>>>	}
>> >>>>>>>out:
>> >>>>>>>-	write_unlock_bh(&bond->lock);
>> >>>>>>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>>>>>+	read_unlock(&bond->lock);
>> >>>>>>>	rtnl_unlock();
>> >>>>>>>
>> >>>>>>>	return count;
>> >>>>>>
>> >>>>>>Vanilla 2.6.24-rc5 plus this patch:
>> >>>>>>
>> >>>>>>=========================================================
>> >>>>>>[ INFO: possible irq lock inversion dependency detected ]
>> >>>>>>2.6.24-rc5 #1
>> >>>>>>---------------------------------------------------------
>> >>>>>>events/0/9 just changed the state of lock:
>> >>>>>>(&mc->mca_lock){-+..}, at: [<c0411c7a>] 
>> >>>>>>mld_ifc_timer_expire+0x130/0x1fb
>> >>>>>>but this lock took another, soft-read-irq-unsafe lock in the past:
>> >>>>>>(&bond->lock){-.--}
>> >>>>>>
>> >>>>>>and interrupts could create inverse lock ordering between them.
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>Grrr, I should have seen that -- sorry.  Try your luck with this 
>> >>>>>instead:
>> >>>><CUT>
>> >>>>
>> >>>>No luck.
>> >>>>
>> >>>
>> >>>
>> >>>I'm guessing if we go back to using a write-lock for bond->lock this
>> >>>will go back to working again, but I'm not totally convinced since there
>> >>>are plenty of places where we used a read-lock with it.
>> >>
>> >>Should I check this patch or rather, based on a future discussion, wait
>> >>for another version?
>> >>
>> >>>
>> >>>diff --git a/drivers/net/bonding/bond_sysfs.c
>> >>>b/drivers/net/bonding/bond_sysfs.c
>> >>>index 11b76b3..635b857 100644
>> >>>--- a/drivers/net/bonding/bond_sysfs.c
>> >>>+++ b/drivers/net/bonding/bond_sysfs.c
>> >>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
>> >>>*d,
>> >>>	struct slave *slave;
>> >>>	struct bonding *bond = to_bond(d);
>> >>>
>> >>>+	rtnl_lock();
>> >>>	write_lock_bh(&bond->lock);
>> >>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>+
>> >>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>		printk(KERN_INFO DRV_NAME
>> >>>		       ": %s: Unable to set primary slave; %s is in mode
>> >>>		       %d\n",
>> >>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
>> >>>*d,
>> >>>		}
>> >>>	}
>> >>>out:
>> >>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>	write_unlock_bh(&bond->lock);
>> >>>-
>> >>>	rtnl_unlock();
>> >>>
>> >>>	return count;
>> >>>@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
>> >>>device *d,
>> >>>
>> >>>	rtnl_lock();
>> >>>	write_lock_bh(&bond->lock);
>> >>>+	write_lock_bh(&bond->curr_slave_lock);
>> >>>
>> >>>	if (!USES_PRIMARY(bond->params.mode)) {
>> >>>		printk(KERN_INFO DRV_NAME
>> >>>@@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
>> >>>device *d,
>> >>>		}
>> >>>	}
>> >>>out:
>> >>>+	write_unlock_bh(&bond->curr_slave_lock);
>> >>>	write_unlock_bh(&bond->lock);
>> >>>	rtnl_unlock();
>> >>>
>> >>
>> >>
>> >>Best regards,
>> >>
>> >>					Krzysztof Olędzki
>> >
>> >For now, I prefer Jay's original patch -- with the read_locks (rather
>> >than read/write_lock_bh) and the added rtnl_lock.  There is still a
>> >lockdep issue that we need to sort-out, but this patch is needed first.
>> 
>> This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any 
>> chances to cure it before 2.6.24-final?
>> 
>> Best regards,
>> 
>> 				Krzysztof Olędzki
>
>Krzysztof,
>
>I doubt the lockdep issue will be fixed, but the patch Jay posted and I
>acked needs to be included in 2.6.24.

	I'm (finally) back from vacation and am working on the lock
problem right now; there are a couple of other changes that need to go
in (in addition to what was posted previously).  One is a spurious RTNL
warning, the other is a similar 'wrong lock' type of problem that arises
during module unload.

	I should have a patch set for this posted in a couple of hours.

>I played around with the locking when setting the multicast list and I
>can make the lockdep issue go away, but I need to be sure that it's OK
>to switch it to a read-lock from a write-lock (and I don't really think
>it is).

	I haven't looked at the lockdep problem yet.  If you want to be
brave and post your working patch for the lockdep thing, I might be able
to crush your hopes that it's ok.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [IPV4] ROUTE: Avoid sparse warnings
From: Herbert Xu @ 2008-01-07 20:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, viro
In-Reply-To: <20080107145624.05918cea.dada1@cosmosbay.com>

On Mon, Jan 07, 2008 at 02:56:24PM +0100, Eric Dumazet wrote:
> 
> AFAIK, this patch reduces complexity and text size.

But if we had loads of empty hash buckets couldn't this potentially
increase latency by disabling BH longer than before?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
From: Andy Gospodarek @ 2008-01-07 20:26 UTC (permalink / raw)
  To: Krzysztof Oledzki
  Cc: Jay Vosburgh, Herbert Xu, Andrew Morton, bugme-daemon, shemminger,
	davem, netdev
In-Reply-To: <Pine.LNX.4.64.0801071854140.6696@bizon.gios.gov.pl>

On Mon, Jan 07, 2008 at 06:57:25PM +0100, Krzysztof Oledzki wrote:
> 
> 
> On Wed, 19 Dec 2007, Andy Gospodarek wrote:
> 
> >On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
> >>
> >>
> >>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
> >>
> >>>On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
> >>>>
> >>>>
> >>>>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
> >>>>
> >>>>>On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
> >>>>>>
> >>>>>>
> >>>>>>On Wed, 12 Dec 2007, Jay Vosburgh wrote:
> >>>>>>
> >>>>>>>Herbert Xu <herbert@gondor.apana.org.au> wrote:
> >>>>>>>
> >>>>>>>>>diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
> >>>>>>>>>drivers/net/bonding/bond_sysfs.c
> >>>>>>>>>--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
> >>>>>>>>>+++ a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>>>@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
> >>>>>>>>>out:
> >>>>>>>>>    write_unlock_bh(&bond->lock);
> >>>>>>>>>
> >>>>>>>>>-       rtnl_unlock();
> >>>>>>>>>-
> >>>>>>>>
> >>>>>>>>Looking at the changeset that added this perhaps the intention
> >>>>>>>>is to hold the lock? If so we should add an rtnl_lock to the start
> >>>>>>>>of the function.
> >>>>>>>
> >>>>>>>	Yes, this function needs to hold locks, and more than just
> >>>>>>>what's there now.  I believe the following should be correct; I 
> >>>>>>>haven't
> >>>>>>>tested it, though (I'm supposedly on vacation right now).
> >>>>>>>
> >>>>>>>	The following change should be correct for the
> >>>>>>>bonding_store_primary case discussed in this thread, and also 
> >>>>>>>corrects
> >>>>>>>the bonding_store_active case which performs similar functions.
> >>>>>>>
> >>>>>>>	The bond_change_active_slave and bond_select_active_slave
> >>>>>>>functions both require rtnl, bond->lock for read and curr_slave_lock
> >>>>>>>for
> >>>>>>>write_bh, and no other locks.  This is so that the lower level
> >>>>>>>mode-specific functions can release locks down to just rtnl in order 
> >>>>>>>to
> >>>>>>>call, e.g., dev_set_mac_address with the locks it expects (rtnl 
> >>>>>>>only).
> >>>>>>>
> >>>>>>>Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
> >>>>>>>
> >>>>>>>diff --git a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>b/drivers/net/bonding/bond_sysfs.c
> >>>>>>>index 11b76b3..28a2d80 100644
> >>>>>>>--- a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>+++ b/drivers/net/bonding/bond_sysfs.c
> >>>>>>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
> >>>>>>>device
> >>>>>>>*d,
> >>>>>>>	struct slave *slave;
> >>>>>>>	struct bonding *bond = to_bond(d);
> >>>>>>>
> >>>>>>>-	write_lock_bh(&bond->lock);
> >>>>>>>+	rtnl_lock();
> >>>>>>>+	read_lock(&bond->lock);
> >>>>>>>+	write_lock_bh(&bond->curr_slave_lock);
> >>>>>>>+
> >>>>>>>	if (!USES_PRIMARY(bond->params.mode)) {
> >>>>>>>		printk(KERN_INFO DRV_NAME
> >>>>>>>		       ": %s: Unable to set primary slave; %s is in 
> >>>>>>>		       mode
> >>>>>>>		       %d\n",
> >>>>>>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
> >>>>>>>device
> >>>>>>>*d,
> >>>>>>>		}
> >>>>>>>	}
> >>>>>>>out:
> >>>>>>>-	write_unlock_bh(&bond->lock);
> >>>>>>>-
> >>>>>>>+	write_unlock_bh(&bond->curr_slave_lock);
> >>>>>>>+	read_unlock(&bond->lock);
> >>>>>>>	rtnl_unlock();
> >>>>>>>
> >>>>>>>	return count;
> >>>>>>>@@ -1190,7 +1193,8 @@ static ssize_t 
> >>>>>>>bonding_store_active_slave(struct
> >>>>>>>device *d,
> >>>>>>>	struct bonding *bond = to_bond(d);
> >>>>>>>
> >>>>>>>	rtnl_lock();
> >>>>>>>-	write_lock_bh(&bond->lock);
> >>>>>>>+	read_lock(&bond->lock);
> >>>>>>>+	write_lock_bh(&bond->curr_slave_lock);
> >>>>>>>
> >>>>>>>	if (!USES_PRIMARY(bond->params.mode)) {
> >>>>>>>		printk(KERN_INFO DRV_NAME
> >>>>>>>@@ -1247,7 +1251,8 @@ static ssize_t 
> >>>>>>>bonding_store_active_slave(struct
> >>>>>>>device *d,
> >>>>>>>		}
> >>>>>>>	}
> >>>>>>>out:
> >>>>>>>-	write_unlock_bh(&bond->lock);
> >>>>>>>+	write_unlock_bh(&bond->curr_slave_lock);
> >>>>>>>+	read_unlock(&bond->lock);
> >>>>>>>	rtnl_unlock();
> >>>>>>>
> >>>>>>>	return count;
> >>>>>>
> >>>>>>Vanilla 2.6.24-rc5 plus this patch:
> >>>>>>
> >>>>>>=========================================================
> >>>>>>[ INFO: possible irq lock inversion dependency detected ]
> >>>>>>2.6.24-rc5 #1
> >>>>>>---------------------------------------------------------
> >>>>>>events/0/9 just changed the state of lock:
> >>>>>>(&mc->mca_lock){-+..}, at: [<c0411c7a>] 
> >>>>>>mld_ifc_timer_expire+0x130/0x1fb
> >>>>>>but this lock took another, soft-read-irq-unsafe lock in the past:
> >>>>>>(&bond->lock){-.--}
> >>>>>>
> >>>>>>and interrupts could create inverse lock ordering between them.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>Grrr, I should have seen that -- sorry.  Try your luck with this 
> >>>>>instead:
> >>>><CUT>
> >>>>
> >>>>No luck.
> >>>>
> >>>
> >>>
> >>>I'm guessing if we go back to using a write-lock for bond->lock this
> >>>will go back to working again, but I'm not totally convinced since there
> >>>are plenty of places where we used a read-lock with it.
> >>
> >>Should I check this patch or rather, based on a future discussion, wait
> >>for another version?
> >>
> >>>
> >>>diff --git a/drivers/net/bonding/bond_sysfs.c
> >>>b/drivers/net/bonding/bond_sysfs.c
> >>>index 11b76b3..635b857 100644
> >>>--- a/drivers/net/bonding/bond_sysfs.c
> >>>+++ b/drivers/net/bonding/bond_sysfs.c
> >>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
> >>>*d,
> >>>	struct slave *slave;
> >>>	struct bonding *bond = to_bond(d);
> >>>
> >>>+	rtnl_lock();
> >>>	write_lock_bh(&bond->lock);
> >>>+	write_lock_bh(&bond->curr_slave_lock);
> >>>+
> >>>	if (!USES_PRIMARY(bond->params.mode)) {
> >>>		printk(KERN_INFO DRV_NAME
> >>>		       ": %s: Unable to set primary slave; %s is in mode
> >>>		       %d\n",
> >>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
> >>>*d,
> >>>		}
> >>>	}
> >>>out:
> >>>+	write_unlock_bh(&bond->curr_slave_lock);
> >>>	write_unlock_bh(&bond->lock);
> >>>-
> >>>	rtnl_unlock();
> >>>
> >>>	return count;
> >>>@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
> >>>device *d,
> >>>
> >>>	rtnl_lock();
> >>>	write_lock_bh(&bond->lock);
> >>>+	write_lock_bh(&bond->curr_slave_lock);
> >>>
> >>>	if (!USES_PRIMARY(bond->params.mode)) {
> >>>		printk(KERN_INFO DRV_NAME
> >>>@@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
> >>>device *d,
> >>>		}
> >>>	}
> >>>out:
> >>>+	write_unlock_bh(&bond->curr_slave_lock);
> >>>	write_unlock_bh(&bond->lock);
> >>>	rtnl_unlock();
> >>>
> >>
> >>
> >>Best regards,
> >>
> >>					Krzysztof Olędzki
> >
> >For now, I prefer Jay's original patch -- with the read_locks (rather
> >than read/write_lock_bh) and the added rtnl_lock.  There is still a
> >lockdep issue that we need to sort-out, but this patch is needed first.
> 
> This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any 
> chances to cure it before 2.6.24-final?
> 
> Best regards,
> 
> 				Krzysztof Olędzki

Krzysztof,

I doubt the lockdep issue will be fixed, but the patch Jay posted and I
acked needs to be included in 2.6.24.

I played around with the locking when setting the multicast list and I
can make the lockdep issue go away, but I need to be sure that it's OK
to switch it to a read-lock from a write-lock (and I don't really think
it is).

-andy


^ permalink raw reply

* Re: 2.6.24-rc6 oops in net_tx_action
From: linux @ 2008-01-07 19:13 UTC (permalink / raw)
  To: davem, linux; +Cc: jgarzik, linux-kernel, netdev, romieu, shemminger
In-Reply-To: <20080107.002342.10040673.davem@davemloft.net>

>> Thanks!  I got the patch from
>> http://marc.info/?l=linux-netdev&m=119756785219214
>> (Which didn't make it into -rc7; please fix!)
>> and am recompiling now.

> Jeff is busy so he's asked me to pick up the more important
> driver bug fixes that get posted.
>
> I'll push this around, thanks.

Much obliged.  It's only 11 hours of uptime, but no problems so far,
even trying abusive things like "ping -f -l64 -s8000".

^ permalink raw reply

* [IPV4] ROUTE: ip_rt_dump() is unecessary slow
From: Eric Dumazet @ 2008-01-07 18:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org

I noticed "ip route list cache x.y.z.t" can be *very* slow.

While strace-ing -T it I also noticed that first part of route cache is
fetched quite fast :


recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>

while the part at the end of the table is more expensive:

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2  \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>

The following patch corrects this performance/latency problem, removing quadratic behavior.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 10915bb..b7aa6dd 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2755,11 +2755,10 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 	int idx, s_idx;
 
 	s_h = cb->args[0];
+	if (s_h < 0)
+		s_h = 0;
 	s_idx = idx = cb->args[1];
-	for (h = 0; h <= rt_hash_mask; h++) {
-		if (h < s_h) continue;
-		if (h > s_h)
-			s_idx = 0;
+	for (h = s_h; h <= rt_hash_mask; h++) {
 		rcu_read_lock_bh();
 		for (rt = rcu_dereference(rt_hash_table[h].chain), idx = 0; rt;
 		     rt = rcu_dereference(rt->u.dst.rt_next), idx++) {
@@ -2776,6 +2775,7 @@ int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 			dst_release(xchg(&skb->dst, NULL));
 		}
 		rcu_read_unlock_bh();
+		s_idx = 0;
 	}
 
 done:


^ permalink raw reply related

* Re: [Bugme-new] [Bug 9543] New: RTNL: assertion failed at net/ipv6/addrconf.c (2164)/RTNL: assertion failed at net/ipv4/devinet.c (1055)
From: Krzysztof Oledzki @ 2008-01-07 17:57 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jay Vosburgh, Herbert Xu, Andrew Morton, bugme-daemon, shemminger,
	davem, netdev
In-Reply-To: <20071219144208.GB8728@gospo.usersys.redhat.com>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6817 bytes --]



On Wed, 19 Dec 2007, Andy Gospodarek wrote:

> On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
>>
>>
>> On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>>
>>> On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
>>>>
>>>>
>>>> On Fri, 14 Dec 2007, Andy Gospodarek wrote:
>>>>
>>>>> On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
>>>>>>
>>>>>>
>>>>>> On Wed, 12 Dec 2007, Jay Vosburgh wrote:
>>>>>>
>>>>>>> Herbert Xu <herbert@gondor.apana.org.au> wrote:
>>>>>>>
>>>>>>>>> diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>>>>>>>>> drivers/net/bonding/bond_sysfs.c
>>>>>>>>> --- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
>>>>>>>>> +++ a/drivers/net/bonding/bond_sysfs.c
>>>>>>>>> @@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
>>>>>>>>> out:
>>>>>>>>>     write_unlock_bh(&bond->lock);
>>>>>>>>>
>>>>>>>>> -       rtnl_unlock();
>>>>>>>>> -
>>>>>>>>
>>>>>>>> Looking at the changeset that added this perhaps the intention
>>>>>>>> is to hold the lock? If so we should add an rtnl_lock to the start
>>>>>>>> of the function.
>>>>>>>
>>>>>>> 	Yes, this function needs to hold locks, and more than just
>>>>>>> what's there now.  I believe the following should be correct; I haven't
>>>>>>> tested it, though (I'm supposedly on vacation right now).
>>>>>>>
>>>>>>> 	The following change should be correct for the
>>>>>>> bonding_store_primary case discussed in this thread, and also corrects
>>>>>>> the bonding_store_active case which performs similar functions.
>>>>>>>
>>>>>>> 	The bond_change_active_slave and bond_select_active_slave
>>>>>>> functions both require rtnl, bond->lock for read and curr_slave_lock
>>>>>>> for
>>>>>>> write_bh, and no other locks.  This is so that the lower level
>>>>>>> mode-specific functions can release locks down to just rtnl in order to
>>>>>>> call, e.g., dev_set_mac_address with the locks it expects (rtnl only).
>>>>>>>
>>>>>>> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
>>>>>>>
>>>>>>> diff --git a/drivers/net/bonding/bond_sysfs.c
>>>>>>> b/drivers/net/bonding/bond_sysfs.c
>>>>>>> index 11b76b3..28a2d80 100644
>>>>>>> --- a/drivers/net/bonding/bond_sysfs.c
>>>>>>> +++ b/drivers/net/bonding/bond_sysfs.c
>>>>>>> @@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
>>>>>>> device
>>>>>>> *d,
>>>>>>> 	struct slave *slave;
>>>>>>> 	struct bonding *bond = to_bond(d);
>>>>>>>
>>>>>>> -	write_lock_bh(&bond->lock);
>>>>>>> +	rtnl_lock();
>>>>>>> +	read_lock(&bond->lock);
>>>>>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>>>>> +
>>>>>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>>>>>> 		printk(KERN_INFO DRV_NAME
>>>>>>> 		       ": %s: Unable to set primary slave; %s is in mode
>>>>>>> 		       %d\n",
>>>>>>> @@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
>>>>>>> device
>>>>>>> *d,
>>>>>>> 		}
>>>>>>> 	}
>>>>>>> out:
>>>>>>> -	write_unlock_bh(&bond->lock);
>>>>>>> -
>>>>>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>>>>>> +	read_unlock(&bond->lock);
>>>>>>> 	rtnl_unlock();
>>>>>>>
>>>>>>> 	return count;
>>>>>>> @@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
>>>>>>> device *d,
>>>>>>> 	struct bonding *bond = to_bond(d);
>>>>>>>
>>>>>>> 	rtnl_lock();
>>>>>>> -	write_lock_bh(&bond->lock);
>>>>>>> +	read_lock(&bond->lock);
>>>>>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>>>>>
>>>>>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>>>>>> 		printk(KERN_INFO DRV_NAME
>>>>>>> @@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
>>>>>>> device *d,
>>>>>>> 		}
>>>>>>> 	}
>>>>>>> out:
>>>>>>> -	write_unlock_bh(&bond->lock);
>>>>>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>>>>>> +	read_unlock(&bond->lock);
>>>>>>> 	rtnl_unlock();
>>>>>>>
>>>>>>> 	return count;
>>>>>>
>>>>>> Vanilla 2.6.24-rc5 plus this patch:
>>>>>>
>>>>>> =========================================================
>>>>>> [ INFO: possible irq lock inversion dependency detected ]
>>>>>> 2.6.24-rc5 #1
>>>>>> ---------------------------------------------------------
>>>>>> events/0/9 just changed the state of lock:
>>>>>> (&mc->mca_lock){-+..}, at: [<c0411c7a>] mld_ifc_timer_expire+0x130/0x1fb
>>>>>> but this lock took another, soft-read-irq-unsafe lock in the past:
>>>>>> (&bond->lock){-.--}
>>>>>>
>>>>>> and interrupts could create inverse lock ordering between them.
>>>>>>
>>>>>>
>>>>>
>>>>> Grrr, I should have seen that -- sorry.  Try your luck with this instead:
>>>> <CUT>
>>>>
>>>> No luck.
>>>>
>>>
>>>
>>> I'm guessing if we go back to using a write-lock for bond->lock this
>>> will go back to working again, but I'm not totally convinced since there
>>> are plenty of places where we used a read-lock with it.
>>
>> Should I check this patch or rather, based on a future discussion, wait
>> for another version?
>>
>>>
>>> diff --git a/drivers/net/bonding/bond_sysfs.c
>>> b/drivers/net/bonding/bond_sysfs.c
>>> index 11b76b3..635b857 100644
>>> --- a/drivers/net/bonding/bond_sysfs.c
>>> +++ b/drivers/net/bonding/bond_sysfs.c
>>> @@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
>>> *d,
>>> 	struct slave *slave;
>>> 	struct bonding *bond = to_bond(d);
>>>
>>> +	rtnl_lock();
>>> 	write_lock_bh(&bond->lock);
>>> +	write_lock_bh(&bond->curr_slave_lock);
>>> +
>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>> 		printk(KERN_INFO DRV_NAME
>>> 		       ": %s: Unable to set primary slave; %s is in mode
>>> 		       %d\n",
>>> @@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
>>> *d,
>>> 		}
>>> 	}
>>> out:
>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>> 	write_unlock_bh(&bond->lock);
>>> -
>>> 	rtnl_unlock();
>>>
>>> 	return count;
>>> @@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
>>> device *d,
>>>
>>> 	rtnl_lock();
>>> 	write_lock_bh(&bond->lock);
>>> +	write_lock_bh(&bond->curr_slave_lock);
>>>
>>> 	if (!USES_PRIMARY(bond->params.mode)) {
>>> 		printk(KERN_INFO DRV_NAME
>>> @@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
>>> device *d,
>>> 		}
>>> 	}
>>> out:
>>> +	write_unlock_bh(&bond->curr_slave_lock);
>>> 	write_unlock_bh(&bond->lock);
>>> 	rtnl_unlock();
>>>
>>
>>
>> Best regards,
>>
>> 					Krzysztof Olędzki
>
> For now, I prefer Jay's original patch -- with the read_locks (rather
> than read/write_lock_bh) and the added rtnl_lock.  There is still a
> lockdep issue that we need to sort-out, but this patch is needed first.

This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any 
chances to cure it before 2.6.24-final?

Best regards,

 				Krzysztof Olędzki

^ permalink raw reply

* [PATCH] AX25: nuke user trigable printks
From: maximilian attems @ 2008-01-07 17:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, maximilian attems

sfuzz trigerrs any of those printk easily.
things that should have gone in early 2.5.x aka years ago
should not be thrown so easily to the user.

Signed-off-by: maximilian attems <max@stro.at>
---
 net/ax25/af_ax25.c |   15 +++------------
 1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 8378afd..62ab6bd 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1111,18 +1111,13 @@ static int __must_check ax25_connect(struct socket *sock,
 
 	if (addr_len == sizeof(struct sockaddr_ax25)) {
 		/* support for this will go away in early 2.5.x */
-		printk(KERN_WARNING "ax25_connect(): %s uses obsolete socket structure\n",
-			current->comm);
+		;
 	}
 	else if (addr_len != sizeof(struct full_sockaddr_ax25)) {
 		/* support for old structure may go away some time */
 		if ((addr_len < sizeof(struct sockaddr_ax25) + sizeof(ax25_address) * 6) ||
-		    (addr_len > sizeof(struct full_sockaddr_ax25))) {
+		    (addr_len > sizeof(struct full_sockaddr_ax25)))
 			return -EINVAL;
-		}
-
-		printk(KERN_WARNING "ax25_connect(): %s uses old (6 digipeater) socket structure.\n",
-			current->comm);
 	}
 
 	if (fsa->fsa_ax25.sax25_family != AF_AX25)
@@ -1468,8 +1463,7 @@ static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
 		}
 
 		if (addr_len == sizeof(struct sockaddr_ax25)) {
-			printk(KERN_WARNING "ax25_sendmsg(): %s uses obsolete socket structure\n",
-				current->comm);
+			;
 		}
 		else if (addr_len != sizeof(struct full_sockaddr_ax25)) {
 			/* support for old structure may go away some time */
@@ -1478,9 +1472,6 @@ static int ax25_sendmsg(struct kiocb *iocb, struct socket *sock,
 				err = -EINVAL;
 				goto out;
 			}
-
-			printk(KERN_WARNING "ax25_sendmsg(): %s uses old (6 digipeater) socket structure.\n",
-				current->comm);
 		}
 
 		if (addr_len > sizeof(struct sockaddr_ax25) && usax->sax25_ndigis != 0) {
-- 
debian.1.5.3.7.1-dirty


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox