Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-2.6 6/6] caif: Bugfix - use MSG_TRUNC in receive
From: sjur.brandeland @ 2010-05-21 12:16 UTC (permalink / raw)
  To: netdev, davem; +Cc: sjurbr, linus.walleij, marcel, Sjur Braendeland
In-Reply-To: <1274444172-31969-5-git-send-email-sjur.brandeland@stericsson.com>

From: Sjur Braendeland <sjur.brandeland@stericsson.com>

Fixed handling when skb don't fit in user buffer,
instead of returning -EMSGSIZE, the buffer is truncated (just
as unix seqpakcet does).

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
---
 net/caif/caif_socket.c |   47 ++++++++++++++++++-----------------------------
 1 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index 691a571..3d0e095 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -292,53 +292,42 @@ static void caif_check_flow_release(struct sock *sk)
 			caif_flow_ctrl(sk, CAIF_MODEMCMD_FLOW_ON_REQ);
 	}
 }
+
 /*
- * Copied from sock.c:sock_queue_rcv_skb(), and added check that user buffer
- * has sufficient size.
+ * Copied from unix_dgram_recvmsg, but removed credit checks,
+ * changed locking, address handling and added MSG_TRUNC.
  */
-
 static int caif_seqpkt_recvmsg(struct kiocb *iocb, struct socket *sock,
-				struct msghdr *m, size_t buf_len, int flags)
+				struct msghdr *m, size_t len, int flags)
 
 {
 	struct sock *sk = sock->sk;
 	struct sk_buff *skb;
-	int ret = 0;
-	int len;
+	int ret;
+	int copylen;
 
-	if (unlikely(!buf_len))
-		return -EINVAL;
+	ret = -EOPNOTSUPP;
+	if (m->msg_flags&MSG_OOB)
+		goto read_error;
 
 	skb = skb_recv_datagram(sk, flags, 0 , &ret);
 	if (!skb)
 		goto read_error;
-
-	len = skb->len;
-
-	if (skb && skb->len > buf_len && !(flags & MSG_PEEK)) {
-		len = buf_len;
-		/*
-		 * Push skb back on receive queue if buffer too small.
-		 * This has a built-in race where multi-threaded receive
-		 * may get packet in wrong order, but multiple read does
-		 * not really guarantee ordered delivery anyway.
-		 * Let's optimize for speed without taking locks.
-		 */
-
-		skb_queue_head(&sk->sk_receive_queue, skb);
-		ret = -EMSGSIZE;
-		goto read_error;
+	copylen = skb->len;
+	if (len < copylen) {
+		m->msg_flags |= MSG_TRUNC;
+		copylen = len;
 	}
 
-	ret = skb_copy_datagram_iovec(skb, 0, m->msg_iov, len);
+	ret = skb_copy_datagram_iovec(skb, 0, m->msg_iov, copylen);
 	if (ret)
-		goto read_error;
+		goto out_free;
 
+	ret = (flags & MSG_TRUNC) ? skb->len : copylen;
+out_free:
 	skb_free_datagram(sk, skb);
-
 	caif_check_flow_release(sk);
-
-	return len;
+	return ret;
 
 read_error:
 	return ret;
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH] rtnetlink: Fix error handling in do_setlink()
From: David Howells @ 2010-05-21 12:25 UTC (permalink / raw)
  To: netdev; +Cc: David Howells, Chris Wright, David S. Miller

Commit c02db8c6290bb992442fec1407643c94cc414375:

	Author:  Chris Wright <chrisw@sous-sol.org>
	Date:    Sun May 16 01:05:45 2010 -0700
	Subject: rtnetlink: make SR-IOV VF interface symmetric

adds broken error handling to do_setlink() in net/core/rtnetlink.c.  The
problem is the following chunk of code:

	if (tb[IFLA_VFINFO_LIST]) {
		struct nlattr *attr;
		int rem;
		nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
			if (nla_type(attr) != IFLA_VF_INFO)
  ---->				goto errout;
			err = do_setvfinfo(dev, attr);
			if (err < 0)
				goto errout;
			modified = 1;
		}
	}

which can get to errout without setting err, resulting in the following error:

net/core/rtnetlink.c: In function 'do_setlink':
net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function

Change the code to return -EINVAL in this case.  Note that this might not be
the appropriate error though.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chris Wright <chrisw@sous-sol.org>
cc: David S. Miller <davem@davemloft.net>
---

 net/core/rtnetlink.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)


diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 31e85d3..56b4157 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1030,8 +1030,10 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 		struct nlattr *attr;
 		int rem;
 		nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
-			if (nla_type(attr) != IFLA_VF_INFO)
+			if (nla_type(attr) != IFLA_VF_INFO) {
+				err = -EINVAL;
 				goto errout;
+			}
 			err = do_setvfinfo(dev, attr);
 			if (err < 0)
 				goto errout;


^ permalink raw reply related

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Eilon Greenstein @ 2010-05-21 12:58 UTC (permalink / raw)
  To: Krzysztof Olędzki; +Cc: Rick Jones, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF5A19E.5010003@ans.pl>

On Thu, 2010-05-20 at 13:54 -0700, Krzysztof Olędzki wrote:
> On 2010-05-20 22:25, Rick Jones wrote:
> > Some simple/simplistic thoughts/questions...
> >
> > Has the DAC been used successfully prior to this?
> 
> Yes. It was successfully used to connect two HP switches, before I 
> received SFP+ SR modules, that allowed me to put the switches into 
> distanced rooms.

Krzysztof,

I would like to check in what speed the I2C is running at, and while we
are there, to make sure it is at low speed (100KHz and not 400KHz). Can
you please add this patch and let me know what you see:

diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
index ff70be8..6e05f99 100644
--- a/drivers/net/bnx2x_link.c
+++ b/drivers/net/bnx2x_link.c
@@ -2766,6 +2766,21 @@ static u8
bnx2x_8727_read_sfp_module_eeprom(struct link_params *params,
                      MDIO_PMA_REG_SFP_TWO_WIRE_CTRL,
                      &val);
 
+       /* Make sure we are working in 100KHz */
+       bnx2x_cl45_read(bp, port,
+                     PORT_HW_CFG_XGXS_EXT_PHY_TYPE_BCM8727,
+                     ext_phy_addr,
+                     MDIO_PMA_DEVAD,
+                     8005,
+                     &val);
+       BNX2X_ERR("I2C 2W speed 0x%x, bit 8 %d\n", val, val & 8);
+       bnx2x_cl45_write(bp, port,
+                      ext_phy_type,
+                      ext_phy_addr,
+                      MDIO_PMA_DEVAD,
+                      8005,
+                      val & ~8);
+
        /* Set the read command byte count */
        bnx2x_cl45_write(bp, port,
                       ext_phy_type,

Since we are already off for the weekend, I couldn't find a machine with
this kind of board to test it for myself. On top of that, I don't have
this type of DAC cable, so I need your help in this debug process.

Thanks,
Eilon




^ permalink raw reply related

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-21 12:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Neil Horman, David Miller, eric.dumazet, bmb, tgraf, netdev
In-Reply-To: <20100521110847.GA29878@gondor.apana.org.au>

On Fri, May 21, 2010 at 09:08:47PM +1000, Herbert Xu wrote:
> On Fri, May 21, 2010 at 06:51:28AM -0400, Neil Horman wrote:
> >
> > I read it, we just described the issue in diferent terms.
> 
> Yeah I wasn't clear enough with my email without an actual patch :)
> 
No worries, its not an easy issue to describe :)

> Thanks for testing Neil!
Thanks for the patch, I'll let you know how it works out in just a bit here!
Neil

> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] net-2.6 : V2 - fix dev_get_valid_name
From: Daniel Lezcano @ 2010-05-21 13:10 UTC (permalink / raw)
  To: opurdila; +Cc: davem, netdev
In-Reply-To: <1274299939-28132-1-git-send-email-daniel.lezcano@free.fr>

On 05/19/2010 10:12 PM, Daniel Lezcano wrote:
> the commit:
>
> commit d90310243fd750240755e217c5faa13e24f41536
> Author: Octavian Purdila<opurdila@ixiacom.com>
> Date:   Wed Nov 18 02:36:59 2009 +0000
>
>      net: device name allocation cleanups
>
> introduced a bug when there is a hash collision making impossible
> to rename a device with eth%d. This bug is very hard to reproduce
> and appears rarely.
>
> The problem is coming from we don't pass a temporary buffer to
> __dev_alloc_name but 'dev->name' which is modified by the function.
>
> A detailed explanation is here:
>
> http://marc.info/?l=linux-netdev&m=127417784011987&w=2
>
> Changelog:
>   V2 : replaced strings comparison by pointers comparison
>
> Signed-off-by: Daniel Lezcano<daniel.lezcano@free.fr>
> ---
>    

Octavian, are you ok with this patch ?

^ permalink raw reply

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Krzysztof Olędzki @ 2010-05-21 13:19 UTC (permalink / raw)
  To: eilong; +Cc: Rick Jones, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <1274446713.11082.3.camel@lb-tlvb-eilong.il.broadcom.com>

On 2010-05-21 14:58, Eilon Greenstein wrote:
> On Thu, 2010-05-20 at 13:54 -0700, Krzysztof Olędzki wrote:
>> On 2010-05-20 22:25, Rick Jones wrote:
>>> Some simple/simplistic thoughts/questions...
>>>
>>> Has the DAC been used successfully prior to this?
>>
>> Yes. It was successfully used to connect two HP switches, before I
>> received SFP+ SR modules, that allowed me to put the switches into
>> distanced rooms.
>
> Krzysztof,
>
> I would like to check in what speed the I2C is running at, and while we
> are there, to make sure it is at low speed (100KHz and not 400KHz). Can
> you please add this patch and let me know what you see:

Sure.

>
> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c

<CUT>

[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2774(eth5)]I2C 2W speed 0x0, bit 8 0

> Since we are already off for the weekend, I couldn't find a machine with
> this kind of board to test it for myself. On top of that, I don't have
> this type of DAC cable, so I need your help in this debug process.

No problem. There is no need to ask for a help a person who depends on 
your support. ;) I would like to solve it as soon as possible so I'm 
willing to test everything you send me. ;)

best regards,

			Krzysztof Olędzki


^ permalink raw reply

* Re: [PATCH] net-2.6 : V2 - fix dev_get_valid_name
From: Octavian Purdila @ 2010-05-21 13:25 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: davem, netdev
In-Reply-To: <4BF68635.4070202@free.fr>

On Friday 21 May 2010 16:10:13 you wrote:
> On 05/19/2010 10:12 PM, Daniel Lezcano wrote:
> > the commit:
> >
> > commit d90310243fd750240755e217c5faa13e24f41536
> > Author: Octavian Purdila<opurdila@ixiacom.com>
> > Date:   Wed Nov 18 02:36:59 2009 +0000
> >
> >      net: device name allocation cleanups
> >
> > introduced a bug when there is a hash collision making impossible
> > to rename a device with eth%d. This bug is very hard to reproduce
> > and appears rarely.
> >
> > The problem is coming from we don't pass a temporary buffer to
> > __dev_alloc_name but 'dev->name' which is modified by the function.
> >
> > A detailed explanation is here:
> >
> > http://marc.info/?l=linux-netdev&m=127417784011987&w=2
> >
> > Changelog:
> >   V2 : replaced strings comparison by pointers comparison
> >
> > Signed-off-by: Daniel Lezcano<daniel.lezcano@free.fr>
> > ---
> 
> Octavian, are you ok with this patch ?
> 

Yes, all looks good to me, thanks for the fix.

Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>


^ permalink raw reply

* [PATCH] iproute2: Add dsfield as alias for tos for ip rules
From: Arnd Hannemann @ 2010-05-21 14:10 UTC (permalink / raw)
  To: shemminger; +Cc: netdev, Arnd Hannemann

From: Arnd Hannemann <arnd@rhea.(none)>

Get ip rule parsing of "dsfield" in sync with ip route parsing and manual page.

Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de>
---
 ip/iprule.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/ip/iprule.c b/ip/iprule.c
index 7140375..cbf8226 100644
--- a/ip/iprule.c
+++ b/ip/iprule.c
@@ -270,7 +270,8 @@ static int iprule_modify(int cmd, int argc, char **argv)
 			if (get_u32(&pref, *argv, 0))
 				invarg("preference value is invalid\n", *argv);
 			addattr32(&req.n, sizeof(req), FRA_PRIORITY, pref);
-		} else if (strcmp(*argv, "tos") == 0) {
+		} else if (strcmp(*argv, "tos") == 0 ||
+			   matches(*argv, "dsfield") == 0) {
 			__u32 tos;
 			NEXT_ARG();
 			if (rtnl_dsfield_a2n(&tos, *argv))
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH] net: add additional lock to qdisc to increase throughput
From: Eric Dumazet @ 2010-05-21 15:08 UTC (permalink / raw)
  To: Alexander Duyck, David Miller; +Cc: netdev
In-Reply-To: <20100323202553.21598.10754.stgit@gitlad.jf.intel.com>

Le mardi 23 mars 2010 à 13:25 -0700, Alexander Duyck a écrit :
> The qdisc layer shows a significant issue when you start transmitting from
> multiple CPUs.  The issue is that the transmit rate drops significantly, and I
> believe it is due to the fact that the spinlock is shared between the 1
> dequeue, and n-1 enqueue cpu threads.  In order to improve this situation I am
> adding one additional lock which will need to be obtained during the enqueue
> portion of the path.  This essentially allows sch_direct_xmit to jump to
> near the head of the line when attempting to obtain the lock after
> completing a transmit.
> 
> Running the script below I saw an increase from 200K packets per second to
> 1.07M packets per second as a result of this patch.
> 
> for j in `seq 0 15`; do
> 	for i in `seq 0 7`; do
> 		netperf -H <ip> -t UDP_STREAM -l 600 -N -T $i -- -m 6 &
> 	done
> done
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
>  include/net/sch_generic.h |    3 ++-
>  net/core/dev.c            |    7 ++++++-
>  net/sched/sch_generic.c   |    1 +
>  3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 67dc08e..f5088a9 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -51,7 +51,6 @@ struct Qdisc {
>  	struct list_head	list;
>  	u32			handle;
>  	u32			parent;
> -	atomic_t		refcnt;
>  	struct gnet_stats_rate_est	rate_est;
>  	int			(*reshape_fail)(struct sk_buff *skb,
>  					struct Qdisc *q);
> @@ -65,6 +64,8 @@ struct Qdisc {
>  	struct netdev_queue	*dev_queue;
>  	struct Qdisc		*next_sched;
>  
> +	atomic_t		refcnt;
> +	spinlock_t		enqueue_lock;
>  	struct sk_buff		*gso_skb;
>  	/*
>  	 * For performance sake on SMP, we put highly modified fields at the end
> diff --git a/net/core/dev.c b/net/core/dev.c
> index a03aab4..8a2d508 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2006,8 +2006,10 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  	spinlock_t *root_lock = qdisc_lock(q);
>  	int rc;
>  
> +	spin_lock(&q->enqueue_lock);
>  	spin_lock(root_lock);
>  	if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
> +		spin_unlock(&q->enqueue_lock);
>  		kfree_skb(skb);
>  		rc = NET_XMIT_DROP;
>  	} else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
> @@ -2018,7 +2020,9 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  		 * xmit the skb directly.
>  		 */
>  		__qdisc_update_bstats(q, skb->len);
> -		if (sch_direct_xmit(skb, q, dev, txq, root_lock))
> +		spin_unlock(&q->enqueue_lock);
> +		rc = sch_direct_xmit(skb, q, dev, txq, root_lock);
> +		if (rc)
>  			__qdisc_run(q);
>  		else
>  			clear_bit(__QDISC_STATE_RUNNING, &q->state);
> @@ -2026,6 +2030,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
>  		rc = NET_XMIT_SUCCESS;
>  	} else {
>  		rc = qdisc_enqueue_root(skb, q);
> +		spin_unlock(&q->enqueue_lock);
>  		qdisc_run(q);
>  	}
>  	spin_unlock(root_lock);
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 5173c1e..4b5e125 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -538,6 +538,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
>  	sch = (struct Qdisc *) QDISC_ALIGN((unsigned long) p);
>  	sch->padded = (char *) sch - (char *) p;
>  
> +	spin_lock_init(&sch->enqueue_lock);
>  	INIT_LIST_HEAD(&sch->list);
>  	skb_queue_head_init(&sch->q);
>  	sch->ops = ops;
> 

Hi Alexander

What about following patch instead, adding an heuristic to avoid an
extra atomic operation in fast path ?

I also try to release the 'busylock' after the main lock to favor the
worker as much as possible. It must be released before main lock only
before a call to __qdisc_run()

Another refinement would be to make the spinning done outside of BH
disabled context, because we currently delay TX completion (if NAPI is
used by driver) that can make room in device TX ring buffer. This would
allow cpus to process incoming traffic too...

Thanks

[PATCH] net: add additional lock to qdisc to increase throughput

When many cpus compete for sending frames on a given qdisc, the qdisc
spinlock suffers from very high contention.

The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
the lock, and cannot dequeue packets fast enough, since it must wait for
this lock for each dequeued packet.

One solution to this problem is to force all cpus spinning on a second
lock before trying to get the main lock, when/if they see
__QDISC_STATE_RUNNING already set.

The owning cpu then compete with at most one other cpu for the main
lock, allowing for higher dequeueing rate.

Based on a previous patch from Alexander Duyck. I added the heuristic to
avoid the atomic in fast path, and put the new lock far away from the
cache line used by the dequeue worker. Also try to release the busylock
lock as late as possible.

Tests with following script gave a boost from ~50.000 pps to ~600.000
pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
(A single netperf flow can reach ~800.000 pps on this platform)

for j in `seq 0 3`; do
  for i in `seq 0 7`; do
    netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
  done
done


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sch_generic.h |    3 ++-
 net/core/dev.c            |   29 +++++++++++++++++++++++++----
 net/sched/sch_generic.c   |    1 +
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 03ca5d8..2d55649 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -73,7 +73,8 @@ struct Qdisc {
 	struct sk_buff_head	q;
 	struct gnet_stats_basic_packed bstats;
 	struct gnet_stats_queue	qstats;
-	struct rcu_head     rcu_head;
+	struct rcu_head		rcu_head;
+	spinlock_t		busylock;
 };
 
 struct Qdisc_class_ops {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c82065..1b313eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2040,6 +2040,16 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 {
 	spinlock_t *root_lock = qdisc_lock(q);
 	int rc;
+	bool contended = test_bit(__QDISC_STATE_RUNNING, &q->state);
+
+	/*
+	 * Heuristic to force contended enqueues to serialize on a
+	 * separate lock before trying to get qdisc main lock.
+	 * This permits __QDISC_STATE_RUNNING owner to get the lock more often
+	 * and dequeue packets faster.
+	 */
+	if (unlikely(contended))
+		spin_lock(&q->busylock);
 
 	spin_lock(root_lock);
 	if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
@@ -2055,19 +2065,30 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 		if (!(dev->priv_flags & IFF_XMIT_DST_RELEASE))
 			skb_dst_force(skb);
 		__qdisc_update_bstats(q, skb->len);
-		if (sch_direct_xmit(skb, q, dev, txq, root_lock))
+		if (sch_direct_xmit(skb, q, dev, txq, root_lock)) {
+			if (unlikely(contended)) {
+				spin_unlock(&q->busylock);
+				contended = false;
+			}
 			__qdisc_run(q);
-		else
+		} else
 			clear_bit(__QDISC_STATE_RUNNING, &q->state);
 
 		rc = NET_XMIT_SUCCESS;
 	} else {
 		skb_dst_force(skb);
 		rc = qdisc_enqueue_root(skb, q);
-		qdisc_run(q);
+		if (!test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {
+			if (unlikely(contended)) {
+				spin_unlock(&q->busylock);
+				contended = false;
+			}
+			__qdisc_run(q);
+		}
 	}
 	spin_unlock(root_lock);
-
+	if (unlikely(contended))
+		spin_unlock(&q->busylock);
 	return rc;
 }
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a63029e..0a44290 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -543,6 +543,7 @@ struct Qdisc *qdisc_alloc(struct netdev_queue *dev_queue,
 
 	INIT_LIST_HEAD(&sch->list);
 	skb_queue_head_init(&sch->q);
+	spin_lock_init(&sch->busylock);
 	sch->ops = ops;
 	sch->enqueue = ops->enqueue;
 	sch->dequeue = ops->dequeue;




^ permalink raw reply related

* Re: [PATCH 1/3] cgroups: Add an API to attach a task to current task's cgroup
From: Sridhar Samudrala @ 2010-05-21 15:09 UTC (permalink / raw)
  To: Paul Menage
  Cc: Sridhar Samudrala, Michael S. Tsirkin, netdev,
	kvm@vger.kernel.org, lkml
In-Reply-To: <AANLkTinsrFoLVKDFM5pcKcL_6MvAzhR6IzbNmWKh3BDh@mail.gmail.com>

On 5/20/2010 3:22 PM, Paul Menage wrote:
> On Tue, May 18, 2010 at 5:04 PM, Sridhar Samudrala
> <samudrala.sridhar@gmail.com>  wrote:
>    
>> Add a new kernel API to attach a task to current task's cgroup
>> in all the active hierarchies.
>>
>> Signed-off-by: Sridhar Samudrala<sri@us.ibm.com>
>>      
> Reviewed-by: Paul Menage<menage@google.com>
>
> It would be more efficient to just attach directly to current->cgroups
> rather than potentially creating/destroying one css_set for each
> hierarchy until we've completely converged on current->cgroups - but
> that would require a bunch of refactoring of the guts of
> cgroup_attach_task() to ensure that the right can_attach()/attach()
> callbacks are made. That doesn't really seem worthwhile right now for
> the initial use, that I imagine isn't going to be
> performance-sensitive.
>    
Yes. In our use-case, this will be called only once per guest interface 
when the guest comes up.
Hope you or someone more familiar with cgroups subsystem can optimize 
this function later.

Thanks
Sridhar


^ permalink raw reply

* Re: linux-next: manual merge of the driver-core tree with the net tree
From: Greg KH @ 2010-05-21 15:13 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Miller, sfr, linux-next, linux-kernel, therbert, netdev
In-Reply-To: <m1pr0pwvgk.fsf@fess.ebiederm.org>

On Fri, May 21, 2010 at 12:43:55AM -0700, Eric W. Biederman wrote:
> David Miller <davem@davemloft.net> writes:
> 
> > From: ebiederm@xmission.com (Eric W. Biederman)
> > Date: Thu, 20 May 2010 23:46:22 -0700
> >
> >> It looks right, except perhaps the RPS code looks like it will cause a
> >> build failure with sysfs disabled, but that has nothing to do with your
> >> changes.
> >
> > CONFIG_RPS depends upon CONFIG_SMP && CONFIG_SYSFS, so no that build
> > failure is not possible.
> 
> That's the bit I'm missing.  My apologies if I caused any unnecessary worry.

I've now fixed this all up in my tree and have pushed it out.
I'll also go push the patches themselves to Linus later today.

thanks,

greg k-h

^ permalink raw reply

* [PATCH] [RFC] Add ip route {save,restore}
From: Dan Smith @ 2010-05-21 15:14 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

This patch adds save and restore commands to "ip route". Save dumps
the RTNL stream to stdout which can then be passed to restore later.
This may be helpful in some normal situations, and will allow C/R to
migrate the routing information in userspace.  Tweaking of the stream
can be done by userspace helpers to convert between versions and adjust
things like device indexes when restoring routes in a different
environment.

By factoring out some of the common bits of print_route() into
filter_nlmsg(), the "save" command can use the same selection logic
as "list," allowing the caller to save only specific routes as
necessary.

Signed-off-by: Dan Smith <danms@us.ibm.com>
---
 ip/iproute.c |  169 +++++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 127 insertions(+), 42 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 8252e18..3049975 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -23,6 +23,7 @@
 #include <netinet/ip.h>
 #include <arpa/inet.h>
 #include <linux/in_route.h>
+#include <errno.h>
 
 #include "rt_names.h"
 #include "utils.h"
@@ -32,7 +33,11 @@
 #define RTAX_RTTVAR RTAX_HOPS
 #endif
 
-
+enum list_action {
+	IPROUTE_LIST,
+	IPROUTE_FLUSH,
+	IPROUTE_SAVE,
+};
 static const char *mx_names[RTAX_MAX+1] = {
 	[RTAX_MTU]	= "mtu",
 	[RTAX_WINDOW]	= "window",
@@ -53,6 +58,8 @@ static void usage(void) __attribute__((noreturn));
 static void usage(void)
 {
 	fprintf(stderr, "Usage: ip route { list | flush } SELECTOR\n");
+	fprintf(stderr, "       ip route save SELECTOR\n");
+	fprintf(stderr, "       ip route restore\n");
 	fprintf(stderr, "       ip route get ADDRESS [ from ADDRESS iif STRING ]\n");
 	fprintf(stderr, "                            [ oif STRING ]  [ tos TOS ]\n");
 	fprintf(stderr, "       ip route { add | del | change | append | replace | monitor } ROUTE\n");
@@ -115,46 +122,16 @@ static int flush_update(void)
 	return 0;
 }
 
-int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 {
-	FILE *fp = (FILE*)arg;
 	struct rtmsg *r = NLMSG_DATA(n);
-	int len = n->nlmsg_len;
-	struct rtattr * tb[RTA_MAX+1];
-	char abuf[256];
 	inet_prefix dst;
 	inet_prefix src;
-	inet_prefix prefsrc;
 	inet_prefix via;
-	int host_len = -1;
-	static int ip6_multiple_tables;
+	inet_prefix prefsrc;
 	__u32 table;
-	SPRINT_BUF(b1);
-	static int hz;
-
-	if (n->nlmsg_type != RTM_NEWROUTE && n->nlmsg_type != RTM_DELROUTE) {
-		fprintf(stderr, "Not a route: %08x %08x %08x\n",
-			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
-		return 0;
-	}
-	if (filter.flushb && n->nlmsg_type != RTM_NEWROUTE)
-		return 0;
-	len -= NLMSG_LENGTH(sizeof(*r));
-	if (len < 0) {
-		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
-		return -1;
-	}
-
-	if (r->rtm_family == AF_INET6)
-		host_len = 128;
-	else if (r->rtm_family == AF_INET)
-		host_len = 32;
-	else if (r->rtm_family == AF_DECnet)
-		host_len = 16;
-	else if (r->rtm_family == AF_IPX)
-		host_len = 80;
+	static int ip6_multiple_tables;
 
-	parse_rtattr(tb, RTA_MAX, RTM_RTA(r), len);
 	table = rtm_get_table(r, tb);
 
 	if (r->rtm_family == AF_INET6 && table != RT_TABLE_MAIN)
@@ -281,6 +258,56 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	    *(int*)RTA_DATA(tb[RTA_PRIORITY]) == -1)
 		return 0;
 
+	return 1;
+}
+
+int calc_host_len(struct rtmsg *r)
+{
+	if (r->rtm_family == AF_INET6)
+		return 128;
+	else if (r->rtm_family == AF_INET)
+		return 32;
+	else if (r->rtm_family == AF_DECnet)
+		return 16;
+	else if (r->rtm_family == AF_IPX)
+		return 80;
+	else
+		return -1;
+}
+
+int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = (FILE*)arg;
+	struct rtmsg *r = NLMSG_DATA(n);
+	int len = n->nlmsg_len;
+	struct rtattr * tb[RTA_MAX+1];
+	char abuf[256];
+	int host_len = -1;
+	__u32 table;
+	SPRINT_BUF(b1);
+	static int hz;
+
+	if (n->nlmsg_type != RTM_NEWROUTE && n->nlmsg_type != RTM_DELROUTE) {
+		fprintf(stderr, "Not a route: %08x %08x %08x\n",
+			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
+		return 0;
+	}
+	if (filter.flushb && n->nlmsg_type != RTM_NEWROUTE)
+		return 0;
+	len -= NLMSG_LENGTH(sizeof(*r));
+	if (len < 0) {
+		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
+		return -1;
+	}
+
+	host_len = calc_host_len(r);
+
+ 	parse_rtattr(tb, RTA_MAX, RTM_RTA(r), len);
+	table = rtm_get_table(r, tb);
+
+	if (!filter_nlmsg(n, tb, host_len))
+		return 0;
+
 	if (filter.flushb) {
 		struct nlmsghdr *fn;
 		if (NLMSG_ALIGN(filter.flushp) + n->nlmsg_len > filter.flushe) {
@@ -1041,17 +1068,51 @@ static int iproute_flush_cache(void)
 	return 0;
 }
 
+int save_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	int ret;
+	int len = n->nlmsg_len;
+	struct rtmsg *r = NLMSG_DATA(n);
+	struct rtattr *tb[RTA_MAX+1];
+	int host_len = -1;
+
+	if (isatty(STDOUT_FILENO)) {
+		fprintf(stderr, "Not sending binary stream to stdout\n");
+		return -1;
+	}
+
+	host_len = calc_host_len(r);
+	len -= NLMSG_LENGTH(sizeof(*r));
+ 	parse_rtattr(tb, RTA_MAX, RTM_RTA(r), len);
+
+	if (!filter_nlmsg(n, tb, host_len))
+		return 0;
+
+	ret = write(STDOUT_FILENO, n, n->nlmsg_len);
+	if ((ret > 0) && (ret != n->nlmsg_len)) {
+		fprintf(stderr, "Short write while saving nlmsg\n");
+		ret = -EIO;
+	}
+
+	return ret == n->nlmsg_len ? 0 : ret;
+}
 
-static int iproute_list_or_flush(int argc, char **argv, int flush)
+static int iproute_list_flush_or_save(int argc, char **argv, int action)
 {
 	int do_ipv6 = preferred_family;
 	char *id = NULL;
 	char *od = NULL;
+	rtnl_filter_t filter_fn;
+
+	if (action == IPROUTE_SAVE)
+		filter_fn = save_route;
+	else
+		filter_fn = print_route;
 
 	iproute_reset_filter();
 	filter.tb = RT_TABLE_MAIN;
 
-	if (flush && argc <= 0) {
+	if ((action == IPROUTE_FLUSH) && argc <= 0) {
 		fprintf(stderr, "\"ip route flush\" requires arguments.\n");
 		return -1;
 	}
@@ -1201,7 +1262,7 @@ static int iproute_list_or_flush(int argc, char **argv, int flush)
 		}
 	}
 
-	if (flush) {
+	if (action == IPROUTE_FLUSH) {
 		int round = 0;
 		char flushb[4096-512];
 		time_t start = time(0);
@@ -1226,7 +1287,7 @@ static int iproute_list_or_flush(int argc, char **argv, int flush)
 				exit(1);
 			}
 			filter.flushed = 0;
-			if (rtnl_dump_filter(&rth, print_route, stdout, NULL, NULL) < 0) {
+			if (rtnl_dump_filter(&rth, filter_fn, stdout, NULL, NULL) < 0) {
 				fprintf(stderr, "Flush terminated\n");
 				exit(1);
 			}
@@ -1269,7 +1330,7 @@ static int iproute_list_or_flush(int argc, char **argv, int flush)
 		}
 	}
 
-	if (rtnl_dump_filter(&rth, print_route, stdout, NULL, NULL) < 0) {
+	if (rtnl_dump_filter(&rth, filter_fn, stdout, NULL, NULL) < 0) {
 		fprintf(stderr, "Dump terminated\n");
 		exit(1);
 	}
@@ -1436,6 +1497,26 @@ int iproute_get(int argc, char **argv)
 	exit(0);
 }
 
+int restore_handler(struct sockaddr_nl *nl, struct nlmsghdr *n, void *arg)
+{
+	int ret;
+
+	n->nlmsg_flags |= NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+
+	ll_init_map(&rth);
+
+	ret = rtnl_talk(&rth, n, 0, 0, n, NULL, NULL);
+	if ((ret < 0) && (errno == EEXIST))
+		ret = 0;
+
+	return ret;
+}
+
+int iproute_restore(void)
+{
+	exit(rtnl_from_file(stdin, &restore_handler, NULL));
+}
+
 void iproute_reset_filter()
 {
 	memset(&filter, 0, sizeof(filter));
@@ -1446,7 +1527,7 @@ void iproute_reset_filter()
 int do_iproute(int argc, char **argv)
 {
 	if (argc < 1)
-		return iproute_list_or_flush(0, NULL, 0);
+		return iproute_list_flush_or_save(0, NULL, IPROUTE_LIST);
 
 	if (matches(*argv, "add") == 0)
 		return iproute_modify(RTM_NEWROUTE, NLM_F_CREATE|NLM_F_EXCL,
@@ -1471,11 +1552,15 @@ int do_iproute(int argc, char **argv)
 				      argc-1, argv+1);
 	if (matches(*argv, "list") == 0 || matches(*argv, "show") == 0
 	    || matches(*argv, "lst") == 0)
-		return iproute_list_or_flush(argc-1, argv+1, 0);
+		return iproute_list_flush_or_save(argc-1, argv+1, IPROUTE_LIST);
 	if (matches(*argv, "get") == 0)
 		return iproute_get(argc-1, argv+1);
 	if (matches(*argv, "flush") == 0)
-		return iproute_list_or_flush(argc-1, argv+1, 1);
+		return iproute_list_flush_or_save(argc-1, argv+1, IPROUTE_FLUSH);
+	if (matches(*argv, "save") == 0)
+		return iproute_list_flush_or_save(argc-1, argv+1, IPROUTE_SAVE);
+	if (matches(*argv, "restore") == 0)
+		return iproute_restore();
 	if (matches(*argv, "help") == 0)
 		usage();
 	fprintf(stderr, "Command \"%s\" is unknown, try \"ip route help\".\n", *argv);
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH] [RFC] Add ip route {save,restore}
From: Stephen Hemminger @ 2010-05-21 15:26 UTC (permalink / raw)
  To: Dan Smith; +Cc: netdev
In-Reply-To: <1274454863-5146-1-git-send-email-danms@us.ibm.com>

On Fri, 21 May 2010 08:14:23 -0700
Dan Smith <danms@us.ibm.com> wrote:

> This patch adds save and restore commands to "ip route". Save dumps
> the RTNL stream to stdout which can then be passed to restore later.
> This may be helpful in some normal situations, and will allow C/R to
> migrate the routing information in userspace.  Tweaking of the stream
> can be done by userspace helpers to convert between versions and adjust
> things like device indexes when restoring routes in a different
> environment.
> 
> By factoring out some of the common bits of print_route() into
> filter_nlmsg(), the "save" command can use the same selection logic
> as "list," allowing the caller to save only specific routes as
> necessary.
> 
> Signed-off-by: Dan Smith <danms@us.ibm.com>

Looks good, please add man page and doc entry for this in final
version.

^ permalink raw reply

* bug fix patch lost: git problem or just incorrect merge?
From: James Bottomley @ 2010-05-21 15:41 UTC (permalink / raw)
  To: Linus Torvalds, David Miller; +Cc: linux-kernel, netdev, linux-scsi

The patch in question is this one (upstream for a while):

commit d7d05548a62c87ee55b0c81933669177f885aa8d
Author: Mike Christie <michaelc@cs.wisc.edu>
Date:   Wed Mar 31 14:41:35 2010 -0500

    [SCSI] iscsi_tcp: fix relogin/shutdown hang

It's a simple one line change in iscsi_tcp.c (diff clipped):

--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -599,7 +599,7 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag)
-       if (sock->sk->sk_sleep && waitqueue_active(sock->sk->sk_sleep)) {
+       if (sock->sk->sk_sleep) {

It was killed by this merge commit in the net-next tree:

commit 278554bd6579206921f5d8a523649a7a57f8850d
Merge: 5a147e8 cea0d76
Author: David S. Miller <davem@davemloft.net>
Date:   Wed May 12 00:05:35 2010 -0700

However, the curious thing is that git seems to have lost trace of the
missing patch entirely:  if I try to find it in linus' tree with a git
log -- drivers/scsi/iscsi_tcp.c, it doesn't show up.  The merge commit
which killed it does list iscsi_tcp.c as a file conflict, but git show
on that commit doesn't list that file in the resolution diff ... even
though this is where it actually got killed.

Is this a git problem ... or is it just a mismerge in the net tree?

Either way, of course, we need the patch back ...

James

^ permalink raw reply

* Re: [RFC PATCH] net: add additional lock to qdisc to increase enqueue/dequeue fairness
From: Stephen Hemminger @ 2010-05-21 15:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, alexander.h.duyck, netdev
In-Reply-To: <1269382380.2915.40.camel@edumazet-laptop>

On Tue, 23 Mar 2010 23:13:00 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mardi 23 mars 2010 à 14:45 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Tue, 23 Mar 2010 21:54:27 +0100
> > 
> > > Quite frankly, the real problem in this case is not the reduced
> > > throughput, but fact that one cpu can stay a long time doing the xmits
> > > to device, of skb queued by other cpus. This can hurt latencies a lot,
> > > for real time threads for example...
> > > 
> > > I wonder if ticket spinlocks are not the problem. Maybe we want a
> > > variant of spinlocks, so that cpu doing transmits can get the lock
> > > before other cpus...
> > 
> > I want to note that things operate the way they do now
> > intentionally.
> > 
> > Herbert Xu and Jamal Hadi Salim were active in this area
> > about 4 years ago.
> 
> Yes, but ticket spinlocks were added after their work (in 2008 - 2.6.25
> if I remember well) and change things.
> 
> We want cpu owning __QDISC_STATE_RUNNING being able to re-get the lock
> as fast as possible. Alexander results can show the possible speedup.

What about having a special function (spin_lock_greedy?) that just ignores
the ticket mechanism and always assumes it has right to next ticket.



-- 

^ permalink raw reply

* Namespaces and devices
From: Martín Ferrari @ 2010-05-21 16:27 UTC (permalink / raw)
  To: netdev; +Cc: Mathieu Lacage

Hi,

Sorry if this is a dumb question, but I couldn't find any
documentation that matches the current behaviour... So I don't know if
what I see is what is intended, or if it's a bug.

I would like to know what is the exact behaviour re. devices when a
netns is destroyed, and which kind of devices can be moved.

According to http://lxc.sourceforge.net/network/configuration.php,
devices assigned to a netns should move to the main netns when the
former is destroyed. What I see is that the devices are deleted, at
least for veth and dummy devices. I also see a bug I previously
reported that caused an oops in some cases.

Also, I have read somewhere (now I cannot find it) that supposedly, I
should be able to move real devices to a netns, but I always get
Invalid argument errors.

Thanks.

-- 
Martín Ferrari

^ permalink raw reply

* ixgbe and SRIOV failure in driver?
From: Fischer, Anna @ 2010-05-21 16:32 UTC (permalink / raw)
  To: e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org

I am running a system with 3 Intel NICs. Two of them are 82598 devices, and one is a SRIOV capable 82599.

All devices use the ixgbe driver. What happens (I believe) now is that when the driver loads at first, it sees the 82598 first (because of its position in the PCI tree) and then it says "Device not IOV capable - switching off IOV."

So then it switches into non-IOV mode, and I can never enable SRIOV on my 82599, because the driver does not enable it any more for further devices.

So to get around this issue, I tried to use pciback.hide to hide the 82598 devices from the OS. That way I was hoping that the driver would switch on SRIOV on my 82599. However, then I got a kernel panic on boot (see below).

I am running Xen 4 and the Dom0 kernel is a 2.6.31 kernel.

Any help would be greatly appreciated.

Thanks,
Anna

Starting udev: (XEN) ioapic_guest_write: apic=0, pin=16, irq=16
(XEN) ioapic_guest_write: new_entry=0001a9a8
(XEN) ioapic_guest_write: old_entry=0000a9a8 pirq=16
(XEN) ioapic_guest_write: Attempt to modify IO-APIC pin for in-use IRQ!
(XEN) printk: 11 messages suppressed.
(XEN) mm.c:859:d0 Error getting mfn 40000 (pfn 5555555555555555) from L1 entry 3
(XEN) mm.c:859:d0 Error getting mfn 40001 (pfn 5555555555555555) from L1 entry 3
(XEN) mm.c:859:d0 Error getting mfn 40002 (pfn 5555555555555555) from L1 entry 3
(XEN) mm.c:859:d0 Error getting mfn 40003 (pfn 5555555555555555) from L1 entry 3
[   23.067972] BUG: unable to handle kernel paging request at ffffc90020e10010
[   23.067985] IP: [<ffffffffa00c6652>] ixgbe_init_ops_generic+0x12/0x1c0 [ixgb]
[   23.068004] PGD 220a067 PUD 220b067 PMD 3e841067 PTE 0
[   23.068004] Oops: 0000 [#1] SMP 
[   23.068004] last sysfs file: /sys/devices/pci0000:00/0000:00:1d.1/usb6/dev
[   23.068004] CPU 0 
[   23.068004] Modules linked in: ixgbe(+) e1000e usb_storage ahci libata sd_mod
[   23.068004] Pid: 1172, comm: modprobe Not tainted 2.6.31.13 #2 Montevina Plam
[   23.068004] RIP: e030:[<ffffffffa00c6652>]  [<ffffffffa00c6652>] ixgbe_init_]
[   23.068004] RSP: e02b:ffff88003d981c78  EFLAGS: 00010246
[   23.068004] RAX: ffff88003c1ed550 RBX: ffff88003c1ed550 RCX: 0000000080050008
[   23.068004] RDX: ffffc90020e00000 RSI: ffff88003c1ed7f0 RDI: ffff88003c1ed540
[   23.068004] RBP: ffff88003d981c78 R08: 0000000000000030 R09: 00000000fffffffb
[   23.068004] R10: 0000000000000000 R11: 800000000000056b R12: ffff88003c1ed540
[   23.068004] R13: ffff88003c1ed740 R14: 00000000fffffffb R15: ffff88003c1ec000
[   23.068004] FS:  00007f24a1a736e0(0000) GS:ffffc90000000000(0000) knlGS:00000
[   23.068004] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[   23.068004] CR2: ffffc90020e10010 CR3: 000000003c183000 CR4: 0000000000002660
[   23.068004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.068004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   23.068004] Process modprobe (pid: 1172, threadinfo ffff88003d980000, task f)
[   23.068004] Stack:
[   23.068004]  ffff88003d981ca8 ffffffffa00cf7f6 ffff88003c1ed540 ffff88003c1e0
[   23.068004] <0> ffff88003c1ed540 00000000fffffffb ffff88003d981cc8 ffffffffa8
[   23.068004] <0> 00000000fffffffb ffff88003fab5000 ffff88003d981d48 ffffffffa7
[   23.068004] Call Trace:
[   23.068004]  [<ffffffffa00cf7f6>] ixgbe_init_ops_82599+0x36/0x1a0 [ixgbe]
[   23.068004]  [<ffffffffa00c8818>] ixgbe_init_shared_code+0x48/0x50 [ixgbe]
[   23.068004]  [<ffffffffa00d6b37>] ixgbe_probe+0x317/0x1380 [ixgbe]
[   23.068004]  [<ffffffff812147b0>] ? pci_match_id+0x1f/0x43
[   23.068004]  [<ffffffff812147e6>] local_pci_probe+0x12/0x16
[   23.068004]  [<ffffffff812152fc>] pci_device_probe+0x5c/0x88
[   23.068004]  [<ffffffff812e23e4>] ? driver_sysfs_add+0x4d/0x73
[   23.068004]  [<ffffffff812e2535>] driver_probe_device+0xb2/0x136
[   23.068004]  [<ffffffff812e260d>] __driver_attach+0x54/0x77
[   23.068004]  [<ffffffff812e25b9>] ? __driver_attach+0x0/0x77
[   23.068004]  [<ffffffff812e25b9>] ? __driver_attach+0x0/0x77
[   23.068004]  [<ffffffff812e1bb3>] bus_for_each_dev+0x49/0x78
[   23.068004]  [<ffffffff812e2395>] driver_attach+0x1c/0x1e
[   23.068004]  [<ffffffff812e16c2>] bus_add_driver+0xba/0x220
[   23.068004]  [<ffffffff812e299a>] driver_register+0x9e/0x115
[   23.068004]  [<ffffffff81215574>] __pci_register_driver+0x50/0xad
[   23.068004]  [<ffffffffa00eb000>] ? ixgbe_init_module+0x0/0x56 [ixgbe]
[   23.068004]  [<ffffffffa00eb054>] ixgbe_init_module+0x54/0x56 [ixgbe]
[   23.068004]  [<ffffffff8100a0e7>] do_one_initcall+0x59/0x159
[   23.068004]  [<ffffffff81095311>] sys_init_module+0xcb/0x1fe
[   23.068004]  [<ffffffff81033cc2>] system_call_fastpath+0x16/0x1b
[   23.068004] Code: 5d c9 c3 48 83 c4 08 b8 ff ff ff ff 5b 41 5c 41 5d c9 c3 6 
[   23.068004] RIP  [<ffffffffa00c6652>] ixgbe_init_ops_generic+0x12/0x1c0 [ixg]
[   23.068004]  RSP <ffff88003d981c78>
[   23.068004] CR2: ffffc90020e10010
[   23.068004] ---[ end trace a981c9cfc87a2d66 ]---

------------------------------------------------------------------------------

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: tun: Use netif_receive_skb instead of netif_rx
From: Neil Horman @ 2010-05-21 16:40 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, eric.dumazet, bmb, tgraf, nhorman, netdev
In-Reply-To: <20100521110847.GA29878@gondor.apana.org.au>

On Fri, May 21, 2010 at 09:08:47PM +1000, Herbert Xu wrote:
> On Fri, May 21, 2010 at 06:51:28AM -0400, Neil Horman wrote:
> >
> > I read it, we just described the issue in diferent terms.
> 
> Yeah I wasn't clear enough with my email without an actual patch :)
> 
> Thanks for testing Neil!

So a bit of bad news.  I think you're patch is correct herbert but its still not
working.  When I initially started messing with this it was on 2.6.32.  But
since you're patches I (naturally) started testing them on 2.6.34.  Whereas in
2.6.32 the tasks classid value for the net_cls cgroup subsys was set properly,
in 2.6.34 its not.  It appears since we started using the cgroup subsys
registration calls, we never registered an attach method, so we never set the
tasks classid, so the comparison always fails.

There may also be an issue with the setting of the classid (possible use of the
wrong subsys id value when grabbing our cgroup_subsys_state), but I'm checking
on that now.

Best
Neil

> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> 

^ permalink raw reply

* Re: bnx2x + SFP+ DA/2.6.33.3: Got bad status 0x0 when reading from SFP+ EEPROM -> SFP+ module is not initialized
From: Krzysztof Olędzki @ 2010-05-21 16:43 UTC (permalink / raw)
  To: eilong; +Cc: Rick Jones, Michael Chan, netdev@vger.kernel.org
In-Reply-To: <4BF68854.5060409@ans.pl>

On 2010-05-21 15:19, Krzysztof Olędzki wrote:
> On 2010-05-21 14:58, Eilon Greenstein wrote:
>> On Thu, 2010-05-20 at 13:54 -0700, Krzysztof Olędzki wrote:
>>> On 2010-05-20 22:25, Rick Jones wrote:
>>>> Some simple/simplistic thoughts/questions...
>>>>
>>>> Has the DAC been used successfully prior to this?
>>>
>>> Yes. It was successfully used to connect two HP switches, before I
>>> received SFP+ SR modules, that allowed me to put the switches into
>>> distanced rooms.
>>
>> Krzysztof,
>>
>> I would like to check in what speed the I2C is running at, and while we
>> are there, to make sure it is at low speed (100KHz and not 400KHz). Can
>> you please add this patch and let me know what you see:
> 
> Sure.
> 
>>
>> diff --git a/drivers/net/bnx2x_link.c b/drivers/net/bnx2x_link.c
> 
> <CUT>
> 
> [bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
> [bnx2x_8727_read_sfp_module_eeprom:2774(eth5)]I2C 2W speed 0x0, bit 8 0
> 
>> Since we are already off for the weekend, I couldn't find a machine with
>> this kind of board to test it for myself. On top of that, I don't have
>> this type of DAC cable, so I need your help in this debug process.
> 
> No problem. There is no need to ask for a help a person who depends on
> your support. ;) I would like to solve it as soon as possible so I'm
> willing to test everything you send me. ;)

Hello again,

To make sure the NICs are OK I made a test and plugged a SFP+ SR module (HP J9150A):

[bnx2x_8727_handle_mod_abs:4639(eth4)]MOD_ABS indication show module is present
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2835(eth4)]Got bad status 0x0 when reading from SFP+ EEPROM
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2835(eth4)]Got bad status 0x0 when reading from SFP+ EEPROM
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2835(eth4)]Got bad status 0x0 when reading from SFP+ EEPROM
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2835(eth4)]Got bad status 0x0 when reading from SFP+ EEPROM
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2835(eth4)]Got bad status 0x0 when reading from SFP+ EEPROM
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_wait_for_sfp_module_initialized:3138(eth4)]SFP+ module initialization took 25 ms
[bnx2x_sfp_module_detection:3197(eth4)]SFP+ module plugged in/out detected on port 0
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_get_edc_mode:2930(eth4)]Optic module detected
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_get_edc_mode:2954(eth4)]EDC mode is set to 0x44
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
[bnx2x_8727_read_sfp_module_eeprom:2774(eth4)]I2C 2W speed 0x0, bit 8 0
bnx2x: Warning: Unqualified SFP+ module detected on eth4, Port 0 from MergeOptics GmbH part number TRX10GVP2010CA01
[bnx2x_sfp_module_detection:3205(eth4)]Module verification failed!!

So, both NICs and SFP+ DA cables are OK. The only problem we have is to make them talk together. :|

Best regards,

			Krzysztof Olędzki

^ permalink raw reply

* Re: [RFC PATCH] net: add additional lock to qdisc to increase enqueue/dequeue fairness
From: Eric Dumazet @ 2010-05-21 16:44 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, alexander.h.duyck, netdev
In-Reply-To: <20100521084349.0d6f8f9a@nehalam>

Le vendredi 21 mai 2010 à 08:43 -0700, Stephen Hemminger a écrit :

> What about having a special function (spin_lock_greedy?) that just ignores
> the ticket mechanism and always assumes it has right to next ticket.
> 

I am not sure we want to do this, for a single use case...
Adding a new lock primitive to linux should be really really motivated.

x86 ticket spinlocks are nice because we are sure no cpu is going to
stay forever in this code. For other arches, I dont know how this is
coped.

I thought about this before re-working on this patch, but found it was
easier to slightly change Alexander code, ie adding a regular spinlock
for the slowpath.

We could use cmpxchg() and manipulate several bits at once in fast path.
( __QDISC_STATE_RUNNING, __QDISC_STATE_LOCKED ... ) but making the crowd
of cpus spin on the same bits/cacheline than dequeue worker would
definitely slowdown the worker.

^ permalink raw reply

* [PATCH] Add ip route {save,restore}
From: Dan Smith @ 2010-05-21 16:45 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

This patch adds save and restore commands to "ip route". Save dumps
the RTNL stream to stdout which can then be passed to restore later.
This may be helpful in some normal situations, and will allow C/R to
migrate the routing information in userspace.  Tweaking of the stream
can be done by userspace helpers to convert between versions and adjust
things like device indexes when restoring routes in a different
environment.

By factoring out some of the common bits of print_route() into
filter_nlmsg(), the "save" command can use the same selection logic
as "list," allowing the caller to save only specific routes as
necessary.

The only change since the RFC is the addition of manpage and doc
material.

Signed-off-by: Dan Smith <danms@us.ibm.com>
---
 doc/ip-cref.tex |   35 +++++++++++
 ip/iproute.c    |  169 +++++++++++++++++++++++++++++++++++++++++--------------
 man/man8/ip.8   |   22 +++++++
 3 files changed, 184 insertions(+), 42 deletions(-)

diff --git a/doc/ip-cref.tex b/doc/ip-cref.tex
index 056e3bf..d8fed66 100644
--- a/doc/ip-cref.tex
+++ b/doc/ip-cref.tex
@@ -1675,6 +1675,41 @@ information about this route is shown:
 \item \verb|used| --- the number of lookups of this route since its creation.
 \end{itemize}
 
+\subsection{{\tt ip route save} -- save routing tables}
+\label{IP-ROUTE-SAVE}
+
+\paragraph{Description:} this command saves the contents of the routing
+tables or the route(s) selected by some criteria to standard output.
+
+\paragraph{Arguments:} \verb|ip route save| has the same arguments as
+\verb|ip route show|.
+
+\paragraph{Example:} This saves all the routes to the {\tt saved\_routes}
+file:
+\begin{verbatim}
+dan@caffeine:~ # ip route save > saved_routes
+\end{verbatim}
+
+\paragraph{Output format:} The format of the data stream provided by
+\verb|ip route save| is that of \verb|rtnetlink|.  See
+\verb|rtnetlink(7)| for more information.
+
+\subsection{{\tt ip route restore} -- restore routing tables}
+\label{IP-ROUTE-RESTORE}
+
+\paragraph{Description:} this command restores the contents of the routing
+tables according to a data stream as provided by \verb|ip route save| via
+standard input.  Note that any routes already in the table are left unchanged.
+Any routes in the input stream that already exist in the tables are ignored.
+
+\paragraph{Arguments:} This command takes no arguments.
+
+\paragraph{Example:} This restores all routes that were saved to the
+{\tt saved\_routes} file:
+
+\begin{verbatim}
+dan@caffeine:~ # ip route restore < saved_routes
+\end{verbatim}
 
 \subsection{{\tt ip route flush} --- flush routing tables}
 \label{IP-ROUTE-FLUSH}
diff --git a/ip/iproute.c b/ip/iproute.c
index 8252e18..3049975 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -23,6 +23,7 @@
 #include <netinet/ip.h>
 #include <arpa/inet.h>
 #include <linux/in_route.h>
+#include <errno.h>
 
 #include "rt_names.h"
 #include "utils.h"
@@ -32,7 +33,11 @@
 #define RTAX_RTTVAR RTAX_HOPS
 #endif
 
-
+enum list_action {
+	IPROUTE_LIST,
+	IPROUTE_FLUSH,
+	IPROUTE_SAVE,
+};
 static const char *mx_names[RTAX_MAX+1] = {
 	[RTAX_MTU]	= "mtu",
 	[RTAX_WINDOW]	= "window",
@@ -53,6 +58,8 @@ static void usage(void) __attribute__((noreturn));
 static void usage(void)
 {
 	fprintf(stderr, "Usage: ip route { list | flush } SELECTOR\n");
+	fprintf(stderr, "       ip route save SELECTOR\n");
+	fprintf(stderr, "       ip route restore\n");
 	fprintf(stderr, "       ip route get ADDRESS [ from ADDRESS iif STRING ]\n");
 	fprintf(stderr, "                            [ oif STRING ]  [ tos TOS ]\n");
 	fprintf(stderr, "       ip route { add | del | change | append | replace | monitor } ROUTE\n");
@@ -115,46 +122,16 @@ static int flush_update(void)
 	return 0;
 }
 
-int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+int filter_nlmsg(struct nlmsghdr *n, struct rtattr **tb, int host_len)
 {
-	FILE *fp = (FILE*)arg;
 	struct rtmsg *r = NLMSG_DATA(n);
-	int len = n->nlmsg_len;
-	struct rtattr * tb[RTA_MAX+1];
-	char abuf[256];
 	inet_prefix dst;
 	inet_prefix src;
-	inet_prefix prefsrc;
 	inet_prefix via;
-	int host_len = -1;
-	static int ip6_multiple_tables;
+	inet_prefix prefsrc;
 	__u32 table;
-	SPRINT_BUF(b1);
-	static int hz;
-
-	if (n->nlmsg_type != RTM_NEWROUTE && n->nlmsg_type != RTM_DELROUTE) {
-		fprintf(stderr, "Not a route: %08x %08x %08x\n",
-			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
-		return 0;
-	}
-	if (filter.flushb && n->nlmsg_type != RTM_NEWROUTE)
-		return 0;
-	len -= NLMSG_LENGTH(sizeof(*r));
-	if (len < 0) {
-		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
-		return -1;
-	}
-
-	if (r->rtm_family == AF_INET6)
-		host_len = 128;
-	else if (r->rtm_family == AF_INET)
-		host_len = 32;
-	else if (r->rtm_family == AF_DECnet)
-		host_len = 16;
-	else if (r->rtm_family == AF_IPX)
-		host_len = 80;
+	static int ip6_multiple_tables;
 
-	parse_rtattr(tb, RTA_MAX, RTM_RTA(r), len);
 	table = rtm_get_table(r, tb);
 
 	if (r->rtm_family == AF_INET6 && table != RT_TABLE_MAIN)
@@ -281,6 +258,56 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	    *(int*)RTA_DATA(tb[RTA_PRIORITY]) == -1)
 		return 0;
 
+	return 1;
+}
+
+int calc_host_len(struct rtmsg *r)
+{
+	if (r->rtm_family == AF_INET6)
+		return 128;
+	else if (r->rtm_family == AF_INET)
+		return 32;
+	else if (r->rtm_family == AF_DECnet)
+		return 16;
+	else if (r->rtm_family == AF_IPX)
+		return 80;
+	else
+		return -1;
+}
+
+int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	FILE *fp = (FILE*)arg;
+	struct rtmsg *r = NLMSG_DATA(n);
+	int len = n->nlmsg_len;
+	struct rtattr * tb[RTA_MAX+1];
+	char abuf[256];
+	int host_len = -1;
+	__u32 table;
+	SPRINT_BUF(b1);
+	static int hz;
+
+	if (n->nlmsg_type != RTM_NEWROUTE && n->nlmsg_type != RTM_DELROUTE) {
+		fprintf(stderr, "Not a route: %08x %08x %08x\n",
+			n->nlmsg_len, n->nlmsg_type, n->nlmsg_flags);
+		return 0;
+	}
+	if (filter.flushb && n->nlmsg_type != RTM_NEWROUTE)
+		return 0;
+	len -= NLMSG_LENGTH(sizeof(*r));
+	if (len < 0) {
+		fprintf(stderr, "BUG: wrong nlmsg len %d\n", len);
+		return -1;
+	}
+
+	host_len = calc_host_len(r);
+
+ 	parse_rtattr(tb, RTA_MAX, RTM_RTA(r), len);
+	table = rtm_get_table(r, tb);
+
+	if (!filter_nlmsg(n, tb, host_len))
+		return 0;
+
 	if (filter.flushb) {
 		struct nlmsghdr *fn;
 		if (NLMSG_ALIGN(filter.flushp) + n->nlmsg_len > filter.flushe) {
@@ -1041,17 +1068,51 @@ static int iproute_flush_cache(void)
 	return 0;
 }
 
+int save_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
+{
+	int ret;
+	int len = n->nlmsg_len;
+	struct rtmsg *r = NLMSG_DATA(n);
+	struct rtattr *tb[RTA_MAX+1];
+	int host_len = -1;
+
+	if (isatty(STDOUT_FILENO)) {
+		fprintf(stderr, "Not sending binary stream to stdout\n");
+		return -1;
+	}
+
+	host_len = calc_host_len(r);
+	len -= NLMSG_LENGTH(sizeof(*r));
+ 	parse_rtattr(tb, RTA_MAX, RTM_RTA(r), len);
+
+	if (!filter_nlmsg(n, tb, host_len))
+		return 0;
+
+	ret = write(STDOUT_FILENO, n, n->nlmsg_len);
+	if ((ret > 0) && (ret != n->nlmsg_len)) {
+		fprintf(stderr, "Short write while saving nlmsg\n");
+		ret = -EIO;
+	}
+
+	return ret == n->nlmsg_len ? 0 : ret;
+}
 
-static int iproute_list_or_flush(int argc, char **argv, int flush)
+static int iproute_list_flush_or_save(int argc, char **argv, int action)
 {
 	int do_ipv6 = preferred_family;
 	char *id = NULL;
 	char *od = NULL;
+	rtnl_filter_t filter_fn;
+
+	if (action == IPROUTE_SAVE)
+		filter_fn = save_route;
+	else
+		filter_fn = print_route;
 
 	iproute_reset_filter();
 	filter.tb = RT_TABLE_MAIN;
 
-	if (flush && argc <= 0) {
+	if ((action == IPROUTE_FLUSH) && argc <= 0) {
 		fprintf(stderr, "\"ip route flush\" requires arguments.\n");
 		return -1;
 	}
@@ -1201,7 +1262,7 @@ static int iproute_list_or_flush(int argc, char **argv, int flush)
 		}
 	}
 
-	if (flush) {
+	if (action == IPROUTE_FLUSH) {
 		int round = 0;
 		char flushb[4096-512];
 		time_t start = time(0);
@@ -1226,7 +1287,7 @@ static int iproute_list_or_flush(int argc, char **argv, int flush)
 				exit(1);
 			}
 			filter.flushed = 0;
-			if (rtnl_dump_filter(&rth, print_route, stdout, NULL, NULL) < 0) {
+			if (rtnl_dump_filter(&rth, filter_fn, stdout, NULL, NULL) < 0) {
 				fprintf(stderr, "Flush terminated\n");
 				exit(1);
 			}
@@ -1269,7 +1330,7 @@ static int iproute_list_or_flush(int argc, char **argv, int flush)
 		}
 	}
 
-	if (rtnl_dump_filter(&rth, print_route, stdout, NULL, NULL) < 0) {
+	if (rtnl_dump_filter(&rth, filter_fn, stdout, NULL, NULL) < 0) {
 		fprintf(stderr, "Dump terminated\n");
 		exit(1);
 	}
@@ -1436,6 +1497,26 @@ int iproute_get(int argc, char **argv)
 	exit(0);
 }
 
+int restore_handler(struct sockaddr_nl *nl, struct nlmsghdr *n, void *arg)
+{
+	int ret;
+
+	n->nlmsg_flags |= NLM_F_REQUEST | NLM_F_CREATE | NLM_F_ACK;
+
+	ll_init_map(&rth);
+
+	ret = rtnl_talk(&rth, n, 0, 0, n, NULL, NULL);
+	if ((ret < 0) && (errno == EEXIST))
+		ret = 0;
+
+	return ret;
+}
+
+int iproute_restore(void)
+{
+	exit(rtnl_from_file(stdin, &restore_handler, NULL));
+}
+
 void iproute_reset_filter()
 {
 	memset(&filter, 0, sizeof(filter));
@@ -1446,7 +1527,7 @@ void iproute_reset_filter()
 int do_iproute(int argc, char **argv)
 {
 	if (argc < 1)
-		return iproute_list_or_flush(0, NULL, 0);
+		return iproute_list_flush_or_save(0, NULL, IPROUTE_LIST);
 
 	if (matches(*argv, "add") == 0)
 		return iproute_modify(RTM_NEWROUTE, NLM_F_CREATE|NLM_F_EXCL,
@@ -1471,11 +1552,15 @@ int do_iproute(int argc, char **argv)
 				      argc-1, argv+1);
 	if (matches(*argv, "list") == 0 || matches(*argv, "show") == 0
 	    || matches(*argv, "lst") == 0)
-		return iproute_list_or_flush(argc-1, argv+1, 0);
+		return iproute_list_flush_or_save(argc-1, argv+1, IPROUTE_LIST);
 	if (matches(*argv, "get") == 0)
 		return iproute_get(argc-1, argv+1);
 	if (matches(*argv, "flush") == 0)
-		return iproute_list_or_flush(argc-1, argv+1, 1);
+		return iproute_list_flush_or_save(argc-1, argv+1, IPROUTE_FLUSH);
+	if (matches(*argv, "save") == 0)
+		return iproute_list_flush_or_save(argc-1, argv+1, IPROUTE_SAVE);
+	if (matches(*argv, "restore") == 0)
+		return iproute_restore();
 	if (matches(*argv, "help") == 0)
 		usage();
 	fprintf(stderr, "Command \"%s\" is unknown, try \"ip route help\".\n", *argv);
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index a5d2915..8158f5e 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -132,6 +132,13 @@ tentative " | " deprecated " | " dadfailed " | " temporary " ]"
 .I  SELECTOR
 
 .ti -8
+.BR "ip route save"
+.I SELECTOR
+
+.ti -8
+.BR "ip route restore"
+
+.ti -8
 .B  ip route get
 .IR ADDRESS " [ "
 .BI from " ADDRESS " iif " STRING"
@@ -1867,6 +1874,21 @@ however, no packets are actually sent.  With the
 argument, the kernel pretends that a packet arrived from this interface
 and searches for a path to forward the packet.
 
+.SS ip route save - save routing table information to stdout
+this command behaves like
+.BR "ip route show"
+except that the output is raw data suitable for passing to
+.BR "ip route restore" .
+
+.SS ip route restore - restore routing table information from stdin
+this command expects to read a data stream as returned from
+.BR "ip route save" .
+It will attempt to restore the routing table information exactly as
+it was at the time of the save, so any translation of information
+in the stream (such as device indexes) must be done first.  Any existing
+routes are left unchanged.  Any routes specified in the data stream that
+already exist in the table will be ignored.
+
 .SH ip rule - routing policy database management
 
 .BR "Rule" s
-- 
1.7.0.4


^ permalink raw reply related

* Re: bug fix patch lost: git problem or just incorrect merge?
From: Linus Torvalds @ 2010-05-21 16:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: David Miller, linux-kernel, netdev, linux-scsi
In-Reply-To: <1274456515.9022.14.camel@mulgrave.site>

On Fri, 21 May 2010, James Bottomley wrote:
>
> The patch in question is this one (upstream for a while):
> 
> commit d7d05548a62c87ee55b0c81933669177f885aa8d
> Author: Mike Christie <michaelc@cs.wisc.edu>
> Date:   Wed Mar 31 14:41:35 2010 -0500
> 
>     [SCSI] iscsi_tcp: fix relogin/shutdown hang
> 
> It's a simple one line change in iscsi_tcp.c (diff clipped):
> 
> --- a/drivers/scsi/iscsi_tcp.c
> +++ b/drivers/scsi/iscsi_tcp.c
> @@ -599,7 +599,7 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag)
> -       if (sock->sk->sk_sleep && waitqueue_active(sock->sk->sk_sleep)) {
> +       if (sock->sk->sk_sleep) {
> 
> It was killed by this merge commit in the net-next tree:
> 
> commit 278554bd6579206921f5d8a523649a7a57f8850d
> Merge: 5a147e8 cea0d76
> Author: David S. Miller <davem@davemloft.net>
> Date:   Wed May 12 00:05:35 2010 -0700
> 
> However, the curious thing is that git seems to have lost trace of the
> missing patch entirely

No, it's there, and the bug is that David doesn't do merges well.

One of the reasons I ask people to let me merge is that it results in 
cleaner history to not have criss-cross merges. And another is that I'm 
pretty good at it, and letting me make merges also means that I am more 
aware of problem spots.

> if I try to find it in linus' tree with a git log -- 
> drivers/scsi/iscsi_tcp.c, it doesn't show up.

That is because "git log" will see the merge, see that _all_ history came 
from the other side, and ignore the side that was ignored. Because 
clearly, that other side didn't actually contribute anything.

Now, the _reason_ that other side didn't contribute anything is that David 
messed up the merge, and took just his own sides changes.

> The merge commit which killed it does list iscsi_tcp.c as a file 
> conflict

Yes. I re-did the merge, and the result looks like this (cut-and-paste 
whitespace damage, I'm just illustrating he point):

	diff --cc drivers/scsi/iscsi_tcp.c
	index 9eae04a,02143af..0000000
	--- a/drivers/scsi/iscsi_tcp.c
	+++ b/drivers/scsi/iscsi_tcp.c
	@@@ -599,9 -599,9 +599,13 @@@ static void iscsi_sw_tcp_conn_stop(stru
	        set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx);
	        write_unlock_bh(&tcp_sw_conn->sock->sk->sk_callback_lock);

	++<<<<<<< HEAD
	 +      if (sk_sleep(sock->sk) && waitqueue_active(sk_sleep(sock->sk))) {
	++=======
	+       if (sock->sk->sk_sleep) {
	++>>>>>>> cea0d76
	                sock->sk->sk_err = EIO;
	 -              wake_up_interruptible(sock->sk->sk_sleep);
	 +              wake_up_interruptible(sk_sleep(sock->sk));
	        }

	        iscsi_conn_stop(cls_conn, flag);

and David picked his side of things, not your side.

The _correct_ merge would have been to take both changes, as is quite 
obvious if you do

	gitk 5a147e8...cea0d76 drivers/scsi/iscsi_tcp.c

and see that the conflict comes because:

 - one side (David's) changed

	sock->sk->sk_sleep

   into

	sk_sleep(sock->sk)

   in commit aa395145165cb06a0d0885221bbe0ce4a564391d

 - the other side (your) removed the 'waitqueue_active()' part in commit 
   d7d05548a62c87ee55b0c81933669177f885aa8d.

So the end result _should_ have been this merge resolution:

	diff --cc drivers/scsi/iscsi_tcp.c
	index 9eae04a,02143af..0000000
	--- a/drivers/scsi/iscsi_tcp.c
	+++ b/drivers/scsi/iscsi_tcp.c
	@@@ -599,9 -599,9 +599,9 @@@ static void iscsi_sw_tcp_conn_stop(stru
	        set_bit(ISCSI_SUSPEND_BIT, &conn->suspend_rx);
	        write_unlock_bh(&tcp_sw_conn->sock->sk->sk_callback_lock);

	-       if (sk_sleep(sock->sk) && waitqueue_active(sk_sleep(sock->sk))) {
	 -      if (sock->sk->sk_sleep) {
	++      if (sk_sleep(sock->sk)) {
	                sock->sk->sk_err = EIO;
	 -              wake_up_interruptible(sock->sk->sk_sleep);
	 +              wake_up_interruptible(sk_sleep(sock->sk));
	        }

	        iscsi_conn_stop(cls_conn, flag);

but David just picked his side entirely. And that is also the reason for:

> but git show on that commit doesn't list that file in the resolution 
> diff ... even though this is where it actually got killed.

A merge diff ("combined diff") only shows conflicts as defined by "you 
resolved it to something that didn't match either side". That's a _real_ 
conflict. If you just end up picking one side, there is no diff.

> Is this a git problem ... or is it just a mismerge in the net tree?

So it's a mis-merge. You can see the commit with

	gitk v2.6.34.. --full-history drivers/scsi/iscsi_tcp.c

which doesn't do the "ignore the side of a merge that didn't bring any new 
data in". Or, with any recent git, you can use "--simplify-merges" instead 
of full-history, which only simplifies trivial merges where neither side 
really touched things at all.

If you do that, you'll also see why git doesn't show the uninteresting 
side of a merge by default. Just for fun, compare the graphs of

	gitk v2.6.34.. drivers/scsi/iscsi_tcp.c
	gitk v2.6.34.. --simplify-merges drivers/scsi/iscsi_tcp.c
	gitk v2.6.34.. --full-history drivers/scsi/iscsi_tcp.c

and ask yourself: do you normally want to see _all_ the history, even 
stuff that didn't end up affecting the end result?

> Either way, of course, we need the patch back ...

I'll fix it up.

			Linus

^ permalink raw reply

* Re: bug fix patch lost: git problem or just incorrect merge?
From: Linus Torvalds @ 2010-05-21 17:04 UTC (permalink / raw)
  To: James Bottomley; +Cc: David Miller, linux-kernel, netdev, linux-scsi
In-Reply-To: <alpine.LFD.2.00.1005210931010.4243@i5.linux-foundation.org>

On Fri, 21 May 2010, Linus Torvalds wrote:
> 
> > Either way, of course, we need the patch back ...
> 
> I'll fix it up.

Hmm. Pushed that out as appended, since that is the correct resolve.

HOWEVER - the code still doesn't actually make any sense. It does

	if (sk_sleep(sock->sk)) {

and that sk_sleep() today is an inline function that just does

	return &sk->sk_wq->wait;

and testing the result of an address-of operation for NULL is almost 
certainly totally non-sensical. Sure, it _might_ work (maybe 'wait' is the 
first element in the 'sk_wq' data structure, and sk_wq is NULL), but that 
kind of code is always total and utterl crap regardless.

So I pushed it out because I had done the resolve already, and it's no 
worse than it was before, but it's still a steaming buggy pile of shit.

It being iscsi, I can't bring myself to care. But somebody who does, 
should really look at it. The most likely resolution is to remove the test 
entirely, since I don't think it's ever valid to have a socket that 
doesn't have a sk_wq (there's a _lot_ of unconditional use of sk_sleep()).

		Linus

^ permalink raw reply

* Re: bug fix patch lost: git problem or just incorrect merge?
From: James Bottomley @ 2010-05-21 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: David Miller, linux-kernel, netdev, linux-scsi
In-Reply-To: <alpine.LFD.2.00.1005210947440.4243@i5.linux-foundation.org>

On Fri, 2010-05-21 at 10:04 -0700, Linus Torvalds wrote:
> 
> On Fri, 21 May 2010, Linus Torvalds wrote:
> > 
> > > Either way, of course, we need the patch back ...
> > 
> > I'll fix it up.
> 
> Hmm. Pushed that out as appended, since that is the correct resolve.

Thanks!

> HOWEVER - the code still doesn't actually make any sense. It does
> 
> 	if (sk_sleep(sock->sk)) {
> 
> and that sk_sleep() today is an inline function that just does
> 
> 	return &sk->sk_wq->wait;
> 
> and testing the result of an address-of operation for NULL is almost 
> certainly totally non-sensical. Sure, it _might_ work (maybe 'wait' is the 
> first element in the 'sk_wq' data structure, and sk_wq is NULL), but that 
> kind of code is always total and utterl crap regardless.
> 
> So I pushed it out because I had done the resolve already, and it's no 
> worse than it was before, but it's still a steaming buggy pile of shit.

Yes, the problem was caused by this patch

commit 43815482370c510c569fd18edb57afcb0fa8cab6
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Thu Apr 29 11:01:49 2010 +0000

    net: sock_def_readable() and friends RCU conversion

Which moved sk_sleep() from returning the pointer to the waitqueue,
which may or may not be assigned to returning a pointer to an internal
waitqueue in the socket, which, obviously, can never be null.

I suspect what iscsi should be doing is always sending the wakeup ... in
which case with your resolution, the code is operating correctly even if
the form is suboptimal.

> It being iscsi, I can't bring myself to care. But somebody who does, 
> should really look at it. The most likely resolution is to remove the test 
> entirely, since I don't think it's ever valid to have a socket that 
> doesn't have a sk_wq (there's a _lot_ of unconditional use of sk_sleep()).

I'll have Mike look at it, but I think just removing the if() will be
the correct resolution.

James

^ permalink raw reply

* Re: [RFC PATCH] net: add additional lock to qdisc to increase enqueue/dequeue fairness
From: Stephen Hemminger @ 2010-05-21 17:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, alexander.h.duyck, netdev
In-Reply-To: <1274460275.2439.469.camel@edumazet-laptop>

On Fri, 21 May 2010 18:44:35 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le vendredi 21 mai 2010 à 08:43 -0700, Stephen Hemminger a écrit :
> 
> > What about having a special function (spin_lock_greedy?) that just ignores
> > the ticket mechanism and always assumes it has right to next ticket.
> > 
> 
> I am not sure we want to do this, for a single use case...
> Adding a new lock primitive to linux should be really really motivated.
> 
> x86 ticket spinlocks are nice because we are sure no cpu is going to
> stay forever in this code. For other arches, I dont know how this is
> coped.
> 
> I thought about this before re-working on this patch, but found it was
> easier to slightly change Alexander code, ie adding a regular spinlock
> for the slowpath.
> 
> We could use cmpxchg() and manipulate several bits at once in fast path.
> ( __QDISC_STATE_RUNNING, __QDISC_STATE_LOCKED ... ) but making the crowd
> of cpus spin on the same bits/cacheline than dequeue worker would
> definitely slowdown the worker.

Your solution is okay, but it is a special case performance hack
and we seem to be getting more and more of these lately.
This is a general problem of any producer/consumer case;
other code will have same problem (like disk requests) 
and can't use the same solution.

Maybe the RT and scheduler folks have some input because it does
seem like the same problem has occurred in other contexts before.

-- 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox