Netdev List
 help / color / mirror / Atom feed
* [PATCH] net: use synchronize_rcu_expedited()
From: Eric Dumazet @ 2011-05-24  9:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Paul E. McKenney

synchronize_rcu() is very slow in various situations (HZ=100,
CONFIG_NO_HZ=y, CONFIG_PREEMPT=n)

Extract from my (mostly idle) 8 core machine :

 synchronize_rcu() in 99985 us
 synchronize_rcu() in 79982 us
 synchronize_rcu() in 87612 us
 synchronize_rcu() in 79827 us
 synchronize_rcu() in 109860 us
 synchronize_rcu() in 98039 us
 synchronize_rcu() in 89841 us
 synchronize_rcu() in 79842 us
 synchronize_rcu() in 80151 us
 synchronize_rcu() in 119833 us
 synchronize_rcu() in 99858 us
 synchronize_rcu() in 73999 us
 synchronize_rcu() in 79855 us
 synchronize_rcu() in 79853 us


When we hold RTNL mutex, we would like to spend some cpu cycles but not
block too long other processes waiting for this mutex.

We also want to setup/dismantle network features as fast as possible at
boot/shutdown time.

This patch makes synchronize_net() call the expedited version if RTNL is
locked.

synchronize_rcu_expedited() typical delay is about 20 us on my machine.

 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 16 us
 synchronize_rcu_expedited() in 20 us
 synchronize_rcu_expedited() in 18 us
 synchronize_rcu_expedited() in 18 us


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Ben Greear <greearb@candelatech.com>
---
 net/core/dev.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bcb05cb..ec11d75 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5954,7 +5954,10 @@ EXPORT_SYMBOL(free_netdev);
 void synchronize_net(void)
 {
 	might_sleep();
-	synchronize_rcu();
+	if (rtnl_is_locked())
+		synchronize_rcu_expedited();
+	else
+		synchronize_rcu();
 }
 EXPORT_SYMBOL(synchronize_net);
 



^ permalink raw reply related

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-24  9:12 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lguest-uLR06cmDAlY/bJ5BZ2RsiQ, Shirley Ma,
	kvm-u79uwXL29TY76Z2rM5mHXA, Carsten Otte,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, netdev-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <OFEED174CD.D3C9A726-ON6525789A.002661D5-6525789A.002B26ED-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>

On Tue, May 24, 2011 at 01:24:15PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote on 05/23/2011 04:49:00 PM:
> 
> > > To do this properly, we should really be using the actual number of sg
> > > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > > know how many.
> > >
> > > Cheers,
> > > Rusty.
> >
> > Maybe I'm confused here.  The problem isn't the failing
> > add_buf for the given skb IIUC.  What we are trying to do here is stop
> > the queue *before xmit_skb fails*. We can't look at the
> > number of fragments in the current skb - the next one can be
> > much larger.  That's why we check capacity after xmit_skb,
> > not before it, right?
> 
> Maybe Rusty means it is a simpler model to free the amount
> of space that this xmit needs. We will still fail anyway
> at some time but it is unlikely, since earlier iteration
> freed up atleast the space that it was going to use.

Not sure I nderstand.  We can't know space is freed in the previous
iteration as buffers might not have been used by then.

> The
> code could become much simpler:
> 
> start_xmit()
> {
> {
>         num_sgs = get num_sgs for this skb;
> 
>         /* Free enough pending old buffers to enable queueing this one */
>         free_old_xmit_skbs(vi, num_sgs * 2);     /* ?? */
> 
>         if (virtqueue_get_capacity() < num_sgs) {
>                 netif_stop_queue(dev);
>                 if (virtqueue_enable_cb_delayed(vi->svq) ||
>                     free_old_xmit_skbs(vi, num_sgs)) {
>                         /* Nothing freed up, or not enough freed up */
>                         kfree_skb(skb);
>                         return NETDEV_TX_OK;

This packet drop is what we wanted to avoid.


>                 }
>                 netif_start_queue(dev);
>                 virtqueue_disable_cb(vi->svq);
>         }
> 
>         /* xmit_skb cannot fail now, also pass 'num_sgs' */
>         xmit_skb(vi, skb, num_sgs);
>         virtqueue_kick(vi->svq);
> 
>         skb_orphan(skb);
>         nf_reset(skb);
> 
>         return NETDEV_TX_OK;
> }
> 
> We could even return TX_BUSY since that makes the dequeue
> code more efficient. See dev_dequeue_skb() - you can skip a
> lot of code (and avoid taking locks) to check if the queue
> is already stopped but that code runs only if you return
> TX_BUSY in the earlier iteration.
> 
> BTW, shouldn't the check in start_xmit be:
> 	if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
> 		...
> 	}
> 
> Thanks,
> 
> - KK

I thought we used to do basically this but other devices moved to a
model where they stop *before* queueing fails, so we did too.

-- 
MST

^ permalink raw reply

* Re: [PATCH] ethtool: ETHTOOL_SFEATURES: remove NETIF_F_COMPAT return
From: Michał Mirosław @ 2011-05-24  9:14 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev
In-Reply-To: <20110519100331.GA25103@rere.qmqm.pl>

On Thu, May 19, 2011 at 12:03:31PM +0200, Michał Mirosław wrote:
> On Mon, May 16, 2011 at 02:09:58PM -0400, David Miller wrote:
> > You guys really need to sort this out properly.
> > Please resubmit whatever final solution is agreed upon.
> I noticed that v2.6.39 was tagged today. We should definitely remove
> NETIF_F_COMPAT so it won't bite us in the future. The other patch that
> fixes ethtool_ops->set_flags compatibility is a bugfix, so it should go
> in - if we decide that the SFEATURES compatibility should be removed
> it won't matter.
> 
> Ben, do you agree?

Ping?

http://patchwork.ozlabs.org/patch/95552/
(this is a bugfix, so should go to stable)

http://patchwork.ozlabs.org/patch/95753/
(removes ETHTOOL_F_COMPAT; this we need to decide on)

Best Regards,
Michał Mirosław

^ permalink raw reply

* [PATCH v4 1/1] igmp: call ip_mc_clear_src() only when we have no users of ip_mc_list
From: Veaceslav Falico @ 2011-05-24  9:15 UTC (permalink / raw)
  To: David Stevens
  Cc: David Miller, jmorris, kaber, kuznet, linux-kernel, mmarek,
	netdev, netdev-owner, pekkas, yoshfuji
In-Reply-To: <OF8F2A5C61.8060D0CC-ON88257899.00610B06-88257899.006123E3@us.ibm.com>

In igmp_group_dropped() we call ip_mc_clear_src(), which resets the number
of source filters per mulitcast. However, igmp_group_dropped() is also
called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, which
means that the group might get added back on NETDEV_UP, NETDEV_REGISTER and
NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source
filters.

To fix that, we must clear the source filters only when there are no users
in the ip_mc_list, i.e. in ip_mc_dec_group() and on device destroy.

Acked-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>

---
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 1fd3d9c..57ca93a 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1169,20 +1169,18 @@ static void igmp_group_dropped(struct ip_mc_list *im)
 
 	if (!in_dev->dead) {
 		if (IGMP_V1_SEEN(in_dev))
-			goto done;
+			return;
 		if (IGMP_V2_SEEN(in_dev)) {
 			if (reporter)
 				igmp_send_report(in_dev, im, IGMP_HOST_LEAVE_MESSAGE);
-			goto done;
+			return;
 		}
 		/* IGMPv3 */
 		igmpv3_add_delrec(in_dev, im);
 
 		igmp_ifc_event(in_dev);
 	}
-done:
 #endif
-	ip_mc_clear_src(im);
 }
 
 static void igmp_group_added(struct ip_mc_list *im)
@@ -1319,6 +1317,7 @@ void ip_mc_dec_group(struct in_device *in_dev, __be32 addr)
 				*ip = i->next_rcu;
 				in_dev->mc_count--;
 				igmp_group_dropped(i);
+				ip_mc_clear_src(i);
 
 				if (!in_dev->dead)
 					ip_rt_multicast_event(in_dev);
@@ -1428,7 +1427,8 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
 		in_dev->mc_list = i->next_rcu;
 		in_dev->mc_count--;
 
-		igmp_group_dropped(i);
+		/* We've dropped the groups in ip_mc_down already */
+		ip_mc_clear_src(i);
 		ip_ma_put(i);
 	}
 }

^ permalink raw reply related

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Krishna Kumar2 @ 2011-05-24  9:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christian Borntraeger, Carsten Otte, habanero, Heiko Carstens,
	kvm, lguest, linux-kernel, linux-s390, linux390, netdev,
	Rusty Russell, Martin Schwidefsky, steved, Tom Lendacky,
	virtualization, Shirley Ma
In-Reply-To: <20110524091255.GB16886@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/24/2011 02:42:55 PM:

> > > > To do this properly, we should really be using the actual number of
sg
> > > > elements needed, but we'd have to do most of xmit_skb beforehand so
we
> > > > know how many.
> > > >
> > > > Cheers,
> > > > Rusty.
> > >
> > > Maybe I'm confused here.  The problem isn't the failing
> > > add_buf for the given skb IIUC.  What we are trying to do here is
stop
> > > the queue *before xmit_skb fails*. We can't look at the
> > > number of fragments in the current skb - the next one can be
> > > much larger.  That's why we check capacity after xmit_skb,
> > > not before it, right?
> >
> > Maybe Rusty means it is a simpler model to free the amount
> > of space that this xmit needs. We will still fail anyway
> > at some time but it is unlikely, since earlier iteration
> > freed up atleast the space that it was going to use.
>
> Not sure I nderstand.  We can't know space is freed in the previous
> iteration as buffers might not have been used by then.

Yes, the first few iterations may not have freed up space, but
later ones should. The amount of free space should increase
from then on, especially since we try to free double of what
we consume.

> > The
> > code could become much simpler:
> >
> > start_xmit()
> > {
> > {
> >         num_sgs = get num_sgs for this skb;
> >
> >         /* Free enough pending old buffers to enable queueing this one
*/
> >         free_old_xmit_skbs(vi, num_sgs * 2);     /* ?? */
> >
> >         if (virtqueue_get_capacity() < num_sgs) {
> >                 netif_stop_queue(dev);
> >                 if (virtqueue_enable_cb_delayed(vi->svq) ||
> >                     free_old_xmit_skbs(vi, num_sgs)) {
> >                         /* Nothing freed up, or not enough freed up */
> >                         kfree_skb(skb);
> >                         return NETDEV_TX_OK;
>
> This packet drop is what we wanted to avoid.

Please see below on returning NETDEV_TX_BUSY.

>
> >                 }
> >                 netif_start_queue(dev);
> >                 virtqueue_disable_cb(vi->svq);
> >         }
> >
> >         /* xmit_skb cannot fail now, also pass 'num_sgs' */
> >         xmit_skb(vi, skb, num_sgs);
> >         virtqueue_kick(vi->svq);
> >
> >         skb_orphan(skb);
> >         nf_reset(skb);
> >
> >         return NETDEV_TX_OK;
> > }
> >
> > We could even return TX_BUSY since that makes the dequeue
> > code more efficient. See dev_dequeue_skb() - you can skip a
> > lot of code (and avoid taking locks) to check if the queue
> > is already stopped but that code runs only if you return
> > TX_BUSY in the earlier iteration.
> >
> > BTW, shouldn't the check in start_xmit be:
> >    if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
> >       ...
> >    }
> >
> > Thanks,
> >
> > - KK
>
> I thought we used to do basically this but other devices moved to a
> model where they stop *before* queueing fails, so we did too.

I am not sure of why it was changed, since returning TX_BUSY
seems more efficient IMHO. qdisc_restart() handles requeue'd
packets much better than a stopped queue, as a significant
part of this code is skipped if gso_skb is present (qdisc
will eventually start dropping packets when tx_queue_len is
exceeded anyway).

Thanks,

- KK


^ permalink raw reply

* [PATCH] ethtool: Pause frame reporting fixes
From: Esa-Pekka Pyökkimies @ 2011-05-24 10:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: bhutchings@solarflare.com

Hello. Our software needs to modify pause parameters dynamically.
For this to work reliably the SUPPORTED_Pause and SUPPORTED_Asym_Pause
flags need to be set correctly by the driver. As it turns out, some drivers
leave them at zero even though they implement pause frames.
This patch adds reporting for these flags to ethtool.
For some reason the current code reports ADVERTISED_Pause
but not SUPPORTED_Pause. Also there was a small bug
in the reporting for ADVERTISED_Pause.

Esa-Pekka Pyokkimies, Software Specialist, Stonesoft

 ethtool.c |   19 ++++++++++++++++---
 1 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 34fe107..cf6c4b2 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -1193,6 +1193,19 @@ static void dump_supported(struct ethtool_cmd *ep)
 		did1++; fprintf(stdout, "10000baseT/Full ");
 	}
 	fprintf(stdout, "\n");
+        
+	fprintf(stdout, "	Supported pause frame use: ");
+	if (mask & SUPPORTED_Pause) {
+		if (mask & SUPPORTED_Asym_Pause)
+			fprintf(stdout, "Receive-only\n");
+		else
+			fprintf(stdout, "Symmetric\n");
+	} else {
+		if (mask & SUPPORTED_Asym_Pause)
+			fprintf(stdout, "Transmit-only\n");
+		else
+			fprintf(stdout, "No\n");
+        }
 
 	fprintf(stdout, "	Supports auto-negotiation: ");
 	if (mask & SUPPORTED_Autoneg)
@@ -1255,10 +1268,10 @@ static void dump_advertised(struct ethtool_cmd *ep,
 
 	fprintf(stdout, "	%s pause frame use: ", prefix);
 	if (mask & ADVERTISED_Pause) {
-		fprintf(stdout, "Symmetric");
 		if (mask & ADVERTISED_Asym_Pause)
-			fprintf(stdout, " Receive-only");
-		fprintf(stdout, "\n");
+			fprintf(stdout, "Receive-only\n");
+                else
+			fprintf(stdout, "Symmetric\n");
 	} else {
 		if (mask & ADVERTISED_Asym_Pause)
 			fprintf(stdout, "Transmit-only\n");

^ permalink raw reply related

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-24 11:29 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lguest-uLR06cmDAlY/bJ5BZ2RsiQ, Shirley Ma,
	kvm-u79uwXL29TY76Z2rM5mHXA, Carsten Otte,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, netdev-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <OF69E520FD.340352AC-ON6525789A.003308A2-6525789A.0033F2DF-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>

On Tue, May 24, 2011 at 02:57:43PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote on 05/24/2011 02:42:55 PM:
> 
> > > > > To do this properly, we should really be using the actual number of
> sg
> > > > > elements needed, but we'd have to do most of xmit_skb beforehand so
> we
> > > > > know how many.
> > > > >
> > > > > Cheers,
> > > > > Rusty.
> > > >
> > > > Maybe I'm confused here.  The problem isn't the failing
> > > > add_buf for the given skb IIUC.  What we are trying to do here is
> stop
> > > > the queue *before xmit_skb fails*. We can't look at the
> > > > number of fragments in the current skb - the next one can be
> > > > much larger.  That's why we check capacity after xmit_skb,
> > > > not before it, right?
> > >
> > > Maybe Rusty means it is a simpler model to free the amount
> > > of space that this xmit needs. We will still fail anyway
> > > at some time but it is unlikely, since earlier iteration
> > > freed up atleast the space that it was going to use.
> >
> > Not sure I nderstand.  We can't know space is freed in the previous
> > iteration as buffers might not have been used by then.
> 
> Yes, the first few iterations may not have freed up space, but
> later ones should. The amount of free space should increase
> from then on, especially since we try to free double of what
> we consume.

Hmm. This is only an upper limit on the # of entries in the queue.
Assume that vq size is 4 and we transmit 4 enties without
getting anything in the used ring. The next transmit will fail.

So I don't really see why it's unlikely that we reach the packet
drop code with your patch.

> > > The
> > > code could become much simpler:
> > >
> > > start_xmit()
> > > {
> > > {
> > >         num_sgs = get num_sgs for this skb;
> > >
> > >         /* Free enough pending old buffers to enable queueing this one
> */
> > >         free_old_xmit_skbs(vi, num_sgs * 2);     /* ?? */
> > >
> > >         if (virtqueue_get_capacity() < num_sgs) {
> > >                 netif_stop_queue(dev);
> > >                 if (virtqueue_enable_cb_delayed(vi->svq) ||
> > >                     free_old_xmit_skbs(vi, num_sgs)) {
> > >                         /* Nothing freed up, or not enough freed up */
> > >                         kfree_skb(skb);
> > >                         return NETDEV_TX_OK;
> >
> > This packet drop is what we wanted to avoid.
> 
> Please see below on returning NETDEV_TX_BUSY.
> 
> >
> > >                 }
> > >                 netif_start_queue(dev);
> > >                 virtqueue_disable_cb(vi->svq);
> > >         }
> > >
> > >         /* xmit_skb cannot fail now, also pass 'num_sgs' */
> > >         xmit_skb(vi, skb, num_sgs);
> > >         virtqueue_kick(vi->svq);
> > >
> > >         skb_orphan(skb);
> > >         nf_reset(skb);
> > >
> > >         return NETDEV_TX_OK;
> > > }
> > >
> > > We could even return TX_BUSY since that makes the dequeue
> > > code more efficient. See dev_dequeue_skb() - you can skip a
> > > lot of code (and avoid taking locks) to check if the queue
> > > is already stopped but that code runs only if you return
> > > TX_BUSY in the earlier iteration.
> > >
> > > BTW, shouldn't the check in start_xmit be:
> > >    if (likely(!free_old_xmit_skbs(vi, 2+MAX_SKB_FRAGS))) {
> > >       ...
> > >    }
> > >
> > > Thanks,
> > >
> > > - KK
> >
> > I thought we used to do basically this but other devices moved to a
> > model where they stop *before* queueing fails, so we did too.
> 
> I am not sure of why it was changed, since returning TX_BUSY
> seems more efficient IMHO.
> qdisc_restart() handles requeue'd
> packets much better than a stopped queue, as a significant
> part of this code is skipped if gso_skb is present

I think this is the argument:
http://www.mail-archive.com/virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org/msg06364.html


> (qdisc
> will eventually start dropping packets when tx_queue_len is
> exceeded anyway).
> 
> Thanks,
> 
> - KK

tx_queue_len is a pretty large buffer so maybe no.
I think the packet drops from the scheduler queue can also be
done intelligently (e.g. with CHOKe) which should
work better than dropping a random packet?

-- 
MST

^ permalink raw reply

* [PATCH net-2.6] bnx2x: fix inverted condition
From: Dmitry Kravkov @ 2011-05-24 12:06 UTC (permalink / raw)
  To: davem, netdev; +Cc: Eilon Greenstein


Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_cmn.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_cmn.c b/drivers/net/bnx2x/bnx2x_cmn.c
index d5bd35b..2890443 100644
--- a/drivers/net/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/bnx2x/bnx2x_cmn.c
@@ -2675,7 +2675,7 @@ alloc_mem_err:
 	 * Min size diferent for TPA and non-TPA queues
 	 */
 	if (ring_size < (fp->disable_tpa ?
-				MIN_RX_SIZE_TPA : MIN_RX_SIZE_NONTPA)) {
+				MIN_RX_SIZE_NONTPA : MIN_RX_SIZE_TPA)) {
 			/* release memory allocated for this queue */
 			bnx2x_free_fp_mem_at(bp, index);
 			return -ENOMEM;
-- 
1.7.2.2





^ permalink raw reply related

* [PATCH] seqlock: get rid of SEQLOCK_UNLOCKED
From: Eric Dumazet @ 2011-05-24 12:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, David Miller, netdev, Thomas Gleixner

All static seqlock should be initialized with the lockdep friendly
__SEQLOCK_UNLOCKED() macro.

Remove legacy SEQLOCK_UNLOCKED() macro.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Miller <davem@davemloft.net>
---
 arch/ia64/kernel/time.c         |    2 +-
 arch/x86/kernel/vsyscall_64.c   |    2 +-
 include/linux/seqlock.h         |    3 ---
 net/ipv4/inet_connection_sock.c |    2 +-
 4 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c
index 04440cc..85118df 100644
--- a/arch/ia64/kernel/time.c
+++ b/arch/ia64/kernel/time.c
@@ -36,7 +36,7 @@
 static cycle_t itc_get_cycles(struct clocksource *cs);
 
 struct fsyscall_gtod_data_t fsyscall_gtod_data = {
-	.lock = SEQLOCK_UNLOCKED,
+	.lock = __SEQLOCK_UNLOCKED(fsyscall_gtod_data.lock),
 };
 
 struct itc_jitter_data_t itc_jitter_data;
diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c
index dcbb28c..59be48d 100644
--- a/arch/x86/kernel/vsyscall_64.c
+++ b/arch/x86/kernel/vsyscall_64.c
@@ -59,7 +59,7 @@ int __vgetcpu_mode __section_vgetcpu_mode;
 
 struct vsyscall_gtod_data __vsyscall_gtod_data __section_vsyscall_gtod_data =
 {
-	.lock = SEQLOCK_UNLOCKED,
+	.lock = __SEQLOCK_UNLOCKED(__vsyscall_gtod_data.lock),
 	.sysctl_enabled = 1,
 };
 
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 06d6964..e981189 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -41,9 +41,6 @@ typedef struct {
 #define __SEQLOCK_UNLOCKED(lockname) \
 		 { 0, __SPIN_LOCK_UNLOCKED(lockname) }
 
-#define SEQLOCK_UNLOCKED \
-		 __SEQLOCK_UNLOCKED(old_style_seqlock_init)
-
 #define seqlock_init(x)					\
 	do {						\
 		(x)->sequence = 0;			\
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 61fac4c..c14d88a 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(inet_csk_timer_bug_msg);
  * This struct holds the first and last local port number.
  */
 struct local_ports sysctl_local_ports __read_mostly = {
-	.lock = SEQLOCK_UNLOCKED,
+	.lock = __SEQLOCK_UNLOCKED(sysctl_local_ports.lock),
 	.range = { 32768, 61000 },
 };
 

^ permalink raw reply related

* [PATCH 1/1] IPVS : bug in ip_vs_ftp, same list heaad used in all netns.
From: Hans Schillstrom @ 2011-05-24 12:11 UTC (permalink / raw)
  To: horms, ja, wensong, lvs-devel, netdev, netfilter-devel
  Cc: hans, Hans Schillstrom

When ip_vs was adapted to netns the ftp application was not adapted
in a correct way.
However this is a fix to avoid kernel errors. In the long term another solution
might be chosen.  I.e the ports that the ftp appl, uses should be per netns.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 include/net/ip_vs.h            |    3 ++-
 net/netfilter/ipvs/ip_vs_ftp.c |   27 +++++++++++++++++++--------
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 4fff432..481f856 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -797,7 +797,8 @@ struct netns_ipvs {
 	struct list_head	rs_table[IP_VS_RTAB_SIZE];
 	/* ip_vs_app */
 	struct list_head	app_list;
-
+	/* ip_vs_ftp */
+	struct ip_vs_app	*ftp_app;
 	/* ip_vs_proto */
 	#define IP_VS_PROTO_TAB_SIZE	32	/* must be power of 2 */
 	struct ip_vs_proto_data *proto_data_table[IP_VS_PROTO_TAB_SIZE];
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 6b5dd6d..af63553 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -411,25 +411,35 @@ static struct ip_vs_app ip_vs_ftp = {
 static int __net_init __ip_vs_ftp_init(struct net *net)
 {
 	int i, ret;
-	struct ip_vs_app *app = &ip_vs_ftp;
+	struct ip_vs_app *app;
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
+	app = kmemdup(&ip_vs_ftp, sizeof(struct ip_vs_app), GFP_KERNEL);
+	if (!app)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&app->a_list);
+	INIT_LIST_HEAD(&app->incs_list);
+	ipvs->ftp_app = app;
 
 	ret = register_ip_vs_app(net, app);
 	if (ret)
-		return ret;
+		goto err_exit;
 
 	for (i=0; i<IP_VS_APP_MAX_PORTS; i++) {
 		if (!ports[i])
 			continue;
 		ret = register_ip_vs_app_inc(net, app, app->protocol, ports[i]);
 		if (ret)
-			break;
+			goto err_unreg;
 		pr_info("%s: loaded support on port[%d] = %d\n",
 			app->name, i, ports[i]);
 	}
+	return 0;
 
-	if (ret)
-		unregister_ip_vs_app(net, app);

^ permalink raw reply related

* Re: linux-next: build failure after merge of the final tree
From: Greg KH @ 2011-05-24 12:48 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Mike Frysinger, Linus, linux-next, linux-kernel, David S. Miller,
	netdev, Andrew Morton, Mel Gorman, linux-mm, Alexander Viro,
	linux-fsdevel, Paul E. McKenney, Dipankar Sarma
In-Reply-To: <20110524135930.bb4c5506.sfr@canb.auug.org.au>

On Tue, May 24, 2011 at 01:59:30PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> On Mon, 23 May 2011 19:51:51 -0700 Greg KH <greg@kroah.com> wrote:
> >
> > On Mon, May 23, 2011 at 10:06:40PM -0400, Mike Frysinger wrote:
> > > On Fri, May 20, 2011 at 02:18, Stephen Rothwell wrote:
> > > > Caused by commit e66eed651fd1 ("list: remove prefetching from regular list
> > > > iterators").
> > > >
> > > > I added the following patch for today:
> > > 
> > > probably should get added to whatever tree that commit is coming from
> > > so we dont have bisect hell ?
> > > 
> > > more failures:
> > > drivers/usb/host/isp1362-hcd.c: In function 'isp1362_write_ptd':
> > > drivers/usb/host/isp1362-hcd.c:355: error: implicit declaration of
> > > function 'prefetch'
> > > drivers/usb/host/isp1362-hcd.c: In function 'isp1362_read_ptd':
> > > drivers/usb/host/isp1362-hcd.c:377: error: implicit declaration of
> > > function 'prefetchw'
> > > make[3]: *** [drivers/usb/host/isp1362-hcd.o] Error 1
> > > 
> > > drivers/usb/musb/musb_core.c: In function 'musb_write_fifo':
> > > drivers/usb/musb/musb_core.c:219: error: implicit declaration of
> > > function 'prefetch'
> > > make[3]: *** [drivers/usb/musb/musb_core.o] Error 1
> > > 
> > > although it seems like it should be fairly trivial to look at the
> > > funcs in linux/prefetch.h, grep the tree, and find a pretty good list
> > > of the files that are missing the include
> > 
> > How did this not show up in linux-next?  Where did the patch that caused
> > this show up from?
> > 
> > totally confused,
> 
> :-)
> 
> sfr said above:
> > Caused by commit e66eed651fd1 ("list: remove prefetching from regular
> > list iterators").
> 
> The cause was a patch from Linus ...

Ah, ok, that makes more sense, sorry for the noise.

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Krishna Kumar2 @ 2011-05-24 12:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lguest-uLR06cmDAlY/bJ5BZ2RsiQ, Shirley Ma,
	kvm-u79uwXL29TY76Z2rM5mHXA, Carsten Otte,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, netdev-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <20110524112901.GB17087-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

"Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote on 05/24/2011 04:59:39 PM:

> > > > Maybe Rusty means it is a simpler model to free the amount
> > > > of space that this xmit needs. We will still fail anyway
> > > > at some time but it is unlikely, since earlier iteration
> > > > freed up atleast the space that it was going to use.
> > >
> > > Not sure I nderstand.  We can't know space is freed in the previous
> > > iteration as buffers might not have been used by then.
> >
> > Yes, the first few iterations may not have freed up space, but
> > later ones should. The amount of free space should increase
> > from then on, especially since we try to free double of what
> > we consume.
>
> Hmm. This is only an upper limit on the # of entries in the queue.
> Assume that vq size is 4 and we transmit 4 enties without
> getting anything in the used ring. The next transmit will fail.
>
> So I don't really see why it's unlikely that we reach the packet
> drop code with your patch.

I was assuming 256 entries :) I will try to get some
numbers to see how often it is true tomorrow.

> > I am not sure of why it was changed, since returning TX_BUSY
> > seems more efficient IMHO.
> > qdisc_restart() handles requeue'd
> > packets much better than a stopped queue, as a significant
> > part of this code is skipped if gso_skb is present
>
> I think this is the argument:
> http://www.mail-archive.com/virtualization-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org/msg06364.html

Thanks for digging up that thread! Yes, that one skb would get
sent first ahead of possibly higher priority skbs. However,
from a performance point, TX_BUSY code skips a lot of checks
and code for all subsequent packets till the device is
restarted. I can test performance with both cases and report
what I find (the requeue code has become very simple and clean
from "horribly complex", thanks to Herbert and Dave).

> > (qdisc
> > will eventually start dropping packets when tx_queue_len is
>
> tx_queue_len is a pretty large buffer so maybe no.

I remember seeing tons of drops (pfifo_fast_enqueue) when
xmit returns TX_BUSY.

> I think the packet drops from the scheduler queue can also be
> done intelligently (e.g. with CHOKe) which should
> work better than dropping a random packet?

I am not sure of that - choke_enqueue checks against a random
skb to drop current skb, and also during congestion. But for
my "sample driver xmit", returning TX_BUSY could still allow
to be used with CHOKe.

thanks,

- KK

^ permalink raw reply

* Re: [PATCH] be2net: hash key for rss-config cmd not set
From: Flavio Leitner @ 2011-05-24 13:12 UTC (permalink / raw)
  To: Sathya Perla; +Cc: netdev
In-Reply-To: <edb3e5b8-a38f-4efe-b267-9b5b332f16a4@exht1.ad.emulex.com>

On 05/24/2011 03:29 AM, Sathya Perla wrote:
> A non-zero, non-descript value is needed as the hash key. The hash variable was left un-initialized; but sometimes it gets a zero value
> and hashing is not effective. The constant value used now (not of any significance) seems to work fine.
> 
> Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
> ---
>  drivers/net/benet/be_cmds.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
> index 2463b1c..81654ae 100644
> --- a/drivers/net/benet/be_cmds.c
> +++ b/drivers/net/benet/be_cmds.c
> @@ -1703,7 +1703,8 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable, u16 table_size)
>  {
>  	struct be_mcc_wrb *wrb;
>  	struct be_cmd_req_rss_config *req;
> -	u32 myhash[10];
> +	u32 myhash[10] = {0x0123, 0x4567, 0x89AB, 0xCDEF, 0x01EF,
> +			0x0123, 0x4567, 0x89AB, 0xCDEF, 0x01EF};
>  	int status;
>  
>  	if (mutex_lock_interruptible(&adapter->mbox_lock))

I know we have the changelog saying that it's not of any significance,
but would be nice to have the same thing in the source as well.

fbl

^ permalink raw reply

* Re: 2.6.38.x, 2.6.39 sfq? kernel panic in sfq_enqueue
From: Eric Dumazet @ 2011-05-24 13:17 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: netdev, hadi
In-Reply-To: <d3a37593d507f6e16c4e7cbf121cdd6b@visp.net.lb>

Le mardi 24 mai 2011 à 01:35 +0300, Denys Fedoryshchenko a écrit :
> On Mon, 23 May 2011 17:05:46 +0200, Eric Dumazet wrote:
> > Le lundi 23 mai 2011 à 17:27 +0300, Denys Fedoryshchenko a écrit :
> >
> >> > Thanks !
> >>  By objdump or he must recompile kernel with DEBUG_INFO and use gdb?
> >>
> >>
> >>
> >
> >
> > Just objdump of sch_sfq.o (or sch_sfq.ko) file, (to keep existing
> > offsets), thanks
>  Sorry for delay, i had to contact person who run that kernel :)
> 
>  www.nuclearcat.com/files/disasm.txt
>  www.nuclearcat.com/files/sch_sfq.ko
> 
> 

Thanks !

BTW, does this person have another bug trace to feed us ?

I tried to reproduce it here but failed.

full dmesg might help too 



^ permalink raw reply

* Re: 2.6.38.x, 2.6.39 sfq? kernel panic in sfq_enqueue
From: Denys Fedoryshchenko @ 2011-05-24 13:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, hadi
In-Reply-To: <1306243069.3026.34.camel@edumazet-laptop>

 On Tue, 24 May 2011 15:17:49 +0200, Eric Dumazet wrote:
> Le mardi 24 mai 2011 à 01:35 +0300, Denys Fedoryshchenko a écrit :
>> On Mon, 23 May 2011 17:05:46 +0200, Eric Dumazet wrote:
>> > Le lundi 23 mai 2011 à 17:27 +0300, Denys Fedoryshchenko a écrit :
>> >
>> >> > Thanks !
>> >>  By objdump or he must recompile kernel with DEBUG_INFO and use 
>> gdb?
>> >>
>> >>
>> >>
>> >
>> >
>> > Just objdump of sch_sfq.o (or sch_sfq.ko) file, (to keep existing
>> > offsets), thanks
>>  Sorry for delay, i had to contact person who run that kernel :)
>>
>>  www.nuclearcat.com/files/disasm.txt
>>  www.nuclearcat.com/files/sch_sfq.ko
>>
>>
>
> Thanks !
>
> BTW, does this person have another bug trace to feed us ?
>
> I tried to reproduce it here but failed.
>
> full dmesg might help too
 We can try to compile also kernel with debug info and run gdb on it 
 maybe, i will ask him about that.


^ permalink raw reply

* Re: [PATCH v3] CDC NCM: release interfaces fix in unbind()
From: Oliver Neukum @ 2011-05-24 13:27 UTC (permalink / raw)
  To: David Miller
  Cc: alexey.orishko-Re5JQEeQqe8AvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	gregkh-l3A5Bk7waGM, stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw
In-Reply-To: <20110523.174647.1404735155326791304.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Am Montag, 23. Mai 2011, 23:46:47 schrieb David Miller:
> From: Alexey Orishko <alexey.orishko-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date: Sat, 21 May 2011 02:42:16 +0200
> 
> > Changes:
> > - claim slave/data interface during bind() and release
> >  interfaces in unbind() unconditionally
> > - in case of error during bind(), release claimed data
> >  interface in the same function
> > - remove obsolited "*_claimed" entries from driver context
> > 
> > Signed-off-by: Alexey Orishko <alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw@public.gmane.org>
> 
> Oliver, this look good to you?

I'd like to see intfdata set to NULL before an interface is unclaimed.

	Regards
		Oliver

-- 
- - - 
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) 
Maxfeldstraße 5                         
90409 Nürnberg 
Germany 
- - - 
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH v3] CDC NCM: release interfaces fix in unbind()
From: Alexey ORISHKO @ 2011-05-24 13:32 UTC (permalink / raw)
  To: Oliver Neukum, David Miller
  Cc: netdev@vger.kernel.org, linux-usb@vger.kernel.org, gregkh@suse.de,
	stern@rowland.harvard.edu
In-Reply-To: <201105241527.30270.oneukum@suse.de>

> From: Oliver Neukum [mailto:oneukum@suse.de]
> >
> > Oliver, this look good to you?
> 
> I'd like to see intfdata set to NULL before an interface is unclaimed.
> 
> 	Regards
> 		Oliver

I'll update it shortly and post it.

BR Alexey

^ permalink raw reply

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-24 13:52 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	lguest-uLR06cmDAlY/bJ5BZ2RsiQ, Shirley Ma,
	kvm-u79uwXL29TY76Z2rM5mHXA, Carsten Otte,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, netdev-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <OF2D4A7890.FFA91132-ON6525789A.0043E0AC-6525789A.00464690-xthvdsQ13ZrQT0dZR+AlfA@public.gmane.org>

On Tue, May 24, 2011 at 06:20:35PM +0530, Krishna Kumar2 wrote:
> "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote on 05/24/2011 04:59:39 PM:
> 
> > > > > Maybe Rusty means it is a simpler model to free the amount
> > > > > of space that this xmit needs. We will still fail anyway
> > > > > at some time but it is unlikely, since earlier iteration
> > > > > freed up atleast the space that it was going to use.
> > > >
> > > > Not sure I nderstand.  We can't know space is freed in the previous
> > > > iteration as buffers might not have been used by then.
> > >
> > > Yes, the first few iterations may not have freed up space, but
> > > later ones should. The amount of free space should increase
> > > from then on, especially since we try to free double of what
> > > we consume.
> >
> > Hmm. This is only an upper limit on the # of entries in the queue.
> > Assume that vq size is 4 and we transmit 4 enties without
> > getting anything in the used ring. The next transmit will fail.
> >
> > So I don't really see why it's unlikely that we reach the packet
> > drop code with your patch.
> 
> I was assuming 256 entries :) I will try to get some
> numbers to see how often it is true tomorrow.

That would depend on how fast the hypervisor is.
Try doing something to make hypervisor slower than the guest.  I don't
think we need measurements to realize that with the host being slower
than guest that would happen a lot, though.

> > > I am not sure of why it was changed, since returning TX_BUSY
> > > seems more efficient IMHO.
> > > qdisc_restart() handles requeue'd
> > > packets much better than a stopped queue, as a significant
> > > part of this code is skipped if gso_skb is present
> >
> > I think this is the argument:
> > http://www.mail-archive.com/virtualization-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> > foundation.org/msg06364.html
> 
> Thanks for digging up that thread! Yes, that one skb would get
> sent first ahead of possibly higher priority skbs. However,
> from a performance point, TX_BUSY code skips a lot of checks
> and code for all subsequent packets till the device is
> restarted. I can test performance with both cases and report
> what I find (the requeue code has become very simple and clean
> from "horribly complex", thanks to Herbert and Dave).

Cc Herbert, and try to convince him :)

> > > (qdisc
> > > will eventually start dropping packets when tx_queue_len is
> >
> > tx_queue_len is a pretty large buffer so maybe no.
> 
> I remember seeing tons of drops (pfifo_fast_enqueue) when
> xmit returns TX_BUSY.
> 
> > I think the packet drops from the scheduler queue can also be
> > done intelligently (e.g. with CHOKe) which should
> > work better than dropping a random packet?
> 
> I am not sure of that - choke_enqueue checks against a random
> skb to drop current skb, and also during congestion. But for
> my "sample driver xmit", returning TX_BUSY could still allow
> to be used with CHOKe.
> 
> thanks,
> 
> - KK

^ permalink raw reply

* Re: [RFC Patch] bonding: move to net/ directory
From: Américo Wang @ 2011-05-24 14:00 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Linux Kernel Network Developers, David Miller, Jay Vosburgh
In-Reply-To: <20110523151336.GF21309@gospo.rdu.redhat.com>

On Mon, May 23, 2011 at 11:13 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
> On Mon, May 23, 2011 at 08:45:14PM +0800, Américo Wang wrote:
>> Hello, Jay, Andy,
>>
>> Is there any peculiar reason that bonding code has to stay
>> in drivers/net/ directory?
>>
>> Since bonding and bridge are somewhat similar, both of
>> which are used to "bond" two or more devices into one,
>> and bridge code is already in net/bridge/, so I think it also
>> makes sense to move bonding code into net/bonding/ too.
>>
>> This could also help to grep the source more easily. :)
>>
>> What do you think about the patch below?
>> (Note, this patch is against net-next-2.6.)
>>
>
> I would rather keep the code for bonding in drivers/net since it is
> really a pure device (though not directly tied to any specific
> hardware) rather than a device + forwarding or management code.

Is this a reason strong enough to leave it to drivers/net/ ?
I think it is generic enough to be moved to net/ directory... :-/

>
> It has bothered me for a long time that the code just to manage the VLAN
> and bridge devices sits outside of drivers/net, but I've never proposed
> a patch to move the files as I suspect the maintainers of that code
> would like to keep it all together.  Maybe it is time to do that.
>

You mean move net/8021q/ to drivers/net/8021q/ ?

Thanks!

^ permalink raw reply

* bridge netfilter output bug on 2.6.39
From: Stephen Hemminger @ 2011-05-24 14:41 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev

Got this bug report against 2.6.39.  Looks like ip_fragment() is now
getting confused when called from bridge netfilter. Probably related to
the changes to do ip_options_compile for the bridge input path.

https://bugzilla.kernel.org/show_bug.cgi?id=35672

May 23 02:04:24 lxc kernel: [99498.329036] BUG: unable to handle kernel NULL
pointer dereference at 00000004
May 23 02:04:24 lxc kernel: [99498.330017] IP: [<c143d6bf>] dst_mtu+0xb/0x1c
May 23 02:04:24 lxc kernel: [99498.330017] *pdpt = 000000001fb55001 *pde =
0000000000000000
May 23 02:04:24 lxc kernel: [99498.330017] Oops: 0000 [#1] SMP
May 23 02:04:24 lxc kernel: [99498.330017] last sysfs file:
/sys/devices/virtual/vc/vcsa8/uevent
May 23 02:04:24 lxc kernel: [99498.330017] Modules linked in: lp ppdev
parport_pc parport fuse firewire_ohci firewire_core crc_itu_t intel_agp
intel_gtt
May 23 02:04:24 lxc kernel: [99498.330017]
May 23 02:04:24 lxc kernel: [99498.330017] Pid: 0, comm: swapper Not tainted
2.6.39-lxc #2 .   .  /IP35 Pro XE(Intel P35-ICH9R)
May 23 02:04:24 lxc kernel: [99498.330017] EIP: 0060:[<c143d6bf>] EFLAGS:
00010246 CPU: 0
May 23 02:04:24 lxc kernel: [99498.330017] EIP is at dst_mtu+0xb/0x1c
May 23 02:04:24 lxc kernel: [99498.330017] EAX: 00000000 EBX: e90b6b40 ECX:
effc981c EDX: effc9000
May 23 02:04:24 lxc kernel: [99498.330017] ESI: c1a0d84e EDI: dda6331e EBP:
f080bb44 ESP: f080bb44
May 23 02:04:24 lxc kernel: [99498.330017]  DS: 007b ES: 007b FS: 00d8 GS: 0000
SS: 0068
May 23 02:04:24 lxc kernel: [99498.330017] Process swapper (pid: 0, ti=f080a000
task=c172b7e0 task.ti=c1724000)
May 23 02:04:24 lxc kernel: [99498.330017] Stack:
May 23 02:04:24 lxc kernel: [99498.330017]  f080bb8c c143e20d 00000004 f080bb88
c141aab2 c14b46db effc9000 00000014
May 23 02:04:24 lxc kernel: [99498.330017]  c14b8a44 effc9000 e90b6b40 00000014
effc981c e90b6b58 cd472800 e90b6b40
May 23 02:04:24 lxc kernel: [99498.330017]  c14b8a44 dda6331e f080bb98 c14b8aa0
e90b6b40 f080bba8 c14b881a e90b6b40
May 23 02:04:24 lxc kernel: [99498.330017] Call Trace:
May 23 02:04:24 lxc kernel: [99498.330017]  [<c143e20d>] ip_fragment+0xb5/0x66c
May 23 02:04:24 lxc kernel: [99498.330017]  [<c141aab2>] ?
nf_hook_slow+0x43/0xd1
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b46db>] ? br_flood+0x83/0x83
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8a44>] ?
br_parse_ip_options+0x1b0/0x1b0
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8a44>] ?
br_parse_ip_options+0x1b0/0x1b0
May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8aa0>]
br_nf_dev_queue_xmit+0x5c/0x68

^ permalink raw reply

* Re: [patch 1/1] net: convert %p usage to %pK
From: Stephen Hemminger @ 2011-05-24 14:43 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, joe, mingo, akpm, netdev, drosenberg, a.p.zijlstra,
	eparis, eugeneteo, jmorris, kees.cook, tgraf
In-Reply-To: <20110524.035801.1555795213632087107.davem@davemloft.net>

On Tue, 24 May 2011 03:58:01 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 24 May 2011 09:45:01 +0200
> 
> > Le mardi 24 mai 2011 à 00:35 -0700, Joe Perches a écrit :
> > 
> >> I think it's be better without the casts
> >> using the standard kernel.h macros.
> >> 
> >> 	void *ptr;
> >> 
> >> 	ptr = maybe_hide_ptr(sk);
> >> 	r->id.idiag_cookie[0] = lower_32_bits(ptr);
> >> 	r->id.idiag_cookie[1] = upper_32_bits(ptr);
> >> 
> > 
> > I am not sure I want to patch lower_32_bits() and upper_32_bits() for
> > this.
> > 
> > They dont work on pointers, but on "numbers", according to kerneldoc
> > Andrew wrote years ago. gcc agrees :
> > 
> > net/ipv4/inet_diag.c: In function ‘inet_csk_diag_fill’:
> > net/ipv4/inet_diag.c:119: warning: cast from pointer to integer of different size
> > net/ipv4/inet_diag.c:120: error: invalid operands to binary >>
> > make[1]: *** [net/ipv4/inet_diag.o] Error 1
> 
> Also you can't do this, the "cookie" is used by the kernel future
> lookups to find sockets.
> 
> The kernel pointer is part of the API, so sorry you can't "hide"
> kernel pointers in this case without really breaking user visible
> things.
> [Error decoding BASE64]

Couldn't the pointer be replaced by a socket index?

-- 

^ permalink raw reply

* Re: [RFC Patch] bonding: move to net/ directory
From: Andy Gospodarek @ 2011-05-24 15:03 UTC (permalink / raw)
  To: Américo Wang
  Cc: Andy Gospodarek, Linux Kernel Network Developers, David Miller,
	Jay Vosburgh
In-Reply-To: <BANLkTikT2HJyGb_4BOF+Ds1AAcJV+HVQ-w@mail.gmail.com>

On Tue, May 24, 2011 at 10:00:23PM +0800, Américo Wang wrote:
> On Mon, May 23, 2011 at 11:13 PM, Andy Gospodarek <andy@greyhouse.net> wrote:
> > On Mon, May 23, 2011 at 08:45:14PM +0800, Américo Wang wrote:
> >> Hello, Jay, Andy,
> >>
> >> Is there any peculiar reason that bonding code has to stay
> >> in drivers/net/ directory?
> >>
> >> Since bonding and bridge are somewhat similar, both of
> >> which are used to "bond" two or more devices into one,
> >> and bridge code is already in net/bridge/, so I think it also
> >> makes sense to move bonding code into net/bonding/ too.
> >>
> >> This could also help to grep the source more easily. :)
> >>
> >> What do you think about the patch below?
> >> (Note, this patch is against net-next-2.6.)
> >>
> >
> > I would rather keep the code for bonding in drivers/net since it is
> > really a pure device (though not directly tied to any specific
> > hardware) rather than a device + forwarding or management code.
> 
> Is this a reason strong enough to leave it to drivers/net/ ?
> I think it is generic enough to be moved to net/ directory... :-/
> 

I think the distinction is an important one and is one of the main
reasons why I would like to see bonding stay in drivers/net.

> >
> > It has bothered me for a long time that the code just to manage the VLAN
> > and bridge devices sits outside of drivers/net, but I've never proposed
> > a patch to move the files as I suspect the maintainers of that code
> > would like to keep it all together.  Maybe it is time to do that.
> >
> 
> You mean move net/8021q/ to drivers/net/8021q/ ?
> 

No.  I'm talking about the parts in the bridging and vlan code that
specifically setup devices, not all of the code.  I would be happier
if code that created objects of type net_device_ops all lived in
drivers/net.  Then all the drivers (real, stacked, or virtual) are in
the same place.

It has not bothered me enough to consider posting patches, but you
should consider it if it bothers you.

^ permalink raw reply

* Re: [PATCH 1/3] vlan: Do not support clearing VLAN_FLAG_REORDER_HDR
From: Ben Greear @ 2011-05-24 15:17 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Eric W. Biederman, David Miller, shemminger, jpirko, xiaosuo,
	netdev, kaber, fubar, eric.dumazet, andy, jesse
In-Reply-To: <4DDB5A3F.8050007@gmail.com>

On 05/24/2011 12:11 AM, Nicolas de Pesloüan wrote:
> Le 24/05/2011 06:20, Ben Greear a écrit :
> <snip>
>
>> Then I think we should put it back with pf_packet logic. Possibly with
>> a per-socket option to disable this and send it as only aux data if that
>> is more efficient.
>
> Why would we need a per-socket option for that?

I'd want the tag available at all times, but if it is a performance win to
put the tag in 'aux data', then a socket could opt for that method.  Otherwise,
the tag should always be inline in the packet.

>> If it turns out the NIC is not stripping VLAN tags for whatever reason,
>> we might be able to optimize things so that it never does the HW emulation
>> so that it never has to un-do it later.
>
> It might be very tricky to avoid any do-undo-redo situation. I will latter try and describe all the possible situations: with/without hw-accel, with/without a
> protocol handler registered on the parent interface, with/without a child interface build on top of a particular parent interface and with the corresponding
> VLAN ID...

You could optimize for some common cases, but leave fix-up code in place to add/remove the
tag properly before handing the packet to user-space, filters, etc.

Thanks,
Ben

>
> Nicolas.


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* [PATCH v4] CDC NCM: release interfaces fix in unbind()
From: Alexey Orishko @ 2011-05-24 15:26 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	oliver-GvhC2dPhHPQdnm+yROfE0A
  Cc: linux-usb-u79uwXL29TY76Z2rM5mHXA, gregkh-l3A5Bk7waGM,
	Alexey Orishko

Changes:
- claim slave/data interface during bind() and release
 interfaces in unbind() unconditionally
- in case of error during bind(), release claimed data
 interface in the same function
- remove obsolited "*_claimed" entries from driver context

Signed-off-by: Alexey Orishko <alexey.orishko-0IS4wlFg1OjSUeElwK9/Pw@public.gmane.org>
---
 drivers/net/usb/cdc_ncm.c |   73 +++++++++++++++-----------------------------
 1 files changed, 25 insertions(+), 48 deletions(-)

diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index 4ab557d..cdd3ae4 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -54,7 +54,7 @@
 #include <linux/usb/usbnet.h>
 #include <linux/usb/cdc.h>
 
-#define	DRIVER_VERSION				"06-May-2011"
+#define	DRIVER_VERSION				"24-May-2011"
 
 /* CDC NCM subclass 3.2.1 */
 #define USB_CDC_NCM_NDP16_LENGTH_MIN		0x10
@@ -134,8 +134,6 @@ struct cdc_ncm_ctx {
 	u16 tx_ndp_modulus;
 	u16 tx_seq;
 	u16 connected;
-	u8 data_claimed;
-	u8 control_claimed;
 };
 
 static void cdc_ncm_tx_timeout(unsigned long arg);
@@ -460,17 +458,6 @@ static void cdc_ncm_free(struct cdc_ncm_ctx *ctx)
 
 	del_timer_sync(&ctx->tx_timer);
 
-	if (ctx->data_claimed) {
-		usb_set_intfdata(ctx->data, NULL);
-		usb_driver_release_interface(driver_of(ctx->intf), ctx->data);
-	}
-
-	if (ctx->control_claimed) {
-		usb_set_intfdata(ctx->control, NULL);
-		usb_driver_release_interface(driver_of(ctx->intf),
-								ctx->control);
-	}
-
 	if (ctx->tx_rem_skb != NULL) {
 		dev_kfree_skb_any(ctx->tx_rem_skb);
 		ctx->tx_rem_skb = NULL;
@@ -495,7 +482,7 @@ static int cdc_ncm_bind(struct usbnet *dev, struct usb_interface *intf)
 
 	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
 	if (ctx == NULL)
-		goto error;
+		return -ENODEV;
 
 	memset(ctx, 0, sizeof(*ctx));
 
@@ -568,46 +555,36 @@ advance:
 
 	/* check if we got everything */
 	if ((ctx->control == NULL) || (ctx->data == NULL) ||
-	    (ctx->ether_desc == NULL))
+	    (ctx->ether_desc == NULL) || (ctx->control != intf))
 		goto error;
 
 	/* claim interfaces, if any */
-	if (ctx->data != intf) {
-		temp = usb_driver_claim_interface(driver, ctx->data, dev);
-		if (temp)
-			goto error;
-		ctx->data_claimed = 1;
-	}
-
-	if (ctx->control != intf) {
-		temp = usb_driver_claim_interface(driver, ctx->control, dev);
-		if (temp)
-			goto error;
-		ctx->control_claimed = 1;
-	}
+	temp = usb_driver_claim_interface(driver, ctx->data, dev);
+	if (temp)
+		goto error;
 
 	iface_no = ctx->data->cur_altsetting->desc.bInterfaceNumber;
 
 	/* reset data interface */
 	temp = usb_set_interface(dev->udev, iface_no, 0);
 	if (temp)
-		goto error;
+		goto error2;
 
 	/* initialize data interface */
 	if (cdc_ncm_setup(ctx))
-		goto error;
+		goto error2;
 
 	/* configure data interface */
 	temp = usb_set_interface(dev->udev, iface_no, 1);
 	if (temp)
-		goto error;
+		goto error2;
 
 	cdc_ncm_find_endpoints(ctx, ctx->data);
 	cdc_ncm_find_endpoints(ctx, ctx->control);
 
 	if ((ctx->in_ep == NULL) || (ctx->out_ep == NULL) ||
 	    (ctx->status_ep == NULL))
-		goto error;
+		goto error2;
 
 	dev->net->ethtool_ops = &cdc_ncm_ethtool_ops;
 
@@ -617,7 +594,7 @@ advance:
 
 	temp = usbnet_get_ethernet_addr(dev, ctx->ether_desc->iMACAddress);
 	if (temp)
-		goto error;
+		goto error2;
 
 	dev_info(&dev->udev->dev, "MAC-Address: "
 				"0x%02x:0x%02x:0x%02x:0x%02x:0x%02x:0x%02x\n",
@@ -642,38 +619,38 @@ advance:
 	ctx->tx_speed = ctx->rx_speed = 0;
 	return 0;
 
+error2:
+	usb_set_intfdata(ctx->control, NULL);
+	usb_set_intfdata(ctx->data, NULL);
+	usb_driver_release_interface(driver, ctx->data);
 error:
 	cdc_ncm_free((struct cdc_ncm_ctx *)dev->data[0]);
 	dev->data[0] = 0;
-	dev_info(&dev->udev->dev, "Descriptor failure\n");
+	dev_info(&dev->udev->dev, "bind() failure\n");
 	return -ENODEV;
 }
 
 static void cdc_ncm_unbind(struct usbnet *dev, struct usb_interface *intf)
 {
 	struct cdc_ncm_ctx *ctx = (struct cdc_ncm_ctx *)dev->data[0];
-	struct usb_driver *driver;
+	struct usb_driver *driver = driver_of(intf);
 
 	if (ctx == NULL)
 		return;		/* no setup */
 
-	driver = driver_of(intf);
-
-	usb_set_intfdata(ctx->data, NULL);
-	usb_set_intfdata(ctx->control, NULL);
-	usb_set_intfdata(ctx->intf, NULL);
-
-	/* release interfaces, if any */
-	if (ctx->data_claimed) {
+	/* disconnect master --> disconnect slave */
+	if (intf == ctx->control && ctx->data) {
+		usb_set_intfdata(ctx->data, NULL);
 		usb_driver_release_interface(driver, ctx->data);
-		ctx->data_claimed = 0;
-	}
+		ctx->data = NULL;
 
-	if (ctx->control_claimed) {
+	} else if (intf == ctx->data && ctx->control) {
+		usb_set_intfdata(ctx->control, NULL);
 		usb_driver_release_interface(driver, ctx->control);
-		ctx->control_claimed = 0;
+		ctx->control = NULL;
 	}
 
+	usb_set_intfdata(ctx->intf, NULL);
 	cdc_ncm_free(ctx);
 }
 
-- 
1.7.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: bridge netfilter output bug on 2.6.39
From: Eric Dumazet @ 2011-05-24 15:39 UTC (permalink / raw)
  To: Stephen Hemminger, David Miller; +Cc: Herbert Xu, netdev
In-Reply-To: <20110524074156.58eb30f8@nehalam>

Le mardi 24 mai 2011 à 07:41 -0700, Stephen Hemminger a écrit :
> Got this bug report against 2.6.39.  Looks like ip_fragment() is now
> getting confused when called from bridge netfilter. Probably related to
> the changes to do ip_options_compile for the bridge input path.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=35672
> 
> May 23 02:04:24 lxc kernel: [99498.329036] BUG: unable to handle kernel NULL
> pointer dereference at 00000004
> May 23 02:04:24 lxc kernel: [99498.330017] IP: [<c143d6bf>] dst_mtu+0xb/0x1c
> May 23 02:04:24 lxc kernel: [99498.330017] *pdpt = 000000001fb55001 *pde =
> 0000000000000000
> May 23 02:04:24 lxc kernel: [99498.330017] Oops: 0000 [#1] SMP
> May 23 02:04:24 lxc kernel: [99498.330017] last sysfs file:
> /sys/devices/virtual/vc/vcsa8/uevent
> May 23 02:04:24 lxc kernel: [99498.330017] Modules linked in: lp ppdev
> parport_pc parport fuse firewire_ohci firewire_core crc_itu_t intel_agp
> intel_gtt
> May 23 02:04:24 lxc kernel: [99498.330017]
> May 23 02:04:24 lxc kernel: [99498.330017] Pid: 0, comm: swapper Not tainted
> 2.6.39-lxc #2 .   .  /IP35 Pro XE(Intel P35-ICH9R)
> May 23 02:04:24 lxc kernel: [99498.330017] EIP: 0060:[<c143d6bf>] EFLAGS:
> 00010246 CPU: 0
> May 23 02:04:24 lxc kernel: [99498.330017] EIP is at dst_mtu+0xb/0x1c
> May 23 02:04:24 lxc kernel: [99498.330017] EAX: 00000000 EBX: e90b6b40 ECX:
> effc981c EDX: effc9000
> May 23 02:04:24 lxc kernel: [99498.330017] ESI: c1a0d84e EDI: dda6331e EBP:
> f080bb44 ESP: f080bb44
> May 23 02:04:24 lxc kernel: [99498.330017]  DS: 007b ES: 007b FS: 00d8 GS: 0000
> SS: 0068
> May 23 02:04:24 lxc kernel: [99498.330017] Process swapper (pid: 0, ti=f080a000
> task=c172b7e0 task.ti=c1724000)
> May 23 02:04:24 lxc kernel: [99498.330017] Stack:
> May 23 02:04:24 lxc kernel: [99498.330017]  f080bb8c c143e20d 00000004 f080bb88
> c141aab2 c14b46db effc9000 00000014
> May 23 02:04:24 lxc kernel: [99498.330017]  c14b8a44 effc9000 e90b6b40 00000014
> effc981c e90b6b58 cd472800 e90b6b40
> May 23 02:04:24 lxc kernel: [99498.330017]  c14b8a44 dda6331e f080bb98 c14b8aa0
> e90b6b40 f080bba8 c14b881a e90b6b40
> May 23 02:04:24 lxc kernel: [99498.330017] Call Trace:
> May 23 02:04:24 lxc kernel: [99498.330017]  [<c143e20d>] ip_fragment+0xb5/0x66c
> May 23 02:04:24 lxc kernel: [99498.330017]  [<c141aab2>] ?
> nf_hook_slow+0x43/0xd1
> May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b46db>] ? br_flood+0x83/0x83
> May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8a44>] ?
> br_parse_ip_options+0x1b0/0x1b0
> May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8a44>] ?
> br_parse_ip_options+0x1b0/0x1b0
> May 23 02:04:24 lxc kernel: [99498.330017]  [<c14b8aa0>]
> br_nf_dev_queue_xmit+0x5c/0x68
> --

I would say its more likely a problem with dst metrics changes

In this crash, we dereference a NULL dst->_metrics 'pointer' in
dst_metric_raw(dst, RTAX_MTU);

Hmm, it seems __dst_destroy_metrics_generic() doesnt add the
DST_METRICS_READ_ONLY flag ?

[PATCH] net: fix __dst_destroy_metrics_generic()

dst_default_metrics is readonly, we dont want to kfree() it later.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
---
 net/core/dst.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 81a4fa1..1badc98 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -315,7 +315,7 @@ void __dst_destroy_metrics_generic(struct dst_entry *dst, unsigned long old)
 {
 	unsigned long prev, new;
 
-	new = (unsigned long) dst_default_metrics;
+	new = ((unsigned long) dst_default_metrics) | DST_METRICS_READ_ONLY;
 	prev = cmpxchg(&dst->_metrics, old, new);
 	if (prev == old)
 		kfree(__DST_METRICS_PTR(old));



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox