Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-25  6:07 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <8739k3k1fb.fsf-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

On Wed, May 25, 2011 at 11:05:04AM +0930, Rusty Russell wrote:
> On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > I do understand how it seems a waste to leave direct space
> > in the ring while we might in practice have space
> > due to indirect. Didn't come up with a nice way to
> > solve this yet - but 'no worse than now :)'
> 
> Let's just make it "bool free_old_xmit_skbs(unsigned int max)".  max ==
> 2 for the normal xmit path, so we're low latency but we keep ahead on
> average.  max == -1 for the "we're out of capacity, we may have to stop
> the queue".
> 
> That keeps it simple and probably the right thing...
> 
> Thanks,
> Rusty.

Hmm I'm not sure I got it, need to think about this.
I'd like to go back and document how my design was supposed to work.
This really should have been in commit log or even a comment.
I thought we need a min, not a max.
We start with this:

	while ((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS) &&
		(skb = get_buf)))
		kfree_skb(skb);
	return !c;

This is clean and simple, right? And it's exactly asking for what we need.

But this way we always keep a lot of memory in skbs even when rate of
communication is low.

So we add the min parameter:

	int n = 0;

	while ((((c = (virtqueue_get_capacity(vq) < 2 + MAX_SKB_FRAGS)) ||
		 n++ < min) && (skb = get_buf)))
		kfree_skb(skb);
	return !c;


on the normal path min == 2 so we're low latency but we keep ahead on
average. min == 0 for the "we're out of capacity, we may have to stop
the queue".

Does the above make sense at all?

-- 
MST

^ permalink raw reply

* Re: Kernel crash after using new Intel NIC (igb)
From: Arun Sharma @ 2011-05-25  6:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arun Sharma, Maximilian Engelhardt, linux-kernel, netdev,
	StuStaNet Vorstand
In-Reply-To: <1306291469.3305.11.camel@edumazet-laptop>

On Wed, May 25, 2011 at 04:44:29AM +0200, Eric Dumazet wrote:
> 
> Hmm, thanks for the report. Are you running x86 or another arch ?
> 

This was on x86.

> We probably need some sort of memory barrier.
> 
> However, locking this central lock makes the thing too slow, I'll try to
> use an atomic_inc_return on p->refcnt instead. (and then lock
> unused_peers.lock if we got a 0->1 transition)

Another possibility is to do the list_empty() check twice. Once without
taking the lock and again with the spinlock held.

 -Arun

^ permalink raw reply

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Michael S. Tsirkin @ 2011-05-25  5:50 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <8762ozk1qd.fsf-8n+1lVoiYb80n/F98K4Iww@public.gmane.org>

On Wed, May 25, 2011 at 10:58:26AM +0930, Rusty Russell wrote:
> On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> > > Can we hit problems with OOM?  Sure, but no worse than now...
> > > The problem is that this "virtqueue_get_capacity()" returns the worst
> > > case, not the normal case.  So using it is deceptive.
> > > 
> > 
> > Maybe just document this?
> 
> Yes, but also by renaming virtqueue_get_capacity().  Takes it from a 3
> to a 6 on the API hard-to-misuse scale.
> 
> How about, virtqueue_min_capacity()?  Makes the reader realize something
> weird is going on.

Absolutely. Great idea.

> > I still believe capacity really needs to be decided
> > at the virtqueue level, not in the driver.
> > E.g. with indirect each skb uses a single entry: freeing
> > 1 small skb is always enough to have space for a large one.
> > 
> > I do understand how it seems a waste to leave direct space
> > in the ring while we might in practice have space
> > due to indirect. Didn't come up with a nice way to
> > solve this yet - but 'no worse than now :)'
> 
> Agreed.
> 
> > > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > > > sure we have enough space in the buffer. Another way to do
> > > > that is with a define :).
> > > 
> > > To do this properly, we should really be using the actual number of sg
> > > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > > know how many.
> > > 
> > > Cheers,
> > > Rusty.
> > 
> > Maybe I'm confused here.  The problem isn't the failing
> > add_buf for the given skb IIUC.  What we are trying to do here is stop
> > the queue *before xmit_skb fails*. We can't look at the
> > number of fragments in the current skb - the next one can be
> > much larger.  That's why we check capacity after xmit_skb,
> > not before it, right?
> 
> No, I was confused...  More coffee!
> 
> Thanks,
> Rusty.

^ permalink raw reply

* PLEASE I NEED YOUR ASSISTANCE????
From: Mr. LEUNG Cheung @ 2011-05-25  5:06 UTC (permalink / raw)


Hello,

I am contacting you because I need your assistance to transfer $22,500,000.00 USD from Hong Kong to your country. 

Your share will be 40% while I will be entitled to 60% as the initiator of the business. 

Thank you,
Mr. LEUNG Cheung

^ permalink raw reply

* Re: VLAN test cases in 2.6.38.7
From: Ben Greear @ 2011-05-25  3:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20110524.224720.67532722991776785.davem@davemloft.net>

On 05/24/2011 07:47 PM, David Miller wrote:
> From: Ben Greear<greearb@candelatech.com>
> Date: Tue, 24 May 2011 15:52:52 -0700
>
>> Either my ubridge code is broken, or 2.6.38.7+ doesn't send
>> tp_vlan_tci
>> properly.  All I ever see is zero for that field.
>
> You need to put the af_packet socket into TPACKET_V2 format.

Actually, that doesn't seem needed.  I just needed to give more
space to the cmsg buffer to account for the cmsg struct overhead.
(And I had a bunch of other bugs, but that was the only one
that directly affected the tci field.)

I think that TPACKET_V2 stuff might be only for reading
memmapped messages or something like that?

It looks like for any flexible user-space code, we are just going
to have to use recvmesg instead of recvfrom, and add the extra checks
for cmesg data.  I'll post code when I get it working better so
the next person to try this might have an easier time :P

I think we will need to add the has-vlan-tci flag to the cmesg
data, and probably also remove that masking of the CFI bit.

I've no idea about socket filters, but maybe someone else can
deal with that :)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: VLAN test cases in 2.6.38.7
From: David Miller @ 2011-05-25  2:47 UTC (permalink / raw)
  To: greearb; +Cc: netdev
In-Reply-To: <4DDC36C4.80408@candelatech.com>

From: Ben Greear <greearb@candelatech.com>
Date: Tue, 24 May 2011 15:52:52 -0700

> Either my ubridge code is broken, or 2.6.38.7+ doesn't send
> tp_vlan_tci
> properly.  All I ever see is zero for that field.

You need to put the af_packet socket into TPACKET_V2 format.

^ permalink raw reply

* Re: Kernel crash after using new Intel NIC (igb)
From: Eric Dumazet @ 2011-05-25  2:44 UTC (permalink / raw)
  To: Arun Sharma
  Cc: Maximilian Engelhardt, linux-kernel, netdev, StuStaNet Vorstand
In-Reply-To: <20110524213327.GA3917@dev1756.snc6.facebook.com>

Le mardi 24 mai 2011 à 14:33 -0700, Arun Sharma a écrit :
> On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:
> > 
> > Probably not.
> > 
> > What gives slub_nomerge=1   for you ?
> > 
> 
> It took me a while to get a new kernel on a large enough sample
> of machines to get some data.
> 
> Like you observed in the other thread, this is unlikely to be a random
> memory corruption.
> 
> The panics stopped after we moved the list_empty() check under the lock.
> 
> --- a/net/ipv4/inetpeer.c
> +++ b/net/ipv4/inetpeer.c
> @@ -154,11 +154,11 @@ void __init inet_initpeers(void)
>  /* Called with or without local BH being disabled. */
>  static void unlink_from_unused(struct inet_peer *p)
>  {
> +	spin_lock_bh(&unused_peers.lock);
>  	if (!list_empty(&p->unused)) {
> -		spin_lock_bh(&unused_peers.lock);
>  		list_del_init(&p->unused);
> -		spin_unlock_bh(&unused_peers.lock);
>  	}
> +	spin_unlock_bh(&unused_peers.lock);
>  }
>  
>  static int addr_compare(const struct inetpeer_addr *a,
> 
> The idea being that the list gets corrupted under some kind of a race
> condition. Two threads racing on list_empty() and executing
> list_del_init() seems harmless.
> 
> There is probably a different race condition that is mitigated by doing
> the list_empty() check under the lock.
> 

Hmm, thanks for the report. Are you running x86 or another arch ?

We probably need some sort of memory barrier.

However, locking this central lock makes the thing too slow, I'll try to
use an atomic_inc_return on p->refcnt instead. (and then lock
unused_peers.lock if we got a 0->1 transition)

I am testing following patch :


diff --git a/net/ipv4/inetpeer.c b/net/ipv4/inetpeer.c
index 9df4e63..43aacbf 100644
--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -154,11 +154,9 @@ void __init inet_initpeers(void)
 /* Called with or without local BH being disabled. */
 static void unlink_from_unused(struct inet_peer *p)
 {
-	if (!list_empty(&p->unused)) {
-		spin_lock_bh(&unused_peers.lock);
-		list_del_init(&p->unused);
-		spin_unlock_bh(&unused_peers.lock);
-	}
+	spin_lock_bh(&unused_peers.lock);
+	list_del_init(&p->unused);
+	spin_unlock_bh(&unused_peers.lock);
 }
 
 static int addr_compare(const struct inetpeer_addr *a,
@@ -213,10 +211,11 @@ static int addr_compare(const struct inetpeer_addr *a,
  * We exit from this function if number of links exceeds PEER_MAXDEPTH
  */
 static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
-				    struct inet_peer_base *base)
+				    struct inet_peer_base *base,
+				    int *newrefcnt)
 {
 	struct inet_peer *u = rcu_dereference(base->root);
-	int count = 0;
+	int old, new, count = 0;
 
 	while (u != peer_avl_empty) {
 		int cmp = addr_compare(daddr, &u->daddr);
@@ -226,8 +225,16 @@ static struct inet_peer *lookup_rcu(const struct inetpeer_addr *daddr,
 			 * distinction between an unused entry (refcnt=0) and
 			 * a freed one.
 			 */
-			if (unlikely(!atomic_add_unless(&u->refcnt, 1, -1)))
-				u = NULL;
+			while (1) {
+				old = atomic_read(&u->refcnt);
+				if (old == -1)
+					return NULL;
+				new = old + 1;
+				if (atomic_cmpxchg(&u->refcnt,
+						   old, new) == old)
+					break;
+			}
+			*newrefcnt = new;
 			return u;
 		}
 		if (cmp == -1)
@@ -465,14 +472,14 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 	struct inet_peer_base *base = family_to_base(daddr->family);
 	struct inet_peer *p;
 	unsigned int sequence;
-	int invalidated;
+	int invalidated, newrefcnt = 0;
 
 	/* Look up for the address quickly, lockless.
 	 * Because of a concurrent writer, we might not find an existing entry.
 	 */
 	rcu_read_lock();
 	sequence = read_seqbegin(&base->lock);
-	p = lookup_rcu(daddr, base);
+	p = lookup_rcu(daddr, base, &newrefcnt);
 	invalidated = read_seqretry(&base->lock, sequence);
 	rcu_read_unlock();
 
@@ -480,7 +487,8 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 		/* The existing node has been found.
 		 * Remove the entry from unused list if it was there.
 		 */
-		unlink_from_unused(p);
+		if (newrefcnt == 1)
+			unlink_from_unused(p);
 		return p;
 	}
 
@@ -494,10 +502,11 @@ struct inet_peer *inet_getpeer(struct inetpeer_addr *daddr, int create)
 	write_seqlock_bh(&base->lock);
 	p = lookup(daddr, stack, base);
 	if (p != peer_avl_empty) {
-		atomic_inc(&p->refcnt);
+		newrefcnt = atomic_inc_return(&p->refcnt);
 		write_sequnlock_bh(&base->lock);
 		/* Remove the entry from unused list if it was there. */
-		unlink_from_unused(p);
+		if (newrefcnt == 1)
+			unlink_from_unused(p);
 		return p;
 	}
 	p = create ? kmem_cache_alloc(peer_cachep, GFP_ATOMIC) : NULL;

^ permalink raw reply related

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
From: Neil Horman @ 2011-05-25  2:20 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: =?us-ascii?B?PT9JU08tODg1OS0xP1E/Tmljb2xhc19kZV9QZXNsbz1GQ2FuPz0=?=,
	Andy Gospodarek, netdev, David S. Miller
In-Reply-To: <20007.1306283668@death>

On Tue, May 24, 2011 at 05:34:28PM -0700, Jay Vosburgh wrote:
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> >On Tue, May 24, 2011 at 02:12:54PM -0700, Jay Vosburgh wrote:
> >> Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
> >> 
> >> >Le 24/05/2011 22:37, Neil Horman a écrit :
> >> >
> >> >>>>> +		return -EINVAL;
> >> >>>
> >> >>> This will turn a warning into an error.
> >> >>>
> >> >> Yes, because it should have been an error all along.
> >> >>
> >> >>> This warning existed for long, but never caused the bonding setup to
> >> >>> fail. This patch cause some regression for user space. For example,
> >> >>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
> >> >>> before enslaving, because this was never required.
> >> >>>
> >> >> Thats not a regression, thats the kernel returning an error where it should have
> >> >> done so all along.  Just because a utility got away with it for awhile and it
> >> >> didn't always cause a lockup, doesn't grandfather that application in to a
> >> >> situation where the kernel has to support its broken behavior in perpituity.
> >> >>
> >> >> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
> >> >> patch doesn't touch, so ifenslave is currently unaffected (although I should
> >> >> look in the ioctl path to see if we have already added such a check, lest you be
> >> >> able to deadlock your system as previously indicated using that tool).
> >> >
> >> >Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
> >> >ioctl (ifenslave binary) anymore, but only sysfs.
> >> >
> >> >Documentation/bonding.txt should be updated to reflect this change.
> >> >pr_warning should be changed to pr_ err.
> >> >Bonding version should be bumped.
> >> >
> >> >Anyway, I will fix this package, but I suspect there exist many user
> >> >scripts that don't ensure bond is up before enslaving.
> >> 
> >> 	I looked at sysconfig (as supplied with opensuse) and it uses
> >> sysfs, and does set the master device up first.  The other potential
> >> user that comes to mind is that OFED at one point had a script to set up
> >> bonding for Infiniband devices.  I don't know if this is still the case,
> >> nor do I know if it set the bond device up before enslaving.
> >> 
> >> 	Generally speaking, though, in the long run I think it should be
> >> permissible to change any bonding option when the bond is down (even to
> >> values that make no sense in context, e.g., setting the primary to a
> >> device not currently enslaved).  My rationale here is that some options
> >> are very difficult to modify when the bond is up (e.g., changing the
> >> mode), and now some other set is precluded when the bond is down.  The
> >> init scripts already have repeat logic in them; this just makes things
> >> more complicated.
> >> 
> >> 	There should be a state wherein any option can be changed (well,
> >> maybe not max_bonds), and that should be the down state.  A subset can
> >> also be changed while up.  I'd be happy to be able to change all options
> >> while the bond is up, too, but that seems pretty hard to do.
> >> 
> >> 	How much harder is it to fix the locking and permit the action
> >> in question here?
> >> 
> >In this case, to just hack something in place is pretty easy, I can just
> >initalize the spinlocks for all cases in the bond_create path.  But to do in any
> >sort of sensical way is much harder, since the code is written such that you
> >initialize various relevant data structures based on the mode of the bond,
> >which, as you indicated above, you want the right to change up until the point
> >where you ifup the bonded interface. 
> >
> >The whole thing is predicated on the notion that
> >transitioning from the down to up state is the gating factor to initializing the
> >current configuration.  What might work is an in-between state in which you commit
> >and initialize a bond based on the current configuation.  Doing so would allow
> >you to (re-)initialize a bond configuration in a safe state.  Only after
> >commiting a configuration could you enslave devices or ifup the bond.  Once up,
> >further commits would be non-permissable until the bond was brought down again.
> >Of course, this would also require changing the semantics of the user space
> >tools.
> 
> 	One alternative is to simply permit all option changes while the
> bond is down, then commit then as a set when the bond is brought up (or
> fail the open or adjust options if the options are invalid).  More or
> less what you suggest, except that the "commit" is the ifup.
> 
Well, thats rather the problem, by making the ifup be the commit step, you deny
the 'in between phase' where the configuration is applied, but prior to the
interface being up, so its safe to enslave devices.

> 	Even that is a substantial rework of how things are done now,
> since as you point out, options today take effect more or less in real
> time.
> 
Agreed its both a large undertaking, and it still requires user space
co-ordination.

> >This also begs the question, is it or is it not safe to enslave devices while
> >the bond is down?  Clearly from the bug report its unsafe, and I don't know what
> >other (if any) conditions exist that cause problems when doing this (be that a
> >deadlock, panic or simply undefined or unexpected behavior).  If its really
> >unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
> >cause things like this, and as such, we should return an error.  If it is safe
> >(generally) and this is an isolated bug, then we should probably remove the
> >warning.  But to just issue a vague 'This might do bad things' warning seems
> >wrong in either case.
> 
> 	Agreed.  I think it (conceptually) should be safe to add or
> remove slaves when the bond is down, and bonding shouldn't complain
> about it.
> 
Ok, thats good to know then.

> 	Ok, I looked through the log, and originally enslaving while
> down via sysfs was disallowed when that code was added.
> 
> 	Later, enslaving while down was enabled in a patch for
> Infiniband.  Reading the changelog explains why; for IB, enslaving while
> up messed up the multicast group addresses:
> 
> commit 6b1bf096508c870889c2be63c7757a04d72116fe
> Author: Moni Shoua <monis@voltaire.com>
<snip commit log>
> 
> 	Assuming this situation hasn't changed (and I'm not sure how it
> could, because the first IB slave causes a type change), I don't believe
> we can disallow enslaving while the bond is down because IB depends on
> it.
> 
Ok, thats also good to know.  Thank you for doing the research on that.

> >I'll respin the patch to just initialize the spinlocks in the morning, if thats
> >what will fix the deadlock, but it really seems like the wrong way to go to me.
> >If enslaving devices to a bond while its down has been known to cause problems,
> >then we shouldn't allow it and we should update the user space tools to
> >understand and handle that.
> 
> 	This is the first I've heard of it causing a panic; looking
> back, I can also see some issues with the warning itself (because it
> happens prior to the rtnl_trylock, the message may repeat a lot if rtnl
> is contended), so I'm happy to see the warning go away either way.
> 
Copy that, it sounds like the warning isn't actually needed (i.e. enslaving with
the bond down should be safe, and this panic is an outlier condition).  I'll
respin the patch with the spinlock initialized in the create path and remove the
warning removed in the AM

Thanks!
Neil

> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

^ permalink raw reply

* [PATCHv2] net: Define enum for the bits used in features.
From: Mahesh Bandewar @ 2011-05-25  1:56 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Mahesh Bandewar, Tom Herbert, Michał Mirosław,
	Stephen Hemminger
In-Reply-To: <1306263162-2022-1-git-send-email-maheshb@google.com>

Little bit cleanup by defining enum for all bits used. Also use those enum
values to redefine flags.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
Changes since v1:
 Split the patch into two pieces.

 include/linux/netdevice.h |  100 +++++++++++++++++++++++++++++++-------------
 1 files changed, 70 insertions(+), 30 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ca333e7..b4520b2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -51,6 +51,7 @@
 #ifdef CONFIG_DCB
 #include <net/dcbnl.h>
 #endif
+#include <linux/netdev_features.h>
 
 struct vlan_group;
 struct netpoll_info;
@@ -981,6 +982,49 @@ struct net_device_ops {
 };
 
 /*
+ * Net device feature bits; if you change something,
+ * also update netdev_features_strings[] in ethtool.c
+ */
+enum netdev_features {
+	SG_BIT,			/* Scatter/gather IO. */
+	IP_CSUM_BIT,		/* Can checksum TCP/UDP over IPv4. */
+	NO_CSUM_BIT,		/* Does not require checksum. F.e. loopack. */
+	HW_CSUM_BIT,		/* Can checksum all the packets. */
+	IPV6_CSUM_BIT,		/* Can checksum TCP/UDP over IPV6 */
+	HIGHDMA_BIT,		/* Can DMA to high memory. */
+	FRAGLIST_BIT,		/* Scatter/gather IO. */
+	HW_VLAN_TX_BIT,		/* Transmit VLAN hw acceleration */
+	HW_VLAN_RX_BIT,		/* Receive VLAN hw acceleration */
+	HW_VLAN_FILTER_BIT,	/* Receive filtering on VLAN */
+	VLAN_CHALLENGED_BIT,	/* Device cannot handle VLAN packets */
+	GSO_BIT,		/* Enable software GSO. */
+	LLTX_BIT,		/* LockLess TX - deprecated. Please */
+				/* do not use LLTX in new drivers */
+	NETNS_LOCAL_BIT,	/* Does not change network namespaces */
+	GRO_BIT,		/* Generic receive offload */
+	LRO_BIT,		/* large receive offload */
+	RESERVED16_BIT,		/* the GSO_MASK reserved bit 16 */
+	RESERVED17_BIT,		/* the GSO_MASK reserved bit 17 */
+	RESERVED18_BIT,		/* the GSO_MASK reserved bit 18 */
+	RESERVED19_BIT,		/* the GSO_MASK reserved bit 19 */
+	RESERVED20_BIT,		/* the GSO_MASK reserved bit 20 */
+	RESERVED21_BIT,		/* the GSO_MASK reserved bit 21 */
+	RESERVED22_BIT,		/* the GSO_MASK reserved bit 22 */
+	RESERVED23_BIT,		/* the GSO_MASK reserved bit 23 */
+	FCOE_CRC_BIT,		/* FCoE CRC32 */
+	SCTP_CSUM_BIT,		/* SCTP checksum offload */
+	FCOE_MTU_BIT,		/* Supports max FCoE MTU, 2158 bytes*/
+	NTUPLE_BIT,		/* N-tuple filters supported */
+	RXHASH_BIT,		/* Receive hashing offload */
+	RXCSUM_BIT,		/* Receive checksumming offload */
+	NOCACHE_COPY_BIT,	/* Use no-cache copyfromuser */
+	LOOPBACK_BIT,		/* Enable loopback */
+
+	/* Add you bit above this */
+	ND_FEATURE_NUM_BITS	/* (LAST VALUE) Total bits in use */
+};
+
+/*
  *	The DEVICE structure.
  *	Actually, this whole structure is a big mistake.  It mixes I/O
  *	data with strictly "high-level" data, and it has to know about
@@ -1035,36 +1079,32 @@ struct net_device {
 	/* mask of features inheritable by VLAN devices */
 	u32			vlan_features;
 
-	/* Net device feature bits; if you change something,
-	 * also update netdev_features_strings[] in ethtool.c */
-
-#define NETIF_F_SG		1	/* Scatter/gather IO. */
-#define NETIF_F_IP_CSUM		2	/* Can checksum TCP/UDP over IPv4. */
-#define NETIF_F_NO_CSUM		4	/* Does not require checksum. F.e. loopack. */
-#define NETIF_F_HW_CSUM		8	/* Can checksum all the packets. */
-#define NETIF_F_IPV6_CSUM	16	/* Can checksum TCP/UDP over IPV6 */
-#define NETIF_F_HIGHDMA		32	/* Can DMA to high memory. */
-#define NETIF_F_FRAGLIST	64	/* Scatter/gather IO. */
-#define NETIF_F_HW_VLAN_TX	128	/* Transmit VLAN hw acceleration */
-#define NETIF_F_HW_VLAN_RX	256	/* Receive VLAN hw acceleration */
-#define NETIF_F_HW_VLAN_FILTER	512	/* Receive filtering on VLAN */
-#define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
-#define NETIF_F_GSO		2048	/* Enable software GSO. */
-#define NETIF_F_LLTX		4096	/* LockLess TX - deprecated. Please */
-					/* do not use LLTX in new drivers */
-#define NETIF_F_NETNS_LOCAL	8192	/* Does not change network namespaces */
-#define NETIF_F_GRO		16384	/* Generic receive offload */
-#define NETIF_F_LRO		32768	/* large receive offload */
-
-/* the GSO_MASK reserves bits 16 through 23 */
-#define NETIF_F_FCOE_CRC	(1 << 24) /* FCoE CRC32 */
-#define NETIF_F_SCTP_CSUM	(1 << 25) /* SCTP checksum offload */
-#define NETIF_F_FCOE_MTU	(1 << 26) /* Supports max FCoE MTU, 2158 bytes*/
-#define NETIF_F_NTUPLE		(1 << 27) /* N-tuple filters supported */
-#define NETIF_F_RXHASH		(1 << 28) /* Receive hashing offload */
-#define NETIF_F_RXCSUM		(1 << 29) /* Receive checksumming offload */
-#define NETIF_F_NOCACHE_COPY	(1 << 30) /* Use no-cache copyfromuser */
-#define NETIF_F_LOOPBACK	(1 << 31) /* Enable loopback */
+#define BIT2FLAG(bit)		(1 << (bit))
+
+#define NETIF_F_SG		BIT2FLAG(SG_BIT)
+#define NETIF_F_IP_CSUM		BIT2FLAG(IP_CSUM_BIT)
+#define NETIF_F_NO_CSUM		BIT2FLAG(NO_CSUM_BIT)
+#define NETIF_F_HW_CSUM		BIT2FLAG(HW_CSUM_BIT)
+#define NETIF_F_IPV6_CSUM	BIT2FLAG(IPV6_CSUM_BIT)
+#define NETIF_F_HIGHDMA		BIT2FLAG(HIGHDMA_BIT)
+#define NETIF_F_FRAGLIST	BIT2FLAG(FRAGLIST_BIT)
+#define NETIF_F_HW_VLAN_TX	BIT2FLAG(HW_VLAN_TX_BIT)
+#define NETIF_F_HW_VLAN_RX	BIT2FLAG(HW_VLAN_RX_BIT)
+#define NETIF_F_HW_VLAN_FILTER	BIT2FLAG(HW_VLAN_FILTER_BIT)
+#define NETIF_F_VLAN_CHALLENGED	BIT2FLAG(VLAN_CHALLENGED_BIT)
+#define NETIF_F_GSO		BIT2FLAG(GSO_BIT)
+#define NETIF_F_LLTX		BIT2FLAG(LLTX_BIT)
+#define NETIF_F_NETNS_LOCAL	BIT2FLAG(NETNS_LOCAL_BIT)
+#define NETIF_F_GRO		BIT2FLAG(GRO_BIT)
+#define NETIF_F_LRO		BIT2FLAG(LRO_BIT)
+#define NETIF_F_FCOE_CRC	BIT2FLAG(FCOE_CRC_BIT)
+#define NETIF_F_SCTP_CSUM	BIT2FLAG(SCTP_CSUM_BIT)
+#define NETIF_F_FCOE_MTU	BIT2FLAG(FCOE_MTU_BIT)
+#define NETIF_F_NTUPLE		BIT2FLAG(NTUPLE_BIT)
+#define NETIF_F_RXHASH		BIT2FLAG(RXHASH_BIT)
+#define NETIF_F_RXCSUM		BIT2FLAG(RXCSUM_BIT)
+#define NETIF_F_NOCACHE_COPY	BIT2FLAG(NOCACHE_COPY_BIT)
+#define NETIF_F_LOOPBACK	BIT2FLAG(LOOPBACK_BIT)
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
-- 
1.7.3.1


^ permalink raw reply related

* [PATCHv2] net: Abstract features usage.
From: Mahesh Bandewar @ 2011-05-25  1:55 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Mahesh Bandewar, Tom Herbert, Michał Mirosław,
	Stephen Hemminger
In-Reply-To: <1306263162-2022-1-git-send-email-maheshb@google.com>

Define macros to set/clear/test bits for feature set usage. This will eliminate
the direct use of these fields and enable future ease in managing these fields.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
Changes since v1:
 Split the patch into two pieces.

 include/linux/netdev_features.h |   64 +++++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h       |    8 +++++
 2 files changed, 72 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netdev_features.h

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
new file mode 100644
index 0000000..3043c4d
--- /dev/null
+++ b/include/linux/netdev_features.h
@@ -0,0 +1,64 @@
+#ifndef _NETDEV_FEATURES_H
+#define _NETDEV_FEATURES_H
+
+/* Forward declarations */
+struct net_device;
+
+typedef unsigned long *nd_feature_t;
+
+static inline void _nd_set_feature(u32 *old_field,
+		unsigned long *new_field, int bit)
+{
+	if (bit < 32)
+		*old_field |= (1 << bit);
+	set_bit(bit, new_field);
+}
+
+static inline void _nd_clear_feature(u32 *old_field,
+		unsigned long *new_field, int bit)
+{
+	if (bit < 32)
+		*old_field &= ~(1 << bit);
+
+	clear_bit(bit, new_field);
+}
+
+static inline bool _nd_test_feature(u32 old_field,
+		unsigned long *new_field, int bit)
+{
+	if (bit < 32)
+		return (old_field & (1 << bit)) == 1;	
+
+	return test_bit(bit, new_field) == 1;
+}
+
+#define netdev_set_active_feature(dev, bit)	\
+	_nd_set_feature(&(dev)->features, (dev)->active_feature, (bit))
+#define netdev_clear_active_feature(dev, bit)	\
+	_nd_clear_feature(&(dev)->features, (dev)->active_feature, (bit))
+#define netdev_test_active_feature(dev, bit)	\
+	_nd_test_feature((dev)->features, (dev)->active_feature, (bit))
+
+#define netdev_set_offered_feature(dev, bit)	\
+	_nd_set_feature(&(dev)->hw_features, (dev)->offered_feature, (bit))
+#define netdev_clear_offered_feature(dev, bit)	\
+	_nd_clear_feature(&(dev)->hw_features, (dev)->offered_feature, (bit))
+#define netdev_test_offered_feature(dev, bit)	\
+	_nd_test_feature((dev)->hw_features, (dev)->offered_feature, (bit))
+
+#define netdev_set_vlan_feature(dev, bit)	\
+	_nd_set_feature(&(dev)->vlan_features, (dev)->vlan_feature, (bit))
+#define netdev_clear_vlan_feature(dev, bit)	\
+	_nd_clear_feature(&(dev)->vlan_features, (dev)->vlan_feature, (bit))
+#define netdev_test_vlan_feature(dev, bit)	\
+	_nd_test_feature((dev)->vlan_features, (dev)->vlan_feature, (bit))
+
+#define netdev_set_wanted_feature(dev, bit)	\
+	_nd_set_feature(&(dev)->wanted_features, (dev)->wanted_feature, (bit))
+#define netdev_clear_wanted_feature(dev, bit)	\
+	_nd_clear_feature(&(dev)->wanted_features, (dev)->wanted_feature, (bit))
+#define netdev_test_wanted_feature(dev, bit)	\
+	_nd_test_feature((dev)->wanted_features, (dev)->wanted_feature, (bit))
+
+
+#endif	/* __NETDEV_FEATURES_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b4520b2..e565ceb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1079,6 +1079,14 @@ struct net_device {
 	/* mask of features inheritable by VLAN devices */
 	u32			vlan_features;
 
+#define DEV_FEATURE_WORDS	BITS_TO_LONGS(ND_FEATURE_NUM_BITS)
+#define DEV_FEATURE_BITS	(DEV_FEATURE_WORDS * BITS_PER_LONG)
+
+	DECLARE_BITMAP(active_feature, DEV_FEATURE_BITS);
+	DECLARE_BITMAP(offered_feature, DEV_FEATURE_BITS);
+	DECLARE_BITMAP(wanted_feature, DEV_FEATURE_BITS);
+	DECLARE_BITMAP(vlan_feature, DEV_FEATURE_BITS);
+
 #define BIT2FLAG(bit)		(1 << (bit))
 
 #define NETIF_F_SG		BIT2FLAG(SG_BIT)
-- 
1.7.3.1


^ permalink raw reply related

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Rusty Russell @ 2011-05-25  1:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <20110523111900.GB27212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> I do understand how it seems a waste to leave direct space
> in the ring while we might in practice have space
> due to indirect. Didn't come up with a nice way to
> solve this yet - but 'no worse than now :)'

Let's just make it "bool free_old_xmit_skbs(unsigned int max)".  max ==
2 for the normal xmit path, so we're low latency but we keep ahead on
average.  max == -1 for the "we're out of capacity, we may have to stop
the queue".

That keeps it simple and probably the right thing...

Thanks,
Rusty.

^ permalink raw reply

* Re: [PATCHv2 10/14] virtio_net: limit xmit polling
From: Rusty Russell @ 2011-05-25  1:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Krishna Kumar, Carsten Otte, lguest-uLR06cmDAlY/bJ5BZ2RsiQ,
	Shirley Ma, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	habanero-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, Heiko Carstens,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	steved-r/Jw6+rmf7HQT0dZR+AlfA, Christian Borntraeger,
	Tom Lendacky, Martin Schwidefsky, linux390-tA70FqPdS9bQT0dZR+AlfA
In-Reply-To: <20110523111900.GB27212-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Mon, 23 May 2011 14:19:00 +0300, "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, May 23, 2011 at 11:37:15AM +0930, Rusty Russell wrote:
> > Can we hit problems with OOM?  Sure, but no worse than now...
> > The problem is that this "virtqueue_get_capacity()" returns the worst
> > case, not the normal case.  So using it is deceptive.
> > 
> 
> Maybe just document this?

Yes, but also by renaming virtqueue_get_capacity().  Takes it from a 3
to a 6 on the API hard-to-misuse scale.

How about, virtqueue_min_capacity()?  Makes the reader realize something
weird is going on.

> I still believe capacity really needs to be decided
> at the virtqueue level, not in the driver.
> E.g. with indirect each skb uses a single entry: freeing
> 1 small skb is always enough to have space for a large one.
> 
> I do understand how it seems a waste to leave direct space
> in the ring while we might in practice have space
> due to indirect. Didn't come up with a nice way to
> solve this yet - but 'no worse than now :)'

Agreed.

> > > I just wanted to localize the 2+MAX_SKB_FRAGS logic that tries to make
> > > sure we have enough space in the buffer. Another way to do
> > > that is with a define :).
> > 
> > To do this properly, we should really be using the actual number of sg
> > elements needed, but we'd have to do most of xmit_skb beforehand so we
> > know how many.
> > 
> > Cheers,
> > Rusty.
> 
> Maybe I'm confused here.  The problem isn't the failing
> add_buf for the given skb IIUC.  What we are trying to do here is stop
> the queue *before xmit_skb fails*. We can't look at the
> number of fragments in the current skb - the next one can be
> much larger.  That's why we check capacity after xmit_skb,
> not before it, right?

No, I was confused...  More coffee!

Thanks,
Rusty.

^ permalink raw reply

* Re: VLAN test cases in 2.6.38.7
From: Ben Greear @ 2011-05-25  0:34 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4DDC36C4.80408@candelatech.com>

On 05/24/2011 03:52 PM, Ben Greear wrote:
> On 05/24/2011 12:08 PM, Ben Greear wrote:
>> On 05/24/2011 11:49 AM, Ben Greear wrote:
>>>
>>> I wrote a simple bridge that uses packet sockets to read/write
>>> from network devices. I'll upload this code somewhere when
>>> I get it a bit more presentable.
>>>
>>> Machine info:
>>>
>>> Linux lf0300-demo 2.6.38.7+ #14 SMP Mon May 23 10:31:45 PDT 2011 i686
>>> i686 i386 GNU/Linux
>>>
>>> Interface A: eth1
>>> driver: igb
>>> version: 3.0.19
>>> firmware-version: 1.2-1
>>> bus-info: 0000:01:00.0
>>>
>>> Interface B: eth3:
>>> driver: igb
>>> version: 3.0.19
>>> firmware-version: 1.2-1
>>> bus-info: 0000:01:00.1
>>>
>>>
>>> If no vlans are on eth1 and eth3, then it bridges fine, with vlan
>>> headers inline. But, if you add a VLAN to eth1, it stops working,
>>> probably because pkt tag is then un-stripped. I was generating on
>>> vlan 7, and created vlan 9, btw.
>>>
>>> I'll see if I can figure out how to use aux-data next...
>>
>> Why does the aux-data mask out the CFI bit? Shouldn't
>> we just pass the 16-bit VLAN tag un-modified to user-space?
>
> Either my ubridge code is broken, or 2.6.38.7+ doesn't send tp_vlan_tci
> properly. All I ever see is zero for that field.
>
> Test case is:
>
> set up VLAN traffic generator on VLAN 7 on separate machine.
> udp traffic, 56kbps, 1024 byte udp payload.
>
> Set up ubridge on test machine:
> ifconfig eth1 promisc
> ifconfig eth3 promisc
> # Make NIC go into stripping mode.
> ip link add link eth1 up name eth1.9 type vlan id 9
> ./do_test.pl eth1 eth3
>
>
> Expect that at least tci is != 0.
>
> The ubridge code is here:
> http://www.candelatech.com/~greearb/misc/ubridge.tar.gz

Ok, was my bug.  That test case passes now...will upload
the fixed code tomorrow.

Ben

>
> Thanks,
> Ben
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* Re: [GIT PULL] Namespace file descriptors for 2.6.40
From: Valdis.Kletnieks @ 2011-05-25  0:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric W. Biederman, James Bottomley, Linus Torvalds, linux-kernel,
	Linux Containers, netdev, Geert Uytterhoeven
In-Reply-To: <20110524071628.GA31612@elte.hu>

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On Tue, 24 May 2011 09:16:28 +0200, Ingo Molnar said:
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > My gut feel says we should really implement an
> > include/asm-generic/unistd-common.h to include all new system calls.
> >
> > That way there would be only one file to touch instead of 50. Certainly it
> > works for include/asm-generic/unistd.h for the architectures that use it. 
> > And all we really need is just a little abstraction on that concept.
>
> I suppose that could be tried, although in practice it would probably be
> somewhat complex due to the various compat syscall handling differences.

Can somebody fill us newcomers in on the arch-aeology of why some syscalls have
different numbers on different archs? I know it's partially because some simply
didn't implement some syscalls so there were numbering mismatches, but would it
have been *that* hard to wire all of those skipped syscalls up to one stub
'return -ENOSYS'?


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
From: Jay Vosburgh @ 2011-05-25  0:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: =?us-ascii?B?PT9JU08tODg1OS0xP1E/Tmljb2xhc19kZV9QZXNsbz1GQ2FuPz0=?=,
	Andy Gospodarek, netdev, David S. Miller
In-Reply-To: <20110524231813.GA2350@neilslaptop.think-freely.org>

Neil Horman <nhorman@tuxdriver.com> wrote:

>On Tue, May 24, 2011 at 02:12:54PM -0700, Jay Vosburgh wrote:
>> Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
>> 
>> >Le 24/05/2011 22:37, Neil Horman a écrit :
>> >
>> >>>>> +		return -EINVAL;
>> >>>
>> >>> This will turn a warning into an error.
>> >>>
>> >> Yes, because it should have been an error all along.
>> >>
>> >>> This warning existed for long, but never caused the bonding setup to
>> >>> fail. This patch cause some regression for user space. For example,
>> >>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
>> >>> before enslaving, because this was never required.
>> >>>
>> >> Thats not a regression, thats the kernel returning an error where it should have
>> >> done so all along.  Just because a utility got away with it for awhile and it
>> >> didn't always cause a lockup, doesn't grandfather that application in to a
>> >> situation where the kernel has to support its broken behavior in perpituity.
>> >>
>> >> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
>> >> patch doesn't touch, so ifenslave is currently unaffected (although I should
>> >> look in the ioctl path to see if we have already added such a check, lest you be
>> >> able to deadlock your system as previously indicated using that tool).
>> >
>> >Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
>> >ioctl (ifenslave binary) anymore, but only sysfs.
>> >
>> >Documentation/bonding.txt should be updated to reflect this change.
>> >pr_warning should be changed to pr_ err.
>> >Bonding version should be bumped.
>> >
>> >Anyway, I will fix this package, but I suspect there exist many user
>> >scripts that don't ensure bond is up before enslaving.
>> 
>> 	I looked at sysconfig (as supplied with opensuse) and it uses
>> sysfs, and does set the master device up first.  The other potential
>> user that comes to mind is that OFED at one point had a script to set up
>> bonding for Infiniband devices.  I don't know if this is still the case,
>> nor do I know if it set the bond device up before enslaving.
>> 
>> 	Generally speaking, though, in the long run I think it should be
>> permissible to change any bonding option when the bond is down (even to
>> values that make no sense in context, e.g., setting the primary to a
>> device not currently enslaved).  My rationale here is that some options
>> are very difficult to modify when the bond is up (e.g., changing the
>> mode), and now some other set is precluded when the bond is down.  The
>> init scripts already have repeat logic in them; this just makes things
>> more complicated.
>> 
>> 	There should be a state wherein any option can be changed (well,
>> maybe not max_bonds), and that should be the down state.  A subset can
>> also be changed while up.  I'd be happy to be able to change all options
>> while the bond is up, too, but that seems pretty hard to do.
>> 
>> 	How much harder is it to fix the locking and permit the action
>> in question here?
>> 
>In this case, to just hack something in place is pretty easy, I can just
>initalize the spinlocks for all cases in the bond_create path.  But to do in any
>sort of sensical way is much harder, since the code is written such that you
>initialize various relevant data structures based on the mode of the bond,
>which, as you indicated above, you want the right to change up until the point
>where you ifup the bonded interface. 
>
>The whole thing is predicated on the notion that
>transitioning from the down to up state is the gating factor to initializing the
>current configuration.  What might work is an in-between state in which you commit
>and initialize a bond based on the current configuation.  Doing so would allow
>you to (re-)initialize a bond configuration in a safe state.  Only after
>commiting a configuration could you enslave devices or ifup the bond.  Once up,
>further commits would be non-permissable until the bond was brought down again.
>Of course, this would also require changing the semantics of the user space
>tools.

	One alternative is to simply permit all option changes while the
bond is down, then commit then as a set when the bond is brought up (or
fail the open or adjust options if the options are invalid).  More or
less what you suggest, except that the "commit" is the ifup.

	Even that is a substantial rework of how things are done now,
since as you point out, options today take effect more or less in real
time.

>This also begs the question, is it or is it not safe to enslave devices while
>the bond is down?  Clearly from the bug report its unsafe, and I don't know what
>other (if any) conditions exist that cause problems when doing this (be that a
>deadlock, panic or simply undefined or unexpected behavior).  If its really
>unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
>cause things like this, and as such, we should return an error.  If it is safe
>(generally) and this is an isolated bug, then we should probably remove the
>warning.  But to just issue a vague 'This might do bad things' warning seems
>wrong in either case.

	Agreed.  I think it (conceptually) should be safe to add or
remove slaves when the bond is down, and bonding shouldn't complain
about it.

	Ok, I looked through the log, and originally enslaving while
down via sysfs was disallowed when that code was added.

	Later, enslaving while down was enabled in a patch for
Infiniband.  Reading the changelog explains why; for IB, enslaving while
up messed up the multicast group addresses:

commit 6b1bf096508c870889c2be63c7757a04d72116fe
Author: Moni Shoua <monis@voltaire.com>
Date:   Tue Oct 9 19:43:40 2007 -0700

    net/bonding: Enable IP multicast for bonding IPoIB devices
    
    Allow to enslave devices when the bonding device is not up. Over the discussion
    held at the previous post this seemed to be the most clean way to go, where it
    is not expected to cause instabilities.
    
    Normally, the bonding driver is UP before any enslavement takes place.
    Once a netdevice is UP, the network stack acts to have it join some multicast groups
    (eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
    type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
    computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
    where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
    is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)
    
    Signed-off-by: Moni Shoua <monis at voltaire.com>
    Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
    Acked-by: Jay Vosburgh <fubar@us.ibm.com>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

	Assuming this situation hasn't changed (and I'm not sure how it
could, because the first IB slave causes a type change), I don't believe
we can disallow enslaving while the bond is down because IB depends on
it.

>I'll respin the patch to just initialize the spinlocks in the morning, if thats
>what will fix the deadlock, but it really seems like the wrong way to go to me.
>If enslaving devices to a bond while its down has been known to cause problems,
>then we shouldn't allow it and we should update the user space tools to
>understand and handle that.

	This is the first I've heard of it causing a panic; looking
back, I can also see some issues with the warning itself (because it
happens prior to the rtnl_trylock, the message may repeat a lot if rtnl
is contended), so I'm happy to see the warning go away either way.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 0/9] strict user copy checks on x86_64
From: H. Peter Anvin @ 2011-05-24 23:33 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Andrew Morton, linux-kernel, linux-wireless, netdev,
	Intel Linux Wireless, linux-scsi, x86, Ingo Molnar
In-Reply-To: <4DDC232F.7080005@codeaurora.org>

On 05/24/2011 02:29 PM, Stephen Boyd wrote:
> 
> It looks like 1, 2, 4, 6, and 7 got picked up. Should I resend the left
> over patches with appropriate acked-bys and tags? Would it be
> appropriate to push this through your tree?
> 

I was first going to think I'd pick up 8 and 9 in tip, but since 9 is
cross-architecture, Andrew's tree might be better.

Acked-by: H. Peter Anvin <hpa@zytor.com>

	-hpa

^ permalink raw reply

* Re: [PATCH] net: Abstract features usage.
From: Mahesh Bandewar @ 2011-05-24 23:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev, Tom Herbert
In-Reply-To: <20110524161138.55aed6b1@nehalam>

On Tue, May 24, 2011 at 4:11 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Tue, 24 May 2011 16:04:20 -0700
> Mahesh Bandewar <maheshb@google.com> wrote:
>
>> On Tue, May 24, 2011 at 2:29 PM, Stephen Hemminger
>> <shemminger@vyatta.com> wrote:
>> > On Tue, 24 May 2011 11:52:42 -0700
>> > Mahesh Bandewar <maheshb@google.com> wrote:
>> >
>> >> Define macros to set/clear/test bits for feature set usage. This will eliminate
>> >> the direct use of these fields and enable future ease in managing these fields.
>> >>
>> >> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> >> ---
>> >>  include/linux/netdev_features.h |  137 +++++++++++++++++++++++++++++++++++++++
>> >>  include/linux/netdevice.h       |   35 ++---------
>> >>  2 files changed, 142 insertions(+), 30 deletions(-)
>> >>  create mode 100644 include/linux/netdev_features.h
>> >>
>> >> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>> >> new file mode 100644
>> >> index 0000000..97bf8c4
>> >> --- /dev/null
>> >> +++ b/include/linux/netdev_features.h
>> >> @@ -0,0 +1,137 @@
>> >> +#ifndef      _NETDEV_FEATURES_H
>> >> +#define      _NETDEV_FEATURES_H
>> >> +
>> >> +/* Forward declarations */
>> >> +struct net_device;
>> >> +
>> >> +typedef      unsigned long *nd_feature_t
>> >
>> > typedef'ing a pointer is strongly discouraged by kernel coding style.
>> > You need to use another way such as open coding it.
>> >
>> The idea here is to have a typedef that we can use to avoid future
>> changes all over the code if we decide to change the way feature bits
>> are stored. So if this is bitmap it will be defined as "unsigned long
>> *" and if it were to be u64 then it's typedef-ed as "u64". So in one
>> case it's a pointer and in other case it's not! Now how do we handle
>> both these cases?
>
> It is okay to add a typedef for a fixed width type.
> Just not adding the indirection
> because it leads to confusion.
>
> Read "Chapter 5: Typedefs" in Documentation/CodingStyle
>
Yes, this usage matches the (a) case in the documentation. This
typedef is to hide the object but enable APIs to pass it around for
reference and it _should_ only be altered by using the helper APIs
from this file. If we find a certain API is missing, it should be
added here instead of doing some local magic.

Thanks,
--mahesh..
>
>
> --
>

^ permalink raw reply

* Re: [PATCH] bonding: prevent deadlock on slave store with alb mode
From: Neil Horman @ 2011-05-24 23:18 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: =?ISO-8859-1?Q?Nicolas_de_Peslo=FCan?=, Andy Gospodarek, netdev,
	David S. Miller
In-Reply-To: <12577.1306271574@death>

On Tue, May 24, 2011 at 02:12:54PM -0700, Jay Vosburgh wrote:
> Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:
> 
> >Le 24/05/2011 22:37, Neil Horman a écrit :
> >
> >>>>> +		return -EINVAL;
> >>>
> >>> This will turn a warning into an error.
> >>>
> >> Yes, because it should have been an error all along.
> >>
> >>> This warning existed for long, but never caused the bonding setup to
> >>> fail. This patch cause some regression for user space. For example,
> >>> current ifenslave-2.6 package in Debian doesn't ensure bond is UP
> >>> before enslaving, because this was never required.
> >>>
> >> Thats not a regression, thats the kernel returning an error where it should have
> >> done so all along.  Just because a utility got away with it for awhile and it
> >> didn't always cause a lockup, doesn't grandfather that application in to a
> >> situation where the kernel has to support its broken behavior in perpituity.
> >>
> >> Besides, iirc, the ifsenslave utility still uses the ioctl path, which this
> >> patch doesn't touch, so ifenslave is currently unaffected (although I should
> >> look in the ioctl path to see if we have already added such a check, lest you be
> >> able to deadlock your system as previously indicated using that tool).
> >
> >Unfortunately, no. Recent versions of ifenslave-2.6 on Debian don't use
> >ioctl (ifenslave binary) anymore, but only sysfs.
> >
> >Documentation/bonding.txt should be updated to reflect this change.
> >pr_warning should be changed to pr_ err.
> >Bonding version should be bumped.
> >
> >Anyway, I will fix this package, but I suspect there exist many user
> >scripts that don't ensure bond is up before enslaving.
> 
> 	I looked at sysconfig (as supplied with opensuse) and it uses
> sysfs, and does set the master device up first.  The other potential
> user that comes to mind is that OFED at one point had a script to set up
> bonding for Infiniband devices.  I don't know if this is still the case,
> nor do I know if it set the bond device up before enslaving.
> 
> 	Generally speaking, though, in the long run I think it should be
> permissible to change any bonding option when the bond is down (even to
> values that make no sense in context, e.g., setting the primary to a
> device not currently enslaved).  My rationale here is that some options
> are very difficult to modify when the bond is up (e.g., changing the
> mode), and now some other set is precluded when the bond is down.  The
> init scripts already have repeat logic in them; this just makes things
> more complicated.
> 
> 	There should be a state wherein any option can be changed (well,
> maybe not max_bonds), and that should be the down state.  A subset can
> also be changed while up.  I'd be happy to be able to change all options
> while the bond is up, too, but that seems pretty hard to do.
> 
> 	How much harder is it to fix the locking and permit the action
> in question here?
> 
In this case, to just hack something in place is pretty easy, I can just
initalize the spinlocks for all cases in the bond_create path.  But to do in any
sort of sensical way is much harder, since the code is written such that you
initialize various relevant data structures based on the mode of the bond,
which, as you indicated above, you want the right to change up until the point
where you ifup the bonded interface. 

The whole thing is predicated on the notion that
transitioning from the down to up state is the gating factor to initializing the
current configuration.  What might work is an in-between state in which you commit
and initialize a bond based on the current configuation.  Doing so would allow
you to (re-)initialize a bond configuration in a safe state.  Only after
commiting a configuration could you enslave devices or ifup the bond.  Once up,
further commits would be non-permissable until the bond was brought down again.
Of course, this would also require changing the semantics of the user space
tools.

This also begs the question, is it or is it not safe to enslave devices while
the bond is down?  Clearly from the bug report its unsafe, and I don't know what
other (if any) conditions exist that cause problems when doing this (be that a
deadlock, panic or simply undefined or unexpected behavior).  If its really
unsafe, then issuing a warning seems incorrect, we shouldn't allow user space to
cause things like this, and as such, we should return an error.  If it is safe
(generally) and this is an isolated bug, then we should probably remove the
warning.  But to just issue a vague 'This might do bad things' warning seems
wrong in either case.

I'll respin the patch to just initialize the spinlocks in the morning, if thats
what will fix the deadlock, but it really seems like the wrong way to go to me.
If enslaving devices to a bond while its down has been known to cause problems,
then we shouldn't allow it and we should update the user space tools to
understand and handle that.
Neil

> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

^ permalink raw reply

* Re: [PATCH] net: Abstract features usage.
From: Mahesh Bandewar @ 2011-05-24 23:11 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: David Miller, netdev, Tom Herbert
In-Reply-To: <BANLkTinpNJz-6y1hvw6uzDJsixHHLEuA-w@mail.gmail.com>

On Tue, May 24, 2011 at 2:49 PM, Michał Mirosław <mirqus@gmail.com> wrote:
> W dniu 24 maja 2011 22:29 użytkownik Mahesh Bandewar
> <maheshb@google.com> napisał:
>> On Tue, May 24, 2011 at 12:37 PM, Michał Mirosław <mirqus@gmail.com> wrote:
>>>
>>> 2011/5/24 Mahesh Bandewar <maheshb@google.com>:
>>> > Define macros to set/clear/test bits for feature set usage. This will eliminate
>>> > the direct use of these fields and enable future ease in managing these fields.
>>> >
>>> > Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>>> > ---
>>> >  include/linux/netdev_features.h |  137 +++++++++++++++++++++++++++++++++++++++
>>> >  include/linux/netdevice.h       |   35 ++---------
>>> >  2 files changed, 142 insertions(+), 30 deletions(-)
>>> >  create mode 100644 include/linux/netdev_features.h
>>> >
>>> > diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>>> > new file mode 100644
>>> > index 0000000..97bf8c4
>>> > --- /dev/null
>>> > +++ b/include/linux/netdev_features.h
>>> > @@ -0,0 +1,137 @@
>>> > +#ifndef        _NETDEV_FEATURES_H
>>> > +#define        _NETDEV_FEATURES_H
>>> > +
>>> > +/* Forward declarations */
>>> > +struct net_device;
>>> > +
>>> > +typedef        unsigned long *nd_feature_t;
>>> > +
>>> > +/* Net device feature bits; if you change something,
>>> > + * also update netdev_features_strings[] in ethtool.c */
>>> > +enum netdev_features {
>>> > +       SG_BIT,                 /* Scatter/gather IO. */
>>> [...]
>>>
>>> Please split this change (introducing enum + converting NETIF_F_*
>>> defines to use it). This part is a nice cleanup, but I think the
>>> bitmap idea is still not worth the trouble until u64 runs out.
>> What will we achieve by this split? May be I'm not getting your point
>> about split. Do you want to see the cleanup part as a separate patch?
>> These enums and NETIF_F_* values have to be in sync.
>
> Exactly. You already did the cleanup (defining enum and NETIF_F_*
> #defines), so just split it into separate patch. This is a
> prerequisite to any change to the type holding features from u32 to
> either u64 or bitmap. And this will separate the cleanup from your
> proposed idea.
>
Alright, will split the change into two pieces.

>> The macros that I have defined are expecting this bit numbers rather
>> than the flag value.
>> Whether it is u64 or bitmap, the changes should be limited to this
>> file. Going forward that can be achieved by
>> simply changing few lines in these macros / inlines.
>
> One thing I don't like in this implementation is that it leaves old
> fields around - so you'll have two code templates in the kernel doing
> the same thing (unless you fix all users, but then you won't need old
> fields anyway).
>
The older fields give us longer time frame to make these changes where
ever necessary. Also enable us to make these changes in stages. Once
all the changes are in place, the older-fields along with the code
instrumentation will be removed. So yes, this file / macros / inlines
will be touched at the end of the completion of the last phase.

Thanks,
--mahesh..

> Best Regards,
> Michał Mirosław
>

^ permalink raw reply

* Re: [PATCH] net: Abstract features usage.
From: Stephen Hemminger @ 2011-05-24 23:11 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: David Miller, netdev, Tom Herbert
In-Reply-To: <BANLkTin48HKVRn6kG8SCZNfhyWVVhkBArg@mail.gmail.com>

On Tue, 24 May 2011 16:04:20 -0700
Mahesh Bandewar <maheshb@google.com> wrote:

> On Tue, May 24, 2011 at 2:29 PM, Stephen Hemminger
> <shemminger@vyatta.com> wrote:
> > On Tue, 24 May 2011 11:52:42 -0700
> > Mahesh Bandewar <maheshb@google.com> wrote:
> >
> >> Define macros to set/clear/test bits for feature set usage. This will eliminate
> >> the direct use of these fields and enable future ease in managing these fields.
> >>
> >> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
> >> ---
> >>  include/linux/netdev_features.h |  137 +++++++++++++++++++++++++++++++++++++++
> >>  include/linux/netdevice.h       |   35 ++---------
> >>  2 files changed, 142 insertions(+), 30 deletions(-)
> >>  create mode 100644 include/linux/netdev_features.h
> >>
> >> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> >> new file mode 100644
> >> index 0000000..97bf8c4
> >> --- /dev/null
> >> +++ b/include/linux/netdev_features.h
> >> @@ -0,0 +1,137 @@
> >> +#ifndef      _NETDEV_FEATURES_H
> >> +#define      _NETDEV_FEATURES_H
> >> +
> >> +/* Forward declarations */
> >> +struct net_device;
> >> +
> >> +typedef      unsigned long *nd_feature_t
> >
> > typedef'ing a pointer is strongly discouraged by kernel coding style.
> > You need to use another way such as open coding it.
> >
> The idea here is to have a typedef that we can use to avoid future
> changes all over the code if we decide to change the way feature bits
> are stored. So if this is bitmap it will be defined as "unsigned long
> *" and if it were to be u64 then it's typedef-ed as "u64". So in one
> case it's a pointer and in other case it's not! Now how do we handle
> both these cases?

It is okay to add a typedef for a fixed width type. 
Just not adding the indirection
because it leads to confusion.

Read "Chapter 5: Typedefs" in Documentation/CodingStyle



-- 

^ permalink raw reply

* Re: [PATCH] net: Abstract features usage.
From: Mahesh Bandewar @ 2011-05-24 23:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev, Tom Herbert
In-Reply-To: <20110524142928.105d66ef@nehalam>

On Tue, May 24, 2011 at 2:29 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
> On Tue, 24 May 2011 11:52:42 -0700
> Mahesh Bandewar <maheshb@google.com> wrote:
>
>> Define macros to set/clear/test bits for feature set usage. This will eliminate
>> the direct use of these fields and enable future ease in managing these fields.
>>
>> Signed-off-by: Mahesh Bandewar <maheshb@google.com>
>> ---
>>  include/linux/netdev_features.h |  137 +++++++++++++++++++++++++++++++++++++++
>>  include/linux/netdevice.h       |   35 ++---------
>>  2 files changed, 142 insertions(+), 30 deletions(-)
>>  create mode 100644 include/linux/netdev_features.h
>>
>> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>> new file mode 100644
>> index 0000000..97bf8c4
>> --- /dev/null
>> +++ b/include/linux/netdev_features.h
>> @@ -0,0 +1,137 @@
>> +#ifndef      _NETDEV_FEATURES_H
>> +#define      _NETDEV_FEATURES_H
>> +
>> +/* Forward declarations */
>> +struct net_device;
>> +
>> +typedef      unsigned long *nd_feature_t
>
> typedef'ing a pointer is strongly discouraged by kernel coding style.
> You need to use another way such as open coding it.
>
The idea here is to have a typedef that we can use to avoid future
changes all over the code if we decide to change the way feature bits
are stored. So if this is bitmap it will be defined as "unsigned long
*" and if it were to be u64 then it's typedef-ed as "u64". So in one
case it's a pointer and in other case it's not! Now how do we handle
both these cases?

Thanks,
--mahesh..

> --
>

^ permalink raw reply

* Re: VLAN test cases in 2.6.38.7
From: Ben Greear @ 2011-05-24 22:52 UTC (permalink / raw)
  To: netdev
In-Reply-To: <4DDC0236.4090800@candelatech.com>

On 05/24/2011 12:08 PM, Ben Greear wrote:
> On 05/24/2011 11:49 AM, Ben Greear wrote:
>>
>> I wrote a simple bridge that uses packet sockets to read/write
>> from network devices. I'll upload this code somewhere when
>> I get it a bit more presentable.
>>
>> Machine info:
>>
>> Linux lf0300-demo 2.6.38.7+ #14 SMP Mon May 23 10:31:45 PDT 2011 i686
>> i686 i386 GNU/Linux
>>
>> Interface A: eth1
>> driver: igb
>> version: 3.0.19
>> firmware-version: 1.2-1
>> bus-info: 0000:01:00.0
>>
>> Interface B: eth3:
>> driver: igb
>> version: 3.0.19
>> firmware-version: 1.2-1
>> bus-info: 0000:01:00.1
>>
>>
>> If no vlans are on eth1 and eth3, then it bridges fine, with vlan
>> headers inline. But, if you add a VLAN to eth1, it stops working,
>> probably because pkt tag is then un-stripped. I was generating on
>> vlan 7, and created vlan 9, btw.
>>
>> I'll see if I can figure out how to use aux-data next...
>
> Why does the aux-data mask out the CFI bit? Shouldn't
> we just pass the 16-bit VLAN tag un-modified to user-space?

Either my ubridge code is broken, or 2.6.38.7+ doesn't send tp_vlan_tci
properly.  All I ever see is zero for that field.

Test case is:

set up VLAN traffic generator on VLAN 7 on separate machine.
udp traffic, 56kbps, 1024 byte udp payload.

Set up ubridge on test machine:
ifconfig eth1 promisc
ifconfig eth3 promisc
# Make NIC go into stripping mode.
ip link add link eth1 up name eth1.9 type vlan id 9
./do_test.pl eth1 eth3


Expect that at least tci is != 0.

The ubridge code is here:
http://www.candelatech.com/~greearb/misc/ubridge.tar.gz

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* 2.6.39: ipv6 tcp connections hanging on transmit
From: Jeremy Fitzhardinge @ 2011-05-24 22:09 UTC (permalink / raw)
  To: NetDev; +Cc: Linux Kernel Mailing List

When I use ssh over ipv6 to connect to my machines, it works fine for
command-line interaction until there's a large output (ps alxw, ps -l,
etc), whereupon the connection hangs.  It appears to be about when it
would send an MTU-sized packet.

Logging in with ssh over ipv4 works fine.

The outputting processes terminate OK, but they leave queued data in the
socket:

# netstat
Proto Recv-Q Send-Q Local Address               Foreign Address             State      
[...]
tcp        0   5776 2001:470:a816:0:230:48f:ssh 2001:470:a816:0:e45e::47865 ESTABLISHED 
tcp        0  26688 2001:470:a816:0:230:48f:ssh 2001:470:a816:0:e45e::55428 ESTABLISHED 
tcp        0      0 2001:470:a816:0:230:48f:ssh 2001:470:a816:0:e45e::55426 ESTABLISHED 
tcp        0   5856 2001:470:a816:0:230:48f:ssh 2001:470:a816:0:e45e::55434 ESTABLISHED 

and the corresponding ssh sessions are hung forever.

The path to the host is over openvpn bridged to the host's lan.  The
path MTU is 1500, verified with "traceroute6 --mtu" and "ping -s 1500"
works.

The network device is a bridge containing a single ethernet device: an
Intel 82574L using the e1000e driver.

I think this regression appeared in .38/.39, but unfortunately I haven't
had a chance to go back and systematically test older kernels yet.

Any clues, suggestions, further information I can provide, etc?

Thanks,
    J

^ permalink raw reply

* Re: [PATCH] ethtool: ETHTOOL_SFEATURES: remove NETIF_F_COMPAT return
From: Michał Mirosław @ 2011-05-24 21:59 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, netdev
In-Reply-To: <20110524.153930.610330240390616957.davem@davemloft.net>

On Tue, May 24, 2011 at 03:39:30PM -0400, David Miller wrote:
> From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> Date: Tue, 24 May 2011 11:14:37 +0200
> 
> > On Thu, May 19, 2011 at 12:03:31PM +0200, Michał Mirosław wrote:
> >> On Mon, May 16, 2011 at 02:09:58PM -0400, David Miller wrote:
> >> > You guys really need to sort this out properly.
> >> > Please resubmit whatever final solution is agreed upon.
> >> I noticed that v2.6.39 was tagged today. We should definitely remove
> >> NETIF_F_COMPAT so it won't bite us in the future. The other patch that
> >> fixes ethtool_ops->set_flags compatibility is a bugfix, so it should go
> >> in - if we decide that the SFEATURES compatibility should be removed
> >> it won't matter.
> >> 
> >> Ben, do you agree?
> > Ping?
> > http://patchwork.ozlabs.org/patch/95552/
> > (this is a bugfix, so should go to stable)
> > 
> > http://patchwork.ozlabs.org/patch/95753/
> > (removes ETHTOOL_F_COMPAT; this we need to decide on)
> You and Ben are still arguing over details.
> 
> I want fresh versions of these patches (yes, both of them) once
> all of the issues are resolved.

We could just wait for 2.6.40 and pretend this code part never existed. ;-)

I'll rebase the first patch tomorrow. Without it the compatibility in
ETHTOOL_SFEATURES for non-converted drivers is busted wrt set_flags.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: linux-next: build failure after merge of the final tree
From: Stephen Rothwell @ 2011-05-24 21:52 UTC (permalink / raw)
  To: Greg KH
  Cc: Mike Frysinger, Linus, linux-next, linux-kernel, David S. Miller,
	netdev, Andrew Morton, Mel Gorman, linux-mm, Alexander Viro,
	linux-fsdevel, Paul E. McKenney, Dipankar Sarma
In-Reply-To: <20110524124833.GB31776@kroah.com>

[-- Attachment #1: Type: text/plain, Size: 599 bytes --]

Hi Greg,

On Tue, 24 May 2011 05:48:33 -0700 Greg KH <greg@kroah.com> wrote:
>
> On Tue, May 24, 2011 at 01:59:30PM +1000, Stephen Rothwell wrote:
> > 
> > The cause was a patch from Linus ...
> 
> Ah, ok, that makes more sense, sorry for the noise.

And it doesn't show up in many builds because musb depends on ARM ||
(BF54x && !BF544) || (BF52x && !BF522 && !BF523).  So it probably appears
in some of the overnight builds, but not the ones I do while creating
linux-next.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox