netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bonding: Fix corrupted queue_mapping
@ 2012-06-08  5:05 Tom Herbert
  2012-06-08  5:46 ` David Miller
  2012-06-08  5:57 ` Eric Dumazet
  0 siblings, 2 replies; 17+ messages in thread
From: Tom Herbert @ 2012-06-08  5:05 UTC (permalink / raw)
  To: davem, netdev

In the transmit path of the bonding driver, skb->cb is used to
stash the skb->queue_mapping so that the bonding device can set its
own queue mapping.  This value becomes corrupted since the skb->cb is
also used in __dev_xmit_skb.

When transmitting through bonding driver, bond_select_queue is
called from dev_queue_xmit.  In bond_select_queue the original
skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
and skb->queue_mapping is overwritten with the bond driver queue.
Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
the packet length into skb->cb, thereby overwriting the stashed
queue mappping.  In bond_dev_queue_xmit (called from hard_start_xmit),
the queue mapping for the skb is set to the stashed value which is now
the skb length and hence is an invalid queue for the slave device.

Fix is to set bond_queue_mapping to skb->cb +
sizeof((struct qdisc_skb_cb)

Signed-off-by: Tom Herbert <therbert@google.com>
---
 drivers/net/bonding/bond_main.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 2ee8cf9..044c1c0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -76,6 +76,7 @@
 #include <net/route.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
+#include <net/sch_generic.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 #include "bond_alb.h"
@@ -381,7 +382,8 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr)
 	return next;
 }
 
-#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb))
+#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb + \
+    sizeof(struct qdisc_skb_cb)))
 
 /**
  * bond_dev_queue_xmit - Prepare skb for xmit.
-- 
1.7.7.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  5:05 [PATCH] bonding: Fix corrupted queue_mapping Tom Herbert
@ 2012-06-08  5:46 ` David Miller
  2012-06-08  5:57 ` Eric Dumazet
  1 sibling, 0 replies; 17+ messages in thread
From: David Miller @ 2012-06-08  5:46 UTC (permalink / raw)
  To: therbert; +Cc: netdev

From: Tom Herbert <therbert@google.com>
Date: Thu, 7 Jun 2012 22:05:42 -0700 (PDT)

> @@ -381,7 +382,8 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr)
>  	return next;
>  }
>  
> -#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb))
> +#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb + \
> +    sizeof(struct qdisc_skb_cb)))
>  
>  /**
>   * bond_dev_queue_xmit - Prepare skb for xmit.

I know it's a little bit more work, but please declare a proper
datastructure which shows explicitly what's going on, like Infiniband
does in drivers/infiniband/ulp/ipoib/ipoib.h

struct bond_skb_cb {
	struct qdisc_skb_cb	qdisc_cb;
	u16			queue_mapping;
};

Actually, this probably means there is also a conflict and thus
queue mapping corruption possible for bonded infiniband. :-/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  5:05 [PATCH] bonding: Fix corrupted queue_mapping Tom Herbert
  2012-06-08  5:46 ` David Miller
@ 2012-06-08  5:57 ` Eric Dumazet
  2012-06-08  6:02   ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  5:57 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev

On Thu, 2012-06-07 at 22:05 -0700, Tom Herbert wrote:
> In the transmit path of the bonding driver, skb->cb is used to
> stash the skb->queue_mapping so that the bonding device can set its
> own queue mapping.  This value becomes corrupted since the skb->cb is
> also used in __dev_xmit_skb.
> 
> When transmitting through bonding driver, bond_select_queue is
> called from dev_queue_xmit.  In bond_select_queue the original
> skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
> and skb->queue_mapping is overwritten with the bond driver queue.
> Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
> the packet length into skb->cb, thereby overwriting the stashed
> queue mappping.  In bond_dev_queue_xmit (called from hard_start_xmit),
> the queue mapping for the skb is set to the stashed value which is now
> the skb length and hence is an invalid queue for the slave device.
> 
> Fix is to set bond_queue_mapping to skb->cb +
> sizeof((struct qdisc_skb_cb)
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  drivers/net/bonding/bond_main.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 2ee8cf9..044c1c0 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -76,6 +76,7 @@
>  #include <net/route.h>
>  #include <net/net_namespace.h>
>  #include <net/netns/generic.h>
> +#include <net/sch_generic.h>
>  #include "bonding.h"
>  #include "bond_3ad.h"
>  #include "bond_alb.h"
> @@ -381,7 +382,8 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr)
>  	return next;
>  }
>  
> -#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb))
> +#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb + \
> +    sizeof(struct qdisc_skb_cb)))
>  
>  /**
>   * bond_dev_queue_xmit - Prepare skb for xmit.


Sorry this wont work in all cases.

Some qdisc also use skb->cb[]

maybe :

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 55ce96b..47cbfa2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -220,6 +220,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
+	unsigned int		bond_queue_mapping;
 	unsigned char		data[24];
 };
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  5:57 ` Eric Dumazet
@ 2012-06-08  6:02   ` David Miller
  2012-06-08  6:11     ` Eric Dumazet
  2012-06-08  6:17     ` Eric Dumazet
  0 siblings, 2 replies; 17+ messages in thread
From: David Miller @ 2012-06-08  6:02 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Jun 2012 07:57:37 +0200

> On Thu, 2012-06-07 at 22:05 -0700, Tom Herbert wrote:
>> In the transmit path of the bonding driver, skb->cb is used to
>> stash the skb->queue_mapping so that the bonding device can set its
>> own queue mapping.  This value becomes corrupted since the skb->cb is
>> also used in __dev_xmit_skb.
>> 
>> When transmitting through bonding driver, bond_select_queue is
>> called from dev_queue_xmit.  In bond_select_queue the original
>> skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
>> and skb->queue_mapping is overwritten with the bond driver queue.
>> Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
>> the packet length into skb->cb, thereby overwriting the stashed
>> queue mappping.  In bond_dev_queue_xmit (called from hard_start_xmit),
>> the queue mapping for the skb is set to the stashed value which is now
>> the skb length and hence is an invalid queue for the slave device.
>> 
>> Fix is to set bond_queue_mapping to skb->cb +
>> sizeof((struct qdisc_skb_cb)
>> 
>> Signed-off-by: Tom Herbert <therbert@google.com>
>> ---
>>  drivers/net/bonding/bond_main.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>> 
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 2ee8cf9..044c1c0 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -76,6 +76,7 @@
>>  #include <net/route.h>
>>  #include <net/net_namespace.h>
>>  #include <net/netns/generic.h>
>> +#include <net/sch_generic.h>
>>  #include "bonding.h"
>>  #include "bond_3ad.h"
>>  #include "bond_alb.h"
>> @@ -381,7 +382,8 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr)
>>  	return next;
>>  }
>>  
>> -#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb))
>> +#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb + \
>> +    sizeof(struct qdisc_skb_cb)))
>>  
>>  /**
>>   * bond_dev_queue_xmit - Prepare skb for xmit.
> 
> 
> Sorry this wont work in all cases.
> 
> Some qdisc also use skb->cb[]

Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
explicitly allocated:

>  	unsigned char		data[24];

there. :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  6:02   ` David Miller
@ 2012-06-08  6:11     ` Eric Dumazet
  2012-06-08  6:15       ` David Miller
  2012-06-08  6:17     ` Eric Dumazet
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  6:11 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev

On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 08 Jun 2012 07:57:37 +0200
> 
> > On Thu, 2012-06-07 at 22:05 -0700, Tom Herbert wrote:
> >> In the transmit path of the bonding driver, skb->cb is used to
> >> stash the skb->queue_mapping so that the bonding device can set its
> >> own queue mapping.  This value becomes corrupted since the skb->cb is
> >> also used in __dev_xmit_skb.
> >> 
> >> When transmitting through bonding driver, bond_select_queue is
> >> called from dev_queue_xmit.  In bond_select_queue the original
> >> skb->queue_mapping is copied into skb->cb (via bond_queue_mapping)
> >> and skb->queue_mapping is overwritten with the bond driver queue.
> >> Subsequently in dev_queue_xmit, __dev_xmit_skb is called which writes
> >> the packet length into skb->cb, thereby overwriting the stashed
> >> queue mappping.  In bond_dev_queue_xmit (called from hard_start_xmit),
> >> the queue mapping for the skb is set to the stashed value which is now
> >> the skb length and hence is an invalid queue for the slave device.
> >> 
> >> Fix is to set bond_queue_mapping to skb->cb +
> >> sizeof((struct qdisc_skb_cb)
> >> 
> >> Signed-off-by: Tom Herbert <therbert@google.com>
> >> ---
> >>  drivers/net/bonding/bond_main.c |    4 +++-
> >>  1 files changed, 3 insertions(+), 1 deletions(-)
> >> 
> >> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >> index 2ee8cf9..044c1c0 100644
> >> --- a/drivers/net/bonding/bond_main.c
> >> +++ b/drivers/net/bonding/bond_main.c
> >> @@ -76,6 +76,7 @@
> >>  #include <net/route.h>
> >>  #include <net/net_namespace.h>
> >>  #include <net/netns/generic.h>
> >> +#include <net/sch_generic.h>
> >>  #include "bonding.h"
> >>  #include "bond_3ad.h"
> >>  #include "bond_alb.h"
> >> @@ -381,7 +382,8 @@ struct vlan_entry *bond_next_vlan(struct bonding *bond, struct vlan_entry *curr)
> >>  	return next;
> >>  }
> >>  
> >> -#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb))
> >> +#define bond_queue_mapping(skb) (*(u16 *)((skb)->cb + \
> >> +    sizeof(struct qdisc_skb_cb)))
> >>  
> >>  /**
> >>   * bond_dev_queue_xmit - Prepare skb for xmit.
> > 
> > 
> > Sorry this wont work in all cases.
> > 
> > Some qdisc also use skb->cb[]
> 
> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
> explicitly allocated:
> 
> >  	unsigned char		data[24];
> 
> there. :-)
> 

Yes, but some other layers can use the same trick so it might collide.

Inserting the bond field in qdisc_skb_cb (level0) is safer.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  6:11     ` Eric Dumazet
@ 2012-06-08  6:15       ` David Miller
  2012-06-08  6:47         ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2012-06-08  6:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Jun 2012 08:11:21 +0200

> On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
>> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
>> explicitly allocated:
>> 
>> >  	unsigned char		data[24];
>> 
>> there. :-)
>> 
> 
> Yes, but some other layers can use the same trick so it might collide.
> 
> Inserting the bond field in qdisc_skb_cb (level0) is safer.

Do you suggest that Infiniband does the same thing? :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  6:02   ` David Miller
  2012-06-08  6:11     ` Eric Dumazet
@ 2012-06-08  6:17     ` Eric Dumazet
  2012-06-08  6:22       ` David Miller
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  6:17 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev

On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:

> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
> explicitly allocated:
> 
> >  	unsigned char		data[24];
> 
> there. :-)
> 

By the way, I notice data[] is not aligned on a long on 64bit arches.

This might break net/sched/sch_netem.c on some arches, since
time_to_send is a u64.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  6:17     ` Eric Dumazet
@ 2012-06-08  6:22       ` David Miller
  0 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2012-06-08  6:22 UTC (permalink / raw)
  To: eric.dumazet; +Cc: therbert, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 08 Jun 2012 08:17:18 +0200

> On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
> 
>> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
>> explicitly allocated:
>> 
>> >  	unsigned char		data[24];
>> 
>> there. :-)
>> 
> 
> By the way, I notice data[] is not aligned on a long on 64bit arches.
> 
> This might break net/sched/sch_netem.c on some arches, since
> time_to_send is a u64.

Looks like we'll get the bonding queue mapping and fix this alignment
bug for free then :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  6:15       ` David Miller
@ 2012-06-08  6:47         ` Eric Dumazet
  2012-06-08  7:23           ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  6:47 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev

On Thu, 2012-06-07 at 23:15 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 08 Jun 2012 08:11:21 +0200
> 
> > On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
> >> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
> >> explicitly allocated:
> >> 
> >> >  	unsigned char		data[24];
> >> 
> >> there. :-)
> >> 
> > 
> > Yes, but some other layers can use the same trick so it might collide.
> > 
> > Inserting the bond field in qdisc_skb_cb (level0) is safer.
> 
> Do you suggest that Infiniband does the same thing? :-)

I wonder if another way to solve this is not letting ndo_select_queue()
method the responsibility to call skb_set_queue_mapping() itself ?

(ie removing skb_set_queue_mapping() done in dev_pick_tx())

bonding would not have to save/restore skb queue mapping ?

Partial patch : (we have to audit all ndo_select_queue()

diff --git a/net/core/dev.c b/net/core/dev.c
index cd09819..c6c92d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2368,6 +2368,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 
 	if (dev->real_num_tx_queues == 1)
 		queue_index = 0;
+		skb_set_queue_mapping(skb, queue_index);
 	else if (ops->ndo_select_queue) {
 		queue_index = ops->ndo_select_queue(dev, skb);
 		queue_index = dev_cap_txqueue(dev, queue_index);
@@ -2391,9 +2392,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 					sk_tx_queue_set(sk, queue_index);
 			}
 		}
+		skb_set_queue_mapping(skb, queue_index);
 	}
 
-	skb_set_queue_mapping(skb, queue_index);
 	return netdev_get_tx_queue(dev, queue_index);
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  6:47         ` Eric Dumazet
@ 2012-06-08  7:23           ` Eric Dumazet
  2012-06-08  7:42             ` John Fastabend
                               ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  7:23 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev

On Fri, 2012-06-08 at 08:47 +0200, Eric Dumazet wrote:
> On Thu, 2012-06-07 at 23:15 -0700, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Fri, 08 Jun 2012 08:11:21 +0200
> > 
> > > On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
> > >> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
> > >> explicitly allocated:
> > >> 
> > >> >  	unsigned char		data[24];
> > >> 
> > >> there. :-)
> > >> 
> > > 
> > > Yes, but some other layers can use the same trick so it might collide.
> > > 
> > > Inserting the bond field in qdisc_skb_cb (level0) is safer.
> > 
> > Do you suggest that Infiniband does the same thing? :-)
> 
> I wonder if another way to solve this is not letting ndo_select_queue()
> method the responsibility to call skb_set_queue_mapping() itself ?
> 
> (ie removing skb_set_queue_mapping() done in dev_pick_tx())
> 
> bonding would not have to save/restore skb queue mapping ?
> 
> Partial patch : (we have to audit all ndo_select_queue()
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd09819..c6c92d5 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2368,6 +2368,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>  
>  	if (dev->real_num_tx_queues == 1)
>  		queue_index = 0;
> +		skb_set_queue_mapping(skb, queue_index);
>  	else if (ops->ndo_select_queue) {
>  		queue_index = ops->ndo_select_queue(dev, skb);
>  		queue_index = dev_cap_txqueue(dev, queue_index);
> @@ -2391,9 +2392,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>  					sk_tx_queue_set(sk, queue_index);
>  			}
>  		}
> +		skb_set_queue_mapping(skb, queue_index);
>  	}
>  
> -	skb_set_queue_mapping(skb, queue_index);
>  	return netdev_get_tx_queue(dev, queue_index);
>  }
>  
> 


I must say I dont understand dev_pick_tx() anymore.

It seems to ignore skb->queue_mapping (unless device provides its own
ndo_select_queue() and this functions is aware of skb->queue_mapping, as
correctly done in ixgbe)

So commit fff3269907897ee (tcp: reflect SYN queue_mapping into SYNACK
packets) works on ixgbe, but probably not on other multiqueue devices.

This sounds like a regression to me.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  7:23           ` Eric Dumazet
@ 2012-06-08  7:42             ` John Fastabend
  2012-06-08  7:48               ` Eric Dumazet
  2012-06-08  7:46             ` Eric Dumazet
  2012-06-08 15:04             ` Tom Herbert
  2 siblings, 1 reply; 17+ messages in thread
From: John Fastabend @ 2012-06-08  7:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, therbert, netdev

On 6/8/2012 12:23 AM, Eric Dumazet wrote:
> On Fri, 2012-06-08 at 08:47 +0200, Eric Dumazet wrote:
>> On Thu, 2012-06-07 at 23:15 -0700, David Miller wrote:
>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>> Date: Fri, 08 Jun 2012 08:11:21 +0200
>>>
>>>> On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
>>>>> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
>>>>> explicitly allocated:
>>>>>
>>>>>>   	unsigned char		data[24];
>>>>>
>>>>> there. :-)
>>>>>
>>>>
>>>> Yes, but some other layers can use the same trick so it might collide.
>>>>
>>>> Inserting the bond field in qdisc_skb_cb (level0) is safer.
>>>
>>> Do you suggest that Infiniband does the same thing? :-)
>>
>> I wonder if another way to solve this is not letting ndo_select_queue()
>> method the responsibility to call skb_set_queue_mapping() itself ?
>>
>> (ie removing skb_set_queue_mapping() done in dev_pick_tx())
>>
>> bonding would not have to save/restore skb queue mapping ?
>>
>> Partial patch : (we have to audit all ndo_select_queue()
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index cd09819..c6c92d5 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2368,6 +2368,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>>
>>   	if (dev->real_num_tx_queues == 1)
>>   		queue_index = 0;
>> +		skb_set_queue_mapping(skb, queue_index);
>>   	else if (ops->ndo_select_queue) {
>>   		queue_index = ops->ndo_select_queue(dev, skb);
>>   		queue_index = dev_cap_txqueue(dev, queue_index);
>> @@ -2391,9 +2392,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>>   					sk_tx_queue_set(sk, queue_index);
>>   			}
>>   		}
>> +		skb_set_queue_mapping(skb, queue_index);
>>   	}
>>
>> -	skb_set_queue_mapping(skb, queue_index);
>>   	return netdev_get_tx_queue(dev, queue_index);
>>   }
>>
>>
>
>
> I must say I dont understand dev_pick_tx() anymore.
>
> It seems to ignore skb->queue_mapping (unless device provides its own
> ndo_select_queue() and this functions is aware of skb->queue_mapping, as
> correctly done in ixgbe)
>
> So commit fff3269907897ee (tcp: reflect SYN queue_mapping into SYNACK
> packets) works on ixgbe, but probably not on other multiqueue devices.
>
> This sounds like a regression to me.
>
>
>

Well it would get picked up via skb_tx_hash(),

         else if (ops->ndo_select_queue) {
		[...]
         } else {
                 struct sock *sk = skb->sk;
                 queue_index = sk_tx_queue_get(sk);

                 if (queue_index < 0 || skb->ooo_okay ||
                     queue_index >= dev->real_num_tx_queues) {
                         int old_index = queue_index;

                         queue_index = get_xps_queue(dev, skb);
                         if (queue_index < 0)
                                 queue_index = skb_tx_hash(dev, skb);
	[...]


So think this might be OK.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  7:23           ` Eric Dumazet
  2012-06-08  7:42             ` John Fastabend
@ 2012-06-08  7:46             ` Eric Dumazet
  2012-06-08 15:04             ` Tom Herbert
  2 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  7:46 UTC (permalink / raw)
  To: David Miller; +Cc: therbert, netdev

On Fri, 2012-06-08 at 09:24 +0200, Eric Dumazet wrote:

> 
> I must say I dont understand dev_pick_tx() anymore.
> 
> It seems to ignore skb->queue_mapping (unless device provides its own
> ndo_select_queue() and this functions is aware of skb->queue_mapping, as
> correctly done in ixgbe)
> 
> So commit fff3269907897ee (tcp: reflect SYN queue_mapping into SYNACK
> packets) works on ixgbe, but probably not on other multiqueue devices.
> 
> This sounds like a regression to me.

Oh well, its done in skb_tx_hash(), after a few indirections, and if
skb->sk is NULL.

Which happens to be true in net-next for SYNACKS after commit
90ba9b1986b5ac (tcp: tcp_make_synack() can use alloc_skb())

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  7:42             ` John Fastabend
@ 2012-06-08  7:48               ` Eric Dumazet
  2012-06-08  7:52                 ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  7:48 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, therbert, netdev

On Fri, 2012-06-08 at 00:42 -0700, John Fastabend wrote:
> On 6/8/2012 12:23 AM, Eric Dumazet wrote:
> > On Fri, 2012-06-08 at 08:47 +0200, Eric Dumazet wrote:
> >> On Thu, 2012-06-07 at 23:15 -0700, David Miller wrote:
> >>> From: Eric Dumazet <eric.dumazet@gmail.com>
> >>> Date: Fri, 08 Jun 2012 08:11:21 +0200
> >>>
> >>>> On Thu, 2012-06-07 at 23:02 -0700, David Miller wrote:
> >>>>> Hmmm, isn't that what qdisc_skb_cb is for?  And even private data is
> >>>>> explicitly allocated:
> >>>>>
> >>>>>>   	unsigned char		data[24];
> >>>>>
> >>>>> there. :-)
> >>>>>
> >>>>
> >>>> Yes, but some other layers can use the same trick so it might collide.
> >>>>
> >>>> Inserting the bond field in qdisc_skb_cb (level0) is safer.
> >>>
> >>> Do you suggest that Infiniband does the same thing? :-)
> >>
> >> I wonder if another way to solve this is not letting ndo_select_queue()
> >> method the responsibility to call skb_set_queue_mapping() itself ?
> >>
> >> (ie removing skb_set_queue_mapping() done in dev_pick_tx())
> >>
> >> bonding would not have to save/restore skb queue mapping ?
> >>
> >> Partial patch : (we have to audit all ndo_select_queue()
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index cd09819..c6c92d5 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -2368,6 +2368,7 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
> >>
> >>   	if (dev->real_num_tx_queues == 1)
> >>   		queue_index = 0;
> >> +		skb_set_queue_mapping(skb, queue_index);
> >>   	else if (ops->ndo_select_queue) {
> >>   		queue_index = ops->ndo_select_queue(dev, skb);
> >>   		queue_index = dev_cap_txqueue(dev, queue_index);
> >> @@ -2391,9 +2392,9 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
> >>   					sk_tx_queue_set(sk, queue_index);
> >>   			}
> >>   		}
> >> +		skb_set_queue_mapping(skb, queue_index);
> >>   	}
> >>
> >> -	skb_set_queue_mapping(skb, queue_index);
> >>   	return netdev_get_tx_queue(dev, queue_index);
> >>   }
> >>
> >>
> >
> >
> > I must say I dont understand dev_pick_tx() anymore.
> >
> > It seems to ignore skb->queue_mapping (unless device provides its own
> > ndo_select_queue() and this functions is aware of skb->queue_mapping, as
> > correctly done in ixgbe)
> >
> > So commit fff3269907897ee (tcp: reflect SYN queue_mapping into SYNACK
> > packets) works on ixgbe, but probably not on other multiqueue devices.
> >
> > This sounds like a regression to me.
> >
> >
> >
> 
> Well it would get picked up via skb_tx_hash(),
> 
>          else if (ops->ndo_select_queue) {
> 		[...]
>          } else {
>                  struct sock *sk = skb->sk;
>                  queue_index = sk_tx_queue_get(sk);
> 
>                  if (queue_index < 0 || skb->ooo_okay ||
>                      queue_index >= dev->real_num_tx_queues) {
>                          int old_index = queue_index;
> 
>                          queue_index = get_xps_queue(dev, skb);
>                          if (queue_index < 0)
>                                  queue_index = skb_tx_hash(dev, skb);
> 	[...]
> 
> 
> So think this might be OK.

Yes, it sounds like sk setting (sk->sk_tx_queue_mapping) has precedence
over skb->queue_mapping.

Not sure how it works for UDP workload for example.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  7:48               ` Eric Dumazet
@ 2012-06-08  7:52                 ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08  7:52 UTC (permalink / raw)
  To: John Fastabend; +Cc: David Miller, therbert, netdev

On Fri, 2012-06-08 at 09:49 +0200, Eric Dumazet wrote:

> Yes, it sounds like sk setting (sk->sk_tx_queue_mapping) has precedence
> over skb->queue_mapping.
> 
> Not sure how it works for UDP workload for example.

Unconnected UDP sockets dont have sk_dst_cache, so
sk->sk_tx_queue_mapping stay to -1, so everything seems good.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08  7:23           ` Eric Dumazet
  2012-06-08  7:42             ` John Fastabend
  2012-06-08  7:46             ` Eric Dumazet
@ 2012-06-08 15:04             ` Tom Herbert
  2012-06-08 15:11               ` Eric Dumazet
  2 siblings, 1 reply; 17+ messages in thread
From: Tom Herbert @ 2012-06-08 15:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

> I must say I dont understand dev_pick_tx() anymore.
>
> It seems to ignore skb->queue_mapping (unless device provides its own
> ndo_select_queue() and this functions is aware of skb->queue_mapping, as
> correctly done in ixgbe)
>
> So commit fff3269907897ee (tcp: reflect SYN queue_mapping into SYNACK
> packets) works on ixgbe, but probably not on other multiqueue devices.
>
> This sounds like a regression to me.
>
Maybe the fundamental issue is that the queue mappings only allow for
one level of multi queue device.  It might be better if bonding didn't
have one and dev_pick_tx did the right thin (use xps on bonding
maybe).

Tom

>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08 15:04             ` Tom Herbert
@ 2012-06-08 15:11               ` Eric Dumazet
  2012-06-08 16:16                 ` John Fastabend
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2012-06-08 15:11 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev

On Fri, 2012-06-08 at 08:04 -0700, Tom Herbert wrote:

> Maybe the fundamental issue is that the queue mappings only allow for
> one level of multi queue device.  It might be better if bonding didn't
> have one and dev_pick_tx did the right thin (use xps on bonding
> maybe).

bonding misuses multiqueue infrastructure to divert frames on selected
slaves, or maybe I am wrong.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] bonding: Fix corrupted queue_mapping
  2012-06-08 15:11               ` Eric Dumazet
@ 2012-06-08 16:16                 ` John Fastabend
  0 siblings, 0 replies; 17+ messages in thread
From: John Fastabend @ 2012-06-08 16:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev

On 6/8/2012 8:11 AM, Eric Dumazet wrote:
> On Fri, 2012-06-08 at 08:04 -0700, Tom Herbert wrote:
>
>> Maybe the fundamental issue is that the queue mappings only allow for
>> one level of multi queue device.  It might be better if bonding didn't
>> have one and dev_pick_tx did the right thin (use xps on bonding
>> maybe).
>
> bonding misuses multiqueue infrastructure to divert frames on selected
> slaves, or maybe I am wrong.
>

This is right see bond_slave_override() here the slaves queue_ids
are mapped to skb->queue_mapping via this TX_QUEUE_OVERRIDE param.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-06-08 16:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-08  5:05 [PATCH] bonding: Fix corrupted queue_mapping Tom Herbert
2012-06-08  5:46 ` David Miller
2012-06-08  5:57 ` Eric Dumazet
2012-06-08  6:02   ` David Miller
2012-06-08  6:11     ` Eric Dumazet
2012-06-08  6:15       ` David Miller
2012-06-08  6:47         ` Eric Dumazet
2012-06-08  7:23           ` Eric Dumazet
2012-06-08  7:42             ` John Fastabend
2012-06-08  7:48               ` Eric Dumazet
2012-06-08  7:52                 ` Eric Dumazet
2012-06-08  7:46             ` Eric Dumazet
2012-06-08 15:04             ` Tom Herbert
2012-06-08 15:11               ` Eric Dumazet
2012-06-08 16:16                 ` John Fastabend
2012-06-08  6:17     ` Eric Dumazet
2012-06-08  6:22       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).