Re: RFS seems to have incompatiblities with bridged vlans

All of lore.kernel.org
 help / color / mirror / Atom feed

From: John Fastabend <john.r.fastabend@intel.com>
To: "tim.gardner@canonical.com" <tim.gardner@canonical.com>
Cc: Tom Herbert <therbert@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	Peter Lieven <pl@dlh.net>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: RFS seems to have incompatiblities with bridged vlans
Date: Tue, 08 Jun 2010 18:08:43 -0700	[thread overview]
Message-ID: <4C0EE99B.8030300@intel.com> (raw)
In-Reply-To: <4C0ED931.6030402@canonical.com>

Tim Gardner wrote:
> On 06/08/2010 05:00 PM, Tom Herbert wrote:
>> How about only checking against dev->num_rx_queues when that value is
>> greater than one.  Since bonding device is calling alloc_netdev, it is
>> not going to set the queue mapping, but dev->num_rx_queues will be one
>> in that case (this handles any intermediate driver that does do
>> multiple queues).
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 6f330ce..30ab66d 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2270,7 +2270,7 @@ static int get_rps_cpu(struct net_device *dev,
>> struct sk_buff *skb,
>>                  u16 v16[2];
>>          } ports;
>>
>> -       if (skb_rx_queue_recorded(skb)) {
>> +       if (skb_rx_queue_recorded(skb)&&  dev->num_rx_queues>  1) {
>>                  u16 index = skb_get_rx_queue(skb);
>>                  if (unlikely(index>= dev->num_rx_queues)) {
>>                          if (net_ratelimit()) {
>>


Problem with this is it doesn't address mis-aligned num_rx_queues.  For example 
with the bonding driver defaulting to 16 queues now. We could end up with a base 
driver with 16+ queues and a bond with 16.  Then we have the same issue again.

  eth0  -------> bond / bridge ---------> vlan.id
(nbrxq=64)      (nbrxq=16)              (nbrxq=X)

>>
>> On Tue, Jun 8, 2010 at 7:18 AM, Eric Dumazet<eric.dumazet@gmail.com>  wrote:
>>> Le lundi 07 juin 2010 à 15:30 -0700, John Fastabend a écrit :
>>>
>>>> There is always a possibility that the underlying device sets the
>>>> queue_mapping to be greater then num_cpus.  Also I suspect the same
>>>> issue exists with bonding devices.  Maybe something like the following
>>>> is worth while? compile tested only,
>>>>
>>>> [PATCH] 8021q: vlan reassigns dev without check queue_mapping
>>>>
>>>> recv path reassigns skb->dev without sanity checking the
>>>> queue_mapping field.  This can result in the queue_mapping
>>>> field being set incorrectly if the new dev supports less
>>>> queues then the underlying device.
>>>>
>>>> This patch just resets the queue_mapping to 0 which should
>>>> resolve this issue?  Any thoughts?
>>>>
>>>> The same issue could happen on bonding devices as well.
>>>>
>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>>> ---
>>>>
>>>>    net/8021q/vlan_core.c |    6 ++++++
>>>>    1 files changed, 6 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>>>> index bd537fc..ad309f8 100644
>>>> --- a/net/8021q/vlan_core.c
>>>> +++ b/net/8021q/vlan_core.c
>>>> @@ -21,6 +21,9 @@ int __vlan_hwaccel_rx(struct sk_buff *skb, struct
>>>> vlan_group *grp,
>>>>        if (!skb->dev)
>>>>                goto drop;
>>>>
>>>> +     if (unlikely(skb->queue_mapping>= skb->dev->real_num_tx_queues))
>>>> +             skb_set_queue_mapping(skb, 0);
>>>> +
>>>>        return (polling ? netif_receive_skb(skb) : netif_rx(skb));
>>>>
>>>>    drop:
>>>> @@ -93,6 +96,9 @@ vlan_gro_common(struct napi_struct *napi, struct
>>>> vlan_group *grp,
>>>>        if (!skb->dev)
>>>>                goto drop;
>>>>
>>>> +     if (unlikely(skb->queue_mapping>= skb->dev->real_num_tx_queues))
>>>> +             skb_set_queue_mapping(skb, 0);
>>>> +
>>>>        for (p = napi->gro_list; p; p = p->next) {
>>>>                NAPI_GRO_CB(p)->same_flow =
>>>>                        p->dev == skb->dev&&  !compare_ether_header(
>>>> --
>>> Only a workaround, added in hot path in a otherwise 'good' driver
>>> (multiqueue enabled and ready)

Agreed thanks!

>>>
>>> eth0  ------->  bond / bridge --------->  vlan.id
>>> (nbtxq=8)      (ntxbq=1)        (nbtxq=X)
>>>
>>> X is capped to 1 because of bond/bridge, while bond has no "queue"
>>> (LLTX driver)
>>>
>>> Solutions :
>>>
>>> 1) queue_mapping could be silently tested in get_rps_cpu()...
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 6f330ce..3a3f7f6 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -2272,14 +2272,11 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
>>>
>>>         if (skb_rx_queue_recorded(skb)) {
>>>                 u16 index = skb_get_rx_queue(skb);
>>> -               if (unlikely(index>= dev->num_rx_queues)) {
>>> -                       if (net_ratelimit()) {
>>> -                               pr_warning("%s received packet on queue "
>>> -                                       "%u, but number of RX queues is %u\n",
>>> -                                       dev->name, index, dev->num_rx_queues);
>>> -                       }
>>> -                       goto done;
>>> -               }
>>> +               if (WARN_ONCE(index>= dev->num_rx_queues,
>>> +                               KERN_WARNING "%s received packet on queue %u, "
>>> +                               "but number of RX queues is %u\n",
>>> +                               dev->name, index, dev->num_rx_queues))
>>> +                       index %= dev->num_rx_queues;
>>>                 rxqueue = dev->_rx + index;
>>>         } else
>>>                 rxqueue = dev->_rx;
>>>
>>>

Looks good to me.

>>>
>>> 2) bond/bridge should setup more queues, just in case.
>>>    We probably need to be able to make things more dynamic,
>>>    (propagate nbtxq between layers) but not for 2.6.35
>>>
>>>

The bonding driver is already multiq per Andy Gospodarek's patch commit bb1d912, 
but unless the bond and bridge devices use the max num_rx_queues of there 
underlying devices this could still go wrong.

The bonding driver would possibly need to increase num_rx_queues and 
num_tx_queues when a device is enslaved or be set to some maximum at init for 
this to work right.

>>>
>>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>>> index 5e12462..ce813dd 100644
>>> --- a/drivers/net/bonding/bond_main.c
>>> +++ b/drivers/net/bonding/bond_main.c
>>> @@ -5012,8 +5012,8 @@ int bond_create(struct net *net, const char *name)
>>>
>>>         rtnl_lock();
>>>
>>> -       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
>>> -                               bond_setup);
>>> +       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
>>> +                                  bond_setup, max(64, nr_cpu_ids));
>>>         if (!bond_dev) {
>>>                 pr_err("%s: eek! can't alloc netdev!\n", name);
>>>                 rtnl_unlock();
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Huh, so you guys are looking at the same issue (only my issue is RPS). 
> See http://marc.info/?l=linux-netdev&m=127603240621028&w=2. I'm in favor 
> of dropping the warning when no queues have been allocated.
> 
> How about this (see attached).

Prefer Eric's patch see first comment.

Thanks,
John

> 
> rtg
> 
>

next prev parent reply	other threads:[~2010-06-09  1:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-07 20:36 RFS seems to have incompatiblities with bridged vlans Peter Lieven
2010-06-07 21:59 ` Stephen Hemminger
2010-06-07 22:19   ` Eric Dumazet
2010-06-07 22:30   ` John Fastabend
2010-06-07 23:13     ` John Fastabend
2010-06-08 14:18     ` Eric Dumazet
2010-06-08 23:00       ` Tom Herbert
2010-06-08 23:58         ` Tim Gardner
2010-06-09  1:08           ` John Fastabend [this message]
2010-06-09  1:52             ` Tom Herbert
2010-06-09  4:42             ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C0EE99B.8030300@intel.com \
    --to=john.r.fastabend@intel.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pl@dlh.net \
    --cc=shemminger@vyatta.com \
    --cc=therbert@google.com \
    --cc=tim.gardner@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.