From: "Nicolas de Pesloüan" <nicolas.2p.debian@gmail.com>
To: Jiri Pirko <jpirko@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>,
David Miller <davem@davemloft.net>,
kaber@trash.net, eric.dumazet@gmail.com, netdev@vger.kernel.org,
shemminger@linux-foundation.org, andy@greyhouse.net
Subject: Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
Date: Sat, 19 Feb 2011 14:18:00 +0100 [thread overview]
Message-ID: <4D5FC308.9020507@gmail.com> (raw)
In-Reply-To: <20110219112842.GE2782@psychotron.redhat.com>
Le 19/02/2011 12:28, Jiri Pirko a écrit :
> Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>> Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>> Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>> This patch converts bonding to use rx_handler. Results in cleaner
>>>> __netif_receive_skb() with much less exceptions needed. Also
>>>> bond-specific work is moved into bond code.
>>>>
>>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>
>>>> v1->v2:
>>>> using skb_iif instead of new input_dev to remember original
>>>> device
>>>> v2->v3:
>>>> set orig_dev = skb->dev if skb_iif is set
>>>>
>>>
>>> Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>
>>> Bonding used to be handled with very few overhead, simply replacing
>>> skb->dev with skb->dev->master. Time has passed and we eventually
>>> added many special processing for bonding into __netif_receive_skb(),
>>> but the overhead remained very light.
>>>
>>> Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>
>>> Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>> whatever need to be delivered, to whoever need, inside the loop ?
>>>
>>> rx_handler = rcu_dereference(skb->dev->rx_handler);
>>> while (rx_handler) {
>>> /* ... */
>>> orig_dev = skb->dev;
>>> skb = rx_handler(skb);
>>> /* ... */
>>> rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>> }
>>>
>>> This would reduce the overhead, while still allowing nesting: vlan on
>>> top on bonding, bridge on top on bonding, ...
>>
>> I see your point. Makes sense to me. But the loop would have to include
>> at least processing of ptype_all too. I'm going to cook a follow-up
>> patch.
>>
>
> DRAFT (doesn't modify rx_handlers):
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 4ebf7fe..e5dba47 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
> {
> struct packet_type *ptype, *pt_prev;
> rx_handler_func_t *rx_handler;
> + struct net_device *dev;
> struct net_device *orig_dev;
> struct net_device *null_or_dev;
> int ret = NET_RX_DROP;
> @@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
> if (netpoll_receive_skb(skb))
> return NET_RX_DROP;
>
> - __this_cpu_inc(softnet_data.processed);
> + skb->skb_iif = skb->dev->ifindex;
> + orig_dev = skb->dev;
orig_dev should be set inside the loop, to reflect "previously crossed device", while following the
path:
eth0 -> bond0 -> br0.
First step inside loop:
orig_dev = eth0
skb->dev = bond0 (at the end of the loop).
Second step inside loop:
orig_dev = bond0
skb->dev = br0 (et the end of the loop).
This would allow for exact match delivery to bond0 if someone bind there.
> +
> skb_reset_network_header(skb);
> skb_reset_transport_header(skb);
> skb->mac_len = skb->network_header - skb->mac_header;
> @@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>
> rcu_read_lock();
>
> - if (!skb->skb_iif) {
> - skb->skb_iif = skb->dev->ifindex;
> - orig_dev = skb->dev;
> - } else {
> - orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
> - }
I like the fact that it removes the above part.
> +another_round:
> + __this_cpu_inc(softnet_data.processed);
> + dev = skb->dev;
>
> #ifdef CONFIG_NET_CLS_ACT
> if (skb->tc_verd& TC_NCLS) {
> @@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
> #endif
>
> list_for_each_entry_rcu(ptype,&ptype_all, list) {
> - if (!ptype->dev || ptype->dev == skb->dev) {
> + if (!ptype->dev || ptype->dev == dev) {
> if (pt_prev)
> ret = deliver_skb(skb, pt_prev, orig_dev);
> pt_prev = ptype;
Inside the loop, we should only do exact match delivery, for &ptype_all and for
&ptype_base[ntohs(type) & PTYPE_HASH_MASK]:
list_for_each_entry_rcu(ptype, &ptype_all, list) {
- if (!ptype->dev || ptype->dev == dev) {
+ if (ptype->dev == dev) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
if (ptype->type == type &&
- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
+ (ptype->dev == skb->dev)) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
After leaving the loop, we can do wilcard delivery, if skb is not NULL.
list_for_each_entry_rcu(ptype, &ptype_all, list) {
- if (!ptype->dev || ptype->dev == dev) {
+ if (!ptype->dev) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
- if (ptype->type == type &&
- (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
+ if (ptype->type == type && !ptype->dev) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
This would reduce the number of tests inside the list_for_each_entry_rcu() loops. And because we
match only ptype->dev == dev inside the loop and !ptype->dev outside the loop, this should avoid
duplicate delivery.
Also, for performance reason, exact match protocol handler lists might be moved from ptype_base or
ptype_all to a per net_device list. That way, the list_for_each_entry_rcu() inside the loop could be
empty if no protocol handler bind on the current dev.
inside loop:
list_for_each_entry_rcu(ptype, dev->ptype_all, list) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
list_for_each_entry_rcu(ptype,
dev->ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
if (ptype->type == type) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
Outside loop :
list_for_each_entry_rcu(ptype, &ptype_all, list) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
list_for_each_entry_rcu(ptype,
&ptype_base[ntohs(type) & PTYPE_HASH_MASK], list) {
if (ptype->type == type) {
if (pt_prev)
ret = deliver_skb(skb, pt_prev, orig_dev);
pt_prev = ptype;
}
}
This would require several changes into ptype_all and ptype_base handling, but should be faster.
> @@ -3167,7 +3167,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
> ncls:
> #endif
>
> - rx_handler = rcu_dereference(skb->dev->rx_handler);
> + rx_handler = rcu_dereference(dev->rx_handler);
> if (rx_handler) {
> if (pt_prev) {
> ret = deliver_skb(skb, pt_prev, orig_dev);
> @@ -3176,6 +3176,8 @@ ncls:
> skb = rx_handler(skb);
> if (!skb)
> goto out;
> + if (dev != skb->dev)
I would use "if (skb->dev != dev)" for clarity, because skb->dev is expected to have changed, not dev.
> + goto another_round;
> }
>
> if (vlan_tx_tag_present(skb)) {
>
Nicolas.
next prev parent reply other threads:[~2011-02-19 13:23 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-18 13:25 [patch net-next-2.6] net: convert bonding to use rx_handler Jiri Pirko
2011-02-18 13:29 ` Eric Dumazet
2011-02-18 14:14 ` Jiri Pirko
2011-02-18 14:27 ` Eric Dumazet
2011-02-18 14:46 ` Patrick McHardy
2011-02-18 14:58 ` Jiri Pirko
2011-02-18 15:50 ` Patrick McHardy
2011-02-18 16:14 ` Eric Dumazet
2011-02-18 18:47 ` Jiri Pirko
2011-02-18 19:17 ` Eric Dumazet
2011-02-18 19:28 ` Jiri Pirko
2011-02-18 19:58 ` Eric Dumazet
2011-02-18 20:03 ` Jiri Pirko
2011-02-18 20:06 ` David Miller
2011-02-18 20:13 ` Jiri Pirko
2011-02-18 20:58 ` [patch net-next-2.6 V2] " Jiri Pirko
2011-02-18 23:06 ` Jay Vosburgh
2011-02-19 7:44 ` Jiri Pirko
2011-02-19 8:05 ` [patch net-next-2.6 V3] " Jiri Pirko
2011-02-19 8:37 ` Eric Dumazet
2011-02-19 8:58 ` Jiri Pirko
2011-02-19 9:22 ` Eric Dumazet
2011-02-19 10:56 ` Nicolas de Pesloüan
2011-02-19 11:08 ` Jiri Pirko
2011-02-19 11:28 ` Jiri Pirko
2011-02-19 13:18 ` Nicolas de Pesloüan [this message]
2011-02-19 13:46 ` Jiri Pirko
2011-02-19 14:32 ` Nicolas de Pesloüan
2011-02-19 20:27 ` Nicolas de Pesloüan
2011-02-20 10:36 ` Jiri Pirko
2011-02-20 12:12 ` Nicolas de Pesloüan
2011-02-20 15:07 ` Jiri Pirko
2011-02-21 23:20 ` Nicolas de Pesloüan
2011-02-26 14:24 ` Nicolas de Pesloüan
2011-02-26 19:42 ` Jay Vosburgh
2011-02-27 12:58 ` Jiri Pirko
2011-02-27 20:44 ` Nicolas de Pesloüan
2011-02-27 23:22 ` David Miller
2011-02-28 7:07 ` Jiri Pirko
2011-02-28 7:30 ` David Miller
2011-02-28 9:22 ` Jiri Pirko
2011-02-28 9:35 ` Eric Dumazet
2011-02-28 9:55 ` [patch net-next-2.6] net: convert bonding to use rx_handler - second part Jiri Pirko
2011-02-28 18:49 ` [patch net-next-2.6 V3] net: convert bonding to use rx_handler David Miller
2011-02-23 19:05 ` Jiri Pirko
2011-02-25 23:46 ` Nicolas de Pesloüan
2011-02-26 7:14 ` Jiri Pirko
2011-02-26 11:25 ` Nicolas de Pesloüan
2011-02-26 14:58 ` Jiri Pirko
2011-02-27 14:17 ` Nicolas de Pesloüan
2011-02-27 20:06 ` Jiri Pirko
2011-02-27 20:59 ` Nicolas de Pesloüan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D5FC308.9020507@gmail.com \
--to=nicolas.2p.debian@gmail.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=fubar@us.ibm.com \
--cc=jpirko@redhat.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.