From: Jiri Pirko <jpirko@redhat.com>
To: "Nicolas de Pesloüan" <nicolas.2p.debian@gmail.com>
Cc: David Miller <davem@davemloft.net>,
kaber@trash.net, eric.dumazet@gmail.com, netdev@vger.kernel.org,
shemminger@linux-foundation.org, fubar@us.ibm.com,
andy@greyhouse.net
Subject: Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
Date: Sat, 26 Feb 2011 15:58:05 +0100 [thread overview]
Message-ID: <20110226145804.GC2783@psychotron.redhat.com> (raw)
In-Reply-To: <4D68E31E.7060807@gmail.com>
Sat, Feb 26, 2011 at 12:25:18PM CET, nicolas.2p.debian@gmail.com wrote:
>Le 26/02/2011 08:14, Jiri Pirko a écrit :
>>Sat, Feb 26, 2011 at 12:46:53AM CET, nicolas.2p.debian@gmail.com wrote:
>>>Le 23/02/2011 20:05, Jiri Pirko a écrit :
>>>>This patch converts bonding to use rx_handler. Results in cleaner
>>>>__netif_receive_skb() with much less exceptions needed. Also
>>>>bond-specific work is moved into bond code.
>>>>
>>>>Did performance test using pktgen and counting incoming packets by
>>>>iptables. No regression noted.
>>>>
>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>
>>>>v1->v2:
>>>> using skb_iif instead of new input_dev to remember original
>>>> device
>>>>
>>>>v2->v3:
>>>> do another loop in case skb->dev is changed. That way orig_dev
>>>> core can be left untouched.
>>>
>>>Hi Jiri,
>>>
>>>Eventually taking enough time for a review.
>>>
>>>I think we should split this change :
>>>
>>>1/ Change __netif_receive_skb() to call rx_handler for diverted net_device, until rx_handler is NULL.
>>>
>>>2/ Convert currently existing rx_handlers (bridge and macvlan) to use
>>>this new "loop" feature, removing the need to call netif_rx() inside
>>>their respective rx_handler and also removing the associated
>>>overhead.
>>
>>This might not be possible. Macvlan uses result of called netif_rx for
>>counting, bridge calls netdev_receive_skb via NF_HOOK. Nevertheless,
>>this can be eventually handled later, not as a part of this patch.
>
>Yes, I agree. Step 2 and step 3 can be swapped.
>
>Anyway, we need to describe the options given to a rx_handler:
>
>- Return skb unchanged. This would cause normal delivery (ptype->dev == NULL or ptype->dev == skb->dev).
>- Return skb->dev changed. __netif_receive_skb() will loop to the new
>device. This would cause extact match delivery only (ptype->dev !=
>NULL and ptype->dev == one of the orig_dev).
>- Manage the skb another way and return NULL. This would stop any
>protocol handlers to receive the skb, except if the rx_handler
>arrange to re-inject the skb somewhere.
>
>>>3/ Convert bonding to use rx_handlers.
>>>
>>>Also, on step 1, we definitely need to clarify what orig_dev should be.
>>>
>>>I now think that orig_dev should be "the device one level below the
>>>current one" or NULL if current device was not diverted from another
>>>one. It means that we should keep an array of crossed (diverted)
>>>devices and the associated orig_dev. This array would be used to pass
>>>the right orig_dev to protocol handlers, depending on the device they
>>>register on :
>>
>>I constructed the patch in the way origdev is the same in all situations
>>as before the patch. I think that this decision can be ommitted at the
>>moment.
>
>Agreed, event if the current handling of orig_dev is far from bullet
>proof and needs to be clarified at some time.
>
>>>eth0 -> bond0 -> br0
>>>
>>>A protocol handler registered on bond0 would receive eth0 as orig_dev.
>>>A protocol handler registered on br0 would receive bond0 as orig_dev.
>>>
>>>[snip]
>>>
>>>>@@ -3167,32 +3135,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>
>>>[snip]
>>>
>>>>+another_round:
>>>>+
>>>>+ __this_cpu_inc(softnet_data.processed);
>>>>+
>>>> #ifdef CONFIG_NET_CLS_ACT
>>>> if (skb->tc_verd& TC_NCLS) {
>>>> skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
>>>>@@ -3209,8 +3157,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>> #endif
>>>>
>>>> list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>- if (ptype->dev == null_or_orig || ptype->dev == skb->dev ||
>>>>- ptype->dev == orig_dev) {
>>>>+ if (!ptype->dev || ptype->dev == skb->dev) {
>>>> if (pt_prev)
>>>> ret = deliver_skb(skb, pt_prev, orig_dev);
>>>> pt_prev = ptype;
>>>>@@ -3224,16 +3171,20 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>> ncls:
>>>> #endif
>>>>
>>>
>>>Why do you loop to ptype_all before calling rx_handler ?
>>>
>>>I don't understand why ptype_all and ptype_base are not handled at
>>>the same place in current __netif_receive_skb() but I think we should
>>>take the opportunity to change that, unless someone know of a good
>>>reason not to do so.
>>
>>Again, the patch tries to do as little changes as it can. So this stays
>>the same as before. In case you want to change it, feel free to submit
>>patch doing that as follow-on.
>
>The point here is that bridge and macvlan handling used to be after
>the ptype_all loop (hence the place you inserted the call to
>rx_handler last summer), but the bonding part is currently before the
>ptype_all loop.
>
>Moving bonding handling after the ptype_all loop will cause the ptype_all loop to be run twice:
>- first time, with skb->dev == eth0 and orig_dev == eth0.
>- second time, with skb->dev == bond0 and orig_dev == eth0.
>
>The first time currently does not exists. And because bonding wasn't
>given a chance yet to decide that the frame should be dropped, the
>packet will always be delivered to eth0, causing duplicate
>deliveries. Note that this is probably true for bridge and macvlan
>too, and that those duplicate deliveries probably already exists.
Yes, and in fact that was what I like about this patch, that then
deliveries are simillar to bridge.
>
>Also, delivering skb inside a loop that may change the skb (skb->dev
>at least) is guaranteed to produce strange behaviors.
>
>Can someone, knowing the history of
>ptype_all/ptype_base/bridge/macvlan/bonding/vlan handling in
>__netif_receive_skb(), comment on this?
>
>Are there any reasons not to process ptype_all and ptype_base at the
>same location, at the end of __netif_receive_skb(), and to manage all
>divert features (bridge/macvlan/bonding/vlan) before?
That is very good set of questions. Would like to hear answers too.
>
> Nicolas.
next prev parent reply other threads:[~2011-02-26 14:58 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-18 13:25 [patch net-next-2.6] net: convert bonding to use rx_handler Jiri Pirko
2011-02-18 13:29 ` Eric Dumazet
2011-02-18 14:14 ` Jiri Pirko
2011-02-18 14:27 ` Eric Dumazet
2011-02-18 14:46 ` Patrick McHardy
2011-02-18 14:58 ` Jiri Pirko
2011-02-18 15:50 ` Patrick McHardy
2011-02-18 16:14 ` Eric Dumazet
2011-02-18 18:47 ` Jiri Pirko
2011-02-18 19:17 ` Eric Dumazet
2011-02-18 19:28 ` Jiri Pirko
2011-02-18 19:58 ` Eric Dumazet
2011-02-18 20:03 ` Jiri Pirko
2011-02-18 20:06 ` David Miller
2011-02-18 20:13 ` Jiri Pirko
2011-02-18 20:58 ` [patch net-next-2.6 V2] " Jiri Pirko
2011-02-18 23:06 ` Jay Vosburgh
2011-02-19 7:44 ` Jiri Pirko
2011-02-19 8:05 ` [patch net-next-2.6 V3] " Jiri Pirko
2011-02-19 8:37 ` Eric Dumazet
2011-02-19 8:58 ` Jiri Pirko
2011-02-19 9:22 ` Eric Dumazet
2011-02-19 10:56 ` Nicolas de Pesloüan
2011-02-19 11:08 ` Jiri Pirko
2011-02-19 11:28 ` Jiri Pirko
2011-02-19 13:18 ` Nicolas de Pesloüan
2011-02-19 13:46 ` Jiri Pirko
2011-02-19 14:32 ` Nicolas de Pesloüan
2011-02-19 20:27 ` Nicolas de Pesloüan
2011-02-20 10:36 ` Jiri Pirko
2011-02-20 12:12 ` Nicolas de Pesloüan
2011-02-20 15:07 ` Jiri Pirko
2011-02-21 23:20 ` Nicolas de Pesloüan
2011-02-26 14:24 ` Nicolas de Pesloüan
2011-02-26 19:42 ` Jay Vosburgh
2011-02-27 12:58 ` Jiri Pirko
2011-02-27 20:44 ` Nicolas de Pesloüan
2011-02-27 23:22 ` David Miller
2011-02-28 7:07 ` Jiri Pirko
2011-02-28 7:30 ` David Miller
2011-02-28 9:22 ` Jiri Pirko
2011-02-28 9:35 ` Eric Dumazet
2011-02-28 9:55 ` [patch net-next-2.6] net: convert bonding to use rx_handler - second part Jiri Pirko
2011-02-28 18:49 ` [patch net-next-2.6 V3] net: convert bonding to use rx_handler David Miller
2011-02-23 19:05 ` Jiri Pirko
2011-02-25 23:46 ` Nicolas de Pesloüan
2011-02-26 7:14 ` Jiri Pirko
2011-02-26 11:25 ` Nicolas de Pesloüan
2011-02-26 14:58 ` Jiri Pirko [this message]
2011-02-27 14:17 ` Nicolas de Pesloüan
2011-02-27 20:06 ` Jiri Pirko
2011-02-27 20:59 ` Nicolas de Pesloüan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110226145804.GC2783@psychotron.redhat.com \
--to=jpirko@redhat.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=fubar@us.ibm.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=nicolas.2p.debian@gmail.com \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).