From: "Nicolas de Pesloüan" <nicolas.2p.debian@gmail.com>
To: Jiri Pirko <jpirko@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>,
netdev@vger.kernel.org, davem@davemloft.net,
shemminger@linux-foundation.org, kaber@trash.net,
fubar@us.ibm.com, eric.dumazet@gmail.com
Subject: Re: [patch net-next-2.6] net: reinject arps into bonding slave instead of master
Date: Wed, 09 Mar 2011 23:18:41 +0100 [thread overview]
Message-ID: <4D77FCC1.5050904@gmail.com> (raw)
In-Reply-To: <20110309171100.GA2842@psychotron.redhat.com>
Le 09/03/2011 18:11, Jiri Pirko a écrit :
> Wed, Mar 09, 2011 at 04:28:34PM CET, nicolas.2p.debian@gmail.com wrote:
>> Le 09/03/2011 16:09, Jiri Pirko a écrit :
>>> Wed, Mar 09, 2011 at 03:49:49PM CET, nicolas.2p.debian@gmail.com wrote:
>>>> Le 09/03/2011 08:45, Jiri Pirko a écrit :
>>>>> Tue, Mar 08, 2011 at 10:44:37PM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>> Le 08/03/2011 14:42, Andy Gospodarek a écrit :
>>>>>>> I'm pretty sure this patch will have the same catastrophic problem your
>>>>>>> last one did. By cloning and setting skb2->dev = orig_dev you just
>>>>>>> inserted a frame identical to the one we received right back into the
>>>>>>> stack. It only took a few minutes for my box to melt as one frame on
>>>>>>> the wire will cause an infinite number of frames to be received by the
>>>>>>> stack.
>>>>>>
>>>>>> I agree with Andy. We still keep one reinject (netif_rx), which is
>>>>>> probably better that two (__netif_receive_skb), but not enough.
>>>>>>
>>>>>> I really think we need a general framework for late delivery of final
>>>>>> packets to packet handler registered somewhere in the rx_handler
>>>>>> path.
>>>>>>
>>>>>> Jiri, is this patch the one you announced as "I have some kind nice
>>>>>> solution in mind and I'm going to submit that as a patch later (too
>>>>>> many patches are in the wind atm)" ?
>>>>>
>>>>>
>>>>> I did not had time to verify my thought yet but I think that the only
>>>>> think needed against my original patch (bonding: move processing of recv
>>>>> handlers into handle_frame()) is ro remove vlan_on_bond_hook, period.
>>>>>
>>>>> Because all incoming arps are seen by bond_handle_frame =>
>>>>> bond->recv_probe , even vlan ones - that would make eth0-bond0-bond0.5
>>>>> work and eth0-bond0-br0-bond0.5 as well. But again, need to verify this.
>>>>
>>>> All incoming ARPs are seen by bond_handle_frame, even vlan ones, definitely true.
>>>>
>>>> But as some of them *are* vlan ones, you will have to untag those "by hand" inside bond_handle_frame.
>>>
>>
>>> Hmm. For hw vlan accel this would not be needed.
>>
>> Agreed.
>>
>>> But for non-hw-vlan-accel the frame is wrapped when going thru handle_frame. And yes, in that
>>> case untagging would be necessary. Unless the vlan untagging happening now in ptype_base handler
>>> is moved in rx path somewhere before __netif_receive_skb.
>>
>> Can't it me moved not before, but inside __netif_receive_skb, as a rx_handler?
>
> I don't think so - rx_handler is not right place to untag. rx_handler is
> per device. untaging should happen idealy on realdev skb injection. That
> way the rx path for hw-vlan-accel and non-he-vlan-accel would be very
> similar.
I don't understand your argument about "rx_handler is per device". A vlan interface is by design
built on top a parent (physical or logical) interface. Why can't we use the rx_handler of the parent
device to untag?
Untagging the frame very early (before __netif_receive_skb() would work if the value of skb->dev, at
the time the frame enter __netif_receive_skb, is the parent interface for the vlan interface: eth0
-> eth0.100. This is exactly what happens for hw-accel vlan frame and it sounds logical at first to
do it the same way for non-hw-accel device.
But for all others setups, where there exist some net_devices before the "untagging" one, you would
face some troubles. For example, with eth0+eth1 -> br0 -> br0.100, you cannot untag before entering
__netif_receive_skb. If you do so, the bridge would receive untagged frame and if the frame is not
for the local host, the bridge would forward an untagged frame while it is expected to forward a
tagged one. Even if the bridge is in a position to know the frame *was* tagged, we cannot expect the
bridge to do special processing to handle this situation. Doing so would break layering.
On another setup, eth0 -> eth0.100 -> br0 -> eth1.200 -> eth1, on the opposite, we expect the bridge
to receive and to forward an untagged frame, that has been untagged while crossing eth0.100 and will
be tagged while crossing eth1.200.
If we want to avoid reinjection as much as possible (and I advocate for that, you know :-)), then
the only place to untag is in the rx_header for the parent device of the vlan device. I still don't
understand the problem you see with this proposal.
And the only remaining discrepancy would be in hw-accel vs non-hw-accel. A protocol handler
registered on eth0 -> eth0.100 would receive:
- a tagged frame for non-hw-accel.
- an untagged frame for hw-accel.
I think it is already true today, so I don't consider this a problem. We might arrange to fix this
discrepancy later, if someone consider this a problem.
Nicolas.
>> At the time we setup eth0.100, we can arrange for a vlan_untag
>> rx_handler to be registered on eth0. This is exactly the way it works
>> now for macvlan. Should it be possible (and desirable) for vlan too?
>>
>> This might re-open the discussion about the need for several
>> rx_handlers per interface, but that is another story.
>>
>> Also, moving it before __netif_receive_skb would cause every protocol
>> handlers to receive the frame untagged. At least protocol sniffers
>> registered at ptype_all may want to receive the tagged frame, for
>> diagnostic purpose.
>
> Thats how its done for hw-vlan-accel. No problem here.
>
>>
>> That would result in two things:
>>>
>>> 1) Bond would be able to scope vlan packets
>>> 2) The rx path for hw-vlan-accel and non-hw-vlan-accel would be the same
>>> (should be desirable + one reinjection would be avoided for
>>> non-hw-vlan-accel)
>>
>> Agreed, but moving it inside __netif_receive_skb should have the same
>> effects. If we move non-hw-accel-vlan to a rx_handler, the skb would
>> be COW before being untagged only if shared. This sounds good to me.
>>
>> Nicolas.
>
next prev parent reply other threads:[~2011-03-09 22:18 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-05 10:29 [patch net-next-2.6 0/8] mostly bonding rx path changes Jiri Pirko
2011-03-05 10:29 ` [patch net-next-2.6 1/8] af_packet: use skb->skb_iif instead of orig_dev->ifindex Jiri Pirko
2011-03-05 14:03 ` Nicolas de Pesloüan
2011-03-05 10:29 ` [patch net-next-2.6 2/8] bonding: register slave pointer for rx_handler Jiri Pirko
2011-03-05 14:06 ` Nicolas de Pesloüan
2011-03-05 14:27 ` Jiri Pirko
2011-03-05 14:38 ` Nicolas de Pesloüan
2011-03-05 10:29 ` [patch net-next-2.6 3/8] net: get rid of multiple bond-related netdevice->priv_flags Jiri Pirko
2011-03-05 14:14 ` Nicolas de Pesloüan
2011-03-05 14:37 ` Ben Hutchings
2011-03-05 14:46 ` Nicolas de Pesloüan
2011-03-05 10:29 ` [patch net-next-2.6 4/8] bonding: wrap slave state work Jiri Pirko
2011-03-05 15:21 ` Nicolas de Pesloüan
2011-03-07 9:58 ` Jiri Pirko
2011-03-07 19:55 ` Nicolas de Pesloüan
2011-03-08 7:18 ` Jiri Pirko
2011-03-08 21:23 ` Nicolas de Pesloüan
2011-03-05 10:29 ` [patch net-next-2.6 5/8] bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag Jiri Pirko
2011-03-05 14:18 ` Nicolas de Pesloüan
2011-03-05 10:29 ` [patch net-next-2.6 6/8] bonding: move processing of recv handlers into handle_frame() Jiri Pirko
2011-03-05 14:33 ` Nicolas de Pesloüan
2011-03-05 14:43 ` Jiri Pirko
2011-03-05 14:50 ` Nicolas de Pesloüan
2011-03-06 12:24 ` Nicolas de Pesloüan
2011-03-06 13:34 ` Jiri Pirko
2011-03-06 14:25 ` Nicolas de Pesloüan
2011-03-06 16:32 ` Michał Mirosław
2011-03-06 17:37 ` Nicolas de Pesloüan
2011-03-07 12:51 ` [patch net-next-2.6] net: reinject arps into bonding slave instead of master Jiri Pirko
2011-03-07 14:32 ` Andy Gospodarek
2011-03-07 20:12 ` Nicolas de Pesloüan
2011-03-07 21:19 ` Jiri Pirko
2011-03-07 21:30 ` Nicolas de Pesloüan
2011-03-07 22:43 ` Andy Gospodarek
2011-03-07 23:09 ` Nicolas de Pesloüan
2011-03-08 2:43 ` Andy Gospodarek
2011-03-08 21:34 ` Nicolas de Pesloüan
2011-03-08 7:13 ` Jiri Pirko
2011-03-08 13:42 ` Andy Gospodarek
2011-03-08 21:44 ` Nicolas de Pesloüan
2011-03-09 7:45 ` Jiri Pirko
2011-03-09 14:49 ` Nicolas de Pesloüan
2011-03-09 15:09 ` Jiri Pirko
2011-03-09 15:28 ` Nicolas de Pesloüan
2011-03-09 17:11 ` Jiri Pirko
2011-03-09 22:18 ` Nicolas de Pesloüan [this message]
2011-03-10 6:48 ` Jiri Pirko
2011-03-10 20:44 ` Nicolas de Pesloüan
2011-03-10 20:52 ` Jiri Pirko
2011-03-10 21:05 ` Jiri Pirko
2011-03-09 20:51 ` Jiri Pirko
2011-03-09 13:33 ` Neil Horman
2011-03-05 10:29 ` [patch net-next-2.6 7/8] net: introduce rx_handler results and logic around that Jiri Pirko
2011-03-05 12:48 ` Ben Hutchings
2011-03-05 14:52 ` Nicolas de Pesloüan
2011-03-05 14:54 ` Jiri Pirko
2011-03-05 15:06 ` Nicolas de Pesloüan
2011-03-05 15:13 ` [patch net-next-2.6] net: comment rx_handler results Jiri Pirko
2011-03-05 15:27 ` Nicolas de Pesloüan
2011-03-05 15:37 ` Jiri Pirko
2011-03-05 15:50 ` Nicolas de Pesloüan
2011-03-06 20:00 ` [PATCH net-next-2.6] net: enhance the documentation for rx_handler Nicolas de Pesloüan
2011-03-07 9:54 ` Jiri Pirko
2011-03-07 16:36 ` Stephen Hemminger
2011-03-07 20:01 ` [PATCH net-next-2.6 V2] " Nicolas de Pesloüan
2011-03-05 15:31 ` [patch net-next-2.6 7/8 v2] net: introduce rx_handler results and logic around that Jiri Pirko
2011-03-05 10:29 ` [patch net-next-2.6 8/8] net: get rid of orig_dev parameter of packet handlers Jiri Pirko
2011-03-05 15:05 ` Nicolas de Pesloüan
2011-03-05 15:15 ` Jiri Pirko
2011-03-05 15:32 ` [patch net-next-2.6 8/8 v2] " Jiri Pirko
2011-03-05 16:56 ` Nicolas de Pesloüan
2011-03-05 22:07 ` Jiri Pirko
2011-03-05 22:18 ` Nicolas de Pesloüan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D77FCC1.5050904@gmail.com \
--to=nicolas.2p.debian@gmail.com \
--cc=andy@greyhouse.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=fubar@us.ibm.com \
--cc=jpirko@redhat.com \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=shemminger@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).