netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] core: Untag packets after rx_handler has run.
@ 2014-09-04 18:40 Vladislav Yasevich
  2014-09-04 19:05 ` Jiri Pirko
  0 siblings, 1 reply; 7+ messages in thread
From: Vladislav Yasevich @ 2014-09-04 18:40 UTC (permalink / raw)
  To: netdev
  Cc: Vladislav Yasevich, Jiri Pirko, Florian Zumbiehl, Eric Dumazet,
	Matthew Rosato

Currently, we attempt to remove the vlan informaion from the packet
before passing it to the rx_handler.  In most situations this works
just fine since the rx_handlers are usually installed for the
lower device and thus vlan device isn't found.  However, macvtap
device is a bit different as it installs an rx_handler on top
of a macvlan device.  As a result, if someone was define a vlan
device on top of a macvap (for the purposes of enabling a VM
to use vlans with macvtap), then the current code will result
in passing an untagged packet to the macvtap rx_handler and the
VM will not receive tagged traffic.

This patch moves the untaggin code to after the rx_handler for
the current device has been called.  This still works for the
existing rx_handlers (like bonding/teaming/bridging/etc) and
also makes vlans on top of macvtap work as before.

Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device)
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 net/core/dev.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ab9a165..54691d1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3642,17 +3642,6 @@ ncls:
 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
 		goto drop;
 
-	if (vlan_tx_tag_present(skb)) {
-		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-			pt_prev = NULL;
-		}
-		if (vlan_do_receive(&skb))
-			goto another_round;
-		else if (unlikely(!skb))
-			goto unlock;
-	}
-
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
 		if (pt_prev) {
@@ -3674,6 +3663,17 @@ ncls:
 		}
 	}
 
+	if (vlan_tx_tag_present(skb)) {
+		if (pt_prev) {
+			ret = deliver_skb(skb, pt_prev, orig_dev);
+			pt_prev = NULL;
+		}
+		if (vlan_do_receive(&skb))
+			goto another_round;
+		else if (unlikely(!skb))
+			goto unlock;
+	}
+
 	if (unlikely(vlan_tx_tag_present(skb))) {
 		if (vlan_tx_tag_get_id(skb))
 			skb->pkt_type = PACKET_OTHERHOST;
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] core: Untag packets after rx_handler has run.
  2014-09-04 18:40 [PATCH net] core: Untag packets after rx_handler has run Vladislav Yasevich
@ 2014-09-04 19:05 ` Jiri Pirko
  2014-09-04 19:29   ` Vlad Yasevich
  0 siblings, 1 reply; 7+ messages in thread
From: Jiri Pirko @ 2014-09-04 19:05 UTC (permalink / raw)
  To: Vladislav Yasevich
  Cc: netdev, Vladislav Yasevich, Florian Zumbiehl, Eric Dumazet,
	Matthew Rosato

Thu, Sep 04, 2014 at 08:40:43PM CEST, vyasevich@gmail.com wrote:
>Currently, we attempt to remove the vlan informaion from the packet
>before passing it to the rx_handler.  In most situations this works
>just fine since the rx_handlers are usually installed for the
>lower device and thus vlan device isn't found.  However, macvtap
>device is a bit different as it installs an rx_handler on top
>of a macvlan device.  As a result, if someone was define a vlan
>device on top of a macvap (for the purposes of enabling a VM
>to use vlans with macvtap), then the current code will result
>in passing an untagged packet to the macvtap rx_handler and the
>VM will not receive tagged traffic.

skb->vlan_tci is set. macvlan should work with that to pass the frame
correctly. This should be handled in macvtap code.

btw can you give me an example of setup where rx_handler is set for
macvlan device? I wonder what is could be.

>
>This patch moves the untaggin code to after the rx_handler for
>the current device has been called.  This still works for the
>existing rx_handlers (like bonding/teaming/bridging/etc) and
>also makes vlans on top of macvtap work as before.
>
>Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device)
>Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>---
> net/core/dev.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index ab9a165..54691d1 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -3642,17 +3642,6 @@ ncls:
> 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
> 		goto drop;
> 
>-	if (vlan_tx_tag_present(skb)) {
>-		if (pt_prev) {
>-			ret = deliver_skb(skb, pt_prev, orig_dev);
>-			pt_prev = NULL;
>-		}
>-		if (vlan_do_receive(&skb))
>-			goto another_round;
>-		else if (unlikely(!skb))
>-			goto unlock;
>-	}
>-
> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
> 	if (rx_handler) {
> 		if (pt_prev) {
>@@ -3674,6 +3663,17 @@ ncls:
> 		}
> 	}
> 
>+	if (vlan_tx_tag_present(skb)) {
>+		if (pt_prev) {
>+			ret = deliver_skb(skb, pt_prev, orig_dev);
>+			pt_prev = NULL;
>+		}
>+		if (vlan_do_receive(&skb))
>+			goto another_round;
>+		else if (unlikely(!skb))
>+			goto unlock;
>+	}
>+

nack. This will definitelly break several stacked setups.


> 	if (unlikely(vlan_tx_tag_present(skb))) {
> 		if (vlan_tx_tag_get_id(skb))
> 			skb->pkt_type = PACKET_OTHERHOST;
>-- 
>1.9.3
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] core: Untag packets after rx_handler has run.
  2014-09-04 19:05 ` Jiri Pirko
@ 2014-09-04 19:29   ` Vlad Yasevich
  2014-09-04 20:43     ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Vlad Yasevich @ 2014-09-04 19:29 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, Vladislav Yasevich, Florian Zumbiehl, Eric Dumazet,
	Matthew Rosato

On 09/04/2014 03:05 PM, Jiri Pirko wrote:
> Thu, Sep 04, 2014 at 08:40:43PM CEST, vyasevich@gmail.com wrote:
>> Currently, we attempt to remove the vlan informaion from the packet
>> before passing it to the rx_handler.  In most situations this works
>> just fine since the rx_handlers are usually installed for the
>> lower device and thus vlan device isn't found.  However, macvtap
>> device is a bit different as it installs an rx_handler on top
>> of a macvlan device.  As a result, if someone was define a vlan
>> device on top of a macvap (for the purposes of enabling a VM
>> to use vlans with macvtap), then the current code will result
>> in passing an untagged packet to the macvtap rx_handler and the
>> VM will not receive tagged traffic.
> 
> skb->vlan_tci is set. macvlan should work with that to pass the frame
> correctly. This should be handled in macvtap code.

OK.  Consider a configuration.

               vlan10
vlan10           |
  |         VM1:eth0
  v          |
macvtap0 <---+
  |
  V
eth0

On the host, vlan10 can't really send and receive traffic.  It's only
there to add a vlan filter to eth0 so that packets marked with vlan10
can be received.

When traffic is processed by __netif_receive_skb_core(), skb->dev is eth0 and we
have the tci, so that when vlan_do_receive() is called, we can't find the vlan
device and call the rx_handler macvlan_handle_frame().
That handler calls netif_rx() with skb->dev set to macvtap0.

This time through the receive path, vlan_tci is still set and we
do find the vlan device which is on top of macvtap0, so we set the tci to 0
and then pass it to the rx_handler macvtap_handle_frame().

As a result, we pass an untagged frame to the VM.

> 
> btw can you give me an example of setup where rx_handler is set for
> macvlan device? I wonder what is could be.
> 
>>
>> This patch moves the untaggin code to after the rx_handler for
>> the current device has been called.  This still works for the
>> existing rx_handlers (like bonding/teaming/bridging/etc) and
>> also makes vlans on top of macvtap work as before.
>>
>> Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device)
>> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>> net/core/dev.c | 22 +++++++++++-----------
>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index ab9a165..54691d1 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3642,17 +3642,6 @@ ncls:
>> 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
>> 		goto drop;
>>
>> -	if (vlan_tx_tag_present(skb)) {
>> -		if (pt_prev) {
>> -			ret = deliver_skb(skb, pt_prev, orig_dev);
>> -			pt_prev = NULL;
>> -		}
>> -		if (vlan_do_receive(&skb))
>> -			goto another_round;
>> -		else if (unlikely(!skb))
>> -			goto unlock;
>> -	}
>> -
>> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
>> 	if (rx_handler) {
>> 		if (pt_prev) {
>> @@ -3674,6 +3663,17 @@ ncls:
>> 		}
>> 	}
>>
>> +	if (vlan_tx_tag_present(skb)) {
>> +		if (pt_prev) {
>> +			ret = deliver_skb(skb, pt_prev, orig_dev);
>> +			pt_prev = NULL;
>> +		}
>> +		if (vlan_do_receive(&skb))
>> +			goto another_round;
>> +		else if (unlikely(!skb))
>> +			goto unlock;
>> +	}
>> +
> 
> nack. This will definitelly break several stacked setups.

Which ones?  The only thing I can see that would behave differently
is something like:

    vlan0      bridge0
     |           |
     +-------- eth0

In this case, the old code would give an untagged packet to the bridge
and the new code would give a tagged packet.

This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
traffic even though the vlan has no relationship to the bridge.

I've tested a couple of different stacked setups and they all seem to work.

Thanks
-vlad

> 
> 
>> 	if (unlikely(vlan_tx_tag_present(skb))) {
>> 		if (vlan_tx_tag_get_id(skb))
>> 			skb->pkt_type = PACKET_OTHERHOST;
>> -- 
>> 1.9.3
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] core: Untag packets after rx_handler has run.
  2014-09-04 19:29   ` Vlad Yasevich
@ 2014-09-04 20:43     ` Alexei Starovoitov
  2014-09-04 21:01       ` Vlad Yasevich
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2014-09-04 20:43 UTC (permalink / raw)
  To: Vlad Yasevich
  Cc: Jiri Pirko, netdev, Vladislav Yasevich, Florian Zumbiehl,
	Eric Dumazet, Matthew Rosato

On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
> > nack. This will definitelly break several stacked setups.
> 
> Which ones?  The only thing I can see that would behave differently
> is something like:
> 
>     vlan0      bridge0
>      |           |
>      +-------- eth0
> 
> In this case, the old code would give an untagged packet to the bridge
> and the new code would give a tagged packet.
> 
> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
> traffic even though the vlan has no relationship to the bridge.
> 
> I've tested a couple of different stacked setups and they all seem to work.

2nd nack.
It will break user space, including our setup that has:
 vlanX     OVS
   |        |
   +------ eth0

vlan device has IP assigned and all tagged traffic goes through the stack
and into control plane process. ovs datapath keeps managing eth0 with
all other vlans.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] core: Untag packets after rx_handler has run.
  2014-09-04 20:43     ` Alexei Starovoitov
@ 2014-09-04 21:01       ` Vlad Yasevich
  2014-09-04 21:54         ` Alexei Starovoitov
  0 siblings, 1 reply; 7+ messages in thread
From: Vlad Yasevich @ 2014-09-04 21:01 UTC (permalink / raw)
  To: Alexei Starovoitov, Vlad Yasevich
  Cc: Jiri Pirko, netdev, Florian Zumbiehl, Eric Dumazet,
	Matthew Rosato

On 09/04/2014 04:43 PM, Alexei Starovoitov wrote:
> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
>>> nack. This will definitelly break several stacked setups.
>>
>> Which ones?  The only thing I can see that would behave differently
>> is something like:
>>
>>     vlan0      bridge0
>>      |           |
>>      +-------- eth0
>>
>> In this case, the old code would give an untagged packet to the bridge
>> and the new code would give a tagged packet.
>>
>> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
>> traffic even though the vlan has no relationship to the bridge.
>>
>> I've tested a couple of different stacked setups and they all seem to work.
> 
> 2nd nack.
> It will break user space, including our setup that has:
>  vlanX     OVS
>    |        |
>    +------ eth0
> 
> vlan device has IP assigned and all tagged traffic goes through the stack
> and into control plane process. ovs datapath keeps managing eth0 with
> all other vlans.
> 

Did you specially configure OVS to pass the traffic up the stack?  I see
OVS will only pass LOOPBACK packets.  All others it seems to consume.

Can the same be accomplished with a tagged internal port?

The reason I am asking, is I am trying to figure out if this is
a valid config.  It seems very hard to get right and seems to work almost
by accident at times.  For example, in the bridge scenario I described.
vlan and bridge have to share a mac address for that work.

-vlad

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] core: Untag packets after rx_handler has run.
  2014-09-04 21:01       ` Vlad Yasevich
@ 2014-09-04 21:54         ` Alexei Starovoitov
  2014-09-04 23:48           ` Vlad Yasevich
  0 siblings, 1 reply; 7+ messages in thread
From: Alexei Starovoitov @ 2014-09-04 21:54 UTC (permalink / raw)
  To: vyasevic
  Cc: Vlad Yasevich, Jiri Pirko, netdev@vger.kernel.org,
	Florian Zumbiehl, Eric Dumazet, Matthew Rosato

On Thu, Sep 4, 2014 at 2:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> On 09/04/2014 04:43 PM, Alexei Starovoitov wrote:
>> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
>>>> nack. This will definitelly break several stacked setups.
>>>
>>> Which ones?  The only thing I can see that would behave differently
>>> is something like:
>>>
>>>     vlan0      bridge0
>>>      |           |
>>>      +-------- eth0
>>>
>>> In this case, the old code would give an untagged packet to the bridge
>>> and the new code would give a tagged packet.
>>>
>>> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
>>> traffic even though the vlan has no relationship to the bridge.
>>>
>>> I've tested a couple of different stacked setups and they all seem to work.
>>
>> 2nd nack.
>> It will break user space, including our setup that has:
>>  vlanX     OVS
>>    |        |
>>    +------ eth0
>>
>> vlan device has IP assigned and all tagged traffic goes through the stack
>> and into control plane process. ovs datapath keeps managing eth0 with
>> all other vlans.
>>
>
> Did you specially configure OVS to pass the traffic up the stack?  I see
> OVS will only pass LOOPBACK packets.  All others it seems to consume.
>
> Can the same be accomplished with a tagged internal port?

our ovs config is not using internal port. vlan device is used as
control interface and should be independent of ovs datapath.
Theoretically it may be possible to use ovs for both, but very dangerous,
when control and data are going through the same datapath.
Any ovs programming mistake will kill control plane and whole
hypervisor will become inaccessible.

> The reason I am asking, is I am trying to figure out if this is
> a valid config.  It seems very hard to get right and seems to work almost
> by accident at times.  For example, in the bridge scenario I described.
> vlan and bridge have to share a mac address for that work.

I think it's not valid vs invalid config.
this was the behavior of vlan devices for long time. vlan was parsed
and send to vlan_dev _before_ rx_handler. I suspect there is more
than one user app that is relying on that.
I can change our stuff to do something different, but I think we
should not be breaking vlan behavior for others.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] core: Untag packets after rx_handler has run.
  2014-09-04 21:54         ` Alexei Starovoitov
@ 2014-09-04 23:48           ` Vlad Yasevich
  0 siblings, 0 replies; 7+ messages in thread
From: Vlad Yasevich @ 2014-09-04 23:48 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Vlad Yasevich, Jiri Pirko, netdev@vger.kernel.org,
	Florian Zumbiehl, Eric Dumazet, Matthew Rosato

On 09/04/2014 05:54 PM, Alexei Starovoitov wrote:
> On Thu, Sep 4, 2014 at 2:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
>> On 09/04/2014 04:43 PM, Alexei Starovoitov wrote:
>>> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
>>>>> nack. This will definitelly break several stacked setups.
>>>>
>>>> Which ones?  The only thing I can see that would behave differently
>>>> is something like:
>>>>
>>>>     vlan0      bridge0
>>>>      |           |
>>>>      +-------- eth0
>>>>
>>>> In this case, the old code would give an untagged packet to the bridge
>>>> and the new code would give a tagged packet.
>>>>
>>>> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
>>>> traffic even though the vlan has no relationship to the bridge.
>>>>
>>>> I've tested a couple of different stacked setups and they all seem to work.
>>>
>>> 2nd nack.
>>> It will break user space, including our setup that has:
>>>  vlanX     OVS
>>>    |        |
>>>    +------ eth0
>>>
>>> vlan device has IP assigned and all tagged traffic goes through the stack
>>> and into control plane process. ovs datapath keeps managing eth0 with
>>> all other vlans.
>>>
>>
>> Did you specially configure OVS to pass the traffic up the stack?  I see
>> OVS will only pass LOOPBACK packets.  All others it seems to consume.
>>
>> Can the same be accomplished with a tagged internal port?
> 
> our ovs config is not using internal port. vlan device is used as
> control interface and should be independent of ovs datapath.
> Theoretically it may be possible to use ovs for both, but very dangerous,
> when control and data are going through the same datapath.
> Any ovs programming mistake will kill control plane and whole
> hypervisor will become inaccessible.
> 
>> The reason I am asking, is I am trying to figure out if this is
>> a valid config.  It seems very hard to get right and seems to work almost
>> by accident at times.  For example, in the bridge scenario I described.
>> vlan and bridge have to share a mac address for that work.
> 
> I think it's not valid vs invalid config.
> this was the behavior of vlan devices for long time. vlan was parsed
> and send to vlan_dev _before_ rx_handler. I suspect there is more
> than one user app that is relying on that.
> I can change our stuff to do something different, but I think we
> should not be breaking vlan behavior for others.
> 

I see.  So vlan device always appears to take precedence over the rx_handler
if they are at the same level and we can't break this.

OK, this means that to solve this we have to expose the vlan filtering
API on macvtap devices as well.

Thanks
-vlad

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-09-04 23:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-04 18:40 [PATCH net] core: Untag packets after rx_handler has run Vladislav Yasevich
2014-09-04 19:05 ` Jiri Pirko
2014-09-04 19:29   ` Vlad Yasevich
2014-09-04 20:43     ` Alexei Starovoitov
2014-09-04 21:01       ` Vlad Yasevich
2014-09-04 21:54         ` Alexei Starovoitov
2014-09-04 23:48           ` Vlad Yasevich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).