[PATCH] net: take care of bonding in build_skb_flow

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] net: take care of bonding in build_skb_flow_key (v4)
@ 2016-01-21  5:32 Wengang Wang
  2016-01-21  8:35 ` Jiri Pirko
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2016-01-21  5:32 UTC (permalink / raw)
  To: netdev; +Cc: wen.gang.wang, sd, jay.vosburgh, zyjzyj2000

In a bonding setting, we determines fragment size according to MTU and
PMTU associated to the bonding master. If the slave finds the fragment
size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
passing _skb_ and _pmtu_, trying to update the path MTU.
Problem is that the target device that function ip_rt_update_pmtu actually
tries to update is the slave (skb->dev), not the master. Thus since no
PMTU change happens on master, the fragment size for later packets doesn't
change so all later fragments/packets are dropped too.

The fix is letting build_skb_flow_key() take care of the transition of
device index from bonding slave to the master. That makes the master become
the target device that ip_rt_update_pmtu tries to update PMTU to.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 net/ipv4/route.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 85f184e..7e766b5 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	int oif = skb->dev->ifindex;
+	struct net_device *master;
 	u8 tos = RT_TOS(iph->tos);
 	u8 prot = iph->protocol;
 	u32 mark = skb->mark;

+	if (netif_is_bond_slave(skb->dev)) {
+		rcu_read_lock();
+		master = netdev_master_upper_dev_get_rcu(skb->dev);
+		if (master)
+			oif = master->ifindex;
+		rcu_read_unlock();
+	}
+
 	__build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
 }

-- 
2.1.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
  2016-01-21  5:32 [PATCH] net: take care of bonding in build_skb_flow_key (v4) Wengang Wang
@ 2016-01-21  8:35 ` Jiri Pirko
  2016-01-22  4:21   ` Wengang Wang
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Pirko @ 2016-01-21  8:35 UTC (permalink / raw)
  To: Wengang Wang; +Cc: netdev, sd, jay.vosburgh, zyjzyj2000

Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.wang@oracle.com wrote:
>In a bonding setting, we determines fragment size according to MTU and
>PMTU associated to the bonding master. If the slave finds the fragment
>size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>passing _skb_ and _pmtu_, trying to update the path MTU.
>Problem is that the target device that function ip_rt_update_pmtu actually
>tries to update is the slave (skb->dev), not the master. Thus since no
>PMTU change happens on master, the fragment size for later packets doesn't
>change so all later fragments/packets are dropped too.
>
>The fix is letting build_skb_flow_key() take care of the transition of
>device index from bonding slave to the master. That makes the master become
>the target device that ip_rt_update_pmtu tries to update PMTU to.
>
>Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>---
> net/ipv4/route.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
>diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>index 85f184e..7e766b5 100644
>--- a/net/ipv4/route.c
>+++ b/net/ipv4/route.c
>@@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
> {
> 	const struct iphdr *iph = ip_hdr(skb);
> 	int oif = skb->dev->ifindex;
>+	struct net_device *master;
> 	u8 tos = RT_TOS(iph->tos);
> 	u8 prot = iph->protocol;
> 	u32 mark = skb->mark;
> 
>+	if (netif_is_bond_slave(skb->dev)) {
>+		rcu_read_lock();
>+		master = netdev_master_upper_dev_get_rcu(skb->dev);
>+		if (master)
>+			oif = master->ifindex;
>+		rcu_read_unlock();
>+	}

This is certainly not correct as it should not be bond-specific but
rather generic. Note that you may have bond over bond or bridge over
bond or other scenarios, which this patch ignores.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
  2016-01-21  8:35 ` Jiri Pirko
@ 2016-01-22  4:21   ` Wengang Wang
  2016-01-22  6:52     ` Jiri Pirko
  0 siblings, 1 reply; 6+ messages in thread
From: Wengang Wang @ 2016-01-22  4:21 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, sd, jay.vosburgh, zyjzyj2000



在 2016年01月21日 16:35, Jiri Pirko 写道:
> Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.wang@oracle.com wrote:
>> In a bonding setting, we determines fragment size according to MTU and
>> PMTU associated to the bonding master. If the slave finds the fragment
>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>> passing _skb_ and _pmtu_, trying to update the path MTU.
>> Problem is that the target device that function ip_rt_update_pmtu actually
>> tries to update is the slave (skb->dev), not the master. Thus since no
>> PMTU change happens on master, the fragment size for later packets doesn't
>> change so all later fragments/packets are dropped too.
>>
>> The fix is letting build_skb_flow_key() take care of the transition of
>> device index from bonding slave to the master. That makes the master become
>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>> ---
>> net/ipv4/route.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 85f184e..7e766b5 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>> {
>> 	const struct iphdr *iph = ip_hdr(skb);
>> 	int oif = skb->dev->ifindex;
>> +	struct net_device *master;
>> 	u8 tos = RT_TOS(iph->tos);
>> 	u8 prot = iph->protocol;
>> 	u32 mark = skb->mark;
>>
>> +	if (netif_is_bond_slave(skb->dev)) {
>> +		rcu_read_lock();
>> +		master = netdev_master_upper_dev_get_rcu(skb->dev);
>> +		if (master)
>> +			oif = master->ifindex;
>> +		rcu_read_unlock();
>> +	}
> This is certainly not correct as it should not be bond-specific but
> rather generic.

Then what you would suggest to fix it?
> Note that you may have bond over bond or bridge over
> bond or other scenarios, which this patch ignores.
I don't think bond over bond is a good configuration. Do you have a real 
use case for that configuration?

thanks,
wengang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
  2016-01-22  4:21   ` Wengang Wang
@ 2016-01-22  6:52     ` Jiri Pirko
  2016-01-22  8:00       ` Wengang Wang
  2016-01-26  7:45       ` zhuyj
  0 siblings, 2 replies; 6+ messages in thread
From: Jiri Pirko @ 2016-01-22  6:52 UTC (permalink / raw)
  To: Wengang Wang; +Cc: netdev, sd, jay.vosburgh, zyjzyj2000

Fri, Jan 22, 2016 at 05:21:28AM CET, wen.gang.wang@oracle.com wrote:
>
>
>在 2016年01月21日 16:35, Jiri Pirko 写道:
>>Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.wang@oracle.com wrote:
>>>In a bonding setting, we determines fragment size according to MTU and
>>>PMTU associated to the bonding master. If the slave finds the fragment
>>>size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>>passing _skb_ and _pmtu_, trying to update the path MTU.
>>>Problem is that the target device that function ip_rt_update_pmtu actually
>>>tries to update is the slave (skb->dev), not the master. Thus since no
>>>PMTU change happens on master, the fragment size for later packets doesn't
>>>change so all later fragments/packets are dropped too.
>>>
>>>The fix is letting build_skb_flow_key() take care of the transition of
>>>device index from bonding slave to the master. That makes the master become
>>>the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>
>>>Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>---
>>>net/ipv4/route.c | 9 +++++++++
>>>1 file changed, 9 insertions(+)
>>>
>>>diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>index 85f184e..7e766b5 100644
>>>--- a/net/ipv4/route.c
>>>+++ b/net/ipv4/route.c
>>>@@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>>>{
>>>	const struct iphdr *iph = ip_hdr(skb);
>>>	int oif = skb->dev->ifindex;
>>>+	struct net_device *master;
>>>	u8 tos = RT_TOS(iph->tos);
>>>	u8 prot = iph->protocol;
>>>	u32 mark = skb->mark;
>>>
>>>+	if (netif_is_bond_slave(skb->dev)) {
>>>+		rcu_read_lock();
>>>+		master = netdev_master_upper_dev_get_rcu(skb->dev);
>>>+		if (master)
>>>+			oif = master->ifindex;
>>>+		rcu_read_unlock();
>>>+	}
>>This is certainly not correct as it should not be bond-specific but
>>rather generic.
>
>Then what you would suggest to fix it?
>>Note that you may have bond over bond or bridge over
>>bond or other scenarios, which this patch ignores.
>I don't think bond over bond is a good configuration. Do you have a real use
>case for that configuration?

Stacking of multiple master devices is absolutelly common.

You have to go in the upper tree all the way up, for all master device
types.


>
>thanks,
>wengang
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
  2016-01-22  6:52     ` Jiri Pirko
@ 2016-01-22  8:00       ` Wengang Wang
  2016-01-26  7:45       ` zhuyj
  1 sibling, 0 replies; 6+ messages in thread
From: Wengang Wang @ 2016-01-22  8:00 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, sd, jay.vosburgh, zyjzyj2000



在 2016年01月22日 14:52, Jiri Pirko 写道:
> Fri, Jan 22, 2016 at 05:21:28AM CET, wen.gang.wang@oracle.com wrote:
>>
>> 在 2016年01月21日 16:35, Jiri Pirko 写道:
>>> Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.wang@oracle.com wrote:
>>>> In a bonding setting, we determines fragment size according to MTU and
>>>> PMTU associated to the bonding master. If the slave finds the fragment
>>>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>>> Problem is that the target device that function ip_rt_update_pmtu actually
>>>> tries to update is the slave (skb->dev), not the master. Thus since no
>>>> PMTU change happens on master, the fragment size for later packets doesn't
>>>> change so all later fragments/packets are dropped too.
>>>>
>>>> The fix is letting build_skb_flow_key() take care of the transition of
>>>> device index from bonding slave to the master. That makes the master become
>>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>>
>>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>> ---
>>>> net/ipv4/route.c | 9 +++++++++
>>>> 1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>> index 85f184e..7e766b5 100644
>>>> --- a/net/ipv4/route.c
>>>> +++ b/net/ipv4/route.c
>>>> @@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>>>> {
>>>> 	const struct iphdr *iph = ip_hdr(skb);
>>>> 	int oif = skb->dev->ifindex;
>>>> +	struct net_device *master;
>>>> 	u8 tos = RT_TOS(iph->tos);
>>>> 	u8 prot = iph->protocol;
>>>> 	u32 mark = skb->mark;
>>>>
>>>> +	if (netif_is_bond_slave(skb->dev)) {
>>>> +		rcu_read_lock();
>>>> +		master = netdev_master_upper_dev_get_rcu(skb->dev);
>>>> +		if (master)
>>>> +			oif = master->ifindex;
>>>> +		rcu_read_unlock();
>>>> +	}
>>> This is certainly not correct as it should not be bond-specific but
>>> rather generic.
>> Then what you would suggest to fix it?
>>> Note that you may have bond over bond or bridge over
>>> bond or other scenarios, which this patch ignores.
>> I don't think bond over bond is a good configuration. Do you have a real use
>> case for that configuration?
> Stacking of multiple master devices is absolutelly common.
>
> You have to go in the upper tree all the way up, for all master device
> types.
Yep, to make code better. I can do it.

thanks,
wengang

>
>> thanks,
>> wengang
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] net: take care of bonding in build_skb_flow_key (v4)
  2016-01-22  6:52     ` Jiri Pirko
  2016-01-22  8:00       ` Wengang Wang
@ 2016-01-26  7:45       ` zhuyj
  1 sibling, 0 replies; 6+ messages in thread
From: zhuyj @ 2016-01-26  7:45 UTC (permalink / raw)
  To: Jiri Pirko, Wengang Wang; +Cc: netdev, sd, jay.vosburgh, zhuyj

On 01/22/2016 02:52 PM, Jiri Pirko wrote:
> Fri, Jan 22, 2016 at 05:21:28AM CET, wen.gang.wang@oracle.com wrote:
>>
>> 在 2016年01月21日 16:35, Jiri Pirko 写道:
>>> Thu, Jan 21, 2016 at 06:32:58AM CET, wen.gang.wang@oracle.com wrote:
>>>> In a bonding setting, we determines fragment size according to MTU and
>>>> PMTU associated to the bonding master. If the slave finds the fragment
>>>> size is too big, it drops the fragment and calls ip_rt_update_pmtu(),
>>>> passing _skb_ and _pmtu_, trying to update the path MTU.
>>>> Problem is that the target device that function ip_rt_update_pmtu actually
>>>> tries to update is the slave (skb->dev), not the master. Thus since no
>>>> PMTU change happens on master, the fragment size for later packets doesn't
>>>> change so all later fragments/packets are dropped too.
>>>>
>>>> The fix is letting build_skb_flow_key() take care of the transition of
>>>> device index from bonding slave to the master. That makes the master become
>>>> the target device that ip_rt_update_pmtu tries to update PMTU to.
>>>>
>>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>>> ---
>>>> net/ipv4/route.c | 9 +++++++++
>>>> 1 file changed, 9 insertions(+)
>>>>
>>>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>>>> index 85f184e..7e766b5 100644
>>>> --- a/net/ipv4/route.c
>>>> +++ b/net/ipv4/route.c
>>>> @@ -524,10 +524,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, const struct sk_buff *skb,
>>>> {
>>>> 	const struct iphdr *iph = ip_hdr(skb);
>>>> 	int oif = skb->dev->ifindex;
>>>> +	struct net_device *master;
>>>> 	u8 tos = RT_TOS(iph->tos);
>>>> 	u8 prot = iph->protocol;
>>>> 	u32 mark = skb->mark;
>>>>
>>>> +	if (netif_is_bond_slave(skb->dev)) {
>>>> +		rcu_read_lock();
>>>> +		master = netdev_master_upper_dev_get_rcu(skb->dev);
>>>> +		if (master)
>>>> +			oif = master->ifindex;
>>>> +		rcu_read_unlock();
>>>> +	}
>>> This is certainly not correct as it should not be bond-specific but
>>> rather generic.
>> Then what you would suggest to fix it?
>>> Note that you may have bond over bond or bridge over
>>> bond or other scenarios, which this patch ignores.
>> I don't think bond over bond is a good configuration. Do you have a real use
>> case for that configuration?
> Stacking of multiple master devices is absolutelly common.
>
> You have to go in the upper tree all the way up, for all master device
> types.
I am not sure that the following can work or not.
Just a test patch.

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 85f184e..12b4982 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -523,10 +523,19 @@ static void build_skb_flow_key(struct flowi4 *fl4, 
const struct sk_buff *skb,
                                const struct sock *sk)
  {
         const struct iphdr *iph = ip_hdr(skb);
-       int oif = skb->dev->ifindex;
+       struct net_device *master = NULL;
         u8 tos = RT_TOS(iph->tos);
         u8 prot = iph->protocol;
         u32 mark = skb->mark;
+       int oif = skb->dev->ifindex;
+
+       if (skb->dev->flags & IFF_SLAVE) {
+               rcu_read_lock();
+               master = skb_dst(skb)->dev;
+               if (master)
+                       oif = master->ifindex;
+               rcu_read_unlock();
+       }

         __build_flow_key(fl4, sk, iph, oif, tos, prot, mark, 0);
  }

Thanks a lot.
Zhu Yanjun
>
>
>> thanks,
>> wengang
>>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-26  7:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-21  5:32 [PATCH] net: take care of bonding in build_skb_flow_key (v4) Wengang Wang
2016-01-21  8:35 ` Jiri Pirko
2016-01-22  4:21   ` Wengang Wang
2016-01-22  6:52     ` Jiri Pirko
2016-01-22  8:00       ` Wengang Wang
2016-01-26  7:45       ` zhuyj

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).