* [PATCH V2] ipvlan: fix ipvlan MTU limits
@ 2018-01-10 7:21 liuqifa
2018-01-11 2:09 ` Mahesh Bandewar (महेश बंडेवार)
0 siblings, 1 reply; 9+ messages in thread
From: liuqifa @ 2018-01-10 7:21 UTC (permalink / raw)
To: davem, dsahern, maheshb, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi
Cc: netdev, weiyongjun1, chenweilong, maowenan, wangyufen, yuehaibing,
liujian56, liuqifa, liuzhe24, wangkefeng.wang, dingtianhong
From: Keefe Liu <liuqifa@huawei.com>
The MTU of ipvlan interface should not bigger than the phy device, When we
run following scripts, we will find there are some problems.
Step1:
ip link add link eth0 name ipv1 type ipvlan mode l2
ip netns add net1
ip link set dev ipv1 netns net1
Step2:
ip netns exec net1 ip link set dev ipv1 mtu 1501
RTNETLINK answers: Invalid argument
dmesg info: "ipv1: Invalid MTU 1501 requested, hw max 1500"
Step3:
ip link set dev eth0 mtu 1600
ip netns exec net1 ip link set dev ipv1 mtu 1501
RTNETLINK answers: Invalid argument
dmesg info: "ipv1: Invalid MTU 1501 requested, hw max 1500"
Step4:
ip link set dev eth0 mtu 1400
ip netns exec net1 ip link set dev ipv1 mtu 1500
The result of Step2 is we expected, but the result of Step3 and Step4
are not.
This patch set ipvlan's MTU ranges from ETH_MIN_MTU to phy device's maximum
MTU. When we create a new ipvlan device, its MTU will be set to the same
with phy device if we not specify the mtu value. When we change the ipvlan
device's MTU, ipvlan_change_mtu() will make sure the new MTU no bigger than
the phy device's MTU.
When we change the phy device's MTU, ipvlan devices attached to this
phy device will check their MTU value. If ipvlan's MTU bigger than phy's,
ipvlan's MTU will be set to the same with phy device, otherwise ipvlan's
MTU remains unchanged.
Signed-off-by: Keefe Liu <liuqifa@huawei.com>
---
drivers/net/ipvlan/ipvlan_main.c | 36 ++++++++++++++++++++++++++++--------
1 file changed, 28 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 30cb803..2ba071a 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -34,11 +34,6 @@ struct ipvlan_netns {
.l3mdev_l3_rcv = ipvlan_l3_rcv,
};
-static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
-{
- ipvlan->dev->mtu = dev->mtu;
-}
-
static int ipvlan_register_nf_hook(struct net *net)
{
struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
@@ -380,12 +375,24 @@ static int ipvlan_get_iflink(const struct net_device *dev)
return ipvlan->phy_dev->ifindex;
}
+static int ipvlan_change_mtu(struct net_device *dev, int new_mtu)
+{
+ struct ipvl_dev *ipvlan = netdev_priv(dev);
+
+ if (ipvlan->phy_dev->mtu < new_mtu)
+ return -EINVAL;
+
+ dev->mtu = new_mtu;
+ return 0;
+}
+
static const struct net_device_ops ipvlan_netdev_ops = {
.ndo_init = ipvlan_init,
.ndo_uninit = ipvlan_uninit,
.ndo_open = ipvlan_open,
.ndo_stop = ipvlan_stop,
.ndo_start_xmit = ipvlan_start_xmit,
+ .ndo_change_mtu = ipvlan_change_mtu,
.ndo_fix_features = ipvlan_fix_features,
.ndo_change_rx_flags = ipvlan_change_rx_flags,
.ndo_set_rx_mode = ipvlan_set_multicast_mac_filter,
@@ -581,10 +588,18 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
}
}
+ if (!tb[IFLA_MTU])
+ dev->mtu = phy_dev->mtu;
+ else if (dev->mtu > phy_dev->mtu)
+ return -EINVAL;
+
+ /* MTU range: 68 - phy_dev->max_mtu */
+ dev->min_mtu = ETH_MIN_MTU;
+ dev->max_mtu = phy_dev->max_mtu;
+
ipvlan->phy_dev = phy_dev;
ipvlan->dev = dev;
ipvlan->sfeatures = IPVLAN_FEATURES;
- ipvlan_adjust_mtu(ipvlan, phy_dev);
INIT_LIST_HEAD(&ipvlan->addrs);
/* TODO Probably put random address here to be presented to the
@@ -680,6 +695,8 @@ void ipvlan_link_setup(struct net_device *dev)
{
ether_setup(dev);
+ dev->min_mtu = 0;
+ dev->max_mtu = ETH_MAX_MTU;
dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
dev->netdev_ops = &ipvlan_netdev_ops;
@@ -775,8 +792,11 @@ static int ipvlan_device_event(struct notifier_block *unused,
break;
case NETDEV_CHANGEMTU:
- list_for_each_entry(ipvlan, &port->ipvlans, pnode)
- ipvlan_adjust_mtu(ipvlan, dev);
+ list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
+ if (ipvlan->dev->mtu <= dev->mtu)
+ continue;
+ dev_set_mtu(ipvlan->dev, dev->mtu);
+ }
break;
case NETDEV_CHANGEADDR:
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-10 7:21 [PATCH V2] ipvlan: fix ipvlan MTU limits liuqifa
@ 2018-01-11 2:09 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-11 11:25 ` Jiri Benc
0 siblings, 1 reply; 9+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-11 2:09 UTC (permalink / raw)
To: liuqifa
Cc: David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Tue, Jan 9, 2018 at 11:21 PM, <liuqifa@huawei.com> wrote:
> From: Keefe Liu <liuqifa@huawei.com>
>
> The MTU of ipvlan interface should not bigger than the phy device, When we
> run following scripts, we will find there are some problems.
> Step1:
> ip link add link eth0 name ipv1 type ipvlan mode l2
> ip netns add net1
> ip link set dev ipv1 netns net1
> Step2:
> ip netns exec net1 ip link set dev ipv1 mtu 1501
> RTNETLINK answers: Invalid argument
> dmesg info: "ipv1: Invalid MTU 1501 requested, hw max 1500"
> Step3:
> ip link set dev eth0 mtu 1600
> ip netns exec net1 ip link set dev ipv1 mtu 1501
> RTNETLINK answers: Invalid argument
> dmesg info: "ipv1: Invalid MTU 1501 requested, hw max 1500"
> Step4:
> ip link set dev eth0 mtu 1400
> ip netns exec net1 ip link set dev ipv1 mtu 1500
> The result of Step2 is we expected, but the result of Step3 and Step4
> are not.
>
> This patch set ipvlan's MTU ranges from ETH_MIN_MTU to phy device's maximum
> MTU. When we create a new ipvlan device, its MTU will be set to the same
> with phy device if we not specify the mtu value. When we change the ipvlan
> device's MTU, ipvlan_change_mtu() will make sure the new MTU no bigger than
> the phy device's MTU.
>
> When we change the phy device's MTU, ipvlan devices attached to this
> phy device will check their MTU value. If ipvlan's MTU bigger than phy's,
> ipvlan's MTU will be set to the same with phy device, otherwise ipvlan's
> MTU remains unchanged.
>
> Signed-off-by: Keefe Liu <liuqifa@huawei.com>
> ---
> drivers/net/ipvlan/ipvlan_main.c | 36 ++++++++++++++++++++++++++++--------
> 1 file changed, 28 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 30cb803..2ba071a 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -34,11 +34,6 @@ struct ipvlan_netns {
> .l3mdev_l3_rcv = ipvlan_l3_rcv,
> };
>
> -static void ipvlan_adjust_mtu(struct ipvl_dev *ipvlan, struct net_device *dev)
> -{
> - ipvlan->dev->mtu = dev->mtu;
> -}
> -
> static int ipvlan_register_nf_hook(struct net *net)
> {
> struct ipvlan_netns *vnet = net_generic(net, ipvlan_netid);
> @@ -380,12 +375,24 @@ static int ipvlan_get_iflink(const struct net_device *dev)
> return ipvlan->phy_dev->ifindex;
> }
>
> +static int ipvlan_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + struct ipvl_dev *ipvlan = netdev_priv(dev);
> +
> + if (ipvlan->phy_dev->mtu < new_mtu)
> + return -EINVAL;
> +
> + dev->mtu = new_mtu;
> + return 0;
> +}
> +
> static const struct net_device_ops ipvlan_netdev_ops = {
> .ndo_init = ipvlan_init,
> .ndo_uninit = ipvlan_uninit,
> .ndo_open = ipvlan_open,
> .ndo_stop = ipvlan_stop,
> .ndo_start_xmit = ipvlan_start_xmit,
> + .ndo_change_mtu = ipvlan_change_mtu,
> .ndo_fix_features = ipvlan_fix_features,
> .ndo_change_rx_flags = ipvlan_change_rx_flags,
> .ndo_set_rx_mode = ipvlan_set_multicast_mac_filter,
> @@ -581,10 +588,18 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> }
> }
>
> + if (!tb[IFLA_MTU])
> + dev->mtu = phy_dev->mtu;
> + else if (dev->mtu > phy_dev->mtu)
> + return -EINVAL;
> +
> + /* MTU range: 68 - phy_dev->max_mtu */
> + dev->min_mtu = ETH_MIN_MTU;
> + dev->max_mtu = phy_dev->max_mtu;
> +
> ipvlan->phy_dev = phy_dev;
> ipvlan->dev = dev;
> ipvlan->sfeatures = IPVLAN_FEATURES;
> - ipvlan_adjust_mtu(ipvlan, phy_dev);
> INIT_LIST_HEAD(&ipvlan->addrs);
>
> /* TODO Probably put random address here to be presented to the
> @@ -680,6 +695,8 @@ void ipvlan_link_setup(struct net_device *dev)
> {
> ether_setup(dev);
>
> + dev->min_mtu = 0;
> + dev->max_mtu = ETH_MAX_MTU;
> dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_TX_SKB_SHARING);
> dev->priv_flags |= IFF_UNICAST_FLT | IFF_NO_QUEUE;
> dev->netdev_ops = &ipvlan_netdev_ops;
> @@ -775,8 +792,11 @@ static int ipvlan_device_event(struct notifier_block *unused,
> break;
>
> case NETDEV_CHANGEMTU:
> - list_for_each_entry(ipvlan, &port->ipvlans, pnode)
> - ipvlan_adjust_mtu(ipvlan, dev);
> + list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
> + if (ipvlan->dev->mtu <= dev->mtu)
> + continue;
> + dev_set_mtu(ipvlan->dev, dev->mtu);
> + }
Please look at the suggestion that I had given you in the earlier
thread. This approach will cause regression. Currently if the master
has MTU set to 1500, so all it's slave will get 1500 mtu. If master
changes from 1500 -> 1600, all slaves used to get the same. But in the
above patch if master changes from 1500 -> 1600, all slaves stay at
1500 and another operation need to be performed per slave to change
mtu. Your approach probably works when master is changing from 1600 ->
1500.
Also remember when master changes its mtu, you have to reset max_mtu
for slaves with the new value.
I still prefer the approach I had mentioned that uses 'mtu_adj'. In
that approach you can leave those slaves which have changed their mtu
to be lower than masters' but if master's mtu changes to larger value
all other slaves will get updated mtu leaving behind the slaves who
have opted to change their mtu on their own. Also the same thing is
true when mtu get reduced at master.
> break;
>
> case NETDEV_CHANGEADDR:
> --
> 1.8.3.1
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-11 2:09 ` Mahesh Bandewar (महेश बंडेवार)
@ 2018-01-11 11:25 ` Jiri Benc
2018-01-11 16:59 ` Mahesh Bandewar (महेश बंडेवार)
0 siblings, 1 reply; 9+ messages in thread
From: Jiri Benc @ 2018-01-11 11:25 UTC (permalink / raw)
To: Mahesh Bandewar (महेश बंडेवार)
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Wed, 10 Jan 2018 18:09:50 -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
> I still prefer the approach I had mentioned that uses 'mtu_adj'. In
> that approach you can leave those slaves which have changed their mtu
> to be lower than masters' but if master's mtu changes to larger value
> all other slaves will get updated mtu leaving behind the slaves who
> have opted to change their mtu on their own. Also the same thing is
> true when mtu get reduced at master.
The problem with this magic behavior is, well, that it's magic. There's
no way to tell what happens with a given slave when the master's MTU
gets changed just by looking at the current configuration. There's also
no way to switch the magic behavior back on once the slave's MTU is
changed.
At minimum, you'd need some kind of indication that the slave's MTU is
following the master. And a way to toggle this back.
Keefe's patch is much saner, the behavior is completely deterministic.
Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-11 11:25 ` Jiri Benc
@ 2018-01-11 16:59 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-11 18:35 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-12 8:34 ` Jiri Benc
0 siblings, 2 replies; 9+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-11 16:59 UTC (permalink / raw)
To: Jiri Benc
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Thu, Jan 11, 2018 at 3:25 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Wed, 10 Jan 2018 18:09:50 -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
>> I still prefer the approach I had mentioned that uses 'mtu_adj'. In
>> that approach you can leave those slaves which have changed their mtu
>> to be lower than masters' but if master's mtu changes to larger value
>> all other slaves will get updated mtu leaving behind the slaves who
>> have opted to change their mtu on their own. Also the same thing is
>> true when mtu get reduced at master.
>
> The problem with this magic behavior is, well, that it's magic. There's
> no way to tell what happens with a given slave when the master's MTU
> gets changed just by looking at the current configuration. There's also
> no way to switch the magic behavior back on once the slave's MTU is
> changed.
>
I guess the logic would be as simple as - if mtu_adj for a slave is
set to 0, then it's
following master otherwise not. By setting different mtu for a slave, you will
set this mtu_adj a positive number which would mean it's not following master.
So it's subjected to clamping when masters' mtu is reducing but should stay
otherwise. Also when slave decides to follow master again, it can set the mtu
to be same as masters' (making mtu_adj == 0) and then it would start following
master again.
Whether it's magic or not, it's the current behavior and I know several use
cases depend on this behavior which would be broken otherwise. The
approach I proposed keeps that going for those who depend on that while
adds an ability to set mtu per slave for the use case mentioned in this
patch-set too.
> At minimum, you'd need some kind of indication that the slave's MTU is
> following the master. And a way to toggle this back.
>
> Keefe's patch is much saner, the behavior is completely deterministic.
>
> Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-11 16:59 ` Mahesh Bandewar (महेश बंडेवार)
@ 2018-01-11 18:35 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-12 8:34 ` Jiri Benc
1 sibling, 0 replies; 9+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-11 18:35 UTC (permalink / raw)
To: Jiri Benc
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Thu, Jan 11, 2018 at 8:59 AM, Mahesh Bandewar (महेश बंडेवार)
<maheshb@google.com> wrote:
> On Thu, Jan 11, 2018 at 3:25 AM, Jiri Benc <jbenc@redhat.com> wrote:
>> On Wed, 10 Jan 2018 18:09:50 -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
>>> I still prefer the approach I had mentioned that uses 'mtu_adj'. In
>>> that approach you can leave those slaves which have changed their mtu
>>> to be lower than masters' but if master's mtu changes to larger value
>>> all other slaves will get updated mtu leaving behind the slaves who
>>> have opted to change their mtu on their own. Also the same thing is
>>> true when mtu get reduced at master.
>>
>> The problem with this magic behavior is, well, that it's magic. There's
>> no way to tell what happens with a given slave when the master's MTU
>> gets changed just by looking at the current configuration. There's also
>> no way to switch the magic behavior back on once the slave's MTU is
>> changed.
>>
> I guess the logic would be as simple as - if mtu_adj for a slave is
> set to 0, then it's
> following master otherwise not. By setting different mtu for a slave, you will
> set this mtu_adj a positive number which would mean it's not following master.
> So it's subjected to clamping when masters' mtu is reducing but should stay
> otherwise. Also when slave decides to follow master again, it can set the mtu
> to be same as masters' (making mtu_adj == 0) and then it would start following
> master again.
>
> Whether it's magic or not, it's the current behavior and I know several use
> cases depend on this behavior which would be broken otherwise. The
> approach I proposed keeps that going for those who depend on that while
> adds an ability to set mtu per slave for the use case mentioned in this
> patch-set too.
>
Actually we can just have boolean val per slave indicating it's changed locally
instead of maintaining the mtu diff which is useless at this moment.
>> At minimum, you'd need some kind of indication that the slave's MTU is
>> following the master. And a way to toggle this back.
>>
>> Keefe's patch is much saner, the behavior is completely deterministic.
>>
>> Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-11 16:59 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-11 18:35 ` Mahesh Bandewar (महेश बंडेवार)
@ 2018-01-12 8:34 ` Jiri Benc
2018-01-12 8:48 ` Jiri Benc
1 sibling, 1 reply; 9+ messages in thread
From: Jiri Benc @ 2018-01-12 8:34 UTC (permalink / raw)
To: Mahesh Bandewar (महेश बंडेवार)
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Thu, 11 Jan 2018 08:59:58 -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
> I guess the logic would be as simple as - if mtu_adj for a slave is
> set to 0, then it's
> following master otherwise not. By setting different mtu for a slave, you will
> set this mtu_adj a positive number which would mean it's not following master.
> So it's subjected to clamping when masters' mtu is reducing but should stay
> otherwise. Also when slave decides to follow master again, it can set the mtu
> to be same as masters' (making mtu_adj == 0) and then it would start following
> master again.
How can the mtu_adj value be queried and set from user space?
> Whether it's magic or not, it's the current behavior and I know several use
> cases depend on this behavior which would be broken otherwise. The
> approach I proposed keeps that going for those who depend on that while
> adds an ability to set mtu per slave for the use case mentioned in this
> patch-set too.
I don't think this works currently. When someone (does not have to be
you, it can be a management software running in background) sets the
MTU to the current value, the magic behavior is lost without any way to
restore it (unless I'm missing a way to restore it, see my question
above). So any user that depends on the magic behavior is broken anyway
even now.
Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-12 8:34 ` Jiri Benc
@ 2018-01-12 8:48 ` Jiri Benc
2018-01-12 17:50 ` Mahesh Bandewar (महेश बंडेवार)
0 siblings, 1 reply; 9+ messages in thread
From: Jiri Benc @ 2018-01-12 8:48 UTC (permalink / raw)
To: Mahesh Bandewar (महेश बंडेवार)
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Fri, 12 Jan 2018 09:34:13 +0100, Jiri Benc wrote:
> I don't think this works currently. When someone (does not have to be
> you, it can be a management software running in background) sets the
> MTU to the current value, the magic behavior is lost without any way to
> restore it (unless I'm missing a way to restore it, see my question
> above). So any user that depends on the magic behavior is broken anyway
> even now.
Upon further inspection, it seems that currently, slaves always follow
master's MTU without a way to change it. Tough situation. Even
implementing user space toggleable mtu_adj could break users in the way
I described. But it seems to be the lesser evil, at least there would
be a way to unbreak the scripts with one line addition.
But it's absolute must to have this visible to the user space and
changeable. Something like this:
# ip a
123: ipvlan0: <FLAGS> mtu 1500 (auto) qdisc ...
# ip l s ipvlan0 mtu 1400
# ip a
123: ipvlan0: <FLAGS> mtu 1400 qdisc ...
# ip l s ipvlan0 mtu auto
# ip a
123: ipvlan0: <FLAGS> mtu 1500 (auto) qdisc ...
Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-12 8:48 ` Jiri Benc
@ 2018-01-12 17:50 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-12 18:10 ` Jiri Benc
0 siblings, 1 reply; 9+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-12 17:50 UTC (permalink / raw)
To: Jiri Benc
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Fri, Jan 12, 2018 at 12:48 AM, Jiri Benc <jbenc@redhat.com> wrote:
> On Fri, 12 Jan 2018 09:34:13 +0100, Jiri Benc wrote:
>> I don't think this works currently. When someone (does not have to be
>> you, it can be a management software running in background) sets the
>> MTU to the current value, the magic behavior is lost without any way to
>> restore it (unless I'm missing a way to restore it, see my question
>> above). So any user that depends on the magic behavior is broken anyway
>> even now.
>
> Upon further inspection, it seems that currently, slaves always follow
> master's MTU without a way to change it. Tough situation. Even
> implementing user space toggleable mtu_adj could break users in the way
> I described. But it seems to be the lesser evil, at least there would
> be a way to unbreak the scripts with one line addition.
>
> But it's absolute must to have this visible to the user space and
> changeable. Something like this:
>
> # ip a
> 123: ipvlan0: <FLAGS> mtu 1500 (auto) qdisc ...
> # ip l s ipvlan0 mtu 1400
> # ip a
> 123: ipvlan0: <FLAGS> mtu 1400 qdisc ...
> # ip l s ipvlan0 mtu auto
> # ip a
> 123: ipvlan0: <FLAGS> mtu 1500 (auto) qdisc ...
>
(Looks like you missed the last update I mentioned)
Here is the approach in details -
(a) At slave creation time - it's exactly how it's done currently
where slave copies masters' mtu. at the same time max_mtu of the slave
is set as the current mtu of the master.
(b) If slave updates mtu - ipvlan_change_mtu() will be called and the
slave's mtu will get set and it will set a flag indicating that slave
has changed it's mtu (dissociation from master if the mtu is different
from masters'). If slave mtu is set same as masters' then this flag
will get reset-ed indicating it wants to follow master (current
behavior).
(c) When master updates mtu - ipvlan_adj_mtu() gets called where all
slaves' max_mtu changes to the master's mtu value (clamping applies
for the all slaves which are not following master). All the slaves
which are following master (flag per slave) will get this new mtu.
Another consequence of this is that slaves' flag might get reset-ed if
the master's mtu is reduced to the value that was set earlier for the
slave (and it will start following slave).
The above should work? The user-space can query the mtu of the slave
device just like any other device. I was thinking about 'mtu_adj' with
some additional future extention but for now; we can live with a flag
on the slave device(s).
thanks,
--mahesh..
> Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2] ipvlan: fix ipvlan MTU limits
2018-01-12 17:50 ` Mahesh Bandewar (महेश बंडेवार)
@ 2018-01-12 18:10 ` Jiri Benc
0 siblings, 0 replies; 9+ messages in thread
From: Jiri Benc @ 2018-01-12 18:10 UTC (permalink / raw)
To: Mahesh Bandewar (महेश बंडेवार)
Cc: liuqifa, David Miller, dsahern, mschiffer, idosch, fw, kjlx,
girish.moodalbail, sainath.grandhi, linux-netdev, weiyongjun1,
chenweilong, maowenan, wangyufen, yuehaibing, liujian56, liuzhe24,
wangkefeng.wang, dingtianhong
On Fri, 12 Jan 2018 09:50:35 -0800, Mahesh Bandewar (महेश बंडेवार) wrote:
> (Looks like you missed the last update I mentioned)
I did not miss it. The proposed behavior is inconsistent and has no
clear pattern (I used the word "magic" for that). I guess examples will
help more. See below.
> Here is the approach in details -
> (a) At slave creation time - it's exactly how it's done currently
> where slave copies masters' mtu. at the same time max_mtu of the slave
> is set as the current mtu of the master.
> (b) If slave updates mtu - ipvlan_change_mtu() will be called and the
> slave's mtu will get set and it will set a flag indicating that slave
> has changed it's mtu (dissociation from master if the mtu is different
> from masters'). If slave mtu is set same as masters' then this flag
> will get reset-ed indicating it wants to follow master (current
> behavior).
Consider these two cases:
# ip l a link eth0 type ipvlan
# ip l s ipvlan0 mtu 1400
# ip l s eth0 mtu 1400
# ip l s eth0 mtu 1500
Now MTU of ipvlan0 is 1400.
# ip l a link eth0 type ipvlan
# ip l s eth0 mtu 1400
# ip l s ipvlan0 mtu 1400
# ip l s eth0 mtu 1500
Now MTU of ipvlan0 is 1500.
See why I call that behavior "magic"? Before the last step, it's
impossible to tell from the user space what will happen. And no, don't
propose to artificially cover the first example by forcing the auto
follow flag, it doesn't help. It just moves the magic behavior to
different scenarios.
> (c) When master updates mtu - ipvlan_adj_mtu() gets called where all
> slaves' max_mtu changes to the master's mtu value (clamping applies
> for the all slaves which are not following master). All the slaves
> which are following master (flag per slave) will get this new mtu.
> Another consequence of this is that slaves' flag might get reset-ed if
> the master's mtu is reduced to the value that was set earlier for the
> slave (and it will start following slave).
Okay, you're actually proposing that. So, another example:
# ip l a link eth0 type ipvlan
# ip l s ipvlan0 mtu 1400
# ip l s eth0 mtu 1400
# ip l s eth0 mtu 1500
Now MTU of ipvlan0 is 1500.
# ip l a link eth0 type ipvlan
# ip l s ipvlan0 mtu 1400
# ip l s eth0 mtu 1600
# ip l s eth0 mtu 1500
Now MTU of ipvlan0 is 1400. There's still no consistency here.
> The above should work? The user-space can query the mtu of the slave
> device just like any other device. I was thinking about 'mtu_adj' with
> some additional future extention but for now; we can live with a flag
> on the slave device(s).
It absolutely does not. Never introduce magic behavior, it just forces
you to add more layers of hacks later.
Really, the only way is to introduce user visible and user changeable
flag. Yes, it's more work. But that's something you'll have to swallow.
Introducing hacky behavior is not the way to go. We try hard to
preserve user space compatibility which is pretty much impossible if
you introduce magic hacky behavior. Do it right from the start.
Jiri
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-01-12 18:10 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-10 7:21 [PATCH V2] ipvlan: fix ipvlan MTU limits liuqifa
2018-01-11 2:09 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-11 11:25 ` Jiri Benc
2018-01-11 16:59 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-11 18:35 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-12 8:34 ` Jiri Benc
2018-01-12 8:48 ` Jiri Benc
2018-01-12 17:50 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-12 18:10 ` Jiri Benc
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).