Netdev List
 help / color / mirror / Atom feed
* Re: RCU callback crashes
From: Cong Wang @ 2017-12-21  0:03 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <CAM_iQpVPUifm3rcXu8SP9ShSmm7z9z+8UjppdY_AxMYQwHE9YQ@mail.gmail.com>

On Wed, Dec 20, 2017 at 10:31 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Dec 20, 2017 at 10:17 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> I guess it is q->miniqp which is freed in qdisc_graft() without properly
>> waiting for rcu readers?
>
> It is probably so, the call_rcu_bh(&miniq_old->rcu, mini_qdisc_rcu_func)
> in the end of mini_qdisc_pair_swap() is invoked on miniq_old->rcu,
> but miniq is being freed, no rcu barrier waits for it...
>
> You can try to add a rcu_barrier_bh() at the end to see if this crash
> is gone, but I don't think people like adding yet another rcu barrier...

Hi, Jakub

Can you test the following fix? I am not a fan of rcu barrier but we
already have one so...

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 876fab2604b8..1b68fedea124 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1240,6 +1240,8 @@ void mini_qdisc_pair_swap(struct mini_Qdisc_pair *miniqp,

        if (!tp_head) {
                RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
+               /* Wait for existing flying RCU callback before being freed. */
+               rcu_barrier_bh();
                return;
        }

^ permalink raw reply related

* Re: linux-next: build failure after merge of the net-next tree
From: Jakub Kicinski @ 2017-12-20 23:59 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: David Miller, Networking, Linux-Next Mailing List,
	Linux Kernel Mailing List, Daniel Borkmann
In-Reply-To: <20171221104304.38f94a00@canb.auug.org.au>

On Thu, 21 Dec 2017 10:43:04 +1100, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the net-next tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
> 
> drivers/net/netdevsim/bpf.c: In function 'nsim_bpf_setup_tc_block_cb':
> drivers/net/netdevsim/bpf.c:127:7: error: 'TC_CLSBPF_REPLACE' undeclared (first use in this function)
>   case TC_CLSBPF_REPLACE:
>        ^
> drivers/net/netdevsim/bpf.c:129:7: error: 'TC_CLSBPF_ADD' undeclared (first use in this function)
>   case TC_CLSBPF_ADD:
>        ^
> drivers/net/netdevsim/bpf.c:131:7: error: 'TC_CLSBPF_DESTROY' undeclared (first use in this function)
>   case TC_CLSBPF_DESTROY:
>        ^
> 
> Caused by commit
> 
>   31d3ad832948 ("netdevsim: add bpf offload support")
> 
> interacting with commit
> 
>   102740bd9436 ("cls_bpf: fix offload assumptions after callback conversion")
> 
> from the net tree.
> 
> I applied the following merge fix patch:
> 
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Thu, 21 Dec 2017 10:18:46 +1100
> Subject: [PATCH] netdevsim: fix up for "cls_bpf: fix offload assumptions after
>  callback conversion"
> 
> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Hi Stephen, sorry about those merges today.  The proper fix is queued
up in patchwork for when net-next is merged in to net:

http://patchwork.ozlabs.org/patch/851063/

I will CC you on patches which may cause/fix merge trouble in the
future, often the information about how to resolve the conflict is
not part of the commit message.

Sorry I didn't think about CCing you earlier!

^ permalink raw reply

* linux-next: build failure after merge of the net-next tree
From: Stephen Rothwell @ 2017-12-20 23:43 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Jakub Kicinski, Daniel Borkmann

Hi all,

After merging the net-next tree, today's linux-next build (x86_64
allmodconfig) failed like this:

drivers/net/netdevsim/bpf.c: In function 'nsim_bpf_setup_tc_block_cb':
drivers/net/netdevsim/bpf.c:127:7: error: 'TC_CLSBPF_REPLACE' undeclared (first use in this function)
  case TC_CLSBPF_REPLACE:
       ^
drivers/net/netdevsim/bpf.c:129:7: error: 'TC_CLSBPF_ADD' undeclared (first use in this function)
  case TC_CLSBPF_ADD:
       ^
drivers/net/netdevsim/bpf.c:131:7: error: 'TC_CLSBPF_DESTROY' undeclared (first use in this function)
  case TC_CLSBPF_DESTROY:
       ^

Caused by commit

  31d3ad832948 ("netdevsim: add bpf offload support")

interacting with commit

  102740bd9436 ("cls_bpf: fix offload assumptions after callback conversion")

from the net tree.

I applied the following merge fix patch:

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Thu, 21 Dec 2017 10:18:46 +1100
Subject: [PATCH] netdevsim: fix up for "cls_bpf: fix offload assumptions after
 callback conversion"

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 drivers/net/netdevsim/bpf.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netdevsim/bpf.c b/drivers/net/netdevsim/bpf.c
index 7da814686ad9..afaf980bbbe7 100644
--- a/drivers/net/netdevsim/bpf.c
+++ b/drivers/net/netdevsim/bpf.c
@@ -123,16 +123,10 @@ int nsim_bpf_setup_tc_block_cb(enum tc_setup_type type,
 	if (prog && !prog->aux->offload && !ns->bpf_tc_non_bound_accept)
 		return -EOPNOTSUPP;
 
-	switch (cls_bpf->command) {
-	case TC_CLSBPF_REPLACE:
-		return nsim_bpf_offload(ns, prog, true);
-	case TC_CLSBPF_ADD:
-		return nsim_bpf_offload(ns, prog, false);
-	case TC_CLSBPF_DESTROY:
-		return nsim_bpf_offload(ns, NULL, true);
-	default:
+	if (cls_bpf->command != TC_CLSBPF_OFFLOAD)
 		return -EOPNOTSUPP;
-	}
+
+	return nsim_bpf_offload(ns, prog, cls_bpf->oldprog);
 }
 
 int nsim_bpf_disable_tc(struct netdevsim *ns)
-- 
2.15.0

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply related

* Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
From: John Fastabend @ 2017-12-20 23:40 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: xiyou.wangcong, jiri, davem, netdev, eric.dumazet
In-Reply-To: <20171220135959.3ff075ac@cakuba.netronome.com>

On 12/20/2017 01:59 PM, Jakub Kicinski wrote:
> On Wed, 20 Dec 2017 12:09:19 -0800, John Fastabend wrote:
>> RCU grace period is needed for lockless qdiscs added in the commit
>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>
>> It is needed now that qdiscs may be lockless otherwise we risk
>> free'ing a qdisc that is still in use from datapath. Additionally,
>> push list cleanup into RCU callback. Otherwise we risk the datapath
>> adding skbs during removal.
>>
>> Fixes: c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array")
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> 
> Seems like this revert may be too heavy handed:
> 
> # ./tools/testing/selftests/bpf/test_offload.py --log /tmp/log
> Test destruction of generic XDP...
> Test TC non-offloaded...
> Test TC non-offloaded isn't getting bound...
> Test TC offloads are off by default...
> Test TC offload by default...
> Test TC cBPF bytcode tries offload by default...
> Test TC cBPF unbound bytecode doesn't offload...
> Test TC offloads work...
> FAIL: TC filter did not load with TC offloads enabled
> 
> And it's triggering:
> 
> WARNING: CPU: 15 PID: 1853 at ../drivers/net/netdevsim/bpf.c:372 nsim_bpf_uninit+0x2e/0x41 [netdevsim]
> 
> Which is:
> 
>    368	void nsim_bpf_uninit(struct netdevsim *ns)
>    369	{
>    370		WARN_ON(!list_empty(&ns->bpf_bound_progs));
>    371		WARN_ON(ns->xdp_prog);
>>> 372		WARN_ON(ns->bpf_offloaded);
>    373	}
> 
> (Meaning the offload was not stopped by the stack before ndo_uninit.)
> 

Dang. So offload code depends on destroy being called on a qdisc
to in turn destroy the filters and unbind any offloads.

I was hoping I could get away with tearing down live qdiscs without
too much work. Looks like not.

Note the fixes tag was bogus nothing is actually broken in current
code until a lockless qdisc with classes shows up.

.John

^ permalink raw reply

* Re: [net] Revert "net: core: maybe return -EEXIST in __dev_alloc_name"
From: Rasmus Villemoes @ 2017-12-20 23:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Johannes Berg, netdev@vger.kernel.org, Jouni Malinen,
	Johannes Berg, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <87vah29a1m.fsf@concordia.ellerman.id.au>

On Tue, Dec 19 2017, Michael Ellerman <michael@concordia.ellerman.id.au> wrote:

> Hi Johannes,
>
>> From: Johannes Berg <johannes.berg@intel.com>
>> 
>> This reverts commit d6f295e9def0; some userspace (in the case
>
> This revert seems to have broken networking on one of my powerpc
> machines, according to git bisect.
>
> The symptom is DHCP fails and I don't get a link, I didn't dig any
> further than that. I can if it's helpful.
>
> I think the problem is that 87c320e51519 ("net: core: dev_get_valid_name
> is now the same as dev_alloc_name_ns") only makes sense while
> d6f295e9def0 remains in the tree.

I'm sorry about all of this, I really didn't think there would be such
consequences of changing an errno return. Indeed, d6f29 was preparation
for unifying the two functions that do the exact same thing (and how we
ever got into that situation is somewhat unclear), except for
their behaviour in the case the requested name already exists. So one of
the two interfaces had to change its return value, and as I wrote, I
thought EEXIST was the saner choice when an explicit name (no %d) had
been requested.

> ie. before the entire series, dev_get_valid_name() would return EEXIST,
> and that was retained when 87c320e51519 was merged, but now that
> d6f295e9def0 has been reverted dev_get_valid_name() is returning ENFILE.
>
> I can get the network up again if I also revert 87c320e51519 ("net:
> core: dev_get_valid_name is now the same as dev_alloc_name_ns"), or with
> the gross patch below.

I don't think changing -ENFILE to -EEXIST would be right either, since
dev_get_valid_name() used to be able to return both (-EEXIST in the case
where there's no %d, -ENFILE in the case where we end up calling
dev_alloc_name_ns()). If anything, we could do the check for the old
-EEXIST condition first, and then call dev_alloc_name_ns(). But I'm also
fine with reverting.

Again, sorry :(

Rasmus

^ permalink raw reply

* Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
From: John Fastabend @ 2017-12-20 23:34 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, David Miller, Jakub Kicinski,
	Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpVQF5MosmXqhfxtnH9kh96MEHd0-kO8SO3TbCpywOzv+g@mail.gmail.com>

On 12/20/2017 03:23 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 12/20/2017 02:41 PM, Cong Wang wrote:
>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>>> <john.fastabend@gmail.com> wrote:
>>>> RCU grace period is needed for lockless qdiscs added in the commit
>>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>>>
>>>> It is needed now that qdiscs may be lockless otherwise we risk
>>>> free'ing a qdisc that is still in use from datapath. Additionally,
>>>> push list cleanup into RCU callback. Otherwise we risk the datapath
>>>> adding skbs during removal.
>>>
>>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
>>> It doesn't work with your "lockless" patches?
>>>
>>
>> Well this is only in the 'parent == NULL' case otherwise we call
>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
>> sch_tree_lock().
>>
>> The only converted qdisc mq and mqprio at this point don't care
>> though and do their own dev_deactivate/activate. So its not fixing
>> anything in the above mentioned commit.
> 
> Sure, removing a class does not impact the whole device,
> but removing the root qdisc does.
> 
> After your "lockless", skb_array_consume_bh() is called in
> pfifo_fast_reset() and ptr_ring_cleanup() is called in
> pfifo_fast_destroy(), assuming skb_array is not buggy, what race
> do we have here with datapath?
> 

None at the moment.

> 
>>
>> I still think it will need to be done eventually. If it resolves
>> the miniq case it seems like a good idea. Although per Jakub's comment
>> perhaps I pulled too much into the RCU handler.
> 
> The case Jakub reported is a RCU callback missing a rcu
> barrier. I don't understand why you keep believing it is RCU
> readers on datapath.> 
> Not even to mention ingress is not affected by your "lockless"
> thing.
> 

I was thinking about the case where we want a lockless qdisc
with classes. Doing the qdisc destroy after a grace period would
solve this. Also we could start to cleanup a lot of the locking
and extra bits around 'running' qdisc and such by doing a clean
xchg on the qdisc layer. It seems that a dev_activate/deactivate
just to install a new qdisc is not needed.

Anyways future work. However if it resolves the miniq issue, as
Jiri indicated, seems like a clean fix. Although Jakub's issue
with the patch would need to be addressed. Seems he gets a WARN_ON
if the offload is not disabled but the device is unitialized.

.John

^ permalink raw reply

* Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
From: Cong Wang @ 2017-12-20 23:23 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, David Miller, Jakub Kicinski,
	Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <d6892431-21bc-254f-cc29-acf28d00af20@gmail.com>

On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 12/20/2017 02:41 PM, Cong Wang wrote:
>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>> RCU grace period is needed for lockless qdiscs added in the commit
>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>>
>>> It is needed now that qdiscs may be lockless otherwise we risk
>>> free'ing a qdisc that is still in use from datapath. Additionally,
>>> push list cleanup into RCU callback. Otherwise we risk the datapath
>>> adding skbs during removal.
>>
>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
>> It doesn't work with your "lockless" patches?
>>
>
> Well this is only in the 'parent == NULL' case otherwise we call
> cops->graft(). Most sch_* seem to use qdisc_replace and this uses
> sch_tree_lock().
>
> The only converted qdisc mq and mqprio at this point don't care
> though and do their own dev_deactivate/activate. So its not fixing
> anything in the above mentioned commit.

Sure, removing a class does not impact the whole device,
but removing the root qdisc does.

After your "lockless", skb_array_consume_bh() is called in
pfifo_fast_reset() and ptr_ring_cleanup() is called in
pfifo_fast_destroy(), assuming skb_array is not buggy, what race
do we have here with datapath?


>
> I still think it will need to be done eventually. If it resolves
> the miniq case it seems like a good idea. Although per Jakub's comment
> perhaps I pulled too much into the RCU handler.

The case Jakub reported is a RCU callback missing a rcu
barrier. I don't understand why you keep believing it is RCU
readers on datapath.

Not even to mention ingress is not affected by your "lockless"
thing.

^ permalink raw reply

* [PATCH net-next] phylink: avoid attaching more than one PHY
From: Russell King @ 2017-12-20 23:23 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev

Attaching more than one PHY to phylink is bad news, as we store a
pointer to the PHY in a single location. Error out if more than one
PHY is attempted to be attached.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phylink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index db5d5726ced9..82166a26f5c6 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -726,6 +726,9 @@ int phylink_connect_phy(struct phylink *pl, struct phy_device *phy)
 		     phy_interface_mode_is_8023z(pl->link_interface))))
 		return -EINVAL;
 
+	if (pl->phydev)
+		return -EBUSY;
+
 	/* Use PHY device/driver interface */
 	if (pl->link_interface == PHY_INTERFACE_MODE_NA) {
 		pl->link_interface = phy->interface;
-- 
2.7.4

^ permalink raw reply related

* [PATCH net] phylink: ensure AN is enabled
From: Russell King @ 2017-12-20 23:21 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev

Ensure that we mark AN as enabled at boot time, rather than leaving
it disabled.  This is noticable if your SFP module is fiber, and you
it supports faster speeds than 1G with 2.5G support in place.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phylink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index e30339fca5cf..db5d5726ced9 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -567,6 +567,7 @@ struct phylink *phylink_create(struct net_device *ndev,
 	pl->link_config.pause = MLO_PAUSE_AN;
 	pl->link_config.speed = SPEED_UNKNOWN;
 	pl->link_config.duplex = DUPLEX_UNKNOWN;
+	pl->link_config.an_enabled = true;
 	pl->ops = ops;
 	__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH net] phylink: ensure the PHY interface mode is appropriately set
From: Russell King @ 2017-12-20 23:21 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev

When setting the ethtool settings, ensure that the validated PHY
interface mode is propagated to the current link settings, so that
2500BaseX can be selected.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phylink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index f7a777475762..e30339fca5cf 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1136,6 +1136,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl,
 	mutex_lock(&pl->state_mutex);
 	/* Configure the MAC to match the new settings */
 	linkmode_copy(pl->link_config.advertising, our_kset.link_modes.advertising);
+	pl->link_config.interface = config.interface;
 	pl->link_config.speed = our_kset.base.speed;
 	pl->link_config.duplex = our_kset.base.duplex;
 	pl->link_config.an_enabled = our_kset.base.autoneg != AUTONEG_DISABLE;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v2] net: ibm: emac: support RGMII-[RX|TX]ID phymode
From: Benjamin Herrenschmidt @ 2017-12-20 22:04 UTC (permalink / raw)
  To: Christian Lamparter, David Miller; +Cc: netdev, andrew, christophe.jaillet
In-Reply-To: <5154461.jTBeEfsTQW@debian64>

On Wed, 2017-12-20 at 22:07 +0100, Christian Lamparter wrote:
> 
> > will read this and say "Oh the function tests against these weird
> > PHY_MODE_* aliases, but the helper function phy_interface_mode_is_rgmii()
> > tests against PHY_INTERFACE_MODE_*, what is going on?"
> > 
> > I hate to do this to you, but please get rid of these confusing and
> > completely unnecessary PHY_MODE_* CPP aliases first, and just use the
> > proper PHY_INTERFACE_MODE_* values consistently.
> 
> Yeah, I can do that. no problem. 
> 
> Question is, should I also replace the rgmii_mode_name() with phy_modes() too?
> 
> The only user of rgmii_mode_name() is this notice printk in rgmii_attach():
> <http://elixir.free-electrons.com/linux/latest/source/drivers/net/ethernet/ibm/emac/rgmii.c#L117>

Yup, this is all pre-hisorical gunk, feel free to replace it.

Cheers,
Ben.

^ permalink raw reply

* Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
From: John Fastabend @ 2017-12-20 23:05 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jiri Pirko, David Miller, Jakub Kicinski,
	Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpVuLzmiEkQGJwZcrZQsAziTAbzyaszbVjLe_YeQxvYpBg@mail.gmail.com>

On 12/20/2017 02:41 PM, Cong Wang wrote:
> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> RCU grace period is needed for lockless qdiscs added in the commit
>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>>
>> It is needed now that qdiscs may be lockless otherwise we risk
>> free'ing a qdisc that is still in use from datapath. Additionally,
>> push list cleanup into RCU callback. Otherwise we risk the datapath
>> adding skbs during removal.
> 
> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
> It doesn't work with your "lockless" patches?
> 

Well this is only in the 'parent == NULL' case otherwise we call
cops->graft(). Most sch_* seem to use qdisc_replace and this uses
sch_tree_lock().

The only converted qdisc mq and mqprio at this point don't care
though and do their own dev_deactivate/activate. So its not fixing
anything in the above mentioned commit. 

I still think it will need to be done eventually. If it resolves
the miniq case it seems like a good idea. Although per Jakub's comment
perhaps I pulled too much into the RCU handler.

^ permalink raw reply

* linux-next: manual merge of the net-next tree with the net tree
From: Stephen Rothwell @ 2017-12-20 22:59 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
	Jakub Kicinski

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/netronome/nfp/bpf/main.c

between commit:

  d3f89b98e391 ("nfp: bpf: keep track of the offloaded program")

from the net tree and commit:

  bd0b2e7fe611 ("net: xdp: make the stack take care of the tear down")

from the net-next tree.

I fixed it up (the latter seems to be a fix for the same problem as the
former, so I just reverted the former by hand) and can carry the fix as
necessary. This is now fixed as far as linux-next is concerned, but any
non trivial conflicts should be mentioned to your upstream maintainer
when your tree is submitted for merging.  You may also want to consider
cooperating with the maintainer of the conflicting tree to minimise any
particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply

* [PATCH v5 iproute2 net-next] erspan: add erspan version II support
From: William Tu @ 2017-12-20 22:49 UTC (permalink / raw)
  To: netdev

The patch adds support for configuring the erspan v2, for both
ipv4 and ipv6 erspan implementation.  Three additional fields
are added: 'erspan_ver' for distinguishing v1 or v2, 'erspan_dir'
for specifying direction of the mirrored traffic, and 'erspan_hwid'
for users to set ERSPAN engine ID within a system.

As for manpage, the ERSPAN descriptions used to be under GRE, IPIP,
SIT Type paragraph.  Since IP6GRE/IP6GRETAP also supports ERSPAN,
the patch removes the old one, creates a separate ERSPAN paragrah,
and adds an example.

Signed-off-by: William Tu <u9012063@gmail.com>
---
change in v5:
  - update the "Usage: " in c file, so
    # ip link help erspan
    shows the new options
change in v4:                                                                   
  - use matches instead of strcmp on ingress/egress                             
change in v3:                                                                   
  - change erspan_dir 0/1 to "in[gress]/e[gress]"                               
  - update manpage                                                              
change in v2:                                                                   
  - fix typo ETH_P_ERSPAN2                                                      
  - fix space and indent 
---
 include/uapi/linux/if_ether.h  |  1 +
 include/uapi/linux/if_tunnel.h |  3 ++
 ip/link_gre.c                  | 70 ++++++++++++++++++++++++++++++--
 ip/link_gre6.c                 | 71 ++++++++++++++++++++++++++++++--
 man/man8/ip-link.8.in          | 92 ++++++++++++++++++++++++++++++++++++------
 5 files changed, 218 insertions(+), 19 deletions(-)

diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index 2eb529a90250..133567bf2e04 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -47,6 +47,7 @@
 #define ETH_P_PUP	0x0200		/* Xerox PUP packet		*/
 #define ETH_P_PUPAT	0x0201		/* Xerox PUP Addr Trans packet	*/
 #define ETH_P_TSN	0x22F0		/* TSN (IEEE 1722) packet	*/
+#define ETH_P_ERSPAN2	0x22EB		/* ERSPAN version 2 (type III)	*/
 #define ETH_P_IP	0x0800		/* Internet Protocol packet	*/
 #define ETH_P_X25	0x0805		/* CCITT X.25			*/
 #define ETH_P_ARP	0x0806		/* Address Resolution packet	*/
diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 38cdf90692f8..ecdc76669cfd 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -137,6 +137,9 @@ enum {
 	IFLA_GRE_IGNORE_DF,
 	IFLA_GRE_FWMARK,
 	IFLA_GRE_ERSPAN_INDEX,
+	IFLA_GRE_ERSPAN_VER,
+	IFLA_GRE_ERSPAN_DIR,
+	IFLA_GRE_ERSPAN_HWID,
 	__IFLA_GRE_MAX,
 };
 
diff --git a/ip/link_gre.c b/ip/link_gre.c
index 43cb1af6196a..65ad8bad4b82 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -44,7 +44,11 @@ static void print_usage(FILE *f)
 		"                            [ [no]encap-csum6 ]\n"
 		"                            [ [no]encap-remcsum ]\n"
 		"                            [ fwmark MARK ]\n"
+		"                            [ erspan_ver version ]\n"
 		"                            [ erspan IDX ]\n"
+		"                            [ erspan_dir { ingress | egress } ]\n"
+		"                            [ erspan_hwid hwid ]\n"
+		"                            [ external ]\n"
 		"\n"
 		"Where: ADDR := { IP_ADDRESS | any }\n"
 		"       TOS  := { NUMBER | inherit }\n"
@@ -98,6 +102,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	__u8 ignore_df = 0;
 	__u32 fwmark = 0;
 	__u32 erspan_idx = 0;
+	__u8 erspan_ver = 0;
+	__u8 erspan_dir = 0;
+	__u16 erspan_hwid = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -179,6 +186,15 @@ get_failed:
 		if (greinfo[IFLA_GRE_ERSPAN_INDEX])
 			erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
 
+		if (greinfo[IFLA_GRE_ERSPAN_VER])
+			erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_DIR])
+			erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_HWID])
+			erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
 		free(answer);
 	}
 
@@ -343,6 +359,24 @@ get_failed:
 				invarg("invalid erspan index\n", *argv);
 			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
 				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+		} else if (strcmp(*argv, "erspan_ver") == 0) {
+			NEXT_ARG();
+			if (get_u8(&erspan_ver, *argv, 0))
+				invarg("invalid erspan version\n", *argv);
+			if (erspan_ver != 1 && erspan_ver != 2)
+				invarg("erspan version must be 1 or 2\n", *argv);
+		} else if (strcmp(*argv, "erspan_dir") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "ingress") == 0)
+				erspan_dir = 0;
+			else if (matches(*argv, "egress") == 0)
+				erspan_dir = 1;
+			else
+				invarg("Invalid erspan direction.", *argv);
+		} else if (strcmp(*argv, "erspan_hwid") == 0) {
+			NEXT_ARG();
+			if (get_u16(&erspan_hwid, *argv, 0))
+				invarg("invalid erspan hwid\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -374,8 +408,15 @@ get_failed:
 		addattr_l(n, 1024, IFLA_GRE_TTL, &ttl, 1);
 		addattr_l(n, 1024, IFLA_GRE_TOS, &tos, 1);
 		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-		if (erspan_idx != 0)
-			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+		if (erspan_ver) {
+			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+			if (erspan_ver == 1 && erspan_idx != 0) {
+				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+			} else if (erspan_ver == 2) {
+				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+			}
+		}
 	} else {
 		addattr_l(n, 1024, IFLA_GRE_COLLECT_METADATA, NULL, 0);
 	}
@@ -514,7 +555,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
 		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
 
-		fprintf(f, "erspan_index %u ", erspan_idx);
+		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_VER]) {
+		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_DIR]) {
+		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+		if (erspan_dir == 0)
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir ingress ", NULL);
+		else
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir egress ", NULL);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_HWID]) {
+		__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+		print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
 	}
 
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/ip/link_gre6.c b/ip/link_gre6.c
index 2cb46ca116d0..cb621806261c 100644
--- a/ip/link_gre6.c
+++ b/ip/link_gre6.c
@@ -52,7 +52,11 @@ static void print_usage(FILE *f)
 		"                                  [ [no]encap-csum ]\n"
 		"                                  [ [no]encap-csum6 ]\n"
 		"                                  [ [no]encap-remcsum ]\n"
+		"                                  [ erspan_ver version ]\n"
 		"                                  [ erspan IDX ]\n"
+		"                                  [ erspan_dir { ingress | egress } ]\n"
+		"                                  [ erspan_hwid hwid ]\n"
+		"                                  [ external ]\n"
 		"\n"
 		"Where: ADDR      := IPV6_ADDRESS\n"
 		"       TTL       := { 0..255 } (default=%d)\n"
@@ -109,6 +113,9 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 	int len;
 	__u32 fwmark = 0;
 	__u32 erspan_idx = 0;
+	__u8 erspan_ver = 0;
+	__u8 erspan_dir = 0;
+	__u16 erspan_hwid = 0;
 
 	if (!(n->nlmsg_flags & NLM_F_CREATE)) {
 		if (rtnl_talk(&rth, &req.n, &answer) < 0) {
@@ -191,6 +198,15 @@ get_failed:
 		if (greinfo[IFLA_GRE_ERSPAN_INDEX])
 			erspan_idx = rta_getattr_u32(greinfo[IFLA_GRE_ERSPAN_INDEX]);
 
+		if (greinfo[IFLA_GRE_ERSPAN_VER])
+			erspan_ver = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_VER]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_DIR])
+			erspan_dir = rta_getattr_u8(greinfo[IFLA_GRE_ERSPAN_DIR]);
+
+		if (greinfo[IFLA_GRE_ERSPAN_HWID])
+			erspan_hwid = rta_getattr_u16(greinfo[IFLA_GRE_ERSPAN_HWID]);
+
 		free(answer);
 	}
 
@@ -389,6 +405,24 @@ get_failed:
 				invarg("invalid erspan index\n", *argv);
 			if (erspan_idx & ~((1<<20) - 1) || erspan_idx == 0)
 				invarg("erspan index must be > 0 and <= 20-bit\n", *argv);
+		} else if (strcmp(*argv, "erspan_ver") == 0) {
+			NEXT_ARG();
+			if (get_u8(&erspan_ver, *argv, 0))
+				invarg("invalid erspan version\n", *argv);
+			if (erspan_ver != 1 && erspan_ver != 2)
+				invarg("erspan version must be 1 or 2\n", *argv);
+		} else if (strcmp(*argv, "erspan_dir") == 0) {
+			NEXT_ARG();
+			if (matches(*argv, "ingress") == 0)
+				erspan_dir = 0;
+			else if (matches(*argv, "egress") == 0)
+				erspan_dir = 1;
+			else
+				invarg("Invalid erspan direction.", *argv);
+		} else if (strcmp(*argv, "erspan_hwid") == 0) {
+			NEXT_ARG();
+			if (get_u16(&erspan_hwid, *argv, 0))
+				invarg("invalid erspan hwid\n", *argv);
 		} else
 			usage();
 		argc--; argv++;
@@ -408,9 +442,15 @@ get_failed:
 		addattr_l(n, 1024, IFLA_GRE_FLOWINFO, &flowinfo, 4);
 		addattr32(n, 1024, IFLA_GRE_FLAGS, flags);
 		addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-		if (erspan_idx != 0)
-			addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
-
+		if (erspan_ver) {
+			addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
+			if (erspan_ver == 1 && erspan_idx != 0) {
+				addattr32(n, 1024, IFLA_GRE_ERSPAN_INDEX, erspan_idx);
+			} else {
+				addattr8(n, 1024, IFLA_GRE_ERSPAN_DIR, erspan_dir);
+				addattr16(n, 1024, IFLA_GRE_ERSPAN_HWID, erspan_hwid);
+			}
+		}
 		addattr16(n, 1024, IFLA_GRE_ENCAP_TYPE, encaptype);
 		addattr16(n, 1024, IFLA_GRE_ENCAP_FLAGS, encapflags);
 		addattr16(n, 1024, IFLA_GRE_ENCAP_SPORT, htons(encapsport));
@@ -587,7 +627,30 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 
 	if (tb[IFLA_GRE_ERSPAN_INDEX]) {
 		__u32 erspan_idx = rta_getattr_u32(tb[IFLA_GRE_ERSPAN_INDEX]);
-		fprintf(f, "erspan_index %u ", erspan_idx);
+		print_uint(PRINT_ANY, "erspan_index", "erspan_index %u ", erspan_idx);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_VER]) {
+		__u8 erspan_ver = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_VER]);
+
+		print_uint(PRINT_ANY, "erspan_ver", "erspan_ver %u ", erspan_ver);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_DIR]) {
+		__u8 erspan_dir = rta_getattr_u8(tb[IFLA_GRE_ERSPAN_DIR]);
+
+		if (erspan_dir == 0)
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir ingress ", NULL);
+		else
+			print_string(PRINT_ANY, "erspan_dir",
+				     "erspan_dir egress ", NULL);
+	}
+
+	if (tb[IFLA_GRE_ERSPAN_HWID]) {
+		__u16 erspan_hwid = rta_getattr_u16(tb[IFLA_GRE_ERSPAN_HWID]);
+
+		print_hex(PRINT_ANY, "erspan_hwid", "erspan_hwid 0x%x ", erspan_hwid);
 	}
 
 	if (tb[IFLA_GRE_ENCAP_TYPE] &&
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 9e9a5f0d2cef..0086b3dfa09d 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -665,13 +665,13 @@ keyword.
 .in -8
 
 .TP
-GRE, IPIP, SIT, ERSPAN Type Support
+GRE, IPIP, SIT Type Support
 For a link of types
-.I GRE/IPIP/SIT/ERSPAN
+.I GRE/IPIP/SIT
 the following additional arguments are supported:
 
 .BI "ip link add " DEVICE
-.BR type " { " gre " | " ipip " | " sit " | " erspan " }"
+.BR type " { " gre " | " ipip " | " sit " }"
 .BI " remote " ADDR " local " ADDR
 [
 .BR encap " { " fou " | " gue " | " none " }"
@@ -685,8 +685,6 @@ the following additional arguments are supported:
 .I " [no]encap-remcsum "
 ] [
 .I " mode " { ip6ip | ipip | mplsip | any } "
-] [
-.BR erspan " \fIIDX "
 ]
 
 .in +8
@@ -731,13 +729,6 @@ MPLS-Over-IPv4, "any" indicates IPv6, IPv4 or MPLS Over IPv4. Supported for
 SIT where the default is "ip6ip" and IPIP where the default is "ipip".
 IPv6-Over-IPv4 is not supported for IPIP.
 
-.sp
-.BR erspan " \fIIDX "
-- specifies the ERSPAN index field.
-.IR IDX
-indicates a 20 bit index/port number associated with the ERSPAN
-traffic's source port and direction.
-
 .in -8
 
 .TP
@@ -883,6 +874,76 @@ the following additional arguments are supported:
 - specifies the mode (datagram or connected) to use.
 
 .TP
+ERSPAN Type Support
+For a link of type
+.I ERSPAN/IP6ERSPAN
+the following additional arguments are supported:
+
+.BI "ip link add " DEVICE
+.BR type " { " erspan " | " ip6erspan " }"
+.BI remote " ADDR " local " ADDR " seq
+.RB key
+.I KEY
+.BR erspan_ver " \fIversion "
+[
+.BR erspan " \fIIDX "
+] [
+.BR erspan_dir " { " \fIingress " | " \fIegress " }"
+] [
+.BR erspan_hwid " \fIhwid "
+] [
+.RB external
+]
+
+.in +8
+.sp
+.BI  remote " ADDR "
+- specifies the remote address of the tunnel.
+
+.sp
+.BI  local " ADDR "
+- specifies the fixed local address for tunneled packets.
+It must be an address on another interface on this host.
+
+.sp
+.BR erspan_ver " \fIversion "
+- specifies the ERSPAN version number.
+.IR version
+indicates the ERSPAN version to be created: 1 for version 1 (type II)
+or 2 for version 2 (type III).
+
+.sp
+.BR erspan " \fIIDX "
+- specifies the ERSPAN v1 index field.
+.IR IDX
+indicates a 20 bit index/port number associated with the ERSPAN
+traffic's source port and direction.
+
+.sp
+.BR erspan_dir " { " \fIingress " | " \fIegress " }"
+- specifies the ERSPAN v2 mirrored traffic's direction.
+
+.sp
+.BR erspan_hwid " \fIhwid "
+- an unique identifier of an ERSPAN v2 engine within a system.
+.IR hwid
+is a 6-bit value for users to configure.
+
+.sp
+.BR external
+- make this tunnel externally controlled (or not, which is the default).
+In the kernel, this is referred to as collect metadata mode.  This flag is
+mutually exclusive with the
+.BR remote ,
+.BR local ,
+.BR erspan_ver ,
+.BR erspan ,
+.BR erspan_dir " and " erspan_hwid
+options.
+
+.in -8
+
+.TP
 GENEVE Type Support
 For a link of type
 .I GENEVE
@@ -2062,6 +2123,13 @@ ip link add link wpan0 lowpan0 type lowpan
 Creates a 6LoWPAN interface named lowpan0 on the underlying
 IEEE 802.15.4 device wpan0.
 .RE
+.PP
+ip link add dev ip6erspan11 type ip6erspan seq key 102
+local fc00:100::2 remote fc00:100::1
+erspan_ver 2 erspan_dir ingress erspan_hwid 17
+.RS 4
+Creates a IP6ERSPAN version 2 interface named ip6erspan00.
+.RE
 
 .SH SEE ALSO
 .br
-- 
2.7.4

^ permalink raw reply related

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Willem de Bruijn @ 2017-12-20 22:44 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <f0f959dc-9c6f-c82b-b245-4aedf057e992@01019freenet.de>

On Wed, Dec 20, 2017 at 10:56 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/18/2017 at 06:11 PM Andreas Hartmann wrote:
>> On 12/17/2017 at 11:33 PM Willem de Bruijn wrote:
> [...]
>>> I have been able to reproduce the hang by sending a UFO packet
>>> between two guests running v4.13 on a host running v4.15-rc1.
>>>
>>> The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
>>> vhost_zerocopy_callback being called for each segment of a
>>> segmented UFO skb. This refcount is decremented then on each
>>> segment, but incremented only once for the entire UFO skb.
>>>
>>> Before v4.14, these packets would be converted in skb_segment to
>>> regular copy packets with skb_orphan_frags and the callback function
>>> called once at this point. v4.14 added support for reference counted
>>> zerocopy skb that can pass through skb_orphan_frags unmodified and
>>> have their zerocopy state safely cloned with skb_zerocopy_clone.
>>>
>>> The call to skb_zerocopy_clone must come after skb_orphan_frags
>>> to limit cloning of this state to those skbs that can do so safely.
>>>
>>> Please try a host with the following patch. This fixes it for me. I intend to
>>> send it to net.
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index a592ca025fc4..d2d985418819 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>                                               SKBTX_SHARED_FRAG;
>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>> -                       goto err;
>>>
>>>                 while (pos < offset + len) {
>>>                         if (i >= nfrags) {
>>> @@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                         if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>>                                 goto err;
>>> +                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>> +                               goto err;
>>>
>>>                         *nskb_frag = *frag;
>>>                         __skb_frag_ref(nskb_frag);
>>>
>>>
>>> This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
>>> in the frags[] array. I will follow-up with a patch to net-next that only
>>> checks once per skb:
>>>
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index 466581cf4cdc..a293a33604ec 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                 skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
>>>                                               SKBTX_SHARED_FRAG;
>>> -               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
>>> +               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>> +                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
>>>                         goto err;
>>>
>>>                 while (pos < offset + len) {
>>> @@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>
>>>                                 BUG_ON(!nfrags);
>>>
>>> +                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
>>> +                                   skb_zerocopy_clone(nskb, frag_skb,
>>> +                                                      GFP_ATOMIC))
>>> +                                       goto err;
>>> +
>>>                                 list_skb = list_skb->next;
>>>                         }
>>>
>>> @@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>>>                                 goto err;
>>>                         }
>>>
>>> -                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
>>> -                               goto err;
>>> -
>>
>> I'm currently testing this one.
>>
>
> Test is in progress. I'm testing w/ 4.14.7, which already contains "net:
> accept UFO datagrams from tuntap and packet".
>
> At first, I tested an unpatched 4.14.7 - the problem (no more killable
> qemu-process) did occur promptly on shutdown of the machine. This was
> expected.
>
> Next, I applied the above patch (the second one). Until now, I didn't
> face any problem any more on shutdown of VMs. Looks promising.

Thanks for testing.

I sent the first, simpler, one to net together with another fix.

  http://patchwork.ozlabs.org/patch/851715/

^ permalink raw reply

* Re: [PATCH] net: Revert "net_sched: no need to free qdisc in RCU callback"
From: Cong Wang @ 2017-12-20 22:41 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jiri Pirko, David Miller, Jakub Kicinski,
	Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <20171220200919.6233.48192.stgit@john-Precision-Tower-5810>

On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> RCU grace period is needed for lockless qdiscs added in the commit
> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array").
>
> It is needed now that qdiscs may be lockless otherwise we risk
> free'ing a qdisc that is still in use from datapath. Additionally,
> push list cleanup into RCU callback. Otherwise we risk the datapath
> adding skbs during removal.

What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ?
It doesn't work with your "lockless" patches?

^ permalink raw reply

* Re: RCU callback crashes
From: Cong Wang @ 2017-12-20 22:38 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jakub Kicinski, Jiri Pirko, netdev@vger.kernel.org
In-Reply-To: <92220030-e9f1-1963-00b6-05f37abb82ee@gmail.com>

On Wed, Dec 20, 2017 at 12:23 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> I'm trying to see how removing that rcu grace period was safe in the
> first place. The datapath is using rcu_read critical section to protect
> the qdisc but the control path (a) doesn't use rcu grace period and (b)
> doesn't use the qidisc lock. Going to go get a coffee and I'll think
> about it a bit more. Any ideas Cong?

qdisc_graft() -> dev_deactivate() -> synchronize_net() is the reason
you want to find?

Also, you can try `git log` to see why it was introduced in the beginning,
here it is:

commit 5d944c640b4ae5f37c537acf491c2f0eb89fa0d6
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Wed Mar 31 07:06:04 2010 +0000

    gen_estimator: deadlock fix

^ permalink raw reply

* [PATCH net 2/2] skbuff: skb_copy_ubufs must release uarg even without user frags
From: Willem de Bruijn @ 2017-12-20 22:37 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn
In-Reply-To: <20171220223750.27795-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

skb_copy_ubufs creates a private copy of frags[] to release its hold
on user frags, then calls uarg->callback to notify the owner.

Call uarg->callback even when no frags exist. This edge case can
happen when zerocopy_sg_from_iter finds enough room in skb_headlen
to copy all the data.

Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages")
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/core/skbuff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index edf40ac0cd07..a3cb0be4c6f3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1178,7 +1178,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	u32 d_off;
 
 	if (!num_frags)
-		return 0;
+		goto release;
 
 	if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
 		return -EINVAL;
@@ -1238,6 +1238,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	__skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off);
 	skb_shinfo(skb)->nr_frags = new_frags;
 
+release:
 	skb_zcopy_clear(skb, false);
 	return 0;
 }
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related

* [PATCH net 1/2] skbuff: orphan frags before zerocopy clone
From: Willem de Bruijn @ 2017-12-20 22:37 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn
In-Reply-To: <20171220223750.27795-1-willemdebruijn.kernel@gmail.com>

From: Willem de Bruijn <willemb@google.com>

Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
calls to skb_uarg(skb)->callback for the same data.

skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
with each segment. This is only safe for uargs that do refcounting,
which is those that pass skb_orphan_frags without dropping their
shared frags. For others, skb_orphan_frags drops the user frags and
sets the uarg to NULL, after which sock_zerocopy_clone has no effect.

Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
calls for the same data causing the vhost_net_ubuf_ref_>refcount to
drop below zero.

Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY")
Reported-by: Andreas Hartmann <andihartmann@01019freenet.de>
Reported-by: David Hill <dhill@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>

---

This fix causes skb_zerocopy_clone to be called for each frag in the
array. I will follow-up with a patch to net-next that will call both
skb_orphan_frags and skb_zerocopy_clone once per skb only.
---
 net/core/skbuff.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a592ca025fc4..edf40ac0cd07 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 
 		skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
 					      SKBTX_SHARED_FRAG;
-		if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-			goto err;
 
 		while (pos < offset + len) {
 			if (i >= nfrags) {
@@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 
 			if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
 				goto err;
+			if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+				goto err;
 
 			*nskb_frag = *frag;
 			__skb_frag_ref(nskb_frag);
-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply related

* [PATCH net 0/2] zerocopy fixes
From: Willem de Bruijn @ 2017-12-20 22:37 UTC (permalink / raw)
  To: netdev; +Cc: davem, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

The removal of UFO hardware offload support exposed a bug in
segmentation of zerocopy skbs.

The reference counting mechanism for msg_zerocopy was incorrectly
applied to vhost_net zerocopy packets.

The other issue observed through analysis. We do not cook skbs
with skb_zcopy(skb) but no frags, but this is possible in principle.
Correctly call skb_zcopy_clear on those.

Willem de Bruijn (2):
  skbuff: orphan frags before zerocopy clone
  skbuff: skb_copy_ubufs must release uarg even without user frags

 net/core/skbuff.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

-- 
2.15.1.620.gb9897f4670-goog

^ permalink raw reply

* Re: [RFC PATCH] virtio_net: Extend virtio to use VF datapath when available
From: Jakub Kicinski @ 2017-12-20 22:33 UTC (permalink / raw)
  To: Sridhar Samudrala; +Cc: mst, stephen, netdev, virtualization, alexander.duyck
In-Reply-To: <1513644036-45230-1-git-send-email-sridhar.samudrala@intel.com>

On Mon, 18 Dec 2017 16:40:36 -0800, Sridhar Samudrala wrote:
> +static int virtio_netdev_event(struct notifier_block *this,
> +			       unsigned long event, void *ptr)
> +{
> +	struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
> +
> +	/* Skip our own events */
> +	if (event_dev->netdev_ops == &virtnet_netdev)
> +		return NOTIFY_DONE;

I wonder how does this work WRT loop prevention.  What if I have two
virtio devices with the same MAC, what is preventing them from claiming
each other?  Is it only the check above (not sure what is "own" in
comment referring to)?

I'm worried the check above will not stop virtio from enslaving hyperv
interfaces and vice versa, potentially leading to a loop, no?  There is
also the fact that it would be preferable to share the code between
paravirt drivers, to avoid duplicated bugs.

My suggestion during the previous discussion was to create a paravirt
bond device, which will explicitly tell the OS which interfaces to
bond, regardless of which driver they're using.  Could be some form of
ACPI/FW driver too, I don't know enough about x86 FW to suggest
something fully fleshed :(

^ permalink raw reply

* [PATCH] hv_netvsc: update VF after name has changed.
From: Stephen Hemminger @ 2017-12-20 22:33 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

Since commit 6123c66854c1 ("netvsc: delay setup of VF device")
the automatic bring up of the VF is delayed to allow userspace (udev)
a chance to rename the device. This delay is problematic because
it delays boot and may not be long enough for some cases.

Instead, use the rename can be used to trigger the next step
in setup to happen immediately.

The VF initialization sequence now looks like:
   * hotplug causes VF network device probe to create network device
   * netvsc notifier joins VF with netvsc and schedules VF to be
     setup after timer expires
   * udev in userspace renames device
   * if netvsc notifier detects rename, it can cancel timer
     and do immediate setup

The delay can also be increased to allow for slower rules.
Still need the delayed work to handle the case where rename is
not done.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index c5584c2d440e..7771d6c54b55 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -49,7 +49,7 @@
 #define RING_SIZE_MIN		64
 
 #define LINKCHANGE_INT (2 * HZ)
-#define VF_TAKEOVER_INT (HZ / 10)
+#define VF_TAKEOVER_INT (HZ / 4)
 
 static unsigned int ring_size __ro_after_init = 128;
 module_param(ring_size, uint, S_IRUGO);
@@ -1894,6 +1894,26 @@ static int netvsc_vf_changed(struct net_device *vf_netdev)
 	return NOTIFY_OK;
 }
 
+/* If VF was renamed, then udev (or other tool) has done its
+ * work and the VF can be brought up.
+ */
+static int netvsc_vf_renamed(struct net_device *vf_netdev)
+{
+	struct net_device_context *ndev_ctx;
+	struct net_device *ndev;
+
+	ndev = get_netvsc_byref(vf_netdev);
+	if (!ndev)
+		return NOTIFY_DONE;
+
+	ndev_ctx = netdev_priv(ndev);
+
+	if (cancel_delayed_work(&ndev_ctx->vf_takeover))
+		__netvsc_vf_setup(ndev, vf_netdev);
+
+	return NOTIFY_OK;
+}
+
 static int netvsc_unregister_vf(struct net_device *vf_netdev)
 {
 	struct net_device *ndev;
@@ -2110,6 +2130,8 @@ static int netvsc_netdev_event(struct notifier_block *this,
 	case NETDEV_UP:
 	case NETDEV_DOWN:
 		return netvsc_vf_changed(event_dev);
+	case NETDEV_CHANGENAME:
+		return netvsc_vf_renamed(event_dev);
 	default:
 		return NOTIFY_DONE;
 	}
-- 
2.11.0

^ permalink raw reply related

* Re: RCU callback crashes
From: Cong Wang @ 2017-12-20 22:25 UTC (permalink / raw)
  To: John Fastabend; +Cc: Jiri Pirko, Jakub Kicinski, netdev@vger.kernel.org
In-Reply-To: <d82f1fdf-5d7b-db1b-6efb-85ff957f8a38@gmail.com>

On Wed, Dec 20, 2017 at 12:14 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Hi,
>
> Just sent a patch to complete qdisc_destroy from rcu callback. This
> is needed to resolve a race with the lockless qdisc patches.
>
> But I guess it should fix the miniq issue as well?


If you ever look into tools/testing/selftests/bpf/test_offload.py, it is
ingress qdisc. Don't know why you keep believing it is related
to your patches.

^ permalink raw reply

* Re: [PATCH v5 3/6] perf: implement pmu perf_kprobe
From: Song Liu @ 2017-12-20 22:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, mingo@redhat.com, David Miller,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Daniel Borkmann, Kernel Team
In-Reply-To: <20171220212554.v4obwcp25hee556y@hirez.programming.kicks-ass.net>


> On Dec 20, 2017, at 1:25 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Wed, Dec 20, 2017 at 06:10:11PM +0000, Song Liu wrote:
>> I think there is one more thing to change:
> 
> OK, folded that too; it should all be at:
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core
> 
> Can you verify it all looks/works right?

Thanks Peter! The patches look right. And they work as expected in my tests. 

Best,
Song

^ permalink raw reply

* [PATCH v3 2/3] net: ibm: emac: replace custom PHY_MODE_* macros
From: Christian Lamparter @ 2017-12-20 22:01 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Andrew Lunn, Christophe Jaillet
In-Reply-To: <a9482f4f4037f6eb732de327290a432539648bcd.1513806256.git.chunkeey@gmail.com>

The ibm_emac driver predates the PHY_INTERFACE_MODE_*
enums by a few years.

And while the driver has been retrofitted to use the PHYLIB,
the old definitions have stuck around to this day.

This patch replaces all occurences of PHY_MODE_* with
the respective equivalent PHY_INTERFACE_MODE_* enum.
And finally, it purges the old macros for good.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>

---
Extra change in zmii.c to make checkpatch happy.
---
 drivers/net/ethernet/ibm/emac/core.c  | 20 +++++++++---------
 drivers/net/ethernet/ibm/emac/emac.h  | 13 ------------
 drivers/net/ethernet/ibm/emac/phy.c   | 10 ++++-----
 drivers/net/ethernet/ibm/emac/rgmii.c | 20 +++++++++---------
 drivers/net/ethernet/ibm/emac/zmii.c  | 38 +++++++++++++++++------------------
 5 files changed, 44 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 7feff2450ed6..27f592da7cfc 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -199,18 +199,18 @@ static void __emac_set_multicast_list(struct emac_instance *dev);
 
 static inline int emac_phy_supports_gige(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
-		phy_mode == PHY_MODE_RGMII ||
-		phy_mode == PHY_MODE_SGMII ||
-		phy_mode == PHY_MODE_TBI ||
-		phy_mode == PHY_MODE_RTBI;
+	return  phy_mode == PHY_INTERFACE_MODE_GMII ||
+		phy_mode == PHY_INTERFACE_MODE_RGMII ||
+		phy_mode == PHY_INTERFACE_MODE_SGMII ||
+		phy_mode == PHY_INTERFACE_MODE_TBI ||
+		phy_mode == PHY_INTERFACE_MODE_RTBI;
 }
 
 static inline int emac_phy_gpcs(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_SGMII ||
-		phy_mode == PHY_MODE_TBI ||
-		phy_mode == PHY_MODE_RTBI;
+	return  phy_mode == PHY_INTERFACE_MODE_SGMII ||
+		phy_mode == PHY_INTERFACE_MODE_TBI ||
+		phy_mode == PHY_INTERFACE_MODE_RTBI;
 }
 
 static inline void emac_tx_enable(struct emac_instance *dev)
@@ -2865,7 +2865,7 @@ static int emac_init_config(struct emac_instance *dev)
 	/* PHY mode needs some decoding */
 	dev->phy_mode = of_get_phy_mode(np);
 	if (dev->phy_mode < 0)
-		dev->phy_mode = PHY_MODE_NA;
+		dev->phy_mode = PHY_INTERFACE_MODE_NA;
 
 	/* Check EMAC version */
 	if (of_device_is_compatible(np, "ibm,emac4sync")) {
@@ -3168,7 +3168,7 @@ static int emac_probe(struct platform_device *ofdev)
 	printk(KERN_INFO "%s: EMAC-%d %pOF, MAC %pM\n",
 	       ndev->name, dev->cell_index, np, ndev->dev_addr);
 
-	if (dev->phy_mode == PHY_MODE_SGMII)
+	if (dev->phy_mode == PHY_INTERFACE_MODE_SGMII)
 		printk(KERN_NOTICE "%s: in SGMII mode\n", ndev->name);
 
 	if (dev->phy.address >= 0)
diff --git a/drivers/net/ethernet/ibm/emac/emac.h b/drivers/net/ethernet/ibm/emac/emac.h
index 5afcc27ceebb..bc14dcf27b6b 100644
--- a/drivers/net/ethernet/ibm/emac/emac.h
+++ b/drivers/net/ethernet/ibm/emac/emac.h
@@ -104,19 +104,6 @@ struct emac_regs {
 	} u1;
 };
 
-/*
- * PHY mode settings (EMAC <-> ZMII/RGMII bridge <-> PHY)
- */
-#define PHY_MODE_NA	PHY_INTERFACE_MODE_NA
-#define PHY_MODE_MII	PHY_INTERFACE_MODE_MII
-#define PHY_MODE_RMII	PHY_INTERFACE_MODE_RMII
-#define PHY_MODE_SMII	PHY_INTERFACE_MODE_SMII
-#define PHY_MODE_RGMII	PHY_INTERFACE_MODE_RGMII
-#define PHY_MODE_TBI	PHY_INTERFACE_MODE_TBI
-#define PHY_MODE_GMII	PHY_INTERFACE_MODE_GMII
-#define PHY_MODE_RTBI	PHY_INTERFACE_MODE_RTBI
-#define PHY_MODE_SGMII	PHY_INTERFACE_MODE_SGMII
-
 /* EMACx_MR0 */
 #define EMAC_MR0_RXI			0x80000000
 #define EMAC_MR0_TXI			0x40000000
diff --git a/drivers/net/ethernet/ibm/emac/phy.c b/drivers/net/ethernet/ibm/emac/phy.c
index 874949faa424..71ee13c34192 100644
--- a/drivers/net/ethernet/ibm/emac/phy.c
+++ b/drivers/net/ethernet/ibm/emac/phy.c
@@ -96,7 +96,7 @@ int emac_mii_reset_gpcs(struct mii_phy *phy)
 	if ((val & BMCR_ISOLATE) && limit > 0)
 		gpcs_phy_write(phy, MII_BMCR, val & ~BMCR_ISOLATE);
 
-	if (limit > 0 && phy->mode == PHY_MODE_SGMII) {
+	if (limit > 0 && phy->mode == PHY_INTERFACE_MODE_SGMII) {
 		/* Configure GPCS interface to recommended setting for SGMII */
 		gpcs_phy_write(phy, 0x04, 0x8120); /* AsymPause, FDX */
 		gpcs_phy_write(phy, 0x07, 0x2801); /* msg_pg, toggle */
@@ -313,16 +313,16 @@ static int cis8201_init(struct mii_phy *phy)
 	epcr &= ~EPCR_MODE_MASK;
 
 	switch (phy->mode) {
-	case PHY_MODE_TBI:
+	case PHY_INTERFACE_MODE_TBI:
 		epcr |= EPCR_TBI_MODE;
 		break;
-	case PHY_MODE_RTBI:
+	case PHY_INTERFACE_MODE_RTBI:
 		epcr |= EPCR_RTBI_MODE;
 		break;
-	case PHY_MODE_GMII:
+	case PHY_INTERFACE_MODE_GMII:
 		epcr |= EPCR_GMII_MODE;
 		break;
-	case PHY_MODE_RGMII:
+	case PHY_INTERFACE_MODE_RGMII:
 	default:
 		epcr |= EPCR_RGMII_MODE;
 	}
diff --git a/drivers/net/ethernet/ibm/emac/rgmii.c b/drivers/net/ethernet/ibm/emac/rgmii.c
index 9a1c06f2471d..5db225831e81 100644
--- a/drivers/net/ethernet/ibm/emac/rgmii.c
+++ b/drivers/net/ethernet/ibm/emac/rgmii.c
@@ -52,25 +52,25 @@
 /* RGMII bridge supports only GMII/TBI and RGMII/RTBI PHYs */
 static inline int rgmii_valid_mode(int phy_mode)
 {
-	return  phy_mode == PHY_MODE_GMII ||
-		phy_mode == PHY_MODE_MII ||
-		phy_mode == PHY_MODE_RGMII ||
-		phy_mode == PHY_MODE_TBI ||
-		phy_mode == PHY_MODE_RTBI;
+	return  phy_mode == PHY_INTERFACE_MODE_GMII ||
+		phy_mode == PHY_INTERFACE_MODE_MII ||
+		phy_mode == PHY_INTERFACE_MODE_RGMII ||
+		phy_mode == PHY_INTERFACE_MODE_TBI ||
+		phy_mode == PHY_INTERFACE_MODE_RTBI;
 }
 
 static inline u32 rgmii_mode_mask(int mode, int input)
 {
 	switch (mode) {
-	case PHY_MODE_RGMII:
+	case PHY_INTERFACE_MODE_RGMII:
 		return RGMII_FER_RGMII(input);
-	case PHY_MODE_TBI:
+	case PHY_INTERFACE_MODE_TBI:
 		return RGMII_FER_TBI(input);
-	case PHY_MODE_GMII:
+	case PHY_INTERFACE_MODE_GMII:
 		return RGMII_FER_GMII(input);
-	case PHY_MODE_MII:
+	case PHY_INTERFACE_MODE_MII:
 		return RGMII_FER_MII(input);
-	case PHY_MODE_RTBI:
+	case PHY_INTERFACE_MODE_RTBI:
 		return RGMII_FER_RTBI(input);
 	default:
 		BUG();
diff --git a/drivers/net/ethernet/ibm/emac/zmii.c b/drivers/net/ethernet/ibm/emac/zmii.c
index 89c42d362292..fdcc734541fe 100644
--- a/drivers/net/ethernet/ibm/emac/zmii.c
+++ b/drivers/net/ethernet/ibm/emac/zmii.c
@@ -49,20 +49,20 @@
  */
 static inline int zmii_valid_mode(int mode)
 {
-	return  mode == PHY_MODE_MII ||
-		mode == PHY_MODE_RMII ||
-		mode == PHY_MODE_SMII ||
-		mode == PHY_MODE_NA;
+	return  mode == PHY_INTERFACE_MODE_MII ||
+		mode == PHY_INTERFACE_MODE_RMII ||
+		mode == PHY_INTERFACE_MODE_SMII ||
+		mode == PHY_INTERFACE_MODE_NA;
 }
 
 static inline const char *zmii_mode_name(int mode)
 {
 	switch (mode) {
-	case PHY_MODE_MII:
+	case PHY_INTERFACE_MODE_MII:
 		return "MII";
-	case PHY_MODE_RMII:
+	case PHY_INTERFACE_MODE_RMII:
 		return "RMII";
-	case PHY_MODE_SMII:
+	case PHY_INTERFACE_MODE_SMII:
 		return "SMII";
 	default:
 		BUG();
@@ -72,11 +72,11 @@ static inline const char *zmii_mode_name(int mode)
 static inline u32 zmii_mode_mask(int mode, int input)
 {
 	switch (mode) {
-	case PHY_MODE_MII:
+	case PHY_INTERFACE_MODE_MII:
 		return ZMII_FER_MII(input);
-	case PHY_MODE_RMII:
+	case PHY_INTERFACE_MODE_RMII:
 		return ZMII_FER_RMII(input);
-	case PHY_MODE_SMII:
+	case PHY_INTERFACE_MODE_SMII:
 		return ZMII_FER_SMII(input);
 	default:
 		return 0;
@@ -106,27 +106,27 @@ int zmii_attach(struct platform_device *ofdev, int input, int *mode)
 	 * Please, always specify PHY mode in your board port to avoid
 	 * any surprises.
 	 */
-	if (dev->mode == PHY_MODE_NA) {
-		if (*mode == PHY_MODE_NA) {
+	if (dev->mode == PHY_INTERFACE_MODE_NA) {
+		if (*mode == PHY_INTERFACE_MODE_NA) {
 			u32 r = dev->fer_save;
 
 			ZMII_DBG(dev, "autodetecting mode, FER = 0x%08x" NL, r);
 
 			if (r & (ZMII_FER_MII(0) | ZMII_FER_MII(1)))
-				dev->mode = PHY_MODE_MII;
+				dev->mode = PHY_INTERFACE_MODE_MII;
 			else if (r & (ZMII_FER_RMII(0) | ZMII_FER_RMII(1)))
-				dev->mode = PHY_MODE_RMII;
+				dev->mode = PHY_INTERFACE_MODE_RMII;
 			else
-				dev->mode = PHY_MODE_SMII;
-		} else
+				dev->mode = PHY_INTERFACE_MODE_SMII;
+		} else {
 			dev->mode = *mode;
-
+		}
 		printk(KERN_NOTICE "%pOF: bridge in %s mode\n",
 		       ofdev->dev.of_node,
 		       zmii_mode_name(dev->mode));
 	} else {
 		/* All inputs must use the same mode */
-		if (*mode != PHY_MODE_NA && *mode != dev->mode) {
+		if (*mode != PHY_INTERFACE_MODE_NA && *mode != dev->mode) {
 			printk(KERN_ERR
 			       "%pOF: invalid mode %d specified for input %d\n",
 			       ofdev->dev.of_node, *mode, input);
@@ -246,7 +246,7 @@ static int zmii_probe(struct platform_device *ofdev)
 
 	mutex_init(&dev->lock);
 	dev->ofdev = ofdev;
-	dev->mode = PHY_MODE_NA;
+	dev->mode = PHY_INTERFACE_MODE_NA;
 
 	rc = -ENXIO;
 	if (of_address_to_resource(np, 0, &regs)) {
-- 
2.15.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox