Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] net: qualcomm: emac: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:01 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, timur, yang.wei9
In-Reply-To: <1549986597-4837-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:49:57 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in emac_mac_tx_process() when
> skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:01 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, yang.wei9
In-Reply-To: <1549986773-4974-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:52:53 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called in mace_interrupt() when skb
> xmit done. It makes drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] net: atheros: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:01 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, jcliburn, chris.snook, yang.wei9
In-Reply-To: <1549986705-4915-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:51:45 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied.

^ permalink raw reply

* Re: [PATCH net] net: neterion: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
From: David Miller @ 2019-02-14  5:00 UTC (permalink / raw)
  To: albin_yang; +Cc: netdev, jdmason, yang.wei9
In-Reply-To: <1549986451-4780-1-git-send-email-albin_yang@163.com>

From: Yang Wei <albin_yang@163.com>
Date: Tue, 12 Feb 2019 23:47:31 +0800

> From: Yang Wei <yang.wei9@zte.com.cn>
> 
> dev_consume_skb_irq() should be called when skb xmit done. It makes
> drop profiles(dropwatch, perf) more friendly.
> 
> Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: Andrew Lunn @ 2019-02-14  4:50 UTC (permalink / raw)
  To: David Miller; +Cc: dave.anglin, linux, vivien.didelot, f.fainelli, netdev
In-Reply-To: <20190213.204731.2262809689964875254.davem@davemloft.net>

> Ok, all done.

Thanks

> Should I queue just this one for -stable?  I didn't queue up Heiner's change for
> -stable because it fixes a 5.0-rcX regression.

Yes please.

    Andrew

^ permalink raw reply

* Re: [RFC PATCH net-next 2/5] net: 8021q: vlan_dev: add vid tag for uc and mc address lists
From: Florian Fainelli @ 2019-02-14  4:49 UTC (permalink / raw)
  To: Ivan Khoronzhuk, davem, linux-omap, netdev, linux-kernel, jiri,
	andrew
In-Reply-To: <20190213161715.GA32249@khorivan>



On February 13, 2019 8:17:16 AM PST, Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> wrote:
>On Tue, Jan 22, 2019 at 03:12:41PM +0200, Ivan Khoronzhuk wrote:
>>On Mon, Jan 21, 2019 at 03:37:41PM -0800, Florian Fainelli wrote:
>>>On 12/4/18 3:42 PM, Ivan Khoronzhuk wrote:
>>>>On Tue, Dec 04, 2018 at 11:49:27AM -0800, Florian Fainelli wrote:
>>
>>[...]
>>
>>>
>>>Ivan, based on the recent submission I copied you on [1], it sounds
>like
>>>we want to move ahead with your proposal to extend netdev_hw_addr
>with a
>>>vid member.
>>>
>>>On second thought, your approach is good and if we enclose the vid
>>>member within an #if IS_ENABLED(CONFIG_VLAN)8021Q) we should be good
>for
>>>most foreseeable use cases, if not, we can always introduce a
>variable
>>>size/defined context in the future.
>>>
>>>Can you resubmit this patch series as non-RFC in the next few days so
>I
>>>can also repost mine [1] and take advantage of these changes for
>>>multicast over VLAN when VLAN filtering is globally enabled on the
>device.
>>>
>>>[1]: https://www.spinics.net/lists/netdev/msg544722.html
>>>
>>>Thanks!
>>
>>Yes, sure. I can start to do that in several days.
>>Just a little busy right now.
>>
>>Just before doing this, maybe some comments could be added as it has
>more
>>attention now. Meanwhile I can send alternative variant but based on
>>real dev splitting addresses between vlans. In this approach it leaves
>address
>>space w/o vid extension but requires more changes to vlan core.
>Drawback here
>>that to change one address alg traverses all related vlan addresses,
>it can be
>>cpu/time wasteful, if it's done regularly, but saves memory....
>>
>>Basically it's implemented locally in cpsw and requires more changes
>to move
>>it as some vlan core auxiliary functions to be reused. But it can work
>only
>>with vlans directly on top of real dev, which is fixable.
>>
>>Core function here:
>>__hw_addr_ref_sync_dev
>>it is called only for address the link of which was
>increased/decreased, thus
>>update made only on one address, comparing it for every vlan dev.
>>
>>It was added with this patch:
>>[1] net: core: dev_addr_lists: add auxiliary func to handle reference 
>>address update e7946760de5852f32
>>
>>And used by this patch:
>>[2] net: ethernet: ti: cpsw: fix vlan mcast 15180eca569bfe1d4d
>>
>>So, idea is to move [2] to be vlan core auxiliary function to be
>reused
>>by NIC drivers.
>>
>>But potentially it can bring a little more changes I assume:
>>
>>1) add priv_flag |= IFF_IV_FLT (independent vlan filtering). It allows
>to reuse
>>this flag for farther changes, probably for per vlan allmulti or so.
>>
>>2) real dev has to have complete list for vlans, not only their vids,
>but also
>>all vlandevs in device chain above it. So changes in add_vid can be
>required.
>>Vlan core can assign vlan dev pointer to real device only after it's
>completely
>>initialized. And for propagation reasons it requires every device in
>>infrastructure to be aware. That seems doable, but depends not only on
>me.
>>
>>3) Move code from [2] to be auxiliary vlan core API for setting mc and
>uc.
>>From this patch only one function is cpsw specific: cpsw_set_mc(). The
>rest can
>>be applicable on every NIC supporting IFF_IV_FLT.
>>
>>4) Move code from link below to do the same but for uc addresses:
>>https://git.linaro.org/people/ivan.khoronzhuk/tsn_kernel.git/commit/?h=ucast_vlan_fix&id=ebc88a7d8758759322d9ff88f25f8bac51ce7219
>>here only one func cpsw specific: cpsw_set_uc()
>>the rest can be generic.
>>
>>As third alternative, we can think about how to reduce memory for
>addresses by
>>reusing them or else, but this is as continuation of addr+vid
>approach, and API
>>probably would be the same.
>>
>>Then all this can be compared for proper decision.
>
>
>Hi Florian,
>
>After several more investigations and tries probably better left this
>idea as is.

Thank you for keeping the thread alive, does that mean you are going to resubmit this patch series as-is (rebased) or are you saying that you are abandoning the idea and leaving the situation the way it is in cpsw?

>
>Here actually several explanations for this:
>1) If even assume that we can get access to vlan devices in the above
>ndev
>tree (we can) that doesn't guarantee that receive vlan filters are set
>replicating this structure. For example bond device can have one active
>slave
>but both of them in the tree having vid set, in this case addresses are
>syched only with active slave, no filters should be applied to not
>active slave.
>this can be achieved only each address has vid context.
>
>2) According to 1) rx filters device structure can be created while
>mc_sync()
>in each rx_mode(), and then used as orthogonal info. I've tried and it
>looks
>not cool and consumes anyway memory and even if it's less it's still
>not very
>scalable. (+ no normal signal "in complex structure case" when address
>should
>be undated to avoid redundant cpu cycles). Not sure it can have
>practical
>results and be universal enouph.
>
>3) Assuming that every device in the tree (bond, team or else) is legal
>to
>modify its own address space, the real end device cannot be sure the
>vlan device
>address spaces reflects vid addresses that device tree want's from him.
>According to this each address in address space must hold its own
>context at
>every device and this context is comparable with address size.
>
>>-- Regards,
>>Ivan Khoronzhuk

-- 
Florian

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: David Miller @ 2019-02-14  4:47 UTC (permalink / raw)
  To: andrew; +Cc: dave.anglin, linux, vivien.didelot, f.fainelli, netdev
In-Reply-To: <20190214020723.GE24589@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Thu, 14 Feb 2019 03:07:23 +0100

> On Mon, Feb 11, 2019 at 01:40:21PM -0500, John David Anglin wrote:
>> The GPIO interrupt controller on the espressobin board only supports edge interrupts.
>> If one enables the use of hardware interrupts in the device tree for the 88E6341, it is
>> possible to miss an edge.  When this happens, the INTn pin on the Marvell switch is
>> stuck low and no further interrupts occur.
>> 
>> I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is
>> a race in handling device interrupts (e.g. PHY link interrupts).  Some interrupts are
>> directly cleared by reading the Global 1 status register.  However, the device interrupt
>> flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced.
>> This is done by reading the relevant SERDES and PHY status register.
>> 
>> The code only services interrupts whose status bit is set at the time of reading its status
>> register.  If an interrupt event occurs after its status is read and before all interrupts
>> are serviced, then this event will not be serviced and the INTn output pin will remain low.
>> 
>> This is not a problem with polling or level interrupts since the handler will be called
>> again to process the event.  However, it's a big problem when using level interrupts.
>> 
>> The fix presented here is to add a loop around the code servicing switch interrupts.  If
>> any pending interrupts remain after the current set has been handled, we loop and process
>> the new set.  If there are no pending interrupts after servicing, we are sure that INTn has
>> gone high and we will get an edge when a new event occurs.
>> 
>> Tested on espressobin board.
>> 
>> Signed-off-by:  John David Anglin <dave.anglin@bell.net>
> 
> Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
> 
> Tested-by: Andrew Lunn <andrew@lunn.ch>
> 
> David, please ensure that Heiner's patch:
> 
> net: phy: fix interrupt handling in non-started states
> 
> is applied first. Otherwise we can get into an interrupt storm.

Ok, all done.

Should I queue just this one for -stable?  I didn't queue up Heiner's change for
-stable because it fixes a 5.0-rcX regression.

^ permalink raw reply

* Re: [PATCH net] net: phy: fix interrupt handling in non-started states
From: David Miller @ 2019-02-14  4:44 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev, linux
In-Reply-To: <25e86edc-0b88-8c03-b692-776e971331f2@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Tue, 12 Feb 2019 19:56:15 +0100

> phylib enables interrupts before phy_start() has been called, and if
> we receive an interrupt in a non-started state, the interrupt handler
> returns IRQ_NONE. This causes problems with at least one Marvell chip
> as reported by Andrew.
> Fix this by handling interrupts the same as in phy_mac_interrupt(),
> basically always running the phylib state machine. It knows when it
> has to do something and when not.
> This change allows to handle interrupts gracefully even if they
> occur in a non-started state.
> 
> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
> Reported-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied, thanks Heiner.

^ permalink raw reply

* Re: [PATCH 2/2] doc: add phylink documentation to the networking book
From: Randy Dunlap @ 2019-02-14  4:39 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King, linux-doc, netdev, David S. Miller, Jonathan Corbet
In-Reply-To: <20190214043217.GB20024@lunn.ch>

On 2/13/19 8:32 PM, Andrew Lunn wrote:
>>> +For information describing the SFP cage in DT, please see the binding
>>> +documentation in the kernel source tree
>>> +``Documentation/devicetree/bindings/net/sff,sfp.txt``
>> oh, so SFP means "Small Form-factor Pluggable".
>>
>> I see that this source file:
>> ./drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1902:
>>
>> seems to imply that SFP means "single function per port (SFP) mode":
> 
> Hi Randy
> 
> rfc5513 might be relevant.
> 
> 	Andrew
> 

Definitely.  like WAD.  :)

thanks.
-- 
~Randy

^ permalink raw reply

* Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
From: Alexei Starovoitov @ 2019-02-14  4:39 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Willem de Bruijn, Stanislav Fomichev, Network Development,
	David Miller, Alexei Starovoitov, Daniel Borkmann, simon.horman,
	Willem de Bruijn
In-Reply-To: <20190212170232.GB10595@mini-arch>

On Tue, Feb 12, 2019 at 09:02:32AM -0800, Stanislav Fomichev wrote:
> On 02/05, Stanislav Fomichev wrote:
> > On 02/05, Alexei Starovoitov wrote:
> > > On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote:
> > > > On 02/05, Alexei Starovoitov wrote:
> > > > > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote:
> > > > > > On 02/05, Alexei Starovoitov wrote:
> > > > > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote:
> > > > > > > > On 02/05, Willem de Bruijn wrote:
> > > > > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
> > > > > > > > > > skb. Because we use passed skb to lookup associated networking namespace
> > > > > > > > > > to find whether we have a BPF program attached or not, we always use
> > > > > > > > > > C-based flow dissector in this case.
> > > > > > > > > >
> > > > > > > > > > The goal of this patch series is to add new networking namespace argument
> > > > > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to
> > > > > > > > > > work in the skb-less case.
> > > > > > > > > >
> > > > > > > > > > The series goes like this:
> > > > > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to
> > > > > > > > > >    initialize temporary skb
> > > > > > > > > > 2. introduce skb_net which can be used to get networking namespace
> > > > > > > > > >    associated with an skb
> > > > > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and
> > > > > > > > > >    plumb through the callers
> > > > > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb
> > > > > > > > > >    (using __init_skb) and calls BPF flow dissector program
> > > > > > > > > 
> > > > > > > > > The main concern I see with this series is this cost of skb zeroing
> > > > > > > > > for every packet in the device driver receive routine, *independent*
> > > > > > > > > from the real skb allocation and zeroing which will likely happen
> > > > > > > > > later.
> > > > > > > > Yes, plus ~200 bytes on the stack for the callers.
> > > > > > > > 
> > > > > > > > Not sure how visible this zeroing though, I can probably try to get some
> > > > > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with
> > > > > > > > on-stack skb).
> > > > > > > 
> > > > > > > imo extra 256 byte memset for every packet is non starter.
> > > > > > We can put pre-allocated/initialized skbs without data into percpu or even
> > > > > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think
> > > > > > about having multiple percpu for irq/softirq/process contexts.
> > > > > > Any concerns with that approach?
> > > > > > Any other possible concerns with the overall series?
> > > > > 
> > > > > I'm missing why the whole thing is needed.
> > > > > You're saying:
> > > > > " make BPF flow dissector programs be able to work in the skb-less case".
> > > > > What does it mean specifically?
> > > > > The only non-skb case is XDP.
> > > > > Are you saying you want flow_dissector prog to be run in XDP?
> > > > eth_get_headlen that drivers call on RX path on a chunk of data to
> > > > guesstimate the length of the headers calls flow dissector without an skb
> > > > (__skb_flow_dissect was a weird interface where it accepts skb or
> > > > data+len). Right now, there is no way to trigger BPF flow dissector
> > > > for this case (we don't have an skb to get associated namespace/etc/etc).
> > > > The patch series tries to fix that to make sure that we always trigger
> > > > BPF program if it's attached to a device's namespace.
> > > 
> > > then why not to create flow_dissector prog type that works without skb?
> > > Why do you need to fake an skb?
> > > XDP progs work just fine without it.
> > What's the advantage of having another prog type? In this case we would have
> > to write the same flow dissector program twice: first time against __skb_buff
> > interface, second time against xdp_md.
> > By using fake skb, we make the same flow dissector __sk_buff BPF program
> > work in both contexts without a rewrite to an xdp interface (I don't
> > think users should care whether flow dissector was called form "xdp" vs skb
> > context; and we're sort of stuck with __sk_buff interface already).
> Should I follow up with v2 where I address memset(,,256) for each packet?
> Or you still have some questions/doubts/suggestions regarding the problem
> I'm trying to solve?

sorry for delay. I'm still thinking what is the path forward here.

That 'stuck with __sk_buff' is what bothers me.
It's an indication that api wasn't thought through if first thing
it needs is this fake skb hack.
If bpf_flow.c is a realistic example of such flow dissector prog
it means that real skb fields are accessed.
In particular skb->vlan_proto, skb->protocol.
These fields in case of 'fake skb' will not be set, since eth_type_trans()
isn't called yet.
So either flow_dissector needs a real __sk_buff and all of its fields
should be real or it's a different flow_dissector prog type that
needs ctx->data, ctx->data_end, ctx->flow_keys only.
Either way going with fake skb is incorrect, since bpf_flow.c example
will be broken and for program writers it will be hard to figure why
it's broken.


^ permalink raw reply

* Re: [PATCH 2/2] doc: add phylink documentation to the networking book
From: Andrew Lunn @ 2019-02-14  4:32 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Russell King, linux-doc, netdev, David S. Miller, Jonathan Corbet
In-Reply-To: <f002402d-fb27-f697-f07d-de3cdff41f40@infradead.org>

> > +For information describing the SFP cage in DT, please see the binding
> > +documentation in the kernel source tree
> > +``Documentation/devicetree/bindings/net/sff,sfp.txt``
> oh, so SFP means "Small Form-factor Pluggable".
> 
> I see that this source file:
> ./drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1902:
> 
> seems to imply that SFP means "single function per port (SFP) mode":

Hi Randy

rfc5513 might be relevant.

	Andrew

^ permalink raw reply

* Re: [PATCH bpf-next v11 0/7] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: Alexei Starovoitov @ 2019-02-14  4:21 UTC (permalink / raw)
  To: David Ahern
  Cc: Peter Oskolkov, Alexei Starovoitov, Daniel Borkmann, netdev,
	Peter Oskolkov, Willem de Bruijn
In-Reply-To: <3772c82a-6959-9f8a-9273-0adcbdbcf631@gmail.com>

On Wed, Feb 13, 2019 at 08:44:51PM -0700, David Ahern wrote:
> On 2/13/19 7:39 PM, Alexei Starovoitov wrote:
> > On Wed, Feb 13, 2019 at 05:46:26PM -0700, David Ahern wrote:
> >> On 2/13/19 12:53 PM, Peter Oskolkov wrote:
> >>> This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> >>> BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> >>> and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> >>> to packets (e.g. IP/GRE, GUE, IPIP).
> >>>
> >>> This is useful when thousands of different short-lived flows should be
> >>> encapped, each with different and dynamically determined destination.
> >>> Although lwtunnels can be used in some of these scenarios, the ability
> >>> to dynamically generate encap headers adds more flexibility, e.g.
> >>> when routing depends on the state of the host (reflected in global bpf
> >>> maps).
> >>>
> >>
> >>
> >> For the set:
> >> Reviewed-by: David Ahern <dsahern@gmail.com>
> > 
> > Applied. Thanks everyone!
> > 
> 
> Looks like a cleanup round is needed.
> 
> I changed the routes to fail with unreachable:
> 
> @@ -179,16 +175,16 @@
>  	ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local
> ${IPv4_GRE} ttl 255
>  	ip -netns ${NS3} link set gre_dev up
>  	ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
> -	ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
> -	ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
> +	ip -netns ${NS1} route add unreachable ${IPv4_GRE}/32
> +	ip -netns ${NS2} route add unreachable ${IPv4_GRE}/32
> 
> 
>  	# configure IPv6 GRE device in NS3, and a route to it via the "bottom"
> route
>  	ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote
> ${IPv6_1} local ${IPv6_GRE} ttl 255
>  	ip -netns ${NS3} link set gre6_dev up
>  	ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
> -	ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
> -	ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
> +	ip -netns ${NS1} -6 route add unreachable ${IPv6_GRE}/128
> +	ip -netns ${NS2} -6 route add unreachable ${IPv6_GRE}/128
> 
>  	# rp_filter gets confused by what these tests are doing, so disable it
>  	ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
> @@ -220,7 +216,6 @@
> 
> 
> and then removed all of the set -e and exit 1's in the script (really
> should let all of the tests run versus bailing on the first failure).
> 
> With kmemleak enabled I see a lot of suspected memory leaks - some may
> not be related to this change but it is triggering the suspected leak:

argh. Thanks a lot for catching it.
Let's figure out the fix quickly.
If it's too intrusive we can revert and reapply.
I'm not going to send a pull-req to Dave with a known issue like this.


^ permalink raw reply

* Re: [PATCH net 2/2] net: phy: fix potential race in the phylib state machine
From: Florian Fainelli @ 2019-02-14  4:13 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller
  Cc: Russell King - ARM Linux, netdev@vger.kernel.org
In-Reply-To: <1094ff3a-0d7a-dc96-8a19-a5102e08fa79@gmail.com>



On 2/13/2019 11:12 AM, Heiner Kallweit wrote:
> Russell reported the following race in the phylib state machine
> (quoting from his mail):
> 
> if (phy_polling_mode(phydev) && phy_is_started(phydev))
> 	phy_queue_state_machine(phydev, PHY_STATE_TIME);
> 
> state = PHY_UP
> thread 0			thread 1
> 				phy_disconnect()
> 				+-phy_is_started()
> phy_is_started()                |
> 				`-phy_stop()
> 				  +-phydev->state = PHY_HALTED
> 				  `-phy_stop_machine()
> 				    `-cancel_delayed_work_sync()
> phy_queue_state_machine()
> `-mod_delayed_work()
> 
> At this point, the phydev->state_queue() has been added back onto the
> system workqueue despite phy_stop_machine() having been called and
> cancel_delayed_work_sync() called on it.
> 
> Fix this by protecting the complete operation in thread 0.
> 
> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
> Reported-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>

-- 
Florian

^ permalink raw reply

* Re: [PATCH net 1/2] net: phy: don't use locking in phy_is_started
From: Florian Fainelli @ 2019-02-14  4:13 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller
  Cc: Russell King - ARM Linux, netdev@vger.kernel.org
In-Reply-To: <2e6abca8-6a60-a7f0-b3e3-0d55fbebd4fc@gmail.com>



On 2/13/2019 11:11 AM, Heiner Kallweit wrote:
> Russell suggested to remove the locking from phy_is_started() because
> the read is atomic anyway and actually the locking may be more
> misleading.
> 
> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
> Suggested-by: Russell King - ARM Linux admin <linux@armlinux.org.uk>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net] net: phy: fix interrupt handling in non-started states
From: Florian Fainelli @ 2019-02-14  4:10 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn, David Miller
  Cc: netdev@vger.kernel.org, Russell King - ARM Linux
In-Reply-To: <25e86edc-0b88-8c03-b692-776e971331f2@gmail.com>



On 2/12/2019 10:56 AM, Heiner Kallweit wrote:
> phylib enables interrupts before phy_start() has been called, and if
> we receive an interrupt in a non-started state, the interrupt handler
> returns IRQ_NONE. This causes problems with at least one Marvell chip
> as reported by Andrew.
> Fix this by handling interrupts the same as in phy_mac_interrupt(),
> basically always running the phylib state machine. It knows when it
> has to do something and when not.
> This change allows to handle interrupts gracefully even if they
> occur in a non-started state.
> 
> Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
> Reported-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 2/2] net: hns3: add fixup handle for hns3 driver
From: Florian Fainelli @ 2019-02-14  4:08 UTC (permalink / raw)
  To: Jian Shen, andrew, hkallweit1, davem; +Cc: netdev, linux-kernel, linuxarm
In-Reply-To: <1550118667-119947-3-git-send-email-shenjian15@huawei.com>



On 2/13/2019 8:31 PM, Jian Shen wrote:
> The default led configuration of marvell 88E1510 is not fit
> for hns3 driver, this patch fixes it.
> 
> Signed-off-by: Jian Shen <shenjian15@huawei.com>
> ---
>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c   | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> index 84f2878..4c8346e 100644
> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
> @@ -2,6 +2,7 @@
>  // Copyright (c) 2016-2017 Hisilicon Limited.
>  
>  #include <linux/etherdevice.h>
> +#include <linux/marvell_phy.h>
>  #include <linux/kernel.h>
>  
>  #include "hclge_cmd.h"
> @@ -125,6 +126,13 @@ static int hclge_mdio_read(struct mii_bus *bus, int phyid, int regnum)
>  	return le16_to_cpu(mdio_cmd->data_rd);
>  }
>  
> +static int hclge_phy_marvell_fixup(struct phy_device *phydev)
> +{
> +	phydev->dev_flags |= MARVELL_PHY_M1510_HNS3_LEDS;
> +
> +	return 0;
> +}
> +
>  int hclge_mac_mdio_config(struct hclge_dev *hdev)
>  {
>  	struct hclge_mac *mac = &hdev->hw.mac;
> @@ -168,6 +176,15 @@ int hclge_mac_mdio_config(struct hclge_dev *hdev)
>  	mac->phydev = phydev;
>  	mac->mdio_bus = mdio_bus;
>  
> +	/* register the PHY board fixup (for Marvell 88E1510) */
> +	ret = phy_register_fixup_for_uid(MARVELL_PHY_ID_88E1510,
> +					 MARVELL_PHY_ID_MASK,
> +					 hclge_phy_marvell_fixup);
> +	/* we can live without it, so just issue a warning */
> +	if (ret)
> +		dev_warn(&hdev->pdev->dev,
> +			 "Cannot register PHY board fixup\n");

You don't need to register a fixup for passing your flags, you can do
that at the time you attach to the PHY:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/phy.h#n945


> +
>  	return 0;
>  }
>  
> @@ -240,6 +257,8 @@ void hclge_mac_disconnect_phy(struct hnae3_handle *handle)
>  	if (!phydev)
>  		return;
>  
> +	phy_unregister_fixup_for_uid(MARVELL_PHY_ID_88E1510,
> +				     MARVELL_PHY_ID_MASK);
>  	phy_disconnect(phydev);
>  }
>  
> 

-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: phy: marvell: add new m88e1510 LED configuration
From: Florian Fainelli @ 2019-02-14  4:06 UTC (permalink / raw)
  To: Jian Shen, andrew, hkallweit1, davem; +Cc: netdev, linux-kernel, linuxarm
In-Reply-To: <1550118667-119947-2-git-send-email-shenjian15@huawei.com>



On 2/13/2019 8:31 PM, Jian Shen wrote:
> The default m88e1510 LED configuration is 0x1177, used LED[0]
> for 1000M link, LED[1] for 100M link, and LED[2] for active.
> But for our boards, we want to use 0x1040, which use LED[0] for
> link, and LED[1] for active.
> 
> This patch adds a new m88e1510 LED configuration for it.

There appears to be a precedent with the DNS323 flag that was defined
for the same purpose, but this unfortunately does not scale we cannot
have every new platform come up with its own LED configuration without
having a more structured approach to representing the LED configuration.

Maybe we can encode the desired LED behavior in a more generic way and
utilize the 32 flag bits available to denote a selection, e.g.:

MARVELL_PHY_FLAG_LED0_100M	BIT(3)
MARVELL_PHY_FLAG_LED0_1000M	BIT(4)

etc.

or maybe even better would be to expose the LEDs using the standard LEDs
class subsystem and allow configuring different triggers. We have some
amount of support for PHY LEDs already in tree, but AFAIR what we do not
have support for is a "hardware blinking" trigger which those LEDs are.

> 
> Signed-off-by: Jian Shen <shenjian15@huawei.com>
> ---
>  drivers/net/phy/marvell.c   | 22 +++++++++++++++++++++-
>  include/linux/marvell_phy.h |  1 +
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 3ccba37..c195286 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -128,6 +128,10 @@
>  #define MII_PHY_LED_CTRL	        16
>  #define MII_88E1121_PHY_LED_DEF		0x0030
>  #define MII_88E1510_PHY_LED_DEF		0x1177
> +#define MII_88E1510_PHY_HNS3_LED_DEF	0x1040
> +
> +#define MII_88E1510_PHY_LED_POLARITY_CTRL	0x11
> +#define MII_88E1510_PHY_HNS3_LED_POLARITY	0x4415
>  
>  #define MII_M1011_PHY_STATUS		0x11
>  #define MII_M1011_PHY_STATUS_1000	0x8000
> @@ -619,12 +623,19 @@ static void marvell_config_led(struct phy_device *phydev)
>  		def_config = MII_88E1121_PHY_LED_DEF;
>  		break;
>  	/* Default PHY LED config:
> +	 * For hns3:
> +	 * LED[0] .. Link
> +	 * LED[1] .. Activity
> +	 * For others:
>  	 * LED[0] .. 1000Mbps Link
>  	 * LED[1] .. 100Mbps Link
>  	 * LED[2] .. Blink, Activity
>  	 */
>  	case MARVELL_PHY_FAMILY_ID(MARVELL_PHY_ID_88E1510):
> -		def_config = MII_88E1510_PHY_LED_DEF;
> +		if (phydev->dev_flags & MARVELL_PHY_M1510_HNS3_LEDS)
> +			def_config = MII_88E1510_PHY_HNS3_LED_DEF;
> +		else
> +			def_config = MII_88E1510_PHY_LED_DEF;
>  		break;
>  	default:
>  		return;
> @@ -634,6 +645,15 @@ static void marvell_config_led(struct phy_device *phydev)
>  			      def_config);
>  	if (err < 0)
>  		phydev_warn(phydev, "Fail to config marvell phy LED.\n");
> +
> +	if (phydev->dev_flags & MARVELL_PHY_M1510_HNS3_LEDS) {
> +		err = phy_write_paged(phydev, MII_MARVELL_LED_PAGE,
> +				      MII_88E1510_PHY_LED_POLARITY_CTRL,
> +				      MII_88E1510_PHY_HNS3_LED_POLARITY);
> +		if (err < 0)
> +			phydev_warn(phydev,
> +				    "Fail to config marvell phy LED polarity.\n");
> +	}
>  }
>  
>  static int marvell_config_init(struct phy_device *phydev)
> diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
> index 1eb6f24..99e0bbb 100644
> --- a/include/linux/marvell_phy.h
> +++ b/include/linux/marvell_phy.h
> @@ -32,5 +32,6 @@
>  /* struct phy_device dev_flags definitions */
>  #define MARVELL_PHY_M1145_FLAGS_RESISTANCE	0x00000001
>  #define MARVELL_PHY_M1118_DNS323_LEDS		0x00000002
> +#define MARVELL_PHY_M1510_HNS3_LEDS		0x00000004
>  
>  #endif /* _MARVELL_PHY_H */
> 

-- 
Florian

^ permalink raw reply

* Re: [PATCH 2/2] doc: add phylink documentation to the networking book
From: Randy Dunlap @ 2019-02-14  4:00 UTC (permalink / raw)
  To: Russell King, linux-doc, netdev; +Cc: David S. Miller, Jonathan Corbet
In-Reply-To: <E1gr376-0007ea-NV@rmk-PC.armlinux.org.uk>

On 2/5/19 7:58 AM, Russell King wrote:
> Add some phylink documentation to the networking book detailing how
> to convert network drivers from phylib to phylink.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
> Version 2 adds the "Modes of operation" section, as it appears mvpp2 is
> non-conformant (which is, unfortunately, causing problems in certain
> circumstances.)
> 
>  Documentation/networking/index.rst       |   1 +
>  Documentation/networking/sfp-phylink.rst | 268 +++++++++++++++++++++++++++++++
>  2 files changed, 269 insertions(+)
>  create mode 100644 Documentation/networking/sfp-phylink.rst
> 

> diff --git a/Documentation/networking/sfp-phylink.rst b/Documentation/networking/sfp-phylink.rst
> new file mode 100644
> index 000000000000..78a577c9d8a3
> --- /dev/null
> +++ b/Documentation/networking/sfp-phylink.rst
> @@ -0,0 +1,268 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=======
> +phylink
> +=======
> +
> +Overview
> +========
> +
> +phylink is a mechanism to support hot-pluggable networking modules
> +without needing to re-initialise the adapter on hot-plug events.
> +
> +phylink supports conventional phylib-based setups, fixed link setups
> +and SFP modules at present.

Please tell what SFP means.
It would also be nice if net/phy/Kconfig told what SFP means.

> +
> +Modes of operation
> +==================
> +
> +phylink has several modes of operation, which depend on the firmware
> +settings.
> +
> +1. PHY mode
> +
> +   In PHY mode, we use phylib to read the current link settings from
> +   the PHY, and pass them to the MAC driver.  We expect the MAC driver
> +   to configure exactly the modes that are specified without any
> +   negotiation being enabled on the link.
> +
> +2. Fixed mode
> +
> +   Fixed mode is the same as PHY mode as far as the MAC driver is
> +   concerned.
> +
> +3. In-band mode
> +
> +   In-band mode is used with 802.3z, SGMII and similar interface modes

should "with" be "when"?

> +   are used, and we are expecting to use the and honor the in-band

eh?                                  ^^^^^^^^^^^^^^^^^^^^

> +   negotiation or control word sent across the serdes channel.
> +
> +By example, what this means is that:
> +
> +.. code-block:: none
> +
> +  &eth {
> +    phy = <&phy>;
> +    phy-mode = "sgmii";
> +  };
> +
> +does not use in-band SGMII signalling.  The PHY is expected to follow
> +exactly the settings given to it in its :c:func:`mac_config` function.
> +The link should be forced up or down appropriately in the
> +:c:func:`mac_link_up` and :c:func:`mac_link_down` functions.
> +
> +.. code-block:: none
> +
> +  &eth {
> +    managed = "in-band-status";
> +    phy = <&phy>;
> +    phy-mode = "sgmii";
> +  };
> +
> +uses in-band mode, where results from the PHYs negotiation are passed

                                             PHY's

> +to the MAC through the SGMII control word, and the MAC is expected to
> +acknowledge the control word.  The :c:func:`mac_link_up` and
> +:c:func:`mac_link_down` functions must not force the MAC side link
> +up and down.
> +
> +Rough guide to converting a network driver to sfp/phylink
> +=========================================================
> +
> +This guide briefly describes how to convert a network driver from
> +phylib to the sfp/phylink support.  Please send patches to improve
> +this documentation.
> +
> +1. Optionally split the network driver's phylib update function into
> +   three parts dealing with link-down, link-up and reconfiguring the
> +   MAC settings. This can be done as a separate preparation commit.
> +
> +   An example of this preparation can be found in git commit fc548b991fb0.
> +
> +2. Replace::
> +
> +	select FIXED_PHY
> +	select PHYLIB
> +
> +   with::
> +
> +	select PHYLINK
> +
> +   in the driver's Kconfig stanza.
> +
> +3. Add::
> +
> +	#include <linux/phylink.h>
> +
> +   to the driver's list of header files.
> +
> +4. Add::
> +
> +	struct phylink *phylink;
> +
> +   to the driver's private data structure.  We shall refer to the
> +   driver's private data pointer as ``priv`` below, and the driver's
> +   private data structure as ``struct foo_priv``.
> +
> +5. Replace the following functions:
> +
> +   .. flat-table::
> +    :header-rows: 1
> +    :widths: 1 1
> +    :stub-columns: 0
> +
> +    * - Original function
> +      - Replacement function
> +    * - phy_start(phydev)
> +      - phylink_start(priv->phylink)
> +    * - phy_stop(phydev)
> +      - phylink_stop(priv->phylink)
> +    * - phy_mii_ioctl(phydev, ifr, cmd)
> +      - phylink_mii_ioctl(priv->phylink, ifr, cmd)
> +    * - phy_ethtool_get_wol(phydev, wol)
> +      - phylink_ethtool_get_wol(priv->phylink, wol)
> +    * - phy_ethtool_set_wol(phydev, wol)
> +      - phylink_ethtool_set_wol(priv->phylink, wol)
> +    * - phy_disconnect(phydev)
> +      - phylink_disconnect_phy(priv->phylink)
> +
> +   Please note that some of these functions must be called under the
> +   rtnl lock, and will warn if not. This will normally be the case,
> +   except if these are called from the driver suspend/resume paths.
> +
> +6. Add/replace ksettings get/set methods with:
> +
> +   .. code-block:: c
> +
> +    static int foo_ethtool_set_link_ksettings(struct net_device *dev,
> +					     const struct ethtool_link_ksettings *cmd)
> +    {
> +	struct foo_priv *priv = netdev_priv(dev);
> +
> +	return phylink_ethtool_ksettings_set(priv->phylink, cmd);
> +    }
> +
> +    static int foo_ethtool_get_link_ksettings(struct net_device *dev,
> +					     struct ethtool_link_ksettings *cmd)
> +    {
> +	struct foo_priv *priv = netdev_priv(dev);
> +
> +	return phylink_ethtool_ksettings_get(priv->phylink, cmd);
> +    }
> +
> +7. Replace the call to:
> +
> +	phy_dev = of_phy_connect(dev, node, link_func, flags, phy_interface)

add ending ';' above.

> +
> +   and associated code with a call to:
> +
> +	err = phylink_of_phy_connect(priv->phylink, node, flags)

ditto.

> +
> +   For the most part, ``flags`` can be zero, these flags are passed to

                                          zero;

> +   the of_phy_attach() inside this function call if a PHY is specified
> +   in the DT node ``node``.
> +
> +   ``node`` should be the DT node which contains the network phy property,
> +   fixed link properties, and will also contain the sfp property.
> +
> +   The setup of fixed links should also be removed; these are handled
> +   natively by phylink.

      internally?

> +
> +   of_phy_connect() was also passed a function pointer for link updates.
> +   This function is replaced by a different form of MAC updates
> +   described below in (8).
> +
> +   Manipulation of the PHY's supported/advertised happens within phylink

                          PHYs

> +   based on the validate callback, see below in (8).
> +
> +   Note that the driver no longer needs to store the ``phy_interface``,
> +   and also note that ``phy_interface`` becomes a dynamic property,
> +   just like the speed, duplex etc settings.

                                  etc.

> +
> +   Finally, note that the MAC driver has no direct access to the PHY
> +   anymore; that is because in the phylink model, the PHY can be
> +   dynamic.
> +
> +8. Add a :c:type:`struct phylink_mac_ops <phylink_mac_ops>` instance to
> +   the driver, which is a table of function pointers, and implement
> +   these functions. The old link update function for
> +   :c:func:`of_phy_connect` becomes three methods: :c:func:`mac_link_up`,
> +   :c:func:`mac_link_down`, and :c:func:`mac_config`. If step 1 was
> +   performed, then the functionality will have been split there.
> +
> +   It is important that if in-band negotiation is used,
> +   :c:func:`mac_link_up` and :c:func:`mac_link_down` do not prevent the
> +   in-band negotiation from completing, since these functions are called
> +   when the in-band link state changes - otherwise the link will never
> +   come up.
> +
> +   The :c:func:`validate` method should mask the supplied supported mask,
> +   and ``state->advertising`` with the supported ethtool link modes.
> +   These are the new ethtool link modes, so bitmask operations must be
> +   used. For an example, see drivers/net/ethernet/marvell/mvneta.c.
> +
> +   The :c:func:`mac_link_state` method is used to read the link state
> +   from the MAC, and report back the settings that the MAC is currently
> +   using. This is particularly important for in-band negotiation
> +   methods such as 1000base-X and SGMII.
> +
> +   The :c:func:`mac_config` method is used to update the MAC with the
> +   requested state, and must avoid unnecessarily taking the link down
> +   when making changes to the MAC configuration.  This means the
> +   function should modify the state and only take the link down when
> +   absolutely necessary to change the MAC configuration.  An example
> +   of how to do this can be found in :c:func:`mvneta_mac_config` in
> +   drivers/net/ethernet/marvell/mvneta.c.
> +
> +   For further information on these methods, please see the inline
> +   documentation in :c:type:`struct phylink_mac_ops <phylink_mac_ops>`.
> +
> +9. Remove calls to of_parse_phandle() for the PHY,
> +   of_phy_register_fixed_link() for fixed links etc from the probe

                                                   etc.

> +   function, and replace with:
> +
> +   .. code-block:: c
> +
> +	struct phylink *phylink;
> +
> +	phylink = phylink_create(dev, node, phy_mode, &phylink_ops);
> +	if (IS_ERR(phylink)) {
> +		err = PTR_ERR(phylink);
> +		fail probe;
> +	}
> +
> +	priv->phylink = phylink;
> +
> +   and arrange to destroy the phylink in the probe failure path as
> +   appropriate and the removal path too by calling:
> +
> +   .. code-block:: c
> +
> +	phylink_destroy(priv->phylink);
> +
> +10. Arrange for MAC link state interrupts to be forwarded into
> +    phylink, via:
> +
> +    .. code-block:: c
> +
> +	phylink_mac_change(priv->phylink, link_is_up);
> +
> +    where ``link_is_up`` is true if the link is currently up or false
> +    otherwise.
> +
> +11. Verify that the driver does not call::
> +
> +	netif_carrier_on()
> +	netif_carrier_off()
> +
> +   as these will interfere with phylink's tracking of the link state,
> +   and cause phylink to omit calls via the :c:func:`mac_link_up` and
> +   :c:func:`mac_link_down` methods.
> +
> +Network drivers should call phylink_stop() and phylink_start() via their
> +suspend/resume paths, which ensures that the appropriate
> +:c:type:`struct phylink_mac_ops <phylink_mac_ops>` methods are called
> +as necessary.
> +
> +For information describing the SFP cage in DT, please see the binding
> +documentation in the kernel source tree
> +``Documentation/devicetree/bindings/net/sff,sfp.txt``
oh, so SFP means "Small Form-factor Pluggable".

I see that this source file:
./drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1902:

seems to imply that SFP means "single function per port (SFP) mode":

	dev_err(&pf->pdev->dev,
		"VF %d requested polling mode: this feature is supported only when the device is running in single function per port (SFP) mode\n",
		 vf->vf_id);


Good job overall.  Thanks.

-- 
~Randy

^ permalink raw reply

* [PATCH net-next 2/2] net: hns3: add fixup handle for hns3 driver
From: Jian Shen @ 2019-02-14  4:31 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem; +Cc: netdev, linux-kernel, linuxarm
In-Reply-To: <1550118667-119947-1-git-send-email-shenjian15@huawei.com>

The default led configuration of marvell 88E1510 is not fit
for hns3 driver, this patch fixes it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
---
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c   | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
index 84f2878..4c8346e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c
@@ -2,6 +2,7 @@
 // Copyright (c) 2016-2017 Hisilicon Limited.
 
 #include <linux/etherdevice.h>
+#include <linux/marvell_phy.h>
 #include <linux/kernel.h>
 
 #include "hclge_cmd.h"
@@ -125,6 +126,13 @@ static int hclge_mdio_read(struct mii_bus *bus, int phyid, int regnum)
 	return le16_to_cpu(mdio_cmd->data_rd);
 }
 
+static int hclge_phy_marvell_fixup(struct phy_device *phydev)
+{
+	phydev->dev_flags |= MARVELL_PHY_M1510_HNS3_LEDS;
+
+	return 0;
+}
+
 int hclge_mac_mdio_config(struct hclge_dev *hdev)
 {
 	struct hclge_mac *mac = &hdev->hw.mac;
@@ -168,6 +176,15 @@ int hclge_mac_mdio_config(struct hclge_dev *hdev)
 	mac->phydev = phydev;
 	mac->mdio_bus = mdio_bus;
 
+	/* register the PHY board fixup (for Marvell 88E1510) */
+	ret = phy_register_fixup_for_uid(MARVELL_PHY_ID_88E1510,
+					 MARVELL_PHY_ID_MASK,
+					 hclge_phy_marvell_fixup);
+	/* we can live without it, so just issue a warning */
+	if (ret)
+		dev_warn(&hdev->pdev->dev,
+			 "Cannot register PHY board fixup\n");
+
 	return 0;
 }
 
@@ -240,6 +257,8 @@ void hclge_mac_disconnect_phy(struct hnae3_handle *handle)
 	if (!phydev)
 		return;
 
+	phy_unregister_fixup_for_uid(MARVELL_PHY_ID_88E1510,
+				     MARVELL_PHY_ID_MASK);
 	phy_disconnect(phydev);
 }
 
-- 
1.9.1


^ permalink raw reply related

* [PATCH net-next 1/2] net: phy: marvell: add new m88e1510 LED configuration
From: Jian Shen @ 2019-02-14  4:31 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem; +Cc: netdev, linux-kernel, linuxarm
In-Reply-To: <1550118667-119947-1-git-send-email-shenjian15@huawei.com>

The default m88e1510 LED configuration is 0x1177, used LED[0]
for 1000M link, LED[1] for 100M link, and LED[2] for active.
But for our boards, we want to use 0x1040, which use LED[0] for
link, and LED[1] for active.

This patch adds a new m88e1510 LED configuration for it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
---
 drivers/net/phy/marvell.c   | 22 +++++++++++++++++++++-
 include/linux/marvell_phy.h |  1 +
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 3ccba37..c195286 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -128,6 +128,10 @@
 #define MII_PHY_LED_CTRL	        16
 #define MII_88E1121_PHY_LED_DEF		0x0030
 #define MII_88E1510_PHY_LED_DEF		0x1177
+#define MII_88E1510_PHY_HNS3_LED_DEF	0x1040
+
+#define MII_88E1510_PHY_LED_POLARITY_CTRL	0x11
+#define MII_88E1510_PHY_HNS3_LED_POLARITY	0x4415
 
 #define MII_M1011_PHY_STATUS		0x11
 #define MII_M1011_PHY_STATUS_1000	0x8000
@@ -619,12 +623,19 @@ static void marvell_config_led(struct phy_device *phydev)
 		def_config = MII_88E1121_PHY_LED_DEF;
 		break;
 	/* Default PHY LED config:
+	 * For hns3:
+	 * LED[0] .. Link
+	 * LED[1] .. Activity
+	 * For others:
 	 * LED[0] .. 1000Mbps Link
 	 * LED[1] .. 100Mbps Link
 	 * LED[2] .. Blink, Activity
 	 */
 	case MARVELL_PHY_FAMILY_ID(MARVELL_PHY_ID_88E1510):
-		def_config = MII_88E1510_PHY_LED_DEF;
+		if (phydev->dev_flags & MARVELL_PHY_M1510_HNS3_LEDS)
+			def_config = MII_88E1510_PHY_HNS3_LED_DEF;
+		else
+			def_config = MII_88E1510_PHY_LED_DEF;
 		break;
 	default:
 		return;
@@ -634,6 +645,15 @@ static void marvell_config_led(struct phy_device *phydev)
 			      def_config);
 	if (err < 0)
 		phydev_warn(phydev, "Fail to config marvell phy LED.\n");
+
+	if (phydev->dev_flags & MARVELL_PHY_M1510_HNS3_LEDS) {
+		err = phy_write_paged(phydev, MII_MARVELL_LED_PAGE,
+				      MII_88E1510_PHY_LED_POLARITY_CTRL,
+				      MII_88E1510_PHY_HNS3_LED_POLARITY);
+		if (err < 0)
+			phydev_warn(phydev,
+				    "Fail to config marvell phy LED polarity.\n");
+	}
 }
 
 static int marvell_config_init(struct phy_device *phydev)
diff --git a/include/linux/marvell_phy.h b/include/linux/marvell_phy.h
index 1eb6f24..99e0bbb 100644
--- a/include/linux/marvell_phy.h
+++ b/include/linux/marvell_phy.h
@@ -32,5 +32,6 @@
 /* struct phy_device dev_flags definitions */
 #define MARVELL_PHY_M1145_FLAGS_RESISTANCE	0x00000001
 #define MARVELL_PHY_M1118_DNS323_LEDS		0x00000002
+#define MARVELL_PHY_M1510_HNS3_LEDS		0x00000004
 
 #endif /* _MARVELL_PHY_H */
-- 
1.9.1


^ permalink raw reply related

* [PATCH net-next 0/2] net: phy: add new led configuration for marvell m88e1510
From: Jian Shen @ 2019-02-14  4:31 UTC (permalink / raw)
  To: andrew, f.fainelli, hkallweit1, davem; +Cc: netdev, linux-kernel, linuxarm

Currently, the m88e1510 phy driver used LED[0] and LED[1] for link,
LED[2] for active. It's incompatible with some boards which uses
LED[1] for active. This patchset adds new led configuration for
HNS3 driver.

Jian Shen (2):
  net: phy: marvell: add new m88e1510 LED configuration
  net: hns3: add fixup handle for hns3 driver

 .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c    | 19 +++++++++++++++++++
 drivers/net/phy/marvell.c                          | 22 +++++++++++++++++++++-
 include/linux/marvell_phy.h                        |  1 +
 3 files changed, 41 insertions(+), 1 deletion(-)

-- 
1.9.1


^ permalink raw reply

* Re: [PATCH bpf-next v11 0/7] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: David Ahern @ 2019-02-14  3:44 UTC (permalink / raw)
  To: Peter Oskolkov
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann, netdev,
	Peter Oskolkov, Willem de Bruijn
In-Reply-To: <20190214023916.fu6ymperb4lqi632@ast-mbp>

On 2/13/19 7:39 PM, Alexei Starovoitov wrote:
> On Wed, Feb 13, 2019 at 05:46:26PM -0700, David Ahern wrote:
>> On 2/13/19 12:53 PM, Peter Oskolkov wrote:
>>> This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
>>> BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
>>> and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
>>> to packets (e.g. IP/GRE, GUE, IPIP).
>>>
>>> This is useful when thousands of different short-lived flows should be
>>> encapped, each with different and dynamically determined destination.
>>> Although lwtunnels can be used in some of these scenarios, the ability
>>> to dynamically generate encap headers adds more flexibility, e.g.
>>> when routing depends on the state of the host (reflected in global bpf
>>> maps).
>>>
>>
>>
>> For the set:
>> Reviewed-by: David Ahern <dsahern@gmail.com>
> 
> Applied. Thanks everyone!
> 

Looks like a cleanup round is needed.

I changed the routes to fail with unreachable:

@@ -179,16 +175,16 @@
 	ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local
${IPv4_GRE} ttl 255
 	ip -netns ${NS3} link set gre_dev up
 	ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
-	ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
-	ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
+	ip -netns ${NS1} route add unreachable ${IPv4_GRE}/32
+	ip -netns ${NS2} route add unreachable ${IPv4_GRE}/32


 	# configure IPv6 GRE device in NS3, and a route to it via the "bottom"
route
 	ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote
${IPv6_1} local ${IPv6_GRE} ttl 255
 	ip -netns ${NS3} link set gre6_dev up
 	ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
-	ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
-	ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
+	ip -netns ${NS1} -6 route add unreachable ${IPv6_GRE}/128
+	ip -netns ${NS2} -6 route add unreachable ${IPv6_GRE}/128

 	# rp_filter gets confused by what these tests are doing, so disable it
 	ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
@@ -220,7 +216,6 @@


and then removed all of the set -e and exit 1's in the script (really
should let all of the tests run versus bailing on the first failure).

With kmemleak enabled I see a lot of suspected memory leaks - some may
not be related to this change but it is triggering the suspected leak:


unreferenced object 0xffff88813407a9c0 (size 160):
  comm "ping", pid 1040, jiffies 4294800240 (age 130.536s)
  hex dump (first 32 bytes):
    00 60 23 28 81 88 ff ff 80 d7 23 82 ff ff ff ff  .`#(......#.....
    c1 7f c8 81 ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<00000000dec307f3>] dst_alloc+0x89/0xc1
    [<0000000037c7c09a>] rt_dst_alloc+0x57/0xd4
    [<00000000850d146d>] ip_route_output_key_hash_rcu+0x57a/0x64d
    [<0000000059f3e271>] ip_route_output_key_hash+0x6e/0x98
    [<0000000093465e72>] ip_route_output_flow+0x1e/0x47
    [<000000007eee78d9>] raw_sendmsg+0x551/0xbd8
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881280bdf00 (size 224):
  comm "ping", pid 1040, jiffies 4294800240 (age 130.536s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 80 96 27 81 88 ff ff 40 da a2 27 81 88 ff ff  ...'....@..'....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<00000000e92d2e81>] __ip_append_data+0x44f/0xa88
    [<00000000c0fa4285>] ip_append_data.part.19+0xa4/0xb7
    [<000000008437d83b>] ip_append_data+0x22/0x28
    [<0000000010065ae2>] raw_sendmsg+0xaff/0xbd8
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff888127ad6c00 (size 1024):
  comm "ping", pid 1040, jiffies 4294800240 (age 130.545s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<000000002d199add>] __bpf_prog_run32+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<00000000abfa977f>] ip_finish_output2+0x5b6/0x5e7
    [<000000005a652edf>] ip_finish_output+0x17f/0x191
    [<00000000cded5fe2>] ip_output+0x58/0x88
unreferenced object 0xffff888131020f00 (size 224):
  comm "ping", pid 1040, jiffies 4294800240 (age 130.545s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 80 96 27 81 88 ff ff 40 da a2 27 81 88 ff ff  ...'....@..'....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<00000000e92d2e81>] __ip_append_data+0x44f/0xa88
    [<00000000c0fa4285>] ip_append_data.part.19+0xa4/0xb7
    [<000000008437d83b>] ip_append_data+0x22/0x28
    [<0000000010065ae2>] raw_sendmsg+0xaff/0xbd8
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff88812779cc00 (size 1024):
  comm "ping", pid 1040, jiffies 4294800240 (age 130.545s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<000000002d199add>] __bpf_prog_run32+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<00000000abfa977f>] ip_finish_output2+0x5b6/0x5e7
    [<000000005a652edf>] ip_finish_output+0x17f/0x191
    [<00000000cded5fe2>] ip_output+0x58/0x88
unreferenced object 0xffff888131abf980 (size 1632):
  comm "ping6", pid 1041, jiffies 4294801264 (age 129.529s)
  hex dump (first 32 bytes):
    00 00 00 00 7f 00 00 06 00 00 00 00 00 00 3a 00  ..............:.
    0a 00 07 41 00 00 00 00 00 00 00 00 00 00 00 00  ...A............
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<000000006287ee1f>] sk_prot_alloc.isra.27+0x30/0xb4
    [<00000000f5b12125>] sk_alloc+0x2e/0x1aa
    [<00000000c2b12d1b>] inet6_create+0x1ae/0x3a7
    [<000000000a3125dc>] __sock_create+0x1c1/0x22a
    [<00000000039d3cb8>] sock_create+0x30/0x32
    [<000000000556b08a>] __sys_socket+0x3d/0xb3
    [<000000007e47d085>] __x64_sys_socket+0x1a/0x1e
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff888127683f00 (size 224):
  comm "softirq", pid 0, jiffies 4294801264 (age 129.529s)
  hex dump (first 32 bytes):
    00 60 23 28 81 88 ff ff 00 2f 30 35 81 88 ff ff  .`#(...../05....
    c1 7f c8 81 ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<00000000dec307f3>] dst_alloc+0x89/0xc1
    [<000000002651f911>] ip6_dst_alloc+0x25/0x63
    [<000000008608d224>] ip6_pol_route+0x201/0x2ae
    [<000000000084a9eb>] ip6_pol_route_output+0x19/0x1b
    [<000000005d385680>] fib6_rule_lookup+0xe7/0x12c
    [<000000001a93c416>] ip6_route_output_flags+0xc5/0xd1
    [<0000000023cbe9f3>] ip6_dst_lookup_tail+0x1a3/0x364
    [<000000003afeb57e>] ip6_dst_lookup_flow+0x47/0x9b
    [<00000000938dbf6b>] rawv6_sendmsg+0x45f/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
unreferenced object 0xffff88812fb34f00 (size 224):
  comm "ping6", pid 1041, jiffies 4294801264 (age 129.530s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 80 96 27 81 88 ff ff 80 f9 ab 31 81 88 ff ff  ...'.......1....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<0000000014f706bf>] __ip6_append_data+0x57c/0xc62
    [<0000000016ddc7e9>] ip6_append_data+0x135/0x148
    [<00000000210a1bd5>] rawv6_sendmsg+0xb19/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881276bdc00 (size 1024):
  comm "ping6", pid 1041, jiffies 4294801264 (age 129.537s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<000000002d199add>] __bpf_prog_run32+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<000000004c67bd98>] ip6_finish_output2+0x477/0x494
    [<00000000bf642298>] ip6_finish_output+0x106/0x110
    [<00000000365d3055>] ip6_output+0x87/0xbf
unreferenced object 0xffff8881278cbf00 (size 224):
  comm "ping6", pid 1041, jiffies 4294801264 (age 129.537s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 80 96 27 81 88 ff ff 80 f9 ab 31 81 88 ff ff  ...'.......1....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<0000000014f706bf>] __ip6_append_data+0x57c/0xc62
    [<0000000016ddc7e9>] ip6_append_data+0x135/0x148
    [<00000000210a1bd5>] rawv6_sendmsg+0xb19/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881268fdc00 (size 1024):
  comm "ping6", pid 1041, jiffies 4294801264 (age 129.537s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<000000002d199add>] __bpf_prog_run32+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<000000004c67bd98>] ip6_finish_output2+0x477/0x494
    [<00000000bf642298>] ip6_finish_output+0x106/0x110
    [<00000000365d3055>] ip6_output+0x87/0xbf
unreferenced object 0xffff888127157e80 (size 128):
  comm "ip", pid 1157, jiffies 4294810718 (age 120.100s)
  hex dump (first 32 bytes):
    06 00 04 00 00 00 00 00 02 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<00000000bd21c202>] __kmalloc+0x102/0x143
    [<00000000ac289f37>] lwtunnel_state_alloc+0x1a/0x1c
    [<000000002acf5ea0>] bpf_build_state+0x8c/0x16a
    [<00000000e1c85c0f>] lwtunnel_build_state+0x10a/0x148
    [<00000000187eb239>] ip6_route_info_create+0x2ae/0x823
    [<0000000017a89b3a>] ip6_route_add+0x1a/0x4e
    [<00000000de590240>] inet6_rtm_newroute+0x62/0x80
    [<000000009356b68b>] rtnetlink_rcv_msg+0x22d/0x273
    [<000000003da74356>] netlink_rcv_skb+0x8b/0xd9
    [<000000005349126a>] rtnetlink_rcv+0x15/0x17
    [<00000000d24a54ac>] netlink_unicast+0x118/0x1b1
    [<0000000096e0cc4e>] netlink_sendmsg+0x328/0x34d
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<000000003872075e>] ___sys_sendmsg+0x1ad/0x238
unreferenced object 0xffff8881271d1180 (size 64):
  comm "ip", pid 1157, jiffies 4294810718 (age 120.100s)
  hex dump (first 32 bytes):
    74 65 73 74 5f 6c 77 74 5f 69 70 5f 65 6e 63 61  test_lwt_ip_enca
    70 2e 6f 3a 5b 65 6e 63 61 70 5f 67 72 65 36 5d  p.o:[encap_gre6]
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<000000004a28667f>] kmemdup+0x20/0x35
    [<00000000f0fe083f>] bpf_parse_prog+0x77/0xc3
    [<000000005e53fa11>] bpf_build_state+0x108/0x16a
    [<00000000e1c85c0f>] lwtunnel_build_state+0x10a/0x148
    [<00000000187eb239>] ip6_route_info_create+0x2ae/0x823
    [<0000000017a89b3a>] ip6_route_add+0x1a/0x4e
    [<00000000de590240>] inet6_rtm_newroute+0x62/0x80
    [<000000009356b68b>] rtnetlink_rcv_msg+0x22d/0x273
    [<000000003da74356>] netlink_rcv_skb+0x8b/0xd9
    [<000000005349126a>] rtnetlink_rcv+0x15/0x17
    [<00000000d24a54ac>] netlink_unicast+0x118/0x1b1
    [<0000000096e0cc4e>] netlink_sendmsg+0x328/0x34d
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
unreferenced object 0xffff88813189ea40 (size 1432):
  comm "ping", pid 1159, jiffies 4294810742 (age 120.076s)
  hex dump (first 32 bytes):
    00 00 00 00 ac 10 01 64 00 00 00 00 00 00 01 00  .......d........
    02 00 07 41 00 00 00 00 00 00 00 00 00 00 00 00  ...A............
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<000000006287ee1f>] sk_prot_alloc.isra.27+0x30/0xb4
    [<00000000f5b12125>] sk_alloc+0x2e/0x1aa
    [<0000000048f46e3a>] inet_create+0x1ab/0x32e
    [<000000000a3125dc>] __sock_create+0x1c1/0x22a
    [<00000000039d3cb8>] sock_create+0x30/0x32
    [<000000000556b08a>] __sys_socket+0x3d/0xb3
    [<000000007e47d085>] __x64_sys_socket+0x1a/0x1e
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881270f2f00 (size 224):
  comm "ping", pid 1159, jiffies 4294810743 (age 120.084s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 30 af 26 81 88 ff ff 40 ea 89 31 81 88 ff ff  .0.&....@..1....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<00000000e92d2e81>] __ip_append_data+0x44f/0xa88
    [<00000000c0fa4285>] ip_append_data.part.19+0xa4/0xb7
    [<000000008437d83b>] ip_append_data+0x22/0x28
    [<0000000010065ae2>] raw_sendmsg+0xaff/0xbd8
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881336dac00 (size 1024):
  comm "ping", pid 1159, jiffies 4294810743 (age 120.084s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<0000000055e881cc>] __bpf_prog_run64+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<00000000abfa977f>] ip_finish_output2+0x5b6/0x5e7
    [<000000005a652edf>] ip_finish_output+0x17f/0x191
    [<00000000cded5fe2>] ip_output+0x58/0x88
unreferenced object 0xffff8881283d0f00 (size 224):
  comm "ping", pid 1159, jiffies 4294810743 (age 120.084s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 30 af 26 81 88 ff ff 40 ea 89 31 81 88 ff ff  .0.&....@..1....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<00000000e92d2e81>] __ip_append_data+0x44f/0xa88
    [<00000000c0fa4285>] ip_append_data.part.19+0xa4/0xb7
    [<000000008437d83b>] ip_append_data+0x22/0x28
    [<0000000010065ae2>] raw_sendmsg+0xaff/0xbd8
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881272edc00 (size 1024):
  comm "ping", pid 1159, jiffies 4294810743 (age 120.093s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<0000000055e881cc>] __bpf_prog_run64+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<00000000abfa977f>] ip_finish_output2+0x5b6/0x5e7
    [<000000005a652edf>] ip_finish_output+0x17f/0x191
    [<00000000cded5fe2>] ip_output+0x58/0x88
unreferenced object 0xffff88813355d980 (size 1632):
  comm "ping6", pid 1160, jiffies 4294811768 (age 119.068s)
  hex dump (first 32 bytes):
    00 00 00 00 7f 00 00 06 00 00 00 00 00 00 3a 00  ..............:.
    0a 00 07 41 00 00 00 00 00 00 00 00 00 00 00 00  ...A............
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<000000006287ee1f>] sk_prot_alloc.isra.27+0x30/0xb4
    [<00000000f5b12125>] sk_alloc+0x2e/0x1aa
    [<00000000c2b12d1b>] inet6_create+0x1ae/0x3a7
    [<000000000a3125dc>] __sock_create+0x1c1/0x22a
    [<00000000039d3cb8>] sock_create+0x30/0x32
    [<000000000556b08a>] __sys_socket+0x3d/0xb3
    [<000000007e47d085>] __x64_sys_socket+0x1a/0x1e
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff8881282b1f00 (size 224):
  comm "softirq", pid 0, jiffies 4294811768 (age 119.068s)
  hex dump (first 32 bytes):
    00 10 1c 28 81 88 ff ff 40 ee 25 28 81 88 ff ff  ...(....@.%(....
    c1 7f c8 81 ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<00000000dec307f3>] dst_alloc+0x89/0xc1
    [<000000002651f911>] ip6_dst_alloc+0x25/0x63
    [<000000008608d224>] ip6_pol_route+0x201/0x2ae
    [<000000000084a9eb>] ip6_pol_route_output+0x19/0x1b
    [<000000005d385680>] fib6_rule_lookup+0xe7/0x12c
    [<000000001a93c416>] ip6_route_output_flags+0xc5/0xd1
    [<0000000023cbe9f3>] ip6_dst_lookup_tail+0x1a3/0x364
    [<000000003afeb57e>] ip6_dst_lookup_flow+0x47/0x9b
    [<00000000938dbf6b>] rawv6_sendmsg+0x45f/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
unreferenced object 0xffff88812744bf00 (size 224):
  comm "ping6", pid 1160, jiffies 4294811768 (age 119.076s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 30 af 26 81 88 ff ff 80 d9 55 33 81 88 ff ff  .0.&......U3....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<0000000014f706bf>] __ip6_append_data+0x57c/0xc62
    [<0000000016ddc7e9>] ip6_append_data+0x135/0x148
    [<00000000210a1bd5>] rawv6_sendmsg+0xb19/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff888127371c00 (size 1024):
  comm "ping6", pid 1160, jiffies 4294811769 (age 119.075s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<0000000055e881cc>] __bpf_prog_run64+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<000000004c67bd98>] ip6_finish_output2+0x477/0x494
    [<00000000bf642298>] ip6_finish_output+0x106/0x110
    [<00000000365d3055>] ip6_output+0x87/0xbf
unreferenced object 0xffff88812723cf00 (size 224):
  comm "softirq", pid 0, jiffies 4294811770 (age 119.074s)
  hex dump (first 32 bytes):
    00 10 1c 28 81 88 ff ff 40 ee 25 28 81 88 ff ff  ...(....@.%(....
    c1 7f c8 81 ff ff ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<00000000dec307f3>] dst_alloc+0x89/0xc1
    [<000000002651f911>] ip6_dst_alloc+0x25/0x63
    [<000000008608d224>] ip6_pol_route+0x201/0x2ae
    [<000000000084a9eb>] ip6_pol_route_output+0x19/0x1b
    [<000000005d385680>] fib6_rule_lookup+0xe7/0x12c
    [<000000001a93c416>] ip6_route_output_flags+0xc5/0xd1
    [<0000000023cbe9f3>] ip6_dst_lookup_tail+0x1a3/0x364
    [<000000003afeb57e>] ip6_dst_lookup_flow+0x47/0x9b
    [<00000000938dbf6b>] rawv6_sendmsg+0x45f/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
unreferenced object 0xffff8881273d3f00 (size 224):
  comm "ping6", pid 1160, jiffies 4294811770 (age 119.084s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 30 af 26 81 88 ff ff 80 d9 55 33 81 88 ff ff  .0.&......U3....
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<0000000025231f88>] kmem_cache_alloc+0xd8/0x1fa
    [<0000000043c55b9b>] __alloc_skb+0x66/0x1b9
    [<0000000014f706bf>] __ip6_append_data+0x57c/0xc62
    [<0000000016ddc7e9>] ip6_append_data+0x135/0x148
    [<00000000210a1bd5>] rawv6_sendmsg+0xb19/0xdfc
    [<00000000f564ad0b>] inet_sendmsg+0x3f/0x82
    [<00000000a0a71539>] sock_sendmsg_nosec+0x18/0x2f
    [<0000000025dbe598>] __sys_sendto+0x102/0x143
    [<000000000f989e54>] __x64_sys_sendto+0x28/0x2c
    [<00000000520e974d>] do_syscall_64+0x5c/0x6e
    [<00000000413f2b33>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
    [<000000005a49f8d9>] 0xffffffffffffffff
unreferenced object 0xffff88812825bc00 (size 1024):
  comm "ping6", pid 1160, jiffies 4294811771 (age 119.083s)
  hex dump (first 32 bytes):
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
    5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
  backtrace:
    [<00000000c4c19340>] kmemleak_alloc+0x70/0x94
    [<000000003251fc6a>] slab_post_alloc_hook+0x47/0x5c
    [<000000006cd7857e>] __kmalloc_track_caller+0xfe/0x13f
    [<00000000ebec8a26>] __kmalloc_reserve.isra.17+0x2d/0x6d
    [<00000000827ebff0>] pskb_expand_head+0xcc/0x2d1
    [<0000000034bfc15f>] skb_cow_head+0xae/0xb7
    [<000000008f8c30fc>] bpf_lwt_push_ip_encap+0xb1/0x34c
    [<000000007add911b>] bpf_lwt_xmit_push_encap+0x1d/0x29
    [<00000000eecbf798>] ___bpf_prog_run+0xbc3/0x1757
    [<0000000055e881cc>] __bpf_prog_run64+0x42/0x58
    [<00000000d0c1f29b>] run_lwt_bpf.constprop.4+0xff/0x2e6
    [<00000000b9ad5b04>] bpf_xmit+0x3d/0xef
    [<000000003ba220bf>] lwtunnel_xmit+0xc7/0xeb
    [<000000004c67bd98>] ip6_finish_output2+0x477/0x494
    [<00000000bf642298>] ip6_finish_output+0x106/0x110
    [<00000000365d3055>] ip6_output+0x87/0xbf

^ permalink raw reply

* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: David Chang @ 2019-02-14  2:45 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Realtek linux nic maintainers, netdev, Martti Laaksonen
In-Reply-To: <856b3a75-5daf-6ce8-7fa3-0405e3cefe97@gmail.com>

Hi Heiner,

On Feb 05, 2019 at 19:50:30 +0100, Heiner Kallweit wrote:
> Hi David,
> 
> meanwhile there's the following bug report matching what reported.
> It's even the same chip version (RTL8168h).
> https://bugzilla.redhat.com/show_bug.cgi?id=1671958
> 
> Symptom there is also a significant number of rx_missed packets.
> Could you try what I mentioned there last:
> Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
> end of rtl_hw_start_8168h_1() being disabled.

After disabled the aspm function that you mentioned, we finally got the
positive testing result. And the rx_missed error was gone. If without
the patch, the receive side get back to bad performance.

kernel: r8169: loading out-of-tree module taints kernel.
kernel: r8169: module verification failed: signature and/or required key missing - tainting kernel
kernel: libphy: r8169: probed
kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, ec:8e:b5:5a:2c:f5, XID 54100880, IRQ 128
kernel: r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
kernel: r8169 0000:01:00.0 enp1s0: renamed from eth0
kernel: Generic PHY r8169-100:00: attached PHY driver [Generic PHY] (mii_bus:phy_addr=r8169-100:00, irq=IGNORE)
kernel: r8169 0000:01:00.0 enp1s0: Link is Up - 1Gbps/Full - flow control off

NIC statistics:
     tx_packets: 1653804
     rx_packets: 1555966
     tx_errors: 0
     rx_errors: 0
     rx_missed: 0
     align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     unicast: 1555884
     broadcast: 78
     multicast: 4
     tx_aborted: 0
     tx_underrun: 0

iperf receive:
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.x.x.x, port 55516
[  5] local 10.x.x.x port 5201 connected to 10.x.x.x port 58172
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec
[  5]   1.00-2.00   sec   112 MBytes   941 Mbits/sec
[  5]   2.00-3.00   sec   112 MBytes   940 Mbits/sec
[  5]   3.00-4.00   sec   112 MBytes   941 Mbits/sec
[  5]   4.00-5.00   sec   112 MBytes   941 Mbits/sec
[  5]   5.00-6.00   sec   112 MBytes   942 Mbits/sec
[  5]   6.00-7.00   sec   112 MBytes   939 Mbits/sec
[  5]   7.00-8.00   sec   112 MBytes   941 Mbits/sec
[  5]   8.00-9.00   sec   112 MBytes   938 Mbits/sec
[  5]   9.00-10.00  sec   112 MBytes   941 Mbits/sec
[  5]  10.00-11.00  sec   112 MBytes   941 Mbits/sec
[...]
[  5]  50.00-51.00  sec   112 MBytes   941 Mbits/sec
[  5]  51.00-52.00  sec   112 MBytes   941 Mbits/sec
[  5]  52.00-53.00  sec   112 MBytes   942 Mbits/sec
[  5]  53.00-54.00  sec   112 MBytes   941 Mbits/sec
[  5]  54.00-55.00  sec   111 MBytes   934 Mbits/sec
[  5]  55.00-56.00  sec   112 MBytes   942 Mbits/sec
[  5]  56.00-57.00  sec   112 MBytes   937 Mbits/sec
[  5]  57.00-58.00  sec   112 MBytes   941 Mbits/sec
[  5]  58.00-59.00  sec   111 MBytes   932 Mbits/sec
[  5]  59.00-60.00  sec   112 MBytes   942 Mbits/sec
[  5]  60.00-60.04  sec  4.06 MBytes   939 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-60.04  sec  6.57 GBytes   940 Mbits/sec                  receiver

regards,
David

> 
> Heiner
> 
> 
> On 31.01.2019 03:32, David Chang wrote:
> > Hi,
> > 
> > We had a similr case here.
> > - Realtek r8169 receive performance regression in kernel 4.19
> >   https://bugzilla.suse.com/show_bug.cgi?id=1119649
> > 
> > kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> > The major symptom is there are many rx_missed count.
> > 
> > 
> > On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
> >> Hi Peter,
> >>
> >> recently I had somebody where pcie_aspm=off for whatever reason didn't
> >> do the trick, can you also check with pcie_aspm.policy=performance.
> > 
> > We will give it a try later.
> > 
> >> And please check with "ethtool -S <if>" whether the chip statistics
> >> show a significant number of errors.
> >>
> >> If this doesn't help you may have to bisect to find the offending commit.
> > 
> > We had tried fallback driver to a few previous commits as following,
> > but with no luck.
> > 
> > 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> > 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> > a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> > 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> > e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
> > 
> > Thanks,
> > David Chang
> > 
> >>
> >> Heiner
> >>
> >>
> >> On 30.01.2019 10:59, Peter Ceiley wrote:
> >>> Hi Heiner,
> >>>
> >>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
> >>> and this made no difference.
> >>>
> >>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
> >>> subsequently loaded the module in the running 4.19.18 kernel. I can
> >>> confirm that this immediately resolved the issue and access to the NFS
> >>> shares operated as expected.
> >>>
> >>> I presume this means it is an issue with the r8169 driver included in
> >>> 4.19 onwards?
> >>>
> >>> To answer your last questions:
> >>>
> >>> Base Board Information
> >>>     Manufacturer: Alienware
> >>>     Product Name: 0PGRP5
> >>>     Version: A02
> >>>
> >>> ... and yes, the RTL8168 is the onboard network chip.
> >>>
> >>> Regards,
> >>>
> >>> Peter.
> >>>
> >>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>
> >>>> Hi Peter,
> >>>>
> >>>> I think the vendor driver doesn't enable ASPM per default.
> >>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
> >>>> Few older systems seem to have issues with ASPM, what kind of
> >>>> system / mainboard are you using? The RTL8168 is the onboard
> >>>> network chip?
> >>>>
> >>>> Rgds, Heiner
> >>>>
> >>>>
> >>>> On 29.01.2019 07:20, Peter Ceiley wrote:
> >>>>> Hi Heiner,
> >>>>>
> >>>>> Thanks, I'll do some more testing. It might not be the driver - I
> >>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
> >>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
> >>>>> a good idea.
> >>>>>
> >>>>> Cheers,
> >>>>>
> >>>>> Peter.
> >>>>>
> >>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Peter,
> >>>>>>
> >>>>>> at a first glance it doesn't look like a typical driver issue.
> >>>>>> What you could do:
> >>>>>>
> >>>>>> - Test the r8169.c from 4.18 on top of 4.19.
> >>>>>>
> >>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
> >>>>>>
> >>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
> >>>>>>
> >>>>>> Any specific reason why you think root cause is in the driver and not
> >>>>>> elsewhere in the network subsystem?
> >>>>>>
> >>>>>> Heiner
> >>>>>>
> >>>>>>
> >>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
> >>>>>>> Hi Heiner,
> >>>>>>>
> >>>>>>> Thanks for getting back to me.
> >>>>>>>
> >>>>>>> No, I don't use jumbo packets.
> >>>>>>>
> >>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
> >>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
> >>>>>>> establishing a connection and is most notable, for example, on my
> >>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
> >>>>>>> larger directories) to list the contents of each directory. Once a
> >>>>>>> transfer begins on a file, I appear to get good bandwidth.
> >>>>>>>
> >>>>>>> I'm unsure of the best scientific data to provide you in order to
> >>>>>>> troubleshoot this issue. Running the following
> >>>>>>>
> >>>>>>>     netstat -s |grep retransmitted
> >>>>>>>
> >>>>>>> shows a steady increase in retransmitted segments each time I list the
> >>>>>>> contents of a remote directory, for example, running 'ls' on a
> >>>>>>> directory containing 345 media files did the following using kernel
> >>>>>>> 4.19.18:
> >>>>>>>
> >>>>>>> increased retransmitted segments by 21 and the 'time' command showed
> >>>>>>> the following:
> >>>>>>>     real    0m19.867s
> >>>>>>>     user    0m0.012s
> >>>>>>>     sys    0m0.036s
> >>>>>>>
> >>>>>>> The same command shows no retransmitted segments running kernel
> >>>>>>> 4.18.16 and 'time' showed:
> >>>>>>>     real    0m0.300s
> >>>>>>>     user    0m0.004s
> >>>>>>>     sys    0m0.007s
> >>>>>>>
> >>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
> >>>>>>>
> >>>>>>> dmesg XID:
> >>>>>>> [    2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
> >>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
> >>>>>>>
> >>>>>>> # lspci -vv
> >>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> >>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
> >>>>>>>     Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> >>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> >>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>     Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>     Interrupt: pin A routed to IRQ 19
> >>>>>>>     Region 0: I/O ports at d000 [size=256]
> >>>>>>>     Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
> >>>>>>>     Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
> >>>>>>>     Capabilities: [40] Power Management version 3
> >>>>>>>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
> >>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
> >>>>>>>         Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>     Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
> >>>>>>>         Address: 0000000000000000  Data: 0000
> >>>>>>>     Capabilities: [70] Express (v2) Endpoint, MSI 01
> >>>>>>>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s
> >>>>>>> <512ns, L1 <64us
> >>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> >>>>>>> SlotPowerLimit 10.000W
> >>>>>>>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>             MaxPayload 128 bytes, MaxReadReq 4096 bytes
> >>>>>>>         DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> >>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> >>>>>>> Latency L0s unlimited, L1 <64us
> >>>>>>>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> >>>>>>>         LnkCtl:    ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> >>>>>>>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
> >>>>>>>             TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>         DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
> >>>>>>> OBFF Via message/WAKE#
> >>>>>>>              AtomicOpsCap: 32bit- 64bit- 128bitCAS-
> >>>>>>>         DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
> >>>>>>> OBFF Disabled
> >>>>>>>              AtomicOpsCtl: ReqEn-
> >>>>>>>         LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
> >>>>>>>              Transmit Margin: Normal Operating Range,
> >>>>>>> EnterModifiedCompliance- ComplianceSOS-
> >>>>>>>              Compliance De-emphasis: -6dB
> >>>>>>>         LnkSta2: Current De-emphasis Level: -6dB,
> >>>>>>> EqualizationComplete-, EqualizationPhase1-
> >>>>>>>              EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> >>>>>>>     Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
> >>>>>>>         Vector table: BAR=4 offset=00000000
> >>>>>>>         PBA: BAR=4 offset=00000800
> >>>>>>>     Capabilities: [d0] Vital Product Data
> >>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
> >>>>>>>         Not readable
> >>>>>>>     Capabilities: [100 v1] Advanced Error Reporting
> >>>>>>>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> >>>>>>>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> >>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> >>>>>>>         CESta:    RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
> >>>>>>>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
> >>>>>>>         AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
> >>>>>>> ECRCChkCap+ ECRCChkEn-
> >>>>>>>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
> >>>>>>>         HeaderLog: 00000000 00000000 00000000 00000000
> >>>>>>>     Capabilities: [140 v1] Virtual Channel
> >>>>>>>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
> >>>>>>>         Arb:    Fixed- WRR32- WRR64- WRR128-
> >>>>>>>         Ctrl:    ArbSelect=Fixed
> >>>>>>>         Status:    InProgress-
> >>>>>>>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >>>>>>>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> >>>>>>>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> >>>>>>>             Status:    NegoPending- InProgress-
> >>>>>>>     Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
> >>>>>>>     Capabilities: [170 v1] Latency Tolerance Reporting
> >>>>>>>         Max snoop latency: 71680ns
> >>>>>>>         Max no snoop latency: 71680ns
> >>>>>>>     Kernel driver in use: r8169
> >>>>>>>     Kernel modules: r8169
> >>>>>>>
> >>>>>>> Please let me know if you have any other ideas in terms of testing.
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> Peter.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I have been experiencing very poor network performance since Kernel
> >>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
> >>>>>>>>>
> >>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
> >>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
> >>>>>>>>> 4.20.4 & 4.19.18).
> >>>>>>>>>
> >>>>>>>>> If someone could guide me in the right direction, I'm happy to help
> >>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
> >>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
> >>>>>>>>> differ in that I still have a network connection. I have attempted to
> >>>>>>>>> reload the driver on a running system, but this does not improve the
> >>>>>>>>> situation.
> >>>>>>>>>
> >>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
> >>>>>>>>>
> >>>>>>>>> lshw shows:
> >>>>>>>>>        description: Ethernet interface
> >>>>>>>>>        product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
> >>>>>>>>>        vendor: Realtek Semiconductor Co., Ltd.
> >>>>>>>>>        physical id: 0
> >>>>>>>>>        bus info: pci@0000:03:00.0
> >>>>>>>>>        logical name: enp3s0
> >>>>>>>>>        version: 0c
> >>>>>>>>>        serial:
> >>>>>>>>>        size: 1Gbit/s
> >>>>>>>>>        capacity: 1Gbit/s
> >>>>>>>>>        width: 64 bits
> >>>>>>>>>        clock: 33MHz
> >>>>>>>>>        capabilities: pm msi pciexpress msix vpd bus_master cap_list
> >>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
> >>>>>>>>> 1000bt-fd autonegotiation
> >>>>>>>>>        configuration: autonegotiation=on broadcast=yes driver=r8169
> >>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
> >>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
> >>>>>>>>>        resources: irq:19 ioport:d000(size=256)
> >>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>>
> >>>>>>>>> Peter.
> >>>>>>>>>
> >>>>>>>> Hi Peter,
> >>>>>>>>
> >>>>>>>> the description "poor network performance" is quite vague, therefore:
> >>>>>>>>
> >>>>>>>> - Can you provide any measurements?
> >>>>>>>> - iperf results before and after
> >>>>>>>> - statistics about dropped packets (rx and/or tx)
> >>>>>>>> - Do you use jumbo packets?
> >>>>>>>>
> >>>>>>>> Also help would be a "lspci -vv" output for the network card and
> >>>>>>>> the dmesg output line with the chip XID.
> >>>>>>>>
> >>>>>>>> Heiner
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> > 
> 
> 

^ permalink raw reply

* [PATCH net-next v5] ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs
From: Callum Sinclair @ 2019-02-14  2:44 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, nikolay, netdev, linux-kernel
  Cc: nicolas.dichtel, Callum Sinclair

Created a way to clear the multicast forwarding cache on a socket
without having to either remove the entries manually using the delete
entry socket option or destroy and recreate the multicast socket.

Calling the socket option MRT_FLUSH will allow any combination of the
four flag options to be cleared.

MRT_FLUSH_MFC will clear all non static mfc entries and clear the unresolved cache
MRT_FLUSH_MFC_STATIC will clear all static mfc entries
MRT_FLUSH_VIFS will clear all non static interfaces
MRT_FLUSH_VIFS_STATIC will clear all static interfaces.

Callum Sinclair (1):
  ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs

 include/uapi/linux/mroute.h  |  9 ++++-
 include/uapi/linux/mroute6.h |  9 ++++-
 net/ipv4/ipmr.c              | 75 +++++++++++++++++++++-------------
 net/ipv6/ip6mr.c             | 78 +++++++++++++++++++++++-------------
 4 files changed, 115 insertions(+), 56 deletions(-)

-- 
2.20.1

^ permalink raw reply

* [PATCH net-next v5] ipmr: ip6mr: Create new sockopt to clear mfc cache or vifs
From: Callum Sinclair @ 2019-02-14  2:44 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, nikolay, netdev, linux-kernel
  Cc: nicolas.dichtel, Callum Sinclair
In-Reply-To: <20190214024418.21490-1-callum.sinclair@alliedtelesis.co.nz>

Currently the only way to clear the forwarding cache was to delete the
entries one by one using the MRT_DEL_MFC socket option or to destroy and
recreate the socket.

Create a new socket option which with the use of optional flags can
clear any combination of multicast entries (static or not static) and
multicast vifs (static or not static).

Calling the new socket option MRT_FLUSH with the flags MRT_FLUSH_MFC and
MRT_FLUSH_VIFS will clear all entries and vifs on the socket except for
static entries.

Signed-off-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz>
---
v1 -> v2:
  Implemented additional flags for static entries
v2 -> v3:
  Cleaned up flag logic so any combination of routes can be cleared.
  Fixed style errors
  Fixed incorrect flag values
v3 -> v4:
  Fixed style errors
  Fixed incorrect flag (MRT_FLUSH was used instead of MRT_FLUSH_VIFS)
v4 -> v5:
  Only clear the unresolved queue when MRT_FLUSH_MFC flag is set.

 include/uapi/linux/mroute.h  |  9 ++++-
 include/uapi/linux/mroute6.h |  9 ++++-
 net/ipv4/ipmr.c              | 75 +++++++++++++++++++++-------------
 net/ipv6/ip6mr.c             | 78 +++++++++++++++++++++++-------------
 4 files changed, 115 insertions(+), 56 deletions(-)

diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
index 5d37a9ccce63..11c8c1fc1124 100644
--- a/include/uapi/linux/mroute.h
+++ b/include/uapi/linux/mroute.h
@@ -28,12 +28,19 @@
 #define MRT_TABLE	(MRT_BASE+9)	/* Specify mroute table ID		*/
 #define MRT_ADD_MFC_PROXY	(MRT_BASE+10)	/* Add a (*,*|G) mfc entry	*/
 #define MRT_DEL_MFC_PROXY	(MRT_BASE+11)	/* Del a (*,*|G) mfc entry	*/
-#define MRT_MAX		(MRT_BASE+11)
+#define MRT_FLUSH	(MRT_BASE+12)	/* Flush all mfc entries and/or vifs	*/
+#define MRT_MAX		(MRT_BASE+12)
 
 #define SIOCGETVIFCNT	SIOCPROTOPRIVATE	/* IP protocol privates */
 #define SIOCGETSGCNT	(SIOCPROTOPRIVATE+1)
 #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
 
+/* MRT_FLUSH optional flags */
+#define MRT_FLUSH_MFC	1	/* Flush multicast entries */
+#define MRT_FLUSH_MFC_STATIC	2	/* Flush static multicast entries */
+#define MRT_FLUSH_VIFS	4	/* Flush multicast vifs */
+#define MRT_FLUSH_VIFS_STATIC	8	/* Flush static multicast vifs */
+
 #define MAXVIFS		32
 typedef unsigned long vifbitmap_t;	/* User mode code depends on this lot */
 typedef unsigned short vifi_t;
diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h
index 9999cc006390..ac84ef11b29c 100644
--- a/include/uapi/linux/mroute6.h
+++ b/include/uapi/linux/mroute6.h
@@ -31,12 +31,19 @@
 #define MRT6_TABLE	(MRT6_BASE+9)	/* Specify mroute table ID		*/
 #define MRT6_ADD_MFC_PROXY	(MRT6_BASE+10)	/* Add a (*,*|G) mfc entry	*/
 #define MRT6_DEL_MFC_PROXY	(MRT6_BASE+11)	/* Del a (*,*|G) mfc entry	*/
-#define MRT6_MAX	(MRT6_BASE+11)
+#define MRT6_FLUSH	(MRT6_BASE+12)	/* Flush all mfc entries and/or vifs	*/
+#define MRT6_MAX	(MRT6_BASE+12)
 
 #define SIOCGETMIFCNT_IN6	SIOCPROTOPRIVATE	/* IP protocol privates */
 #define SIOCGETSGCNT_IN6	(SIOCPROTOPRIVATE+1)
 #define SIOCGETRPF	(SIOCPROTOPRIVATE+2)
 
+/* MRT6_FLUSH optional flags */
+#define MRT6_FLUSH_MFC	1	/* Flush multicast entries */
+#define MRT6_FLUSH_MFC_STATIC	2	/* Flush static multicast entries */
+#define MRT6_FLUSH_VIFS	4	/* Flushing multicast vifs */
+#define MRT6_FLUSH_VIFS_STATIC	8	/* Flush static multicast vifs */
+
 #define MAXMIFS		32
 typedef unsigned long mifbitmap_t;	/* User mode code depends on this lot */
 typedef unsigned short mifi_t;
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index e536970557dd..53869779af74 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -110,7 +110,7 @@ static int ipmr_cache_report(struct mr_table *mrt,
 static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
 				 int cmd);
 static void igmpmsg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt);
-static void mroute_clean_tables(struct mr_table *mrt, bool all);
+static void mroute_clean_tables(struct mr_table *mrt, int flags);
 static void ipmr_expire_process(struct timer_list *t);
 
 #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
@@ -415,7 +415,8 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
 static void ipmr_free_table(struct mr_table *mrt)
 {
 	del_timer_sync(&mrt->ipmr_expire_timer);
-	mroute_clean_tables(mrt, true);
+	mroute_clean_tables(mrt, MRT_FLUSH_VIFS | MRT_FLUSH_VIFS_STATIC |
+					  MRT_FLUSH_MFC | MRT_FLUSH_MFC_STATIC);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree(mrt);
 }
@@ -1296,7 +1297,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
 }
 
 /* Close the multicast socket, and clear the vif tables etc */
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+static void mroute_clean_tables(struct mr_table *mrt, int flags)
 {
 	struct net *net = read_pnet(&mrt->net);
 	struct mr_mfc *c, *tmp;
@@ -1305,35 +1306,44 @@ static void mroute_clean_tables(struct mr_table *mrt, bool all)
 	int i;
 
 	/* Shut down all active vif entries */
-	for (i = 0; i < mrt->maxvif; i++) {
-		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-			continue;
-		vif_delete(mrt, i, 0, &list);
+	if (flags & (MRT_FLUSH_VIFS | MRT_FLUSH_VIFS_STATIC)) {
+		for (i = 0; i < mrt->maxvif; i++) {
+			if (((mrt->vif_table[i].flags & VIFF_STATIC) &&
+			     !(flags & MRT_FLUSH_VIFS_STATIC)) ||
+			    (!(mrt->vif_table[i].flags & VIFF_STATIC) && !(flags & MRT_FLUSH_VIFS)))
+				continue;
+			vif_delete(mrt, i, 0, &list);
+		}
+		unregister_netdevice_many(&list);
 	}
-	unregister_netdevice_many(&list);
 
 	/* Wipe the cache */
-	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
-		if (!all && (c->mfc_flags & MFC_STATIC))
-			continue;
-		rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
-		list_del_rcu(&c->list);
-		cache = (struct mfc_cache *)c;
-		call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
-					      mrt->id);
-		mroute_netlink_event(mrt, cache, RTM_DELROUTE);
-		mr_cache_put(c);
-	}
-
-	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
-		spin_lock_bh(&mfc_unres_lock);
-		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
-			list_del(&c->list);
+	if (flags & (MRT_FLUSH_MFC | MRT_FLUSH_MFC_STATIC)) {
+		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
+			if (((c->mfc_flags & MFC_STATIC) && !(flags & MRT_FLUSH_MFC_STATIC)) ||
+			    (!(c->mfc_flags & MFC_STATIC) && !(flags & MRT_FLUSH_MFC)))
+				continue;
+			rhltable_remove(&mrt->mfc_hash, &c->mnode, ipmr_rht_params);
+			list_del_rcu(&c->list);
 			cache = (struct mfc_cache *)c;
+			call_ipmr_mfc_entry_notifiers(net, FIB_EVENT_ENTRY_DEL, cache,
+						      mrt->id);
 			mroute_netlink_event(mrt, cache, RTM_DELROUTE);
-			ipmr_destroy_unres(mrt, cache);
+			mr_cache_put(c);
+		}
+	}
+
+	if (flags & MRT_FLUSH_MFC) {
+		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+			spin_lock_bh(&mfc_unres_lock);
+			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
+				list_del(&c->list);
+				cache = (struct mfc_cache *)c;
+				mroute_netlink_event(mrt, cache, RTM_DELROUTE);
+				ipmr_destroy_unres(mrt, cache);
+			}
+			spin_unlock_bh(&mfc_unres_lock);
 		}
-		spin_unlock_bh(&mfc_unres_lock);
 	}
 }
 
@@ -1354,7 +1364,7 @@ static void mrtsock_destruct(struct sock *sk)
 						    NETCONFA_IFINDEX_ALL,
 						    net->ipv4.devconf_all);
 			RCU_INIT_POINTER(mrt->mroute_sk, NULL);
-			mroute_clean_tables(mrt, false);
+			mroute_clean_tables(mrt, MRT_FLUSH_VIFS | MRT_FLUSH_MFC);
 		}
 	}
 	rtnl_unlock();
@@ -1479,6 +1489,17 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 					   sk == rtnl_dereference(mrt->mroute_sk),
 					   parent);
 		break;
+	case MRT_FLUSH:
+		if (optlen != sizeof(val)) {
+			ret = -EINVAL;
+			break;
+		}
+		if (get_user(val, (int __user *)optval)) {
+			ret = -EFAULT;
+			break;
+		}
+		mroute_clean_tables(mrt, val);
+		break;
 	/* Control PIM assert. */
 	case MRT_ASSERT:
 		if (optlen != sizeof(val)) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index cc01aa3f2b5e..b67a7c1e3615 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -97,7 +97,7 @@ static void mr6_netlink_event(struct mr_table *mrt, struct mfc6_cache *mfc,
 static void mrt6msg_netlink_event(struct mr_table *mrt, struct sk_buff *pkt);
 static int ip6mr_rtm_dumproute(struct sk_buff *skb,
 			       struct netlink_callback *cb);
-static void mroute_clean_tables(struct mr_table *mrt, bool all);
+static void mroute_clean_tables(struct mr_table *mrt, int flags);
 static void ipmr_expire_process(struct timer_list *t);
 
 #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES
@@ -393,7 +393,8 @@ static struct mr_table *ip6mr_new_table(struct net *net, u32 id)
 static void ip6mr_free_table(struct mr_table *mrt)
 {
 	del_timer_sync(&mrt->ipmr_expire_timer);
-	mroute_clean_tables(mrt, true);
+	mroute_clean_tables(mrt, MRT6_FLUSH_VIFS | MRT6_FLUSH_VIFS_STATIC |
+					  MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC);
 	rhltable_destroy(&mrt->mfc_hash);
 	kfree(mrt);
 }
@@ -1496,42 +1497,51 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
  *	Close the multicast socket, and clear the vif tables etc
  */
 
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+static void mroute_clean_tables(struct mr_table *mrt, int flags)
 {
 	struct mr_mfc *c, *tmp;
 	LIST_HEAD(list);
 	int i;
 
 	/* Shut down all active vif entries */
-	for (i = 0; i < mrt->maxvif; i++) {
-		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-			continue;
-		mif6_delete(mrt, i, 0, &list);
+	if (flags & (MRT6_FLUSH_VIFS | MRT6_FLUSH_VIFS_STATIC)) {
+		for (i = 0; i < mrt->maxvif; i++) {
+			if (((mrt->vif_table[i].flags & VIFF_STATIC) &&
+			     !(flags & MRT6_FLUSH_VIFS_STATIC)) ||
+			    (!(mrt->vif_table[i].flags & VIFF_STATIC) && !(flags & MRT6_FLUSH_VIFS)))
+				continue;
+			mif6_delete(mrt, i, 0, &list);
+		}
+		unregister_netdevice_many(&list);
 	}
-	unregister_netdevice_many(&list);
 
 	/* Wipe the cache */
-	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
-		if (!all && (c->mfc_flags & MFC_STATIC))
-			continue;
-		rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
-		list_del_rcu(&c->list);
-		call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
-					       FIB_EVENT_ENTRY_DEL,
-					       (struct mfc6_cache *)c, mrt->id);
-		mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
-		mr_cache_put(c);
+	if (flags & (MRT6_FLUSH_MFC | MRT6_FLUSH_MFC_STATIC)) {
+		list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
+			if (((c->mfc_flags & MFC_STATIC) && !(flags & MRT6_FLUSH_MFC_STATIC)) ||
+			    (!(c->mfc_flags & MFC_STATIC) && !(flags & MRT6_FLUSH_MFC)))
+				continue;
+			rhltable_remove(&mrt->mfc_hash, &c->mnode, ip6mr_rht_params);
+			list_del_rcu(&c->list);
+			call_ip6mr_mfc_entry_notifiers(read_pnet(&mrt->net),
+						       FIB_EVENT_ENTRY_DEL,
+										   (struct mfc6_cache *)c, mrt->id);
+			mr6_netlink_event(mrt, (struct mfc6_cache *)c, RTM_DELROUTE);
+			mr_cache_put(c);
+		}
 	}
 
-	if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
-		spin_lock_bh(&mfc_unres_lock);
-		list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
-			list_del(&c->list);
-			mr6_netlink_event(mrt, (struct mfc6_cache *)c,
-					  RTM_DELROUTE);
-			ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
+	if (flags & MRT6_FLUSH_MFC) {
+		if (atomic_read(&mrt->cache_resolve_queue_len) != 0) {
+			spin_lock_bh(&mfc_unres_lock);
+			list_for_each_entry_safe(c, tmp, &mrt->mfc_unres_queue, list) {
+				list_del(&c->list);
+				mr6_netlink_event(mrt, (struct mfc6_cache *)c,
+						  RTM_DELROUTE);
+				ip6mr_destroy_unres(mrt, (struct mfc6_cache *)c);
+			}
+			spin_unlock_bh(&mfc_unres_lock);
 		}
-		spin_unlock_bh(&mfc_unres_lock);
 	}
 }
 
@@ -1587,7 +1597,7 @@ int ip6mr_sk_done(struct sock *sk)
 						     NETCONFA_IFINDEX_ALL,
 						     net->ipv6.devconf_all);
 
-			mroute_clean_tables(mrt, false);
+			mroute_clean_tables(mrt, MRT6_FLUSH_VIFS | MRT6_FLUSH_MFC);
 			err = 0;
 			break;
 		}
@@ -1703,6 +1713,20 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		rtnl_unlock();
 		return ret;
 
+	case MRT6_FLUSH:
+	{
+		int flags;
+
+		if (optlen != sizeof(flags))
+			return -EINVAL;
+		if (get_user(flags, (int __user *)optval))
+			return -EFAULT;
+		rtnl_lock();
+		mroute_clean_tables(mrt, flags);
+		rtnl_unlock();
+		return 0;
+	}
+
 	/*
 	 *	Control PIM assert (to activate pim will activate assert)
 	 */
-- 
2.20.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox