Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 0/3] kallsyms: don't leak address
From: Tobin C. Harding @ 2017-12-17 23:53 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Tobin C. Harding, Steven Rostedt, Tycho Andersen, Linus Torvalds,
	Kees Cook, Andrew Morton, Daniel Borkmann, Masahiro Yamada,
	Alexei Starovoitov, linux-kernel, Network Development

This set plugs a kernel address leak that occurs if kallsyms symbol
look up fails. This set was prompted by a leaking address found using
scripts/leaking_addresses.pl on a PowerPC machine in the wild.

Patch set does not change behaviour when KALLSYMS is not defined
(suggested by Linus).

RFC has been in flight for 3 weeks with no negative response.

Patch 1 - return error code if symbol look up fails.
Patch 2 - print <no-symbol> to buffer if symbol look up returns an error.
Patch 3 - maintain current behaviour in ftrace.

Patch 3 (the ftrace stuff) is untested.

thanks,
Tobin.

Tobin C. Harding (3):
  kallsyms: don't leak address when symbol not found
  vsprintf: print <no-symbol> if symbol not found
  trace: print address if symbol not found

 include/linux/kernel.h           |  2 ++
 kernel/kallsyms.c                |  6 ++++--
 kernel/trace/trace.h             | 24 ++++++++++++++++++++++++
 kernel/trace/trace_events_hist.c |  6 +++---
 lib/vsprintf.c                   | 18 +++++++++++++++---
 5 files changed, 48 insertions(+), 8 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: Linux 4.14 - regression: broken tun/tap / bridge network with virtio - bisected
From: Willem de Bruijn @ 2017-12-17 22:33 UTC (permalink / raw)
  To: Andreas Hartmann
  Cc: Michal Kubecek, Jason Wang, David Miller, Network Development
In-Reply-To: <c2044fb8-a2ce-241f-b1ce-054ac70a327d@01019freenet.de>

On Fri, Dec 15, 2017 at 1:05 AM, Andreas Hartmann
<andihartmann@01019freenet.de> wrote:
> On 12/14/2017 at 11:17 PM Willem de Bruijn wrote:
>>>> Well, the patch does not fix hanging VMs, which have been shutdown and
>>>> can't be killed any more.
>>>> Because of the stack trace
>>>>
>>>> [<ffffffffc0d0e3c5>] vhost_net_ubuf_put_and_wait+0x35/0x60 [vhost_net]
>>>> [<ffffffffc0d0f264>] vhost_net_ioctl+0x304/0x870 [vhost_net]
>>>> [<ffffffff9b25460f>] do_vfs_ioctl+0x8f/0x5c0
>>>> [<ffffffff9b254bb4>] SyS_ioctl+0x74/0x80
>>>> [<ffffffff9b00365b>] do_syscall_64+0x5b/0x100
>>>> [<ffffffff9b78e7ab>] entry_SYSCALL64_slow_path+0x25/0x25
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>>
>>>> I was hoping, that the problems could be related - but that seems not to
>>>> be true.
>>>
>>> However, it turned out, that reverting the complete patchset "Remove UDP
>>> Fragmentation Offload support" prevent hanging qemu processes.
>>
>> That implies a combination of UFO and vhost zerocopy. Disabling
>> experimental_zcopytx in vhost_net will probably work around the bug
>> then.

I have been able to reproduce the hang by sending a UFO packet
between two guests running v4.13 on a host running v4.15-rc1.

The vhost_net_ubuf_ref refcount indeed hits overflow (-1) from
vhost_zerocopy_callback being called for each segment of a
segmented UFO skb. This refcount is decremented then on each
segment, but incremented only once for the entire UFO skb.

Before v4.14, these packets would be converted in skb_segment to
regular copy packets with skb_orphan_frags and the callback function
called once at this point. v4.14 added support for reference counted
zerocopy skb that can pass through skb_orphan_frags unmodified and
have their zerocopy state safely cloned with skb_zerocopy_clone.

The call to skb_zerocopy_clone must come after skb_orphan_frags
to limit cloning of this state to those skbs that can do so safely.

Please try a host with the following patch. This fixes it for me. I intend to
send it to net.

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a592ca025fc4..d2d985418819 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3654,8 +3654,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
-                       goto err;

                while (pos < offset + len) {
                        if (i >= nfrags) {
@@ -3681,6 +3679,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                        if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
                                goto err;
+                       if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+                               goto err;

                        *nskb_frag = *frag;
                        __skb_frag_ref(nskb_frag);


This is relatively inefficient, as it calls skb_zerocopy_clone for each frag
in the frags[] array. I will follow-up with a patch to net-next that only
checks once per skb:

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 466581cf4cdc..a293a33604ec 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3662,7 +3662,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
                                              SKBTX_SHARED_FRAG;
-               if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
+               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+                   skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
                        goto err;

                while (pos < offset + len) {
@@ -3676,6 +3677,11 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,

                                BUG_ON(!nfrags);

+                               if (skb_orphan_frags(frag_skb, GFP_ATOMIC) ||
+                                   skb_zerocopy_clone(nskb, frag_skb,
+                                                      GFP_ATOMIC))
+                                       goto err;
+
                                list_skb = list_skb->next;
                        }

@@ -3687,9 +3693,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
                                goto err;
                        }

-                       if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
-                               goto err;
-

I'll also send to net-next

(1) a patch to convert its vhost_net_ ubuf_ref refcnt to refcount_t

(2) a path to skb_zerocopy_clone to warn on clone if not
     sock_zerocopy_callback

> I already tested it w/ options vhost_net experimental_zcopytx=0 - but
> this didn't "resolve" anything. See
> https://www.mail-archive.com/netdev@vger.kernel.org/msg203197.html
>
> Therefore, I think your following thoughts are lapsed unfortunately,
> aren't they?

That experiment was perhaps run before commit 0c19f846d582 ("net:
accept UFO datagrams from tuntap and packet") and hit the other UFO
bug.

^ permalink raw reply related

* [PATCH] net: ibm: emac: support RGMII-[RX|TX]ID phymode
From: Christian Lamparter @ 2017-12-17 21:51 UTC (permalink / raw)
  To: netdev; +Cc: David S . Miller, Andrew Lunn, Christophe Jaillet

The RGMII spec allows compliance for devices that implement an internal
delay on TXC and/or RXC inside the transmitter. This patch adds the
necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the
emac driver.

Signed-off-by: Christian Lamparter <chunkeey@gmail.com>
---
 drivers/net/ethernet/ibm/emac/core.c  | 3 +++
 drivers/net/ethernet/ibm/emac/emac.h  | 3 +++
 drivers/net/ethernet/ibm/emac/rgmii.c | 9 +++++++++
 3 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/ibm/emac/core.c b/drivers/net/ethernet/ibm/emac/core.c
index 7feff2450ed6..820173bee168 100644
--- a/drivers/net/ethernet/ibm/emac/core.c
+++ b/drivers/net/ethernet/ibm/emac/core.c
@@ -201,6 +201,9 @@ static inline int emac_phy_supports_gige(int phy_mode)
 {
 	return  phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_RGMII ||
+		phy_mode == PHY_MODE_RGMII_ID ||
+		phy_mode == PHY_MODE_RGMII_RXID ||
+		phy_mode == PHY_MODE_RGMII_TXID ||
 		phy_mode == PHY_MODE_SGMII ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
diff --git a/drivers/net/ethernet/ibm/emac/emac.h b/drivers/net/ethernet/ibm/emac/emac.h
index 5afcc27ceebb..8c6d2af7281b 100644
--- a/drivers/net/ethernet/ibm/emac/emac.h
+++ b/drivers/net/ethernet/ibm/emac/emac.h
@@ -112,6 +112,9 @@ struct emac_regs {
 #define PHY_MODE_RMII	PHY_INTERFACE_MODE_RMII
 #define PHY_MODE_SMII	PHY_INTERFACE_MODE_SMII
 #define PHY_MODE_RGMII	PHY_INTERFACE_MODE_RGMII
+#define PHY_MODE_RGMII_ID	PHY_INTERFACE_MODE_RGMII_ID
+#define PHY_MODE_RGMII_RXID	PHY_INTERFACE_MODE_RGMII_RXID
+#define PHY_MODE_RGMII_TXID	PHY_INTERFACE_MODE_RGMII_TXID
 #define PHY_MODE_TBI	PHY_INTERFACE_MODE_TBI
 #define PHY_MODE_GMII	PHY_INTERFACE_MODE_GMII
 #define PHY_MODE_RTBI	PHY_INTERFACE_MODE_RTBI
diff --git a/drivers/net/ethernet/ibm/emac/rgmii.c b/drivers/net/ethernet/ibm/emac/rgmii.c
index c4a1ac38bba8..7963adffbb1c 100644
--- a/drivers/net/ethernet/ibm/emac/rgmii.c
+++ b/drivers/net/ethernet/ibm/emac/rgmii.c
@@ -55,6 +55,9 @@ static inline int rgmii_valid_mode(int phy_mode)
 	return  phy_mode == PHY_MODE_GMII ||
 		phy_mode == PHY_MODE_MII ||
 		phy_mode == PHY_MODE_RGMII ||
+		phy_mode == PHY_MODE_RGMII_ID ||
+		phy_mode == PHY_MODE_RGMII_RXID ||
+		phy_mode == PHY_MODE_RGMII_TXID ||
 		phy_mode == PHY_MODE_TBI ||
 		phy_mode == PHY_MODE_RTBI;
 }
@@ -63,6 +66,9 @@ static inline const char *rgmii_mode_name(int mode)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return "RGMII";
 	case PHY_MODE_TBI:
 		return "TBI";
@@ -81,6 +87,9 @@ static inline u32 rgmii_mode_mask(int mode, int input)
 {
 	switch (mode) {
 	case PHY_MODE_RGMII:
+	case PHY_MODE_RGMII_ID:
+	case PHY_MODE_RGMII_RXID:
+	case PHY_MODE_RGMII_TXID:
 		return RGMII_FER_RGMII(input);
 	case PHY_MODE_TBI:
 		return RGMII_FER_TBI(input);
-- 
2.15.1

^ permalink raw reply related

* pull-request: bpf 2017-12-17
From: Daniel Borkmann @ 2017-12-17 20:06 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a corner case in generic XDP where we have non-linear skbs
   but enough tailroom in the skb to not miss to linearizing there,
   from Song.

2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
   BPF context is not skb, from Daniel.

3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
   call would use the wrong register for the skb, from Daniel.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 8c8f67a46f2bf33556ad12a1971734047b60831a:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf (2017-12-13 17:30:04 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to c1b08ebe5003ae291470cb6e26923628ab19606f:

  Merge branch 'bpf-jit-fixes' (2017-12-15 09:19:37 -0800)

----------------------------------------------------------------
Alexei Starovoitov (1):
      Merge branch 'bpf-jit-fixes'

Daniel Borkmann (5):
      bpf, s390x: do not reload skb pointers in non-skb context
      bpf, ppc64: do not reload skb pointers in non-skb context
      bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_data
      bpf, sparc: fix usage of wrong reg for load_skb_regs after call
      bpf: add test case for ld_abs and helper changing pkt data

Song Liu (1):
      xdp: linearize skb in netif_receive_generic_xdp()

 arch/powerpc/net/bpf_jit_comp64.c           |  6 ++--
 arch/s390/net/bpf_jit_comp.c                | 11 ++++----
 arch/sparc/net/bpf_jit_comp_64.c            |  6 ++--
 kernel/bpf/verifier.c                       |  6 ++++
 lib/test_bpf.c                              | 43 +++++++++++++++++++++++++++++
 net/core/dev.c                              |  2 +-
 tools/testing/selftests/bpf/test_verifier.c | 24 ++++++++++++++++
 7 files changed, 87 insertions(+), 11 deletions(-)

^ permalink raw reply

* Re: [PATCH] trace: reenable preemption if we modify the ip
From: Daniel Borkmann @ 2017-12-17 19:49 UTC (permalink / raw)
  To: Josef Bacik, netdev, mhiramat, ast, darrick.wong, linux-kernel
  Cc: Josef Bacik
In-Reply-To: <1513392177-10298-1-git-send-email-josef@toxicpanda.com>

On 12/16/2017 03:42 AM, Josef Bacik wrote:
> From: Josef Bacik <jbacik@fb.com>
> 
> Things got moved around between the original bpf_override_return patches
> and the final version, and now the ftrace kprobe dispatcher assumes if
> you modified the ip that you also enabled preemption.  Make a comment of
> this and enable preemption, this fixes the lockdep splat that happened
> when using this feature.
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Applied to bpf-next with Fixes tag, thanks Josef.

^ permalink raw reply

* Re: [PATCH bpf-next] nfp: set flags in the correct member of netdev_bpf
From: Daniel Borkmann @ 2017-12-17 19:43 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: oss-drivers, alexei.starovoitov
In-Reply-To: <20171216002913.22278-1-jakub.kicinski@netronome.com>

On 12/16/2017 01:29 AM, Jakub Kicinski wrote:
> netdev_bpf.flags is the input member for installing the program.
> netdev_bpf.prog_flags is the output member for querying.  Set
> the correct one on query.
> 
> Fixes: 92f0292b35a0 ("net: xdp: report flags program was installed with on query")
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Yep, netdevsim had this correct. :) Applied to bpf-next, thanks Jakub!

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: fix Makefile exit code if libelf not found
From: Daniel Borkmann @ 2017-12-17 19:41 UTC (permalink / raw)
  To: Jakub Kicinski, netdev; +Cc: oss-drivers, alexei.starovoitov
In-Reply-To: <20171216001930.21836-1-jakub.kicinski@netronome.com>

On 12/16/2017 01:19 AM, Jakub Kicinski wrote:
> /bin/sh's exit does not recognize -1 as a number, leading to
> the following error message:
> 
> /bin/sh: 1: exit: Illegal number: -1
> 
> Use 1 as the exit code.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

Applied to bpf-next, thanks Jakub!

^ permalink raw reply

* Re: [PATCH bpf-next 00/13] bpf: introduce function calls
From: Daniel Borkmann @ 2017-12-17 19:38 UTC (permalink / raw)
  To: Alexei Starovoitov, David S . Miller
  Cc: John Fastabend, Edward Cree, Jakub Kicinski, netdev, kernel-team
In-Reply-To: <20171215015517.409513-1-ast@kernel.org>

On 12/15/2017 02:55 AM, Alexei Starovoitov wrote:
> First of all huge thank you to Daniel, John, Jakub, Edward and others who
> reviewed multiple iterations of this patch set over the last many months
> and to Dave and others who gave critical feedback during netconf/netdev.
> 
> The patch is solid enough and we thought through numerous corner cases,
> but it's not the end. More followups with code reorg and features to follow.
> 
> TLDR: Allow arbitrary function calls from bpf function to another bpf function.
> 
> Since the beginning of bpf all bpf programs were represented as a single function
> and program authors were forced to use always_inline for all functions
> in their C code. That was causing llvm to unnecessary inflate the code size
> and forcing developers to move code to header files with little code reuse.
> 
> With a bit of additional complexity teach verifier to recognize
> arbitrary function calls from one bpf function to another as long as
> all of functions are presented to the verifier as a single bpf program.
> Extended program layout:
> ..
> r1 = ..    // arg1
> r2 = ..    // arg2
> call pc+1  // function call pc-relative
> exit
> .. = r1    // access arg1
> .. = r2    // access arg2
> ..
> call pc+20 // second level of function call
> ...
> 
> It allows for better optimized code and finally allows to introduce
> the core bpf libraries that can be reused in different projects,
> since programs are no longer limited by single elf file.
> With function calls bpf can be compiled into multiple .o files.
> 
> This patch is the first step. It detects programs that contain
> multiple functions and checks that calls between them are valid.
> It splits the sequence of bpf instructions (one program) into a set
> of bpf functions that call each other. Calls to only known
> functions are allowed. Since all functions are presented to
> the verifier at once conceptually it is 'static linking'.
> 
> Future plans:
> - introduce BPF_PROG_TYPE_LIBRARY and allow a set of bpf functions
>   to be loaded into the kernel that can be later linked to other
>   programs with concrete program types. Aka 'dynamic linking'.
> 
> - introduce function pointer type and indirect calls to allow
>   bpf functions call other dynamically loaded bpf functions while
>   the caller bpf function is already executing. Aka 'runtime linking'.
>   This will be more generic and more flexible alternative
>   to bpf_tail_calls.
> 
> FAQ:
> Q: Interpreter and JIT changes mean that new instruction is introduced ?
> A: No. The call instruction technically stays the same. Now it can call
>    both kernel helpers and other bpf functions.
>    Calling convention stays the same as well.
>    From uapi point of view the call insn got new 'relocation' BPF_PSEUDO_CALL
>    similar to BPF_PSEUDO_MAP_FD 'relocation' of bpf_ldimm64 insn.
> 
> Q: What had to change on LLVM side?
> A: Trivial LLVM patch to allow calls was applied to upcoming 6.0 release:
>    https://reviews.llvm.org/rL318614
>    with few bugfixes as well.
>    Make sure to build the latest llvm to have bpf_call support.
> 
> More details in the patches.

Series applied to bpf-next, thanks Alexei!

^ permalink raw reply

* Re: [RFC net-next] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
From: Russell King - ARM Linux @ 2017-12-17 19:26 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Florian Fainelli, David S. Miller, netdev
In-Reply-To: <20171217182922.GB29596@lunn.ch>

On Sun, Dec 17, 2017 at 07:29:22PM +0100, Andrew Lunn wrote:
> On Sun, Dec 17, 2017 at 02:48:27PM +0000, Russell King wrote:
> > Provide a pointer to the SFP bus in struct net_device, so that the
> > ethtool module EEPROM methods can access the SFP directly, rather
> > than needing every user to provide a hook for it.
> > 
> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> > ---
> > Questions:
> > 1. Is it worth adding a pointer to struct net_device for these two
> >    methods, rather than having multiple duplicate veneers to vector
> >    the ethtool module EEPROM ioctls through to the SFP bus layer?
> > 
> > 2. Should this allow network/phy drivers to override the default -
> >    the code is currently structured to allow phy drivers to override
> >    network drivers implementations, which seems the wrong way around.
> 
> Hi Russell
> 
> Looking at drivers which implement reading the EEPROM, very few of
> them expose the i2c bus, as a linux i2c bus. They seem to send
> commands off to the firmware, and have it return a block of data.  So
> converting to using the generic SFP code is not going to be too easy.
> 
> Probably a low hanging fruit is to expose a few library like functions
> for parsing the EEPROM data. As you said, there seems to be a few bugs
> in the drivers with respect to actually interpreting the data. So
> having one central implementation, without bugs, would be good.
> 
> Rather than adding the sfp bus to net_device, i think phylink will get
> more use. And the default implementation of these methods can look at
> the phylink to see if there is an sfp device.

You can't layer phylink on top of phylib on top of phylink for the
situation where we have a SFP cage connected to a PHY - that's the
problem.

SFP needs to know what is happening with the net device, and when
to enable or disable the laser, and although there's notifiers for
the netdev up/down, using that is far from ideal in this case - to
do so, SFP would need the reverse phandle in DT so it knows which
network device its associated with.  I've already been there with
a previous iteration of the SFP code.

> We are unlikely to be
> able to replace phydev with phylink, but maybe all new 10Gbps PHY and
> fibre modules not hidden behind firmware could use phylink? So having
> phylink in net_device could make sense. There has been a move to
> remove phydev from the drivers private structure and use the one in
> net_device. Maybe we should do the same for phylink?

I would suggest you read the patch that adds SFP support to the 88x3310
PHY driver - that case makes no use of phylink.  As I mention above,
it's not possible to layer phylink on top of phylib.  Not only would
that lead to nested locks, but phy drivers do not have the knowledge
necessary to know when to make various phylink calls, as phylib
drivers have no clue when the network device comes up/goes down for
example.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* Re: [PATCH] openvswitch: Trim off padding before L3 conntrack processing
From: Pravin Shelar @ 2017-12-17 19:22 UTC (permalink / raw)
  To: Ed Swierk
  Cc: ovs-dev, Linux Kernel Network Developers, Lance Richardson,
	Benjamin Warren, Keith Holleman
In-Reply-To: <CAO_EM_mDa17fQx+--RDSyr1_qKSW7ZKxtt+p3J9yt23F8m_F1w@mail.gmail.com>

On Thu, Dec 14, 2017 at 12:05 PM, Ed Swierk <eswierk@skyportsystems.com> wrote:
> On Wed, Dec 13, 2017 at 4:58 PM, Pravin Shelar <pshelar@ovn.org> wrote:
>> On Tue, Dec 12, 2017 at 8:17 AM, Ed Swierk <eswierk@skyportsystems.com> wrote:
>>> A short IPv4 packet may have up to 6 bytes of padding following the IP
>>> payload when received on an Ethernet device.
>>>
>>> In the normal IPv4 receive path, ip_rcv() trims the packet to
>>> ip_hdr->tot_len before invoking NF_INET_PRE_ROUTING hooks (including
>>> conntrack). Then any subsequent L3+ processing steps, like
>>> nf_checksum(), use skb->len as the length of the packet, rather than
>>> referring back to ip_hdr->tot_len. In the IPv6 receive path, ip6_rcv()
>>> does the same using ipv6_hdr->payload_len.
>>>
>>> In the OVS conntrack receive path, this trimming does not occur, so
>>> the checksum verification in tcp_header() fails, printing "nf_ct_tcp:
>>> bad TCP checksum". Extra zero bytes don't affect the checksum, but the
>>> length in the IP pseudoheader does. That length is based on skb->len,
>>> and without trimming, it doesn't match the length the sender used when
>>> computing the checksum.
>>>
>>> With this change, OVS conntrack trims IPv4 and IPv6 packets prior to
>>> L3 processing.
>>>
>>> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
>>> ---
>>>  net/openvswitch/conntrack.c | 17 +++++++++++++++++
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>>> index d558e882ca0c..3a7c9215c431 100644
>>> --- a/net/openvswitch/conntrack.c
>>> +++ b/net/openvswitch/conntrack.c
>>> @@ -1105,12 +1105,29 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
>>>                    const struct ovs_conntrack_info *info)
>>>  {
>>>         int nh_ofs;
>>> +       unsigned int nh_len;
>>>         int err;
>>>
>>>         /* The conntrack module expects to be working at L3. */
>>>         nh_ofs = skb_network_offset(skb);
>>>         skb_pull_rcsum(skb, nh_ofs);
>>>
>>> +       /* Trim to L3 length since nf_checksum() doesn't expect padding. */
>> Can you explore if nf_checksum can be changed to avoid the padding?
>
> The nf_ip_checksum() and nf_ip6_checksum() helper functions can easily
> be changed to avoid the padding.
>
> My worry is that conntrack is just one of many netfilter hooks that
> perform L3+ processing, and may assume that once skb->data points to
> the L3 header, skb->len reflects the length of the L3 header and
> payload. For example, in nf_conntrack_ftp.c, help() uses skb->len to
> determine the length of the FTP payload and the TCP sequence number of
> the next packet; this would be thrown off by lower-layer padding.
>
> br_netfilter, a cousin of OVS, has always preserved this
> assumption--like ip_rcv() and ip6_rcv(), br_validate_ipv4() and
> br_validate_ipv6() trim the skb to the L3 length before they invoke
> NF_INET_PRE_ROUTING hooks. Modifying OVS to fit the mold seems more
> straightforward than changing this assumption.
>
we could avoid extra processing in fast path, thats why I wanted to
explore this, But if it is too complex, I am fine with this patch.

>>> +       switch (skb->protocol) {
>>> +       case htons(ETH_P_IP):
>>> +               nh_len = ntohs(ip_hdr(skb)->tot_len);
>>> +               break;
>>> +       case htons(ETH_P_IPV6):
>>> +               nh_len = ntohs(ipv6_hdr(skb)->payload_len)
>>> +                       + sizeof(struct ipv6hdr);
>>> +               break;
>>> +       default:
>>> +               nh_len = skb->len;
>>> +       }
>>> +       err = pskb_trim_rcsum(skb, nh_len);
>>> +       if (err)
>>> +               return err;
>>> +
>> In case of error skb needs to be freed.
>
> Thanks, I will fix this.
>
> --Ed

^ permalink raw reply

* Re: [PATCH] openvswitch: Trim off padding before L3 conntrack processing
From: Pravin Shelar @ 2017-12-17 19:22 UTC (permalink / raw)
  To: Ed Swierk
  Cc: ovs-dev, Linux Kernel Network Developers, Lance Richardson,
	Benjamin Warren, Keith Holleman
In-Reply-To: <CAO_EM_mDa17fQx+--RDSyr1_qKSW7ZKxtt+p3J9yt23F8m_F1w@mail.gmail.com>

On Thu, Dec 14, 2017 at 12:05 PM, Ed Swierk <eswierk@skyportsystems.com> wrote:
> On Wed, Dec 13, 2017 at 4:58 PM, Pravin Shelar <pshelar@ovn.org> wrote:
>> On Tue, Dec 12, 2017 at 8:17 AM, Ed Swierk <eswierk@skyportsystems.com> wrote:
>>> A short IPv4 packet may have up to 6 bytes of padding following the IP
>>> payload when received on an Ethernet device.
>>>
>>> In the normal IPv4 receive path, ip_rcv() trims the packet to
>>> ip_hdr->tot_len before invoking NF_INET_PRE_ROUTING hooks (including
>>> conntrack). Then any subsequent L3+ processing steps, like
>>> nf_checksum(), use skb->len as the length of the packet, rather than
>>> referring back to ip_hdr->tot_len. In the IPv6 receive path, ip6_rcv()
>>> does the same using ipv6_hdr->payload_len.
>>>
>>> In the OVS conntrack receive path, this trimming does not occur, so
>>> the checksum verification in tcp_header() fails, printing "nf_ct_tcp:
>>> bad TCP checksum". Extra zero bytes don't affect the checksum, but the
>>> length in the IP pseudoheader does. That length is based on skb->len,
>>> and without trimming, it doesn't match the length the sender used when
>>> computing the checksum.
>>>
>>> With this change, OVS conntrack trims IPv4 and IPv6 packets prior to
>>> L3 processing.
>>>
>>> Signed-off-by: Ed Swierk <eswierk@skyportsystems.com>
>>> ---
>>>  net/openvswitch/conntrack.c | 17 +++++++++++++++++
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>>> index d558e882ca0c..3a7c9215c431 100644
>>> --- a/net/openvswitch/conntrack.c
>>> +++ b/net/openvswitch/conntrack.c
>>> @@ -1105,12 +1105,29 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
>>>                    const struct ovs_conntrack_info *info)
>>>  {
>>>         int nh_ofs;
>>> +       unsigned int nh_len;
>>>         int err;
>>>
>>>         /* The conntrack module expects to be working at L3. */
>>>         nh_ofs = skb_network_offset(skb);
>>>         skb_pull_rcsum(skb, nh_ofs);
>>>
>>> +       /* Trim to L3 length since nf_checksum() doesn't expect padding. */
>> Can you explore if nf_checksum can be changed to avoid the padding?
>
> The nf_ip_checksum() and nf_ip6_checksum() helper functions can easily
> be changed to avoid the padding.
>
> My worry is that conntrack is just one of many netfilter hooks that
> perform L3+ processing, and may assume that once skb->data points to
> the L3 header, skb->len reflects the length of the L3 header and
> payload. For example, in nf_conntrack_ftp.c, help() uses skb->len to
> determine the length of the FTP payload and the TCP sequence number of
> the next packet; this would be thrown off by lower-layer padding.
>
> br_netfilter, a cousin of OVS, has always preserved this
> assumption--like ip_rcv() and ip6_rcv(), br_validate_ipv4() and
> br_validate_ipv6() trim the skb to the L3 length before they invoke
> NF_INET_PRE_ROUTING hooks. Modifying OVS to fit the mold seems more
> straightforward than changing this assumption.
>
we could avoid extra processing in fast path, thats why I wanted to
explore this, But if it is too complex, I am fine with this patch.

>>> +       switch (skb->protocol) {
>>> +       case htons(ETH_P_IP):
>>> +               nh_len = ntohs(ip_hdr(skb)->tot_len);
>>> +               break;
>>> +       case htons(ETH_P_IPV6):
>>> +               nh_len = ntohs(ipv6_hdr(skb)->payload_len)
>>> +                       + sizeof(struct ipv6hdr);
>>> +               break;
>>> +       default:
>>> +               nh_len = skb->len;
>>> +       }
>>> +       err = pskb_trim_rcsum(skb, nh_len);
>>> +       if (err)
>>> +               return err;
>>> +
>> In case of error skb needs to be freed.
>
> Thanks, I will fix this.
>
> --Ed

^ permalink raw reply

* Re: [RFC net-next] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
From: Andrew Lunn @ 2017-12-17 18:29 UTC (permalink / raw)
  To: Russell King; +Cc: Florian Fainelli, David S. Miller, netdev
In-Reply-To: <E1eQaEt-000329-8b@rmk-PC.armlinux.org.uk>

On Sun, Dec 17, 2017 at 02:48:27PM +0000, Russell King wrote:
> Provide a pointer to the SFP bus in struct net_device, so that the
> ethtool module EEPROM methods can access the SFP directly, rather
> than needing every user to provide a hook for it.
> 
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
> ---
> Questions:
> 1. Is it worth adding a pointer to struct net_device for these two
>    methods, rather than having multiple duplicate veneers to vector
>    the ethtool module EEPROM ioctls through to the SFP bus layer?
> 
> 2. Should this allow network/phy drivers to override the default -
>    the code is currently structured to allow phy drivers to override
>    network drivers implementations, which seems the wrong way around.

Hi Russell

Looking at drivers which implement reading the EEPROM, very few of
them expose the i2c bus, as a linux i2c bus. They seem to send
commands off to the firmware, and have it return a block of data.  So
converting to using the generic SFP code is not going to be too easy.

Probably a low hanging fruit is to expose a few library like functions
for parsing the EEPROM data. As you said, there seems to be a few bugs
in the drivers with respect to actually interpreting the data. So
having one central implementation, without bugs, would be good.

Rather than adding the sfp bus to net_device, i think phylink will get
more use. And the default implementation of these methods can look at
the phylink to see if there is an sfp device. We are unlikely to be
able to replace phydev with phylink, but maybe all new 10Gbps PHY and
fibre modules not hidden behind firmware could use phylink? So having
phylink in net_device could make sense. There has been a move to
remove phydev from the drivers private structure and use the one in
net_device. Maybe we should do the same for phylink?

    Andrew

^ permalink raw reply

* Re: [RFC net-next] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
From: Russell King - ARM Linux @ 2017-12-17 17:06 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev
In-Reply-To: <E1eQaEt-000329-8b@rmk-PC.armlinux.org.uk>

> Questions:
> 1. Is it worth adding a pointer to struct net_device for these two
>    methods, rather than having multiple duplicate veneers to vector
>    the ethtool module EEPROM ioctls through to the SFP bus layer?
> 
> 2. Should this allow network/phy drivers to override the default -
>    the code is currently structured to allow phy drivers to override
>    network drivers implementations, which seems the wrong way around.

I should also mention that there's another place that having the
sfp bus pointer in the network device comes in handy - the case
where we have a SFP module connected to a PHY rather than the MAC.

In this case, phylink itself is not used to link the SFP to the
netdev, and phylib doesn't provide the necessary hooks into the PHY
driver for the PHY driver to know when the network device comes up
or goes down.  SFP needs to know that to assert/deassert the TX
DISABLE signal to disable the module laser.

Having the net_device structure contain a pointer to the SFP bus
allows phylink or network drivers to directly inform SFP of the
state of the network device, without needing intermediaries to
forward the state.

It's possible that this may not be the best approach - the only setup
I'm aware of at present that has the "mac <-> phy <-> sfp" setup is
the Macchiatobin, but if other phys are involved, it may be better
if instead of having PHY drivers having to add support for SFP, we
instead do it in phylib.  The counter argument to that is that SFP
likely needs more in-depth knowledge of the PHY than a the generic
phylib parts could know about.

The patches as they currently stand are in my "phy" branch, browsable
via:

 http://git.armlinux.org.uk/cgit/linux-arm.git/log/?h=phy

specifically:

 sfp: use netdev sfp_bus for start/stop
 net: phy: Add SFP support to Marvell 10G PHY driver

The last patch is does not (yet) take into account the RX_LOS signal
when determining the link state, which it ought to to avoid false
link assertions as can happen when there's noise pickup by the
detector.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up

^ permalink raw reply

* Re: [patch net-next] mlxsw: spectrum: Add "spectrum" prefix macro
From: Joe Perches @ 2017-12-17 16:25 UTC (permalink / raw)
  To: Jiri Pirko, netdev; +Cc: davem, arkadis, idosch, mlxsw
In-Reply-To: <20171217161534.2446-1-jiri@resnulli.us>

On Sun, 2017-12-17 at 17:15 +0100, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arkadis@mellanox.com>
> 
> Add "spectrum" string prefix macro for error strings.
[]
> diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
[]
> @@ -4168,13 +4168,11 @@ mlxsw_sp_master_lag_check(struct mlxsw_sp *mlxsw_sp,
>  	u16 lag_id;
>  
>  	if (mlxsw_sp_lag_index_get(mlxsw_sp, lag_dev, &lag_id) != 0) {
> -		NL_SET_ERR_MSG(extack,
> -			       "spectrum: Exceeded number of supported LAG devices");
> +		NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Exceeded number of supported LAG devices");

Perhaps use NL_SET_ERR_MSG_MOD instead.

etc...

^ permalink raw reply

* [patch net] mlxsw: spectrum_router: Remove batch neighbour deletion causing FW bug
From: Jiri Pirko @ 2017-12-17 16:16 UTC (permalink / raw)
  To: netdev; +Cc: davem, petrm, idosch, mlxsw

From: Petr Machata <petrm@mellanox.com>

This reverts commit 63dd00fa3e524c27cc0509190084ab147ecc8ae2.

RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
error code.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Fixes: 63dd00fa3e52 ("mlxsw: spectrum_router: Add batch neighbour deletion")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 72ef4f8..be657b8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -2436,25 +2436,16 @@ static void mlxsw_sp_neigh_fini(struct mlxsw_sp *mlxsw_sp)
 	rhashtable_destroy(&mlxsw_sp->router->neigh_ht);
 }
 
-static int mlxsw_sp_neigh_rif_flush(struct mlxsw_sp *mlxsw_sp,
-				    const struct mlxsw_sp_rif *rif)
-{
-	char rauht_pl[MLXSW_REG_RAUHT_LEN];
-
-	mlxsw_reg_rauht_pack(rauht_pl, MLXSW_REG_RAUHT_OP_WRITE_DELETE_ALL,
-			     rif->rif_index, rif->addr);
-	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(rauht), rauht_pl);
-}
-
 static void mlxsw_sp_neigh_rif_gone_sync(struct mlxsw_sp *mlxsw_sp,
 					 struct mlxsw_sp_rif *rif)
 {
 	struct mlxsw_sp_neigh_entry *neigh_entry, *tmp;
 
-	mlxsw_sp_neigh_rif_flush(mlxsw_sp, rif);
 	list_for_each_entry_safe(neigh_entry, tmp, &rif->neigh_list,
-				 rif_list_node)
+				 rif_list_node) {
+		mlxsw_sp_neigh_entry_update(mlxsw_sp, neigh_entry, false);
 		mlxsw_sp_neigh_entry_destroy(mlxsw_sp, neigh_entry);
+	}
 }
 
 enum mlxsw_sp_nexthop_type {
-- 
2.9.5

^ permalink raw reply related

* [patch net-next] mlxsw: spectrum: Add "spectrum" prefix macro
From: Jiri Pirko @ 2017-12-17 16:15 UTC (permalink / raw)
  To: netdev; +Cc: davem, arkadis, idosch, mlxsw

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add "spectrum" string prefix macro for error strings.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 23 ++++++++++-------------
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h |  2 ++
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index d373df7..57e5ab4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4168,13 +4168,11 @@ mlxsw_sp_master_lag_check(struct mlxsw_sp *mlxsw_sp,
 	u16 lag_id;
 
 	if (mlxsw_sp_lag_index_get(mlxsw_sp, lag_dev, &lag_id) != 0) {
-		NL_SET_ERR_MSG(extack,
-			       "spectrum: Exceeded number of supported LAG devices");
+		NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Exceeded number of supported LAG devices");
 		return false;
 	}
 	if (lag_upper_info->tx_type != NETDEV_LAG_TX_TYPE_HASH) {
-		NL_SET_ERR_MSG(extack,
-			       "spectrum: LAG device using unsupported Tx type");
+		NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "LAG device using unsupported Tx type");
 		return false;
 	}
 	return true;
@@ -4416,15 +4414,14 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
 		    !netif_is_lag_master(upper_dev) &&
 		    !netif_is_bridge_master(upper_dev) &&
 		    !netif_is_ovs_master(upper_dev)) {
-			NL_SET_ERR_MSG(extack,
-				       "spectrum: Unknown upper device type");
+			NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Unknown upper device type");
 			return -EINVAL;
 		}
 		if (!info->linking)
 			break;
 		if (netdev_has_any_upper_dev(upper_dev)) {
 			NL_SET_ERR_MSG(extack,
-				       "spectrum: Enslaving a port to a device that already has an upper device is not supported");
+				       MLXSW_SP_PREFIX "Enslaving a port to a device that already has an upper device is not supported");
 			return -EINVAL;
 		}
 		if (netif_is_lag_master(upper_dev) &&
@@ -4433,23 +4430,23 @@ static int mlxsw_sp_netdevice_port_upper_event(struct net_device *lower_dev,
 			return -EINVAL;
 		if (netif_is_lag_master(upper_dev) && vlan_uses_dev(dev)) {
 			NL_SET_ERR_MSG(extack,
-				       "spectrum: Master device is a LAG master and this device has a VLAN");
+				       MLXSW_SP_PREFIX "Master device is a LAG master and this device has a VLAN");
 			return -EINVAL;
 		}
 		if (netif_is_lag_port(dev) && is_vlan_dev(upper_dev) &&
 		    !netif_is_lag_master(vlan_dev_real_dev(upper_dev))) {
 			NL_SET_ERR_MSG(extack,
-				       "spectrum: Can not put a VLAN on a LAG port");
+				       MLXSW_SP_PREFIX "Can not put a VLAN on a LAG port");
 			return -EINVAL;
 		}
 		if (netif_is_ovs_master(upper_dev) && vlan_uses_dev(dev)) {
 			NL_SET_ERR_MSG(extack,
-				       "spectrum: Master device is an OVS master and this device has a VLAN");
+				       MLXSW_SP_PREFIX "Master device is an OVS master and this device has a VLAN");
 			return -EINVAL;
 		}
 		if (netif_is_ovs_port(dev) && is_vlan_dev(upper_dev)) {
 			NL_SET_ERR_MSG(extack,
-				       "spectrum: Can not put a VLAN on an OVS port");
+				       MLXSW_SP_PREFIX "Can not put a VLAN on an OVS port");
 			return -EINVAL;
 		}
 		break;
@@ -4561,13 +4558,13 @@ static int mlxsw_sp_netdevice_port_vlan_event(struct net_device *vlan_dev,
 	case NETDEV_PRECHANGEUPPER:
 		upper_dev = info->upper_dev;
 		if (!netif_is_bridge_master(upper_dev)) {
-			NL_SET_ERR_MSG(extack, "spectrum: VLAN devices only support bridge and VRF uppers");
+			NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "VLAN devices only support bridge and VRF uppers");
 			return -EINVAL;
 		}
 		if (!info->linking)
 			break;
 		if (netdev_has_any_upper_dev(upper_dev)) {
-			NL_SET_ERR_MSG(extack, "spectrum: Enslaving a port to a device that already has an upper device is not supported");
+			NL_SET_ERR_MSG(extack, MLXSW_SP_PREFIX "Enslaving a port to a device that already has an upper device is not supported");
 			return -EINVAL;
 		}
 		break;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a0adcd8..36a0335 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -66,6 +66,8 @@
 #define MLXSW_SP_KVD_LINEAR_SIZE 98304 /* entries */
 #define MLXSW_SP_KVD_GRANULARITY 128
 
+#define MLXSW_SP_PREFIX "spectrum: "
+
 struct mlxsw_sp_port;
 struct mlxsw_sp_rif;
 
-- 
2.9.5

^ permalink raw reply related

* Re: BUG: KASAN: use-after-free in fib_table_flush
From: Ido Schimmel @ 2017-12-17 16:07 UTC (permalink / raw)
  To: Fengguang Wu, alexander.h.duyck
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Linus Torvalds, David Ahern, Jiri Pirko, Ido Schimmel,
	linux-kernel, lkp
In-Reply-To: <20171217125557.ikoybonovlw25u4e@wfg-t540p.sh.intel.com>

+Alexander

On Sun, Dec 17, 2017 at 08:55:57PM +0800, Fengguang Wu wrote:
> Hello,
> 
> FYI this happens in mainline kernel 4.15.0-rc3.
> It looks like a new regression.
> 
> It occurs in 4 out of 28 boots.
> 
> [  166.090516] ==================================================================
> [  166.092419] BUG: KASAN: use-after-free in fib_table_flush+0x76c/0x870:
> 						fib_table_flush at net/ipv4/fib_trie.c:1868
> [  166.092907] Read of size 8 at addr ffff880012fc0b18 by task kworker/u2:3/173
> [  166.093402]
> [  166.093528] CPU: 0 PID: 173 Comm: kworker/u2:3 Not tainted 4.15.0-rc3 #31
> [  166.094018] Workqueue: netns cleanup_net
> [  166.094298] Call Trace:
> [  166.094489]  print_address_description+0xa6/0x370:
> 						print_address_description at mm/kasan/report.c:253
> [  166.094867]  ? fib_table_flush+0x76c/0x870:
> 						fib_table_flush at net/ipv4/fib_trie.c:1868
> [  166.095159]  kasan_report+0x226/0x330:
> 						kasan_report_error at mm/kasan/report.c:352
> 						 (inlined by) kasan_report at mm/kasan/report.c:409
> [  166.095420]  fib_table_flush+0x76c/0x870:
> 						fib_table_flush at net/ipv4/fib_trie.c:1868
> [  166.095698]  ? fib_table_flush_external+0x5a0/0x5a0:
> 						fib_table_flush at net/ipv4/fib_trie.c:1836
> [  166.096067]  ? ip_fib_net_exit+0x94/0x360:
> 						ip_fib_net_exit at net/ipv4/fib_frontend.c:1313 (discriminator 16)
> [  166.096350]  ip_fib_net_exit+0x228/0x360:
> 						ip_fib_net_exit at net/ipv4/fib_frontend.c:1316
> [  166.096629]  ? ip_fib_net_exit+0x360/0x360:
> 						fib_net_exit at net/ipv4/fib_frontend.c:1355
> [  166.096930]  ops_exit_list+0xa8/0x160
> [  166.097233]  cleanup_net+0x414/0x860:
> 						cleanup_net at net/core/net_namespace.c:483 (discriminator 9)
> [  166.097487]  ? net_drop_ns+0x80/0x80:
> 						cleanup_net at net/core/net_namespace.c:439
> [  166.097748]  ? kvm_sched_clock_read+0x5/0x10:
> 						kvm_sched_clock_read at arch/x86/kernel/kvmclock.c:101
> [  166.098051]  ? native_sched_clock_from_tsc+0x40/0x70:
> 						__preempt_count_dec_and_test at arch/x86/include/asm/preempt.h:91
> 						 (inlined by) cyc2ns_read_end at arch/x86/kernel/tsc.c:81
> 						 (inlined by) cycles_2_ns at arch/x86/kernel/tsc.c:135
> 						 (inlined by) native_sched_clock_from_tsc at arch/x86/kernel/tsc.c:219
> [  166.098399]  ? sched_clock_cpu+0xf/0x70:
> 						sched_clock_cpu at kernel/sched/clock.c:363
> [  166.098672]  ? __lock_acquire+0x3b2/0x1fc0
> [  166.099054]  ? lock_downgrade+0x6a0/0x6a0:
> 						lock_release at kernel/locking/lockdep.c:4013
> [  166.099337]  ? lock_acquire+0x117/0x260:
> 						get_current at arch/x86/include/asm/current.h:15
> 						 (inlined by) lock_acquire at kernel/locking/lockdep.c:4006
> [  166.099609]  ? process_one_work+0x70f/0x11c0:
> 						process_one_work at kernel/workqueue.c:2087
> [  166.099938]  process_one_work+0x791/0x11c0:
> 						process_one_work at kernel/workqueue.c:2118
> [  166.100229]  ? kvm_sched_clock_read+0x5/0x10:
> 						kvm_sched_clock_read at arch/x86/kernel/kvmclock.c:101
> [  166.100532]  ? sched_clock+0x2d/0x40:
> 						paravirt_sched_clock at arch/x86/include/asm/paravirt.h:174
> 						 (inlined by) sched_clock at arch/x86/kernel/tsc.c:227
> [  166.100792]  ? cancel_delayed_work_sync+0x20/0x20:
> 						process_one_work at kernel/workqueue.c:2014
> [  166.101123]  worker_thread+0xe8/0x1070:
> 						__read_once_size at include/linux/compiler.h:183
> 						 (inlined by) list_empty at include/linux/list.h:203
> 						 (inlined by) worker_thread at kernel/workqueue.c:2247
> [  166.101392]  ? __kthread_parkme+0x164/0x230:
> 						__kthread_parkme at kernel/kthread.c:188
> [  166.101689]  ? process_one_work+0x11c0/0x11c0:
> 						worker_thread at kernel/workqueue.c:2189
> [  166.102006]  kthread+0x2fd/0x400:
> 						kthread at kernel/kthread.c:238
> [  166.102240]  ? kthread_create_on_node+0xf0/0xf0:
> 						kthread at kernel/kthread.c:198
> [  166.102561]  ret_from_fork+0x1f/0x30:
> 						ret_from_fork at arch/x86/entry/entry_64.S:447
> [  166.102855]
> [  166.102972] Allocated by task 1907:
> [  166.103235]  __kmalloc+0xf6/0x1a0:
> 						__kmalloc at mm/slub.c:3765
> [  166.103475]  fib_trie_table+0xe8/0x240:
> 						fib_trie_table at net/ipv4/fib_trie.c:2081
> [  166.103748]  fib_net_init+0x1bc/0x570:
> 						fib4_rules_init at net/ipv4/fib_frontend.c:59
> 						 (inlined by) ip_fib_net_init at net/ipv4/fib_frontend.c:1287
> 						 (inlined by) fib_net_init at net/ipv4/fib_frontend.c:1335
> [  166.104032]  ops_init+0x1c0/0x360:
> 						ops_init at net/core/net_namespace.c:119
> [  166.104269]  setup_net+0x23c/0x530:
> 						setup_net at net/core/net_namespace.c:296
> [  166.104512]  copy_net_ns+0x170/0x350:
> 						copy_net_ns at net/core/net_namespace.c:420
> [  166.104779]  create_new_namespaces+0x343/0x730:
> 						create_new_namespaces at kernel/nsproxy.c:107
> [  166.105091]  unshare_nsproxy_namespaces+0xa1/0x150:
> 						unshare_nsproxy_namespaces at kernel/nsproxy.c:206 (discriminator 4)
> [  166.105427]  SyS_unshare+0x338/0x6c0
> [  166.105682]  do_syscall_64+0x21f/0xb80:
> 						do_syscall_64 at arch/x86/entry/common.c:285
> [  166.105954]  return_from_SYSCALL_64+0x0/0x65:
> 						return_from_SYSCALL_64 at arch/x86/entry/entry_64.S:259
> [  166.106253]
> [  166.106367] Freed by task 11:
> [  166.106581]  kfree+0x102/0x1d0:
> 						slab_free at mm/slub.c:2973
> 						 (inlined by) kfree at mm/slub.c:3899
> [  166.106838]  rcu_do_batch+0x331/0x7f0:
> 						rcu_lock_release at include/linux/rcupdate.h:249
> 						 (inlined by) __rcu_reclaim at kernel/rcu/rcu.h:196
> 						 (inlined by) rcu_do_batch at kernel/rcu/tree.c:2758
> [  166.107102]  rcu_cpu_kthread+0x12a/0x160:
> 						rcu_preempt_do_callbacks at kernel/rcu/tree_plugin.h:687
> 						 (inlined by) rcu_kthread_do_work at kernel/rcu/tree_plugin.h:1142
> 						 (inlined by) rcu_cpu_kthread at kernel/rcu/tree_plugin.h:1184
> [  166.107381]  smpboot_thread_fn+0x3c1/0x820:
> 						smpboot_thread_fn at kernel/smpboot.c:164
> [  166.107669]  kthread+0x2fd/0x400:
> 						kthread at kernel/kthread.c:238
> [  166.107928]  ret_from_fork+0x1f/0x30:
> 						ret_from_fork at arch/x86/entry/entry_64.S:447
> [  166.108181]
> [  166.108295] The buggy address belongs to the object at ffff880012fc0ae0
> [  166.108295]  which belongs to the cache kmalloc-64 of size 64
> [  166.109179] The buggy address is located 56 bytes inside of
> [  166.109179]  64-byte region [ffff880012fc0ae0, ffff880012fc0b20)

Hi Alexander,

Note that CONFIG_IP_MULTIPLE_TABLES is disabled, so both the main and
local table are allocated during init and also share the same trie.

I think that what happens is that ip_fib_net_exit() frees the main table
and its trie via an RCU callback which is scheduled before the local
table is iterated over, thus resulting in a use-after-free.

I can reliably trigger the bug by adding synchronize_rcu() at the end of
each iteration of the loop.

Problem goes away if we iterate over the tables in reverse order which
is symmetric to fib4_rules_init().

What do you think?

^ permalink raw reply

* Re: [patch iproute2] tc: implement filter block sharing to ingress and clsact qdiscs
From: Jiri Pirko @ 2017-12-17 16:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew, vivien.didelot,
	f.fainelli, michael.chan, ganeshgr, saeedm, matanb, leonro,
	idosch, jakub.kicinski, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <20171216101251.69e607f6@xeon-e3>

Sat, Dec 16, 2017 at 07:12:51PM CET, stephen@networkplumber.org wrote:
>On Wed, 13 Dec 2017 16:13:57 +0100
>Jiri Pirko <jiri@resnulli.us> wrote:
>
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>
>This needs to wait until block sharing makes it into net-next upstream.

Sure. I like to send the userspace patch alongside with the kernel
patchset so the reviewers have full view. I hope you don't mind

^ permalink raw reply

* [PATCH 4/4] bcm63xx_enet: use platform device id directly for miibus name
From: Jonas Gorski @ 2017-12-17 16:02 UTC (permalink / raw)
  To: netdev, linux-mips
  Cc: Ralf Baechle, David S. Miller, Florian Fainelli,
	bcm-kernel-feedback-list
In-Reply-To: <20171217160255.30342-1-jonas.gorski@gmail.com>

Directly use the platform device for generating the miibus name. This
removes the last user of bcm_enet_priv::mac_id and we can remove the
field.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 3 +--
 drivers/net/ethernet/broadcom/bcm63xx_enet.h | 3 ---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index d4519c621d08..1fbbbabe7588 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1750,7 +1750,6 @@ static int bcm_enet_probe(struct platform_device *pdev)
 	dev->irq = priv->irq = res_irq->start;
 	priv->irq_rx = res_irq_rx->start;
 	priv->irq_tx = res_irq_tx->start;
-	priv->mac_id = pdev->id;
 
 	priv->mac_clk = devm_clk_get(&pdev->dev, "enet");
 	if (IS_ERR(priv->mac_clk)) {
@@ -1818,7 +1817,7 @@ static int bcm_enet_probe(struct platform_device *pdev)
 		bus->priv = priv;
 		bus->read = bcm_enet_mdio_read_phylib;
 		bus->write = bcm_enet_mdio_write_phylib;
-		sprintf(bus->id, "%s-%d", pdev->name, priv->mac_id);
+		sprintf(bus->id, "%s-%d", pdev->name, pdev->id);
 
 		/* only probe bus where we think the PHY is, because
 		 * the mdio read operation return 0 instead of 0xffff
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.h b/drivers/net/ethernet/broadcom/bcm63xx_enet.h
index 5a66728d4776..1d3c917eb830 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.h
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.h
@@ -193,9 +193,6 @@ struct bcm_enet_mib_counters {
 
 struct bcm_enet_priv {
 
-	/* mac id (from platform device id) */
-	int mac_id;
-
 	/* base remapped address of device */
 	void __iomem *base;
 
-- 
2.13.2

^ permalink raw reply related

* [PATCH 3/4] bcm63xx_enet: remove pointless mac_id check
From: Jonas Gorski @ 2017-12-17 16:02 UTC (permalink / raw)
  To: netdev, linux-mips
  Cc: Ralf Baechle, David S. Miller, Florian Fainelli,
	bcm-kernel-feedback-list
In-Reply-To: <20171217160255.30342-1-jonas.gorski@gmail.com>

Enabling the ephy clock for mac 1 is harmless, and the actual usage of
the ephy is not restricted to mac 0, so we might as well remove the
check.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index e603a6fe6349..d4519c621d08 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1787,7 +1787,7 @@ static int bcm_enet_probe(struct platform_device *pdev)
 		priv->tx_chan = pd->tx_chan;
 	}
 
-	if (priv->mac_id == 0 && priv->has_phy && !priv->use_external_mii) {
+	if (priv->has_phy && !priv->use_external_mii) {
 		/* using internal PHY, enable clock */
 		priv->phy_clk = devm_clk_get(&pdev->dev, "ephy");
 		if (IS_ERR(priv->phy_clk)) {
-- 
2.13.2

^ permalink raw reply related

* [PATCH 2/4] bcm63xx_enet: use platform data for dma channel numbers
From: Jonas Gorski @ 2017-12-17 16:02 UTC (permalink / raw)
  To: netdev, linux-mips
  Cc: Ralf Baechle, David S. Miller, Florian Fainelli,
	bcm-kernel-feedback-list
In-Reply-To: <20171217160255.30342-1-jonas.gorski@gmail.com>

To reduce the reliance on device ids, pass the dma channel numbers to
the enet devices as platform data.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
---
 arch/mips/bcm63xx/dev-enet.c                          |  8 ++++++++
 arch/mips/include/asm/mach-bcm63xx/bcm63xx_dev_enet.h |  4 ++++
 drivers/net/ethernet/broadcom/bcm63xx_enet.c          | 11 ++---------
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/mips/bcm63xx/dev-enet.c b/arch/mips/bcm63xx/dev-enet.c
index e8284771d620..07b4c65a88a4 100644
--- a/arch/mips/bcm63xx/dev-enet.c
+++ b/arch/mips/bcm63xx/dev-enet.c
@@ -265,6 +265,14 @@ int __init bcm63xx_enet_register(int unit,
 		dpd->dma_chan_width = ENETDMA_CHAN_WIDTH;
 	}
 
+	if (unit == 0) {
+		dpd->rx_chan = 0;
+		dpd->tx_chan = 1;
+	} else {
+		dpd->rx_chan = 2;
+		dpd->tx_chan = 3;
+	}
+
 	ret = platform_device_register(pdev);
 	if (ret)
 		return ret;
diff --git a/arch/mips/include/asm/mach-bcm63xx/bcm63xx_dev_enet.h b/arch/mips/include/asm/mach-bcm63xx/bcm63xx_dev_enet.h
index c0bd47444cff..da39e4d326ba 100644
--- a/arch/mips/include/asm/mach-bcm63xx/bcm63xx_dev_enet.h
+++ b/arch/mips/include/asm/mach-bcm63xx/bcm63xx_dev_enet.h
@@ -55,6 +55,10 @@ struct bcm63xx_enet_platform_data {
 
 	/* DMA descriptor shift */
 	unsigned int dma_desc_shift;
+
+	/* dma channel ids */
+	int rx_chan;
+	int tx_chan;
 };
 
 /*
diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index 5a5886345da2..e603a6fe6349 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1752,15 +1752,6 @@ static int bcm_enet_probe(struct platform_device *pdev)
 	priv->irq_tx = res_irq_tx->start;
 	priv->mac_id = pdev->id;
 
-	/* get rx & tx dma channel id for this mac */
-	if (priv->mac_id == 0) {
-		priv->rx_chan = 0;
-		priv->tx_chan = 1;
-	} else {
-		priv->rx_chan = 2;
-		priv->tx_chan = 3;
-	}
-
 	priv->mac_clk = devm_clk_get(&pdev->dev, "enet");
 	if (IS_ERR(priv->mac_clk)) {
 		ret = PTR_ERR(priv->mac_clk);
@@ -1792,6 +1783,8 @@ static int bcm_enet_probe(struct platform_device *pdev)
 		priv->dma_chan_width = pd->dma_chan_width;
 		priv->dma_has_sram = pd->dma_has_sram;
 		priv->dma_desc_shift = pd->dma_desc_shift;
+		priv->rx_chan = pd->rx_chan;
+		priv->tx_chan = pd->tx_chan;
 	}
 
 	if (priv->mac_id == 0 && priv->has_phy && !priv->use_external_mii) {
-- 
2.13.2

^ permalink raw reply related

* [PATCH 1/4] bcm63xx_enet: just use "enet" as the clock name
From: Jonas Gorski @ 2017-12-17 16:02 UTC (permalink / raw)
  To: netdev, linux-mips
  Cc: Ralf Baechle, David S. Miller, Florian Fainelli,
	bcm-kernel-feedback-list
In-Reply-To: <20171217160255.30342-1-jonas.gorski@gmail.com>

Now that we have the individual clocks available as "enet" we
don't need to rely on the device id for them anymore.

Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com>
---
 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index d9346e2ac720..5a5886345da2 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1716,7 +1716,6 @@ static int bcm_enet_probe(struct platform_device *pdev)
 	struct bcm63xx_enet_platform_data *pd;
 	struct resource *res_mem, *res_irq, *res_irq_rx, *res_irq_tx;
 	struct mii_bus *bus;
-	const char *clk_name;
 	int i, ret;
 
 	if (!bcm_enet_shared_base[0])
@@ -1757,14 +1756,12 @@ static int bcm_enet_probe(struct platform_device *pdev)
 	if (priv->mac_id == 0) {
 		priv->rx_chan = 0;
 		priv->tx_chan = 1;
-		clk_name = "enet0";
 	} else {
 		priv->rx_chan = 2;
 		priv->tx_chan = 3;
-		clk_name = "enet1";
 	}
 
-	priv->mac_clk = devm_clk_get(&pdev->dev, clk_name);
+	priv->mac_clk = devm_clk_get(&pdev->dev, "enet");
 	if (IS_ERR(priv->mac_clk)) {
 		ret = PTR_ERR(priv->mac_clk);
 		goto out;
-- 
2.13.2

^ permalink raw reply related

* [PATCH 0/4] bcm63xx_enet: remove mac_id usage
From: Jonas Gorski @ 2017-12-17 16:02 UTC (permalink / raw)
  To: netdev, linux-mips
  Cc: Ralf Baechle, David S. Miller, Florian Fainelli,
	bcm-kernel-feedback-list

This patchset aims at reducing the platform device id number usage with
the target of making it eventually possible to probe the driver through OF.

Runtested on BCM6358.

Since the patches touch mostly net/, they should go through net-next.

Jonas Gorski (4):
  bcm63xx_enet: just use "enet" as the clock name
  bcm63xx_enet: use platform data for dma channel numbers
  bcm63xx_enet: remove pointless mac_id check
  bcm63xx_enet: use platform device id directly for miibus name

 arch/mips/bcm63xx/dev-enet.c                        |  8 ++++++++
 .../include/asm/mach-bcm63xx/bcm63xx_dev_enet.h     |  4 ++++
 drivers/net/ethernet/broadcom/bcm63xx_enet.c        | 21 +++++----------------
 drivers/net/ethernet/broadcom/bcm63xx_enet.h        |  3 ---
 4 files changed, 17 insertions(+), 19 deletions(-)

-- 
1.9.1

^ permalink raw reply

* Re: [PATCH net-next 2/2 v8] net: ethernet: Add a driver for Gemini gigabit ethernet
From: Linus Walleij @ 2017-12-17 15:49 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Michal Miroslaw, Janos Laube, Paulius Zaleckas, Linux ARM,
	Hans Ulli Kroll, Florian Fainelli, Tobias Waldvogel
In-Reply-To: <20171211.141651.2190843744682664766.davem@davemloft.net>

On Mon, Dec 11, 2017 at 8:16 PM, David Miller <davem@davemloft.net> wrote:
> From: Linus Walleij <linus.walleij@linaro.org>

>> +if NET_VENDOR_CORTINA
>> +
>> +config GEMINI_ETHERNET
>> +     tristate "Gemini Gigabit Ethernet support"
>> +     depends on ARCH_GEMINI
>> +     depends on OF
>> +     select PHYLIB
>> +     select CRC32
>> +     ---help---
>> +       This driver supports StorLink SL351x (Gemini) dual Gigabit Ethernet.
>
> Make this driver buildable anywhere, you don't use any platform architecture
> specific features.

I pushed the recent v9 set where I remove the dep on ARCH_GEMINI
and the autobuilders complain a lot about the use of dma_to_pfn()
which turns out to be a ARM thing from <asm/dma-mapping.h>
included from <linux/dma-mapping.h>.

I will try switching to functions from the generic dma-mapping API
and fix it up and send a v10.

Yours,
Linus Walleij

^ permalink raw reply

* [RFC net-next] sfp/phylink: move module EEPROM ethtool access into netdev core ethtool
From: Russell King @ 2017-12-17 14:48 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David S. Miller; +Cc: netdev

Provide a pointer to the SFP bus in struct net_device, so that the
ethtool module EEPROM methods can access the SFP directly, rather
than needing every user to provide a hook for it.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
Questions:
1. Is it worth adding a pointer to struct net_device for these two
   methods, rather than having multiple duplicate veneers to vector
   the ethtool module EEPROM ioctls through to the SFP bus layer?

2. Should this allow network/phy drivers to override the default -
   the code is currently structured to allow phy drivers to override
   network drivers implementations, which seems the wrong way around.

 drivers/net/phy/phylink.c             | 28 ----------------------------
 drivers/net/phy/sfp-bus.c             |  6 ++----
 include/linux/netdevice.h             |  3 +++
 include/linux/phylink.h               |  3 ---
 net/core/ethtool.c                    |  7 +++++++
 5 files changed, 12 insertions(+), 35 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index db5d5726ced9..0f59d7149a61 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -1247,34 +1247,6 @@ int phylink_ethtool_set_pauseparam(struct phylink *pl,
 }
 EXPORT_SYMBOL_GPL(phylink_ethtool_set_pauseparam);
 
-int phylink_ethtool_get_module_info(struct phylink *pl,
-				    struct ethtool_modinfo *modinfo)
-{
-	int ret = -EOPNOTSUPP;
-
-	WARN_ON(!lockdep_rtnl_is_held());
-
-	if (pl->sfp_bus)
-		ret = sfp_get_module_info(pl->sfp_bus, modinfo);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_info);
-
-int phylink_ethtool_get_module_eeprom(struct phylink *pl,
-				      struct ethtool_eeprom *ee, u8 *buf)
-{
-	int ret = -EOPNOTSUPP;
-
-	WARN_ON(!lockdep_rtnl_is_held());
-
-	if (pl->sfp_bus)
-		ret = sfp_get_module_eeprom(pl->sfp_bus, ee, buf);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(phylink_ethtool_get_module_eeprom);
-
 /**
  * phylink_ethtool_get_eee_err() - read the energy efficient ethernet error
  *   counter
diff --git a/drivers/net/phy/sfp-bus.c b/drivers/net/phy/sfp-bus.c
index 1356dba0d9d3..4d61099b1357 100644
--- a/drivers/net/phy/sfp-bus.c
+++ b/drivers/net/phy/sfp-bus.c
@@ -321,6 +321,7 @@ static int sfp_register_bus(struct sfp_bus *bus)
 	}
 	if (bus->started)
 		bus->socket_ops->start(bus->sfp);
+	bus->netdev->sfp_bus = bus;
 	bus->registered = true;
 	return 0;
 }
@@ -335,6 +336,7 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
 		if (bus->phydev && ops && ops->disconnect_phy)
 			ops->disconnect_phy(bus->upstream);
 	}
+	bus->netdev->sfp_bus = NULL;
 	bus->registered = false;
 }
 
@@ -350,8 +352,6 @@ static void sfp_unregister_bus(struct sfp_bus *bus)
  */
 int sfp_get_module_info(struct sfp_bus *bus, struct ethtool_modinfo *modinfo)
 {
-	if (!bus->registered)
-		return -ENOIOCTLCMD;
 	return bus->socket_ops->module_info(bus->sfp, modinfo);
 }
 EXPORT_SYMBOL_GPL(sfp_get_module_info);
@@ -370,8 +370,6 @@ EXPORT_SYMBOL_GPL(sfp_get_module_info);
 int sfp_get_module_eeprom(struct sfp_bus *bus, struct ethtool_eeprom *ee,
 			  u8 *data)
 {
-	if (!bus->registered)
-		return -ENOIOCTLCMD;
 	return bus->socket_ops->module_eeprom(bus->sfp, ee, data);
 }
 EXPORT_SYMBOL_GPL(sfp_get_module_eeprom);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef789e1d679e..99a0a155c319 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -57,6 +57,7 @@ struct device;
 struct phy_device;
 struct dsa_port;
 
+struct sfp_bus;
 /* 802.11 specific */
 struct wireless_dev;
 /* 802.15.4 specific */
@@ -1644,6 +1645,7 @@ enum netdev_priv_flags {
  *	@priomap:	XXX: need comments on this one
  *	@phydev:	Physical device may attach itself
  *			for hardware timestamping
+ *	@sfp_bus:	attached &struct sfp_bus structure.
  *
  *	@qdisc_tx_busylock: lockdep class annotating Qdisc->busylock spinlock
  *	@qdisc_running_key: lockdep class annotating Qdisc->running seqcount
@@ -1922,6 +1924,7 @@ struct net_device {
 	struct netprio_map __rcu *priomap;
 #endif
 	struct phy_device	*phydev;
+	struct sfp_bus		*sfp_bus;
 	struct lock_class_key	*qdisc_tx_busylock;
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index bd137c273d38..618fa5e83564 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -211,9 +211,6 @@ void phylink_ethtool_get_pauseparam(struct phylink *,
 				    struct ethtool_pauseparam *);
 int phylink_ethtool_set_pauseparam(struct phylink *,
 				   struct ethtool_pauseparam *);
-int phylink_ethtool_get_module_info(struct phylink *, struct ethtool_modinfo *);
-int phylink_ethtool_get_module_eeprom(struct phylink *,
-				      struct ethtool_eeprom *, u8 *);
 int phylink_get_eee_err(struct phylink *);
 int phylink_ethtool_get_eee(struct phylink *, struct ethtool_eee *);
 int phylink_ethtool_set_eee(struct phylink *, struct ethtool_eee *);
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf450a36e..86a6b3d05116 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -22,6 +22,7 @@
 #include <linux/bitops.h>
 #include <linux/uaccess.h>
 #include <linux/vmalloc.h>
+#include <linux/sfp.h>
 #include <linux/slab.h>
 #include <linux/rtnetlink.h>
 #include <linux/sched/signal.h>
@@ -2217,6 +2218,9 @@ static int __ethtool_get_module_info(struct net_device *dev,
 	const struct ethtool_ops *ops = dev->ethtool_ops;
 	struct phy_device *phydev = dev->phydev;
 
+	if (dev->sfp_bus)
+		return sfp_get_module_info(dev->sfp_bus, modinfo);
+
 	if (phydev && phydev->drv && phydev->drv->module_info)
 		return phydev->drv->module_info(phydev, modinfo);
 
@@ -2251,6 +2255,9 @@ static int __ethtool_get_module_eeprom(struct net_device *dev,
 	const struct ethtool_ops *ops = dev->ethtool_ops;
 	struct phy_device *phydev = dev->phydev;
 
+	if (dev->sfp_bus)
+		return sfp_get_module_eeprom(dev->sfp_bus, ee, data);
+
 	if (phydev && phydev->drv && phydev->drv->module_eeprom)
 		return phydev->drv->module_eeprom(phydev, ee, data);
 
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox