Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net] datagram: When peeking datagrams with offset < 0 don't skip empty skbs
From: Paolo Abeni @ 2017-08-16 20:20 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Matthew Dawson, Network Development, Macieira, Thiago
In-Reply-To: <CAF=yD-K8FMicoSS6u-0r_J0p0fTyn4GNwhXn7_gRSSMYmAzw5g@mail.gmail.com>

On Wed, 2017-08-16 at 11:18 -0400, Willem de Bruijn wrote:
> > If I read the above correctly, you are arguining in favor of the
> > addittional flag version, right?
> 
> I was. Though if we are going to thread the argument from the caller
> to __skb_try_recv_from_queue to avoid rereading sk->sk_peek_off,
> on second thought it might be simpler to do it through off:
[...]
> This, of course, requires restricting sk_peek_off to protect against overflow.

Ok, even if I'm not 100% sure overall this will be simpler when adding
also the overflow check.

> If I'm not mistaken, the test in udp_recvmsg currently incorrectly sets
> peeking to false when peeking at offset zero:
> 
>         peeking = off = sk_peek_offset(sk, flags);

I think you are right, does not look correct.

> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2408,9 +2408,7 @@ EXPORT_SYMBOL(__sk_mem_reclaim);
> > 
> >  int sk_set_peek_off(struct sock *sk, int val)
> >  {
> > -       if (val < 0)
> > -               return -EINVAL;
> > -
> > +       /* a negative value will disable peeking with offset */
> >         sk->sk_peek_off = val;
> >         return 0;
> >  }
> 
> Separate patch to net-next?

Agreed.

Paolo

^ permalink raw reply

* [net-next:master 1065/1071] sockmap.c:undefined reference to `strp_done'
From: kbuild test robot @ 2017-08-16 20:17 UTC (permalink / raw)
  To: John Fastabend; +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   cf9d01405925e3f8144c99d7bf7b184449794066
commit: 174a79ff9515f400b9a6115643dafd62a635b7e6 [1065/1071] bpf: sockmap with sk redirect support
config: i386-randconfig-n0-201733 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
        git checkout 174a79ff9515f400b9a6115643dafd62a635b7e6
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/bpf/sockmap.o: In function `smap_gc_work':
>> sockmap.c:(.text+0x28f): undefined reference to `strp_done'
   kernel/bpf/sockmap.o: In function `smap_data_ready':
>> sockmap.c:(.text+0x4cc): undefined reference to `strp_data_ready'
   kernel/bpf/sockmap.o: In function `smap_release_sock':
>> sockmap.c:(.text+0x54c): undefined reference to `strp_stop'
   kernel/bpf/sockmap.o: In function `sock_map_ctx_update_elem.isra.6':
   sockmap.c:(.text+0x9f1): undefined reference to `strp_stop'
>> sockmap.c:(.text+0xa58): undefined reference to `strp_init'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29285 bytes --]

^ permalink raw reply

* Re: [PATCH net] ipv6: fix NULL dereference in ip6_route_dev_notify()
From: Cong Wang @ 2017-08-16 20:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1502912256.4936.150.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, Aug 16, 2017 at 12:37 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-08-16 at 12:15 -0700, Eric Dumazet wrote:
>> On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote:
>> > On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > > From: Eric Dumazet <edumazet@google.com>
>> > >
>> > > Based on a syzkaller report [1], I found that a per cpu allocation
>> > > failure in snmp6_alloc_dev() would then lead to NULL dereference in
>> > > ip6_route_dev_notify().
>> > >
>> > > It seems this is a very old bug, thus no Fixes tag in this submission.
>> >
>> >
>> > It should be introduced by my commit which introduces these
>> > in6_dev_put().
>> >
>>
>> Sorry, which commit exactly ?
>>
>> I got the issue on 4.3 up to latest 4.12
>
> Oh you're right, this was because I had backported
> 242d3a49a2a1a71d8eb9f953db1bcaa9d698ce00 into my trees a while back...
>
> So the bug was only added in 4.12

Yes this is correct.

^ permalink raw reply

* Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid
From: Cong Wang @ 2017-08-16 20:15 UTC (permalink / raw)
  To: Levin, Alexander (Sasha Levin)
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20170816201707.xbpquud3ar3uarst@sasha-lappy>

On Wed, Aug 16, 2017 at 1:13 PM, Levin, Alexander (Sasha Levin)
<alexander.levin@verizon.com> wrote:
> Ping?

I guess you missed the comment saying you need to check
the return value of nla_put_u32().

^ permalink raw reply

* Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid
From: Levin, Alexander (Sasha Levin) @ 2017-08-16 20:13 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xiyou.wangcong@gmail.com
In-Reply-To: <20170727181352.18547-1-alexander.levin@verizon.com>

Ping?

On Thu, Jul 27, 2017 at 02:13:52PM -0400, Sasha Levin wrote:
>This is useful for directly looking up a task based on class id rather than
>having to scan through all open file descriptors.
>
>Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
>---
>
>Changes in V2:
> - Addressed comments from Cong Wang (use nla_put_u32())
>
> include/uapi/linux/inet_diag.h |  1 +
> net/ipv4/inet_diag.c           | 10 ++++++++++
> 2 files changed, 11 insertions(+)
>
>diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
>index bbe201047df6..678496897a68 100644
>--- a/include/uapi/linux/inet_diag.h
>+++ b/include/uapi/linux/inet_diag.h
>@@ -142,6 +142,7 @@ enum {
> 	INET_DIAG_PAD,
> 	INET_DIAG_MARK,
> 	INET_DIAG_BBRINFO,
>+	INET_DIAG_CLASS_ID,
> 	__INET_DIAG_MAX,
> };
>
>diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
>index 3828b3a805cd..2c2445d4bb58 100644
>--- a/net/ipv4/inet_diag.c
>+++ b/net/ipv4/inet_diag.c
>@@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
> 			goto errout;
> 	}
>
>+	if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
>+		u32 classid = 0;
>+
>+#ifdef CONFIG_SOCK_CGROUP_DATA
>+		classid = sock_cgroup_classid(&sk->sk_cgrp_data);
>+#endif
>+
>+		nla_put_u32(skb, INET_DIAG_CLASS_ID, classid);
>+	}
>+
> out:
> 	nlmsg_end(skb, nlh);
> 	return 0;
>-- 
>2.11.0

-- 

Thanks,
Sasha

^ permalink raw reply

* [net-next:master 1065/1071] net//core/filter.c:1881:8: error: implicit declaration of function '__sock_map_lookup_elem'
From: kbuild test robot @ 2017-08-16 20:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: kbuild-all, netdev

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head:   cf9d01405925e3f8144c99d7bf7b184449794066
commit: 174a79ff9515f400b9a6115643dafd62a635b7e6 [1065/1071] bpf: sockmap with sk redirect support
config: cris-etrax-100lx_v2_defconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        git checkout 174a79ff9515f400b9a6115643dafd62a635b7e6
        # save the attached .config to linux build tree
        make.cross ARCH=cris 

All error/warnings (new ones prefixed by >>):

   net//core/filter.c: In function 'do_sk_redirect_map':
>> net//core/filter.c:1881:8: error: implicit declaration of function '__sock_map_lookup_elem' [-Werror=implicit-function-declaration]
      sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
           ^~~~~~~~~~~~~~~~~~~~~~
>> net//core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
         ^
   cc1: some warnings being treated as errors

vim +/__sock_map_lookup_elem +1881 net//core/filter.c

  1874	
  1875	struct sock *do_sk_redirect_map(void)
  1876	{
  1877		struct redirect_info *ri = this_cpu_ptr(&redirect_info);
  1878		struct sock *sk = NULL;
  1879	
  1880		if (ri->map) {
> 1881			sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
  1882	
  1883			ri->ifindex = 0;
  1884			ri->map = NULL;
  1885			/* we do not clear flags for future lookup */
  1886		}
  1887	
  1888		return sk;
  1889	}
  1890	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 8731 bytes --]

^ permalink raw reply

* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
From: Bjorn Helgaas @ 2017-08-16 20:02 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Ding Tianhong, mark.rutland, gabriele.paoloni, asit.k.mallick,
	catalin.marinas, will.deacon, linuxarm, alexander.duyck,
	ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr,
	Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner,
	linux-arm-kernel, amira, netdev, linux-kernel, David.Laight,
	Suravee.Suthikulpanit, robin.murphy, davem, l.stach
In-Reply-To: <20170816193303.GA14147@ulmo>

On Wed, Aug 16, 2017 at 09:33:03PM +0200, Thierry Reding wrote:
> On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote:
> > On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote:
> > > Eric report a oops when booting the system after applying
> > > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):
> > > ...
> > 
> > > It looks like the pci_find_pcie_root_port() was trying to
> > > find the Root Port for the PCI device which is the Root
> > > Port already, it will return NULL and trigger the problem,
> > > so check the highest_pcie_bridge to fix thie problem.
> > 
> > The problem was actually with a Root Complex Integrated Endpoint that
> > has no upstream PCIe device:
> > 
> >   00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04)
> >         Subsystem: Intel Corporation Device 0e2a
> >         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
> >                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> >                         ExtTag- RBE- FLReset-
> >                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+
> >                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
> >                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> 
> I've started seeing this crash on Tegra K1 as well. Here's the device
> for which it oopses:
> 
> 00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) (prog-if 00 [Normal decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 391
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 00001000-00001fff [size=4K]
>         Memory behind bridge: 13000000-130fffff [size=1M]
>         Prefetchable memory behind bridge: 0000000020000000-00000000200fffff [size=1M]
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>         BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 Bridge
>         Capabilities: [48] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
>                 Address: 000000fcfffff000  Data: 0000
>         Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
>                 Mapping Address Base: 00000000fee00000
>         Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0
>                         ExtTag+ RBE+
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
>                 LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit Latency L0s <512ns
>                         ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
>                         Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
>                         Control: AttnInd Off, PwrInd On, Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
>                         Changed: MRL- PresDet+ LinkState+
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
>                          AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
>                          AtomicOpsCtl: ReqEn- EgressBlck-
>                 LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
>                          EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>         Kernel driver in use: pcieport
> 
> > > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
> > 
> > This also
> > 
> > Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
> > 
> > which added pci_find_pcie_root_port().  Prior to this Relaxed Ordering
> > series, we only used pci_find_pcie_root_port() in a Chelsio quirk that
> > only applied to non-integrated endpoints, so we didn't trip over the
> > bug.
> > 
> > > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> > > ---
> > >  drivers/pci/pci.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index af0cc34..7e2022f 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
> > >  		bridge = pci_upstream_bridge(bridge);
> > >  	}
> > >  
> > > -	if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> > > +	if (highest_pcie_bridge &&
> > > +	    pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> > >  		return NULL;
> > >  
> > >  	return highest_pcie_bridge;
> > > -- 
> > 
> > I think structuring the fix as follows is a little more readable:
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index af0cc3456dc1..587cd7623ed8 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
> >  		bridge = pci_upstream_bridge(bridge);
> >  	}
> >  
> > -	if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> > -		return NULL;
> > +	if (highest_pcie_bridge &&
> > +	    pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT)
> > +		return highest_pcie_bridge;
> >  
> > -	return highest_pcie_bridge;
> > +	return NULL;
> >  }
> >  EXPORT_SYMBOL(pci_find_pcie_root_port);
> 
> In case of Tegra, dev actually points to the root port. Now if I read
> the above code correctly, highest_pcie_bridge will still be NULL in that
> case, which in turn will return NULL from pci_find_pcie_root_port(). But
> shouldn't it really return dev?
> 
> The patch that I used to fix the issue is this:
> 
> --->8---
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 2c712dcfd37d..dd56c1c05614 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource);
>   */
>  struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
>  {
> -       struct pci_dev *bridge, *highest_pcie_bridge = NULL;
> +       struct pci_dev *bridge, *highest_pcie_bridge = dev;
>  
>         bridge = pci_upstream_bridge(dev);
>         while (bridge && pci_is_pcie(bridge)) {
> --->8---
> 
> That works correctly if this function ends up being called on the PCIe
> root port, though perhaps that's not what this function is supposed to
> do. It's somewhat unclear from the kerneldoc what the function should
> be doing when called on a root port device itself.

Your fix looks right to me.

^ permalink raw reply

* Re: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
From: Jorgen S. Hansen @ 2017-08-16 19:44 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	gregkh@linuxfoundation.org, devel@linuxdriverproject.org,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, George Zhang,
	Michal Kubecek, Asias He, Stefan Hajnoczi, Vitaly Kuznetsov,
	Cathy Avery, jasowang@redhat.com, Rolf Neugebauer, Dave Scott,
	Marcelo Cerri, apw@canonical.com
In-Reply-To: <KL1P15301MB00083C97259FF2A700C78442BF8D0@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

> On Aug 16, 2017, at 12:13 AM, Dexuan Cui <decui@microsoft.com> wrote:
> 
> 
> Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can
> automatically load when an application creates an AF_VSOCK socket.
> 
> This is the expected good behavior on VMware hypervisor, but as we
> are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
> should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper-V,
> otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
> and hv_sock.ko try to call vsock_core_init() on Hyper-V.

The VMCI driver (vmw_vmci.ko) is used both by the VMware guest support (VMware Tools primarily) and by our Workstation product. Always disabling the VMCI driver on Hyper-V means that user won’t be able to run Workstation nested in Linux VMs on Hyper-V. Since the VMCI driver itself isn’t the problem here, maybe we could move the check to vmw_vsock_vmci_transport.ko? Ideally, there should be some way for a user to have access to both protocols, but for now disabling the VMCI socket transport for Hyper-V (possibly with a module option to skip that check and always load it) but leaving the VMCI driver functional would be better,

Thanks,
Jorgen

^ permalink raw reply

* [PATCH net v2] net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
From: Roopa Prabhu @ 2017-08-16 19:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, fw, dsa, idaifish, syzkaller, dvyukov, eric.dumazet

From: Roopa Prabhu <roopa@cumulusnetworks.com>

Syzkaller hit 'general protection fault in fib_dump_info' bug on
commit 4.13-rc5..

Guilty file: net/ipv4/fib_semantics.c

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 2808 Comm: syz-executor0 Not tainted 4.13.0-rc5 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff880078562700 task.stack: ffff880078110000
RIP: 0010:fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314
RSP: 0018:ffff880078117010 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 00000000000000fe RCX: 0000000000000002
RDX: 0000000000000006 RSI: ffff880078117084 RDI: 0000000000000030
RBP: ffff880078117268 R08: 000000000000000c R09: ffff8800780d80c8
R10: 0000000058d629b4 R11: 0000000067fce681 R12: 0000000000000000
R13: ffff8800784bd540 R14: ffff8800780d80b5 R15: ffff8800780d80a4
FS:  00000000022fa940(0000) GS:ffff88007fc00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004387d0 CR3: 0000000079135000 CR4: 00000000000006f0
Call Trace:
  inet_rtm_getroute+0xc89/0x1f50 net/ipv4/route.c:2766
  rtnetlink_rcv_msg+0x288/0x680 net/core/rtnetlink.c:4217
  netlink_rcv_skb+0x340/0x470 net/netlink/af_netlink.c:2397
  rtnetlink_rcv+0x28/0x30 net/core/rtnetlink.c:4223
  netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
  netlink_unicast+0x4c4/0x6e0 net/netlink/af_netlink.c:1291
  netlink_sendmsg+0x8c4/0xca0 net/netlink/af_netlink.c:1854
  sock_sendmsg_nosec net/socket.c:633 [inline]
  sock_sendmsg+0xca/0x110 net/socket.c:643
  ___sys_sendmsg+0x779/0x8d0 net/socket.c:2035
  __sys_sendmsg+0xd1/0x170 net/socket.c:2069
  SYSC_sendmsg net/socket.c:2080 [inline]
  SyS_sendmsg+0x2d/0x50 net/socket.c:2076
  entry_SYSCALL_64_fastpath+0x1a/0xa5
  RIP: 0033:0x4512e9
  RSP: 002b:00007ffc75584cc8 EFLAGS: 00000216 ORIG_RAX:
  000000000000002e
  RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00000000004512e9
  RDX: 0000000000000000 RSI: 0000000020f2cfc8 RDI: 0000000000000003
  RBP: 000000000000000e R08: 0000000000000000 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000216 R12: fffffffffffffffe
  R13: 0000000000718000 R14: 0000000020c44ff0 R15: 0000000000000000
  Code: 00 0f b6 8d ec fd ff ff 48 8b 85 f0 fd ff ff 88 48 17 48 8b 45
  28 48 8d 78 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03
  <0f>
  b6 04 02 84 c0 74 08 3c 03 0f 8e cb 0c 00 00 48 8b 45 28 44
  RIP: fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314 RSP:
  ffff880078117010
---[ end trace 254a7af28348f88b ]---

This patch adds a res->fi NULL check.

example run:
$ip route get 0.0.0.0 iif virt1-0
broadcast 0.0.0.0 dev lo
    cache <local,brd> iif virt1-0

$ip route get 0.0.0.0 iif virt1-0 fibmatch
RTNETLINK answers: No route to host

Reported-by: idaifish <idaifish@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: b61798130f1b ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
v2 : change error code to -EHOSTUNREACH : suggestion from david ahern

 net/ipv4/route.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 7effa62..468d6a3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2763,14 +2763,21 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 	if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
 		table_id = rt->rt_table_id;
 
-	if (rtm->rtm_flags & RTM_F_FIB_MATCH)
+	if (rtm->rtm_flags & RTM_F_FIB_MATCH) {
+		if (!res.fi) {
+			err = fib_props[res.type].error;
+			if (!err)
+				err = -EHOSTUNREACH;
+			goto errout_free;
+		}
 		err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
 				    nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
 				    rt->rt_type, res.prefix, res.prefixlen,
 				    fl4.flowi4_tos, res.fi, 0);
-	else
+	} else {
 		err = rt_fill_info(net, dst, src, table_id, &fl4, skb,
 				   NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
+	}
 	if (err < 0)
 		goto errout_free;
 
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH net] ipv6: fix NULL dereference in ip6_route_dev_notify()
From: Eric Dumazet @ 2017-08-16 19:37 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev
In-Reply-To: <1502910934.4936.142.camel@edumazet-glaptop3.roam.corp.google.com>

On Wed, 2017-08-16 at 12:15 -0700, Eric Dumazet wrote:
> On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote:
> > On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Based on a syzkaller report [1], I found that a per cpu allocation
> > > failure in snmp6_alloc_dev() would then lead to NULL dereference in
> > > ip6_route_dev_notify().
> > >
> > > It seems this is a very old bug, thus no Fixes tag in this submission.
> > 
> > 
> > It should be introduced by my commit which introduces these
> > in6_dev_put().
> > 
> 
> Sorry, which commit exactly ?
> 
> I got the issue on 4.3 up to latest 4.12

Oh you're right, this was because I had backported
242d3a49a2a1a71d8eb9f953db1bcaa9d698ce00 into my trees a while back...

So the bug was only added in 4.12

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: John Fastabend @ 2017-08-16 19:34 UTC (permalink / raw)
  To: Eric Dumazet, David Miller; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <1502911033.4936.144.camel@edumazet-glaptop3.roam.corp.google.com>

On 08/16/2017 12:17 PM, Eric Dumazet wrote:
> On Wed, 2017-08-16 at 12:13 -0700, David Miller wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Wed, 16 Aug 2017 12:06:36 -0700
>>
>>> On 08/16/2017 11:35 AM, David Miller wrote:
>>>> From: David Miller <davem@davemloft.net>
>>>> Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)
>>>>
>>>>> From: John Fastabend <john.fastabend@gmail.com>
>>>>> Date: Tue, 15 Aug 2017 22:30:15 -0700
>>>>>
>>>>>> This series implements a sockmap and socket redirect helper for BPF
>>>>>> using a model similar to XDP netdev redirect.
>>>>>
>>>>> Series applied, thanks John.
>>>>>
>>>>
>>>> We get a legit warning from gcc due to these changes:
>>>>
>>>> kernel/bpf/sockmap.c: In function ‘smap_state_change’:
>>>> kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>>>   struct smap_psock *psock;
>>>>
>>>> It's the default switch case in this function, psock is not initialized
>>>> for sure at this point.
>>>>
>>>
>>> I missed this with older gcc4 it seems. Fixed now and compiling without errors
>>> now with gcc4/5. Below is the diff and all verifier/sockmap tests pass. Want a
>>> v2 I presume?
>>
>> I already pushed out v1, so you'll need to send me a fixup patch.
> 
> I also have a build error.
> 
> $ git grep -n __sock_map_lookup_elem
> include/linux/bpf.h:316:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
> kernel/bpf/sockmap.c:558:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
> net/core/filter.c:1881:         sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
> 
> 
> 
> $ make ...
> ...
> net/core/filter.c: In function ‘do_sk_redirect_map’:
> net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration]
>    sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>    ^
> net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default]
>    sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
>       ^
> cc1: some warnings being treated as errors
> make[2]: *** [net/core/filter.o] Error 1
> make[2]: *** Waiting for unfinished jobs....
> 
> 

Thanks Eric, I'll have a fix shortly.

^ permalink raw reply

* Re: [PATCH net RESEND] PCI: fix oops when try to find Root Port for a PCI device
From: Thierry Reding @ 2017-08-16 19:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ding Tianhong, mark.rutland, gabriele.paoloni, asit.k.mallick,
	catalin.marinas, will.deacon, linuxarm, alexander.duyck,
	ashok.raj, eric.dumazet, jeffrey.t.kirsher, linux-pci, ganeshgr,
	Bob.Shaw, leedom, patrick.j.cramer, bhelgaas, werner,
	linux-arm-kernel, amira, netdev, linux-kernel, David.Laight,
	Suravee.Suthikulpanit, robin.murphy, davem, l.stach
In-Reply-To: <20170815170331.GA4099@bhelgaas-glaptop.roam.corp.google.com>

[-- Attachment #1: Type: text/plain, Size: 8419 bytes --]

On Tue, Aug 15, 2017 at 12:03:31PM -0500, Bjorn Helgaas wrote:
> On Tue, Aug 15, 2017 at 11:24:48PM +0800, Ding Tianhong wrote:
> > Eric report a oops when booting the system after applying
> > the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):
> > ...
> 
> > It looks like the pci_find_pcie_root_port() was trying to
> > find the Root Port for the PCI device which is the Root
> > Port already, it will return NULL and trigger the problem,
> > so check the highest_pcie_bridge to fix thie problem.
> 
> The problem was actually with a Root Complex Integrated Endpoint that
> has no upstream PCIe device:
> 
>   00:05.2 System peripheral: Intel Corporation Device 0e2a (rev 04)
>         Subsystem: Intel Corporation Device 0e2a
>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- RBE- FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported+
>                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes

I've started seeing this crash on Tegra K1 as well. Here's the device
for which it oopses:

00:02.0 PCI bridge: NVIDIA Corporation TegraK1 PCIe x1 Bridge (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 391
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00001000-00001fff [size=4K]
        Memory behind bridge: 13000000-130fffff [size=1M]
        Prefetchable memory behind bridge: 0000000020000000-00000000200fffff [size=1M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Subsystem: NVIDIA Corporation TegraK1 PCIe x1 Bridge
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/2 Maskable- 64bit+
                Address: 000000fcfffff000  Data: 0000
        Capabilities: [60] HyperTransport: MSI Mapping Enable- Fixed-
                Mapping Address Base: 00000000fee00000
        Capabilities: [80] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag+ RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s, Exit Latency L0s <512ns
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
                        Slot #0, PowerLimit 0.000W; Interlock- NoCompl-
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Off, PwrInd On, Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd-
                         AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Kernel driver in use: pcieport

> > Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
> 
> This also
> 
> Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
> 
> which added pci_find_pcie_root_port().  Prior to this Relaxed Ordering
> series, we only used pci_find_pcie_root_port() in a Chelsio quirk that
> only applied to non-integrated endpoints, so we didn't trip over the
> bug.
> 
> > Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> > ---
> >  drivers/pci/pci.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index af0cc34..7e2022f 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -522,7 +522,8 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
> >  		bridge = pci_upstream_bridge(bridge);
> >  	}
> >  
> > -	if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> > +	if (highest_pcie_bridge &&
> > +	    pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> >  		return NULL;
> >  
> >  	return highest_pcie_bridge;
> > -- 
> 
> I think structuring the fix as follows is a little more readable:
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index af0cc3456dc1..587cd7623ed8 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -522,10 +522,11 @@ struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
>  		bridge = pci_upstream_bridge(bridge);
>  	}
>  
> -	if (pci_pcie_type(highest_pcie_bridge) != PCI_EXP_TYPE_ROOT_PORT)
> -		return NULL;
> +	if (highest_pcie_bridge &&
> +	    pci_pcie_type(highest_pcie_bridge) == PCI_EXP_TYPE_ROOT_PORT)
> +		return highest_pcie_bridge;
>  
> -	return highest_pcie_bridge;
> +	return NULL;
>  }
>  EXPORT_SYMBOL(pci_find_pcie_root_port);

In case of Tegra, dev actually points to the root port. Now if I read
the above code correctly, highest_pcie_bridge will still be NULL in that
case, which in turn will return NULL from pci_find_pcie_root_port(). But
shouldn't it really return dev?

The patch that I used to fix the issue is this:

--->8---
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2c712dcfd37d..dd56c1c05614 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -514,7 +514,7 @@ EXPORT_SYMBOL(pci_find_resource);
  */
 struct pci_dev *pci_find_pcie_root_port(struct pci_dev *dev)
 {
-       struct pci_dev *bridge, *highest_pcie_bridge = NULL;
+       struct pci_dev *bridge, *highest_pcie_bridge = dev;
 
        bridge = pci_upstream_bridge(dev);
        while (bridge && pci_is_pcie(bridge)) {
--->8---

That works correctly if this function ends up being called on the PCIe
root port, though perhaps that's not what this function is supposed to
do. It's somewhat unclear from the kerneldoc what the function should
be doing when called on a root port device itself.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related

* [PATCH net-next] ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t
From: Eric Dumazet @ 2017-08-16 19:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/dst.h        |    2 +-
 net/core/dst.c           |    6 +++---
 net/ipv4/fib_semantics.c |    2 +-
 net/ipv4/route.c         |    4 ++--
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index f73611ec401754d4f52b5310a24da53566dafce6..dd38177c3a61f5c4e48be9d57d4d10d6b7d14672 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -107,7 +107,7 @@ struct dst_entry {
 
 struct dst_metrics {
 	u32		metrics[RTAX_MAX];
-	atomic_t	refcnt;
+	refcount_t	refcnt;
 };
 extern const struct dst_metrics dst_default_metrics;
 
diff --git a/net/core/dst.c b/net/core/dst.c
index 00aa972ad1a1a451c24f3f8211243ad35c19433a..d6ead757c25895da01eb61bc9636c7e9b3cdfb3e 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -55,7 +55,7 @@ const struct dst_metrics dst_default_metrics = {
 	 * We really want to avoid false sharing on this variable, and catch
 	 * any writes on it.
 	 */
-	.refcnt = ATOMIC_INIT(1),
+	.refcnt = REFCOUNT_INIT(1),
 };
 
 void dst_init(struct dst_entry *dst, struct dst_ops *ops,
@@ -213,7 +213,7 @@ u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
 		struct dst_metrics *old_p = (struct dst_metrics *)__DST_METRICS_PTR(old);
 		unsigned long prev, new;
 
-		atomic_set(&p->refcnt, 1);
+		refcount_set(&p->refcnt, 1);
 		memcpy(p->metrics, old_p->metrics, sizeof(p->metrics));
 
 		new = (unsigned long) p;
@@ -225,7 +225,7 @@ u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
 			if (prev & DST_METRICS_READ_ONLY)
 				p = NULL;
 		} else if (prev & DST_METRICS_REFCOUNTED) {
-			if (atomic_dec_and_test(&old_p->refcnt))
+			if (refcount_dec_and_test(&old_p->refcnt))
 				kfree(old_p);
 		}
 	}
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index d521caf57385fa05f76036708057b95052330cb1..dd2ac3ed82ac86f8b417852233dfea69711e982d 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1090,7 +1090,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
 			kfree(fi);
 			return ERR_PTR(err);
 		}
-		atomic_set(&fi->fib_metrics->refcnt, 1);
+		refcount_set(&fi->fib_metrics->refcnt, 1);
 	} else {
 		fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics;
 	}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d400c05431063fc7bdd15b83ab540acc86decb3d..872b4cb136d3fa0cda403836cc83a156a65310a3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1398,7 +1398,7 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
 	struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);
 	struct rtable *rt = (struct rtable *) dst;
 
-	if (p != &dst_default_metrics && atomic_dec_and_test(&p->refcnt))
+	if (p != &dst_default_metrics && refcount_dec_and_test(&p->refcnt))
 		kfree(p);
 
 	if (!list_empty(&rt->rt_uncached)) {
@@ -1456,7 +1456,7 @@ static void rt_set_nexthop(struct rtable *rt, __be32 daddr,
 		dst_init_metrics(&rt->dst, fi->fib_metrics->metrics, true);
 		if (fi->fib_metrics != &dst_default_metrics) {
 			rt->dst._metrics |= DST_METRICS_REFCOUNTED;
-			atomic_inc(&fi->fib_metrics->refcnt);
+			refcount_inc(&fi->fib_metrics->refcnt);
 		}
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		rt->dst.tclassid = nh->nh_tclassid;

^ permalink raw reply related

* Re: [PATCH net V3] openvswitch: fix skb_panic due to the incorrect actions attrlen
From: Pravin Shelar @ 2017-08-16 19:21 UTC (permalink / raw)
  To: Liping Zhang
  Cc: David S. Miller, Linux Kernel Network Developers, Liping Zhang,
	Neil McKee
In-Reply-To: <20170816053007.13991-1-zlpnobody@163.com>

On Tue, Aug 15, 2017 at 10:30 PM, Liping Zhang <zlpnobody@163.com> wrote:
> From: Liping Zhang <zlpnobody@gmail.com>
>
> For sw_flow_actions, the actions_len only represents the kernel part's
> size, and when we dump the actions to the userspace, we will do the
> convertions, so it's true size may become bigger than the actions_len.
>
> But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
> to alloc the skbuff, so the user_skb's size may become insufficient and
> oops will happen like this:
>   skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
>   ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
>   ------------[ cut here ]------------
>   kernel BUG at net/core/skbuff.c:129!
>   [...]
>   Call Trace:
>    <IRQ>
>    [<ffffffff8148be82>] skb_put+0x43/0x44
>    [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
>    [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
>    [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
>    [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
>    [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
>    [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
>    [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
>    [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
>    [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
>    [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
>   [...]
>
> Also we can find that the actions_len is much little than the orig_len:
>   crash> struct sw_flow_actions 0xffff8812f539d000
>   struct sw_flow_actions {
>     rcu = {
>       next = 0xffff8812f5398800,
>       func = 0xffffe3b00035db32
>     },
>     orig_len = 1384,
>     actions_len = 592,
>     actions = 0xffff8812f539d01c
>   }
>
> So as a quick fix, use the orig_len instead of the actions_len to alloc
> the user_skb.
>
> Last, this oops happened on our system running a relative old kernel, but
> the same risk still exists on the mainline, since we use the wrong
> actions_len from the beginning.
>
> Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
> Cc: Neil McKee <neil.mckee@inmon.com>
> Signed-off-by: Liping Zhang <zlpnobody@gmail.com>

Looks good.
Acked-by: Pravin B Shelar <pshelar@ovn.org>

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: Eric Dumazet @ 2017-08-16 19:17 UTC (permalink / raw)
  To: David Miller; +Cc: john.fastabend, daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816.121305.1824274693019781357.davem@davemloft.net>

On Wed, 2017-08-16 at 12:13 -0700, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 16 Aug 2017 12:06:36 -0700
> 
> > On 08/16/2017 11:35 AM, David Miller wrote:
> >> From: David Miller <davem@davemloft.net>
> >> Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)
> >> 
> >>> From: John Fastabend <john.fastabend@gmail.com>
> >>> Date: Tue, 15 Aug 2017 22:30:15 -0700
> >>>
> >>>> This series implements a sockmap and socket redirect helper for BPF
> >>>> using a model similar to XDP netdev redirect.
> >>>
> >>> Series applied, thanks John.
> >>>
> >> 
> >> We get a legit warning from gcc due to these changes:
> >> 
> >> kernel/bpf/sockmap.c: In function ‘smap_state_change’:
> >> kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >>   struct smap_psock *psock;
> >> 
> >> It's the default switch case in this function, psock is not initialized
> >> for sure at this point.
> >> 
> > 
> > I missed this with older gcc4 it seems. Fixed now and compiling without errors
> > now with gcc4/5. Below is the diff and all verifier/sockmap tests pass. Want a
> > v2 I presume?
> 
> I already pushed out v1, so you'll need to send me a fixup patch.

I also have a build error.

$ git grep -n __sock_map_lookup_elem
include/linux/bpf.h:316:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
kernel/bpf/sockmap.c:558:struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key)
net/core/filter.c:1881:         sk = __sock_map_lookup_elem(ri->map, ri->ifindex);



$ make ...
...
net/core/filter.c: In function ‘do_sk_redirect_map’:
net/core/filter.c:1881:3: error: implicit declaration of function ‘__sock_map_lookup_elem’ [-Werror=implicit-function-declaration]
   sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
   ^
net/core/filter.c:1881:6: warning: assignment makes pointer from integer without a cast [enabled by default]
   sk = __sock_map_lookup_elem(ri->map, ri->ifindex);
      ^
cc1: some warnings being treated as errors
make[2]: *** [net/core/filter.o] Error 1
make[2]: *** Waiting for unfinished jobs....

^ permalink raw reply

* Re: [PATCH net] ipv6: fix NULL dereference in ip6_route_dev_notify()
From: Eric Dumazet @ 2017-08-16 19:15 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, netdev
In-Reply-To: <CAM_iQpXjhUWPC1goetn2bwG=SPeqWcz+bZvMVWOD_2cbDcG-qw@mail.gmail.com>

On Wed, 2017-08-16 at 11:50 -0700, Cong Wang wrote:
> On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Based on a syzkaller report [1], I found that a per cpu allocation
> > failure in snmp6_alloc_dev() would then lead to NULL dereference in
> > ip6_route_dev_notify().
> >
> > It seems this is a very old bug, thus no Fixes tag in this submission.
> 
> 
> It should be introduced by my commit which introduces these
> in6_dev_put().
> 

Sorry, which commit exactly ?

I got the issue on 4.3 up to latest 4.12

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 19:13 UTC (permalink / raw)
  To: john.fastabend; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <599497BC.6010205@gmail.com>

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 16 Aug 2017 12:06:36 -0700

> On 08/16/2017 11:35 AM, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)
>> 
>>> From: John Fastabend <john.fastabend@gmail.com>
>>> Date: Tue, 15 Aug 2017 22:30:15 -0700
>>>
>>>> This series implements a sockmap and socket redirect helper for BPF
>>>> using a model similar to XDP netdev redirect.
>>>
>>> Series applied, thanks John.
>>>
>> 
>> We get a legit warning from gcc due to these changes:
>> 
>> kernel/bpf/sockmap.c: In function ‘smap_state_change’:
>> kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>>   struct smap_psock *psock;
>> 
>> It's the default switch case in this function, psock is not initialized
>> for sure at this point.
>> 
> 
> I missed this with older gcc4 it seems. Fixed now and compiling without errors
> now with gcc4/5. Below is the diff and all verifier/sockmap tests pass. Want a
> v2 I presume?

I already pushed out v1, so you'll need to send me a fixup patch.

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: John Fastabend @ 2017-08-16 19:06 UTC (permalink / raw)
  To: David Miller; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816.113501.1961184995639656444.davem@davemloft.net>

On 08/16/2017 11:35 AM, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)
> 
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Tue, 15 Aug 2017 22:30:15 -0700
>>
>>> This series implements a sockmap and socket redirect helper for BPF
>>> using a model similar to XDP netdev redirect.
>>
>> Series applied, thanks John.
>>
> 
> We get a legit warning from gcc due to these changes:
> 
> kernel/bpf/sockmap.c: In function ‘smap_state_change’:
> kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>   struct smap_psock *psock;
> 
> It's the default switch case in this function, psock is not initialized
> for sure at this point.
> 

I missed this with older gcc4 it seems. Fixed now and compiling without errors
now with gcc4/5. Below is the diff and all verifier/sockmap tests pass. Want a
v2 I presume?

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 792f0ad..f7e5e6c 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -188,6 +188,9 @@ static void smap_state_change(struct sock *sk)
                        smap_release_sock(sk);
                break;
        default:
+               psock = smap_psock_sk(sk);
+               if (unlikely(!psock))
+                       break;
                smap_report_sk_error(psock, EPIPE);
                break;
        }

^ permalink raw reply related

* Re: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
From: David Miller @ 2017-08-16 18:54 UTC (permalink / raw)
  To: decui
  Cc: mkubecek, jasowang, georgezhang, sthemmin, rolf.neugebauer,
	marcelo.cerri, asias, dan.carpenter, olaf, haiyangz, dave.scott,
	stefanha, apw, netdev, linux-kernel, gregkh, joe, devel, vkuznets,
	jhansen
In-Reply-To: <KL1P15301MB0008FB89237C3BB7A6D2BB39BF820@KL1P15301MB0008.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 16 Aug 2017 18:51:36 +0000

> It looks typically modern Linux distros have CONFIG_HYPERVISOR_GUEST=y
> by default

It doesn't matter what any distribution does or does not do.

People are going to do 'randconfig' builds over thousands and
thousands of configuration combinations to test your changes,
and those tests will fail.

> Do you want me to submit a v2 for this patch with the Kconfig change?

Of course, there is no way I can apply your series as-is.

^ permalink raw reply

* RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
From: Dexuan Cui @ 2017-08-16 18:51 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, gregkh@linuxfoundation.org,
	devel@linuxdriverproject.org, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, georgezhang@vmware.com, jhansen@vmware.com,
	mkubecek@suse.cz, asias@redhat.com, stefanha@redhat.com,
	vkuznets@redhat.com, cavery@redhat.com, jasowang@redhat.com,
	rolf.neugebauer@docker.com, dave.scott@docker.com,
	"marcelo.cerri@can
In-Reply-To: <20170816.110635.1062040564700704290.davem@davemloft.net>

> From: David Miller [mailto:davem@davemloft.net]
> Sent: Wednesday, August 16, 2017 11:07
> 
> From: Dexuan Cui <decui@microsoft.com>
> Date: Tue, 15 Aug 2017 22:13:29 +0000
> 
> > +	/*
> > +	 * Check if we are running on VMware's hypervisor and bail out
> > +	 * if we are not.
> > +	 */
> > +	if (x86_hyper != &x86_hyper_vmware)
> > +		return -ENODEV;
> 
> This symbol is only available when CONFIG_HYPERVISOR_GUEST is defined.
> But this driver does not have a Kconfig dependency on that symbol so
> the build can fail in some configurations.

Hi David,
It looks typically modern Linux distros have CONFIG_HYPERVISOR_GUEST=y
by default, but I agree here we should make the dependency explicit:

--- a/drivers/misc/vmw_vmci/Kconfig
+++ b/drivers/misc/vmw_vmci/Kconfig
@@ -4,7 +4,7 @@

 config VMWARE_VMCI
        tristate "VMware VMCI Driver"
-       depends on X86 && PCI
+       depends on X86 && PCI && HYPERVISOR_GUEST
        help
          This is VMware's Virtual Machine Communication Interface.  It enables
          high-speed communication between host and guest in a virtual

And it looks it's not a bad thing to add the dependency, because some
existing VMWare drivers have had the dependency on CONFIG_HYPERVISOR_GUEST=y:
drivers/input/mouse/vmmouse.c (MOUSE_PS2_VMMOUSE)
drivers/misc/vmw_balloon.c (VMWARE_BALLOON)

Do you want me to submit a v2 for this patch with the Kconfig change?

-- Dexuan

^ permalink raw reply

* Re: [PATCH net] ipv6: fix NULL dereference in ip6_route_dev_notify()
From: Cong Wang @ 2017-08-16 18:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1502795391.4936.60.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Aug 15, 2017 at 4:09 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Based on a syzkaller report [1], I found that a per cpu allocation
> failure in snmp6_alloc_dev() would then lead to NULL dereference in
> ip6_route_dev_notify().
>
> It seems this is a very old bug, thus no Fixes tag in this submission.

It should be introduced by my commit which introduces these
in6_dev_put().

>
> Let's add in6_dev_put_clear() helper, as we will probably use
> it elsewhere (once available/present in net-next)
>

I don't understand why not just check ->rt6i_idev against NULL
for -net?

Ideally, we should let notifiers handle this, when we fail in one
of those notifiers, we should only rollback those succeeded
rather than all of them, but this may require more changes.

Thanks.

^ permalink raw reply

* Re: [PATCH V4 net-next 21/21] net-next/hinic: Add select_queue and netpoll
From: David Miller @ 2017-08-16 18:47 UTC (permalink / raw)
  To: aviad.krawczyk
  Cc: linux-kernel, netdev, bc.y, victor.gissin, zhaochen6, tony.qu
In-Reply-To: <d5f45e39dc7d5ccb0bcb6bd5e2ebea7d6ffaba83.1502883967.git.aviad.krawczyk@huawei.com>

From: Aviad Krawczyk <aviad.krawczyk@huawei.com>
Date: Wed, 16 Aug 2017 20:03:06 +0800

> +static u16 hinic_select_queue(struct net_device *netdev, struct sk_buff *skb,
> +			      void *accel_priv,
> +			      select_queue_fallback_t fallback)
> +{
> +	u16 qid;
> +
> +	if (skb_rx_queue_recorded(skb))
> +		qid = skb_get_rx_queue(skb);
> +	else
> +		qid = fallback(netdev, skb);
> +
> +	return qid;
> +}

This is a NOP, do not implement this function unless you absolutely need to
do something custom in your driver, and you do not.

^ permalink raw reply

* Re: [PATCH net-next 0/2] nfp: process MTU updates from firmware flower app
From: David Miller @ 2017-08-16 18:36 UTC (permalink / raw)
  To: simon.horman; +Cc: jakub.kicinski, netdev, oss-drivers
In-Reply-To: <1502869064-25842-1-git-send-email-simon.horman@netronome.com>

From: Simon Horman <simon.horman@netronome.com>
Date: Wed, 16 Aug 2017 09:37:42 +0200

> The first patch of this series moves processing of control messages from a
> BH handler to a workqueue. That change makes it safe to process MTU
> updates from the firmware which is added by the second patch of this
> series.

Series applied, thanks.

^ permalink raw reply

* Re: [net-next PATCH] bpf: devmap: remove unnecessary value size check
From: David Miller @ 2017-08-16 18:35 UTC (permalink / raw)
  To: john.fastabend; +Cc: netdev
In-Reply-To: <20170816063512.14925.40390.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Tue, 15 Aug 2017 23:35:12 -0700

> In the devmap alloc map logic we check to ensure that the sizeof the
> values are not greater than KMALLOC_MAX_SIZE. But, in the dev map case
> we ensure the value size is 4bytes earlier in the function because all
> values should be netdev ifindex values.
> 
> The second check is harmless but is not needed so remove it.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 00/10] BPF: sockmap and sk redirect support
From: David Miller @ 2017-08-16 18:35 UTC (permalink / raw)
  To: john.fastabend; +Cc: daniel, ast, tgraf, netdev, tom
In-Reply-To: <20170816.112819.1162151697604053267.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 16 Aug 2017 11:28:19 -0700 (PDT)

> From: John Fastabend <john.fastabend@gmail.com>
> Date: Tue, 15 Aug 2017 22:30:15 -0700
> 
>> This series implements a sockmap and socket redirect helper for BPF
>> using a model similar to XDP netdev redirect.
> 
> Series applied, thanks John.
> 

We get a legit warning from gcc due to these changes:

kernel/bpf/sockmap.c: In function ‘smap_state_change’:
kernel/bpf/sockmap.c:156:21: warning: ‘psock’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  struct smap_psock *psock;

It's the default switch case in this function, psock is not initialized
for sure at this point.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox