Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [bpf-next PATCH v2 10/18] bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192355.8039.74913.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:55 -0700

> Test read and writes for BPF_PROG_TYPE_SK_MSG.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [bpf-next PATCH v2 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192350.8039.60544.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:50 -0700

> Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [bpf-next PATCH v2 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data
From: David Miller @ 2018-03-15 18:42 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192345.8039.27774.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:45 -0700

> Currently, if a bpf sk msg program is run the program
> can only parse data that the (start,end) pointers already
> consumed. For sendmsg hooks this is likely the first
> scatterlist element. For sendpage this will be the range
> (0,0) because the data is shared with userspace and by
> default we want to avoid allowing userspace to modify
> data while (or after) BPF verdict is being decided.
> 
> To support pulling in additional bytes for parsing use
> a new helper bpf_sk_msg_pull(start, end, flags) which
> works similar to cls tc logic. This helper will attempt
> to point the data start pointer at 'start' bytes offest
> into msg and data end pointer at 'end' bytes offset into
> message.
 ...
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Yeah this looks really nice.

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [bpf-next PATCH v2 07/18] bpf: sockmap, add msg_cork_bytes() helper
From: David Miller @ 2018-03-15 18:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192339.8039.54134.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:39 -0700

> In the case where we need a specific number of bytes before a
> verdict can be assigned, even if the data spans multiple sendmsg
> or sendfile calls. The BPF program may use msg_cork_bytes().
> 
> The extreme case is a user can call sendmsg repeatedly with
> 1-byte msg segments. Obviously, this is bad for performance but
> is still valid. If the BPF program needs N bytes to validate
> a header it can use msg_cork_bytes to specify N bytes and the
> BPF program will not be called again until N bytes have been
> accumulated. The infrastructure will attempt to coalesce data
> if possible so in many cases (most my use cases at least) the
> data will be in a single scatterlist element with data pointers
> pointing to start/end of the element. However, this is dependent
> on available memory so is not guaranteed. So BPF programs must
> validate data pointer ranges, but this is the case anyways to
> convince the verifier the accesses are valid.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [bpf-next PATCH v2 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper
From: David Miller @ 2018-03-15 18:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192334.8039.98220.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:34 -0700

> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Note2 if this is operating in the sendpage context the data
> pointers may be zeroed after this call if the apply walks beyond
> a msg_pull_data() call specified data range. (helper implemented
> shortly in this series).
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* Re: [bpf-next PATCH v2 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
From: David Miller @ 2018-03-15 18:41 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192329.8039.75277.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:29 -0700

> This implements a BPF ULP layer to allow policy enforcement and
> monitoring at the socket layer. In order to support this a new
> program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
> the sendmsg/sendpage hook. To attach the policy to sockets a
> sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.
 ...
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [pci PATCH v7 1/5] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Alexander Duyck @ 2018-03-15 18:41 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180315183449.3102.64791.stgit@localhost.localdomain>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch adds a common configuration function called
pci_sriov_configure_simple that will allow for managing VFs on devices
where the PF is not capable of managing VF resources.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

v5: New patch replacing pci_sriov_configure_unmanaged with
      pci_sriov_configure_simple
    Dropped bits related to autoprobe changes
v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled
v7: Updated pci_sriov_configure_simple to drop need for err value
    Fixed comment explaining why pci_sriov_configure_simple is NULL

 drivers/pci/iov.c   |   31 +++++++++++++++++++++++++++++++
 include/linux/pci.h |    3 +++
 2 files changed, 34 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924ae0350..3e0a7fdff3e9 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -807,3 +807,34 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
 	return dev->sriov->total_VFs;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
+
+/**
+ * pci_sriov_configure_simple - helper to configure unmanaged SR-IOV
+ * @dev: the PCI device
+ * @nr_virtfn: number of virtual functions to enable, 0 to disable
+ *
+ * Used to provide generic enable/disable SR-IOV option for devices
+ * that do not manage the VFs generated by their driver. Return value
+ * is negative on error, or number of VFs allocated on success.
+ */
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
+{
+	might_sleep();
+
+	if (!dev->is_physfn)
+		return -ENODEV;
+
+	if (pci_vfs_assigned(dev)) {
+		pci_warn(dev,
+			 "Cannot modify SR-IOV while VFs are assigned\n");
+		return -EPERM;
+	}
+
+	if (!nr_virtfn) {
+		sriov_disable(dev);
+		return 0;
+	}
+
+	return sriov_enable(dev, nr_virtfn) ? : nr_virtfn;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_simple);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..992202449829 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1953,6 +1953,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
 #else
@@ -1980,6 +1981,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+/* this is expected to be used as a function pointer, just define as NULL */
+#define pci_sriov_configure_simple NULL
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
 static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }

^ permalink raw reply related

* Re: [bpf-next PATCH v2 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG
From: David Miller @ 2018-03-15 18:40 UTC (permalink / raw)
  To: john.fastabend; +Cc: ast, daniel, davejwatson, netdev
In-Reply-To: <20180312192319.8039.61347.stgit@john-Precision-Tower-5810>

From: John Fastabend <john.fastabend@gmail.com>
Date: Mon, 12 Mar 2018 12:23:19 -0700

> When calling do_tcp_sendpages() from in kernel and we know the data
> has no references from user side we can omit SKBTX_SHARED_FRAG flag.
> This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
> to omit setting SKBTX_SHARED_FRAG.
> 
> The flag is not exposed to userspace because the sendpage call from
> the splice logic masks out all bits except MSG_MORE.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [pci PATCH v7 0/5] Add support for unmanaged SR-IOV
From: Alexander Duyck @ 2018-03-15 18:40 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw

This series is meant to add support for SR-IOV on devices when the VFs are
not managed by the kernel. Examples of recent patches attempting to do this
include:
virto - https://patchwork.kernel.org/patch/10241225/
pci-stub - https://patchwork.kernel.org/patch/10109935/
vfio - https://patchwork.kernel.org/patch/10103353/
uio - https://patchwork.kernel.org/patch/9974031/

Since this is quickly blowing up into a multi-driver problem it is probably
best to implement this solution as generically as possible.

This series is an attempt to do that. What we do with this patch set is
provide a generic framework to enable SR-IOV in the case that the PF driver
doesn't support managing the VFs itself.

I based my patch set originally on the patch by Mark Rustad but there isn't
much left after going through and cleaning out the bits that were no longer
needed, and after incorporating the feedback from David Miller. At this point
the only items to be fully reused was his patch description which is now
present in patch 3 of the set.

This solution is limited in scope to just adding support for devices that
provide no functionality for SR-IOV other than allocating the VFs by
calling pci_enable_sriov. Previous sets had included patches for VFIO, but
for now I am dropping that as the scope of that work is larger then I
think I can take on at this time.

v2: Reduced scope back to just virtio_pci and vfio-pci
    Broke into 3 patch set from single patch
    Changed autoprobe behavior to always set when num_vfs is set non-zero
v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
    Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
v4: Dropped vfio-pci patch
    Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
    Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
    Added new patch that enables pci_sriov_configure_simple
    Updated drivers to use pci_sriov_configure_simple
v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
    Updated drivers to drop "#ifdef" checks for IOV
    Added pci-pf-stub as place for PF-only drivers to add support
v7: Dropped pci_id table explanation from pci-pf-stub driver
    Updated pci_sriov_configure_simple to drop need for err value
    Fixed comment explaining why pci_sriov_configure_simple is NULL

Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Liang-Min Wang <liang-min.wang@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>

---

Alexander Duyck (5):
      pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
      virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
      ena: Migrate over to unmanaged SR-IOV support
      nvme: Migrate over to unmanaged SR-IOV support
      pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs

 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 -------------
 drivers/nvme/host/pci.c                      |   20 ----------
 drivers/pci/Kconfig                          |   12 ++++++
 drivers/pci/Makefile                         |    2 +
 drivers/pci/iov.c                            |   31 +++++++++++++++
 drivers/pci/pci-pf-stub.c                    |   54 ++++++++++++++++++++++++++
 drivers/virtio/virtio_pci_common.c           |    1 
 include/linux/pci.h                          |    3 +
 include/linux/pci_ids.h                      |    2 +
 9 files changed, 107 insertions(+), 46 deletions(-)
 create mode 100644 drivers/pci/pci-pf-stub.c

--

^ permalink raw reply

* Re: [PATCH v2] sctp: Fix double free in sctp_sendmsg_to_asoc
From: David Miller @ 2018-03-15 18:32 UTC (permalink / raw)
  To: nhorman; +Cc: linux-sctp, netdev, lucien.xin
In-Reply-To: <20180312181525.21774-1-nhorman@tuxdriver.com>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Mon, 12 Mar 2018 14:15:25 -0400

> syzbot/kasan detected a double free in sctp_sendmsg_to_asoc:
> BUG: KASAN: use-after-free in sctp_association_free+0x7b7/0x930
> net/sctp/associola.c:332
> Read of size 8 at addr ffff8801d8006ae0 by task syzkaller914861/4202
 ...
> This was introduced by commit:
> f84af33 sctp: factor out sctp_sendmsg_to_asoc from sctp_sendmsg
> 
> As the newly refactored function moved the wait_for_sndbuf call to a
> point after the association was connected, allowing for peeloff events
> to occur, which in turn caused wait_for_sndbuf to return -EPIPE which
> was not caught by the logic that determines if an association should be
> freed or not.
> 
> Fix it the easy way by returning the ordering of
> sctp_primitive_ASSOCIATE and sctp_wait_for_sndbuf to the old order, to
> ensure that EPIPE will not happen.
> 
> Tested by myself using the syzbot reproducers with positive results
> 
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: davem@davemloft.net
> CC: Xin Long <lucien.xin@gmail.com>
> Reported-by: syzbot+a4e4112c3aff00c8cfd8@syzkaller.appspotmail.com

Applied, thanks Neil.

^ permalink raw reply

* Re: [PATCH] [v2] Bluetooth: btrsi: rework dependencies
From: Marcel Holtmann @ 2018-03-15 18:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Johan Hedberg, Kalle Valo, Sebastian Reichel, Amitkumar Karwar,
	Siva Rebbagondla, Linux Bluetooth mailing list, LKML,
	linux-wireless, netdev
In-Reply-To: <20180315165050.2211236-1-arnd@arndb.de>

Hi Arnd,

> The linkage between the bluetooth driver and the wireless
> driver is not defined properly, leading to build problems
> such as:
> 
> warning: (BT_HCIRSI) selects RSI_COEX which has unmet direct dependencies (NETDEVICES && WLAN && WLAN_VENDOR_RSI && BT_HCIRSI && RSI_91X)
> drivers/net/wireless/rsi/rsi_91x_main.o: In function `rsi_read_pkt':
> (.text+0x205): undefined reference to `rsi_bt_ops'
> 
> As the dependency is actually the reverse (RSI_91X uses
> the BT_RSI driver, not the other way round), this changes
> the dependency to match, and enables the bluetooth driver
> from the RSI_COEX symbol.
> 
> Fixes: 38aa4da50483 ("Bluetooth: btrsi: add new rsi bluetooth driver")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: Pick a different from v1
> ---
> drivers/bluetooth/Kconfig        | 4 +---
> drivers/net/wireless/rsi/Kconfig | 4 +++-
> 2 files changed, 4 insertions(+), 4 deletions(-)

Acked-by: Marcel Holtmann <marcel@holtmann.org>

Since I think Kalle still has to take it through his tree until the btrsi driver makes it into net-next.

Regards

Marcel

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
From: Florian Fainelli @ 2018-03-15 18:29 UTC (permalink / raw)
  To: Grygorii Strashko, SZ Lin (林上智),
	David S. Miller
  Cc: Schuyler Patton, Ivan Khoronzhuk, Keerthy, Sekhar Nori,
	linux-omap, netdev, linux-kernel
In-Reply-To: <eddbe437-8a9f-c7df-4e68-8a45c6d89cdb@ti.com>

On 03/15/2018 11:18 AM, Grygorii Strashko wrote:
> 
> 
> On 03/15/2018 12:39 PM, Grygorii Strashko wrote:
>>
>>
>> On 03/15/2018 11:56 AM, SZ Lin (林上智) wrote:
>>> According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
>>> DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register
>>> is only
>>> available when PHY is configured in RGMII mode with 10Mbps speed. It
>>> will
>>> cause some networking issues without RGMII mode, such as carrier sense
>>> errors and low throughput. TI also mentioned this issue in their
>>> forum[4].
>>>
>>> This patch adds the check mechanism for PHY interface with RGMII
>>> interface
>>> type, the in-band mode can only be set in RGMII mode with 10Mbps speed.
>>>
>>> References:
>>> [1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
>>> [2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
>>> [3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
>>> [4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155
>>>
>>> Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
>>> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
>>> Signed-off-by: Schuyler Patton <spatton@ti.com>
>>> ---
>>> Changes from v1:
>>> - Use phy_interface_is_rgmii helper function
>>> - Remove blank line
>>>
>>
>> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
>>
> 
> Also could this be marked as stable material 4.9+?

This is not how it works for networking changes, just make sure you
provide a "Fixes:" tag, and David would usually take care of queueing
the change to -stable accordingly:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/netdev-FAQ.txt#n148
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net: drivers/net: Remove unnecessary skb_copy_expand OOM messages
From: David Miller @ 2018-03-15 18:28 UTC (permalink / raw)
  To: joe
  Cc: inaky.perez-gonzalez, linux-wimax, johannes, pablo, kadlec, fw,
	kvalo, balbi, gregkh, netdev, linux-kernel, linux-usb,
	linux-wireless, netfilter-devel, coreteam
In-Reply-To: <782fa75ab2e443d275f19d97f2f3a073b6074517.1520867179.git.joe@perches.com>

From: Joe Perches <joe@perches.com>
Date: Mon, 12 Mar 2018 08:07:12 -0700

> skb_copy_expand without __GFP_NOWARN already does a dump_stack
> on OOM so these messages are redundant.
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Ok, applied to net-next, thanks.

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH 03/15] ice: Start hardware initialization
From: Venkataramanan, Anirudh @ 2018-03-15 18:27 UTC (permalink / raw)
  To: shannon.nelson@oracle.com, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org
In-Reply-To: <f07d8b58-aea8-c7a9-fc93-ac6d220ade37@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 1733 bytes --]

On Thu, 2018-03-15 at 10:00 -0700, Shannon Nelson wrote:
> On 3/14/2018 3:05 PM, Venkataramanan, Anirudh wrote:
> > On Mon, 2018-03-12 at 19:05 -0700, Shannon Nelson wrote:
> > > On 3/9/2018 9:21 AM, Anirudh Venkataramanan wrote:
> 
> 
> > > > +
> > > > +/**
> > > > + * ice_read_sr_aq - Read Shadow RAM.
> > > > + * @hw: pointer to the HW structure
> > > > + * @offset: offset in words from module start
> > > > + * @words: number of words to read
> > > > + * @data: buffer for words reads from Shadow RAM
> > > > + * @last_command: tells the AdminQ that this is the last
> > > > command
> > > > + *
> > > > + * Reads 16-bit word buffers from the Shadow RAM using the
> > > > admin
> > > > command.
> > > > + */
> > > > +static enum ice_status
> > > > +ice_read_sr_aq(struct ice_hw *hw, u32 offset, u16 words, u16
> > > > *data,
> > > > +	       bool last_command)
> > > > +{
> > > > +	enum ice_status status;
> > > > +
> > > > +	status = ice_check_sr_access_params(hw, offset,
> > > > words);
> > > > +	if (!status)
> > > > +		status = ice_aq_read_nvm(hw, 0, 2 * offset, 2
> > > > *
> > > > words, data,
> > > 
> > > Why the doubling of offset and words?  If this is some general
> > > adjustment made for the AQ interface, it should be made in
> > > ice_aq_read_nvm().  If not, then some explanation is needed here.
> > 
> > ice_read_sr_aq expects a word offset and size in words. The
> > ice_aq_read_nvm interface expects offset and size in bytes. The
> > doubling is a conversion from word offset/size to byte offset/size.
> 
> In that case, this might be a good place for a small comment for
> readers 
> like me who don't have the spec available.

Sure thing! :-)

> 
> sln
> 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 3302 bytes --]

^ permalink raw reply

* Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2018-03-14
From: David Miller @ 2018-03-15 18:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180314204448.13878-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 14 Mar 2018 13:44:39 -0700

> This series contains updates to i40e and i40evf only.

Series applied, thanks Jeff.

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
From: Grygorii Strashko @ 2018-03-15 18:18 UTC (permalink / raw)
  To: SZ Lin (林上智), David S. Miller
  Cc: Schuyler Patton, Ivan Khoronzhuk, Keerthy, Sekhar Nori,
	linux-omap, netdev, linux-kernel
In-Reply-To: <b866b57c-2e15-978b-e1e5-5b2c7aa0a06a@ti.com>



On 03/15/2018 12:39 PM, Grygorii Strashko wrote:
> 
> 
> On 03/15/2018 11:56 AM, SZ Lin (林上智) wrote:
>> According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
>> DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register is 
>> only
>> available when PHY is configured in RGMII mode with 10Mbps speed. It will
>> cause some networking issues without RGMII mode, such as carrier sense
>> errors and low throughput. TI also mentioned this issue in their 
>> forum[4].
>>
>> This patch adds the check mechanism for PHY interface with RGMII 
>> interface
>> type, the in-band mode can only be set in RGMII mode with 10Mbps speed.
>>
>> References:
>> [1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
>> [2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
>> [3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
>> [4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155
>>
>> Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
>> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
>> Signed-off-by: Schuyler Patton <spatton@ti.com>
>> ---
>> Changes from v1:
>> - Use phy_interface_is_rgmii helper function
>> - Remove blank line
>>
> 
> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
> 

Also could this be marked as stable material 4.9+?

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [bug, bisected] pfifo_fast causes packet reordering
From: Jakob Unterwurzacher @ 2018-03-15 18:08 UTC (permalink / raw)
  To: John Fastabend, Dave Taht
  Cc: netdev, linux-kernel, David S. Miller, linux-can@vger.kernel.org,
	Martin Elshuber
In-Reply-To: <95844480-d020-9000-53ef-0da8b965ce6e@gmail.com>

On 14.03.18 05:03, John Fastabend wrote:
> On 03/13/2018 11:35 AM, Dave Taht wrote:
>> On Tue, Mar 13, 2018 at 11:24 AM, Jakob Unterwurzacher
>> <jakob.unterwurzacher@theobroma-systems.com> wrote:
>>> During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on Linux
>>> v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of packets are
>>> delivered out-of-order.
>>>
> 
> Is the stress-testing tool available somewhere? What type of packets
> are being sent?


I have reproduced it using two USB network cards connected to each 
other. The test tool sends UDP packets containing a counter and listens 
on the other interface, it is available at
https://github.com/jakob-tsd/pfifo_stress/blob/master/pfifo_stress.py

Here is what I get:

root@rk3399-q7:~# ./pfifo_stress.py
[...]
expected ctr 0xcdc, received 0xcdd
expected ctr 0xcde, received 0xcdc
expected ctr 0xcdd, received 0xcde
expected ctr 0xe3c, received 0xe3d
expected ctr 0xe3e, received 0xe3c
expected ctr 0xe3d, received 0xe3e
expected ctr 0x1097, received 0x1098
expected ctr 0x1099, received 0x1097
expected ctr 0x1098, received 0x1099
expected ctr 0x17c0, received 0x17c1
expected ctr 0x17c2, received 0x17c0
[...]

Best regards,
Jakob

^ permalink raw reply

* Re: [PATCH net-next] rds: tcp: must use spin_lock_irq* and not spin_lock_bh with rds_tcp_conn_lock
From: Santosh Shilimkar @ 2018-03-15 17:52 UTC (permalink / raw)
  To: Sowmini Varadhan, netdev; +Cc: davem
In-Reply-To: <1521111266-148947-1-git-send-email-sowmini.varadhan@oracle.com>

On 3/15/2018 3:54 AM, Sowmini Varadhan wrote:
> rds_tcp_connection allocation/free management has the potential to be
> called from __rds_conn_create after IRQs have been disabled, so
> spin_[un]lock_bh cannot be used with rds_tcp_conn_lock.
> 
> Bottom-halves that need to synchronize for critical sections protected
> by rds_tcp_conn_lock should instead use rds_destroy_pending() correctly.
> 
> Reported-by: syzbot+c68e51bb5e699d3f8d91@syzkaller.appspotmail.com
> Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
>         netns/module teardown and rds connection/workq management")
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> ---
Thanks Sowmini for the WARN_ON() discussion off-list.

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>

^ permalink raw reply

* Re: Fw: [Bug 199109] New: pptp: kernel printk "recursion detected", and then reboot itself
From: Guillaume Nault @ 2018-03-15 17:51 UTC (permalink / raw)
  To: xuheng333, Stephen Hemminger; +Cc: xeb, netdev
In-Reply-To: <20180314080217.1f461de3@xeon-e3>

On Wed, Mar 14, 2018 at 08:02:17AM -0700, Stephen Hemminger wrote:
> 
> 
> Begin forwarded message:
> 
> Date: Wed, 14 Mar 2018 06:56:09 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 199109] New: pptp: kernel printk "recursion detected", and then reboot itself
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=199109
> 
>             Bug ID: 199109
>            Summary: pptp: kernel printk "recursion detected", and then
>                     reboot itself
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.9.77
>           Hardware: Mips32
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: high
>           Priority: P1
>          Component: IPV4
>           Assignee: stephen@networkplumber.org
>           Reporter: xuheng333@zoho.com
>         Regression: No
> 
> Created attachment 274715
>   --> https://bugzilla.kernel.org/attachment.cgi?id=274715&action=edit  
> system log
> 
> Use openwrt LEDE, (gcc version 5.5.0 (OpenWrt GCC 5.5.0 r5932-9c4fe10) ).
> When use pptp, make WAN(eth1) down and then make WAN(eth1) up, pptp will redial
> itself. Do this in a loop.
> 
> After a while, kernel print "recursion detected" and "net_ratelimit: 49422
> callbacks suppressed", print this many times, then system reboot.
> 
> When add more printk in driver "driver/net/ppp/ppp_generic.c", fond it fall in
> loop in "static void __ppp_xmit_process(struct ppp *ppp)", which called by
> "static void __ppp_channel_push(struct channel *pch)". 
> __ppp_channel_push() locked ppp->xmit_recursion, and loop for long time( while
> (!ppp->xmit_pending && (skb = skb_dequeue(&ppp->file.xq))) ppp_send_frame(ppp,
> skb); ). Some other thread want to lock ppp->xmit_recursion, but failed.
> 
> 
Hi Xuheng,

Are you sure that your PPTP packets aren't routed back to your PPP
interface? What's the IP address of the PPTP server?

Regards,

Guillaume

^ permalink raw reply

* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
From: Daniel Borkmann @ 2018-03-15 17:48 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Liran Alon, davem, netdev, linux-kernel, idan.brown, Yuval Shaia
In-Reply-To: <20180315175444.02d70f23@halley>

On 03/15/2018 04:54 PM, Shmulik Ladkani wrote:
> On Thu, 15 Mar 2018 16:13:39 +0100 Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 03/15/2018 01:50 PM, Shmulik Ladkani wrote:
>>>
>>> It would be beneficial to have the mark preserved when skb is injected
>>> to the slave device's rx path (especially when it's on the same netns).  
>>
>> Right, I think also here the easiest would be to have a BPF_F_PRESERVE_MARK
>> flag to opt-in in general case (xnet/non-xnet)
> 
> Sounds okay to me.
> 
>> But lets presume for a sec you would _not_ scrub it, then how are users
>> supposed to make use of this? The feature/bug may not be critical enough
>> (well, otherwise it wouldn't have been like this for long time) for stable,
>> so to write an app relying on it the behavior will change from kernel A to
>> kernel B, where you need to end up having a full blown veth run-time test
>> in order to figure it out before you can use it, not really useful either.
> 
> Let's assume BPF_F_PRESERVE_MARK is a feature then, which is available only
> in new kernels.
> As said, this flag will not be honored by older kernels.
> 
> But your "run-time test" argument is true for every new flag-bit
> introduced to bpf functions, for example:
>  BPF_F_SEQ_NUMBER was added after other skb_set_tunnel_key flags,
>  Same for BPF_F_INVALIDATE_HASH (skb_store_bytes), BPF_F_MARK_ENFORCE
>  (l4_csum_replace) and others.
> 
> With every flag addition, the flag mask validation in the corresponding
> bpf function has been relaxed to support it.
> 
> Why is BPF_F_PRESERVE_MARK any different from any previous flag addition?

Not really different than other BPF helpers, but you can simply figure it out
via BPF_PROG_TEST_RUN by calling bpf_redirect() helper on some fake ifindex
and check the return value whether it's shot or redirect. This is definitely
trivial to do and doesn't require any devs to set up compared to otherwise
full blown setup. E.g. test_verifier uses this for the test case array it
contains.

Cheers,
Daniel

^ permalink raw reply

* Re: [Intel-wired-lan] [RFC PATCH 0/2] net:setup XPS mapping for each online CPU
From: Alexander Duyck @ 2018-03-15 17:47 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Eric Dumazet, netdev, intel-wired-lan, David Miller
In-Reply-To: <1521134431.2681.26.camel@redhat.com>

On Thu, Mar 15, 2018 at 10:20 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> Hi,
>
> On Thu, 2018-03-15 at 15:59 +0000, Eric Dumazet wrote:
>> On Thu, Mar 15, 2018 at 8:51 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> > I'm sorry, I do not follow. AFAICS with unconnected sockets without XPS
>> > we always hit the netdev_pick_tx()/skb_tx_hash()/skb_flow_dissect()
>> > overhead in xmit path.
>>
>> Then fix this if you want, instead of fixing one NIC only, or by enforcing
>> XPS by all NIC.
>>
>> For unconnected sockets, picking the TX queue based on current cpu is good,
>> we do not have to enforce ordering as much as possible.
>>
>> (pfifo_fast no longer can enforce it anyway)
>
> Thank you for the prompt reply.
>
> I'm double checking to avoid misinterpretation on my side: are you
> suggesting to plug a CPU-based selection logic for unconnected sockets
> in netdev_pick_tx() or to cook patches like 2/2 for all the relevant
> NICs?
>
> Thanks!
>
> Paolo

We just need to watch out for any possible side effects. For example
using XPS on a virtualization host has been problematic as you end up
with the traffic getting reordered if the VM jumps from CPU to CPU.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
From: Stephen Hemminger @ 2018-03-15 17:40 UTC (permalink / raw)
  To: Mohammed Gamal
  Cc: netdev, sthemmin, otubo, linux-kernel, devel, vkuznets, davem
In-Reply-To: <1521131053.30828.1.camel@redhat.com>

On Thu, 15 Mar 2018 17:24:13 +0100
Mohammed Gamal <mgamal@redhat.com> wrote:

> On Wed, 2018-03-14 at 10:22 +0100, Mohammed Gamal wrote:
> > On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:  
> > > On Tue, 13 Mar 2018 20:06:50 +0100
> > > Mohammed Gamal <mgamal@redhat.com> wrote:
> > >   
> > > > Dring high network traffic changes to network interface
> > > > parameters
> > > > such as number of channels or MTU can cause a kernel panic with a
> > > > NULL
> > > > pointer dereference. This is due to netvsc_device_remove() being
> > > > called and deallocating the channel ring buffers, which can then
> > > > be
> > > > accessed by netvsc_send_pkt() before they're allocated on calling
> > > > netvsc_device_add()
> > > > 
> > > > The patch fixes this problem by checking the channel state and
> > > > returning
> > > > ENODEV if not yet opened. We also move the call to
> > > > hv_ringbuf_avail_percent()
> > > > which may access the uninitialized ring buffer.
> > > > 
> > > > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
> > > > ---
> > > >  drivers/net/hyperv/netvsc.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/hyperv/netvsc.c
> > > > b/drivers/net/hyperv/netvsc.c
> > > > index 0265d70..44a8358 100644
> > > > --- a/drivers/net/hyperv/netvsc.c
> > > > +++ b/drivers/net/hyperv/netvsc.c
> > > > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> > > >  	struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > > > packet->q_idx);
> > > >  	u64 req_id;
> > > >  	int ret;
> > > > -	u32 ring_avail = hv_ringbuf_avail_percent(&out_channel-  
> > > > > outbound);  
> > > > 
> > > > +	u32 ring_avail;
> > > >  
> > > >  	nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> > > >  	if (skb)
> > > > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> > > >  
> > > >  	req_id = (ulong)skb;
> > > >  
> > > > -	if (out_channel->rescind)
> > > > +	if (out_channel->rescind || out_channel->state !=
> > > > CHANNEL_OPENED_STATE)
> > > >  		return -ENODEV;
> > > >  
> > > >  	if (packet->page_buf_cnt) {
> > > > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> > > >  				       VMBUS_DATA_PACKET_FLAG_CO
> > > > MP
> > > > LETION_REQUESTED);
> > > >  	}
> > > >  
> > > > +	ring_avail = hv_ringbuf_avail_percent(&out_channel-  
> > > > > outbound);  
> > > > 
> > > >  	if (ret == 0) {
> > > >  		atomic_inc_return(&nvchan->queue_sends);
> > > >    
> > > 
> > > Thanks for your patch. Yes there are races with the current update
> > > logic. The root cause goes higher up in the flow; the send queues
> > > should
> > > be stopped before netvsc_device_remove is called. Solving it where
> > > you tried
> > > to is racy and not going to work reliably.
> > > 
> > > Network patches should go to netdev@vger.kernel.org
> > > 
> > > You can't move the ring_avail check until after the
> > > vmbus_sendpacket
> > > because
> > > that will break the flow control logic.
> > >   
> > 
> > Why? I don't see ring_avail being used before that point.  
> 
> Ah, stupid me. vmbus_sendpacket() will write to the ring buffer and
> that means that ring_avail value will be different than the expected.
> 
> >   
> > > Instead, you should just move the avail_read check until just after
> > > the existing rescind
> > > check.
> > > 
> > > Also, you shouldn't need to check for OPENED_STATE, just rescind is
> > > enough.  
> > 
> > That rarely mitigated the race. channel->rescind flag is set on vmbus
> > exit - called on module unload - and when a rescind offer is received
> > from the host, which AFAICT doesn't happen on every call to
> > netvsc_device_remove, so it's quite possible that the ringbuffer is
> > accessed before it's allocated again on channel open and hence the
> > check for OPENED_STAT - which is only set after all vmbus data is
> > initialized.
> >   
> 
> Perhaps I haven't been clear enough. The NULL pointer dereference
> happens in the call to hv_ringbuf_avail_percent() which is used to
> calculate ring_avail. 
> 
> So we need to stop the queues before calling it if the channel's ring
> buffers haven't been allocated yet, but OTOH we should only stop the
> queues based upon the value of ring_avail, so this leads into a chicken
> and egg situation. 
> 
> Is my observation here correct? Please correct me if I am wrong,
> Stephen.

This is a far more drastic work of the shutdown logic which I am still working
on. The current netvsc driver is not doing a good job of handling concurrent
send/receives during ring changes.


From a22da18b41ad5024340dddcc989d918987836f6d Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <sthemmin@microsoft.com>
Date: Tue, 6 Feb 2018 15:05:19 -0800
Subject: [PATCH] hv_netvsc: common detach logic

Make common function for detaching internals of device
during changes to MTU and RSS. Make sure no more packets
are transmitted and all packets have been received before
doing device teardown.

Changes transmit enabling logic so that transmit queues are disabled
during the period when lower device is being changed. And enabled
only after sub channels are setup. This avoids issue where it could
be that a packet was being sent while subchannel was not initialized.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/hyperv_net.h   |   1 -
 drivers/net/hyperv/netvsc.c       |  20 +--
 drivers/net/hyperv/netvsc_drv.c   | 268 +++++++++++++++++++++-----------------
 drivers/net/hyperv/rndis_filter.c |  14 +-
 4 files changed, 160 insertions(+), 143 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index cd538d5a7986..32861036c3fc 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -212,7 +212,6 @@ void netvsc_channel_cb(void *context);
 int netvsc_poll(struct napi_struct *napi, int budget);
 
 void rndis_set_subchannel(struct work_struct *w);
-bool rndis_filter_opened(const struct netvsc_device *nvdev);
 int rndis_filter_open(struct netvsc_device *nvdev);
 int rndis_filter_close(struct netvsc_device *nvdev);
 struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 8c95a3797b2f..31b1c6c430bb 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -573,8 +573,6 @@ void netvsc_device_remove(struct hv_device *device)
 		= rtnl_dereference(net_device_ctx->nvdev);
 	int i;
 
-	cancel_work_sync(&net_device->subchan_work);
-
 	netvsc_revoke_buf(device, net_device);
 
 	RCU_INIT_POINTER(net_device_ctx->nvdev, NULL);
@@ -661,14 +659,18 @@ static void netvsc_send_tx_complete(struct netvsc_device *net_device,
 	queue_sends =
 		atomic_dec_return(&net_device->chan_table[q_idx].queue_sends);
 
-	if (net_device->destroy && queue_sends == 0)
-		wake_up(&net_device->wait_drain);
+	if (unlikely(net_device->destroy)) {
+		if (queue_sends == 0)
+			wake_up(&net_device->wait_drain);
+	} else {
+		struct netdev_queue *txq = netdev_get_tx_queue(ndev, q_idx);
 
-	if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) &&
-	    (hv_ringbuf_avail_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER ||
-	     queue_sends < 1)) {
-		netif_tx_wake_queue(netdev_get_tx_queue(ndev, q_idx));
-		ndev_ctx->eth_stats.wake_queue++;
+		if (netif_tx_queue_stopped(txq) &&
+		    (hv_ringbuf_avail_percent(&channel->outbound) > RING_AVAIL_PERCENT_HIWATER ||
+		     queue_sends < 1)) {
+			netif_tx_wake_queue(txq);
+			ndev_ctx->eth_stats.wake_queue++;
+		}
 	}
 }
 
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index faea0be18924..721fac7cad81 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -46,7 +46,8 @@
 
 #include "hyperv_net.h"
 
-#define RING_SIZE_MIN		64
+#define RING_SIZE_MIN	64
+#define RETRY_MAX	2000
 
 #define LINKCHANGE_INT (2 * HZ)
 #define VF_TAKEOVER_INT (HZ / 10)
@@ -123,10 +124,8 @@ static int netvsc_open(struct net_device *net)
 	}
 
 	rdev = nvdev->extension;
-	if (!rdev->link_state) {
+	if (!rdev->link_state)
 		netif_carrier_on(net);
-		netif_tx_wake_all_queues(net);
-	}
 
 	if (vf_netdev) {
 		/* Setting synthetic device up transparently sets
@@ -142,36 +141,25 @@ static int netvsc_open(struct net_device *net)
 	return 0;
 }
 
-static int netvsc_close(struct net_device *net)
+static int netvsc_wait_until_empty(struct netvsc_device *nvdev)
 {
-	struct net_device_context *net_device_ctx = netdev_priv(net);
-	struct net_device *vf_netdev
-		= rtnl_dereference(net_device_ctx->vf_netdev);
-	struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
-	int ret = 0;
-	u32 aread, i, msec = 10, retry = 0, retry_max = 20;
-	struct vmbus_channel *chn;
-
-	netif_tx_disable(net);
-
-	/* No need to close rndis filter if it is removed already */
-	if (!nvdev)
-		goto out;
-
-	ret = rndis_filter_close(nvdev);
-	if (ret != 0) {
-		netdev_err(net, "unable to close device (ret %d).\n", ret);
-		return ret;
-	}
+	unsigned int retry = 0;
+	int i;
 
 	/* Ensure pending bytes in ring are read */
-	while (true) {
-		aread = 0;
+	for (;;) {
+		u32 aread = 0;
+
 		for (i = 0; i < nvdev->num_chn; i++) {
-			chn = nvdev->chan_table[i].channel;
+			struct vmbus_channel *chn
+				= nvdev->chan_table[i].channel;
+
 			if (!chn)
 				continue;
 
+			/* make sure receive not running now */
+			napi_synchronize(&nvdev->chan_table[i].napi);
+
 			aread = hv_get_bytes_to_read(&chn->inbound);
 			if (aread)
 				break;
@@ -181,22 +169,40 @@ static int netvsc_close(struct net_device *net)
 				break;
 		}
 
-		retry++;
-		if (retry > retry_max || aread == 0)
-			break;
+		if (aread == 0)
+			return 0;
 
-		msleep(msec);
+		if (++retry > RETRY_MAX)
+			return -ETIMEDOUT;
 
-		if (msec < 1000)
-			msec *= 2;
+		usleep_range(1000, 2000);
 	}
+}
 
-	if (aread) {
-		netdev_err(net, "Ring buffer not empty after closing rndis\n");
-		ret = -ETIMEDOUT;
+static int netvsc_close(struct net_device *net)
+{
+	struct net_device_context *net_device_ctx = netdev_priv(net);
+	struct net_device *vf_netdev
+		= rtnl_dereference(net_device_ctx->vf_netdev);
+	struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
+	int ret;
+
+	netif_tx_disable(net);
+
+	/* No need to close rndis filter if it is removed already */
+	if (!nvdev)
+		return 0;
+
+	ret = rndis_filter_close(nvdev);
+	if (ret != 0) {
+		netdev_err(net, "unable to close device (ret %d).\n", ret);
+		return ret;
 	}
 
-out:
+	ret = netvsc_wait_until_empty(nvdev);
+	if (ret)
+		netdev_err(net, "Ring buffer not empty after closing rndis\n");
+
 	if (vf_netdev)
 		dev_close(vf_netdev);
 
@@ -845,16 +851,76 @@ static void netvsc_get_channels(struct net_device *net,
 	}
 }
 
+static int netvsc_detach(struct net_device *ndev,
+			 struct netvsc_device *nvdev)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
+	struct hv_device *hdev = ndev_ctx->device_ctx;
+	int ret;
+
+	/* Don't try continuing to try and setup sub channels */
+	if (cancel_work_sync(&nvdev->subchan_work))
+		nvdev->num_chn = 1;
+
+	/* If device was up (receiving) then shutdown */
+	if (netif_running(ndev)) {
+		netif_tx_disable(ndev);
+
+		ret = rndis_filter_close(nvdev);
+		if (ret) {
+			netdev_err(ndev,
+				   "unable to close device (ret %d).\n", ret);
+			return ret;
+		}
+
+		ret = netvsc_wait_until_empty(nvdev);
+		if (ret) {
+			netdev_err(ndev,
+				   "Ring buffer not empty after closing rndis\n");
+			return ret;
+		}
+	}
+
+	rndis_filter_device_remove(hdev, nvdev);
+	return 0;
+}
+
+static int netvsc_attach(struct net_device *ndev,
+			 struct netvsc_device_info *dev_info)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(ndev);
+	struct hv_device *hdev = ndev_ctx->device_ctx;
+	struct netvsc_device *nvdev;
+	struct rndis_device *rdev;
+	int ret;
+
+	nvdev = rndis_filter_device_add(hdev, dev_info);
+	if (IS_ERR(nvdev))
+		return PTR_ERR(nvdev);
+
+	netif_carrier_off(ndev);
+
+	if (netif_running(ndev)) {
+		ret = rndis_filter_open(nvdev);
+		if (ret)
+			return ret;
+
+		rdev = nvdev->extension;
+		if (!rdev->link_state)
+			netif_carrier_on(ndev);
+	}
+
+	return 0;
+}
+
 static int netvsc_set_channels(struct net_device *net,
 			       struct ethtool_channels *channels)
 {
 	struct net_device_context *net_device_ctx = netdev_priv(net);
-	struct hv_device *dev = net_device_ctx->device_ctx;
 	struct netvsc_device *nvdev = rtnl_dereference(net_device_ctx->nvdev);
 	unsigned int orig, count = channels->combined_count;
 	struct netvsc_device_info device_info;
-	bool was_opened;
-	int ret = 0;
+	int ret;
 
 	/* We do not support separate count for rx, tx, or other */
 	if (count == 0 ||
@@ -871,9 +937,6 @@ static int netvsc_set_channels(struct net_device *net,
 		return -EINVAL;
 
 	orig = nvdev->num_chn;
-	was_opened = rndis_filter_opened(nvdev);
-	if (was_opened)
-		rndis_filter_close(nvdev);
 
 	memset(&device_info, 0, sizeof(device_info));
 	device_info.num_chn = count;
@@ -882,28 +945,17 @@ static int netvsc_set_channels(struct net_device *net,
 	device_info.recv_sections = nvdev->recv_section_cnt;
 	device_info.recv_section_size = nvdev->recv_section_size;
 
-	rndis_filter_device_remove(dev, nvdev);
+	ret = netvsc_detach(net, nvdev);
+	if (ret)
+		return ret;
 
-	nvdev = rndis_filter_device_add(dev, &device_info);
-	if (IS_ERR(nvdev)) {
-		ret = PTR_ERR(nvdev);
+	ret = netvsc_attach(net, &device_info);
+	if (ret) {
 		device_info.num_chn = orig;
-		nvdev = rndis_filter_device_add(dev, &device_info);
-
-		if (IS_ERR(nvdev)) {
-			netdev_err(net, "restoring channel setting failed: %ld\n",
-				   PTR_ERR(nvdev));
-			return ret;
-		}
+		if (netvsc_attach(net, &device_info))
+			netdev_err(net, "restoring channel setting failed\n");
 	}
 
-	if (was_opened)
-		rndis_filter_open(nvdev);
-
-	/* We may have missed link change notifications */
-	net_device_ctx->last_reconfig = 0;
-	schedule_delayed_work(&net_device_ctx->dwork, 0);
-
 	return ret;
 }
 
@@ -969,10 +1021,8 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 	struct net_device_context *ndevctx = netdev_priv(ndev);
 	struct net_device *vf_netdev = rtnl_dereference(ndevctx->vf_netdev);
 	struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
-	struct hv_device *hdev = ndevctx->device_ctx;
 	int orig_mtu = ndev->mtu;
 	struct netvsc_device_info device_info;
-	bool was_opened;
 	int ret = 0;
 
 	if (!nvdev || nvdev->destroy)
@@ -985,11 +1035,6 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 			return ret;
 	}
 
-	netif_device_detach(ndev);
-	was_opened = rndis_filter_opened(nvdev);
-	if (was_opened)
-		rndis_filter_close(nvdev);
-
 	memset(&device_info, 0, sizeof(device_info));
 	device_info.num_chn = nvdev->num_chn;
 	device_info.send_sections = nvdev->send_section_cnt;
@@ -997,35 +1042,27 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu)
 	device_info.recv_sections = nvdev->recv_section_cnt;
 	device_info.recv_section_size = nvdev->recv_section_size;
 
-	rndis_filter_device_remove(hdev, nvdev);
+	ret = netvsc_detach(ndev, nvdev);
+	if (ret)
+		goto rollback_vf;
 
 	ndev->mtu = mtu;
 
-	nvdev = rndis_filter_device_add(hdev, &device_info);
-	if (IS_ERR(nvdev)) {
-		ret = PTR_ERR(nvdev);
-
-		/* Attempt rollback to original MTU */
-		ndev->mtu = orig_mtu;
-		nvdev = rndis_filter_device_add(hdev, &device_info);
-
-		if (vf_netdev)
-			dev_set_mtu(vf_netdev, orig_mtu);
-
-		if (IS_ERR(nvdev)) {
-			netdev_err(ndev, "restoring mtu failed: %ld\n",
-				   PTR_ERR(nvdev));
-			return ret;
-		}
-	}
+	ret = netvsc_attach(ndev, &device_info);
+	if (ret)
+		goto rollback;
 
-	if (was_opened)
-		rndis_filter_open(nvdev);
+	return 0;
 
-	netif_device_attach(ndev);
+rollback:
+	/* Attempt rollback to original MTU */
+	ndev->mtu = orig_mtu;
 
-	/* We may have missed link change notifications */
-	schedule_delayed_work(&ndevctx->dwork, 0);
+	if (netvsc_attach(ndev, &device_info))
+		netdev_err(ndev, "restoring mtu failed\n");
+rollback_vf:
+	if (vf_netdev)
+		dev_set_mtu(vf_netdev, orig_mtu);
 
 	return ret;
 }
@@ -1531,11 +1568,9 @@ static int netvsc_set_ringparam(struct net_device *ndev,
 {
 	struct net_device_context *ndevctx = netdev_priv(ndev);
 	struct netvsc_device *nvdev = rtnl_dereference(ndevctx->nvdev);
-	struct hv_device *hdev = ndevctx->device_ctx;
 	struct netvsc_device_info device_info;
 	struct ethtool_ringparam orig;
 	u32 new_tx, new_rx;
-	bool was_opened;
 	int ret = 0;
 
 	if (!nvdev || nvdev->destroy)
@@ -1560,34 +1595,18 @@ static int netvsc_set_ringparam(struct net_device *ndev,
 	device_info.recv_sections = new_rx;
 	device_info.recv_section_size = nvdev->recv_section_size;
 
-	netif_device_detach(ndev);
-	was_opened = rndis_filter_opened(nvdev);
-	if (was_opened)
-		rndis_filter_close(nvdev);
-
-	rndis_filter_device_remove(hdev, nvdev);
-
-	nvdev = rndis_filter_device_add(hdev, &device_info);
-	if (IS_ERR(nvdev)) {
-		ret = PTR_ERR(nvdev);
+	ret = netvsc_detach(ndev, nvdev);
+	if (ret)
+		return ret;
 
+	ret = netvsc_attach(ndev, &device_info);
+	if (ret) {
 		device_info.send_sections = orig.tx_pending;
 		device_info.recv_sections = orig.rx_pending;
-		nvdev = rndis_filter_device_add(hdev, &device_info);
-		if (IS_ERR(nvdev)) {
-			netdev_err(ndev, "restoring ringparam failed: %ld\n",
-				   PTR_ERR(nvdev));
-			return ret;
-		}
-	}
-
-	if (was_opened)
-		rndis_filter_open(nvdev);
-	netif_device_attach(ndev);
 
-	/* We may have missed link change notifications */
-	ndevctx->last_reconfig = 0;
-	schedule_delayed_work(&ndevctx->dwork, 0);
+		if (netvsc_attach(ndev, &device_info))
+			netdev_err(ndev, "restoring ringparam failed");
+	}
 
 	return ret;
 }
@@ -2072,8 +2091,8 @@ static int netvsc_probe(struct hv_device *dev,
 static int netvsc_remove(struct hv_device *dev)
 {
 	struct net_device_context *ndev_ctx;
-	struct net_device *vf_netdev;
-	struct net_device *net;
+	struct net_device *vf_netdev, *net;
+	struct netvsc_device *nvdev;
 
 	net = hv_get_drvdata(dev);
 	if (net == NULL) {
@@ -2083,8 +2102,6 @@ static int netvsc_remove(struct hv_device *dev)
 
 	ndev_ctx = netdev_priv(net);
 
-	netif_device_detach(net);
-
 	cancel_delayed_work_sync(&ndev_ctx->dwork);
 
 	/*
@@ -2092,14 +2109,19 @@ static int netvsc_remove(struct hv_device *dev)
 	 * removed. Also blocks mtu and channel changes.
 	 */
 	rtnl_lock();
+
 	vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
 	if (vf_netdev)
 		netvsc_unregister_vf(vf_netdev);
 
+	nvdev = rtnl_dereference(ndev_ctx->nvdev);
+	if (nvdev) {
+		cancel_work_sync(&nvdev->subchan_work);
+		rndis_filter_device_remove(dev, nvdev);
+	}
+
 	unregister_netdevice(net);
 
-	rndis_filter_device_remove(dev,
-				   rtnl_dereference(ndev_ctx->nvdev));
 	rtnl_unlock();
 
 	hv_set_drvdata(dev, NULL);
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index a9746f9fbf4b..3ccb0c6263a8 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1122,6 +1122,7 @@ void rndis_set_subchannel(struct work_struct *w)
 	for (i = 0; i < VRSS_SEND_TAB_SIZE; i++)
 		ndev_ctx->tx_table[i] = i % nvdev->num_chn;
 
+	netif_tx_wake_all_queues(ndev);
 	rtnl_unlock();
 	return;
 
@@ -1132,6 +1133,7 @@ void rndis_set_subchannel(struct work_struct *w)
 
 	nvdev->max_chn = 1;
 	nvdev->num_chn = 1;
+	netif_wake_queue(ndev);
 unlock:
 	rtnl_unlock();
 }
@@ -1326,6 +1328,8 @@ struct netvsc_device *rndis_filter_device_add(struct hv_device *dev,
 
 	if (net_device->num_chn > 1)
 		schedule_work(&net_device->subchan_work);
+	else
+		netif_wake_queue(net);
 
 out:
 	/* if unavailable, just proceed with one queue */
@@ -1346,9 +1350,6 @@ void rndis_filter_device_remove(struct hv_device *dev,
 {
 	struct rndis_device *rndis_dev = net_dev->extension;
 
-	/* Don't try and setup sub channels if about to halt */
-	cancel_work_sync(&net_dev->subchan_work);
-
 	/* Halt and release the rndis device */
 	rndis_filter_halt_device(net_dev, rndis_dev);
 
@@ -1372,10 +1373,3 @@ int rndis_filter_close(struct netvsc_device *nvdev)
 
 	return rndis_filter_close_device(nvdev->extension);
 }
-
-bool rndis_filter_opened(const struct netvsc_device *nvdev)
-{
-	const struct rndis_device *dev = nvdev->extension;
-
-	return dev->state == RNDIS_DEV_DATAINITIALIZED;
-}
-- 
2.16.1

^ permalink raw reply related

* Re: [PATCH v2] net: ethernet: ti: cpsw: add check for in-band mode setting with RGMII PHY interface
From: Grygorii Strashko @ 2018-03-15 17:39 UTC (permalink / raw)
  To: SZ Lin (林上智)
  Cc: Schuyler Patton, David S. Miller, Ivan Khoronzhuk, Keerthy,
	Sekhar Nori, linux-omap, netdev, linux-kernel
In-Reply-To: <20180315165603.30471-1-sz.lin@moxa.com>



On 03/15/2018 11:56 AM, SZ Lin (林上智) wrote:
> According to AM335x TRM[1] 14.3.6.2, AM437x TRM[2] 15.3.6.2 and
> DRA7 TRM[3] 24.11.4.8.7.3.3, in-band mode in EXT_EN(bit18) register is only
> available when PHY is configured in RGMII mode with 10Mbps speed. It will
> cause some networking issues without RGMII mode, such as carrier sense
> errors and low throughput. TI also mentioned this issue in their forum[4].
> 
> This patch adds the check mechanism for PHY interface with RGMII interface
> type, the in-band mode can only be set in RGMII mode with 10Mbps speed.
> 
> References:
> [1]: https://www.ti.com/lit/ug/spruh73p/spruh73p.pdf
> [2]: http://www.ti.com/lit/ug/spruhl7h/spruhl7h.pdf
> [3]: http://www.ti.com/lit/ug/spruic2b/spruic2b.pdf
> [4]: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/640765/2392155
> 
> Suggested-by: Holsety Chen (陳憲輝) <Holsety.Chen@moxa.com>
> Signed-off-by: SZ Lin (林上智) <sz.lin@moxa.com>
> Signed-off-by: Schuyler Patton <spatton@ti.com>
> ---
> Changes from v1:
> - Use phy_interface_is_rgmii helper function
> - Remove blank line
> 

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>

>   drivers/net/ethernet/ti/cpsw.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 1b1b78fdc138..b2b30c9df037 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -1014,7 +1014,8 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave,
>   		/* set speed_in input in case RMII mode is used in 100Mbps */
>   		if (phy->speed == 100)
>   			mac_control |= BIT(15);
> -		else if (phy->speed == 10)
> +		/* in band mode only works in 10Mbps RGMII mode */
> +		else if ((phy->speed == 10) && phy_interface_is_rgmii(phy))
>   			mac_control |= BIT(18); /* In Band mode */
>   
>   		if (priv->rx_pause)
> 

-- 
regards,
-grygorii

^ permalink raw reply

* Re: [Intel-wired-lan] [RFC PATCH 2/2] ixgbe: setup XPS via netif_set_xps()
From: Alexander Duyck @ 2018-03-15 17:22 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Netdev, Eric Dumazet, intel-wired-lan, David S. Miller
In-Reply-To: <1521133538.2681.18.camel@redhat.com>

On Thu, Mar 15, 2018 at 10:05 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> Hi,
>
> On Thu, 2018-03-15 at 09:43 -0700, Alexander Duyck wrote:
>> On Thu, Mar 15, 2018 at 8:08 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>> > Before this commit, ixgbe with the default setting lacks XPS mapping
>> > for CPUs id greater than the number of tx queues.
>> >
>> > As a consequence the xmit path for such CPUs experience a relevant cost
>> > in __netdev_pick_tx, mainly due to skb_tx_hash(), as reported by the perf
>> > tool:
>> >
>> > 7.55%--netdev_pick_tx
>> >         |
>> >         --6.92%--__netdev_pick_tx
>> >                   |
>> >                   --6.35%--__skb_tx_hash
>> >                             |
>> >                             --5.94%--__skb_get_hash
>> >                                       |
>> >                                       --3.22%--__skb_flow_dissect
>> >
>> > in the following  scenario:
>> >
>> > ethtool -L em1 combined 1
>> > taskset 2 netperf -H 192.168.1.1 -t UDP_STREAM -- -m 1
>> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.101.1 () port 0 AF_INET
>> > Socket  Message  Elapsed      Messages
>> > Size    Size     Time         Okay Errors   Throughput
>> > bytes   bytes    secs            #      #   10^6bits/sec
>> >
>> > 212992       1   10.00     11497225      0       9.20
>> >
>> > After this commit the perf tool reports:
>> >
>> > 0.85%--__netdev_pick_tx
>> >
>> > and netperf reports:
>> >
>> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.101.1 () port 0 AF_INET
>> > Socket  Message  Elapsed      Messages
>> > Size    Size     Time         Okay Errors   Throughput
>> > bytes   bytes    secs            #      #   10^6bits/sec
>> >
>> > 212992       1   10.00     12736058      0      10.19
>> >
>> > roughly +10% in xmit tput.
>> >
>> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>
>> I think we shouldn't be configuring XPS if number of Tx or Rx queues
>> is less than the number of CPUs, or ATR is not enabled.
>
> Thank you for the feedback!
>
> Please note the currently the ixgbe driver is enabling XPS regardless
> of the above considerations.
>
>> Really the XPS bits are only really supposed to be used with the ATR
>> functionality enabled. If we don't have enough queues for a 1:1
>> mapping we should probably not be programming XPS since ATR isn't
>> going to function right anyway.
>
> uhm... I don't know the details of ATR, but apparently it is for TCP
> only, while the use-case I'm referring to is plain (no tunnel)
> unconnected UDP traffic. Am I missing something?

No. Basically the ATR/XPS bits is an overreach. The original code had
the driver just using the incoming CPU to select a Tx queue via the
ndo_select_queue. I pushed it out to XPS in order to try to get the
drivers to avoid using ndo_select_queue and provide the user with a
way to at least manually disable the Tx side of it.

For now I would say we should not have the driver configuring the XPS
map if it cannot assume a 1:1 mapping. As-is there are a number of
features where having this functionality enabled doesn't make sense.
In those cases we leave cpu as -1 in ixgbe_alloc_q_vector, and leave
the affinity mask as all 0s. It might make sense to just update the
code there in the case of ixgbe so that we don't update the XPS map or
the q_vector->affinity_mask if we cannot achieve a 1:1 mapping. As is
I would say the code is probably in need of updates since
ixgbe_alloc_q_vector doesn't handle the case where we might have a
non-linear CPU layout.

ATR is a feature that has been on my list of things to fix sometime in
the near future, but I haven't had the time as I have been pulled into
too many other efforts. Ideally we should be moving away from ATR and
instead looking at doing something like supporting ndo_rx_flow_steer.

Thanks.

- Alex

^ permalink raw reply

* Re: [RFC PATCH 0/2] net:setup XPS mapping for each online CPU
From: Paolo Abeni @ 2018-03-15 17:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David Miller, Jeff Kirsher, intel-wired-lan,
	Alexander Duyck
In-Reply-To: <CANn89iLJOVuN-1j+9_qr_LaU7AE6COHdnp4Rc7fe9p6_s=hCCA@mail.gmail.com>

Hi,

On Thu, 2018-03-15 at 15:59 +0000, Eric Dumazet wrote:
> On Thu, Mar 15, 2018 at 8:51 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > I'm sorry, I do not follow. AFAICS with unconnected sockets without XPS
> > we always hit the netdev_pick_tx()/skb_tx_hash()/skb_flow_dissect()
> > overhead in xmit path.
> 
> Then fix this if you want, instead of fixing one NIC only, or by enforcing
> XPS by all NIC.
> 
> For unconnected sockets, picking the TX queue based on current cpu is good,
> we do not have to enforce ordering as much as possible.
> 
> (pfifo_fast no longer can enforce it anyway)

Thank you for the prompt reply. 

I'm double checking to avoid misinterpretation on my side: are you
suggesting to plug a CPU-based selection logic for unconnected sockets
in netdev_pick_tx() or to cook patches like 2/2 for all the relevant
NICs? 

Thanks!

Paolo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox