Netdev List

Netdev List
 help / color / mirror / Atom feed

* pull request: bluetooth 2018-05-07
From: Johan Hedberg @ 2018-05-07 15:05 UTC (permalink / raw)
  To: davem; +Cc: linux-bluetooth, netdev

[-- Attachment #1: Type: text/plain, Size: 1189 bytes --]

Hi Dave,

Here are a few more Bluetooth fixes for the 4.17 kernel, all for the
btusb driver. Two relate to the needs_reset_resume table, and one is a
revert of a patch for Atheros 1525/QCA6174 which caused a regression for
some people.

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 2cb5fb1454ef4990f44f3070226ee29201bd5c87:

  MAINTAINERS: add myself as SCTP co-maintainer (2018-04-29 22:49:45 -0400)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth.git for-upstream

for you to fetch changes up to 596b07a9a22656493726edf1739569102bd3e136:

  Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table (2018-04-30 10:56:04 +0200)

----------------------------------------------------------------
Hans de Goede (3):
      Revert "Bluetooth: btusb: Fix quirk for Atheros 1525/QCA6174"
      Bluetooth: btusb: Only check needs_reset_resume DMI table for QCA rome chipsets
      Bluetooth: btusb: Add Dell XPS 13 9360 to btusb_needs_reset_resume_table

 drivers/bluetooth/btusb.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: Locking in network code
From: Stephen Hemminger @ 2018-05-07 14:48 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Jacob S. Moroni, Netdev
In-Reply-To: <CAKgT0UddSGs-d0cbQV4YN8RLEqa478C7eG3HNFf1Y-yivWPUFw@mail.gmail.com>

On Sun, 6 May 2018 09:16:26 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Sun, May 6, 2018 at 6:43 AM, Jacob S. Moroni <mail@jakemoroni.com> wrote:
> > Hello,
> >
> > I have a stupid question regarding which variant of spin_lock to use
> > throughout the network stack, and inside RX handlers specifically.
> >
> > It's my understanding that skbuffs are normally passed into the stack
> > from soft IRQ context if the device is using NAPI, and hard IRQ
> > context if it's not using NAPI (and I guess process context too if the
> > driver does it's own workqueue thing).
> >
> > So, that means that handlers registered with netdev_rx_handler_register
> > may end up being called from any context.  
> 
> I am pretty sure the Rx handlers are all called from softirq context.
> The hard IRQ will just call netif_rx which will queue the packet up to
> be handles in the soft IRQ later.

The only exception is the netpoll code which runs stack in hardirq context.

> > However, the RX handler in the macvlan code calls ip_check_defrag,
> > which could eventually lead to a call to ip_defrag, which ends
> > up taking a regular spin_lock around the call to ip_frag_queue.
> >
> > Is this a risk of deadlock, and if not, why?
> >
> > What if you're running a system with one CPU and a packet fragment
> > arrives on a NAPI interface, then, while the spin_lock is held,
> > another fragment somehow arrives on another interface which does
> > its processing in hard IRQ context?
> >
> > --
> >   Jacob S. Moroni
> >   mail@jakemoroni.com  
> 
> Take a look at the netif_rx code and it should answer most of your
> questions. Basically everything is handed off from the hard IRQ to the
> soft IRQ via a backlog queue and then handled in net_rx_action.
> 
> - Alex

^ permalink raw reply

* Re: [iproute2-next  1/1] tipc: Add support to set and get MTU for UDP bearer
From: Stephen Hemminger @ 2018-05-07 14:46 UTC (permalink / raw)
  To: GhantaKrishnamurthy MohanKrishna
  Cc: tipc-discussion, jon.maloy, maloy, ying.xue, netdev, davem,
	dsahern
In-Reply-To: <1525691673-22966-1-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

On Mon, 7 May 2018 13:14:33 +0200
GhantaKrishnamurthy MohanKrishna         <mohan.krishna.ghanta.krishnamurthy@ericsson.com> wrote:

> +				struct opt *opts)
> +{
> +	struct opt *opt;
> +
> +	if (!(opt = get_opt(opts, "media"))) {

You don't need to have assignment in conditional context in this case.
Please split the assignment and the if.

^ permalink raw reply

* Re: [PATCH iproute2] rdma: fix header files
From: Stephen Hemminger @ 2018-05-07 14:45 UTC (permalink / raw)
  To: David Ahern; +Cc: swise, netdev
In-Reply-To: <424b2502-e79e-fad7-255d-37a4ddd24068@gmail.com>

On Sat, 5 May 2018 09:17:31 -0600
David Ahern <dsahern@gmail.com> wrote:

> On 5/4/18 10:58 PM, Stephen Hemminger wrote:
> > On Fri, 4 May 2018 16:13:07 -0600
> > David Ahern <dsahern@gmail.com> wrote:
> >   
> >> On 5/4/18 3:56 PM, Stephen Hemminger wrote:  
> >>> All user api headers in iproute2 should be in include/uapi
> >>> so that script can be used to put correct sanitized kernel headers
> >>> there. And the header files for rdma must be a complete set; if one
> >>> header file includes another, all must be present.
> >>>
> >>> This fixes build on older distributions, and Windows Services
> >>> for Linux.
> >>>
> >>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> >>> ---
> >>>  include/uapi/rdma/ib_user_sa.h                |   77 ++
> >>>  include/uapi/rdma/ib_user_verbs.h             | 1210 +++++++++++++++++
> >>>  .../uapi/rdma/rdma_netlink.h                  |   13 +
> >>>  .../uapi/rdma/rdma_user_cm.h                  |    6 +-
> >>>  4 files changed, 1303 insertions(+), 3 deletions(-)
> >>>  create mode 100644 include/uapi/rdma/ib_user_sa.h
> >>>  create mode 100644 include/uapi/rdma/ib_user_verbs.h
> >>>  rename {rdma/include => include}/uapi/rdma/rdma_netlink.h (95%)
> >>>  rename {rdma/include => include}/uapi/rdma/rdma_user_cm.h (98%)
> >>>     
> >>
> >> Stephen:
> >>
> >> Per a recent discussion the RDMA folks need to take ownership of the
> >> uapi files. RDMA features do not hit Dave's net-next tree so the rdma
> >> code can never hit iproute2-next during a dev cycle.  
> > 
> > I want all uapi headers in include/uapi because it avoids possible overlap problems,
> > During the linux-net/linus release cycle they should match what is Linus's tree.
> > 
> > During the net-next they can come from two sources.
> >   
> 
> That creates extra work for me for no reason.
> 
> You state above "user api headers in iproute2 should be in include/uapi
> so that script can be used to put correct sanitized kernel headers there."
> 
> With RDMA's development cycle that will *never* happen. With the
> exception of RDMA, all iproute2 features go through net-next and it is
> the right tree to pull updates from. Every time I sync from net-next the
> header files for rdma will have to be ignored, so they will never be
> updated through this mechanism which means the stated goal is not
> achievable.
> 
> As for linux-next, I will not sync header files to a tree that
> disappears; it breaks all traceability. Further, it seems to me that it
> does not really solve the problem. I forget the steps now but RDMA
> features have to hit some development tree before going to Linus, so
> there will be a delay with the headers.
> 
> Back in March we discussed options. iproute2 is nothing more than a
> delivery vehicle for rdmatool. Since it breaks everything else about the
> iproute2 and net-next association, the simplest option for everyone is
> for the rdma group to control syncing their own headers and putting them
> under rdma directory.

There are three different issues here:
1. What directory should rdma files be in. It really doesn't matter that much
   so lets keep them in rdma/incude/uapi.
2. For current master branch the set of include files does not match Linus's current
   tree.
3. The header files for rdma do not include all the referenced headers. My rule has
   been if foo.c includes uapi/foo.h and foo.h includes bar.h then bar.h must also
   be in the include/uapi directory.

I will spin a new patch addressing #2 and #3.

^ permalink raw reply

* [net-next 5/6] fm10k: warn if the stat size is unknown
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 0565d6f795e5..2bb7b71cf460 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -247,6 +247,8 @@ static void __fm10k_add_ethtool_stats(u64 **data, void *pointer,
 			*((*data)++) = *(u8 *)p;
 			break;
 		default:
+			WARN_ONCE(1, "unexpected stat size for %s",
+				  stats[i].stat_string);
 			*((*data)++) = 0;
 		}
 	}
-- 
2.14.3

^ permalink raw reply related

* [net-next 4/6] fm10k: use macro to avoid passing the array and size separately
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Avoid potential bugs with fm10k_add_stat_strings and
fm10k_add_ethtool_stats by using a macro to calculate the ARRAY_SIZE
when passing. This helps ensure that the size is always correct.

Note that it assumes we only pass static const fm10k_stat arrays, and
that evaluation of the argument won't have side effects.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 48 +++++++++++-------------
 1 file changed, 21 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index fa7c026fe874..0565d6f795e5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -134,8 +134,8 @@ enum {
 static const char fm10k_prv_flags[FM10K_PRV_FLAG_LEN][ETH_GSTRING_LEN] = {
 };
 
-static void fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
-				   const unsigned int size, ...)
+static void __fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
+				     const unsigned int size, ...)
 {
 	unsigned int i;
 
@@ -149,31 +149,28 @@ static void fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
 	}
 }
 
+#define fm10k_add_stat_strings(p, stats, ...) \
+	__fm10k_add_stat_strings(p, stats, ARRAY_SIZE(stats), ## __VA_ARGS__)
+
 static void fm10k_get_stat_strings(struct net_device *dev, u8 *data)
 {
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	unsigned int i;
 
-	fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats,
-			       FM10K_NETDEV_STATS_LEN);
+	fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats);
 
-	fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats,
-			       FM10K_GLOBAL_STATS_LEN);
+	fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats);
 
-	fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats,
-			       FM10K_MBX_STATS_LEN);
+	fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats);
 
 	if (interface->hw.mac.type != fm10k_mac_vf)
-		fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats,
-				       FM10K_PF_STATS_LEN);
+		fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats);
 
 	for (i = 0; i < interface->hw.mac.max_queues; i++) {
 		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN,
 				       "tx", i);
 
 		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN,
 				       "rx", i);
 	}
 }
@@ -219,9 +216,9 @@ static int fm10k_get_sset_count(struct net_device *dev, int sset)
 	}
 }
 
-static void fm10k_add_ethtool_stats(u64 **data, void *pointer,
-				    const struct fm10k_stats stats[],
-				    const unsigned int size)
+static void __fm10k_add_ethtool_stats(u64 **data, void *pointer,
+				      const struct fm10k_stats stats[],
+				      const unsigned int size)
 {
 	unsigned int i;
 	char *p;
@@ -255,6 +252,9 @@ static void fm10k_add_ethtool_stats(u64 **data, void *pointer,
 	}
 }
 
+#define fm10k_add_ethtool_stats(data, pointer, stats) \
+	__fm10k_add_ethtool_stats(data, pointer, stats, ARRAY_SIZE(stats))
+
 static void fm10k_get_ethtool_stats(struct net_device *netdev,
 				    struct ethtool_stats __always_unused *stats,
 				    u64 *data)
@@ -265,20 +265,16 @@ static void fm10k_get_ethtool_stats(struct net_device *netdev,
 
 	fm10k_update_stats(interface);
 
-	fm10k_add_ethtool_stats(&data, net_stats, fm10k_gstrings_net_stats,
-				FM10K_NETDEV_STATS_LEN);
+	fm10k_add_ethtool_stats(&data, net_stats, fm10k_gstrings_net_stats);
 
-	fm10k_add_ethtool_stats(&data, interface, fm10k_gstrings_global_stats,
-				FM10K_GLOBAL_STATS_LEN);
+	fm10k_add_ethtool_stats(&data, interface, fm10k_gstrings_global_stats);
 
 	fm10k_add_ethtool_stats(&data, &interface->hw.mbx,
-				fm10k_gstrings_mbx_stats,
-				FM10K_MBX_STATS_LEN);
+				fm10k_gstrings_mbx_stats);
 
 	if (interface->hw.mac.type != fm10k_mac_vf) {
 		fm10k_add_ethtool_stats(&data, interface,
-					fm10k_gstrings_pf_stats,
-					FM10K_PF_STATS_LEN);
+					fm10k_gstrings_pf_stats);
 	}
 
 	for (i = 0; i < interface->hw.mac.max_queues; i++) {
@@ -286,13 +282,11 @@ static void fm10k_get_ethtool_stats(struct net_device *netdev,
 
 		ring = interface->tx_ring[i];
 		fm10k_add_ethtool_stats(&data, ring,
-					fm10k_gstrings_queue_stats,
-					FM10K_QUEUE_STATS_LEN);
+					fm10k_gstrings_queue_stats);
 
 		ring = interface->rx_ring[i];
 		fm10k_add_ethtool_stats(&data, ring,
-					fm10k_gstrings_queue_stats,
-					FM10K_QUEUE_STATS_LEN);
+					fm10k_gstrings_queue_stats);
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 6/6] fm10k: don't protect fm10k_queue_mac_request by fm10k_host_mbx_ready
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We don't actually need to check if the host mbx is ready when queuing
MAC requests, because these are not handled by a special queue which
queues up requests until the mailbox is capable of handling them.

Pull these requests outside the fm10k_host_mbx_ready() check, as it is
not necessary.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 0dc9f2dbc1ad..929f538d28bc 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -1537,12 +1537,12 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 
 	glort = l2_accel->dglort + 1 + i;
 
-	if (fm10k_host_mbx_ready(interface)) {
+	if (fm10k_host_mbx_ready(interface))
 		hw->mac.ops.update_xcast_mode(hw, glort,
 					      FM10K_XCAST_MODE_NONE);
-		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
-					hw->mac.default_vid, true);
-	}
+
+	fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+				hw->mac.default_vid, true);
 
 	for (vid = fm10k_find_next_vlan(interface, 0);
 	     vid < VLAN_N_VID;
@@ -1583,12 +1583,12 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 
 	glort = l2_accel->dglort + 1 + i;
 
-	if (fm10k_host_mbx_ready(interface)) {
+	if (fm10k_host_mbx_ready(interface))
 		hw->mac.ops.update_xcast_mode(hw, glort,
 					      FM10K_XCAST_MODE_NONE);
-		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
-					hw->mac.default_vid, false);
-	}
+
+	fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+				hw->mac.default_vid, false);
 
 	for (vid = fm10k_find_next_vlan(interface, 0);
 	     vid < VLAN_N_VID;
-- 
2.14.3

^ permalink raw reply related

* [net-next 3/6] fm10k: use variadic arguments to fm10k_add_stat_strings
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Instead of using a fixed prefix string we setup before each call to
fm10k_add_stat_strings, modify the helper to take variadic arguments and
pass them to vsnprintf. This requires changing the fm10k_stat strings to
take % format specifiers where necessary, but the resulting code is much
simpler.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 53 ++++++++++++------------
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index f4cad9ffdc79..fa7c026fe874 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -6,6 +6,11 @@
 #include "fm10k.h"
 
 struct fm10k_stats {
+	/* The stat_string is expected to be a format string formatted using
+	 * vsnprintf by fm10k_add_stat_strings. Every member of a stats array
+	 * should use the same format specifiers as they will be formatted
+	 * using the same variadic arguments.
+	 */
 	char stat_string[ETH_GSTRING_LEN];
 	int sizeof_stat;
 	int stat_offset;
@@ -93,15 +98,13 @@ static const struct fm10k_stats fm10k_gstrings_mbx_stats[] = {
 	FM10K_MBX_STAT("mbx_rx_mbmem_pushed", rx_mbmem_pushed),
 };
 
-#define FM10K_QUEUE_STAT(_name, _stat) { \
-	.stat_string = _name, \
-	.sizeof_stat = FIELD_SIZEOF(struct fm10k_ring, _stat), \
-	.stat_offset = offsetof(struct fm10k_ring, _stat) \
-}
+/* per-queue ring statistics */
+#define FM10K_QUEUE_STAT(_name, _stat) \
+	FM10K_STAT_FIELDS(struct fm10k_ring, _name, _stat)
 
 static const struct fm10k_stats fm10k_gstrings_queue_stats[] = {
-	FM10K_QUEUE_STAT("packets", stats.packets),
-	FM10K_QUEUE_STAT("bytes", stats.bytes),
+	FM10K_QUEUE_STAT("%s_queue_%u_packets", stats.packets),
+	FM10K_QUEUE_STAT("%s_queue_%u_bytes", stats.bytes),
 };
 
 #define FM10K_GLOBAL_STATS_LEN ARRAY_SIZE(fm10k_gstrings_global_stats)
@@ -131,16 +134,18 @@ enum {
 static const char fm10k_prv_flags[FM10K_PRV_FLAG_LEN][ETH_GSTRING_LEN] = {
 };
 
-static void fm10k_add_stat_strings(u8 **p, const char *prefix,
-				   const struct fm10k_stats stats[],
-				   const unsigned int size)
+static void fm10k_add_stat_strings(u8 **p, const struct fm10k_stats stats[],
+				   const unsigned int size, ...)
 {
 	unsigned int i;
 
 	for (i = 0; i < size; i++) {
-		snprintf(*p, ETH_GSTRING_LEN, "%s%s",
-			 prefix, stats[i].stat_string);
+		va_list args;
+
+		va_start(args, size);
+		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
 		*p += ETH_GSTRING_LEN;
+		va_end(args);
 	}
 }
 
@@ -149,31 +154,27 @@ static void fm10k_get_stat_strings(struct net_device *dev, u8 *data)
 	struct fm10k_intfc *interface = netdev_priv(dev);
 	unsigned int i;
 
-	fm10k_add_stat_strings(&data, "", fm10k_gstrings_net_stats,
+	fm10k_add_stat_strings(&data, fm10k_gstrings_net_stats,
 			       FM10K_NETDEV_STATS_LEN);
 
-	fm10k_add_stat_strings(&data, "", fm10k_gstrings_global_stats,
+	fm10k_add_stat_strings(&data, fm10k_gstrings_global_stats,
 			       FM10K_GLOBAL_STATS_LEN);
 
-	fm10k_add_stat_strings(&data, "", fm10k_gstrings_mbx_stats,
+	fm10k_add_stat_strings(&data, fm10k_gstrings_mbx_stats,
 			       FM10K_MBX_STATS_LEN);
 
 	if (interface->hw.mac.type != fm10k_mac_vf)
-		fm10k_add_stat_strings(&data, "", fm10k_gstrings_pf_stats,
+		fm10k_add_stat_strings(&data, fm10k_gstrings_pf_stats,
 				       FM10K_PF_STATS_LEN);
 
 	for (i = 0; i < interface->hw.mac.max_queues; i++) {
-		char prefix[ETH_GSTRING_LEN];
-
-		snprintf(prefix, ETH_GSTRING_LEN, "tx_queue_%u_", i);
-		fm10k_add_stat_strings(&data, prefix,
-				       fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN);
+		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
+				       FM10K_QUEUE_STATS_LEN,
+				       "tx", i);
 
-		snprintf(prefix, ETH_GSTRING_LEN, "rx_queue_%u_", i);
-		fm10k_add_stat_strings(&data, prefix,
-				       fm10k_gstrings_queue_stats,
-				       FM10K_QUEUE_STATS_LEN);
+		fm10k_add_stat_strings(&data, fm10k_gstrings_queue_stats,
+				       FM10K_QUEUE_STATS_LEN,
+				       "rx", i);
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related

* [net-next 1/6] fm10k: setup VLANs for l2 accelerated macvlan interfaces
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

We have support for accelerating macvlan devices via the
.ndo_dfwd_add_station() netdev op. These accelerated macvlan MAC
addresses are stored in the l2_accel structure, separate from the
unicast or multicast address lists.

If a VLAN is added on top of the macvlan device by the stack, traffic
will not properly flow to the macvlan. This occurs because we fail to
setup the VLANs for l2_accel MAC addresses.

In the non-offloaded case the MAC address is added to the unicast
address list, and thus the normal setup for enabling VLANs works as
expected.

We also need to add VLANs marked from .ndo_vlan_rx_add_vid() into the
l2_accel MAC addresses. Otherwise, VLAN traffic will not properly be
received by the VLAN devices attached to the offloaded macvlan devices.

Fix this by adding necessary logic to setup VLANs not only for the
unicast and multicast addresses, but also the l2_accel list. We need
similar logic in dfwd_add_station, dfwd_del_station, fm10k_update_vid,
and fm10k_restore_rx_state.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 50 ++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index c879af72bbf5..0dc9f2dbc1ad 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -907,7 +907,9 @@ static int fm10k_mc_vlan_unsync(struct net_device *netdev,
 static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
 {
 	struct fm10k_intfc *interface = netdev_priv(netdev);
+	struct fm10k_l2_accel *l2_accel = interface->l2_accel;
 	struct fm10k_hw *hw = &interface->hw;
+	u16 glort;
 	s32 err;
 	int i;
 
@@ -975,6 +977,22 @@ static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
 	if (err)
 		goto err_out;
 
+	/* Update L2 accelerated macvlan addresses */
+	if (l2_accel) {
+		for (i = 0; i < l2_accel->size; i++) {
+			struct net_device *sdev = l2_accel->macvlan[i];
+
+			if (!sdev)
+				continue;
+
+			glort = l2_accel->dglort + 1 + i;
+
+			fm10k_queue_mac_request(interface, glort,
+						sdev->dev_addr,
+						vid, set);
+		}
+	}
+
 	/* set VLAN ID prior to syncing/unsyncing the VLAN */
 	interface->vid = vid + (set ? VLAN_N_VID : 0);
 
@@ -1214,6 +1232,22 @@ void fm10k_restore_rx_state(struct fm10k_intfc *interface)
 
 		fm10k_queue_mac_request(interface, glort,
 					hw->mac.addr, vid, true);
+
+		/* synchronize macvlan addresses */
+		if (l2_accel) {
+			for (i = 0; i < l2_accel->size; i++) {
+				struct net_device *sdev = l2_accel->macvlan[i];
+
+				if (!sdev)
+					continue;
+
+				glort = l2_accel->dglort + 1 + i;
+
+				fm10k_queue_mac_request(interface, glort,
+							sdev->dev_addr,
+							vid, true);
+			}
+		}
 	}
 
 	/* update xcast mode before synchronizing addresses if host's mailbox
@@ -1430,7 +1464,7 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 	struct fm10k_dglort_cfg dglort = { 0 };
 	struct fm10k_hw *hw = &interface->hw;
 	int size = 0, i;
-	u16 glort;
+	u16 vid, glort;
 
 	/* The hardware supported by fm10k only filters on the destination MAC
 	 * address. In order to avoid issues we only support offloading modes
@@ -1510,6 +1544,12 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
 					hw->mac.default_vid, true);
 	}
 
+	for (vid = fm10k_find_next_vlan(interface, 0);
+	     vid < VLAN_N_VID;
+	     vid = fm10k_find_next_vlan(interface, vid))
+		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+					vid, true);
+
 	fm10k_mbx_unlock(interface);
 
 	return sdev;
@@ -1522,8 +1562,8 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 	struct fm10k_dglort_cfg dglort = { 0 };
 	struct fm10k_hw *hw = &interface->hw;
 	struct net_device *sdev = priv;
+	u16 vid, glort;
 	int i;
-	u16 glort;
 
 	if (!l2_accel)
 		return;
@@ -1550,6 +1590,12 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
 					hw->mac.default_vid, false);
 	}
 
+	for (vid = fm10k_find_next_vlan(interface, 0);
+	     vid < VLAN_N_VID;
+	     vid = fm10k_find_next_vlan(interface, vid))
+		fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+					vid, false);
+
 	fm10k_mbx_unlock(interface);
 
 	/* record removal */
-- 
2.14.3

^ permalink raw reply related

* [net-next 2/6] fm10k: reduce duplicate fm10k_stat macro code
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20180507144521.6979-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Share some of the code for setting up fm10k_stat macros by implementing
an FM10K_STAT_FIELDS macro which we can use when setting up the type
specific macros.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 28 ++++++++++++------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index eeac2b75a195..f4cad9ffdc79 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -11,12 +11,16 @@ struct fm10k_stats {
 	int stat_offset;
 };
 
-#define FM10K_NETDEV_STAT(_net_stat) { \
-	.stat_string = #_net_stat, \
-	.sizeof_stat = FIELD_SIZEOF(struct net_device_stats, _net_stat), \
-	.stat_offset = offsetof(struct net_device_stats, _net_stat) \
+#define FM10K_STAT_FIELDS(_type, _name, _stat) { \
+	.stat_string = _name, \
+	.sizeof_stat = FIELD_SIZEOF(_type, _stat), \
+	.stat_offset = offsetof(_type, _stat) \
 }
 
+/* netdevice statistics */
+#define FM10K_NETDEV_STAT(_net_stat) \
+	FM10K_STAT_FIELDS(struct net_device_stats, #_net_stat, _net_stat)
+
 static const struct fm10k_stats fm10k_gstrings_net_stats[] = {
 	FM10K_NETDEV_STAT(tx_packets),
 	FM10K_NETDEV_STAT(tx_bytes),
@@ -34,11 +38,9 @@ static const struct fm10k_stats fm10k_gstrings_net_stats[] = {
 
 #define FM10K_NETDEV_STATS_LEN	ARRAY_SIZE(fm10k_gstrings_net_stats)
 
-#define FM10K_STAT(_name, _stat) { \
-	.stat_string = _name, \
-	.sizeof_stat = FIELD_SIZEOF(struct fm10k_intfc, _stat), \
-	.stat_offset = offsetof(struct fm10k_intfc, _stat) \
-}
+/* General interface statistics */
+#define FM10K_STAT(_name, _stat) \
+	FM10K_STAT_FIELDS(struct fm10k_intfc, _name, _stat)
 
 static const struct fm10k_stats fm10k_gstrings_global_stats[] = {
 	FM10K_STAT("tx_restart_queue", restart_queue),
@@ -75,11 +77,9 @@ static const struct fm10k_stats fm10k_gstrings_pf_stats[] = {
 	FM10K_STAT("nodesc_drop", stats.nodesc_drop.count),
 };
 
-#define FM10K_MBX_STAT(_name, _stat) { \
-	.stat_string = _name, \
-	.sizeof_stat = FIELD_SIZEOF(struct fm10k_mbx_info, _stat), \
-	.stat_offset = offsetof(struct fm10k_mbx_info, _stat) \
-}
+/* mailbox statistics */
+#define FM10K_MBX_STAT(_name, _stat) \
+	FM10K_STAT_FIELDS(struct fm10k_mbx_info, _name, _stat)
 
 static const struct fm10k_stats fm10k_gstrings_mbx_stats[] = {
 	FM10K_MBX_STAT("mbx_tx_busy", tx_busy),
-- 
2.14.3

^ permalink raw reply related

* [net-next 0/6][pull request] 100GbE Intel Wired LAN Driver Updates 2018-05-07
From: Jeff Kirsher @ 2018-05-07 14:45 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains updates to fm10k only.

Jake provides all the changes in the series, starting with adding
support for accelerated MACVLAN devices.  Reduced code duplication by
implementing a macro to be used when setting up the type specific
macros.  Avoided potential bugs with stats by using a macro to calculate
the array size when passing to ensure that the size is correct.

The following are changes since commit 90278871d4b0da39c84fc9aa4929b0809dc7cf3c:
  Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Jacob Keller (6):
  fm10k: setup VLANs for l2 accelerated macvlan interfaces
  fm10k: reduce duplicate fm10k_stat macro code
  fm10k: use variadic arguments to fm10k_add_stat_strings
  fm10k: use macro to avoid passing the array and size separately
  fm10k: warn if the stat size is unknown
  fm10k: don't protect fm10k_queue_mac_request by fm10k_host_mbx_ready

 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 115 +++++++++++------------
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c  |  62 ++++++++++--
 2 files changed, 110 insertions(+), 67 deletions(-)

-- 
2.14.3

^ permalink raw reply

* Re: [PATCH net-next v2 1/3] ipv4: support sport and dport in RTM_GETROUTE
From: Roopa Prabhu @ 2018-05-07 14:42 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, netdev, Nikolay Aleksandrov, Ido Schimmel
In-Reply-To: <a6027834-d056-5ff9-82e7-52d4ba2e90fb@cumulusnetworks.com>

On Sun, May 6, 2018 at 6:46 PM, David Ahern <dsa@cumulusnetworks.com> wrote:
> On 5/6/18 6:59 PM, Roopa Prabhu wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This is a followup to fib rules sport, dport match support.
>> Having them supported in getroute makes it easier to test
>> fib rule lookups. Used by fib rule self tests. Before this patch
>> getroute used same skb to pass through the route lookup and
>> for the netlink getroute reply msg. This patch allocates separate
>> skb's to keep flow dissector happy.
>>
>> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
>> ---
>>  include/uapi/linux/rtnetlink.h |   2 +
>>  net/ipv4/route.c               | 151 ++++++++++++++++++++++++++++++-----------
>>  2 files changed, 115 insertions(+), 38 deletions(-)
>>
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index 9b15005..630ecf4 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -327,6 +327,8 @@ enum rtattr_type_t {
>>       RTA_PAD,
>>       RTA_UID,
>>       RTA_TTL_PROPAGATE,
>> +     RTA_SPORT,
>> +     RTA_DPORT,
>
> If you are going to add sport and dport because of the potential for FIB
> rules, you need to add ip-proto as well. I realize existing code assumed
> UDP, but the FIB rules cover any IP proto. Yes, I know this makes the
> change much larger to generate tcp, udp as well as iphdr options; the
> joys of new features. ;-)


:) sure..like i mentioned in the cover letter..., i was thinking of
submitting follow up patches for more ip_proto.
since i will be spinning v3, let me see if i can include that too.



>
> I also suggest a comment that these new RTA attributes are used for
> GETROUTE only.


sure

>
> And you need to add the new entries to rtm_ipv4_policy.
>
>
>>       __RTA_MAX
>>  };

ack,

>>
>> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
>> index 1412a7b..e91ed62 100644
>> --- a/net/ipv4/route.c
>> +++ b/net/ipv4/route.c
>> @@ -2568,11 +2568,10 @@ struct rtable *ip_route_output_flow(struct net *net, struct flowi4 *flp4,
>>  EXPORT_SYMBOL_GPL(ip_route_output_flow);
>>
>>  /* called with rcu_read_lock held */
>> -static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>> -                     struct flowi4 *fl4, struct sk_buff *skb, u32 portid,
>> -                     u32 seq)
>> +static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
>> +                     struct rtable *rt, u32 table_id, struct flowi4 *fl4,
>> +                     struct sk_buff *skb, u32 portid, u32 seq)
>>  {
>> -     struct rtable *rt = skb_rtable(skb);
>>       struct rtmsg *r;
>>       struct nlmsghdr *nlh;
>>       unsigned long expires = 0;
>> @@ -2651,6 +2650,14 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>>                       from_kuid_munged(current_user_ns(), fl4->flowi4_uid)))
>>               goto nla_put_failure;
>>
>> +     if (fl4->fl4_sport &&
>> +         nla_put_be16(skb, RTA_SPORT, fl4->fl4_sport))
>> +             goto nla_put_failure;
>> +
>> +     if (fl4->fl4_dport &&
>> +         nla_put_be16(skb, RTA_DPORT, fl4->fl4_dport))
>> +             goto nla_put_failure;
>
> Why return the attributes to the user? I can't see any value in that.
> UID option is not returned either so there is precedence.

hmm..i do see UID returned just 2 lines above. :)

In the least i think it will confirm that the kernel did see your attributes :).


>
>
>> +
>>       error = rt->dst.error;
>>
>>       if (rt_is_input_route(rt)) {
>> @@ -2668,7 +2675,7 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>>                       }
>>               } else
>>  #endif
>> -                     if (nla_put_u32(skb, RTA_IIF, skb->dev->ifindex))
>> +                     if (nla_put_u32(skb, RTA_IIF, fl4->flowi4_iif))
>>                               goto nla_put_failure;
>>       }
>>
>> @@ -2683,35 +2690,86 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
>>       return -EMSGSIZE;
>>  }
>>
>> +static int nla_get_port(struct nlattr *attr, __be16 *port)
>> +{
>> +     int p = nla_get_be16(attr);
>
> __be16 p;
>
>> +
>> +     if (p <= 0 || p >= 0xffff)
>> +             return -EINVAL;
>
> This check is not needed by definition of be16.

ack, will fix the kbuild sparse warning also for both v4 and v6.


>
>> +
>> +     *port = p;
>> +     return 0;
>> +}
>> +
>> +static int inet_rtm_getroute_reply(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>> +                                __be32 dst, __be32 src, struct flowi4 *fl4,
>> +                                struct rtable *rt, struct fib_result *res)
>> +{
>> +     struct net *net = sock_net(in_skb->sk);
>> +     struct rtmsg *rtm = nlmsg_data(nlh);
>> +     u32 table_id = RT_TABLE_MAIN;
>> +     struct sk_buff *skb;
>> +     int err = 0;
>> +
>> +     skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
>> +     if (!skb) {
>> +             err = -ENOMEM;
>> +             return err;
>> +     }
>
> just 'return -ENOMEM' and without the {}.

yep

>
>
>> +
>> +     if (rtm->rtm_flags & RTM_F_LOOKUP_TABLE)
>> +             table_id = res->table ? res->table->tb_id : 0;
>> +
>> +     if (rtm->rtm_flags & RTM_F_FIB_MATCH)
>> +             err = fib_dump_info(skb, NETLINK_CB(in_skb).portid,
>> +                                 nlh->nlmsg_seq, RTM_NEWROUTE, table_id,
>> +                                 rt->rt_type, res->prefix, res->prefixlen,
>> +                                 fl4->flowi4_tos, res->fi, 0);
>> +     else
>> +             err = rt_fill_info(net, dst, src, rt, table_id,
>> +                                fl4, skb, NETLINK_CB(in_skb).portid,
>> +                                nlh->nlmsg_seq);
>> +     if (err < 0)
>> +             goto errout;
>> +
>> +     return rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
>> +
>> +errout:
>> +     kfree_skb(skb);
>> +     return err;
>> +}
>> +
>>  static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
>>                            struct netlink_ext_ack *extack)
>>  {
>>       struct net *net = sock_net(in_skb->sk);
>> -     struct rtmsg *rtm;
>>       struct nlattr *tb[RTA_MAX+1];
>> +     __be16 sport = 0, dport = 0;
>>       struct fib_result res = {};
>>       struct rtable *rt = NULL;
>> +     struct sk_buff *skb;
>> +     struct rtmsg *rtm;
>>       struct flowi4 fl4;
>> +     struct iphdr *iph;
>> +     struct udphdr *udph;
>>       __be32 dst = 0;
>>       __be32 src = 0;
>> +     kuid_t uid;
>>       u32 iif;
>>       int err;
>>       int mark;
>> -     struct sk_buff *skb;
>> -     u32 table_id = RT_TABLE_MAIN;
>> -     kuid_t uid;
>>
>>       err = nlmsg_parse(nlh, sizeof(*rtm), tb, RTA_MAX, rtm_ipv4_policy,
>>                         extack);
>>       if (err < 0)
>> -             goto errout;
>> +             return err;
>>
>>       rtm = nlmsg_data(nlh);
>>
>>       skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
>>       if (!skb) {
>>               err = -ENOBUFS;
>> -             goto errout;
>> +             return err;
>
> just return -ENOBUFS
>

yes

>
>>       }
>>
>>       /* Reserve room for dummy headers, this skb can pass
>

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: David Ahern @ 2018-05-07 14:26 UTC (permalink / raw)
  To: Daniel Borkmann, Jesper Dangaard Brouer
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend
In-Reply-To: <247aa1c4-e860-db72-fbb4-9da6c3a3f84c@iogearbox.net>

On 5/7/18 8:10 AM, Daniel Borkmann wrote:
> On 05/07/2018 03:35 PM, Jesper Dangaard Brouer wrote:
>> On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:
>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 6877426c23a6..cf0d27acf1d1 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>> [...]
>>> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
>>> +	.func		= bpf_xdp_fib_lookup,
>>> +	.gpl_only	= true,
>>
>> Is it a deliberate choice to require BPF-progs using this helper to be
>> GPL licensed?
>>
>> Asking as this seems to be the first network related helper with this
>> requirement, while this is typical for tracing related helpers.
> 
> Good point, we should remove that. In networking it's only the perf event
> output helpers tying into tracing bits. After all, if you do a route lookup
> via netlink from user space there's no such restriction at all.
> 

Networking symbols are typically exported GPL for modules. The person
writing the code and exporting GPL is specifying a desire that only GPL
licensed modules can link to the symbol.

Given the common analogy of modules and bpf programs, why can't a writer
of a bpf helper specify a preference that only GPL licensed programs
leverage a BPF helper?

^ permalink raw reply

* Re: [PATCH] net: 8390: Fix possible data races in __ei_get_stats
From: Eric Dumazet @ 2018-05-07 14:15 UTC (permalink / raw)
  To: Jia-Ju Bai, davem, fthain, joe; +Cc: netdev, linux-kernel
In-Reply-To: <20180507140809.28847-1-baijiaju1990@gmail.com>



On 05/07/2018 07:08 AM, Jia-Ju Bai wrote:
> The write operations to "dev->stats" are protected by 
> the spinlock on line 862-864, but the read operations to
> this data on line 858 and 867 are not protected by the spinlock.
> Thus, there may exist data races for "dev->stats".
> 
> To fix the data races, the read operations to "dev->stats" are 
> protected by the spinlock, and a local variable is used for return.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> ---
>  drivers/net/ethernet/8390/lib8390.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/8390/lib8390.c b/drivers/net/ethernet/8390/lib8390.c
> index c9c55c9eab9f..198952247d30 100644
> --- a/drivers/net/ethernet/8390/lib8390.c
> +++ b/drivers/net/ethernet/8390/lib8390.c
> @@ -852,19 +852,25 @@ static struct net_device_stats *__ei_get_stats(struct net_device *dev)
>  	unsigned long ioaddr = dev->base_addr;
>  	struct ei_device *ei_local = netdev_priv(dev);
>  	unsigned long flags;
> +	struct net_device_stats *stats;
> +
> +	spin_lock_irqsave(&ei_local->page_lock, flags);
>  
>  	/* If the card is stopped, just return the present stats. */
> -	if (!netif_running(dev))
> -		return &dev->stats;
> +	if (!netif_running(dev)) {
> +		stats = &dev->stats;
> +		spin_unlock_irqrestore(&ei_local->page_lock, flags);
> +		return stats;
> +	}
>  
> -	spin_lock_irqsave(&ei_local->page_lock, flags);
>  	/* Read the counter registers, assuming we are in page 0. */
>  	dev->stats.rx_frame_errors  += ei_inb_p(ioaddr + EN0_COUNTER0);
>  	dev->stats.rx_crc_errors    += ei_inb_p(ioaddr + EN0_COUNTER1);
>  	dev->stats.rx_missed_errors += ei_inb_p(ioaddr + EN0_COUNTER2);
> +	stats = &dev->stats;
>  	spin_unlock_irqrestore(&ei_local->page_lock, flags);
>  
> -	return &dev->stats;
> +	return stats;
>  }
>  
>  /*
> 

dev->stats is not a pointer, it is an array embedded in the 
struct net_device

So this patch is not needed, since dev->stats can not change.

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: Daniel Borkmann @ 2018-05-07 14:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend
In-Reply-To: <20180507153552.5063a4b2@redhat.com>

On 05/07/2018 03:35 PM, Jesper Dangaard Brouer wrote:
> On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:
> 
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 6877426c23a6..cf0d27acf1d1 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
> [...]
>> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
>> +	.func		= bpf_xdp_fib_lookup,
>> +	.gpl_only	= true,
> 
> Is it a deliberate choice to require BPF-progs using this helper to be
> GPL licensed?
> 
> Asking as this seems to be the first network related helper with this
> requirement, while this is typical for tracing related helpers.

Good point, we should remove that. In networking it's only the perf event
output helpers tying into tracing bits. After all, if you do a route lookup
via netlink from user space there's no such restriction at all.

^ permalink raw reply

* [PATCH] net: 8390: Fix possible data races in __ei_get_stats
From: Jia-Ju Bai @ 2018-05-07 14:08 UTC (permalink / raw)
  To: davem, fthain, joe; +Cc: netdev, linux-kernel, Jia-Ju Bai

The write operations to "dev->stats" are protected by 
the spinlock on line 862-864, but the read operations to
this data on line 858 and 867 are not protected by the spinlock.
Thus, there may exist data races for "dev->stats".

To fix the data races, the read operations to "dev->stats" are 
protected by the spinlock, and a local variable is used for return.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/ethernet/8390/lib8390.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/8390/lib8390.c b/drivers/net/ethernet/8390/lib8390.c
index c9c55c9eab9f..198952247d30 100644
--- a/drivers/net/ethernet/8390/lib8390.c
+++ b/drivers/net/ethernet/8390/lib8390.c
@@ -852,19 +852,25 @@ static struct net_device_stats *__ei_get_stats(struct net_device *dev)
 	unsigned long ioaddr = dev->base_addr;
 	struct ei_device *ei_local = netdev_priv(dev);
 	unsigned long flags;
+	struct net_device_stats *stats;
+
+	spin_lock_irqsave(&ei_local->page_lock, flags);
 
 	/* If the card is stopped, just return the present stats. */
-	if (!netif_running(dev))
-		return &dev->stats;
+	if (!netif_running(dev)) {
+		stats = &dev->stats;
+		spin_unlock_irqrestore(&ei_local->page_lock, flags);
+		return stats;
+	}
 
-	spin_lock_irqsave(&ei_local->page_lock, flags);
 	/* Read the counter registers, assuming we are in page 0. */
 	dev->stats.rx_frame_errors  += ei_inb_p(ioaddr + EN0_COUNTER0);
 	dev->stats.rx_crc_errors    += ei_inb_p(ioaddr + EN0_COUNTER1);
 	dev->stats.rx_missed_errors += ei_inb_p(ioaddr + EN0_COUNTER2);
+	stats = &dev->stats;
 	spin_unlock_irqrestore(&ei_local->page_lock, flags);
 
-	return &dev->stats;
+	return stats;
 }
 
 /*
-- 
2.17.0

^ permalink raw reply related

* RE: [PATCH net-next] flow_dissector: do not rely on implicit casts
From: Jon Maloy @ 2018-05-07 13:54 UTC (permalink / raw)
  To: Paolo Abeni, netdev@vger.kernel.org; +Cc: David S. Miller
In-Reply-To: <163b504b851217e600d9b48aa7cba76b376a675a.1525687522.git.pabeni@redhat.com>

Acked-by: Jon Maloy <jon.maloy@ericsson.com>


> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Paolo Abeni
> Sent: Monday, May 07, 2018 06:06
> To: netdev@vger.kernel.org
> Cc: David S. Miller <davem@davemloft.net>
> Subject: [PATCH net-next] flow_dissector: do not rely on implicit casts
> 
> This change fixes a couple of type mismatch reported by the sparse tool,
> explicitly using the requested type for the offending arguments.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  include/net/tipc.h        | 4 ++--
>  net/core/flow_dissector.c | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/tipc.h b/include/net/tipc.h index
> 07670ec022a7..f0e7e6bc1bef 100644
> --- a/include/net/tipc.h
> +++ b/include/net/tipc.h
> @@ -44,11 +44,11 @@ struct tipc_basic_hdr {
>  	__be32 w[4];
>  };
> 
> -static inline u32 tipc_hdr_rps_key(struct tipc_basic_hdr *hdr)
> +static inline __be32 tipc_hdr_rps_key(struct tipc_basic_hdr *hdr)
>  {
>  	u32 w0 = ntohl(hdr->w[0]);
>  	bool keepalive_msg = (w0 & KEEPALIVE_MSG_MASK) ==
> KEEPALIVE_MSG_MASK;
> -	int key;
> +	__be32 key;
> 
>  	/* Return source node identity as key */
>  	if (likely(!keepalive_msg))
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index
> 030d4ca177fb..4fc1e84d77ec 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -1316,7 +1316,7 @@ u32 skb_get_poff(const struct sk_buff *skb)  {
>  	struct flow_keys_basic keys;
> 
> -	if (!skb_flow_dissect_flow_keys_basic(skb, &keys, 0, 0, 0, 0, 0))
> +	if (!skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
>  		return 0;
> 
>  	return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
> --
> 2.14.3

^ permalink raw reply

* Re: [bpf-next v2 8/9] bpf: Provide helper to do forwarding lookups in kernel FIB table
From: Jesper Dangaard Brouer @ 2018-05-07 13:35 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, borkmann, ast, davem, shm, roopa, toke, john.fastabend,
	brouer
In-Reply-To: <20180504025432.23451-9-dsahern@gmail.com>


On Thu,  3 May 2018 19:54:31 -0700 David Ahern <dsahern@gmail.com> wrote:

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 6877426c23a6..cf0d27acf1d1 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
[...]
> +static const struct bpf_func_proto bpf_xdp_fib_lookup_proto = {
> +	.func		= bpf_xdp_fib_lookup,
> +	.gpl_only	= true,

Is it a deliberate choice to require BPF-progs using this helper to be
GPL licensed?

Asking as this seems to be the first network related helper with this
requirement, while this is typical for tracing related helpers.


> +	.ret_type	= RET_INTEGER,
> +	.arg1_type      = ARG_PTR_TO_CTX,
> +	.arg2_type      = ARG_PTR_TO_MEM,
> +	.arg3_type      = ARG_CONST_SIZE,
> +	.arg4_type	= ARG_ANYTHING,
> +};

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH net-next] net-next/hinic: add pci device ids for 25ge and 100ge card
From: Zhao Chen @ 2018-05-07 13:21 UTC (permalink / raw)
  To: davem
  Cc: linux-kernel, netdev, aviad.krawczyk, zhaochen6, tony.qu,
	yin.yinshi, luoshaokai

This patch adds PCI device IDs to support 25GE and 100GE card:

1. Add device id 0x0201 for HINIC 100GE dual port card.
2. Add device id 0x0200 for HINIC 25GE dual port card.
3. Macro of device id 0x1822 is modified for HINIC 25GE quad port card.

Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
---
 drivers/net/ethernet/huawei/hinic/hinic_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/huawei/hinic/hinic_main.c b/drivers/net/ethernet/huawei/hinic/hinic_main.c
index eb53bd93065e..5b122728dcb4 100644
--- a/drivers/net/ethernet/huawei/hinic/hinic_main.c
+++ b/drivers/net/ethernet/huawei/hinic/hinic_main.c
@@ -51,7 +51,9 @@ static unsigned int rx_weight = 64;
 module_param(rx_weight, uint, 0644);
 MODULE_PARM_DESC(rx_weight, "Number Rx packets for NAPI budget (default=64)");
 
-#define PCI_DEVICE_ID_HI1822_PF         0x1822
+#define HINIC_DEV_ID_QUAD_PORT_25GE     0x1822
+#define HINIC_DEV_ID_DUAL_PORT_25GE     0x0200
+#define HINIC_DEV_ID_DUAL_PORT_100GE    0x0201
 
 #define HINIC_WQ_NAME                   "hinic_dev"
 
@@ -1097,7 +1099,9 @@ static void hinic_remove(struct pci_dev *pdev)
 }
 
 static const struct pci_device_id hinic_pci_table[] = {
-	{ PCI_VDEVICE(HUAWEI, PCI_DEVICE_ID_HI1822_PF), 0},
+	{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_QUAD_PORT_25GE), 0},
+	{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_25GE), 0},
+	{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_100GE), 0},
 	{ 0, 0}
 };
 MODULE_DEVICE_TABLE(pci, hinic_pci_table);
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Dmitry Vyukov @ 2018-05-07 13:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Kevin Easton, Jason Wang, KVM list, virtualization, netdev, LKML,
	syzkaller-bugs
In-Reply-To: <20180507155534-mutt-send-email-mst@kernel.org>

On Mon, May 7, 2018 at 3:03 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Apr 27, 2018 at 11:45:02AM -0400, Kevin Easton wrote:
>> The struct vhost_msg within struct vhost_msg_node is copied to userspace,
>> so it should be allocated with kzalloc() to ensure all structure padding
>> is zeroed.
>>
>> Signed-off-by: Kevin Easton <kevin@guarana.org>
>> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
>> ---
>>  drivers/vhost/vhost.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index f3bd8e9..1b84dcff 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
>>  /* Create a new message. */
>>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>>  {
>> -     struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>> +     struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
>>       if (!node)
>>               return NULL;
>>       node->vq = vq;
>
>
> Let's just init the msg though.
>
> OK it seems this is the best we can do for now,
> we need a new feature bit to fix it for 32 bit
> userspace on 64 bit kernels.
>
> Does the following help?

Hi Michael,

You can ask reporter (syzbot) to test:
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
https://github.com/google/syzkaller/blob/master/docs/syzbot.md#kmsan-bugs


> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e9..58d9aec 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2342,6 +2342,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>         struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
>         if (!node)
>                 return NULL;
> +
> +       /* Make sure all padding within the structure is initialized. */
> +       memset(&node->msg, 0, sizeof node->msg);
>         node->vq = vq;
>         node->msg.type = type;
>         return node;
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180507155534-mutt-send-email-mst%40kernel.org.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* [PATCH net 2/2] net: aquantia: Limit number of vectors to actually allocated irqs
From: Igor Russkikh @ 2018-05-07 13:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1525360773.git.igor.russkikh@aquantia.com>

Driver should use pci_alloc_irq_vectors return value to correct number
of allocated vectors and napi instances. Otherwise it'll panic later
in pci_irq_vector.

Driver also should allow more than one MSI vectors to be allocated.

Error return path from pci_alloc_irq_vectors is also fixed to revert
resources in a correct sequence when error happens.

Reported-by: Long, Nicholas <nicholas.a.long@baesystems.com>
Fixes: 23ee07a ("net: aquantia: Cleanup pci functions module")
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 20 ++++++++++----------
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 720760d..1a1a638 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -95,6 +95,7 @@ void aq_nic_cfg_start(struct aq_nic_s *self)
 	/*rss rings */
 	cfg->vecs = min(cfg->aq_hw_caps->vecs, AQ_CFG_VECS_DEF);
 	cfg->vecs = min(cfg->vecs, num_online_cpus());
+	cfg->vecs = min(cfg->vecs, self->irqvecs);
 	/* cfg->vecs should be power of 2 for RSS */
 	if (cfg->vecs >= 8U)
 		cfg->vecs = 8U;
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
index 219b550..faa533a 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.h
@@ -80,6 +80,7 @@ struct aq_nic_s {
 
 	struct pci_dev *pdev;
 	unsigned int msix_entry_mask;
+	u32 irqvecs;
 };
 
 static inline struct device *aq_nic_get_dev(struct aq_nic_s *self)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
index ecc6306..a50e08b 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c
@@ -267,16 +267,16 @@ static int aq_pci_probe(struct pci_dev *pdev,
 	numvecs = min(numvecs, num_online_cpus());
 	/*enable interrupts */
 #if !AQ_CFG_FORCE_LEGACY_INT
-	err = pci_alloc_irq_vectors(self->pdev, numvecs, numvecs,
-				    PCI_IRQ_MSIX);
-
-	if (err < 0) {
-		err = pci_alloc_irq_vectors(self->pdev, 1, 1,
-					    PCI_IRQ_MSI | PCI_IRQ_LEGACY);
-		if (err < 0)
-			goto err_hwinit;
+	numvecs = pci_alloc_irq_vectors(self->pdev, 1, numvecs,
+					PCI_IRQ_MSIX | PCI_IRQ_MSI |
+					PCI_IRQ_LEGACY);
+
+	if (numvecs < 0) {
+		err = numvecs;
+		goto err_hwinit;
 	}
 #endif
+	self->irqvecs = numvecs;
 
 	/* net device init */
 	aq_nic_cfg_start(self);
@@ -298,9 +298,9 @@ static int aq_pci_probe(struct pci_dev *pdev,
 	kfree(self->aq_hw);
 err_ioremap:
 	free_netdev(ndev);
-err_pci_func:
-	pci_release_regions(pdev);
 err_ndev:
+	pci_release_regions(pdev);
+err_pci_func:
 	pci_disable_device(pdev);
 	return err;
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 1/2] net: aquantia: driver should correctly declare vlan_features bits
From: Igor Russkikh @ 2018-05-07 13:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh
In-Reply-To: <cover.1525360773.git.igor.russkikh@aquantia.com>

In particular, not reporting SG forced skbs to be linear for vlan
interfaces over atlantic NIC.

With this fix it is possible to enable SG feature on device and
therefore optimize performance.

Reported-by: Ma Yuying <yuma@redhat.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 drivers/net/ethernet/aquantia/atlantic/aq_nic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 32f6d2e..720760d 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -246,6 +246,8 @@ void aq_nic_ndev_init(struct aq_nic_s *self)

 	self->ndev->hw_features |= aq_hw_caps->hw_features;
 	self->ndev->features = aq_hw_caps->hw_features;
+	self->ndev->vlan_features |= NETIF_F_HW_CSUM | NETIF_F_RXCSUM |
+				     NETIF_F_RXHASH | NETIF_F_SG | NETIF_F_LRO;
 	self->ndev->priv_flags = aq_hw_caps->hw_priv_flags;
 	self->ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;

-- 
2.7.4

^ permalink raw reply related

* [PATCH net 0/2] Aquantia various patches 2018-05
From: Igor Russkikh @ 2018-05-07 13:10 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, David Arcari, Pavel Belous, Igor Russkikh

These are two patches covering issues found during test cycles:

First is that driver should declare valid vlan_features
Second fix is about correct allocation of MSI interrupts on some systems.

Igor Russkikh (2):
  net: aquantia: driver should correctly declare vlan_features bits
  net: aquantia: Limit number of vectors to actually allocated irqs

 drivers/net/ethernet/aquantia/atlantic/aq_nic.c      |  3 +++
 drivers/net/ethernet/aquantia/atlantic/aq_nic.h      |  1 +
 drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c | 20 ++++++++++----------
 3 files changed, 14 insertions(+), 10 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Jesper Dangaard Brouer @ 2018-05-07 13:09 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Alexei Starovoitov, Daniel Borkmann, Björn Töpel,
	Karlsson, Magnus, Alexander Duyck, Alexander Duyck,
	John Fastabend, Alexei Starovoitov, Willem de Bruijn,
	Michael S. Tsirkin, Network Development, Björn Töpel,
	michael.lundkvist, Brandeburg, Jesse, Singhai, Anjali,
	Zhang, Qi Z, brouer
In-Reply-To: <CAJ8uoz2L+3dw5OOHJJ0CyV-y5w1ooNAjKHzHAcAiKhTjAq-vzQ@mail.gmail.com>

On Mon, 7 May 2018 11:13:58 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Sat, May 5, 2018 at 2:34 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, May 04, 2018 at 01:22:17PM +0200, Magnus Karlsson wrote:  
> >> On Fri, May 4, 2018 at 1:38 AM, Alexei Starovoitov
> >> <alexei.starovoitov@gmail.com> wrote:  
> >> > On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:  
> >> >> On 05/02/2018 01:01 PM, Björn Töpel wrote:  
> >> >> > From: Björn Töpel <bjorn.topel@intel.com>
> >> >> >
> >> >> > This patch set introduces a new address family called AF_XDP that is
> >> >> > optimized for high performance packet processing and, in upcoming
> >> >> > patch sets, zero-copy semantics. In this patch set, we have removed
> >> >> > all zero-copy related code in order to make it smaller, simpler and
> >> >> > hopefully more review friendly. This patch set only supports copy-mode
> >> >> > for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
> >> >> > for RX using the XDP_DRV path. Zero-copy support requires XDP and
> >> >> > driver changes that Jesper Dangaard Brouer is working on. Some of his
> >> >> > work has already been accepted. We will publish our zero-copy support
> >> >> > for RX and TX on top of his patch sets at a later point in time.  
> >> >>
> >> >> +1, would be great to see it land this cycle. Saw few minor nits here
> >> >> and there but nothing to hold it up, for the series:
> >> >>
> >> >> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> >> >>
> >> >> Thanks everyone!  
> >> >
> >> > Great stuff!
> >> >
> >> > Applied to bpf-next, with one condition.
> >> > Upcoming zero-copy patches for both RX and TX need to be posted
> >> > and reviewed within this release window.
> >> > If netdev community as a whole won't be able to agree on the zero-copy
> >> > bits we'd need to revert this feature before the next merge window.  
> >>
> >> Thanks everyone for reviewing this. Highly appreciated.
> >>
> >> Just so we understand the purpose correctly:
> >>
> >> 1: Do you want to see the ZC patches in order to verify that the user
> >> space API holds? If so, we can produce an additional RFC  patch set
> >> using a big chunk of code that we had in RFC V1. We are not proud of
> >> this code since it is clunky, but it hopefully proves the point with
> >> the uapi being the same.
> >>
> >> 2: And/Or are you worried about us all (the netdev community) not
> >> agreeing on a way to implement ZC internally in the drivers and the
> >> XDP infrastructure? This is not going to be possible to finish during
> >> this cycle since we do not like the implementation we had in RFC V1.
> >> Too intrusive and now we also have nicer abstractions from Jesper that
> >> we can use and extend to provide a (hopefully) much cleaner and less
> >> intrusive solution.  
> >
> > short answer: both.
> >
> > Cleanliness and performance of the ZC code is not as important as
> > getting API right. The main concern that during ZC review process
> > we will find out that existing API has issues, so we have to
> > do this exercise before the merge window.
> > And RFC won't fly. Send the patches for real. They have to go
> > through the proper code review. The hackers of netdev community
> > can accept a partial, or a bit unclean, or slightly inefficient
> > implementation, since it can be and will be improved later,
> > but API we cannot change once it goes into official release.
> >
> > Here is the example of API concern:
> > this patch set added shared umem concept. It sounds good in theory,
> > but will it perform well with ZC ? Earlier RFCs didn't have that
> > feature. If it won't perform well than it shouldn't be in the tree.
> > The key reason to let AF_XDP into the tree is its performance promise.
> > If it doesn't perform we should rip it out and redesign.  
> 
> That is a fair point. We will try to produce patch sets for zero-copy
> RX and TX using the latest interfaces within this merge window. Just
> note that we will focus on this for the next week(s) instead of the
> review items that you and Daniel Borkmann submitted. If we get those
> patch sets out in time and we agree that they are a possible way
> forward, then we produce patches with your fixes. It was mainly small
> items, so should be quick.

I would like to see that you create a new xdp_mem_type for this new
zero-copy type. This will allow other XDP redirect methods/types (e.g.
devmap and cpumap) to react appropriately when receiving a zero-copy
frame.

For devmap, I'm hoping we can allow/support using the ndo_xdp_xmit call
without (first) copying (into a newly allocated page).  By arguing that
if an xsk-userspace app modify a frame it's not allowed to, then it is
simply a bug in the program. (Note, this would also allow using
ndo_xdp_xmit call for TX from xsk-userspace).

For cpumap, it is hard to avoid a copy, but I'm hoping we could delay
the copy (and alloc of mem dest area) until on the remote CPU.  This is
already the principle of cpumap; of moving the allocation of the SKB to
the remote CPU.

For ZC to interact with XDP redirect-core and return API, the zero-copy
memory type/allocator, need to provide an area for the xdp_frame data
to be stored in (as we cannot allow using top-of-frame like
non-zero-copy variants), and extend xdp_frame with an ZC umem-id.
I imagine we can avoid any dynamic allocations, as we upfront (at bind
and XDP_UMEM_REG time) know the number of frames.  (e.g. pre-alloc in
xdp_umem_reg() call, and have xdp_umem_get_xdp_frame lookup func).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] vhost: Use kzalloc() to allocate vhost_msg_node
From: Michael S. Tsirkin @ 2018-05-07 13:03 UTC (permalink / raw)
  To: Kevin Easton
  Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel,
	syzkaller-bugs
In-Reply-To: <20180427154502.GA22544@la.guarana.org>

On Fri, Apr 27, 2018 at 11:45:02AM -0400, Kevin Easton wrote:
> The struct vhost_msg within struct vhost_msg_node is copied to userspace,
> so it should be allocated with kzalloc() to ensure all structure padding
> is zeroed.
> 
> Signed-off-by: Kevin Easton <kevin@guarana.org>
> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com
> ---
>  drivers/vhost/vhost.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index f3bd8e9..1b84dcff 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2339,7 +2339,7 @@ EXPORT_SYMBOL_GPL(vhost_disable_notify);
>  /* Create a new message. */
>  struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
>  {
> -	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
> +	struct vhost_msg_node *node = kzalloc(sizeof *node, GFP_KERNEL);
>  	if (!node)
>  		return NULL;
>  	node->vq = vq;


Let's just init the msg though.

OK it seems this is the best we can do for now,
we need a new feature bit to fix it for 32 bit
userspace on 64 bit kernels.

Does the following help?

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index f3bd8e9..58d9aec 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2342,6 +2342,9 @@ struct vhost_msg_node *vhost_new_msg(struct vhost_virtqueue *vq, int type)
 	struct vhost_msg_node *node = kmalloc(sizeof *node, GFP_KERNEL);
 	if (!node)
 		return NULL;
+
+	/* Make sure all padding within the structure is initialized. */
+	memset(&node->msg, 0, sizeof node->msg);
 	node->vq = vq;
 	node->msg.type = type;
 	return node;

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox