Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH bpf-next v2 0/4] Implement bpf queue/stack maps
From: Joe Stringer @ 2018-09-07  6:27 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: mauricio.vasquez, ast, daniel, netdev, Joe Stringer
In-Reply-To: <20180907001317.fj7f6fg6ihljompp@ast-mbp.dhcp.thefacebook.com>

On Thu, 6 Sep 2018 at 17:13, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> bpf_map_pop_elem() is trying to do lookup_and_delete and preserve
> validity of value without races.
> With pcpu_freelist I don't think there is a solution.
> We can have this queue/stack map without prealloc and use kmalloc/kfree
> back and forth. Performance will not be as great, but for your use case,
> I suspect, it will be good enough.
> The key issue with kmalloc/kfree is unbounded time of rcu callbacks.
> If somebody starts doing push/pop for every packet, the rcu subsystem
> will struggle and nothing we can do about it.
>
> The only way I could think of to resolve this problem is to reuse
> the logic that Joe is working on for socket lookups inside the program.
> Joe,
> how is that going? Could you repost the latest patches?

I can rebase & send them out. Was just wanting to get a little more testing in.

Cheers,
Joe

^ permalink raw reply

* [PATCH v3 0/2] net: ethernet: i40e/i40evf fix build error
From: Wang Dongsheng @ 2018-09-07 11:19 UTC (permalink / raw)
  To: jeffrey.t.kirsher, sergei.shtylyov
  Cc: jacob.e.keller, davem, intel-wired-lan, netdev, linux-kernel,
	Wang Dongsheng

i40e: function __i40e_add_stat_strings can never be inlined because it uses
variable argument lists.

i40evf: A risk of "Multiple definition" from cross-including.

I am working on aarch64 platform and unfortunately the config file is MODULE.
So I missed the buildin test... Sorry for pushing this patch to v3 version...

Now I tested this patch with buildin and modules.

Tested on: x86_64, make ARCH=i386

Modules section .text:
i40e: 00019380 <__i40e_add_stat_strings>:
i40evf: 00006b00 <__i40e_add_stat_strings>:

Buildin:
i40e: c351ca60 <__i40e_add_stat_strings>:
i40evf: c354f2c0 <__i40e_add_stat_strings>:


Wang Dongsheng (2):
  net: ethernet: i40e: fix build error
  net: ethernet: i40evf: fix underlying build error

 .../net/ethernet/intel/i40e/i40e_ethtool.c    | 24 ++++++++++++++++++
 .../ethernet/intel/i40e/i40e_ethtool_stats.h  | 25 ++-----------------
 .../intel/i40evf/i40e_ethtool_stats.h         | 23 +----------------
 .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 ++++++++++++++++++
 4 files changed, 51 insertions(+), 45 deletions(-)

-- 
2.18.0

^ permalink raw reply

* [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Wang Dongsheng @ 2018-09-07 11:19 UTC (permalink / raw)
  To: jeffrey.t.kirsher, sergei.shtylyov
  Cc: jacob.e.keller, davem, intel-wired-lan, netdev, linux-kernel,
	Wang Dongsheng
In-Reply-To: <1536319175-3660-1-git-send-email-dongsheng.wang@hxt-semitech.com>

Can't have non-inline function in a header file.
There is a risk of "Multiple definition" from cross-including.

Tested on: x86_64, make ARCH=i386

Modules section .text:
i40e: 00019380 <__i40e_add_stat_strings>:
i40evf: 00006b00 <__i40e_add_stat_strings>:

Buildin section .text:
i40e: c351ca60 <__i40e_add_stat_strings>:
i40evf: c354f2c0 <__i40e_add_stat_strings>:

Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
---
V3: add static 
---
 .../intel/i40evf/i40e_ethtool_stats.h         | 23 +-----------------
 .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 +++++++++++++++++++
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
index 60b595dd8c39..62ab67a77753 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
@@ -181,29 +181,8 @@ i40evf_add_queue_stats(u64 **data, struct i40e_ring *ring)
 	*data += size;
 }
 
-/**
- * __i40e_add_stat_strings - copy stat strings into ethtool buffer
- * @p: ethtool supplied buffer
- * @stats: stat definitions array
- * @size: size of the stats array
- *
- * Format and copy the strings described by stats into the buffer pointed at
- * by p.
- **/
 static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
-				    const unsigned int size, ...)
-{
-	unsigned int i;
-
-	for (i = 0; i < size; i++) {
-		va_list args;
-
-		va_start(args, size);
-		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
-		*p += ETH_GSTRING_LEN;
-		va_end(args);
-	}
-}
+				    const unsigned int size, ...);
 
 /**
  * 40e_add_stat_strings - copy stat strings into ethtool buffer
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 9319971c5c92..eb2e910bc3a1 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -171,6 +171,30 @@ static void i40evf_get_priv_flag_strings(struct net_device *netdev, u8 *data)
 	}
 }
 
+/**
+ * __i40e_add_stat_strings - copy stat strings into ethtool buffer
+ * @p: ethtool supplied buffer
+ * @stats: stat definitions array
+ * @size: size of the stats array
+ *
+ * Format and copy the strings described by stats into the buffer pointed at
+ * by p.
+ **/
+static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
+				    const unsigned int size, ...)
+{
+	unsigned int i;
+
+	for (i = 0; i < size; i++) {
+		va_list args;
+
+		va_start(args, size);
+		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
+		*p += ETH_GSTRING_LEN;
+		va_end(args);
+	}
+}
+
 /**
  * i40evf_get_stat_strings - Get stat strings
  * @netdev: network interface device structure
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
From: Sinan Kaya @ 2018-09-07  6:40 UTC (permalink / raw)
  To: Daniel Drake, bhelgaas
  Cc: linux-pci, linux, nouveau, linux-pm, peter, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch, mika.westerberg,
	jonathan.derrick, kugel, davem, hkallweit1, netdev, nic_swsd
In-Reply-To: <20180907053614.6540-1-drake@endlessm.com>

On 9/6/2018 10:36 PM, Daniel Drake wrote:
> +	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
> +		pci_setup_bridge_mmio_pref(pci_dev);

This should probably some kind of a quirk rather than default
for the listed card as it sounds like you are dealing with
broken hardware.

^ permalink raw reply

* Re: [Xen-devel] [PATCH] xen-netfront: wait xenbus state change when load module manually
From: Juergen Gross @ 2018-09-07 11:30 UTC (permalink / raw)
  To: Jiri Slaby, Boris Ostrovsky, Xiao Liang, David Miller
  Cc: netdev, linux-kernel, xen-devel
In-Reply-To: <172007cc-6a82-99dd-2d1e-5103a6c1b9b0@suse.cz>

On 07/09/18 13:06, Jiri Slaby wrote:
> On 08/24/2018, 04:26 PM, Boris Ostrovsky wrote:
>> On 08/24/2018 07:26 AM, Juergen Gross wrote:
>>> On 24/08/18 13:12, Jiri Slaby wrote:
>>>> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>>>>> On 07/29/2018 11:30 PM, David Miller wrote:
>>>>>> From: Xiao Liang <xiliang@redhat.com>
>>>>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>>>>
>>>>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>>>>>       netif_carrier_off(netdev);
>>>>>>>         xenbus_switch_state(dev, XenbusStateInitialising);
>>>>>>> +    wait_event(module_load_q,
>>>>>>> +               xenbus_read_driver_state(dev->otherend) !=
>>>>>>> +               XenbusStateClosed &&
>>>>>>> +               xenbus_read_driver_state(dev->otherend) !=
>>>>>>> +               XenbusStateUnknown);
>>>>>>>       return netdev;
>>>>>>>      exit:
>>>>>> What performs the wakeups that will trigger for this sleep site?
>>>>> In my understanding, backend leaving closed/unknow state can trigger the
>>>>> wakeups. I mean to make sure both sides are ready for creating connection.
>>>> While backporting this to 4.12, I was surprised by the commit the same
>>>> as Boris and David.
>>>>
>>>> So I assume the explanation is that wake_up_all of module_unload_q in
>>>> netback_changed wakes also all the processes waiting on module_load_q?
>>>> If so, what makes sure that module_unload_q is queued and the process is
>>>> the same as for module_load_q?
>>> How could it? Either the thread is waiting on module_unload_q _or_ on
>>> module_load_q. It can't wait on two queues at the same time.
>>>
>>>> To me, it looks rather error-prone. Unless it is erroneous now, at least
>>>> for future changes. Wouldn't it make sense to wake up module_load_q
>>>> along with module_unload_q in netback_changed? Or drop module_load_q
>>>> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
>>> To me this looks just wrong. A thread waiting on module_load_q won't be
>>> woken up again.
>>>
>>> I'd drop module_load_q in favor of module_unload_q.
>>
>>
>> Yes, use single queue, but rename it to something more neutral. module_wq?
> 
> Can somebody who is actually using the module fix this, please?

Already at it.


Juergen

^ permalink raw reply

* Re: [PATCH] net/sock: move memory_allocated over to percpu_counter variables
From: Eric Dumazet @ 2018-09-07  7:21 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Herbert Xu, David Miller, Neil Horman, Marcelo Ricardo Leitner,
	Vladislav Yasevich, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	linux-crypto, LKML, linux-sctp, netdev, linux-decnet-user,
	kernel-team, Yuchung Cheng, Neal Cardwell
In-Reply-To: <CANn89iKJcgMWb2Kmk6L9k=NkfBUKZ6BwriWr3O+N5Y0u5dy=9g@mail.gmail.com>

On Fri, Sep 7, 2018 at 12:03 AM Eric Dumazet <edumazet@google.com> wrote:

> Problem is : we have platforms with more than 100 cpus, and
> sk_memory_allocated() cost will be too expensive,
> especially if the host is under memory pressure, since all cpus will
> touch their private counter.
>
> per cpu variables do not really scale, they were ok 10 years ago when
> no more than 16 cpus were the norm.
>
> I would prefer change TCP to not aggressively call
> __sk_mem_reduce_allocated() from tcp_write_timer()
>
> Ideally only tcp_retransmit_timer() should attempt to reduce forward
> allocations, after recurring timeout.
>
> Note that after 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d ("net: avoid
> sk_forward_alloc overflows")
> we have better control over sockets having huge forward allocations.
>
> Something like :

Or something less risky :

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 7fdf222a0bdfe9775970082f6b5dcdcc82b2ae1a..0aee80b6966cb2898e46350c761f9eb431ff1206
100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -604,7 +604,8 @@ void tcp_write_timer_handler(struct sock *sk)
        }

 out:
-       sk_mem_reclaim(sk);
+       if (tcp_under_memory_pressure(sk))
+               sk_mem_reclaim(sk);
 }

 static void tcp_write_timer(struct timer_list *t)

^ permalink raw reply

* [PATCH] xen/netfront: fix waiting for xenbus state change
From: Juergen Gross @ 2018-09-07 12:21 UTC (permalink / raw)
  To: linux-kernel, xen-devel, netdev
  Cc: boris.ostrovsky, davem, Juergen Gross, stable

Commit 822fb18a82aba ("xen-netfront: wait xenbus state change when load
module manually") added a new wait queue to wait on for a state change
when the module is loaded manually. Unfortunately there is no wakeup
anywhere to stop that waiting.

Instead of introducing a new wait queue rename the existing
module_unload_q to module_wq and use it for both purposes (loading and
unloading).

As any state change of the backend might be intended to stop waiting
do the wake_up_all() in any case when netback_changed() is called.

Fixes: 822fb18a82aba ("xen-netfront: wait xenbus state change when load module manually")
Cc: <stable@vger.kernel.org> #4.18
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/net/xen-netfront.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 73f596a90c69..9407acbd19a9 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -87,8 +87,7 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
-static DECLARE_WAIT_QUEUE_HEAD(module_load_q);
-static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
+static DECLARE_WAIT_QUEUE_HEAD(module_wq);
 
 struct netfront_stats {
 	u64			packets;
@@ -1332,11 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	netif_carrier_off(netdev);
 
 	xenbus_switch_state(dev, XenbusStateInitialising);
-	wait_event(module_load_q,
-			   xenbus_read_driver_state(dev->otherend) !=
-			   XenbusStateClosed &&
-			   xenbus_read_driver_state(dev->otherend) !=
-			   XenbusStateUnknown);
+	wait_event(module_wq,
+		   xenbus_read_driver_state(dev->otherend) !=
+		   XenbusStateClosed &&
+		   xenbus_read_driver_state(dev->otherend) !=
+		   XenbusStateUnknown);
 	return netdev;
 
  exit:
@@ -2010,15 +2009,14 @@ static void netback_changed(struct xenbus_device *dev,
 
 	dev_dbg(&dev->dev, "%s\n", xenbus_strstate(backend_state));
 
+	wake_up_all(&module_wq);
+
 	switch (backend_state) {
 	case XenbusStateInitialising:
 	case XenbusStateInitialised:
 	case XenbusStateReconfiguring:
 	case XenbusStateReconfigured:
-		break;
-
 	case XenbusStateUnknown:
-		wake_up_all(&module_unload_q);
 		break;
 
 	case XenbusStateInitWait:
@@ -2034,12 +2032,10 @@ static void netback_changed(struct xenbus_device *dev,
 		break;
 
 	case XenbusStateClosed:
-		wake_up_all(&module_unload_q);
 		if (dev->state == XenbusStateClosed)
 			break;
 		/* Missed the backend's CLOSING state -- fallthrough */
 	case XenbusStateClosing:
-		wake_up_all(&module_unload_q);
 		xenbus_frontend_closed(dev);
 		break;
 	}
@@ -2147,14 +2143,14 @@ static int xennet_remove(struct xenbus_device *dev)
 
 	if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
 		xenbus_switch_state(dev, XenbusStateClosing);
-		wait_event(module_unload_q,
+		wait_event(module_wq,
 			   xenbus_read_driver_state(dev->otherend) ==
 			   XenbusStateClosing ||
 			   xenbus_read_driver_state(dev->otherend) ==
 			   XenbusStateUnknown);
 
 		xenbus_switch_state(dev, XenbusStateClosed);
-		wait_event(module_unload_q,
+		wait_event(module_wq,
 			   xenbus_read_driver_state(dev->otherend) ==
 			   XenbusStateClosed ||
 			   xenbus_read_driver_state(dev->otherend) ==
-- 
2.16.4

^ permalink raw reply related

* Re: [PATCH 3/4] of: Convert to using %pOFn instead of device_node.name
From: Thierry Reding @ 2018-09-07 12:29 UTC (permalink / raw)
  To: Rob Herring
  Cc: Frank Rowand, devicetree, linux-kernel, Andrew Lunn,
	Florian Fainelli, netdev
In-Reply-To: <20180828155254.10709-4-robh@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2503 bytes --]

On Tue, Aug 28, 2018 at 10:52:53AM -0500, Rob Herring wrote:
> In preparation to remove the node name pointer from struct device_node,
> convert printf users to use the %pOFn format specifier.
> 
> Cc: Frank Rowand <frowand.list@gmail.com>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: devicetree@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Signed-off-by: Rob Herring <robh@kernel.org>
> ---
>  drivers/of/device.c   |  4 ++--
>  drivers/of/of_mdio.c  | 12 ++++++------
>  drivers/of/of_numa.c  |  4 ++--
>  drivers/of/overlay.c  |  4 ++--
>  drivers/of/platform.c |  8 ++++----
>  drivers/of/unittest.c | 12 ++++++------
>  6 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index 5957cd4fa262..daa075d87317 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -219,7 +219,7 @@ static ssize_t of_device_get_modalias(struct device *dev, char *str, ssize_t len
>  		return -ENODEV;
>  
>  	/* Name & Type */
> -	csize = snprintf(str, len, "of:N%sT%s", dev->of_node->name,
> +	csize = snprintf(str, len, "of:N%pOFnT%s", dev->of_node,
>  			 dev->of_node->type);
>  	tsize = csize;
>  	len -= csize;

This seems to cause the modalias to be improperly constructed. As a
consequence, automatic module loading at boot time is now broken. I
think the reason why this fails is because vsnprintf() will skip all
alpha-numeric characters after a call to pointer(). Presumably this
is meant to be a generic way of skipping whatever specifiers we throw
at it.

Unfortunately for the case of OF modaliases, this means that the 'T'
character gets eaten, so we end up with something like this:

	# udevadm info /sys/bus/platform/devices/54200000.dc
	[...]
	E: MODALIAS=of:Ndc<NULL>Cnvidia,tegra124-dc
	[...]

instead of this:

	# udevadm info /sys/bus/platform/devices/54200000.dc
	[...]
	E: MODALIAS=of:NdcT<NULL>Cnvidia,tegra124-dc
	[...]

Everything is back to normal if I revert this patch. However, since
that's obviously not what we want, I think perhaps what we need is a
way for pointer() (and its implementations) to report back how many
characters in the format string it consumed so that we can support
these kinds of back-to-back strings.

If nobody else has the time I can look into coding up a fix, but in the
meantime it might be best to back this one out until we can handle the
OF modalias format string.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 1/7] fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-07 12:33 UTC (permalink / raw)
  To: Al Viro; +Cc: netdev, Cong Wang, Jiri Pirko, stable
In-Reply-To: <612ba054-370d-d118-b439-c68ea466eec9@mojatatu.com>

To clarify with an example i used to test
your patches:

#0 add ingress filter
$TC qdisc add dev $P ingress
#1 add filter
$TC filter add dev $P parent ffff: protocol ip prio 10 \
u32 match ip protocol 1 0xff
#2 display
$TC filter ls dev $P parent ffff:

#3 try to delete root
$TC filter delete dev $P parent ffff: protocol ip prio 10 \
handle 800: u32

#4 nothing changes..
$TC filter ls dev $P parent ffff:
#5 delete filter
$TC filter delete dev $P parent ffff: protocol ip prio 10 \
handle 800:0:800 u32
#6 filter gone but hash table still there..
$TC filter ls dev $P parent ffff:
#7 delete tp
$TC filter delete dev $P parent ffff: protocol ip prio 10 \
u32
#8 now it is gone..
$TC filter ls dev $P parent ffff:

your patches show #6 filter as still active.
We want it to look like #8

Hope this helps.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Wang, Dongsheng @ 2018-09-07 12:34 UTC (permalink / raw)
  To: jeffrey.t.kirsher@intel.com, sergei.shtylyov@cogentembedded.com
  Cc: jacob.e.keller@intel.com, davem@davemloft.net,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, carolyn.wyborny@intel.com
In-Reply-To: <1536319175-3660-3-git-send-email-dongsheng.wang@hxt-semitech.com>

Hello Jacob,

Since Carolyn' team is working this, I think we don't need this patch
anymore because this header file is only for ethtool.c.

The cross-including scenario simply doesn't exist.

If you agree, please ignore this patch and just only review the [1/2].


Cheers,
Dongsheng

On 9/7/2018 7:19 PM, Wang, Dongsheng wrote:
> Can't have non-inline function in a header file.
> There is a risk of "Multiple definition" from cross-including.
>
> Tested on: x86_64, make ARCH=i386
>
> Modules section .text:
> i40e: 00019380 <__i40e_add_stat_strings>:
> i40evf: 00006b00 <__i40e_add_stat_strings>:
>
> Buildin section .text:
> i40e: c351ca60 <__i40e_add_stat_strings>:
> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
> ---
> V3: add static 
> ---
>  .../intel/i40evf/i40e_ethtool_stats.h         | 23 +-----------------
>  .../ethernet/intel/i40evf/i40evf_ethtool.c    | 24 +++++++++++++++++++
>  2 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> index 60b595dd8c39..62ab67a77753 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> @@ -181,29 +181,8 @@ i40evf_add_queue_stats(u64 **data, struct i40e_ring *ring)
>  	*data += size;
>  }
>  
> -/**
> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> - * @p: ethtool supplied buffer
> - * @stats: stat definitions array
> - * @size: size of the stats array
> - *
> - * Format and copy the strings described by stats into the buffer pointed at
> - * by p.
> - **/
>  static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> -				    const unsigned int size, ...)
> -{
> -	unsigned int i;
> -
> -	for (i = 0; i < size; i++) {
> -		va_list args;
> -
> -		va_start(args, size);
> -		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> -		*p += ETH_GSTRING_LEN;
> -		va_end(args);
> -	}
> -}
> +				    const unsigned int size, ...);
>  
>  /**
>   * 40e_add_stat_strings - copy stat strings into ethtool buffer
> diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
> index 9319971c5c92..eb2e910bc3a1 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
> @@ -171,6 +171,30 @@ static void i40evf_get_priv_flag_strings(struct net_device *netdev, u8 *data)
>  	}
>  }
>  
> +/**
> + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> + * @p: ethtool supplied buffer
> + * @stats: stat definitions array
> + * @size: size of the stats array
> + *
> + * Format and copy the strings described by stats into the buffer pointed at
> + * by p.
> + **/
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> +				    const unsigned int size, ...)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < size; i++) {
> +		va_list args;
> +
> +		va_start(args, size);
> +		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> +		*p += ETH_GSTRING_LEN;
> +		va_end(args);
> +	}
> +}
> +
>  /**
>   * i40evf_get_stat_strings - Get stat strings
>   * @netdev: network interface device structure

^ permalink raw reply

* Re: [PATCH v3 1/2] net: ethernet: i40e: fix build error
From: Wang, Dongsheng @ 2018-09-07 12:40 UTC (permalink / raw)
  To: jeffrey.t.kirsher@intel.com, sergei.shtylyov@cogentembedded.com
  Cc: jacob.e.keller@intel.com, davem@davemloft.net,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, carolyn.wyborny@intel.com
In-Reply-To: <1536319175-3660-2-git-send-email-dongsheng.wang@hxt-semitech.com>

Hello all,

The i40e_ethtool_stats.h is just included by i40e/i40e_ethtool.c. So the
static doesn't make any affect. And Carolyn's team is working rebuild
i40e and i40evf.

Cheers,
Dongsheng

On 9/7/2018 7:19 PM, Wang, Dongsheng wrote:
> Remove "inline" from __i40e_add_stat_strings and move the function.
>
> In file included from
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c:9:0:
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c: In function
> __i40e_add_stat_string:
> drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h:193:20: error:
> function __i40e_add_stat_strings can never be inlined because it uses
> variable argument lists
>  static inline void __i40e_add_stat_strings(u8 **p, const struct
> 					    i40e_stats stats[],
>
> Tested on: x86_64, make ARCH=i386
>
> Modules section .text:
> i40e: 00019380 <__i40e_add_stat_strings>:
> i40evf: 00006b00 <__i40e_add_stat_strings>:
>
> Buildin section .text:
> i40e: c351ca60 <__i40e_add_stat_strings>:
> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
> ---
> V3: add static 
> V2: Move function
> ---
>  .../net/ethernet/intel/i40e/i40e_ethtool.c    | 24 ++++++++++++++++++
>  .../ethernet/intel/i40e/i40e_ethtool_stats.h  | 25 ++-----------------
>  2 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index d7d3974beca2..34121a72f2e7 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1821,6 +1821,30 @@ static void i40e_get_ethtool_stats(struct net_device *netdev,
>  		  "ethtool stats count mismatch!");
>  }
>  
> +/**
> + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> + * @p: ethtool supplied buffer
> + * @stats: stat definitions array
> + * @size: size of the stats array
> + *
> + * Format and copy the strings described by stats into the buffer pointed at
> + * by p.
> + **/
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> +				    const unsigned int size, ...)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < size; i++) {
> +		va_list args;
> +
> +		va_start(args, size);
> +		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> +		*p += ETH_GSTRING_LEN;
> +		va_end(args);
> +	}
> +}
> +
>  /**
>   * i40e_get_stat_strings - copy stat strings into supplied buffer
>   * @netdev: the netdev to collect strings for
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> index bba1cb0b658f..553b0d720839 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> @@ -181,29 +181,8 @@ i40e_add_queue_stats(u64 **data, struct i40e_ring *ring)
>  	*data += size;
>  }
>  
> -/**
> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> - * @p: ethtool supplied buffer
> - * @stats: stat definitions array
> - * @size: size of the stats array
> - *
> - * Format and copy the strings described by stats into the buffer pointed at
> - * by p.
> - **/
> -static inline void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> -				    const unsigned int size, ...)
> -{
> -	unsigned int i;
> -
> -	for (i = 0; i < size; i++) {
> -		va_list args;
> -
> -		va_start(args, size);
> -		vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> -		*p += ETH_GSTRING_LEN;
> -		va_end(args);
> -	}
> -}
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> +				    const unsigned int size, ...);
>  
>  /**
>   * 40e_add_stat_strings - copy stat strings into ethtool buffer

^ permalink raw reply

* [PATCH v2 1/4] i40e: clean zero-copy XDP Tx ring on shutdown/reset
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

When the zero-copy enabled XDP Tx ring is torn down, due to
configuration changes, outstandning frames on the hardware descriptor
ring are queued on the completion ring.

The completion ring has a back-pressure mechanism that will guarantee
that there is sufficient space on the ring.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 17 +++++++----
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 30 +++++++++++++++++++
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 37bd4e50ccde..7f85d4ba8b54 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -636,13 +636,18 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
 	unsigned long bi_size;
 	u16 i;
 
-	/* ring already cleared, nothing to do */
-	if (!tx_ring->tx_bi)
-		return;
+	if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
+		i40e_xsk_clean_tx_ring(tx_ring);
+	} else {
+		/* ring already cleared, nothing to do */
+		if (!tx_ring->tx_bi)
+			return;
 
-	/* Free all the Tx ring sk_buffs */
-	for (i = 0; i < tx_ring->count; i++)
-		i40e_unmap_and_free_tx_resource(tx_ring, &tx_ring->tx_bi[i]);
+		/* Free all the Tx ring sk_buffs */
+		for (i = 0; i < tx_ring->count; i++)
+			i40e_unmap_and_free_tx_resource(tx_ring,
+							&tx_ring->tx_bi[i]);
+	}
 
 	bi_size = sizeof(struct i40e_tx_buffer) * tx_ring->count;
 	memset(tx_ring->tx_bi, 0, bi_size);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index b5afd479a9c5..29c68b29d36f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -87,4 +87,6 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 	}
 }
 
+void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
+
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2ebfc78bbd09..99116277c4d2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -830,3 +830,33 @@ int i40e_xsk_async_xmit(struct net_device *dev, u32 queue_id)
 
 	return 0;
 }
+
+/**
+ * i40e_xsk_clean_xdp_ring - Clean the XDP Tx ring on shutdown
+ * @xdp_ring: XDP Tx ring
+ **/
+void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
+{
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct i40e_tx_buffer *tx_bi;
+	u32 xsk_frames = 0;
+
+	while (ntc != ntu) {
+		tx_bi = &tx_ring->tx_bi[ntc];
+
+		if (tx_bi->xdpf)
+			i40e_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+
+		ntc++;
+		if (ntc > tx_ring->count)
+			ntc = 0;
+	}
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+}
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 0/4] i40e AF_XDP zero-copy buffer leak fixes
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev

From: Björn Töpel <bjorn.topel@intel.com>

NB! The v1 was sent via the bpf-next tree. This time the series is
routed via JeffK's Intel Wired tree to minimize the risk for i40e
merge conflicts.

This series addresses an AF_XDP zero-copy issue that buffers passed
from userspace to the kernel was leaked when the hardware descriptor
ring was torn down.

The patches fixes the i40e AF_XDP zero-copy implementation.

Thanks to Jakub Kicinski for pointing this out!

Some background for folks that don't know the details: A zero-copy
capable driver picks buffers off the fill ring and places them on the
hardware Rx ring to be completed at a later point when DMA is
complete. Similar on the Tx side; The driver picks buffers off the Tx
ring and places them on the Tx hardware ring.

In the typical flow, the Rx buffer will be placed onto an Rx ring
(completed to the user), and the Tx buffer will be placed on the
completion ring to notify the user that the transfer is done.

However, if the driver needs to tear down the hardware rings for some
reason (interface goes down, reconfiguration and such), the userspace
buffers cannot be leaked. They have to be reused or completed back to
userspace.

The implementation does the following:

* Outstanding Tx descriptors will be passed to the completion
  ring. The Tx code has back-pressure mechanism in place, so that
  enough empty space in the completion ring is guaranteed.

* Outstanding Rx descriptors are temporarily stored on a stash/reuse
  queue. The reuse queue is based on Jakub's RFC. When/if the HW rings
  comes up again, entries from the stash are used to re-populate the
  ring.

* When AF_XDP ZC is enabled, disallow changing the number of hardware
  descriptors via ethtool. Otherwise, the size of the stash/reuse
  queue can grow unbounded.

Going forward, introducing a "zero-copy allocator" analogous to Jesper
Brouer's page pool would be a more robust and reuseable solution.

v1->v2: Address kbuild "undefined symbols" error when building with
        !CONFIG_XDP_SOCKETS.

Thanks!
Björn


Björn Töpel (3):
  i40e: clean zero-copy XDP Tx ring on shutdown/reset
  i40e: clean zero-copy XDP Rx ring on shutdown/reset
  i40e: disallow changing the number of descriptors when AF_XDP is on

Jakub Kicinski (1):
  net: xsk: add a simple buffer reuse queue

 .../net/ethernet/intel/i40e/i40e_ethtool.c    |   9 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  21 ++-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |   4 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 152 +++++++++++++++++-
 include/net/xdp_sock.h                        |  69 ++++++++
 net/xdp/xdp_umem.c                            |   2 +
 net/xdp/xsk_queue.c                           |  55 +++++++
 net/xdp/xsk_queue.h                           |   3 +
 8 files changed, 299 insertions(+), 16 deletions(-)

-- 
2.17.1

^ permalink raw reply

* [PATCH v2 2/4] net: xsk: add a simple buffer reuse queue
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>

XSK UMEM is strongly single producer single consumer so reuse of
frames is challenging.  Add a simple "stash" of FILL packets to
reuse for drivers to optionally make use of.  This is useful
when driver has to free (ndo_stop) or resize a ring with an active
AF_XDP ZC socket.

v2: Fixed build issues for !CONFIG_XDP_SOCKETS.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/xdp_sock.h | 69 ++++++++++++++++++++++++++++++++++++++++++
 net/xdp/xdp_umem.c     |  2 ++
 net/xdp/xsk_queue.c    | 55 +++++++++++++++++++++++++++++++++
 net/xdp/xsk_queue.h    |  3 ++
 4 files changed, 129 insertions(+)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 932ca0dad6f3..70a115bea4f4 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -21,6 +21,12 @@ struct xdp_umem_page {
 	dma_addr_t dma;
 };
 
+struct xdp_umem_fq_reuse {
+	u32 nentries;
+	u32 length;
+	u64 handles[];
+};
+
 struct xdp_umem {
 	struct xsk_queue *fq;
 	struct xsk_queue *cq;
@@ -37,6 +43,7 @@ struct xdp_umem {
 	struct page **pgs;
 	u32 npgs;
 	struct net_device *dev;
+	struct xdp_umem_fq_reuse *fq_reuse;
 	u16 queue_id;
 	bool zc;
 	spinlock_t xsk_list_lock;
@@ -75,6 +82,10 @@ void xsk_umem_discard_addr(struct xdp_umem *umem);
 void xsk_umem_complete_tx(struct xdp_umem *umem, u32 nb_entries);
 bool xsk_umem_consume_tx(struct xdp_umem *umem, dma_addr_t *dma, u32 *len);
 void xsk_umem_consume_tx_done(struct xdp_umem *umem);
+struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
+struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
+					  struct xdp_umem_fq_reuse *newq);
+void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
 
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
@@ -85,6 +96,35 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
 	return umem->pages[addr >> PAGE_SHIFT].dma + (addr & (PAGE_SIZE - 1));
 }
+
+/* Reuse-queue aware version of FILL queue helpers */
+static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+{
+	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+	if (!rq->length)
+		return xsk_umem_peek_addr(umem, addr);
+
+	*addr = rq->handles[rq->length - 1];
+	return addr;
+}
+
+static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+{
+	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+	if (!rq->length)
+		xsk_umem_discard_addr(umem);
+	else
+		rq->length--;
+}
+
+static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
+{
+	struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+	rq->handles[rq->length++] = addr;
+}
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -128,6 +168,21 @@ static inline void xsk_umem_consume_tx_done(struct xdp_umem *umem)
 {
 }
 
+static inline struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries)
+{
+	return NULL;
+}
+
+static inline struct xdp_umem_fq_reuse *xsk_reuseq_swap(
+	struct xdp_umem *umem,
+	struct xdp_umem_fq_reuse *newq)
+{
+	return NULL;
+}
+static inline void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
+{
+}
+
 static inline char *xdp_umem_get_data(struct xdp_umem *umem, u64 addr)
 {
 	return NULL;
@@ -137,6 +192,20 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem *umem, u64 addr)
 {
 	return 0;
 }
+
+static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+{
+	return NULL;
+}
+
+static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+{
+}
+
+static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
+{
+}
+
 #endif /* CONFIG_XDP_SOCKETS */
 
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index b3b632c5aeae..555427b3e0fe 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -165,6 +165,8 @@ static void xdp_umem_release(struct xdp_umem *umem)
 		umem->cq = NULL;
 	}
 
+	xsk_reuseq_destroy(umem);
+
 	xdp_umem_unpin_pages(umem);
 
 	task = get_pid_task(umem->pid, PIDTYPE_PID);
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index 2dc1384d9f27..b66504592d9b 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -3,7 +3,9 @@
  * Copyright(c) 2018 Intel Corporation.
  */
 
+#include <linux/log2.h>
 #include <linux/slab.h>
+#include <linux/overflow.h>
 
 #include "xsk_queue.h"
 
@@ -62,3 +64,56 @@ void xskq_destroy(struct xsk_queue *q)
 	page_frag_free(q->ring);
 	kfree(q);
 }
+
+struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries)
+{
+	struct xdp_umem_fq_reuse *newq;
+
+	/* Check for overflow */
+	if (nentries > (u32)roundup_pow_of_two(nentries))
+		return NULL;
+	nentries = roundup_pow_of_two(nentries);
+
+	newq = kvmalloc(struct_size(newq, handles, nentries), GFP_KERNEL);
+	if (!newq)
+		return NULL;
+	memset(newq, 0, offsetof(typeof(*newq), handles));
+
+	newq->nentries = nentries;
+	return newq;
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_prepare);
+
+struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
+					  struct xdp_umem_fq_reuse *newq)
+{
+	struct xdp_umem_fq_reuse *oldq = umem->fq_reuse;
+
+	if (!oldq) {
+		umem->fq_reuse = newq;
+		return NULL;
+	}
+
+	if (newq->nentries < oldq->length)
+		return newq;
+
+	memcpy(newq->handles, oldq->handles,
+	       array_size(oldq->length, sizeof(u64)));
+	newq->length = oldq->length;
+
+	umem->fq_reuse = newq;
+	return oldq;
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_swap);
+
+void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
+{
+	kvfree(rq);
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_free);
+
+void xsk_reuseq_destroy(struct xdp_umem *umem)
+{
+	xsk_reuseq_free(umem->fq_reuse);
+	umem->fq_reuse = NULL;
+}
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 82252cccb4e0..bcb5cbb40419 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -258,4 +258,7 @@ void xskq_set_umem(struct xsk_queue *q, u64 size, u64 chunk_mask);
 struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
 void xskq_destroy(struct xsk_queue *q_ops);
 
+/* Executed by the core when the entire UMEM gets freed */
+void xsk_reuseq_destroy(struct xdp_umem *umem);
+
 #endif /* _LINUX_XSK_QUEUE_H */
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 3/4] i40e: clean zero-copy XDP Rx ring on shutdown/reset
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

Outstanding Rx descriptors are temporarily stored on a stash/reuse
queue. When/if the HW rings comes up again, entries from the stash are
used to re-populate the ring.

The latter required some restructuring of the allocation scheme for
the AF_XDP zero-copy implementation. There is now a fast, and a slow
allocation. The "fast allocation" is used from the fast-path and
obtains free buffers from the fill ring and the internal recycle
mechanism. The "slow allocation" is only used in ring setup, and
obtains buffers from the fill ring and the stash (if any).

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   4 +-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |   1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 100 ++++++++++++++++--
 3 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 7f85d4ba8b54..740ea58ba938 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1355,8 +1355,10 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 		rx_ring->skb = NULL;
 	}
 
-	if (rx_ring->xsk_umem)
+	if (rx_ring->xsk_umem) {
+		i40e_xsk_clean_rx_ring(rx_ring);
 		goto skip_free;
+	}
 
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 29c68b29d36f..8d46acff6f2e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -87,6 +87,7 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 	}
 }
 
+void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
 
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 99116277c4d2..e4b62e871afc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -140,6 +140,7 @@ static void i40e_xsk_umem_dma_unmap(struct i40e_vsi *vsi, struct xdp_umem *umem)
 static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 				u16 qid)
 {
+	struct xdp_umem_fq_reuse *reuseq;
 	bool if_running;
 	int err;
 
@@ -156,6 +157,12 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
 			return -EBUSY;
 	}
 
+	reuseq = xsk_reuseq_prepare(vsi->rx_rings[0]->count);
+	if (!reuseq)
+		return -ENOMEM;
+
+	xsk_reuseq_free(xsk_reuseq_swap(umem, reuseq));
+
 	err = i40e_xsk_umem_dma_map(vsi, umem);
 	if (err)
 		return err;
@@ -353,16 +360,46 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring *rx_ring,
 }
 
 /**
- * i40e_alloc_rx_buffers_zc - Allocates a number of Rx buffers
+ * i40e_alloc_buffer_slow_zc - Allocates an i40e_rx_buffer
  * @rx_ring: Rx ring
- * @count: The number of buffers to allocate
+ * @bi: Rx buffer to populate
  *
- * This function allocates a number of Rx buffers and places them on
- * the Rx ring.
+ * This function allocates an Rx buffer. The buffer can come from fill
+ * queue, or via the reuse queue.
  *
  * Returns true for a successful allocation, false otherwise
  **/
-bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
+static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
+				      struct i40e_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	u64 handle, hr;
+
+	if (!xsk_umem_peek_addr_rq(umem, &handle)) {
+		rx_ring->rx_stats.alloc_page_failed++;
+		return false;
+	}
+
+	handle &= rx_ring->xsk_umem->chunk_mask;
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr_rq(umem);
+	return true;
+}
+
+static __always_inline bool __i40e_alloc_rx_buffers_zc(
+	struct i40e_ring *rx_ring, u16 count,
+	bool alloc(struct i40e_ring *rx_ring,
+		   struct i40e_rx_buffer *bi))
 {
 	u16 ntu = rx_ring->next_to_use;
 	union i40e_rx_desc *rx_desc;
@@ -372,7 +409,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 	rx_desc = I40E_RX_DESC(rx_ring, ntu);
 	bi = &rx_ring->rx_bi[ntu];
 	do {
-		if (!i40e_alloc_buffer_zc(rx_ring, bi)) {
+		if (!alloc(rx_ring, bi)) {
 			ok = false;
 			goto no_buffers;
 		}
@@ -404,6 +441,38 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
 	return ok;
 }
 
+/**
+ * i40e_alloc_rx_buffers_zc - Allocates a number of Rx buffers
+ * @rx_ring: Rx ring
+ * @count: The number of buffers to allocate
+ *
+ * This function allocates a number of Rx buffers from the reuse queue
+ * or fill ring and places them on the Rx ring.
+ *
+ * Returns true for a successful allocation, false otherwise
+ **/
+bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
+{
+	return __i40e_alloc_rx_buffers_zc(rx_ring, count,
+					  i40e_alloc_buffer_slow_zc);
+}
+
+/**
+ * i40e_alloc_rx_buffers_fast_zc - Allocates a number of Rx buffers
+ * @rx_ring: Rx ring
+ * @count: The number of buffers to allocate
+ *
+ * This function allocates a number of Rx buffers from the fill ring
+ * or the internal recycle mechanism and places them on the Rx ring.
+ *
+ * Returns true for a successful allocation, false otherwise
+ **/
+static bool i40e_alloc_rx_buffers_fast_zc(struct i40e_ring *rx_ring, u16 count)
+{
+	return __i40e_alloc_rx_buffers_zc(rx_ring, count,
+					  i40e_alloc_buffer_zc);
+}
+
 /**
  * i40e_get_rx_buffer_zc - Return the current Rx buffer
  * @rx_ring: Rx ring
@@ -571,8 +640,8 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
 			failure = failure ||
-				  !i40e_alloc_rx_buffers_zc(rx_ring,
-							    cleaned_count);
+				  !i40e_alloc_rx_buffers_fast_zc(rx_ring,
+								 cleaned_count);
 			cleaned_count = 0;
 		}
 
@@ -831,6 +900,21 @@ int i40e_xsk_async_xmit(struct net_device *dev, u32 queue_id)
 	return 0;
 }
 
+void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring)
+{
+	u16 i;
+
+	for (i = 0; i < rx_ring->count; i++) {
+		struct i40e_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
+
+		if (!rx_bi->addr)
+			continue;
+
+		xsk_umem_fq_reuse(rx_ring->xsk_umem, rx_bi->handle);
+		rx_bi->addr = NULL;
+	}
+}
+
 /**
  * i40e_xsk_clean_xdp_ring - Clean the XDP Tx ring on shutdown
  * @xdp_ring: XDP Tx ring
-- 
2.17.1

^ permalink raw reply related

* [PATCH v2 4/4] i40e: disallow changing the number of descriptors when AF_XDP is on
From: Björn Töpel @ 2018-09-07  8:18 UTC (permalink / raw)
  To: ast, daniel, jeffrey.t.kirsher, intel-wired-lan, jakub.kicinski
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, netdev
In-Reply-To: <20180907081848.5438-1-bjorn.topel@gmail.com>

From: Björn Töpel <bjorn.topel@intel.com>

When an AF_XDP UMEM is attached to any of the Rx rings, we disallow a
user to change the number of descriptors via e.g. "ethtool -G IFNAME".

Otherwise, the size of the stash/reuse queue can grow unbounded, which
would result in OOM or leaking userspace buffers.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 .../net/ethernet/intel/i40e/i40e_ethtool.c    |  9 +++++++-
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 22 +++++++++++++++++++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index d7d3974beca2..3cd2c88c72f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -5,7 +5,7 @@
 
 #include "i40e.h"
 #include "i40e_diag.h"
-
+#include "i40e_txrx_common.h"
 #include "i40e_ethtool_stats.h"
 
 #define I40E_PF_STAT(_name, _stat) \
@@ -1493,6 +1493,13 @@ static int i40e_set_ringparam(struct net_device *netdev,
 	    (new_rx_count == vsi->rx_rings[0]->count))
 		return 0;
 
+	/* If there is a AF_XDP UMEM attached to any of Rx rings,
+	 * disallow changing the number of descriptors -- regardless
+	 * if the netdev is running or not.
+	 */
+	if (i40e_xsk_any_rx_ring_enabled(vsi))
+		return -EBUSY;
+
 	while (test_and_set_bit(__I40E_CONFIG_BUSY, pf->state)) {
 		timeout--;
 		if (!timeout)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 8d46acff6f2e..09809dffe399 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -89,5 +89,6 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 
 void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
+bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi);
 
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index e4b62e871afc..119f59ec7cc0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -944,3 +944,25 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
 	if (xsk_frames)
 		xsk_umem_complete_tx(umem, xsk_frames);
 }
+
+/**
+ * i40e_xsk_any_rx_ring_enabled - Checks whether any of the Rx rings
+ * has AF_XDP UMEM attached
+ * @vsi: vsi
+ *
+ * Returns true if any of the Rx rings has an AF_XDP UMEM attached
+ **/
+bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
+{
+	int i;
+
+	if (!vsi->xsk_umems)
+		return false;
+
+	for (i = 0; i < vsi->num_queue_pairs; i++) {
+		if (vsi->xsk_umems[i])
+			return true;
+	}
+
+	return false;
+}
-- 
2.17.1

^ permalink raw reply related

* Re: Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-09-07 13:00 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <20180907012225.GA32677@debian>

On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > Are there still plans to test the performance with vost pmd?
> > vhost doesn't seem to show a performance gain ...
> > 
> 
> I tried some performance tests with vhost PMD. In guest, the
> XDP program will return XDP_DROP directly. And in host, testpmd
> will do txonly fwd.
> 
> When burst size is 1 and packet size is 64 in testpmd and
> testpmd needs to iterate 5 Tx queues (but only the first two
> queues are enabled) to prepare and inject packets, I got ~12%
> performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> is faster (e.g. just need to iterate the first two queues to
> prepare and inject packets), then I got similar performance
> for both rings (~9.9Mpps) (packed ring's performance can be
> lower, because it's more complicated in driver.)
> 
> I think packed ring makes vhost PMD faster, but it doesn't make
> the driver faster. In packed ring, the ring is simplified, and
> the handling of the ring in vhost (device) is also simplified,
> but things are not simplified in driver, e.g. although there is
> no desc table in the virtqueue anymore, driver still needs to
> maintain a private desc state table (which is still managed as
> a list in this patch set) to support the out-of-order desc
> processing in vhost (device).
> 
> I think this patch set is mainly to make the driver have a full
> functional support for the packed ring, which makes it possible
> to leverage the packed ring feature in vhost (device). But I'm
> not sure whether there is any other better idea, I'd like to
> hear your thoughts. Thanks!

Just this: Jens seems to report a nice gain with virtio and
vhost pmd across the board. Try to compare virtio and
virtio pmd to see what does pmd do better?


> 
> > 
> > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > Hello everyone,
> > > 
> > > This patch set implements packed ring support in virtio driver.
> > > 
> > > Some functional tests have been done with Jason's
> > > packed ring implementation in vhost:
> > > 
> > > https://lkml.org/lkml/2018/7/3/33
> > > 
> > > Both of ping and netperf worked as expected.
> > > 
> > > v1 -> v2:
> > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > - Add comments related to ccw (Jason);
> > > 
> > > RFC (v6) -> v1:
> > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > >   when event idx is off (Jason);
> > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > - Test the state of the desc at used_idx instead of last_used_idx
> > >   in virtqueue_enable_cb_delayed_packed() (Jason);
> > > - Save wrap counter (as part of queue state) in the return value
> > >   of virtqueue_enable_cb_prepare_packed();
> > > - Refine the packed ring definitions in uapi;
> > > - Rebase on the net-next tree;
> > > 
> > > RFC v5 -> RFC v6:
> > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > - Define wrap counter as bool (Jason);
> > > - Use ALIGN() in vring_init_packed() (Jason);
> > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > - Add comments for barriers (Jason);
> > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > - Refine the memory barrier in virtqueue_poll();
> > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > 
> > > RFC v4 -> RFC v5:
> > > - Save DMA addr, etc in desc state (Jason);
> > > - Track used wrap counter;
> > > 
> > > RFC v3 -> RFC v4:
> > > - Make ID allocation support out-of-order (Jason);
> > > - Various fixes for EVENT_IDX support;
> > > 
> > > RFC v2 -> RFC v3:
> > > - Split into small patches (Jason);
> > > - Add helper virtqueue_use_indirect() (Jason);
> > > - Just set id for the last descriptor of a list (Jason);
> > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > - Fix/improve desc suppression code (Jason/MST);
> > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > - Fix the comments and API in uapi (MST);
> > > - Remove the BUG_ON() for indirect (Jason);
> > > - Some other refinements and bug fixes;
> > > 
> > > RFC v1 -> RFC v2:
> > > - Add indirect descriptor support - compile test only;
> > > - Add event suppression supprt - compile test only;
> > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > - Avoid using '%' operator (Jason);
> > > - Rename free_head -> next_avail_idx (Jason);
> > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > - Some other refinements and bug fixes;
> > > 
> > > Thanks!
> > > 
> > > Tiwei Bie (5):
> > >   virtio: add packed ring definitions
> > >   virtio_ring: support creating packed ring
> > >   virtio_ring: add packed ring support
> > >   virtio_ring: add event idx support in packed ring
> > >   virtio_ring: enable packed ring
> > > 
> > >  drivers/s390/virtio/virtio_ccw.c   |   14 +
> > >  drivers/virtio/virtio_ring.c       | 1365 ++++++++++++++++++++++------
> > >  include/linux/virtio_ring.h        |    8 +-
> > >  include/uapi/linux/virtio_config.h |    3 +
> > >  include/uapi/linux/virtio_ring.h   |   43 +
> > >  5 files changed, 1157 insertions(+), 276 deletions(-)
> > > 
> > > -- 
> > > 2.18.0
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > 

^ permalink raw reply

* Re: [PATCH] ethernet: renesas: convert to SPDX identifiers
From: Sergei Shtylyov @ 2018-09-07  9:05 UTC (permalink / raw)
  To: Kuninori Morimoto, David S. Miller, Mark Brown
  Cc: Geert Uytterhoeven, Robin Murphy, netdev, linux-renesas-soc
In-Reply-To: <875zziqdem.wl-kuninori.morimoto.gx@renesas.com>

Hello!

On 9/7/2018 5:02 AM, Kuninori Morimoto wrote:

> From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> 
> This patch updates license to use SPDX-License-Identifier
> instead of verbose license text.
> 
> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> ---
>   drivers/net/ethernet/renesas/Kconfig    | 1 +
>   drivers/net/ethernet/renesas/Makefile   | 1 +
>   drivers/net/ethernet/renesas/ravb_ptp.c | 6 +-----
>   3 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/renesas/Kconfig b/drivers/net/ethernet/renesas/Kconfig
> index f3f7477..bb0ebdf 100644
> --- a/drivers/net/ethernet/renesas/Kconfig
> +++ b/drivers/net/ethernet/renesas/Kconfig
> @@ -1,3 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
>   #
>   # Renesas device configuration
>   #

    There was no license at all on this file...

> diff --git a/drivers/net/ethernet/renesas/Makefile b/drivers/net/ethernet/renesas/Makefile
> index a05102a..f21ab8c 100644
> --- a/drivers/net/ethernet/renesas/Makefile
> +++ b/drivers/net/ethernet/renesas/Makefile
> @@ -1,3 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
>   #
>   # Makefile for the Renesas device drivers.
>   #

    Likewise.

> diff --git a/drivers/net/ethernet/renesas/ravb_ptp.c b/drivers/net/ethernet/renesas/ravb_ptp.c
> index eede70e..0721b5c 100644
> --- a/drivers/net/ethernet/renesas/ravb_ptp.c
> +++ b/drivers/net/ethernet/renesas/ravb_ptp.c
> @@ -1,13 +1,9 @@
> +// SPDX-License-Identifier: GPL-2.0+
>   /* PTP 1588 clock using the Renesas Ethernet AVB
>    *
>    * Copyright (C) 2013-2015 Renesas Electronics Corporation
>    * Copyright (C) 2015 Renesas Solutions Corp.
>    * Copyright (C) 2015-2016 Cogent Embedded, Inc. <source@cogentembedded.com>
> - *
> - *  This program is free software; you can redistribute it and/or modify
> - *  it under the terms of the GNU General Public License as published by
> - *  the Free Software Foundation; either version 2 of the License, or
> - *  (at your option) any later version.
>    */
>   
>   #include "ravb.h"

    This part looks valid, don't know why this file was missed by Wolfram who 
did the previous patch...

MBR, Sergei

^ permalink raw reply

* Re: [PATCH] xen/netfront: fix waiting for xenbus state change
From: Boris Ostrovsky @ 2018-09-07 13:46 UTC (permalink / raw)
  To: Juergen Gross, linux-kernel, xen-devel, netdev; +Cc: davem, stable
In-Reply-To: <20180907122130.30810-1-jgross@suse.com>

On 09/07/2018 08:21 AM, Juergen Gross wrote:
> Commit 822fb18a82aba ("xen-netfront: wait xenbus state change when load
> module manually") added a new wait queue to wait on for a state change
> when the module is loaded manually. Unfortunately there is no wakeup
> anywhere to stop that waiting.
>
> Instead of introducing a new wait queue rename the existing
> module_unload_q to module_wq and use it for both purposes (loading and
> unloading).
>
> As any state change of the backend might be intended to stop waiting
> do the wake_up_all() in any case when netback_changed() is called.
>
> Fixes: 822fb18a82aba ("xen-netfront: wait xenbus state change when load module manually")
> Cc: <stable@vger.kernel.org> #4.18
> Signed-off-by: Juergen Gross <jgross@suse.com>


Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

^ permalink raw reply

* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Michael S. Tsirkin @ 2018-09-07 13:49 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <20180711022711.7090-4-tiwei.bie@intel.com>

On Wed, Jul 11, 2018 at 10:27:09AM +0800, Tiwei Bie wrote:
> This commit introduces the support (without EVENT_IDX) for
> packed ring.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  drivers/virtio/virtio_ring.c | 495 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 487 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c4f8abc7445a..f317b485ba54 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -55,12 +55,21 @@
>  #define END_USE(vq)
>  #endif
>  
> +#define _VRING_DESC_F_AVAIL(b)	((__u16)(b) << 7)
> +#define _VRING_DESC_F_USED(b)	((__u16)(b) << 15)

Underscore followed by an upper case letter isn't a good idea
for a symbol. And it's not nice that this does not
use the VRING_DESC_F_USED from the header.
How about ((b) ? VRING_DESC_F_USED : 0) instead?
Is produced code worse then?

> +
>  struct vring_desc_state {
>  	void *data;			/* Data for callback. */
>  	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>  };
>  
>  struct vring_desc_state_packed {
> +	void *data;			/* Data for callback. */
> +	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */

Include vring_desc_state for these?

> +	int num;			/* Descriptor list length. */
> +	dma_addr_t addr;		/* Buffer DMA addr. */
> +	u32 len;			/* Buffer length. */
> +	u16 flags;			/* Descriptor flags. */

This seems only to be used for indirect. Check indirect field above
instead and drop this.

>  	int next;			/* The next desc state. */

Packing things into 16 bit integers will help reduce
cache pressure here.

Also, this is only used for unmap, so when dma API is not used,
maintaining addr and len list management is unnecessary. How about we
maintain addr/len in a separate array, avoiding accesses on unmap in that case?

Also, lots of architectures have a nop unmap in the DMA API.
See a proposed patch at the end for optimizing that case.

>  };
>  
> @@ -660,7 +669,6 @@ static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	virtio_mb(vq->weak_barriers);
>  	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
>  }

why is this changing the split queue implementation?


>  
> @@ -757,6 +765,72 @@ static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
>  		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
>  }
>  
> +static void vring_unmap_state_packed(const struct vring_virtqueue *vq,
> +				     struct vring_desc_state_packed *state)
> +{
> +	u16 flags;
> +
> +	if (!vring_use_dma_api(vq->vq.vdev))
> +		return;
> +
> +	flags = state->flags;
> +
> +	if (flags & VRING_DESC_F_INDIRECT) {
> +		dma_unmap_single(vring_dma_dev(vq),
> +				 state->addr, state->len,
> +				 (flags & VRING_DESC_F_WRITE) ?
> +				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	} else {
> +		dma_unmap_page(vring_dma_dev(vq),
> +			       state->addr, state->len,
> +			       (flags & VRING_DESC_F_WRITE) ?
> +			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	}
> +}
> +
> +static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> +				   struct vring_packed_desc *desc)
> +{
> +	u16 flags;
> +
> +	if (!vring_use_dma_api(vq->vq.vdev))
> +		return;
> +
> +	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);

I see no reason to use virtioXX wrappers for the packed ring.
That's there to support legacy devices. Just use leXX.

> +
> +	if (flags & VRING_DESC_F_INDIRECT) {
> +		dma_unmap_single(vring_dma_dev(vq),
> +				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> +				 (flags & VRING_DESC_F_WRITE) ?
> +				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	} else {
> +		dma_unmap_page(vring_dma_dev(vq),
> +			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> +			       (flags & VRING_DESC_F_WRITE) ?
> +			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +	}
> +}
> +
> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> +						       unsigned int total_sg,
> +						       gfp_t gfp)
> +{
> +	struct vring_packed_desc *desc;
> +
> +	/*
> +	 * We require lowmem mappings for the descriptors because
> +	 * otherwise virt_to_phys will give us bogus addresses in the
> +	 * virtqueue.

Where is virt_to_phys used? I don't see it in this patch.

> +	 */
> +	gfp &= ~__GFP_HIGHMEM;
> +
> +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> +
> +	return desc;
> +}
> +
>  static inline int virtqueue_add_packed(struct virtqueue *_vq,
>  				       struct scatterlist *sgs[],
>  				       unsigned int total_sg,
> @@ -766,47 +840,449 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>  				       void *ctx,
>  				       gfp_t gfp)
>  {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_packed_desc *desc;
> +	struct scatterlist *sg;
> +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> +	__virtio16 uninitialized_var(head_flags), flags;
> +	u16 head, avail_wrap_counter, id, curr;
> +	bool indirect;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +	BUG_ON(ctx && vq->indirect);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return -EIO;
> +	}
> +
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif

Add incline helpers for this debug stuff rather than
duplicate it from split ring?


> +
> +	BUG_ON(total_sg == 0);
> +
> +	head = vq->next_avail_idx;
> +	avail_wrap_counter = vq->avail_wrap_counter;
> +
> +	if (virtqueue_use_indirect(_vq, total_sg))
> +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> +	else {
> +		desc = NULL;
> +		WARN_ON_ONCE(total_sg > vq->vring_packed.num && !vq->indirect);
> +	}


Apparently this attempts to treat indirect descriptors same as
direct ones. This is how it is in the split ring, but not in
the packed one. I think you want two separate functions, for
direct and indirect.

> +
> +	if (desc) {
> +		/* Use a single buffer which doesn't continue */
> +		indirect = true;
> +		/* Set up rest to use this indirect table. */
> +		i = 0;
> +		descs_used = 1;
> +	} else {
> +		indirect = false;
> +		desc = vq->vring_packed.desc;
> +		i = head;
> +		descs_used = total_sg;
> +	}
> +
> +	if (vq->vq.num_free < descs_used) {
> +		pr_debug("Can't add buf len %i - avail = %i\n",
> +			 descs_used, vq->vq.num_free);
> +		/* FIXME: for historical reasons, we force a notify here if
> +		 * there are outgoing parts to the buffer.  Presumably the
> +		 * host should service the ring ASAP. */
> +		if (out_sgs)
> +			vq->notify(&vq->vq);
> +		if (indirect)
> +			kfree(desc);
> +		END_USE(vq);
> +		return -ENOSPC;
> +	}
> +
> +	id = vq->free_head;
> +	BUG_ON(id == vq->vring_packed.num);
> +
> +	curr = id;
> +	for (n = 0; n < out_sgs + in_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> +					       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +				  (n < out_sgs ? 0 : VRING_DESC_F_WRITE) |
> +				  _VRING_DESC_F_AVAIL(vq->avail_wrap_counter) |
> +				  _VRING_DESC_F_USED(!vq->avail_wrap_counter));

Spec says:
	The VIRTQ_DESC_F_WRITE flags bit is the only valid flag for descriptors in the
	indirect table.

All this logic isn't needed for indirect.

Also, why re-calculate avail/used flags every time? They only change
when you wrap around.


> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			i++;
> +			if (!indirect) {
> +				if (vring_use_dma_api(_vq->vdev)) {
> +					vq->desc_state_packed[curr].addr = addr;
> +					vq->desc_state_packed[curr].len =
> +						sg->length;
> +					vq->desc_state_packed[curr].flags =
> +						virtio16_to_cpu(_vq->vdev,
> +								flags);
> +				}
> +				curr = vq->desc_state_packed[curr].next;
> +
> +				if (i >= vq->vring_packed.num) {
> +					i = 0;
> +					vq->avail_wrap_counter ^= 1;
> +				}
> +			}
> +		}
> +	}
> +
> +	prev = (i > 0 ? i : vq->vring_packed.num) - 1;
> +	desc[prev].id = cpu_to_virtio16(_vq->vdev, id);

Is it easier to write this out in all descriptors, to avoid the need to
calculate prev?

> +
> +	/* Last one doesn't continue. */
> +	if (total_sg == 1)
> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> +	else
> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev,
> +						~VRING_DESC_F_NEXT);

Wouldn't it be easier to avoid setting VRING_DESC_F_NEXT
in the first place?
	 if (n != in_sgs - 1 && n != out_sgs - 1)

must be better than writing descriptor an extra time.

> +
> +	if (indirect) {
> +		/* Now that the indirect table is filled in, map it. */
> +		dma_addr_t addr = vring_map_single(
> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> +			DMA_TO_DEVICE);
> +		if (vring_mapping_error(vq, addr))
> +			goto unmap_release;
> +
> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> +				      _VRING_DESC_F_AVAIL(avail_wrap_counter) |
> +				      _VRING_DESC_F_USED(!avail_wrap_counter));
> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev,
> +								   addr);
> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> +				total_sg * sizeof(struct vring_packed_desc));
> +		vq->vring_packed.desc[head].id = cpu_to_virtio16(_vq->vdev, id);
> +
> +		if (vring_use_dma_api(_vq->vdev)) {
> +			vq->desc_state_packed[id].addr = addr;
> +			vq->desc_state_packed[id].len = total_sg *
> +					sizeof(struct vring_packed_desc);
> +			vq->desc_state_packed[id].flags =
> +					virtio16_to_cpu(_vq->vdev, head_flags);
> +		}
> +	}
> +
> +	/* We're using some buffers from the free list. */
> +	vq->vq.num_free -= descs_used;
> +
> +	/* Update free pointer */
> +	if (indirect) {
> +		n = head + 1;
> +		if (n >= vq->vring_packed.num) {
> +			n = 0;
> +			vq->avail_wrap_counter ^= 1;
> +		}
> +		vq->next_avail_idx = n;
> +		vq->free_head = vq->desc_state_packed[id].next;
> +	} else {
> +		vq->next_avail_idx = i;
> +		vq->free_head = curr;
> +	}
> +
> +	/* Store token and indirect buffer state. */
> +	vq->desc_state_packed[id].num = descs_used;
> +	vq->desc_state_packed[id].data = data;
> +	if (indirect)
> +		vq->desc_state_packed[id].indir_desc = desc;
> +	else
> +		vq->desc_state_packed[id].indir_desc = ctx;
> +
> +	/* A driver MUST NOT make the first descriptor in the list
> +	 * available before all subsequent descriptors comprising
> +	 * the list are made available. */
> +	virtio_wmb(vq->weak_barriers);
> +	vq->vring_packed.desc[head].flags = head_flags;
> +	vq->num_added += descs_used;
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +
> +	return 0;
> +
> +unmap_release:
> +	err_idx = i;
> +	i = head;
> +
> +	for (n = 0; n < total_sg; n++) {
> +		if (i == err_idx)
> +			break;
> +		vring_unmap_desc_packed(vq, &desc[i]);
> +		i++;
> +		if (!indirect && i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->avail_wrap_counter = avail_wrap_counter;
> +
> +	if (indirect)
> +		kfree(desc);
> +
> +	END_USE(vq);
>  	return -EIO;
>  }
>  
>  static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>  {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 flags;
> +	bool needs_kick;
> +	u32 snapshot;
> +
> +	START_USE(vq);
> +	/* We need to expose the new flags value before checking notification
> +	 * suppressions. */
> +	virtio_mb(vq->weak_barriers);
> +
> +	snapshot = READ_ONCE(*(u32 *)vq->vring_packed.device);
> +	flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;

What are all these hard-coded things? Also either we do READ_ONCE
everywhere or nowhere. Why is this place special? Any why read 32 bit
if you only want the 16?

And doesn't sparse complain about cast to __virtio16?

> +
> +#ifdef DEBUG
> +	if (vq->last_add_time_valid) {
> +		WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> +					      vq->last_add_time)) > 100);
> +	}
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	needs_kick = (flags != VRING_EVENT_F_DISABLE);
> +	END_USE(vq);
> +	return needs_kick;
> +}
> +
> +static void detach_buf_packed(struct vring_virtqueue *vq,
> +			      unsigned int id, void **ctx)
> +{
> +	struct vring_desc_state_packed *state = NULL;
> +	struct vring_packed_desc *desc;
> +	unsigned int curr, i;
> +
> +	/* Clear data ptr. */
> +	vq->desc_state_packed[id].data = NULL;
> +
> +	curr = id;
> +	for (i = 0; i < vq->desc_state_packed[id].num; i++) {
> +		state = &vq->desc_state_packed[curr];
> +		vring_unmap_state_packed(vq, state);
> +		curr = state->next;
> +	}
> +
> +	BUG_ON(state == NULL);
> +	vq->vq.num_free += vq->desc_state_packed[id].num;
> +	state->next = vq->free_head;
> +	vq->free_head = id;
> +
> +	if (vq->indirect) {
> +		u32 len;
> +
> +		/* Free the indirect table, if any, now that it's unmapped. */
> +		desc = vq->desc_state_packed[id].indir_desc;
> +		if (!desc)
> +			return;
> +
> +		if (vring_use_dma_api(vq->vq.vdev)) {
> +			len = vq->desc_state_packed[id].len;
> +			for (i = 0; i < len / sizeof(struct vring_packed_desc);
> +					i++)
> +				vring_unmap_desc_packed(vq, &desc[i]);
> +		}
> +		kfree(desc);
> +		vq->desc_state_packed[id].indir_desc = NULL;
> +	} else if (ctx) {
> +		*ctx = vq->desc_state_packed[id].indir_desc;
> +	}
> +}
> +
> +static inline bool is_used_desc_packed(const struct vring_virtqueue *vq,
> +				       u16 idx, bool used_wrap_counter)
> +{
> +	u16 flags;
> +	bool avail, used;
> +
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[idx].flags);
> +	avail = !!(flags & VRING_DESC_F_AVAIL);
> +	used = !!(flags & VRING_DESC_F_USED);
> +
> +	return avail == used && used == used_wrap_counter;

I think that you don't need to look at avail flag to detect a used
descriptor. The reason device writes it is to avoid confusing
*device* next time descriptor wraps.

>  }
>  
>  static inline bool more_used_packed(const struct vring_virtqueue *vq)
>  {
> -	return false;
> +	return is_used_desc_packed(vq, vq->last_used_idx,
> +			vq->used_wrap_counter);
>  }
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  					  unsigned int *len,
>  					  void **ctx)
>  {
> -	return NULL;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	u16 last_used, id;
> +	void *ret;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	if (!more_used_packed(vq)) {
> +		pr_debug("No more buffers in queue\n");
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	/* Only get used elements after they have been exposed by host. */
> +	virtio_rmb(vq->weak_barriers);
> +
> +	last_used = vq->last_used_idx;
> +	id = virtio16_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> +
> +	if (unlikely(id >= vq->vring_packed.num)) {
> +		BAD_RING(vq, "id %u out of range\n", id);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->desc_state_packed[id].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", id);
> +		return NULL;
> +	}
> +
> +	vq->last_used_idx += vq->desc_state_packed[id].num;
> +	if (vq->last_used_idx >= vq->vring_packed.num) {
> +		vq->last_used_idx -= vq->vring_packed.num;
> +		vq->used_wrap_counter ^= 1;
> +	}
> +
> +	/* detach_buf_packed clears data, so grab it now. */
> +	ret = vq->desc_state_packed[id].data;
> +	detach_buf_packed(vq, id, ctx);
> +
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	END_USE(vq);
> +	return ret;
>  }
>  
>  static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
>  {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (vq->event_flags_shadow != VRING_EVENT_F_DISABLE) {
> +		vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
>  }
>  
>  static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
>  {
> -	return 0;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +
> +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +	}
> +
> +	END_USE(vq);
> +	return vq->last_used_idx | ((u16)vq->used_wrap_counter << 15);
>  }
>  
> -static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> +static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned off_wrap)
>  {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	bool wrap_counter;
> +	u16 used_idx;
> +
> +	wrap_counter = off_wrap >> 15;
> +	used_idx = off_wrap & ~(1 << 15);
> +
> +	return is_used_desc_packed(vq, used_idx, wrap_counter);

These >> 15 << 15 all over the place duplicate info.
Pls put 15 in the header.

Also can you maintain the counters properly shifted?
Then just use them.


>  }
>  
>  static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  {
> -	return false;
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	START_USE(vq);
> +
> +	/* We optimistically turn back on interrupts, then check if there was
> +	 * more to do. */
> +
> +	if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> +		vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> +		vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->event_flags_shadow);
> +		/* We need to enable interrupts first before re-checking
> +		 * for more used buffers. */
> +		virtio_mb(vq->weak_barriers);
> +	}
> +
> +	if (more_used_packed(vq)) {
> +		END_USE(vq);
> +		return false;
> +	}
> +
> +	END_USE(vq);
> +	return true;
>  }
>  
>  static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
>  {
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	unsigned int i;
> +	void *buf;
> +
> +	START_USE(vq);
> +
> +	for (i = 0; i < vq->vring_packed.num; i++) {
> +		if (!vq->desc_state_packed[i].data)
> +			continue;
> +		/* detach_buf clears data, so grab it now. */
> +		buf = vq->desc_state_packed[i].data;
> +		detach_buf_packed(vq, i, NULL);
> +		END_USE(vq);
> +		return buf;
> +	}
> +	/* That should have freed everything. */
> +	BUG_ON(vq->vq.num_free != vq->vring_packed.num);
> +
> +	END_USE(vq);
>  	return NULL;
>  }
>  
> @@ -1083,6 +1559,9 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> +	/* We need to enable interrupts first before re-checking
> +	 * for more used buffers. */
> +	virtio_mb(vq->weak_barriers);
>  	return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
>  			    virtqueue_poll_split(_vq, last_used_idx);
>  }

Possible optimization for when dma API is in use:

--->

dma: detecting nop unmap

drivers need to maintain the dma address for unmap purposes,
but these cycles are wasted when unmap callback is not
defined. Add an API for drivers to check that and avoid
unmap completely. Debug builds still have unmap.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----

diff --git a/include/linux/dma-debug.h b/include/linux/dma-debug.h
index a785f2507159..38b2c27c8449 100644
--- a/include/linux/dma-debug.h
+++ b/include/linux/dma-debug.h
@@ -42,6 +42,11 @@ extern void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 				 size_t size, int direction, bool map_single);
 
+static inline bool has_debug_dma_unmap(struct device *dev)
+{
+	return true;
+}
+
 extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 			     int nents, int mapped_ents, int direction);
 
@@ -121,6 +126,11 @@ static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
 {
 }
 
+static inline bool has_debug_dma_unmap(struct device *dev)
+{
+	return false;
+}
+
 static inline void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
 				    int nents, int mapped_ents, int direction)
 {
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 1db6a6b46d0d..656f3e518166 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -241,6 +241,14 @@ static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
 	return addr;
 }
 
+static inline bool has_dma_unmap(struct device *dev)
+{
+	const struct dma_map_ops *ops = get_dma_ops(dev);
+
+	return ops->unmap_page || ops->unmap_sg || ops->unmap_resource ||
+		has_dma_unmap(dev);
+}
+
 static inline void dma_unmap_single_attrs(struct device *dev, dma_addr_t addr,
 					  size_t size,
 					  enum dma_data_direction dir,

^ permalink raw reply related

* Re: [PATCH net-next] net: sched: cls_flower: dump offload count value
From: Jakub Kicinski @ 2018-09-07  9:11 UTC (permalink / raw)
  To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem
In-Reply-To: <1536248243-24092-1-git-send-email-vladbu@mellanox.com>

On Thu,  6 Sep 2018 18:37:23 +0300, Vlad Buslov wrote:
> Change flower in_hw_count type to fixed-size u32 and dump it as
> TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
> blocks and re-offload functionality.
> 
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
> Acked-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/sch_generic.h    | 2 +-
>  include/uapi/linux/pkt_cls.h | 2 ++
>  net/sched/cls_flower.c       | 5 ++++-
>  3 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index a6d00093f35e..d68ac55539a5 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
>  }
>  
>  static inline void
> -tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
> +tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
>  			  u32 *flags, bool add)
>  {
>  	if (add) {
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index be382fb0592d..2824fb7ed1c9 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -483,6 +483,8 @@ enum {
>  	TCA_FLOWER_KEY_ENC_OPTS,
>  	TCA_FLOWER_KEY_ENC_OPTS_MASK,
>  
> +	TCA_FLOWER_IN_HW_COUNT,		/* be32 */

Why be32?

I wish there was a good way to share this attribute between
classifiers :( 

>  	__TCA_FLOWER_MAX,
>  };
>  
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index 6fd9bdd93796..4b8dd37dd4f8 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -98,7 +98,7 @@ struct cls_fl_filter {
>  	struct list_head list;
>  	u32 handle;
>  	u32 flags;
> -	unsigned int in_hw_count;
> +	u32 in_hw_count;
>  	struct rcu_work rwork;
>  	struct net_device *hw_dev;
>  };
> @@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
>  	if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
>  		goto nla_put_failure;
>  
> +	if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
> +		goto nla_put_failure;
> +
>  	if (tcf_exts_dump(skb, &f->exts))
>  		goto nla_put_failure;
>  

^ permalink raw reply

* Re: [PATCH net-next v2 1/5] virtio: add packed ring definitions
From: Michael S. Tsirkin @ 2018-09-07 13:51 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <20180711022711.7090-2-tiwei.bie@intel.com>

On Wed, Jul 11, 2018 at 10:27:07AM +0800, Tiwei Bie wrote:
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  include/uapi/linux/virtio_config.h |  3 +++
>  include/uapi/linux/virtio_ring.h   | 43 ++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> index 449132c76b1c..1196e1c1d4f6 100644
> --- a/include/uapi/linux/virtio_config.h
> +++ b/include/uapi/linux/virtio_config.h
> @@ -75,6 +75,9 @@
>   */
>  #define VIRTIO_F_IOMMU_PLATFORM		33
>  
> +/* This feature indicates support for the packed virtqueue layout. */
> +#define VIRTIO_F_RING_PACKED		34
> +
>  /*
>   * Does the device support Single Root I/O Virtualization?
>   */
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index 6d5d5faa989b..0254a2ba29cf 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -44,6 +44,10 @@
>  /* This means the buffer contains a list of buffer descriptors. */
>  #define VRING_DESC_F_INDIRECT	4
>  
> +/* Mark a descriptor as available or used. */
> +#define VRING_DESC_F_AVAIL	(1ul << 7)
> +#define VRING_DESC_F_USED	(1ul << 15)
> +
>  /* The Host uses this in used->flags to advise the Guest: don't kick me when
>   * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
>   * will still kick if it's out of buffers. */
> @@ -53,6 +57,17 @@
>   * optimization.  */
>  #define VRING_AVAIL_F_NO_INTERRUPT	1
>  
> +/* Enable events. */
> +#define VRING_EVENT_F_ENABLE	0x0
> +/* Disable events. */
> +#define VRING_EVENT_F_DISABLE	0x1
> +/*
> + * Enable events for a specific descriptor
> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + * Only valid if VIRTIO_RING_F_EVENT_IDX has been negotiated.
> + */
> +#define VRING_EVENT_F_DESC	0x2
> +
>  /* We support indirect buffer descriptors */
>  #define VIRTIO_RING_F_INDIRECT_DESC	28
>  

These are for the packed ring, right? Pls prefix accordingly.
Also, you likely need macros for the wrap counters.

> @@ -171,4 +186,32 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
>  	return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
>  }
>  
> +struct vring_packed_desc_event {
> +	/* Descriptor Ring Change Event Offset/Wrap Counter. */
> +	__virtio16 off_wrap;
> +	/* Descriptor Ring Change Event Flags. */
> +	__virtio16 flags;
> +};
> +
> +struct vring_packed_desc {
> +	/* Buffer Address. */
> +	__virtio64 addr;
> +	/* Buffer Length. */
> +	__virtio32 len;
> +	/* Buffer ID. */
> +	__virtio16 id;
> +	/* The flags depending on descriptor type. */
> +	__virtio16 flags;
> +};

Don't use __virtioXX types, just __leXX ones.

> +
> +struct vring_packed {
> +	unsigned int num;
> +
> +	struct vring_packed_desc *desc;
> +
> +	struct vring_packed_desc_event *driver;
> +
> +	struct vring_packed_desc_event *device;
> +};
> +
>  #endif /* _UAPI_LINUX_VIRTIO_RING_H */
> -- 
> 2.18.0

^ permalink raw reply

* Re: [PATCH] ethernet: renesas: convert to SPDX identifiers
From: Geert Uytterhoeven @ 2018-09-07  9:11 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Kuninori Morimoto, David S. Miller, Mark Brown, Robin Murphy,
	netdev, Linux-Renesas
In-Reply-To: <d55d4d25-9966-cf34-7258-97009a9fa5ce@cogentembedded.com>

Hi Sergei,

On Fri, Sep 7, 2018 at 11:05 AM Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:
> On 9/7/2018 5:02 AM, Kuninori Morimoto wrote:
> > From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> >
> > This patch updates license to use SPDX-License-Identifier
> > instead of verbose license text.
> >
> > Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
> > ---
> >   drivers/net/ethernet/renesas/Kconfig    | 1 +
> >   drivers/net/ethernet/renesas/Makefile   | 1 +
> >   drivers/net/ethernet/renesas/ravb_ptp.c | 6 +-----
> >   3 files changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/renesas/Kconfig b/drivers/net/ethernet/renesas/Kconfig
> > index f3f7477..bb0ebdf 100644
> > --- a/drivers/net/ethernet/renesas/Kconfig
> > +++ b/drivers/net/ethernet/renesas/Kconfig
> > @@ -1,3 +1,4 @@
> > +# SPDX-License-Identifier: GPL-2.0
> >   #
> >   # Renesas device configuration
> >   #
>
>     There was no license at all on this file...

So it's licensed under GPL-2.0.
Cfr. e.g. commit 7328c8f48d1895b3 ("PCI: Add SPDX GPL-2.0 when no license
was specified")

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH] ethernet: renesas: convert to SPDX identifiers
From: Sergei Shtylyov @ 2018-09-07  9:16 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Kuninori Morimoto, David S. Miller, Mark Brown, Robin Murphy,
	netdev, Linux-Renesas
In-Reply-To: <CAMuHMdVJ+3cX5hMntJ=YJwA48gju+PEcsh-NQmpDRMCnAVG=zw@mail.gmail.com>

On 9/7/2018 12:11 PM, Geert Uytterhoeven wrote:

>> On 9/7/2018 5:02 AM, Kuninori Morimoto wrote:
>>> From: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
>>>
>>> This patch updates license to use SPDX-License-Identifier
>>> instead of verbose license text.
>>>
>>> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
>>> ---
>>>    drivers/net/ethernet/renesas/Kconfig    | 1 +
>>>    drivers/net/ethernet/renesas/Makefile   | 1 +
>>>    drivers/net/ethernet/renesas/ravb_ptp.c | 6 +-----
>>>    3 files changed, 3 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/renesas/Kconfig b/drivers/net/ethernet/renesas/Kconfig
>>> index f3f7477..bb0ebdf 100644
>>> --- a/drivers/net/ethernet/renesas/Kconfig
>>> +++ b/drivers/net/ethernet/renesas/Kconfig
>>> @@ -1,3 +1,4 @@
>>> +# SPDX-License-Identifier: GPL-2.0
>>>    #
>>>    # Renesas device configuration
>>>    #
>>
>>      There was no license at all on this file...
> 
> So it's licensed under GPL-2.0.
> Cfr. e.g. commit 7328c8f48d1895b3 ("PCI: Add SPDX GPL-2.0 when no license
> was specified")

    OK, but then does the patch description match what is being done?

> Gr{oetje,eeting}s,
> 
>                          Geert

MBR, Sergei

^ permalink raw reply

* GREETINGS FROM TRACY WILLIAM
From: Miss Tracy William @ 2018-09-07  9:20 UTC (permalink / raw)

In-Reply-To: <1094400474.1597284.1536312012984.ref@mail.yahoo.com>



Hello Dear,
how are you today,I hope you are doing great.
It is my great pleasure to
contact you and i hope you don't mind,I was just surfing
through the Internet search when i found your email address,I want to make a new
and special friend,I hope you don't mind.

My name is Tracy William,I am from the United States of America but
presently I live
and work in England,
I will give you pictures and details of me as soon as i hear from you
bye
Tracy

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox