Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: dsa: b53: Utilize common helpers for u64/MAC
From: Andrew Lunn @ 2017-01-04 22:07 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem, vivien.didelot, volodymyr.bendiuga
In-Reply-To: <20170104215320.19293-1-f.fainelli@gmail.com>

On Wed, Jan 04, 2017 at 01:53:20PM -0800, Florian Fainelli wrote:
> Utilize the two functions recently introduced: u64_to_ether() and
> ether_to_u64() instead of our own versions.

:-)

And i expect these have been tested on big endian systems. I never
thought to look if one of the other drivers already had them.

> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Daniel Borkmann @ 2017-01-04 22:16 UTC (permalink / raw)
  To: Sowmini Varadhan, linux-kselftest, netdev; +Cc: willemb, davem, shuah
In-Reply-To: <3aa068fa482f7cf5381957e9a3ea58550822d1d1.1483555162.git.sowmini.varadhan@oracle.com>

On 01/04/2017 07:45 PM, Sowmini Varadhan wrote:
> The bpf_prog used in sock_setfilter() only attempts to check for
> ip pktlen, and verifies that the contents of the 80'th packet in
> the ethernet frame is 'a' or 'b'.  Thus many non-udp packets
> could incorrectly pass through this filter and cause incorrect
> test results.
>
> This commit hardens the conditions checked by the filter so
> that only UDP/IPv4 packets with the matching length and test-character
> will be permitted by the filter. The filter has been cleaned up
> to explicitly use the BPF macros to make it more readable.
>
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> ---
> v2: commit comment edited based on Willem de Bruijn review
> v3: Shuah Khan nit.
>
>   tools/testing/selftests/net/psock_lib.h |   29 ++++++++++++++++++++++-------
>   1 files changed, 22 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h
> index 24bc7ec..9e5553a 100644
> --- a/tools/testing/selftests/net/psock_lib.h
> +++ b/tools/testing/selftests/net/psock_lib.h
> @@ -27,6 +27,7 @@
>   #include <string.h>
>   #include <arpa/inet.h>
>   #include <unistd.h>
> +#include <netinet/udp.h>
>
>   #define DATA_LEN			100
>   #define DATA_CHAR			'a'
> @@ -40,14 +41,28 @@
>
>   static __maybe_unused void sock_setfilter(int fd, int lvl, int optnum)
>   {
> +	uint16_t ip_len = DATA_LEN +
> +			  sizeof(struct iphdr) +
> +			  sizeof(struct udphdr);
> +	/* the filter below checks for all of the following conditions that
> +	 * are based on the contents of create_payload()
> +	 *  ether type 0x800 and
> +	 *  ip proto udp     and
> +	 *  ip len == ip_len and
> +	 *  udp[38] == 'a' or udp[38] == 'b'
> +	 */
>   	struct sock_filter bpf_filter[] = {
> -		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
> -		{ 0x35, 0, 4, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
> -		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
> -		{ 0x15, 1, 0, DATA_CHAR  },  /* JEQ DATA_CHAR   [t goto match]*/
> -		{ 0x15, 0, 1, DATA_CHAR_1},  /* JEQ DATA_CHAR_1 [t goto match]*/
> -		{ 0x06, 0, 0, 0x00000060 },  /* RET match	              */
> -		{ 0x06, 0, 0, 0x00000000 },  /* RET no match		      */
> +		BPF_STMT(BPF_LD | BPF_H | BPF_ABS, 12),	/* LD ethertype */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ETH_P_IP, 0, 8),
> +		BPF_STMT(BPF_LD|BPF_B|BPF_ABS, 23),	/* LD ip_proto */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, IPPROTO_UDP, 0, 6),
> +		BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 16),	/* LD ip_len */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ip_len, 0, 4),
> +		BPF_STMT(BPF_LD|BPF_B|BPF_ABS, 80),	/* LD udp[38] */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, DATA_CHAR, 1, 0),
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, DATA_CHAR_1, 0, 1),
> +		BPF_STMT(BPF_RET | BPF_K, ~0),		/* match */
> +		BPF_STMT(BPF_RET | BPF_K, 0)		/* no match */

Just reading up on the thread, sorry to jump in late. Can't you just
use the generated code from bpf_asm (tools/net/) and add the asm program
as a comment above? Something like we do in net/core/ptp_classifier.c +13.
As it stands it makes it a bit harder to parse / less readable with macros
actually. Rest seems fine, thanks.

>   	};
>   	struct sock_fprog bpf_prog;
>
>

^ permalink raw reply

* Re: [PATCH net-next v2] tcp: provide timestamps for partial writes
From: Willem de Bruijn @ 2017-01-04 22:22 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: David Miller, Network Development, Soheil Hassas Yeganeh,
	Willem de Bruijn, Eric Dumazet, Neal Cardwell, Martin KaFai Lau
In-Reply-To: <20170104161934.26849-1-soheil.kdev@gmail.com>

On Wed, Jan 4, 2017 at 11:19 AM, Soheil Hassas Yeganeh
<soheil.kdev@gmail.com> wrote:
> From: Soheil Hassas Yeganeh <soheil@google.com>
>
> For TCP sockets, TX timestamps are only captured when the user data
> is successfully and fully written to the socket. In many cases,
> however, TCP writes can be partial for which no timestamp is
> collected.
>
> Collect timestamps whenever any user data is (fully or partially)
> copied into the socket. Pass tcp_write_queue_tail to tcp_tx_timestamp
> instead of the local skb pointer since it can be set to NULL on
> the error path.
>
> Note that tcp_write_queue_tail can be NULL, even if bytes have been
> copied to the socket. This is because acknowledgements are being
> processed in tcp_sendmsg(), and by the time tcp_tx_timestamp is
> called tcp_write_queue_tail can be NULL. For such cases, this patch
> does not collect any timestamps (i.e., it is best-effort).
>
> This patch is written with suggestions from Willem de Bruijn and
> Eric Dumazet.
>
> Change-log V1 -> V2:
>         - Use sockc.tsflags instead of sk->sk_tsflags.
>         - Use the same code path for normal writes and errors.
>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Acked-by: Yuchung Cheng <ycheng@google.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> Cc: Martin KaFai Lau <kafai@fb.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Sowmini Varadhan @ 2017-01-04 22:22 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: linux-kselftest, netdev, willemb, davem, shuah
In-Reply-To: <586D7437.1050708@iogearbox.net>

On (01/04/17 23:16), Daniel Borkmann wrote:
> 
> Just reading up on the thread, sorry to jump in late. Can't you just
> use the generated code from bpf_asm (tools/net/) and add the asm program
> as a comment above? Something like we do in net/core/ptp_classifier.c +13.

I was actually using the example from the BSD bpf(4) man page, 
and expanding on that one.. 
  https://www.freebsd.org/cgi/man.cgi?query=bpf&sektion=4&manpath=FreeBSD+4.7-RELEASE
(I could not find the equivalent linux man page).

It was a lot easier to parse than the existing code .

> As it stands it makes it a bit harder to parse / less readable with macros
> actually. Rest seems fine, thanks.

You think the earlier code was readable? I had to use
gcc -E, with help from the bpf(4) page, to make sense of it.

--Sowmini

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Daniel Borkmann @ 2017-01-04 22:26 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: linux-kselftest, netdev, willemb, davem, shuah
In-Reply-To: <20170104222224.GA31756@oracle.com>

On 01/04/2017 11:22 PM, Sowmini Varadhan wrote:
> On (01/04/17 23:16), Daniel Borkmann wrote:
>>
>> Just reading up on the thread, sorry to jump in late. Can't you just
>> use the generated code from bpf_asm (tools/net/) and add the asm program
>> as a comment above? Something like we do in net/core/ptp_classifier.c +13.
>
> I was actually using the example from the BSD bpf(4) man page,
> and expanding on that one..
>    https://www.freebsd.org/cgi/man.cgi?query=bpf&sektion=4&manpath=FreeBSD+4.7-RELEASE
> (I could not find the equivalent linux man page).
>
> It was a lot easier to parse than the existing code .

cBPF with its tooling is all documented here:

   Documentation/networking/filter.txt

>> As it stands it makes it a bit harder to parse / less readable with macros
>> actually. Rest seems fine, thanks.
>
> You think the earlier code was readable? I had to use
> gcc -E, with help from the bpf(4) page, to make sense of it.
>
> --Sowmini
>
>

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Shuah Khan @ 2017-01-04 22:37 UTC (permalink / raw)
  To: Sowmini Varadhan, linux-kselftest, netdev
  Cc: daniel, willemb, davem, Shuah Khan
In-Reply-To: <3aa068fa482f7cf5381957e9a3ea58550822d1d1.1483555162.git.sowmini.varadhan@oracle.com>

On 01/04/2017 11:45 AM, Sowmini Varadhan wrote:
> The bpf_prog used in sock_setfilter() only attempts to check for
> ip pktlen, and verifies that the contents of the 80'th packet in
> the ethernet frame is 'a' or 'b'.  Thus many non-udp packets
> could incorrectly pass through this filter and cause incorrect
> test results.
> 
> This commit hardens the conditions checked by the filter so
> that only UDP/IPv4 packets with the matching length and test-character
> will be permitted by the filter. The filter has been cleaned up
> to explicitly use the BPF macros to make it more readable.
> 
> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> ---
> v2: commit comment edited based on Willem de Bruijn review
> v3: Shuah Khan nit.
> 
>  tools/testing/selftests/net/psock_lib.h |   29 ++++++++++++++++++++++-------
>  1 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/psock_lib.h b/tools/testing/selftests/net/psock_lib.h
> index 24bc7ec..9e5553a 100644
> --- a/tools/testing/selftests/net/psock_lib.h
> +++ b/tools/testing/selftests/net/psock_lib.h
> @@ -27,6 +27,7 @@
>  #include <string.h>
>  #include <arpa/inet.h>
>  #include <unistd.h>
> +#include <netinet/udp.h>
>  
>  #define DATA_LEN			100
>  #define DATA_CHAR			'a'
> @@ -40,14 +41,28 @@
>  
>  static __maybe_unused void sock_setfilter(int fd, int lvl, int optnum)
>  {
> +	uint16_t ip_len = DATA_LEN +
> +			  sizeof(struct iphdr) +
> +			  sizeof(struct udphdr);
> +	/* the filter below checks for all of the following conditions that
> +	 * are based on the contents of create_payload()
> +	 *  ether type 0x800 and
> +	 *  ip proto udp     and
> +	 *  ip len == ip_len and
> +	 *  udp[38] == 'a' or udp[38] == 'b'
> +	 */

Looks like you have to do v4 anyway, please make sure your comment
block is one of the acceptable formats based on coding style:

https://marc.info/?l=linux-crypto-vger&m=146799837129319&w=2

>  	struct sock_filter bpf_filter[] = {
> -		{ 0x80, 0, 0, 0x00000000 },  /* LD  pktlen		      */
> -		{ 0x35, 0, 4, DATA_LEN   },  /* JGE DATA_LEN  [f goto nomatch]*/
> -		{ 0x30, 0, 0, 0x00000050 },  /* LD  ip[80]		      */
> -		{ 0x15, 1, 0, DATA_CHAR  },  /* JEQ DATA_CHAR   [t goto match]*/
> -		{ 0x15, 0, 1, DATA_CHAR_1},  /* JEQ DATA_CHAR_1 [t goto match]*/
> -		{ 0x06, 0, 0, 0x00000060 },  /* RET match	              */
> -		{ 0x06, 0, 0, 0x00000000 },  /* RET no match		      */
> +		BPF_STMT(BPF_LD | BPF_H | BPF_ABS, 12),	/* LD ethertype */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ETH_P_IP, 0, 8),
> +		BPF_STMT(BPF_LD|BPF_B|BPF_ABS, 23),	/* LD ip_proto */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, IPPROTO_UDP, 0, 6),
> +		BPF_STMT(BPF_LD|BPF_H|BPF_ABS, 16),	/* LD ip_len */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, ip_len, 0, 4),
> +		BPF_STMT(BPF_LD|BPF_B|BPF_ABS, 80),	/* LD udp[38] */
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, DATA_CHAR, 1, 0),
> +		BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, DATA_CHAR_1, 0, 1),
> +		BPF_STMT(BPF_RET | BPF_K, ~0),		/* match */
> +		BPF_STMT(BPF_RET | BPF_K, 0)		/* no match */
>  	};
>  	struct sock_fprog bpf_prog;
>  
> 

^ permalink raw reply

* Re: [net-next PATCH 5/6] i40e: Add TX and RX support in switchdev mode.
From: Samudrala, Sridhar @ 2017-01-04 22:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Alexander Duyck, John Fastabend, Anjali Singhai Jain,
	jakub.kicinski, intel-wired-lan, Linux Netdev List
In-Reply-To: <CAJ3xEMjGuKKge+_hqZPOg2fp3f50TdxGtbKFXwnCZ70rPxmuyw@mail.gmail.com>



On 1/3/2017 3:03 PM, Or Gerlitz wrote:
> On Fri, Dec 30, 2016 at 7:04 PM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>> On 12/30/2016 7:31 AM, Or Gerlitz wrote:
>>> Are you exposing switchdev ops for the representators? didn't see that
>>> or maybe it's in the 4th patch which didn't make it to the list?
>> Not at this time. In the future patches when we offload fdb/vlan
>> functionality, we could use switchdev ops.
> but wait, this is the switchdev mode... even before doing any
> offloading, you want (need) your representor netdevices to have the
> same HW ID marking they are all ports of the same ASIC, this you can
> do with the switchdev parent ID attribute.
OK. I will add switchdev_port_attr_get() with PORT_PARENT_ID support in v3.

Thanks
Sridhar

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Sowmini Varadhan @ 2017-01-04 22:48 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: linux-kselftest, netdev, willemb, davem, shuah
In-Reply-To: <586D7692.4000604@iogearbox.net>

On (01/04/17 23:26), Daniel Borkmann wrote:
> 
> >>As it stands it makes it a bit harder to parse / less readable with macros
> >>actually. Rest seems fine, thanks.

Usually macros are there (a) as an abstraction so you 
dont have to hard-code things, and, (b) to make things 
more readable. (maybe that's why the 1992 VJ paper on 
BPF came up with these macros?)

I think we differ on code-aesthetics (not correctness) here. 
It was not immediately obvious to me that "0x15 is actually 
BPF_JMP + BPF_JEQ + BPF_K" etc, when I wanted to extend
the bpf_prog to harden the checks in the existing code.

Would it be ok to leave the extremely subjective 
"make this more readable" part for you to tackle later? 
Or I can just drop patch1, and you can fix it to your 
satisfaction later.

--Sowmini

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Sowmini Varadhan @ 2017-01-04 22:49 UTC (permalink / raw)
  To: Shuah Khan; +Cc: linux-kselftest, netdev, daniel, willemb, davem
In-Reply-To: <d3e75147-475a-823c-3530-5150dd24a63d@kernel.org>

On (01/04/17 15:37), Shuah Khan wrote:
> Looks like you have to do v4 anyway, please make sure your comment
> block is one of the acceptable formats based on coding style:

I'm not sure about that.  I can just keep patch 2.

thanks,
--Sowmini

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Sowmini Varadhan @ 2017-01-04 22:55 UTC (permalink / raw)
  To: Shuah Khan; +Cc: linux-kselftest, netdev, daniel, willemb, davem
In-Reply-To: <d3e75147-475a-823c-3530-5150dd24a63d@kernel.org>

On (01/04/17 15:37), Shuah Khan wrote:
> > +	/* the filter below checks for all of the following conditions that
> > +	 * are based on the contents of create_payload()
> > +	 *  ether type 0x800 and
> > +	 *  ip proto udp     and
> > +	 *  ip len == ip_len and
> > +	 *  udp[38] == 'a' or udp[38] == 'b'
> > +	 */
> 
> Looks like you have to do v4 anyway, please make sure your comment
> block is one of the acceptable formats based on coding style:
> 
> https://marc.info/?l=linux-crypto-vger&m=146799837129319&w=2

BTW, the above is conformant with the comment style required for
networking:

https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt

which seems to be used in psock_fanout.c and reuseport_bpf.c as well.

Thanks
--Sowmini

^ permalink raw reply

* [RFC PATCH next] ipv6: do not send RTM_DELADDR for tentative addresses
From: Mahesh Bandewar @ 2017-01-04 23:01 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Mahesh Bandewar, Mahesh Bandewar, Hideaki YOSHIFUJI,
	Patrick McHardy, Hannes Frederic Sowa

From: Mahesh Bandewar <maheshb@google.com>

RTM_NEWADDR notification is sent when IFA_F_TENTATIVE is cleared from
the address. So if the address is added and deleted before DAD probes
completes, the RTM_DELADDR will be sent for which there was no
RTM_NEWADDR causing asymmetry in notification. However if the same
logic is used while sending RTM_DELADDR notification, this asymmetry
can be avoided.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/addrconf.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c1e124bc8e1e..ac9bd5620f81 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4888,6 +4888,13 @@ static void inet6_ifa_notify(int event, struct inet6_ifaddr *ifa)
 	struct net *net = dev_net(ifa->idev->dev);
 	int err = -ENOBUFS;
 
+	/* Don't send DELADDR notification for TENTATIVE address,
+	 * since NEWADDR notification is sent only after removing
+	 * TENTATIVE flag.
+	 */
+	if (ifa->flags & IFA_F_TENTATIVE && event == RTM_DELADDR)
+		return;
+
 	skb = nlmsg_new(inet6_ifaddr_msgsize(), GFP_ATOMIC);
 	if (!skb)
 		goto errout;
-- 
2.11.0.390.gc69c2f50cf-goog

^ permalink raw reply related

* Re: [PATCH 2/2] PCI: lock each enable/disable num_vfs operation in sysfs
From: Gavin Shan @ 2017-01-04 23:11 UTC (permalink / raw)
  To: Tantilov, Emil S
  Cc: Gavin Shan, linux-pci@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org, Duyck, Alexander H,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <87618083B2453E4A8714035B62D67992507430EB@FMSMSX105.amr.corp.intel.com>

On Wed, Jan 04, 2017 at 04:00:20PM +0000, Tantilov, Emil S wrote:
>>On Tue, Jan 03, 2017 at 04:48:31PM -0800, Emil Tantilov wrote:
>>>Enabling/disabling SRIOV via sysfs by echo-ing multiple values
>>>simultaneously:
>>>
>>>echo 63 > /sys/class/net/ethX/device/sriov_numvfs&
>>>echo 63 > /sys/class/net/ethX/device/sriov_numvfs
>>>
>>>sleep 5
>>>
>>>echo 0 > /sys/class/net/ethX/device/sriov_numvfs&
>>>echo 0 > /sys/class/net/ethX/device/sriov_numvfs
>>>
>>>Results in the following bug:
>>>
>>>kernel BUG at drivers/pci/iov.c:495!
>>>invalid opcode: 0000 [#1] SMP
>>>CPU: 1 PID: 8050 Comm: bash Tainted: G   W   4.9.0-rc7-net-next #2092
>>>RIP: 0010:[<ffffffff813b1647>]
>>>	  [<ffffffff813b1647>] pci_iov_release+0x57/0x60
>>>
>>>Call Trace:
>>> [<ffffffff81391726>] pci_release_dev+0x26/0x70
>>> [<ffffffff8155be6e>] device_release+0x3e/0xb0
>>> [<ffffffff81365ee7>] kobject_cleanup+0x67/0x180
>>> [<ffffffff81365d9d>] kobject_put+0x2d/0x60
>>> [<ffffffff8155bc27>] put_device+0x17/0x20
>>> [<ffffffff8139c08a>] pci_dev_put+0x1a/0x20
>>> [<ffffffff8139cb6b>] pci_get_dev_by_id+0x5b/0x90
>>> [<ffffffff8139cca5>] pci_get_subsys+0x35/0x40
>>> [<ffffffff8139ccc8>] pci_get_device+0x18/0x20
>>> [<ffffffff8139ccfb>] pci_get_domain_bus_and_slot+0x2b/0x60
>>> [<ffffffff813b09e7>] pci_iov_remove_virtfn+0x57/0x180
>>> [<ffffffff813b0b95>] pci_disable_sriov+0x65/0x140
>>> [<ffffffffa00a1af7>] ixgbe_disable_sriov+0xc7/0x1d0 [ixgbe]
>>> [<ffffffffa00a1e9d>] ixgbe_pci_sriov_configure+0x3d/0x170 [ixgbe]
>>> [<ffffffff8139d28c>] sriov_numvfs_store+0xdc/0x130
>>>...
>>>RIP  [<ffffffff813b1647>] pci_iov_release+0x57/0x60
>>>
>>>Use the existing mutex lock to protect each enable/disable operation.
>>>
>>>CC: Alexander Duyck <alexander.h.duyck@intel.com>
>>>Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
>>
>>Emil, It's going to change semantics of pci_enable_sriov() and pci_disable_sriov().
>>They can be invoked when writing to the sysfs entry, or loading PF's
>>driver. With the change applied, the lock (pf->sriov->lock) isn't acquired and released
>>in the PF's driver loading path.
>
>The enablement of SRIOV on driver load is done via deprecated module parameter.
>Perhaps we can just remove it, although there are probably still people that use it
>and may not be happy if we get rid of it. 
>

Yeah, some drivers are still using the interface. So we cannot affect it until
it can be droped.

>>I think the reasonable way would be adding a flag in "struct sriov", to
>>indicate someone is accessing the IOV capability through sysfs file. With this, the
>>code returns with "-EBUSY" immediately for contenders. With it, nothing is going
>>to be changed in PF's driver loading path.
>
>Flag is what I initially had in mind, but did not want to add extra locking if we 
>can make use of the existing.
>

The problem is sriov->lock wasn't introduced to protect the whole IOV capability.
Instead, it protects the allocation of virtual bus (if needed). In your patch,
it will be used to protect the whole IOV capability, ensure accessing the IOV
capability exclusively. So the usage of this lock is changed. 

     code extracted from pci.h:

     struct pci_sriov {
            :
            struct mutex lock;      /* lock for VF bus */
            :
     }

The lock was introduced by commit d1b054da8 ("PCI: initialize and release SR-IOV
capability"). If I'm correct enough, I don't think this lock is needed when
pci_enable_sriov() or pci_disable_sriov() are called in driver because of module
parameters. I don't see the usage case calling pci_disable_sriov() while previous
pci_enable_sriov() isn't finished yet. Also, it's not needed in EEH subsystem.
So I think the lock can be dropped, then it can be used to protect sysfs path. 

>>Also, there are some minor comments as below and I guess most of them won't
>>be applied if you take my suggestion eventually. However, I'm trying to make
>>the comments complete.
>
>Thanks a lot for reviewing!
>
>>
>>>---
>>> drivers/pci/pci-sysfs.c |   24 +++++++++++++++++-------
>>> 1 file changed, 17 insertions(+), 7 deletions(-)
>>>
>>>diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>>>index 0666287..5b54cf5 100644
>>>--- a/drivers/pci/pci-sysfs.c
>>>+++ b/drivers/pci/pci-sysfs.c
>>>@@ -472,7 +472,9 @@ static ssize_t sriov_numvfs_store(struct device *dev,
>>> 				  const char *buf, size_t count)
>>> {
>>> 	struct pci_dev *pdev = to_pci_dev(dev);
>>>+	struct pci_sriov *iov = pdev->sriov;
>>> 	int ret;
>>>+
>>
>>Unnecessary change.
>>
>>> 	u16 num_vfs;
>>>
>>> 	ret = kstrtou16(buf, 0, &num_vfs);
>>>@@ -482,38 +484,46 @@ static ssize_t sriov_numvfs_store(struct device
>>*dev,
>>> 	if (num_vfs > pci_sriov_get_totalvfs(pdev))
>>> 		return -ERANGE;
>>>
>>>+	mutex_lock(&iov->dev->sriov->lock);
>>>+
>>> 	if (num_vfs == pdev->sriov->num_VFs)
>>>-		return count;		/* no change */
>>>+		goto exit;
>>>
>>> 	/* is PF driver loaded w/callback */
>>> 	if (!pdev->driver || !pdev->driver->sriov_configure) {
>>> 		dev_info(&pdev->dev, "Driver doesn't support SRIOV
>>configuration via sysfs\n");
>>>-		return -ENOSYS;
>>>+		ret = -EINVAL;
>>>+		goto exit;
>>
>>Why we need change the error code here?
>
>checkpatch was complaining about the use of the ENOSYS error code being specific
>and even though it was not my patch introducing it I had to change it to shut it up.
>

Right, it's reserved for attempt to call nonexisting syscall, but I think ENOENT
might be more indicative than EINVAL in this specific case?

>>> 	}
>>>
>>> 	if (num_vfs == 0) {
>>> 		/* disable VFs */
>>> 		ret = pdev->driver->sriov_configure(pdev, 0);
>>>-		if (ret < 0)
>>>-			return ret;
>>>-		return count;
>>>+		goto exit;
>>> 	}
>>>
>>> 	/* enable VFs */
>>> 	if (pdev->sriov->num_VFs) {
>>> 		dev_warn(&pdev->dev, "%d VFs already enabled. Disable before enabling %d VFs\n",
>>> 			 pdev->sriov->num_VFs, num_vfs);
>>>-		return -EBUSY;
>>>+		ret = -EBUSY;
>>>+		goto exit;
>>> 	}
>>>
>>> 	ret = pdev->driver->sriov_configure(pdev, num_vfs);
>>> 	if (ret < 0)
>>>-		return ret;
>>>+		goto exit;
>>>
>>> 	if (ret != num_vfs)
>>> 		dev_warn(&pdev->dev, "%d VFs requested; only %d enabled\n",
>>> 			 num_vfs, ret);
>>>
>>>+exit:
>>>+	mutex_unlock(&iov->dev->sriov->lock);
>>>+
>>>+	if (ret < 0)
>>>+		return ret;
>>>+
>>> 	return count;
>>
>>The code might be clearer if @ret is returned here. In that case, We need
>>set it properly in error paths.
>
>I played with different ways to handle this and this seemed the least intrusive.
>

Ok, both should be fine.

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Daniel Borkmann @ 2017-01-04 22:59 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: linux-kselftest, netdev, willemb, davem, shuah
In-Reply-To: <20170104224848.GB31756@oracle.com>

On 01/04/2017 11:48 PM, Sowmini Varadhan wrote:
> On (01/04/17 23:26), Daniel Borkmann wrote:
[...]
>>>> As it stands it makes it a bit harder to parse / less readable with macros
>>>> actually. Rest seems fine, thanks.
>
> Usually macros are there (a) as an abstraction so you
> dont have to hard-code things, and, (b) to make things
> more readable. (maybe that's why the 1992 VJ paper on
> BPF came up with these macros?)
>
> I think we differ on code-aesthetics (not correctness) here.
> It was not immediately obvious to me that "0x15 is actually
> BPF_JMP + BPF_JEQ + BPF_K" etc, when I wanted to extend
> the bpf_prog to harden the checks in the existing code.
>
> Would it be ok to leave the extremely subjective
> "make this more readable" part for you to tackle later?
> Or I can just drop patch1, and you can fix it to your
> satisfaction later.

I think we're talking past each other (?), my suggestion
from my original email was to use bpf_asm and paste the
(human readable) program as a comment above as done also
elsewhere. But just leave it as it is then, no big deal
either.

^ permalink raw reply

* Re: [PATCH v3 net-next 1/2] tools: psock_lib: tighten conditions checked in sock_setfilter
From: Shuah Khan @ 2017-01-04 23:26 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: linux-kselftest, netdev, daniel, willemb, davem, Shuah Khan
In-Reply-To: <20170104225554.GD31756@oracle.com>

On 01/04/2017 03:55 PM, Sowmini Varadhan wrote:
> On (01/04/17 15:37), Shuah Khan wrote:
>>> +	/* the filter below checks for all of the following conditions that
>>> +	 * are based on the contents of create_payload()
>>> +	 *  ether type 0x800 and
>>> +	 *  ip proto udp     and
>>> +	 *  ip len == ip_len and
>>> +	 *  udp[38] == 'a' or udp[38] == 'b'
>>> +	 */
>>
>> Looks like you have to do v4 anyway, please make sure your comment
>> block is one of the acceptable formats based on coding style:
>>
>> https://marc.info/?l=linux-crypto-vger&m=146799837129319&w=2
> 
> BTW, the above is conformant with the comment style required for
> networking:
> 
> https://www.kernel.org/doc/Documentation/networking/netdev-FAQ.txt
> 
> which seems to be used in psock_fanout.c and reuseport_bpf.c as well.

I would like to see the comment blocks in selftest consistent with the
Kernel coding style.

> 
> Thanks
> --Sowmini
> 

Could you please split this patch into two. Hardening part in one and
the cleanup in a separate patch. This way I can get the hardening fix
into 4.10 in my next Kselftest update. Cleanup patch can go in later.

thanks,
-- Shuah

^ permalink raw reply

* RE: [PATCH] scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))
From: YUAN Linyu @ 2017-01-04 23:45 UTC (permalink / raw)
  To: David Miller, cugyly@163.com; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170104.132452.646592152519338774.davem@davemloft.net>

Thanks

> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, January 05, 2017 2:25 AM
> To: cugyly@163.com
> Cc: netdev@vger.kernel.org; YUAN Linyu
> Subject: Re: [PATCH] scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct
> {compat_}cmsghdr))
> 
> From: yuan linyu <cugyly@163.com>
> Date: Tue,  3 Jan 2017 20:42:17 +0800
> 
> > From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
> >
> > sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
> > remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
> > CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code
> consistent.
> >
> > Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
> 
> Applied, and I added the following commit which will make sure our
> analysis is accurate.
> 
> ====================
> [PATCH] net: Assert at build time the assumptions we make about the CMSG
> header.
> 
> It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).
> 
> Otherwise there are missing adjustments in the various calculations
> that parse and build these things.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/compat.c | 3 +++
>  net/socket.c | 2 ++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/net/compat.c b/net/compat.c
> index 4e27dd1..ba3ac72 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -130,6 +130,9 @@ int cmsghdr_from_user_compat_to_kern(struct
> msghdr *kmsg, struct sock *sk,
>  	__kernel_size_t kcmlen, tmp;
>  	int err = -EFAULT;
> 
> +	BUILD_BUG_ON(sizeof(struct compat_cmsghdr) !=
> +		     CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)));
> +
>  	kcmlen = 0;
>  	kcmsg_base = kcmsg = (struct cmsghdr *)stackbuf;
>  	ucmsg = CMSG_COMPAT_FIRSTHDR(kmsg);
> diff --git a/net/socket.c b/net/socket.c
> index 8487bf1..5f3b5a2 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -1948,6 +1948,8 @@ static int ___sys_sendmsg(struct socket *sock,
> struct user_msghdr __user *msg,
>  		ctl_buf = msg_sys->msg_control;
>  		ctl_len = msg_sys->msg_controllen;
>  	} else if (ctl_len) {
> +		BUILD_BUG_ON(sizeof(struct cmsghdr) !=
> +			     CMSG_ALIGN(sizeof(struct cmsghdr)));
>  		if (ctl_len > sizeof(ctl)) {
>  			ctl_buf = sock_kmalloc(sock->sk, ctl_len, GFP_KERNEL);
>  			if (ctl_buf == NULL)
> --
> 2.4.11

^ permalink raw reply

* [PATCH net-next] liquidio: fix wrong information about channels reported to ethtool
From: Felix Manlunas @ 2017-01-05  0:18 UTC (permalink / raw)
  To: davem; +Cc: netdev, raghu.vatsavayi, derek.chickles, satananda.burla

From: Weilin Chang <weilin.chang@cavium.com>

Information reported to ethtool about channels is sometimes wrong for PF,
and always wrong for VF.  Fix them by getting the information from the
right fields from the right structs.

Signed-off-by: Weilin Chang <weilin.chang@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index b00c300..50384ce 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -296,12 +296,16 @@ lio_ethtool_get_channels(struct net_device *dev,
 		rx_count = CFG_GET_NUM_RXQS_NIC_IF(conf6x, lio->ifidx);
 		tx_count = CFG_GET_NUM_TXQS_NIC_IF(conf6x, lio->ifidx);
 	} else if (OCTEON_CN23XX_PF(oct)) {
-		struct octeon_config *conf23 = CHIP_CONF(oct, cn23xx_pf);
 
-		max_rx = CFG_GET_OQ_MAX_Q(conf23);
-		max_tx = CFG_GET_IQ_MAX_Q(conf23);
-		rx_count = CFG_GET_NUM_RXQS_NIC_IF(conf23, lio->ifidx);
-		tx_count = CFG_GET_NUM_TXQS_NIC_IF(conf23, lio->ifidx);
+		max_rx = oct->sriov_info.num_pf_rings;
+		max_tx = oct->sriov_info.num_pf_rings;
+		rx_count = lio->linfo.num_rxpciq;
+		tx_count = lio->linfo.num_txpciq;
+	} else if (OCTEON_CN23XX_VF(oct)) {
+		max_tx = oct->sriov_info.rings_per_vf;
+		max_rx = oct->sriov_info.rings_per_vf;
+		rx_count = lio->linfo.num_rxpciq;
+		tx_count = lio->linfo.num_txpciq;
 	}
 
 	channel->max_rx = max_rx;

^ permalink raw reply related

* RE: [PATCH] stmmac: Enable Clause 45 PHYs in GMAC4 (eQOS)
From: Kweh, Hock Leong @ 2017-01-05  1:37 UTC (permalink / raw)
  To: Joao Pinto, davem@davemloft.net; +Cc: netdev@vger.kernel.org
In-Reply-To: <14201d58c172f7a65b03510d36d55170f4d16cd5.1483540094.git.jpinto@synopsys.com>

> -----Original Message-----
> From: Joao Pinto [mailto:Joao.Pinto@synopsys.com]
> Sent: Wednesday, January 04, 2017 10:36 PM
> To: davem@davemloft.net
> Cc: Kweh, Hock Leong <hock.leong.kweh@intel.com>; netdev@vger.kernel.org;
> Joao Pinto <Joao.Pinto@synopsys.com>
> Subject: [PATCH] stmmac: Enable Clause 45 PHYs in GMAC4 (eQOS)
> 
> The eQOS IP Core (best known in stmmac as gmac4) has a register that must be
> set if using a Clause 45 PHY. If this register is not set, the PHY won't work.
> This patch will have no impact in setups using Clause 22 PHYs.
> 
> Signed-off-by: Joao Pinto <jpinto@synopsys.com>

Hi Joao,

This is not working on our environment. We are using the 4-ETH-4-MGB-101 plugin card.

Regards,
Wilson

> ---
>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> index b0344c2..676ae3c 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
> @@ -41,6 +41,7 @@
>  #define MII_GMAC4_GOC_SHIFT		2
>  #define MII_GMAC4_WRITE			(1 << MII_GMAC4_GOC_SHIFT)
>  #define MII_GMAC4_READ			(3 << MII_GMAC4_GOC_SHIFT)
> +#define MII_CLAUSE45_PHY		(1 << 1)
> 
>  static int stmmac_mdio_busy_wait(void __iomem *ioaddr, unsigned int
> mii_addr)  { @@ -125,7 +126,7 @@ static int stmmac_mdio_write(struct
> mii_bus *bus, int phyaddr, int phyreg,
>  	value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>  		& priv->hw->mii.clk_csr_mask;
>  	if (priv->plat->has_gmac4)
> -		value |= MII_GMAC4_WRITE;
> +		value |= MII_GMAC4_WRITE | MII_CLAUSE45_PHY;
>  	else
>  		value |= MII_WRITE;
> 
> --
> 2.9.3

^ permalink raw reply

* [PATCH net-next] packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode
From: Daniel Borkmann @ 2017-01-05  1:34 UTC (permalink / raw)
  To: davem; +Cc: sowmini.varadhan, willemb, netdev, Daniel Borkmann

When TX timestamping is in use with TPACKET_V3's TX ring, then we'll
hit the BUG() in __packet_set_timestamp() when ring buffer slot is
returned to user space via tpacket_destruct_skb(). This is due to v3
being assumed as unreachable here, but since 7f953ab2ba46 ("af_packet:
TX_RING support for TPACKET_V3") it's not anymore. Fix it by filling
the timestamp back into the ring slot.

Fixes: 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 net/packet/af_packet.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 7e39087..ddbda25 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -481,6 +481,9 @@ static __u32 __packet_set_timestamp(struct packet_sock *po, void *frame,
 		h.h2->tp_nsec = ts.tv_nsec;
 		break;
 	case TPACKET_V3:
+		h.h3->tp_sec = ts.tv_sec;
+		h.h3->tp_nsec = ts.tv_nsec;
+		break;
 	default:
 		WARN(1, "TPACKET version not supported.\n");
 		BUG();
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH net-next] packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode
From: Sowmini Varadhan @ 2017-01-05  2:10 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, willemb, netdev
In-Reply-To: <1483580068-13854-1-git-send-email-daniel@iogearbox.net>

On (01/05/17 02:34), Daniel Borkmann wrote:
> When TX timestamping is in use with TPACKET_V3's TX ring, then we'll
> hit the BUG() in __packet_set_timestamp() when ring buffer slot is
> returned to user space via tpacket_destruct_skb(). This is due to v3
> being assumed as unreachable here, but since 7f953ab2ba46 ("af_packet:
> TX_RING support for TPACKET_V3") it's not anymore. Fix it by filling
> the timestamp back into the ring slot.

Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>

--Sowmini

^ permalink raw reply

* Re: [net PATCH] net: virtio: cap mtu when XDP programs are running
From: Jason Wang @ 2017-01-05  3:09 UTC (permalink / raw)
  To: John Fastabend, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel
In-Reply-To: <586D458F.5050705@gmail.com>



On 2017年01月05日 02:57, John Fastabend wrote:
> [...]
>
>> On 2017年01月04日 00:48, John Fastabend wrote:
>>> On 17-01-02 10:14 PM, Jason Wang wrote:
>>>> On 2017年01月03日 06:30, John Fastabend wrote:
>>>>> XDP programs can not consume multiple pages so we cap the MTU to
>>>>> avoid this case. Virtio-net however only checks the MTU at XDP
>>>>> program load and does not block MTU changes after the program
>>>>> has loaded.
>>>>>
>>>>> This patch sets/clears the max_mtu value at XDP load/unload time.
>>>>>
>>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>>> ---
> [...]
>
>>> OK so this logic is a bit too simply. When it resets the max_mtu I guess it
>>> needs to read the mtu via
>>>
>>>      virtio_cread16(vdev, ...)
>>>
>>> or we may break the negotiated mtu.
>> Yes, this is a problem (even use ETH_MAX_MTU). We may need a method to notify
>> the device about the mtu in this case which is not supported by virtio now.
> Note this is not really a XDP specific problem. The guest can change the MTU
> after init time even without XDP which I assume should ideally result in a
> notification if the MTU is negotiated.

Yes, Michael, do you think we need add some mechanism to notify host 
about MTU change in this case?

Thanks

^ permalink raw reply

* Re: [RFC PATCH] virtio_net: XDP support for adjust_head
From: Jason Wang @ 2017-01-05  3:10 UTC (permalink / raw)
  To: John Fastabend, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel
In-Reply-To: <586D45DF.401@gmail.com>



On 2017年01月05日 02:58, John Fastabend wrote:
> [...]
>
>>> @@ -393,34 +397,39 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
>>>                   struct bpf_prog *xdp_prog,
>>>                   void *data, int len)
>>>    {
>>> -    int hdr_padded_len;
>>>        struct xdp_buff xdp;
>>> -    void *buf;
>>>        unsigned int qp;
>>>        u32 act;
>>>    +
>>>        if (vi->mergeable_rx_bufs) {
>>> -        hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>>> -        xdp.data = data + hdr_padded_len;
>>> +        int desc_room = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>>> +
>>> +        /* Allow consuming headroom but reserve enough space to push
>>> +         * the descriptor on if we get an XDP_TX return code.
>>> +         */
>>> +        xdp.data_hard_start = data - vi->headroom + desc_room;
>>> +        xdp.data = data + desc_room;
>>>            xdp.data_end = xdp.data + (len - vi->hdr_len);
>>> -        buf = data;
>>>        } else { /* small buffers */
>>>            struct sk_buff *skb = data;
>>>    -        xdp.data = skb->data;
>>> +        xdp.data_hard_start = skb->data;
>>> +        xdp.data = skb->data + vi->headroom;
>>>            xdp.data_end = xdp.data + len;
>>> -        buf = skb->data;
>>>        }
>>>          act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>        switch (act) {
>>>        case XDP_PASS:
>>> +        if (!vi->mergeable_rx_bufs)
>>> +            __skb_pull((struct sk_buff *) data,
>>> +                   xdp.data - xdp.data_hard_start);
>> Instead of doing things here and virtnet_xdp_xmit(). How about always making
>> skb->data point to the buffer head like:
>>
>> 1) reserve headroom in add_recvbuf_small()
>> 2) skb_push(xdp->data - xdp_data_hard_start, skb) if we detect xdp->data was
>> modified afer bpf_prog_run_xdp()
>>
>> Then there's no special code in either XDP_PASS or XDP_TX?
>>
> Alternatively moving the pull into the receive_small XDP handler also
> removes the special case. I'll submit a patch shortly let me know what
> you think.
>

I'm ok with this.

Thanks

^ permalink raw reply

* [net PATCH 1/2] virtio_net: cap mtu when XDP programs are running
From: John Fastabend @ 2017-01-05  3:11 UTC (permalink / raw)
  To: jasowang, mst; +Cc: john.r.fastabend, netdev, john.fastabend

XDP programs can not consume multiple pages so we cap the MTU to
avoid this case. Virtio-net however only checks the MTU at XDP
program load and does not block MTU changes after the program
has loaded.

This patch sets/clears the max_mtu value at XDP load/unload time.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4a10500..261103d9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1699,11 +1699,28 @@ static void virtnet_init_settings(struct net_device *dev)
 	.set_settings = virtnet_set_settings,
 };
 
+#define MIN_MTU ETH_MIN_MTU
+#define MAX_MTU ETH_MAX_MTU
+
+static unsigned long int virtnet_xdp_mtu(struct bpf_prog *prog,
+					 struct virtnet_info *vi)
+{
+	if (!prog && virtio_has_feature(vi->vdev, VIRTIO_NET_F_MTU))
+		return virtio_cread16(vi->vdev,
+				      offsetof(struct virtio_net_config, mtu));
+	else if (!prog)
+		return ETH_MAX_MTU;
+	else if (vi->mergeable_rx_bufs)
+		return PAGE_SIZE - sizeof(struct padded_vnet_hdr);
+	else
+		return GOOD_PACKET_LEN;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
-	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct bpf_prog *old_prog;
+	unsigned long int max_sz;
 	u16 xdp_qp = 0, curr_qp;
 	int i, err;
 
@@ -1720,6 +1737,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -EINVAL;
 	}
 
+	max_sz = virtnet_xdp_mtu(prog, vi);
 	if (dev->mtu > max_sz) {
 		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
 		return -EINVAL;
@@ -1748,6 +1766,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 			virtnet_set_queues(vi, curr_qp);
 			return PTR_ERR(prog);
 		}
+		dev->max_mtu = max_sz;
+	} else {
+		dev->max_mtu = ETH_MAX_MTU;
 	}
 
 	vi->xdp_queue_pairs = xdp_qp;
@@ -2133,9 +2154,6 @@ static bool virtnet_validate_features(struct virtio_device *vdev)
 	return true;
 }
 
-#define MIN_MTU ETH_MIN_MTU
-#define MAX_MTU ETH_MAX_MTU
-
 static int virtnet_probe(struct virtio_device *vdev)
 {
 	int i, err;

^ permalink raw reply related

* [net PATCH 2/2] virtio_net: use dev_kfree_skb for small buffer XDP receive
From: John Fastabend @ 2017-01-05  3:11 UTC (permalink / raw)
  To: jasowang, mst; +Cc: john.r.fastabend, netdev, john.fastabend
In-Reply-To: <20170105031118.2636.82374.stgit@john-Precision-Tower-5810>

In the small buffer case during driver unload we currently use
put_page instead of dev_kfree_skb. Resolve this by adding a check
for virtnet mode when checking XDP queue type. Also name the
function so that the code reads correctly to match the additional
check.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 261103d9..a224d3e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1911,8 +1911,12 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 			put_page(vi->rq[i].alloc_frag.page);
 }
 
-static bool is_xdp_queue(struct virtnet_info *vi, int q)
+static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
 {
+	/* For small receive mode always use kfree_skb variants */
+	if (!vi->mergeable_rx_bufs)
+		return false;
+
 	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
 		return false;
 	else if (q < vi->curr_queue_pairs)
@@ -1929,7 +1933,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-			if (!is_xdp_queue(vi, i))
+			if (!is_xdp_raw_buffer_queue(vi, i))
 				dev_kfree_skb(buf);
 			else
 				put_page(virt_to_head_page(buf));

^ permalink raw reply related

* Re: [net PATCH 1/2] virtio_net: cap mtu when XDP programs are running
From: John Fastabend @ 2017-01-05  3:14 UTC (permalink / raw)
  To: jasowang, mst; +Cc: john.r.fastabend, netdev
In-Reply-To: <20170105031118.2636.82374.stgit@john-Precision-Tower-5810>

On 17-01-04 07:11 PM, John Fastabend wrote:
> XDP programs can not consume multiple pages so we cap the MTU to
> avoid this case. Virtio-net however only checks the MTU at XDP
> program load and does not block MTU changes after the program
> has loaded.
> 
> This patch sets/clears the max_mtu value at XDP load/unload time.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)


[...]

>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  {
> -	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct bpf_prog *old_prog;
> +	unsigned long int max_sz;
>  	u16 xdp_qp = 0, curr_qp;
>  	int i, err;
>  
> @@ -1720,6 +1737,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  		return -EINVAL;
>  	}
>  
> +	max_sz = virtnet_xdp_mtu(prog, vi);
>  	if (dev->mtu > max_sz) {
>  		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>  		return -EINVAL;
> @@ -1748,6 +1766,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  			virtnet_set_queues(vi, curr_qp);
>  			return PTR_ERR(prog);
>  		}
> +		dev->max_mtu = max_sz;
> +	} else {
> +		dev->max_mtu = ETH_MAX_MTU;
>  	}

oops v2 needed. The else branch is not required anymore. :/

^ permalink raw reply

* Re: [net PATCH 1/2] virtio_net: cap mtu when XDP programs are running
From: Michael S. Tsirkin @ 2017-01-05  3:18 UTC (permalink / raw)
  To: John Fastabend; +Cc: jasowang, john.r.fastabend, netdev
In-Reply-To: <20170105031118.2636.82374.stgit@john-Precision-Tower-5810>

On Wed, Jan 04, 2017 at 07:11:18PM -0800, John Fastabend wrote:
> XDP programs can not consume multiple pages so we cap the MTU to
> avoid this case. Virtio-net however only checks the MTU at XDP
> program load and does not block MTU changes after the program
> has loaded.

Do drivers really have to tweak max mtu all the time?
Seems strange, I would say drivers just report device caps
and net core enforces rules.
Can't net core do these checks?

> 
> This patch sets/clears the max_mtu value at XDP load/unload time.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4a10500..261103d9 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1699,11 +1699,28 @@ static void virtnet_init_settings(struct net_device *dev)
>  	.set_settings = virtnet_set_settings,
>  };
>  
> +#define MIN_MTU ETH_MIN_MTU
> +#define MAX_MTU ETH_MAX_MTU
> +
> +static unsigned long int virtnet_xdp_mtu(struct bpf_prog *prog,
> +					 struct virtnet_info *vi)
> +{
> +	if (!prog && virtio_has_feature(vi->vdev, VIRTIO_NET_F_MTU))
> +		return virtio_cread16(vi->vdev,
> +				      offsetof(struct virtio_net_config, mtu));
> +	else if (!prog)
> +		return ETH_MAX_MTU;
> +	else if (vi->mergeable_rx_bufs)
> +		return PAGE_SIZE - sizeof(struct padded_vnet_hdr);
> +	else
> +		return GOOD_PACKET_LEN;
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  {
> -	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct bpf_prog *old_prog;
> +	unsigned long int max_sz;
>  	u16 xdp_qp = 0, curr_qp;
>  	int i, err;
>  
> @@ -1720,6 +1737,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  		return -EINVAL;
>  	}
>  
> +	max_sz = virtnet_xdp_mtu(prog, vi);
>  	if (dev->mtu > max_sz) {
>  		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>  		return -EINVAL;
> @@ -1748,6 +1766,9 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  			virtnet_set_queues(vi, curr_qp);
>  			return PTR_ERR(prog);
>  		}
> +		dev->max_mtu = max_sz;
> +	} else {
> +		dev->max_mtu = ETH_MAX_MTU;
>  	}
>  
>  	vi->xdp_queue_pairs = xdp_qp;
> @@ -2133,9 +2154,6 @@ static bool virtnet_validate_features(struct virtio_device *vdev)
>  	return true;
>  }
>  
> -#define MIN_MTU ETH_MIN_MTU
> -#define MAX_MTU ETH_MAX_MTU
> -
>  static int virtnet_probe(struct virtio_device *vdev)
>  {
>  	int i, err;

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox