Netdev List
 help / color / mirror / Atom feed
* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19  7:51 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <6a6bac84-1d29-2740-1636-d3adb26b6bcc@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/18/2019, 10:16:20 (UTC+00:00)

> Have you tried using NFS on a board with this ethernet controller?

I'm having some issues setting up the NFS server in order to replicate 
so this may take some time.

Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
the buffer address ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.2 226/249] selftests: bpf: fix inlines in test_lwt_seg6local
From: Jiri Benc @ 2019-07-19  7:54 UTC (permalink / raw)
  To: David Miller
  Cc: sashal, linux-kernel, stable, yhs, daniel, linux-kselftest,
	netdev, bpf, clang-built-linux
In-Reply-To: <20190718.115534.1778444973119064345.davem@davemloft.net>

On Thu, 18 Jul 2019 11:55:34 -0700 (PDT), David Miller wrote:
> It has a significant impact on automated testing which lots of
> individuals and entities perform, therefore I think it very much is
> -stable material.

Okay.

Thanks,

 Jiri

^ permalink raw reply

* Re: [PATCH net,v4 1/4] net: openvswitch: rename flow_stats to sw_flow_stats
From: Jiri Pirko @ 2019-07-19  7:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, jakub.kicinski, pshelar
In-Reply-To: <20190718175931.13529-2-pablo@netfilter.org>

Thu, Jul 18, 2019 at 07:59:28PM CEST, pablo@netfilter.org wrote:
>There is a flow_stats structure defined in include/net/flow_offload.h
>and a follow up patch adds #include <net/flow_offload.h> to
>net/sch_generic.h.
>
>This breaks compilation since OVS codebase includes net/sock.h which
>pulls in linux/filter.h which includes net/sch_generic.h.
>
>In file included from ./include/net/sch_generic.h:18:0,
>                 from ./include/linux/filter.h:25,
>                 from ./include/net/sock.h:59,
>                 from ./include/linux/tcp.h:19,
>                 from net/openvswitch/datapath.c:24
>
>This definition takes precedence to OVS since it is placed in the
>networking core, so rename flow_stats in OVS to sw_flow_stats since
>this structure is contained in the sw_flow object.
>
>Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net,v4 4/4] net: flow_offload: add flow_block structure and use it
From: Jiri Pirko @ 2019-07-19  8:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, jakub.kicinski, pshelar
In-Reply-To: <20190718175931.13529-5-pablo@netfilter.org>

Thu, Jul 18, 2019 at 07:59:31PM CEST, pablo@netfilter.org wrote:
>This object stores the flow block callbacks that are attached to this
>block. Update flow_block_cb_lookup() to take this new object.
>
>This patch restores the block sharing feature.
>
>Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
>Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-19  8:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190718072741-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > the length in the packet header.
> > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > >
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > >
> > > > > So how does it work right now? If an app
> > > > > does sendmsg with a 64K buffer and the other
> > > > > side publishes 4K buffers - does it just stall?
> > > >
> > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > >
> > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > the TX worker, depending on the size of the buffers available in the
> > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > for now we postponed it)
> > >
> > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > Why is it done like this?
> > 
> > Honestly, I don't know the exact reasons for this design, but I suppose
> > that the idea was to have only one worker that uses the vring, and
> > multiple user threads that enqueue packets in the list.
> > This can simplify the code and we can put the user threads to sleep if
> > we don't have "credit" available (this means that the receiver doesn't
> > have space to receive the packet).
> 
> 
> I think you mean the reverse: even without credits you can copy from
> user and queue up data, then process it without waking up the user
> thread.

I checked the code better, but it doesn't seem to do that.
The .sendmsg callback of af_vsock, check if the transport has space
(virtio-vsock transport returns the credit available). If there is no
space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.

When the transport receives an update of credit available on the other
peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
sleeping, that will queue the new packet.

So, in the current implementation, the TX worker doesn't check the
credit available, it only sends the packets.

> Does it help though? It certainly adds up work outside of
> user thread context which means it's not accounted for
> correctly.

I can try to xmit the packet directly in the user thread context, to see
the improvements.

> 
> Maybe we want more VQs. Would help improve parallelism. The question
> would then become how to map sockets to VQs. With a simple hash
> it's easy to create collisions ...

Yes, more VQs can help but the map question is not simple to answer.
Maybe we can do an hash on the (cid, port) or do some kind of estimation
of queue utilization and try to balance.
Should the mapping be unique?

> 
> 
> > 
> > What are the drawbacks in your opinion?
> > 
> > 
> > Thanks,
> > Stefano
> 
> - More pressure on scheduler
> - Increased latency
> 

Thanks,
Stefano

^ permalink raw reply

* [PATCH v2] vrf: make sure skb->data contains ip header to make routing
From: Peter Kosyh @ 2019-07-19  8:11 UTC (permalink / raw)
  To: David Ahern; +Cc: davem, Peter Kosyh, Shrijeet Mukherjee, netdev, linux-kernel
In-Reply-To: <213bada2-fe81-3c14-1506-11abf0f3ca22@cumulusnetworks.com>

vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).

Case:

1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.

So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
---
 drivers/net/vrf.c | 58 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 54edf8956a25..6e84328bdd40 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -165,23 +165,29 @@ static int vrf_ip6_local_out(struct net *net, struct sock *sk,
 static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 					   struct net_device *dev)
 {
-	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	const struct ipv6hdr *iph;
 	struct net *net = dev_net(skb->dev);
-	struct flowi6 fl6 = {
-		/* needed to match OIF rule */
-		.flowi6_oif = dev->ifindex,
-		.flowi6_iif = LOOPBACK_IFINDEX,
-		.daddr = iph->daddr,
-		.saddr = iph->saddr,
-		.flowlabel = ip6_flowinfo(iph),
-		.flowi6_mark = skb->mark,
-		.flowi6_proto = iph->nexthdr,
-		.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF,
-	};
+	struct flowi6 fl6;
 	int ret = NET_XMIT_DROP;
 	struct dst_entry *dst;
 	struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst;
 
+	if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr)))
+		goto err;
+
+	iph = ipv6_hdr(skb);
+
+	memset(&fl6, 0, sizeof(fl6));
+	/* needed to match OIF rule */
+	fl6.flowi6_oif = dev->ifindex;
+	fl6.flowi6_iif = LOOPBACK_IFINDEX;
+	fl6.daddr = iph->daddr;
+	fl6.saddr = iph->saddr;
+	fl6.flowlabel = ip6_flowinfo(iph);
+	fl6.flowi6_mark = skb->mark;
+	fl6.flowi6_proto = iph->nexthdr;
+	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
+
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (dst == dst_null)
 		goto err;
@@ -237,21 +243,27 @@ static int vrf_ip_local_out(struct net *net, struct sock *sk,
 static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 					   struct net_device *vrf_dev)
 {
-	struct iphdr *ip4h = ip_hdr(skb);
+	struct iphdr *ip4h;
 	int ret = NET_XMIT_DROP;
-	struct flowi4 fl4 = {
-		/* needed to match OIF rule */
-		.flowi4_oif = vrf_dev->ifindex,
-		.flowi4_iif = LOOPBACK_IFINDEX,
-		.flowi4_tos = RT_TOS(ip4h->tos),
-		.flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF,
-		.flowi4_proto = ip4h->protocol,
-		.daddr = ip4h->daddr,
-		.saddr = ip4h->saddr,
-	};
+	struct flowi4 fl4;
 	struct net *net = dev_net(vrf_dev);
 	struct rtable *rt;
 
+	if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr)))
+		goto err;
+
+	ip4h = ip_hdr(skb);
+
+	memset(&fl4, 0, sizeof(fl4));
+	/* needed to match OIF rule */
+	fl4.flowi4_oif = vrf_dev->ifindex;
+	fl4.flowi4_iif = LOOPBACK_IFINDEX;
+	fl4.flowi4_tos = RT_TOS(ip4h->tos);
+	fl4.flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF;
+	fl4.flowi4_proto = ip4h->protocol;
+	fl4.daddr = ip4h->daddr;
+	fl4.saddr = ip4h->saddr;
+
 	rt = ip_route_output_flow(net, &fl4, NULL);
 	if (IS_ERR(rt))
 		goto err;
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Jason Wang @ 2019-07-19  8:21 UTC (permalink / raw)
  To: Stefano Garzarella, Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, kvm
In-Reply-To: <20190719080832.7hoeus23zjyrx3cc@steredhat>


On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>> the length in the packet header.
>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>
>>>>>>> Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
>>>>>> So how does it work right now? If an app
>>>>>> does sendmsg with a 64K buffer and the other
>>>>>> side publishes 4K buffers - does it just stall?
>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>
>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>> the TX worker, depending on the size of the buffers available in the
>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>> for now we postponed it)
>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>> Why is it done like this?
>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>> that the idea was to have only one worker that uses the vring, and
>>> multiple user threads that enqueue packets in the list.
>>> This can simplify the code and we can put the user threads to sleep if
>>> we don't have "credit" available (this means that the receiver doesn't
>>> have space to receive the packet).
>> I think you mean the reverse: even without credits you can copy from
>> user and queue up data, then process it without waking up the user
>> thread.
> I checked the code better, but it doesn't seem to do that.
> The .sendmsg callback of af_vsock, check if the transport has space
> (virtio-vsock transport returns the credit available). If there is no
> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>
> When the transport receives an update of credit available on the other
> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> sleeping, that will queue the new packet.
>
> So, in the current implementation, the TX worker doesn't check the
> credit available, it only sends the packets.
>
>> Does it help though? It certainly adds up work outside of
>> user thread context which means it's not accounted for
>> correctly.
> I can try to xmit the packet directly in the user thread context, to see
> the improvements.


It will then looks more like what virtio-net (and other networking 
device) did.


>
>> Maybe we want more VQs. Would help improve parallelism. The question
>> would then become how to map sockets to VQs. With a simple hash
>> it's easy to create collisions ...
> Yes, more VQs can help but the map question is not simple to answer.
> Maybe we can do an hash on the (cid, port) or do some kind of estimation
> of queue utilization and try to balance.
> Should the mapping be unique?


It sounds to me you want some kind of fair queuing? We've already had 
several qdiscs that do this.

So if we use the kernel networking xmit path, all those issues could be 
addressed.

Thanks

>

^ permalink raw reply

* [PATCH] usbnet: smsc75xx: Merge memcpy + le32_to_cpus to get_unaligned_le32
From: Chuhong Yuan @ 2019-07-19  8:27 UTC (permalink / raw)
  Cc: Steve Glendinning, David S . Miller, netdev, linux-usb,
	linux-kernel, Chuhong Yuan

Merge the combo use of memcpy and le32_to_cpus.
Use get_unaligned_le32 instead.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/usb/smsc75xx.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 1417a22962a1..7fac9db5380d 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -661,8 +661,7 @@ static void smsc75xx_status(struct usbnet *dev, struct urb *urb)
 		return;
 	}
 
-	memcpy(&intdata, urb->transfer_buffer, 4);
-	le32_to_cpus(&intdata);
+	intdata = get_unaligned_le32(urb->transfer_buffer);
 
 	netif_dbg(dev, link, dev->net, "intdata: 0x%08X\n", intdata);
 
@@ -2181,12 +2180,10 @@ static int smsc75xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 		struct sk_buff *ax_skb;
 		unsigned char *packet;
 
-		memcpy(&rx_cmd_a, skb->data, sizeof(rx_cmd_a));
-		le32_to_cpus(&rx_cmd_a);
+		rx_cmd_a = get_unaligned_le32(skb->data);
 		skb_pull(skb, 4);
 
-		memcpy(&rx_cmd_b, skb->data, sizeof(rx_cmd_b));
-		le32_to_cpus(&rx_cmd_b);
+		rx_cmd_b = get_unaligned_le32(skb->data);
 		skb_pull(skb, 4 + RXW_PADDING);
 
 		packet = skb->data;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
From: Stefano Garzarella @ 2019-07-19  8:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190718083105-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 08:33:40AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > > Since now we are able to split packets, we can avoid limiting
> > > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > > packet size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > >
> > > OK so this is kind of like GSO where we are passing
> > > 64K packets to the vsock and then split at the
> > > low level.
> > 
> > Exactly, something like that in the Host->Guest path, instead in the
> > Guest->Host we use the entire 64K packet.
> > 
> > Thanks,
> > Stefano
> 
> btw two allocations for each packet isn't great. How about
> allocating the struct linearly with the data?

Are you referring to the kzalloc() to allocate the 'struct
virtio_vsock_pkt', followed by the kmalloc() to allocate the buffer?

Actually they don't look great, I will try to do a single allocation.

> And all buffers are same length for you - so you can actually
> do alloc_pages.

Yes, also Jason suggested it and we decided to postpone since we will
try to reuse the virtio-net where it comes for free.

> Allocating/freeing pages in a batch should also be considered.

For the allocation of guest rx buffers we do some kind of batching (we
refill the queue when it reaches the half), but only it this case :(

I'll try to do more alloc/free batching.

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-19  8:37 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB3266960A104A7CDBB4E59192D3CB0@BN8PR12MB3266.namprd12.prod.outlook.com>


On 19/07/2019 08:51, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> 
>> Have you tried using NFS on a board with this ethernet controller?
> 
> I'm having some issues setting up the NFS server in order to replicate 
> so this may take some time.

If that's the case, we may wish to consider reverting this for now as it
is preventing our board from booting. Appears to revert cleanly on top
of mainline.

> Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> the buffer address ?

If you have a debug patch you would like me to apply and test with I
can. However, it is best you prepare the patch as maybe I will not dump
the appropriate addresses.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-19  8:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm
In-Reply-To: <fcd19719-e5a9-adad-1e6c-c84487187088@redhat.com>

On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> 
> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > the length in the packet header.
> > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > 
> > > > > > > > Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
> > > > > > > So how does it work right now? If an app
> > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > 
> > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > for now we postponed it)
> > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > Why is it done like this?
> > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > that the idea was to have only one worker that uses the vring, and
> > > > multiple user threads that enqueue packets in the list.
> > > > This can simplify the code and we can put the user threads to sleep if
> > > > we don't have "credit" available (this means that the receiver doesn't
> > > > have space to receive the packet).
> > > I think you mean the reverse: even without credits you can copy from
> > > user and queue up data, then process it without waking up the user
> > > thread.
> > I checked the code better, but it doesn't seem to do that.
> > The .sendmsg callback of af_vsock, check if the transport has space
> > (virtio-vsock transport returns the credit available). If there is no
> > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > 
> > When the transport receives an update of credit available on the other
> > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > sleeping, that will queue the new packet.
> > 
> > So, in the current implementation, the TX worker doesn't check the
> > credit available, it only sends the packets.
> > 
> > > Does it help though? It certainly adds up work outside of
> > > user thread context which means it's not accounted for
> > > correctly.
> > I can try to xmit the packet directly in the user thread context, to see
> > the improvements.
> 
> 
> It will then looks more like what virtio-net (and other networking device)
> did.

I'll try ASAP, the changes should not be too complicated... I hope :)

> 
> 
> > 
> > > Maybe we want more VQs. Would help improve parallelism. The question
> > > would then become how to map sockets to VQs. With a simple hash
> > > it's easy to create collisions ...
> > Yes, more VQs can help but the map question is not simple to answer.
> > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > of queue utilization and try to balance.
> > Should the mapping be unique?
> 
> 
> It sounds to me you want some kind of fair queuing? We've already had
> several qdiscs that do this.

Thanks for pointing it out!

> 
> So if we use the kernel networking xmit path, all those issues could be
> addressed.

One more point to AF_VSOCK + net-stack, but we have to evaluate possible
drawbacks in using the net-stack. (e.g. more latency due to the complexity
of the net-stack?)

Thanks,
Stefano

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19  8:44 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <bc9ab3c5-b1b9-26d4-7b73-01474328eafa@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/19/2019, 09:37:49 (UTC+00:00)

> 
> On 19/07/2019 08:51, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> > 
> >> Have you tried using NFS on a board with this ethernet controller?
> > 
> > I'm having some issues setting up the NFS server in order to replicate 
> > so this may take some time.
> 
> If that's the case, we may wish to consider reverting this for now as it
> is preventing our board from booting. Appears to revert cleanly on top
> of mainline.
> 
> > Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> > the buffer address ?
> 
> If you have a debug patch you would like me to apply and test with I
> can. However, it is best you prepare the patch as maybe I will not dump
> the appropriate addresses.
> 
> Cheers
> Jon
> 
> -- 
> nvpublic

Send me full boot log please.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-19  8:49 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB3266989D15E017A789E14282D3CB0@BN8PR12MB3266.namprd12.prod.outlook.com>


On 19/07/2019 09:44, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/19/2019, 09:37:49 (UTC+00:00)
> 
>>
>> On 19/07/2019 08:51, Jose Abreu wrote:
>>> From: Jon Hunter <jonathanh@nvidia.com>
>>> Date: Jul/18/2019, 10:16:20 (UTC+00:00)
>>>
>>>> Have you tried using NFS on a board with this ethernet controller?
>>>
>>> I'm having some issues setting up the NFS server in order to replicate 
>>> so this may take some time.
>>
>> If that's the case, we may wish to consider reverting this for now as it
>> is preventing our board from booting. Appears to revert cleanly on top
>> of mainline.
>>
>>> Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
>>> the buffer address ?
>>
>> If you have a debug patch you would like me to apply and test with I
>> can. However, it is best you prepare the patch as maybe I will not dump
>> the appropriate addresses.
>>
>> Cheers
>> Jon
>>
>> -- 
>> nvpublic
> 
> Send me full boot log please.

Please see: https://paste.debian.net/1092277/

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Jason Wang @ 2019-07-19  8:51 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm
In-Reply-To: <20190719083920.67qo2umpthz454be@steredhat>


On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
>> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>>>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>>>> the length in the packet header.
>>>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
>>>>>>>> So how does it work right now? If an app
>>>>>>>> does sendmsg with a 64K buffer and the other
>>>>>>>> side publishes 4K buffers - does it just stall?
>>>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>>>
>>>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>>>> the TX worker, depending on the size of the buffers available in the
>>>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>>>> for now we postponed it)
>>>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>>>> Why is it done like this?
>>>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>>>> that the idea was to have only one worker that uses the vring, and
>>>>> multiple user threads that enqueue packets in the list.
>>>>> This can simplify the code and we can put the user threads to sleep if
>>>>> we don't have "credit" available (this means that the receiver doesn't
>>>>> have space to receive the packet).
>>>> I think you mean the reverse: even without credits you can copy from
>>>> user and queue up data, then process it without waking up the user
>>>> thread.
>>> I checked the code better, but it doesn't seem to do that.
>>> The .sendmsg callback of af_vsock, check if the transport has space
>>> (virtio-vsock transport returns the credit available). If there is no
>>> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>>>
>>> When the transport receives an update of credit available on the other
>>> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
>>> sleeping, that will queue the new packet.
>>>
>>> So, in the current implementation, the TX worker doesn't check the
>>> credit available, it only sends the packets.
>>>
>>>> Does it help though? It certainly adds up work outside of
>>>> user thread context which means it's not accounted for
>>>> correctly.
>>> I can try to xmit the packet directly in the user thread context, to see
>>> the improvements.
>>
>> It will then looks more like what virtio-net (and other networking device)
>> did.
> I'll try ASAP, the changes should not be too complicated... I hope :)
>
>>
>>>> Maybe we want more VQs. Would help improve parallelism. The question
>>>> would then become how to map sockets to VQs. With a simple hash
>>>> it's easy to create collisions ...
>>> Yes, more VQs can help but the map question is not simple to answer.
>>> Maybe we can do an hash on the (cid, port) or do some kind of estimation
>>> of queue utilization and try to balance.
>>> Should the mapping be unique?
>>
>> It sounds to me you want some kind of fair queuing? We've already had
>> several qdiscs that do this.
> Thanks for pointing it out!
>
>> So if we use the kernel networking xmit path, all those issues could be
>> addressed.
> One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> drawbacks in using the net-stack. (e.g. more latency due to the complexity
> of the net-stack?)


Yes, we need benchmark the performance. But as we've noticed, current 
vsock implementation is not efficient, and for stream socket, the 
overhead should be minimal. The most important thing is to avoid 
reinventing things that has already existed.

Thanks


>
> Thanks,
> Stefano

^ permalink raw reply

* [PATCH bpf] selftests/bpf: fix sendmsg6_prog on s390
From: Ilya Leoshkevich @ 2019-07-19  9:06 UTC (permalink / raw)
  To: bpf, netdev; +Cc: gor, heiko.carstens, rdna, Ilya Leoshkevich

"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in
sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to
leading IPv6 address digits, which is not the case on big-endian
machines.

Since checking bitwise operations doesn't seem to be the point of the
test, replace two short comparisons with a single int comparison.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 tools/testing/selftests/bpf/progs/sendmsg6_prog.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/sendmsg6_prog.c b/tools/testing/selftests/bpf/progs/sendmsg6_prog.c
index 5aeaa284fc47..a68062820410 100644
--- a/tools/testing/selftests/bpf/progs/sendmsg6_prog.c
+++ b/tools/testing/selftests/bpf/progs/sendmsg6_prog.c
@@ -41,8 +41,7 @@ int sendmsg_v6_prog(struct bpf_sock_addr *ctx)
 	}
 
 	/* Rewrite destination. */
-	if ((ctx->user_ip6[0] & 0xFFFF) == bpf_htons(0xFACE) &&
-	     ctx->user_ip6[0] >> 16 == bpf_htons(0xB00C)) {
+	if (ctx->user_ip6[0] == bpf_htonl(0xFACEB00C)) {
 		ctx->user_ip6[0] = bpf_htonl(DST_REWRITE_IP6_0);
 		ctx->user_ip6[1] = bpf_htonl(DST_REWRITE_IP6_1);
 		ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
-- 
2.21.0


^ permalink raw reply related

* [PATCH] ax88179_178a: Merge memcpy + le32_to_cpus to get_unaligned_le32
From: Chuhong Yuan @ 2019-07-19  9:07 UTC (permalink / raw)
  Cc: David S . Miller, linux-usb, netdev, linux-kernel, Chuhong Yuan

Merge the combo use of memcpy and le32_to_cpus.
Use get_unaligned_le32 instead.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/usb/ax88179_178a.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index 0bc457ba8574..72d165114b67 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1366,8 +1366,7 @@ static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 		return 0;
 
 	skb_trim(skb, skb->len - 4);
-	memcpy(&rx_hdr, skb_tail_pointer(skb), 4);
-	le32_to_cpus(&rx_hdr);
+	rx_hdr = get_unaligned_le32(skb_tail_pointer(skb));
 
 	pkt_cnt = (u16)rx_hdr;
 	hdr_off = (u16)(rx_hdr >> 16);
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf v3] bpf: fix narrower loads on s390
From: Ilya Leoshkevich @ 2019-07-19  9:18 UTC (permalink / raw)
  To: bpf, netdev, ys114321, alexei.starovoitov
  Cc: gor, heiko.carstens, Ilya Leoshkevich

The very first check in test_pkt_md_access is failing on s390, which
happens because loading a part of a struct __sk_buff field produces
an incorrect result.

The preprocessed code of the check is:

{
	__u8 tmp = *((volatile __u8 *)&skb->len +
		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
};

clang generates the following code for it:

      0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
      1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
      2:	57 30 00 00 00 00 00 ff	r3 &= 255
      3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>

Finally, verifier transforms it to:

  0: (61) r2 = *(u32 *)(r1 +104)
  1: (bc) w2 = w2
  2: (74) w2 >>= 24
  3: (bc) w2 = w2
  4: (54) w2 &= 255
  5: (bc) w2 = w2

The problem is that when verifier emits the code to replace a partial
load of a struct __sk_buff field (*(u8 *)(r1 + 3)) with a full load of
struct sk_buff field (*(u32 *)(r1 + 104)), an optional shift and a
bitwise AND, it assumes that the machine is little endian and
incorrectly decides to use a shift.

Adjust shift count calculation to account for endianness.

Fixes: 31fd85816dbe ("bpf: permits narrower load from bpf program context fields")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---

v1 -> v2: extract endianness-dependent code into a separate function
v2 -> v3: rename the function and move it to where it belongs

 include/linux/filter.h | 13 +++++++++++++
 kernel/bpf/verifier.c  |  4 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ff65d22cf336..92c6e31fb008 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -24,6 +24,7 @@
 
 #include <net/sch_generic.h>
 
+#include <asm/byteorder.h>
 #include <uapi/linux/filter.h>
 #include <uapi/linux/bpf.h>
 
@@ -747,6 +748,18 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 	return size <= size_default && (size & (size - 1)) == 0;
 }
 
+static inline u8
+bpf_ctx_narrow_load_shift(u32 off, u32 size, u32 size_default)
+{
+	u8 load_off = off & (size_default - 1);
+
+#ifdef __LITTLE_ENDIAN
+	return load_off * 8;
+#else
+	return (size_default - (load_off + size)) * 8;
+#endif
+}
+
 #define bpf_ctx_wide_access_ok(off, size, type, field)			\
 	(size == sizeof(__u64) &&					\
 	off >= offsetof(type, field) &&					\
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5900cbb966b1..c84d83f86141 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8616,8 +8616,8 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		}
 
 		if (is_narrower_load && size < target_size) {
-			u8 shift = (off & (size_default - 1)) * 8;
-
+			u8 shift = bpf_ctx_narrow_load_shift(off, size,
+							     size_default);
 			if (ctx_field_size <= 4) {
 				if (shift)
 					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-19  9:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm
In-Reply-To: <53da84b9-184f-1377-0582-ab7cf42ebdb6@redhat.com>

On Fri, Jul 19, 2019 at 04:51:00PM +0800, Jason Wang wrote:
> 
> On 2019/7/19 下午4:39, Stefano Garzarella wrote:
> > On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> > > On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > > > the length in the packet header.
> > > > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
> > > > > > > > > So how does it work right now? If an app
> > > > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > > > 
> > > > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > > > for now we postponed it)
> > > > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > > > Why is it done like this?
> > > > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > > > that the idea was to have only one worker that uses the vring, and
> > > > > > multiple user threads that enqueue packets in the list.
> > > > > > This can simplify the code and we can put the user threads to sleep if
> > > > > > we don't have "credit" available (this means that the receiver doesn't
> > > > > > have space to receive the packet).
> > > > > I think you mean the reverse: even without credits you can copy from
> > > > > user and queue up data, then process it without waking up the user
> > > > > thread.
> > > > I checked the code better, but it doesn't seem to do that.
> > > > The .sendmsg callback of af_vsock, check if the transport has space
> > > > (virtio-vsock transport returns the credit available). If there is no
> > > > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > > > 
> > > > When the transport receives an update of credit available on the other
> > > > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > > > sleeping, that will queue the new packet.
> > > > 
> > > > So, in the current implementation, the TX worker doesn't check the
> > > > credit available, it only sends the packets.
> > > > 
> > > > > Does it help though? It certainly adds up work outside of
> > > > > user thread context which means it's not accounted for
> > > > > correctly.
> > > > I can try to xmit the packet directly in the user thread context, to see
> > > > the improvements.
> > > 
> > > It will then looks more like what virtio-net (and other networking device)
> > > did.
> > I'll try ASAP, the changes should not be too complicated... I hope :)
> > 
> > > 
> > > > > Maybe we want more VQs. Would help improve parallelism. The question
> > > > > would then become how to map sockets to VQs. With a simple hash
> > > > > it's easy to create collisions ...
> > > > Yes, more VQs can help but the map question is not simple to answer.
> > > > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > > > of queue utilization and try to balance.
> > > > Should the mapping be unique?
> > > 
> > > It sounds to me you want some kind of fair queuing? We've already had
> > > several qdiscs that do this.
> > Thanks for pointing it out!
> > 
> > > So if we use the kernel networking xmit path, all those issues could be
> > > addressed.
> > One more point to AF_VSOCK + net-stack, but we have to evaluate possible
> > drawbacks in using the net-stack. (e.g. more latency due to the complexity
> > of the net-stack?)
> 
> 
> Yes, we need benchmark the performance. But as we've noticed, current vsock
> implementation is not efficient, and for stream socket, the overhead should
> be minimal. The most important thing is to avoid reinventing things that has
> already existed.

Got it. I completely agree with you, and I want to avoid reinventing things
(surely in a worse way).

But the idea (suggested also by Micheal) to discover how fast can go a
new protocol separate from the networking stack, is quite attractive :)

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH] qca8k: enable port flow control
From: Niklas Cassel @ 2019-07-19  9:31 UTC (permalink / raw)
  To: xiaofeis
  Cc: davem, vkoul, netdev, andrew, linux-arm-msm, bjorn.andersson,
	vivien.didelot, f.fainelli, xiazha
In-Reply-To: <1563504791-43398-1-git-send-email-xiaofeis@codeaurora.org>

On Fri, Jul 19, 2019 at 10:53:11AM +0800, xiaofeis wrote:
> Set phy device advertising to enable MAC flow control.
> 
> Change-Id: Ibf0f554b072fc73136ec9f7ffb90c20b25a4faae
> Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
> ---
>  drivers/net/dsa/qca8k.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
> index d93be14..95ac081 100644
> --- a/drivers/net/dsa/qca8k.c
> +++ b/drivers/net/dsa/qca8k.c
> @@ -1,7 +1,7 @@
>  /*
>   * Copyright (C) 2009 Felix Fietkau <nbd@nbd.name>
>   * Copyright (C) 2011-2012 Gabor Juhos <juhosg@openwrt.org>
> - * Copyright (c) 2015, The Linux Foundation. All rights reserved.
> + * Copyright (c) 2015, 2019, The Linux Foundation. All rights reserved.
>   * Copyright (c) 2016 John Crispin <john@phrozen.org>
>   *
>   * This program is free software; you can redistribute it and/or modify
> @@ -800,6 +800,8 @@
>  	qca8k_port_set_status(priv, port, 1);
>  	priv->port_sts[port].enabled = 1;
>  
> +	phy->advertising |= (ADVERTISED_Pause | ADVERTISED_Asym_Pause);

Drop the unnecessary parentheses.

Question for DSA maintainers: shouldn't this be implemented in the
dsa_switch_ops phylink_validate callback, like it's done for other
dsa drivers?


Kind regards,
Niklas

> +
>  	return 0;
>  }
>  
> -- 
> 1.9.1
> 

^ permalink raw reply

* Re: [PATCH] libertas_tf: Use correct channel range in lbtf_geo_init
From: kbuild test robot @ 2019-07-19  9:32 UTC (permalink / raw)
  To: YueHaibing
  Cc: kbuild-all, kvalo, davem, lkundrak, derosier, linux-kernel,
	netdev, linux-wireless, YueHaibing
In-Reply-To: <20190716132534.11256-1-yuehaibing@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

Hi YueHaibing,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.2 next-20190718]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/YueHaibing/libertas_tf-Use-correct-channel-range-in-lbtf_geo_init/20190718-011728
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/wireless/marvell/libertas_tf/cmd.c: In function 'lbtf_geo_init':
>> drivers/net/wireless/marvell/libertas_tf/cmd.c:68:17: error: 'range' is a pointer; did you mean to use '->'?
     for (ch = range.start; ch < range.end; ch++)
                    ^
                    ->
   drivers/net/wireless/marvell/libertas_tf/cmd.c:68:35: error: 'range' is a pointer; did you mean to use '->'?
     for (ch = range.start; ch < range.end; ch++)
                                      ^
                                      ->

vim +68 drivers/net/wireless/marvell/libertas_tf/cmd.c

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 66191 bytes --]

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19 10:25 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <4db855e4-1d59-d30b-154c-e7a2aa1c9047@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/19/2019, 09:49:10 (UTC+00:00)

> 
> On 19/07/2019 09:44, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/19/2019, 09:37:49 (UTC+00:00)
> > 
> >>
> >> On 19/07/2019 08:51, Jose Abreu wrote:
> >>> From: Jon Hunter <jonathanh@nvidia.com>
> >>> Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> >>>
> >>>> Have you tried using NFS on a board with this ethernet controller?
> >>>
> >>> I'm having some issues setting up the NFS server in order to replicate 
> >>> so this may take some time.
> >>
> >> If that's the case, we may wish to consider reverting this for now as it
> >> is preventing our board from booting. Appears to revert cleanly on top
> >> of mainline.
> >>
> >>> Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> >>> the buffer address ?
> >>
> >> If you have a debug patch you would like me to apply and test with I
> >> can. However, it is best you prepare the patch as maybe I will not dump
> >> the appropriate addresses.
> >>
> >> Cheers
> >> Jon
> >>
> >> -- 
> >> nvpublic
> > 
> > Send me full boot log please.
> 
> Please see: https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.debian.net_1092277_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=iHahNPEIegk1merE1utjRvC8Xoz5jQlNb1VRzPHk4-4&s=4UTbo8miS4M-PmGNup4OXgJOosgvJQZm9wcvWYjJs7k&e= 
> 
> Cheers
> Jon
> 
> -- 
> nvpublic

Thanks. Can you add attached patch and check if WARN is triggered ? And 
it would be good to know whether this is boot specific crash or just 
doesn't work at all, i.e. not using NFS to mount rootfs and instead 
manually configure interface and send/receive packets.

---
Thanks,
Jose Miguel Abreu

[-- Attachment #2: 0001-net-stmmac-Add-page-sanity-check.patch --]
[-- Type: application/octet-stream, Size: 1393 bytes --]

From d495620feccf24dc54218219c4c7f79c8696ecaa Mon Sep 17 00:00:00 2001
Message-Id: <d495620feccf24dc54218219c4c7f79c8696ecaa.1563531731.git.joabreu@synopsys.com>
From: Jose Abreu <joabreu@synopsys.com>
Date: Fri, 19 Jul 2019 12:21:44 +0200
Subject: [PATCH net] net: stmmac: Add page sanity check

Add a WARN_ON() when page is NULL.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>

---
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Jose Abreu <joabreu@synopsys.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: netdev@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5f1294ce0216..eac6920301e9 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3350,6 +3350,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
 		entry = next_entry;
 		buf = &rx_q->buf_pool[entry];
 
+		WARN_ON(!buf->page);
+
 		if (priv->extend_desc)
 			p = (struct dma_desc *)(rx_q->dma_erx + entry);
 		else
-- 
2.7.4


^ permalink raw reply related

* Re: [PATCH] be2net: fix adapter->big_page_size miscaculation
From: kbuild test robot @ 2019-07-19 10:32 UTC (permalink / raw)
  To: Qian Cai
  Cc: kbuild-all, davem, sathya.perla, ajit.khaparde,
	sriharsha.basavapatna, somnath.kotur, arnd, dhowells, hpa, netdev,
	linux-arch, linux-kernel, Qian Cai
In-Reply-To: <1562959401-19815-1-git-send-email-cai@lca.pw>

[-- Attachment #1: Type: text/plain, Size: 2951 bytes --]

Hi Qian,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.2 next-20190719]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Qian-Cai/be2net-fix-adapter-big_page_size-miscaculation/20190713-191644
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
>> drivers/net/ethernet/emulex/benet/be_main.c:3138:33: error: implicit declaration of function '__get_order'; did you mean 'get_order'? [-Werror=implicit-function-declaration]
     adapter->big_page_size = (1 << __get_order(rx_frag_size)) * PAGE_SIZE;
                                    ^~~~~~~~~~~
                                    get_order
   cc1: some warnings being treated as errors

vim +3138 drivers/net/ethernet/emulex/benet/be_main.c

  3116	
  3117	static int be_rx_cqs_create(struct be_adapter *adapter)
  3118	{
  3119		struct be_queue_info *eq, *cq;
  3120		struct be_rx_obj *rxo;
  3121		int rc, i;
  3122	
  3123		adapter->num_rss_qs =
  3124				min(adapter->num_evt_qs, adapter->cfg_num_rx_irqs);
  3125	
  3126		/* We'll use RSS only if atleast 2 RSS rings are supported. */
  3127		if (adapter->num_rss_qs < 2)
  3128			adapter->num_rss_qs = 0;
  3129	
  3130		adapter->num_rx_qs = adapter->num_rss_qs + adapter->need_def_rxq;
  3131	
  3132		/* When the interface is not capable of RSS rings (and there is no
  3133		 * need to create a default RXQ) we'll still need one RXQ
  3134		 */
  3135		if (adapter->num_rx_qs == 0)
  3136			adapter->num_rx_qs = 1;
  3137	
> 3138		adapter->big_page_size = (1 << __get_order(rx_frag_size)) * PAGE_SIZE;
  3139		for_all_rx_queues(adapter, rxo, i) {
  3140			rxo->adapter = adapter;
  3141			cq = &rxo->cq;
  3142			rc = be_queue_alloc(adapter, cq, RX_CQ_LEN,
  3143					    sizeof(struct be_eth_rx_compl));
  3144			if (rc)
  3145				return rc;
  3146	
  3147			u64_stats_init(&rxo->stats.sync);
  3148			eq = &adapter->eq_obj[i % adapter->num_evt_qs].q;
  3149			rc = be_cmd_cq_create(adapter, cq, eq, false, 3);
  3150			if (rc)
  3151				return rc;
  3152		}
  3153	
  3154		dev_info(&adapter->pdev->dev,
  3155			 "created %d RX queue(s)\n", adapter->num_rx_qs);
  3156		return 0;
  3157	}
  3158	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 54343 bytes --]

^ permalink raw reply

* [patch net-next rfc 2/7] net: introduce name_node struct to be used in hashlist
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/linux/netdevice.h | 10 +++-
 net/core/dev.c            | 96 +++++++++++++++++++++++++++++++--------
 2 files changed, 86 insertions(+), 20 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 88292953aa6f..74f99f127b0e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -918,6 +918,12 @@ struct dev_ifalias {
 struct devlink;
 struct tlsdev_ops;
 
+struct netdev_name_node {
+	struct hlist_node hlist;
+	struct net_device *dev;
+	char *name;
+};
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1551,7 +1557,7 @@ enum netdev_priv_flags {
  *		(i.e. as seen by users in the "Space.c" file).  It is the name
  *		of the interface.
  *
- *	@name_hlist: 	Device name hash chain, please keep it close to name[]
+ *	@name_node:	Name hashlist node
  *	@ifalias:	SNMP alias
  *	@mem_end:	Shared memory end
  *	@mem_start:	Shared memory start
@@ -1761,7 +1767,7 @@ enum netdev_priv_flags {
 
 struct net_device {
 	char			name[IFNAMSIZ];
-	struct hlist_node	name_hlist;
+	struct netdev_name_node	*name_node;
 	struct dev_ifalias	__rcu *ifalias;
 	/*
 	 *	I/O specific fields
diff --git a/net/core/dev.c b/net/core/dev.c
index fc676b2610e3..ad0d42fbdeee 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -228,6 +228,66 @@ static inline void rps_unlock(struct softnet_data *sd)
 #endif
 }
 
+static struct netdev_name_node *netdev_name_node_alloc(struct net_device *dev,
+						       char *name)
+{
+	struct netdev_name_node *name_node;
+
+	name_node = kzalloc(sizeof(*name_node), GFP_KERNEL);
+	if (!name_node)
+		return NULL;
+	name_node->dev = dev;
+	name_node->name = name;
+	return name_node;
+}
+
+static struct netdev_name_node *
+netdev_name_node_head_alloc(struct net_device *dev)
+{
+	return netdev_name_node_alloc(dev, dev->name);
+}
+
+static void netdev_name_node_free(struct netdev_name_node *name_node)
+{
+	kfree(name_node);
+}
+
+static void netdev_name_node_add(struct net *net,
+				 struct netdev_name_node *name_node)
+{
+	hlist_add_head_rcu(&name_node->hlist,
+			   dev_name_hash(net, name_node->name));
+}
+
+static void netdev_name_node_del(struct netdev_name_node *name_node)
+{
+	hlist_del_rcu(&name_node->hlist);
+}
+
+static struct netdev_name_node *netdev_name_node_lookup(struct net *net,
+							const char *name)
+{
+	struct hlist_head *head = dev_name_hash(net, name);
+	struct netdev_name_node *name_node;
+
+	hlist_for_each_entry(name_node, head, hlist)
+		if (!strcmp(name_node->name, name))
+			return name_node;
+	return NULL;
+}
+
+static struct netdev_name_node *netdev_name_node_lookup_rcu(struct net *net,
+							    const char *name)
+{
+	struct hlist_head *head = dev_name_hash(net, name);
+	struct netdev_name_node *name_node;
+
+	hlist_for_each_entry_rcu(name_node, head, hlist)
+		if (!strcmp(name_node->name, name))
+			return name_node;
+	return NULL;
+}
+
 /* Device list insertion */
 static void list_netdevice(struct net_device *dev)
 {
@@ -237,7 +297,7 @@ static void list_netdevice(struct net_device *dev)
 
 	write_lock_bh(&dev_base_lock);
 	list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
-	hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
+	netdev_name_node_add(net, dev->name_node);
 	hlist_add_head_rcu(&dev->index_hlist,
 			   dev_index_hash(net, dev->ifindex));
 	write_unlock_bh(&dev_base_lock);
@@ -255,7 +315,7 @@ static void unlist_netdevice(struct net_device *dev)
 	/* Unlink dev from the device chain */
 	write_lock_bh(&dev_base_lock);
 	list_del_rcu(&dev->dev_list);
-	hlist_del_rcu(&dev->name_hlist);
+	netdev_name_node_del(dev->name_node);
 	hlist_del_rcu(&dev->index_hlist);
 	write_unlock_bh(&dev_base_lock);
 
@@ -733,14 +793,10 @@ EXPORT_SYMBOL_GPL(dev_fill_metadata_dst);
 
 struct net_device *__dev_get_by_name(struct net *net, const char *name)
 {
-	struct net_device *dev;
-	struct hlist_head *head = dev_name_hash(net, name);
+	struct netdev_name_node *node_name;
 
-	hlist_for_each_entry(dev, head, name_hlist)
-		if (!strncmp(dev->name, name, IFNAMSIZ))
-			return dev;
-
-	return NULL;
+	node_name = netdev_name_node_lookup(net, name);
+	return node_name ? node_name->dev : NULL;
 }
 EXPORT_SYMBOL(__dev_get_by_name);
 
@@ -758,14 +814,10 @@ EXPORT_SYMBOL(__dev_get_by_name);
 
 struct net_device *dev_get_by_name_rcu(struct net *net, const char *name)
 {
-	struct net_device *dev;
-	struct hlist_head *head = dev_name_hash(net, name);
-
-	hlist_for_each_entry_rcu(dev, head, name_hlist)
-		if (!strncmp(dev->name, name, IFNAMSIZ))
-			return dev;
+	struct netdev_name_node *node_name;
 
-	return NULL;
+	node_name = netdev_name_node_lookup_rcu(net, name);
+	return node_name ? node_name->dev : NULL;
 }
 EXPORT_SYMBOL(dev_get_by_name_rcu);
 
@@ -1232,13 +1284,13 @@ int dev_change_name(struct net_device *dev, const char *newname)
 	netdev_adjacent_rename_links(dev, oldname);
 
 	write_lock_bh(&dev_base_lock);
-	hlist_del_rcu(&dev->name_hlist);
+	netdev_name_node_del(dev->name_node);
 	write_unlock_bh(&dev_base_lock);
 
 	synchronize_rcu();
 
 	write_lock_bh(&dev_base_lock);
-	hlist_add_head_rcu(&dev->name_hlist, dev_name_hash(net, dev->name));
+	netdev_name_node_add(net, dev->name_node);
 	write_unlock_bh(&dev_base_lock);
 
 	ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
@@ -8206,6 +8258,8 @@ static void rollback_registered_many(struct list_head *head)
 		dev_uc_flush(dev);
 		dev_mc_flush(dev);
 
+		netdev_name_node_free(dev->name_node);
+
 		if (dev->netdev_ops->ndo_uninit)
 			dev->netdev_ops->ndo_uninit(dev);
 
@@ -8648,6 +8702,10 @@ int register_netdevice(struct net_device *dev)
 	if (ret < 0)
 		goto out;
 
+	dev->name_node = netdev_name_node_head_alloc(dev);
+	if (!dev->name_node)
+		goto out;
+
 	/* Init, if this function is available */
 	if (dev->netdev_ops->ndo_init) {
 		ret = dev->netdev_ops->ndo_init(dev);
@@ -8767,6 +8825,8 @@ int register_netdevice(struct net_device *dev)
 	return ret;
 
 err_uninit:
+	if (dev->name_node)
+		netdev_name_node_free(dev->name_node);
 	if (dev->netdev_ops->ndo_uninit)
 		dev->netdev_ops->ndo_uninit(dev);
 	if (dev->priv_destructor)
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 1/7] net: procfs: use index hashlist instead of name hashlist
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw
In-Reply-To: <20190719110029.29466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Name hashlist is going to be used for more than just dev->name, so use
rather index hashlist for iteration over net_device instances.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/net-procfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 36347933ec3a..6bbd06f7dc7d 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -20,8 +20,8 @@ static inline struct net_device *dev_from_same_bucket(struct seq_file *seq, loff
 	struct hlist_head *h;
 	unsigned int count = 0, offset = get_offset(*pos);
 
-	h = &net->dev_name_head[get_bucket(*pos)];
-	hlist_for_each_entry_rcu(dev, h, name_hlist) {
+	h = &net->dev_index_head[get_bucket(*pos)];
+	hlist_for_each_entry_rcu(dev, h, index_hlist) {
 		if (++count == offset)
 			return dev;
 	}
-- 
2.21.0


^ permalink raw reply related

* [patch net-next rfc 0/7] net: introduce alternative names for network interfaces
From: Jiri Pirko @ 2019-07-19 11:00 UTC (permalink / raw)
  To: netdev
  Cc: davem, jakub.kicinski, sthemmin, dsahern, dcbw, mkubecek, andrew,
	parav, saeedm, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

In the past, there was repeatedly discussed the IFNAMSIZ (16) limit for
netdevice name length. Now when we have PF and VF representors
with port names like "pfXvfY", it became quite common to hit this limit:
0123456789012345
enp131s0f1npf0vf6
enp131s0f1npf0vf22

Udev cannot rename these interfaces out-of-the-box and user needs to
create custom rules to handle them.

Also, udev has multiple schemes of netdev names. From udev code:
 * Type of names:
 *   b<number>                             - BCMA bus core number
 *   c<bus_id>                             - bus id of a grouped CCW or CCW device,
 *                                           with all leading zeros stripped [s390]
 *   o<index>[n<phys_port_name>|d<dev_port>]
 *                                         - on-board device index number
 *   s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
 *                                         - hotplug slot index number
 *   x<MAC>                                - MAC address
 *   [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>]
 *                                         - PCI geographical location
 *   [P<domain>]p<bus>s<slot>[f<function>][u<port>][..][c<config>][i<interface>]
 *                                         - USB port number chain
 *   v<slot>                               - VIO slot number (IBM PowerVM)
 *   a<vendor><model>i<instance>           - Platform bus ACPI instance id
 *   i<addr>n<phys_port_name>              - Netdevsim bus address and port name

One device can be often renamed by multiple patterns at the
same time (e.g. pci address/mac).

This patchset introduces alternative names for network interfaces.
Main goal is to:
1) Overcome the IFNAMSIZ limitation
2) Allow to have multiple names at the same time (multiple udev patterns)
3) Allow to use alternative names as handle for commands

See following examples.

$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff

-> Add alternative names for dummy0:

$ ip link altname add dummy0 name someothername
$ ip link altname add dummy0 name someotherveryveryveryverylongname
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname
  
-> Add bridge brx, add it's alternative name and use alternative names to
   do enslavement.

$ ip link add name brx type bridge
$ ip link altname add brx name mypersonalsuperspecialbridge
$ ip link set someotherveryveryveryverylongname master mypersonalsuperspecialbridge
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname someothername
    altname someotherveryveryveryverylongname
4: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge

-> Add ipv4 address to the bridge using alternative name:
    
$ ip addr add 192.168.0.1/24 dev mypersonalsuperspecialbridge
$ ip addr show mypersonalsuperspecialbridge     
4: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge
    inet 192.168.0.1/24 scope global brx
       valid_lft forever preferred_lft forever

-> Delete one of dummy0 alternative names:

$ ip link altname del someotherveryveryveryverylongname    
$ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop master brx state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname someothername
4: brx: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 7e:a2:d4:b8:91:7a brd ff:ff:ff:ff:ff:ff
    altname mypersonalsuperspecialbridge

TODO:
- notifications for alternative names add/removal
- sanitization of add/del cmds (similar to get link)
- test more usecases and write selftests
- extend support for other netlink ifaces (ovs for example)
- add sysfs symlink altname->basename?

Jiri Pirko (7):
  net: procfs: use index hashlist instead of name hashlist
  net: introduce name_node struct to be used in hashlist
  net: rtnetlink: add commands to add and delete alternative ifnames
  net: rtnetlink: put alternative names to getlink message
  net: rtnetlink: unify the code in __rtnl_newlink get dev with the rest
  net: rtnetlink: introduce helper to get net_device instance by ifname
  net: rtnetlink: add possibility to use alternative names as message
    handle

 include/linux/netdevice.h      |  14 ++-
 include/uapi/linux/if.h        |   1 +
 include/uapi/linux/if_link.h   |   3 +
 include/uapi/linux/rtnetlink.h |   7 ++
 net/core/dev.c                 | 152 ++++++++++++++++++++++----
 net/core/net-procfs.c          |   4 +-
 net/core/rtnetlink.c           | 192 +++++++++++++++++++++++++++++----
 security/selinux/nlmsgtab.c    |   4 +-
 8 files changed, 334 insertions(+), 43 deletions(-)

-- 
2.21.0


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox