Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19  8:44 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <bc9ab3c5-b1b9-26d4-7b73-01474328eafa@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/19/2019, 09:37:49 (UTC+00:00)

> 
> On 19/07/2019 08:51, Jose Abreu wrote:
> > From: Jon Hunter <jonathanh@nvidia.com>
> > Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> > 
> >> Have you tried using NFS on a board with this ethernet controller?
> > 
> > I'm having some issues setting up the NFS server in order to replicate 
> > so this may take some time.
> 
> If that's the case, we may wish to consider reverting this for now as it
> is preventing our board from booting. Appears to revert cleanly on top
> of mainline.
> 
> > Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> > the buffer address ?
> 
> If you have a debug patch you would like me to apply and test with I
> can. However, it is best you prepare the patch as maybe I will not dump
> the appropriate addresses.
> 
> Cheers
> Jon
> 
> -- 
> nvpublic

Send me full boot log please.

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-19  8:39 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, netdev, linux-kernel, Stefan Hajnoczi,
	David S. Miller, virtualization, kvm
In-Reply-To: <fcd19719-e5a9-adad-1e6c-c84487187088@redhat.com>

On Fri, Jul 19, 2019 at 04:21:52PM +0800, Jason Wang wrote:
> 
> On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > > > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
> > > > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > > > the length in the packet header.
> > > > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > > > > 
> > > > > > > > Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
> > > > > > > So how does it work right now? If an app
> > > > > > > does sendmsg with a 64K buffer and the other
> > > > > > > side publishes 4K buffers - does it just stall?
> > > > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > > > > 
> > > > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > > > the TX worker, depending on the size of the buffers available in the
> > > > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > > > for now we postponed it)
> > > > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > > > Why is it done like this?
> > > > Honestly, I don't know the exact reasons for this design, but I suppose
> > > > that the idea was to have only one worker that uses the vring, and
> > > > multiple user threads that enqueue packets in the list.
> > > > This can simplify the code and we can put the user threads to sleep if
> > > > we don't have "credit" available (this means that the receiver doesn't
> > > > have space to receive the packet).
> > > I think you mean the reverse: even without credits you can copy from
> > > user and queue up data, then process it without waking up the user
> > > thread.
> > I checked the code better, but it doesn't seem to do that.
> > The .sendmsg callback of af_vsock, check if the transport has space
> > (virtio-vsock transport returns the credit available). If there is no
> > space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
> > 
> > When the transport receives an update of credit available on the other
> > peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> > sleeping, that will queue the new packet.
> > 
> > So, in the current implementation, the TX worker doesn't check the
> > credit available, it only sends the packets.
> > 
> > > Does it help though? It certainly adds up work outside of
> > > user thread context which means it's not accounted for
> > > correctly.
> > I can try to xmit the packet directly in the user thread context, to see
> > the improvements.
> 
> 
> It will then looks more like what virtio-net (and other networking device)
> did.

I'll try ASAP, the changes should not be too complicated... I hope :)

> 
> 
> > 
> > > Maybe we want more VQs. Would help improve parallelism. The question
> > > would then become how to map sockets to VQs. With a simple hash
> > > it's easy to create collisions ...
> > Yes, more VQs can help but the map question is not simple to answer.
> > Maybe we can do an hash on the (cid, port) or do some kind of estimation
> > of queue utilization and try to balance.
> > Should the mapping be unique?
> 
> 
> It sounds to me you want some kind of fair queuing? We've already had
> several qdiscs that do this.

Thanks for pointing it out!

> 
> So if we use the kernel networking xmit path, all those issues could be
> addressed.

One more point to AF_VSOCK + net-stack, but we have to evaluate possible
drawbacks in using the net-stack. (e.g. more latency due to the complexity
of the net-stack?)

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-19  8:37 UTC (permalink / raw)
  To: Jose Abreu, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <BN8PR12MB3266960A104A7CDBB4E59192D3CB0@BN8PR12MB3266.namprd12.prod.outlook.com>


On 19/07/2019 08:51, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/18/2019, 10:16:20 (UTC+00:00)
> 
>> Have you tried using NFS on a board with this ethernet controller?
> 
> I'm having some issues setting up the NFS server in order to replicate 
> so this may take some time.

If that's the case, we may wish to consider reverting this for now as it
is preventing our board from booting. Appears to revert cleanly on top
of mainline.

> Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
> the buffer address ?

If you have a debug patch you would like me to apply and test with I
can. However, it is best you prepare the patch as maybe I will not dump
the appropriate addresses.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH v4 5/5] vsock/virtio: change the maximum packet size allowed
From: Stefano Garzarella @ 2019-07-19  8:29 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190718083105-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 08:33:40AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 09:52:41AM +0200, Stefano Garzarella wrote:
> > On Wed, Jul 17, 2019 at 5:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jul 17, 2019 at 01:30:30PM +0200, Stefano Garzarella wrote:
> > > > Since now we are able to split packets, we can avoid limiting
> > > > their sizes to VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE.
> > > > Instead, we can use VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max
> > > > packet size.
> > > >
> > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > >
> > >
> > > OK so this is kind of like GSO where we are passing
> > > 64K packets to the vsock and then split at the
> > > low level.
> > 
> > Exactly, something like that in the Host->Guest path, instead in the
> > Guest->Host we use the entire 64K packet.
> > 
> > Thanks,
> > Stefano
> 
> btw two allocations for each packet isn't great. How about
> allocating the struct linearly with the data?

Are you referring to the kzalloc() to allocate the 'struct
virtio_vsock_pkt', followed by the kmalloc() to allocate the buffer?

Actually they don't look great, I will try to do a single allocation.

> And all buffers are same length for you - so you can actually
> do alloc_pages.

Yes, also Jason suggested it and we decided to postpone since we will
try to reuse the virtio-net where it comes for free.

> Allocating/freeing pages in a batch should also be considered.

For the allocation of guest rx buffers we do some kind of batching (we
refill the queue when it reaches the half), but only it this case :(

I'll try to do more alloc/free batching.

Thanks,
Stefano

^ permalink raw reply

* [PATCH] usbnet: smsc75xx: Merge memcpy + le32_to_cpus to get_unaligned_le32
From: Chuhong Yuan @ 2019-07-19  8:27 UTC (permalink / raw)
  Cc: Steve Glendinning, David S . Miller, netdev, linux-usb,
	linux-kernel, Chuhong Yuan

Merge the combo use of memcpy and le32_to_cpus.
Use get_unaligned_le32 instead.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/usb/smsc75xx.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c
index 1417a22962a1..7fac9db5380d 100644
--- a/drivers/net/usb/smsc75xx.c
+++ b/drivers/net/usb/smsc75xx.c
@@ -661,8 +661,7 @@ static void smsc75xx_status(struct usbnet *dev, struct urb *urb)
 		return;
 	}
 
-	memcpy(&intdata, urb->transfer_buffer, 4);
-	le32_to_cpus(&intdata);
+	intdata = get_unaligned_le32(urb->transfer_buffer);
 
 	netif_dbg(dev, link, dev->net, "intdata: 0x%08X\n", intdata);
 
@@ -2181,12 +2180,10 @@ static int smsc75xx_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
 		struct sk_buff *ax_skb;
 		unsigned char *packet;
 
-		memcpy(&rx_cmd_a, skb->data, sizeof(rx_cmd_a));
-		le32_to_cpus(&rx_cmd_a);
+		rx_cmd_a = get_unaligned_le32(skb->data);
 		skb_pull(skb, 4);
 
-		memcpy(&rx_cmd_b, skb->data, sizeof(rx_cmd_b));
-		le32_to_cpus(&rx_cmd_b);
+		rx_cmd_b = get_unaligned_le32(skb->data);
 		skb_pull(skb, 4 + RXW_PADDING);
 
 		packet = skb->data;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Jason Wang @ 2019-07-19  8:21 UTC (permalink / raw)
  To: Stefano Garzarella, Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, kvm
In-Reply-To: <20190719080832.7hoeus23zjyrx3cc@steredhat>


On 2019/7/19 下午4:08, Stefano Garzarella wrote:
> On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
>> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
>>> On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>> On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
>>>>> On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin<mst@redhat.com>  wrote:
>>>>>> On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
>>>>>>> If the packets to sent to the guest are bigger than the buffer
>>>>>>> available, we can split them, using multiple buffers and fixing
>>>>>>> the length in the packet header.
>>>>>>> This is safe since virtio-vsock supports only stream sockets.
>>>>>>>
>>>>>>> Signed-off-by: Stefano Garzarella<sgarzare@redhat.com>
>>>>>> So how does it work right now? If an app
>>>>>> does sendmsg with a 64K buffer and the other
>>>>>> side publishes 4K buffers - does it just stall?
>>>>> Before this series, the 64K (or bigger) user messages was split in 4K packets
>>>>> (fixed in the code) and queued in an internal list for the TX worker.
>>>>>
>>>>> After this series, we will queue up to 64K packets and then it will be split in
>>>>> the TX worker, depending on the size of the buffers available in the
>>>>> vring. (The idea was to allow EWMA or a configuration of the buffers size, but
>>>>> for now we postponed it)
>>>> Got it. Using workers for xmit is IMHO a bad idea btw.
>>>> Why is it done like this?
>>> Honestly, I don't know the exact reasons for this design, but I suppose
>>> that the idea was to have only one worker that uses the vring, and
>>> multiple user threads that enqueue packets in the list.
>>> This can simplify the code and we can put the user threads to sleep if
>>> we don't have "credit" available (this means that the receiver doesn't
>>> have space to receive the packet).
>> I think you mean the reverse: even without credits you can copy from
>> user and queue up data, then process it without waking up the user
>> thread.
> I checked the code better, but it doesn't seem to do that.
> The .sendmsg callback of af_vsock, check if the transport has space
> (virtio-vsock transport returns the credit available). If there is no
> space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.
>
> When the transport receives an update of credit available on the other
> peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
> sleeping, that will queue the new packet.
>
> So, in the current implementation, the TX worker doesn't check the
> credit available, it only sends the packets.
>
>> Does it help though? It certainly adds up work outside of
>> user thread context which means it's not accounted for
>> correctly.
> I can try to xmit the packet directly in the user thread context, to see
> the improvements.


It will then looks more like what virtio-net (and other networking 
device) did.


>
>> Maybe we want more VQs. Would help improve parallelism. The question
>> would then become how to map sockets to VQs. With a simple hash
>> it's easy to create collisions ...
> Yes, more VQs can help but the map question is not simple to answer.
> Maybe we can do an hash on the (cid, port) or do some kind of estimation
> of queue utilization and try to balance.
> Should the mapping be unique?


It sounds to me you want some kind of fair queuing? We've already had 
several qdiscs that do this.

So if we use the kernel networking xmit path, all those issues could be 
addressed.

Thanks

>

^ permalink raw reply

* [PATCH v2] vrf: make sure skb->data contains ip header to make routing
From: Peter Kosyh @ 2019-07-19  8:11 UTC (permalink / raw)
  To: David Ahern; +Cc: davem, Peter Kosyh, Shrijeet Mukherjee, netdev, linux-kernel
In-Reply-To: <213bada2-fe81-3c14-1506-11abf0f3ca22@cumulusnetworks.com>

vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
using ip/ipv6 addresses, but don't make sure the header is available
in skb->data[] (skb_headlen() is less then header size).

Case:

1) igb driver from intel.
2) Packet size is greater then 255.
3) MPLS forwards to VRF device.

So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
functions.

Signed-off-by: Peter Kosyh <p.kosyh@gmail.com>
---
 drivers/net/vrf.c | 58 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 54edf8956a25..6e84328bdd40 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -165,23 +165,29 @@ static int vrf_ip6_local_out(struct net *net, struct sock *sk,
 static netdev_tx_t vrf_process_v6_outbound(struct sk_buff *skb,
 					   struct net_device *dev)
 {
-	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	const struct ipv6hdr *iph;
 	struct net *net = dev_net(skb->dev);
-	struct flowi6 fl6 = {
-		/* needed to match OIF rule */
-		.flowi6_oif = dev->ifindex,
-		.flowi6_iif = LOOPBACK_IFINDEX,
-		.daddr = iph->daddr,
-		.saddr = iph->saddr,
-		.flowlabel = ip6_flowinfo(iph),
-		.flowi6_mark = skb->mark,
-		.flowi6_proto = iph->nexthdr,
-		.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF,
-	};
+	struct flowi6 fl6;
 	int ret = NET_XMIT_DROP;
 	struct dst_entry *dst;
 	struct dst_entry *dst_null = &net->ipv6.ip6_null_entry->dst;
 
+	if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct ipv6hdr)))
+		goto err;
+
+	iph = ipv6_hdr(skb);
+
+	memset(&fl6, 0, sizeof(fl6));
+	/* needed to match OIF rule */
+	fl6.flowi6_oif = dev->ifindex;
+	fl6.flowi6_iif = LOOPBACK_IFINDEX;
+	fl6.daddr = iph->daddr;
+	fl6.saddr = iph->saddr;
+	fl6.flowlabel = ip6_flowinfo(iph);
+	fl6.flowi6_mark = skb->mark;
+	fl6.flowi6_proto = iph->nexthdr;
+	fl6.flowi6_flags = FLOWI_FLAG_SKIP_NH_OIF;
+
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (dst == dst_null)
 		goto err;
@@ -237,21 +243,27 @@ static int vrf_ip_local_out(struct net *net, struct sock *sk,
 static netdev_tx_t vrf_process_v4_outbound(struct sk_buff *skb,
 					   struct net_device *vrf_dev)
 {
-	struct iphdr *ip4h = ip_hdr(skb);
+	struct iphdr *ip4h;
 	int ret = NET_XMIT_DROP;
-	struct flowi4 fl4 = {
-		/* needed to match OIF rule */
-		.flowi4_oif = vrf_dev->ifindex,
-		.flowi4_iif = LOOPBACK_IFINDEX,
-		.flowi4_tos = RT_TOS(ip4h->tos),
-		.flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF,
-		.flowi4_proto = ip4h->protocol,
-		.daddr = ip4h->daddr,
-		.saddr = ip4h->saddr,
-	};
+	struct flowi4 fl4;
 	struct net *net = dev_net(vrf_dev);
 	struct rtable *rt;
 
+	if (!pskb_may_pull(skb, ETH_HLEN + sizeof(struct iphdr)))
+		goto err;
+
+	ip4h = ip_hdr(skb);
+
+	memset(&fl4, 0, sizeof(fl4));
+	/* needed to match OIF rule */
+	fl4.flowi4_oif = vrf_dev->ifindex;
+	fl4.flowi4_iif = LOOPBACK_IFINDEX;
+	fl4.flowi4_tos = RT_TOS(ip4h->tos);
+	fl4.flowi4_flags = FLOWI_FLAG_ANYSRC | FLOWI_FLAG_SKIP_NH_OIF;
+	fl4.flowi4_proto = ip4h->protocol;
+	fl4.daddr = ip4h->daddr;
+	fl4.saddr = ip4h->saddr;
+
 	rt = ip_route_output_flow(net, &fl4, NULL);
 	if (IS_ERR(rt))
 		goto err;
-- 
2.11.0


^ permalink raw reply related

* Re: [PATCH v4 4/5] vhost/vsock: split packets to send using multiple buffers
From: Stefano Garzarella @ 2019-07-19  8:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Stefan Hajnoczi, David S. Miller,
	virtualization, Jason Wang, kvm
In-Reply-To: <20190718072741-mutt-send-email-mst@kernel.org>

On Thu, Jul 18, 2019 at 07:35:46AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 18, 2019 at 11:37:30AM +0200, Stefano Garzarella wrote:
> > On Thu, Jul 18, 2019 at 10:13 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Thu, Jul 18, 2019 at 09:50:14AM +0200, Stefano Garzarella wrote:
> > > > On Wed, Jul 17, 2019 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > On Wed, Jul 17, 2019 at 01:30:29PM +0200, Stefano Garzarella wrote:
> > > > > > If the packets to sent to the guest are bigger than the buffer
> > > > > > available, we can split them, using multiple buffers and fixing
> > > > > > the length in the packet header.
> > > > > > This is safe since virtio-vsock supports only stream sockets.
> > > > > >
> > > > > > Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> > > > >
> > > > > So how does it work right now? If an app
> > > > > does sendmsg with a 64K buffer and the other
> > > > > side publishes 4K buffers - does it just stall?
> > > >
> > > > Before this series, the 64K (or bigger) user messages was split in 4K packets
> > > > (fixed in the code) and queued in an internal list for the TX worker.
> > > >
> > > > After this series, we will queue up to 64K packets and then it will be split in
> > > > the TX worker, depending on the size of the buffers available in the
> > > > vring. (The idea was to allow EWMA or a configuration of the buffers size, but
> > > > for now we postponed it)
> > >
> > > Got it. Using workers for xmit is IMHO a bad idea btw.
> > > Why is it done like this?
> > 
> > Honestly, I don't know the exact reasons for this design, but I suppose
> > that the idea was to have only one worker that uses the vring, and
> > multiple user threads that enqueue packets in the list.
> > This can simplify the code and we can put the user threads to sleep if
> > we don't have "credit" available (this means that the receiver doesn't
> > have space to receive the packet).
> 
> 
> I think you mean the reverse: even without credits you can copy from
> user and queue up data, then process it without waking up the user
> thread.

I checked the code better, but it doesn't seem to do that.
The .sendmsg callback of af_vsock, check if the transport has space
(virtio-vsock transport returns the credit available). If there is no
space, it put the thread to sleep on the 'sk_sleep(sk)' wait_queue.

When the transport receives an update of credit available on the other
peer, it calls 'sk->sk_write_space(sk)' that wakes up the thread
sleeping, that will queue the new packet.

So, in the current implementation, the TX worker doesn't check the
credit available, it only sends the packets.

> Does it help though? It certainly adds up work outside of
> user thread context which means it's not accounted for
> correctly.

I can try to xmit the packet directly in the user thread context, to see
the improvements.

> 
> Maybe we want more VQs. Would help improve parallelism. The question
> would then become how to map sockets to VQs. With a simple hash
> it's easy to create collisions ...

Yes, more VQs can help but the map question is not simple to answer.
Maybe we can do an hash on the (cid, port) or do some kind of estimation
of queue utilization and try to balance.
Should the mapping be unique?

> 
> 
> > 
> > What are the drawbacks in your opinion?
> > 
> > 
> > Thanks,
> > Stefano
> 
> - More pressure on scheduler
> - Increased latency
> 

Thanks,
Stefano

^ permalink raw reply

* Re: [PATCH net,v4 4/4] net: flow_offload: add flow_block structure and use it
From: Jiri Pirko @ 2019-07-19  8:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, jakub.kicinski, pshelar
In-Reply-To: <20190718175931.13529-5-pablo@netfilter.org>

Thu, Jul 18, 2019 at 07:59:31PM CEST, pablo@netfilter.org wrote:
>This object stores the flow block callbacks that are attached to this
>block. Update flow_block_cb_lookup() to take this new object.
>
>This patch restores the block sharing feature.
>
>Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
>Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH net,v4 1/4] net: openvswitch: rename flow_stats to sw_flow_stats
From: Jiri Pirko @ 2019-07-19  7:58 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev, jakub.kicinski, pshelar
In-Reply-To: <20190718175931.13529-2-pablo@netfilter.org>

Thu, Jul 18, 2019 at 07:59:28PM CEST, pablo@netfilter.org wrote:
>There is a flow_stats structure defined in include/net/flow_offload.h
>and a follow up patch adds #include <net/flow_offload.h> to
>net/sch_generic.h.
>
>This breaks compilation since OVS codebase includes net/sock.h which
>pulls in linux/filter.h which includes net/sch_generic.h.
>
>In file included from ./include/net/sch_generic.h:18:0,
>                 from ./include/linux/filter.h:25,
>                 from ./include/net/sock.h:59,
>                 from ./include/linux/tcp.h:19,
>                 from net/openvswitch/datapath.c:24
>
>This definition takes precedence to OVS since it is placed in the
>networking core, so rename flow_stats in OVS to sw_flow_stats since
>this structure is contained in the sw_flow object.
>
>Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.2 226/249] selftests: bpf: fix inlines in test_lwt_seg6local
From: Jiri Benc @ 2019-07-19  7:54 UTC (permalink / raw)
  To: David Miller
  Cc: sashal, linux-kernel, stable, yhs, daniel, linux-kselftest,
	netdev, bpf, clang-built-linux
In-Reply-To: <20190718.115534.1778444973119064345.davem@davemloft.net>

On Thu, 18 Jul 2019 11:55:34 -0700 (PDT), David Miller wrote:
> It has a significant impact on automated testing which lots of
> individuals and entities perform, therefore I think it very much is
> -stable material.

Okay.

Thanks,

 Jiri

^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-19  7:51 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org
  Cc: Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai,
	linux-tegra
In-Reply-To: <6a6bac84-1d29-2740-1636-d3adb26b6bcc@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/18/2019, 10:16:20 (UTC+00:00)

> Have you tried using NFS on a board with this ethernet controller?

I'm having some issues setting up the NFS server in order to replicate 
so this may take some time.

Are you able to add some debug in stmmac_init_rx_buffers() to see what's 
the buffer address ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* [PATCH] net: lan78xx: Merge memcpy + lexx_to_cpus to get_unaligned_lexx
From: Chuhong Yuan @ 2019-07-19  7:36 UTC (permalink / raw)
  Cc: Woojung Huh, Microchip Linux Driver Support, David S . Miller,
	netdev, linux-usb, linux-kernel, Chuhong Yuan

Merge the combo use of memcpy and lexx_to_cpus.
Use get_unaligned_lexx instead.
This simplifies the code.

Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
---
 drivers/net/usb/lan78xx.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 3d92ea6fcc02..9c33b35bd155 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1258,8 +1258,7 @@ static void lan78xx_status(struct lan78xx_net *dev, struct urb *urb)
 		return;
 	}
 
-	memcpy(&intdata, urb->transfer_buffer, 4);
-	le32_to_cpus(&intdata);
+	intdata = get_unaligned_le32(urb->transfer_buffer);
 
 	if (intdata & INT_ENP_PHY_INT) {
 		netif_dbg(dev, link, dev->net, "PHY INTR: 0x%08x\n", intdata);
@@ -3105,16 +3104,13 @@ static int lan78xx_rx(struct lan78xx_net *dev, struct sk_buff *skb)
 		struct sk_buff *skb2;
 		unsigned char *packet;
 
-		memcpy(&rx_cmd_a, skb->data, sizeof(rx_cmd_a));
-		le32_to_cpus(&rx_cmd_a);
+		rx_cmd_a = get_unaligned_le32(skb->data);
 		skb_pull(skb, sizeof(rx_cmd_a));
 
-		memcpy(&rx_cmd_b, skb->data, sizeof(rx_cmd_b));
-		le32_to_cpus(&rx_cmd_b);
+		rx_cmd_b = get_unaligned_le32(skb->data);
 		skb_pull(skb, sizeof(rx_cmd_b));
 
-		memcpy(&rx_cmd_c, skb->data, sizeof(rx_cmd_c));
-		le16_to_cpus(&rx_cmd_c);
+		rx_cmd_c = get_unaligned_le16(skb->data);
 		skb_pull(skb, sizeof(rx_cmd_c));
 
 		packet = skb->data;
-- 
2.20.1


^ permalink raw reply related

* Re: [RFC PATCH 0/5] PTP: add support for Intel's TGPIO controller
From: Felipe Balbi @ 2019-07-19  7:35 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Richard Cochran, netdev, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, x86, linux-kernel,
	Christopher S . Hall
In-Reply-To: <20190718195040.GL25635@lunn.ch>

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]


Hi,

Andrew Lunn <andrew@lunn.ch> writes:
> On Tue, Jul 16, 2019 at 10:20:33AM +0300, Felipe Balbi wrote:
>> TGPIO is a new IP which allows for time synchronization between systems
>> without any other means of synchronization such as PTP or NTP. The
>> driver is implemented as part of the PTP framework since its features
>> covered most of what this controller can do.
>
> Hi Felipe
>
> Given the name TGPIO, can it also be used for plain old boring GPIO?

not really, no. This is a misnomer, IMHO :-) We can only assert output
pulses at specified intervals or capture a timestamp of an external
signal.

> Does there need to be some sort of mux between GPIO and TGPIO? And an
> interface into the generic GPIO core?

no

> Also, is this always embedded into a SoC? Or could it actually be in a
> discrete NIC?

Technically, this could be done as a discrete, but it isn't. In any
case, why does that matter? From a linux-point of view, we have a device
driver either way.

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* BUG: unable to handle kernel paging request in corrupted (2)
From: syzbot @ 2019-07-19  7:28 UTC (permalink / raw)
  To: linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    49d05fe2 ipv6: rt6_check should return NULL if 'from' is N..
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=104b5f70600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=87305c3ca9c25c70
dashboard link: https://syzkaller.appspot.com/bug?extid=08b7a2c58acdfa12c82d
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=143a78f4600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+08b7a2c58acdfa12c82d@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
BUG: unable to handle page fault for address: 00000000ffffffff
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 9ad32067 P4D 9ad32067 PUD 0
Oops: 0010 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9920 Comm: syz-executor.1 Not tainted 5.2.0+ #91
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
BUG: kernel NULL pointer dereference, address: 0000000000000002
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 9ad32067 P4D 9ad32067 PUD 9ad33067 PMD 0
Oops: 0010 [#2] PREEMPT SMP KASAN
CPU: 0 PID: 9920 Comm: syz-executor.1 Not tainted 5.2.0+ #91
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:0x2
Code: Bad RIP value.
RSP: 0000:ffff888092932a20 EFLAGS: 00010086
RAX: 000000000000002d RBX: ffff888092932a40 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff815c1016 RDI: ffffed1012526536
RBP: ffffffff81724d28 R08: 000000000000002d R09: ffffed1015d044fa
R10: ffffed1015d044f9 R11: ffff8880ae8227cf R12: ffffffff81b3e334
R13: 0000000000000010 R14: 0000000000000000 R15: 1ffff1101252654b
FS:  000055555572a940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd8 CR3: 000000009c4d1000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH] [net-next] netfilter: bridge: make NF_TABLES_BRIDGE tristate
From: Arnd Bergmann @ 2019-07-19  6:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jozsef Kadlecsik, Florian Westphal, Roopa Prabhu,
	Nikolay Aleksandrov, David S. Miller, wenxu, netfilter-devel,
	coreteam, bridge, Networking, Linux Kernel Mailing List
In-Reply-To: <20190719063749.45io5pxcxrlmrqqn@salvia>

On Fri, Jul 19, 2019 at 8:37 AM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Thu, Jul 18, 2019 at 09:01:10PM +0200, Pablo Neira Ayuso wrote:
> > On Wed, Jul 10, 2019 at 10:08:20AM +0200, Arnd Bergmann wrote:
> > > The new nft_meta_bridge code fails to link as built-in when NF_TABLES
> > > is a loadable module.
> > >
> > > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_eval':
> > > nft_meta_bridge.c:(.text+0x1e8): undefined reference to `nft_meta_get_eval'
> > > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init':
> > > nft_meta_bridge.c:(.text+0x468): undefined reference to `nft_meta_get_init'
> > > nft_meta_bridge.c:(.text+0x49c): undefined reference to `nft_parse_register'
> > > nft_meta_bridge.c:(.text+0x4cc): undefined reference to `nft_validate_register_store'
> > > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_exit':
> > > nft_meta_bridge.c:(.exit.text+0x14): undefined reference to `nft_unregister_expr'
> > > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_init':
> > > nft_meta_bridge.c:(.init.text+0x14): undefined reference to `nft_register_expr'
> > > net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x60): undefined reference to `nft_meta_get_dump'
> > > net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x88): undefined reference to `nft_meta_set_eval'
> > >
> > > This can happen because the NF_TABLES_BRIDGE dependency itself is just a
> > > 'bool'.  Make the symbol a 'tristate' instead so Kconfig can propagate the
> > > dependencies correctly.
> >
> > Hm. Something breaks here. Investigating. Looks like bridge support is
> > gone after this, nft fails to register the filter chain type:
> >
> > # nft add table bridge x
> > # nft add chain bridge x y { type filter hook input priority 0\; }
> > Error: Could not process rule: No such file or directory
>
> Found it. It seems this patch is needed, on top of your patch.

Right, makes sense.

> I can just squash this chunk into your original patch and push it out
> if you're OK witht this.

Yes, please do.

      Arnd

^ permalink raw reply

* Re: [PATCH] [net-next] netfilter: bridge: make NF_TABLES_BRIDGE tristate
From: Pablo Neira Ayuso @ 2019-07-19  6:37 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jozsef Kadlecsik, Florian Westphal, Roopa Prabhu,
	Nikolay Aleksandrov, David S. Miller, wenxu, netfilter-devel,
	coreteam, bridge, netdev, linux-kernel
In-Reply-To: <20190718190110.akn54iwb2mui72cd@salvia>

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

On Thu, Jul 18, 2019 at 09:01:10PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jul 10, 2019 at 10:08:20AM +0200, Arnd Bergmann wrote:
> > The new nft_meta_bridge code fails to link as built-in when NF_TABLES
> > is a loadable module.
> > 
> > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_eval':
> > nft_meta_bridge.c:(.text+0x1e8): undefined reference to `nft_meta_get_eval'
> > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init':
> > nft_meta_bridge.c:(.text+0x468): undefined reference to `nft_meta_get_init'
> > nft_meta_bridge.c:(.text+0x49c): undefined reference to `nft_parse_register'
> > nft_meta_bridge.c:(.text+0x4cc): undefined reference to `nft_validate_register_store'
> > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_exit':
> > nft_meta_bridge.c:(.exit.text+0x14): undefined reference to `nft_unregister_expr'
> > net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_init':
> > nft_meta_bridge.c:(.init.text+0x14): undefined reference to `nft_register_expr'
> > net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x60): undefined reference to `nft_meta_get_dump'
> > net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x88): undefined reference to `nft_meta_set_eval'
> > 
> > This can happen because the NF_TABLES_BRIDGE dependency itself is just a
> > 'bool'.  Make the symbol a 'tristate' instead so Kconfig can propagate the
> > dependencies correctly.
> 
> Hm. Something breaks here. Investigating. Looks like bridge support is
> gone after this, nft fails to register the filter chain type:
> 
> # nft add table bridge x
> # nft add chain bridge x y { type filter hook input priority 0\; }
> Error: Could not process rule: No such file or directory

Found it. It seems this patch is needed, on top of your patch.

I can just squash this chunk into your original patch and push it out
if you're OK witht this.

Thanks.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 523 bytes --]

diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 3fd540b2c6ba..b5d5d071d765 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -193,7 +193,7 @@ static inline void nft_chain_filter_inet_init(void) {}
 static inline void nft_chain_filter_inet_fini(void) {}
 #endif /* CONFIG_NF_TABLES_IPV6 */
 
-#ifdef CONFIG_NF_TABLES_BRIDGE
+#if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
 static unsigned int
 nft_do_chain_bridge(void *priv,
 		    struct sk_buff *skb,

^ permalink raw reply related

* [PATCH] net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier
From: Maciej Żenczykowski @ 2019-07-19  6:30 UTC (permalink / raw)
  To: Maciej Żenczykowski, David S . Miller
  Cc: netdev, Lorenzo Colitti, Remin Nguyen Van, Alexey I . Froloff

From: Maciej Żenczykowski <maze@google.com>

This is trivial since we already have support for the entirely
identical (from the kernel's point of view) RDNSS and DNSSL that
also contain opaque data that needs to be passed down to userspace.

As specified in RFC7710, Captive Portal option contains a URL.
8-bit identifier of the option type as assigned by the IANA is 37.
This option should also be treated as userland.

Hence, treat ND option 37 as userland (Captive Portal support)

See:
  https://tools.ietf.org/html/rfc7710
  https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml

Fixes: e35f30c131a56
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Remin Nguyen Van <reminv@google.com>
Cc: Alexey I. Froloff <raorn@raorn.name>
---
 include/net/ndisc.h | 1 +
 net/ipv6/ndisc.c    | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 366150053043..b2f715ca0567 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -40,6 +40,7 @@ enum {
 	ND_OPT_RDNSS = 25,		/* RFC5006 */
 	ND_OPT_DNSSL = 31,		/* RFC6106 */
 	ND_OPT_6CO = 34,		/* RFC6775 */
+	ND_OPT_CAPTIVE_PORTAL = 37,	/* RFC7710 */
 	__ND_OPT_MAX
 };

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 083cc1c94cd3..53caf59c591e 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -196,6 +196,7 @@ static inline int ndisc_is_useropt(const struct net_device *dev,
 {
 	return opt->nd_opt_type == ND_OPT_RDNSS ||
 		opt->nd_opt_type == ND_OPT_DNSSL ||
+		opt->nd_opt_type == ND_OPT_CAPTIVE_PORTAL ||
 		ndisc_ops_is_useropt(dev, opt->nd_opt_type);
 }

-- 
2.22.0.657.g960e92d24f-goog

^ permalink raw reply related

* Re: [PATCH] net: ethernet: mediatek: Add MT7628/88 SoC support
From: kbuild test robot @ 2019-07-19  5:52 UTC (permalink / raw)
  To: Stefan Roese
  Cc: kbuild-all, netdev, linux-mediatek, René van Dorst,
	Sean Wang, Felix Fietkau, John Crispin
In-Reply-To: <20190717110243.14240-1-sr@denx.de>

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

Hi Stefan,

I love your patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on next-20190718]
[cannot apply to v5.2]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Stefan-Roese/net-ethernet-mediatek-Add-MT7628-88-SoC-support/20190719-020931
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.4.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.4.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):


vim +649 drivers/net//ethernet/mediatek/mtk_eth_soc.c

   646	
   647	static int txd_to_idx(struct mtk_tx_ring *ring, struct mtk_tx_dma *dma)
   648	{
 > 649		return ((u32)dma - (u32)ring->dma) / sizeof(*dma);
   650	}
   651	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 66257 bytes --]

^ permalink raw reply

* KASAN: slab-out-of-bounds Write in check_noncircular
From: syzbot @ 2019-07-19  5:48 UTC (permalink / raw)
  To: ast, daniel, john.fastabend, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    22051d9c Merge tag 'platform-drivers-x86-v5.3-2' of git://..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14090a34600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=135cb826ac59d7fc
dashboard link: https://syzkaller.appspot.com/bug?extid=e2416b38b581ad58bc1e
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1397afe0600000

The bug was bisected to:

commit e9db4ef6bf4ca9894bb324c76e01b8f1a16b2650
Author: John Fastabend <john.fastabend@gmail.com>
Date:   Sat Jun 30 13:17:47 2018 +0000

     bpf: sockhash fix omitted bucket lock in sock_close

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=131928f4600000
final crash:    https://syzkaller.appspot.com/x/report.txt?x=109928f4600000
console output: https://syzkaller.appspot.com/x/log.txt?x=171928f4600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e2416b38b581ad58bc1e@syzkaller.appspotmail.com
Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close")

==================================================================
BUG: KASAN: slab-out-of-bounds in check_noncircular+0x91/0x560  
/kernel/locking/lockdep.c:1722
Write of size 56 at addr ffff88809752a1a0 by task syz-executor.2/9504

CPU: 1 PID: 9504 Comm: syz-executor.2 Not tainted 5.2.0+ #34
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:

Allocated by task 2258096832:
------------[ cut here ]------------
kernel BUG at mm/slab.c:4179!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 9504 Comm: syz-executor.2 Not tainted 5.2.0+ #34
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__check_heap_object+0xcb/0xd0 /mm/slab.c:4203
Code: 4c 89 d1 4d 89 c8 e8 34 a6 07 00 5b 41 5e 5d c3 49 8b 73 58 41 0f b6  
d0 48 c7 c7 0f eb 7e 88 4c 89 d1 4d 89 c8 e8 d5 a6 07 00 <0f> 0b 0f 1f 00  
55 48 89 e5 53 48 83 ff 10 0f 84 90 00 00 00 48 85
RSP: 0018:ffff8880975297a0 EFLAGS: 00010046
RAX: 0000000000000fc5 RBX: 00000000000011e0 RCX: 000000000000000c
RDX: 000000000000000c RSI: 0000000000000002 RDI: 0000000000000001
RBP: ffff8880975297b0 R08: 0000000000000000 R09: fffff940004ba941
R10: ffff8880975298a0 R11: ffff8880aa5918c0 R12: ffff8880975298a2
R13: 01fffc0000010200 R14: ffff8880975286c0 R15: ffff8880975298a0
FS:  0000555556816940(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff8aff8f88 CR3: 000000008586d000 CR4: 00000000001406e0
Call Trace:
Modules linked in:
---[ end trace 35842f070e95906d ]---
RIP: 0010:__check_heap_object+0xcb/0xd0 /mm/slab.c:4203
Code: 4c 89 d1 4d 89 c8 e8 34 a6 07 00 5b 41 5e 5d c3 49 8b 73 58 41 0f b6  
d0 48 c7 c7 0f eb 7e 88 4c 89 d1 4d 89 c8 e8 d5 a6 07 00 <0f> 0b 0f 1f 00  
55 48 89 e5 53 48 83 ff 10 0f 84 90 00 00 00 48 85
RSP: 0018:ffff8880975297a0 EFLAGS: 00010046
RAX: 0000000000000fc5 RBX: 00000000000011e0 RCX: 000000000000000c
RDX: 000000000000000c RSI: 0000000000000002 RDI: 0000000000000001
RBP: ffff8880975297b0 R08: 0000000000000000 R09: fffff940004ba941
R10: ffff8880975298a0 R11: ffff8880aa5918c0 R12: ffff8880975298a2
R13: 01fffc0000010200 R14: ffff8880975286c0 R15: ffff8880975298a0
FS:  0000555556816940(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffff8aff8f88 CR3: 000000008586d000 CR4: 00000000001406e0


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net] be2net: Synchronize be_update_queues with dev_watchdog
From: Benjamin Poirier @ 2019-07-19  5:26 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: David Miller, Ajit Khaparde, Sathya Perla, Somnath Kotur,
	Sriharsha Basavapatna, Saeed Mahameed, Firo Yang, netdev
In-Reply-To: <42269a37-0353-29c8-ce13-51cb2feeb9af@gmail.com>

On 2019/07/18 10:23, Florian Fainelli wrote:
> On 7/17/19 6:42 PM, Benjamin Poirier wrote:
> > As pointed out by Firo Yang, a netdev tx timeout may trigger just before an
> > ethtool set_channels operation is started. be_tx_timeout(), which dumps
> > some queue structures, is not written to run concurrently with
> > be_update_queues(), which frees/allocates those queues structures. Add some
> > synchronization between the two.
> > 
> > Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com>
> > Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> 
> Would not moving the netif_tx_disable() in be_close() further up in the
> function resolve that problem as well?

Thanks for your review Florian,

No, netif_tx_disable() doesn't provide mutual exclusion with
dev_watchdog(). You can have:

cpu0                               cpu1
\ dev_watchdog
       \ netif_tx_lock
              \ be_tx_timeout
                     ...
                                   \ be_set_channels
                                          \ be_update_queues
                                                 \ netif_carrier_off
                                                 \ netif_tx_disable
                                                 ...
                                                 \ be_clear_queues
                     still running in
                     be_tx_timeout(),
                     boom!

^ permalink raw reply

* Re: [PATCH iproute2 net-next v5 1/5] etf: Add skip_sock_check
From: Stephen Hemminger @ 2019-07-19  5:12 UTC (permalink / raw)
  To: Vedang Patel
  Cc: netdev, jhs, xiyou.wangcong, jiri, vinicius.gomes,
	leandro.maciel.dorileo, jakub.kicinski, m-karicheri2, dsahern
In-Reply-To: <1563479743-8371-1-git-send-email-vedang.patel@intel.com>

On Thu, 18 Jul 2019 12:55:39 -0700
Vedang Patel <vedang.patel@intel.com> wrote:

> -	print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s",
> +	print_string(PRINT_ANY, "deadline_mode", "deadline_mode %s ",
>  				(qopt->flags & TC_ETF_DEADLINE_MODE_ON) ? "on" : "off");
> +	print_string(PRINT_ANY, "skip_sock_check", "skip_sock_check %s",
> +				(qopt->flags & TC_ETF_SKIP_SOCK_CHECK) ? "on" : "off");

These should really be boolean options in JSON, not string values.

^ permalink raw reply

* [PATCH AUTOSEL 5.2 143/171] rds: Accept peer connection reject messages due to incompatible version
From: Sasha Levin @ 2019-07-19  3:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Gerd Rausch, Zhu Yanjun, Santosh Shilimkar, Sasha Levin, netdev,
	linux-rdma
In-Reply-To: <20190719035643.14300-1-sashal@kernel.org>

From: Gerd Rausch <gerd.rausch@oracle.com>

[ Upstream commit 8c6166cfc9cd48e93d9176561e50b63cef4330d5 ]

Prior to
commit d021fabf525ff ("rds: rdma: add consumer reject")

function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
connection attempt by issuing a "rds_conn_drop".

The commit mentioned above added a "break", eliminating
the "fallthrough" case and made the "rds_conn_drop" rather conditional:

Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
carries an integer-value of "1" inside "private_data":

  if (!conn)
    break;
    err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
    if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
      pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
              &conn->c_laddr, &conn->c_faddr);
              conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
              rds_conn_drop(conn);
    }
    rdsdebug("Connection rejected: %s\n",
             rdma_reject_msg(cm_id, event->status));
    break;
    /* FALLTHROUGH */
A number of issues are worth mentioning here:
   #1) Previous versions of the RDS code simply rejected a connection
       by calling "rdma_reject(cm_id, NULL, 0);"
       So the value of the payload in "private_data" will not be "1",
       but "0".

   #2) Now the code has become dependent on host byte order and sizing.
       If one peer is big-endian, the other is little-endian,
       or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
       the *err check does not work as intended.

   #3) There is no check for "len" to see if the data behind *err is even valid.
       Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
       carry 148 bytes of zeroized payload.
       But that should probably not be relied upon here.

   #4) With the added "break;",
       we might as well drop the misleading "/* FALLTHROUGH */" comment.

This commit does _not_ address issue #2, as the sender would have to
agree on a byte order as well.

Here is the sequence of messages in this observed error-scenario:
   Host-A is pre-QoS changes (excluding the commit mentioned above)
   Host-B is post-QoS changes (including the commit mentioned above)

   #1 Host-B
      issues a connection request via function "rds_conn_path_transition"
      connection state transitions to "RDS_CONN_CONNECTING"

   #2 Host-A
      rejects the incompatible connection request (from #1)
      It does so by calling "rdma_reject(cm_id, NULL, 0);"

   #3 Host-B
      receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
      But since the code is changed in the way described above,
      it won't drop the connection here, simply because "*err == 0".

   #4 Host-A
      issues a connection request

   #5 Host-B
      receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
      and ends up calling "rds_ib_cm_handle_connect".
      But since the state is already in "RDS_CONN_CONNECTING"
      (as of #1) it will end up issuing a "rdma_reject" without
      dropping the connection:
         if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
             /* Wait and see - our connect may still be succeeding */
             rds_ib_stats_inc(s_ib_connect_raced);
         }
         goto out;

   #6 Host-A
      receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
      drops the connection and tries again (goto #4) until it gives up.

Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/rds/rdma_transport.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 46bce8389066..9db455d02255 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -112,7 +112,9 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 		if (!conn)
 			break;
 		err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
-		if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
+		if (!err ||
+		    (err && len >= sizeof(*err) &&
+		     ((*err) <= RDS_RDMA_REJ_INCOMPAT))) {
 			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
 				&conn->c_laddr, &conn->c_faddr);
 			conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
@@ -122,7 +124,6 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 		rdsdebug("Connection rejected: %s\n",
 			 rdma_reject_msg(cm_id, event->status));
 		break;
-		/* FALLTHROUGH */
 	case RDMA_CM_EVENT_ADDR_ERROR:
 	case RDMA_CM_EVENT_ROUTE_ERROR:
 	case RDMA_CM_EVENT_CONNECT_ERROR:
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 150/171] net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
From: Sasha Levin @ 2019-07-19  3:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Aya Levin, Feras Daoud, Saeed Mahameed, Sasha Levin, netdev,
	linux-rdma
In-Reply-To: <20190719035643.14300-1-sashal@kernel.org>

From: Aya Levin <ayal@mellanox.com>

[ Upstream commit ef1ce7d7b67b46661091c7ccc0396186b7a247ef ]

Check return value from mlx5e_attach_netdev, add error path on failure.

Fixes: 48935bbb7ae8 ("net/mlx5e: IPoIB, Add netdevice profile skeleton")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 9ca492b430d8..603d294757b4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -698,7 +698,9 @@ static int mlx5_rdma_setup_rn(struct ib_device *ibdev, u8 port_num,
 
 	prof->init(mdev, netdev, prof, ipriv);
 
-	mlx5e_attach_netdev(epriv);
+	err = mlx5e_attach_netdev(epriv);
+	if (err)
+		goto detach;
 	netif_carrier_off(netdev);
 
 	/* set rdma_netdev func pointers */
@@ -714,6 +716,11 @@ static int mlx5_rdma_setup_rn(struct ib_device *ibdev, u8 port_num,
 
 	return 0;
 
+detach:
+	prof->cleanup(epriv);
+	if (ipriv->sub_interface)
+		return err;
+	mlx5e_destroy_mdev_resources(mdev);
 destroy_ht:
 	mlx5i_pkey_qpn_ht_cleanup(netdev);
 	return err;
-- 
2.20.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.2 155/171] net/mlx5: E-Switch, Fix default encap mode
From: Sasha Levin @ 2019-07-19  3:56 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Maor Gottlieb, Roi Dayan, Saeed Mahameed, Sasha Levin, netdev,
	linux-rdma
In-Reply-To: <20190719035643.14300-1-sashal@kernel.org>

From: Maor Gottlieb <maorg@mellanox.com>

[ Upstream commit 9a64144d683a4395f57562d90247c61a0bf5105f ]

Encap mode is related to switchdev mode only. Move the init of
the encap mode to eswitch_offloads. Before this change, we reported
that eswitch supports encap, even tough the device was in non
SRIOV mode.

Fixes: 7768d1971de67 ('net/mlx5: E-Switch, Add control for encapsulation')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c          | 5 -----
 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 7 +++++++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 6a921e24cd5e..e9339e7d6a18 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1882,11 +1882,6 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	esw->enabled_vports = 0;
 	esw->mode = SRIOV_NONE;
 	esw->offloads.inline_mode = MLX5_INLINE_MODE_NONE;
-	if (MLX5_CAP_ESW_FLOWTABLE_FDB(dev, reformat) &&
-	    MLX5_CAP_ESW_FLOWTABLE_FDB(dev, decap))
-		esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
-	else
-		esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
 
 	dev->priv.eswitch = esw;
 	return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 47b446d30f71..c2beadc41c40 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -1840,6 +1840,12 @@ int esw_offloads_init(struct mlx5_eswitch *esw, int vf_nvports,
 {
 	int err;
 
+	if (MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, reformat) &&
+	    MLX5_CAP_ESW_FLOWTABLE_FDB(esw->dev, decap))
+		esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_BASIC;
+	else
+		esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
+
 	err = esw_offloads_steering_init(esw, vf_nvports, total_nvports);
 	if (err)
 		return err;
@@ -1901,6 +1907,7 @@ void esw_offloads_cleanup(struct mlx5_eswitch *esw)
 	esw_offloads_devcom_cleanup(esw);
 	esw_offloads_unload_all_reps(esw, num_vfs);
 	esw_offloads_steering_cleanup(esw);
+	esw->offloads.encap = DEVLINK_ESWITCH_ENCAP_MODE_NONE;
 }
 
 static int esw_mode_from_devlink(u16 mode, u16 *mlx5_mode)
-- 
2.20.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox