Netdev List
 help / color / mirror / Atom feed
* Re: [virtio-dev] Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Jason Wang @ 2018-04-11  2:08 UTC (permalink / raw)
  To: Liang, Cunming, Paolo Bonzini, Bie, Tiwei
  Cc: mst@redhat.com, alex.williamson@redhat.com, ddutile@redhat.com,
	Duyck, Alexander H, virtio-dev@lists.oasis-open.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	Daly, Dan, Wang, Zhihong, Tan, Jianfeng, Wang, Xiao W
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA34E9222F4@SHSMSX104.ccr.corp.intel.com>



On 2018年04月10日 17:23, Liang, Cunming wrote:
>
>> -----Original Message-----
>> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
>> Sent: Tuesday, April 10, 2018 3:52 PM
>> To: Bie, Tiwei <tiwei.bie@intel.com>; Jason Wang <jasowang@redhat.com>
>> Cc: mst@redhat.com; alex.williamson@redhat.com; ddutile@redhat.com;
>> Duyck, Alexander H <alexander.h.duyck@intel.com>; virtio-dev@lists.oasis-
>> open.org; linux-kernel@vger.kernel.org; kvm@vger.kernel.org;
>> virtualization@lists.linux-foundation.org; netdev@vger.kernel.org; Daly, Dan
>> <dan.daly@intel.com>; Liang, Cunming <cunming.liang@intel.com>; Wang,
>> Zhihong <zhihong.wang@intel.com>; Tan, Jianfeng <jianfeng.tan@intel.com>;
>> Wang, Xiao W <xiao.w.wang@intel.com>
>> Subject: Re: [virtio-dev] Re: [RFC] vhost: introduce mdev based hardware
>> vhost backend
>>
>> On 10/04/2018 06:57, Tiwei Bie wrote:
>>>> So you just move the abstraction layer from qemu to kernel, and you
>>>> still need different drivers in kernel for different device
>>>> interfaces of accelerators. This looks even more complex than leaving
>>>> it in qemu. As you said, another idea is to implement userspace vhost
>>>> backend for accelerators which seems easier and could co-work with
>>>> other parts of qemu without inventing new type of messages.
>>> I'm not quite sure. Do you think it's acceptable to add various vendor
>>> specific hardware drivers in QEMU?
>> I think so.  We have vendor-specific quirks, and at some point there was an
>> idea of using quirks to implement (vendor-specific) live migration support for
>> assigned devices.
> Vendor-specific quirks of accessing VGA is a small portion. Other major portions are still handled by guest driver.
>
> While in this case, when saying various vendor specific drivers in QEMU, it says QEMU takes over and provides the entire user space device drivers. Some parts are even not relevant to vhost, they're basic device function enabling. Moreover, it could be different kinds of devices(network/block/...) under vhost. No matter # of vendors or # of types, total LOC is not small.
>
> The idea is to avoid introducing these extra complexity out of QEMU, keeping vhost adapter simple. As vhost protocol is de factor standard, it leverages kernel device driver to provide the diversity. Changing once in QEMU, then it supports multi-vendor devices whose drivers naturally providing kernel driver there.

Let me clarify my question, it's not qemu vs kenrel but userspace vs 
kernel. It could be a library which could be linked to qemu. Doing it in 
userspace have the following obvious advantages:

- attack surface was limited to userspace
- easier to be maintained (compared to kernel driver)
- easier to be extended without introducing new userspace/kernel interfaces
- not tied to a specific operating system

If we want to do it in the kernel, need to consider to unify code 
between mdev device driver and generic driver. For net driver, maybe we 
can even consider to do it on top of exist drivers.

>
> If QEMU is going to build a user space driver framework there, we're open mind on that, even leveraging DPDK as the underlay library. Looking forward to more others' comments from community.

I'm doing this now by implementing vhost inside qemu IOThreads. Hope I 
can post RFC in few months.

Thanks

> Steve
>
>> Paolo

^ permalink raw reply

* [PATCH] intel: i40evf: Replace GFP_ATOMIC with GFP_KERNEL in i40evf_add_vlan
From: Jia-Ju Bai @ 2018-04-11  2:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: intel-wired-lan, netdev, linux-kernel, Jia-Ju Bai

i40evf_add_vlan() is never called in atomic context.

i40evf_add_vlan() is only called by i40evf_vlan_rx_add_vid(), 
which is only set as ".ndo_vlan_rx_add_vid" in struct net_device_ops.
".ndo_vlan_rx_add_vid" is not called in atomic context.

Despite never getting called from atomic context, 
i40evf_add_vlan() calls kzalloc() with GFP_ATOMIC, 
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL, 
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 1825d95..04b2b9c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -770,7 +770,7 @@ i40evf_vlan_filter *i40evf_add_vlan(struct i40evf_adapter *adapter, u16 vlan)
 
 	f = i40evf_find_vlan(adapter, vlan);
 	if (!f) {
-		f = kzalloc(sizeof(*f), GFP_ATOMIC);
+		f = kzalloc(sizeof(*f), GFP_KERNEL);
 		if (!f)
 			goto clearout;
 
-- 
1.9.1

^ permalink raw reply related

* Re: [virtio-dev] Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Stefan Hajnoczi @ 2018-04-11  2:01 UTC (permalink / raw)
  To: Liang, Cunming
  Cc: Paolo Bonzini, Bie, Tiwei, Jason Wang, mst@redhat.com,
	alex.williamson@redhat.com, ddutile@redhat.com,
	Duyck, Alexander H, virtio-dev@lists.oasis-open.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
	Daly, Dan, Wang, Zhihong, Tan, Jianfeng, Wang, Xiao W
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA34E9222F4@SHSMSX104.ccr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 401 bytes --]

On Tue, Apr 10, 2018 at 09:23:53AM +0000, Liang, Cunming wrote:
> If QEMU is going to build a user space driver framework there, we're open mind on that, even leveraging DPDK as the underlay library. Looking forward to more others' comments from community.

There is already an NVMe VFIO driver in QEMU (see block/nvme.c).  So in
principle there's no reason against userspace drivers in QEMU.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* [PATCH] dec: tulip: de4x5: Replace mdelay with usleep_range in de4x5_hw_init
From: Jia-Ju Bai @ 2018-04-11  2:00 UTC (permalink / raw)
  To: davem, dhowells, stephen, johannes.berg, arvind.yadav.cs
  Cc: netdev, linux-parisc, linux-kernel, Jia-Ju Bai

de4x5_hw_init() is never called in atomic context.

de4x5_hw_init() is only called by de4x5_pci_probe(), which is only 
set as ".probe" in struct pci_driver.

Despite never getting called from atomic context, de4x5_hw_init() 
calls mdelay() to busily wait.
This is not necessary and can be replaced with usleep_range() to 
avoid busy waiting.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/ethernet/dec/tulip/de4x5.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c b/drivers/net/ethernet/dec/tulip/de4x5.c
index 0affee9..3fb0119 100644
--- a/drivers/net/ethernet/dec/tulip/de4x5.c
+++ b/drivers/net/ethernet/dec/tulip/de4x5.c
@@ -1107,7 +1107,7 @@ static int (*dc_infoblock[])(struct net_device *dev, u_char, u_char *) = {
 	pdev = to_pci_dev (gendev);
 	pci_write_config_byte(pdev, PCI_CFDA_PSM, WAKEUP);
     }
-    mdelay(10);
+    usleep(10000, 11000);
 
     RESET_DE4X5;
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH] net: dsa: b53: Replace mdelay with msleep in b53_switch_reset_gpio
From: Jia-Ju Bai @ 2018-04-11  1:51 UTC (permalink / raw)
  To: f.fainelli, andrew, vivien.didelot; +Cc: netdev, linux-kernel, Jia-Ju Bai

b53_switch_reset_gpio() is never called in atomic context.

The call chain ending up at b53_switch_reset_gpio() is:
[1] b53_switch_reset_gpio() <- b53_switch_reset() <- 
	b53_reset_switch() <- b53_setup()

b53_switch_reset_gpio() is set as ".setup" in struct dsa_switch_ops.
This function is not called in atomic context.

Despite never getting called from atomic context, b53_switch_reset_gpio()
calls mdelay() to busily wait.
This is not necessary and can be replaced with msleep() to
avoid busy waiting.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/dsa/b53/b53_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 274f367..e070ff6 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -597,10 +597,10 @@ static void b53_switch_reset_gpio(struct b53_device *dev)
 	/* Reset sequence: RESET low(50ms)->high(20ms)
 	 */
 	gpio_set_value(gpio, 0);
-	mdelay(50);
+	msleep(50);
 
 	gpio_set_value(gpio, 1);
-	mdelay(20);
+	msleep(20);
 
 	dev->current_page = 0xff;
 }
-- 
1.9.1

^ permalink raw reply related

* Re: XDP_TX for virtio_net not working in recent kernel?
From: Jason Wang @ 2018-04-11  1:50 UTC (permalink / raw)
  To: Daniel Borkmann, Kimitoshi Takahashi; +Cc: xdp-newbies, netdev
In-Reply-To: <1d5849ae-a9ec-93af-e1b5-e7a368193c7a@iogearbox.net>

[-- Attachment #1: Type: text/plain, Size: 2680 bytes --]



On 2018年04月10日 18:17, Daniel Borkmann wrote:
> [ +jason +netdev ]
>
> On 04/10/2018 12:49 AM, Kimitoshi Takahashi wrote:
>> Hello,
>>
>> I'm totally newbie.
>>
>> I want to try some examples under the samples/bpf of the kernel tree using the virtio_net driver in kvm, in order to get familiar with the XDP. However, it seems to me that for kernels newer than linux-4.15, the xdp2 and xdp_tx_tunnel are not working in the case of virtio_net.
>>
>> Can somebody help me out to make them working?
>>
>>
>> 1) For linux-4.11.12, linux-4.12.14, and linux-4.14.31, xdp2 and xdp_tx_tunnel are working.
>>
>> [xdp2]
>> 10.0.0.27# ./xdp2 2 (or ./xdp -N 2)
>>
>> 10.0.0.254# hping3 10.0.0.27 -p 5000 -2
>>
>> 10.0.0.254# tcpdump -nei kbr0 udp port 5000
>> 06:18:37.677070 2e:c9:83:fb:5b:f3 > 52:54:00:11:00:1b, ethertype IPv4 (0x0800), length 42: 10.0.0.254.2982 > 10.0.0.27.5000: UDP, length 0
>> 06:18:37.677265 52:54:00:11:00:1b > 2e:c9:83:fb:5b:f3, ethertype IPv4 (0x0800), length 42: 10.0.0.254.2982 > 10.0.0.27.5000: UDP, length 0
>>
>> I can see the UDP packets with MAC addresses swapped.
>>
>> [xdp_tx_tunnel]
>>
>> 10.0.0.254# telnet 10.1.2.1 80
>>
>> 10.0.0.27# ./xdp_tx_iptunnel -i 2 -a 10.1.2.1 -p 80 -s 10.0.0.27 -d 10.0.0.24 -m 52:54:00:11:00:18
>>
>> 10.0.0.24# tcpdump -nei any proto 4
>> 22:08:20.510354  In 52:54:00:11:00:1b ethertype IPv4 (0x0800), length 96: 10.0.0.27 > 10.0.0.24: 10.0.0.254.60826 > 10.1.2.1.80: Flags [S], seq 3224377720, win 29200, options [mss 1460,sackOK,TS val 217801953 ecr 0,nop,wscale 7], length 0 (ipip-proto-4)
>>
>> I can see the encapsulated SYN packet that is forwarded by the 10.0.0.27.
>>
>> 2) For linux-4.15, linux-4.15.5, linux-4.15.13 and linux-4.16.1, xdp2 and xdp_tx_tunnel are NOT working.
>>
>> [xdp2]
>> 10.0.0.27# ./xdp2 -N 2
>>
>> 10.0.0.254# hping3 10.0.0.27 -p 5000 -2
>>
>> 10.0.0.254# tcpdump -nei br0 udp port 5000
>> 06:45:34.810046 2e:c9:83:fb:5b:f3 > 52:54:00:11:00:1b, ethertype IPv4 (0x0800), length 42: 10.0.0.254.2346 > 10.0.0.27.5000: UDP, length 0
>> 06:45:35.810176 2e:c9:83:fb:5b:f3 > 52:54:00:11:00:1b, ethertype IPv4 (0x0800), length 42: 10.0.0.254.2347 > 10.0.0.27.5000: UDP, length 0
>>
>> Only the outgoing packets can be seen.
>>
>> [xdp_tx_tunnel]
>>
>> 10.0.0.254# telnet 10.1.2.1 80
>>
>> 10.0.0.27# ./xdp_tx_iptunnel -i 2 -a 10.1.2.1 -p 80 -s 10.0.0.27 -d 10.0.0.24 -m 52:54:00:11:00:18
>>
>> 10.0.0.24# tcpdump -nei any proto 4
>>
>> I can NOT see the encapsulated SYN packet that is forwarded by the 10.0.0.27.
>>
>>
>> -- 
>> Sincerely,
>> Takahashi Kimitoshi

I suspect a kick is missed in the XDP_TX case.

Could you please try the attached patch to see if it fixes the issue?

Thanks

[-- Attachment #2: 0001-virtio-net-add-missing-virtqueue-kick-when-flushing-.patch --]
[-- Type: text/x-patch, Size: 1506 bytes --]

>From f37d74d65671e1a1f1a0c4dd59bf1df69b47f30a Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 22 Jan 2018 17:07:03 +0800
Subject: [PATCH] virtio-net: add missing virtqueue kick when flushing packets

We tends to batch submitting packets during XDP_TX. This requires to
kick virtqueue after a batch, we tried to do it through
xdp_do_flush_map() which only makes sense for devmap not XDP_TX. So
explicitly kick the virtqueue in this case.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2337460..d8e1aea 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1269,7 +1269,9 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 {
 	struct receive_queue *rq =
 		container_of(napi, struct receive_queue, napi);
-	unsigned int received;
+	struct virtnet_info *vi = rq->vq->vdev->priv;
+	struct send_queue *sq;
+	unsigned int received, qp;
 	bool xdp_xmit = false;
 
 	virtnet_poll_cleantx(rq);
@@ -1280,8 +1282,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	if (received < budget)
 		virtqueue_napi_complete(napi, rq->vq, received);
 
-	if (xdp_xmit)
+	if (xdp_xmit) {
+		qp = vi->curr_queue_pairs - vi->xdp_queue_pairs +
+		     smp_processor_id();
+		sq = &vi->sq[qp];
+		virtqueue_kick(sq->vq);
 		xdp_do_flush_map();
+	}
 
 	return received;
 }
-- 
2.7.4


^ permalink raw reply related

* [PATCH 2/2] net: can: sja1000: Replace mdelay with usleep_range in pcan_add_channels
From: Jia-Ju Bai @ 2018-04-11  1:43 UTC (permalink / raw)
  To: wg, mkl; +Cc: linux-can, netdev, linux-kernel, Jia-Ju Bai

pcan_add_channels() is never called in atomic context.

pcan_add_channels() is only called by pcan_probe(), which is only set as 
".probe" in struct pcmcia_driver.

Despite never getting called from atomic context, pcan_add_channels()
calls mdelay() to busily wait.
This is not necessary and can be replaced with usleep_range() to
avoid busy waiting.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/can/sja1000/peak_pcmcia.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/sja1000/peak_pcmcia.c b/drivers/net/can/sja1000/peak_pcmcia.c
index dd56133..1a1fbf3 100644
--- a/drivers/net/can/sja1000/peak_pcmcia.c
+++ b/drivers/net/can/sja1000/peak_pcmcia.c
@@ -530,7 +530,7 @@ static int pcan_add_channels(struct pcan_pccard *card)
 	pcan_write_reg(card, PCC_CCR, ccr);
 
 	/* wait 2ms before unresetting channels */
-	mdelay(2);
+	usleep_range(2000, 3000);
 
 	ccr &= ~PCC_CCR_RST_ALL;
 	pcan_write_reg(card, PCC_CCR, ccr);
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/2] net: can: sja1000: Replace mdelay with usleep_range in peak_pci_probe
From: Jia-Ju Bai @ 2018-04-11  1:42 UTC (permalink / raw)
  To: wg, mkl; +Cc: linux-can, netdev, linux-kernel, Jia-Ju Bai

peak_pci_probe() is never called in atomic context.

peak_pci_probe() is set as ".probe" in struct pci_driver.

Despite never getting called from atomic context, peak_pci_probe()
calls mdelay() to busily wait.
This is not necessary and can be replaced with usleep_range() to
avoid busy waiting.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/net/can/sja1000/peak_pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/sja1000/peak_pci.c b/drivers/net/can/sja1000/peak_pci.c
index 131026f..48cf821 100644
--- a/drivers/net/can/sja1000/peak_pci.c
+++ b/drivers/net/can/sja1000/peak_pci.c
@@ -608,7 +608,7 @@ static int peak_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	writeb(0x00, cfg_base + PITA_GPIOICR);
 	/* Toggle reset */
 	writeb(0x05, cfg_base + PITA_MISC + 3);
-	mdelay(5);
+	usleep_range(5000, 6000);
 	/* Leave parport mux mode */
 	writeb(0x04, cfg_base + PITA_MISC + 3);
 
-- 
1.9.1

^ permalink raw reply related

* RE: [virtio-dev] Re: [RFC] vhost: introduce mdev based hardware vhost backend
From: Tian, Kevin @ 2018-04-11  1:38 UTC (permalink / raw)
  To: Liang, Cunming, Michael S. Tsirkin
  Cc: Duyck, Alexander H, virtio-dev@lists.oasis-open.org,
	kvm@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Wang, Xiao W,
	ddutile@redhat.com, Tan, Jianfeng, Wang, Zhihong, Paolo Bonzini
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA34E9226AF@SHSMSX104.ccr.corp.intel.com>

> From: Liang, Cunming
> Sent: Tuesday, April 10, 2018 10:24 PM
> 
[...]
> >
> > As others said, we do not need to go overeboard. A couple of small
> vendor-
> > specific quirks in qemu isn't a big deal.
> 
> It's quite challenge to identify it's small or not, there's no uniform metric.
> 
> It's only dependent on QEMU itself, that's the obvious benefit. Tradeoff is
> to build the entire device driver. We don't object to do that in QEMU, but
> wanna make sure to understand the boundary size clearly.
> 

It might be helpful if you can post some sample code using proposed 
framework - then people have a clear feeling about what size is talked 
about here and whether it falls into the concept of 'small quirks'.

Thanks
Kevin

^ permalink raw reply

* [PATCH 2/2] staging: irda: Replace mdelay with usleep_range in irda_usb_probe
From: Jia-Ju Bai @ 2018-04-11  1:33 UTC (permalink / raw)
  To: samuel, gregkh, davem, johan, arvind.yadav.cs
  Cc: netdev, devel, linux-kernel, Jia-Ju Bai

irda_usb_probe() is never called in atomic context.

irda_usb_probe() is only set as ".probe" in struct usb_driver.

Despite never getting called from atomic context, irda_usb_probe()
calls mdelay() to busily wait.
This is not necessary and can be replaced with usleep_range() to
avoid busy waiting.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/staging/irda/drivers/irda-usb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/irda/drivers/irda-usb.c b/drivers/staging/irda/drivers/irda-usb.c
index 723e49b..6ff5b08 100644
--- a/drivers/staging/irda/drivers/irda-usb.c
+++ b/drivers/staging/irda/drivers/irda-usb.c
@@ -1710,7 +1710,7 @@ static int irda_usb_probe(struct usb_interface *intf,
 			pr_debug("usb_control_msg failed %d\n", ret);
 			goto err_out_3;
 		} else {
-			mdelay(10);
+			usleep_range(10000, 11000);
 		}
 	}
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/2] staging: irda: Replace mdelay with usleep_range in stir421x_fw_upload
From: Jia-Ju Bai @ 2018-04-11  1:29 UTC (permalink / raw)
  To: samuel, gregkh, davem, johan, arvind.yadav.cs
  Cc: netdev, devel, linux-kernel, Jia-Ju Bai

stir421x_fw_upload() is never called in atomic context.

The call chain ending up at stir421x_fw_upload() is:
[1] stir421x_fw_upload() <- stir421x_patch_device() <- irda_usb_probe()

irda_usb_probe() is set as ".probe" in struct usb_driver.
This function is not called in atomic context.

Despite never getting called from atomic context, stir421x_fw_upload()
calls mdelay() to busily wait.
This is not necessary and can be replaced with usleep_range() to
avoid busy waiting.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
---
 drivers/staging/irda/drivers/irda-usb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/irda/drivers/irda-usb.c b/drivers/staging/irda/drivers/irda-usb.c
index 723e49b..c6c8c2c 100644
--- a/drivers/staging/irda/drivers/irda-usb.c
+++ b/drivers/staging/irda/drivers/irda-usb.c
@@ -1050,7 +1050,7 @@ static int stir421x_fw_upload(struct irda_usb_cb *self,
 		if (ret < 0)
 			break;
 
-		mdelay(10);
+		usleep_range(10000, 11000);
 	}
 
 	kfree(patch_block);
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2 0/2] vhost: fix vhost_vq_access_ok() log check
From: Stefan Hajnoczi @ 2018-04-11  1:21 UTC (permalink / raw)
  To: David Miller
  Cc: jasowang, virtualization, syzkaller-bugs, mst, torvalds, kvm,
	linux-kernel, netdev
In-Reply-To: <20180410.105043.1798364865868187298.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1182 bytes --]

On Tue, Apr 10, 2018 at 10:50:43AM -0400, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Tue, 10 Apr 2018 14:40:10 +0800
> 
> > On 2018年04月10日 13:26, Stefan Hajnoczi wrote:
> >> v2:
> >>   * Rewrote the conditional to make the vq access check clearer [Linus]
> >>   * Added Patch 2 to make the return type consistent and harder to misuse
> >>   * [Linus]
> >>
> >> The first patch fixes the vhost virtqueue access check which was
> >> recently
> >> broken.  The second patch replaces the int return type with bool to
> >> prevent
> >> future bugs.
> >>
> >> Stefan Hajnoczi (2):
> >>    vhost: fix vhost_vq_access_ok() log check
> >>    vhost: return bool from *_access_ok() functions
> >>
> >>   drivers/vhost/vhost.h |  4 +--
> >>   drivers/vhost/vhost.c | 70
> >>   ++++++++++++++++++++++++++-------------------------
> >>   2 files changed, 38 insertions(+), 36 deletions(-)
> >>
> > 
> > Acked-by: Jason Wang <jasowang@redhat.com>
> 
> This sereis doesn't apply cleanly to the net tree, please respin
> and add the appropriate Acks and Fixes: tags that Michael and Jason
> have provided.

Sorry, will fix.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Michael S. Tsirkin @ 2018-04-11  1:21 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sridhar Samudrala, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, jiri
In-Reply-To: <20180410142608.50f15b45@xeon-e3>

On Tue, Apr 10, 2018 at 02:26:08PM -0700, Stephen Hemminger wrote:
> On Tue, 10 Apr 2018 11:59:50 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> 
> > Use the registration/notification framework supported by the generic
> > bypass infrastructure.
> > 
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > ---
> 
> Thanks for doing this.  Your current version has couple show stopper
> issues.
> 
> First, the slave device is instantly taking over the slave.
> This doesn't allow udev/systemd to do its device rename of the slave
> device. Netvsc uses a delayed work to workaround this.
> 
> Secondly, the select queue needs to call queue selection in VF.
> The bonding/teaming logic doesn't work well for UDP flows.
> Commit b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
> fixed this performance problem.
> 
> Lastly, more indirection is bad in current climate.

Well right now netvsc does an indirect call to the PT device,
does it not? If you really want max performance when PT
is in use you need to do the reverse and have PT forward to netvsc.

> I am not completely adverse to this but it needs to be fast, simple
> and completely transparent.

^ permalink raw reply

* [PATCHv2] mISDN: Remove VLAs
From: Laura Abbott @ 2018-04-11  1:04 UTC (permalink / raw)
  To: Karsten Keil, David S. Miller
  Cc: Laura Abbott, Kees Cook, netdev, linux-kernel, kernel-hardening


There's an ongoing effort to remove VLAs[1] from the kernel to eventually
turn on -Wvla. Remove the VLAs from the mISDN code by switching to using
kstrdup in one place and using an upper bound in another.

Signed-off-by: Laura Abbott <labbott@redhat.com>
---
v2: Switch to a tighter upper bound so we are allocating a more
reasonable amount on the stack (300). This is based on previous checks
against this value.
---
 drivers/isdn/mISDN/dsp_hwec.c   |  8 +++++---
 drivers/isdn/mISDN/l1oip_core.c | 14 +++++++++++---
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/isdn/mISDN/dsp_hwec.c b/drivers/isdn/mISDN/dsp_hwec.c
index a6e87076acc2..5336bbdbfdc5 100644
--- a/drivers/isdn/mISDN/dsp_hwec.c
+++ b/drivers/isdn/mISDN/dsp_hwec.c
@@ -68,12 +68,12 @@ void dsp_hwec_enable(struct dsp *dsp, const char *arg)
 		goto _do;
 
 	{
-		char _dup[len + 1];
 		char *dup, *tok, *name, *val;
 		int tmp;
 
-		strcpy(_dup, arg);
-		dup = _dup;
+		dup = kstrdup(arg, GFP_ATOMIC);
+		if (!dup)
+			return;
 
 		while ((tok = strsep(&dup, ","))) {
 			if (!strlen(tok))
@@ -89,6 +89,8 @@ void dsp_hwec_enable(struct dsp *dsp, const char *arg)
 					deftaps = tmp;
 			}
 		}
+
+		kfree(dup);
 	}
 
 _do:
diff --git a/drivers/isdn/mISDN/l1oip_core.c b/drivers/isdn/mISDN/l1oip_core.c
index 21d50e4cc5e1..b05022f94f18 100644
--- a/drivers/isdn/mISDN/l1oip_core.c
+++ b/drivers/isdn/mISDN/l1oip_core.c
@@ -279,7 +279,7 @@ l1oip_socket_send(struct l1oip *hc, u8 localcodec, u8 channel, u32 chanmask,
 		  u16 timebase, u8 *buf, int len)
 {
 	u8 *p;
-	u8 frame[len + 32];
+	u8 frame[MAX_DFRAME_LEN_L1 + 32];
 	struct socket *socket = NULL;
 
 	if (debug & DEBUG_L1OIP_MSG)
@@ -902,7 +902,11 @@ handle_dmsg(struct mISDNchannel *ch, struct sk_buff *skb)
 		p = skb->data;
 		l = skb->len;
 		while (l) {
-			ll = (l < L1OIP_MAX_PERFRAME) ? l : L1OIP_MAX_PERFRAME;
+			/*
+			 * This is technically bounded by L1OIP_MAX_PERFRAME but
+			 * MAX_DFRAME_LEN_L1 < L1OIP_MAX_PERFRAME
+			 */
+			ll = (l < MAX_DFRAME_LEN_L1) ? l : MAX_DFRAME_LEN_L1;
 			l1oip_socket_send(hc, 0, dch->slot, 0,
 					  hc->chan[dch->slot].tx_counter++, p, ll);
 			p += ll;
@@ -1140,7 +1144,11 @@ handle_bmsg(struct mISDNchannel *ch, struct sk_buff *skb)
 		p = skb->data;
 		l = skb->len;
 		while (l) {
-			ll = (l < L1OIP_MAX_PERFRAME) ? l : L1OIP_MAX_PERFRAME;
+			/*
+			 * This is technically bounded by L1OIP_MAX_PERFRAME but
+			 * MAX_DFRAME_LEN_L1 < L1OIP_MAX_PERFRAME
+			 */
+			ll = (l < MAX_DFRAME_LEN_L1) ? l : MAX_DFRAME_LEN_L1;
 			l1oip_socket_send(hc, hc->codec, bch->slot, 0,
 					  hc->chan[bch->slot].tx_counter, p, ll);
 			hc->chan[bch->slot].tx_counter += ll;
-- 
2.14.3

^ permalink raw reply related

* [PATCH] net/tls: Remove VLA usage
From: Kees Cook @ 2018-04-11  0:52 UTC (permalink / raw)
  To: Dave Watson
  Cc: linux-kernel, netdev, Ilya Lesokhin, Aviad Yehezkel,
	David S. Miller

In the quest to remove VLAs from the kernel[1], this replaces the VLA
size with the only possible size used in the code, and adds a mechanism
to double-check future IV sizes.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 net/tls/tls_sw.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 4dc766b03f00..71e79597f940 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -41,6 +41,8 @@
 #include <net/strparser.h>
 #include <net/tls.h>
 
+#define MAX_IV_SIZE	TLS_CIPHER_AES_GCM_128_IV_SIZE
+
 static int tls_do_decryption(struct sock *sk,
 			     struct scatterlist *sgin,
 			     struct scatterlist *sgout,
@@ -673,7 +675,7 @@ static int decrypt_skb(struct sock *sk, struct sk_buff *skb,
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
-	char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + tls_ctx->rx.iv_size];
+	char iv[TLS_CIPHER_AES_GCM_128_SALT_SIZE + MAX_IV_SIZE];
 	struct scatterlist sgin_arr[MAX_SKB_FRAGS + 2];
 	struct scatterlist *sgin = &sgin_arr[0];
 	struct strp_msg *rxm = strp_msg(skb);
@@ -1094,6 +1096,12 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
 		goto free_priv;
 	}
 
+	/* Sanity-check the IV size for stack allocations. */
+	if (iv_size > MAX_IV_SIZE) {
+		rc = -EINVAL;
+		goto free_priv;
+	}
+
 	cctx->prepend_size = TLS_HEADER_SIZE + nonce_size;
 	cctx->tag_size = tag_size;
 	cctx->overhead_size = cctx->prepend_size + cctx->tag_size;
-- 
2.7.4


-- 
Kees Cook
Pixel Security

^ permalink raw reply related

* Re: [PATCH] make net_gso_ok return false when gso_type is zero(invalid)
From: Wenhua Shi @ 2018-04-11  0:51 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: David Miller, netdev, linux-kernel
In-Reply-To: <20180410163216.GB3661@localhost.localdomain>

2018-04-10 18:32 GMT+02:00 Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>:
> On Sun, Apr 08, 2018 at 08:41:21PM +0200, Wenhua Shi wrote:
>> 2018-04-08 18:51 GMT+02:00 David Miller <davem@davemloft.net>:
>> >
>> > From: Wenhua Shi <march511@gmail.com>
>> > Date: Fri,  6 Apr 2018 03:43:39 +0200
>> >
>> > > Signed-off-by: Wenhua Shi <march511@gmail.com>
>> >
>> > This precondition should be made impossible instead of having to do
>> > an extra check everywhere that this helper is invoked, many of which
>> > are in fast paths.
>>
>> I believe the precondition you said is quite true. In my situation, I
>> have to disable GSO for some packet and I notice that it leads to a
>> worse performance (slower than 1Mbps, was almost 800Mbps).
>>
>> Here's the hook I use on debian 9.4, kernel version 4.9:
>
> There is quite a distance between 4.9 and net/net-next. Did you test
> on a more recent kernel too?
>
> Note that TCP stack now works with GSO being always on.
> 0a6b2a1dc2a2 ("tcp: switch to GSO being always on")
>

I've tried testing on the Fedora rawhide channel. The kernel version
is 4.17.0. Detail information is attached.

Without the hook

    [root@fedora-s-1vcpu-1gb-sfo1-01 testing]# iperf -c
myanothernormalmachine -d
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    ------------------------------------------------------------
    Client connecting to myanothernormalmachine, TCP port 5001
    TCP window size: 85.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 107.170.240.XXX port 44692 connected with
104.131.148.XXX port 5001
    [  5] local 107.170.240.XXX port 5001 connected with
104.131.148.XXX port 53978
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  1.04 GBytes   892 Mbits/sec
    [  5]  0.0-10.0 sec   757 MBytes   638 Mbits/sec

With the hook

    [root@fedora-s-1vcpu-1gb-sfo1-01 testing]# iperf -c
myanothernormalmachine -d
    ------------------------------------------------------------
    Server listening on TCP port 5001
    TCP window size: 85.3 KByte (default)
    ------------------------------------------------------------
    ------------------------------------------------------------
    Client connecting to myanothernormalmachine, TCP port 5001
    TCP window size: 85.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 107.170.240.XXX port 44694 connected with
104.131.148.XXX port 5001
    [  5] local 107.170.240.XXX port 5001 connected with
104.131.148.XXX port 53980
    [ ID] Interval       Transfer     Bandwidth
    [  5]  0.0-10.0 sec  1.04 GBytes   894 Mbits/sec
    [  3]  0.0-13.5 sec   170 KBytes   103 Kbits/sec

Kernel

    [root@fedora-s-1vcpu-1gb-sfo1-01 testing]# uname -a
    Linux fedora-s-1vcpu-1gb-sfo1-01.localdomain
4.17.0-0.rc0.git5.2.fc29.x86_64 #1 SMP Mon Apr 9 17:16:30 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux

Hook Source Code

    [root@fedora-s-1vcpu-1gb-sfo1-01 testing]# cat testing.c
    #include <linux/kernel.h>
    #include <linux/init.h>
    #include <linux/module.h>
    #include <linux/kernel.h>
    #include <linux/netfilter.h>
    #include <linux/netfilter_ipv4.h>
    #include <linux/netfilter_ipv6.h>
    #include <linux/skbuff.h>
    #include <linux/tcp.h>
    #include <linux/ip.h>

    unsigned int hook_outgoing(
            void * priv,
            struct sk_buff * skb,
            const struct nf_hook_state * state)
    {
            printk(KERN_INFO "Hook working...\n");
            /* for some reason I have to disable GSO */
            skb_gso_reset(skb);

            /* The following won't work any more. */
            // skb->sk->sk_gso_type = ~0;

            return NF_ACCEPT;

    }

    static struct nf_hook_ops hook =
    {
            .hook = hook_outgoing,
            .pf = PF_INET,
            .hooknum = NF_INET_POST_ROUTING,
            .priority = NF_IP_PRI_LAST,
    };

    static int __init init_testing(void)
    {
            nf_register_net_hook(&init_net, &hook);
            return 0;
    }

    static void __exit exit_testing(void)
    {
            nf_unregister_net_hook(&init_net, &hook);
    }

    MODULE_LICENSE("GPL");
    module_init(init_testing);
    module_exit(exit_testing);




It turns out the problem exists and my previous bypassing trick is not
working any more. I'm now testing whether the patch is working for the
latest net-next branch.

^ permalink raw reply

* [PATCH] netfilter: ftp helper: Support \n and \r terminators for PORT
From: Scott Parlane @ 2018-04-11  0:11 UTC (permalink / raw)
  To: pablo, kadlec, fw, netfilter-devel, coreteam, netdev; +Cc: Scott Parlane

The current code requires \r as the terminator directly
after the last digit, and RFC959 indicates CRLF as the
the terminator.

However, we have seen in the wild a client sending
packets with only \n

Signed-off-by: Scott Parlane <scott.parlane@alliedtelesis.co.nz>
---
 net/netfilter/nf_conntrack_ftp.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index b666959f17..e15bf351f6 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -56,21 +56,23 @@ unsigned int (*nf_nat_ftp_hook)(struct sk_buff *skb,
 EXPORT_SYMBOL_GPL(nf_nat_ftp_hook);
 
 static int try_rfc959(const char *, size_t, struct nf_conntrack_man *,
-		      char, unsigned int *);
+		      char, char, unsigned int *);
 static int try_rfc1123(const char *, size_t, struct nf_conntrack_man *,
-		       char, unsigned int *);
+		       char, char, unsigned int *);
 static int try_eprt(const char *, size_t, struct nf_conntrack_man *,
-		    char, unsigned int *);
+		    char, char, unsigned int *);
 static int try_epsv_response(const char *, size_t, struct nf_conntrack_man *,
-			     char, unsigned int *);
+			     char, char, unsigned int *);
 
 static struct ftp_search {
 	const char *pattern;
 	size_t plen;
 	char skip;
 	char term;
+	char alt_term;
 	enum nf_ct_ftp_type ftptype;
-	int (*getnum)(const char *, size_t, struct nf_conntrack_man *, char, unsigned int *);
+	int (*getnum)(const char *, size_t, struct nf_conntrack_man *, char,
+		      char, unsigned int *);
 } search[IP_CT_DIR_MAX][2] = {
 	[IP_CT_DIR_ORIGINAL] = {
 		{
@@ -78,6 +80,7 @@ static struct ftp_search {
 			.plen		= sizeof("PORT") - 1,
 			.skip		= ' ',
 			.term		= '\r',
+			.alt_term	= '\n',
 			.ftptype	= NF_CT_FTP_PORT,
 			.getnum		= try_rfc959,
 		},
@@ -119,7 +122,7 @@ get_ipv6_addr(const char *src, size_t dlen, struct in6_addr *dst, u_int8_t term)
 }
 
 static int try_number(const char *data, size_t dlen, u_int32_t array[],
-		      int array_size, char sep, char term)
+		      int array_size, char sep, char term, char alt_term)
 {
 	u_int32_t i, len;
 
@@ -136,7 +139,9 @@ static int try_number(const char *data, size_t dlen, u_int32_t array[],
 			/* Unexpected character; true if it's the
 			   terminator (or we don't care about one)
 			   and we're finished. */
-			if ((*data == term || !term) && i == array_size - 1)
+			if ((*data == term || (alt_term &&
+					       *data == alt_term) || !term) &&
+			    i == array_size - 1)
 				return len;
 
 			pr_debug("Char %u (got %u nums) `%u' unexpected\n",
@@ -152,12 +157,12 @@ static int try_number(const char *data, size_t dlen, u_int32_t array[],
 /* Returns 0, or length of numbers: 192,168,1,1,5,6 */
 static int try_rfc959(const char *data, size_t dlen,
 		      struct nf_conntrack_man *cmd, char term,
-		      unsigned int *offset)
+		      char alt_term, unsigned int *offset)
 {
 	int length;
 	u_int32_t array[6];
 
-	length = try_number(data, dlen, array, 6, ',', term);
+	length = try_number(data, dlen, array, 6, ',', term, alt_term);
 	if (length == 0)
 		return 0;
 
@@ -179,7 +184,7 @@ static int try_rfc959(const char *data, size_t dlen,
  */
 static int try_rfc1123(const char *data, size_t dlen,
 		       struct nf_conntrack_man *cmd, char term,
-		       unsigned int *offset)
+		       char alt_term, unsigned int *offset)
 {
 	int i;
 	for (i = 0; i < dlen; i++)
@@ -191,7 +196,7 @@ static int try_rfc1123(const char *data, size_t dlen,
 
 	*offset += i;
 
-	return try_rfc959(data + i, dlen - i, cmd, 0, offset);
+	return try_rfc959(data + i, dlen - i, cmd, 0, 0, offset);
 }
 
 /* Grab port: number up to delimiter */
@@ -222,7 +227,7 @@ static int get_port(const char *data, int start, size_t dlen, char delim,
 
 /* Returns 0, or length of numbers: |1|132.235.1.2|6275| or |2|3ffe::1|6275| */
 static int try_eprt(const char *data, size_t dlen, struct nf_conntrack_man *cmd,
-		    char term, unsigned int *offset)
+		    char term, char alt_term, unsigned int *offset)
 {
 	char delim;
 	int length;
@@ -251,7 +256,8 @@ static int try_eprt(const char *data, size_t dlen, struct nf_conntrack_man *cmd,
 		u_int32_t array[4];
 
 		/* Now we have IP address. */
-		length = try_number(data + 3, dlen - 3, array, 4, '.', delim);
+		length = try_number(data + 3, dlen - 3, array, 4, '.', delim,
+				    0);
 		if (length != 0)
 			cmd->u3.ip = htonl((array[0] << 24) | (array[1] << 16)
 					   | (array[2] << 8) | array[3]);
@@ -271,7 +277,7 @@ static int try_eprt(const char *data, size_t dlen, struct nf_conntrack_man *cmd,
 /* Returns 0, or length of numbers: |||6446| */
 static int try_epsv_response(const char *data, size_t dlen,
 			     struct nf_conntrack_man *cmd, char term,
-			     unsigned int *offset)
+			     char alt_term, unsigned int *offset)
 {
 	char delim;
 
@@ -289,12 +295,13 @@ static int try_epsv_response(const char *data, size_t dlen,
 static int find_pattern(const char *data, size_t dlen,
 			const char *pattern, size_t plen,
 			char skip, char term,
+			char alt_term,
 			unsigned int *numoff,
 			unsigned int *numlen,
 			struct nf_conntrack_man *cmd,
 			int (*getnum)(const char *, size_t,
 				      struct nf_conntrack_man *, char,
-				      unsigned int *))
+				      char, unsigned int *))
 {
 	size_t i = plen;
 
@@ -337,7 +344,7 @@ static int find_pattern(const char *data, size_t dlen,
 	pr_debug("Skipped up to `%c'!\n", skip);
 
 	*numoff = i;
-	*numlen = getnum(data + i, dlen - i, cmd, term, numoff);
+	*numlen = getnum(data + i, dlen - i, cmd, term, alt_term, numoff);
 	if (!*numlen)
 		return -1;
 
@@ -461,6 +468,7 @@ skip_nl_seq:
 				     search[dir][i].plen,
 				     search[dir][i].skip,
 				     search[dir][i].term,
+				     search[dir][i].alt_term,
 				     &matchoff, &matchlen,
 				     &cmd,
 				     search[dir][i].getnum);
-- 
2.16.1

^ permalink raw reply related

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Stephen Hemminger @ 2018-04-10 23:59 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
	Jakub Kicinski, Sridhar Samudrala, virtualization, Netdev,
	David Miller
In-Reply-To: <CADGSJ22rVsC0TDTd6OKVnwbx0ExoQ8xWXBMumKB-OFH4sX=yaQ@mail.gmail.com>

On Tue, 10 Apr 2018 16:44:47 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Tue, Apr 10, 2018 at 4:28 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Tue, Apr 10, 2018 at 02:26:08PM -0700, Stephen Hemminger wrote:  
> >> On Tue, 10 Apr 2018 11:59:50 -0700
> >> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> >>  
> >> > Use the registration/notification framework supported by the generic
> >> > bypass infrastructure.
> >> >
> >> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> >> > ---  
> >>
> >> Thanks for doing this.  Your current version has couple show stopper
> >> issues.
> >>
> >> First, the slave device is instantly taking over the slave.
> >> This doesn't allow udev/systemd to do its device rename of the slave
> >> device. Netvsc uses a delayed work to workaround this.  
> >
> > Interesting. Does this mean udev must act within a specific time window
> > then?  
> 
> Sighs, lots of hacks. Why propgating this from driver to a common
> module. We really need a clean solution.
> 

I had a patch to wait for udev to do the rename and go from there
but davem rejected it.

^ permalink raw reply

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Siwei Liu @ 2018-04-10 23:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stephen Hemminger, Sridhar Samudrala, David Miller, Netdev,
	virtualization, virtio-dev, Brandeburg, Jesse, Alexander Duyck,
	Jakub Kicinski, Jason Wang, Jiri Pirko
In-Reply-To: <20180411022807-mutt-send-email-mst@kernel.org>

On Tue, Apr 10, 2018 at 4:28 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Tue, Apr 10, 2018 at 02:26:08PM -0700, Stephen Hemminger wrote:
>> On Tue, 10 Apr 2018 11:59:50 -0700
>> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>>
>> > Use the registration/notification framework supported by the generic
>> > bypass infrastructure.
>> >
>> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> > ---
>>
>> Thanks for doing this.  Your current version has couple show stopper
>> issues.
>>
>> First, the slave device is instantly taking over the slave.
>> This doesn't allow udev/systemd to do its device rename of the slave
>> device. Netvsc uses a delayed work to workaround this.
>
> Interesting. Does this mean udev must act within a specific time window
> then?

Sighs, lots of hacks. Why propgating this from driver to a common
module. We really need a clean solution.

-Siwei


>
>> Secondly, the select queue needs to call queue selection in VF.
>> The bonding/teaming logic doesn't work well for UDP flows.
>> Commit b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
>> fixed this performance problem.
>>
>> Lastly, more indirection is bad in current climate.
>>
>> I am not completely adverse to this but it needs to be fast, simple
>> and completely transparent.

^ permalink raw reply

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Michael S. Tsirkin @ 2018-04-10 23:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sridhar Samudrala, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh, jiri
In-Reply-To: <20180410142608.50f15b45@xeon-e3>

On Tue, Apr 10, 2018 at 02:26:08PM -0700, Stephen Hemminger wrote:
> On Tue, 10 Apr 2018 11:59:50 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
> 
> > Use the registration/notification framework supported by the generic
> > bypass infrastructure.
> > 
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> > ---
> 
> Thanks for doing this.  Your current version has couple show stopper
> issues.
> 
> First, the slave device is instantly taking over the slave.
> This doesn't allow udev/systemd to do its device rename of the slave
> device. Netvsc uses a delayed work to workaround this.

Interesting. Does this mean udev must act within a specific time window
then?

> Secondly, the select queue needs to call queue selection in VF.
> The bonding/teaming logic doesn't work well for UDP flows.
> Commit b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
> fixed this performance problem.
> 
> Lastly, more indirection is bad in current climate.
> 
> I am not completely adverse to this but it needs to be fast, simple
> and completely transparent.

^ permalink raw reply

* Re: [PATCH] net/mlx5: remove some extraneous spaces in indentations
From: Jason Gunthorpe @ 2018-04-10 23:27 UTC (permalink / raw)
  To: Colin Ian King
  Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, netdev, linux-rdma,
	kernel-janitors, linux-kernel
In-Reply-To: <20180410232254.mg7stooh5tytv6m2@ziepe.ca>

On Tue, Apr 10, 2018 at 05:22:54PM -0600, Jason Gunthorpe wrote:
> On Mon, Apr 09, 2018 at 01:43:36PM +0100, Colin Ian King wrote:
> > From: Colin Ian King <colin.king@canonical.com>
> > 
> > There are several lines where there is an extraneous space causing
> > indentation misalignment. Remove them.
> > 
> > Cleans up Cocconelle warnings:
> > 
> > ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:409:3-18: code aligned
> > with following code on line 410
> > ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:415:3-18: code aligned
> > with following code on line 416
> > ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:421:3-18: code aligned
> > with following code on line 422
> > 
> > Signed-off-by: Colin Ian King <colin.king@canonical.com>
> >  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 18 +++++++++---------
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> applied to for-next thanks

Oh wait, this is for netdev, not rdma sorry. Never mind, DaveM should
pick it up.

Jason

^ permalink raw reply

* Re: [PATCH] net/mlx5: remove some extraneous spaces in indentations
From: Jason Gunthorpe @ 2018-04-10 23:22 UTC (permalink / raw)
  To: Colin Ian King
  Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, netdev, linux-rdma,
	kernel-janitors, linux-kernel
In-Reply-To: <20180409124336.29274-1-colin.king@canonical.com>

On Mon, Apr 09, 2018 at 01:43:36PM +0100, Colin Ian King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> There are several lines where there is an extraneous space causing
> indentation misalignment. Remove them.
> 
> Cleans up Cocconelle warnings:
> 
> ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:409:3-18: code aligned
> with following code on line 410
> ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:415:3-18: code aligned
> with following code on line 416
> ./drivers/net/ethernet/mellanox/mlx5/core/qp.c:421:3-18: code aligned
> with following code on line 422
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/qp.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

applied to for-next thanks

Jason

^ permalink raw reply

* Re: [PATCH bpf v4] bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog
From: Daniel Borkmann @ 2018-04-10 23:14 UTC (permalink / raw)
  To: Yonghong Song, ast, netdev; +Cc: kernel-team
In-Reply-To: <20180410163732.2818747-1-yhs@fb.com>

On 04/10/2018 06:37 PM, Yonghong Song wrote:
> syzbot reported a possible deadlock in perf_event_detach_bpf_prog.
> The error details:
>   ======================================================
>   WARNING: possible circular locking dependency detected
>   4.16.0-rc7+ #3 Not tainted
>   ------------------------------------------------------
>   syz-executor7/24531 is trying to acquire lock:
>    (bpf_event_mutex){+.+.}, at: [<000000008a849b07>] perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
> 
>   but task is already holding lock:
>    (&mm->mmap_sem){++++}, at: [<0000000038768f87>] vm_mmap_pgoff+0x198/0x280 mm/util.c:353
> 
>   which lock already depends on the new lock.
> 
>   the existing dependency chain (in reverse order) is:
> 
[...]
> 
> The bug is introduced by Commit f371b304f12e ("bpf/tracing: allow
> user space to query prog array on the same tp") where copy_to_user,
> which requires mm->mmap_sem, is called inside bpf_event_mutex lock.
> At the same time, during perf_event file descriptor close,
> mm->mmap_sem is held first and then subsequent
> perf_event_detach_bpf_prog needs bpf_event_mutex lock.
> Such a senario caused a deadlock.
> 
> As suggested by Daniel, moving copy_to_user out of the
> bpf_event_mutex lock should fix the problem.
> 
> Fixes: f371b304f12e ("bpf/tracing: allow user space to query prog array on the same tp")
> Reported-by: syzbot+dc5ca0e4c9bfafaf2bae@syzkaller.appspotmail.com
> Signed-off-by: Yonghong Song <yhs@fb.com>

Applied to bpf tree, thanks Yonghong!

^ permalink raw reply

* Re: [PATCH] selftests: bpf: update .gitignore with missing generated files
From: Daniel Borkmann @ 2018-04-10 23:14 UTC (permalink / raw)
  To: Anders Roxell, ast, shuah; +Cc: netdev, linux-kernel, linux-kselftest
In-Reply-To: <20180410122421.25393-1-anders.roxell@linaro.org>

On 04/10/2018 02:24 PM, Anders Roxell wrote:
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>

Applied to bpf tree, thanks Anders!

^ permalink raw reply

* Re: [RFC PATCH net-next v6 4/4] netvsc: refactor notifier/event handling code to use the bypass framework
From: Samudrala, Sridhar @ 2018-04-10 22:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: alexander.h.duyck, virtio-dev, jiri, mst, kubakici, netdev,
	virtualization, loseweigh, davem
In-Reply-To: <20180410142608.50f15b45@xeon-e3>

On 4/10/2018 2:26 PM, Stephen Hemminger wrote:
> On Tue, 10 Apr 2018 11:59:50 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> bypass infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> ---
> Thanks for doing this.  Your current version has couple show stopper
> issues.
>
> First, the slave device is instantly taking over the slave.
> This doesn't allow udev/systemd to do its device rename of the slave
> device. Netvsc uses a delayed work to workaround this.

OK. I guess you are referring to the dev_set_mtu() and dev_open() calls that are
made in bypass_slave_register() and you want to defer them to be done after
a delay.  I could avoid these calls in case of netvsc based on bypass_ops.


>
> Secondly, the select queue needs to call queue selection in VF.
> The bonding/teaming logic doesn't work well for UDP flows.
> Commit b3bf5666a510 ("hv_netvsc: defer queue selection to VF")
> fixed this performance problem.

netvsc should not be using bypass_select_queue() as  that ndo op gets used
only with 3-netdev model.
Anyway, will look into updating bypass_select_queue() based on your fix.

>
> Lastly, more indirection is bad in current climate.
>
> I am not completely adverse to this but it needs to be fast, simple
> and completely transparent.

Not sure we can avoid this indirection if we want to commonize the code,  but use
different models for virtio-net and netvsc.

On the other hand, these patches avoid calls to get_netvsc_bymac() and
get_netvsc_by_ref() that go through all the devices for all the netdev events.
netvsc lookups should be much faster.

Thanks
Sridhar

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox