Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v12 3/5] dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver
From: Marc Kleine-Budde @ 2019-08-16  9:38 UTC (permalink / raw)
  To: Dan Murphy, wg, davem; +Cc: linux-can, netdev, linux-kernel
In-Reply-To: <20190509161109.10499-3-dmurphy@ti.com>


[-- Attachment #1.1: Type: text/plain, Size: 3240 bytes --]

On 5/9/19 6:11 PM, Dan Murphy wrote:
> DT binding documentation for TI TCAN4x5x driver.
> 
> Signed-off-by: Dan Murphy <dmurphy@ti.com>
> ---
> 
> v12 - No changes - https://lore.kernel.org/patchwork/patch/1052300/
> 
> v11 - No changes - https://lore.kernel.org/patchwork/patch/1051178/
> v10 - No changes - https://lore.kernel.org/patchwork/patch/1050488/
> v9 - No Changes - https://lore.kernel.org/patchwork/patch/1050118/
> v8 - No Changes - https://lore.kernel.org/patchwork/patch/1047981/
> v7 - Made device state optional - https://lore.kernel.org/patchwork/patch/1047218/
> v6 - No changes - https://lore.kernel.org/patchwork/patch/1042445/
> 
>  .../devicetree/bindings/net/can/tcan4x5x.txt  | 37 +++++++++++++++++++
>  1 file changed, 37 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/can/tcan4x5x.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/can/tcan4x5x.txt b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt
> new file mode 100644
> index 000000000000..c388f7d9feb1
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/can/tcan4x5x.txt
> @@ -0,0 +1,37 @@
> +Texas Instruments TCAN4x5x CAN Controller
> +================================================
> +
> +This file provides device node information for the TCAN4x5x interface contains.
> +
> +Required properties:
> +	- compatible: "ti,tcan4x5x"
> +	- reg: 0
> +	- #address-cells: 1
> +	- #size-cells: 0
> +	- spi-max-frequency: Maximum frequency of the SPI bus the chip can
> +			     operate at should be less than or equal to 18 MHz.
> +	- data-ready-gpios: Interrupt GPIO for data and error reporting.
> +	- device-wake-gpios: Wake up GPIO to wake up the TCAN device.
> +
> +See Documentation/devicetree/bindings/net/can/m_can.txt for additional
> +required property details.
> +
> +Optional properties:
> +	- reset-gpios: Hardwired output GPIO. If not defined then software
> +		       reset.
> +	- device-state-gpios: Input GPIO that indicates if the device is in
> +			      a sleep state or if the device is active.
> +
> +Example:
> +tcan4x5x: tcan4x5x@0 {
> +		compatible = "ti,tcan4x5x";
> +		reg = <0>;
> +		#address-cells = <1>;
> +		#size-cells = <1>;
> +		spi-max-frequency = <10000000>;
> +		bosch,mram-cfg = <0x0 0 0 32 0 0 1 1>;
> +		data-ready-gpios = <&gpio1 14 GPIO_ACTIVE_LOW>;

Can you convert this into a proper interrupt property? E.g.:

>                 interrupt-parent = <&gpio4>;
>                 interrupts = <13 0x2>;

See:
https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bindings/net/can/microchip,mcp251x.txt#L21
https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/tree/drivers/net/can/spi/mcp251x.c?h=mcp251x#n945

> +		device-state-gpios = <&gpio3 21 GPIO_ACTIVE_HIGH>;
> +		device-wake-gpios = <&gpio1 15 GPIO_ACTIVE_HIGH>;
> +		reset-gpios = <&gpio1 27 GPIO_ACTIVE_LOW>;
> +};

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Thomas Gleixner @ 2019-08-16  9:59 UTC (permalink / raw)
  To: Jordan Glover
  Cc: Alexei Starovoitov, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <fkD3fs46a1YnR4lh0tEG-g3tDnDcyZuzji7bAUR9wujPLLl75ZhI8Yk-H1jZpSugO7qChVeCwxAMmxLdeoF2QFS3ZzuYlh7zmeZOmhDJxww=@protonmail.ch>

On Fri, 16 Aug 2019, Jordan Glover wrote:
> "systemd --user" service? Trying to do so will fail with:
> "Failed to apply ambient capabilities (before UID change): Operation not permitted"
> 
> I think it's crucial to clear that point to avoid confusion in this discussion
> where people are talking about different things.
> 
> On the other hand running "systemd --system" service with:
> 
> User=nobody
> AmbientCapabilities=CAP_NET_ADMIN
> 
> is perfectly legit and clears some security concerns as only privileged user
> can start such service.

While we are at it, can we please stop looking at this from a systemd only
perspective. There is a world outside of systemd.

Thanks,

	tglx

^ permalink raw reply

* RE: [PATCH net-next] r8152: divide the tx and rx bottom functions
From: Hayes Wang @ 2019-08-16 10:04 UTC (permalink / raw)
  To: Eric Dumazet, netdev@vger.kernel.org
  Cc: nic_swsd, linux-kernel@vger.kernel.org
In-Reply-To: <a262d73b-0e91-7610-c88f-9670cc6fd18d@gmail.com>

Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: Friday, August 16, 2019 5:27 PM
[...]
> Maybe you would avoid messing with a tasklet (we really try to get rid
> of tasklets in general) using two NAPI, one for TX, one for RX.
> 
> Some drivers already use two NAPI, it is fine.
> 
> This might avoid the ugly dance in r8152_poll(),
> calling napi_schedule(napi) after napi_complete_done() !

The reason is that the USB device couldn't control
the interrupt of USB controller. That is, I couldn't
disable the interrupt before napi_schedule and
enable it after napi_complete_done. If the callback
function occurs during the following timing, it is
possible no one would schedule the napi again.

static int r8152_poll(struct napi_struct *napi, int budget)
{
	struct r8152 *tp = container_of(napi, struct r8152, napi);
	int work_done;

	work_done = rx_bottom(tp, budget);
	bottom_half(tp);

	--> callback occurs here and try to call napi_schedule

	napi_complete_done(napi, work_done)

That is, no tx or rx could be handled unless
something trigger napi_schedule.


Best Regards,
Hayes




^ permalink raw reply

* [PATCH v2] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Jian-Hong Pan @ 2019-08-16 10:09 UTC (permalink / raw)
  To: Yan-Hsuan Chuang, Kalle Valo, David S . Miller
  Cc: linux-wireless, netdev, linux-kernel, linux, Jian-Hong Pan
In-Reply-To: <CAPpJ_edibR0bxO0Pg=NAaRU8fGYheyN8NTv-gVyTDCJhE-iG5Q@mail.gmail.com>

There is a mass of jobs between spin lock and unlock in the hardware
IRQ which will occupy much time originally. To make system work more
efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
reduce the time in hardware IRQ.

Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
---
v2:
 Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
 rtw_pci_interrupt_handler. Because the interrupts are already disabled
 in the hardware interrupt handler.

 drivers/net/wireless/realtek/rtw88/pci.c | 33 +++++++++++++++++++-----
 1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 00ef229552d5..0740140d7e46 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -866,12 +866,28 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 {
 	struct rtw_dev *rtwdev = dev;
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
-	u32 irq_status[4];
 
 	spin_lock(&rtwpci->irq_lock);
 	if (!rtwpci->irq_enabled)
 		goto out;
 
+	/* disable RTW PCI interrupt to avoid more interrupts before the end of
+	 * thread function
+	 */
+	rtw_pci_disable_interrupt(rtwdev, rtwpci);
+out:
+	spin_unlock(&rtwpci->irq_lock);
+
+	return IRQ_WAKE_THREAD;
+}
+
+static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
+{
+	struct rtw_dev *rtwdev = dev;
+	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
+	unsigned long flags;
+	u32 irq_status[4];
+
 	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
 
 	if (irq_status[0] & IMR_MGNTDOK)
@@ -891,8 +907,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq, void *dev)
 	if (irq_status[0] & IMR_ROK)
 		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
 
-out:
-	spin_unlock(&rtwpci->irq_lock);
+	/* all of the jobs for this interrupt have been done */
+	spin_lock_irqsave(&rtwpci->irq_lock, flags);
+	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))
+		rtw_pci_enable_interrupt(rtwdev, rtwpci);
+	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -1152,8 +1171,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
 		goto err_destroy_pci;
 	}
 
-	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
-			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
+	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
+					rtw_pci_interrupt_handler,
+					rtw_pci_interrupt_threadfn,
+					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
 	if (ret) {
 		ieee80211_unregister_hw(hw);
 		goto err_destroy_pci;
@@ -1192,7 +1213,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
 	rtw_pci_disable_interrupt(rtwdev, rtwpci);
 	rtw_pci_destroy(rtwdev, pdev);
 	rtw_pci_declaim(rtwdev, pdev);
-	free_irq(rtwpci->pdev->irq, rtwdev);
+	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
 	rtw_core_deinit(rtwdev);
 	ieee80211_free_hw(hw);
 }
-- 
2.20.1


^ permalink raw reply related

* [PATCH bpf-next] libbpf: remove zc variable as it is not used
From: Magnus Karlsson @ 2019-08-16 10:26 UTC (permalink / raw)
  To: magnus.karlsson, bjorn.topel, ast, daniel, netdev; +Cc: bpf, yhs

The zc is not used in the xsk part of libbpf, so let us remove it. Not
good to have dead code lying around.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reported-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/xsk.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 680e630..9687da9 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -65,7 +65,6 @@ struct xsk_socket {
 	int xsks_map_fd;
 	__u32 queue_id;
 	char ifname[IFNAMSIZ];
-	bool zc;
 };
 
 struct xsk_nl_info {
@@ -608,8 +607,6 @@ int xsk_socket__create(struct xsk_socket **xsk_ptr, const char *ifname,
 		goto out_mmap_tx;
 	}
 
-	xsk->zc = opts.flags & XDP_OPTIONS_ZEROCOPY;
-
 	if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
 		err = xsk_setup_xdp_prog(xsk);
 		if (err)
-- 
2.7.4


^ permalink raw reply related

* [PATCH v2] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-08-16 11:00 UTC (permalink / raw)
  To: netdev
  Cc: jasowang, eric.dumazet, xiyou.wangcong, davem, yangyingliang,
	weiyongjun1

I got a UAF repport in tun driver when doing fuzzy test:

[  466.269490] ==================================================================
[  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
[  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
[  466.271810]
[  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
[  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  466.271838] Call Trace:
[  466.271858]  dump_stack+0xca/0x13e
[  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271890]  print_address_description+0x79/0x440
[  466.271906]  ? vprintk_func+0x5e/0xf0
[  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271935]  __kasan_report+0x15c/0x1df
[  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
[  466.271976]  kasan_report+0xe/0x20
[  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
[  466.272013]  do_iter_readv_writev+0x4b7/0x740
[  466.272032]  ? default_llseek+0x2d0/0x2d0
[  466.272072]  do_iter_read+0x1c5/0x5e0
[  466.272110]  vfs_readv+0x108/0x180
[  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
[  466.299020]  ? fsnotify+0x888/0xd50
[  466.299040]  ? __fsnotify_parent+0xd0/0x350
[  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
[  466.304548]  ? vfs_write+0x264/0x510
[  466.304569]  ? ksys_write+0x101/0x210
[  466.304591]  ? do_preadv+0x116/0x1a0
[  466.304609]  do_preadv+0x116/0x1a0
[  466.309829]  do_syscall_64+0xc8/0x600
[  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.309861] RIP: 0033:0x4560f9
[  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
[  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
[  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
[  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
[  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
[  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
[  466.323057]
[  466.323064] Allocated by task 2605:
[  466.335165]  save_stack+0x19/0x80
[  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
[  466.337755]  kmem_cache_alloc+0xe8/0x320
[  466.339050]  getname_flags+0xca/0x560
[  466.340229]  user_path_at_empty+0x2c/0x50
[  466.341508]  vfs_statx+0xe6/0x190
[  466.342619]  __do_sys_newstat+0x81/0x100
[  466.343908]  do_syscall_64+0xc8/0x600
[  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.347034]
[  466.347517] Freed by task 2605:
[  466.348471]  save_stack+0x19/0x80
[  466.349476]  __kasan_slab_free+0x12e/0x180
[  466.350726]  kmem_cache_free+0xc8/0x430
[  466.351874]  putname+0xe2/0x120
[  466.352921]  filename_lookup+0x257/0x3e0
[  466.354319]  vfs_statx+0xe6/0x190
[  466.355498]  __do_sys_newstat+0x81/0x100
[  466.356889]  do_syscall_64+0xc8/0x600
[  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  466.359567]
[  466.360050] The buggy address belongs to the object at ffff888372139100
[  466.360050]  which belongs to the cache names_cache of size 4096
[  466.363735] The buggy address is located 336 bytes inside of
[  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
[  466.367179] The buggy address belongs to the page:
[  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
[  466.371582] flags: 0x2fffff80010200(slab|head)
[  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
[  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[  466.377778] page dumped because: kasan: bad access detected
[  466.379730]
[  466.380288] Memory state around the buggy address:
[  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.388257]                                                  ^
[  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  466.394667] ==================================================================

tun_chr_read_iter() accessed the memory which freed by free_netdev()
called by tun_set_iff():

        CPUA                                     CPUB
    tun_set_iff()
      alloc_netdev_mqs()
      tun_attach()
                                            tun_chr_read_iter()
                                              tun_get()
      register_netdevice() <-- inject error
      tun_detach_all()
        synchronize_net()
                                              tun_do_read()
                                                tun_ring_recv()
                                                  schedule()
      free_netdev()
        netdev_freemem()
                                              tun_put()
                                                dev_put() <-- UAF

Move tun_set_real_num_queues() out of tun_attach() and call it
before register_netdevice() in tun_set_iff().

Call tun_attach() after register_netdevice() to make sure tfile->tun
is not published until the netdevice is registered. So the read/write
thread can not use the tun pointer that may freed by free_netdev().
(The tun and dev pointer are allocated by alloc_netdev_mqs(), they can
be freed by netdev_freemem().)

---
Changes in v2:
 - add a param in tun_set_real_num_queues()
 - move tun_set_real_num_queues() out of tun_attach()
 - call tun_set_real_num_queues() before register_netdevice()
 - call tun_attach() after register_netdevice()
---

Cc: Yang Yingliang <yangyingliang@huawei.com>
Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 drivers/net/tun.c | 38 +++++++++++++++++++++-----------------
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index db16d7a13e00..a19f864c5f8d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -626,10 +626,11 @@ static inline bool tun_not_capable(struct tun_struct *tun)
 		!ns_capable(net->user_ns, CAP_NET_ADMIN);
 }
 
-static void tun_set_real_num_queues(struct tun_struct *tun)
+static void tun_set_real_num_queues(struct tun_struct *tun,
+				    unsigned int numqueues)
 {
-	netif_set_real_num_tx_queues(tun->dev, tun->numqueues);
-	netif_set_real_num_rx_queues(tun->dev, tun->numqueues);
+	netif_set_real_num_tx_queues(tun->dev, numqueues);
+	netif_set_real_num_rx_queues(tun->dev, numqueues);
 }
 
 static void tun_disable_queue(struct tun_struct *tun, struct tun_file *tfile)
@@ -708,7 +709,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
 		tun_flow_delete_by_queue(tun, tun->numqueues + 1);
 		/* Drop read queue */
 		tun_queue_purge(tfile);
-		tun_set_real_num_queues(tun);
+		tun_set_real_num_queues(tun, tun->numqueues);
 	} else if (tfile->detached && clean) {
 		tun = tun_enable_queue(tfile);
 		sock_put(&tfile->sk);
@@ -873,7 +874,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
 	rcu_assign_pointer(tfile->tun, tun);
 	rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
 	tun->numqueues++;
-	tun_set_real_num_queues(tun);
 out:
 	return err;
 }
@@ -2734,6 +2734,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		if (err < 0)
 			return err;
 
+		tun_set_real_num_queues(tun, tun->numqueues);
+
 		if (tun->flags & IFF_MULTI_QUEUE &&
 		    (tun->numqueues + tun->numdisabled > 1)) {
 			/* One or more queue has already been attached, no need
@@ -2828,14 +2830,18 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 			      (ifr->ifr_flags & TUN_FEATURES);
 
 		INIT_LIST_HEAD(&tun->disabled);
-		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
-				 ifr->ifr_flags & IFF_NAPI_FRAGS);
-		if (err < 0)
-			goto err_free_flow;
+
+		tun_set_real_num_queues(tun, tun->numqueues + 1);
 
 		err = register_netdevice(tun->dev);
 		if (err < 0)
-			goto err_detach;
+			/* register_netdevice() already called tun_free_netdev() */
+			goto err_free_dev;
+
+		err = tun_attach(tun, file, false, ifr->ifr_flags & IFF_NAPI,
+				 ifr->ifr_flags & IFF_NAPI_FRAGS);
+		if (err < 0)
+			goto err_unregister;
 	}
 
 	netif_carrier_on(tun->dev);
@@ -2851,14 +2857,10 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	strcpy(ifr->ifr_name, tun->dev->name);
 	return 0;
 
-err_detach:
-	tun_detach_all(dev);
-	/* register_netdevice() already called tun_free_netdev() */
-	goto err_free_dev;
+err_unregister:
+	unregister_netdevice(dev);
+	return err;
 
-err_free_flow:
-	tun_flow_uninit(tun);
-	security_tun_dev_free_security(tun->security);
 err_free_stat:
 	free_percpu(tun->pcpu_stats);
 err_free_dev:
@@ -2979,6 +2981,8 @@ static int tun_set_queue(struct file *file, struct ifreq *ifr)
 			goto unlock;
 		ret = tun_attach(tun, file, false, tun->flags & IFF_NAPI,
 				 tun->flags & IFF_NAPI_FRAGS);
+		if (!ret)
+			tun_set_real_num_queues(tun, tun->numqueues);
 	} else if (ifr->ifr_flags & IFF_DETACH_QUEUE) {
 		tun = rtnl_dereference(tfile->tun);
 		if (!tun || !(tun->flags & IFF_MULTI_QUEUE) || tfile->detached)
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH v2] rtw88: pci: Move a mass of jobs in hw IRQ to soft IRQ
From: Tony Chuang @ 2019-08-16 10:44 UTC (permalink / raw)
  To: Jian-Hong Pan, Kalle Valo, David S . Miller
  Cc: linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux@endlessm.com
In-Reply-To: <20190816100903.7549-1-jian-hong@endlessm.com>

> From: Jian-Hong Pan
> 
> There is a mass of jobs between spin lock and unlock in the hardware
> IRQ which will occupy much time originally. To make system work more
> efficiently, this patch moves the jobs to the soft IRQ (bottom half) to
> reduce the time in hardware IRQ.
> 
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> ---
> v2:
>  Change the spin_lock_irqsave/unlock_irqrestore to spin_lock/unlock in
>  rtw_pci_interrupt_handler. Because the interrupts are already disabled
>  in the hardware interrupt handler.
> 
>  drivers/net/wireless/realtek/rtw88/pci.c | 33 +++++++++++++++++++-----
>  1 file changed, 27 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/wireless/realtek/rtw88/pci.c
> b/drivers/net/wireless/realtek/rtw88/pci.c
> index 00ef229552d5..0740140d7e46 100644
> --- a/drivers/net/wireless/realtek/rtw88/pci.c
> +++ b/drivers/net/wireless/realtek/rtw88/pci.c
> @@ -866,12 +866,28 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> void *dev)
>  {
>  	struct rtw_dev *rtwdev = dev;
>  	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> -	u32 irq_status[4];
> 
>  	spin_lock(&rtwpci->irq_lock);
>  	if (!rtwpci->irq_enabled)
>  		goto out;
> 
> +	/* disable RTW PCI interrupt to avoid more interrupts before the end of
> +	 * thread function
> +	 */
> +	rtw_pci_disable_interrupt(rtwdev, rtwpci);

So basically it's to prevent back-to-back interrupts.

Nothing wrong about it, I just wondering why we don't like
back-to-back interrupts. Does it means that those interrupts
fired between irq_handler and threadfin would increase
much more time to consume them.

> +out:
> +	spin_unlock(&rtwpci->irq_lock);
> +
> +	return IRQ_WAKE_THREAD;
> +}
> +
> +static irqreturn_t rtw_pci_interrupt_threadfn(int irq, void *dev)
> +{
> +	struct rtw_dev *rtwdev = dev;
> +	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
> +	unsigned long flags;
> +	u32 irq_status[4];
> +
>  	rtw_pci_irq_recognized(rtwdev, rtwpci, irq_status);
> 
>  	if (irq_status[0] & IMR_MGNTDOK)
> @@ -891,8 +907,11 @@ static irqreturn_t rtw_pci_interrupt_handler(int irq,
> void *dev)
>  	if (irq_status[0] & IMR_ROK)
>  		rtw_pci_rx_isr(rtwdev, rtwpci, RTW_RX_QUEUE_MPDU);
> 
> -out:
> -	spin_unlock(&rtwpci->irq_lock);
> +	/* all of the jobs for this interrupt have been done */
> +	spin_lock_irqsave(&rtwpci->irq_lock, flags);

I suggest to protect the ISRs. Because next patches will require
to check if the TX DMA path is empty. This means I will also add
this rtwpci->irq_lock to the TX path, and check if the skb_queue
does not have any pending SKBs not DMAed successfully.

> +	if (rtw_flag_check(rtwdev, RTW_FLAG_RUNNING))

Why check the flag here? Is there any racing or something?
Otherwise it looks to break the symmetry.

> +		rtw_pci_enable_interrupt(rtwdev, rtwpci);
> +	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
> 
>  	return IRQ_HANDLED;
>  }
> @@ -1152,8 +1171,10 @@ static int rtw_pci_probe(struct pci_dev *pdev,
>  		goto err_destroy_pci;
>  	}
> 
> -	ret = request_irq(pdev->irq, &rtw_pci_interrupt_handler,
> -			  IRQF_SHARED, KBUILD_MODNAME, rtwdev);
> +	ret = devm_request_threaded_irq(rtwdev->dev, pdev->irq,
> +					rtw_pci_interrupt_handler,
> +					rtw_pci_interrupt_threadfn,
> +					IRQF_SHARED, KBUILD_MODNAME, rtwdev);
>  	if (ret) {
>  		ieee80211_unregister_hw(hw);
>  		goto err_destroy_pci;
> @@ -1192,7 +1213,7 @@ static void rtw_pci_remove(struct pci_dev *pdev)
>  	rtw_pci_disable_interrupt(rtwdev, rtwpci);
>  	rtw_pci_destroy(rtwdev, pdev);
>  	rtw_pci_declaim(rtwdev, pdev);
> -	free_irq(rtwpci->pdev->irq, rtwdev);
> +	devm_free_irq(rtwdev->dev, rtwpci->pdev->irq, rtwdev);
>  	rtw_core_deinit(rtwdev);
>  	ieee80211_free_hw(hw);
>  }
> --
> 2.20.1

Yan-Hsuan


^ permalink raw reply

* Re: [PATCH v2 00/10] Add definition for the number of standard PCI BARs
From: Andrew Murray @ 2019-08-16 10:51 UTC (permalink / raw)
  To: Denis Efremov
  Cc: Bjorn Helgaas, linux-kernel, linux-pci, Sebastian Ott,
	Gerald Schaefer, H. Peter Anvin, Giuseppe Cavallaro,
	Alexandre Torgue, Matt Porter, Alexandre Bounine, Peter Jones,
	Bartlomiej Zolnierkiewicz, Cornelia Huck, Alex Williamson,
	Jose Abreu, kvm, linux-fbdev, netdev, x86, linux-s390
In-Reply-To: <20190816092437.31846-1-efremov@linux.com>

On Fri, Aug 16, 2019 at 12:24:27PM +0300, Denis Efremov wrote:
> Code that iterates over all standard PCI BARs typically uses
> PCI_STD_RESOURCE_END, but this is error-prone because it requires
> "i <= PCI_STD_RESOURCE_END" rather than something like
> "i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
> way PCI_SRIOV_NUM_BARS is used. There is already the definition
> PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.
> 
> Changes in v2:
>   - Reverse checks in pci_iomap_range,pci_iomap_wc_range.
>   - Refactor loops in vfio_pci to keep PCI_STD_RESOURCES.
>   - Add 2 new patches to replace the magic constant with new define.
>   - Split net patch in v1 to separate stmmac and dwc-xlgmac patches.
> 
> Denis Efremov (10):
>   PCI: Add define for the number of standard PCI BARs
>   s390/pci: Loop using PCI_STD_NUM_BARS
>   x86/PCI: Loop using PCI_STD_NUM_BARS
>   stmmac: pci: Loop using PCI_STD_NUM_BARS
>   net: dwc-xlgmac: Loop using PCI_STD_NUM_BARS
>   rapidio/tsi721: Loop using PCI_STD_NUM_BARS
>   efifb: Loop using PCI_STD_NUM_BARS
>   vfio_pci: Loop using PCI_STD_NUM_BARS
>   PCI: hv: Use PCI_STD_NUM_BARS
>   PCI: Use PCI_STD_NUM_BARS
> 
>  arch/s390/include/asm/pci.h                      |  5 +----
>  arch/s390/include/asm/pci_clp.h                  |  6 +++---
>  arch/s390/pci/pci.c                              | 16 ++++++++--------
>  arch/s390/pci/pci_clp.c                          |  6 +++---
>  arch/x86/pci/common.c                            |  2 +-
>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |  4 ++--
>  drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   |  2 +-
>  drivers/pci/controller/pci-hyperv.c              | 10 +++++-----
>  drivers/pci/pci.c                                | 11 ++++++-----
>  drivers/pci/quirks.c                             |  4 ++--
>  drivers/rapidio/devices/tsi721.c                 |  2 +-
>  drivers/vfio/pci/vfio_pci.c                      | 11 +++++++----
>  drivers/vfio/pci/vfio_pci_config.c               | 10 ++++++----
>  drivers/vfio/pci/vfio_pci_private.h              |  4 ++--
>  drivers/video/fbdev/efifb.c                      |  2 +-
>  include/linux/pci.h                              |  2 +-
>  include/uapi/linux/pci_regs.h                    |  1 +
>  17 files changed, 51 insertions(+), 47 deletions(-)

I've come across a few more places where this change can be made. There
may be multiple instances in the same file, but only the first is shown
below:

drivers/misc/pci_endpoint_test.c:       for (bar = BAR_0; bar <= BAR_5; bar++) {
drivers/net/ethernet/intel/e1000/e1000_main.c:          for (i = BAR_1; i <= BAR_5; i++) {
drivers/net/ethernet/intel/ixgb/ixgb_main.c:    for (i = BAR_1; i <= BAR_5; i++) {
drivers/pci/controller/dwc/pci-dra7xx.c:        for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/controller/dwc/pci-layerscape-ep.c: for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/controller/dwc/pcie-artpec6.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/controller/dwc/pcie-designware-plat.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
drivers/pci/endpoint/functions/pci-epf-test.c:  for (bar = BAR_0; bar <= BAR_5; bar++) {
include/linux/pci-epc.h:        u64     bar_fixed_size[BAR_5 + 1];
drivers/scsi/pm8001/pm8001_hwi.c:       for (bar = 0; bar < 6; bar++) {
drivers/scsi/pm8001/pm8001_init.c:      for (bar = 0; bar < 6; bar++) {
drivers/ata/sata_nv.c:  for (bar = 0; bar < 6; bar++)
drivers/video/fbdev/core/fbmem.c:       for (idx = 0, bar = 0; bar < PCI_ROM_RESOURCE; bar++) {
drivers/staging/gasket/gasket_core.c:   for (i = 0; i < GASKET_NUM_BARS; i++) {
drivers/tty/serial/8250/8250_pci.c:     for (i = 0; i < PCI_NUM_BAR_RESOURCES; i++) { <-----------

It looks like BARs are often iterated with PCI_NUM_BAR_RESOURCES, there
are a load of these too found with:

git grep PCI_ROM_RESOURCE | grep "< "

I'm happy to share patches if preferred.

Thanks,

Andrew Murray

> 
> -- 
> 2.21.0
> 

^ permalink raw reply

* Re: [PATCH net] tunnel: fix dev null pointer dereference when send pkg larger than mtu in collect_md mode
From: Hangbin Liu @ 2019-08-16 10:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Stefano Brivio, wenxu, Alexei Starovoitov,
	David S . Miller
In-Reply-To: <fe103dee-bba8-1e0d-83b2-f91c2c37089d@gmail.com>

On Fri, Aug 16, 2019 at 10:23:55AM +0200, Eric Dumazet wrote:
> 
> 
> On 8/16/19 5:24 AM, Hangbin Liu wrote:
> > Hi Eric,
> > 
> > Thanks for the review.
> > On Thu, Aug 15, 2019 at 11:16:58AM +0200, Eric Dumazet wrote:
> >>> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> >>> index 38c02bb62e2c..c6713c7287df 100644
> >>> --- a/net/ipv4/ip_tunnel.c
> >>> +++ b/net/ipv4/ip_tunnel.c
> >>> @@ -597,6 +597,9 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
> >>>  		goto tx_error;
> >>>  	}
> >>>  
> >>> +	if (skb_dst(skb) && !skb_dst(skb)->dev)
> >>> +		skb_dst(skb)->dev = rt->dst.dev;
> >>> +
> >>
> >>
> >> IMO this looks wrong.
> >> This dst seems shared. 
> > 
> > If the dst is shared, it may cause some problem. Could you point me where the
> > dst may be shared possibly?
> >
> 
> dst are inherently shared.
> 
> This is why we have a refcount on them.
> 
> Only when the dst has been allocated by the current thread we can make changes on them.
> 

OK, I see now.

Then how about fix the issue in __icmp_send and decode_session{4,6}. The
fix in there is safe, as in __icmp_send() we only want to get net,
dev_net(skb_in->dev) could also do the work, just as icmp6_send() does.

For decode_session{4,6} the oif is also not needed in this scenario as this
is called by xfrm_decode_session_reverse(), we only need the skb_iif
fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

I also need to check more code in OVS..

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 1510e951f451..95d803543df5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -582,7 +582,11 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,

        if (!rt)
                goto out;
-       net = dev_net(rt->dst.dev);
+
+       if (skb_in->dev)
+               net = dev_net(skb_in->dev);
+       else
+               goto out;

        /*
         *      Find the original header. It is expected to be valid, of course.
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8ca637a72697..ec94f5795ea4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -3269,7 +3269,7 @@ decode_session4(struct sk_buff *skb, struct flowi *fl, bool reverse)
        struct flowi4 *fl4 = &fl->u.ip4;
        int oif = 0;

-       if (skb_dst(skb))
+       if (skb_dst(skb) && skb_dst(skb)->dev)
                oif = skb_dst(skb)->dev->ifindex;

        memset(fl4, 0, sizeof(struct flowi4));
@@ -3387,7 +3387,7 @@ decode_session6(struct sk_buff *skb, struct flowi *fl, bool reverse)

        nexthdr = nh[nhoff];

-       if (skb_dst(skb))
+       if (skb_dst(skb) && skb_dst(skb)->dev)
                oif = skb_dst(skb)->dev->ifindex;

        memset(fl6, 0, sizeof(struct flowi6));


Thanks
Hangbin

^ permalink raw reply related

* Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf
From: Jordan Glover @ 2019-08-16 11:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexei Starovoitov, Andy Lutomirski, Daniel Colascione, Song Liu,
	Kees Cook, Networking, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Lorenz Bauer, Jann Horn, Greg KH, Linux API,
	LSM List
In-Reply-To: <alpine.DEB.2.21.1908161158490.1873@nanos.tec.linutronix.de>

On Friday, August 16, 2019 9:59 AM, Thomas Gleixner <tglx@linutronix.de> wrote:

> On Fri, 16 Aug 2019, Jordan Glover wrote:
>
> > "systemd --user" service? Trying to do so will fail with:
> > "Failed to apply ambient capabilities (before UID change): Operation not permitted"
> > I think it's crucial to clear that point to avoid confusion in this discussion
> > where people are talking about different things.
> > On the other hand running "systemd --system" service with:
> > User=nobody
> > AmbientCapabilities=CAP_NET_ADMIN
> > is perfectly legit and clears some security concerns as only privileged user
> > can start such service.
>
> While we are at it, can we please stop looking at this from a systemd only
> perspective. There is a world outside of systemd.
>
> Thanks,
>
> tglx

If you define:

"systemd --user" == unprivileged process started by unprivileged user
"systemd --system" == process started by privileged user but run as another
user which keeps some of parent user privileges and drops others

you can get rid of "systemd" from the equation.

"systemd --user" was the example provided by Alexei when asked about the usecase
but his description didn't match what it does so it's not obvious what the real
usecase is. I'm sure there can be many more examples and systemd isn't important
here in particular beside to understand this specific example.

Jordan

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Christian Herber @ 2019-08-16 11:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190815155613.GE15291@lunn.ch>

On 15.08.2019 17:56, Andrew Lunn wrote:
> Caution: EXT Email
> 
> On Thu, Aug 15, 2019 at 03:32:29PM +0000, Christian Herber wrote:
>> BASE-T1 is a category of Ethernet PHYs.
>> They use a single copper pair for transmission.
>> This patch add basic support for this category of PHYs.
>> It coveres the discovery of abilities and basic configuration.
>> It includes setting fixed speed and enabling auto-negotiation.
>> BASE-T1 devices should always Clause-45 managed.
>> Therefore, this patch extends phy-c45.c.
>> While for some functions like auto-neogtiation different registers are
>> used, the layout of these registers is the same for the used fields.
>> Thus, much of the logic of basic Clause-45 devices can be reused.
>>
>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>> ---
>>   drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>   drivers/net/phy/phy-core.c   |   4 +-
>>   include/uapi/linux/ethtool.h |   2 +
>>   include/uapi/linux/mdio.h    |  21 +++++++
>>   4 files changed, 129 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>> index b9d4145781ca..9ff0b8c785de 100644
>> --- a/drivers/net/phy/phy-c45.c
>> +++ b/drivers/net/phy/phy-c45.c
>> @@ -8,13 +8,23 @@
>>   #include <linux/mii.h>
>>   #include <linux/phy.h>
>>
>> +#define IS_100BASET1(phy) (linkmode_test_bit( \
>> +                        ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
>> +                        (phy)->supported))
>> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
>> +                         ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
>> +                         (phy)->supported))
> 
> Hi Christian
> 
> We already have the flag phydev->is_gigabit_capable. Maybe add a flag
> phydev->is_t1_capable
> 
>> +
>> +static u32 get_aneg_ctrl(struct phy_device *phydev);
>> +static u32 get_aneg_stat(struct phy_device *phydev);
> 
> No forward declarations please. Put the code in the right order so
> they are not needed.
> 
> Thanks
> 
>       Andrew
> 

Hi Andrew,

thanks for feedback. The use of an additional flag is a good proposal.
I was hesitant to touch the phydev structure.
I will add this along with removing the forward declaration in v2.

Regards,

Christian

^ permalink raw reply

* Re: [EXT] Re: [PATCH net-next 1/1] Added BASE-T1 PHY support to PHY Subsystem
From: Christian Herber @ 2019-08-16 12:05 UTC (permalink / raw)
  To: Heiner Kallweit, Andrew Lunn
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Florian Fainelli
In-Reply-To: <2ca68436-8e49-b0b2-2460-4fcac3094b09@gmail.com>

On 15.08.2019 18:34, Heiner Kallweit wrote:
> Caution: EXT Email
> 
> On 15.08.2019 17:56, Andrew Lunn wrote:
>> On Thu, Aug 15, 2019 at 03:32:29PM +0000, Christian Herber wrote:
>>> BASE-T1 is a category of Ethernet PHYs.
>>> They use a single copper pair for transmission.
>>> This patch add basic support for this category of PHYs.
>>> It coveres the discovery of abilities and basic configuration.
>>> It includes setting fixed speed and enabling auto-negotiation.
>>> BASE-T1 devices should always Clause-45 managed.
>>> Therefore, this patch extends phy-c45.c.
>>> While for some functions like auto-neogtiation different registers are
>>> used, the layout of these registers is the same for the used fields.
>>> Thus, much of the logic of basic Clause-45 devices can be reused.
>>>
>>> Signed-off-by: Christian Herber <christian.herber@nxp.com>
>>> ---
>>>   drivers/net/phy/phy-c45.c    | 113 +++++++++++++++++++++++++++++++----
>>>   drivers/net/phy/phy-core.c   |   4 +-
>>>   include/uapi/linux/ethtool.h |   2 +
>>>   include/uapi/linux/mdio.h    |  21 +++++++
>>>   4 files changed, 129 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/phy-c45.c b/drivers/net/phy/phy-c45.c
>>> index b9d4145781ca..9ff0b8c785de 100644
>>> --- a/drivers/net/phy/phy-c45.c
>>> +++ b/drivers/net/phy/phy-c45.c
>>> @@ -8,13 +8,23 @@
>>>   #include <linux/mii.h>
>>>   #include <linux/phy.h>
>>>
>>> +#define IS_100BASET1(phy) (linkmode_test_bit( \
>>> +                       ETHTOOL_LINK_MODE_100baseT1_Full_BIT, \
>>> +                       (phy)->supported))
>>> +#define IS_1000BASET1(phy) (linkmode_test_bit( \
>>> +                        ETHTOOL_LINK_MODE_1000baseT1_Full_BIT, \
>>> +                        (phy)->supported))
>>
>> Hi Christian
>>
>> We already have the flag phydev->is_gigabit_capable. Maybe add a flag
>> phydev->is_t1_capable
>>
>>> +
>>> +static u32 get_aneg_ctrl(struct phy_device *phydev);
>>> +static u32 get_aneg_stat(struct phy_device *phydev);
>>
>> No forward declarations please. Put the code in the right order so
>> they are not needed.
>>
>> Thanks
>>
>>       Andrew
>>
> 
> For whatever reason I don't have the original mail in my netdev inbox (yet).
> 
> +       if (IS_100BASET1(phydev) || IS_1000BASET1(phydev))
> +               ctrl = MDIO_AN_BT1_CTRL;
> 
> Code like this could be problematic once a PHY supports one of the T1 modes
> AND normal modes. Then normal modes would be unusable.
> 
> I think this scenario isn't completely hypothetical. See the Aquantia
> AQCS109 that supports normal modes and (proprietary) 1000Base-T2.
> 
> Maybe we need separate versions of the generic functions for T1.
> Then it would be up to the PHY driver to decide when to use which
> version.
> 
> Heiner
> 

Integrating this with the existing driver or creating a new on is an 
interesting question. I came to the conclusion that it is most efficient 
to integrate to avoid all to much copy paste code.

So far, I am not aware of any device that supports T1 and something else 
at the same time. From a HW perspective, I also consider this quite 
unlikely. In the unlikely case that such a device comes up, support from 
the genphy driver would be limited to the BASE-T1 modes. But i would 
rather create the special case for the special device and cater the 
current mainstream support to the mainstream devices.

I think this boils down to a general strategy for the PHY framework, as 
this can happen for other classes of devices also like NGBASE-T1, 
MultiGBASE-T and future unknown devices. For now, I think the 
architecture is sufficiently scalable with a single c45 genphy driver

Christian

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: relicense bpf_helpers.h and bpf_endian.h
From: Jesper Dangaard Brouer @ 2019-08-16 12:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: brouer, bpf, netdev, andrii.nakryiko, kernel-team,
	Michael Holzheu, Naveen N . Rao, David S . Miller,
	Michal Rostecki, John Fastabend, Sargun Dhillon
In-Reply-To: <20190816054543.2215626-1-andriin@fb.com>

On Thu, 15 Aug 2019 22:45:43 -0700
Andrii Nakryiko <andriin@fb.com> wrote:

> bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
> definitions essential to almost every BPF program. Which makes them
> useful not just for selftests. To be able to expose them as part of
> libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
> BSD-2-Clause. This patch updates licensing of those two files.

I've already ACKed this, and is fine with (LGPL-2.1 OR BSD-2-Clause).

I just want to understand, why "BSD-2-Clause" and not "Apache-2.0" ?

The original argument was that this needed to be compatible with
"Apache-2.0", then why not simply add this in the "OR" ?

> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Hechao Li <hechaol@fb.com>
> Acked-by: Martin KaFai Lau <kafai@fb.com>
> Acked-by: Andrey Ignatov <rdna@fb.com>
> Acked-by: Yonghong Song <yhs@fb.com>
> Acked-by: Lawrence Brakmo <brakmo@fb.com>
> Acked-by: Adam Barth <arb@fb.com>
> Acked-by: Roman Gushchin <guro@fb.com>
> Acked-by: Josef Bacik <jbacik@fb.com>
> Acked-by: Joe Stringer <joe@wand.net.nz>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Acked-by: David Ahern <dsahern@gmail.com>
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Confirming I acked this.

> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Acked-by: Lorenz Bauer <lmb@cloudflare.com>
> Acked-by: Adrian Ratiu <adrian.ratiu@collabora.com>
> Acked-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
> Acked-by: Willem de Bruijn <willemb@google.com>
> Acked-by: Petar Penkov <ppenkov@google.com>
> Acked-by: Teng Qin <palmtenor@gmail.com>
> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Michal Rostecki <mrostecki@opensuse.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Sargun Dhillon <sargun@sargun.me>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/testing/selftests/bpf/bpf_endian.h  | 2 +-
>  tools/testing/selftests/bpf/bpf_helpers.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/bpf_endian.h b/tools/testing/selftests/bpf/bpf_endian.h
> index 05f036df8a4c..ff3593b0ae03 100644
> --- a/tools/testing/selftests/bpf/bpf_endian.h
> +++ b/tools/testing/selftests/bpf/bpf_endian.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
>  #ifndef __BPF_ENDIAN__
>  #define __BPF_ENDIAN__
>  
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 8b503ea142f0..6c4930bc6e2e 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 */
> +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
>  #ifndef __BPF_HELPERS_H
>  #define __BPF_HELPERS_H
>  



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* r8169: Performance regression and latency instability
From: Juliana Rodrigueiro @ 2019-08-16 12:09 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, hkallweit1

Greetings!

During migration from kernel 3.14 to 4.19, we noticed a regression on 
the network performance. Under the exact same circumstances, the 
standard deviation of the latency is more than double than before on the 
Realtek RTL8111/8168B (10ec:8168) using the r8169 driver.

Kernel 3.14:
     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
     313.37

Kernel 4.19:
     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O 
STDDEV_LATENCY -m 64K -d Send
     632.96

In contrast, we noticed small improvements in performance with other 
non-Realtek network cards (igb, tg3). Which suggested a possible driver 
related bug.

However after bisecting the code, I ended up with the following patch, 
which was introduced in kernel 4.17 and modifies net/ipv4:

     commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a
     Author: Eric Dumazet <edumazet@google.com>
     Date:   Mon Feb 19 11:56:47 2018 -0800

         tcp: switch to GSO being always on

Could you please help me to clarify, should GSO be always on on my 
device? Or does it just affect TCP? According to ethtool it is always 
off, "ethtool -K eth0 gso on" has no effect, unless I switch SG on.

     # ethtool -k eth0
     Offload parameters for eth0:
     Cannot get device udp large send offload settings: Operation not 
supported
     rx-checksumming: on
     tx-checksumming: off
     scatter-gather: off
     tcp-segmentation-offload: off
     udp-fragmentation-offload: off
     generic-segmentation-offload: off
     generic-receive-offload: on
     large-receive-offload: off

I validated that reverting "tcp: switch to GSO being always on" 
successfully brings back the better performance for the r8169 driver.

I'm sure that reverting that commit is not the optimal solution, so I 
would like to kindly ask for help to shed some light in this issue.

Best regards,
Juliana.

^ permalink raw reply

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Mark Brown @ 2019-08-16 12:18 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev
In-Reply-To: <20190816004449.10100-4-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 332 bytes --]

On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:

> @@ -842,6 +843,9 @@ struct spi_transfer {
>  
>  	u32		effective_speed_hz;
>  
> +	struct ptp_system_timestamp *ptp_sts;
> +	unsigned int	ptp_sts_word_offset;
> +

You've not documented these fields at all so it's not clear what the
intended usage is.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
From: Mark Brown @ 2019-08-16 12:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev
In-Reply-To: <20190816004449.10100-5-olteanv@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On Fri, Aug 16, 2019 at 03:44:42AM +0300, Vladimir Oltean wrote:
> This patch addresses some cosmetic issues:
> - Alignment
> - Typos
> - (Non-)use of BIT() and GENMASK() macros
> - Unused definitions
> - Unused includes
> - Abuse of ternary operator in detriment of readability
> - Reduce indentation level

This is difficult to review since there's a bunch of largely unrelated
changes all munged into one patch.  It'd be better to split this up so
each change makes one kind of fix, and better to do this separately to
the rest of the series.  In particular having alignment changes along
with other changes hurts reviewability as it's less immediately clear
what's a like for liken substitution.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next, 2/6] PCI: hv: Add a Hyper-V PCI mini driver for software backchannel interface
From: Vitaly Kuznetsov @ 2019-08-16 12:27 UTC (permalink / raw)
  To: Haiyang Zhang, sashal@kernel.org, davem@davemloft.net,
	saeedm@mellanox.com, leon@kernel.org, eranbe@mellanox.com,
	lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	linux-kernel@vger.kernel.org
In-Reply-To: <1565809632-39138-3-git-send-email-haiyangz@microsoft.com>

Haiyang Zhang <haiyangz@microsoft.com> writes:

> This mini driver is a helper driver allows other drivers to
> have a common interface with the Hyper-V PCI frontend driver.
>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  MAINTAINERS                              |  1 +
>  drivers/pci/Kconfig                      |  1 +
>  drivers/pci/controller/Kconfig           |  7 ++++
>  drivers/pci/controller/Makefile          |  1 +
>  drivers/pci/controller/pci-hyperv-mini.c | 70 ++++++++++++++++++++++++++++++++
>  drivers/pci/controller/pci-hyperv.c      | 12 ++++--
>  include/linux/hyperv.h                   | 30 ++++++++++----
>  7 files changed, 111 insertions(+), 11 deletions(-)
>  create mode 100644 drivers/pci/controller/pci-hyperv-mini.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e352550..c4962b9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7453,6 +7453,7 @@ F:	drivers/hid/hid-hyperv.c
>  F:	drivers/hv/
>  F:	drivers/input/serio/hyperv-keyboard.c
>  F:	drivers/pci/controller/pci-hyperv.c
> +F:	drivers/pci/controller/pci-hyperv-mini.c
>  F:	drivers/net/hyperv/
>  F:	drivers/scsi/storvsc_drv.c
>  F:	drivers/uio/uio_hv_generic.c
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 2ab9240..bb852f5 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -182,6 +182,7 @@ config PCI_LABEL
>  config PCI_HYPERV
>          tristate "Hyper-V PCI Frontend"
>          depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
> +	select PCI_HYPERV_MINI
>          help
>            The PCI device frontend driver allows the kernel to import arbitrary
>            PCI devices from a PCI backend to support PCI driver domains.
> diff --git a/drivers/pci/controller/Kconfig b/drivers/pci/controller/Kconfig
> index fe9f9f1..8e31cba 100644
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -281,5 +281,12 @@ config VMD
>  	  To compile this driver as a module, choose M here: the
>  	  module will be called vmd.
>  
> +config PCI_HYPERV_MINI
> +	tristate "Hyper-V PCI Mini"
> +	depends on X86 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && X86_64
> +	help
> +	  The Hyper-V PCI Mini is a helper driver allows other drivers to
> +	  have a common interface with the Hyper-V PCI frontend driver.
> +

Out of pure curiosity, why not just export this interface from
PCI_HYPERV directly? Why do we need this stub?

>  source "drivers/pci/controller/dwc/Kconfig"
>  endmenu
> diff --git a/drivers/pci/controller/Makefile b/drivers/pci/controller/Makefile
> index d56a507..77e0132 100644
> --- a/drivers/pci/controller/Makefile
> +++ b/drivers/pci/controller/Makefile
> @@ -4,6 +4,7 @@ obj-$(CONFIG_PCIE_CADENCE_HOST) += pcie-cadence-host.o
>  obj-$(CONFIG_PCIE_CADENCE_EP) += pcie-cadence-ep.o
>  obj-$(CONFIG_PCI_FTPCI100) += pci-ftpci100.o
>  obj-$(CONFIG_PCI_HYPERV) += pci-hyperv.o
> +obj-$(CONFIG_PCI_HYPERV_MINI) += pci-hyperv-mini.o
>  obj-$(CONFIG_PCI_MVEBU) += pci-mvebu.o
>  obj-$(CONFIG_PCI_AARDVARK) += pci-aardvark.o
>  obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o
> diff --git a/drivers/pci/controller/pci-hyperv-mini.c b/drivers/pci/controller/pci-hyperv-mini.c
> new file mode 100644
> index 0000000..9b6cd1c
> --- /dev/null
> +++ b/drivers/pci/controller/pci-hyperv-mini.c
> @@ -0,0 +1,70 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) Microsoft Corporation.
> + *
> + * Author:
> + *   Haiyang Zhang <haiyangz@microsoft.com>
> + *
> + * This mini driver is a helper driver allows other drivers to
> + * have a common interface with the Hyper-V PCI frontend driver.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/hyperv.h>
> +
> +struct hyperv_pci_block_ops hvpci_block_ops;
> +EXPORT_SYMBOL(hvpci_block_ops);
> +
> +int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
> +			unsigned int block_id, unsigned int *bytes_returned)
> +{
> +	if (!hvpci_block_ops.read_block)
> +		return -EOPNOTSUPP;
> +
> +	return hvpci_block_ops.read_block(dev, buf, buf_len, block_id,
> +					  bytes_returned);
> +}
> +EXPORT_SYMBOL(hyperv_read_cfg_blk);
> +
> +int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
> +			 unsigned int block_id)
> +{
> +	if (!hvpci_block_ops.write_block)
> +		return -EOPNOTSUPP;
> +
> +	return hvpci_block_ops.write_block(dev, buf, len, block_id);
> +}
> +EXPORT_SYMBOL(hyperv_write_cfg_blk);
> +
> +int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
> +				void (*block_invalidate)(void *context,
> +							 u64 block_mask))
> +{
> +	if (!hvpci_block_ops.reg_blk_invalidate)
> +		return -EOPNOTSUPP;
> +
> +	return hvpci_block_ops.reg_blk_invalidate(dev, context,
> +						  block_invalidate);
> +}
> +EXPORT_SYMBOL(hyperv_reg_block_invalidate);
> +
> +static void __exit exit_hv_pci_mini(void)
> +{
> +	pr_info("unloaded\n");
> +}
> +
> +static int __init init_hv_pci_mini(void)
> +{
> +	pr_info("loaded\n");
> +
> +	return 0;
> +}
> +
> +module_init(init_hv_pci_mini);
> +module_exit(exit_hv_pci_mini);
> +
> +MODULE_DESCRIPTION("Hyper-V PCI Mini");
> +MODULE_LICENSE("GPL v2");
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 57adeca..9c93ac2 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -983,7 +983,6 @@ int hv_read_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
>  	*bytes_returned = comp_pkt.bytes_returned;
>  	return 0;
>  }
> -EXPORT_SYMBOL(hv_read_config_block);
>  
>  /**
>   * hv_pci_write_config_compl() - Invoked when a response packet for a write
> @@ -1070,7 +1069,6 @@ int hv_write_config_block(struct pci_dev *pdev, void *buf, unsigned int len,
>  
>  	return 0;
>  }
> -EXPORT_SYMBOL(hv_write_config_block);
>  
>  /**
>   * hv_register_block_invalidate() - Invoked when a config block invalidation
> @@ -1101,7 +1099,6 @@ int hv_register_block_invalidate(struct pci_dev *pdev, void *context,
>  	return 0;
>  
>  }
> -EXPORT_SYMBOL(hv_register_block_invalidate);
>  
>  /* Interrupt management hooks */
>  static void hv_int_desc_free(struct hv_pci_dev *hpdev,
> @@ -3045,10 +3042,19 @@ static int hv_pci_remove(struct hv_device *hdev)
>  static void __exit exit_hv_pci_drv(void)
>  {
>  	vmbus_driver_unregister(&hv_pci_drv);
> +
> +	hvpci_block_ops.read_block = NULL;
> +	hvpci_block_ops.write_block = NULL;
> +	hvpci_block_ops.reg_blk_invalidate = NULL;
>  }
>  
>  static int __init init_hv_pci_drv(void)
>  {
> +	/* Initialize PCI block r/w interface */
> +	hvpci_block_ops.read_block = hv_read_config_block;
> +	hvpci_block_ops.write_block = hv_write_config_block;
> +	hvpci_block_ops.reg_blk_invalidate = hv_register_block_invalidate;
> +
>  	return vmbus_driver_register(&hv_pci_drv);
>  }
>  
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 9d37f8c..2afe6fd 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1579,18 +1579,32 @@ struct vmpacket_descriptor *
>  	    pkt = hv_pkt_iter_next(channel, pkt))
>  
>  /*
> - * Functions for passing data between SR-IOV PF and VF drivers.  The VF driver
> + * Interface for passing data between SR-IOV PF and VF drivers. The VF driver
>   * sends requests to read and write blocks. Each block must be 128 bytes or
>   * smaller. Optionally, the VF driver can register a callback function which
>   * will be invoked when the host says that one or more of the first 64 block
>   * IDs is "invalid" which means that the VF driver should reread them.
>   */
>  #define HV_CONFIG_BLOCK_SIZE_MAX 128
> -int hv_read_config_block(struct pci_dev *dev, void *buf, unsigned int buf_len,
> -			 unsigned int block_id, unsigned int *bytes_returned);
> -int hv_write_config_block(struct pci_dev *dev, void *buf, unsigned int len,
> -			  unsigned int block_id);
> -int hv_register_block_invalidate(struct pci_dev *dev, void *context,
> -				 void (*block_invalidate)(void *context,
> -							  u64 block_mask));
> +
> +int hyperv_read_cfg_blk(struct pci_dev *dev, void *buf, unsigned int buf_len,
> +			unsigned int block_id, unsigned int *bytes_returned);
> +int hyperv_write_cfg_blk(struct pci_dev *dev, void *buf, unsigned int len,
> +			 unsigned int block_id);
> +int hyperv_reg_block_invalidate(struct pci_dev *dev, void *context,
> +				void (*block_invalidate)(void *context,
> +							 u64 block_mask));
> +
> +struct hyperv_pci_block_ops {
> +	int (*read_block)(struct pci_dev *dev, void *buf, unsigned int buf_len,
> +			  unsigned int block_id, unsigned int *bytes_returned);
> +	int (*write_block)(struct pci_dev *dev, void *buf, unsigned int len,
> +			   unsigned int block_id);
> +	int (*reg_blk_invalidate)(struct pci_dev *dev, void *context,
> +				  void (*block_invalidate)(void *context,
> +							   u64 block_mask));
> +};
> +
> +extern struct hyperv_pci_block_ops hvpci_block_ops;
> +
>  #endif /* _HYPERV_H */

-- 
Vitaly

^ permalink raw reply

* Re: WARNING in tracepoint_probe_register_prio (3)
From: syzbot @ 2019-08-16 12:32 UTC (permalink / raw)
  To: antoine.tenart, ard.biesheuvel, baruch, bigeasy, davem, gregkh,
	gustavo, jeyu, linux-kernel, mathieu.desnoyers, maxime.chevallier,
	mingo, netdev, paulmck, paulmck, rmk+kernel, rostedt,
	syzkaller-bugs, tglx
In-Reply-To: <000000000000ab6f84056c786b93@google.com>

syzbot has bisected this bug to:

commit ecb9f80db23a7ab09b46b298b404e41dd7aff6e6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Aug 13 08:00:25 2019 +0000

     net/mvpp2: Replace tasklet with softirq hrtimer

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13ffb9ee600000
start commit:   ecb9f80d net/mvpp2: Replace tasklet with softirq hrtimer
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=100079ee600000
console output: https://syzkaller.appspot.com/x/log.txt?x=17ffb9ee600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d4cf1ffb87d590d7
dashboard link: https://syzkaller.appspot.com/bug?extid=774fddf07b7ab29a1e55
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11b02a22600000

Reported-by: syzbot+774fddf07b7ab29a1e55@syzkaller.appspotmail.com
Fixes: ecb9f80db23a ("net/mvpp2: Replace tasklet with softirq hrtimer")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: r8169: Performance regression and latency instability
From: Eric Dumazet @ 2019-08-16 12:35 UTC (permalink / raw)
  To: Juliana Rodrigueiro, netdev; +Cc: edumazet, hkallweit1
In-Reply-To: <72898d5b-9424-0bcd-3d8a-fc2e2dd0dbf1@intra2net.com>



On 8/16/19 2:09 PM, Juliana Rodrigueiro wrote:
> Greetings!
> 
> During migration from kernel 3.14 to 4.19, we noticed a regression on the network performance. Under the exact same circumstances, the standard deviation of the latency is more than double than before on the Realtek RTL8111/8168B (10ec:8168) using the r8169 driver.
> 
> Kernel 3.14:
>     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O STDDEV_LATENCY -m 64K -d Send
>     313.37
> 
> Kernel 4.19:
>     # netperf -v 2 -P 0 -H <netserver-IP>,4 -I 99,5 -t omni -l 1 -- -O STDDEV_LATENCY -m 64K -d Send
>     632.96
> 
> In contrast, we noticed small improvements in performance with other non-Realtek network cards (igb, tg3). Which suggested a possible driver related bug.
> 
> However after bisecting the code, I ended up with the following patch, which was introduced in kernel 4.17 and modifies net/ipv4:
> 
>     commit 0a6b2a1dc2a2105f178255fe495eb914b09cb37a
>     Author: Eric Dumazet <edumazet@google.com>
>     Date:   Mon Feb 19 11:56:47 2018 -0800
> 
>         tcp: switch to GSO being always on
> 
> Could you please help me to clarify, should GSO be always on on my device? Or does it just affect TCP? According to ethtool it is always off, "ethtool -K eth0 gso on" has no effect, unless I switch SG on.
> 
>     # ethtool -k eth0
>     Offload parameters for eth0:
>     Cannot get device udp large send offload settings: Operation not supported
>     rx-checksumming: on
>     tx-checksumming: off
>     scatter-gather: off
>     tcp-segmentation-offload: off
>     udp-fragmentation-offload: off
>     generic-segmentation-offload: off
>     generic-receive-offload: on
>     large-receive-offload: off
> 
> I validated that reverting "tcp: switch to GSO being always on" successfully brings back the better performance for the r8169 driver.
> 
> I'm sure that reverting that commit is not the optimal solution, so I would like to kindly ask for help to shed some light in this issue.

Hi Juliana

I am sure that all commits done in TCP stack can show a regression on a particular
combination of packet sizes, MTU size, NIC, and measured metric.

Basically if your NIC does not support SG and TSO, there is a possibility
that the changes we did to enter the era of 100Gbit and 200Gbit NIC might
hurt a bit.

Lack of SG means that the lower stack might have to perform memory  allocations
to perform the segmentation and this might be slow (or even fail) under memory pressure.

I have no idea why you can even turn on SG, if it is turned off by default.

Please give us more information on the NIC

ethtool -i eth0 ; ifconfig eth0

Possibly try to use a recent ethtool, it seems yours is pretty old.

I also see this relevant commit : I have no idea why SG would have any relation with TSO.

commit a7eb6a4f2560d5ae64bfac98d79d11378ca2de6c
Author: Holger Hoffstätte <holger@applied-asynchrony.com>
Date:   Fri Aug 9 00:02:40 2019 +0200

    r8169: fix performance issue on RTL8168evl
    
    Disabling TSO but leaving SG active results is a significant
    performance drop. Therefore disable also SG on RTL8168evl.
    This restores the original performance.
    
    Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
    Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com>
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index b2a275d8504c..912bd41eaa1b 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -6898,9 +6898,9 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
        /* RTL8168e-vl has a HW issue with TSO */
        if (tp->mac_version == RTL_GIGA_MAC_VER_34) {
-               dev->vlan_features &= ~NETIF_F_ALL_TSO;
-               dev->hw_features &= ~NETIF_F_ALL_TSO;
-               dev->features &= ~NETIF_F_ALL_TSO;
+               dev->vlan_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+               dev->hw_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
+               dev->features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG);
        }
 
        dev->hw_features |= NETIF_F_RXALL;


^ permalink raw reply related

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
From: Vladimir Oltean @ 2019-08-16 12:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190816121837.GD4039@sirena.co.uk>

Hi Mark,

On Fri, 16 Aug 2019 at 15:18, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:
>
> > @@ -842,6 +843,9 @@ struct spi_transfer {
> >
> >       u32             effective_speed_hz;
> >
> > +     struct ptp_system_timestamp *ptp_sts;
> > +     unsigned int    ptp_sts_word_offset;
> > +
>
> You've not documented these fields at all so it's not clear what the
> intended usage is.

Thanks for looking into this.
Indeed I didn't document them as the patch is part of a RFC and I
thought the purpose was more clear from the context (cover letter
etc).
If I do ever send a patchset for submission I will document the newly
introduced fields properly.
So let me clarify:
The SPI slave device driver is populating these fields to indicate to
the controller driver that it wants word number @ptp_sts_word_offset
from the tx buffer snapshotted. The controller driver is supposed to
put the snapshot into the @ptp_sts field, which is a pointer to a
memory location under the control of the SPI slave device driver.
It is ok if the ptp_sts pointer is NULL (no need to check), because
the API for taking snapshots already checks for that.
At the moment there is yet no proposed mechanism for the SPI slave
driver to ensure that the controller will really act upon this
request. That would be really nice to have, since some SPI slave
devices are time-sensitive and warning early is a good way to prevent
unnecessary troubleshooting.

Regards,
-Vladimir

^ permalink raw reply

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
From: Vladimir Oltean @ 2019-08-16 12:37 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev
In-Reply-To: <20190816122103.GE4039@sirena.co.uk>

Hi Mark,

On Fri, 16 Aug 2019 at 15:21, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 03:44:42AM +0300, Vladimir Oltean wrote:
> > This patch addresses some cosmetic issues:
> > - Alignment
> > - Typos
> > - (Non-)use of BIT() and GENMASK() macros
> > - Unused definitions
> > - Unused includes
> > - Abuse of ternary operator in detriment of readability
> > - Reduce indentation level
>
> This is difficult to review since there's a bunch of largely unrelated
> changes all munged into one patch.  It'd be better to split this up so
> each change makes one kind of fix, and better to do this separately to
> the rest of the series.  In particular having alignment changes along
> with other changes hurts reviewability as it's less immediately clear
> what's a like for liken substitution.

Yes, the diff of this patch looks relatively bad. But I don't know if
splitting it in more patches isn't in fact going to pollute the git
history, so I can just as well drop it.

Regards,
-Vladimir

^ permalink raw reply

* Re: linux-next: manual merge of the net-next tree with the kbuild tree
From: Stephen Rothwell @ 2019-08-16 12:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David Miller, Networking, Masahiro Yamada,
	Linux Next Mailing List, Linux Kernel Mailing List, Kees Cook,
	Andrii Nakryiko, Daniel Borkmann
In-Reply-To: <20190816160128.36e5cc4e@canb.auug.org.au>

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

Hi all,

On Fri, 16 Aug 2019 16:01:28 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Thu, 15 Aug 2019 22:21:29 -0700 Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > Thanks, Stephen! Looks good except one minor issue below.  
> 
> Thanks for checking.
> 
> > >   vmlinux_link()
> > >   {
> > >  +      info LD ${2}    
> > 
> > This needs to be ${1}.  
> 
> At least its only an information message and doesn't affect the build.
> I will fix my resolution for Monday.

I also fixed it up in today's linux-next (just so people aren't
suprised and report it :-)).
-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: WARNING in tracepoint_probe_register_prio (3)
From: Sebastian Andrzej Siewior @ 2019-08-16 12:41 UTC (permalink / raw)
  To: syzbot
  Cc: antoine.tenart, ard.biesheuvel, baruch, davem, gregkh, gustavo,
	jeyu, linux-kernel, mathieu.desnoyers, maxime.chevallier, mingo,
	netdev, paulmck, paulmck, rmk+kernel, rostedt, syzkaller-bugs,
	tglx
In-Reply-To: <000000000000479a1705903b2dc9@google.com>

On 2019-08-16 05:32:00 [-0700], syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit ecb9f80db23a7ab09b46b298b404e41dd7aff6e6
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Tue Aug 13 08:00:25 2019 +0000
> 
>     net/mvpp2: Replace tasklet with softirq hrtimer
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13ffb9ee600000

that bisect is wrong. That warning triggered once and this commit was
the top most one in net-next at the time…

Sebastian

^ permalink raw reply

* Re: linux-next: Signed-off-by missing for commits in the net-next tree
From: Chris Mason @ 2019-08-16 12:45 UTC (permalink / raw)
  To: Andy Grover
  Cc: Gerd Rausch, Stephen Rothwell, David Miller, Networking,
	Linux Next Mailing List, Linux Kernel Mailing List, Andy Grover,
	Chris Mason
In-Reply-To: <e85146f3-93a0-b23f-6a6e-11e42815946d@groveronline.com>

On 16 Aug 2019, at 5:15, Andy Grover wrote:

> On 8/16/19 3:06 PM, Gerd Rausch wrote:
>> Hi,
>>
>> Just added the e-mail addresses I found using a simple "google 
>> search",
>> in order to reach out to the original authors of these commits:
>> Chris Mason and Andy Grover.
>>
>> I'm hoping they still remember their work from 7-8 years ago.
>
> Yes looks like what I was working on. What did you need from me? It's
> too late to amend the commitlogs...

Same question ;)  The missing signed-off-by is a mistake, but from the 
point of view of the DCO, these patches are totally fine by me.

-chris

^ permalink raw reply

* Unable to create htb tc classes more than 64K
From: Akshat Kakkar @ 2019-08-16 12:48 UTC (permalink / raw)
  To: netfilter-devel, lartc, netdev

I want to have around 1 Million htb tc classes.
The simple structure of htb tc class, allow having only 64K classes at once.
But, it is possible to make it more hierarchical using hierarchy of
qdisc and classes.
For this I tried something like this

tc qdisc add dev eno2 root handle 100: htb
tc class add dev eno2 parent 100: classid 100:1 htb rate 100Mbps
tc class add dev eno2 parent 100: classid 100:2 htb rate 100Mbps

tc qdisc add dev eno2 parent 100:1 handle 1: htb
tc class add dev eno2 parent 1: classid 1:10 htb rate 100kbps
tc class add dev eno2 parent 1: classid 1:20 htb rate 300kbps

tc qdisc add dev eno2 parent 100:2 handle 2: htb
tc class add dev eno2 parent 2: classid 2:10 htb rate 100kbps
tc class add dev eno2 parent 2: classid 2:20 htb rate 300kbps

What I want is something like:
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000001 fw flowid 1:10
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000002 fw flowid 1:20
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000003 fw flowid 2:10
tc filter add dev eno2 parent 100: protocol ip prio 1 handle
0x00000004 fw flowid 2:20

But I am unable to shape my traffic by any of 1:10, 1:20, 2:10 or 2:20.

Can you please suggest, where is it going wrong?
Is it not possible altogether?

-Akshat

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox