Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next v1] net/tls: Set count of SG entries if sk_alloc_sg returns -ENOSPC
From: Sabrina Dubroca @ 2018-09-06 11:22 UTC (permalink / raw)
  To: Vakul Garg; +Cc: netdev, borisp, aviadye, davejwatson, davem, doronrk
In-Reply-To: <20180905162743.10145-1-vakul.garg@nxp.com>

2018-09-05, 21:57:43 +0530, Vakul Garg wrote:
> tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
> function sk_alloc_sg(). In case the number of SG entries hit
> MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
> current SG index to '0'. This leads to calling of function
> tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
> kernel crash. To fix this, set the number of SG elements to the number
> of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
> returns -ENOSPC.
> 
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>

What commit introduced this bug? Can you add Fixes: tag? And if it's a
bug present in the net tree, the fix should go into the net tree as
well.


Another thing I've noticed a few times with your patch submissions:
the clock on the system you're using for git-send-email seems to be
set something like 5 hours and 18 minutes in the future. Could you fix
it?  It makes your email appear out of order.

Date: Wed,  5 Sep 2018 21:57:43 +0530

Received: from lti.ap.freescale.net (14.143.30.134) by
        AM0PR04MB4243.eurprd04.prod.outlook.com (2603:10a6:208:66::29) with Microsoft
        SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
        15.20.1101.17; Wed, 5 Sep 2018 11:09:20 +0000

-- 
Sabrina

^ permalink raw reply

* Re: [PATCH] [RFC v2] Drop all 00-INDEX files from Documentation/
From: Jonathan Corbet @ 2018-09-06 15:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mark Rutland, linux-mips, linux-fbdev, x86, kvm, linux-doc,
	Peter Zijlstra, James Hogan, Henrik Austad, Will Deacon,
	dri-devel, Masahiro Yamada, Jan Kandziora, Paul Mackerras,
	Henrik Austad, Pavel Machek, H. Peter Anvin, Evgeniy Polyakov,
	linux-s390, Ian Kent, linux-security-module, Paul Moore,
	Michael Ellerman, Helge Deller, Radim Krčmář,
	James E.J. Bottomley
In-Reply-To: <20180904095908.13298b3d@gandalf.local.home>

On Tue, 4 Sep 2018 09:59:08 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 4 Sep 2018 13:30:30 +0200
> Pavel Machek <pavel@ucw.cz> wrote:
> 
> > I'd say this is still quite valueable, and it might be worth fixing,
> > rather then removing completely.  
> 
> I agree. Perhaps we should have a 00-DESCRIPTION file in each
> directory, and each file could start with a:
> 
>  DESCRIPTION: <one line description here>
> 
> and then these files could be generated by those that have these tags.

I really don't want to hack up yet another documentation syntax and
processing scheme.  We already have one that does all of this and more.
That energy would be far better spent bringing the docs into the RST
hierarchy, IMO.

Thanks,

jon  (who is increasingly inclined to apply this patch)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH] wil6210: fix unsigned cid comparison with >= 0
From: Kalle Valo @ 2018-09-06 16:00 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Maya Erez, David S. Miller, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	wil6210-Rm6X0d1/PG5y9aJCnZT0Uw, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Gustavo A. R. Silva
In-Reply-To: <20180829175018.GA3776-L1vi/lXTdts+Va1GwOuvDg@public.gmane.org>

"Gustavo A. R. Silva" <gustavo-L1vi/lXTdts+Va1GwOuvDg@public.gmane.org> wrote:

> The comparison of cid >= 0 is always true because cid is of type u8
> (8 bits, unsigned).
> 
> Fix this by removing such comparison and updating the type of
> variable cid to u8 in the caller function.
> 
> Addresses-Coverity-ID: 1473079 ("Unsigned compared against 0")
> Fixes: b9010f105f21 ("wil6210: add FT roam support for AP and station")
> Signed-off-by: Gustavo A. R. Silva <gustavo-L1vi/lXTdts+Va1GwOuvDg@public.gmane.org>
> Signed-off-by: Kalle Valo <kvalo-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

Patch applied to ath-next branch of ath.git, thanks.

49925f247016 wil6210: fix unsigned cid comparison with >= 0

-- 
https://patchwork.kernel.org/patch/10580739/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH] [RFC v2] Drop all 00-INDEX files from Documentation/
From: Steven Rostedt @ 2018-09-06 16:01 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Mark Rutland, linux-mips, linux-fbdev, x86, kvm, linux-doc,
	Peter Zijlstra, James Hogan, Henrik Austad, Will Deacon,
	dri-devel, Masahiro Yamada, Jan Kandziora, Paul Mackerras,
	Henrik Austad, Pavel Machek, H. Peter Anvin, Evgeniy Polyakov,
	linux-s390, Ian Kent, linux-security-module, Paul Moore,
	Michael Ellerman, Helge Deller, Radim Krčmář,
	James E.J. Bottomley
In-Reply-To: <20180906095804.5ab2716f@lwn.net>

On Thu, 6 Sep 2018 09:58:04 -0600
Jonathan Corbet <corbet@lwn.net> wrote:

> Thanks,
> 
> jon  (who is increasingly inclined to apply this patch)

As Colin Kaepernick now says... "Just do it!"

;-)

-- Steve
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: [PATCH 1/1] ath10k: avoid possible memory access violation
From: Kalle Valo @ 2018-09-06 16:04 UTC (permalink / raw)
  To: K.T.VIJAYAKUMAAR
  Cc: davem, ath10k, linux-wireless, netdev, linux-kernel, cpgs,
	vijay.bvb
In-Reply-To: <1533292805-9709-1-git-send-email-vijay.bvb@samsung.com>

"K.T.VIJAYAKUMAAR" <vijay.bvb@samsung.com> wrote:

> array "ctl_power_table" access index "pream" is initialized with -1 and
> is raised as a static analysis tool issue.
> [drivers\net\wireless\ath\ath10k\wmi.c:4719] ->
> [drivers\net\wireless\ath\ath10k\wmi.c:4730]: (error) Array index -1 is
> out of bounds.
> 
> Since the "pream" index for accessing ctl_power_table array is initialized
> with -1, there is a chance of memory access violation for the cases below.
> 1) wmi_pdev_tpc_final_table_event change frequency is between 2483 and 5180
> 2) pream_idx is out of the enumeration ranges of wmi_tpc_pream_2ghz,
> wmi_tpc_pream_5ghz
> 
> Signed-off-by: K.T.VIJAYAKUMAAR <vijay.bvb@samsung.com>
> [kvalo@codeaurora.org: clean up the warning message]
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

97c69a70dc2c ath10k: avoid possible memory access violation

-- 
https://patchwork.kernel.org/patch/10554929/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [PATCH 1/1] ath10k: avoid possible memory access violation
From: Kalle Valo @ 2018-09-06 16:04 UTC (permalink / raw)
  To: K.T.VIJAYAKUMAAR
  Cc: netdev, linux-wireless, linux-kernel, ath10k, cpgs, vijay.bvb,
	davem
In-Reply-To: <1533292805-9709-1-git-send-email-vijay.bvb@samsung.com>

"K.T.VIJAYAKUMAAR" <vijay.bvb@samsung.com> wrote:

> array "ctl_power_table" access index "pream" is initialized with -1 and
> is raised as a static analysis tool issue.
> [drivers\net\wireless\ath\ath10k\wmi.c:4719] ->
> [drivers\net\wireless\ath\ath10k\wmi.c:4730]: (error) Array index -1 is
> out of bounds.
> 
> Since the "pream" index for accessing ctl_power_table array is initialized
> with -1, there is a chance of memory access violation for the cases below.
> 1) wmi_pdev_tpc_final_table_event change frequency is between 2483 and 5180
> 2) pream_idx is out of the enumeration ranges of wmi_tpc_pream_2ghz,
> wmi_tpc_pream_5ghz
> 
> Signed-off-by: K.T.VIJAYAKUMAAR <vijay.bvb@samsung.com>
> [kvalo@codeaurora.org: clean up the warning message]
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

97c69a70dc2c ath10k: avoid possible memory access violation

-- 
https://patchwork.kernel.org/patch/10554929/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* [PATCH] r8169: inform about CLKRUN protocol issue when behind a CardBus bridge
From: Maciej S. Szmigiero @ 2018-09-06 16:10 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David S. Miller; +Cc: netdev, linux-kernel

It turns out that at least some r8169 CardBus cards don't operate correctly
when CLKRUN protocol is enabled - the symptoms are recurring timeouts
during PHY reads / writes and a very high packet drop rate.
This is true of at least RTL8169sc/8110sc (XID 18000000) chip in
Sunrich C-160 CardBus NIC.

Such behavior was observed on two separate laptops, the first one has
TI PCIxx12 CardBus bridge, while the second one has Ricoh RL5c476II.

Setting CLKRUN_En bit in CONFIG 3 register via an EEPROM write didn't
improve things in either case (this is probably why it wasn't set by the
card manufacturer).
The only way to fix the issue was to disable the CLKRUN protocol either
in the CardBus bridge (only possible in the TI one) or in the southbridge.

Since the problem takes some time to debug let's warn people that have
the suspect configuration (Conventional PCI r8169 NIC behind a CardBus
bridge) so they know what they can do if they encounter it.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
---
 drivers/net/ethernet/realtek/r8169.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index b08d51bf7a20..b935a18358cb 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -7254,6 +7254,22 @@ static int rtl_jumbo_max(struct rtl8169_private *tp)
 	}
 }
 
+static void rtl_pci_cardbus_check(struct pci_dev *pdev)
+{
+	struct pci_dev *parent = pdev;
+
+	while ((parent = pci_upstream_bridge(parent)) != NULL) {
+		if (parent->hdr_type != PCI_HEADER_TYPE_CARDBUS)
+			continue;
+
+		dev_info(&pdev->dev,
+			 "device is behind a CardBus bridge\n");
+		dev_info(&pdev->dev,
+			 "in case of erratic or no operation try disabling CLKRUN protocol in the CardBus bridge or in the southbridge\n");
+		break;
+	}
+}
+
 static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	const struct rtl_cfg_info *cfg = rtl_cfg_infos + ent->driver_data;
@@ -7305,8 +7321,10 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	tp->mmio_addr = pcim_iomap_table(pdev)[region];
 
-	if (!pci_is_pcie(pdev))
+	if (!pci_is_pcie(pdev)) {
 		dev_info(&pdev->dev, "not PCI Express\n");
+		rtl_pci_cardbus_check(pdev);
+	}
 
 	/* Identify chip attached to board */
 	rtl8169_get_mac_version(tp, cfg->default_ver);
-- 
2.17.0

^ permalink raw reply related

* KASAN: use-after-free Read in _decode_session6
From: syzbot @ 2018-09-06 16:41 UTC (permalink / raw)
  To: davem, herbert, kuznet, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1556a396400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7e83258d6e0156
dashboard link: https://syzkaller.appspot.com/bug?extid=e8c1d30881266e47eb33
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14d42021400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13d09f1e400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e8c1d30881266e47eb33@syzkaller.appspotmail.com

audit: type=1400 audit(1536202560.246:9): avc:  denied  { prog_run } for   
pid=4752 comm="syz-executor726"  
scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023  
tcontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023 tclass=bpf  
permissive=1
==================================================================
BUG: KASAN: use-after-free in _decode_session6+0x1331/0x14e0  
net/ipv6/xfrm6_policy.c:161
Read of size 1 at addr ffff8801bb87e97f by task syz-executor726/4752

CPU: 0 PID: 4752 Comm: syz-executor726 Not tainted 4.19.0-rc2+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161
  __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299
  xfrm_decode_session include/net/xfrm.h:1232 [inline]
  vti6_tnl_xmit+0x3fc/0x1bb1 net/ipv6/ip6_vti.c:542
  __netdev_start_xmit include/linux/netdevice.h:4287 [inline]
  netdev_start_xmit include/linux/netdevice.h:4296 [inline]
  xmit_one net/core/dev.c:3216 [inline]
  dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3232
  __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3802
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3835
  __bpf_tx_skb net/core/filter.c:2012 [inline]
  __bpf_redirect_common net/core/filter.c:2050 [inline]
  __bpf_redirect+0x5b7/0xae0 net/core/filter.c:2057
  ____bpf_clone_redirect net/core/filter.c:2090 [inline]
  bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2062
  bpf_prog_c39d1ba309a769f7+0xd1e/0x1000

Allocated by task 4752:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  __do_kmalloc_node mm/slab.c:3682 [inline]
  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3696
  __kmalloc_reserve.isra.41+0x3a/0xe0 net/core/skbuff.c:137
  pskb_expand_head+0x230/0x10e0 net/core/skbuff.c:1463
  skb_ensure_writable+0x3dd/0x640 net/core/skbuff.c:5129
  __bpf_try_make_writable net/core/filter.c:1633 [inline]
  bpf_try_make_writable net/core/filter.c:1639 [inline]
  bpf_try_make_head_writable net/core/filter.c:1647 [inline]
  ____bpf_clone_redirect net/core/filter.c:2084 [inline]
  bpf_clone_redirect+0x14a/0x490 net/core/filter.c:2062
  bpf_prog_c39d1ba309a769f7+0xd1e/0x1000

Freed by task 4752:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x210 mm/slab.c:3813
  skb_free_head+0x99/0xc0 net/core/skbuff.c:550
  skb_release_data+0x6a4/0x880 net/core/skbuff.c:570
  skb_release_all+0x4a/0x60 net/core/skbuff.c:627
  __kfree_skb net/core/skbuff.c:641 [inline]
  kfree_skb+0x19d/0x4e0 net/core/skbuff.c:659
  vti6_tnl_xmit+0x387/0x1bb1 net/ipv6/ip6_vti.c:561
  __netdev_start_xmit include/linux/netdevice.h:4287 [inline]
  netdev_start_xmit include/linux/netdevice.h:4296 [inline]
  xmit_one net/core/dev.c:3216 [inline]
  dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3232
  __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3802
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3835
  __bpf_tx_skb net/core/filter.c:2012 [inline]
  __bpf_redirect_common net/core/filter.c:2050 [inline]
  __bpf_redirect+0x5b7/0xae0 net/core/filter.c:2057
  ____bpf_clone_redirect net/core/filter.c:2090 [inline]
  bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2062
  bpf_prog_c39d1ba309a769f7+0xd1e/0x1000

The buggy address belongs to the object at ffff8801bb87e780
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 511 bytes inside of
  512-byte region [ffff8801bb87e780, ffff8801bb87e980)
The buggy address belongs to the page:
page:ffffea0006ee1f80 count:1 mapcount:0 mapping:ffff8801dac00940 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006eefd48 ffffea00073f5f08 ffff8801dac00940
raw: 0000000000000000 ffff8801bb87e000 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801bb87e800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801bb87e880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801bb87e900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                 ^
  ffff8801bb87e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801bb87ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Michael S. Tsirkin @ 2018-09-06 16:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-12-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:26PM +0800, Jason Wang wrote:
> This patch implements XDP batching for vhost_net. The idea is first to
> try to do userspace copy and build XDP buff directly in vhost. Instead
> of submitting the packet immediately, vhost_net will batch them in an
> array and submit every 64 (VHOST_NET_BATCH) packets to the under layer
> sockets through msg_control of sendmsg().
> 
> When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a
> loop without caring GUP thus it can do batch map flushing. When XDP is
> not enabled or not supported, the underlayer socket need to build skb
> and pass it to network core. The batched packet submission allows us
> to do batching like netif_receive_skb_list() in the future.
> 
> This saves lots of indirect calls for better cache utilization. For
> the case that we can't so batching e.g when sndbuf is limited or
> packet size is too large, we will go for usual one packet per
> sendmsg() way.
> 
> Doing testpmd on various setups gives us:
> 
> Test                /+pps%
> XDP_DROP on TAP     /+44.8%
> XDP_REDIRECT on TAP /+29%
> macvtap (skb)       /+26%
> 
> Netperf tests shows obvious improvements for small packet transmission:
> 
> size/session/+thu%/+normalize%
>    64/     1/   +2%/    0%
>    64/     2/   +3%/   +1%
>    64/     4/   +7%/   +5%
>    64/     8/   +8%/   +6%
>   256/     1/   +3%/    0%
>   256/     2/  +10%/   +7%
>   256/     4/  +26%/  +22%
>   256/     8/  +27%/  +23%
>   512/     1/   +3%/   +2%
>   512/     2/  +19%/  +14%
>   512/     4/  +43%/  +40%
>   512/     8/  +45%/  +41%
>  1024/     1/   +4%/    0%
>  1024/     2/  +27%/  +21%
>  1024/     4/  +38%/  +73%
>  1024/     8/  +15%/  +24%
>  2048/     1/  +10%/   +7%
>  2048/     2/  +16%/  +12%
>  2048/     4/    0%/   +2%
>  2048/     8/    0%/   +2%
>  4096/     1/  +36%/  +60%
>  4096/     2/  -11%/  -26%
>  4096/     4/    0%/  +14%
>  4096/     8/    0%/   +4%
> 16384/     1/   -1%/   +5%
> 16384/     2/    0%/   +2%
> 16384/     4/    0%/   -3%
> 16384/     8/    0%/   +4%
> 65535/     1/    0%/  +10%
> 65535/     2/    0%/   +8%
> 65535/     4/    0%/   +1%
> 65535/     8/    0%/   +3%
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c | 164 ++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 151 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index fb01ce6d981c..1dd4239cbff8 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -116,6 +116,7 @@ struct vhost_net_virtqueue {
>  	 * For RX, number of batched heads
>  	 */
>  	int done_idx;
> +	int batched_xdp;

Pls add a comment documenting what does this new field do.

>  	/* an array of userspace buffers info */
>  	struct ubuf_info *ubuf_info;
>  	/* Reference counting for outstanding ubufs.
> @@ -123,6 +124,7 @@ struct vhost_net_virtqueue {
>  	struct vhost_net_ubuf_ref *ubufs;
>  	struct ptr_ring *rx_ring;
>  	struct vhost_net_buf rxq;
> +	struct xdp_buff xdp[VHOST_NET_BATCH];

This is a bit too much to have inline. 64 entries 48 bytes each, and we
have 2 of these structures, that's already >4K.

>  };
>  
>  struct vhost_net {
> @@ -338,6 +340,11 @@ static bool vhost_sock_zcopy(struct socket *sock)
>  		sock_flag(sock->sk, SOCK_ZEROCOPY);
>  }
>  
> +static bool vhost_sock_xdp(struct socket *sock)
> +{
> +	return sock_flag(sock->sk, SOCK_XDP);

what if an xdp program is attached while this processing
is going on? Flag value will be wrong - is this safe
and why? Pls add a comment.

> +}
> +
>  /* In case of DMA done not in order in lower device driver for some reason.
>   * upend_idx is used to track end of used idx, done_idx is used to track head
>   * of used idx. Once lower device DMA done contiguously, we will signal KVM
> @@ -444,10 +451,36 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
>  	nvq->done_idx = 0;
>  }
>  
> +static void vhost_tx_batch(struct vhost_net *net,
> +			   struct vhost_net_virtqueue *nvq,
> +			   struct socket *sock,
> +			   struct msghdr *msghdr)
> +{
> +	struct tun_msg_ctl ctl = {
> +		.type = nvq->batched_xdp << 16 | TUN_MSG_PTR,
> +		.ptr = nvq->xdp,
> +	};
> +	int err;
> +
> +	if (nvq->batched_xdp == 0)
> +		goto signal_used;
> +
> +	msghdr->msg_control = &ctl;
> +	err = sock->ops->sendmsg(sock, msghdr, 0);
> +	if (unlikely(err < 0)) {
> +		vq_err(&nvq->vq, "Fail to batch sending packets\n");
> +		return;
> +	}
> +
> +signal_used:
> +	vhost_net_signal_used(nvq);
> +	nvq->batched_xdp = 0;
> +}
> +
>  static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>  				    struct vhost_net_virtqueue *nvq,
>  				    unsigned int *out_num, unsigned int *in_num,
> -				    bool *busyloop_intr)
> +				    struct msghdr *msghdr, bool *busyloop_intr)
>  {
>  	struct vhost_virtqueue *vq = &nvq->vq;
>  	unsigned long uninitialized_var(endtime);
> @@ -455,8 +488,9 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
>  				  out_num, in_num, NULL, NULL);
>  
>  	if (r == vq->num && vq->busyloop_timeout) {
> +		/* Flush batched packets first */
>  		if (!vhost_sock_zcopy(vq->private_data))
> -			vhost_net_signal_used(nvq);
> +			vhost_tx_batch(net, nvq, vq->private_data, msghdr);
>  		preempt_disable();
>  		endtime = busy_clock() + vq->busyloop_timeout;
>  		while (vhost_can_busy_poll(endtime)) {
> @@ -512,7 +546,7 @@ static int get_tx_bufs(struct vhost_net *net,
>  	struct vhost_virtqueue *vq = &nvq->vq;
>  	int ret;
>  
> -	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, busyloop_intr);
> +	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr);
>  
>  	if (ret < 0 || ret == vq->num)
>  		return ret;
> @@ -540,6 +574,83 @@ static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
>  	       !vhost_vq_avail_empty(vq->dev, vq);
>  }
>  
> +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)

I wonder whether NET_IP_ALIGN make sense for XDP.

> +
> +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
> +			       struct iov_iter *from)
> +{
> +	struct vhost_virtqueue *vq = &nvq->vq;
> +	struct socket *sock = vq->private_data;
> +	struct page_frag *alloc_frag = &current->task_frag;
> +	struct virtio_net_hdr *gso;
> +	struct xdp_buff *xdp = &nvq->xdp[nvq->batched_xdp];
> +	size_t len = iov_iter_count(from);
> +	int headroom = vhost_sock_xdp(sock) ? XDP_PACKET_HEADROOM : 0;
> +	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +	int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + headroom + nvq->sock_hlen);
> +	int sock_hlen = nvq->sock_hlen;
> +	void *buf;
> +	int copied;
> +
> +	if (unlikely(len < nvq->sock_hlen))
> +		return -EFAULT;
> +
> +	if (SKB_DATA_ALIGN(len + pad) +
> +	    SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
> +		return -ENOSPC;
> +
> +	buflen += SKB_DATA_ALIGN(len + pad);
> +	alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
> +	if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
> +		return -ENOMEM;
> +
> +	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> +
> +	/* We store two kinds of metadata in the header which will be
> +	 * used for XDP_PASS to do build_skb():
> +	 * offset 0: buflen
> +	 * offset sizeof(int): vnet header

Define a struct for the metadata then?


> +	 */
> +	copied = copy_page_from_iter(alloc_frag->page,
> +				     alloc_frag->offset + sizeof(int),
> +				     sock_hlen, from);
> +	if (copied != sock_hlen)
> +		return -EFAULT;
> +
> +	gso = (struct virtio_net_hdr *)(buf + sizeof(int));
> +
> +	if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
> +	    vhost16_to_cpu(vq, gso->csum_start) +
> +	    vhost16_to_cpu(vq, gso->csum_offset) + 2 >
> +	    vhost16_to_cpu(vq, gso->hdr_len)) {
> +		gso->hdr_len = cpu_to_vhost16(vq,
> +			       vhost16_to_cpu(vq, gso->csum_start) +
> +			       vhost16_to_cpu(vq, gso->csum_offset) + 2);
> +
> +		if (vhost16_to_cpu(vq, gso->hdr_len) > len)
> +			return -EINVAL;
> +	}
> +
> +	len -= sock_hlen;
> +	copied = copy_page_from_iter(alloc_frag->page,
> +				     alloc_frag->offset + pad,
> +				     len, from);
> +	if (copied != len)
> +		return -EFAULT;
> +
> +	xdp->data_hard_start = buf;
> +	xdp->data = buf + pad;
> +	xdp->data_end = xdp->data + len;
> +	*(int *)(xdp->data_hard_start) = buflen;
> +
> +	get_page(alloc_frag->page);
> +	alloc_frag->offset += buflen;
> +
> +	++nvq->batched_xdp;
> +
> +	return 0;
> +}
> +
>  static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  {
>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> @@ -556,10 +667,14 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  	size_t len, total_len = 0;
>  	int err;
>  	int sent_pkts = 0;
> +	bool bulking = (sock->sk->sk_sndbuf == INT_MAX);

What does bulking mean?

>  
>  	for (;;) {
>  		bool busyloop_intr = false;
>  
> +		if (nvq->done_idx == VHOST_NET_BATCH)
> +			vhost_tx_batch(net, nvq, sock, &msg);
> +
>  		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
>  				   &busyloop_intr);
>  		/* On error, stop handling until the next kick. */
> @@ -577,14 +692,34 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  			break;
>  		}
>  
> -		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> -		vq->heads[nvq->done_idx].len = 0;
> -
>  		total_len += len;
> -		if (tx_can_batch(vq, total_len))
> -			msg.msg_flags |= MSG_MORE;
> -		else
> -			msg.msg_flags &= ~MSG_MORE;
> +
> +		/* For simplicity, TX batching is only enabled if
> +		 * sndbuf is unlimited.

What if sndbuf changes while this processing is going on?

> +		 */
> +		if (bulking) {
> +			err = vhost_net_build_xdp(nvq, &msg.msg_iter);
> +			if (!err) {
> +				goto done;
> +			} else if (unlikely(err != -ENOSPC)) {
> +				vhost_tx_batch(net, nvq, sock, &msg);
> +				vhost_discard_vq_desc(vq, 1);
> +				vhost_net_enable_vq(net, vq);
> +				break;
> +			}
> +
> +			/* We can't build XDP buff, go for single
> +			 * packet path but let's flush batched
> +			 * packets.
> +			 */
> +			vhost_tx_batch(net, nvq, sock, &msg);
> +			msg.msg_control = NULL;
> +		} else {
> +			if (tx_can_batch(vq, total_len))
> +				msg.msg_flags |= MSG_MORE;
> +			else
> +				msg.msg_flags &= ~MSG_MORE;
> +		}
>  
>  		/* TODO: Check specific error and bomb out unless ENOBUFS? */
>  		err = sock->ops->sendmsg(sock, &msg, len);
> @@ -596,15 +731,17 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
>  		if (err != len)
>  			pr_debug("Truncated TX packet: len %d != %zd\n",
>  				 err, len);
> -		if (++nvq->done_idx >= VHOST_NET_BATCH)
> -			vhost_net_signal_used(nvq);
> +done:
> +		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> +		vq->heads[nvq->done_idx].len = 0;
> +		++nvq->done_idx;
>  		if (vhost_exceeds_weight(++sent_pkts, total_len)) {
>  			vhost_poll_queue(&vq->poll);
>  			break;
>  		}
>  	}
>  
> -	vhost_net_signal_used(nvq);
> +	vhost_tx_batch(net, nvq, sock, &msg);
>  }
>  
>  static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
> @@ -1111,6 +1248,7 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>  		n->vqs[i].ubuf_info = NULL;
>  		n->vqs[i].upend_idx = 0;
>  		n->vqs[i].done_idx = 0;
> +		n->vqs[i].batched_xdp = 0;
>  		n->vqs[i].vhost_hlen = 0;
>  		n->vqs[i].sock_hlen = 0;
>  		n->vqs[i].rx_ring = NULL;
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH net-next 08/11] tun: switch to new type of msg_control
From: Michael S. Tsirkin @ 2018-09-06 16:54 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-9-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:23PM +0800, Jason Wang wrote:
> This patch introduces to a new tun/tap specific msg_control:
> 
> #define TUN_MSG_UBUF 1
> #define TUN_MSG_PTR  2
> struct tun_msg_ctl {
>        int type;
>        void *ptr;
> };
> 
> This allows us to pass different kinds of msg_control through
> sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
> be used by the existed vhost_net zerocopy code. The second is XDP
> buff, which allows vhost_net to pass XDP buff to TUN. This could be
> used to implement accepting an array of XDP buffs from vhost_net in
> the following patches.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

At this point, do we want to just add a new sock opt for tap's
benefit? Seems cleaner than (ab)using sendmsg.

> ---
>  drivers/net/tap.c      | 18 ++++++++++++------
>  drivers/net/tun.c      |  6 +++++-
>  drivers/vhost/net.c    |  7 +++++--
>  include/linux/if_tun.h |  7 +++++++
>  4 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/tap.c b/drivers/net/tap.c
> index f0f7cd977667..7996ed7cbf18 100644
> --- a/drivers/net/tap.c
> +++ b/drivers/net/tap.c
> @@ -619,7 +619,7 @@ static inline struct sk_buff *tap_alloc_skb(struct sock *sk, size_t prepad,
>  #define TAP_RESERVE HH_DATA_OFF(ETH_HLEN)
>  
>  /* Get packet from user space buffer */
> -static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
> +static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
>  			    struct iov_iter *from, int noblock)
>  {
>  	int good_linear = SKB_MAX_HEAD(TAP_RESERVE);
> @@ -663,7 +663,7 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
>  	if (unlikely(len < ETH_HLEN))
>  		goto err;
>  
> -	if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
> +	if (msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY)) {
>  		struct iov_iter i;
>  
>  		copylen = vnet_hdr.hdr_len ?
> @@ -724,11 +724,11 @@ static ssize_t tap_get_user(struct tap_queue *q, struct msghdr *m,
>  	tap = rcu_dereference(q->tap);
>  	/* copy skb_ubuf_info for callback when skb has no error */
>  	if (zerocopy) {
> -		skb_shinfo(skb)->destructor_arg = m->msg_control;
> +		skb_shinfo(skb)->destructor_arg = msg_control;
>  		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>  		skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
> -	} else if (m && m->msg_control) {
> -		struct ubuf_info *uarg = m->msg_control;
> +	} else if (msg_control) {
> +		struct ubuf_info *uarg = msg_control;
>  		uarg->callback(uarg, false);
>  	}
>  
> @@ -1150,7 +1150,13 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
>  		       size_t total_len)
>  {
>  	struct tap_queue *q = container_of(sock, struct tap_queue, sock);
> -	return tap_get_user(q, m, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
> +	struct tun_msg_ctl *ctl = m->msg_control;
> +
> +	if (ctl && ctl->type != TUN_MSG_UBUF)
> +		return -EINVAL;
> +
> +	return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
> +			    m->msg_flags & MSG_DONTWAIT);
>  }
>  
>  static int tap_recvmsg(struct socket *sock, struct msghdr *m,
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ff1cbf3ebd50..c839a4bdcbd9 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2429,11 +2429,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>  	int ret;
>  	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
>  	struct tun_struct *tun = tun_get(tfile);
> +	struct tun_msg_ctl *ctl = m->msg_control;
>  
>  	if (!tun)
>  		return -EBADFD;
>  
> -	ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter,
> +	if (ctl && ctl->type != TUN_MSG_UBUF)
> +		return -EINVAL;
> +
> +	ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
>  			   m->msg_flags & MSG_DONTWAIT,
>  			   m->msg_flags & MSG_MORE);
>  	tun_put(tun);
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 4e656f89cb22..fb01ce6d981c 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -620,6 +620,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  		.msg_controllen = 0,
>  		.msg_flags = MSG_DONTWAIT,
>  	};
> +	struct tun_msg_ctl ctl;
>  	size_t len, total_len = 0;
>  	int err;
>  	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
> @@ -664,8 +665,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
>  			ubuf->ctx = nvq->ubufs;
>  			ubuf->desc = nvq->upend_idx;
>  			refcount_set(&ubuf->refcnt, 1);
> -			msg.msg_control = ubuf;
> -			msg.msg_controllen = sizeof(ubuf);
> +			msg.msg_control = &ctl;
> +			ctl.type = TUN_MSG_UBUF;
> +			ctl.ptr = ubuf;
> +			msg.msg_controllen = sizeof(ctl);
>  			ubufs = nvq->ubufs;
>  			atomic_inc(&ubufs->refcount);
>  			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
> diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
> index 3d2996dc7d85..ba46dced1f38 100644
> --- a/include/linux/if_tun.h
> +++ b/include/linux/if_tun.h
> @@ -19,6 +19,13 @@
>  
>  #define TUN_XDP_FLAG 0x1UL
>  
> +#define TUN_MSG_UBUF 1
> +#define TUN_MSG_PTR  2

Looks like TUN_MSG_PTR should be pushed out to a follow-up patch?

> +struct tun_msg_ctl {
> +	int type;
> +	void *ptr;
> +};
> +

type actually includes a size. Why not two short fields then?


>  #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
>  struct socket *tun_get_socket(struct file *);
>  struct ptr_ring *tun_get_tx_ring(struct file *file);
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH net-next 01/11] net: sock: introduce SOCK_XDP
From: Michael S. Tsirkin @ 2018-09-06 16:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-2-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:16PM +0800, Jason Wang wrote:
> This patch introduces a new sock flag - SOCK_XDP. This will be used
> for notifying the upper layer that XDP program is attached on the
> lower socket, and requires for extra headroom.
> 
> TUN will be the first user.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

In fact vhost is the 1st user, right? So this can be
pushed out to become patch 10/11.

> ---
>  drivers/net/tun.c  | 19 +++++++++++++++++++
>  include/net/sock.h |  1 +
>  2 files changed, 20 insertions(+)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index ebd07ad82431..2c548bd20393 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -869,6 +869,9 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>  		tun_napi_init(tun, tfile, napi);
>  	}
>  
> +	if (rtnl_dereference(tun->xdp_prog))
> +		sock_set_flag(&tfile->sk, SOCK_XDP);
> +
>  	tun_set_real_num_queues(tun);
>  
>  	/* device is allowed to go away first, so no need to hold extra
> @@ -1241,13 +1244,29 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>  		       struct netlink_ext_ack *extack)
>  {
>  	struct tun_struct *tun = netdev_priv(dev);
> +	struct tun_file *tfile;
>  	struct bpf_prog *old_prog;
> +	int i;
>  
>  	old_prog = rtnl_dereference(tun->xdp_prog);
>  	rcu_assign_pointer(tun->xdp_prog, prog);
>  	if (old_prog)
>  		bpf_prog_put(old_prog);
>  
> +	for (i = 0; i < tun->numqueues; i++) {
> +		tfile = rtnl_dereference(tun->tfiles[i]);
> +		if (prog)
> +			sock_set_flag(&tfile->sk, SOCK_XDP);
> +		else
> +			sock_reset_flag(&tfile->sk, SOCK_XDP);
> +	}
> +	list_for_each_entry(tfile, &tun->disabled, next) {
> +		if (prog)
> +			sock_set_flag(&tfile->sk, SOCK_XDP);
> +		else
> +			sock_reset_flag(&tfile->sk, SOCK_XDP);
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 433f45fc2d68..38cae35f6e16 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -800,6 +800,7 @@ enum sock_flags {
>  	SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */
>  	SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */
>  	SOCK_TXTIME,
> +	SOCK_XDP, /* XDP is attached */
>  };
>  
>  #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH net-next 02/11] tuntap: switch to use XDP_PACKET_HEADROOM
From: Michael S. Tsirkin @ 2018-09-06 16:57 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-3-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:17PM +0800, Jason Wang wrote:
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

any idea why didn't we do this straight away?

> ---
>  drivers/net/tun.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 2c548bd20393..d3677a544b56 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -113,7 +113,6 @@ do {								\
>  } while (0)
>  #endif
>  
> -#define TUN_HEADROOM 256
>  #define TUN_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
>  
>  /* TUN device flags */
> @@ -1654,7 +1653,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(tun->xdp_prog);
>  	if (xdp_prog)
> -		pad += TUN_HEADROOM;
> +		pad += XDP_PACKET_HEADROOM;
>  	buflen += SKB_DATA_ALIGN(len + pad);
>  	rcu_read_unlock();
>  
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH net-next 03/11] tuntap: enable bh early during processing XDP
From: Michael S. Tsirkin @ 2018-09-06 17:02 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-4-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:18PM +0800, Jason Wang wrote:
> This patch move the bh enabling a little bit earlier, this will be
> used for factoring out the core XDP logic of tuntap.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

no reason to disable bh for non-xdp things.

> ---
>  drivers/net/tun.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d3677a544b56..372caf7d67d9 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1726,22 +1726,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  			goto err_xdp;
>  		}
>  	}
> +	rcu_read_unlock();
> +	local_bh_enable();
>  
>  	skb = build_skb(buf, buflen);
> -	if (!skb) {
> -		rcu_read_unlock();
> -		local_bh_enable();
> +	if (!skb)
>  		return ERR_PTR(-ENOMEM);
> -	}
>  
>  	skb_reserve(skb, pad - delta);
>  	skb_put(skb, len);
>  	get_page(alloc_frag->page);
>  	alloc_frag->offset += buflen;
>  
> -	rcu_read_unlock();
> -	local_bh_enable();
> -
>  	return skb;
>  
>  err_redirect:
> -- 
> 2.17.1

^ permalink raw reply

* [PATCH net-next] net: stmmac: Enable TC Ops for GMAC >= 4
From: Jose Abreu @ 2018-09-06 12:29 UTC (permalink / raw)
  To: netdev
  Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue

GMAC >= 4 also supports CBS. Lets enable the TC Ops for these versions.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
 drivers/net/ethernet/stmicro/stmmac/hwif.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/hwif.c b/drivers/net/ethernet/stmicro/stmmac/hwif.c
index d9a34a4d08b3..81b966a8261b 100644
--- a/drivers/net/ethernet/stmicro/stmmac/hwif.c
+++ b/drivers/net/ethernet/stmicro/stmmac/hwif.c
@@ -133,7 +133,7 @@ static const struct stmmac_hwif_entry {
 		.mac = &dwmac4_ops,
 		.hwtimestamp = &stmmac_ptp,
 		.mode = NULL,
-		.tc = NULL,
+		.tc = &dwmac510_tc_ops,
 		.setup = dwmac4_setup,
 		.quirks = stmmac_dwmac4_quirks,
 	}, {
@@ -150,7 +150,7 @@ static const struct stmmac_hwif_entry {
 		.mac = &dwmac410_ops,
 		.hwtimestamp = &stmmac_ptp,
 		.mode = &dwmac4_ring_mode_ops,
-		.tc = NULL,
+		.tc = &dwmac510_tc_ops,
 		.setup = dwmac4_setup,
 		.quirks = NULL,
 	}, {
@@ -167,7 +167,7 @@ static const struct stmmac_hwif_entry {
 		.mac = &dwmac410_ops,
 		.hwtimestamp = &stmmac_ptp,
 		.mode = &dwmac4_ring_mode_ops,
-		.tc = NULL,
+		.tc = &dwmac510_tc_ops,
 		.setup = dwmac4_setup,
 		.quirks = NULL,
 	}, {
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 04/11] tuntap: simplify error handling in tun_build_skb()
From: Michael S. Tsirkin @ 2018-09-06 17:14 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-5-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:19PM +0800, Jason Wang wrote:
> There's no need to duplicate page get logic in each action. So this
> patch tries to get page and calculate the offset before processing XDP
> actions, and undo them when meet errors (we don't care the performance
> on errors). This will be used for factoring out XDP logic.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I see some issues with this one.

> ---
>  drivers/net/tun.c | 37 ++++++++++++++++---------------------
>  1 file changed, 16 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 372caf7d67d9..f8cdcfa392c3 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1642,7 +1642,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  				     int len, int *skb_xdp)
>  {
>  	struct page_frag *alloc_frag = &current->task_frag;
> -	struct sk_buff *skb;
> +	struct sk_buff *skb = NULL;
>  	struct bpf_prog *xdp_prog;
>  	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  	unsigned int delta = 0;
> @@ -1668,6 +1668,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	if (copied != len)
>  		return ERR_PTR(-EFAULT);
>  
> +	get_page(alloc_frag->page);
> +	alloc_frag->offset += buflen;
> +

This adds an atomic op on XDP_DROP which is a data path
operation for some workloads.

>  	/* There's a small window that XDP may be set after the check
>  	 * of xdp_prog above, this should be rare and for simplicity
>  	 * we do XDP on skb in case the headroom is not enough.
> @@ -1695,23 +1698,15 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  
>  		switch (act) {
>  		case XDP_REDIRECT:
> -			get_page(alloc_frag->page);
> -			alloc_frag->offset += buflen;
>  			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
>  			xdp_do_flush_map();
>  			if (err)
> -				goto err_redirect;
> -			rcu_read_unlock();
> -			local_bh_enable();
> -			return NULL;
> +				goto err_xdp;
> +			goto out;
>  		case XDP_TX:
> -			get_page(alloc_frag->page);
> -			alloc_frag->offset += buflen;
>  			if (tun_xdp_tx(tun->dev, &xdp) < 0)
> -				goto err_redirect;
> -			rcu_read_unlock();
> -			local_bh_enable();
> -			return NULL;
> +				goto err_xdp;
> +			goto out;
>  		case XDP_PASS:
>  			delta = orig_data - xdp.data;
>  			len = xdp.data_end - xdp.data;
> @@ -1730,23 +1725,23 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	local_bh_enable();
>  
>  	skb = build_skb(buf, buflen);
> -	if (!skb)
> -		return ERR_PTR(-ENOMEM);
> +	if (!skb) {
> +		skb = ERR_PTR(-ENOMEM);
> +		goto out;

So goto out will skip put_page, and we did
do get_page above. Seems wrong. You should
goto err_skb or something like this.


> +	}
>  
>  	skb_reserve(skb, pad - delta);
>  	skb_put(skb, len);
> -	get_page(alloc_frag->page);
> -	alloc_frag->offset += buflen;
>  
>  	return skb;
>  
> -err_redirect:
> -	put_page(alloc_frag->page);
>  err_xdp:
> +	alloc_frag->offset -= buflen;
> +	put_page(alloc_frag->page);
> +out:

Out here isn't an error at all, is it?  You should not mix return and
error handling IMHO.



>  	rcu_read_unlock();
>  	local_bh_enable();
> -	this_cpu_inc(tun->pcpu_stats->rx_dropped);

Doesn't this break rx_dropped accounting?

> -	return NULL;
> +	return skb;
>  }
>  
>  /* Get packet from user space buffer */
> -- 
> 2.17.1

^ permalink raw reply

* Re: [PATCH net-next 05/11] tuntap: tweak on the path of non-xdp case in tun_build_skb()
From: Michael S. Tsirkin @ 2018-09-06 17:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-6-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:20PM +0800, Jason Wang wrote:
> If we're sure not to go native XDP, there's no need for several things
> like bh and rcu stuffs.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

True...

> ---
>  drivers/net/tun.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index f8cdcfa392c3..389aa0727cc6 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1675,10 +1675,12 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	 * of xdp_prog above, this should be rare and for simplicity
>  	 * we do XDP on skb in case the headroom is not enough.
>  	 */
> -	if (hdr->gso_type || !xdp_prog)
> +	if (hdr->gso_type || !xdp_prog) {
>  		*skb_xdp = 1;
> -	else
> -		*skb_xdp = 0;
> +		goto build;
> +	}
> +
> +	*skb_xdp = 0;
>  
>  	local_bh_disable();
>  	rcu_read_lock();
> @@ -1724,6 +1726,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	rcu_read_unlock();
>  	local_bh_enable();
>  
> +build:

But this is spaghetti code. Please just put common
code into functions and call them, don't goto.

>  	skb = build_skb(buf, buflen);
>  	if (!skb) {
>  		skb = ERR_PTR(-ENOMEM);
> -- 
> 2.17.1

^ permalink raw reply

* RE: [PATCH v2 1/2] net: ethernet: i40e: fix build error
From: Keller, Jacob E @ 2018-09-06 17:16 UTC (permalink / raw)
  To: Wang, Dongsheng, Kirsher, Jeffrey T,
	sergei.shtylyov@cogentembedded.com
  Cc: davem@davemloft.net, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <eab442369533452a9fc138e310aa57b9@HXTBJIDCEMVIW02.hxtcorp.net>



> -----Original Message-----
> From: Wang, Dongsheng [mailto:dongsheng.wang@hxt-semitech.com]
> Sent: Wednesday, September 05, 2018 8:36 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; sergei.shtylyov@cogentembedded.com
> Cc: davem@davemloft.net; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 1/2] net: ethernet: i40e: fix build error
> 
> Yes,  but not only i40e. igb/vf, ixgb/vf also share same code. If we change any of
> them, means we need to broken the whole layout of driver/net/ethernet/intel/ .
> Obviously we can't put header files to $src/include/net. :|
> 
> Cheers,
> Dongsheng
> 
> 

I'm more worried about how it interacts with modules. For example, we could have i40e and i40evf share some code, but then wouldn't one of them become dependent on the other? i.e. you'd have to load i40e in order to successfully load i40evf? Or you'd have to have some sort of common glue module which you load first, and then load i40e and i40evf after? This also creates some interactions with out-of-tree modules which make it difficult. It would be nice if we could share the code in some way that still resulted in allowing each module to be separate...

And yes, I igb/vf and ixgbe/vf and i40e/vf all share some code though i40e is the most shared, comparatively. It's not an easy undertaking, which is why it's not been done before. For fm10k, the same driver handles both the PF and VF, but I know that users weren't happy about having the driver change a lot when fixing/changing the PF, and thus thinking they might need to update their VF too much.

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH net-next 06/11] tuntap: split out XDP logic
From: Michael S. Tsirkin @ 2018-09-06 17:21 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20180906040526.22518-7-jasowang@redhat.com>

On Thu, Sep 06, 2018 at 12:05:21PM +0800, Jason Wang wrote:
> This patch split out XDP logic into a single function. This make it to
> be reused by XDP batching path in the following patch.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c | 84 ++++++++++++++++++++++++++++-------------------
>  1 file changed, 51 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 389aa0727cc6..21b125020b3b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1635,6 +1635,44 @@ static bool tun_can_build_skb(struct tun_struct *tun, struct tun_file *tfile,
>  	return true;
>  }
>  
> +static u32 tun_do_xdp(struct tun_struct *tun,
> +		      struct tun_file *tfile,
> +		      struct bpf_prog *xdp_prog,
> +		      struct xdp_buff *xdp,
> +		      int *err)
> +{
> +	u32 act = bpf_prog_run_xdp(xdp_prog, xdp);
> +
> +	switch (act) {
> +	case XDP_REDIRECT:
> +		*err = xdp_do_redirect(tun->dev, xdp, xdp_prog);
> +		xdp_do_flush_map();
> +		if (*err)
> +			break;
> +		goto out;
> +	case XDP_TX:
> +		*err = tun_xdp_tx(tun->dev, xdp);
> +		if (*err < 0)
> +			break;
> +		*err = 0;
> +		goto out;
> +	case XDP_PASS:
> +		goto out;

Do we need goto? why not just return?

> +	default:
> +		bpf_warn_invalid_xdp_action(act);
> +		/* fall through */
> +	case XDP_ABORTED:
> +		trace_xdp_exception(tun->dev, xdp_prog, act);
> +		/* fall through */
> +	case XDP_DROP:
> +		break;
> +	}
> +
> +	put_page(virt_to_head_page(xdp->data_hard_start));

put here because caller does get_page :( Not pretty.
I'd move this out to the caller.

> +out:
> +	return act;

How about combining err and act? err is < 0 XDP_PASS is > 0.
No need for pointers then.

> +}
> +
>  static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  				     struct tun_file *tfile,
>  				     struct iov_iter *from,
> @@ -1645,10 +1683,10 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	struct sk_buff *skb = NULL;
>  	struct bpf_prog *xdp_prog;
>  	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> -	unsigned int delta = 0;
>  	char *buf;
>  	size_t copied;
> -	int err, pad = TUN_RX_PAD;
> +	int pad = TUN_RX_PAD;
> +	int err = 0;
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(tun->xdp_prog);
> @@ -1685,9 +1723,8 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  	local_bh_disable();
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(tun->xdp_prog);
> -	if (xdp_prog && !*skb_xdp) {
> +	if (xdp_prog) {
>  		struct xdp_buff xdp;
> -		void *orig_data;
>  		u32 act;
>  
>  		xdp.data_hard_start = buf;
> @@ -1695,33 +1732,14 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  		xdp_set_data_meta_invalid(&xdp);
>  		xdp.data_end = xdp.data + len;
>  		xdp.rxq = &tfile->xdp_rxq;
> -		orig_data = xdp.data;
> -		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> -
> -		switch (act) {
> -		case XDP_REDIRECT:
> -			err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
> -			xdp_do_flush_map();
> -			if (err)
> -				goto err_xdp;
> -			goto out;
> -		case XDP_TX:
> -			if (tun_xdp_tx(tun->dev, &xdp) < 0)
> -				goto err_xdp;
> -			goto out;
> -		case XDP_PASS:
> -			delta = orig_data - xdp.data;
> -			len = xdp.data_end - xdp.data;
> -			break;
> -		default:
> -			bpf_warn_invalid_xdp_action(act);
> -			/* fall through */
> -		case XDP_ABORTED:
> -			trace_xdp_exception(tun->dev, xdp_prog, act);
> -			/* fall through */
> -		case XDP_DROP:
> +		act = tun_do_xdp(tun, tfile, xdp_prog, &xdp, &err);
> +		if (err)
>  			goto err_xdp;
> -		}
> +		if (act != XDP_PASS)
> +			goto out;

likely?

> +
> +		pad = xdp.data - xdp.data_hard_start;
> +		len = xdp.data_end - xdp.data;
>  	}
>  	rcu_read_unlock();
>  	local_bh_enable();
> @@ -1729,18 +1747,18 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  build:
>  	skb = build_skb(buf, buflen);
>  	if (!skb) {
> +		put_page(alloc_frag->page);
>  		skb = ERR_PTR(-ENOMEM);
>  		goto out;
>  	}
>  
> -	skb_reserve(skb, pad - delta);
> +	skb_reserve(skb, pad);
>  	skb_put(skb, len);
>  
>  	return skb;
>  
>  err_xdp:
> -	alloc_frag->offset -= buflen;
> -	put_page(alloc_frag->page);
> +	this_cpu_inc(tun->pcpu_stats->rx_dropped);


This fixes bug in previous patch which dropped it. OK :)

>  out:
>  	rcu_read_unlock();
>  	local_bh_enable();
> -- 
> 2.17.1

^ permalink raw reply

* Re: KASAN: slab-out-of-bounds Read in _decode_session6
From: Alexei Starovoitov @ 2018-09-06 17:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: syzbot, ast, daniel, davem, dvyukov, herbert, kuznet,
	linux-kernel, netdev, steffen.klassert, syzkaller-bugs, yoshfuji
In-Reply-To: <c8285592-29a7-2827-8c9a-d8cc0cf099e8@gmail.com>

On Thu, Sep 06, 2018 at 12:00:26AM -0700, Eric Dumazet wrote:
> 
> 
> On 09/05/2018 08:17 PM, syzbot wrote:
> > syzbot has found a reproducer for the following crash on:
> > 
> > HEAD commit:    b36fdc6853a3 Merge tag 'gpio-v4.19-2' of git://git.kernel...
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=164938d1400000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7e83258d6e0156
> > dashboard link: https://syzkaller.appspot.com/bug?extid=acffccec848dc13fe459
> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=115f172e400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=16399be1400000
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+acffccec848dc13fe459@syzkaller.appspotmail.com
> > 
> > IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
> > IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> > IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> > 8021q: adding VLAN 0 to HW filter on device team0
> > ==================================================================
> > BUG: KASAN: slab-out-of-bounds in _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161
> > Read of size 1 at addr ffff8801d4a67f07 by task syz-executor092/4673
> > 
> > CPU: 1 PID: 4673 Comm: syz-executor092 Not tainted 4.19.0-rc2+ #223
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
> >  print_address_description+0x6c/0x20b mm/kasan/report.c:256
> >  kasan_report_error mm/kasan/report.c:354 [inline]
> >  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
> >  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
> >  _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161
> >  __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299
> >  xfrm_decode_session include/net/xfrm.h:1232 [inline]
> >  vti6_tnl_xmit+0x3fc/0x1bb1 net/ipv6/ip6_vti.c:542
> >  __netdev_start_xmit include/linux/netdevice.h:4287 [inline]
> >  netdev_start_xmit include/linux/netdevice.h:4296 [inline]
> >  xmit_one net/core/dev.c:3216 [inline]
> >  dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3232
> >  __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3802
> >  dev_queue_xmit+0x17/0x20 net/core/dev.c:3835
> >  __bpf_tx_skb net/core/filter.c:2012 [inline]
> >  __bpf_redirect_common net/core/filter.c:2050 [inline]
> >  __bpf_redirect+0x5b7/0xae0 net/core/filter.c:2057
> >  ____bpf_clone_redirect net/core/filter.c:2090 [inline]
> >  bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2062
> >  bpf_prog_c39d1ba309a769f7+0xe9e/0x1000
> > 
> > Allocated by task 4673:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
> >  set_track mm/kasan/kasan.c:460 [inline]
> >  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
> >  __do_kmalloc_node mm/slab.c:3682 [inline]
> >  __kmalloc_node_track_caller+0x47/0x70 mm/slab.c:3696
> >  __kmalloc_reserve.isra.41+0x3a/0xe0 net/core/skbuff.c:137
> >  pskb_expand_head+0x230/0x10e0 net/core/skbuff.c:1463
> >  skb_ensure_writable+0x3dd/0x640 net/core/skbuff.c:5129
> >  __bpf_try_make_writable net/core/filter.c:1633 [inline]
> >  bpf_try_make_writable net/core/filter.c:1639 [inline]
> >  bpf_try_make_head_writable net/core/filter.c:1647 [inline]
> >  ____bpf_clone_redirect net/core/filter.c:2084 [inline]
> >  bpf_clone_redirect+0x14a/0x490 net/core/filter.c:2062
> >  bpf_prog_c39d1ba309a769f7+0xe9e/0x1000
> > 
> > Freed by task 3286:
> >  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
> >  set_track mm/kasan/kasan.c:460 [inline]
> >  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
> >  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
> >  __cache_free mm/slab.c:3498 [inline]
> >  kfree+0xd9/0x210 mm/slab.c:3813
> >  load_elf_binary+0x2569/0x5610 fs/binfmt_elf.c:1118
> >  search_binary_handler+0x17d/0x570 fs/exec.c:1653
> >  exec_binprm fs/exec.c:1695 [inline]
> >  __do_execve_file.isra.35+0x15ff/0x2460 fs/exec.c:1819
> >  do_execveat_common fs/exec.c:1866 [inline]
> >  do_execve fs/exec.c:1883 [inline]
> >  __do_sys_execve fs/exec.c:1964 [inline]
> >  __se_sys_execve fs/exec.c:1959 [inline]
> >  __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
> >  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > The buggy address belongs to the object at ffff8801d4a67d00
> >  which belongs to the cache kmalloc-512 of size 512
> > The buggy address is located 7 bytes to the right of
> >  512-byte region [ffff8801d4a67d00, ffff8801d4a67f00)
> > The buggy address belongs to the page:
> > page:ffffea00075299c0 count:1 mapcount:0 mapping:ffff8801dac00940 index:0x0
> > flags: 0x2fffc0000000100(slab)
> > raw: 02fffc0000000100 ffffea0007529988 ffffea0007529a48 ffff8801dac00940
> > raw: 0000000000000000 ffff8801d4a67080 0000000100000006 0000000000000000
> > page dumped because: kasan: bad access detected
> > 
> > Memory state around the buggy address:
> >  ffff8801d4a67e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >  ffff8801d4a67e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> ffff8801d4a67f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >                    ^
> >  ffff8801d4a67f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >  ffff8801d4a68000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ==================================================================
> > 
> 
> 
> What about :
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index aecdeba052d3f0ff3d4f0a33ec36891f9738052c..a662f59786bd0677850c1c60a2c92faa6fb6c5bb 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2081,7 +2081,7 @@ BPF_CALL_3(bpf_clone_redirect, struct sk_buff *, skb, u32, ifindex, u64, flags)
>          * here, we need to free the just generated clone to unclone once
>          * again.
>          */
> -       ret = bpf_try_make_head_writable(skb);
> +       ret = bpf_try_make_head_writable(clone);

This part is fine. I think the bug is in _decode_session6,
but I have a hard time reproducing the issue, so will appreciate
if somebody can test the following patch:

>From 291f80f212461670d1e0140d06eee3071cf3e1ee Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov <ast@kernel.org>
Date: Thu, 6 Sep 2018 10:23:29 -0700
Subject: [PATCH] net/xfrm: fix out-of-bounds packet access

BUG: KASAN: slab-out-of-bounds in _decode_session6+0x1331/0x14e0
net/ipv6/xfrm6_policy.c:161
Read of size 1 at addr ffff8801d882eec7 by task syz-executor1/6667
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  _decode_session6+0x1331/0x14e0 net/ipv6/xfrm6_policy.c:161
  __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2299
  xfrm_decode_session include/net/xfrm.h:1232 [inline]
  vti6_tnl_xmit+0x3c3/0x1bc1 net/ipv6/ip6_vti.c:542
  __netdev_start_xmit include/linux/netdevice.h:4313 [inline]
  netdev_start_xmit include/linux/netdevice.h:4322 [inline]
  xmit_one net/core/dev.c:3217 [inline]
  dev_hard_start_xmit+0x272/0xc10 net/core/dev.c:3233
  __dev_queue_xmit+0x2ab2/0x3870 net/core/dev.c:3803
  dev_queue_xmit+0x17/0x20 net/core/dev.c:3836

Reported-by: syzbot+acffccec848dc13fe459@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 net/ipv6/xfrm6_policy.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/xfrm6_policy.c b/net/ipv6/xfrm6_policy.c
index ef3defaf43b9..d35bcf92969c 100644
--- a/net/ipv6/xfrm6_policy.c
+++ b/net/ipv6/xfrm6_policy.c
@@ -146,8 +146,8 @@ _decode_session6(struct sk_buff *skb, struct flowi *fl, int reverse)
 	fl6->daddr = reverse ? hdr->saddr : hdr->daddr;
 	fl6->saddr = reverse ? hdr->daddr : hdr->saddr;
 
-	while (nh + offset + 1 < skb->data ||
-	       pskb_may_pull(skb, nh + offset + 1 - skb->data)) {
+	while (nh + offset + sizeof(*exthdr) < skb->data ||
+	       pskb_may_pull(skb, nh + offset + sizeof(*exthdr) - skb->data)) {
 		nh = skb_network_header(skb);
 		exthdr = (struct ipv6_opt_hdr *)(nh + offset);
 
-- 
2.17.1

^ permalink raw reply related

* Re: [iproute PATCH] ip-route: Fix segfault with many nexthops
From: Phil Sutter @ 2018-09-06 12:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20180904171544.9337-1-phil@nwl.cc>

Hi,

On Tue, Sep 04, 2018 at 07:15:44PM +0200, Phil Sutter wrote:
[...]
> diff --git a/ip/iproute.c b/ip/iproute.c
> index 30833414a3f7f..9e5ae48c0715c 100644
> --- a/ip/iproute.c
> +++ b/ip/iproute.c
[...]
> @@ -1036,15 +1044,18 @@ static int parse_nexthops(struct nlmsghdr *n, struct rtmsg *r,
>  		memset(rtnh, 0, sizeof(*rtnh));
>  		rtnh->rtnh_len = sizeof(*rtnh);
>  		rta->rta_len += rtnh->rtnh_len;
> -		if (parse_one_nh(n, r, rta, rtnh, &argc, &argv)) {
> +		if (parse_one_nh(n, r, rta, 1024, rtnh, &argc, &argv)) {
>  			fprintf(stderr, "Error: cannot parse nexthop\n");
>  			exit(-1);
>  		}
>  		rtnh = RTNH_NEXT(rtnh);
>  	}
>  
> +		return 0;
> +

This line got added by accident, I'll respin.

Sorry, Phil

^ permalink raw reply

* Re: [PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Leon Romanovsky @ 2018-09-06 12:56 UTC (permalink / raw)
  To: Arseny Maslennikov
  Cc: Stephen Hemminger, linux-rdma, Doug Ledford, Jason Gunthorpe,
	netdev
In-Reply-To: <20180906072656.GB13034@cello>

[-- Attachment #1: Type: text/plain, Size: 1586 bytes --]

On Thu, Sep 06, 2018 at 10:26:56AM +0300, Arseny Maslennikov wrote:
> On Wed, Sep 05, 2018 at 04:47:27PM +0100, Stephen Hemminger wrote:
> > On Mon,  3 Sep 2018 19:13:16 +0300
> > Arseny Maslennikov <ar@cs.msu.ru> wrote:
> >
> > > +	if (ndev->dev_id == ndev->dev_port) {
> > > +		netdev_info_once(ndev,
> > > +			"\"%s\" wants to know my dev_id. "
> > > +			"Should it look at dev_port instead?\n",
> > > +			current->comm);
> > > +		netdev_info_once(ndev,
> > > +			"See Documentation/ABI/testing/sysfs-class-net for more info.\n");
> > > +	}
> >
> > Single line message is sufficient.
> > Also don't break strings in messages.
> >
>
> OK, will fix in v4.
>
>
> (Sorry if the following is too off-topic here)
> Multi-line messages in separate printk calls can be racy, I get that.
> But I'd like to hear some reasoning behind the style decision to not
> break a long string into many string literals. (I'll most certainly not
> be alone in this, Documentation/process/ does not mention reasons, only
> the requirements themselves)
>
> The only drawback I currently see is that breaking a long message into
> multiple string literals makes it impossible to git grep the kernel tree
> for the whole message text.
> However, splitting a long line this way allows us to nicely wrap the
> code at 80 columns, which is a readability boon.
>
> Are there any other reasons to avoid that? Except maybe matters of taste. :)

AFAIK "grep" is the reason.

>
> > > +	}
> > > +
> > > +	ret = sprintf(buf, "%#x\n", ndev->dev_id);
> > > +
> > > +	return ret;
> >
> > Why not?
> > 	return sprintf...



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net] net/ipv6: fix incorrect fib6 gateway info after do redirect
From: Hangbin Liu @ 2018-09-06 12:57 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, David S. Miller, Hangbin Liu

When receive a redirect message and call rt6_do_redirect(), we allocate
a new rt6_info and set new flags and gateway info, but not update these
info to fib6_info.

Then if a user try to get the route info via `ip route get`, he will still
get the old default gateway, because inet6_rtm_getroute() get gateway info
from fib6_info.

Fixes: 23fb93a4d3f11 ("net/ipv6: Cleanup exception and cache route handling")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 net/ipv6/route.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18e00ce..3d367c9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3446,6 +3446,10 @@ static void rt6_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_bu
 		goto out;
 	}
 
+	/* Update fib6_info from rt6_info */
+	from->fib6_flags = rt->rt6i_flags;
+	from->fib6_nh.nh_gw = rt->rt6i_gateway;
+
 	netevent.old = &rt->dst;
 	netevent.new = &nrt->dst;
 	netevent.daddr = &msg->dest;
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH iproute2] bridge/mdb: fix missing new line when show bridge mdb
From: Stephen Hemminger @ 2018-09-06 13:00 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, David Ahern
In-Reply-To: <1536118423-20604-1-git-send-email-liuhangbin@gmail.com>

On Wed,  5 Sep 2018 11:33:43 +0800
Hangbin Liu <liuhangbin@gmail.com> wrote:

> The bridge mdb show is broken on current iproute2. e.g.
> ]# bridge mdb show
> 34: br0  veth0_br  224.1.1.2  temp 34: br0  veth0_br  224.1.1.1  temp
> 
> After fix:
> ]# bridge mdb show
> 34: br0  veth0_br  224.1.1.2  temp
> 34: br0  veth0_br  224.1.1.1  temp
> 
> Reported-by: Ying Xu <yinxu@redhat.com>
> Fixes: c7c1a1ef51aea ("bridge: colorize output and use JSON print library")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  bridge/mdb.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/bridge/mdb.c b/bridge/mdb.c
> index f38dc67..d89c065 100644
> --- a/bridge/mdb.c
> +++ b/bridge/mdb.c
> @@ -107,6 +107,10 @@ static void br_print_router_ports(FILE *f, struct rtattr *attr,
>  			fprintf(f, "%s ", port_ifname);
>  		}
>  	}
> +
> +	if (!is_json_context() && !show_stats)
> +		fprintf(f, "\n");
> +
>  	close_json_array(PRINT_JSON, NULL);
>  }
>  
> @@ -164,6 +168,10 @@ static void print_mdb_entry(FILE *f, int ifindex, const struct br_mdb_entry *e,
>  		print_string(PRINT_ANY, "timer", " %s",
>  			     format_timer(timer));
>  	}
> +
> +	if (!is_json_context())
> +		fprintf(f, "\n");
> +
>  	close_json_object();
>  }
>  

Thanks for catching this.

Now that there is a json print library, the preferred pattern for
this is:
	print_string(PRINT_FP, NULL, "\n", NULL);

I plan to introduce a helper
	print_fp(...)

and it would be easier if all places were consistent.

^ permalink raw reply

* Re: [PATCH v3 3/3] IB/ipoib: Log sysfs 'dev_id' accesses from userspace
From: Leon Romanovsky @ 2018-09-06 13:03 UTC (permalink / raw)
  To: Arseny Maslennikov; +Cc: linux-rdma, Doug Ledford, Jason Gunthorpe, netdev
In-Reply-To: <20180906070433.GA13034@cello>

[-- Attachment #1: Type: text/plain, Size: 3732 bytes --]

On Thu, Sep 06, 2018 at 10:04:33AM +0300, Arseny Maslennikov wrote:
> On Wed, Sep 05, 2018 at 04:50:35PM +0300, Leon Romanovsky wrote:
> > On Mon, Sep 03, 2018 at 07:13:16PM +0300, Arseny Maslennikov wrote:
> > > Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
> > > ---
> > >  drivers/infiniband/ulp/ipoib/ipoib_main.c | 38 +++++++++++++++++++++++
> > >  1 file changed, 38 insertions(+)
> > >
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > index 30f840f874b3..7386e5bde3d3 100644
> > > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > > @@ -2386,6 +2386,42 @@ int ipoib_add_pkey_attr(struct net_device *dev)
> > >  	return device_create_file(&dev->dev, &dev_attr_pkey);
> > >  }
> > >
> > > +/*
> > > + * We erroneously exposed the iface's port number in the dev_id
> > > + * sysfs field long after dev_port was introduced for that purpose[1],
> > > + * and we need to stop everyone from relying on that.
> > > + * Let's overload the shower routine for the dev_id file here
> > > + * to gently bring the issue up.
> > > + *
> > > + * [1] https://www.spinics.net/lists/netdev/msg272123.html
> > > + */
> > > +static ssize_t dev_id_show(struct device *dev,
> > > +			   struct device_attribute *attr, char *buf)
> > > +{
> > > +	struct net_device *ndev = to_net_dev(dev);
> > > +	ssize_t ret = -EINVAL;
> > > +
> > > +	if (ndev->dev_id == ndev->dev_port) {
> > > +		netdev_info_once(ndev,
> > > +			"\"%s\" wants to know my dev_id. "
> > > +			"Should it look at dev_port instead?\n",
> > > +			current->comm);
> > > +		netdev_info_once(ndev,
> > > +			"See Documentation/ABI/testing/sysfs-class-net for more info.\n");
> > > +	}
> > > +
> > > +	ret = sprintf(buf, "%#x\n", ndev->dev_id);
> > > +
> > > +	return ret;
> > > +}
> > > +static DEVICE_ATTR_RO(dev_id);
> > > +
> >
> > I don't see this field among exposed by IPoIB, why should we expose it now?
> >
>
> To deviate from standard netdev behaviour, which only prints the
> field out. Doug wanted this to also print a deprecation message, and
> netdev (obviously) does not do that. See below.
>
> > > +int ipoib_intercept_dev_id_attr(struct net_device *dev)
> > > +{
> > > +	device_remove_file(&dev->dev, &dev_attr_dev_id);
> > > +	return device_create_file(&dev->dev, &dev_attr_dev_id);
> >
> > Why isn't enough to rely on netdev code?
> >
>
> Netdev code relies on macros around a *static* function 'netdev_show',
> which is defined in net/core/net-sysfs.c; it is not listed in any header
> files, and the macros aren't as well. This all leads me to believe it
> was not really meant to be used from outside net/core/net-sysfs.
>
> The only way we could use any netdev code here is to set up our own
> handler (again), printk() a message, then call netdev_show — but we have
> no access to it.
>
> Of course, it also may be that I'm terribly missing a clue.

Thanks,

IMHO, the end result of adequate Doug's request is a little bit too much.
I don't think that it justifies such remove->create construction.

Personal opinion.

>
> > > +}
> > > +
> > >  static struct net_device *ipoib_add_port(const char *format,
> > >  					 struct ib_device *hca, u8 port)
> > >  {
> > > @@ -2427,6 +2463,8 @@ static struct net_device *ipoib_add_port(const char *format,
> > >  	 */
> > >  	ndev->priv_destructor = ipoib_intf_free;
> > >
> > > +	if (ipoib_intercept_dev_id_attr(ndev))
> > > +		goto sysfs_failed;
> > >  	if (ipoib_cm_add_mode_attr(ndev))
> > >  		goto sysfs_failed;
> > >  	if (ipoib_add_pkey_attr(ndev))
> > > --
> > > 2.19.0.rc1
> > >
>
>



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* [PATCH net-next v2 1/7] net: aquantia: fix hw_atl_utils_fw_upload_dwords
From: Igor Russkikh @ 2018-09-06 13:05 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Igor Russkikh, Yana Esina, Nikita Danilov
In-Reply-To: <cover.1536233536.git.igor.russkikh@aquantia.com>

From: Yana Esina <yana.esina@aquantia.com>

This patch fixes the upload function, which worked incorrectly with
some chips.

Signed-off-by: Yana Esina <yana.esina@aquantia.com>
Signed-off-by: Nikita Danilov <nikita.danilov@aquantia.com>
Tested-by: Nikita Danilov <nikita.danilov@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
---
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c |  8 +++++
 .../ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h |  3 ++
 .../aquantia/atlantic/hw_atl/hw_atl_llh_internal.h | 13 ++++++++
 .../aquantia/atlantic/hw_atl/hw_atl_utils.c        | 36 +++++++++++++++-------
 .../aquantia/atlantic/hw_atl/hw_atl_utils.h        |  5 +++
 .../aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c   |  5 +++
 6 files changed, 59 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
index 10ba035dadb1..be0a3a90dfad 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.c
@@ -1460,3 +1460,11 @@ void hw_atl_reg_glb_cpu_scratch_scp_set(struct aq_hw_s *aq_hw,
 	aq_hw_write_reg(aq_hw, HW_ATL_GLB_CPU_SCRATCH_SCP_ADR(scratch_scp),
 			glb_cpu_scratch_scp);
 }
+
+void hw_atl_mcp_up_force_intr_set(struct aq_hw_s *aq_hw, u32 up_force_intr)
+{
+	aq_hw_write_reg_bit(aq_hw, HW_ATL_MCP_UP_FORCE_INTERRUPT_ADR,
+			    HW_ATL_MCP_UP_FORCE_INTERRUPT_MSK,
+			    HW_ATL_MCP_UP_FORCE_INTERRUPT_SHIFT,
+			    up_force_intr);
+}
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h
index dfb426f2dc2c..7056c7342afc 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh.h
@@ -698,4 +698,7 @@ void hw_atl_msm_reg_wr_strobe_set(struct aq_hw_s *aq_hw, u32 reg_wr_strobe);
 /* set pci register reset disable */
 void hw_atl_pci_pci_reg_res_dis_set(struct aq_hw_s *aq_hw, u32 pci_reg_res_dis);
 
+/* set uP Force Interrupt */
+void hw_atl_mcp_up_force_intr_set(struct aq_hw_s *aq_hw, u32 up_force_intr);
+
 #endif /* HW_ATL_LLH_H */
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h
index e0cf70120f1d..716674a9b729 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_llh_internal.h
@@ -2387,4 +2387,17 @@
 #define HW_ATL_GLB_CPU_SCRATCH_SCP_ADR(scratch_scp) \
 	(0x00000300u + (scratch_scp) * 0x4)
 
+/* register address for bitfield uP Force Interrupt */
+#define HW_ATL_MCP_UP_FORCE_INTERRUPT_ADR 0x00000404
+/* bitmask for bitfield uP Force Interrupt */
+#define HW_ATL_MCP_UP_FORCE_INTERRUPT_MSK 0x00000002
+/* inverted bitmask for bitfield uP Force Interrupt */
+#define HW_ATL_MCP_UP_FORCE_INTERRUPT_MSKN 0xFFFFFFFD
+/* lower bit position of bitfield uP Force Interrupt */
+#define HW_ATL_MCP_UP_FORCE_INTERRUPT_SHIFT 1
+/* width of bitfield uP Force Interrupt */
+#define HW_ATL_MCP_UP_FORCE_INTERRUPT_WIDTH 1
+/* default value of bitfield uP Force Interrupt */
+#define HW_ATL_MCP_UP_FORCE_INTERRUPT_DEFAULT 0x0
+
 #endif /* HW_ATL_LLH_INTERNAL_H */
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
index c965e65d07db..1926532bd1af 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c
@@ -325,17 +325,31 @@ static int hw_atl_utils_fw_upload_dwords(struct aq_hw_s *self, u32 a, u32 *p,
 		err = -ETIME;
 		goto err_exit;
 	}
+	if (IS_CHIP_FEATURE(REVISION_B1)) {
+		u32 offset = 0;
+
+		for (; offset < cnt; ++offset) {
+			aq_hw_write_reg(self, 0x328, p[offset]);
+			aq_hw_write_reg(self, 0x32C,
+					(0x80000000 | (0xFFFF & (offset * 4))));
+			hw_atl_mcp_up_force_intr_set(self, 1);
+			/* 1000 times by 10us = 10ms */
+			AQ_HW_WAIT_FOR((aq_hw_read_reg(self,
+						       0x32C) & 0xF0000000) !=
+				       0x80000000,
+				       10, 1000);
+		}
+	} else {
+		u32 offset = 0;
 
-	aq_hw_write_reg(self, 0x00000208U, a);
-
-	for (++cnt; --cnt;) {
-		u32 i = 0U;
+		aq_hw_write_reg(self, 0x208, a);
 
-		aq_hw_write_reg(self, 0x0000020CU, *(p++));
-		aq_hw_write_reg(self, 0x00000200U, 0xC000U);
+		for (; offset < cnt; ++offset) {
+			aq_hw_write_reg(self, 0x20C, p[offset]);
+			aq_hw_write_reg(self, 0x200, 0xC000);
 
-		for (i = 1024U;
-			(0x100U & aq_hw_read_reg(self, 0x00000200U)) && --i;) {
+			AQ_HW_WAIT_FOR((aq_hw_read_reg(self, 0x200U) &
+					0x100) == 0, 10, 1000);
 		}
 	}
 
@@ -399,7 +413,7 @@ struct aq_hw_atl_utils_fw_rpc_tid_s {
 
 #define hw_atl_utils_fw_rpc_init(_H_) hw_atl_utils_fw_rpc_wait(_H_, NULL)
 
-static int hw_atl_utils_fw_rpc_call(struct aq_hw_s *self, unsigned int rpc_size)
+int hw_atl_utils_fw_rpc_call(struct aq_hw_s *self, unsigned int rpc_size)
 {
 	int err = 0;
 	struct aq_hw_atl_utils_fw_rpc_tid_s sw;
@@ -423,8 +437,8 @@ static int hw_atl_utils_fw_rpc_call(struct aq_hw_s *self, unsigned int rpc_size)
 	return err;
 }
 
-static int hw_atl_utils_fw_rpc_wait(struct aq_hw_s *self,
-				    struct hw_aq_atl_utils_fw_rpc **rpc)
+int hw_atl_utils_fw_rpc_wait(struct aq_hw_s *self,
+			     struct hw_aq_atl_utils_fw_rpc **rpc)
 {
 	int err = 0;
 	struct aq_hw_atl_utils_fw_rpc_tid_s sw;
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
index b875590efcbd..505c8a2abd9c 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.h
@@ -319,6 +319,11 @@ struct aq_stats_s *hw_atl_utils_get_hw_stats(struct aq_hw_s *self);
 int hw_atl_utils_fw_downld_dwords(struct aq_hw_s *self, u32 a,
 				  u32 *p, u32 cnt);
 
+int hw_atl_utils_fw_rpc_call(struct aq_hw_s *self, unsigned int rpc_size);
+
+int hw_atl_utils_fw_rpc_wait(struct aq_hw_s *self,
+			     struct hw_aq_atl_utils_fw_rpc **rpc);
+
 extern const struct aq_fw_ops aq_fw_1x_ops;
 extern const struct aq_fw_ops aq_fw_2x_ops;
 
diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
index e37943760a58..6300d94c9ff0 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils_fw2x.c
@@ -21,6 +21,7 @@
 
 #define HW_ATL_FW2X_MPI_EFUSE_ADDR	0x364
 #define HW_ATL_FW2X_MPI_MBOX_ADDR	0x360
+#define HW_ATL_FW2X_MPI_RPC_ADDR        0x334
 
 #define HW_ATL_FW2X_MPI_CONTROL_ADDR	0x368
 #define HW_ATL_FW2X_MPI_CONTROL2_ADDR	0x36C
@@ -40,6 +41,10 @@ static int aq_fw2x_init(struct aq_hw_s *self)
 	AQ_HW_WAIT_FOR(0U != (self->mbox_addr =
 			aq_hw_read_reg(self, HW_ATL_FW2X_MPI_MBOX_ADDR)),
 		       1000U, 10U);
+	AQ_HW_WAIT_FOR(0U != (self->rpc_addr =
+		       aq_hw_read_reg(self, HW_ATL_FW2X_MPI_RPC_ADDR)),
+		       1000U, 100U);
+
 	return err;
 }
 
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox