* Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jose Abreu @ 2018-09-03 15:22 UTC (permalink / raw)
To: Jerome Brunet, Jose Abreu, netdev
Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <70c8a824911deb73fa9dba2c0354ee4ed6623af8.camel@baylibre.com>
On 03-09-2018 15:10, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 11:16, Jerome Brunet wrote:
>>> No notable change. Rx is fine but Tx:
>>> [ 5] 3.00-4.00 sec 3.55 MBytes 29.8 Mbits/sec 51 12.7 KBytes
>>>
>>> I suppose the problem as something to do with the retries. When doing Tx test
>>> alone, we don't have such a things a throughput where we expect it to be.
>> Yeah, I just remembered you are not using GMAC4 so it wouldn't
>> make a difference. Is your version 3.710? If so please try adding
>> the following compatible to your DT bindings "snps,dwmac-3.710".
> According to the documentation, it is a 3.70a but I learn (the hard way) not to
> trust the documentation too much. Is there anyway to make sure which version we
> have. Like a register to read ?
It should be dumped at probe by a string like this one:
"User ID: 0xXY, Synopsys ID: 0xXZ"
>
> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
> some reason, the MDIO bus failed to register with this. Since it is not the
> documented version, I did not check why.
No you can't change. You need to add it. So it should stay like this:
compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
"snps,dwmac-3.710";
Thanks and Best Regards,
Jose Miguel Abreu
>
>>> By the way, your mailer (and its auto 80 column rule I suppose) made the patch
>>> below a bit harder to apply
>> Sorry. Next time I will send as attachment.
> No worries
>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>
^ permalink raw reply
* [PATCH net] net/mlx5: Fix SQ offset in QPs with small RQ
From: Tariq Toukan @ 2018-09-03 15:06 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Alaa Hleihel,
Tariq Toukan
Correct the formula for calculating the RQ page remainder,
which should be in byte granularity. The result will be
non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
is a power of 2.
Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
SQ offset in strides granularity.
Fixes: d7037ad73daa ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/wq.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Hi Dave,
Please queue for -stable v4.18.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/wq.c b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
index 86478a6b99c5..c8c315eb5128 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/wq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/wq.c
@@ -139,14 +139,15 @@ int mlx5_wq_qp_create(struct mlx5_core_dev *mdev, struct mlx5_wq_param *param,
struct mlx5_wq_ctrl *wq_ctrl)
{
u32 sq_strides_offset;
+ u32 rq_pg_remainder;
int err;
mlx5_fill_fbc(MLX5_GET(qpc, qpc, log_rq_stride) + 4,
MLX5_GET(qpc, qpc, log_rq_size),
&wq->rq.fbc);
- sq_strides_offset =
- ((wq->rq.fbc.frag_sz_m1 + 1) % PAGE_SIZE) / MLX5_SEND_WQE_BB;
+ rq_pg_remainder = mlx5_wq_cyc_get_byte_size(&wq->rq) % PAGE_SIZE;
+ sq_strides_offset = rq_pg_remainder / MLX5_SEND_WQE_BB;
mlx5_fill_fbc_offset(ilog2(MLX5_SEND_WQE_BB),
MLX5_GET(qpc, qpc, log_sq_size),
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jose Abreu @ 2018-09-03 15:19 UTC (permalink / raw)
To: Jerome Brunet, Jose Abreu, netdev
Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <542e586ab7740fb5c694faa0b2bbf829264d62f3.camel@baylibre.com>
[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]
On 03-09-2018 15:07, Jerome Brunet wrote:
>
> You had it on what you sent in the RFT, but this different.
Yeah, I had to fix the logic where tx queues != rx queues...
>
> Like with the RFT, the network breakdown we had is no longer reproduced.
> However this patch wreck the Rx throughput (680MBps -> 35MBps)
Damn, thats low. And I cant reproduce it here :/
Strange because I barely messed around with RX path...
Can you try attached patch in top of this one please?
>
> BTW, this patch and the RFT assume that 4ae0169fd1b3 ("net: stmmac: Do not keep
> rearming the coalesce timer in stmmac_xmit") is still applied but I believe
> David reverted the patch.
>
> If you still need this change, you should include it back in your changeset.
Yes I know it was reverted but -net was not merged into -next yet...
Thanks and Best Regards,
Jose Miguel Abreu
>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>> ---
>> drivers/net/ethernet/stmicro/stmmac/common.h | 4 +-
>> drivers/net/ethernet/stmicro/stmmac/stmmac.h | 7 +-
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 +++++++++++++++-------
>> 3 files changed, 126 insertions(+), 62 deletions(-)
>
[-- Attachment #2: fixup.patch --]
[-- Type: text/x-patch, Size: 668 bytes --]
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 14f890f2a970..3c7cfda80433 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2247,10 +2247,8 @@ static void stmmac_tx_timer(struct timer_list *t)
struct stmmac_tx_queue *tx_q = from_timer(tx_q, t, txtimer);
struct stmmac_priv *priv = tx_q->priv_data;
- if (napi_schedule_prep(&tx_q->napi)) {
- stmmac_disable_dma_irq(priv, priv->ioaddr, tx_q->queue_index);
+ if (napi_schedule_prep(&tx_q->napi))
__napi_schedule(&tx_q->napi);
- }
tx_q->tx_timer_active = 0;
}
^ permalink raw reply related
* KASAN: use-after-free Read in sock_i_ino
From: syzbot @ 2018-09-03 19:31 UTC (permalink / raw)
To: davem, jon.maloy, linux-kernel, netdev, syzkaller-bugs,
tipc-discussion, ying.xue
Hello,
syzbot found the following crash on:
HEAD commit: dc6417949297 Merge branch 'net_sched-reject-unknown-tcfa_a..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=13e9e70e400000
kernel config: https://syzkaller.appspot.com/x/.config?x=531a917630d2a492
dashboard link: https://syzkaller.appspot.com/bug?extid=48804b87c16588ad491d
compiler: gcc (GCC) 8.0.1 20180413 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1324fda6400000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+48804b87c16588ad491d@syzkaller.appspotmail.com
IPv6: ADDRCONF(NETDEV_UP): veth1: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
==================================================================
BUG: KASAN: use-after-free in sock_i_ino+0x94/0xa0 net/core/sock.c:1921
Read of size 8 at addr ffff8801ba5e40b0 by task syz-executor0/5008
CPU: 0 PID: 5008 Comm: syz-executor0 Not tainted 4.19.0-rc1+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
sock_i_ino+0x94/0xa0 net/core/sock.c:1921
tipc_sk_fill_sock_diag+0x3be/0xdb0 net/tipc/socket.c:3316
__tipc_add_sock_diag+0x22f/0x360 net/tipc/diag.c:62
tipc_nl_sk_walk+0x122/0x1d0 net/tipc/socket.c:3250
tipc_diag_dump+0x24/0x30 net/tipc/diag.c:73
netlink_dump+0x519/0xd50 net/netlink/af_netlink.c:2233
__netlink_dump_start+0x4f1/0x6f0 net/netlink/af_netlink.c:2329
netlink_dump_start include/linux/netlink.h:213 [inline]
tipc_sock_diag_handler_dump+0x28e/0x3d0 net/tipc/diag.c:91
__sock_diag_cmd net/core/sock_diag.c:232 [inline]
sock_diag_rcv_msg+0x31d/0x410 net/core/sock_diag.c:263
netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:274
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:621 [inline]
sock_sendmsg+0xd5/0x120 net/socket.c:631
___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
__sys_sendmsg+0x11d/0x290 net/socket.c:2152
__do_sys_sendmsg net/socket.c:2161 [inline]
__se_sys_sendmsg net/socket.c:2159 [inline]
__x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457089
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fa11f7e0c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fa11f7e16d4 RCX: 0000000000457089
RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000006
RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d4570 R14: 00000000004c8d49 R15: 0000000000000000
Allocated by task 5008:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
kmem_cache_alloc+0x12e/0x710 mm/slab.c:3554
sock_alloc_inode+0x1d/0x260 net/socket.c:244
alloc_inode+0x63/0x190 fs/inode.c:210
new_inode_pseudo+0x71/0x1a0 fs/inode.c:903
sock_alloc+0x41/0x270 net/socket.c:547
__sock_create+0x175/0x940 net/socket.c:1239
sock_create net/socket.c:1315 [inline]
__sys_socket+0x106/0x260 net/socket.c:1345
__do_sys_socket net/socket.c:1354 [inline]
__se_sys_socket net/socket.c:1352 [inline]
__x64_sys_socket+0x73/0xb0 net/socket.c:1352
do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 5007:
save_stack+0x43/0xd0 mm/kasan/kasan.c:448
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
__cache_free mm/slab.c:3498 [inline]
kmem_cache_free+0x86/0x280 mm/slab.c:3756
sock_destroy_inode+0x51/0x60 net/socket.c:272
destroy_inode+0x159/0x200 fs/inode.c:267
evict+0x5d5/0x990 fs/inode.c:575
iput_final fs/inode.c:1547 [inline]
iput+0x5fa/0xa00 fs/inode.c:1573
dentry_unlink_inode+0x461/0x5e0 fs/dcache.c:374
__dentry_kill+0x44c/0x7a0 fs/dcache.c:566
dentry_kill+0xc9/0x5a0 fs/dcache.c:685
dput.part.26+0x66b/0x7a0 fs/dcache.c:846
dput+0x15/0x20 fs/dcache.c:828
__fput+0x4d4/0xa40 fs/file_table.c:291
____fput+0x15/0x20 fs/file_table.c:309
task_work_run+0x1e8/0x2a0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:193 [inline]
exit_to_usermode_loop+0x318/0x380 arch/x86/entry/common.c:166
prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
do_syscall_64+0x6be/0x820 arch/x86/entry/common.c:293
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8801ba5e4040
which belongs to the cache sock_inode_cache(17:syz0) of size 984
The buggy address is located 112 bytes inside of
984-byte region [ffff8801ba5e4040, ffff8801ba5e4418)
The buggy address belongs to the page:
page:ffffea0006e97900 count:1 mapcount:0 mapping:ffff8801d09a3780
index:0xffff8801ba5e4ffd
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006e93e08 ffffea0006e97988 ffff8801d09a3780
raw: ffff8801ba5e4ffd ffff8801ba5e4040 0000000100000003 ffff8801b1d94a40
page dumped because: kasan: bad access detected
page->mem_cgroup:ffff8801b1d94a40
Memory state around the buggy address:
ffff8801ba5e3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
ffff8801ba5e4000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
> ffff8801ba5e4080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8801ba5e4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8801ba5e4180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* Re: [PATCH v7 1/4] gpiolib: Pass bitmaps, not integer arrays, to get/set array
From: Geert Uytterhoeven @ 2018-09-03 15:07 UTC (permalink / raw)
To: Janusz Krzysztofik
Cc: Linus Walleij, Jonathan Corbet, Miguel Ojeda Sandonis,
peter.korsgaard, Peter Rosin, Ulf Hansson, Andrew Lunn,
Florian Fainelli, David S. Miller, Dominik Brodowski, Greg KH,
Kishon Vijay Abraham I, Lars-Peter Clausen, Michael Hennerich,
Jonathan Cameron, Hartmut Knaack, Peter Meerwald, Jiri Slaby,
Willy Tarreau, open list:DOCUMENTATION
In-Reply-To: <20180902120144.6855-2-jmkrzyszt@gmail.com>
Hi Janusz,
On Sun, Sep 2, 2018 at 2:01 PM Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote:
> Most users of get/set array functions iterate consecutive bits of data,
> usually a single integer, while processing array of results obtained
> from, or building an array of values to be passed to those functions.
> Save time wasted on those iterations by changing the functions' API to
> accept bitmaps.
>
> All current users are updated as well.
>
> More benefits from the change are expected as soon as planned support
> for accepting/passing those bitmaps directly from/to respective GPIO
> chip callbacks if applicable is implemented.
>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Sebastien Bourdelin <sebastien.bourdelin@savoirfairelinux.com>
> Cc: Lukas Wunner <lukas@wunner.de>
> Cc: Peter Korsgaard <peter.korsgaard@barco.com>
> Cc: Peter Rosin <peda@axentia.se>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Florian Fainelli <f.fainelli@gmail.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Rojhalat Ibrahim <imr@rtschenk.de>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Cc: Kishon Vijay Abraham I <kishon@ti.com>
> Cc: Tony Lindgren <tony@atomide.com>
> Cc: Lars-Peter Clausen <lars@metafoo.de>
> Cc: Michael Hennerich <Michael.Hennerich@analog.com>
> Cc: Jonathan Cameron <jic23@kernel.org>
> Cc: Hartmut Knaack <knaack.h@gmx.de>
> Cc: Peter Meerwald-Stadler <pmeerw@pmeerw.net>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Jiri Slaby <jslaby@suse.com>
> Cc: Yegor Yefremov <yegorslists@googlemail.com>
> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com>
> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
> --- a/drivers/auxdisplay/hd44780.c
> +++ b/drivers/auxdisplay/hd44780.c
> @@ -62,17 +62,12 @@ static void hd44780_strobe_gpio(struct hd44780 *hd)
> /* write to an LCD panel register in 8 bit GPIO mode */
> static void hd44780_write_gpio8(struct hd44780 *hd, u8 val, unsigned int rs)
> {
> - int values[10]; /* for DATA[0-7], RS, RW */
> - unsigned int i, n;
> -
> - for (i = 0; i < 8; i++)
> - values[PIN_DATA0 + i] = !!(val & BIT(i));
> - values[PIN_CTRL_RS] = rs;
> - n = 9;
> - if (hd->pins[PIN_CTRL_RW]) {
> - values[PIN_CTRL_RW] = 0;
> - n++;
> - }
> + DECLARE_BITMAP(values, 10); /* for DATA[0-7], RS, RW */
> + unsigned int n;
> +
> + *values = val;
Given DECLARE_BITMAP() creates an array, the above line looks a bit funny now.
IMHO, either you use
unsigned long values;
values = val;
__assign_bit(8, &values, rs);
or
DECLARE_BITMAP(values, 10);
values[0] = val;
__assign_bit(8, values, rs);
Nevertheless, for hd44780.c:
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* [PATCH net-next] net: usbnet: mark expected switch fall-through
From: Gustavo A. R. Silva @ 2018-09-03 18:48 UTC (permalink / raw)
To: Oliver Neukum, David S. Miller
Cc: netdev, linux-usb, linux-kernel, Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1077614 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/usb/usbnet.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 770aa62..73aa333 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1527,6 +1527,7 @@ static void usbnet_bh (struct timer_list *t)
continue;
case tx_done:
kfree(entry->urb->sg);
+ /* fall through */
case rx_cleanup:
usb_free_urb (entry->urb);
dev_kfree_skb (skb);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH v7 1/4] gpiolib: Pass bitmaps, not integer arrays, to get/set array
From: Geert Uytterhoeven @ 2018-09-03 14:24 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Janusz Krzysztofik, Linus Walleij, Jonathan Corbet,
Miguel Ojeda Sandonis, peter.korsgaard, Peter Rosin, Ulf Hansson,
Andrew Lunn, Florian Fainelli, David S. Miller, Dominik Brodowski,
Greg KH, Kishon Vijay Abraham I, Lars-Peter Clausen,
Michael Hennerich, Jonathan Cameron, Hartmut Knaack,
Peter Meerwald, Jiri Slaby, Willy Tarreau
In-Reply-To: <20180903043129.GA17856@bombadil.infradead.org>
On Mon, Sep 3, 2018 at 6:31 AM Matthew Wilcox <willy@infradead.org> wrote:
> > +++ b/drivers/auxdisplay/hd44780.c
> > @@ -62,17 +62,12 @@ static void hd44780_strobe_gpio(struct hd44780 *hd)
> > /* write to an LCD panel register in 8 bit GPIO mode */
> > static void hd44780_write_gpio8(struct hd44780 *hd, u8 val, unsigned int rs)
> > {
> > - int values[10]; /* for DATA[0-7], RS, RW */
> > - unsigned int i, n;
> > -
> > - for (i = 0; i < 8; i++)
> > - values[PIN_DATA0 + i] = !!(val & BIT(i));
> > - values[PIN_CTRL_RS] = rs;
> > - n = 9;
> > - if (hd->pins[PIN_CTRL_RW]) {
> > - values[PIN_CTRL_RW] = 0;
> > - n++;
> > - }
> > + DECLARE_BITMAP(values, 10); /* for DATA[0-7], RS, RW */
> > + unsigned int n;
> > +
> > + *values = val;
> > + __assign_bit(8, values, rs);
> > + n = hd->pins[PIN_CTRL_RW] ? 10 : 9;
>
> Doesn't this assume little endian bitmaps? Has anyone tested this on
> big-endian machines?
include/linux/bitops.h:
static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
bool value)
{
if (value)
__set_bit(nr, addr);
else
__clear_bit(nr, addr);
}
include/asm-generic/bitops/non-atomic.h:
static inline void __set_bit(int nr, volatile unsigned long *addr)
{
unsigned long mask = BIT_MASK(nr);
unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr);
*p |= mask;
}
include/linux/bits.h:
#define BIT_MASK(nr) (1UL << ((nr) % BITS_PER_LONG))
Looks like native endianness to me.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* [PATCH v2 net-next] packet: add sockopt to ignore outgoing packets
From: Vincent Whitchurch @ 2018-09-03 14:23 UTC (permalink / raw)
To: davem; +Cc: netdev, willemb, Vincent Whitchurch
Currently, the only way to ignore outgoing packets on a packet socket is
via the BPF filter. With MSG_ZEROCOPY, packets that are looped into
AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even
if the filter run from packet_rcv() would reject them. So the presence
of a packet socket on the interface takes away the benefits of
MSG_ZEROCOPY, even if the packet socket is not interested in outgoing
packets. (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily
cloned, but the cost for that is much lower.)
Add a socket option to allow AF_PACKET sockets to ignore outgoing
packets to solve this. Note that the *BSDs already have something
similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT.
The first intended user is lldpd.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
---
v2: Stricter value validation.
Moved ignore check out of skb_loop_sk().
include/linux/netdevice.h | 1 +
include/uapi/linux/if_packet.h | 1 +
net/core/dev.c | 3 +++
net/packet/af_packet.c | 17 +++++++++++++++++
4 files changed, 22 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ca5ab98053c8..8ef14d9edc58 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2317,6 +2317,7 @@ static inline struct sk_buff *call_gro_receive_sk(gro_receive_sk_t cb,
struct packet_type {
__be16 type; /* This is really htons(ether_type). */
+ bool ignore_outgoing;
struct net_device *dev; /* NULL is wildcarded here */
int (*func) (struct sk_buff *,
struct net_device *,
diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index 67b61d91d89b..467b654bd4c7 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -57,6 +57,7 @@ struct sockaddr_ll {
#define PACKET_QDISC_BYPASS 20
#define PACKET_ROLLOVER_STATS 21
#define PACKET_FANOUT_DATA 22
+#define PACKET_IGNORE_OUTGOING 23
#define PACKET_FANOUT_HASH 0
#define PACKET_FANOUT_LB 1
diff --git a/net/core/dev.c b/net/core/dev.c
index 325fc5088370..09dcf190c081 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1970,6 +1970,9 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
rcu_read_lock();
again:
list_for_each_entry_rcu(ptype, ptype_list, list) {
+ if (ptype->ignore_outgoing)
+ continue;
+
/* Never send packets back to the socket
* they originated from - MvS (miquels@drinkel.ow.org)
*/
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 5610061e7f2e..23336498eb9f 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3805,6 +3805,20 @@ packet_setsockopt(struct socket *sock, int level, int optname, char __user *optv
return fanout_set_data(po, optval, optlen);
}
+ case PACKET_IGNORE_OUTGOING:
+ {
+ int val;
+
+ if (optlen != sizeof(val))
+ return -EINVAL;
+ if (copy_from_user(&val, optval, sizeof(val)))
+ return -EFAULT;
+ if (val < 0 || val > 1)
+ return -EINVAL;
+
+ po->prot_hook.ignore_outgoing = !!val;
+ return 0;
+ }
case PACKET_TX_HAS_OFF:
{
unsigned int val;
@@ -3928,6 +3942,9 @@ static int packet_getsockopt(struct socket *sock, int level, int optname,
((u32)po->fanout->flags << 24)) :
0);
break;
+ case PACKET_IGNORE_OUTGOING:
+ val = po->prot_hook.ignore_outgoing;
+ break;
case PACKET_ROLLOVER_STATS:
if (!po->rollover)
return -EINVAL;
--
2.11.0
^ permalink raw reply related
* Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
From: Christian Brauner @ 2018-09-03 14:22 UTC (permalink / raw)
To: Kirill Tkhai
Cc: netdev, linux-kernel, davem, kuznet, yoshfuji, pombredanne,
kstewart, gregkh, dsahern, fw, lucien.xin, jakub.kicinski, jbenc,
nicolas.dichtel
In-Reply-To: <2319a029-7aca-b7aa-2e8f-4dfdeedcb6df@virtuozzo.com>
On Mon, Sep 03, 2018 at 04:41:45PM +0300, Kirill Tkhai wrote:
> On 01.09.2018 04:34, Christian Brauner wrote:
> > On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
> >> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> >>> On 29.08.2018 21:13, Christian Brauner wrote:
> >>>> Hi Kirill,
> >>>>
> >>>> Thanks for the question!
> >>>>
> >>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> >>>>> Hi, Christian,
> >>>>>
> >>>>> On 29.08.2018 02:18, Christian Brauner wrote:
> >>>>>> From: Christian Brauner <christian@brauner.io>
> >>>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
> >>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> >>>>>> to signficant performance increases since it allows userspace to avoid
> >>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> >>>>>> interfaces from the netns associated with the netns_fd. Especially when a
> >>>>>> lot of network namespaces are in use, using setns() becomes increasingly
> >>>>>> problematic when performance matters.
> >>>>>
> >>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> >>>>> problems with the performance? You should do this only once on application
> >>>>> startup, and then you have created netlink sockets in any net namespaces you
> >>>>> need. What is the problem here?
> >>>>
> >>>> So we have a daemon (LXD) that is often running thousands of containers.
> >>>> When users issue a lxc list request against the daemon it returns a list
> >>>> of all containers including all of the interfaces and addresses for each
> >>>> container. To retrieve those addresses we currently rely on setns() +
> >>>> getifaddrs() for each of those containers. That has horrible
> >>>> performance.
> >>>
> >>> Could you please provide some numbers showing that setns()
> >>> introduces signify performance decrease in the application?
> >>
> >> Sure, might take a few days++ though since I'm traveling.
> >
> > Hey Kirill,
> >
> > As promised here's some code [1] that compares performance. I basically
> > did a setns() to the network namespace and called getifaddrs() and
> > compared this to the scenario where I use the newly introduced property.
> > I did this 1 million times and calculated the mean getifaddrs()
> > retrieval time based on that.
> > My patch cuts the time in half.
> >
> > brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> > Mean time in microseconds (netnsid): 81
> > Mean time in microseconds (setns): 162
> >
> > Christian
> >
> > I'm only appending the main file since the netsns_getifaddrs() code I
> > used is pretty long:
> >
> > [1]:
> >
> > #define _GNU_SOURCE
> > #define __STDC_FORMAT_MACROS
> > #include <fcntl.h>
> > #include <inttypes.h>
> > #include <linux/types.h>
> > #include <sched.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <sys/stat.h>
> > #include <sys/time.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> >
> > #include "netns_getifaddrs.h"
> > #include "print_getifaddrs.h"
> >
> > #define ITERATIONS 1000000
> > #define SEC_TO_MICROSEC(x) (1000000 * (x))
> >
> > int main(int argc, char *argv[])
> > {
> > int i, ret;
> > __s32 netns_id;
> > pid_t netns_pid;
> > char path[1024];
> > intmax_t times[ITERATIONS];
> > struct timeval t1, t2;
> > intmax_t time_in_mcs;
> > int fret = EXIT_FAILURE;
> > intmax_t sum = 0;
> > int host_netns_fd = -1, netns_fd = -1;
> >
> > struct ifaddrs *ifaddrs = NULL;
> >
> > if (argc != 3)
> > goto on_error;
> >
> > netns_id = atoi(argv[1]);
> > netns_pid = atoi(argv[2]);
> > printf("%d\n", netns_id);
> > printf("%d\n", netns_pid);
> >
> > for (i = 0; i < ITERATIONS; i++) {
> > ret = gettimeofday(&t1, NULL);
> > if (ret < 0)
> > goto on_error;
> >
> > ret = netns_getifaddrs(&ifaddrs, netns_id);
> > freeifaddrs(ifaddrs);
> > if (ret < 0)
> > goto on_error;
> >
> > ret = gettimeofday(&t2, NULL);
> > if (ret < 0)
> > goto on_error;
> >
> > time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> > (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> > times[i] = time_in_mcs;
> > }
> >
> > for (i = 0; i < ITERATIONS; i++)
> > sum += times[i];
> >
> > printf("Mean time in microseconds (netnsid): %ju\n",
> > sum / ITERATIONS);
> >
> > ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> > if (ret < 0 || (size_t)ret >= sizeof(path))
> > goto on_error;
> >
> > netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> > if (netns_fd < 0)
> > goto on_error;
> >
> > host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> > if (host_netns_fd < 0)
> > goto on_error;
> >
> > memset(times, 0, sizeof(times));
> > for (i = 0; i < ITERATIONS; i++) {
> > ret = gettimeofday(&t1, NULL);
> > if (ret < 0)
> > goto on_error;
> >
> > ret = setns(netns_fd, CLONE_NEWNET);
> > if (ret < 0)
> > goto on_error;
> >
> > ret = getifaddrs(&ifaddrs);
> > freeifaddrs(ifaddrs);
> > if (ret < 0)
> > goto on_error;
> >
> > ret = gettimeofday(&t2, NULL);
> > if (ret < 0)
> > goto on_error;
> >
> > ret = setns(host_netns_fd, CLONE_NEWNET);
> > if (ret < 0)
> > goto on_error;
> >
> > time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> > (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> > times[i] = time_in_mcs;
> > }
> >
> > for (i = 0; i < ITERATIONS; i++)
> > sum += times[i];
> >
> > printf("Mean time in microseconds (setns): %ju\n",
> > sum / ITERATIONS);
> >
> > fret = EXIT_SUCCESS;
> >
> > on_error:
> > if (netns_fd >= 0)
> > close(netns_fd);
> >
> > if (host_netns_fd >= 0)
> > close(host_netns_fd);
> >
> > exit(fret);
> > }
>
> But this is a synthetic test, while I asked about real workflow.
> Is this real problem for lxd, and there is observed performance
> decrease?
As you can see in this mail I explicitly stated that it is a real
performacne issue we see with LXD. You asked for numbers I gave you
numbers by writing a test-program just per your request. The benefit of
this "synthetic" case is that it allows us to clearly see the
performance benefit. Expecting me to hack all of this into LXD just to
get some perf numbers that will show the exact same thing per your
request is - and I hope I'm not being unreasonable here - expecting a
bit much.
>
> I see, there are already nsid use in existing code, but I have to say,
> that adding new types of variables as a system call arguments make it
> less modular. When you request RTM_GETADDR for a specific nsid, this
> obligates the kernel to make everything unchangable during the call,
> doesn't it?
>
> We may look at existing code as example, what problems this may cause.
> Look at do_setlink(). There are many different types of variables,
> and all of them should be dereferenced atomically. So, all the function
> is executed under global rtnl. And this causes delays in another config
> places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
> may turn it in the same overloaded function as do_setlink() is. And one
> day, when we reach the state, when we must rework all of this, we won't
> be able to do this. I'm not sure, now is not too late.
>
> I just say about this, because it's possible we should consider another
> approach in rtnl communication in general, and stop to overload it.
While I sympathize with your concerns this all seems very vague. There
is a real-world use case that is solved by this patchset.
^ permalink raw reply
* Re: [PATCH v7 3/4] gpiolib: Pass array info to get/set array functions
From: Geert Uytterhoeven @ 2018-09-03 14:21 UTC (permalink / raw)
To: Janusz Krzysztofik
Cc: Linus Walleij, Jonathan Corbet, Miguel Ojeda Sandonis,
peter.korsgaard, Peter Rosin, Ulf Hansson, Andrew Lunn,
Florian Fainelli, David S. Miller, Dominik Brodowski, Greg KH,
Kishon Vijay Abraham I, Lars-Peter Clausen, Michael Hennerich,
Jonathan Cameron, Hartmut Knaack, Peter Meerwald, Jiri Slaby,
Willy Tarreau, open list:DOCUMENTATION
In-Reply-To: <20180902120144.6855-4-jmkrzyszt@gmail.com>
On Sun, Sep 2, 2018 at 2:01 PM Janusz Krzysztofik <jmkrzyszt@gmail.com> wrote:
> In order to make use of array info obtained from gpiod_get_array() and
> speed up processing of arrays matching single GPIO chip layout, that
> information must be passed to get/set array functions. Extend the
> functions' API with that additional parameter and update all users.
> Pass NULL if a user bulids an array itself from single GPIOs.
builds
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply
* Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jerome Brunet @ 2018-09-03 14:10 UTC (permalink / raw)
To: Jose Abreu, netdev
Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <678dfc4b-9945-186b-33b5-dd2c32984e6f@synopsys.com>
On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> On 03-09-2018 11:16, Jerome Brunet wrote:
> > No notable change. Rx is fine but Tx:
> > [ 5] 3.00-4.00 sec 3.55 MBytes 29.8 Mbits/sec 51 12.7 KBytes
> >
> > I suppose the problem as something to do with the retries. When doing Tx test
> > alone, we don't have such a things a throughput where we expect it to be.
>
> Yeah, I just remembered you are not using GMAC4 so it wouldn't
> make a difference. Is your version 3.710? If so please try adding
> the following compatible to your DT bindings "snps,dwmac-3.710".
According to the documentation, it is a 3.70a but I learn (the hard way) not to
trust the documentation too much. Is there anyway to make sure which version we
have. Like a register to read ?
Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. For
some reason, the MDIO bus failed to register with this. Since it is not the
documented version, I did not check why.
>
> >
> > By the way, your mailer (and its auto 80 column rule I suppose) made the patch
> > below a bit harder to apply
>
> Sorry. Next time I will send as attachment.
No worries
>
> Thanks and Best Regards,
> Jose Miguel Abreu
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jerome Brunet @ 2018-09-03 14:07 UTC (permalink / raw)
To: Jose Abreu, netdev
Cc: Martin Blumenstingl, David S. Miller, Joao Pinto,
Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <573f2a5833c3baa0120e0b83d302d8edcaac7cfc.1535981331.git.joabreu@synopsys.com>
On Mon, 2018-09-03 at 14:35 +0100, Jose Abreu wrote:
> This follows David Miller advice and tries to fix coalesce timer in
> multi-queue scenarios.
>
> We are now using per-queue coalesce values and per-queue TX timer.
>
> Coalesce timer default values was changed to 1ms and the coalesce frames
> to 25.
>
> Tested in B2B setup between XGMAC2 and GMAC5.
>
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>
> Cc: Jerome Brunet <jbrunet@baylibre.com>
> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> ---
> Jerome,
>
> Can I have your Tested-by in this patch?
You had it on what you sent in the RFT, but this different.
Like with the RFT, the network breakdown we had is no longer reproduced.
However this patch wreck the Rx throughput (680MBps -> 35MBps)
BTW, this patch and the RFT assume that 4ae0169fd1b3 ("net: stmmac: Do not keep
rearming the coalesce timer in stmmac_xmit") is still applied but I believe
David reverted the patch.
If you still need this change, you should include it back in your changeset.
>
> Thanks and Best Regards,
> Jose Miguel Abreu
> ---
> drivers/net/ethernet/stmicro/stmmac/common.h | 4 +-
> drivers/net/ethernet/stmicro/stmmac/stmmac.h | 7 +-
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 +++++++++++++++-------
> 3 files changed, 126 insertions(+), 62 deletions(-)
^ permalink raw reply
* Re: phys_port_id in switchdev mode?
From: Marcelo Ricardo Leitner @ 2018-09-03 13:55 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Florian Fainelli, Or Gerlitz, Simon Horman, Andy Gospodarek,
mchan@broadcom.com, Jiri Pirko, Alexander Duyck, Frederick Botha,
nick viljoen, netdev@vger.kernel.org
In-Reply-To: <20180901133412.2939466a@cakuba.netronome.com>
On Sat, Sep 01, 2018 at 01:34:12PM +0200, Jakub Kicinski wrote:
> On Fri, 31 Aug 2018 17:13:22 -0300, Marcelo Ricardo Leitner wrote:
> > On Tue, Aug 28, 2018 at 08:43:51PM +0200, Jakub Kicinski wrote:
> > > Ugh, CC: netdev..
> > >
> > > On Tue, 28 Aug 2018 20:05:39 +0200, Jakub Kicinski wrote:
> > > > Hi!
> > > >
> > > > I wonder if we can use phys_port_id in switchdev to group together
> > > > interfaces of a single PCI PF? Here is the problem:
> >
> > On Mellanox cards, this is already possible via phys_switch_id, as
> > each PF has its own phys_switch_id. So all VFs with a given
> > phys_switch_id belong to the PF with that same phys_switch_id.
>
> You mean Connect-X4 and on, Connect-X3 also shares PF between ports.
Yes ConnectX-3 shares PF beween ports but doesn't support switchdev
mode.
I see the issue now. I was still considering the external ports as
uplink representors.
^ permalink raw reply
* [PATCH] can: peak_canfd: fix spelling mistake in fall-through annotation
From: Gustavo A. R. Silva @ 2018-09-03 18:09 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller
Cc: linux-can, netdev, linux-kernel, Gustavo A. R. Silva
Replace "fallthough" with a proper "fall through" annotation.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/can/peak_canfd/peak_pciefd_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/peak_canfd/peak_pciefd_main.c b/drivers/net/can/peak_canfd/peak_pciefd_main.c
index c458d5f..e4f4d65a 100644
--- a/drivers/net/can/peak_canfd/peak_pciefd_main.c
+++ b/drivers/net/can/peak_canfd/peak_pciefd_main.c
@@ -668,7 +668,7 @@ static int pciefd_can_probe(struct pciefd_board *pciefd)
pciefd_can_writereg(priv, CANFD_CLK_SEL_80MHZ,
PCIEFD_REG_CAN_CLK_SEL);
- /* fallthough */
+ /* fall through */
case CANFD_CLK_SEL_80MHZ:
priv->ucan.can.clock.freq = 80 * 1000 * 1000;
break;
--
2.7.4
^ permalink raw reply related
* [PATCH 9/9] nfp: Do not call pcie_print_link_status()
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
In-Reply-To: <20180903180242.14504-1-mr.nuke.me@gmail.com>
This is now done by the PCI core to warn of sub-optimal bandwidth.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
index c8d0b1016a64..87dde0f787e9 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c
@@ -1328,7 +1328,6 @@ struct nfp_cpp *nfp_cpp_from_nfp6000_pcie(struct pci_dev *pdev)
/* Finished with card initialization. */
dev_info(&pdev->dev,
"Netronome Flow Processor NFP4000/NFP6000 PCIe Card Probe\n");
- pcie_print_link_status(pdev);
nfp = kzalloc(sizeof(*nfp), GFP_KERNEL);
if (!nfp) {
--
2.17.1
^ permalink raw reply related
* [PATCH 8/9] net/mlx5: Do not call pcie_print_link_status()
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
In-Reply-To: <20180903180242.14504-1-mr.nuke.me@gmail.com>
This is now done by the PCI core to warn of sub-optimal bandwidth.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index cf3e4a659052..888af98694f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1056,10 +1056,6 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv,
dev_info(&pdev->dev, "firmware version: %d.%d.%d\n", fw_rev_maj(dev),
fw_rev_min(dev), fw_rev_sub(dev));
- /* Only PFs hold the relevant PCIe information for this query */
- if (mlx5_core_is_pf(dev))
- pcie_print_link_status(dev->pdev);
-
/* on load removing any previous indication of internal error, device is
* up
*/
--
2.17.1
^ permalink raw reply related
* [PATCH 7/9] net/mlx4: Do not call pcie_print_link_status()
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
In-Reply-To: <20180903180242.14504-1-mr.nuke.me@gmail.com>
This is now done by the PCI core to warn of sub-optimal bandwidth.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index d2d59444f562..9902fa3a2c13 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -3525,13 +3525,6 @@ static int mlx4_load_one(struct pci_dev *pdev, int pci_dev_data,
}
}
- /* check if the device is functioning at its maximum possible speed.
- * No return code for this call, just warn the user in case of PCI
- * express device capabilities are under-satisfied by the bus.
- */
- if (!mlx4_is_slave(dev))
- pcie_print_link_status(dev->persist->pdev);
-
/* In master functions, the communication channel must be initialized
* after obtaining its address from fw */
if (mlx4_is_master(dev)) {
--
2.17.1
^ permalink raw reply related
* [PATCH 6/9] ixgbe: Do not call pcie_print_link_status()
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
In-Reply-To: <20180903180242.14504-1-mr.nuke.me@gmail.com>
This is now done by the PCI core to warn of sub-optimal bandwidth.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 -------------------
1 file changed, 26 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9a23d33a47ed..9663419e0ceb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -241,28 +241,6 @@ static inline bool ixgbe_pcie_from_parent(struct ixgbe_hw *hw)
}
}
-static void ixgbe_check_minimum_link(struct ixgbe_adapter *adapter,
- int expected_gts)
-{
- struct ixgbe_hw *hw = &adapter->hw;
- struct pci_dev *pdev;
-
- /* Some devices are not connected over PCIe and thus do not negotiate
- * speed. These devices do not have valid bus info, and thus any report
- * we generate may not be correct.
- */
- if (hw->bus.type == ixgbe_bus_type_internal)
- return;
-
- /* determine whether to use the parent device */
- if (ixgbe_pcie_from_parent(&adapter->hw))
- pdev = adapter->pdev->bus->parent->self;
- else
- pdev = adapter->pdev;
-
- pcie_print_link_status(pdev);
-}
-
static void ixgbe_service_event_schedule(struct ixgbe_adapter *adapter)
{
if (!test_bit(__IXGBE_DOWN, &adapter->state) &&
@@ -10792,10 +10770,6 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
break;
}
- /* don't check link if we failed to enumerate functions */
- if (expected_gts > 0)
- ixgbe_check_minimum_link(adapter, expected_gts);
-
err = ixgbe_read_pba_string_generic(hw, part_str, sizeof(part_str));
if (err)
strlcpy(part_str, "Unknown", sizeof(part_str));
--
2.17.1
^ permalink raw reply related
* [PATCH 4/9] cxgb4: Do not call pcie_print_link_status()
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
In-Reply-To: <20180903180242.14504-1-mr.nuke.me@gmail.com>
This is now done by the PCI core to warn of sub-optimal bandwidth.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 961e3087d1d3..1deb68c99a63 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -5782,9 +5782,6 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
free_msix_info(adapter);
}
- /* check for PCI Express bandwidth capabiltites */
- pcie_print_link_status(pdev);
-
err = init_rss(adapter);
if (err)
goto out_free_dev;
--
2.17.1
^ permalink raw reply related
* [PATCH 2/9] bnx2x: Do not call pcie_print_link_status()
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
In-Reply-To: <20180903180242.14504-1-mr.nuke.me@gmail.com>
This is now done by the PCI core to warn of sub-optimal bandwidth.
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 71362b7f6040..9bd0852d9a66 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -14100,7 +14100,6 @@ static int bnx2x_init_one(struct pci_dev *pdev,
board_info[ent->driver_data].name,
(CHIP_REV(bp) >> 12) + 'A', (CHIP_METAL(bp) >> 4),
dev->base_addr, bp->pdev->irq, dev->dev_addr);
- pcie_print_link_status(bp->pdev);
bnx2x_register_phc(bp);
--
2.17.1
^ permalink raw reply related
* [PATCH 0/9] Export PCIe bandwidth via sysfs
From: Alexandru Gagniuc @ 2018-09-03 18:02 UTC (permalink / raw)
To: linux-pci, bhelgaas
Cc: keith.busch, alex_gagniuc, austin_bolen, shyam_iyer,
Alexandru Gagniuc, Ariel Elior, everest-linux-l2, David S. Miller,
Michael Chan, Ganesh Goudar, Jeff Kirsher, Tariq Toukan,
Saeed Mahameed, Leon Romanovsky, Jakub Kicinski,
Dirk van der Merwe, netdev, linux-kernel, intel-wired-lan,
linux-rdma, oss-drivers
This is a follow-on series to
Commit 2d1ce5ec2117 ("PCI: Check for PCIe Link downtraining")
The remaining issues was that some pcie drivers print link status directly,
sometimes resulting in duplicate system log messages with degraded links.
>From my understanding, the maintainers of these drivers are fine with
removing the duplicate prints as long as the bandwidth information is
readily available. sysfs seemed to be the consensus.
Example:
$ cat /sys/bus/pci/devices/0000:b1:00.0/available_bandwidth
7.876 Gb/s
Alexandru Gagniuc (9):
PCI: sysfs: Export available PCIe bandwidth
bnx2x: Do not call pcie_print_link_status()
bnxt_en: Do not call pcie_print_link_status()
cxgb4: Do not call pcie_print_link_status()
fm10k: Do not call pcie_print_link_status()
ixgbe: Do not call pcie_print_link_status()
net/mlx4: Do not call pcie_print_link_status()
net/mlx5: Do not call pcie_print_link_status()
nfp: Do not call pcie_print_link_status()
.../net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 1 -
.../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 3 ---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 ---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 26 -------------------
drivers/net/ethernet/mellanox/mlx4/main.c | 7 -----
.../net/ethernet/mellanox/mlx5/core/main.c | 4 ---
.../netronome/nfp/nfpcore/nfp6000_pcie.c | 1 -
drivers/pci/pci-sysfs.c | 13 ++++++++++
9 files changed, 13 insertions(+), 46 deletions(-)
--
2.17.1
^ permalink raw reply
* [PATCH] vhost: fix VHOST_GET_BACKEND_FEATURES ioctl request definition
From: Gleb Fotengauer-Malinovskiy @ 2018-09-03 17:59 UTC (permalink / raw)
To: Michael S. Tsirkin, Jason Wang, David S. Miller, kvm,
virtualization, netdev, linux-kernel
The _IOC_READ flag fits this ioctl request more because this request
actually only writes to, but doesn't read from userspace.
See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
Fixes: 429711aec282 ("vhost: switch to use new message format")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
---
include/uapi/linux/vhost.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index b1e22c40c4b6..84c3de89696a 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -176,7 +176,7 @@ struct vhost_memory {
#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
#define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
-#define VHOST_GET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x26, __u64)
+#define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
/* VHOST_NET specific defines */
--
glebfm
^ permalink raw reply related
* [PATCH net-next 2/2] net: stmmac: Fixup the tail addr setting in xmit path
From: Jose Abreu @ 2018-09-03 13:35 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
Alexandre Torgue
In-Reply-To: <cover.1535981331.git.joabreu@synopsys.com>
Currently we are always setting the tail address of descriptor list to
the end of the pre-allocated list.
According to databook this is not correct. Tail address should point to
the last available descriptor + 1, which means we have to update the
tail address everytime we call the xmit function.
This should make no impact in older versions of MAC but in newer
versions there are some DMA features which allows the IP to fetch
descriptors in advance and in a non sequential order so its critical
that we set the tail address correctly.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1fca66ad6b17..14f890f2a970 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2224,8 +2224,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
stmmac_init_tx_chan(priv, priv->ioaddr, priv->plat->dma_cfg,
tx_q->dma_tx_phy, chan);
- tx_q->tx_tail_addr = tx_q->dma_tx_phy +
- (DMA_TX_SIZE * sizeof(struct dma_desc));
+ tx_q->tx_tail_addr = tx_q->dma_tx_phy;
stmmac_set_tx_tail_ptr(priv, priv->ioaddr,
tx_q->tx_tail_addr, chan);
}
@@ -3015,6 +3014,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
+ tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
@@ -3235,6 +3235,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
stmmac_enable_dma_transmission(priv, priv->ioaddr);
+ tx_q->tx_tail_addr = tx_q->dma_tx_phy + (tx_q->cur_tx * sizeof(*desc));
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jose Abreu @ 2018-09-03 13:35 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, Jerome Brunet, Martin Blumenstingl, David S. Miller,
Joao Pinto, Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <cover.1535981331.git.joabreu@synopsys.com>
This follows David Miller advice and tries to fix coalesce timer in
multi-queue scenarios.
We are now using per-queue coalesce values and per-queue TX timer.
Coalesce timer default values was changed to 1ms and the coalesce frames
to 25.
Tested in B2B setup between XGMAC2 and GMAC5.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
---
Jerome,
Can I have your Tested-by in this patch?
Thanks and Best Regards,
Jose Miguel Abreu
---
drivers/net/ethernet/stmicro/stmmac/common.h | 4 +-
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 7 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 177 +++++++++++++++-------
3 files changed, 126 insertions(+), 62 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1854f270ad66..b1b305f8f414 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -258,10 +258,10 @@ struct stmmac_safety_stats {
#define MAX_DMA_RIWT 0xff
#define MIN_DMA_RIWT 0x20
/* Tx coalesce parameters */
-#define STMMAC_COAL_TX_TIMER 40000
+#define STMMAC_COAL_TX_TIMER 1000
#define STMMAC_MAX_COAL_TX_TICK 100000
#define STMMAC_TX_MAX_FRAMES 256
-#define STMMAC_TX_FRAMES 64
+#define STMMAC_TX_FRAMES 25
/* Packets types */
enum packets_types {
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac.h b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
index 76649adf8fb0..957030cfb833 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac.h
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac.h
@@ -48,6 +48,9 @@ struct stmmac_tx_info {
/* Frequently used values are kept adjacent for cache effect */
struct stmmac_tx_queue {
+ u32 tx_count_frames;
+ int tx_timer_active;
+ struct timer_list txtimer;
u32 queue_index;
struct stmmac_priv *priv_data;
struct dma_extended_desc *dma_etx ____cacheline_aligned_in_smp;
@@ -59,6 +62,7 @@ struct stmmac_tx_queue {
dma_addr_t dma_tx_phy;
u32 tx_tail_addr;
u32 mss;
+ struct napi_struct napi ____cacheline_aligned_in_smp;
};
struct stmmac_rx_queue {
@@ -109,15 +113,12 @@ struct stmmac_pps_cfg {
struct stmmac_priv {
/* Frequently used values are kept adjacent for cache effect */
- u32 tx_count_frames;
u32 tx_coal_frames;
u32 tx_coal_timer;
- bool tx_timer_armed;
int tx_coalesce;
int hwts_tx_en;
bool tx_path_in_lpi_mode;
- struct timer_list txtimer;
bool tso;
unsigned int dma_buf_sz;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index ff1ffb46198a..1fca66ad6b17 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -148,6 +148,7 @@ static void stmmac_verify_args(void)
static void stmmac_disable_all_queues(struct stmmac_priv *priv)
{
u32 rx_queues_cnt = priv->plat->rx_queues_to_use;
+ u32 tx_queues_cnt = priv->plat->tx_queues_to_use;
u32 queue;
for (queue = 0; queue < rx_queues_cnt; queue++) {
@@ -155,6 +156,12 @@ static void stmmac_disable_all_queues(struct stmmac_priv *priv)
napi_disable(&rx_q->napi);
}
+
+ for (queue = 0; queue < tx_queues_cnt; queue++) {
+ struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
+
+ napi_disable(&tx_q->napi);
+ }
}
/**
@@ -164,6 +171,7 @@ static void stmmac_disable_all_queues(struct stmmac_priv *priv)
static void stmmac_enable_all_queues(struct stmmac_priv *priv)
{
u32 rx_queues_cnt = priv->plat->rx_queues_to_use;
+ u32 tx_queues_cnt = priv->plat->tx_queues_to_use;
u32 queue;
for (queue = 0; queue < rx_queues_cnt; queue++) {
@@ -171,6 +179,12 @@ static void stmmac_enable_all_queues(struct stmmac_priv *priv)
napi_enable(&rx_q->napi);
}
+
+ for (queue = 0; queue < tx_queues_cnt; queue++) {
+ struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
+
+ napi_enable(&tx_q->napi);
+ }
}
/**
@@ -1843,18 +1857,18 @@ static void stmmac_dma_operation_mode(struct stmmac_priv *priv)
* @queue: TX queue index
* Description: it reclaims the transmit resources after transmission completes.
*/
-static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
+static int stmmac_tx_clean(struct stmmac_priv *priv, int limit, u32 queue)
{
struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
unsigned int bytes_compl = 0, pkts_compl = 0;
- unsigned int entry;
+ unsigned int entry, count = 0;
netif_tx_lock(priv->dev);
priv->xstats.tx_clean++;
entry = tx_q->dirty_tx;
- while (entry != tx_q->cur_tx) {
+ while ((entry != tx_q->cur_tx) && (count < limit)) {
struct sk_buff *skb = tx_q->tx_skbuff[entry];
struct dma_desc *p;
int status;
@@ -1870,6 +1884,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
if (unlikely(status & tx_dma_own))
break;
+ count++;
+
/* Make sure descriptor fields are read after reading
* the own bit.
*/
@@ -1938,6 +1954,8 @@ static void stmmac_tx_clean(struct stmmac_priv *priv, u32 queue)
mod_timer(&priv->eee_ctrl_timer, STMMAC_LPI_T(eee_timer));
}
netif_tx_unlock(priv->dev);
+
+ return count;
}
/**
@@ -2034,7 +2052,6 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
u32 channels_to_check = tx_channel_count > rx_channel_count ?
tx_channel_count : rx_channel_count;
u32 chan;
- bool poll_scheduled = false;
int status[max_t(u32, MTL_MAX_TX_QUEUES, MTL_MAX_RX_QUEUES)];
/* Make sure we never check beyond our status buffer. */
@@ -2055,11 +2072,8 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
if (likely(status[chan] & handle_rx)) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
- if (likely(napi_schedule_prep(&rx_q->napi))) {
- stmmac_disable_dma_irq(priv, priv->ioaddr, chan);
+ if (likely(napi_schedule_prep(&rx_q->napi)))
__napi_schedule(&rx_q->napi);
- poll_scheduled = true;
- }
}
}
@@ -2067,22 +2081,12 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
* If we didn't schedule poll, see if any DMA channel (used by tx) has a
* completed transmission, if so, call stmmac_poll (once).
*/
- if (!poll_scheduled) {
- for (chan = 0; chan < tx_channel_count; chan++) {
- if (status[chan] & handle_tx) {
- /* It doesn't matter what rx queue we choose
- * here. We use 0 since it always exists.
- */
- struct stmmac_rx_queue *rx_q =
- &priv->rx_queue[0];
+ for (chan = 0; chan < tx_channel_count; chan++) {
+ if (status[chan] & handle_tx) {
+ struct stmmac_tx_queue *tx_q = &priv->tx_queue[chan];
- if (likely(napi_schedule_prep(&rx_q->napi))) {
- stmmac_disable_dma_irq(priv,
- priv->ioaddr, chan);
- __napi_schedule(&rx_q->napi);
- }
- break;
- }
+ if (likely(napi_schedule_prep(&tx_q->napi)))
+ __napi_schedule(&tx_q->napi);
}
}
@@ -2241,13 +2245,15 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv)
*/
static void stmmac_tx_timer(struct timer_list *t)
{
- struct stmmac_priv *priv = from_timer(priv, t, txtimer);
- u32 tx_queues_count = priv->plat->tx_queues_to_use;
- u32 queue;
+ struct stmmac_tx_queue *tx_q = from_timer(tx_q, t, txtimer);
+ struct stmmac_priv *priv = tx_q->priv_data;
- /* let's scan all the tx queues */
- for (queue = 0; queue < tx_queues_count; queue++)
- stmmac_tx_clean(priv, queue);
+ if (napi_schedule_prep(&tx_q->napi)) {
+ stmmac_disable_dma_irq(priv, priv->ioaddr, tx_q->queue_index);
+ __napi_schedule(&tx_q->napi);
+ }
+
+ tx_q->tx_timer_active = 0;
}
/**
@@ -2260,11 +2266,17 @@ static void stmmac_tx_timer(struct timer_list *t)
*/
static void stmmac_init_tx_coalesce(struct stmmac_priv *priv)
{
+ u32 tx_channel_count = priv->plat->tx_queues_to_use;
+ u32 chan;
+
priv->tx_coal_frames = STMMAC_TX_FRAMES;
priv->tx_coal_timer = STMMAC_COAL_TX_TIMER;
- timer_setup(&priv->txtimer, stmmac_tx_timer, 0);
- priv->txtimer.expires = STMMAC_COAL_TIMER(priv->tx_coal_timer);
- add_timer(&priv->txtimer);
+
+ for (chan = 0; chan < tx_channel_count; chan++) {
+ struct stmmac_tx_queue *tx_q = &priv->tx_queue[chan];
+
+ timer_setup(&tx_q->txtimer, stmmac_tx_timer, 0);
+ }
}
static void stmmac_set_rings_length(struct stmmac_priv *priv)
@@ -2592,6 +2604,7 @@ static void stmmac_hw_teardown(struct net_device *dev)
static int stmmac_open(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
+ u32 chan;
int ret;
stmmac_check_ether_addr(priv);
@@ -2688,7 +2701,9 @@ static int stmmac_open(struct net_device *dev)
if (dev->phydev)
phy_stop(dev->phydev);
- del_timer_sync(&priv->txtimer);
+ for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
+ del_timer_sync(&priv->tx_queue[chan].txtimer);
+
stmmac_hw_teardown(dev);
init_error:
free_dma_desc_resources(priv);
@@ -2708,6 +2723,7 @@ static int stmmac_open(struct net_device *dev)
static int stmmac_release(struct net_device *dev)
{
struct stmmac_priv *priv = netdev_priv(dev);
+ u32 chan;
if (priv->eee_enabled)
del_timer_sync(&priv->eee_ctrl_timer);
@@ -2722,7 +2738,8 @@ static int stmmac_release(struct net_device *dev)
stmmac_disable_all_queues(priv);
- del_timer_sync(&priv->txtimer);
+ for (chan = 0; chan < priv->plat->tx_queues_to_use; chan++)
+ del_timer_sync(&priv->tx_queue[chan].txtimer);
/* Free the IRQ lines */
free_irq(dev->irq, dev);
@@ -2828,6 +2845,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
int tmp_pay_len = 0;
u32 pay_len, mss;
u8 proto_hdr_len;
+ bool tx_ic;
int i;
tx_q = &priv->tx_queue[queue];
@@ -2936,12 +2954,17 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
priv->xstats.tx_tso_nfrags += nfrags;
/* Manage tx mitigation */
- priv->tx_count_frames += nfrags + 1;
- if (likely(priv->tx_coal_frames > priv->tx_count_frames)) {
- mod_timer(&priv->txtimer,
- STMMAC_COAL_TIMER(priv->tx_coal_timer));
- } else {
- priv->tx_count_frames = 0;
+ tx_q->tx_count_frames += nfrags + 1;
+ if (!priv->tx_coal_frames)
+ tx_ic = false;
+ else if ((nfrags + 1) > priv->tx_coal_frames)
+ tx_ic = true;
+ else if ((tx_q->tx_count_frames % priv->tx_coal_frames) < (nfrags + 1))
+ tx_ic = true;
+ else
+ tx_ic = false;
+
+ if (tx_ic) {
stmmac_set_tx_ic(priv, desc);
priv->xstats.tx_set_ic_bit++;
}
@@ -2994,6 +3017,12 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
+ if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
+ tx_q->tx_timer_active = 1;
+ mod_timer(&tx_q->txtimer,
+ STMMAC_COAL_TIMER(priv->tx_coal_timer));
+ }
+
return NETDEV_TX_OK;
dma_map_err:
@@ -3024,6 +3053,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
struct stmmac_tx_queue *tx_q;
unsigned int enh_desc;
unsigned int des;
+ bool tx_ic;
tx_q = &priv->tx_queue[queue];
@@ -3146,17 +3176,19 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
* This approach takes care about the fragments: desc is the first
* element in case of no SG.
*/
- priv->tx_count_frames += nfrags + 1;
- if (likely(priv->tx_coal_frames > priv->tx_count_frames) &&
- !priv->tx_timer_armed) {
- mod_timer(&priv->txtimer,
- STMMAC_COAL_TIMER(priv->tx_coal_timer));
- priv->tx_timer_armed = true;
- } else {
- priv->tx_count_frames = 0;
+ tx_q->tx_count_frames += nfrags + 1;
+ if (!priv->tx_coal_frames)
+ tx_ic = false;
+ else if ((nfrags + 1) > priv->tx_coal_frames)
+ tx_ic = true;
+ else if ((tx_q->tx_count_frames % priv->tx_coal_frames) < (nfrags + 1))
+ tx_ic = true;
+ else
+ tx_ic = false;
+
+ if (tx_ic) {
stmmac_set_tx_ic(priv, desc);
priv->xstats.tx_set_ic_bit++;
- priv->tx_timer_armed = false;
}
skb_tx_timestamp(skb);
@@ -3202,8 +3234,15 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
netdev_tx_sent_queue(netdev_get_tx_queue(dev, queue), skb->len);
stmmac_enable_dma_transmission(priv, priv->ioaddr);
+
stmmac_set_tx_tail_ptr(priv, priv->ioaddr, tx_q->tx_tail_addr, queue);
+ if (priv->tx_coal_timer && !tx_q->tx_timer_active) {
+ tx_q->tx_timer_active = 1;
+ mod_timer(&tx_q->txtimer,
+ STMMAC_COAL_TIMER(priv->tx_coal_timer));
+ }
+
return NETDEV_TX_OK;
dma_map_err:
@@ -3517,22 +3556,16 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
* Description :
* To look at the incoming frames and clear the tx resources.
*/
-static int stmmac_poll(struct napi_struct *napi, int budget)
+static int stmmac_rx_poll(struct napi_struct *napi, int budget)
{
struct stmmac_rx_queue *rx_q =
container_of(napi, struct stmmac_rx_queue, napi);
struct stmmac_priv *priv = rx_q->priv_data;
- u32 tx_count = priv->plat->tx_queues_to_use;
u32 chan = rx_q->queue_index;
int work_done = 0;
- u32 queue;
priv->xstats.napi_poll++;
- /* check all the queues */
- for (queue = 0; queue < tx_count; queue++)
- stmmac_tx_clean(priv, queue);
-
work_done = stmmac_rx(priv, budget, rx_q->queue_index);
if (work_done < budget) {
napi_complete_done(napi, work_done);
@@ -3541,6 +3574,24 @@ static int stmmac_poll(struct napi_struct *napi, int budget)
return work_done;
}
+static int stmmac_tx_poll(struct napi_struct *napi, int budget)
+{
+ struct stmmac_tx_queue *tx_q =
+ container_of(napi, struct stmmac_tx_queue, napi);
+ struct stmmac_priv *priv = tx_q->priv_data;
+ u32 chan = tx_q->queue_index;
+ int work_done = 0;
+
+ priv->xstats.napi_poll++;
+
+ work_done = stmmac_tx_clean(priv, budget, chan);
+ if (work_done < budget) {
+ napi_complete_done(napi, work_done);
+ stmmac_enable_dma_irq(priv, priv->ioaddr, chan);
+ }
+ return work_done;
+}
+
/**
* stmmac_tx_timeout
* @dev : Pointer to net device structure
@@ -4328,10 +4379,17 @@ int stmmac_dvr_probe(struct device *device,
for (queue = 0; queue < priv->plat->rx_queues_to_use; queue++) {
struct stmmac_rx_queue *rx_q = &priv->rx_queue[queue];
- netif_napi_add(ndev, &rx_q->napi, stmmac_poll,
+ netif_napi_add(ndev, &rx_q->napi, stmmac_rx_poll,
(8 * priv->plat->rx_queues_to_use));
}
+ for (queue = 0; queue < priv->plat->tx_queues_to_use; queue++) {
+ struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
+
+ netif_napi_add(ndev, &tx_q->napi, stmmac_tx_poll,
+ (8 * priv->plat->tx_queues_to_use));
+ }
+
mutex_init(&priv->lock);
/* If a specific clk_csr value is passed from the platform
@@ -4380,6 +4438,11 @@ int stmmac_dvr_probe(struct device *device,
netif_napi_del(&rx_q->napi);
}
+ for (queue = 0; queue < priv->plat->tx_queues_to_use; queue++) {
+ struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
+
+ netif_napi_del(&tx_q->napi);
+ }
error_hw_init:
destroy_workqueue(priv->wq);
error_wq:
--
2.7.4
^ permalink raw reply related
* [PATCH net-next 0/2] net: stmmac: Coalesce and tail addr fixes
From: Jose Abreu @ 2018-09-03 13:35 UTC (permalink / raw)
To: netdev
Cc: Jose Abreu, Jerome Brunet, Martin Blumenstingl, David S. Miller,
Joao Pinto, Giuseppe Cavallaro, Alexandre Torgue
The fix for coalesce timer and a fix in tail address setting that impacts
XGMAC2 operation.
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Jose Abreu (2):
net: stmmac: Rework coalesce timer and fix multi-queue races
net: stmmac: Fixup the tail addr setting in xmit path
drivers/net/ethernet/stmicro/stmmac/common.h | 4 +-
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 7 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 182 +++++++++++++++-------
3 files changed, 129 insertions(+), 64 deletions(-)
--
2.7.4
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox