* Re: [PATCH 1/7] fix hnode refcounting
From: Jamal Hadi Salim @ 2018-09-07 12:13 UTC (permalink / raw)
To: Al Viro; +Cc: netdev, Cong Wang, Jiri Pirko, stable
In-Reply-To: <20180907023529.GV19965@ZenIV.linux.org.uk>
On 2018-09-06 10:35 p.m., Al Viro wrote:
> On Thu, Sep 06, 2018 at 06:21:09AM -0400, Jamal Hadi Salim wrote:
[..]
>
> Argh... Unfortunately, there's this: in u32_delete() we have
> if (root_ht) {
> if (root_ht->refcnt > 1) {
> *last = false;
> goto ret;
> }
> if (root_ht->refcnt == 1) {
> if (!ht_empty(root_ht)) {
> *last = false;
> goto ret;
> }
> }
> }
> and that would need to be updated.
It is not detrimental as you have it right now but
you are right an adjustment is needed...
Deleting of a root directly should not be allowed. But
you can flush a whole tp. Consider this:
--
sudo tc qdisc add dev $P ingress
sudo tc filter add dev $P parent ffff: protocol ip prio 10 \
u32 match ip protocol 1 0xff
Which creates root ht 800
You shouldnt be allowed to do this:
--
tc filter delete dev $P parent ffff: protocol ip prio 10 handle 800: u32
---
But you can delete the tp entirely as such:
---
tc filter delete dev $P parent ffff: protocol ip prio 10 u32
--
The later will go via the destroy() path and flush all filters.
You should also be able to delete individual filters. ex:
$tc filter del dev $P parent ffff: prio 10 handle 800:0:800 u32
Where that code you are referring to is important is when
the last filter deleted - we need the caller to know
and it destroys root.
i.e you should return last=true when the last filter is
deleted so root gets auto deleted (just like it was autocreated)
> However, that logics is bloody odd
> to start with. First of all, root_ht has come from
> struct tc_u_hnode *root_ht = rtnl_dereference(tp->root);
> and the only place where it's ever modified is
> rcu_assign_pointer(tp->root, root_ht);
> in u32_init(), where we'd bloody well checked that root_ht is non-NULL
> (see
> if (root_ht == NULL)
> return -ENOBUFS;
> upstream of that place) and where that assignment is inevitable on the
> way to returning 0. No matter what, if tp has passed u32_init() it
> will have non-NULL ->root, forever. And there is no way for tcf_proto
> to be seen outside of tcf_proto_create() without ->init() having returned
> 0 - it gets freed before anyone sees it.
>
Yes, the check for root_ht is not necessary - but the check for the
last filter (and testing for last) is needed.
> So this 'if (root_ht)' can't be false. What's more, what the hell is the
> whole thing checking? We are in u32_delete(). It's called (as ->delete())
> from tfilter_del_notify(), which is called from tc_del_tfilter(). If we
> return 0 with *last true, we follow up calling tcf_proto_destroy().
> OK, let's look at the logics in there:
> * if there are links to root hnode => false
> * if there's no links to root hnode and it has knodes => false
> (BTW, if we ever get there with root_ht->refcnt < 1, we are obviously screwed)
> * if there is a tcf_proto sharing tp->data => false (i.e. any filters
> with different prio - don't bother)
> * if tp is the only one with reference to tp->data and there are *any*
> knodes => false.
>
> Any extra links can come only from knodes in a non-empty hnode. And it's not
> a common case. Shouldn't thIe whole thing be
> * shared tp->data => false
> * any non-empty hnode => false
> instead? Perhaps even with the knode counter in tp->data, avoiding any loops
> in there, as well as the entire ht_empty()...
>
> Now, in the very beginning of u32_delete() we have this:
> struct tc_u_hnode *ht = arg;
>
> if (ht == NULL)
> goto out;
> OK, but the call of ->delete() is
> err = tp->ops->delete(tp, fh, last, extack);
> and arg == NULL seen in u32_delete() means fh == NULL in tfilter_del_notify().
> Which is called in
> if (!fh) {
> ...
> } else {
> bool last;
>
> err = tfilter_del_notify(net, skb, n, tp, block,
> q, parent, fh, false, &last,
> extack);
> How can we ever get there with NULL fh?
>
Try:
tc filter delete dev $P parent ffff: protocol ip prio 10 u32
tcm handle is 0, so will hit that code path.
> The whole thing makes very little sense; looks like it used to live in
> u32_destroy() prior to commit 763dbf6328e41 ("net_sched: move the empty tp
> check from ->destroy() to ->delete()"), but looking at the rationale in
> that commit... I don't see how it fixes anything - sure, now we remove
> tcf_proto from the list before calling ->destroy(). Without any RCU delays
> in between. How could it possibly solve any issues with ->classify()
> called in parallel with ->destroy()? cls_u32 (at least these days)
> does try to survive u32_destroy() in parallel with u32_classify();
> if any other classifiers do not, they are still broken and that commit
> has not done anything for them.
>
> Anyway, adjusting 1/7 for that is trivial, but I would really like to
> understand what that code is doing... Comments?
>
Refer to above.
cheers,
jamal
^ permalink raw reply
* network device suspend/resume
From: Lakshmi @ 2018-09-07 11:57 UTC (permalink / raw)
To: Oliver Neukum, netdev
Hi,
I am bringing kernel bugzilla bug here 196399
https://bugzilla.kernel.org/show_bug.cgi?id=196399
This issue occurred last time two months ago and I wonder if it appears
again. Can you confirm if there is any issue related to network device
suspend/resume.
last instance is here
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4376/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
==================================================================
0b95:7720 ASIX Electronics Corp. AX88772
USB network card
driver seems to be usbnet
==================================================================
Bug Report: 196399
We have found out that since at least 4.11-rc1, some machines in the
Intel GFX CI lab have been generating the following warning when
suspending to s4 (suspend to disk):
[ 287.212825] ------------[ cut here ]------------
[ 287.212829] WARNING: CPU: 0 PID: 3165 at net/sched/sch_generic.c:316
dev_watchdog+0x218/0x220
[ 287.212830] Modules linked in: mcs7830 usbnet mii snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel
snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core snd_pcm
i2c_designware_platform i2c_designware_core mei_me mei prime_numbers
i2c_hid pinctrl_sunrisepoint pinctrl_intel
[ 287.212864] CPU: 0 PID: 3165 Comm: gem_exec_suspen Tainted: G U
4.12.0-CI-CI_DRM_2829+ #1
[ 287.212865] Hardware name: Dell Inc. XPS 13 9360/093TW6, BIOS 1.3.2
01/18/2017
[ 287.212867] task: ffff8801b4084f40 task.stack: ffffc900001d8000
[ 287.212869] RIP: 0010:dev_watchdog+0x218/0x220
[ 287.212870] RSP: 0018:ffff88027e403e38 EFLAGS: 00010292
[ 287.212872] RAX: 000000000000005a RBX: 0000000000000000 RCX:
0000000000000000
[ 287.212874] RDX: 0000000000000002 RSI: ffffffff81cbcf89 RDI:
ffffffff81c9c627
[ 287.212875] RBP: ffff88027e403e68 R08: 0000000000000000 R09:
0000000000000001
[ 287.212876] R10: 0000000028e9c215 R11: 0000000000000000 R12:
ffff88026e08a848
[ 287.212877] R13: 0000000000000000 R14: ffff88026e050020 R15:
0000000000000001
[ 287.212878] FS: 00007f345056a8c0(0000) GS:ffff88027e400000(0000)
knlGS:0000000000000000
[ 287.212880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 287.212881] CR2: 00000000008d7008 CR3: 00000001b4314000 CR4:
00000000003406f0
[ 287.212882] Call Trace:
[ 287.212883] <IRQ>
[ 287.212886] ? qdisc_rcu_free+0x40/0x40
[ 287.212888] ? qdisc_rcu_free+0x40/0x40
[ 287.212891] call_timer_fn+0x8e/0x370
[ 287.212894] ? qdisc_rcu_free+0x40/0x40
[ 287.212896] expire_timers+0x150/0x1f0
[ 287.212899] run_timer_softirq+0x7c/0x160
[ 287.212903] __do_softirq+0x116/0x4a0
[ 287.212906] irq_exit+0xa9/0xc0
[ 287.212909] smp_apic_timer_interrupt+0x38/0x50
[ 287.212912] apic_timer_interrupt+0x90/0xa0
[ 287.212914] RIP: 0010:delay_tsc+0x33/0xc0
[ 287.212916] RSP: 0018:ffffc900001dbcd8 EFLAGS: 00000286 ORIG_RAX:
ffffffffffffff10
[ 287.212918] RAX: 0000000080000000 RBX: 00000005964f23a0 RCX:
0000000000000001
[ 287.212919] RDX: 0000000080000001 RSI: ffffffff81c8e23a RDI:
00000000ffffffff
[ 287.212920] RBP: ffffc900001dbcf8 R08: 0000000000000000 R09:
0000000000000001
[ 287.212921] R10: 0000000000000000 R11: 0000000000000000 R12:
000000059633478e
[ 287.212922] R13: 0000000000249f13 R14: 0000000000000000 R15:
ffff880272eac008
[ 287.212924] </IRQ>
[ 287.212929] ? delay_tsc+0x6b/0xc0
[ 287.212932] __delay+0xa/0x10
[ 287.212934] __const_udelay+0x31/0x40
[ 287.212936] hibernation_debug_sleep+0x20/0x30
[ 287.212938] hibernation_snapshot+0x2bc/0x5f0
[ 287.212940] hibernate+0x159/0x2f0
[ 287.212943] state_store+0xe0/0xf0
[ 287.212947] kobj_attr_store+0xf/0x20
[ 287.212949] sysfs_kf_write+0x40/0x50
[ 287.212951] kernfs_fop_write+0x130/0x1b0
[ 287.212955] __vfs_write+0x23/0x120
[ 287.212957] ? rcu_read_lock_sched_held+0x75/0x80
[ 287.212959] ? rcu_sync_lockdep_assert+0x2a/0x50
[ 287.212961] ? __sb_start_write+0xfa/0x1f0
[ 287.212964] vfs_write+0xc5/0x1d0
[ 287.212966] ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 287.212969] SyS_write+0x44/0xb0
[ 287.212972] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 287.212973] RIP: 0033:0x7f344ed4a4a0
[ 287.212974] RSP: 002b:00007ffef50dfaa8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 287.212977] RAX: ffffffffffffffda RBX: ffffffff81470683 RCX:
00007f344ed4a4a0
[ 287.212978] RDX: 0000000000000004 RSI: 000000000041d211 RDI:
0000000000000006
[ 287.212979] RBP: ffffc900001dbf88 R08: 00000000008d6a50 R09:
0000000000000000
[ 287.212980] R10: 0000000000000000 R11: 0000000000000246 R12:
000000000041d211
[ 287.212981] R13: 0000000000000006 R14: 0000000000000000 R15:
0000000000000000
[ 287.212984] ? __this_cpu_preempt_check+0x13/0x20
[ 287.212988] Code: 63 8e 18 04 00 00 eb 93 4c 89 f7 c6 05 77 5c 77 00
01 e8 dc 7f fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 18 f4 cf 81 e8 f1 c4
9d ff <0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00
[ 287.213051] ---[ end trace b6016dcc7544a681 ]---
This is caught while running the intel-gpu-tools test named
'igt@gem_exec_suspend@basic-s4-devices' on the following machines:
- Intel Kaby Lake-R RVP: Failure rate 123/135 run(s) (91%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2828/fi-kbl-r/igt@gem_exec_suspend@basic-s4-devices.html
- Intel Kaby Lake i7-7560u: Failure rate 196/305 run(s) (64%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2827/fi-kbl-7560u/igt@gem_exec_suspend@basic-s4-devices.html
- Intel Skylake i7-6600u: Failure rate 23/75 run(s) (30%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2824/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
- Intel Sandy Bridge i7-2600: Failure rate 10/293 run(s) (3%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2816/fi-snb-2600/igt@gem_exec_suspend@basic-s4-devices.html
We have plenty of other machines that do not trigger this warning at all.
The bug used to live in fd.o's bugzilla, but it had no business being
there: https://bugs.freedesktop.org/show_bug.cgi?id=100125
Let me know if I can help in some ways.
=====================================================================
Lakshmi.
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki
Business Identity Code: 0357606 - 4
Domiciled in Helsinki
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
^ permalink raw reply
* Re: [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: Jakub Kicinski @ 2018-09-07 16:36 UTC (permalink / raw)
To: jun qian
Cc: Dirk van der Merwe, Daniel Borkmann, Quentin Monnet, oss-drivers,
netdev, linux-kernel
In-Reply-To: <20180907162117.47361-1-hangdianqj@163.com>
On Fri, 7 Sep 2018 09:21:17 -0700, jun qian wrote:
> As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
>
> Signed-off-by: jun qian <hangdianqj@163.com>
FWIW you should put spaces after the colons. It's generally a good
practice to look at the prefix previous authors used for a given piece
of code with
git log -- $file_path
This would be a better subject:
nfp: replace spin_lock_bh with spin_lock in tasklet callback
^ permalink raw reply
* Re: [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: Jakub Kicinski @ 2018-09-07 16:33 UTC (permalink / raw)
To: jun qian
Cc: Dirk van der Merwe, Daniel Borkmann, Quentin Monnet, oss-drivers,
netdev, linux-kernel
In-Reply-To: <20180907162117.47361-1-hangdianqj@163.com>
On Fri, 7 Sep 2018 09:21:17 -0700, jun qian wrote:
> As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
>
> Signed-off-by: jun qian <hangdianqj@163.com>
We had more of those at some point, I wonder if you(r tool) can find
them :)
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
^ permalink raw reply
* Network device suspend/resume
From: Lakshmi @ 2018-09-07 11:49 UTC (permalink / raw)
To: Oliver Neukum, netdev
Hi,
I am bringing kernel bugzilla bug here
https://bugzilla.kernel.org/show_bug.cgi?id=196399
This issue occured 2 months ago and we didn't see this again. Wondering
if that appears again. Can you confirm if there is any bug in network
suspend/resume in the case.
One more instance here
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_4376/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
================================================================================
Network device is
0b95:7720 ASIX Electronics Corp. AX88772
USB network card
driver seems to be usbnet
=================================================================================
Bug 196399:
We have found out that since at least 4.11-rc1, some machines in the
Intel GFX CI lab have been generating the following warning when
suspending to s4 (suspend to disk):
[ 287.212825] ------------[ cut here ]------------
[ 287.212829] WARNING: CPU: 0 PID: 3165 at net/sched/sch_generic.c:316
dev_watchdog+0x218/0x220
[ 287.212830] Modules linked in: mcs7830 usbnet mii snd_hda_codec_hdmi
snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal
intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel
snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core snd_pcm
i2c_designware_platform i2c_designware_core mei_me mei prime_numbers
i2c_hid pinctrl_sunrisepoint pinctrl_intel
[ 287.212864] CPU: 0 PID: 3165 Comm: gem_exec_suspen Tainted: G U
4.12.0-CI-CI_DRM_2829+ #1
[ 287.212865] Hardware name: Dell Inc. XPS 13 9360/093TW6, BIOS 1.3.2
01/18/2017
[ 287.212867] task: ffff8801b4084f40 task.stack: ffffc900001d8000
[ 287.212869] RIP: 0010:dev_watchdog+0x218/0x220
[ 287.212870] RSP: 0018:ffff88027e403e38 EFLAGS: 00010292
[ 287.212872] RAX: 000000000000005a RBX: 0000000000000000 RCX:
0000000000000000
[ 287.212874] RDX: 0000000000000002 RSI: ffffffff81cbcf89 RDI:
ffffffff81c9c627
[ 287.212875] RBP: ffff88027e403e68 R08: 0000000000000000 R09:
0000000000000001
[ 287.212876] R10: 0000000028e9c215 R11: 0000000000000000 R12:
ffff88026e08a848
[ 287.212877] R13: 0000000000000000 R14: ffff88026e050020 R15:
0000000000000001
[ 287.212878] FS: 00007f345056a8c0(0000) GS:ffff88027e400000(0000)
knlGS:0000000000000000
[ 287.212880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 287.212881] CR2: 00000000008d7008 CR3: 00000001b4314000 CR4:
00000000003406f0
[ 287.212882] Call Trace:
[ 287.212883] <IRQ>
[ 287.212886] ? qdisc_rcu_free+0x40/0x40
[ 287.212888] ? qdisc_rcu_free+0x40/0x40
[ 287.212891] call_timer_fn+0x8e/0x370
[ 287.212894] ? qdisc_rcu_free+0x40/0x40
[ 287.212896] expire_timers+0x150/0x1f0
[ 287.212899] run_timer_softirq+0x7c/0x160
[ 287.212903] __do_softirq+0x116/0x4a0
[ 287.212906] irq_exit+0xa9/0xc0
[ 287.212909] smp_apic_timer_interrupt+0x38/0x50
[ 287.212912] apic_timer_interrupt+0x90/0xa0
[ 287.212914] RIP: 0010:delay_tsc+0x33/0xc0
[ 287.212916] RSP: 0018:ffffc900001dbcd8 EFLAGS: 00000286 ORIG_RAX:
ffffffffffffff10
[ 287.212918] RAX: 0000000080000000 RBX: 00000005964f23a0 RCX:
0000000000000001
[ 287.212919] RDX: 0000000080000001 RSI: ffffffff81c8e23a RDI:
00000000ffffffff
[ 287.212920] RBP: ffffc900001dbcf8 R08: 0000000000000000 R09:
0000000000000001
[ 287.212921] R10: 0000000000000000 R11: 0000000000000000 R12:
000000059633478e
[ 287.212922] R13: 0000000000249f13 R14: 0000000000000000 R15:
ffff880272eac008
[ 287.212924] </IRQ>
[ 287.212929] ? delay_tsc+0x6b/0xc0
[ 287.212932] __delay+0xa/0x10
[ 287.212934] __const_udelay+0x31/0x40
[ 287.212936] hibernation_debug_sleep+0x20/0x30
[ 287.212938] hibernation_snapshot+0x2bc/0x5f0
[ 287.212940] hibernate+0x159/0x2f0
[ 287.212943] state_store+0xe0/0xf0
[ 287.212947] kobj_attr_store+0xf/0x20
[ 287.212949] sysfs_kf_write+0x40/0x50
[ 287.212951] kernfs_fop_write+0x130/0x1b0
[ 287.212955] __vfs_write+0x23/0x120
[ 287.212957] ? rcu_read_lock_sched_held+0x75/0x80
[ 287.212959] ? rcu_sync_lockdep_assert+0x2a/0x50
[ 287.212961] ? __sb_start_write+0xfa/0x1f0
[ 287.212964] vfs_write+0xc5/0x1d0
[ 287.212966] ? trace_hardirqs_on_caller+0xe7/0x1c0
[ 287.212969] SyS_write+0x44/0xb0
[ 287.212972] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 287.212973] RIP: 0033:0x7f344ed4a4a0
[ 287.212974] RSP: 002b:00007ffef50dfaa8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 287.212977] RAX: ffffffffffffffda RBX: ffffffff81470683 RCX:
00007f344ed4a4a0
[ 287.212978] RDX: 0000000000000004 RSI: 000000000041d211 RDI:
0000000000000006
[ 287.212979] RBP: ffffc900001dbf88 R08: 00000000008d6a50 R09:
0000000000000000
[ 287.212980] R10: 0000000000000000 R11: 0000000000000246 R12:
000000000041d211
[ 287.212981] R13: 0000000000000006 R14: 0000000000000000 R15:
0000000000000000
[ 287.212984] ? __this_cpu_preempt_check+0x13/0x20
[ 287.212988] Code: 63 8e 18 04 00 00 eb 93 4c 89 f7 c6 05 77 5c 77 00
01 e8 dc 7f fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 18 f4 cf 81 e8 f1 c4
9d ff <0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00
[ 287.213051] ---[ end trace b6016dcc7544a681 ]---
This is caught while running the intel-gpu-tools test named
'igt@gem_exec_suspend@basic-s4-devices' on the following machines:
- Intel Kaby Lake-R RVP: Failure rate 123/135 run(s) (91%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2828/fi-kbl-r/igt@gem_exec_suspend@basic-s4-devices.html
- Intel Kaby Lake i7-7560u: Failure rate 196/305 run(s) (64%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2827/fi-kbl-7560u/igt@gem_exec_suspend@basic-s4-devices.html
- Intel Skylake i7-6600u: Failure rate 23/75 run(s) (30%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2824/fi-skl-6600u/igt@gem_exec_suspend@basic-s4-devices.html
- Intel Sandy Bridge i7-2600: Failure rate 10/293 run(s) (3%), last
occurence:
https://intel-gfx-ci.01.org/CI/CI_DRM_2816/fi-snb-2600/igt@gem_exec_suspend@basic-s4-devices.html
We have plenty of other machines that do not trigger this warning at all.
The bug used to live in fd.o's bugzilla, but it had no business being
there: https://bugs.freedesktop.org/show_bug.cgi?id=100125
Let me know if I can help in some ways.
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki
Business Identity Code: 0357606 - 4
Domiciled in Helsinki
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
^ permalink raw reply
* Re: [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
From: Eric Dumazet @ 2018-09-07 11:43 UTC (permalink / raw)
To: Edward Cree, Eric Dumazet, davem; +Cc: linux-net-drivers, netdev
In-Reply-To: <e184f2c1-fe28-ad6e-460d-950d8a363852@solarflare.com>
On 09/07/2018 03:44 AM, Edward Cree wrote:
>
> Any suggestions on how to construct a test that will?
>
Say 50 concurrent netperf -t TCP_RR -- -r 8000,8000
This way you have a mix of GRO-candidates, and non GRO ones (pure acks)
GRO sizes would be reasonable (not full size GRO packets).
^ permalink raw reply
* [PATCH] ethernet:netronome:nfp:move spin_lock_bh to spin_lock in tasklet
From: jun qian @ 2018-09-07 16:21 UTC (permalink / raw)
To: Jakub Kicinski, Dirk van der Merwe, Daniel Borkmann,
Quentin Monnet
Cc: oss-drivers, netdev, linux-kernel, jun qian
As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
Signed-off-by: jun qian <hangdianqj@163.com>
---
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index a8b9fbab5f73..084c983ec3c2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2075,10 +2075,10 @@ static void nfp_ctrl_poll(unsigned long arg)
{
struct nfp_net_r_vector *r_vec = (void *)arg;
- spin_lock_bh(&r_vec->lock);
+ spin_lock(&r_vec->lock);
nfp_net_tx_complete(r_vec->tx_ring, 0);
__nfp_ctrl_tx_queued(r_vec);
- spin_unlock_bh(&r_vec->lock);
+ spin_unlock(&r_vec->lock);
nfp_ctrl_rx(r_vec);
--
2.17.1
^ permalink raw reply related
* Re: [PATCH net-next 11/11] vhost_net: batch submitting XDP buffers to underlayer sockets
From: Michael S. Tsirkin @ 2018-09-07 16:13 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <ffb802e6-1505-e01e-f6d5-11cde8dace9b@redhat.com>
On Fri, Sep 07, 2018 at 03:41:52PM +0800, Jason Wang wrote:
> > > @@ -556,10 +667,14 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > size_t len, total_len = 0;
> > > int err;
> > > int sent_pkts = 0;
> > > + bool bulking = (sock->sk->sk_sndbuf == INT_MAX);
> > What does bulking mean?
>
> The name is misleading, it means whether we can do batching. For simplicity,
> I disable batching is sndbuf is not INT_MAX.
But what does batching have to do with sndbuf?
> > > for (;;) {
> > > bool busyloop_intr = false;
> > > + if (nvq->done_idx == VHOST_NET_BATCH)
> > > + vhost_tx_batch(net, nvq, sock, &msg);
> > > +
> > > head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
> > > &busyloop_intr);
> > > /* On error, stop handling until the next kick. */
> > > @@ -577,14 +692,34 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
> > > break;
> > > }
> > > - vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
> > > - vq->heads[nvq->done_idx].len = 0;
> > > -
> > > total_len += len;
> > > - if (tx_can_batch(vq, total_len))
> > > - msg.msg_flags |= MSG_MORE;
> > > - else
> > > - msg.msg_flags &= ~MSG_MORE;
> > > +
> > > + /* For simplicity, TX batching is only enabled if
> > > + * sndbuf is unlimited.
> > What if sndbuf changes while this processing is going on?
>
> We will get the correct sndbuf in the next run of handle_tx(). I think this
> is safe.
If it's safe why bother with special-casing INT_MAX?
--
MST
^ permalink raw reply
* [PATCH v3 1/2] net: ethernet: i40e: fix build error
From: Wang Dongsheng @ 2018-09-07 11:19 UTC (permalink / raw)
To: jeffrey.t.kirsher, sergei.shtylyov
Cc: jacob.e.keller, davem, intel-wired-lan, netdev, linux-kernel,
Wang Dongsheng
In-Reply-To: <1536319175-3660-1-git-send-email-dongsheng.wang@hxt-semitech.com>
Remove "inline" from __i40e_add_stat_strings and move the function.
In file included from
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:9:0:
drivers/net/ethernet/intel/i40e/i40e_ethtool.c: In function
__i40e_add_stat_string:
drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h:193:20: error:
function __i40e_add_stat_strings can never be inlined because it uses
variable argument lists
static inline void __i40e_add_stat_strings(u8 **p, const struct
i40e_stats stats[],
Tested on: x86_64, make ARCH=i386
Modules section .text:
i40e: 00019380 <__i40e_add_stat_strings>:
i40evf: 00006b00 <__i40e_add_stat_strings>:
Buildin section .text:
i40e: c351ca60 <__i40e_add_stat_strings>:
i40evf: c354f2c0 <__i40e_add_stat_strings>:
Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
---
V3: add static
V2: Move function
---
.../net/ethernet/intel/i40e/i40e_ethtool.c | 24 ++++++++++++++++++
.../ethernet/intel/i40e/i40e_ethtool_stats.h | 25 ++-----------------
2 files changed, 26 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index d7d3974beca2..34121a72f2e7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1821,6 +1821,30 @@ static void i40e_get_ethtool_stats(struct net_device *netdev,
"ethtool stats count mismatch!");
}
+/**
+ * __i40e_add_stat_strings - copy stat strings into ethtool buffer
+ * @p: ethtool supplied buffer
+ * @stats: stat definitions array
+ * @size: size of the stats array
+ *
+ * Format and copy the strings described by stats into the buffer pointed at
+ * by p.
+ **/
+static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
+ const unsigned int size, ...)
+{
+ unsigned int i;
+
+ for (i = 0; i < size; i++) {
+ va_list args;
+
+ va_start(args, size);
+ vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
+ *p += ETH_GSTRING_LEN;
+ va_end(args);
+ }
+}
+
/**
* i40e_get_stat_strings - copy stat strings into supplied buffer
* @netdev: the netdev to collect strings for
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
index bba1cb0b658f..553b0d720839 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
@@ -181,29 +181,8 @@ i40e_add_queue_stats(u64 **data, struct i40e_ring *ring)
*data += size;
}
-/**
- * __i40e_add_stat_strings - copy stat strings into ethtool buffer
- * @p: ethtool supplied buffer
- * @stats: stat definitions array
- * @size: size of the stats array
- *
- * Format and copy the strings described by stats into the buffer pointed at
- * by p.
- **/
-static inline void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
- const unsigned int size, ...)
-{
- unsigned int i;
-
- for (i = 0; i < size; i++) {
- va_list args;
-
- va_start(args, size);
- vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
- *p += ETH_GSTRING_LEN;
- va_end(args);
- }
-}
+static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
+ const unsigned int size, ...);
/**
* 40e_add_stat_strings - copy stat strings into ethtool buffer
--
2.18.0
^ permalink raw reply related
* Re: [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Sergei Shtylyov @ 2018-09-07 15:33 UTC (permalink / raw)
To: Wang Dongsheng, jeffrey.t.kirsher
Cc: jacob.e.keller, davem, intel-wired-lan, netdev, linux-kernel
In-Reply-To: <1536319175-3660-3-git-send-email-dongsheng.wang@hxt-semitech.com>
On 09/07/2018 02:19 PM, Wang Dongsheng wrote:
> Can't have non-inline function in a header file.
> There is a risk of "Multiple definition" from cross-including.
>
> Tested on: x86_64, make ARCH=i386
>
> Modules section .text:
> i40e: 00019380 <__i40e_add_stat_strings>:
> i40evf: 00006b00 <__i40e_add_stat_strings>:
>
> Buildin section .text:
> i40e: c351ca60 <__i40e_add_stat_strings>:
> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
> ---
> V3: add static
> ---
> .../intel/i40evf/i40e_ethtool_stats.h | 23 +-----------------
> .../ethernet/intel/i40evf/i40evf_ethtool.c | 24 +++++++++++++++++++
> 2 files changed, 25 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> index 60b595dd8c39..62ab67a77753 100644
> --- a/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> +++ b/drivers/net/ethernet/intel/i40evf/i40e_ethtool_stats.h
> @@ -181,29 +181,8 @@ i40evf_add_queue_stats(u64 **data, struct i40e_ring *ring)
> *data += size;
> }
>
> -/**
> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> - * @p: ethtool supplied buffer
> - * @stats: stat definitions array
> - * @size: size of the stats array
> - *
> - * Format and copy the strings described by stats into the buffer pointed at
> - * by p.
> - **/
> static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
There's no point to keeping *static* function in the header file (unless it's
also *inline*).
> - const unsigned int size, ...)
> -{
> - unsigned int i;
> -
> - for (i = 0; i < size; i++) {
> - va_list args;
> -
> - va_start(args, size);
> - vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> - *p += ETH_GSTRING_LEN;
> - va_end(args);
> - }
> -}
> + const unsigned int size, ...);
>
> /**
> * 40e_add_stat_strings - copy stat strings into ethtool buffer
[...]
MBR, Sergei
^ permalink raw reply
* Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path
From: Edward Cree @ 2018-09-07 10:51 UTC (permalink / raw)
To: Eric Dumazet, davem; +Cc: linux-net-drivers, netdev
In-Reply-To: <01bc9bf5-1780-2650-958f-961bd24b8c26@gmail.com>
On 07/09/18 03:32, Eric Dumazet wrote:
> Your performance numbers are not convincing, since TCP stream test should
> get nominal GRO gains.
I'm not quite sure what you mean here, could you explain a bit more?
> Adding this complexity and icache pressure needs more experimental results.
>
> What about RPC workloads (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )
I'll try that. Any other tests that would be worthwhile?
-Ed
^ permalink raw reply
* RE: linux-next: build failure after merge of the net-next tree
From: Keller, Jacob E @ 2018-09-07 15:30 UTC (permalink / raw)
To: Stephen Rothwell, David Miller, Networking
Cc: Linux-Next Mailing List, Linux Kernel Mailing List,
Kirsher, Jeffrey T, Bowers, AndrewX
In-Reply-To: <20180907102055.001777b8@canb.auug.org.au>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Stephen Rothwell
> Sent: Thursday, September 06, 2018 5:21 PM
> To: David Miller <davem@davemloft.net>; Networking
> <netdev@vger.kernel.org>
> Cc: Linux-Next Mailing List <linux-next@vger.kernel.org>; Linux Kernel Mailing
> List <linux-kernel@vger.kernel.org>; Keller, Jacob E <jacob.e.keller@intel.com>;
> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; Bowers, AndrewX
> <andrewx.bowers@intel.com>
> Subject: Re: linux-next: build failure after merge of the net-next tree
>
> Hi all,
>
> On Mon, 3 Sep 2018 09:47:02 +1000 Stephen Rothwell <sfr@canb.auug.org.au>
> wrote:
> >
> > After merging the net-next tree, today's linux-next build (powerpc
> > ppc64_defconfig) failed like this:
> >
> > In file included from drivers/net/ethernet/intel/i40e/i40e_ethtool.c:9:
> > drivers/net/ethernet/intel/i40e/i40e_ethtool.c: In function
> '__i40e_add_stat_strings':
> > drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h:193:20: error: function
> '__i40e_add_stat_strings' can never be inlined because it uses variable argument
> lists
> > static inline void __i40e_add_stat_strings(u8 **p, const struct i40e_stats
> stats[],
> > ^~~~~~~~~~~~~~~~~~~~~~~
> >
> > Caused by commit
> >
> > 8fd75c58a09a ("i40e: move ethtool stats boiler plate code to
> i40e_ethtool_stats.h")
> >
> > It is not clear this patch has any value anyway as the moved functions
> > are only used in the file they were moved from.
> >
> > I reverted that commit for today.
> >
> > The same problem would exist in drivers/net/ethernet/intel/i40evf (where
> > a lot of code is duplicated from drivers/net/ethernet/intel/i40e) except
> > that this function is not declared inline there.
> > Luckily, i40e_ethtool_stats.h is only included my one file
> > drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c, otherwise there
> > would be multiple copies of __i40e_add_stat_strings().
> >
> > Surely there is some scope for factoring out some common code between
> > these two drivers?
>
> Ping?
>
> --
> Cheers,
> Stephen Rothwell
There's some discussion about this going on in the intel-wired-lan mailing list.
Thanks,
Jake
^ permalink raw reply
* RE: [PATCH v3 1/2] net: ethernet: i40e: fix build error
From: Keller, Jacob E @ 2018-09-07 15:27 UTC (permalink / raw)
To: Wang Dongsheng, Kirsher, Jeffrey T,
sergei.shtylyov@cogentembedded.com
Cc: davem@davemloft.net, intel-wired-lan@lists.osuosl.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <1536319175-3660-2-git-send-email-dongsheng.wang@hxt-semitech.com>
> -----Original Message-----
> From: Wang Dongsheng [mailto:dongsheng.wang@hxt-semitech.com]
> Sent: Friday, September 07, 2018 4:20 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> sergei.shtylyov@cogentembedded.com
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; davem@davemloft.net; intel-
> wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Wang Dongsheng <dongsheng.wang@hxt-
> semitech.com>
> Subject: [PATCH v3 1/2] net: ethernet: i40e: fix build error
>
> Remove "inline" from __i40e_add_stat_strings and move the function.
>
> In file included from
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c:9:0:
> drivers/net/ethernet/intel/i40e/i40e_ethtool.c: In function
> __i40e_add_stat_string:
> drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h:193:20: error:
> function __i40e_add_stat_strings can never be inlined because it uses
> variable argument lists
> static inline void __i40e_add_stat_strings(u8 **p, const struct
> i40e_stats stats[],
>
> Tested on: x86_64, make ARCH=i386
>
> Modules section .text:
> i40e: 00019380 <__i40e_add_stat_strings>:
> i40evf: 00006b00 <__i40e_add_stat_strings>:
>
> Buildin section .text:
> i40e: c351ca60 <__i40e_add_stat_strings>:
> i40evf: c354f2c0 <__i40e_add_stat_strings>:
>
> Signed-off-by: Wang Dongsheng <dongsheng.wang@hxt-semitech.com>
> ---
> V3: add static
> V2: Move function
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
This patch fixes the build issue, and is fine with me.
Thanks,
Jake
> ---
> .../net/ethernet/intel/i40e/i40e_ethtool.c | 24 ++++++++++++++++++
> .../ethernet/intel/i40e/i40e_ethtool_stats.h | 25 ++-----------------
> 2 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index d7d3974beca2..34121a72f2e7 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1821,6 +1821,30 @@ static void i40e_get_ethtool_stats(struct net_device
> *netdev,
> "ethtool stats count mismatch!");
> }
>
> +/**
> + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> + * @p: ethtool supplied buffer
> + * @stats: stat definitions array
> + * @size: size of the stats array
> + *
> + * Format and copy the strings described by stats into the buffer pointed at
> + * by p.
> + **/
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> + const unsigned int size, ...)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < size; i++) {
> + va_list args;
> +
> + va_start(args, size);
> + vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> + *p += ETH_GSTRING_LEN;
> + va_end(args);
> + }
> +}
> +
> /**
> * i40e_get_stat_strings - copy stat strings into supplied buffer
> * @netdev: the netdev to collect strings for
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> index bba1cb0b658f..553b0d720839 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> @@ -181,29 +181,8 @@ i40e_add_queue_stats(u64 **data, struct i40e_ring
> *ring)
> *data += size;
> }
>
> -/**
> - * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> - * @p: ethtool supplied buffer
> - * @stats: stat definitions array
> - * @size: size of the stats array
> - *
> - * Format and copy the strings described by stats into the buffer pointed at
> - * by p.
> - **/
> -static inline void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> - const unsigned int size, ...)
> -{
> - unsigned int i;
> -
> - for (i = 0; i < size; i++) {
> - va_list args;
> -
> - va_start(args, size);
> - vsnprintf(*p, ETH_GSTRING_LEN, stats[i].stat_string, args);
> - *p += ETH_GSTRING_LEN;
> - va_end(args);
> - }
> -}
> +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats stats[],
> + const unsigned int size, ...);
>
> /**
> * 40e_add_stat_strings - copy stat strings into ethtool buffer
> --
> 2.18.0
^ permalink raw reply
* RE: [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
From: Wyborny, Carolyn @ 2018-09-07 15:26 UTC (permalink / raw)
To: Wang, Dongsheng, Kirsher, Jeffrey T,
sergei.shtylyov@cogentembedded.com
Cc: Keller, Jacob E, davem@davemloft.net,
intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org
In-Reply-To: <77898840b49847a7a835b20b383e21a2@HXTBJIDCEMVIW02.hxtcorp.net>
> -----Original Message-----
> From: Wang, Dongsheng [mailto:dongsheng.wang@hxt-semitech.com]
> Sent: Friday, September 07, 2018 5:34 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> sergei.shtylyov@cogentembedded.com
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; davem@davemloft.net;
> intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; Wyborny, Carolyn <carolyn.wyborny@intel.com>
> Subject: Re: [PATCH v3 2/2] net: ethernet: i40evf: fix underlying build error
>
> Hello Jacob,
>
> Since Carolyn' team is working this, I think we don't need this patch
> anymore because this header file is only for ethtool.c.
> [..]
Hello Dongsheng,
The commonization effort we're working on is prioritizing our newest drivers. The i40e work is still being scoped, so we should fix this problem as needed now and not wait.
I apologize for any miscommunication. Was trying to let people know that we aware of the issue and are trying to make progress in that direction.
Thanks,
Carolyn
^ permalink raw reply
* Re: [PATCH v2 net-next 3/4] net: make listified RX functions return number of good packets
From: Edward Cree @ 2018-09-07 10:44 UTC (permalink / raw)
To: Eric Dumazet, davem; +Cc: linux-net-drivers, netdev
In-Reply-To: <2ed84271-06da-b5c2-3f23-940357880002@gmail.com>
On 07/09/18 03:27, Eric Dumazet wrote:
> On 09/06/2018 07:26 AM, Edward Cree wrote:
>> Signed-off-by: Edward Cree <ecree@solarflare.com>
> Lack of changelog here ?
>
> I do not know what is a good packet.
The comment on netif_receive_skb_list() defines this as "skbs for which
netif_receive_skb() would have returned %NET_RX_SUCCESS". But I shall put
that into the changelog as well.
> You are adding a lot of conditional expressions, that cpu
> will mispredict quite often.
I don't see an alternative, since this is needed by patch #4 and I daresay
there are other drivers that will also want to get NET_RX_SUCCESS-like
information (possibly from regular receives as well as GRO; I'm not quite
sure why sfc only cares about the latter).
> Typical micro benchmarks wont really notice.
Any suggestions on how to construct a test that will?
^ permalink raw reply
* [PATCH net-next] cxgb4: impose mandatory VLAN usage when non-zero TAG ID
From: Ganesh Goudar @ 2018-09-07 10:29 UTC (permalink / raw)
To: netdev, davem; +Cc: nirranjan, indranil, dt, Casey Leedom, Ganesh Goudar
From: Casey Leedom <leedom@chelsio.com>
When a non-zero VLAN Tag ID is passed to t4_set_vlan_acl()
then impose mandatory VLAN Usage with that VLAN ID.
I.e any other VLAN ID should result in packets getting
dropped.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 3 +++
drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 5fe5d16..6f1bd7b 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -10210,6 +10210,9 @@ int t4_set_vlan_acl(struct adapter *adap, unsigned int mbox, unsigned int vf,
vlan_cmd.en_to_len16 = cpu_to_be32(enable | FW_LEN16(vlan_cmd));
/* Drop all packets that donot match vlan id */
vlan_cmd.dropnovlan_fm = FW_ACL_VLAN_CMD_FM_F;
+ vlan_cmd.dropnovlan_fm = (enable
+ ? (FW_ACL_VLAN_CMD_DROPNOVLAN_F |
+ FW_ACL_VLAN_CMD_FM_F) : 0);
if (enable != 0) {
vlan_cmd.nvlan = 1;
vlan_cmd.vlanid[0] = cpu_to_be16(vlan);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 5dc6c41..6d2bc87 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2464,6 +2464,7 @@ struct fw_acl_vlan_cmd {
#define FW_ACL_VLAN_CMD_DROPNOVLAN_S 7
#define FW_ACL_VLAN_CMD_DROPNOVLAN_V(x) ((x) << FW_ACL_VLAN_CMD_DROPNOVLAN_S)
+#define FW_ACL_VLAN_CMD_DROPNOVLAN_F FW_ACL_VLAN_CMD_DROPNOVLAN_V(1U)
#define FW_ACL_VLAN_CMD_FM_S 6
#define FW_ACL_VLAN_CMD_FM_M 0x1
--
2.1.0
^ permalink raw reply related
* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
From: Peter Wu @ 2018-09-07 15:05 UTC (permalink / raw)
To: Daniel Drake
Cc: mika.westerberg-VuQAYsv1563Yd54FQh9/CA,
linux-pm-u79uwXL29TY76Z2rM5mHXA, linux-pci-u79uwXL29TY76Z2rM5mHXA,
rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w,
nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
keith.busch-ral2JQCrhuEAvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
linux-6IF/jdPJHihWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
hkallweit1-Re5JQEeQqe8AvxtiuMwx3w
In-Reply-To: <20180907053614.6540-1-drake-6IF/jdPJHihWk0Htik3J/w@public.gmane.org>
On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
<..>
> Thomas Martitz reports that this workaround also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.
Where was this claimed? It is not stated in the linked bug:
(https://bugs.freedesktop.org/show_bug.cgi?id=105760
> On resume, reprogram the PCI bridge prefetch registers, including the
> magic register mentioned above.
>
> This matches Win10 behaviour, which also rewrites these registers
> during S3 resume (checked with qemu tracing).
Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
https://www.spinics.net/lists/linux-pci/msg75856.html
Linux has a generic "restore" operation that works backwards from the
end of the PCI config space to the beginning, see
pci_restore_config_space. Do you have a dmesg where you see the
"restoring config space at offset" messages?
Would it be reasonable to unconditionally write these registers in
pci_restore_config_dword, like Windows does?
Kind regards,
Peter
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply
* Re: [PATCH 3/4] of: Convert to using %pOFn instead of device_node.name
From: Rob Herring @ 2018-09-07 14:58 UTC (permalink / raw)
To: Thierry Reding
Cc: Frank Rowand, devicetree, linux-kernel@vger.kernel.org,
Andrew Lunn, Florian Fainelli, netdev
In-Reply-To: <20180907122928.GA5821@ulmo>
On Fri, Sep 7, 2018 at 7:29 AM Thierry Reding <thierry.reding@gmail.com> wrote:
>
> On Tue, Aug 28, 2018 at 10:52:53AM -0500, Rob Herring wrote:
> > In preparation to remove the node name pointer from struct device_node,
> > convert printf users to use the %pOFn format specifier.
> >
> > Cc: Frank Rowand <frowand.list@gmail.com>
> > Cc: Andrew Lunn <andrew@lunn.ch>
> > Cc: Florian Fainelli <f.fainelli@gmail.com>
> > Cc: devicetree@vger.kernel.org
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Rob Herring <robh@kernel.org>
> > ---
> > drivers/of/device.c | 4 ++--
> > drivers/of/of_mdio.c | 12 ++++++------
> > drivers/of/of_numa.c | 4 ++--
> > drivers/of/overlay.c | 4 ++--
> > drivers/of/platform.c | 8 ++++----
> > drivers/of/unittest.c | 12 ++++++------
> > 6 files changed, 22 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/of/device.c b/drivers/of/device.c
> > index 5957cd4fa262..daa075d87317 100644
> > --- a/drivers/of/device.c
> > +++ b/drivers/of/device.c
> > @@ -219,7 +219,7 @@ static ssize_t of_device_get_modalias(struct device *dev, char *str, ssize_t len
> > return -ENODEV;
> >
> > /* Name & Type */
> > - csize = snprintf(str, len, "of:N%sT%s", dev->of_node->name,
> > + csize = snprintf(str, len, "of:N%pOFnT%s", dev->of_node,
> > dev->of_node->type);
> > tsize = csize;
> > len -= csize;
>
> This seems to cause the modalias to be improperly constructed. As a
> consequence, automatic module loading at boot time is now broken. I
> think the reason why this fails is because vsnprintf() will skip all
> alpha-numeric characters after a call to pointer(). Presumably this
> is meant to be a generic way of skipping whatever specifiers we throw
> at it.
>
> Unfortunately for the case of OF modaliases, this means that the 'T'
> character gets eaten, so we end up with something like this:
>
> # udevadm info /sys/bus/platform/devices/54200000.dc
> [...]
> E: MODALIAS=of:Ndc<NULL>Cnvidia,tegra124-dc
> [...]
>
> instead of this:
>
> # udevadm info /sys/bus/platform/devices/54200000.dc
> [...]
> E: MODALIAS=of:NdcT<NULL>Cnvidia,tegra124-dc
> [...]
Oops. Thanks for finding this.
> Everything is back to normal if I revert this patch. However, since
> that's obviously not what we want, I think perhaps what we need is a
> way for pointer() (and its implementations) to report back how many
> characters in the format string it consumed so that we can support
> these kinds of back-to-back strings.
I don't think we can change it because if I have something like
%pOFMoreWords and then later on want to add 'M' as a new modifier we'd
break any existing cases with 'M'. Of course, I could search the tree
for that case and find unused characters, but that seems fragile
(though silently throwing away the characters does too).
> If nobody else has the time I can look into coding up a fix, but in the
> meantime it might be best to back this one out until we can handle the
> OF modalias format string.
There's an easy fix though. Just replace the 'T' with a '%c'.
I found one other case in the clock code and one in soundbus (which I
haven't posted yet).
Rob
^ permalink raw reply
* Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
From: Wolfgang Walter @ 2018-09-07 9:53 UTC (permalink / raw)
To: Steffen Klassert; +Cc: netdev, Wei Wang
In-Reply-To: <20180831065024.GG23674@gauss3.secunet.de>
Hello,
didn't respond as I've been on vacation.
Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert:
> On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote:
> > Hello,
> >
> > kernels > 4.12 do not work on one of our main routers. They crash as soon
> > as ipsec-tunnels are configured and ipsec-traffic actually flows.
>
> Can you please send the backtrace of this crash?
>
I'll try today. The oops quickly disappears because other problems arising
from it pop up. The machine crashes and no logs are logged. I try to make foto
or try to log to the serial console.
At the moment I only see that there is xfrm_???? stuff in the call trace as
xfrm_lookup, xfrm_route_????, and it is while routing a packet.
With later kernels (4.18.5) the machine seems to crash without a call trace on
console.
Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
^ permalink raw reply
* Re: [PATCH net-next 04/11] tuntap: simplify error handling in tun_build_skb()
From: Michael S. Tsirkin @ 2018-09-07 14:17 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <e1f17f51-5f99-df18-4185-6c6e4dfaba89@redhat.com>
On Fri, Sep 07, 2018 at 11:22:00AM +0800, Jason Wang wrote:
> > > @@ -1668,6 +1668,9 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
> > > if (copied != len)
> > > return ERR_PTR(-EFAULT);
> > > + get_page(alloc_frag->page);
> > > + alloc_frag->offset += buflen;
> > > +
> > This adds an atomic op on XDP_DROP which is a data path
> > operation for some workloads.
>
> Yes, I have patch on top to amortize this, the idea is to have a very big
> refcount once after the frag was allocated and maintain a bias and decrease
> them all when allocating new frags.'
Why bother with refcounting for a drop though? It should be simple.
--
MST
^ permalink raw reply
* Re: [PATCH net-next 06/11] tuntap: split out XDP logic
From: Michael S. Tsirkin @ 2018-09-07 14:16 UTC (permalink / raw)
To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization
In-Reply-To: <84ae17c1-f34b-610d-a5a1-ba8dce1732f3@redhat.com>
On Fri, Sep 07, 2018 at 11:29:34AM +0800, Jason Wang wrote:
> > > + if (act != XDP_PASS)
> > > + goto out;
> > likely?
>
> It depends on the XDP program, so I tend not to use it.
Point is XDP_PASS is already slow.
--
MST
^ permalink raw reply
* Re: [PATCH net-next v2 4/5] virtio_ring: add event idx support in packed ring
From: Michael S. Tsirkin @ 2018-09-07 14:10 UTC (permalink / raw)
To: Tiwei Bie
Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
jfreimann
In-Reply-To: <20180711022711.7090-5-tiwei.bie@intel.com>
On Wed, Jul 11, 2018 at 10:27:10AM +0800, Tiwei Bie wrote:
> This commit introduces the EVENT_IDX support in packed ring.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
Besides the usual comment about hard-coded constants like <<15:
does this actually do any good for performance? We don't
have to if we do not want to.
> ---
> drivers/virtio/virtio_ring.c | 73 ++++++++++++++++++++++++++++++++----
> 1 file changed, 65 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index f317b485ba54..f79a1e17f7d1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1050,7 +1050,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 flags;
> + u16 new, old, off_wrap, flags, wrap_counter, event_idx;
> bool needs_kick;
> u32 snapshot;
>
> @@ -1059,9 +1059,19 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> * suppressions. */
> virtio_mb(vq->weak_barriers);
>
> + old = vq->next_avail_idx - vq->num_added;
> + new = vq->next_avail_idx;
> + vq->num_added = 0;
> +
> snapshot = READ_ONCE(*(u32 *)vq->vring_packed.device);
> + off_wrap = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot & 0xffff));
> flags = virtio16_to_cpu(_vq->vdev, (__virtio16)(snapshot >> 16)) & 0x3;
some kind of struct union would be helpful to make this readable.
>
> + wrap_counter = off_wrap >> 15;
> + event_idx = off_wrap & ~(1 << 15);
> + if (wrap_counter != vq->avail_wrap_counter)
> + event_idx -= vq->vring_packed.num;
> +
> #ifdef DEBUG
> if (vq->last_add_time_valid) {
> WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> @@ -1070,7 +1080,10 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> vq->last_add_time_valid = false;
> #endif
>
> - needs_kick = (flags != VRING_EVENT_F_DISABLE);
> + if (flags == VRING_EVENT_F_DESC)
> + needs_kick = vring_need_event(event_idx, new, old);
> + else
> + needs_kick = (flags != VRING_EVENT_F_DISABLE);
> END_USE(vq);
> return needs_kick;
> }
> @@ -1185,6 +1198,15 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> ret = vq->desc_state_packed[id].data;
> detach_buf_packed(vq, id, ctx);
>
> + /* If we expect an interrupt for the next entry, tell host
> + * by writing event index and flush out the write before
> + * the read in the next get_buf call. */
> + if (vq->event_flags_shadow == VRING_EVENT_F_DESC)
> + virtio_store_mb(vq->weak_barriers,
> + &vq->vring_packed.driver->off_wrap,
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx |
> + ((u16)vq->used_wrap_counter << 15)));
> +
> #ifdef DEBUG
> vq->last_add_time_valid = false;
> #endif
> @@ -1213,8 +1235,18 @@ static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
>
> + if (vq->event) {
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + vq->last_used_idx |
> + ((u16)vq->used_wrap_counter << 15));
> + /* We need to update event offset and event wrap
> + * counter first before updating event flags. */
> + virtio_wmb(vq->weak_barriers);
> + }
> +
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> }
> @@ -1238,22 +1270,47 @@ static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned off_wrap)
> static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs, used_idx, wrap_counter;
>
> START_USE(vq);
>
> /* We optimistically turn back on interrupts, then check if there was
> * more to do. */
>
> + if (vq->event) {
> + /* TODO: tune this threshold */
> + bufs = (vq->vring_packed.num - _vq->num_free) * 3 / 4;
> + wrap_counter = vq->used_wrap_counter;
> +
> + used_idx = vq->last_used_idx + bufs;
> + if (used_idx >= vq->vring_packed.num) {
> + used_idx -= vq->vring_packed.num;
> + wrap_counter ^= 1;
> + }
> +
> + vq->vring_packed.driver->off_wrap = cpu_to_virtio16(_vq->vdev,
> + used_idx | (wrap_counter << 15));
> +
> + /* We need to update event offset and event wrap
> + * counter first before updating event flags. */
> + virtio_wmb(vq->weak_barriers);
> + } else {
> + used_idx = vq->last_used_idx;
> + wrap_counter = vq->used_wrap_counter;
> + }
> +
> if (vq->event_flags_shadow == VRING_EVENT_F_DISABLE) {
> - vq->event_flags_shadow = VRING_EVENT_F_ENABLE;
> + vq->event_flags_shadow = vq->event ? VRING_EVENT_F_DESC :
> + VRING_EVENT_F_ENABLE;
> vq->vring_packed.driver->flags = cpu_to_virtio16(_vq->vdev,
> vq->event_flags_shadow);
> - /* We need to enable interrupts first before re-checking
> - * for more used buffers. */
> - virtio_mb(vq->weak_barriers);
> }
>
> - if (more_used_packed(vq)) {
> + /* We need to update event suppression structure first
> + * before re-checking for more used buffers. */
> + virtio_mb(vq->weak_barriers);
> +
mb is expensive. We should not do it if we changed nothing.
> + if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
> END_USE(vq);
> return false;
> }
> --
> 2.18.0
^ permalink raw reply
* Re: [PATCH net-next] net: sched: cls_flower: dump offload count value
From: Vlad Buslov @ 2018-09-07 9:28 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem
In-Reply-To: <20180907111113.100dfa97@cakuba>
On Fri 07 Sep 2018 at 09:11, Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
> On Thu, 6 Sep 2018 18:37:23 +0300, Vlad Buslov wrote:
>> Change flower in_hw_count type to fixed-size u32 and dump it as
>> TCA_FLOWER_IN_HW_COUNT. This change is necessary to properly test shared
>> blocks and re-offload functionality.
>>
>> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>> Acked-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>> include/net/sch_generic.h | 2 +-
>> include/uapi/linux/pkt_cls.h | 2 ++
>> net/sched/cls_flower.c | 5 ++++-
>> 3 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
>> index a6d00093f35e..d68ac55539a5 100644
>> --- a/include/net/sch_generic.h
>> +++ b/include/net/sch_generic.h
>> @@ -362,7 +362,7 @@ static inline void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
>> }
>>
>> static inline void
>> -tc_cls_offload_cnt_update(struct tcf_block *block, unsigned int *cnt,
>> +tc_cls_offload_cnt_update(struct tcf_block *block, u32 *cnt,
>> u32 *flags, bool add)
>> {
>> if (add) {
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index be382fb0592d..2824fb7ed1c9 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -483,6 +483,8 @@ enum {
>> TCA_FLOWER_KEY_ENC_OPTS,
>> TCA_FLOWER_KEY_ENC_OPTS_MASK,
>>
>> + TCA_FLOWER_IN_HW_COUNT, /* be32 */
>
> Why be32?
This comment is wrong.
Thanks for catching this.
>
> I wish there was a good way to share this attribute between
> classifiers :(
>
>> __TCA_FLOWER_MAX,
>> };
>>
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 6fd9bdd93796..4b8dd37dd4f8 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -98,7 +98,7 @@ struct cls_fl_filter {
>> struct list_head list;
>> u32 handle;
>> u32 flags;
>> - unsigned int in_hw_count;
>> + u32 in_hw_count;
>> struct rcu_work rwork;
>> struct net_device *hw_dev;
>> };
>> @@ -1880,6 +1880,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, void *fh,
>> if (f->flags && nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags))
>> goto nla_put_failure;
>>
>> + if (nla_put_u32(skb, TCA_FLOWER_IN_HW_COUNT, f->in_hw_count))
>> + goto nla_put_failure;
>> +
>> if (tcf_exts_dump(skb, &f->exts))
>> goto nla_put_failure;
>>
^ permalink raw reply
* Re: [PATCH net-next v2 2/5] virtio_ring: support creating packed ring
From: Michael S. Tsirkin @ 2018-09-07 14:03 UTC (permalink / raw)
To: Tiwei Bie
Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
jfreimann
In-Reply-To: <20180711022711.7090-3-tiwei.bie@intel.com>
On Wed, Jul 11, 2018 at 10:27:08AM +0800, Tiwei Bie wrote:
> This commit introduces the support for creating packed ring.
> All split ring specific functions are added _split suffix.
> Some necessary stubs for packed ring are also added.
>
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
I'd rather have a patch just renaming split functions, then
add all packed stuff in as a separate patch on top.
> ---
> drivers/virtio/virtio_ring.c | 801 +++++++++++++++++++++++------------
> include/linux/virtio_ring.h | 8 +-
> 2 files changed, 546 insertions(+), 263 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 814b395007b2..c4f8abc7445a 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -60,11 +60,15 @@ struct vring_desc_state {
> struct vring_desc *indir_desc; /* Indirect descriptor, if any. */
> };
>
> +struct vring_desc_state_packed {
> + int next; /* The next desc state. */
So this can go away with IN_ORDER?
> +};
> +
> struct vring_virtqueue {
> struct virtqueue vq;
>
> - /* Actual memory layout for this queue */
> - struct vring vring;
> + /* Is this a packed ring? */
> + bool packed;
>
> /* Can we use weak barriers? */
> bool weak_barriers;
> @@ -86,11 +90,39 @@ struct vring_virtqueue {
> /* Last used index we've seen. */
> u16 last_used_idx;
>
> - /* Last written value to avail->flags */
> - u16 avail_flags_shadow;
> + union {
> + /* Available for split ring */
> + struct {
> + /* Actual memory layout for this queue. */
> + struct vring vring;
>
> - /* Last written value to avail->idx in guest byte order */
> - u16 avail_idx_shadow;
> + /* Last written value to avail->flags */
> + u16 avail_flags_shadow;
> +
> + /* Last written value to avail->idx in
> + * guest byte order. */
> + u16 avail_idx_shadow;
> + };
Name this field split so it's easier to detect misuse of e.g.
packed fields in split code?
> +
> + /* Available for packed ring */
> + struct {
> + /* Actual memory layout for this queue. */
> + struct vring_packed vring_packed;
> +
> + /* Driver ring wrap counter. */
> + bool avail_wrap_counter;
> +
> + /* Device ring wrap counter. */
> + bool used_wrap_counter;
> +
> + /* Index of the next avail descriptor. */
> + u16 next_avail_idx;
> +
> + /* Last written value to driver->flags in
> + * guest byte order. */
> + u16 event_flags_shadow;
> + };
> + };
>
> /* How to notify other side. FIXME: commonalize hcalls! */
> bool (*notify)(struct virtqueue *vq);
> @@ -110,11 +142,24 @@ struct vring_virtqueue {
> #endif
>
> /* Per-descriptor state. */
> - struct vring_desc_state desc_state[];
> + union {
> + struct vring_desc_state desc_state[1];
> + struct vring_desc_state_packed desc_state_packed[1];
> + };
> };
>
> #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
>
> +static inline bool virtqueue_use_indirect(struct virtqueue *_vq,
> + unsigned int total_sg)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + /* If the host supports indirect descriptor tables, and we have multiple
> + * buffers, then go indirect. FIXME: tune this threshold */
> + return (vq->indirect && total_sg > 1 && vq->vq.num_free);
> +}
> +
> /*
> * Modern virtio devices have feature bits to specify whether they need a
> * quirk and bypass the IOMMU. If not there, just use the DMA API.
> @@ -200,8 +245,17 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> cpu_addr, size, direction);
> }
>
> -static void vring_unmap_one(const struct vring_virtqueue *vq,
> - struct vring_desc *desc)
> +static int vring_mapping_error(const struct vring_virtqueue *vq,
> + dma_addr_t addr)
> +{
> + if (!vring_use_dma_api(vq->vq.vdev))
> + return 0;
> +
> + return dma_mapping_error(vring_dma_dev(vq), addr);
> +}
> +
> +static void vring_unmap_one_split(const struct vring_virtqueue *vq,
> + struct vring_desc *desc)
> {
> u16 flags;
>
> @@ -225,17 +279,9 @@ static void vring_unmap_one(const struct vring_virtqueue *vq,
> }
> }
>
> -static int vring_mapping_error(const struct vring_virtqueue *vq,
> - dma_addr_t addr)
> -{
> - if (!vring_use_dma_api(vq->vq.vdev))
> - return 0;
> -
> - return dma_mapping_error(vring_dma_dev(vq), addr);
> -}
> -
> -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> - unsigned int total_sg, gfp_t gfp)
> +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> + unsigned int total_sg,
> + gfp_t gfp)
> {
> struct vring_desc *desc;
> unsigned int i;
> @@ -256,14 +302,14 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> return desc;
> }
>
> -static inline int virtqueue_add(struct virtqueue *_vq,
> - struct scatterlist *sgs[],
> - unsigned int total_sg,
> - unsigned int out_sgs,
> - unsigned int in_sgs,
> - void *data,
> - void *ctx,
> - gfp_t gfp)
> +static inline int virtqueue_add_split(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int total_sg,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + void *ctx,
> + gfp_t gfp)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> struct scatterlist *sg;
> @@ -299,10 +345,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>
> head = vq->free_head;
>
> - /* If the host supports indirect descriptor tables, and we have multiple
> - * buffers, then go indirect. FIXME: tune this threshold */
> - if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> - desc = alloc_indirect(_vq, total_sg, gfp);
> + if (virtqueue_use_indirect(_vq, total_sg))
> + desc = alloc_indirect_split(_vq, total_sg, gfp);
> else {
> desc = NULL;
> WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> @@ -423,7 +467,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> for (n = 0; n < total_sg; n++) {
> if (i == err_idx)
> break;
> - vring_unmap_one(vq, &desc[i]);
> + vring_unmap_one_split(vq, &desc[i]);
> i = virtio16_to_cpu(_vq->vdev, vq->vring.desc[i].next);
> }
>
> @@ -434,6 +478,355 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> return -EIO;
> }
>
> +static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 new, old;
> + bool needs_kick;
> +
> + START_USE(vq);
> + /* We need to expose available array entries before checking avail
> + * event. */
> + virtio_mb(vq->weak_barriers);
> +
> + old = vq->avail_idx_shadow - vq->num_added;
> + new = vq->avail_idx_shadow;
> + vq->num_added = 0;
> +
> +#ifdef DEBUG
> + if (vq->last_add_time_valid) {
> + WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> + vq->last_add_time)) > 100);
> + }
> + vq->last_add_time_valid = false;
> +#endif
> +
> + if (vq->event) {
> + needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
> + new, old);
> + } else {
> + needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
> + }
> + END_USE(vq);
> + return needs_kick;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> + void **ctx)
> +{
> + unsigned int i, j;
> + __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +
> + /* Clear data ptr. */
> + vq->desc_state[head].data = NULL;
> +
> + /* Put back on free list: unmap first-level descriptors and find end */
> + i = head;
> +
> + while (vq->vring.desc[i].flags & nextflag) {
> + vring_unmap_one_split(vq, &vq->vring.desc[i]);
> + i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
> + vq->vq.num_free++;
> + }
> +
> + vring_unmap_one_split(vq, &vq->vring.desc[i]);
> + vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
> + vq->free_head = head;
> +
> + /* Plus final descriptor */
> + vq->vq.num_free++;
> +
> + if (vq->indirect) {
> + struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
> + u32 len;
> +
> + /* Free the indirect table, if any, now that it's unmapped. */
> + if (!indir_desc)
> + return;
> +
> + len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
> +
> + BUG_ON(!(vq->vring.desc[head].flags &
> + cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> + BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> +
> + for (j = 0; j < len / sizeof(struct vring_desc); j++)
> + vring_unmap_one_split(vq, &indir_desc[j]);
> +
> + kfree(indir_desc);
> + vq->desc_state[head].indir_desc = NULL;
> + } else if (ctx) {
> + *ctx = vq->desc_state[head].indir_desc;
> + }
> +}
> +
> +static inline bool more_used_split(const struct vring_virtqueue *vq)
> +{
> + return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> +}
> +
> +static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> + unsigned int *len,
> + void **ctx)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + void *ret;
> + unsigned int i;
> + u16 last_used;
> +
> + START_USE(vq);
> +
> + if (unlikely(vq->broken)) {
> + END_USE(vq);
> + return NULL;
> + }
> +
> + if (!more_used_split(vq)) {
> + pr_debug("No more buffers in queue\n");
> + END_USE(vq);
> + return NULL;
> + }
> +
> + /* Only get used array entries after they have been exposed by host. */
> + virtio_rmb(vq->weak_barriers);
> +
> + last_used = (vq->last_used_idx & (vq->vring.num - 1));
> + i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
> + *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
> +
> + if (unlikely(i >= vq->vring.num)) {
> + BAD_RING(vq, "id %u out of range\n", i);
> + return NULL;
> + }
> + if (unlikely(!vq->desc_state[i].data)) {
> + BAD_RING(vq, "id %u is not a head!\n", i);
> + return NULL;
> + }
> +
> + /* detach_buf_split clears data, so grab it now. */
> + ret = vq->desc_state[i].data;
> + detach_buf_split(vq, i, ctx);
> + vq->last_used_idx++;
> + /* If we expect an interrupt for the next entry, tell host
> + * by writing event index and flush out the write before
> + * the read in the next get_buf call. */
> + if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> + virtio_store_mb(vq->weak_barriers,
> + &vring_used_event(&vq->vring),
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> +
> +#ifdef DEBUG
> + vq->last_add_time_valid = false;
> +#endif
> +
> + END_USE(vq);
> + return ret;
> +}
> +
> +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> + vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> + }
> +}
> +
> +static unsigned virtqueue_enable_cb_prepare_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 last_used_idx;
> +
> + START_USE(vq);
> +
> + /* We optimistically turn back on interrupts, then check if there was
> + * more to do. */
> + /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always do both to keep code simple. */
> + if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> + vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> + }
> + vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> + END_USE(vq);
> + return last_used_idx;
> +}
> +
> +static bool virtqueue_poll_split(struct virtqueue *_vq, unsigned last_used_idx)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + virtio_mb(vq->weak_barriers);
> + return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> +}
> +
> +static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + u16 bufs;
> +
> + START_USE(vq);
> +
> + /* We optimistically turn back on interrupts, then check if there was
> + * more to do. */
> + /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> + * either clear the flags bit or point the event index at the next
> + * entry. Always update the event index to keep code simple. */
> + if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> + vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> + }
> + /* TODO: tune this threshold */
> + bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
> +
> + virtio_store_mb(vq->weak_barriers,
> + &vring_used_event(&vq->vring),
> + cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
> +
> + if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> + END_USE(vq);
> + return false;
> + }
> +
> + END_USE(vq);
> + return true;
> +}
> +
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> + unsigned int i;
> + void *buf;
> +
> + START_USE(vq);
> +
> + for (i = 0; i < vq->vring.num; i++) {
> + if (!vq->desc_state[i].data)
> + continue;
> + /* detach_buf clears data, so grab it now. */
> + buf = vq->desc_state[i].data;
> + detach_buf_split(vq, i, NULL);
> + vq->avail_idx_shadow--;
> + vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> + END_USE(vq);
> + return buf;
> + }
> + /* That should have freed everything. */
> + BUG_ON(vq->vq.num_free != vq->vring.num);
> +
> + END_USE(vq);
> + return NULL;
> +}
> +
> +/*
> + * The layout for the packed ring is a continuous chunk of memory
> + * which looks like this.
> + *
> + * struct vring_packed {
> + * // The actual descriptors (16 bytes each)
> + * struct vring_packed_desc desc[num];
> + *
> + * // Padding to the next align boundary.
> + * char pad[];
> + *
> + * // Driver Event Suppression
> + * struct vring_packed_desc_event driver;
> + *
> + * // Device Event Suppression
> + * struct vring_packed_desc_event device;
> + * };
> + */
Why not just allocate event structures separately?
Is it a win to have them share a cache line for some reason?
> +static inline void vring_init_packed(struct vring_packed *vr, unsigned int num,
> + void *p, unsigned long align)
> +{
> + vr->num = num;
> + vr->desc = p;
> + vr->driver = (void *)ALIGN(((uintptr_t)p +
> + sizeof(struct vring_packed_desc) * num), align);
> + vr->device = vr->driver + 1;
> +}
What's all this about alignment? Where does it come from?
> +
> +static inline unsigned vring_size_packed(unsigned int num, unsigned long align)
> +{
> + return ((sizeof(struct vring_packed_desc) * num + align - 1)
> + & ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
> +}
> +
> +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int total_sg,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + void *ctx,
> + gfp_t gfp)
> +{
> + return -EIO;
> +}
> +
> +static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
> +{
> + return false;
> +}
> +
> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> +{
> + return false;
> +}
> +
> +static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> + unsigned int *len,
> + void **ctx)
> +{
> + return NULL;
> +}
> +
> +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> +{
> +}
> +
> +static unsigned virtqueue_enable_cb_prepare_packed(struct virtqueue *_vq)
> +{
> + return 0;
> +}
> +
> +static bool virtqueue_poll_packed(struct virtqueue *_vq, unsigned last_used_idx)
> +{
> + return false;
> +}
> +
> +static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
> +{
> + return false;
> +}
> +
> +static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> +{
> + return NULL;
> +}
> +
> +static inline int virtqueue_add(struct virtqueue *_vq,
> + struct scatterlist *sgs[],
> + unsigned int total_sg,
> + unsigned int out_sgs,
> + unsigned int in_sgs,
> + void *data,
> + void *ctx,
> + gfp_t gfp)
> +{
> + struct vring_virtqueue *vq = to_vvq(_vq);
> +
> + return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> + in_sgs, data, ctx, gfp) :
> + virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> + in_sgs, data, ctx, gfp);
> +}
> +
> /**
> * virtqueue_add_sgs - expose buffers to other end
> * @vq: the struct virtqueue we're talking about.
> @@ -550,34 +943,9 @@ EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
> bool virtqueue_kick_prepare(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 new, old;
> - bool needs_kick;
>
> - START_USE(vq);
> - /* We need to expose available array entries before checking avail
> - * event. */
> - virtio_mb(vq->weak_barriers);
> -
> - old = vq->avail_idx_shadow - vq->num_added;
> - new = vq->avail_idx_shadow;
> - vq->num_added = 0;
> -
> -#ifdef DEBUG
> - if (vq->last_add_time_valid) {
> - WARN_ON(ktime_to_ms(ktime_sub(ktime_get(),
> - vq->last_add_time)) > 100);
> - }
> - vq->last_add_time_valid = false;
> -#endif
> -
> - if (vq->event) {
> - needs_kick = vring_need_event(virtio16_to_cpu(_vq->vdev, vring_avail_event(&vq->vring)),
> - new, old);
> - } else {
> - needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
> - }
> - END_USE(vq);
> - return needs_kick;
> + return vq->packed ? virtqueue_kick_prepare_packed(_vq) :
> + virtqueue_kick_prepare_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_kick_prepare);
>
> @@ -625,58 +993,9 @@ bool virtqueue_kick(struct virtqueue *vq)
> }
> EXPORT_SYMBOL_GPL(virtqueue_kick);
>
> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> - void **ctx)
> -{
> - unsigned int i, j;
> - __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> -
> - /* Clear data ptr. */
> - vq->desc_state[head].data = NULL;
> -
> - /* Put back on free list: unmap first-level descriptors and find end */
> - i = head;
> -
> - while (vq->vring.desc[i].flags & nextflag) {
> - vring_unmap_one(vq, &vq->vring.desc[i]);
> - i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next);
> - vq->vq.num_free++;
> - }
> -
> - vring_unmap_one(vq, &vq->vring.desc[i]);
> - vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head);
> - vq->free_head = head;
> -
> - /* Plus final descriptor */
> - vq->vq.num_free++;
> -
> - if (vq->indirect) {
> - struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
> - u32 len;
> -
> - /* Free the indirect table, if any, now that it's unmapped. */
> - if (!indir_desc)
> - return;
> -
> - len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
> -
> - BUG_ON(!(vq->vring.desc[head].flags &
> - cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> - BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> -
> - for (j = 0; j < len / sizeof(struct vring_desc); j++)
> - vring_unmap_one(vq, &indir_desc[j]);
> -
> - kfree(indir_desc);
> - vq->desc_state[head].indir_desc = NULL;
> - } else if (ctx) {
> - *ctx = vq->desc_state[head].indir_desc;
> - }
> -}
> -
> static inline bool more_used(const struct vring_virtqueue *vq)
> {
> - return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> + return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> }
>
> /**
> @@ -699,57 +1018,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> void **ctx)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - void *ret;
> - unsigned int i;
> - u16 last_used;
>
> - START_USE(vq);
> -
> - if (unlikely(vq->broken)) {
> - END_USE(vq);
> - return NULL;
> - }
> -
> - if (!more_used(vq)) {
> - pr_debug("No more buffers in queue\n");
> - END_USE(vq);
> - return NULL;
> - }
> -
> - /* Only get used array entries after they have been exposed by host. */
> - virtio_rmb(vq->weak_barriers);
> -
> - last_used = (vq->last_used_idx & (vq->vring.num - 1));
> - i = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].id);
> - *len = virtio32_to_cpu(_vq->vdev, vq->vring.used->ring[last_used].len);
> -
> - if (unlikely(i >= vq->vring.num)) {
> - BAD_RING(vq, "id %u out of range\n", i);
> - return NULL;
> - }
> - if (unlikely(!vq->desc_state[i].data)) {
> - BAD_RING(vq, "id %u is not a head!\n", i);
> - return NULL;
> - }
> -
> - /* detach_buf clears data, so grab it now. */
> - ret = vq->desc_state[i].data;
> - detach_buf(vq, i, ctx);
> - vq->last_used_idx++;
> - /* If we expect an interrupt for the next entry, tell host
> - * by writing event index and flush out the write before
> - * the read in the next get_buf call. */
> - if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT))
> - virtio_store_mb(vq->weak_barriers,
> - &vring_used_event(&vq->vring),
> - cpu_to_virtio16(_vq->vdev, vq->last_used_idx));
> -
> -#ifdef DEBUG
> - vq->last_add_time_valid = false;
> -#endif
> -
> - END_USE(vq);
> - return ret;
> + return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> + virtqueue_get_buf_ctx_split(_vq, len, ctx);
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>
> @@ -771,12 +1042,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> - if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> - vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> - }
> -
> + if (vq->packed)
> + virtqueue_disable_cb_packed(_vq);
> + else
> + virtqueue_disable_cb_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>
> @@ -795,23 +1064,9 @@ EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
> unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 last_used_idx;
>
> - START_USE(vq);
> -
> - /* We optimistically turn back on interrupts, then check if there was
> - * more to do. */
> - /* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> - * either clear the flags bit or point the event index at the next
> - * entry. Always do both to keep code simple. */
> - if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> - vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> - }
> - vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> - END_USE(vq);
> - return last_used_idx;
> + return vq->packed ? virtqueue_enable_cb_prepare_packed(_vq) :
> + virtqueue_enable_cb_prepare_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>
> @@ -828,8 +1083,8 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> - virtio_mb(vq->weak_barriers);
> - return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> + return vq->packed ? virtqueue_poll_packed(_vq, last_used_idx) :
> + virtqueue_poll_split(_vq, last_used_idx);
> }
> EXPORT_SYMBOL_GPL(virtqueue_poll);
>
> @@ -867,34 +1122,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
> bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - u16 bufs;
>
> - START_USE(vq);
> -
> - /* We optimistically turn back on interrupts, then check if there was
> - * more to do. */
> - /* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> - * either clear the flags bit or point the event index at the next
> - * entry. Always update the event index to keep code simple. */
> - if (vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT) {
> - vq->avail_flags_shadow &= ~VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> - }
> - /* TODO: tune this threshold */
> - bufs = (u16)(vq->avail_idx_shadow - vq->last_used_idx) * 3 / 4;
> -
> - virtio_store_mb(vq->weak_barriers,
> - &vring_used_event(&vq->vring),
> - cpu_to_virtio16(_vq->vdev, vq->last_used_idx + bufs));
> -
> - if (unlikely((u16)(virtio16_to_cpu(_vq->vdev, vq->vring.used->idx) - vq->last_used_idx) > bufs)) {
> - END_USE(vq);
> - return false;
> - }
> -
> - END_USE(vq);
> - return true;
> + return vq->packed ? virtqueue_enable_cb_delayed_packed(_vq) :
> + virtqueue_enable_cb_delayed_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
>
> @@ -909,27 +1139,9 @@ EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
> void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> {
> struct vring_virtqueue *vq = to_vvq(_vq);
> - unsigned int i;
> - void *buf;
>
> - START_USE(vq);
> -
> - for (i = 0; i < vq->vring.num; i++) {
> - if (!vq->desc_state[i].data)
> - continue;
> - /* detach_buf clears data, so grab it now. */
> - buf = vq->desc_state[i].data;
> - detach_buf(vq, i, NULL);
> - vq->avail_idx_shadow--;
> - vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> - END_USE(vq);
> - return buf;
> - }
> - /* That should have freed everything. */
> - BUG_ON(vq->vq.num_free != vq->vring.num);
> -
> - END_USE(vq);
> - return NULL;
> + return vq->packed ? virtqueue_detach_unused_buf_packed(_vq) :
> + virtqueue_detach_unused_buf_split(_vq);
> }
> EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>
> @@ -954,7 +1166,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> EXPORT_SYMBOL_GPL(vring_interrupt);
>
> struct virtqueue *__vring_new_virtqueue(unsigned int index,
> - struct vring vring,
> + union vring_union vring,
> + bool packed,
> struct virtio_device *vdev,
> bool weak_barriers,
> bool context,
> @@ -962,19 +1175,22 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> void (*callback)(struct virtqueue *),
> const char *name)
> {
> - unsigned int i;
> struct vring_virtqueue *vq;
> + unsigned int num, i;
> + size_t size;
>
> - vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> - GFP_KERNEL);
> + num = packed ? vring.vring_packed.num : vring.vring_split.num;
> + size = packed ? num * sizeof(struct vring_desc_state_packed) :
> + num * sizeof(struct vring_desc_state);
> +
> + vq = kmalloc(sizeof(*vq) + size, GFP_KERNEL);
> if (!vq)
> return NULL;
>
> - vq->vring = vring;
> vq->vq.callback = callback;
> vq->vq.vdev = vdev;
> vq->vq.name = name;
> - vq->vq.num_free = vring.num;
> + vq->vq.num_free = num;
> vq->vq.index = index;
> vq->we_own_ring = false;
> vq->queue_dma_addr = 0;
> @@ -983,9 +1199,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> vq->weak_barriers = weak_barriers;
> vq->broken = false;
> vq->last_used_idx = 0;
> - vq->avail_flags_shadow = 0;
> - vq->avail_idx_shadow = 0;
> vq->num_added = 0;
> + vq->packed = packed;
> list_add_tail(&vq->vq.list, &vdev->vqs);
> #ifdef DEBUG
> vq->in_use = false;
> @@ -996,19 +1211,48 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> !context;
> vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
>
> + if (vq->packed) {
> + vq->vring_packed = vring.vring_packed;
> + vq->next_avail_idx = 0;
> + vq->avail_wrap_counter = 1;
> + vq->used_wrap_counter = 1;
> + vq->event_flags_shadow = 0;
> +
> + memset(vq->desc_state_packed, 0,
> + num * sizeof(struct vring_desc_state_packed));
> +
> + /* Put everything in free lists. */
> + vq->free_head = 0;
> + for (i = 0; i < num-1; i++)
> + vq->desc_state_packed[i].next = i + 1;
> + } else {
> + vq->vring = vring.vring_split;
> + vq->avail_flags_shadow = 0;
> + vq->avail_idx_shadow = 0;
> +
> + /* Put everything in free lists. */
> + vq->free_head = 0;
> + for (i = 0; i < num-1; i++)
> + vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> +
> + memset(vq->desc_state, 0,
> + num * sizeof(struct vring_desc_state));
> + }
> +
> /* No callback? Tell other side not to bother us. */
> if (!callback) {
> - vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> - if (!vq->event)
> - vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> + if (packed) {
> + vq->event_flags_shadow = VRING_EVENT_F_DISABLE;
> + vq->vring_packed.driver->flags = cpu_to_virtio16(vdev,
> + vq->event_flags_shadow);
> + } else {
> + vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> + if (!vq->event)
> + vq->vring.avail->flags = cpu_to_virtio16(vdev,
> + vq->avail_flags_shadow);
> + }
> }
>
> - /* Put everything in free lists. */
> - vq->free_head = 0;
> - for (i = 0; i < vring.num-1; i++)
> - vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> - memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> -
> return &vq->vq;
> }
> EXPORT_SYMBOL_GPL(__vring_new_virtqueue);
> @@ -1055,6 +1299,12 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
> }
> }
>
> +static inline int
> +__vring_size(unsigned int num, unsigned long align, bool packed)
> +{
> + return packed ? vring_size_packed(num, align) : vring_size(num, align);
> +}
> +
> struct virtqueue *vring_create_virtqueue(
> unsigned int index,
> unsigned int num,
> @@ -1071,7 +1321,8 @@ struct virtqueue *vring_create_virtqueue(
> void *queue = NULL;
> dma_addr_t dma_addr;
> size_t queue_size_in_bytes;
> - struct vring vring;
> + union vring_union vring;
> + bool packed;
>
> /* We assume num is a power of 2. */
> if (num & (num - 1)) {
> @@ -1079,9 +1330,13 @@ struct virtqueue *vring_create_virtqueue(
> return NULL;
> }
>
> + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +
> /* TODO: allocate each queue chunk individually */
> - for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> + for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> + num /= 2) {
> + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> + packed),
> &dma_addr,
> GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
> if (queue)
> @@ -1093,17 +1348,21 @@ struct virtqueue *vring_create_virtqueue(
>
> if (!queue) {
> /* Try to get a single page. You are my only hope! */
> - queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> + queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> + packed),
> &dma_addr, GFP_KERNEL|__GFP_ZERO);
> }
> if (!queue)
> return NULL;
>
> - queue_size_in_bytes = vring_size(num, vring_align);
> - vring_init(&vring, num, queue, vring_align);
> + queue_size_in_bytes = __vring_size(num, vring_align, packed);
> + if (packed)
> + vring_init_packed(&vring.vring_packed, num, queue, vring_align);
> + else
> + vring_init(&vring.vring_split, num, queue, vring_align);
>
> - vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> - notify, callback, name);
> + vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> + context, notify, callback, name);
> if (!vq) {
> vring_free_queue(vdev, queue_size_in_bytes, queue,
> dma_addr);
> @@ -1129,10 +1388,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> void (*callback)(struct virtqueue *vq),
> const char *name)
> {
> - struct vring vring;
> - vring_init(&vring, num, pages, vring_align);
> - return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> - notify, callback, name);
> + union vring_union vring;
> + bool packed;
> +
> + packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> + if (packed)
> + vring_init_packed(&vring.vring_packed, num, pages, vring_align);
> + else
> + vring_init(&vring.vring_split, num, pages, vring_align);
vring_init in the UAPI header is more or less a bug.
I'd just stop using it, keep it around for legacy userspace.
> +
> + return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> + context, notify, callback, name);
> }
> EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>
> @@ -1142,7 +1408,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>
> if (vq->we_own_ring) {
> vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> - vq->vring.desc, vq->queue_dma_addr);
> + vq->packed ? (void *)vq->vring_packed.desc :
> + (void *)vq->vring.desc,
> + vq->queue_dma_addr);
> }
> list_del(&_vq->list);
> kfree(vq);
> @@ -1184,7 +1452,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>
> struct vring_virtqueue *vq = to_vvq(_vq);
>
> - return vq->vring.num;
> + return vq->packed ? vq->vring_packed.num : vq->vring.num;
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>
> @@ -1227,6 +1495,10 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>
> BUG_ON(!vq->we_own_ring);
>
> + if (vq->packed)
> + return vq->queue_dma_addr + ((char *)vq->vring_packed.driver -
> + (char *)vq->vring_packed.desc);
> +
> return vq->queue_dma_addr +
> ((char *)vq->vring.avail - (char *)vq->vring.desc);
> }
> @@ -1238,11 +1510,16 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>
> BUG_ON(!vq->we_own_ring);
>
> + if (vq->packed)
> + return vq->queue_dma_addr + ((char *)vq->vring_packed.device -
> + (char *)vq->vring_packed.desc);
> +
> return vq->queue_dma_addr +
> ((char *)vq->vring.used - (char *)vq->vring.desc);
> }
> EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>
> +/* Only available for split ring */
> const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> {
> return &to_vvq(vq)->vring;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index fab02133a919..992142b35f55 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> struct virtio_device;
> struct virtqueue;
>
> +union vring_union {
> + struct vring vring_split;
> + struct vring_packed vring_packed;
> +};
> +
> /*
> * Creates a virtqueue and allocates the descriptor ring. If
> * may_reduce_num is set, then this may allocate a smaller ring than
> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>
> /* Creates a virtqueue with a custom layout. */
> struct virtqueue *__vring_new_virtqueue(unsigned int index,
> - struct vring vring,
> + union vring_union vring,
> + bool packed,
> struct virtio_device *vdev,
> bool weak_barriers,
> bool ctx,
> --
> 2.18.0
^ permalink raw reply
* GREETINGS FROM TRACY WILLIAM
From: Miss Tracy William @ 2018-09-07 9:20 UTC (permalink / raw)
In-Reply-To: <1094400474.1597284.1536312012984.ref@mail.yahoo.com>
Hello Dear,
how are you today,I hope you are doing great.
It is my great pleasure to
contact you and i hope you don't mind,I was just surfing
through the Internet search when i found your email address,I want to make a new
and special friend,I hope you don't mind.
My name is Tracy William,I am from the United States of America but
presently I live
and work in England,
I will give you pictures and details of me as soon as i hear from you
bye
Tracy
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox