* Re: [PATCH net-next] net: sched: always disable bh when taking tcf_lock
From: David Miller @ 2018-08-19 17:51 UTC (permalink / raw)
To: vladbu; +Cc: netdev, jhs, xiyou.wangcong, jiri, ast, daniel
In-Reply-To: <1534272376-7830-1-git-send-email-vladbu@mellanox.com>
From: Vlad Buslov <vladbu@mellanox.com>
Date: Tue, 14 Aug 2018 21:46:16 +0300
> Recently, ops->init() and ops->dump() of all actions were modified to
> always obtain tcf_lock when accessing private action state. Actions that
> don't depend on tcf_lock for synchronization with their data path use
> non-bh locking API. However, tcf_lock is also used to protect rate
> estimator stats in softirq context by timer callback.
>
> Change ops->init() and ops->dump() of all actions to disable bh when using
> tcf_lock to prevent deadlock reported by following lockdep warning:
...
> Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
> warning regarding possible irq lock inversion dependency between tcf_lock,
> and psample_groups_lock that is taken when holding tcf_lock in sample init:
...
> In order to prevent potential lock inversion dependency between tcf_lock
> and psample_groups_lock, extract call to psample_group_get() from tcf_lock
> protected section in sample action init function.
>
> Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
> Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
> Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
> Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
> Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
> Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
> Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
> Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Applied, thanks Vlad.
^ permalink raw reply
* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Roman Mashak @ 2018-08-19 15:55 UTC (permalink / raw)
To: Robert P. J. Day; +Cc: Linux kernel netdev mailing list
In-Reply-To: <alpine.LFD.2.21.1808181332210.7716@localhost.localdomain>
"Robert P. J. Day" <rpjday@crashcourse.ca> writes:
> (i'm sure this has been explained many times before, so a link
> covering this will almost certainly do just fine.)
>
> i want to loop one physical ethernet port into another, and just
> ping the daylights from one to the other for stress testing. my fedora
> laptop doesn't actually have two unused ethernet ports, so i just want
> to emulate this by slapping a couple startech USB/net adapters into
> two empty USB ports, setting this up, then doing it all over again
> monday morning on the actual target system, which does have multiple
> ethernet ports.
[...]
I used this in the past to test dual-port NIC over loopback cable, you
will need to ajust the script:
#!/bin/bash -x
ip="sudo $HOME/bin/ip"
eth1=192.168.2.100
eth2=192.168.2.101
dev1=eth1
dev2=eth2
dev1mac=00:1b:21:9b:24:b4
dev2mac=00:1b:21:9b:24:b5
# fake client interfaces and addresses
dev=dummy0
dev_mac=00:00:00:00:00:11
# max fake clients supported for simulation
maxusers=3
## Create dummy device
## Accepted parameters:
## $1 - devname
## $2 - devmac
## $3 - subnet (e.g. 10.10.10)
## $4 - max number of IP addresses to create on interface
setup_dummy()
{
# sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward"
# Enable tc hardware offload
# ethtool -K $SGW_DEV hw-tc-offload on
$ip link add $1 address $2 type dummy
$ip link set $1 up
for i in `seq 1 $4`;
do
$ip addr add $3.$i/32 dev $1
done
}
## Delete dummy device
## Accepted parameters:
## $1 - devname
delete_dummy()
{
$ip link del $1 type dummy
}
setup_network()
{
# Send traffic eth3 <-> eth4 over loopback cable, where both interfaces
# eth3 and eth4 are in the same subnet.
#
# We assume that NetworkManager is not running and eth3/eth4 are configured
# via /etc/network/interfaces:
#
# 192.168.1.100/32 dev eth3
# 192.168.1.101/32 dev eth4
#
# Specify source IP address when sending the traffic:
# ping -I 192.168.1.100 192.168.1.101
#
#
$ip neigh add $eth2 lladdr $dev2mac nud permanent dev $dev1
$ip neigh add $eth1 lladdr $dev1mac nud permanent dev $dev2
$ip route add table main $eth1 dev $dev2
$ip route add table main $eth2 dev $dev1
$ip rule add from all lookup local pref 100
$ip rule del pref 0
$ip rule add from $eth2 to $eth1 iif $dev1 lookup local pref 1
$ip rule add from $eth1 to $eth2 iif $dev2 lookup local pref 2
$ip rule add from $eth2 to $eth1 lookup main pref 3
$ip rule add from $eth1 to $eth2 lookup main pref 4
# $ip rule add from 10.10.10.0/24 to $eth1 iif $dev1 lookup local pref 5
# $ip rule add from 10.10.10.0/24 to $eth2 iif $dev2 lookup local pref 6
# $ip rule add from $eth1 to 10.10.10.0/24 iif $dev2 lookup local pref 7
# $ip rule add from $eth2 to 10.10.10.0/24 iif $dev1 lookup local pref 8
}
restore_network()
{
# FIX: hangs connections
$ip rule flush
$ip rule add priority 32767 lookup default
}
#delete_dummy dummy0
#delete_dummy dummy1
#setup_dummy dummy0 00:00:00:00:00:11 10.10.10 3
#setup_dummy dummy1 00:00:00:00:00:22 20.20.20 3
setup_network
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2018-08-19 18:37 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) Fix races in IPVS, from Tan Hu.
2) Missing unbind in matchall classifier, from Hangbin Liu.
3) Missing act_ife action release, from Vlad Buslov.
4) Cure lockdep splats in ila, from Cong Wang.
5) veth queue leak on link delete, from Toshiaki Makita.
6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses.
From Kees Cook.
7) RCU usage fixup in XDP, from Tariq Toukan.
8) Two TCP ULP fixes from Daniel Borkmann.
9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner
Kallweit.
10) Always take tcf_lock with BH disabled, otherwise we can deadlock
with rate estimator code paths. From Vlad Buslov.
11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume
properly. From Jian-Hong Pan.
Please pull, thanks a lot!
The following changes since commit d01e12dd3f4227f1be5d7c5bffa7b8240787bec1:
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal (2018-08-16 10:21:18 -0700)
are available in the Git repository at:
gitolite@ra.kernel.org:/pub/scm/linux/kernel/git/davem/net.git
for you to fetch changes up to e2948e5af8eeb6c945000772b7613b0323a0a203:
ip6_vti: fix creating fallback tunnel device for vti6 (2018-08-19 11:26:39 -0700)
----------------------------------------------------------------
Alexei Starovoitov (1):
Merge branch 'sockmap-ulp-fixes'
Arnd Bergmann (1):
net: lan743x_ptp: convert to ktime_get_clocktai_ts64
Cong Wang (1):
ila: make lockdep happy again
Daniel Borkmann (6):
tcp, ulp: add alias for all ulp modules
tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
bpf, sockmap: fix leakage of smap_psock_map_entry
bpf, sockmap: fix map elem deletion race with smap_stop_sock
bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
bpf: fix redirect to map under tail calls
David S. Miller (2):
Merge git://git.kernel.org/.../pablo/nf
Merge git://git.kernel.org/.../bpf/bpf
Dmitry V. Levin (1):
netfilter: uapi: fix linux/netfilter/nf_osf.h userspace compilation errors
Fabrizio Castro (1):
dt-bindings: net: ravb: Add support for r8a774a1 SoC
Florian Westphal (5):
netfilter: ip6t_rpfilter: set F_IFACE for linklocal addresses
netfilter: fix memory leaks on netlink_dump_start error
netfilter: nf_tables: fix register ordering
netfilter: nf_tables: don't prevent event handler from device cleanup on netns exit
netfilter: conntrack: fix removal of conntrack entries when l4tracker is removed
Haishuang Yan (3):
ip6_vti: simplify stats handling in vti6_xmit
ip_vti: fix a null pointer deferrence when create vti fallback tunnel
ip6_vti: fix creating fallback tunnel device for vti6
Hangbin Liu (1):
cls_matchall: fix tcf_unbind_filter missing
Harsha Sharma (1):
netfilter: nft_ct: make l3 protocol field optional for timeout object
Heiner Kallweit (1):
r8169: add missing Kconfig dependency
Ivan Khoronzhuk (1):
Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
Jesper Dangaard Brouer (1):
samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
Jian-Hong Pan (1):
r8169: don't use MSI-X on RTL8106e
Kees Cook (1):
isdn: Disable IIOCDBGVAR
Lad, Prabhakar (1):
net: dsa: add support for ksz9897 ethernet switch
Matteo Croce (2):
jiffies: add utility function to calculate delta in ms
ipvs: don't show negative times in ip_vs_conn
Michal Hocko (1):
netfilter: x_tables: do not fail xt_alloc_table_info too easilly
Máté Eckl (2):
netfilter: doc: Add nf_tables part in tproxy.txt
netfilter: nft_tproxy: Fix missing-braces warning
Pablo Neira Ayuso (1):
netfilter: nft_dynset: allow dynamic updates of non-anonymous set
Taehee Yoo (1):
netfilter: nft_set: fix allocation size overflow in privsize callback.
Tan Hu (1):
ipvs: fix race between ip_vs_conn_new() and ip_vs_del_dest()
Tariq Toukan (1):
net/xdp: Fix suspicious RCU usage warning
Toshiaki Makita (1):
veth: Free queues on link delete
Vlad Buslov (2):
net: sched: act_ife: always release ife action on init error
net: sched: always disable bh when taking tcf_lock
Yonghong Song (2):
bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
tools/bpf: fix bpf selftest test_cgroup_storage failure
Yuval Shaia (1):
net/mlx5e: Delete unneeded function argument
Documentation/devicetree/bindings/net/dsa/ksz.txt | 4 ++-
Documentation/devicetree/bindings/net/renesas,ravb.txt | 3 ++-
Documentation/networking/ti-cpsw.txt | 11 ++++----
Documentation/networking/tproxy.txt | 34 +++++++++++++++++++-----
drivers/isdn/i4l/isdn_common.c | 8 +-----
drivers/net/dsa/microchip/ksz_common.c | 9 +++++++
drivers/net/dsa/microchip/ksz_spi.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c | 4 +--
drivers/net/ethernet/microchip/lan743x_ptp.c | 3 +--
drivers/net/ethernet/realtek/Kconfig | 1 +
drivers/net/ethernet/realtek/r8169.c | 9 ++++---
drivers/net/veth.c | 70 ++++++++++++++++++++++++--------------------------
include/linux/filter.h | 3 ++-
include/linux/jiffies.h | 5 ++++
include/linux/spinlock.h | 17 +++++++++---
include/net/netfilter/nf_tables.h | 6 ++---
include/net/tcp.h | 4 +++
include/trace/events/xdp.h | 5 ++--
include/uapi/linux/netfilter/nfnetlink_osf.h | 2 ++
include/uapi/linux/netfilter/xt_osf.h | 2 --
kernel/bpf/core.c | 2 +-
kernel/bpf/cpumap.c | 2 ++
kernel/bpf/devmap.c | 1 +
kernel/bpf/sockmap.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------
kernel/bpf/verifier.c | 21 ---------------
kernel/bpf/xskmap.c | 1 +
lib/bucket_locks.c | 11 +++++---
net/core/filter.c | 68 ++++++++++++++++++++++--------------------------
net/core/xdp.c | 14 +++-------
net/ipv4/ip_vti.c | 3 ++-
net/ipv4/tcp_ulp.c | 4 ++-
net/ipv6/ip6_vti.c | 16 ++++--------
net/ipv6/netfilter/ip6t_rpfilter.c | 12 ++++++++-
net/netfilter/ipvs/ip_vs_conn.c | 22 ++++++++++------
net/netfilter/ipvs/ip_vs_core.c | 15 ++++++++---
net/netfilter/nf_conntrack_netlink.c | 26 ++++++++++++-------
net/netfilter/nf_conntrack_proto.c | 15 +++++++----
net/netfilter/nf_tables_api.c | 38 +++++++++++++++++----------
net/netfilter/nfnetlink_acct.c | 29 ++++++++++-----------
net/netfilter/nft_chain_filter.c | 14 +++++-----
net/netfilter/nft_ct.c | 7 ++---
net/netfilter/nft_dynset.c | 2 --
net/netfilter/nft_set_bitmap.c | 6 ++---
net/netfilter/nft_set_hash.c | 8 +++---
net/netfilter/nft_set_rbtree.c | 4 +--
net/netfilter/nft_tproxy.c | 4 ++-
net/netfilter/x_tables.c | 7 +----
net/sched/act_bpf.c | 10 ++++----
net/sched/act_csum.c | 10 ++++----
net/sched/act_gact.c | 10 ++++----
net/sched/act_ife.c | 8 ++----
net/sched/act_mirred.c | 16 ++++++------
net/sched/act_sample.c | 25 ++++++++++--------
net/sched/act_tunnel_key.c | 10 ++++----
net/sched/act_vlan.c | 10 ++++----
net/sched/cls_matchall.c | 2 ++
net/tls/tls_main.c | 1 +
samples/bpf/xdp_redirect_cpu_user.c | 3 ++-
samples/bpf/xdp_rxq_info_user.c | 3 ++-
tools/testing/selftests/bpf/test_cgroup_storage.c | 1 +
60 files changed, 430 insertions(+), 352 deletions(-)
^ permalink raw reply
* Re: [PATCH 2/2] ip6_vti: fix creating fallback tunnel device for vti6
From: David Miller @ 2018-08-19 18:27 UTC (permalink / raw)
To: yanhaishuang; +Cc: steffen.klassert, kuznet, netdev, linux-kernel
In-Reply-To: <1534662305-16734-2-git-send-email-yanhaishuang@cmss.chinamobile.com>
From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sun, 19 Aug 2018 15:05:05 +0800
> When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
> device for vti6 when a new namespace is created.
>
> Tested:
> [root@builder2 ~]# modprobe ip6_tunnel
> [root@builder2 ~]# modprobe ip6_vti
> [root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
> [root@builder2 ~]# unshare -n
> [root@builder2 ~]# ip link
> 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
> default qlen 1000
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Applied.
^ permalink raw reply
* Re: [PATCH 1/2] ip_vti: fix a null pointer deferrence when create vti fallback tunnel
From: David Miller @ 2018-08-19 18:27 UTC (permalink / raw)
To: yanhaishuang; +Cc: steffen.klassert, kuznet, netdev, linux-kernel, edumazet
In-Reply-To: <1534662305-16734-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Sun, 19 Aug 2018 15:05:04 +0800
> After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
> be NULL and will cause following crash:
...
> Reproduce:
> echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
> modprobe ip_vti
> unshare -n
>
> Fixes: 79134e6ce2c9 (net: do not create fallback tunnels for non-default
> namespaces)
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Applied, but please format your Fixes: tag properly next time.
Do not split up a Fixes tag into multiple lines, no matter how long it
is. And enclose the commit header text in both parenthesis and double
quotes, not just parenthesis. Like ("blah blah blah"), thank you.
^ permalink raw reply
* Re: [PATCH v2 net] r8169: don't use MSI-X on RTL8106e
From: David Miller @ 2018-08-19 18:01 UTC (permalink / raw)
To: jian-hong; +Cc: hkallweit1, nic_swsd, netdev, linux-kernel, linux
In-Reply-To: <20180817050735.3367-1-jian-hong@endlessm.com>
From: Jian-Hong Pan <jian-hong@endlessm.com>
Date: Fri, 17 Aug 2018 13:07:35 +0800
> Found the ethernet network on ASUS X441UAR doesn't come back on resume
> from suspend when using MSI-X. The chip is RTL8106e - version 39.
...
> Here is the ethernet controller in detail:
...
> Falling back to MSI fixes the issue.
>
> Fixes: 6c6aa15fdea5 ("r8169: improve interrupt handling")
> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com>
> ---
> Changes in v2:
> - Make the commit message shorter
> - Add "Fixes" tag in the commit message
I'm going to apply this for now, and queue it up for -stable.
If we hear back from Realtek on something we can do to make MSI-X
work on these chips, we can deal with it as a follow-up.
Thanks.
^ permalink raw reply
* Re: [PATCH] net: lan743x_ptp: convert to ktime_get_clocktai_ts64
From: David Miller @ 2018-08-19 17:58 UTC (permalink / raw)
To: arnd; +Cc: bryan.whitehead, UNGLinuxDriver, yuehaibing, netdev, linux-kernel
In-Reply-To: <20180815175040.3736548-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 15 Aug 2018 19:49:49 +0200
> timekeeping_clocktai64() has been renamed to ktime_get_clocktai_ts64()
> for consistency with the other ktime_get_* access functions.
>
> Rename the new caller that has come up as well.
>
> Question: this is the only ptp driver that sets the hardware time
> to the current system time in TAI. Why does it do that?
>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Deciding whether PTP drivers should set the hardware time at boot
to the current system time is a separate discussion from using
the new name for the timekeeping_clocktai64() interface, I'm applying
this.
Thanks Arnd.
^ permalink raw reply
* Re: [RFC 0/1] Appletalk AARP probe broken by receipt of own broadcasts.
From: Andrew Lunn @ 2018-08-19 14:41 UTC (permalink / raw)
To: Craig McGeachie; +Cc: David S. Miller, netdev, Craig McGeachie
In-Reply-To: <d3f31cc2-3bfa-2298-3743-3290fd9a6d2a@gmail.com>
> I run inside Virtualbox with the Realtek PCIe GBE Family Controller.
>
> Assuming I'm reading /sys/class/net/enp0s3/driver correctly, it's using the
> e1000 driver.
Hi Craig
Ah. And how do you connect to the network? Please run some tcpdumps
and collect packets at various points. Make sure your network setup is
not duplicating packets, in particular, any bridges you might have in
order to connect the segments together.
> However, it might not be the ethernet driver's fault. I've been a bit loose
> with terminology. Appletalk AARP probe packets aren't ethernet broadcasts as
> such; they're multicast packets, via the psnap driver, to hardware address
> 09:00:07:ff:ff:ff.
Basically, the same question applies for Multicast as for Broadcast.
I'm pretty sure the interface should not receiver the packet it
transmitted itself. But if something on the network has duplicated the
packet, it will receiver the duplicate. So before we add a filter,
lets understand where the packets are coming from.
Andrew
^ permalink raw reply
* Re: [PATCH net-next] net: mvneta: fix mvneta_config_rss on armada 3700
From: David Miller @ 2018-08-19 17:49 UTC (permalink / raw)
To: andrew
Cc: thomas.petazzoni, netdev, Jisheng.Zhang, linux-kernel,
linux-arm-kernel
In-Reply-To: <20180810222335.GD11955@lunn.ch>
From: Andrew Lunn <andrew@lunn.ch>
Date: Sat, 11 Aug 2018 00:23:35 +0200
> Please can you queue up:
>
> Fixes: 7a86f05faf11 ("net: ethernet: mvneta: Fix napi structure mixup on armada 3700")
>
> and this patch for stable.
Since these are now both in Linus's tree, done.
But, if one thinks the change belongs in -stable, target 'net' always.
^ permalink raw reply
* Re: [PATCH v2 06/29] mtd: Add support for reading MTD devices via the nvmem API
From: Boris Brezillon @ 2018-08-19 16:46 UTC (permalink / raw)
To: Alban, Srinivas Kandagatla
Cc: Bartosz Golaszewski, Jonathan Corbet, Sekhar Nori, Kevin Hilman,
Russell King, Arnd Bergmann, Greg Kroah-Hartman, David Woodhouse,
Brian Norris, Marek Vasut, Richard Weinberger, Grygorii Strashko,
David S . Miller, Naren, Mauro Carvalho Chehab, Andrew Morton,
Lukas Wunner, Dan Carpenter, Florian Fainelli, Ivan
In-Reply-To: <20180819133106.0420df5f@tock>
On Sun, 19 Aug 2018 13:31:06 +0200
Alban <albeu@free.fr> wrote:
> On Fri, 17 Aug 2018 18:27:20 +0200
> Boris Brezillon <boris.brezillon@bootlin.com> wrote:
>
> > Hi Bartosz,
> >
> > On Fri, 10 Aug 2018 10:05:03 +0200
> > Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >
> > > From: Alban Bedel <albeu@free.fr>
> > >
> > > Allow drivers that use the nvmem API to read data stored on MTD devices.
> > > For this the mtd devices are registered as read-only NVMEM providers.
> > >
> > > Signed-off-by: Alban Bedel <albeu@free.fr>
> > > [Bartosz:
> > > - use the managed variant of nvmem_register(),
> > > - set the nvmem name]
> > > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> >
> > What happened to the 2 other patches of Alban's series? I'd really
> > like the DT case to be handled/agreed on in the same patchset, but
> > IIRC, Alban and Srinivas disagreed on how this should be represented.
> > I hope this time we'll come to an agreement, because the MTD <-> NVMEM
> > glue has been floating around for quite some time...
>
> These other patches were to fix what I consider a fundamental flaw in
> the generic NVMEM bindings, however we couldn't agree on this point.
> Bartosz later contacted me to take over this series and I suggested to
> just change the MTD NVMEM binding to use a compatible string on the
> NVMEM cells as an alternative solution to fix the clash with the old
> style MTD partition.
>
> However all this has no impact on the code needed to add NVMEM support
> to MTD, so the above patch didn't change at all.
It does have an impact on the supported binding though.
nvmem->dev.of_node is automatically assigned to mtd->dev.of_node, which
means people will be able to define their NVMEM cells directly under
the MTD device and reference them from other nodes (even if it's not
documented), and as you said, it conflict with the old MTD partition
bindings. So we'd better agree on this binding before merging this
patch.
I see several options:
1/ provide a way to tell the NVMEM framework not to use parent->of_node
even if it's != NULL. This way we really don't support defining
NVMEM cells in the DT, and also don't support referencing the nvmem
device using a phandle.
2/ define a new binding where all nvmem-cells are placed in an
"nvmem" subnode (just like we have this "partitions" subnode for
partitions), and then add a config->of_node field so that the
nvmem provider can explicitly specify the DT node representing the
nvmem device. We'll also need to set this field to ERR_PTR(-ENOENT)
in case this node does not exist so that the nvmem framework knows
that it should not assign nvmem->dev.of_node to parent->of_node
3/ only declare partitions as nvmem providers. This would solve the
problem we have with partitions defined in the DT since
defining sub-partitions in the DT is not (yet?) supported and
partition nodes are supposed to be leaf nodes. Still, I'm not a big
fan of this solution because it will prevent us from supporting
sub-partitions if we ever want/need to.
4/ Add a ->of_xlate() hook that would be called if present by the
framework instead of using the default parsing we have right now.
5/ Tell the nvmem framework the name of the subnode containing nvmem
cell definitions (if NULL that means cells are directly defined
under the nvmem provider node). We would set it to "nvmem-cells" (or
whatever you like) for the MTD case.
There are probably other options (some were proposed by Alban and
Srinivas already), but I'd like to get this sorted out before we merge
this patch.
Alban, Srinivas, any opinion?
^ permalink raw reply
* [PATCH net-next v8 7/7] net: vhost: make busyloop_intr more accurate
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The patch uses vhost_has_work_pending() to check if
the specified handler is scheduled, because in the most case,
vhost_has_work() return true when other side handler is added
to worker list. Use the vhost_has_work_pending() insead of
vhost_has_work().
Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]
TCP_STREAM (netperf):
* Without the patch: 38035.39 Mbps, 3.37 us mean latency
* With the patch: 38409.44 Mbps, 3.34 us mean latency
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
drivers/vhost/net.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index db63ae2..b6939ef 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -487,10 +487,8 @@ static void vhost_net_busy_poll(struct vhost_net *net,
endtime = busy_clock() + busyloop_timeout;
while (vhost_can_busy_poll(endtime)) {
- if (vhost_has_work(&net->dev)) {
- *busyloop_intr = true;
+ if (vhost_has_work(&net->dev))
break;
- }
if ((sock_has_rx_data(sock) &&
!vhost_vq_avail_empty(&net->dev, rvq)) ||
@@ -513,6 +511,11 @@ static void vhost_net_busy_poll(struct vhost_net *net,
!vhost_has_work_pending(&net->dev, VHOST_NET_VQ_RX))
vhost_net_enable_vq(net, rvq);
+ if (vhost_has_work_pending(&net->dev,
+ poll_rx ?
+ VHOST_NET_VQ_RX: VHOST_NET_VQ_TX))
+ *busyloop_intr = true;
+
mutex_unlock(&vq->mutex);
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v8 6/7] net: vhost: disable rx wakeup during tx busypoll
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
In the handle_tx, the busypoll will vhost_net_disable/enable_vq
because we have poll the sock. This can improve performance.
This is suggested by Toshiaki Makita and Jason Wang.
If the rx handle is scheduled, we will not enable vq, because it's
not necessary. We do it not in last 'else' because if we receive
the data, but can't queue the rx handle(rx vring is full), then we
enable the vq to avoid case: guest receives the data, vring is not
full then guest can get more data, but vq is disabled, rx vq can't
be wakeup to receive more data.
Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]
TCP_STREAM (netperf):
* Without the patch: 37598.20 Mbps, 3.43 us mean latency
* With the patch: 38035.39 Mbps, 3.37 us mean latency
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
drivers/vhost/net.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 23d7ffc..db63ae2 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -480,6 +480,9 @@ static void vhost_net_busy_poll(struct vhost_net *net,
busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
tvq->busyloop_timeout;
+ if (!poll_rx)
+ vhost_net_disable_vq(net, rvq);
+
preempt_disable();
endtime = busy_clock() + busyloop_timeout;
@@ -506,6 +509,10 @@ static void vhost_net_busy_poll(struct vhost_net *net,
else /* On tx here, sock has no rx data. */
vhost_enable_notify(&net->dev, rvq);
+ if (!poll_rx &&
+ !vhost_has_work_pending(&net->dev, VHOST_NET_VQ_RX))
+ vhost_net_enable_vq(net, rvq);
+
mutex_unlock(&vq->mutex);
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v8 5/7] net: vhost: introduce bitmap for vhost_poll
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
The bitmap of vhost_dev can help us to check if the
specified poll is scheduled. This patch will be used
for next two patches.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
drivers/vhost/net.c | 11 +++++++++--
drivers/vhost/vhost.c | 17 +++++++++++++++--
drivers/vhost/vhost.h | 7 ++++++-
3 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1eff72d..23d7ffc 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1135,8 +1135,15 @@ static int vhost_net_open(struct inode *inode, struct file *f)
}
vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
- vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev);
- vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
+ vhost_poll_init(n->poll + VHOST_NET_VQ_TX,
+ handle_tx_net,
+ VHOST_NET_VQ_TX,
+ EPOLLOUT, dev);
+
+ vhost_poll_init(n->poll + VHOST_NET_VQ_RX,
+ handle_rx_net,
+ VHOST_NET_VQ_RX,
+ EPOLLIN, dev);
f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a1c06e7..dc88a60 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -186,7 +186,7 @@ void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
/* Init poll structure */
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
- __poll_t mask, struct vhost_dev *dev)
+ __u8 poll_id, __poll_t mask, struct vhost_dev *dev)
{
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
init_poll_funcptr(&poll->table, vhost_poll_func);
@@ -194,6 +194,7 @@ void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
poll->dev = dev;
poll->wqh = NULL;
+ poll->poll_id = poll_id;
vhost_work_init(&poll->work, fn);
}
EXPORT_SYMBOL_GPL(vhost_poll_init);
@@ -276,8 +277,16 @@ bool vhost_has_work(struct vhost_dev *dev)
}
EXPORT_SYMBOL_GPL(vhost_has_work);
+bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id)
+{
+ return !llist_empty(&dev->work_list) &&
+ test_bit(poll_id, dev->work_pending);
+}
+EXPORT_SYMBOL_GPL(vhost_has_work_pending);
+
void vhost_poll_queue(struct vhost_poll *poll)
{
+ set_bit(poll->poll_id, poll->dev->work_pending);
vhost_work_queue(poll->dev, &poll->work);
}
EXPORT_SYMBOL_GPL(vhost_poll_queue);
@@ -354,6 +363,7 @@ static int vhost_worker(void *data)
if (!node)
schedule();
+ bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
node = llist_reverse_order(node);
/* make sure flag is seen after deletion */
smp_wmb();
@@ -420,6 +430,8 @@ void vhost_dev_init(struct vhost_dev *dev,
struct vhost_virtqueue *vq;
int i;
+ BUG_ON(nvqs > VHOST_DEV_MAX_VQ);
+
dev->vqs = vqs;
dev->nvqs = nvqs;
mutex_init(&dev->mutex);
@@ -428,6 +440,7 @@ void vhost_dev_init(struct vhost_dev *dev,
dev->iotlb = NULL;
dev->mm = NULL;
dev->worker = NULL;
+ bitmap_zero(dev->work_pending, VHOST_DEV_MAX_VQ);
init_llist_head(&dev->work_list);
init_waitqueue_head(&dev->wait);
INIT_LIST_HEAD(&dev->read_list);
@@ -445,7 +458,7 @@ void vhost_dev_init(struct vhost_dev *dev,
vhost_vq_reset(dev, vq);
if (vq->handle_kick)
vhost_poll_init(&vq->poll, vq->handle_kick,
- EPOLLIN, dev);
+ i, EPOLLIN, dev);
}
}
EXPORT_SYMBOL_GPL(vhost_dev_init);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 6c844b9..60b6f6d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -30,6 +30,7 @@ struct vhost_poll {
wait_queue_head_t *wqh;
wait_queue_entry_t wait;
struct vhost_work work;
+ __u8 poll_id;
__poll_t mask;
struct vhost_dev *dev;
};
@@ -37,9 +38,10 @@ struct vhost_poll {
void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
bool vhost_has_work(struct vhost_dev *dev);
+bool vhost_has_work_pending(struct vhost_dev *dev, int poll_id);
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
- __poll_t mask, struct vhost_dev *dev);
+ __u8 id, __poll_t mask, struct vhost_dev *dev);
int vhost_poll_start(struct vhost_poll *poll, struct file *file);
void vhost_poll_stop(struct vhost_poll *poll);
void vhost_poll_flush(struct vhost_poll *poll);
@@ -152,6 +154,8 @@ struct vhost_msg_node {
struct list_head node;
};
+#define VHOST_DEV_MAX_VQ 128
+
struct vhost_dev {
struct mm_struct *mm;
struct mutex mutex;
@@ -159,6 +163,7 @@ struct vhost_dev {
int nvqs;
struct eventfd_ctx *log_ctx;
struct llist_head work_list;
+ DECLARE_BITMAP(work_pending, VHOST_DEV_MAX_VQ);
struct task_struct *worker;
struct vhost_umem *umem;
struct vhost_umem *iotlb;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v8 4/7] net: vhost: add rx busy polling in tx path
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This patch improves the guest receive performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.
We set the poll-us=100us and use the netperf to test throughput
and mean latency. When running the tests, the vhost-net kthread
of that VM, is alway 100% CPU. The commands are shown as below.
Rx performance is greatly improved by this patch. There is not
notable performance change on tx with this series though. This
patch is useful for bi-directional traffic.
netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY"
Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]
TCP_STREAM:
* Without the patch: 19842.95 Mbps, 6.50 us mean latency
* With the patch: 37598.20 Mbps, 3.43 us mean latency
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
drivers/vhost/net.c | 33 +++++++++++++--------------------
1 file changed, 13 insertions(+), 20 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 453c061..1eff72d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -510,31 +510,24 @@ static void vhost_net_busy_poll(struct vhost_net *net,
}
static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
- struct vhost_net_virtqueue *nvq,
+ struct vhost_net_virtqueue *tnvq,
unsigned int *out_num, unsigned int *in_num,
bool *busyloop_intr)
{
- struct vhost_virtqueue *vq = &nvq->vq;
- unsigned long uninitialized_var(endtime);
- int r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+ struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
+ struct vhost_virtqueue *rvq = &rnvq->vq;
+ struct vhost_virtqueue *tvq = &tnvq->vq;
+
+ int r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
out_num, in_num, NULL, NULL);
- if (r == vq->num && vq->busyloop_timeout) {
- if (!vhost_sock_zcopy(vq->private_data))
- vhost_net_signal_used(nvq);
- preempt_disable();
- endtime = busy_clock() + vq->busyloop_timeout;
- while (vhost_can_busy_poll(endtime)) {
- if (vhost_has_work(vq->dev)) {
- *busyloop_intr = true;
- break;
- }
- if (!vhost_vq_avail_empty(vq->dev, vq))
- break;
- cpu_relax();
- }
- preempt_enable();
- r = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+ if (r == tvq->num && tvq->busyloop_timeout) {
+ if (!vhost_sock_zcopy(tvq->private_data))
+ vhost_net_signal_used(tnvq);
+
+ vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false);
+
+ r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
out_num, in_num, NULL, NULL);
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v8 3/7] net: vhost: factor out busy polling logic to vhost_net_busy_poll()
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.
To avoid duplicate codes, introduce the helper functions:
* sock_has_rx_data(changed from sk_has_rx_data)
* vhost_net_busy_poll_try_queue
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
drivers/vhost/net.c | 111 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 71 insertions(+), 40 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 32c1b52..453c061 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -440,6 +440,75 @@ static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
nvq->done_idx = 0;
}
+static int sock_has_rx_data(struct socket *sock)
+{
+ if (unlikely(!sock))
+ return 0;
+
+ if (sock->ops->peek_len)
+ return sock->ops->peek_len(sock);
+
+ return skb_queue_empty(&sock->sk->sk_receive_queue);
+}
+
+static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
+ struct vhost_virtqueue *vq)
+{
+ if (!vhost_vq_avail_empty(&net->dev, vq)) {
+ vhost_poll_queue(&vq->poll);
+ } else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+ vhost_disable_notify(&net->dev, vq);
+ vhost_poll_queue(&vq->poll);
+ }
+}
+
+static void vhost_net_busy_poll(struct vhost_net *net,
+ struct vhost_virtqueue *rvq,
+ struct vhost_virtqueue *tvq,
+ bool *busyloop_intr,
+ bool poll_rx)
+{
+ unsigned long busyloop_timeout;
+ unsigned long endtime;
+ struct socket *sock;
+ struct vhost_virtqueue *vq = poll_rx ? tvq : rvq;
+
+ mutex_lock_nested(&vq->mutex, poll_rx ? VHOST_NET_VQ_TX: VHOST_NET_VQ_RX);
+ vhost_disable_notify(&net->dev, vq);
+ sock = rvq->private_data;
+
+ busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
+ tvq->busyloop_timeout;
+
+ preempt_disable();
+ endtime = busy_clock() + busyloop_timeout;
+
+ while (vhost_can_busy_poll(endtime)) {
+ if (vhost_has_work(&net->dev)) {
+ *busyloop_intr = true;
+ break;
+ }
+
+ if ((sock_has_rx_data(sock) &&
+ !vhost_vq_avail_empty(&net->dev, rvq)) ||
+ !vhost_vq_avail_empty(&net->dev, tvq))
+ break;
+
+ cpu_relax();
+ }
+
+ preempt_enable();
+
+ if (poll_rx)
+ vhost_net_busy_poll_try_queue(net, tvq);
+ else if (sock_has_rx_data(sock))
+ vhost_net_busy_poll_try_queue(net, rvq);
+ else /* On tx here, sock has no rx data. */
+ vhost_enable_notify(&net->dev, rvq);
+
+ mutex_unlock(&vq->mutex);
+}
+
static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_net_virtqueue *nvq,
unsigned int *out_num, unsigned int *in_num,
@@ -753,16 +822,6 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
return len;
}
-static int sk_has_rx_data(struct sock *sk)
-{
- struct socket *sock = sk->sk_socket;
-
- if (sock->ops->peek_len)
- return sock->ops->peek_len(sock);
-
- return skb_queue_empty(&sk->sk_receive_queue);
-}
-
static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
bool *busyloop_intr)
{
@@ -770,41 +829,13 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
struct vhost_virtqueue *rvq = &rnvq->vq;
struct vhost_virtqueue *tvq = &tnvq->vq;
- unsigned long uninitialized_var(endtime);
int len = peek_head_len(rnvq, sk);
- if (!len && tvq->busyloop_timeout) {
+ if (!len && rvq->busyloop_timeout) {
/* Flush batched heads first */
vhost_net_signal_used(rnvq);
/* Both tx vq and rx socket were polled here */
- mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
- vhost_disable_notify(&net->dev, tvq);
-
- preempt_disable();
- endtime = busy_clock() + tvq->busyloop_timeout;
-
- while (vhost_can_busy_poll(endtime)) {
- if (vhost_has_work(&net->dev)) {
- *busyloop_intr = true;
- break;
- }
- if ((sk_has_rx_data(sk) &&
- !vhost_vq_avail_empty(&net->dev, rvq)) ||
- !vhost_vq_avail_empty(&net->dev, tvq))
- break;
- cpu_relax();
- }
-
- preempt_enable();
-
- if (!vhost_vq_avail_empty(&net->dev, tvq)) {
- vhost_poll_queue(&tvq->poll);
- } else if (unlikely(vhost_enable_notify(&net->dev, tvq))) {
- vhost_disable_notify(&net->dev, tvq);
- vhost_poll_queue(&tvq->poll);
- }
-
- mutex_unlock(&tvq->mutex);
+ vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
len = peek_head_len(rnvq, sk);
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v8 2/7] net: vhost: replace magic number of lock annotation
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/net.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 367d802..32c1b52 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -712,7 +712,7 @@ static void handle_tx(struct vhost_net *net)
struct vhost_virtqueue *vq = &nvq->vq;
struct socket *sock;
- mutex_lock(&vq->mutex);
+ mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_TX);
sock = vq->private_data;
if (!sock)
goto out;
@@ -777,7 +777,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
/* Flush batched heads first */
vhost_net_signal_used(rnvq);
/* Both tx vq and rx socket were polled here */
- mutex_lock_nested(&tvq->mutex, 1);
+ mutex_lock_nested(&tvq->mutex, VHOST_NET_VQ_TX);
vhost_disable_notify(&net->dev, tvq);
preempt_disable();
@@ -919,7 +919,7 @@ static void handle_rx(struct vhost_net *net)
__virtio16 num_buffers;
int recv_pkts = 0;
- mutex_lock_nested(&vq->mutex, 0);
+ mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX);
sock = vq->private_data;
if (!sock)
goto out;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next v8 1/7] net: vhost: lock the vqs one by one
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: virtualization, netdev, Tonghao Zhang
In-Reply-To: <1534680686-3108-1-git-send-email-xiangxia.m.yue@gmail.com>
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vhost.c | 24 +++++++-----------------
1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a502f1a..a1c06e7 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -294,8 +294,11 @@ static void vhost_vq_meta_reset(struct vhost_dev *d)
{
int i;
- for (i = 0; i < d->nvqs; ++i)
+ for (i = 0; i < d->nvqs; ++i) {
+ mutex_lock(&d->vqs[i]->mutex);
__vhost_vq_meta_reset(d->vqs[i]);
+ mutex_unlock(&d->vqs[i]->mutex);
+ }
}
static void vhost_vq_reset(struct vhost_dev *dev,
@@ -890,20 +893,6 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
#define vhost_get_used(vq, x, ptr) \
vhost_get_user(vq, x, ptr, VHOST_ADDR_USED)
-static void vhost_dev_lock_vqs(struct vhost_dev *d)
-{
- int i = 0;
- for (i = 0; i < d->nvqs; ++i)
- mutex_lock_nested(&d->vqs[i]->mutex, i);
-}
-
-static void vhost_dev_unlock_vqs(struct vhost_dev *d)
-{
- int i = 0;
- for (i = 0; i < d->nvqs; ++i)
- mutex_unlock(&d->vqs[i]->mutex);
-}
-
static int vhost_new_umem_range(struct vhost_umem *umem,
u64 start, u64 size, u64 end,
u64 userspace_addr, int perm)
@@ -953,7 +942,10 @@ static void vhost_iotlb_notify_vq(struct vhost_dev *d,
if (msg->iova <= vq_msg->iova &&
msg->iova + msg->size - 1 > vq_msg->iova &&
vq_msg->type == VHOST_IOTLB_MISS) {
+ mutex_lock(&node->vq->mutex);
vhost_poll_queue(&node->vq->poll);
+ mutex_unlock(&node->vq->mutex);
+
list_del(&node->node);
kfree(node);
}
@@ -985,7 +977,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
int ret = 0;
mutex_lock(&dev->mutex);
- vhost_dev_lock_vqs(dev);
switch (msg->type) {
case VHOST_IOTLB_UPDATE:
if (!dev->iotlb) {
@@ -1019,7 +1010,6 @@ static int vhost_process_iotlb_msg(struct vhost_dev *dev,
break;
}
- vhost_dev_unlock_vqs(dev);
mutex_unlock(&dev->mutex);
return ret;
--
1.8.3.1
^ permalink raw reply related
* Re: [PATCH v2 06/29] mtd: Add support for reading MTD devices via the nvmem API
From: Alban @ 2018-08-19 11:31 UTC (permalink / raw)
To: Boris Brezillon
Cc: Aban Bedel, Bartosz Golaszewski, Jonathan Corbet, Sekhar Nori,
Kevin Hilman, Russell King, Arnd Bergmann, Greg Kroah-Hartman,
David Woodhouse, Brian Norris, Marek Vasut, Richard Weinberger,
Grygorii Strashko, David S . Miller, Srinivas Kandagatla, Naren,
Mauro Carvalho Chehab, Andrew Morton, Lukas Wunner,
Dan Carpenter <dan.c
In-Reply-To: <20180817182720.6a6e5e8e@bbrezillon>
[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]
On Fri, 17 Aug 2018 18:27:20 +0200
Boris Brezillon <boris.brezillon@bootlin.com> wrote:
> Hi Bartosz,
>
> On Fri, 10 Aug 2018 10:05:03 +0200
> Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> > From: Alban Bedel <albeu@free.fr>
> >
> > Allow drivers that use the nvmem API to read data stored on MTD devices.
> > For this the mtd devices are registered as read-only NVMEM providers.
> >
> > Signed-off-by: Alban Bedel <albeu@free.fr>
> > [Bartosz:
> > - use the managed variant of nvmem_register(),
> > - set the nvmem name]
> > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>
> What happened to the 2 other patches of Alban's series? I'd really
> like the DT case to be handled/agreed on in the same patchset, but
> IIRC, Alban and Srinivas disagreed on how this should be represented.
> I hope this time we'll come to an agreement, because the MTD <-> NVMEM
> glue has been floating around for quite some time...
These other patches were to fix what I consider a fundamental flaw in
the generic NVMEM bindings, however we couldn't agree on this point.
Bartosz later contacted me to take over this series and I suggested to
just change the MTD NVMEM binding to use a compatible string on the
NVMEM cells as an alternative solution to fix the clash with the old
style MTD partition.
However all this has no impact on the code needed to add NVMEM support
to MTD, so the above patch didn't change at all.
Alban
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Cześć słodka!!!
From: Wesley @ 2018-08-19 10:25 UTC (permalink / raw)
Jak się dzisiaj czujesz, mam nadzieję, że wszystko jest w porządku, cieszę się, że mogę się z tobą spotkać. W każdym razie jestem Wesley ze Stanów Zjednoczonych Ameryki, przebywa obecnie w Syrii na misję pokojową. Chcę cię lepiej poznać, jeśli mogę być odważny. Uważam się za łatwego człowieka, a obecnie szukam związku, w którym czuję się kochany. Proszę wybaczyć moje maniery, nie jestem dobry, jeśli chodzi o Internet, ponieważ to nie jest moja dziedzina. W Syrii nie wolno nam wychodzić, co sprawia, że jest dla mnie bardzo znudzony, więc myślę, że potrzebuję przyjaciela do rozmowy z trzymaj mnie!
^ permalink raw reply
* [PATCH net-next v8 0/7] net: vhost: improve performance when enable busyloop
From: xiangxia.m.yue @ 2018-08-19 12:11 UTC (permalink / raw)
To: jasowang, mst, makita.toshiaki; +Cc: netdev, virtualization
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
This patches improve the guest receive performance.
On the handle_tx side, we poll the sock receive queue
at the same time. handle_rx do that in the same way.
For more performance report, see patch 4, 6, 7
Tonghao Zhang (7):
net: vhost: lock the vqs one by one
net: vhost: replace magic number of lock annotation
net: vhost: factor out busy polling logic to vhost_net_busy_poll()
net: vhost: add rx busy polling in tx path
net: vhost: introduce bitmap for vhost_poll
net: vhost: disable rx wakeup during tx busypoll
net: vhost: make busyloop_intr more accurate
drivers/vhost/net.c | 169 +++++++++++++++++++++++++++++++-------------------
drivers/vhost/vhost.c | 41 ++++++------
drivers/vhost/vhost.h | 7 ++-
3 files changed, 133 insertions(+), 84 deletions(-)
--
1.8.3.1
^ permalink raw reply
* Re: how to (cross)connect two (physical) eth ports for ping test?
From: Robert P. J. Day @ 2018-08-19 8:29 UTC (permalink / raw)
To: Willy Tarreau; +Cc: Andrew Lunn, Linux kernel netdev mailing list
In-Reply-To: <20180818204520.GC8729@1wt.eu>
On Sat, 18 Aug 2018, Willy Tarreau wrote:
> On Sat, Aug 18, 2018 at 09:10:25PM +0200, Andrew Lunn wrote:
> > On Sat, Aug 18, 2018 at 01:39:50PM -0400, Robert P. J. Day wrote:
> > >
> > > (i'm sure this has been explained many times before, so a link
> > > covering this will almost certainly do just fine.)
> > >
> > > i want to loop one physical ethernet port into another, and just
> > > ping the daylights from one to the other for stress testing. my fedora
> > > laptop doesn't actually have two unused ethernet ports, so i just want
> > > to emulate this by slapping a couple startech USB/net adapters into
> > > two empty USB ports, setting this up, then doing it all over again
> > > monday morning on the actual target system, which does have multiple
> > > ethernet ports.
> > >
> > > so if someone can point me to the recipe, that would be great and
> > > you can stop reading.
> > >
> > > as far as my tentative solution goes, i assume i need to put at
> > > least one of the physical ports in a network namespace via "ip netns",
> > > then ping from the netns to the root namespace. or, going one step
> > > further, perhaps putting both interfaces into two new namespaces, and
> > > setting up forwarding.
> >
> > Namespaces is a good solution. Something like this should work:
> >
> > ip netns add namespace1
> > ip netns add namespace2
> >
> > ip link set eth1 netns namespace1
> > ip link set eth2 netns namespace2
> >
> > ip netns exec namespace1 \
> > ip addr add 10.42.42.42/24 dev eth1
> >
> > ip netns exec namespace1 \
> > ip link set eth1 up
> >
> > ip netns exec namespace2 \
> > ip addr add 10.42.42.24/24 dev eth2
> >
> > ip netns exec namespace2 \
> > ip link set eth2 up
> >
> > ip netns exec namespace1 \
> > ping 10.42.42.24
> >
> > You might also want to consider iperf3 for stress testing, depending
> > on the sort of stress you need.
>
> FWIW I have a setup somewhere involving ip rule + ip route which
> achieves the same without involving namespaces. It's a bit hackish
> but sometimes convenient. I can dig if someone is interested.
sure, i'm interested ... always educational to see different
solutions.
rday
--
========================================================================
Robert P. J. Day Ottawa, Ontario, CANADA
http://crashcourse.ca/dokuwiki
Twitter: http://twitter.com/rpjday
LinkedIn: http://ca.linkedin.com/in/rpjday
========================================================================
^ permalink raw reply
* ***SPAM***会议记录【太 陽 城集团:401362。COM 送 您150%老 虎] 机大牌优 惠,存100送100 二存再送50 元,天天反水2.0%起,
From: Mathioudaki Athina @ 2018-08-19 4:56 UTC (permalink / raw)
To: netdev
Thank you for your e-mail.
Tech-Line S.A will be closed for summer holidays from August 6th to Friday August 25th inclusive.
I will reply to your e-mail upon my return.
Best regards,
Athina Mathioudaki
^ permalink raw reply
* [PATCH 2/2] ip6_vti: fix creating fallback tunnel device for vti6
From: Haishuang Yan @ 2018-08-19 7:05 UTC (permalink / raw)
To: Steffen Klassert, David S. Miller, Alexey Kuznetsov
Cc: netdev, linux-kernel, Haishuang Yan
In-Reply-To: <1534662305-16734-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
device for vti6 when a new namespace is created.
Tested:
[root@builder2 ~]# modprobe ip6_tunnel
[root@builder2 ~]# modprobe ip6_vti
[root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
[root@builder2 ~]# unshare -n
[root@builder2 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
net/ipv6/ip6_vti.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index c72ae3a..3b9f39f 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -1114,6 +1114,8 @@ static int __net_init vti6_init_net(struct net *net)
ip6n->tnls[0] = ip6n->tnls_wc;
ip6n->tnls[1] = ip6n->tnls_r_l;
+ if (!net_has_fallback_tunnels(net))
+ return 0;
err = -ENOMEM;
ip6n->fb_tnl_dev = alloc_netdev(sizeof(struct ip6_tnl), "ip6_vti0",
NET_NAME_UNKNOWN, vti6_dev_setup);
--
1.8.3.1
^ permalink raw reply related
* [PATCH 1/2] ip_vti: fix a null pointer deferrence when create vti fallback tunnel
From: Haishuang Yan @ 2018-08-19 7:05 UTC (permalink / raw)
To: Steffen Klassert, David S. Miller, Alexey Kuznetsov
Cc: netdev, linux-kernel, Haishuang Yan, Eric Dumazet
After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
be NULL and will cause following crash:
[ 2742.849298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000941
[ 2742.851380] PGD 800000042c21a067 P4D 800000042c21a067 PUD 42aaed067 PMD 0
[ 2742.852818] Oops: 0002 [#1] SMP PTI
[ 2742.853570] CPU: 7 PID: 2484 Comm: unshare Kdump: loaded Not tainted 4.18.0-rc8+ #2
[ 2742.855163] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2742.856970] RIP: 0010:vti_init_net+0x3a/0x50 [ip_vti]
[ 2742.858034] Code: 90 83 c0 48 c7 c2 20 a1 83 c0 48 89 fb e8 6e 3b f6 ff 85 c0 75 22 8b 0d f4 19 00 00 48 8b 93 00 14 00 00 48 8b 14 ca 48 8b 12 <c6> 82 41 09 00 00 04 c6 82 38 09 00 00 45 5b c3 66 0f 1f 44 00 00
[ 2742.861940] RSP: 0018:ffff9be28207fde0 EFLAGS: 00010246
[ 2742.863044] RAX: 0000000000000000 RBX: ffff8a71ebed4980 RCX: 0000000000000013
[ 2742.864540] RDX: 0000000000000000 RSI: 0000000000000013 RDI: ffff8a71ebed4980
[ 2742.866020] RBP: ffff8a71ea717000 R08: ffffffffc083903c R09: ffff8a71ea717000
[ 2742.867505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a71ebed4980
[ 2742.868987] R13: 0000000000000013 R14: ffff8a71ea5b49c0 R15: 0000000000000000
[ 2742.870473] FS: 00007f02266c9740(0000) GS:ffff8a71ffdc0000(0000) knlGS:0000000000000000
[ 2742.872143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2742.873340] CR2: 0000000000000941 CR3: 000000042bc20006 CR4: 00000000001606e0
[ 2742.874821] Call Trace:
[ 2742.875358] ops_init+0x38/0xf0
[ 2742.876078] setup_net+0xd9/0x1f0
[ 2742.876789] copy_net_ns+0xb7/0x130
[ 2742.877538] create_new_namespaces+0x11a/0x1d0
[ 2742.878525] unshare_nsproxy_namespaces+0x55/0xa0
[ 2742.879526] ksys_unshare+0x1a7/0x330
[ 2742.880313] __x64_sys_unshare+0xe/0x20
[ 2742.881131] do_syscall_64+0x5b/0x180
[ 2742.881933] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reproduce:
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
modprobe ip_vti
unshare -n
Fixes: 79134e6ce2c9 (net: do not create fallback tunnels for non-default
namespaces)
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
---
net/ipv4/ip_vti.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 3f091cc..f38cb21 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -438,7 +438,8 @@ static int __net_init vti_init_net(struct net *net)
if (err)
return err;
itn = net_generic(net, vti_net_id);
- vti_fb_tunnel_init(itn->fb_tunnel_dev);
+ if (itn->fb_tunnel_dev)
+ vti_fb_tunnel_init(itn->fb_tunnel_dev);
return 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH][net-next] vxlan: reduce dirty cache line in vxlan_find_mac
From: Li RongQing @ 2018-08-19 3:36 UTC (permalink / raw)
To: netdev
vxlan_find_mac() unconditionally set f->used for every packet,
this cause a cache miss for every packet, since remote, hlist
and used of vxlan_fdb share the same cacheline.
With this change f->used is set only if not equal to jiffies
This gives up to 5% speed-up with small packets.
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
drivers/net/vxlan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index ababba37d735..e5d236595206 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -464,7 +464,7 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
struct vxlan_fdb *f;
f = __vxlan_find_mac(vxlan, mac, vni);
- if (f)
+ if (f && f->used != jiffies)
f->used = jiffies;
return f;
--
2.16.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox