From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Hangbin Liu <haliu@redhat.com>,
Jan Tluka <jtluka@redhat.com>,
Lance Richardson <lrichard@redhat.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.4 35/73] vti: flush x-netns xfrm cache when vti interface is removed
Date: Wed, 28 Sep 2016 11:05:05 +0200 [thread overview]
Message-ID: <20160928090437.171333198@linuxfoundation.org> (raw)
In-Reply-To: <20160928090434.509091655@linuxfoundation.org>
4.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lance Richardson <lrichard@redhat.com>
[ Upstream commit a5d0dc810abf3d6b241777467ee1d6efb02575fc ]
When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):
kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.
Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.
A more detailed description of the problem scenario (referencing commands
in the script below):
(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
The vti_test interface is created in the init namespace. vti_tunnel_init()
attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
setting the tunnel net to &init_net.
(2) ip link set vti_test netns secure
The vti_test interface is moved to the "secure" netns. Note that
the associated struct ip_tunnel still has tunnel->net set to &init_net.
(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
The first packet sent using the vti device causes xfrm_lookup() to be
called as follows:
dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);
Note that tunnel->net is the init namespace, while skb_dst(skb) references
the vti_test interface in the "secure" namespace. The returned dst
references an interface in the init namespace.
Also note that the first parameter to xfrm_lookup() determines which flow
cache is used to store the computed xfrm bundle, so after xfrm_lookup()
returns there will be a cached bundle in the init namespace flow cache
with a dst referencing a device in the "secure" namespace.
(4) ip netns del secure
Kernel begins to delete the "secure" namespace. At some point the
vti_test interface is deleted, at which point dst_ifdown() changes
the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
in the "secure" namespace however).
Since nothing has happened to cause the init namespace's flow cache
to be garbage collected, this dst remains attached to the flow cache,
so the kernel loops waiting for the last reference to lo to go away.
<Begin script>
ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8
ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1
ip netns del secure
<End script>
Reported-by: Hangbin Liu <haliu@redhat.com>
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/ip_vti.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -540,6 +540,33 @@ static struct rtnl_link_ops vti_link_ops
.get_link_net = ip_tunnel_get_link_net,
};
+static bool is_vti_tunnel(const struct net_device *dev)
+{
+ return dev->netdev_ops == &vti_netdev_ops;
+}
+
+static int vti_device_event(struct notifier_block *unused,
+ unsigned long event, void *ptr)
+{
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ struct ip_tunnel *tunnel = netdev_priv(dev);
+
+ if (!is_vti_tunnel(dev))
+ return NOTIFY_DONE;
+
+ switch (event) {
+ case NETDEV_DOWN:
+ if (!net_eq(tunnel->net, dev_net(dev)))
+ xfrm_garbage_collect(tunnel->net);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block vti_notifier_block __read_mostly = {
+ .notifier_call = vti_device_event,
+};
+
static int __init vti_init(void)
{
const char *msg;
@@ -547,6 +574,8 @@ static int __init vti_init(void)
pr_info("IPv4 over IPsec tunneling driver\n");
+ register_netdevice_notifier(&vti_notifier_block);
+
msg = "tunnel device";
err = register_pernet_device(&vti_net_ops);
if (err < 0)
@@ -579,6 +608,7 @@ xfrm_proto_ah_failed:
xfrm_proto_esp_failed:
unregister_pernet_device(&vti_net_ops);
pernet_dev_failed:
+ unregister_netdevice_notifier(&vti_notifier_block);
pr_err("vti init: failed to register %s\n", msg);
return err;
}
@@ -590,6 +620,7 @@ static void __exit vti_fini(void)
xfrm4_protocol_deregister(&vti_ah4_protocol, IPPROTO_AH);
xfrm4_protocol_deregister(&vti_esp4_protocol, IPPROTO_ESP);
unregister_pernet_device(&vti_net_ops);
+ unregister_netdevice_notifier(&vti_notifier_block);
}
module_init(vti_init);
next prev parent reply other threads:[~2016-09-28 9:07 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20160928090623uscas1p1076bd85a3fd981ed5a1284f5bebb1bbf@uscas1p1.samsung.com>
2016-09-28 9:04 ` [PATCH 4.4 00/73] 4.4.23-stable review Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 01/73] include/linux/kernel.h: change abs() macro so it uses consistent return type Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 02/73] Fix build warning in kernel/cpuset.c Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 03/73] reiserfs: fix "new_insert_key may be used uninitialized ..." Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 09/73] crypto: arm64/aes-ctr - fix NULL dereference in tail processing Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 10/73] crypto: arm/aes-ctr " Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 11/73] crypto: skcipher - Fix blkcipher walk OOM crash Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 12/73] crypto: echainiv - Replace chaining with multiplication Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 13/73] ocfs2/dlm: fix race between convert and migration Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 14/73] ocfs2: fix start offset to ocfs2_zero_range_for_truncate() Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 15/73] kbuild: Do not run modules_install and install in paralel Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 16/73] Makefile: revert "Makefile: Document ability to make file.lst and file.S" partially Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 17/73] tools: Support relative directory path for O= Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 18/73] kbuild: forbid kernel directory to contain spaces and colons Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 19/73] Kbuild: disable maybe-uninitialized warning for CONFIG_PROFILE_ALL_BRANCHES Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 20/73] gcov: disable -Wmaybe-uninitialized warning Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 21/73] Disable "maybe-uninitialized" warning globally Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 23/73] Makefile: Mute warning for __builtin_return_address(>0) for tracing only Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 24/73] net: caif: fix misleading indentation Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 26/73] [media] am437x-vfpe: fix typo in vpfe_get_app_input_index Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 27/73] ath9k: fix misleading indentation Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 28/73] iwlegacy: avoid warning about missing braces Greg Kroah-Hartman
2016-09-28 9:04 ` [PATCH 4.4 29/73] Staging: iio: adc: fix indent on break statement Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 30/73] nouveau: fix nv40_perfctr_next() cleanup regression Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 31/73] megaraid: fix null pointer check in megasas_detach_one() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 32/73] bonding: Fix bonding crash Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 33/73] Revert "af_unix: Fix splice-bind deadlock" Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 34/73] af_unix: split u->readlock into two: iolock and bindlock Greg Kroah-Hartman
2016-09-28 9:05 ` Greg Kroah-Hartman [this message]
2016-09-28 9:05 ` [PATCH 4.4 36/73] net/irda: handle iriap_register_lsap() allocation failure Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 37/73] tipc: fix NULL pointer dereference in shutdown() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 38/73] net/mlx5: Added missing check of msg length in verifying its signature Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 39/73] net: dsa: bcm_sf2: Fix race condition while unmasking interrupts Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 40/73] Revert "phy: IRQ cannot be shared" Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 41/73] net: smc91x: fix SMC accesses Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 42/73] bridge: re-introduce fix parsing of MLDv2 reports Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 43/73] pwm: Mark all devices as "might sleep" Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 44/73] autofs races Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 45/73] autofs: use dentry flags to block walks during expire Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 46/73] xfs: prevent dropping ioend completions during buftarg wait Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 47/73] fsnotify: add a way to stop queueing events on group shutdown Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 48/73] fanotify: fix list corruption in fanotify_get_response() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 49/73] fix fault_in_multipages_...() on architectures with no-op access_ok() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 50/73] mtd: maps: sa1100-flash: potential NULL dereference Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 51/73] mtd: pmcmsp-flash: Allocating too much in init_msp_flash() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 52/73] power: reset: hisi-reboot: Unmap region obtained by of_iomap Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 53/73] fix memory leaks in tracing_buffers_splice_read() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 54/73] tracing: Move mutex to protect against resetting of seq data Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 55/73] mm: delete unnecessary and unsafe init_tlb_ubc() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 56/73] can: flexcan: fix resume function Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 57/73] nl80211: validate number of probe response CSA counters Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 58/73] btrfs: ensure that file descriptor used with subvol ioctls is a dir Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 59/73] i2c-eg20t: fix race between i2c init and interrupt enable Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 60/73] i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 61/73] MIPS: Fix pre-r6 emulation FPU initialisation Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 63/73] MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 64/73] MIPS: Remove compact branch policy Kconfig entries Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 65/73] MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...) Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 66/73] MIPS: Add a missing ".set pop" in an early commit Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 67/73] MIPS: paravirt: Fix undefined reference to smp_bootstrap Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 68/73] PM / hibernate: Restore processor state before using per-CPU variables Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 69/73] PM / hibernate: Fix rtree_next_node() to avoid walking off list ends Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 70/73] power_supply: tps65217-charger: fix missing platform_set_drvdata() Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 71/73] power: supply: max17042_battery: fix model download bug Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 72/73] qxl: check for kmap failures Greg Kroah-Hartman
2016-09-28 9:05 ` [PATCH 4.4 73/73] hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() Greg Kroah-Hartman
2016-09-28 16:45 ` [PATCH 4.4 00/73] 4.4.23-stable review Shuah Khan
2016-09-28 22:43 ` Guenter Roeck
[not found] ` <57ec0f9e.07ddc20a.146f7.4be3@mx.google.com>
2016-09-29 9:01 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160928090437.171333198@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=haliu@redhat.com \
--cc=jtluka@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lrichard@redhat.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).