[PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Krister Johansen <kjlx@templeofstupid.com>
To: davem@davemloft.net, eric.dumazet@gmail.com
Cc: netdev@vger.kernel.org
Subject: [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free.
Date: Thu, 8 Jun 2017 13:12:38 -0700	[thread overview]
Message-ID: <20170608201238.GC2553@templeofstupid.com> (raw)

It looks like this:

Message from syslogd@flamingo at Apr 26 00:45:00 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4

They seem to coincide with net namespace teardown.

The message is emitted by netdev_wait_allrefs().

Forced a kdump in netdev_run_todo, but found that the refcount on the lo
device was already 0 at the time we got to the panic.

Used bcc to check the blocking in netdev_run_todo.  The only places
where we're off cpu there are in the rcu_barrier() and msleep() calls.
That behavior is expected.  The msleep time coincides with the amount of
time we spend waiting for the refcount to reach zero; the rcu_barrier()
wait times are not excessive.

After looking through the list of callbacks that the netdevice notifiers
invoke in this path, it appears that the dst_dev_event is the most
interesting.  The dst_ifdown path places a hold on the loopback_dev as
part of releasing the dev associated with the original dst cache entry.
Most of our notifier callbacks are straight-forward, but this one a)
looks complex, and b) places a hold on the network interface in
question.

I constructed a new bcc script that watches various events in the
liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
the loopback device until the invalidated dst entry gets freed.

[      __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183
    __dst_free
    rcu_nocb_kthread
    kthread
    ret_from_fork
---
[     dst_gc_task] completed in          4: start: 1282115783057516 end 1282115783062462
[     dst_gc_task] completed in          5: start: 1282116447054101 end 1282116447059392
[     dst_gc_task] completed in          5: start: 1282118111030391 end 1282118111036041
[     dst_gc_task] completed in          6: start: 1282121247074938 end 1282121247081141
[     dst_gc_task] completed in          5: start: 1282126815053252 end 1282126815058751
[     dst_gc_task] completed in          6: start: 1282135007055673 end 1282135007061877
[     dst_gc_task] completed in          6: start: 1282145759021745 end 1282145759027830
[     dst_release] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282147838674257
    dst_release
    ip_queue_xmit
    tcp_transmit_skb
    tcp_write_xmit
    __tcp_push_pending_frames
    tcp_send_fin
    tcp_close
    inet_release
    sock_release
    sock_close
    __fput
    ____fput
    task_work_run
    do_exit
    do_group_exit
    SyS_exit_group
    do_syscall_64
    return_from_SYSCALL_64
[      dst_ifdown] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282148067377680
    dst_ifdown
    notifier_call_chain
    raw_notifier_call_chain
    call_netdevice_notifiers_info
    netdev_run_todo
    kretprobe_trampoline
    default_device_exit_batch
    ops_exit_list.isra.4
    cleanup_net
    process_one_work
    worker_thread
    kthread
    ret_from_fork
---
[      dst_ifdown] on DST: ffff883ccabb7900 IF lo invoked at 1282148067411269
    dst_ifdown
    notifier_call_chain
    raw_notifier_call_chain
    call_netdevice_notifiers_info
    netdev_run_todo
    kretprobe_trampoline
    default_device_exit_batch
    ops_exit_list.isra.4
    cleanup_net
    process_one_work
    worker_thread
    kthread
    ret_from_fork
---
<...>
[     dst_destroy] on DST: ffff883ccabb7900 IF lo invoked at 1282160607074022
    dst_destroy
    kretprobe_trampoline
    worker_thread
    kthread
    ret_from_fork
---
[     dst_gc_task] completed in         30: start: 1282160607047161 end 1282160607077572

The way this works is that if there's still a reference on the dst entry
at the time we try to free it, it gets placed in the gc list by
__dst_free and the dst_destroy() call is invoked by the gc task once the
refcount is 0.  If the gc task processes a 10th or less of its entries
on a single pass, it inreases the amount of time it waits between gc
intervals.

Looking at the gc_task intervals, they started at 663ms when we invoked
__dst_free().  After that, they increased to 1663, 3136, 5567, 8191,
10751, and 14848.  The release that set the refcnt to 0 on our dst entry
occurred after the gc_task was enqueued for 14 second interval so we had
to wait longer than the warning time in wait_allrefs in order for the
dst entry to get free'd and the hold on 'lo' to be released.

A simple solution to this problem is to have dst_dev_event() reset the
gc timer, which causes us to process this list shortly after the
gc_mutex is relased when dst_dev_event() completes.

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
---
 net/core/dst.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/core/dst.c b/net/core/dst.c
index 6192f11..13ba4a0 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -469,6 +469,20 @@ static int dst_dev_event(struct notifier_block *this, unsigned long event,
 		spin_lock_bh(&dst_garbage.lock);
 		dst = dst_garbage.list;
 		dst_garbage.list = NULL;
+		/* The code in dst_ifdown places a hold on the loopback device.
+		 * If the gc entry processing is set to expire after a lengthy
+		 * interval, this hold can cause netdev_wait_allrefs() to hang
+		 * out and wait for a long time -- until the the loopback
+		 * interface is released.  If we're really unlucky, it'll emit
+		 * pr_emerg messages to console too.  Reset the interval here,
+		 * so dst cleanups occur in a more timely fashion.
+		 */
+		if (dst_garbage.timer_inc > DST_GC_INC) {
+			dst_garbage.timer_inc = DST_GC_INC;
+			dst_garbage.timer_expires = DST_GC_MIN;
+			mod_delayed_work(system_wq, &dst_gc_work,
+					 dst_garbage.timer_expires);
+		}
 		spin_unlock_bh(&dst_garbage.lock);

 		if (last)
-- 
2.7.4

next             reply	other threads:[~2017-06-08 20:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-08 20:12 Krister Johansen [this message]
2017-06-08 21:06 ` [PATCH net] Fix an intermittent pr_emerg warning about lo becoming free Eric Dumazet
2017-06-08 21:41   ` Wei Wang
2017-06-09 16:28 ` David Miller
2017-06-09 18:18 ` Cong Wang
2017-06-09 18:43   ` Krister Johansen
2017-06-09 21:48     ` Cong Wang

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:6192f11 dfblob:13ba4a0 )
 OR (
bs:"[PATCH net] Fix an intermittent pr_emerg warning about lo becoming free." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170608201238.GC2553@templeofstupid.com \
    --to=kjlx@templeofstupid.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.