From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fengguang Wu Subject: Re: [PATCH] tcp: fix possible socket refcount problem Date: Tue, 21 Aug 2012 10:07:50 +0800 Message-ID: <20120821020750.GA12612@localhost> References: <20120818021918.GA6499@localhost> <1345380682.5158.201.camel@edumazet-glaptop> <1345458166.5158.316.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Miller , networking To: Eric Dumazet Return-path: Received: from mga11.intel.com ([192.55.52.93]:34883 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754206Ab2HUCHx (ORCPT ); Mon, 20 Aug 2012 22:07:53 -0400 Content-Disposition: inline In-Reply-To: <1345458166.5158.316.camel@edumazet-glaptop> Sender: netdev-owner@vger.kernel.org List-ID: I'm sure it fixed the problem: up to now, there are 264 boots (each taking 1.5 hours) without a single failure. Thanks for the quick fix! On Mon, Aug 20, 2012 at 12:22:46PM +0200, Eric Dumazet wrote: > From: Eric Dumazet > > Commit 6f458dfb40 (tcp: improve latencies of timer triggered events) > added bug leading to following trace : > > [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000 > [ 2866.131726] > [ 2866.132188] ========================= > [ 2866.132281] [ BUG: held lock freed! ] > [ 2866.132281] 3.6.0-rc1+ #622 Not tainted > [ 2866.132281] ------------------------- > [ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there! > [ 2866.132281] (sk_lock-AF_INET-RPC){+.+...}, at: [] tcp_sendmsg+0x29/0xcc6 > [ 2866.132281] 4 locks held by kworker/0:1/652: > [ 2866.132281] #0: (rpciod){.+.+.+}, at: [] process_one_work+0x1de/0x47f > [ 2866.132281] #1: ((&task->u.tk_work)){+.+.+.}, at: [] process_one_work+0x1de/0x47f > [ 2866.132281] #2: (sk_lock-AF_INET-RPC){+.+...}, at: [] tcp_sendmsg+0x29/0xcc6 > [ 2866.132281] #3: (&icsk->icsk_retransmit_timer){+.-...}, at: [] run_timer_softirq+0x1ad/0x35f > [ 2866.132281] > [ 2866.132281] stack backtrace: > [ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622 > [ 2866.132281] Call Trace: > [ 2866.132281] [] debug_check_no_locks_freed+0x112/0x159 > [ 2866.132281] [] ? __sk_free+0xfd/0x114 > [ 2866.132281] [] kmem_cache_free+0x6b/0x13a > [ 2866.132281] [] __sk_free+0xfd/0x114 > [ 2866.132281] [] sk_free+0x1c/0x1e > [ 2866.132281] [] tcp_write_timer+0x51/0x56 > [ 2866.132281] [] run_timer_softirq+0x218/0x35f > [ 2866.132281] [] ? run_timer_softirq+0x1ad/0x35f > [ 2866.132281] [] ? rb_commit+0x58/0x85 > [ 2866.132281] [] ? tcp_write_timer_handler+0x148/0x148 > [ 2866.132281] [] __do_softirq+0xcb/0x1f9 > [ 2866.132281] [] ? _raw_spin_unlock+0x29/0x2e > [ 2866.132281] [] call_softirq+0x1c/0x30 > [ 2866.132281] [] do_softirq+0x4a/0xa6 > [ 2866.132281] [] irq_exit+0x51/0xad > [ 2866.132281] [] do_IRQ+0x9d/0xb4 > [ 2866.132281] [] common_interrupt+0x6f/0x6f > [ 2866.132281] [] ? sched_clock_cpu+0x58/0xd1 > [ 2866.132281] [] ? _raw_spin_unlock_irqrestore+0x4c/0x56 > [ 2866.132281] [] mod_timer+0x178/0x1a9 > [ 2866.132281] [] sk_reset_timer+0x19/0x26 > [ 2866.132281] [] tcp_rearm_rto+0x99/0xa4 > [ 2866.132281] [] tcp_event_new_data_sent+0x6e/0x70 > [ 2866.132281] [] tcp_write_xmit+0x7de/0x8e4 > [ 2866.132281] [] ? __alloc_skb+0xa0/0x1a1 > [ 2866.132281] [] __tcp_push_pending_frames+0x2e/0x8a > [ 2866.132281] [] tcp_sendmsg+0xb32/0xcc6 > [ 2866.132281] [] inet_sendmsg+0xaa/0xd5 > [ 2866.132281] [] ? inet_autobind+0x5f/0x5f > [ 2866.132281] [] ? trace_clock_local+0x9/0xb > [ 2866.132281] [] sock_sendmsg+0xa3/0xc4 > [ 2866.132281] [] ? rb_reserve_next_event+0x26f/0x2d5 > [ 2866.132281] [] ? native_sched_clock+0x29/0x6f > [ 2866.132281] [] ? sched_clock+0x9/0xd > [ 2866.132281] [] ? trace_clock_local+0x9/0xb > [ 2866.132281] [] kernel_sendmsg+0x37/0x43 > [ 2866.132281] [] xs_send_kvec+0x77/0x80 > [ 2866.132281] [] xs_sendpages+0x6f/0x1a0 > [ 2866.132281] [] ? try_to_del_timer_sync+0x55/0x61 > [ 2866.132281] [] xs_tcp_send_request+0x55/0xf1 > [ 2866.132281] [] xprt_transmit+0x89/0x1db > [ 2866.132281] [] ? call_connect+0x3c/0x3c > [ 2866.132281] [] call_transmit+0x1c5/0x20e > [ 2866.132281] [] __rpc_execute+0x6f/0x225 > [ 2866.132281] [] ? call_connect+0x3c/0x3c > [ 2866.132281] [] rpc_async_schedule+0x28/0x34 > [ 2866.132281] [] process_one_work+0x24d/0x47f > [ 2866.132281] [] ? process_one_work+0x1de/0x47f > [ 2866.132281] [] ? __rpc_execute+0x225/0x225 > [ 2866.132281] [] worker_thread+0x236/0x317 > [ 2866.132281] [] ? process_scheduled_works+0x2f/0x2f > [ 2866.132281] [] kthread+0x9a/0xa2 > [ 2866.132281] [] kernel_thread_helper+0x4/0x10 > [ 2866.132281] [] ? retint_restore_args+0x13/0x13 > [ 2866.132281] [] ? __init_kthread_worker+0x5a/0x5a > [ 2866.132281] [] ? gs_change+0x13/0x13 > [ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000 > [ 2866.309689] ============================================================================= > [ 2866.310254] BUG TCP (Not tainted): Object already free > [ 2866.310254] ----------------------------------------------------------------------------- > [ 2866.310254] > > The bug comes from the fact that timer set in sk_reset_timer() can run > before we actually do the sock_hold(). socket refcount reaches zero and > we free the socket too soon. > > timer handler is not allowed to reduce socket refcnt if socket is owned > by the user, or we need to change sk_reset_timer() implementation. > > We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED > or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags > > Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED > was used instead of TCP_DELACK_TIMER_DEFERRED. > > For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED, > even if not fired from a timer. > > Reported-by: Fengguang Wu > Tested-by: Fengguang Wu > Signed-off-by: Eric Dumazet > --- > net/ipv4/tcp_ipv4.c | 8 +++++--- > net/ipv4/tcp_output.c | 14 +++++++++----- > net/ipv4/tcp_timer.c | 6 ++++-- > 3 files changed, 18 insertions(+), 10 deletions(-) > diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c > index 7678237..6278a11 100644 > --- a/net/ipv4/tcp_ipv4.c > +++ b/net/ipv4/tcp_ipv4.c > @@ -417,10 +417,12 @@ void tcp_v4_err(struct sk_buff *icmp_skb, u32 info) > > if (code == ICMP_FRAG_NEEDED) { /* PMTU discovery (RFC1191) */ > tp->mtu_info = info; > - if (!sock_owned_by_user(sk)) > + if (!sock_owned_by_user(sk)) { > tcp_v4_mtu_reduced(sk); > - else > - set_bit(TCP_MTU_REDUCED_DEFERRED, &tp->tsq_flags); > + } else { > + if (!test_and_set_bit(TCP_MTU_REDUCED_DEFERRED, &tp->tsq_flags)) > + sock_hold(sk); > + } > goto out; > } > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index 20dfd89..d046326 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -910,14 +910,18 @@ void tcp_release_cb(struct sock *sk) > if (flags & (1UL << TCP_TSQ_DEFERRED)) > tcp_tsq_handler(sk); > > - if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED)) > + if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED)) { > tcp_write_timer_handler(sk); > - > - if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED)) > + __sock_put(sk); > + } > + if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED)) { > tcp_delack_timer_handler(sk); > - > - if (flags & (1UL << TCP_MTU_REDUCED_DEFERRED)) > + __sock_put(sk); > + } > + if (flags & (1UL << TCP_MTU_REDUCED_DEFERRED)) { > sk->sk_prot->mtu_reduced(sk); > + __sock_put(sk); > + } > } > EXPORT_SYMBOL(tcp_release_cb); > > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c > index 6df36ad..b774a03 100644 > --- a/net/ipv4/tcp_timer.c > +++ b/net/ipv4/tcp_timer.c > @@ -252,7 +252,8 @@ static void tcp_delack_timer(unsigned long data) > inet_csk(sk)->icsk_ack.blocked = 1; > NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOCKED); > /* deleguate our work to tcp_release_cb() */ > - set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags); > + if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags)) > + sock_hold(sk); > } > bh_unlock_sock(sk); > sock_put(sk); > @@ -481,7 +482,8 @@ static void tcp_write_timer(unsigned long data) > tcp_write_timer_handler(sk); > } else { > /* deleguate our work to tcp_release_cb() */ > - set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags); > + if (!test_and_set_bit(TCP_WRITE_TIMER_DEFERRED, &tcp_sk(sk)->tsq_flags)) > + sock_hold(sk); > } > bh_unlock_sock(sk); > sock_put(sk); >