netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
@ 2018-01-25 17:43 Alexey Kodanev
  2018-01-25 18:03 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Alexey Kodanev @ 2018-01-25 17:43 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, David Miller, dccp, Alexey Kodanev

ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and
after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
dismantle time"), which moved sk_stop_timer() to sk_destruct(),
this started to happen quite often. The timer prevents releasing
the socket, as a result, sk_destruct() won't be called.

Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:

  unregister_netdevice: waiting for bond0 to become free. Usage count = 148

Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
 net/dccp/ccids/ccid2.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index 1c75cd1..92d016e 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(struct timer_list *t)
 
 	ccid2_pr_debug("RTO_EXPIRE\n");
 
+	if (sk->sk_state == DCCP_CLOSED)
+		goto out;
+
 	/* back-off timer */
 	hc->tx_rto <<= 1;
 	if (hc->tx_rto > DCCP_RTO_MAX)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
  2018-01-25 17:43 [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state Alexey Kodanev
@ 2018-01-25 18:03 ` Eric Dumazet
  2018-01-26 12:02   ` Alexey Kodanev
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2018-01-25 18:03 UTC (permalink / raw)
  To: Alexey Kodanev, netdev; +Cc: Eric Dumazet, David Miller, dccp

On Thu, 2018-01-25 at 20:43 +0300, Alexey Kodanev wrote:
> ccid2_hc_tx_rto_expire() timer callback always restarts the timer
> again and can run indefinitely (unless it is stopped outside), and
> after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
> dismantle time"), which moved sk_stop_timer() to sk_destruct(),
> this started to happen quite often. The timer prevents releasing
> the socket, as a result, sk_destruct() won't be called.
> 
> Found with LTP/dccp_ipsec tests running on the bonding device,
> which later couldn't be unloaded after the tests were completed:
> 
>   unregister_netdevice: waiting for bond0 to become free. Usage count = 148
> 
> Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---

I understand your fix, but not why commit 120e9dabaf55 is bug origin.

Looks like this always had been buggy : Timer logic should have checked
socket state from day 0.

I did not move sk_stop_timer() to sk_destruct()

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
  2018-01-25 18:03 ` Eric Dumazet
@ 2018-01-26 12:02   ` Alexey Kodanev
  0 siblings, 0 replies; 3+ messages in thread
From: Alexey Kodanev @ 2018-01-26 12:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, David Miller, dccp

On 01/25/2018 09:03 PM, Eric Dumazet wrote:
> On Thu, 2018-01-25 at 20:43 +0300, Alexey Kodanev wrote:
>> ccid2_hc_tx_rto_expire() timer callback always restarts the timer
>> again and can run indefinitely (unless it is stopped outside), and
>> after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
>> dismantle time"), which moved sk_stop_timer() to sk_destruct(),
>> this started to happen quite often. The timer prevents releasing
>> the socket, as a result, sk_destruct() won't be called.
>>
>> Found with LTP/dccp_ipsec tests running on the bonding device,
>> which later couldn't be unloaded after the tests were completed:
>>
>>   unregister_netdevice: waiting for bond0 to become free. Usage count = 148
>>
>> Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
>> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
>> ---
> 
> I understand your fix, but not why commit 120e9dabaf55 is bug origin.
> 
> Looks like this always had been buggy : Timer logic should have checked
> socket state from day 0.

Hi Eric,

Agree, I'll change to the initial commit id. I've added commit 120e9dabaf55
because ccid_hc_tx_delete() (and sk_stop_timer()) moved from dccp_destroy_sock()
to sk_destruct(), and only after this change the chances that the timer won't
stop increased significantly.

> 
> I did not move sk_stop_timer() to sk_destruct()
> 

ccid_hc_tx_delete() includes sk_stop_timer():


ccid_hc_tx_delete()
    ccid2_hc_tx_exit(struct sock *sk)
        sk_stop_timer(sk, &hc->tx_rtotimer);

    ccid3_hc_tx_exit(struct sock *sk)
        sk_stop_timer(sk, &hc->tx_no_feedback_timer);


Thanks,
Alexey

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-01-26 11:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-25 17:43 [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state Alexey Kodanev
2018-01-25 18:03 ` Eric Dumazet
2018-01-26 12:02   ` Alexey Kodanev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).