* [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
@ 2018-01-25 17:43 Alexey Kodanev
2018-01-25 18:03 ` Eric Dumazet
0 siblings, 1 reply; 3+ messages in thread
From: Alexey Kodanev @ 2018-01-25 17:43 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, David Miller, dccp, Alexey Kodanev
ccid2_hc_tx_rto_expire() timer callback always restarts the timer
again and can run indefinitely (unless it is stopped outside), and
after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
dismantle time"), which moved sk_stop_timer() to sk_destruct(),
this started to happen quite often. The timer prevents releasing
the socket, as a result, sk_destruct() won't be called.
Found with LTP/dccp_ipsec tests running on the bonding device,
which later couldn't be unloaded after the tests were completed:
unregister_netdevice: waiting for bond0 to become free. Usage count = 148
Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
net/dccp/ccids/ccid2.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/net/dccp/ccids/ccid2.c b/net/dccp/ccids/ccid2.c
index 1c75cd1..92d016e 100644
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -140,6 +140,9 @@ static void ccid2_hc_tx_rto_expire(struct timer_list *t)
ccid2_pr_debug("RTO_EXPIRE\n");
+ if (sk->sk_state == DCCP_CLOSED)
+ goto out;
+
/* back-off timer */
hc->tx_rto <<= 1;
if (hc->tx_rto > DCCP_RTO_MAX)
--
1.7.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
2018-01-25 17:43 [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state Alexey Kodanev
@ 2018-01-25 18:03 ` Eric Dumazet
2018-01-26 12:02 ` Alexey Kodanev
0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2018-01-25 18:03 UTC (permalink / raw)
To: Alexey Kodanev, netdev; +Cc: Eric Dumazet, David Miller, dccp
On Thu, 2018-01-25 at 20:43 +0300, Alexey Kodanev wrote:
> ccid2_hc_tx_rto_expire() timer callback always restarts the timer
> again and can run indefinitely (unless it is stopped outside), and
> after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
> dismantle time"), which moved sk_stop_timer() to sk_destruct(),
> this started to happen quite often. The timer prevents releasing
> the socket, as a result, sk_destruct() won't be called.
>
> Found with LTP/dccp_ipsec tests running on the bonding device,
> which later couldn't be unloaded after the tests were completed:
>
> unregister_netdevice: waiting for bond0 to become free. Usage count = 148
>
> Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
I understand your fix, but not why commit 120e9dabaf55 is bug origin.
Looks like this always had been buggy : Timer logic should have checked
socket state from day 0.
I did not move sk_stop_timer() to sk_destruct()
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
2018-01-25 18:03 ` Eric Dumazet
@ 2018-01-26 12:02 ` Alexey Kodanev
0 siblings, 0 replies; 3+ messages in thread
From: Alexey Kodanev @ 2018-01-26 12:02 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, David Miller, dccp
On 01/25/2018 09:03 PM, Eric Dumazet wrote:
> On Thu, 2018-01-25 at 20:43 +0300, Alexey Kodanev wrote:
>> ccid2_hc_tx_rto_expire() timer callback always restarts the timer
>> again and can run indefinitely (unless it is stopped outside), and
>> after commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at
>> dismantle time"), which moved sk_stop_timer() to sk_destruct(),
>> this started to happen quite often. The timer prevents releasing
>> the socket, as a result, sk_destruct() won't be called.
>>
>> Found with LTP/dccp_ipsec tests running on the bonding device,
>> which later couldn't be unloaded after the tests were completed:
>>
>> unregister_netdevice: waiting for bond0 to become free. Usage count = 148
>>
>> Fixes: 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time")
>> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
>> ---
>
> I understand your fix, but not why commit 120e9dabaf55 is bug origin.
>
> Looks like this always had been buggy : Timer logic should have checked
> socket state from day 0.
Hi Eric,
Agree, I'll change to the initial commit id. I've added commit 120e9dabaf55
because ccid_hc_tx_delete() (and sk_stop_timer()) moved from dccp_destroy_sock()
to sk_destruct(), and only after this change the chances that the timer won't
stop increased significantly.
>
> I did not move sk_stop_timer() to sk_destruct()
>
ccid_hc_tx_delete() includes sk_stop_timer():
ccid_hc_tx_delete()
ccid2_hc_tx_exit(struct sock *sk)
sk_stop_timer(sk, &hc->tx_rtotimer);
ccid3_hc_tx_exit(struct sock *sk)
sk_stop_timer(sk, &hc->tx_no_feedback_timer);
Thanks,
Alexey
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-01-26 11:53 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-25 17:43 [PATCH net] dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state Alexey Kodanev
2018-01-25 18:03 ` Eric Dumazet
2018-01-26 12:02 ` Alexey Kodanev
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).