* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-23 7:35 ` [PATCH net-next v2] " Cambda Zhu
@ 2020-04-23 13:40 ` Eric Dumazet
2020-04-23 14:46 ` Cambda Zhu
2020-04-23 13:43 ` Willem de Bruijn
2020-04-24 8:06 ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
2 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-04-23 13:40 UTC (permalink / raw)
To: Cambda Zhu, netdev, Eric Dumazet; +Cc: Dust Li, Tony Lu
On 4/23/20 12:35 AM, Cambda Zhu wrote:
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
>
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
>
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
> ---
> Changes in v2:
> - Add int overflow check.
>
> include/uapi/linux/capability.h | 1 +
> include/uapi/linux/tcp.h | 1 +
> net/ipv4/tcp.c | 11 +++++++++++
> 3 files changed, 13 insertions(+)
>
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 272dc69fa080..0e30c9756a04 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
> /* Allow multicasting */
> /* Allow read/write of device-specific registers */
> /* Allow activation of ATM control sockets */
> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
> #define CAP_NET_ADMIN 12
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index f2acb2566333..e21e0ce98ca1 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -128,6 +128,7 @@ enum {
> #define TCP_CM_INQ TCP_INQ
>
> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
>
> #define TCP_REPAIR_ON 1
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377..d8cd1fd66bc1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> tcp_enable_tx_delay();
> tp->tcp_tx_delay = val;
> break;
> + case TCP_FORCE_LINGER2:
> + if (val < 0)
> + tp->linger2 = -1;
> + else if (val > INT_MAX / HZ)
> + err = -EINVAL;
> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> + tp->linger2 = 0;
> + else
> + tp->linger2 = val * HZ;
> + break;
> default:
> err = -ENOPROTOOPT;
> break;
>
INT_MAX looks quite
Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
What about adding a new define and simply let TCP_LINGER2 use it ?
Really there is no point trying to allow hours or even days for FIN timeout,
and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
at least from security perspective, these values seem legal as far as TCP specs are concerned.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index dcf9a72eeaa6912202e8a1ca6cf800f7401bf517..8dceea9ae87712613243a48edddd83d9ac629295 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -126,6 +126,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
* to combine FIN-WAIT-2 timeout with
* TIME-WAIT timer.
*/
+#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
#define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
#if HZ >= 100
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6d87de434377e3741314772e5fd866de1c599108..a723fec8704ba4dff235818622e52d67ec9ef489 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3035,7 +3035,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_LINGER2:
if (val < 0)
tp->linger2 = -1;
- else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ)
+ else if (val > TCP_FIN_TIMEOUT_MAX / HZ)
tp->linger2 = 0;
else
tp->linger2 = val * HZ;
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-23 13:40 ` Eric Dumazet
@ 2020-04-23 14:46 ` Cambda Zhu
2020-04-23 16:59 ` Eric Dumazet
0 siblings, 1 reply; 15+ messages in thread
From: Cambda Zhu @ 2020-04-23 14:46 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu
> On Apr 23, 2020, at 21:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 4/23/20 12:35 AM, Cambda Zhu wrote:
>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>> with CAP_NET_ADMIN.
>>
>> As a server, different sockets may need different FIN-WAIT timeout and
>> in most cases the system default value will be used. The timeout can
>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>> system default value. If one socket needs a timeout greater than the
>> default, we have to adjust the sysctl which affects all sockets using
>> the system default value. And if we want to adjust it for just one
>> socket and keep the original value for others, all the other sockets
>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>> set greater tp->linger2 than the default for one socket and keep
>> the sysctl_tcp_fin_timeout unchanged.
>>
>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>> ---
>> Changes in v2:
>> - Add int overflow check.
>>
>> include/uapi/linux/capability.h | 1 +
>> include/uapi/linux/tcp.h | 1 +
>> net/ipv4/tcp.c | 11 +++++++++++
>> 3 files changed, 13 insertions(+)
>>
>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>> index 272dc69fa080..0e30c9756a04 100644
>> --- a/include/uapi/linux/capability.h
>> +++ b/include/uapi/linux/capability.h
>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>> /* Allow multicasting */
>> /* Allow read/write of device-specific registers */
>> /* Allow activation of ATM control sockets */
>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>
>> #define CAP_NET_ADMIN 12
>>
>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>> index f2acb2566333..e21e0ce98ca1 100644
>> --- a/include/uapi/linux/tcp.h
>> +++ b/include/uapi/linux/tcp.h
>> @@ -128,6 +128,7 @@ enum {
>> #define TCP_CM_INQ TCP_INQ
>>
>> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
>> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>
>>
>> #define TCP_REPAIR_ON 1
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 6d87de434377..d8cd1fd66bc1 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>> tcp_enable_tx_delay();
>> tp->tcp_tx_delay = val;
>> break;
>> + case TCP_FORCE_LINGER2:
>> + if (val < 0)
>> + tp->linger2 = -1;
>> + else if (val > INT_MAX / HZ)
>> + err = -EINVAL;
>> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> + tp->linger2 = 0;
>> + else
>> + tp->linger2 = val * HZ;
>> + break;
>> default:
>> err = -ENOPROTOOPT;
>> break;
>>
>
> INT_MAX looks quite
>
> Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
>
> net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
>
> Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
>
> What about adding a new define and simply let TCP_LINGER2 use it ?
>
> Really there is no point trying to allow hours or even days for FIN timeout,
> and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
> at least from security perspective, these values seem legal as far as TCP specs are concerned.
>
>
I also think using sysctl_tcp_fin_timeout to cap TCP_LINGER2 is probably a mistake,
and adding a new define for TCP_LINGER2 is a good idea. I have considered the solution
and found it may have some compatibility issues. Whatever it’s a mistake or not, a
system administrator may have used it to limit the max timeout for all sockets. And
when I think about the behavior of TCP_LINGER2, I'm afraid the server may also rely on
the behavior so the server can set any value greater than sysctl_tcp_fin_timeout and
result in the same timeout.
Maybe my worry is unnecessary. If so, I think your suggestion is better.
Regards,
Cambda
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index dcf9a72eeaa6912202e8a1ca6cf800f7401bf517..8dceea9ae87712613243a48edddd83d9ac629295 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -126,6 +126,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
> * to combine FIN-WAIT-2 timeout with
> * TIME-WAIT timer.
> */
> +#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
>
> #define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
> #if HZ >= 100
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377e3741314772e5fd866de1c599108..a723fec8704ba4dff235818622e52d67ec9ef489 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3035,7 +3035,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> case TCP_LINGER2:
> if (val < 0)
> tp->linger2 = -1;
> - else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ)
> + else if (val > TCP_FIN_TIMEOUT_MAX / HZ)
> tp->linger2 = 0;
> else
> tp->linger2 = val * HZ;
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-23 14:46 ` Cambda Zhu
@ 2020-04-23 16:59 ` Eric Dumazet
2020-04-24 2:56 ` Cambda Zhu
0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2020-04-23 16:59 UTC (permalink / raw)
To: Cambda Zhu, Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu
On 4/23/20 7:46 AM, Cambda Zhu wrote:
>
>
>> On Apr 23, 2020, at 21:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 4/23/20 12:35 AM, Cambda Zhu wrote:
>>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>>> with CAP_NET_ADMIN.
>>>
>>> As a server, different sockets may need different FIN-WAIT timeout and
>>> in most cases the system default value will be used. The timeout can
>>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>>> system default value. If one socket needs a timeout greater than the
>>> default, we have to adjust the sysctl which affects all sockets using
>>> the system default value. And if we want to adjust it for just one
>>> socket and keep the original value for others, all the other sockets
>>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>>> set greater tp->linger2 than the default for one socket and keep
>>> the sysctl_tcp_fin_timeout unchanged.
>>>
>>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>>> ---
>>> Changes in v2:
>>> - Add int overflow check.
>>>
>>> include/uapi/linux/capability.h | 1 +
>>> include/uapi/linux/tcp.h | 1 +
>>> net/ipv4/tcp.c | 11 +++++++++++
>>> 3 files changed, 13 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>>> index 272dc69fa080..0e30c9756a04 100644
>>> --- a/include/uapi/linux/capability.h
>>> +++ b/include/uapi/linux/capability.h
>>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>>> /* Allow multicasting */
>>> /* Allow read/write of device-specific registers */
>>> /* Allow activation of ATM control sockets */
>>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>
>>> #define CAP_NET_ADMIN 12
>>>
>>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>>> index f2acb2566333..e21e0ce98ca1 100644
>>> --- a/include/uapi/linux/tcp.h
>>> +++ b/include/uapi/linux/tcp.h
>>> @@ -128,6 +128,7 @@ enum {
>>> #define TCP_CM_INQ TCP_INQ
>>>
>>> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
>>> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>
>>>
>>> #define TCP_REPAIR_ON 1
>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>> index 6d87de434377..d8cd1fd66bc1 100644
>>> --- a/net/ipv4/tcp.c
>>> +++ b/net/ipv4/tcp.c
>>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>>> tcp_enable_tx_delay();
>>> tp->tcp_tx_delay = val;
>>> break;
>>> + case TCP_FORCE_LINGER2:
>>> + if (val < 0)
>>> + tp->linger2 = -1;
>>> + else if (val > INT_MAX / HZ)
>>> + err = -EINVAL;
>>> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>>> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>>> + tp->linger2 = 0;
>>> + else
>>> + tp->linger2 = val * HZ;
>>> + break;
>>> default:
>>> err = -ENOPROTOOPT;
>>> break;
>>>
>>
>> INT_MAX looks quite
>>
>> Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
>>
>> net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
>>
>> Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
>>
>> What about adding a new define and simply let TCP_LINGER2 use it ?
>>
>> Really there is no point trying to allow hours or even days for FIN timeout,
>> and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
>> at least from security perspective, these values seem legal as far as TCP specs are concerned.
>>
>>
>
> I also think using sysctl_tcp_fin_timeout to cap TCP_LINGER2 is probably a mistake,
> and adding a new define for TCP_LINGER2 is a good idea. I have considered the solution
> and found it may have some compatibility issues. Whatever it’s a mistake or not, a
> system administrator may have used it to limit the max timeout for all sockets. And
> when I think about the behavior of TCP_LINGER2, I'm afraid the server may also rely on
> the behavior so the server can set any value greater than sysctl_tcp_fin_timeout and
> result in the same timeout.
>
> Maybe my worry is unnecessary. If so, I think your suggestion is better.
>
Ṕlease send a v3 with a proper changelog then, thanks !
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-23 16:59 ` Eric Dumazet
@ 2020-04-24 2:56 ` Cambda Zhu
2020-04-24 4:23 ` Eric Dumazet
0 siblings, 1 reply; 15+ messages in thread
From: Cambda Zhu @ 2020-04-24 2:56 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu
> On Apr 24, 2020, at 00:59, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 4/23/20 7:46 AM, Cambda Zhu wrote:
>>
>>
>>> On Apr 23, 2020, at 21:40, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>>
>>> On 4/23/20 12:35 AM, Cambda Zhu wrote:
>>>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>>>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>>>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>>>> with CAP_NET_ADMIN.
>>>>
>>>> As a server, different sockets may need different FIN-WAIT timeout and
>>>> in most cases the system default value will be used. The timeout can
>>>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>>>> system default value. If one socket needs a timeout greater than the
>>>> default, we have to adjust the sysctl which affects all sockets using
>>>> the system default value. And if we want to adjust it for just one
>>>> socket and keep the original value for others, all the other sockets
>>>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>>>> set greater tp->linger2 than the default for one socket and keep
>>>> the sysctl_tcp_fin_timeout unchanged.
>>>>
>>>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>>>> ---
>>>> Changes in v2:
>>>> - Add int overflow check.
>>>>
>>>> include/uapi/linux/capability.h | 1 +
>>>> include/uapi/linux/tcp.h | 1 +
>>>> net/ipv4/tcp.c | 11 +++++++++++
>>>> 3 files changed, 13 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>>>> index 272dc69fa080..0e30c9756a04 100644
>>>> --- a/include/uapi/linux/capability.h
>>>> +++ b/include/uapi/linux/capability.h
>>>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>>>> /* Allow multicasting */
>>>> /* Allow read/write of device-specific registers */
>>>> /* Allow activation of ATM control sockets */
>>>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>>
>>>> #define CAP_NET_ADMIN 12
>>>>
>>>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>>>> index f2acb2566333..e21e0ce98ca1 100644
>>>> --- a/include/uapi/linux/tcp.h
>>>> +++ b/include/uapi/linux/tcp.h
>>>> @@ -128,6 +128,7 @@ enum {
>>>> #define TCP_CM_INQ TCP_INQ
>>>>
>>>> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
>>>> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>>>
>>>>
>>>> #define TCP_REPAIR_ON 1
>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>> index 6d87de434377..d8cd1fd66bc1 100644
>>>> --- a/net/ipv4/tcp.c
>>>> +++ b/net/ipv4/tcp.c
>>>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>>>> tcp_enable_tx_delay();
>>>> tp->tcp_tx_delay = val;
>>>> break;
>>>> + case TCP_FORCE_LINGER2:
>>>> + if (val < 0)
>>>> + tp->linger2 = -1;
>>>> + else if (val > INT_MAX / HZ)
>>>> + err = -EINVAL;
>>>> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>>>> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>>>> + tp->linger2 = 0;
>>>> + else
>>>> + tp->linger2 = val * HZ;
>>>> + break;
>>>> default:
>>>> err = -ENOPROTOOPT;
>>>> break;
>>>>
>>>
>>> INT_MAX looks quite
>>>
>>> Anyway, I do not think we need a new socket option, since really it will need documentation and add more confusion.
>>>
>>> net->ipv4.sysctl_tcp_fin_timeout is the default value for sockets which have tp->linger2 cleared.
>>>
>>> Fact that it has been used to cap TCP_LINGER2 was probably a mistake.
>>>
>>> What about adding a new define and simply let TCP_LINGER2 use it ?
>>>
>>> Really there is no point trying to allow hours or even days for FIN timeout,
>>> and no point limiting a socket from having a value between net->ipv4.sysctl_tcp_fin_timeout and 2 minutes,
>>> at least from security perspective, these values seem legal as far as TCP specs are concerned.
>>>
>>>
>>
>> I also think using sysctl_tcp_fin_timeout to cap TCP_LINGER2 is probably a mistake,
>> and adding a new define for TCP_LINGER2 is a good idea. I have considered the solution
>> and found it may have some compatibility issues. Whatever it’s a mistake or not, a
>> system administrator may have used it to limit the max timeout for all sockets. And
>> when I think about the behavior of TCP_LINGER2, I'm afraid the server may also rely on
>> the behavior so the server can set any value greater than sysctl_tcp_fin_timeout and
>> result in the same timeout.
>>
>> Maybe my worry is unnecessary. If so, I think your suggestion is better.
>>
>
> Ṕlease send a v3 with a proper changelog then, thanks !
If the value between sysctl_tcp_fin_timeout and 2 minutes is legal, should we set tp->linger2
to TCP_FIN_TIMEOUT_MAX or return an error instead of set it to zero which means the default value?
Regards,
Cambda
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-24 2:56 ` Cambda Zhu
@ 2020-04-24 4:23 ` Eric Dumazet
0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2020-04-24 4:23 UTC (permalink / raw)
To: Cambda Zhu, Eric Dumazet; +Cc: netdev, Dust Li, Tony Lu
On 4/23/20 7:56 PM, Cambda Zhu wrote:
>
> If the value between sysctl_tcp_fin_timeout and 2 minutes is legal, should we set tp->linger2
> to TCP_FIN_TIMEOUT_MAX or return an error instead of set it to zero which means the default value?
I would not send an error.
This might break some applications that could consider an error as a serious one.
(You can reuse my patch basically)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-23 7:35 ` [PATCH net-next v2] " Cambda Zhu
2020-04-23 13:40 ` Eric Dumazet
@ 2020-04-23 13:43 ` Willem de Bruijn
2020-04-23 15:02 ` Cambda Zhu
2020-04-24 8:06 ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
2 siblings, 1 reply; 15+ messages in thread
From: Willem de Bruijn @ 2020-04-23 13:43 UTC (permalink / raw)
To: Cambda Zhu; +Cc: netdev, Eric Dumazet, Dust Li, Tony Lu
On Thu, Apr 23, 2020 at 3:36 AM Cambda Zhu <cambda@linux.alibaba.com> wrote:
>
> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
> option has same behavior as TCP_LINGER2, except the tp->linger2 value
> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
> with CAP_NET_ADMIN.
>
> As a server, different sockets may need different FIN-WAIT timeout and
> in most cases the system default value will be used. The timeout can
> be adjusted by setting TCP_LINGER2 but cannot be greater than the
> system default value. If one socket needs a timeout greater than the
> default, we have to adjust the sysctl which affects all sockets using
> the system default value. And if we want to adjust it for just one
> socket and keep the original value for others, all the other sockets
> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
> set greater tp->linger2 than the default for one socket and keep
> the sysctl_tcp_fin_timeout unchanged.
>
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
> ---
> Changes in v2:
> - Add int overflow check.
>
> include/uapi/linux/capability.h | 1 +
> include/uapi/linux/tcp.h | 1 +
> net/ipv4/tcp.c | 11 +++++++++++
> 3 files changed, 13 insertions(+)
>
> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
> index 272dc69fa080..0e30c9756a04 100644
> --- a/include/uapi/linux/capability.h
> +++ b/include/uapi/linux/capability.h
> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
> /* Allow multicasting */
> /* Allow read/write of device-specific registers */
> /* Allow activation of ATM control sockets */
> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
> #define CAP_NET_ADMIN 12
>
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index f2acb2566333..e21e0ce98ca1 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -128,6 +128,7 @@ enum {
> #define TCP_CM_INQ TCP_INQ
>
> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>
>
> #define TCP_REPAIR_ON 1
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 6d87de434377..d8cd1fd66bc1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
> tcp_enable_tx_delay();
> tp->tcp_tx_delay = val;
> break;
> + case TCP_FORCE_LINGER2:
> + if (val < 0)
> + tp->linger2 = -1;
> + else if (val > INT_MAX / HZ)
> + err = -EINVAL;
> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> + tp->linger2 = 0;
Instead of silently falling back to LINGER2 behavior for unprivileged
users, I would fail without privileges, similar to
SO_(SND|RCV)BUFFORCE.
Also, those have capable instead of ns_capable. If there is risk to
system integrity, that is the right choice.
Slight aside, if the original setsockopt had checked optval ==
sizeof(int), we could have added a variant of different size (say,
with an additional flags field), instead of having to create a new
socket option.
> + else
> + tp->linger2 = val * HZ;
> + break;
> default:
> err = -ENOPROTOOPT;
> break;
> --
> 2.16.6
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net-next v2] net: Add TCP_FORCE_LINGER2 to TCP setsockopt
2020-04-23 13:43 ` Willem de Bruijn
@ 2020-04-23 15:02 ` Cambda Zhu
0 siblings, 0 replies; 15+ messages in thread
From: Cambda Zhu @ 2020-04-23 15:02 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: netdev, Eric Dumazet, Dust Li, Tony Lu
> On Apr 23, 2020, at 21:43, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Apr 23, 2020 at 3:36 AM Cambda Zhu <cambda@linux.alibaba.com> wrote:
>>
>> This patch adds a new TCP socket option named TCP_FORCE_LINGER2. The
>> option has same behavior as TCP_LINGER2, except the tp->linger2 value
>> can be greater than sysctl_tcp_fin_timeout if the user_ns is capable
>> with CAP_NET_ADMIN.
>>
>> As a server, different sockets may need different FIN-WAIT timeout and
>> in most cases the system default value will be used. The timeout can
>> be adjusted by setting TCP_LINGER2 but cannot be greater than the
>> system default value. If one socket needs a timeout greater than the
>> default, we have to adjust the sysctl which affects all sockets using
>> the system default value. And if we want to adjust it for just one
>> socket and keep the original value for others, all the other sockets
>> have to set TCP_LINGER2. But with TCP_FORCE_LINGER2, the net admin can
>> set greater tp->linger2 than the default for one socket and keep
>> the sysctl_tcp_fin_timeout unchanged.
>>
>> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
>> ---
>> Changes in v2:
>> - Add int overflow check.
>>
>> include/uapi/linux/capability.h | 1 +
>> include/uapi/linux/tcp.h | 1 +
>> net/ipv4/tcp.c | 11 +++++++++++
>> 3 files changed, 13 insertions(+)
>>
>> diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
>> index 272dc69fa080..0e30c9756a04 100644
>> --- a/include/uapi/linux/capability.h
>> +++ b/include/uapi/linux/capability.h
>> @@ -199,6 +199,7 @@ struct vfs_ns_cap_data {
>> /* Allow multicasting */
>> /* Allow read/write of device-specific registers */
>> /* Allow activation of ATM control sockets */
>> +/* Allow setting TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>
>> #define CAP_NET_ADMIN 12
>>
>> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
>> index f2acb2566333..e21e0ce98ca1 100644
>> --- a/include/uapi/linux/tcp.h
>> +++ b/include/uapi/linux/tcp.h
>> @@ -128,6 +128,7 @@ enum {
>> #define TCP_CM_INQ TCP_INQ
>>
>> #define TCP_TX_DELAY 37 /* delay outgoing packets by XX usec */
>> +#define TCP_FORCE_LINGER2 38 /* Set TCP_LINGER2 regardless of sysctl_tcp_fin_timeout */
>>
>>
>> #define TCP_REPAIR_ON 1
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 6d87de434377..d8cd1fd66bc1 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -3149,6 +3149,17 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
>> tcp_enable_tx_delay();
>> tp->tcp_tx_delay = val;
>> break;
>> + case TCP_FORCE_LINGER2:
>> + if (val < 0)
>> + tp->linger2 = -1;
>> + else if (val > INT_MAX / HZ)
>> + err = -EINVAL;
>> + else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ &&
>> + !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> + tp->linger2 = 0;
>
> Instead of silently falling back to LINGER2 behavior for unprivileged
> users, I would fail without privileges, similar to
> SO_(SND|RCV)BUFFORCE.
>
Yes, I have considered failing without privileges too. It’s clearer
but the user may have to set both of TCP_LINGER2 and TCP_FORCE_LINGER2
to set the timeout as TCP_FORCE_LINGER2 not introduced.
> Also, those have capable instead of ns_capable. If there is risk to
> system integrity, that is the right choice.
>
I think both are ok, but the sysctl_tcp_fin_timeout is in net ns.
> Slight aside, if the original setsockopt had checked optval ==
> sizeof(int), we could have added a variant of different size (say,
> with an additional flags field), instead of having to create a new
> socket option.
>
Maybe it’s a little weird… I don’t know. :)
Regards,
Cambda
>> + else
>> + tp->linger2 = val * HZ;
>> + break;
>> default:
>> err = -ENOPROTOOPT;
>> break;
>> --
>> 2.16.6
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
2020-04-23 7:35 ` [PATCH net-next v2] " Cambda Zhu
2020-04-23 13:40 ` Eric Dumazet
2020-04-23 13:43 ` Willem de Bruijn
@ 2020-04-24 8:06 ` Cambda Zhu
2020-05-01 22:12 ` David Miller
2 siblings, 1 reply; 15+ messages in thread
From: Cambda Zhu @ 2020-04-24 8:06 UTC (permalink / raw)
To: netdev, Eric Dumazet; +Cc: Dust Li, Tony Lu, Cambda Zhu
This patch changes the behavior of TCP_LINGER2 about its limit. The
sysctl_tcp_fin_timeout used to be the limit of TCP_LINGER2 but now it's
only the default value. A new macro named TCP_FIN_TIMEOUT_MAX is added
as the limit of TCP_LINGER2, which is 2 minutes.
Since TCP_LINGER2 used sysctl_tcp_fin_timeout as the default value
and the limit in the past, the system administrator cannot set the
default value for most of sockets and let some sockets have a greater
timeout. It might be a mistake that let the sysctl to be the limit of
the TCP_LINGER2. Maybe we can add a new sysctl to set the max of
TCP_LINGER2, but FIN-WAIT-2 timeout is usually no need to be too long
and 2 minutes are legal considering TCP specs.
Changes in v3:
- Remove the new socket option and change the TCP_LINGER2 behavior so
that the timeout can be set to value between sysctl_tcp_fin_timeout
and 2 minutes.
Changes in v2:
- Add int overflow check for the new socket option.
Changes in v1:
- Add a new socket option to set timeout greater than
sysctl_tcp_fin_timeout.
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
---
include/net/tcp.h | 1 +
net/ipv4/tcp.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 5fa9eacd965a..cde8a1f75caa 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -126,6 +126,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
* to combine FIN-WAIT-2 timeout with
* TIME-WAIT timer.
*/
+#define TCP_FIN_TIMEOUT_MAX (120 * HZ) /* max TCP_LINGER2 value (two minutes) */
#define TCP_DELACK_MAX ((unsigned)(HZ/5)) /* maximal time to delay before sending an ACK */
#if HZ >= 100
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6d87de434377..8c1250103959 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3035,8 +3035,8 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
case TCP_LINGER2:
if (val < 0)
tp->linger2 = -1;
- else if (val > net->ipv4.sysctl_tcp_fin_timeout / HZ)
- tp->linger2 = 0;
+ else if (val > TCP_FIN_TIMEOUT_MAX / HZ)
+ tp->linger2 = TCP_FIN_TIMEOUT_MAX;
else
tp->linger2 = val * HZ;
break;
--
2.16.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX
2020-04-24 8:06 ` [PATCH net-next v3] net: Replace the limit of TCP_LINGER2 with TCP_FIN_TIMEOUT_MAX Cambda Zhu
@ 2020-05-01 22:12 ` David Miller
0 siblings, 0 replies; 15+ messages in thread
From: David Miller @ 2020-05-01 22:12 UTC (permalink / raw)
To: cambda; +Cc: netdev, eric.dumazet, dust.li, tonylu
From: Cambda Zhu <cambda@linux.alibaba.com>
Date: Fri, 24 Apr 2020 16:06:16 +0800
> This patch changes the behavior of TCP_LINGER2 about its limit. The
> sysctl_tcp_fin_timeout used to be the limit of TCP_LINGER2 but now it's
> only the default value. A new macro named TCP_FIN_TIMEOUT_MAX is added
> as the limit of TCP_LINGER2, which is 2 minutes.
>
> Since TCP_LINGER2 used sysctl_tcp_fin_timeout as the default value
> and the limit in the past, the system administrator cannot set the
> default value for most of sockets and let some sockets have a greater
> timeout. It might be a mistake that let the sysctl to be the limit of
> the TCP_LINGER2. Maybe we can add a new sysctl to set the max of
> TCP_LINGER2, but FIN-WAIT-2 timeout is usually no need to be too long
> and 2 minutes are legal considering TCP specs.
>
> Changes in v3:
> - Remove the new socket option and change the TCP_LINGER2 behavior so
> that the timeout can be set to value between sysctl_tcp_fin_timeout
> and 2 minutes.
>
> Changes in v2:
> - Add int overflow check for the new socket option.
>
> Changes in v1:
> - Add a new socket option to set timeout greater than
> sysctl_tcp_fin_timeout.
>
> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Applied, thank you.
^ permalink raw reply [flat|nested] 15+ messages in thread