netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] sctp: fix race on protocol/netns initialization
@ 2015-09-09 20:03 Marcelo Ricardo Leitner
  2015-09-09 20:30 ` Vlad Yasevich
  2015-09-10  0:16 ` [PATCH net] " David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-09 20:03 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp

Consider sctp module is unloaded and is being requested because an user
is creating a sctp socket.

During initialization, sctp will add the new protocol type and then
initialize pernet subsys:

        status = sctp_v4_protosw_init();
        if (status)
                goto err_protosw_init;

        status = sctp_v6_protosw_init();
        if (status)
                goto err_v6_protosw_init;

        status = register_pernet_subsys(&sctp_net_ops);

The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
is possible for userspace to create SCTP sockets like if the module is
already fully loaded. If that happens, one of the possible effects is
that we will have readers for net->sctp.local_addr_list list earlier
than expected and sctp_net_init() does not take precautions while
dealing with that list, leading to a potential panic but not limited to
that, as sctp_sock_init() will copy a bunch of blank/partially
initialized values from net->sctp.

The race happens like this:

     CPU 0                           |  CPU 1
  socket()                           |
   __sock_create                     | socket()
    inet_create                      |  __sock_create
     list_for_each_entry_rcu(        |
        answer, &inetsw[sock->type], |
        list) {                      |   inet_create
      /* no hits */                  |
     if (unlikely(err)) {            |
      ...                            |
      request_module()               |
      /* socket creation is blocked  |
       * the module is fully loaded  |
       */                            |
       sctp_init                     |
        sctp_v4_protosw_init         |
         inet_register_protosw       |
          list_add_rcu(&p->list,     |
                       last_perm);   |
                                     |  list_for_each_entry_rcu(
                                     |     answer, &inetsw[sock->type],
        sctp_v6_protosw_init         |     list) {
                                     |     /* hit, so assumes protocol
                                     |      * is already loaded
                                     |      */
                                     |  /* socket creation continues
                                     |   * before netns is initialized
                                     |   */
        register_pernet_subsys       |

Inverting the initialization order between register_pernet_subsys() and
sctp_v4_protosw_init() is not possible because register_pernet_subsys()
will create a control sctp socket, so the protocol must be already
visible by then. Deferring the socket creation to a work-queue is not
good specially because we loose the ability to handle its errors.

So the fix then is to invert the initialization order inside
register_pernet_subsys() so that the control socket is created by last
and also block socket creation if netns initialization wasn't yet
performed.

Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/protocol.c | 18 +++++++++++-------
 net/sctp/socket.c   |  4 ++++
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4345790ad3266c353eeac5398593c2a9ce4effda..d8f78165768a75f93f4ce4120dd5475b6a623aaf 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1271,12 +1271,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	sctp_dbg_objcnt_init(net);
 
-	/* Initialize the control inode/socket for handling OOTB packets.  */
-	if ((status = sctp_ctl_sock_init(net))) {
-		pr_err("Failed to initialize the SCTP control sock\n");
-		goto err_ctl_sock_init;
-	}
-
 	/* Initialize the local address list. */
 	INIT_LIST_HEAD(&net->sctp.local_addr_list);
 	spin_lock_init(&net->sctp.local_addr_lock);
@@ -1284,11 +1278,21 @@ static int __net_init sctp_net_init(struct net *net)
 
 	/* Initialize the address event list */
 	INIT_LIST_HEAD(&net->sctp.addr_waitq);
-	INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
 	spin_lock_init(&net->sctp.addr_wq_lock);
 	net->sctp.addr_wq_timer.expires = 0;
 	setup_timer(&net->sctp.addr_wq_timer, sctp_addr_wq_timeout_handler,
 		    (unsigned long)net);
+	/* sctp_init_sock() will use this to know that netns is
+	 * nearly all initialized but already good to go.
+	 */
+	INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
+
+	/* Initialize the control inode/socket for handling OOTB packets.  */
+	status = sctp_ctl_sock_init(net);
+	if (status) {
+		pr_err("Failed to initialize the SCTP control sock\n");
+		goto err_ctl_sock_init;
+	}
 
 	return 0;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 17bef01b9aa3e7f75328d39fc976f9e80d641e92..45b94deec93d0c7c1612a16922348cf2a7e65ec5 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3993,6 +3993,10 @@ static int sctp_init_sock(struct sock *sk)
 
 	pr_debug("%s: sk:%p\n", __func__, sk);
 
+	/* Validate if netns is already initialized. */
+	if (!net->sctp.auto_asconf_splist.prev)
+		return -ENOPROTOOPT;
+
 	sp = sctp_sk(sk);
 
 	/* Initialize the SCTP per socket area.  */
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-09 20:03 [PATCH net] sctp: fix race on protocol/netns initialization Marcelo Ricardo Leitner
@ 2015-09-09 20:30 ` Vlad Yasevich
  2015-09-09 21:06   ` Marcelo Ricardo Leitner
  2015-09-10  0:16 ` [PATCH net] " David Miller
  1 sibling, 1 reply; 18+ messages in thread
From: Vlad Yasevich @ 2015-09-09 20:30 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, netdev; +Cc: Neil Horman, linux-sctp

On 09/09/2015 04:03 PM, Marcelo Ricardo Leitner wrote:
> Consider sctp module is unloaded and is being requested because an user
> is creating a sctp socket.
> 
> During initialization, sctp will add the new protocol type and then
> initialize pernet subsys:
> 
>         status = sctp_v4_protosw_init();
>         if (status)
>                 goto err_protosw_init;
> 
>         status = sctp_v6_protosw_init();
>         if (status)
>                 goto err_v6_protosw_init;
> 
>         status = register_pernet_subsys(&sctp_net_ops);
> 
> The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
> is possible for userspace to create SCTP sockets like if the module is
> already fully loaded. If that happens, one of the possible effects is
> that we will have readers for net->sctp.local_addr_list list earlier
> than expected and sctp_net_init() does not take precautions while
> dealing with that list, leading to a potential panic but not limited to
> that, as sctp_sock_init() will copy a bunch of blank/partially
> initialized values from net->sctp.
> 
> The race happens like this:
> 
>      CPU 0                           |  CPU 1
>   socket()                           |
>    __sock_create                     | socket()
>     inet_create                      |  __sock_create
>      list_for_each_entry_rcu(        |
>         answer, &inetsw[sock->type], |
>         list) {                      |   inet_create
>       /* no hits */                  |
>      if (unlikely(err)) {            |
>       ...                            |
>       request_module()               |
>       /* socket creation is blocked  |
>        * the module is fully loaded  |
>        */                            |
>        sctp_init                     |
>         sctp_v4_protosw_init         |
>          inet_register_protosw       |
>           list_add_rcu(&p->list,     |
>                        last_perm);   |
>                                      |  list_for_each_entry_rcu(
>                                      |     answer, &inetsw[sock->type],
>         sctp_v6_protosw_init         |     list) {
>                                      |     /* hit, so assumes protocol
>                                      |      * is already loaded
>                                      |      */
>                                      |  /* socket creation continues
>                                      |   * before netns is initialized
>                                      |   */
>         register_pernet_subsys       |
> 
> Inverting the initialization order between register_pernet_subsys() and
> sctp_v4_protosw_init() is not possible because register_pernet_subsys()
> will create a control sctp socket, so the protocol must be already
> visible by then. Deferring the socket creation to a work-queue is not
> good specially because we loose the ability to handle its errors.
> 
> So the fix then is to invert the initialization order inside
> register_pernet_subsys() so that the control socket is created by last
> and also block socket creation if netns initialization wasn't yet
> performed.
> 

not sure how much I like that...  Wouldn't it be better
to pull the control socket initialization stuff out into its
own function that does something like

for_each_net_rcu()
	init_control_socket(net, ...)


Or may be even pull the control socket creation
stuff completely into its own per-net ops operations structure
and initialize it after the the protosw stuff has been done.

-vlad

> Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>  net/sctp/protocol.c | 18 +++++++++++-------
>  net/sctp/socket.c   |  4 ++++
>  2 files changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
> index 4345790ad3266c353eeac5398593c2a9ce4effda..d8f78165768a75f93f4ce4120dd5475b6a623aaf 100644
> --- a/net/sctp/protocol.c
> +++ b/net/sctp/protocol.c
> @@ -1271,12 +1271,6 @@ static int __net_init sctp_net_init(struct net *net)
>  
>  	sctp_dbg_objcnt_init(net);
>  
> -	/* Initialize the control inode/socket for handling OOTB packets.  */
> -	if ((status = sctp_ctl_sock_init(net))) {
> -		pr_err("Failed to initialize the SCTP control sock\n");
> -		goto err_ctl_sock_init;
> -	}
> -
>  	/* Initialize the local address list. */
>  	INIT_LIST_HEAD(&net->sctp.local_addr_list);
>  	spin_lock_init(&net->sctp.local_addr_lock);
> @@ -1284,11 +1278,21 @@ static int __net_init sctp_net_init(struct net *net)
>  
>  	/* Initialize the address event list */
>  	INIT_LIST_HEAD(&net->sctp.addr_waitq);
> -	INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
>  	spin_lock_init(&net->sctp.addr_wq_lock);
>  	net->sctp.addr_wq_timer.expires = 0;
>  	setup_timer(&net->sctp.addr_wq_timer, sctp_addr_wq_timeout_handler,
>  		    (unsigned long)net);
> +	/* sctp_init_sock() will use this to know that netns is
> +	 * nearly all initialized but already good to go.
> +	 */
> +	INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
> +
> +	/* Initialize the control inode/socket for handling OOTB packets.  */
> +	status = sctp_ctl_sock_init(net);
> +	if (status) {
> +		pr_err("Failed to initialize the SCTP control sock\n");
> +		goto err_ctl_sock_init;
> +	}
>  
>  	return 0;
>  
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 17bef01b9aa3e7f75328d39fc976f9e80d641e92..45b94deec93d0c7c1612a16922348cf2a7e65ec5 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -3993,6 +3993,10 @@ static int sctp_init_sock(struct sock *sk)
>  
>  	pr_debug("%s: sk:%p\n", __func__, sk);
>  
> +	/* Validate if netns is already initialized. */
> +	if (!net->sctp.auto_asconf_splist.prev)
> +		return -ENOPROTOOPT;
> +
>  	sp = sctp_sk(sk);
>  
>  	/* Initialize the SCTP per socket area.  */
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-09 20:30 ` Vlad Yasevich
@ 2015-09-09 21:06   ` Marcelo Ricardo Leitner
  2015-09-10 13:24     ` Vlad Yasevich
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-09 21:06 UTC (permalink / raw)
  To: Vlad Yasevich, netdev; +Cc: Neil Horman, linux-sctp

Em 09-09-2015 17:30, Vlad Yasevich escreveu:
> On 09/09/2015 04:03 PM, Marcelo Ricardo Leitner wrote:
>> Consider sctp module is unloaded and is being requested because an user
>> is creating a sctp socket.
>>
>> During initialization, sctp will add the new protocol type and then
>> initialize pernet subsys:
>>
>>          status = sctp_v4_protosw_init();
>>          if (status)
>>                  goto err_protosw_init;
>>
>>          status = sctp_v6_protosw_init();
>>          if (status)
>>                  goto err_v6_protosw_init;
>>
>>          status = register_pernet_subsys(&sctp_net_ops);
>>
>> The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
>> is possible for userspace to create SCTP sockets like if the module is
>> already fully loaded. If that happens, one of the possible effects is
>> that we will have readers for net->sctp.local_addr_list list earlier
>> than expected and sctp_net_init() does not take precautions while
>> dealing with that list, leading to a potential panic but not limited to
>> that, as sctp_sock_init() will copy a bunch of blank/partially
>> initialized values from net->sctp.
>>
>> The race happens like this:
>>
>>       CPU 0                           |  CPU 1
>>    socket()                           |
>>     __sock_create                     | socket()
>>      inet_create                      |  __sock_create
>>       list_for_each_entry_rcu(        |
>>          answer, &inetsw[sock->type], |
>>          list) {                      |   inet_create
>>        /* no hits */                  |
>>       if (unlikely(err)) {            |
>>        ...                            |
>>        request_module()               |
>>        /* socket creation is blocked  |
>>         * the module is fully loaded  |
>>         */                            |
>>         sctp_init                     |
>>          sctp_v4_protosw_init         |
>>           inet_register_protosw       |
>>            list_add_rcu(&p->list,     |
>>                         last_perm);   |
>>                                       |  list_for_each_entry_rcu(
>>                                       |     answer, &inetsw[sock->type],
>>          sctp_v6_protosw_init         |     list) {
>>                                       |     /* hit, so assumes protocol
>>                                       |      * is already loaded
>>                                       |      */
>>                                       |  /* socket creation continues
>>                                       |   * before netns is initialized
>>                                       |   */
>>          register_pernet_subsys       |
>>
>> Inverting the initialization order between register_pernet_subsys() and
>> sctp_v4_protosw_init() is not possible because register_pernet_subsys()
>> will create a control sctp socket, so the protocol must be already
>> visible by then. Deferring the socket creation to a work-queue is not
>> good specially because we loose the ability to handle its errors.
>>
>> So the fix then is to invert the initialization order inside
>> register_pernet_subsys() so that the control socket is created by last
>> and also block socket creation if netns initialization wasn't yet
>> performed.
>>
>
> not sure how much I like that...  Wouldn't it be better
> to pull the control socket initialization stuff out into its
> own function that does something like
>
> for_each_net_rcu()
> 	init_control_socket(net, ...)
>
>
> Or may be even pull the control socket creation
> stuff completely into its own per-net ops operations structure
> and initialize it after the the protosw stuff has been done.
>
> -vlad

I'm afraid error handling won't be easy then.

But still, the control socket is not really the problem, because we 
don't care (much?) if it contains zeroed values and the panic happens 
only if you call connect() on it. I moved it solely because of the 
protection on sctp_init_sock().

The real problem is new sockets created by an user application while 
module is still loading, because even if them don't trigger the panic, 
they may not be fully functional due to improper values loaded. Can't 
see other good ways to protect sctp_init_sock() from that early call (as 
in, prior to netns initialization).

I used the list pointer because that's null as that memory is entirely 
zeroed when alloced and, after initialization, it's never null again. 
Works like a lock/condition without using an extra field.

   Marcelo

>> Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> ---
>>   net/sctp/protocol.c | 18 +++++++++++-------
>>   net/sctp/socket.c   |  4 ++++
>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>> index 4345790ad3266c353eeac5398593c2a9ce4effda..d8f78165768a75f93f4ce4120dd5475b6a623aaf 100644
>> --- a/net/sctp/protocol.c
>> +++ b/net/sctp/protocol.c
>> @@ -1271,12 +1271,6 @@ static int __net_init sctp_net_init(struct net *net)
>>
>>   	sctp_dbg_objcnt_init(net);
>>
>> -	/* Initialize the control inode/socket for handling OOTB packets.  */
>> -	if ((status = sctp_ctl_sock_init(net))) {
>> -		pr_err("Failed to initialize the SCTP control sock\n");
>> -		goto err_ctl_sock_init;
>> -	}
>> -
>>   	/* Initialize the local address list. */
>>   	INIT_LIST_HEAD(&net->sctp.local_addr_list);
>>   	spin_lock_init(&net->sctp.local_addr_lock);
>> @@ -1284,11 +1278,21 @@ static int __net_init sctp_net_init(struct net *net)
>>
>>   	/* Initialize the address event list */
>>   	INIT_LIST_HEAD(&net->sctp.addr_waitq);
>> -	INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
>>   	spin_lock_init(&net->sctp.addr_wq_lock);
>>   	net->sctp.addr_wq_timer.expires = 0;
>>   	setup_timer(&net->sctp.addr_wq_timer, sctp_addr_wq_timeout_handler,
>>   		    (unsigned long)net);
>> +	/* sctp_init_sock() will use this to know that netns is
>> +	 * nearly all initialized but already good to go.
>> +	 */
>> +	INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
>> +
>> +	/* Initialize the control inode/socket for handling OOTB packets.  */
>> +	status = sctp_ctl_sock_init(net);
>> +	if (status) {
>> +		pr_err("Failed to initialize the SCTP control sock\n");
>> +		goto err_ctl_sock_init;
>> +	}
>>
>>   	return 0;
>>
>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>> index 17bef01b9aa3e7f75328d39fc976f9e80d641e92..45b94deec93d0c7c1612a16922348cf2a7e65ec5 100644
>> --- a/net/sctp/socket.c
>> +++ b/net/sctp/socket.c
>> @@ -3993,6 +3993,10 @@ static int sctp_init_sock(struct sock *sk)
>>
>>   	pr_debug("%s: sk:%p\n", __func__, sk);
>>
>> +	/* Validate if netns is already initialized. */
>> +	if (!net->sctp.auto_asconf_splist.prev)
>> +		return -ENOPROTOOPT;
>> +
>>   	sp = sctp_sk(sk);
>>
>>   	/* Initialize the SCTP per socket area.  */
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-09 20:03 [PATCH net] sctp: fix race on protocol/netns initialization Marcelo Ricardo Leitner
  2015-09-09 20:30 ` Vlad Yasevich
@ 2015-09-10  0:16 ` David Miller
  2015-09-10 12:54   ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 18+ messages in thread
From: David Miller @ 2015-09-10  0:16 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, vyasevich, nhorman, linux-sctp

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Wed,  9 Sep 2015 17:03:01 -0300

> So the fix then is to invert the initialization order inside
> register_pernet_subsys() so that the control socket is created by last
> and also block socket creation if netns initialization wasn't yet
> performed.

If we really need to we could make ->create() fail with -EAFNOSUPPORT
if kern==1 until the protocol is fully setup.

Or, instead of failing, we could make such ->create() calls block
until the control sock init is complete or fails.

We have actually several visibility issues wrt. control sockets on
protocol init, in general.

For example, such control sockets can briefly be hashed and visible
to socket dumps and packet input.

A lot of really tricky issues involved here.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10  0:16 ` [PATCH net] " David Miller
@ 2015-09-10 12:54   ` Marcelo Ricardo Leitner
  2015-09-10 13:02     ` David Laight
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 12:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, vyasevich, nhorman, linux-sctp

Em 09-09-2015 21:16, David Miller escreveu:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Wed,  9 Sep 2015 17:03:01 -0300
>
>> So the fix then is to invert the initialization order inside
>> register_pernet_subsys() so that the control socket is created by last
>> and also block socket creation if netns initialization wasn't yet
>> performed.
>
> If we really need to we could make ->create() fail with -EAFNOSUPPORT
> if kern==1 until the protocol is fully setup.
>
> Or, instead of failing, we could make such ->create() calls block
> until the control sock init is complete or fails.

I guess I should have written that paragraph in another order, perhaps like:
So the fix then is to deny any sctp socket creation until netns 
initialization is sufficiently done. And due to that, we have to 
initialize the control socket as last step in netns initialization, as 
now it can't be created earlier anymore.

Is it clearer on the intention?

And my emphasis on userspace sockets was to highlight that a random user 
could trigger it, but yes both users are affected by the issue.

Strictly speaking, we would have to block ->create() not until the 
control socket init is done but until the protocol is fully loaded. Such 
condition, with this patch, is after net->sctp.auto_asconf_splist is 
initialized. But for blocking until instead of just denying, we would 
need some other mechanism.

It would be better from the (sctp) user point of view but then such 
solution may better belong to another layer instead and protect all 
protocols at once. (I checked and couldn't find other protocols at risk 
like sctp)

> We have actually several visibility issues wrt. control sockets on
> protocol init, in general.
>
> For example, such control sockets can briefly be hashed and visible
> to socket dumps and packet input.
>
> A lot of really tricky issues involved here.

Agreed, but does these still apply after explaining that paragraph/the 
solution? I had no intention on visiting these issues with this patch, 
they are left unchanged, but I can if a better solution for the original 
issue calls for it.

   Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 12:54   ` Marcelo Ricardo Leitner
@ 2015-09-10 13:02     ` David Laight
  2015-09-10 14:36       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 18+ messages in thread
From: David Laight @ 2015-09-10 13:02 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner', David Miller
  Cc: netdev@vger.kernel.org, vyasevich@gmail.com,
	nhorman@tuxdriver.com, linux-sctp@vger.kernel.org

From: Marcelo Ricardo Leitner
> Sent: 10 September 2015 13:54
> Em 09-09-2015 21:16, David Miller escreveu:
> > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Date: Wed,  9 Sep 2015 17:03:01 -0300
> >
> >> So the fix then is to invert the initialization order inside
> >> register_pernet_subsys() so that the control socket is created by last
> >> and also block socket creation if netns initialization wasn't yet
> >> performed.
> >
> > If we really need to we could make ->create() fail with -EAFNOSUPPORT
> > if kern==1 until the protocol is fully setup.
> >
> > Or, instead of failing, we could make such ->create() calls block
> > until the control sock init is complete or fails.
> 
> I guess I should have written that paragraph in another order, perhaps like:
> So the fix then is to deny any sctp socket creation until netns
> initialization is sufficiently done. And due to that, we have to
> initialize the control socket as last step in netns initialization, as
> now it can't be created earlier anymore.
> 
> Is it clearer on the intention?
> 
> And my emphasis on userspace sockets was to highlight that a random user
> could trigger it, but yes both users are affected by the issue.
> 
> Strictly speaking, we would have to block ->create() not until the
> control socket init is done but until the protocol is fully loaded. Such
> condition, with this patch, is after net->sctp.auto_asconf_splist is
> initialized. But for blocking until instead of just denying, we would
> need some other mechanism.
> 
> It would be better from the (sctp) user point of view but then such
> solution may better belong to another layer instead and protect all
> protocols at once. (I checked and couldn't find other protocols at risk
> like sctp)

Given that the first ->create() blocks while the protocol code loads
it really wouldn't be right to error a subsequent ->create() because
the load hasn't completed.

I hold a semaphore across sock_create_kern() because of issues with sctp.
(Current kernels are nowhere near as bad as really old ones though.)

	David

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-09 21:06   ` Marcelo Ricardo Leitner
@ 2015-09-10 13:24     ` Vlad Yasevich
  2015-09-10 14:22       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 18+ messages in thread
From: Vlad Yasevich @ 2015-09-10 13:24 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, netdev; +Cc: Neil Horman, linux-sctp

On 09/09/2015 05:06 PM, Marcelo Ricardo Leitner wrote:
> Em 09-09-2015 17:30, Vlad Yasevich escreveu:
>> On 09/09/2015 04:03 PM, Marcelo Ricardo Leitner wrote:
>>> Consider sctp module is unloaded and is being requested because an user
>>> is creating a sctp socket.
>>>
>>> During initialization, sctp will add the new protocol type and then
>>> initialize pernet subsys:
>>>
>>>          status = sctp_v4_protosw_init();
>>>          if (status)
>>>                  goto err_protosw_init;
>>>
>>>          status = sctp_v6_protosw_init();
>>>          if (status)
>>>                  goto err_v6_protosw_init;
>>>
>>>          status = register_pernet_subsys(&sctp_net_ops);
>>>
>>> The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
>>> is possible for userspace to create SCTP sockets like if the module is
>>> already fully loaded. If that happens, one of the possible effects is
>>> that we will have readers for net->sctp.local_addr_list list earlier
>>> than expected and sctp_net_init() does not take precautions while
>>> dealing with that list, leading to a potential panic but not limited to
>>> that, as sctp_sock_init() will copy a bunch of blank/partially
>>> initialized values from net->sctp.
>>>
>>> The race happens like this:
>>>
>>>       CPU 0                           |  CPU 1
>>>    socket()                           |
>>>     __sock_create                     | socket()
>>>      inet_create                      |  __sock_create
>>>       list_for_each_entry_rcu(        |
>>>          answer, &inetsw[sock->type], |
>>>          list) {                      |   inet_create
>>>        /* no hits */                  |
>>>       if (unlikely(err)) {            |
>>>        ...                            |
>>>        request_module()               |
>>>        /* socket creation is blocked  |
>>>         * the module is fully loaded  |
>>>         */                            |
>>>         sctp_init                     |
>>>          sctp_v4_protosw_init         |
>>>           inet_register_protosw       |
>>>            list_add_rcu(&p->list,     |
>>>                         last_perm);   |
>>>                                       |  list_for_each_entry_rcu(
>>>                                       |     answer, &inetsw[sock->type],
>>>          sctp_v6_protosw_init         |     list) {
>>>                                       |     /* hit, so assumes protocol
>>>                                       |      * is already loaded
>>>                                       |      */
>>>                                       |  /* socket creation continues
>>>                                       |   * before netns is initialized
>>>                                       |   */
>>>          register_pernet_subsys       |
>>>
>>> Inverting the initialization order between register_pernet_subsys() and
>>> sctp_v4_protosw_init() is not possible because register_pernet_subsys()
>>> will create a control sctp socket, so the protocol must be already
>>> visible by then. Deferring the socket creation to a work-queue is not
>>> good specially because we loose the ability to handle its errors.
>>>
>>> So the fix then is to invert the initialization order inside
>>> register_pernet_subsys() so that the control socket is created by last
>>> and also block socket creation if netns initialization wasn't yet
>>> performed.
>>>
>>
>> not sure how much I like that...  Wouldn't it be better
>> to pull the control socket initialization stuff out into its
>> own function that does something like
>>
>> for_each_net_rcu()
>>     init_control_socket(net, ...)
>>
>>
>> Or may be even pull the control socket creation
>> stuff completely into its own per-net ops operations structure
>> and initialize it after the the protosw stuff has been done.
>>
>> -vlad
> 
> I'm afraid error handling won't be easy then.
> 
> But still, the control socket is not really the problem, because we don't care (much?) if
> it contains zeroed values and the panic happens only if you call connect() on it. I moved
> it solely because of the protection on sctp_init_sock().
> 
> The real problem is new sockets created by an user application while module is still
> loading, because even if them don't trigger the panic, they may not be fully functional
> due to improper values loaded. Can't see other good ways to protect sctp_init_sock() from
> that early call (as in, prior to netns initialization).

Right, I understand what the problem really is.  Like you said, the simple fix is to
reorder the sctp defaults initialization with protosw registration.  However, that's
not possible because control socket is created in the sctp defaults initialization code
and needs protosw to be registered (chicken and egg issue).

What I am saying is that it is kind of strange to create control socket during protocol
default initialization.  The control socket has nothing  really to do with defaults.  So,
we could pull it out of the defaults initialization (sctp_net_init()) code and into its
own initialization path.

Then you can order sctp_net_init() such that it happens first, then protosw registration
happens, then control socket initialization happens, then inet protocol registration happens.

This way, we are always guaranteed that by the time user calls socket(), protocol
defaults are fully initialized.

> 
> I used the list pointer because that's null as that memory is entirely zeroed when alloced
> and, after initialization, it's never null again. Works like a lock/condition without
> using an extra field.
> 

I understand this a well.  What I don't particularly like is that we are re-using
a list without really stating why it's now done this way.  Additionally, it's not really
the last that happens so it's seems kind of hacky...  If we need to add new
per-net initializers, we now need to make sure that the code is put in the right
place.  I'd just really like to have a cleaner solution...

-vlad

>   Marcelo
> 
>>> Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
>>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> ---
>>>   net/sctp/protocol.c | 18 +++++++++++-------
>>>   net/sctp/socket.c   |  4 ++++
>>>   2 files changed, 15 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
>>> index
>>> 4345790ad3266c353eeac5398593c2a9ce4effda..d8f78165768a75f93f4ce4120dd5475b6a623aaf 100644
>>> --- a/net/sctp/protocol.c
>>> +++ b/net/sctp/protocol.c
>>> @@ -1271,12 +1271,6 @@ static int __net_init sctp_net_init(struct net *net)
>>>
>>>       sctp_dbg_objcnt_init(net);
>>>
>>> -    /* Initialize the control inode/socket for handling OOTB packets.  */
>>> -    if ((status = sctp_ctl_sock_init(net))) {
>>> -        pr_err("Failed to initialize the SCTP control sock\n");
>>> -        goto err_ctl_sock_init;
>>> -    }
>>> -
>>>       /* Initialize the local address list. */
>>>       INIT_LIST_HEAD(&net->sctp.local_addr_list);
>>>       spin_lock_init(&net->sctp.local_addr_lock);
>>> @@ -1284,11 +1278,21 @@ static int __net_init sctp_net_init(struct net *net)
>>>
>>>       /* Initialize the address event list */
>>>       INIT_LIST_HEAD(&net->sctp.addr_waitq);
>>> -    INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
>>>       spin_lock_init(&net->sctp.addr_wq_lock);
>>>       net->sctp.addr_wq_timer.expires = 0;
>>>       setup_timer(&net->sctp.addr_wq_timer, sctp_addr_wq_timeout_handler,
>>>               (unsigned long)net);
>>> +    /* sctp_init_sock() will use this to know that netns is
>>> +     * nearly all initialized but already good to go.
>>> +     */
>>> +    INIT_LIST_HEAD(&net->sctp.auto_asconf_splist);
>>> +
>>> +    /* Initialize the control inode/socket for handling OOTB packets.  */
>>> +    status = sctp_ctl_sock_init(net);
>>> +    if (status) {
>>> +        pr_err("Failed to initialize the SCTP control sock\n");
>>> +        goto err_ctl_sock_init;
>>> +    }
>>>
>>>       return 0;
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index
>>> 17bef01b9aa3e7f75328d39fc976f9e80d641e92..45b94deec93d0c7c1612a16922348cf2a7e65ec5 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -3993,6 +3993,10 @@ static int sctp_init_sock(struct sock *sk)
>>>
>>>       pr_debug("%s: sk:%p\n", __func__, sk);
>>>
>>> +    /* Validate if netns is already initialized. */
>>> +    if (!net->sctp.auto_asconf_splist.prev)
>>> +        return -ENOPROTOOPT;
>>> +
>>>       sp = sctp_sk(sk);
>>>
>>>       /* Initialize the SCTP per socket area.  */
>>>
>>
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 13:24     ` Vlad Yasevich
@ 2015-09-10 14:22       ` Marcelo Ricardo Leitner
  2015-09-10 15:50         ` Vlad Yasevich
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 14:22 UTC (permalink / raw)
  To: Vlad Yasevich, netdev; +Cc: Neil Horman, linux-sctp

Em 10-09-2015 10:24, Vlad Yasevich escreveu:
> On 09/09/2015 05:06 PM, Marcelo Ricardo Leitner wrote:
>> Em 09-09-2015 17:30, Vlad Yasevich escreveu:
>>> On 09/09/2015 04:03 PM, Marcelo Ricardo Leitner wrote:
>>>> Consider sctp module is unloaded and is being requested because an user
>>>> is creating a sctp socket.
>>>>
>>>> During initialization, sctp will add the new protocol type and then
>>>> initialize pernet subsys:
>>>>
>>>>           status = sctp_v4_protosw_init();
>>>>           if (status)
>>>>                   goto err_protosw_init;
>>>>
>>>>           status = sctp_v6_protosw_init();
>>>>           if (status)
>>>>                   goto err_v6_protosw_init;
>>>>
>>>>           status = register_pernet_subsys(&sctp_net_ops);
>>>>
>>>> The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
>>>> is possible for userspace to create SCTP sockets like if the module is
>>>> already fully loaded. If that happens, one of the possible effects is
>>>> that we will have readers for net->sctp.local_addr_list list earlier
>>>> than expected and sctp_net_init() does not take precautions while
>>>> dealing with that list, leading to a potential panic but not limited to
>>>> that, as sctp_sock_init() will copy a bunch of blank/partially
>>>> initialized values from net->sctp.
>>>>
>>>> The race happens like this:
>>>>
>>>>        CPU 0                           |  CPU 1
>>>>     socket()                           |
>>>>      __sock_create                     | socket()
>>>>       inet_create                      |  __sock_create
>>>>        list_for_each_entry_rcu(        |
>>>>           answer, &inetsw[sock->type], |
>>>>           list) {                      |   inet_create
>>>>         /* no hits */                  |
>>>>        if (unlikely(err)) {            |
>>>>         ...                            |
>>>>         request_module()               |
>>>>         /* socket creation is blocked  |
>>>>          * the module is fully loaded  |
>>>>          */                            |
>>>>          sctp_init                     |
>>>>           sctp_v4_protosw_init         |
>>>>            inet_register_protosw       |
>>>>             list_add_rcu(&p->list,     |
>>>>                          last_perm);   |
>>>>                                        |  list_for_each_entry_rcu(
>>>>                                        |     answer, &inetsw[sock->type],
>>>>           sctp_v6_protosw_init         |     list) {
>>>>                                        |     /* hit, so assumes protocol
>>>>                                        |      * is already loaded
>>>>                                        |      */
>>>>                                        |  /* socket creation continues
>>>>                                        |   * before netns is initialized
>>>>                                        |   */
>>>>           register_pernet_subsys       |
>>>>
>>>> Inverting the initialization order between register_pernet_subsys() and
>>>> sctp_v4_protosw_init() is not possible because register_pernet_subsys()
>>>> will create a control sctp socket, so the protocol must be already
>>>> visible by then. Deferring the socket creation to a work-queue is not
>>>> good specially because we loose the ability to handle its errors.
>>>>
>>>> So the fix then is to invert the initialization order inside
>>>> register_pernet_subsys() so that the control socket is created by last
>>>> and also block socket creation if netns initialization wasn't yet
>>>> performed.
>>>>
>>>
>>> not sure how much I like that...  Wouldn't it be better
>>> to pull the control socket initialization stuff out into its
>>> own function that does something like
>>>
>>> for_each_net_rcu()
>>>      init_control_socket(net, ...)
>>>
>>>
>>> Or may be even pull the control socket creation
>>> stuff completely into its own per-net ops operations structure
>>> and initialize it after the the protosw stuff has been done.
>>>
>>> -vlad
>>
>> I'm afraid error handling won't be easy then.
>>
>> But still, the control socket is not really the problem, because we don't care (much?) if
>> it contains zeroed values and the panic happens only if you call connect() on it. I moved
>> it solely because of the protection on sctp_init_sock().
>>
>> The real problem is new sockets created by an user application while module is still
>> loading, because even if them don't trigger the panic, they may not be fully functional
>> due to improper values loaded. Can't see other good ways to protect sctp_init_sock() from
>> that early call (as in, prior to netns initialization).
>
> Right, I understand what the problem really is.  Like you said, the simple fix is to
> reorder the sctp defaults initialization with protosw registration.  However, that's
> not possible because control socket is created in the sctp defaults initialization code
> and needs protosw to be registered (chicken and egg issue).

Yes, same page then, cool.

> What I am saying is that it is kind of strange to create control socket during protocol
> default initialization.  The control socket has nothing  really to do with defaults.  So,
> we could pull it out of the defaults initialization (sctp_net_init()) code and into its
> own initialization path.

I don't really see sctp_net_init() as a pure defaults initialization 
routine. It's the callback for new netns's, so it should initialize 
anything needed for a new netns, no?

> Then you can order sctp_net_init() such that it happens first, then protosw registration
> happens, then control socket initialization happens, then inet protocol registration happens.
>
> This way, we are always guaranteed that by the time user calls socket(), protocol
> defaults are fully initialized.

Okay, that works for module loading stage, but then how would we handle 
new netns's? We have to create the control socket per netns and AFAICT 
sctp_net_init() is the only hook called when a new netns is being created.

Then if we move it a workqueue that is scheduled by sctp_net_init(), we 
loose the ability to handle its errors by propagating through 
sctp_net_init() return value, not good.

>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
>> and, after initialization, it's never null again. Works like a lock/condition without
>> using an extra field.
>>
>
> I understand this a well.  What I don't particularly like is that we are re-using
> a list without really stating why it's now done this way.  Additionally, it's not really
> the last that happens so it's seems kind of hacky...  If we need to add new
> per-net initializers, we now need to make sure that the code is put in the right
> place.  I'd just really like to have a cleaner solution...

Ok, got you. We could add a dedicated flag/bit for that then, if reusing 
the list is not clear enough. Or, as we are discussing on the other part 
of thread, we could make it block and wait for the initialization, 
probably using some wait_queue. I'm still thinking on something this 
way, likely something more below than sctp then.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 13:02     ` David Laight
@ 2015-09-10 14:36       ` Marcelo Ricardo Leitner
  2015-09-10 15:03         ` David Laight
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 14:36 UTC (permalink / raw)
  To: David Laight, David Miller
  Cc: netdev@vger.kernel.org, vyasevich@gmail.com,
	nhorman@tuxdriver.com, linux-sctp@vger.kernel.org

Em 10-09-2015 10:02, David Laight escreveu:
> From: Marcelo Ricardo Leitner
>> Sent: 10 September 2015 13:54
>> Em 09-09-2015 21:16, David Miller escreveu:
>>> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Date: Wed,  9 Sep 2015 17:03:01 -0300
>>>
>>>> So the fix then is to invert the initialization order inside
>>>> register_pernet_subsys() so that the control socket is created by last
>>>> and also block socket creation if netns initialization wasn't yet
>>>> performed.
>>>
>>> If we really need to we could make ->create() fail with -EAFNOSUPPORT
>>> if kern==1 until the protocol is fully setup.
>>>
>>> Or, instead of failing, we could make such ->create() calls block
>>> until the control sock init is complete or fails.
>>
>> I guess I should have written that paragraph in another order, perhaps like:
>> So the fix then is to deny any sctp socket creation until netns
>> initialization is sufficiently done. And due to that, we have to
>> initialize the control socket as last step in netns initialization, as
>> now it can't be created earlier anymore.
>>
>> Is it clearer on the intention?
>>
>> And my emphasis on userspace sockets was to highlight that a random user
>> could trigger it, but yes both users are affected by the issue.
>>
>> Strictly speaking, we would have to block ->create() not until the
>> control socket init is done but until the protocol is fully loaded. Such
>> condition, with this patch, is after net->sctp.auto_asconf_splist is
>> initialized. But for blocking until instead of just denying, we would
>> need some other mechanism.
>>
>> It would be better from the (sctp) user point of view but then such
>> solution may better belong to another layer instead and protect all
>> protocols at once. (I checked and couldn't find other protocols at risk
>> like sctp)
>
> Given that the first ->create() blocks while the protocol code loads
> it really wouldn't be right to error a subsequent ->create() because
> the load hasn't completed.

Can't say I don't agree with you, but at the same time, there are other 
temporary errors that can happen and that the user should just retry. 
This would be just another condition in a trade off for avoiding complexity.

> I hold a semaphore across sock_create_kern() because of issues with sctp.
> (Current kernels are nowhere near as bad as really old ones though.)

Oh, that's not good to hear. I'll experiment with that later, try to 
catch some bugs. :)

   Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 14:36       ` Marcelo Ricardo Leitner
@ 2015-09-10 15:03         ` David Laight
  0 siblings, 0 replies; 18+ messages in thread
From: David Laight @ 2015-09-10 15:03 UTC (permalink / raw)
  To: 'Marcelo Ricardo Leitner', David Miller
  Cc: netdev@vger.kernel.org, vyasevich@gmail.com,
	nhorman@tuxdriver.com, linux-sctp@vger.kernel.org

From: Marcelo Ricardo 
> Sent: 10 September 2015 15:36
...
> > Given that the first ->create() blocks while the protocol code loads
> > it really wouldn't be right to error a subsequent ->create() because
> > the load hasn't completed.
> 
> Can't say I don't agree with you, but at the same time, there are other
> temporary errors that can happen and that the user should just retry.
> This would be just another condition in a trade off for avoiding complexity.

We do retry, but the delay messes up out test scripts :-(

> > I hold a semaphore across sock_create_kern() because of issues with sctp.
> > (Current kernels are nowhere near as bad as really old ones though.)
> 
> Oh, that's not good to hear. I'll experiment with that later, try to
> catch some bugs. :)

I mean REALLY old - like 2.6.12 (FC3).
I'm pretty sure I've seen oops as well as create failing.

We don't create enough sockets for the semaphore to be a problem.

OTOH I've a current problem with a customer using RHEL5.8 (basically 2.6.18).
They might manage to move to RHEL6 (2.6.32) - but that could take a year or two.
RH might be pulling some of the SCTP fixes, but I doubt they get priority.

	David

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 14:22       ` Marcelo Ricardo Leitner
@ 2015-09-10 15:50         ` Vlad Yasevich
  2015-09-10 16:24           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 18+ messages in thread
From: Vlad Yasevich @ 2015-09-10 15:50 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, netdev; +Cc: Neil Horman, linux-sctp

On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote:
> Em 10-09-2015 10:24, Vlad Yasevich escreveu:
>> On 09/09/2015 05:06 PM, Marcelo Ricardo Leitner wrote:
>>> Em 09-09-2015 17:30, Vlad Yasevich escreveu:
>>>> On 09/09/2015 04:03 PM, Marcelo Ricardo Leitner wrote:
>>>>> Consider sctp module is unloaded and is being requested because an user
>>>>> is creating a sctp socket.
>>>>>
>>>>> During initialization, sctp will add the new protocol type and then
>>>>> initialize pernet subsys:
>>>>>
>>>>>           status = sctp_v4_protosw_init();
>>>>>           if (status)
>>>>>                   goto err_protosw_init;
>>>>>
>>>>>           status = sctp_v6_protosw_init();
>>>>>           if (status)
>>>>>                   goto err_v6_protosw_init;
>>>>>
>>>>>           status = register_pernet_subsys(&sctp_net_ops);
>>>>>
>>>>> The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
>>>>> is possible for userspace to create SCTP sockets like if the module is
>>>>> already fully loaded. If that happens, one of the possible effects is
>>>>> that we will have readers for net->sctp.local_addr_list list earlier
>>>>> than expected and sctp_net_init() does not take precautions while
>>>>> dealing with that list, leading to a potential panic but not limited to
>>>>> that, as sctp_sock_init() will copy a bunch of blank/partially
>>>>> initialized values from net->sctp.
>>>>>
>>>>> The race happens like this:
>>>>>
>>>>>        CPU 0                           |  CPU 1
>>>>>     socket()                           |
>>>>>      __sock_create                     | socket()
>>>>>       inet_create                      |  __sock_create
>>>>>        list_for_each_entry_rcu(        |
>>>>>           answer, &inetsw[sock->type], |
>>>>>           list) {                      |   inet_create
>>>>>         /* no hits */                  |
>>>>>        if (unlikely(err)) {            |
>>>>>         ...                            |
>>>>>         request_module()               |
>>>>>         /* socket creation is blocked  |
>>>>>          * the module is fully loaded  |
>>>>>          */                            |
>>>>>          sctp_init                     |
>>>>>           sctp_v4_protosw_init         |
>>>>>            inet_register_protosw       |
>>>>>             list_add_rcu(&p->list,     |
>>>>>                          last_perm);   |
>>>>>                                        |  list_for_each_entry_rcu(
>>>>>                                        |     answer, &inetsw[sock->type],
>>>>>           sctp_v6_protosw_init         |     list) {
>>>>>                                        |     /* hit, so assumes protocol
>>>>>                                        |      * is already loaded
>>>>>                                        |      */
>>>>>                                        |  /* socket creation continues
>>>>>                                        |   * before netns is initialized
>>>>>                                        |   */
>>>>>           register_pernet_subsys       |
>>>>>
>>>>> Inverting the initialization order between register_pernet_subsys() and
>>>>> sctp_v4_protosw_init() is not possible because register_pernet_subsys()
>>>>> will create a control sctp socket, so the protocol must be already
>>>>> visible by then. Deferring the socket creation to a work-queue is not
>>>>> good specially because we loose the ability to handle its errors.
>>>>>
>>>>> So the fix then is to invert the initialization order inside
>>>>> register_pernet_subsys() so that the control socket is created by last
>>>>> and also block socket creation if netns initialization wasn't yet
>>>>> performed.
>>>>>
>>>>
>>>> not sure how much I like that...  Wouldn't it be better
>>>> to pull the control socket initialization stuff out into its
>>>> own function that does something like
>>>>
>>>> for_each_net_rcu()
>>>>      init_control_socket(net, ...)
>>>>
>>>>
>>>> Or may be even pull the control socket creation
>>>> stuff completely into its own per-net ops operations structure
>>>> and initialize it after the the protosw stuff has been done.
>>>>
>>>> -vlad
>>>
>>> I'm afraid error handling won't be easy then.
>>>
>>> But still, the control socket is not really the problem, because we don't care (much?) if
>>> it contains zeroed values and the panic happens only if you call connect() on it. I moved
>>> it solely because of the protection on sctp_init_sock().
>>>
>>> The real problem is new sockets created by an user application while module is still
>>> loading, because even if them don't trigger the panic, they may not be fully functional
>>> due to improper values loaded. Can't see other good ways to protect sctp_init_sock() from
>>> that early call (as in, prior to netns initialization).
>>
>> Right, I understand what the problem really is.  Like you said, the simple fix is to
>> reorder the sctp defaults initialization with protosw registration.  However, that's
>> not possible because control socket is created in the sctp defaults initialization code
>> and needs protosw to be registered (chicken and egg issue).
> 
> Yes, same page then, cool.
> 
>> What I am saying is that it is kind of strange to create control socket during protocol
>> default initialization.  The control socket has nothing  really to do with defaults.  So,
>> we could pull it out of the defaults initialization (sctp_net_init()) code and into its
>> own initialization path.
> 
> I don't really see sctp_net_init() as a pure defaults initialization routine. It's the
> callback for new netns's, so it should initialize anything needed for a new netns, no?
> 
>> Then you can order sctp_net_init() such that it happens first, then protosw registration
>> happens, then control socket initialization happens, then inet protocol registration
>> happens.
>>
>> This way, we are always guaranteed that by the time user calls socket(), protocol
>> defaults are fully initialized.
> 
> Okay, that works for module loading stage, but then how would we handle new netns's? We
> have to create the control socket per netns and AFAICT sctp_net_init() is the only hook
> called when a new netns is being created.
> 
> Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability
> to handle its errors by propagating through sctp_net_init() return value, not good.

Here is kind of what I had in mind.  It's incomplete and completely untested (not even
compiled), but good enough to describe the idea:

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 59e8035..970bdce 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1166,7 +1166,7 @@ static void sctp_v4_del_protocol(void)
        unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
 }

-static int __net_init sctp_net_init(struct net *net)
+static int __net_init sctp_defauls_init(struct net *net)
 {
        int status;

@@ -1259,12 +1259,6 @@ static int __net_init sctp_net_init(struct net *net)

        sctp_dbg_objcnt_init(net);

-       /* Initialize the control inode/socket for handling OOTB packets.  */
-       if ((status = sctp_ctl_sock_init(net))) {
-               pr_err("Failed to initialize the SCTP control sock\n");
-               goto err_ctl_sock_init;
-       }
-
        /* Initialize the local address list. */
        INIT_LIST_HEAD(&net->sctp.local_addr_list);
        spin_lock_init(&net->sctp.local_addr_lock);
@@ -1291,15 +1285,12 @@ err_sysctl_register:
        return status;
 }

-static void __net_exit sctp_net_exit(struct net *net)
+static void __net_exit sctp_defautls_exit(struct net *net)
 {
        /* Free the local address list */
        sctp_free_addr_wq(net);
        sctp_free_local_addr_list(net);

-       /* Free the control endpoint.  */
-       inet_ctl_sock_destroy(net->sctp.ctl_sock);
-
        sctp_dbg_objcnt_exit(net);

        sctp_proc_exit(net);
@@ -1307,9 +1298,31 @@ static void __net_exit sctp_net_exit(struct net *net)
        sctp_sysctl_net_unregister(net);
 }

-static struct pernet_operations sctp_net_ops = {
-       .init = sctp_net_init,
-       .exit = sctp_net_exit,
+static struct pernet_operations sctp_defaults_ops = {
+       .init = sctp_net_defaults_init,
+       .exit = sctp_net_defaults_exit,
+};
+
+static int __net_init sctp_net_ctrlsock_init(struct net *net)
+{
+       int status;
+
+       /* Initialize the control inode/socket for handling OOTB packets.  */
+       if ((status = sctp_ctl_sock_init(net)))
+               pr_err("Failed to initialize the SCTP control sock\n");
+
+       return status;
+}
+
+static void __net_exit sctp_defautls_exit(struct net *net)
+{
+       /* Free the control endpoint.  */
+       inet_ctl_sock_destroy(net->sctp.ctl_sock);
+}
+
+static struct pernet_operations sctp_ctrlsock_opts = {
+       .init = sctp_net_ctrlsock_init,
+       .exit = sctp_net_ctrlsock_exit,
 };
 /* Initialize the universe into something sensible.  */
@@ -1442,6 +1455,10 @@ static __init int sctp_init(void)
        sctp_v4_pf_init();
        sctp_v6_pf_init();

+       status = register_pernet_subsys(&sctp_defaults_ops);
+       if (status)
+               goto err_register_default_ops;
+
        status = sctp_v4_protosw_init();

        if (status)
@@ -1451,9 +1468,9 @@ static __init int sctp_init(void)
        if (status)
                goto err_v6_protosw_init;

-       status = register_pernet_subsys(&sctp_net_ops);
+       status = register_pernet_subsys(&sctp_ctrlsock_ops);
        if (status)
-               goto err_register_pernet_subsys;
+               goto err_register_ctrlsock_ops;

        status = sctp_v4_add_protocol();
        if (status)


> 
>>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
>>> and, after initialization, it's never null again. Works like a lock/condition without
>>> using an extra field.
>>>
>>
>> I understand this a well.  What I don't particularly like is that we are re-using
>> a list without really stating why it's now done this way.  Additionally, it's not really
>> the last that happens so it's seems kind of hacky...  If we need to add new
>> per-net initializers, we now need to make sure that the code is put in the right
>> place.  I'd just really like to have a cleaner solution...
> 
> Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not
> clear enough. Or, as we are discussing on the other part of thread, we could make it block
> and wait for the initialization, probably using some wait_queue. I'm still thinking on
> something this way, likely something more below than sctp then.
> 

I think if we don the above, the second process calling socket() will either find the
the protosw or will try to load the module also.  I think either is ok after
request_module returns we'll look at the protosw and will find find things.

-vlad

> Thanks,
> Marcelo
> 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 15:50         ` Vlad Yasevich
@ 2015-09-10 16:24           ` Marcelo Ricardo Leitner
  2015-09-10 18:35             ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 16:24 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Neil Horman, linux-sctp

On Thu, Sep 10, 2015 at 11:50:06AM -0400, Vlad Yasevich wrote:
> On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote:
> > Em 10-09-2015 10:24, Vlad Yasevich escreveu:
...
> >> Then you can order sctp_net_init() such that it happens first, then protosw registration
> >> happens, then control socket initialization happens, then inet protocol registration
> >> happens.
> >>
> >> This way, we are always guaranteed that by the time user calls socket(), protocol
> >> defaults are fully initialized.
> > 
> > Okay, that works for module loading stage, but then how would we handle new netns's? We
> > have to create the control socket per netns and AFAICT sctp_net_init() is the only hook
> > called when a new netns is being created.
> > 
> > Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability
> > to handle its errors by propagating through sctp_net_init() return value, not good.
> 
> Here is kind of what I had in mind.  It's incomplete and completely untested (not even
> compiled), but good enough to describe the idea:
...

Ohh, ok now I get it, thanks. If having two pernet_subsys for a given
module is fine, that works for me. It's clearer and has no moment of
temporary failure.

I can finish this patch if everybody agrees with it.

> >>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
> >>> and, after initialization, it's never null again. Works like a lock/condition without
> >>> using an extra field.
> >>>
> >>
> >> I understand this a well.  What I don't particularly like is that we are re-using
> >> a list without really stating why it's now done this way.  Additionally, it's not really
> >> the last that happens so it's seems kind of hacky...  If we need to add new
> >> per-net initializers, we now need to make sure that the code is put in the right
> >> place.  I'd just really like to have a cleaner solution...
> > 
> > Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not
> > clear enough. Or, as we are discussing on the other part of thread, we could make it block
> > and wait for the initialization, probably using some wait_queue. I'm still thinking on
> > something this way, likely something more below than sctp then.
> > 
> 
> I think if we don the above, the second process calling socket() will either find the
> the protosw or will try to load the module also.  I think either is ok after
> request_module returns we'll look at the protosw and will find find things.

Seems so, yes. Nice.

  Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 16:24           ` Marcelo Ricardo Leitner
@ 2015-09-10 18:35             ` Marcelo Ricardo Leitner
  2015-09-10 18:47               ` Marcelo Ricardo Leitner
  2015-09-10 19:14               ` Vlad Yasevich
  0 siblings, 2 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 18:35 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Neil Horman, linux-sctp

On Thu, Sep 10, 2015 at 01:24:54PM -0300, Marcelo Ricardo Leitner wrote:
> On Thu, Sep 10, 2015 at 11:50:06AM -0400, Vlad Yasevich wrote:
> > On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote:
> > > Em 10-09-2015 10:24, Vlad Yasevich escreveu:
> ...
> > >> Then you can order sctp_net_init() such that it happens first, then protosw registration
> > >> happens, then control socket initialization happens, then inet protocol registration
> > >> happens.
> > >>
> > >> This way, we are always guaranteed that by the time user calls socket(), protocol
> > >> defaults are fully initialized.
> > > 
> > > Okay, that works for module loading stage, but then how would we handle new netns's? We
> > > have to create the control socket per netns and AFAICT sctp_net_init() is the only hook
> > > called when a new netns is being created.
> > > 
> > > Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability
> > > to handle its errors by propagating through sctp_net_init() return value, not good.
> > 
> > Here is kind of what I had in mind.  It's incomplete and completely untested (not even
> > compiled), but good enough to describe the idea:
> ...
> 
> Ohh, ok now I get it, thanks. If having two pernet_subsys for a given
> module is fine, that works for me. It's clearer and has no moment of
> temporary failure.
> 
> I can finish this patch if everybody agrees with it.
> 
> > >>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
> > >>> and, after initialization, it's never null again. Works like a lock/condition without
> > >>> using an extra field.
> > >>>
> > >>
> > >> I understand this a well.  What I don't particularly like is that we are re-using
> > >> a list without really stating why it's now done this way.  Additionally, it's not really
> > >> the last that happens so it's seems kind of hacky...  If we need to add new
> > >> per-net initializers, we now need to make sure that the code is put in the right
> > >> place.  I'd just really like to have a cleaner solution...
> > > 
> > > Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not
> > > clear enough. Or, as we are discussing on the other part of thread, we could make it block
> > > and wait for the initialization, probably using some wait_queue. I'm still thinking on
> > > something this way, likely something more below than sctp then.
> > > 
> > 
> > I think if we don the above, the second process calling socket() will either find the
> > the protosw or will try to load the module also.  I think either is ok after
> > request_module returns we'll look at the protosw and will find find things.
> 
> Seems so, yes. Nice.

I was testing with it, something is not good. I finished your patch and
testing with a flooder like:
 # for j in {1..5}; do for i in {1234..1280}; do \
       sctp_darn -H 192.168.122.147 -P $j$i -l & done & done

The system didn't crash, but I got:
[1] 13507
[2] 13508
[3] 13510
[4] 13513
[5] 13517
[mrl@localhost ~]$ sctp_darn: failed to create socket:  Socket type not
supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn listening...
sctp_darn listening...
sctp_darn listening...
sctp_darn listening...
...
sctp_darn listening...
sctp_darn listening...
sctp_darn listening...
sctp_darn listening...
sctp_darn: failed to create socket:  Socket type not supported.
sctp_darn listening...
sctp_darn listening...

And with this applied:
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -289,6 +289,7 @@ lookup_protocol:
        if (unlikely(err)) {
                if (try_loading_module < 2) {
                        rcu_read_unlock();
+                       printk("%p loading proto module\n", sock);
                        /*
                         * Be more specific, e.g. net-pf-2-proto-132-type-1
                         * (net-pf-PF_INET-proto-IPPROTO_SCTP-type-SOCK_STREAM)
@@ -303,6 +304,8 @@ lookup_protocol:
                        else
                                request_module("net-pf-%d-proto-%d",
                                               PF_INET, protocol);
+
+                       printk("%p done loading proto module\n", sock);
                        goto lookup_protocol;
                } else
                        goto out_rcu_unlock;


During that test, it showed:
[  732.261730] ffff8800da2ce300 loading proto module
               ^^^^^^^^^^^^^^^^ (1)    (first printed line)
[  732.262077] ffff8800da2ca680 loading proto module
[  732.262285] ffff8800da2c8b00 loading proto module
[  732.262421] ffff8800da2ccd00 loading proto module
[  732.263763] ffff8800da2c8580 loading proto module
[  732.265872] ffff8800da2cd280 loading proto module
[  732.270517] ffff8800da2cf900 loading proto module
[  732.270626] ffff8800da2c9080 loading proto module
[  732.272170] ffff8800da2c8580 done loading proto module
[  732.273042] ffff8800da2c8580 loading proto module
[  732.277248] ffff8800da2cf380 loading proto module
[  732.278495] ffff8800da2ca680 done loading proto module
[  732.278916] ffff8800da2cd280 done loading proto module
[  732.278918] ffff8800da2cd280 loading proto module
...
[  732.281171] ffff8800da2ce300 done loading proto module
               ^^^^^^^^^^^^^^^^ (1)
[  732.281173] ffff8800da2ce300 loading proto module
               ^^^^^^^^^^^^^^^^ (1)
...
[  732.321299] ffff880034995800 loading proto module
[  732.461481] ffff880034995800 done loading proto module
[  732.461482] ffff880034995800 loading proto module
[  732.461483] ffff880034995800 done loading proto module
               ^^^^^^^^^^^^^^^^   an attempt that tried both aliases
	                          quickly and returned before sctp was
				  initialized
...
[  732.489413] sctp: Hash tables configured (established 1638 bind 1638)
...
[  732.634034] ffff8800da2ce300 done loading proto module
               ^^^^^^^^^^^^^^^^ (1)  nearly 400ms later

Also got:
$ dmesg | grep runaw
[  732.439598] request_module: runaway loop modprobe
net-pf-2-proto-132-type-5
[  732.442017] request_module: runaway loop modprobe net-pf-2-proto-132
[  732.449195] request_module: runaway loop modprobe
net-pf-2-proto-132-type-5
[  732.451946] request_module: runaway loop modprobe net-pf-2-proto-132
[  732.458970] request_module: runaway loop modprobe
net-pf-2-proto-132-type-5
[  732.460380] request_module: runaway loop modprobe
net-pf-2-proto-132-type-5

Such socket() calls failed (and some other too), as per (kernel/kmod.c):
__request_module()
{
...
        max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
        atomic_inc(&kmod_concurrent);
        if (atomic_read(&kmod_concurrent) > max_modprobes) {
                /* We may be blaming an innocent here, but unlikely */
                if (kmod_loop_msg < 5) {
                        printk(KERN_ERR
                               "request_module: runaway loop modprobe %s\n",
                               module_name);
                        kmod_loop_msg++;
                }
                atomic_dec(&kmod_concurrent);
                return -ENOMEM;
        }
...
}

FWIW, my test system has 8 threads.

It seems that request_module will not serialize it as we wanted and we
would be putting unexpected pressure on it, yet it fixes the original
issue. Maybe we can place a semaphore at inet_create(), protecting the
request_module()s so only one socket can do it at a time and, after it
is released, whoever was blocked on it re-checks if the module isn't
already loaded before attempting again. It makes the loading of
different modules slower, though, but I'm not sure if that's really a
problem. Not many modules are loaded at the same time like that. What do
you think? 

  Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 18:35             ` Marcelo Ricardo Leitner
@ 2015-09-10 18:47               ` Marcelo Ricardo Leitner
  2015-09-10 19:14               ` Vlad Yasevich
  1 sibling, 0 replies; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 18:47 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Neil Horman, linux-sctp

On Thu, Sep 10, 2015 at 03:35:20PM -0300, Marcelo Ricardo Leitner wrote:
> On Thu, Sep 10, 2015 at 01:24:54PM -0300, Marcelo Ricardo Leitner wrote:
> > On Thu, Sep 10, 2015 at 11:50:06AM -0400, Vlad Yasevich wrote:
> > > On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote:
> > > > Em 10-09-2015 10:24, Vlad Yasevich escreveu:
> > ...
> > > >> Then you can order sctp_net_init() such that it happens first, then protosw registration
> > > >> happens, then control socket initialization happens, then inet protocol registration
> > > >> happens.
> > > >>
> > > >> This way, we are always guaranteed that by the time user calls socket(), protocol
> > > >> defaults are fully initialized.
> > > > 
> > > > Okay, that works for module loading stage, but then how would we handle new netns's? We
> > > > have to create the control socket per netns and AFAICT sctp_net_init() is the only hook
> > > > called when a new netns is being created.
> > > > 
> > > > Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability
> > > > to handle its errors by propagating through sctp_net_init() return value, not good.
> > > 
> > > Here is kind of what I had in mind.  It's incomplete and completely untested (not even
> > > compiled), but good enough to describe the idea:
> > ...
> > 
> > Ohh, ok now I get it, thanks. If having two pernet_subsys for a given
> > module is fine, that works for me. It's clearer and has no moment of
> > temporary failure.
> > 
> > I can finish this patch if everybody agrees with it.
> > 
> > > >>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
> > > >>> and, after initialization, it's never null again. Works like a lock/condition without
> > > >>> using an extra field.
> > > >>>
> > > >>
> > > >> I understand this a well.  What I don't particularly like is that we are re-using
> > > >> a list without really stating why it's now done this way.  Additionally, it's not really
> > > >> the last that happens so it's seems kind of hacky...  If we need to add new
> > > >> per-net initializers, we now need to make sure that the code is put in the right
> > > >> place.  I'd just really like to have a cleaner solution...
> > > > 
> > > > Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not
> > > > clear enough. Or, as we are discussing on the other part of thread, we could make it block
> > > > and wait for the initialization, probably using some wait_queue. I'm still thinking on
> > > > something this way, likely something more below than sctp then.
> > > > 
> > > 
> > > I think if we don the above, the second process calling socket() will either find the
> > > the protosw or will try to load the module also.  I think either is ok after
> > > request_module returns we'll look at the protosw and will find find things.
> > 
> > Seems so, yes. Nice.
> 
> I was testing with it, something is not good. I finished your patch and
> testing with a flooder like:

This is the patch I used. Mostly just fixed a few typos and added error handling.

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4345790ad326..8930046eaa1b 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1178,7 +1178,7 @@ static void sctp_v4_del_protocol(void)
 	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
 }
 
-static int __net_init sctp_net_init(struct net *net)
+static int __net_init sctp_defaults_init(struct net *net)
 {
 	int status;
 
@@ -1271,12 +1271,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	sctp_dbg_objcnt_init(net);
 
-	/* Initialize the control inode/socket for handling OOTB packets.  */
-	if ((status = sctp_ctl_sock_init(net))) {
-		pr_err("Failed to initialize the SCTP control sock\n");
-		goto err_ctl_sock_init;
-	}
-
 	/* Initialize the local address list. */
 	INIT_LIST_HEAD(&net->sctp.local_addr_list);
 	spin_lock_init(&net->sctp.local_addr_lock);
@@ -1292,9 +1286,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	return 0;
 
-err_ctl_sock_init:
-	sctp_dbg_objcnt_exit(net);
-	sctp_proc_exit(net);
 err_init_proc:
 	cleanup_sctp_mibs(net);
 err_init_mibs:
@@ -1303,15 +1294,12 @@ err_sysctl_register:
 	return status;
 }
 
-static void __net_exit sctp_net_exit(struct net *net)
+static void __net_exit sctp_defaults_exit(struct net *net)
 {
 	/* Free the local address list */
 	sctp_free_addr_wq(net);
 	sctp_free_local_addr_list(net);
 
-	/* Free the control endpoint.  */
-	inet_ctl_sock_destroy(net->sctp.ctl_sock);
-
 	sctp_dbg_objcnt_exit(net);
 
 	sctp_proc_exit(net);
@@ -1319,9 +1307,32 @@ static void __net_exit sctp_net_exit(struct net *net)
 	sctp_sysctl_net_unregister(net);
 }
 
-static struct pernet_operations sctp_net_ops = {
-	.init = sctp_net_init,
-	.exit = sctp_net_exit,
+static struct pernet_operations sctp_defaults_ops = {
+	.init = sctp_defaults_init,
+	.exit = sctp_defaults_exit,
+};
+
+static int __net_init sctp_ctrlsock_init(struct net *net)
+{
+	int status;
+
+	/* Initialize the control inode/socket for handling OOTB packets.  */
+	status = sctp_ctl_sock_init(net);
+	if (status)
+		pr_err("Failed to initialize the SCTP control sock\n");
+
+	return status;
+}
+
+static void __net_init sctp_ctrlsock_exit(struct net *net)
+{
+	/* Free the control endpoint.  */
+	inet_ctl_sock_destroy(net->sctp.ctl_sock);
+}
+
+static struct pernet_operations sctp_ctrlsock_ops = {
+	.init = sctp_ctrlsock_init,
+	.exit = sctp_ctrlsock_exit,
 };
 
 /* Initialize the universe into something sensible.  */
@@ -1454,8 +1465,11 @@ static __init int sctp_init(void)
 	sctp_v4_pf_init();
 	sctp_v6_pf_init();
 
-	status = sctp_v4_protosw_init();
+	status = register_pernet_subsys(&sctp_defaults_ops);
+	if (status)
+		goto err_register_defaults;
 
+	status = sctp_v4_protosw_init();
 	if (status)
 		goto err_protosw_init;
 
@@ -1463,9 +1477,9 @@ static __init int sctp_init(void)
 	if (status)
 		goto err_v6_protosw_init;
 
-	status = register_pernet_subsys(&sctp_net_ops);
+	status = register_pernet_subsys(&sctp_ctrlsock_ops);
 	if (status)
-		goto err_register_pernet_subsys;
+		goto err_register_ctrlsock;
 
 	status = sctp_v4_add_protocol();
 	if (status)
@@ -1481,12 +1495,14 @@ out:
 err_v6_add_protocol:
 	sctp_v4_del_protocol();
 err_add_protocol:
-	unregister_pernet_subsys(&sctp_net_ops);
-err_register_pernet_subsys:
+	unregister_pernet_subsys(&sctp_ctrlsock_ops);
+err_register_ctrlsock:
 	sctp_v6_protosw_exit();
 err_v6_protosw_init:
 	sctp_v4_protosw_exit();
 err_protosw_init:
+	unregister_pernet_subsys(&sctp_defaults_ops);
+err_register_defaults:
 	sctp_v4_pf_exit();
 	sctp_v6_pf_exit();
 	sctp_sysctl_unregister();
@@ -1519,12 +1535,14 @@ static __exit void sctp_exit(void)
 	sctp_v6_del_protocol();
 	sctp_v4_del_protocol();
 
-	unregister_pernet_subsys(&sctp_net_ops);
+	unregister_pernet_subsys(&sctp_ctrlsock_ops);
 
 	/* Free protosw registrations */
 	sctp_v6_protosw_exit();
 	sctp_v4_protosw_exit();
 
+	unregister_pernet_subsys(&sctp_defaults_ops);
+
 	/* Unregister with socket layer. */
 	sctp_v6_pf_exit();
 	sctp_v4_pf_exit();

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 18:35             ` Marcelo Ricardo Leitner
  2015-09-10 18:47               ` Marcelo Ricardo Leitner
@ 2015-09-10 19:14               ` Vlad Yasevich
  2015-09-10 19:42                 ` Marcelo Ricardo Leitner
  1 sibling, 1 reply; 18+ messages in thread
From: Vlad Yasevich @ 2015-09-10 19:14 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, Neil Horman, linux-sctp

On 09/10/2015 02:35 PM, Marcelo Ricardo Leitner wrote:
> On Thu, Sep 10, 2015 at 01:24:54PM -0300, Marcelo Ricardo Leitner wrote:
>> On Thu, Sep 10, 2015 at 11:50:06AM -0400, Vlad Yasevich wrote:
>>> On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote:
>>>> Em 10-09-2015 10:24, Vlad Yasevich escreveu:
>> ...
>>>>> Then you can order sctp_net_init() such that it happens first, then protosw registration
>>>>> happens, then control socket initialization happens, then inet protocol registration
>>>>> happens.
>>>>>
>>>>> This way, we are always guaranteed that by the time user calls socket(), protocol
>>>>> defaults are fully initialized.
>>>>
>>>> Okay, that works for module loading stage, but then how would we handle new netns's? We
>>>> have to create the control socket per netns and AFAICT sctp_net_init() is the only hook
>>>> called when a new netns is being created.
>>>>
>>>> Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability
>>>> to handle its errors by propagating through sctp_net_init() return value, not good.
>>>
>>> Here is kind of what I had in mind.  It's incomplete and completely untested (not even
>>> compiled), but good enough to describe the idea:
>> ...
>>
>> Ohh, ok now I get it, thanks. If having two pernet_subsys for a given
>> module is fine, that works for me. It's clearer and has no moment of
>> temporary failure.
>>
>> I can finish this patch if everybody agrees with it.
>>
>>>>>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
>>>>>> and, after initialization, it's never null again. Works like a lock/condition without
>>>>>> using an extra field.
>>>>>>
>>>>>
>>>>> I understand this a well.  What I don't particularly like is that we are re-using
>>>>> a list without really stating why it's now done this way.  Additionally, it's not really
>>>>> the last that happens so it's seems kind of hacky...  If we need to add new
>>>>> per-net initializers, we now need to make sure that the code is put in the right
>>>>> place.  I'd just really like to have a cleaner solution...
>>>>
>>>> Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not
>>>> clear enough. Or, as we are discussing on the other part of thread, we could make it block
>>>> and wait for the initialization, probably using some wait_queue. I'm still thinking on
>>>> something this way, likely something more below than sctp then.
>>>>
>>>
>>> I think if we don the above, the second process calling socket() will either find the
>>> the protosw or will try to load the module also.  I think either is ok after
>>> request_module returns we'll look at the protosw and will find find things.
>>
>> Seems so, yes. Nice.
> 
> I was testing with it, something is not good. I finished your patch and
> testing with a flooder like:
>  # for j in {1..5}; do for i in {1234..1280}; do \
>        sctp_darn -H 192.168.122.147 -P $j$i -l & done & done
> 
... snip...
> 
> It seems that request_module will not serialize it as we wanted and we
> would be putting unexpected pressure on it, yet it fixes the original
> issue.

So, wouldn't the same issue exist when running the above with DCCP sockets?

> Maybe we can place a semaphore at inet_create(), protecting the
> request_module()s so only one socket can do it at a time and, after it
> is released, whoever was blocked on it re-checks if the module isn't
> already loaded before attempting again. It makes the loading of
> different modules slower, though, but I'm not sure if that's really a
> problem. Not many modules are loaded at the same time like that. What do
> you think? 

I think this is a different issue.  The fact that we keep trying to probe
the same module is silly.  May be a per proto semaphore so that SCTP doesn't
block DCCP for example.

-vlad

> 
>   Marcelo
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net] sctp: fix race on protocol/netns initialization
  2015-09-10 19:14               ` Vlad Yasevich
@ 2015-09-10 19:42                 ` Marcelo Ricardo Leitner
  2015-09-10 20:31                   ` [PATCH net v2] " Marcelo Ricardo Leitner
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 19:42 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, Neil Horman, linux-sctp

Em 10-09-2015 16:14, Vlad Yasevich escreveu:
> On 09/10/2015 02:35 PM, Marcelo Ricardo Leitner wrote:
>> On Thu, Sep 10, 2015 at 01:24:54PM -0300, Marcelo Ricardo Leitner wrote:
>>> On Thu, Sep 10, 2015 at 11:50:06AM -0400, Vlad Yasevich wrote:
>>>> On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote:
>>>>> Em 10-09-2015 10:24, Vlad Yasevich escreveu:
>>> ...
>>>>>> Then you can order sctp_net_init() such that it happens first, then protosw registration
>>>>>> happens, then control socket initialization happens, then inet protocol registration
>>>>>> happens.
>>>>>>
>>>>>> This way, we are always guaranteed that by the time user calls socket(), protocol
>>>>>> defaults are fully initialized.
>>>>>
>>>>> Okay, that works for module loading stage, but then how would we handle new netns's? We
>>>>> have to create the control socket per netns and AFAICT sctp_net_init() is the only hook
>>>>> called when a new netns is being created.
>>>>>
>>>>> Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability
>>>>> to handle its errors by propagating through sctp_net_init() return value, not good.
>>>>
>>>> Here is kind of what I had in mind.  It's incomplete and completely untested (not even
>>>> compiled), but good enough to describe the idea:
>>> ...
>>>
>>> Ohh, ok now I get it, thanks. If having two pernet_subsys for a given
>>> module is fine, that works for me. It's clearer and has no moment of
>>> temporary failure.
>>>
>>> I can finish this patch if everybody agrees with it.
>>>
>>>>>>> I used the list pointer because that's null as that memory is entirely zeroed when alloced
>>>>>>> and, after initialization, it's never null again. Works like a lock/condition without
>>>>>>> using an extra field.
>>>>>>>
>>>>>>
>>>>>> I understand this a well.  What I don't particularly like is that we are re-using
>>>>>> a list without really stating why it's now done this way.  Additionally, it's not really
>>>>>> the last that happens so it's seems kind of hacky...  If we need to add new
>>>>>> per-net initializers, we now need to make sure that the code is put in the right
>>>>>> place.  I'd just really like to have a cleaner solution...
>>>>>
>>>>> Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not
>>>>> clear enough. Or, as we are discussing on the other part of thread, we could make it block
>>>>> and wait for the initialization, probably using some wait_queue. I'm still thinking on
>>>>> something this way, likely something more below than sctp then.
>>>>>
>>>>
>>>> I think if we don the above, the second process calling socket() will either find the
>>>> the protosw or will try to load the module also.  I think either is ok after
>>>> request_module returns we'll look at the protosw and will find find things.
>>>
>>> Seems so, yes. Nice.
>>
>> I was testing with it, something is not good. I finished your patch and
>> testing with a flooder like:
>>   # for j in {1..5}; do for i in {1234..1280}; do \
>>         sctp_darn -H 192.168.122.147 -P $j$i -l & done & done
>>
> ... snip...
>>
>> It seems that request_module will not serialize it as we wanted and we
>> would be putting unexpected pressure on it, yet it fixes the original
>> issue.
>
> So, wouldn't the same issue exist when running the above with DCCP sockets?

Pretty much, yes.

>> Maybe we can place a semaphore at inet_create(), protecting the
>> request_module()s so only one socket can do it at a time and, after it
>> is released, whoever was blocked on it re-checks if the module isn't
>> already loaded before attempting again. It makes the loading of
>> different modules slower, though, but I'm not sure if that's really a
>> problem. Not many modules are loaded at the same time like that. What do
>> you think?
>
> I think this is a different issue.  The fact that we keep trying to probe

Agreed. I'll post this one as v2 while we continue with the 
request_module part.

> the same module is silly.  May be a per proto semaphore so that SCTP doesn't
> block DCCP for example.

Can be, yes. It just has to be dynamic, otherwise we would have to have 
like 256 semaphores that are left unused for most of the system's 
lifetime. I'll see what I can do here.

Thanks,
Marcelo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net v2] sctp: fix race on protocol/netns initialization
  2015-09-10 19:42                 ` Marcelo Ricardo Leitner
@ 2015-09-10 20:31                   ` Marcelo Ricardo Leitner
  2015-09-11 22:00                     ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Marcelo Ricardo Leitner @ 2015-09-10 20:31 UTC (permalink / raw)
  To: netdev; +Cc: Vlad Yasevich, Neil Horman, linux-sctp, David Laight,
	David Miller

Consider sctp module is unloaded and is being requested because an user
is creating a sctp socket.

During initialization, sctp will add the new protocol type and then
initialize pernet subsys:

        status = sctp_v4_protosw_init();
        if (status)
                goto err_protosw_init;

        status = sctp_v6_protosw_init();
        if (status)
                goto err_v6_protosw_init;

        status = register_pernet_subsys(&sctp_net_ops);

The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
is possible for userspace to create SCTP sockets like if the module is
already fully loaded. If that happens, one of the possible effects is
that we will have readers for net->sctp.local_addr_list list earlier
than expected and sctp_net_init() does not take precautions while
dealing with that list, leading to a potential panic but not limited to
that, as sctp_sock_init() will copy a bunch of blank/partially
initialized values from net->sctp.

The race happens like this:

     CPU 0                           |  CPU 1
  socket()                           |
   __sock_create                     | socket()
    inet_create                      |  __sock_create
     list_for_each_entry_rcu(        |
        answer, &inetsw[sock->type], |
        list) {                      |   inet_create
      /* no hits */                  |
     if (unlikely(err)) {            |
      ...                            |
      request_module()               |
      /* socket creation is blocked  |
       * the module is fully loaded  |
       */                            |
       sctp_init                     |
        sctp_v4_protosw_init         |
         inet_register_protosw       |
          list_add_rcu(&p->list,     |
                       last_perm);   |
                                     |  list_for_each_entry_rcu(
                                     |     answer, &inetsw[sock->type],
        sctp_v6_protosw_init         |     list) {
                                     |     /* hit, so assumes protocol
                                     |      * is already loaded
                                     |      */
                                     |  /* socket creation continues
                                     |   * before netns is initialized
                                     |   */
        register_pernet_subsys       |

Simply inverting the initialization order between
register_pernet_subsys() and sctp_v4_protosw_init() is not possible
because register_pernet_subsys() will create a control sctp socket, so
the protocol must be already visible by then. Deferring the socket
creation to a work-queue is not good specially because we loose the
ability to handle its errors.

So, as suggested by Vlad, the fix is to split netns initialization in
two moments: defaults and control socket, so that the defaults are
already loaded by when we register the protocol, while control socket
initialization is kept at the same moment it is today.

Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---

Notes:
    As discussed, a supplementary fix for this one for serializing socket
    creation/module loading and avoiding failures when creating sockets
    while the module is still being loaded is welcomed but not required
    for this one and will be posted separately.

 net/sctp/protocol.c | 64 ++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 41 insertions(+), 23 deletions(-)

diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4345790ad3266c353eeac5398593c2a9ce4effda..8930046eaa1b9023c3d06ab7875a19c42b7d3a57 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1178,7 +1178,7 @@ static void sctp_v4_del_protocol(void)
 	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
 }
 
-static int __net_init sctp_net_init(struct net *net)
+static int __net_init sctp_defaults_init(struct net *net)
 {
 	int status;
 
@@ -1271,12 +1271,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	sctp_dbg_objcnt_init(net);
 
-	/* Initialize the control inode/socket for handling OOTB packets.  */
-	if ((status = sctp_ctl_sock_init(net))) {
-		pr_err("Failed to initialize the SCTP control sock\n");
-		goto err_ctl_sock_init;
-	}
-
 	/* Initialize the local address list. */
 	INIT_LIST_HEAD(&net->sctp.local_addr_list);
 	spin_lock_init(&net->sctp.local_addr_lock);
@@ -1292,9 +1286,6 @@ static int __net_init sctp_net_init(struct net *net)
 
 	return 0;
 
-err_ctl_sock_init:
-	sctp_dbg_objcnt_exit(net);
-	sctp_proc_exit(net);
 err_init_proc:
 	cleanup_sctp_mibs(net);
 err_init_mibs:
@@ -1303,15 +1294,12 @@ err_sysctl_register:
 	return status;
 }
 
-static void __net_exit sctp_net_exit(struct net *net)
+static void __net_exit sctp_defaults_exit(struct net *net)
 {
 	/* Free the local address list */
 	sctp_free_addr_wq(net);
 	sctp_free_local_addr_list(net);
 
-	/* Free the control endpoint.  */
-	inet_ctl_sock_destroy(net->sctp.ctl_sock);
-
 	sctp_dbg_objcnt_exit(net);
 
 	sctp_proc_exit(net);
@@ -1319,9 +1307,32 @@ static void __net_exit sctp_net_exit(struct net *net)
 	sctp_sysctl_net_unregister(net);
 }
 
-static struct pernet_operations sctp_net_ops = {
-	.init = sctp_net_init,
-	.exit = sctp_net_exit,
+static struct pernet_operations sctp_defaults_ops = {
+	.init = sctp_defaults_init,
+	.exit = sctp_defaults_exit,
+};
+
+static int __net_init sctp_ctrlsock_init(struct net *net)
+{
+	int status;
+
+	/* Initialize the control inode/socket for handling OOTB packets.  */
+	status = sctp_ctl_sock_init(net);
+	if (status)
+		pr_err("Failed to initialize the SCTP control sock\n");
+
+	return status;
+}
+
+static void __net_init sctp_ctrlsock_exit(struct net *net)
+{
+	/* Free the control endpoint.  */
+	inet_ctl_sock_destroy(net->sctp.ctl_sock);
+}
+
+static struct pernet_operations sctp_ctrlsock_ops = {
+	.init = sctp_ctrlsock_init,
+	.exit = sctp_ctrlsock_exit,
 };
 
 /* Initialize the universe into something sensible.  */
@@ -1454,8 +1465,11 @@ static __init int sctp_init(void)
 	sctp_v4_pf_init();
 	sctp_v6_pf_init();
 
-	status = sctp_v4_protosw_init();
+	status = register_pernet_subsys(&sctp_defaults_ops);
+	if (status)
+		goto err_register_defaults;
 
+	status = sctp_v4_protosw_init();
 	if (status)
 		goto err_protosw_init;
 
@@ -1463,9 +1477,9 @@ static __init int sctp_init(void)
 	if (status)
 		goto err_v6_protosw_init;
 
-	status = register_pernet_subsys(&sctp_net_ops);
+	status = register_pernet_subsys(&sctp_ctrlsock_ops);
 	if (status)
-		goto err_register_pernet_subsys;
+		goto err_register_ctrlsock;
 
 	status = sctp_v4_add_protocol();
 	if (status)
@@ -1481,12 +1495,14 @@ out:
 err_v6_add_protocol:
 	sctp_v4_del_protocol();
 err_add_protocol:
-	unregister_pernet_subsys(&sctp_net_ops);
-err_register_pernet_subsys:
+	unregister_pernet_subsys(&sctp_ctrlsock_ops);
+err_register_ctrlsock:
 	sctp_v6_protosw_exit();
 err_v6_protosw_init:
 	sctp_v4_protosw_exit();
 err_protosw_init:
+	unregister_pernet_subsys(&sctp_defaults_ops);
+err_register_defaults:
 	sctp_v4_pf_exit();
 	sctp_v6_pf_exit();
 	sctp_sysctl_unregister();
@@ -1519,12 +1535,14 @@ static __exit void sctp_exit(void)
 	sctp_v6_del_protocol();
 	sctp_v4_del_protocol();
 
-	unregister_pernet_subsys(&sctp_net_ops);
+	unregister_pernet_subsys(&sctp_ctrlsock_ops);
 
 	/* Free protosw registrations */
 	sctp_v6_protosw_exit();
 	sctp_v4_protosw_exit();
 
+	unregister_pernet_subsys(&sctp_defaults_ops);
+
 	/* Unregister with socket layer. */
 	sctp_v6_pf_exit();
 	sctp_v4_pf_exit();
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] sctp: fix race on protocol/netns initialization
  2015-09-10 20:31                   ` [PATCH net v2] " Marcelo Ricardo Leitner
@ 2015-09-11 22:00                     ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2015-09-11 22:00 UTC (permalink / raw)
  To: marcelo.leitner; +Cc: netdev, vyasevich, nhorman, linux-sctp, David.Laight

From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Thu, 10 Sep 2015 17:31:15 -0300

> Consider sctp module is unloaded and is being requested because an user
> is creating a sctp socket.
> 
> During initialization, sctp will add the new protocol type and then
> initialize pernet subsys:
 ...
> The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
> is possible for userspace to create SCTP sockets like if the module is
> already fully loaded. If that happens, one of the possible effects is
> that we will have readers for net->sctp.local_addr_list list earlier
> than expected and sctp_net_init() does not take precautions while
> dealing with that list, leading to a potential panic but not limited to
> that, as sctp_sock_init() will copy a bunch of blank/partially
> initialized values from net->sctp.
> 
> The race happens like this:
 ...
> Simply inverting the initialization order between
> register_pernet_subsys() and sctp_v4_protosw_init() is not possible
> because register_pernet_subsys() will create a control sctp socket, so
> the protocol must be already visible by then. Deferring the socket
> creation to a work-queue is not good specially because we loose the
> ability to handle its errors.
> 
> So, as suggested by Vlad, the fix is to split netns initialization in
> two moments: defaults and control socket, so that the defaults are
> already loaded by when we register the protocol, while control socket
> initialization is kept at the same moment it is today.
> 
> Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-09-11 22:00 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-09 20:03 [PATCH net] sctp: fix race on protocol/netns initialization Marcelo Ricardo Leitner
2015-09-09 20:30 ` Vlad Yasevich
2015-09-09 21:06   ` Marcelo Ricardo Leitner
2015-09-10 13:24     ` Vlad Yasevich
2015-09-10 14:22       ` Marcelo Ricardo Leitner
2015-09-10 15:50         ` Vlad Yasevich
2015-09-10 16:24           ` Marcelo Ricardo Leitner
2015-09-10 18:35             ` Marcelo Ricardo Leitner
2015-09-10 18:47               ` Marcelo Ricardo Leitner
2015-09-10 19:14               ` Vlad Yasevich
2015-09-10 19:42                 ` Marcelo Ricardo Leitner
2015-09-10 20:31                   ` [PATCH net v2] " Marcelo Ricardo Leitner
2015-09-11 22:00                     ` David Miller
2015-09-10  0:16 ` [PATCH net] " David Miller
2015-09-10 12:54   ` Marcelo Ricardo Leitner
2015-09-10 13:02     ` David Laight
2015-09-10 14:36       ` Marcelo Ricardo Leitner
2015-09-10 15:03         ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).