All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Tranchetti <stranche@codeaurora.org>
To: davem@davemloft.net, netdev@vger.kernel.org
Cc: Sean Tranchetti <stranche@codeaurora.org>,
	Pravin B Shelar <pshelar@ovn.org>,
	Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Subject: [PATCH net] genetlink: take netlink table lock when (un)registering
Date: Fri, 26 Jun 2020 18:31:03 -0600	[thread overview]
Message-ID: <1593217863-2964-1-git-send-email-stranche@codeaurora.org> (raw)

A potential deadlock can occur during registering or unregistering a new
generic netlink family between the main nl_table_lock and the cb_lock where
each thread wants the lock held by the other, as demonstrated below.

1) Thread 1 is performing a netlink_bind() operation on a socket. As part
   of this call, it will call netlink_lock_table(), incrementing the
   nl_table_users count to 1.
2) Thread 2 is registering (or unregistering) a genl_family via the
   genl_(un)register_family() API. The cb_lock semaphore will be taken for
   writing.
3) Thread 1 will call genl_bind() as part of the bind operation to handle
   subscribing to GENL multicast groups at the request of the user. It will
   attempt to take the cb_lock semaphore for reading, but it will fail and
   be scheduled away, waiting for Thread 2 to finish the write.
4) Thread 2 will call netlink_table_grab() during the (un)registration
   call. However, as Thread 1 has incremented nl_table_users, it will not
   be able to proceed, and both threads will be stuck waiting for the
   other.

To avoid this scenario, the locks should be acquired in the same order by
both threads. Since both the register and unregister functions need to take
the nl_table_lock in their processing, it makes sense to explicitly acquire
them before they lock the genl_mutex and the cb_lock. In unregistering, no
other change is needed aside from this locking change.

Registering a family requires more ancilary operations, such as memory
allocation. Unfortunately, much of this allocation must be performed inside
of the genl locks to ensure internal synchronization, so they must also be
performed inside of the nl_table_lock where sleeping is not allowed. As a
result, the allocation types must be changed to GFP_ATOMIC.

Fixes: def3117493ea ("genl: Allow concurrent genl callbacks")
Cc: Pravin B Shelar <pshelar@ovn.org>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
---
 net/netlink/genetlink.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 55ee680..79e3b1b 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -155,14 +155,14 @@ static int genl_allocate_reserve_groups(int n_groups, int *first_id)
 			size_t nlen = new_longs * sizeof(unsigned long);
 
 			if (mc_groups == &mc_group_start) {
-				new_groups = kzalloc(nlen, GFP_KERNEL);
+				new_groups = kzalloc(nlen, GFP_ATOMIC);
 				if (!new_groups)
 					return -ENOMEM;
 				mc_groups = new_groups;
 				*mc_groups = mc_group_start;
 			} else {
 				new_groups = krealloc(mc_groups, nlen,
-						      GFP_KERNEL);
+						      GFP_ATOMIC);
 				if (!new_groups)
 					return -ENOMEM;
 				mc_groups = new_groups;
@@ -229,7 +229,6 @@ static int genl_validate_assign_mc_groups(struct genl_family *family)
 	if (family->netnsok) {
 		struct net *net;
 
-		netlink_table_grab();
 		rcu_read_lock();
 		for_each_net_rcu(net) {
 			err = __netlink_change_ngroups(net->genl_sock,
@@ -245,10 +244,9 @@ static int genl_validate_assign_mc_groups(struct genl_family *family)
 			}
 		}
 		rcu_read_unlock();
-		netlink_table_ungrab();
 	} else {
-		err = netlink_change_ngroups(init_net.genl_sock,
-					     mc_groups_longs * BITS_PER_LONG);
+		err = __netlink_change_ngroups(init_net.genl_sock,
+					       mc_groups_longs * BITS_PER_LONG);
 	}
 
 	if (groups_allocated && err) {
@@ -264,7 +262,6 @@ static void genl_unregister_mc_groups(const struct genl_family *family)
 	struct net *net;
 	int i;
 
-	netlink_table_grab();
 	rcu_read_lock();
 	for_each_net_rcu(net) {
 		for (i = 0; i < family->n_mcgrps; i++)
@@ -328,6 +325,10 @@ int genl_register_family(struct genl_family *family)
 	if (err)
 		return err;
 
+	/* Acquire netlink table lock before any GENL-specific locks to ensure
+	 * sync with any netlink operations making calls into the GENL code.
+	 */
+	netlink_table_grab();
 	genl_lock_all();
 
 	if (genl_family_find_byname(family->name)) {
@@ -354,7 +355,7 @@ int genl_register_family(struct genl_family *family)
 	if (family->maxattr && !family->parallel_ops) {
 		family->attrbuf = kmalloc_array(family->maxattr + 1,
 						sizeof(struct nlattr *),
-						GFP_KERNEL);
+						GFP_ATOMIC);
 		if (family->attrbuf == NULL) {
 			err = -ENOMEM;
 			goto errout_locked;
@@ -363,7 +364,7 @@ int genl_register_family(struct genl_family *family)
 		family->attrbuf = NULL;
 
 	family->id = idr_alloc_cyclic(&genl_fam_idr, family,
-				      start, end + 1, GFP_KERNEL);
+				      start, end + 1, GFP_ATOMIC);
 	if (family->id < 0) {
 		err = family->id;
 		goto errout_free;
@@ -374,6 +375,7 @@ int genl_register_family(struct genl_family *family)
 		goto errout_remove;
 
 	genl_unlock_all();
+	netlink_table_ungrab();
 
 	/* send all events */
 	genl_ctrl_event(CTRL_CMD_NEWFAMILY, family, NULL, 0);
@@ -389,6 +391,7 @@ int genl_register_family(struct genl_family *family)
 	kfree(family->attrbuf);
 errout_locked:
 	genl_unlock_all();
+	netlink_table_ungrab();
 	return err;
 }
 EXPORT_SYMBOL(genl_register_family);
@@ -403,13 +406,21 @@ int genl_register_family(struct genl_family *family)
  */
 int genl_unregister_family(const struct genl_family *family)
 {
+	/* Acquire netlink table lock before any GENL-specific locks to ensure
+	 * sync with any netlink operations making calls into the GENL code.
+	 */
+	netlink_table_grab();
 	genl_lock_all();
 
 	if (!genl_family_find_byid(family->id)) {
 		genl_unlock_all();
+		netlink_table_ungrab();
 		return -ENOENT;
 	}
 
+	/* Netlink table lock will be removed by this function. No other
+	 * functions that require it should be placed after this point.
+	 */
 	genl_unregister_mc_groups(family);
 
 	idr_remove(&genl_fam_idr, family->id);
-- 
1.9.1


             reply	other threads:[~2020-06-27  0:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-27  0:31 Sean Tranchetti [this message]
2020-06-27  7:58 ` [genetlink] 9eb5d9390b: BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/rwsem.c kernel test robot
2020-06-27  7:58   ` kernel test robot
2020-06-27 18:55 ` [PATCH net] genetlink: take netlink table lock when (un)registering Cong Wang
2020-06-29 20:18   ` stranche
2020-06-30  7:00     ` Johannes Berg
2020-06-28  0:26 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1593217863-2964-1-git-send-email-stranche@codeaurora.org \
    --to=stranche@codeaurora.org \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=pshelar@ovn.org \
    --cc=subashab@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.