From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B897BC06510 for ; Tue, 2 Jul 2019 19:45:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8796021850 for ; Tue, 2 Jul 2019 19:45:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726824AbfGBTp1 (ORCPT ); Tue, 2 Jul 2019 15:45:27 -0400 Received: from Chamillionaire.breakpoint.cc ([193.142.43.52]:44660 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726635AbfGBTp1 (ORCPT ); Tue, 2 Jul 2019 15:45:27 -0400 Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.89) (envelope-from ) id 1hiOiS-0007EE-OH; Tue, 02 Jul 2019 21:45:24 +0200 From: Florian Westphal To: Cc: Florian Westphal , Thomas Jarosch , Juliana Rodrigueiro Subject: [PATCH nf] netfilter: nfnetlink: avoid deadlock due to synchronous request_module Date: Tue, 2 Jul 2019 21:41:40 +0200 Message-Id: <20190702194140.23764-1-fw@strlen.de> X-Mailer: git-send-email 2.21.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org Thomas and Juliana report a deadlock when running: (rmmod nf_conntrack_netlink/xfrm_user) conntrack -e NEW -E & modprobe -v xfrm_user They provided following analysis: conntrack -e NEW -E netlink_bind() netlink_lock_table() -> increases "nl_table_users" nfnetlink_bind() # does not unlock the table as it's locked by netlink_bind() __request_module() call_usermodehelper_exec() This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind() won't return until modprobe process is done. "modprobe xfrm_user": xfrm_user_init() register_pernet_subsys() -> grab pernet_ops_rwsem .. netlink_table_grab() calls schedule() as "nl_table_users" is non-zero so modprobe is blocked because netlink_bind() increased nl_table_users while also holding pernet_ops_rwsem. "modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink: ctnetlink_init() register_pernet_subsys() -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module both modprobe processes wait on one another -- neither can make progress. Switch netlink_bind() to "nowait" modprobe -- this releases the netlink table lock, which then allows both modprobe instances to complete. Reported-by: Thomas Jarosch Reported-by: Juliana Rodrigueiro Signed-off-by: Florian Westphal --- net/netfilter/nfnetlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 92077d459109..4abbb452cf6c 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -578,7 +578,7 @@ static int nfnetlink_bind(struct net *net, int group) ss = nfnetlink_get_subsys(type << 8); rcu_read_unlock(); if (!ss) - request_module("nfnetlink-subsys-%d", type); + request_module_nowait("nfnetlink-subsys-%d", type); return 0; } #endif -- 2.21.0