All of lore.kernel.org
 help / color / mirror / Atom feed
From: Krister Johansen <kjlx@templeofstupid.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>,
	Jamal Hadi Salim <jhs@mojatatu.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.
Date: Mon, 3 Oct 2016 23:39:08 -0700	[thread overview]
Message-ID: <20161004063908.GB2638@templeofstupid.com> (raw)
In-Reply-To: <CAM_iQpX1a1=3PmMAwMRDXpL3qjAskQ990FdxgyCg4T5p4=8G7Q@mail.gmail.com>

Hi Cong,

Thanks for the feedback.

On Mon, Oct 03, 2016 at 11:22:33AM -0700, Cong Wang wrote:
> On Sat, Oct 1, 2016 at 8:13 PM, Krister Johansen
> <kjlx@templeofstupid.com> wrote:
> > A tc_action_ops structure is visibile as soon as it is placed in the
> > act_base list.  When tcf_regsiter_action adds an item to this list and
> > drops act_mod_lock, registration is not complete until
> > register_pernet_subsys() finishes.
> 
> Hmm, good catch, but does the fix have to be so complicated?

There were two reasons that the patch I submitted was more complicated
than your proposal.  The first is simply my own lack of knowledge.  I
didn't see many other net subsystems that held locks across the call to
register_pernet_subsys().  I avoided doing so out of caution / paranoia.

The other reason for blocking if a register_pernet_subsys() was already
pending is the behavior of this code when the lookup fails.  The code in
tcf_action_init_1() calls request_module() when tc_lookup_action_n()
fails.  In the cases that I observed, this could lead to hundreds
modprobe processes running for essentially the same few modules.  Only
one of these calls will succeed.

Since the call to request_module() will sleep until the modprobe process
exits, it didn't seem unreasonable to block other threads in the same
code path.  Instead of blocking on a redundant modprobe call, it blocks
pending the completion of a modprobe that's already in progress.

I admit that the patch I submitted didn't close this window entirely,
but in the tests that I ran I was able to see the number of concurrent
modprobe processes go from dozens down to just a few.

> How about moving register_pernet_subsys() under act_mod_lock?
> Similar is needed for unregister too of course. This also means
> we need to convert act_mod_lock to a mutex which allows blocking.
> Fortunately, we don't have to take act_mod_lock in any atomic context.

If it's permissible to hold act_mod_lock across the call to
register_pernet_subsys, then perhaps this could instead be simplified to
use mutex_lock_interruptible() instead of RCU locking.  The blocking
lock would prevent other operations from triggering a modprobe until
the outstanding load completes.  However, the downside is that any
request_module() would block all other lookups.  My attempt to get
around that problem was to record on the action ops whether a pernet
operation was in progress.

> Please try the attached patch. I also convert the read path to RCU
> to avoid a possible deadlock. A quick test shows no lockdep splat.

I'll give it a try, but it may take me a few days to report back with
results.  In the meantime, let's try to reach consensus on an acceptable
solution.

Thanks again,

-K

  reply	other threads:[~2016-10-04  6:39 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-02  3:13 [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action Krister Johansen
2016-10-03  1:18 ` Jamal Hadi Salim
2016-10-04  6:38   ` Krister Johansen
2016-10-03 18:22 ` Cong Wang
2016-10-04  6:39   ` Krister Johansen [this message]
2016-10-05  6:52   ` Krister Johansen
2016-10-05 18:01     ` Cong Wang
2016-10-05 18:07       ` Cong Wang
2016-10-06  6:11       ` Krister Johansen
2016-10-06 19:01         ` Cong Wang
2016-10-09  6:13           ` Krister Johansen
2016-10-11 17:36             ` Cong Wang
2016-10-11  9:28       ` Krister Johansen
2016-10-11 17:51         ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161004063908.GB2638@templeofstupid.com \
    --to=kjlx@templeofstupid.com \
    --cc=jhs@mojatatu.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.