netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia.com>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Linux Kernel Network Developers" <netdev@vger.kernel.org>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"David Miller" <davem@davemloft.net>,
	"Jamal Hadi Salim" <jhs@mojatatu.com>,
	"Jiri Pirko" <jiri@resnulli.us>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>
Subject: Re: [PATCH RFC 2/4] net: sched: fix err handler in tcf_action_init()
Date: Sat, 3 Apr 2021 13:01:14 +0300	[thread overview]
Message-ID: <ygnhy2dzadqt.fsf@nvidia.com> (raw)
In-Reply-To: <CAM_iQpXRfHQ=Hzhon=ggjPJGjfS1CCkM6iV8oJ3iHZiTpnJFmw@mail.gmail.com>


On Sat 03 Apr 2021 at 02:14, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Mar 31, 2021 at 9:41 AM Vlad Buslov <vladbu@nvidia.com> wrote:
>>
>> With recent changes that separated action module load from action
>> initialization tcf_action_init() function error handling code was modified
>> to manually release the loaded modules if loading/initialization of any
>> further action in same batch failed. For the case when all modules
>> successfully loaded and some of the actions were initialized before one of
>> them failed in init handler. In this case for all previous actions the
>> module will be released twice by the error handler: First time by the loop
>> that manually calls module_put() for all ops, and second time by the action
>> destroy code that puts the module after destroying the action.
>
> This is really strange. Isn't tc_action_load_ops() paired with module_put()
> under 'err_mod'? And the one in tcf_action_destroy() paired with
> tcf_action_init_1()? Is it the one below which causes the imbalance?
>
> 1038         /* module count goes up only when brand new policy is created
> 1039          * if it exists and is only bound to in a_o->init() then
> 1040          * ACT_P_CREATED is not returned (a zero is).
> 1041          */
> 1042         if (err != ACT_P_CREATED)
> 1043                 module_put(a_o->owner);
> 1044

This problem is not related to action change reference counting
imbalance which is addressed in previous commit. The issue is that
function tcf_action_init_1() doesn't take another reference to module.
It expects caller to get the reference before calling init and "takes
over" the reference in case of success (e.g. action instance now owns
the reference which will be released when action instance is destroyed).

So, the following happens in reproduction provided in commit message
when executing "tc actions add action simple sdata \"1\" index 1
action simple sdata \"2\" index 2" command:

1. tcf_action_init() is called with batch of two actions of same type,
no module references are held, 'actions' array is empty:

act_simple refcnt balance = 0
actions[] = {}

2. tc_action_load_ops() is called for first action:

act_simple refcnt balance = +1
actions[] = {}

3. tc_action_load_ops() is called for second action:

act_simple refcnt balance = +2
actions[] = {}

4. tcf_action_init_1() called for first action, succeeds, action
instance is assigned to 'actions' array:

act_simple refcnt balance = +2
actions[] = { [0]=act1 }

5. tcf_action_init_1() fails for second action, 'actions' array not
changed, goto err:

act_simple refcnt balance = +2
actions[] = { [0]=act1 }

6. tcf_action_destroy() is called for 'actions' array, last reference to
first action is released, tcf_action_destroy_1() calls module_put() for
actions module:

act_simple refcnt balance = +1
actions[] = {}

7. err_mod loop starts iterating over ops array, executes module_put()
for first actions ops:

act_simple refcnt balance = 0
actions[] = {}

7. err_mod loop executes module_put() for second actions ops:

act_simple refcnt balance = -1
actions[] = {}


The goal of my fix is to not unconditionally release the module
reference for successfully initialized actions because this is already
handled by action destroy code. Hope this explanation clarifies things.

Regards,
Vlad

  reply	other threads:[~2021-04-03 10:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31 16:40 [PATCH RFC 0/4] Action initalization fixes Vlad Buslov
2021-03-31 16:40 ` [PATCH RFC 1/4] net: sched: fix action overwrite reference counting Vlad Buslov
2021-04-02 22:13   ` Cong Wang
2021-04-03  9:25     ` Vlad Buslov
2021-03-31 16:40 ` [PATCH RFC 2/4] net: sched: fix err handler in tcf_action_init() Vlad Buslov
2021-04-02 23:14   ` Cong Wang
2021-04-03 10:01     ` Vlad Buslov [this message]
2021-04-05 22:56       ` Cong Wang
2021-04-06 19:35         ` Vlad Buslov
2021-03-31 16:40 ` [PATCH RFC 3/4] tc-testing: add simple action test to verify batch add cleanup Vlad Buslov
2021-03-31 16:40 ` [PATCH RFC 4/4] tc-testing: add simple action test to verify batch change cleanup Vlad Buslov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ygnhy2dzadqt.fsf@nvidia.com \
    --to=vladbu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=toke@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).