From: Brenden Blanco <bblanco@plumgrid.com>
To: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Cc: Tom Herbert <tom@herbertland.com>,
Tariq Toukan <tariqt@mellanox.com>,
"David S. Miller" <davem@davemloft.net>,
Linux Kernel Network Developers <netdev@vger.kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Tariq Toukan <ttoukan.linux@gmail.com>,
Or Gerlitz <gerlitz.or@gmail.com>
Subject: Re: [PATCH] net/mlx4_en: protect ring->xdp_prog with rcu_read_lock
Date: Tue, 30 Aug 2016 18:50:59 -0700 [thread overview]
Message-ID: <20160831015058.GA30198@gmail.com> (raw)
In-Reply-To: <CALzJLG8WGA5+cbyMd2f8Y2Qpcx=b72r2hSZ7Vn7+N-o3BzBvZw@mail.gmail.com>
On Tue, Aug 30, 2016 at 12:35:58PM +0300, Saeed Mahameed wrote:
> On Mon, Aug 29, 2016 at 8:46 PM, Tom Herbert <tom@herbertland.com> wrote:
> > On Mon, Aug 29, 2016 at 8:55 AM, Brenden Blanco <bblanco@plumgrid.com> wrote:
> >> On Mon, Aug 29, 2016 at 05:59:26PM +0300, Tariq Toukan wrote:
> >>> Hi Brenden,
> >>>
> >>> The solution direction should be XDP specific that does not hurt the
> >>> regular flow.
> >> An rcu_read_lock is _already_ taken for _every_ packet. This is 1/64th of
>
> In other words "let's add new small speed bump, we already have
> plenty ahead, so why not slow down now anyway".
>
> Every single new instruction hurts performance, in this case maybe you
> are right, maybe we won't feel any performance
> impact, but that doesn't mean it is ok to do this.
Actually, I will make a stronger assertion. Unless your .config contains
CONFIG_PREEMPT=y (not most distros) or something like DEBUG_ATOMIC_SLEEP
(to trigger PREEMPT_COUNT), the code in this patch will be a nop.
Therefore, adding the protections that you mention below will be
_slower_ than the code already proposed.
>
>
> >> that.
> >>>
> >>> On 26/08/2016 11:38 PM, Brenden Blanco wrote:
> >>> >Depending on the preempt mode, the bpf_prog stored in xdp_prog may be
> >>> >freed despite the use of call_rcu inside bpf_prog_put. The situation is
> >>> >possible when running in PREEMPT_RCU=y mode, for instance, since the rcu
> >>> >callback for destroying the bpf prog can run even during the bh handling
> >>> >in the mlx4 rx path.
> >>> >
> >>> >Several options were considered before this patch was settled on:
> >>> >
> >>> >Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all
> >>> >of the rings are updated with the new program.
> >>> >This approach has the disadvantage that as the number of rings
> >>> >increases, the speed of udpate will slow down significantly due to
> >>> >napi_synchronize's msleep(1).
> >>> I prefer this option as it doesn't hurt the data path. A delay in a
> >>> control command can be tolerated.
> >>> >Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh.
> >>> >The action of the bpf_prog_put_bh would be to then call bpf_prog_put
> >>> >later. Those drivers that consume a bpf prog in a bh context (like mlx4)
> >>> >would then use the bpf_prog_put_bh instead when the ring is up. This has
> >>> >the problem of complexity, in maintaining proper refcnts and rcu lists,
> >>> >and would likely be harder to review. In addition, this approach to
> >>> >freeing must be exclusive with other frees of the bpf prog, for instance
> >>> >a _bh prog must not be referenced from a prog array that is consumed by
> >>> >a non-_bh prog.
> >>> >
> >>> >The placement of rcu_read_lock in this patch is functionally the same as
> >>> >putting an rcu_read_lock in napi_poll. Actually doing so could be a
> >>> >potentially controversial change, but would bring the implementation in
> >>> >line with sk_busy_loop (though of course the nature of those two paths
> >>> >is substantially different), and would also avoid future copy/paste
> >>> >problems with future supporters of XDP. Still, this patch does not take
> >>> >that opinionated option.
> >>> So you decided to add a lock for all non-XDP flows, which are 99% of
> >>> the cases.
> >>> We should avoid this.
> >> The whole point of rcu_read_lock architecture is to be taken in the fast
> >> path. There won't be a performance impact from this patch.
> >
> > +1, this is nothing at all like a spinlock and really this should be
> > just like any other rcu like access.
> >
> > Brenden, tracking down how the structure is freed needed a few steps,
> > please make sure the RCU requirements are well documented. Also, I'm
> > still not a fan of using xchg to set the program, seems that a lock
> > could be used in that path.
> >
> > Thanks,
> > Tom
>
> Sorry folks I am with Tariq on this, you can't just add a single
> instruction which is only valid/needed for 1% of the use cases
> to the driver's general data path, even if it was as cheap as one cpu cycle!
How about 0?
$ diff mlx4_en.ko.norcu.s mlx4_en.ko.rcu.s | wc -l
0
>
> Let me try to suggest something:
> instead of taking the rcu_read_lock for the whole
> mlx4_en_process_rx_cq, we can minimize to XDP code path only
> by double checking xdp_prog (non-protected check followed by a
> protected check inside mlx4 XDP critical path).
>
> i.e instead of:
>
> rcu_read_lock();
>
> xdp_prog = ring->xdp_prog;
>
> //__Do lots of non-XDP related stuff__
>
> if (xdp_prog) {
> //Do XDP magic ..
> }
> //__Do more of non-XDP related stuff__
>
> rcu_read_unlock();
>
>
> We can minimize it to XDP critical path only:
>
> //Non protected xdp_prog dereference.
> if (xdp_prog) {
> rcu_read_lock();
> //Protected dereference to ring->xdp_prog
> xdp_prog = ring->xdp_prog;
> if(unlikely(!xdp_prg)) goto unlock;
The addition of this branch and extra deref is now slowing down the xdp
path compared to the current proposal.
> //Do XDP magic ..
>
> unlock:
> rcu_read_unlock();
> }
next prev parent reply other threads:[~2016-08-31 1:51 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-26 20:38 [PATCH] net/mlx4_en: protect ring->xdp_prog with rcu_read_lock Brenden Blanco
2016-08-26 21:01 ` Brenden Blanco
2016-08-29 14:59 ` Tariq Toukan
2016-08-29 15:55 ` Brenden Blanco
2016-08-29 17:46 ` Tom Herbert
2016-08-30 9:35 ` Saeed Mahameed
2016-08-31 1:50 ` Brenden Blanco [this message]
2016-09-01 22:59 ` Saeed Mahameed
2016-09-01 23:30 ` Tom Herbert
2016-09-02 17:50 ` Brenden Blanco
2016-09-02 18:01 ` Brenden Blanco
2016-09-02 18:13 ` Brenden Blanco
2016-09-02 19:14 ` Tom Herbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160831015058.GA30198@gmail.com \
--to=bblanco@plumgrid.com \
--cc=alexei.starovoitov@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=gerlitz.or@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=saeedm@dev.mellanox.co.il \
--cc=tariqt@mellanox.com \
--cc=tom@herbertland.com \
--cc=ttoukan.linux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).