netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	jakub.kicinski@netronome.com, xiyou.wangcong@gmail.com,
	jiri@resnulli.us, netdev@vger.kernel.org
Subject: Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds
Date: Wed, 3 Jan 2018 20:34:18 +0200	[thread overview]
Message-ID: <20180103195231-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <127b09e0-2acb-f220-d525-e61ad34da794@gmail.com>

On Wed, Jan 03, 2018 at 09:46:15AM -0800, John Fastabend wrote:
> On 01/03/2018 07:50 AM, Michael S. Tsirkin wrote:
> > On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote:
> >>>
> >>> More generally, what makes this usage safe?
> >>> Is there a way to formalize it at the API level?
> >>>
> >>
> >> Right I think these are good questions. I think the ptr_ring API should
> >> allow a peek operation to be used without a lock. The user has to ensure
> >> they only use it as a hint and if its dereferenced the user needs to
> >> ensure the object is not free'd from some other codepath while it is
> >> being dereferenced. The existing API seems to match this.
> >>
> >> This is how I used it in pfifo_fast expecting the above to be true. The
> >> API allows for false negatives which _should_ be OK if the user is expecting
> >> this. Alternatively, we could make it false positives if you want and
> >> that would also work for me considering this case is hit very rarely.
> > 
> > By now I'm confused by which are positives and which are negatives :)
> > OK so the guarantees we want would be:
> > 
> > - empty can return false if ring is empty.
> >   caller must re-check with consumer lock taken
> 
> Correct.
> 
> > - if multiple threads call it, only one thread
> 
> What is "it" here, __skb_array_empty() presumably.
> 
> >   is guaranteed that empty will not return true
> >   if ring is non empty.
> >   after detecting that ring is not empty,
> >   this thread shall ....
> > 
> The guarantee is even weaker.
> 
>   - if __skb_array_empty() returns true, either the
>     ring is empty OR a consumer operation is in
>     progress.

Well I'm not sure that's guaranteed actually, in that
in progress isn't all that well defined on multi-core
systems.

E.g. write of NULL could bypass index write and
empty will return true on another CPU even though
both completed on our CPU.

I still think our use is ok, I'm just still having
trouble formulating the exact rules.

> 
> For pfifo_fast this is OK because some thread is making
> progress at this point.
>
>
> Note, even if multiple threads
> are calling __skb_array_empty() there is no guarantee
> any of them will return false even if the ring has
> elements.
>
> > can you help me fill in the blank please?
> > 
> 
> Did that help?
> 
> > 
> > 
> > 
> > 
> >>>>> John, others, could you pls confirm it's not too bad performance-wise?
> >>>>> I'll then split it up properly and re-post.
> >>>>>
> >>>>
> >>>> I haven't benchmarked it but in the dequeue case taking a lock for
> >>>> every priority even when its empty seems unneeded.
> >>>
> >>> Well it does seem to make the code more comprehensible and more robust.
> >>>
> >>
> >> Its a trade-off between performance and robustness.
> >>
> >>> But OK -  I was looking at fixing the unlocked empty API to make sure it
> >>> actually does what it's supposed to. I posted a draft earlier in this
> >>> thread, it needs to be looked at in depth to figure out whether it can
> >>> ever give false negatives or positives, and document the results.
> >>>
> >>>
> >>
> >> I'll look at it. But I still think keeping a lockless version makes sense
> >> for many use cases.
> > 
> > Fine. Just let's try to document what are the guarantees.
> > 
> 
> Yep. Guarantees should be (on ptr_ring implementation and similar
> on skb_array implementation)
> 
>   - if __ptr_ring_empty returns true, the ring may be empty
>     user must use operation with consumer lock to guarantee the
>     ring is empty. Note, when run with concurrent consumer operations
>     it is possible to return true when ring has elements. This
>     occurs when ring consumer head rolls over ring size. Also,
>     producer operations may put object in the ring concurrently
>     making empty check incorrect. To avoid this use locked
>     ptr_ring_empty() API.
> 
>   - if __ptr_ring_empty returns false, the ring may have elements.
>     Possibly, scenario is consumer operation and __ptr_ring_empty
>     operation run concurrently causing false to be returned when
>     ring is empty. To avoid this use locked ptr_ring_empty() API.
> 
> >>>
> >>>>> -->
> >>>>>
> >>>>> net: don't misuse ptr_ring_peek
> >>>>>
> >>>>> ptr_ring_peek only claims to be safe if the result is never
> >>>>> dereferenced, which isn't the case for its use in sch_generic.
> >>>>> Add locked API variants and use the bh one here.
> >>>>>
> >>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>
> >>>>> ---
> >>>>>
> >>>>
> >>>> [...]
> >>>>
> >>>>> --- a/net/sched/sch_generic.c
> >>>>> +++ b/net/sched/sch_generic.c
> >>>>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
> >>>>>  	for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) {
> >>>>>  		struct skb_array *q = band2list(priv, band);
> >>>>>  
> >>>>> -		skb = __skb_array_peek(q);
> >>>>> +		skb = skb_array_peek_bh(q);
> >>>>
> >>>> Ah I should have added a comment here. For now peek() is only used from
> >>>> locking qdiscs. So peek and consume/produce operations will never happen
> >>>> in parallel. In this case we should never hit the false negative case with
> >>>> my patch or the out of bounds reference without my patch.
> > 
> > OK so this is the part I missed. Can you add a comment please?
> > 
> > 
> 
> Yes, working on a documentation patch set to address misleading and missing
> comments/documentation in qdisc layer now.
> 
> >>>> Doing a peek() op without qdisc lock is a bit problematic anyways. With
> >>>> current code another cpu could consume the skb and free it. Either we
> >>>> can ensure a single consumer runs at a time on an array (not the same as
> >>>> qdisc maybe) or just avoid peek operations in this case. My current plan
> >>>> was to just avoid peek() ops altogether, they seem unnecessary with the
> >>>> types of qdiscs I want to be build.
> >>>>
> >>>> Thanks,
> >>>> John
> >>>
> >>> For the lockless qdisc, for net-next, do we need the patch above until you fix that though?
> >>>
> >>
> >> No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is
> >> applied we do not need any additional fixes in net-next. Future work will
> >> require the above patch (the one you provided) though so its useful work.
> > 
> > The one that avoids allocating extra memory?
> > 
> 
> Let me try summarize again.
> 
> We need the extra memory slot patch if we expose the
> __ptr_ring_empty/__skb_array_empty API. Otherwise we hit the out
> of bounds issues. And for qdisc layer the __skb_array_empty() API
> call is useful so we should not remove it.
> 
> In addition to the qdisc layer using the __skb_array_empty() API if
> we need a peek() API that works without the qdisc lock then your patch,
> the one adding the locked peek API, will be needed. It is not needed
> though until we have a qdisc that would use it without the lock.
> 
> In summary we keep the extra allocation to make the __skb_array_empty() API
> usable. And hold off exposing the skb_array_peek()/ptr_ring_peek() until we
> have a user for it. We can roll your proposed patch into any series we come
> up with for a new/updated qdisc.
> 
> > 
> >> I'll do another review of the false positive case though to be sure the
> >> current code is OK wrt handling false positives and any potential stalls.
> > 
> > 
> > Thanks!
> > 
> >>>>>  	}
> >>>>>  
> >>>>>  	return skb;
> >>>>>

  reply	other threads:[~2018-01-03 18:34 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-28  3:50 [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds John Fastabend
2018-01-02 16:52 ` David Miller
2018-01-02 17:01   ` Michael S. Tsirkin
2018-01-02 17:17     ` Michael S. Tsirkin
2018-01-02 21:27       ` John Fastabend
2018-01-02 23:12         ` Michael S. Tsirkin
2018-01-03  0:25           ` John Fastabend
2018-01-03 15:50             ` Michael S. Tsirkin
2018-01-03 17:46               ` John Fastabend
2018-01-03 18:34                 ` Michael S. Tsirkin [this message]
2018-01-02 18:33     ` David Miller
2018-01-02 16:53 ` Michael S. Tsirkin
2018-01-02 17:01   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180103195231-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=jakub.kicinski@netronome.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).