From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [net-next PATCH] net: ptr_ring: otherwise safe empty checks can overrun array bounds Date: Wed, 3 Jan 2018 20:34:18 +0200 Message-ID: <20180103195231-mutt-send-email-mst@kernel.org> References: <20171228035024.14699.69453.stgit@john-Precision-Tower-5810> <20180102.115219.1101472320429215260.davem@davemloft.net> <20180102190107-mutt-send-email-mst@kernel.org> <20180102191329-mutt-send-email-mst@kernel.org> <149cba1c-e51a-e3fa-8105-34b95ad2e582@gmail.com> <20180103004619-mutt-send-email-mst@kernel.org> <457d384f-5cd1-d52d-f6e9-4c9ebb1f8d09@gmail.com> <20180103165249-mutt-send-email-mst@kernel.org> <127b09e0-2acb-f220-d525-e61ad34da794@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: David Miller , jakub.kicinski@netronome.com, xiyou.wangcong@gmail.com, jiri@resnulli.us, netdev@vger.kernel.org To: John Fastabend Return-path: Received: from mx1.redhat.com ([209.132.183.28]:37268 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751134AbeACSeU (ORCPT ); Wed, 3 Jan 2018 13:34:20 -0500 Content-Disposition: inline In-Reply-To: <127b09e0-2acb-f220-d525-e61ad34da794@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Jan 03, 2018 at 09:46:15AM -0800, John Fastabend wrote: > On 01/03/2018 07:50 AM, Michael S. Tsirkin wrote: > > On Tue, Jan 02, 2018 at 04:25:03PM -0800, John Fastabend wrote: > >>> > >>> More generally, what makes this usage safe? > >>> Is there a way to formalize it at the API level? > >>> > >> > >> Right I think these are good questions. I think the ptr_ring API should > >> allow a peek operation to be used without a lock. The user has to ensure > >> they only use it as a hint and if its dereferenced the user needs to > >> ensure the object is not free'd from some other codepath while it is > >> being dereferenced. The existing API seems to match this. > >> > >> This is how I used it in pfifo_fast expecting the above to be true. The > >> API allows for false negatives which _should_ be OK if the user is expecting > >> this. Alternatively, we could make it false positives if you want and > >> that would also work for me considering this case is hit very rarely. > > > > By now I'm confused by which are positives and which are negatives :) > > OK so the guarantees we want would be: > > > > - empty can return false if ring is empty. > > caller must re-check with consumer lock taken > > Correct. > > > - if multiple threads call it, only one thread > > What is "it" here, __skb_array_empty() presumably. > > > is guaranteed that empty will not return true > > if ring is non empty. > > after detecting that ring is not empty, > > this thread shall .... > > > The guarantee is even weaker. > > - if __skb_array_empty() returns true, either the > ring is empty OR a consumer operation is in > progress. Well I'm not sure that's guaranteed actually, in that in progress isn't all that well defined on multi-core systems. E.g. write of NULL could bypass index write and empty will return true on another CPU even though both completed on our CPU. I still think our use is ok, I'm just still having trouble formulating the exact rules. > > For pfifo_fast this is OK because some thread is making > progress at this point. > > > Note, even if multiple threads > are calling __skb_array_empty() there is no guarantee > any of them will return false even if the ring has > elements. > > > can you help me fill in the blank please? > > > > Did that help? > > > > > > > > > > >>>>> John, others, could you pls confirm it's not too bad performance-wise? > >>>>> I'll then split it up properly and re-post. > >>>>> > >>>> > >>>> I haven't benchmarked it but in the dequeue case taking a lock for > >>>> every priority even when its empty seems unneeded. > >>> > >>> Well it does seem to make the code more comprehensible and more robust. > >>> > >> > >> Its a trade-off between performance and robustness. > >> > >>> But OK - I was looking at fixing the unlocked empty API to make sure it > >>> actually does what it's supposed to. I posted a draft earlier in this > >>> thread, it needs to be looked at in depth to figure out whether it can > >>> ever give false negatives or positives, and document the results. > >>> > >>> > >> > >> I'll look at it. But I still think keeping a lockless version makes sense > >> for many use cases. > > > > Fine. Just let's try to document what are the guarantees. > > > > Yep. Guarantees should be (on ptr_ring implementation and similar > on skb_array implementation) > > - if __ptr_ring_empty returns true, the ring may be empty > user must use operation with consumer lock to guarantee the > ring is empty. Note, when run with concurrent consumer operations > it is possible to return true when ring has elements. This > occurs when ring consumer head rolls over ring size. Also, > producer operations may put object in the ring concurrently > making empty check incorrect. To avoid this use locked > ptr_ring_empty() API. > > - if __ptr_ring_empty returns false, the ring may have elements. > Possibly, scenario is consumer operation and __ptr_ring_empty > operation run concurrently causing false to be returned when > ring is empty. To avoid this use locked ptr_ring_empty() API. > > >>> > >>>>> --> > >>>>> > >>>>> net: don't misuse ptr_ring_peek > >>>>> > >>>>> ptr_ring_peek only claims to be safe if the result is never > >>>>> dereferenced, which isn't the case for its use in sch_generic. > >>>>> Add locked API variants and use the bh one here. > >>>>> > >>>>> Signed-off-by: Michael S. Tsirkin > >>>>> > >>>>> --- > >>>>> > >>>> > >>>> [...] > >>>> > >>>>> --- a/net/sched/sch_generic.c > >>>>> +++ b/net/sched/sch_generic.c > >>>>> @@ -659,7 +659,7 @@ static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc) > >>>>> for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) { > >>>>> struct skb_array *q = band2list(priv, band); > >>>>> > >>>>> - skb = __skb_array_peek(q); > >>>>> + skb = skb_array_peek_bh(q); > >>>> > >>>> Ah I should have added a comment here. For now peek() is only used from > >>>> locking qdiscs. So peek and consume/produce operations will never happen > >>>> in parallel. In this case we should never hit the false negative case with > >>>> my patch or the out of bounds reference without my patch. > > > > OK so this is the part I missed. Can you add a comment please? > > > > > > Yes, working on a documentation patch set to address misleading and missing > comments/documentation in qdisc layer now. > > >>>> Doing a peek() op without qdisc lock is a bit problematic anyways. With > >>>> current code another cpu could consume the skb and free it. Either we > >>>> can ensure a single consumer runs at a time on an array (not the same as > >>>> qdisc maybe) or just avoid peek operations in this case. My current plan > >>>> was to just avoid peek() ops altogether, they seem unnecessary with the > >>>> types of qdiscs I want to be build. > >>>> > >>>> Thanks, > >>>> John > >>> > >>> For the lockless qdisc, for net-next, do we need the patch above until you fix that though? > >>> > >> > >> No, I think after this patch (net: ptr_ring: otherwise safe empty checks...) is > >> applied we do not need any additional fixes in net-next. Future work will > >> require the above patch (the one you provided) though so its useful work. > > > > The one that avoids allocating extra memory? > > > > Let me try summarize again. > > We need the extra memory slot patch if we expose the > __ptr_ring_empty/__skb_array_empty API. Otherwise we hit the out > of bounds issues. And for qdisc layer the __skb_array_empty() API > call is useful so we should not remove it. > > In addition to the qdisc layer using the __skb_array_empty() API if > we need a peek() API that works without the qdisc lock then your patch, > the one adding the locked peek API, will be needed. It is not needed > though until we have a qdisc that would use it without the lock. > > In summary we keep the extra allocation to make the __skb_array_empty() API > usable. And hold off exposing the skb_array_peek()/ptr_ring_peek() until we > have a user for it. We can roll your proposed patch into any series we come > up with for a new/updated qdisc. > > > > >> I'll do another review of the false positive case though to be sure the > >> current code is OK wrt handling false positives and any potential stalls. > > > > > > Thanks! > > > >>>>> } > >>>>> > >>>>> return skb; > >>>>>