From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zoltan Kiss Subject: Re: RING_HAS_UNCONSUMED_REQUESTS oddness Date: Thu, 6 Mar 2014 21:39:40 +0000 Message-ID: <5318EB1C.10500@citrix.com> References: <5318987C.3030303@citrix.com> <1394121221.13270.10.camel@hastur.hellion.org.uk> <5318A2D8.3090808@citrix.com> <20140306173057.GK11475@deinos.phlegethon.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1WLg14-0000ha-16 for xen-devel@lists.xenproject.org; Thu, 06 Mar 2014 21:39:46 +0000 In-Reply-To: <20140306173057.GK11475@deinos.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Tim Deegan Cc: "xen-devel@lists.xenproject.org" , Wei Liu , Ian Campbell List-Id: xen-devel@lists.xenproject.org On 06/03/14 17:30, Tim Deegan wrote: > At 16:31 +0000 on 06 Mar (1394119880), Zoltan Kiss wrote: >> On 06/03/14 15:53, Ian Campbell wrote: >>> On Thu, 2014-03-06 at 15:47 +0000, Zoltan Kiss wrote: >>>> By my understanding, there is no way rsp could be smaller than req, so >>>> there is no point having this. Am I missing something? >>> >>> It happens during wraparound, i.e. after req has wrapped but rsp hasn't >>> yet. >> >> The name of the macro suggest we are interested whether the ring has >> unconsumed requests, and netback uses it that way. The answer to that >> question is req_prod - req_cons. And it works if prod wrapped but cons >> didn't. > > Yes. > >> rsp calculates the number of "consumed but not responded" requests (it >> also works well if req_cons wrapped but rsp_prod_pvt didn't), then >> subtract it from the ring size. > > That is indeed an odd thing to check, since it seems like it could only > be relevant if the request producer overran the response producer. > It's been there in one form or another since the original ring.h, > and RING_REQUEST_CONS_OVERFLOW does something similar. > > I can't remember the original reasoning, and so I'm reluctant to > suggest removing it without some more eyes on the code... I've added the following printk before the "req < rsp" part: if (rsp < req) \ pr_err("req %u rsp %u req_prod %u req_cons %u rsp_prod_pvt %u\n", req, rsp, (_r)->sring->req_prod, (_r)->req_cons, (_r)->rsp_prod_pvt); \ And it gave me such results: xen_netback:xenvif_zerocopy_callback: req 4294967279 rsp 52 req_prod 1770663942 req_cons 1770663959 rsp_prod_pvt 1770663755 So it can happen that req_prod is behind req_cons, sometimes even with 17! But it always happen in this callback of my new grant mapping series, which runs outside the NAPI instance. My theory why this can happen: - callback reads req_prod - frontend writes it - backend picks it up, and consumes those slots - callback reads req_cons So req can be near UINT_MAX if you call this macro outside the backend. The only place where the actual return value of this macro matters is xenvif_tx_build_gops, and it should be correct there. At other places we are only looking for the fact whether the ring has unconsumed requests or not. If prod is smaller than cons, we clearly read a wrong value. I think what we can do: 1. try again until its correct 2. just return a non-zero value, it shouldn't cause too much trouble if we say yes here 3. we can't see rsp_cons, so try to figure out if the ring is full of consumed but not responded requests, and return zero then, otherwise a positive value. That's what we do know. Does this make sense? Should we rather go option 1? Should I post a comment patch to document this, and spare a few hours for future generations? :) Zoli