From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 10 Jun 2013 12:43:19 -0400 From: Konrad Rzeszutek Wilk To: Jan Beulich Cc: david.vrabel@citrix.com, roger.pau@citrix.com, xen-devel , linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH] xen/blkback: Check for insane amounts of request on the ring. Message-ID: <20130610164319.GB24467@phenom.dumpdata.com> References: <1370375826-7311-1-git-send-email-konrad.wilk@oracle.com> <20130607201140.GA22115@phenom.dumpdata.com> <51B6126302000078000DCC1C@nat28.tlf.novell.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <51B6126302000078000DCC1C@nat28.tlf.novell.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: On Mon, Jun 10, 2013 at 04:52:35PM +0100, Jan Beulich wrote: > >>> On 07.06.13 at 22:11, Konrad Rzeszutek Wilk wrote: > > On Tue, Jun 04, 2013 at 03:57:06PM -0400, Konrad Rzeszutek Wilk wrote: > >> + /* N.B. 'rp', not 'rc'. */ > >> + if (RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, rp)) { > >> + pr_warn(DRV_PFX "Frontend provided bogus ring requests (%d - %d = %d). > > Halting ring processing on dev=%04x\n", > >> + rp, rc, rp - rc, blkif->vbd.pdevice); > > > > Hm, I seem to be able to get: > > > > [ 189.398095] xen-blkback:Frontend provided bogus ring requests (125 - 115 = > > 10). Halting ring processing on dev=800011 > > or: > > [ 478.558699] xen-blkback:Frontend provided bogus ring requests (95 - 94 = > > 1). Halting ring processing on dev=800011 > > > > Which is clearly wrong. Piggybacking on the rsp_prod_pvt does not seem to > > cut it. > > We see that too, but not very frequently. One thing is that > rsp_prod_pvt doesn't get printed along with rc and rp, thus > making it not immediately obvious how this can be off in any way. > > Among the instance there are cases where the printed > difference is 32, which makes me wonder whether part of the > problem is the >= in the macro (we may want > here). > > And then we might have been living with some sort of issue in the > past, because the existing use of the macro just causes the loop > to be exited, with it getting re-entered subsequently (i.e. at worst > causing performance issues). My observation was that the rsp_prod_pvt was lagging behind b/c the READ requests weren't completed. In other words, the processing of the ring was stalled b/c 'make_response' hadn't been called yet. Which meant that rsp_prod was not updated to rsp_prod_pvt (backend does not care about that value, only frontend does). Going back to the rc an rp check solves the immediate 'insane ring check'.