From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S939624AbdAIS31 (ORCPT <rfc822;w@1wt.eu>);
        Mon, 9 Jan 2017 13:29:27 -0500
Received: from 2.mo69.mail-out.ovh.net ([178.33.251.80]:57222 "EHLO
        2.mo69.mail-out.ovh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S939559AbdAIS3W (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 9 Jan 2017 13:29:22 -0500
Date: Mon, 9 Jan 2017 19:29:15 +0100
From: Greg Kurz <groug@kaod.org>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Tuomas Tynkkynen <tuomas@tuxera.com>, linux-fsdevel@vger.kernel.org,
        v9fs-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject: Re: [V9fs-developer] 9pfs hangs since 4.7
Message-ID: <20170109192915.3227d2ee@bahia.lan>
In-Reply-To: <20170107171910.GJ1555@ZenIV.linux.org.uk>
References: <20161124215023.02deb03c@duuni>
        <20170102102035.7d1cf903@duuni>
        <20170102162309.GZ1555@ZenIV.linux.org.uk>
        <20170104013355.4a8923b6@duuni>
        <20170104014753.GE1555@ZenIV.linux.org.uk>
        <20170104220447.74f2265d@duuni>
        <20170104230101.GG1555@ZenIV.linux.org.uk>
        <20170106145235.51630baf@bahia.lan>
        <20170107062647.GB12074@ZenIV.linux.org.uk>
        <20170107161045.742893b1@bahia.lan>
        <20170107171910.GJ1555@ZenIV.linux.org.uk>
X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-redhat-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Ovh-Tracer-Id: 15456353921809684878
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: -100
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeelgedrvdeggdduudehucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuqfggjfdpvefjgfevmfevgfenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddm
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, 7 Jan 2017 17:19:10 +0000
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote:
> > > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken
> > > up.  In the meanwhile, another request arrives into the slot of freed by
> > > that virtqueue_push() and we are out of pdus.
> > >   
> > 
> > Indeed. Even if this doesn't seem to be the problem here, I guess this should
> > be fixed.  
> 
> 	FWIW, there's something that looks like an off-by-one in
> v9fs_device_realize_common():
>     /* initialize pdu allocator */
>     QLIST_INIT(&s->free_list);
>     QLIST_INIT(&s->active_list);
>     for (i = 0; i < (MAX_REQ - 1); i++) {
>         QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next);
>         s->pdus[i].s = s;
>         s->pdus[i].idx = i;
>     }
> 
> Had been there since the original merge of 9p support into qemu - that code
> had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into
> free list.  So your scenario with failing pdu_alloc() is still possible.

Indeed, this (MAX_REQ - 1) thing looks wrong. Thanks for poiting that out.

> In that log the total amount of pending requests has reached 128 for the
> first time right when the requests had stopped being handled and even
> though it had dropped below that shortly after, extra requests being put
> into queue had not been processed at all...
> 
> I'm not familiar with qemu guts enough to tell if that's a plausible scenario,
> though... shouldn't subsequent queue insertions (after enough slots had been
> released) simply trigger virtio_queue_notify_vq() again?  It *is* a bug
> (if we get a burst filling a previously empty queue all at once, there won't
> be any slots becoming freed), but that's obviously not the case here -
> slots were getting freed, after all.