linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Tuomas Tynkkynen <tuomas@tuxera.com>
Cc: Greg Kurz <groug@kaod.org>,
	linux-fsdevel@vger.kernel.org,
	v9fs-developer@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
Subject: Re: [V9fs-developer] 9pfs hangs since 4.7
Date: Mon, 9 Jan 2017 20:05:47 +0000	[thread overview]
Message-ID: <20170109200547.GN1555@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20170109203931.2315e1cd@duuni>

On Mon, Jan 09, 2017 at 08:39:31PM +0200, Tuomas Tynkkynen wrote:

> Yes, this does seem to be related to this or otherwise MAX_REQ related!
> - Bumping MAX_REQ up to 1024 makes the hang go away (on 4.7).
> - Dropping it to 64 makes the same hang happen on kernels where it worked
>   before (I tried 4.4.x).
> - Doing s/(MAX_REQ - 1)/MAX_REQ/ makes the hang go away.

Note that it's still possible to trigger the same situation with that
off-by-one taken care of; if client sends 64 Treadlink and 64 Tflush
(one for each of those), then follows by another pile of Treadlink (feeding
them in as soon as free slots appear), the server can end up with failing
pdu_alloc() - completion of readlink will release its slot immediately,
but pdu won't get freed until the corresponding flush has gotten the
CPU.

I'm _not_ familiar with scheduling in qemu and I don't quite understand
the mechanism of getting from "handle_9p_output() bailed out with some
requests still not processed" to "no further requests get processed", so
it might be that for some reason triggering the former as described above
won't escalate to the latter, but I wouldn't count upon that.

Another thing I'm very certain about is that 9 0 0 0 108 1 0 1 0 sent by
broken client (Tflush tag = 1 oldtag = 1) will do nasty things to qemu
server.  v9fs_flush() will try to find the pdu of request to cancel,
find its own argument, put itself on its ->complete and yield CPU, expecting
to get back once the victim gets through to pdu_complete().  Since the
victim is itself...

AFAICS, the things client can expect wrt Tflush handling are
	* in no case should Rflush be sent before the reply to request
its trying to cancel
	* the only case when server _may_ not send Rflush is the arrival
of more than one Tflush with the same oldtag; in that case it is allowed
to suppress replies to earlier ones.  If they are not suppressed, replies
should come in the order of Tflush arrivals.
	* if reply to Tflush is sent (see above), it must be Rflush.
	* multiple Tflush with the same oldtag are allowed; Linux kernel
client does not issue those, but other clients might.  As the matter of
fact, Plan 9 kernel client *does* issue those.
	* Tflush to Tflush is no-op; it still needs a reply, and ordering
constraints apply (it can't be sent before the reply to Tflush it's been
refering to, which, in turn, can't be sent before the reply to request
the first Tflush refers to).  Normally such requests are not sent, but
in principle they are allowed.
	* Tflush to request that isn't being processed should be
answered immediately.  The same goes for Tflush refering to itself.
The former is not an error (we might have already sent a reply), but
the latter might be worth a loud warning - clients are definitely not
supposed to do that.  It still needs Rflush in response - Rerror is not
allowed.

      reply	other threads:[~2017-01-09 20:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-24 19:50 9pfs hangs since 4.7 Tuomas Tynkkynen
2016-11-29 16:39 ` Eric Van Hensbergen
2016-12-02 20:11   ` Tuomas Tynkkynen
2017-01-02  8:20 ` Tuomas Tynkkynen
2017-01-02 16:23   ` Al Viro
2017-01-03 23:34     ` Tuomas Tynkkynen
2017-01-04  1:47       ` Al Viro
2017-01-04 20:04         ` Tuomas Tynkkynen
2017-01-04 23:01           ` Al Viro
2017-01-06 13:52             ` [V9fs-developer] " Greg Kurz
2017-01-07  6:26               ` Al Viro
2017-01-07 15:10                 ` Greg Kurz
2017-01-07 17:19                   ` Al Viro
2017-01-07 18:15                     ` Al Viro
2017-01-08  5:46                       ` Al Viro
2017-01-10  8:16                         ` Greg Kurz
2017-01-09 18:29                     ` Greg Kurz
2017-01-09 18:39                     ` Tuomas Tynkkynen
2017-01-09 20:05                       ` Al Viro [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170109200547.GN1555@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=groug@kaod.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tuomas@tuxera.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).