All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Mike Marshall <hubcap@omnibond.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: orangefs: NAK until the ABI is documented (was Re: updated orangefs tree at kernel.org)
Date: Sun, 11 Oct 2015 00:10:55 +0100	[thread overview]
Message-ID: <20151010231054.GH22011@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20151010023157.GE22011@ZenIV.linux.org.uk>

On Sat, Oct 10, 2015 at 03:31:57AM +0100, Al Viro wrote:
> 	* atrocious userland API for downcalls (response to everything
> other than readdir must come in 4-element iovec array passed to writev(),
> no matter where the segment boundaries are; response to readdir must come
> in 5-element iovec array passed to writev(), the boundaries among the
> first 4 segments do not matter, the 5th segment covering exactly the payload).
> IMO that needs to be fixed before we merge the damn thing - it's really too
> ugly to live.  I would really like to hear from somebody familiar with the
> userland side - what responses does it actually submit?  The validation
> kernel-side is basically inexistent, and while I suspect that we could handle
> the actually sent responses much cleaner, the full set of everything that
> would be accepted by the current code is a bloody mess and would be much
> harder to handle in a clean way.  What's more, the response layouts are
> messy, and if it is still possible to change that API, we'd be much better off
> if we cleaned them up.

Another API bogosity: when the 5th segment is present, successful writev()
returns the sum of sizes of the first 4.  Userland side just ignores that -
everything positive is treated as "everything's written".

Another piece of fun: header too large by less than sizeof(long) => print
a warning, leave op->downcall completely uninitialized and proceed.

Below is what the kernel side is actually doing:
	* less than 4 segments or more than 5 => whine and fail
	* if concatenation of the first 4 segments is longer than
16 + sizeof(struct pvfs2_downcall_s) + sizeof(long) => whine and fail
	* if concatenation of the first 4 segments is longer than
16 + sizeof(struct pvfs2_downcall_s) by no more than sizeof(long) => whine
and proceed with garbage.
	* if concatenation of the first 4 segments is no longer than
16 + sizeof(struct pvfs2_downcall_s), it is zero-padded to that size,
then the first 16 bytes are interpreted as 32bit protocol version (ignored)
+ 32bit magic (checked for one specific value) + 64bit tag (used to find the
matching request).  The rest is copied into op->downcall.
	* if the 32bit value 4 bytes into op->downcall is zero and 64bit
value following it is non-zero, the latter is interpreted as the size of
trailer data.
	* if there's no trailer, the 5th segment (if present) is completely
ignored.  Otherwise it must be present and is expected to contain the trailer
data.
	* if the 5th segment is longer than expected trailer data => whine
and fail (note that if the amount of expected trailer data is zero a non-empty
5th segment is quietly ignored).
	* if trailer expected, vmalloc a buffer for it and copy the 5th
segment contents in there; in case of the 5th segment being *shorter* than
expected trailer data, leave the rest of the buffer uninitialized (i.e.
expose the contents of random previously freed pages).
	* if vmalloc fails, act as if status (32bit at offset 5 into
op->downcall) had been -ENOMEM and don't look at the 5th segment at all.
	* in all cases ignore the amount of data taken from the 5th segment;
the return value is the sum of sizes of the first 4.

	What the current userland side appears to sends is 4-segment
version/magic/tag/struct pvfs2_downcall_s + possibly the 5th segment covering
the readdir results.  Trailer size in struct pvfs2_downcall_s actually is
equal to the size of the 5th segment if present and 0 otherwise.
	The 4th segment (struct pvfs2_downcall_s) contains a bunch of garbage
never used by the kernel (pointers and misalignment gaps).
	How much of that is guaranteed is, of course, known only to the
orangefs folks.  I only looked at the current version of the userland code;
hadn't checked the older ones and have no idea what kind of changes might
be planned.

	Folks, userland interfaces need to be documented (and preferably -
contain less ugliness).  At that point I think that we need the damn API
written up and reviewed, or it's a strong NAK.  Once it's in the tree, it's
cast in stone, so we'd better get it right and we need it explicitly
documented.  "Whatever orangefs userland code does" doesn't cut it, especially
since your protocol version check is inexistent.  Digging through the entire
history of your userland code and trying to deduce what would and what would
not be OK to change kernel-side is far past reasonable.

	I'm not asking for the description of semantics for individual requests
and responses (it would be nice to have as well, but that's a separate story),
but the rules for data marshalling are really needed.

  parent reply	other threads:[~2015-10-10 23:10 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-07 20:07 updated orangefs tree at kernel.org Mike Marshall
2015-10-08 21:07 ` Al Viro
2015-10-08 23:03   ` Al Viro
2015-10-09  3:41     ` Al Viro
2015-10-09  6:13       ` Al Viro
2015-10-09 19:22       ` Al Viro
2015-10-10  2:31     ` Al Viro
2015-10-10 20:23       ` Al Viro
2015-10-10 23:10       ` Al Viro [this message]
2015-10-12 16:20         ` orangefs: NAK until the ABI is documented (was Re: updated orangefs tree at kernel.org) Mike Marshall
2015-10-12 19:16           ` Al Viro
2015-10-13 17:46             ` Mike Marshall
2015-10-13 23:34               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151010231054.GH22011@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.