qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Markus Pargmann <mpa@pengutronix.de>
To: Eric Blake <eblake@redhat.com>
Cc: nbd-general@lists.sourceforge.net, Kevin Wolf <kwolf@redhat.com>,
	"Denis V. Lunev" <den@openvz.org>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Wouter Verhelst <w@uter.be>
Subject: Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension
Date: Mon, 04 Apr 2016 12:32 +0200	[thread overview]
Message-ID: <6085102.BIbixMc99f@adelgunde> (raw)
In-Reply-To: <56F954BC.5010606@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5410 bytes --]

Hi,

On Monday 28 March 2016 09:58:52 Eric Blake wrote:
> On 03/25/2016 02:49 AM, Wouter Verhelst wrote:
> 
> >> You may also want to add a rule that for all future extensions, any
> >> command that requires data in the server response (other than the server
> >> response to NBD_CMD_READ) must include a 32-bit length as the first
> >> field of its data payload, so that the server reply is always structured.
> > 
> > Right.
> > 
> >> Hmm, I still think it would be hard to write a wireshark dissector for
> >> server replies, since there is no indication whether a given reply will
> >> include data or not.
> > 
> > Well, a wireshark dissector already exists. However, it is very old; it
> > doesn't support the NBD_CMD_TRIM or NBD_CMD_FLUSH commands, it doesn't
> > support the NBD_CMD_FLAG_FUA option to NBD_CMD_WRITE, and it doesn't
> > deal with negotiation at all. It was written when newstyle negotiation
> > didn't exist yet, and scanning for INIT_PASSWD at the time was too hard,
> > according to the guy who wrote it (don't remember who that was).
> 
> And I've never written a wireshark dissector myself, to know how easy or
> hard it would be to extend that work.
> 
> > 
> >> The _client_ knows (well, any well-written client
> >> that uses a different value for 'handle' for every command sent to the
> >> server), because it can read the returned 'handle' field to know what
> >> command it matches to and therefore what data to expect; but a
> >> third-party observer seeing _just_ the server messages has no idea which
> >> server responses should have payload.
> > 
> > It can if it knows enough about the protocol, but granted, that makes it
> > harder for us to extend the protocol without breaking the dissector.
> > 
> >> Scanning the stream and assuming that a new reply starts each time
> >> NBD_REPLY_MAGIC is encountered is a close approximation, but hits
> >> false positives if the data being returned for NBD_CMD_READ contains
> >> the same magic number as part of its contents.
> > 
> > Indeed.
> 
> One benefit of TCP: we can rely on packet boundaries (whether or not
> fragmentation is also happening); I'm not sure if UDP shares the same
> benefits (then again, I'm not even sure if UDP is usable for the NBD
> protocol, since we definitely rely on in-order delivery of packets: a
> read command can definitely return more bytes than even a jumbo frame
> can contain, even though we do allow out-of-order delivery of replies).
>  So if the server always sends each reply as the start of its own packet
> (rather than coalescing multiple replies into a single network packet),
> and the dissector only looks for magic numbers at the start of a packet,
> then any server packet not starting with the magic number must be data
> payload, and you could potentially even avoid the false positives by
> choosing to break data replies into packets by adjusting packet lengths
> by one byte any time the next data chunk would otherwise happen to start
> with the same two bytes as the magic number.  But I haven't actually
> tested any of this, to know if packet coalescing goes on, and I
> certainly don't think it is worth complicating the server to avoid false
> positive magic number detection by splitting read payloads across packet
> boundaries, just for the benefit of wire-sniffing.
> 
> > I like this. However, before committing to it, I would like to see
> > Markus' feedback on that (explicit Cc added, even though he's on the
> > list, too).
> > 
> > We'd also need a way to communicate the ability to speak this protocol
> > from the kernel to userspace before telling the server that the client
> > understands something which maybe its kernel doesn't.
> 
> proto.md already documents that ioctl()s exist for the user space to
> inform the kernel about options sent by the server prior to kicking off
> transmission phase, and that the NBD protocol itself does not go into
> full detail about available ioctl()s (the kernel/userspace coordination
> does not affect the protocol).  It seems fairly straightforward for the
> kernel to supply another ioctl() that userspace can use to learn whether
> it is allowed or forbidden from advertising structured reply status
> during the handshake phase (including the case where the ioctl() is not
> present being treated as the client must not enable structured replies).

Depending on the implementation we may not even need to communicate the
used NBD protocol from userspace to the kernel. We have a different
magic and the magic stays at the beginning of the message so we can
simply use the magic to decide what message we have. Of course that
would need another receive which may be too slow.

For the other direction, giving the userspace information about the
capabilities of the kernel implementation, it may be better to use a
sysfs entry? All current ioctls are used for the exact nbd device it is
used on. But implementation capabilities are a common property over all
nbd devices.

Best Regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  reply	other threads:[~2016-04-04 10:32 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-23 14:16 [Qemu-devel] [PATCH 0/2] NBD protocol extensions: WRITE_ZEROES and GET_LBA_STATUS Denis V. Lunev
2016-03-23 14:16 ` [Qemu-devel] [PATCH 1/2] NBD proto: add WRITE_ZEROES extension Denis V. Lunev
2016-03-23 15:14   ` Eric Blake
2016-03-23 17:40     ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-24  7:16     ` [Qemu-devel] " Pavel Borzenkov
2016-03-24  7:36       ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-23 17:21   ` Wouter Verhelst
2016-03-24  7:57     ` Pavel Borzenkov
2016-03-24  8:26       ` Wouter Verhelst
2016-03-24 11:35         ` Pavel Borzenkov
2016-03-24 11:37         ` Paolo Bonzini
2016-03-24 12:31           ` Wouter Verhelst
2016-03-24 14:53         ` Eric Blake
2016-03-23 14:16 ` [Qemu-devel] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension Denis V. Lunev
2016-03-23 16:27   ` Eric Blake
2016-03-24 12:30     ` Pavel Borzenkov
2016-03-24 15:04       ` Eric Blake
2016-03-24 16:36         ` Pavel Borzenkov
2016-03-23 17:58   ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-23 18:14     ` Kevin Wolf
2016-03-24  8:25       ` Pavel Borzenkov
2016-03-24  8:41         ` Wouter Verhelst
2016-03-24 11:36           ` Pavel Borzenkov
2016-03-24 12:32             ` Wouter Verhelst
2016-03-24  8:43     ` Pavel Borzenkov
2016-03-24  9:33       ` Wouter Verhelst
2016-03-24 10:32         ` Alex Bligh
2016-03-24 11:58           ` Paolo Bonzini
2016-03-24 12:17             ` Alex Bligh
2016-03-24 12:32               ` Paolo Bonzini
2016-03-24 13:31                 ` Alex Bligh
2016-03-24 13:32                   ` Paolo Bonzini
2016-03-24 11:55     ` Paolo Bonzini
2016-03-24 12:43       ` Wouter Verhelst
2016-03-24 15:25       ` Eric Blake
2016-03-24 15:33         ` Paolo Bonzini
2016-03-24 15:53           ` Wouter Verhelst
2016-03-24 16:04             ` Eric Blake
2016-03-24 16:07               ` Kevin Wolf
2016-03-24 16:47                 ` Wouter Verhelst
2016-03-29  9:38                   ` Kevin Wolf
2016-03-29  9:53                     ` Wouter Verhelst
2016-03-29 10:25                     ` Paolo Bonzini
2016-03-24 22:08   ` [Qemu-devel] " Eric Blake
2016-03-25  8:49     ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-03-25  9:01       ` Alex Bligh
2016-03-28 15:58       ` Eric Blake
2016-04-04 10:32         ` Markus Pargmann [this message]
2016-04-04 10:18       ` Markus Pargmann
2016-04-04 16:54         ` Eric Blake
2016-04-04 22:17         ` Wouter Verhelst
2016-04-04 16:40   ` [Qemu-devel] " Eric Blake
2016-04-04 20:16   ` Denis V. Lunev
2016-04-04 20:36     ` [Qemu-devel] [Nbd] " Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6085102.BIbixMc99f@adelgunde \
    --to=mpa@pengutronix.de \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).