From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: "manish.mishra" <manish.mishra@nutanix.com>,
qemu-devel@nongnu.org, prerna.saxena@nutanix.com,
quintela@redhat.com, dgilbert@redhat.com, lsoaresp@redhat.com
Subject: Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel
Date: Tue, 22 Nov 2022 17:31:28 +0000 [thread overview]
Message-ID: <Y30HcKdICo+MBttS@redhat.com> (raw)
In-Reply-To: <Y30D2MXHVbwCsR2P@x1n>
On Tue, Nov 22, 2022 at 12:16:08PM -0500, Peter Xu wrote:
> On Tue, Nov 22, 2022 at 10:12:25PM +0530, manish.mishra wrote:
> >
> > On 22/11/22 10:03 pm, Peter Xu wrote:
> > > On Tue, Nov 22, 2022 at 11:29:05AM -0500, Peter Xu wrote:
> > > > On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:
> > > > > On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
> > > > > > On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
> > > > > > > On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
> > > > > > > > On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
> > > > > > > > > On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
> > > > > > > > > > On Sat, Nov 19, 2022 at 09:36:14AM +0000, manish.mishra wrote:
> > > > > > > > > > > MSG_PEEK reads from the peek of channel, The data is treated as
> > > > > > > > > > > unread and the next read shall still return this data. This
> > > > > > > > > > > support is currently added only for socket class. Extra parameter
> > > > > > > > > > > 'flags' is added to io_readv calls to pass extra read flags like
> > > > > > > > > > > MSG_PEEK.
> > > > > > > > > > >
> > > > > > > > > > > Suggested-by: Daniel P. Berrangé <berrange@redhat.com
> > > > > > > > > > > Signed-off-by: manish.mishra<manish.mishra@nutanix.com>
> > > > > > > > > > > ---
> > > > > > > > > > > chardev/char-socket.c | 4 +-
> > > > > > > > > > > include/io/channel.h | 83 +++++++++++++++++++++++++++++
> > > > > > > > > > > io/channel-buffer.c | 1 +
> > > > > > > > > > > io/channel-command.c | 1 +
> > > > > > > > > > > io/channel-file.c | 1 +
> > > > > > > > > > > io/channel-null.c | 1 +
> > > > > > > > > > > io/channel-socket.c | 16 +++++-
> > > > > > > > > > > io/channel-tls.c | 1 +
> > > > > > > > > > > io/channel-websock.c | 1 +
> > > > > > > > > > > io/channel.c | 73 +++++++++++++++++++++++--
> > > > > > > > > > > migration/channel-block.c | 1 +
> > > > > > > > > > > scsi/qemu-pr-helper.c | 2 +-
> > > > > > > > > > > tests/qtest/tpm-emu.c | 2 +-
> > > > > > > > > > > tests/unit/test-io-channel-socket.c | 1 +
> > > > > > > > > > > util/vhost-user-server.c | 2 +-
> > > > > > > > > > > 15 files changed, 179 insertions(+), 11 deletions(-)
> > > > > > > > > > > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > > > > > > > > > > index b76dca9cc1..a06b24766d 100644
> > > > > > > > > > > --- a/io/channel-socket.c
> > > > > > > > > > > +++ b/io/channel-socket.c
> > > > > > > > > > > @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
> > > > > > > > > > > }
> > > > > > > > > > > #endif /* WIN32 */
> > > > > > > > > > > + qio_channel_set_feature(QIO_CHANNEL(cioc), QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
> > > > > > > > > > > +
> > > > > > > > > > This covers the incoming server side socket.
> > > > > > > > > >
> > > > > > > > > > This also needs to be set in outgoing client side socket in
> > > > > > > > > > qio_channel_socket_connect_async
> > > > > > > > > Yes sorry, i considered only current use-case, but as it is generic one both should be there. Thanks will update it.
> > > > > > > > >
> > > > > > > > > > > @@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
> > > > > > > > > > > }
> > > > > > > > > > > #endif /* WIN32 */
> > > > > > > > > > > -
> > > > > > > > > > > #ifdef QEMU_MSG_ZEROCOPY
> > > > > > > > > > > static int qio_channel_socket_flush(QIOChannel *ioc,
> > > > > > > > > > > Error **errp)
> > > > > > > > > > Please remove this unrelated whitespace change.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
> > > > > > > > > > > return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
> > > > > > > > > > > }
> > > > > > > > > > > +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
> > > > > > > > > > > + const struct iovec *iov,
> > > > > > > > > > > + size_t niov,
> > > > > > > > > > > + Error **errp)
> > > > > > > > > > > +{
> > > > > > > > > > > + ssize_t len = 0;
> > > > > > > > > > > + ssize_t total = iov_size(iov, niov);
> > > > > > > > > > > +
> > > > > > > > > > > + while (len < total) {
> > > > > > > > > > > + len = qio_channel_readv_full(ioc, iov, niov, NULL,
> > > > > > > > > > > + NULL, QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
> > > > > > > > > > > +
> > > > > > > > > > > + if (len == QIO_CHANNEL_ERR_BLOCK) {
> > > > > > > > > > > + if (qemu_in_coroutine()) {
> > > > > > > > > > > + qio_channel_yield(ioc, G_IO_IN);
> > > > > > > > > > > + } else {
> > > > > > > > > > > + qio_channel_wait(ioc, G_IO_IN);
> > > > > > > > > > > + }
> > > > > > > > > > > + continue;
> > > > > > > > > > > + }
> > > > > > > > > > > + if (len == 0) {
> > > > > > > > > > > + return 0;
> > > > > > > > > > > + }
> > > > > > > > > > > + if (len < 0) {
> > > > > > > > > > > + return -1;
> > > > > > > > > > > + }
> > > > > > > > > > > + }
> > > > > > > > > > This will busy wait burning CPU where there is a read > 0 and < total.
> > > > > > > > > >
> > > > > > > > > Daniel, i could use MSG_WAITALL too if that works but then we will lose opportunity to yield. Or if you have some other idea.
> > > > > > > > How easy would this happen?
> > > > > > > >
> > > > > > > > Another alternative is we could just return the partial len to caller then
> > > > > > > > we fallback to the original channel orders if it happens. And then if it
> > > > > > > > mostly will never happen it'll behave merely the same as what we want.
> > > > > > > Well we're trying to deal with a bug where the slow and/or unreliable
> > > > > > > network causes channels to arrive in unexpected order. Given we know
> > > > > > > we're having network trouble, I wouldn't want to make more assumptions
> > > > > > > about things happening correctly.
> > > > > > >
> > > > > > >
> > > > > > > With regards,
> > > > > > > Daniel
> > > > > >
> > > > > > Peter, I have seen MSG_PEEK used in combination with MSG_WAITALL, but looks like even though chances are less it can still return partial data even with multiple retries for signal case, so is not full proof.
> > > > > >
> > > > > > *MSG_WAITALL *(since Linux 2.2)
> > > > > > This flag requests that the operation block until the full
> > > > > > request is satisfied. However, the call may still return
> > > > > > less data than requested if a signal is caught, an error
> > > > > > or disconnect occurs, or the next data to be received is
> > > > > > of a different type than that returned. This flag has no
> > > > > > effect for datagram sockets.
> > > > > >
> > > > > > Actual read ahead will be little hackish, so just confirming we all are in agreement to do actual read ahead and i can send patch? :)
> > > > > Yet another option is the caller handles partial PEEK and then we can sleep
> > > > > in the migration code before another PEEK attempt until it reaches the full
> > > > > length.
> > > > >
> > > > > Even with that explicit sleep code IMHO it is cleaner than the read-header
> > > > > flag plus things like !tls check just to avoid the handshake dead lock
> > > > > itself (and if to go with this route we'd better also have a full document
> > > > > on why !tls, aka, how the dead lock can happen).
> > > > Nah, I forgot we're in the same condition as in the main thread.. sorry.
> > > >
> > > > Then how about using qemu_co_sleep_ns_wakeable() to replace
> > > > qio_channel_yield() either above, or in the caller?
> > > A better one is qemu_co_sleep_ns(). Off-topic: I'd even think we should
> > > have one qemu_co_sleep_realtime_ns() because currently all callers of
> > I am not aware of this :) , will check it.
> > > qemu_co_sleep_ns() is for the rt clock.
> >
> >
> > Yes that also works Peter. In that case, should i have a default time or take it from upper layers. And for live migration does something like of scale 1ms works?
>
> Sounds good to me on migration side. When making it formal we'd also want
> to know how Juan/Dave think.
>
> But let's also wait for Dan's input about this before going forward. If
> the io code wants an _eof() version of PEEK then maybe we'd better do the
> timeout-yield there even if not as elegant as G_IO_IN. IIUC it's a matter
> of whether we want to allow the PEEK interface return partial len.
I don't think we should add an _eof() version with PEEK, because its
impossible to implement sanely. If migration caller wants to busy
wait, or do a coroutine sleep it can do that.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2022-11-22 17:33 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-19 9:36 [PATCH 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
2022-11-19 9:36 ` [PATCH 2/2] migration: check magic value for deciding the mapping of channels manish.mishra
2022-11-19 9:36 ` manish.mishra
2022-11-19 9:36 ` [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
2022-11-22 9:00 ` Daniel P. Berrangé
2022-11-22 9:08 ` manish.mishra
2022-11-22 9:29 ` Daniel P. Berrangé
2022-11-22 9:40 ` manish.mishra
2022-11-22 9:53 ` Daniel P. Berrangé
2022-11-22 10:13 ` manish.mishra
2022-11-22 10:31 ` Daniel P. Berrangé
2022-11-22 14:41 ` Peter Xu
2022-11-22 14:49 ` Daniel P. Berrangé
2022-11-22 15:31 ` manish.mishra
2022-11-22 16:10 ` Peter Xu
2022-11-22 16:29 ` Peter Xu
2022-11-22 16:33 ` Peter Xu
2022-11-22 16:42 ` manish.mishra
2022-11-22 17:16 ` Peter Xu
2022-11-22 17:31 ` Daniel P. Berrangé [this message]
2022-11-19 9:36 ` [PATCH v3 2/2] migration: check magic value for deciding the mapping of channels manish.mishra
2022-11-21 21:59 ` Peter Xu
2022-11-22 9:01 ` Daniel P. Berrangé
2022-11-19 9:40 ` [PATCH 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y30HcKdICo+MBttS@redhat.com \
--to=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lsoaresp@redhat.com \
--cc=manish.mishra@nutanix.com \
--cc=peterx@redhat.com \
--cc=prerna.saxena@nutanix.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).