qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "manish.mishra" <manish.mishra@nutanix.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	qemu-devel@nongnu.org, prerna.saxena@nutanix.com,
	quintela@redhat.com, dgilbert@redhat.com, lsoaresp@redhat.com
Subject: Re: [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel
Date: Tue, 22 Nov 2022 22:12:25 +0530	[thread overview]
Message-ID: <00d72719-051f-1fcf-e246-79996349937f@nutanix.com> (raw)
In-Reply-To: <Y3z54h+twgVKKZ2t@x1n>


On 22/11/22 10:03 pm, Peter Xu wrote:
> On Tue, Nov 22, 2022 at 11:29:05AM -0500, Peter Xu wrote:
>> On Tue, Nov 22, 2022 at 11:10:18AM -0500, Peter Xu wrote:
>>> On Tue, Nov 22, 2022 at 09:01:59PM +0530, manish.mishra wrote:
>>>> On 22/11/22 8:19 pm, Daniel P. Berrangé wrote:
>>>>> On Tue, Nov 22, 2022 at 09:41:01AM -0500, Peter Xu wrote:
>>>>>> On Tue, Nov 22, 2022 at 02:38:53PM +0530, manish.mishra wrote:
>>>>>>> On 22/11/22 2:30 pm, Daniel P. Berrangé wrote:
>>>>>>>> On Sat, Nov 19, 2022 at 09:36:14AM +0000, manish.mishra wrote:
>>>>>>>>> MSG_PEEK reads from the peek of channel, The data is treated as
>>>>>>>>> unread and the next read shall still return this data. This
>>>>>>>>> support is currently added only for socket class. Extra parameter
>>>>>>>>> 'flags' is added to io_readv calls to pass extra read flags like
>>>>>>>>> MSG_PEEK.
>>>>>>>>>
>>>>>>>>> Suggested-by: Daniel P. Berrangé <berrange@redhat.com
>>>>>>>>> Signed-off-by: manish.mishra<manish.mishra@nutanix.com>
>>>>>>>>> ---
>>>>>>>>>     chardev/char-socket.c               |  4 +-
>>>>>>>>>     include/io/channel.h                | 83 +++++++++++++++++++++++++++++
>>>>>>>>>     io/channel-buffer.c                 |  1 +
>>>>>>>>>     io/channel-command.c                |  1 +
>>>>>>>>>     io/channel-file.c                   |  1 +
>>>>>>>>>     io/channel-null.c                   |  1 +
>>>>>>>>>     io/channel-socket.c                 | 16 +++++-
>>>>>>>>>     io/channel-tls.c                    |  1 +
>>>>>>>>>     io/channel-websock.c                |  1 +
>>>>>>>>>     io/channel.c                        | 73 +++++++++++++++++++++++--
>>>>>>>>>     migration/channel-block.c           |  1 +
>>>>>>>>>     scsi/qemu-pr-helper.c               |  2 +-
>>>>>>>>>     tests/qtest/tpm-emu.c               |  2 +-
>>>>>>>>>     tests/unit/test-io-channel-socket.c |  1 +
>>>>>>>>>     util/vhost-user-server.c            |  2 +-
>>>>>>>>>     15 files changed, 179 insertions(+), 11 deletions(-)
>>>>>>>>> diff --git a/io/channel-socket.c b/io/channel-socket.c
>>>>>>>>> index b76dca9cc1..a06b24766d 100644
>>>>>>>>> --- a/io/channel-socket.c
>>>>>>>>> +++ b/io/channel-socket.c
>>>>>>>>> @@ -406,6 +406,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc,
>>>>>>>>>         }
>>>>>>>>>     #endif /* WIN32 */
>>>>>>>>> +    qio_channel_set_feature(QIO_CHANNEL(cioc), QIO_CHANNEL_FEATURE_READ_MSG_PEEK);
>>>>>>>>> +
>>>>>>>> This covers the incoming server side socket.
>>>>>>>>
>>>>>>>> This also needs to be set in outgoing client side socket in
>>>>>>>> qio_channel_socket_connect_async
>>>>>>> Yes sorry, i considered only current use-case, but as it is generic one both should be there. Thanks will update it.
>>>>>>>
>>>>>>>>> @@ -705,7 +718,6 @@ static ssize_t qio_channel_socket_writev(QIOChannel *ioc,
>>>>>>>>>     }
>>>>>>>>>     #endif /* WIN32 */
>>>>>>>>> -
>>>>>>>>>     #ifdef QEMU_MSG_ZEROCOPY
>>>>>>>>>     static int qio_channel_socket_flush(QIOChannel *ioc,
>>>>>>>>>                                         Error **errp)
>>>>>>>> Please remove this unrelated whitespace change.
>>>>>>>>
>>>>>>>>
>>>>>>>>> @@ -109,6 +117,37 @@ int qio_channel_readv_all_eof(QIOChannel *ioc,
>>>>>>>>>         return qio_channel_readv_full_all_eof(ioc, iov, niov, NULL, NULL, errp);
>>>>>>>>>     }
>>>>>>>>> +int qio_channel_readv_peek_all_eof(QIOChannel *ioc,
>>>>>>>>> +                                   const struct iovec *iov,
>>>>>>>>> +                                   size_t niov,
>>>>>>>>> +                                   Error **errp)
>>>>>>>>> +{
>>>>>>>>> +   ssize_t len = 0;
>>>>>>>>> +   ssize_t total = iov_size(iov, niov);
>>>>>>>>> +
>>>>>>>>> +   while (len < total) {
>>>>>>>>> +       len = qio_channel_readv_full(ioc, iov, niov, NULL,
>>>>>>>>> +                                    NULL, QIO_CHANNEL_READ_FLAG_MSG_PEEK, errp);
>>>>>>>>> +
>>>>>>>>> +       if (len == QIO_CHANNEL_ERR_BLOCK) {
>>>>>>>>> +            if (qemu_in_coroutine()) {
>>>>>>>>> +                qio_channel_yield(ioc, G_IO_IN);
>>>>>>>>> +            } else {
>>>>>>>>> +                qio_channel_wait(ioc, G_IO_IN);
>>>>>>>>> +            }
>>>>>>>>> +            continue;
>>>>>>>>> +       }
>>>>>>>>> +       if (len == 0) {
>>>>>>>>> +           return 0;
>>>>>>>>> +       }
>>>>>>>>> +       if (len < 0) {
>>>>>>>>> +           return -1;
>>>>>>>>> +       }
>>>>>>>>> +   }
>>>>>>>> This will busy wait burning CPU where there is a read > 0 and < total.
>>>>>>>>
>>>>>>> Daniel, i could use MSG_WAITALL too if that works but then we will lose opportunity to yield. Or if you have some other idea.
>>>>>> How easy would this happen?
>>>>>>
>>>>>> Another alternative is we could just return the partial len to caller then
>>>>>> we fallback to the original channel orders if it happens.  And then if it
>>>>>> mostly will never happen it'll behave merely the same as what we want.
>>>>> Well we're trying to deal with a bug where the slow and/or unreliable
>>>>> network causes channels to arrive in unexpected order. Given we know
>>>>> we're having network trouble, I wouldn't want to make more assumptions
>>>>> about things happening correctly.
>>>>>
>>>>>
>>>>> With regards,
>>>>> Daniel
>>>>
>>>> Peter, I have seen MSG_PEEK used in combination with MSG_WAITALL, but looks like even though chances are less it can still return partial data even with multiple retries for signal case, so is not full proof.
>>>>
>>>> *MSG_WAITALL *(since Linux 2.2)
>>>>                This flag requests that the operation block until the full
>>>>                request is satisfied.  However, the call may still return
>>>>                less data than requested if a signal is caught, an error
>>>>                or disconnect occurs, or the next data to be received is
>>>>                of a different type than that returned.  This flag has no
>>>>                effect for datagram sockets.
>>>>
>>>> Actual read ahead will be little hackish, so just confirming we all are in agreement to do actual read ahead and i can send patch? :)
>>> Yet another option is the caller handles partial PEEK and then we can sleep
>>> in the migration code before another PEEK attempt until it reaches the full
>>> length.
>>>
>>> Even with that explicit sleep code IMHO it is cleaner than the read-header
>>> flag plus things like !tls check just to avoid the handshake dead lock
>>> itself (and if to go with this route we'd better also have a full document
>>> on why !tls, aka, how the dead lock can happen).
>> Nah, I forgot we're in the same condition as in the main thread.. sorry.
>>
>> Then how about using qemu_co_sleep_ns_wakeable() to replace
>> qio_channel_yield() either above, or in the caller?
> A better one is qemu_co_sleep_ns().  Off-topic: I'd even think we should
> have one qemu_co_sleep_realtime_ns() because currently all callers of
I am not aware of this :) , will check it.
> qemu_co_sleep_ns() is for the rt clock.


Yes that also works Peter. In that case, should i have a default time or take it from upper layers. And for live migration does something like of scale 1ms works?

Thanks

Manish Mishra

>


  reply	other threads:[~2022-11-22 16:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-19  9:36 [PATCH 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
2022-11-19  9:36 ` [PATCH 2/2] migration: check magic value for deciding the mapping of channels manish.mishra
2022-11-19  9:36 ` manish.mishra
2022-11-19  9:36 ` [PATCH v3 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra
2022-11-22  9:00   ` Daniel P. Berrangé
2022-11-22  9:08     ` manish.mishra
2022-11-22  9:29       ` Daniel P. Berrangé
2022-11-22  9:40         ` manish.mishra
2022-11-22  9:53           ` Daniel P. Berrangé
2022-11-22 10:13             ` manish.mishra
2022-11-22 10:31               ` Daniel P. Berrangé
2022-11-22 14:41       ` Peter Xu
2022-11-22 14:49         ` Daniel P. Berrangé
2022-11-22 15:31           ` manish.mishra
2022-11-22 16:10             ` Peter Xu
2022-11-22 16:29               ` Peter Xu
2022-11-22 16:33                 ` Peter Xu
2022-11-22 16:42                   ` manish.mishra [this message]
2022-11-22 17:16                     ` Peter Xu
2022-11-22 17:31                       ` Daniel P. Berrangé
2022-11-19  9:36 ` [PATCH v3 2/2] migration: check magic value for deciding the mapping of channels manish.mishra
2022-11-21 21:59   ` Peter Xu
2022-11-22  9:01   ` Daniel P. Berrangé
2022-11-19  9:40 ` [PATCH 1/2] io: Add support for MSG_PEEK for socket channel manish.mishra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00d72719-051f-1fcf-e246-79996349937f@nutanix.com \
    --to=manish.mishra@nutanix.com \
    --cc=berrange@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=peterx@redhat.com \
    --cc=prerna.saxena@nutanix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).