qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org, "Juraj Marcin" <jmarcin@redhat.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v2 2/3] io/tls: Make qio_channel_tls_bye() always synchronous
Date: Thu, 18 Sep 2025 14:12:13 -0400	[thread overview]
Message-ID: <aMxLfX7Z-Z7l4Heb@x1.local> (raw)
In-Reply-To: <87bjn7vmvv.fsf@suse.de>

On Thu, Sep 18, 2025 at 11:47:00AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > No issue I hit, the change is only from code observation when I am looking
> > at a TLS premature termination issue.
> >
> > qio_channel_tls_bye() API needs to be synchronous.  When it's not, the
> > previous impl will attach an asynchronous task retrying but only until when
> > the channel gets the relevant GIO event. It may be problematic, because the
> > caller of qio_channel_tls_bye() may have invoked channel close() before
> > that, leading to premature termination of the TLS session.
> >
> 
> I'm not super versed on socket APIs, so bear with me: Wouldn't the
> subsequent shutdown() before close() ensure that the io watch gets
> triggered? Assuming we're atomically installing the watch before the
> shutdown() (at the moment, we're not).

I think it won't.

First of all, AFAIU migration_cleanup() must be run in the main thread,
because it can register async tasks like the bye() task, and it registers
against context==NULL (in qio_channel_tls_bye_task(), for example), which I
believe means it'll be registered against the QEMU main loop.

Then, if we do a sequence of this:

  qio_channel_tls_bye()
  shutdown()
  close()

And if we do not yield anywhere within the process, IIUC it means it'll run
in sequence _without_ processing any events on the main loop even if some
events triggered.

So.. I think the close() will see the async task and remove it, never get a
chance to kick it off.

> 
> > Remove the asynchronous handling, instead retry it immediately.  Currently,
> > the only two possible cases that may lead to async task is either INTERRUPT
> > or EAGAIN.  It should be suffice to spin retry as of now, until a solid
> > proof showing that a more complicated retry logic is needed.
> >
> > With that, we can remove the whole async model for the bye task.
> >
> 
> With the bye() being synchronous, do we still have the issue when
> migration fails? I guess it depends on what the answer to my question
> above is...

When migration fails, IMHO it's fine to prematurely terminate the channels,
as I replied to one email that you commented on v1.  But we can discuss, I
am not sure if I missed things alone the lines.

Note, Dan suggested me to change the channel blocking mode as a smaller and
quicker fix, instead of throwing async model away, which seems to be
preferred to keep for any iochannel APIs.  So feel free to ignore this
patch too as of now.  I'll still need to investigate a bit on what would
happen if a concurrent update of fd nonblocking would affect other threads,
though.  In all cases, all results will be reflected in v3, but likely this
patch will be either dropped or replaced.

I know I let you read some of the things that we already planned to throw
away, my apologies. But it's partly your "fault" (to take holidays!). No,
I'm joking. :) It's still good to discuss these.

-- 
Peter Xu



  reply	other threads:[~2025-09-18 18:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 21:23 [PATCH v2 0/3] migration/tls: Graceful shutdowns for main and postcopy channels Peter Xu
2025-09-11 21:23 ` [PATCH v2 1/3] io/crypto: Move tls premature termination handling into QIO layer Peter Xu
2025-09-12 11:18   ` Daniel P. Berrangé
2025-09-12 15:24     ` Peter Xu
2025-09-15 18:31       ` Daniel P. Berrangé
2025-09-12 12:05   ` Juraj Marcin
2025-09-18 14:12   ` Fabiano Rosas
2025-09-11 21:23 ` [PATCH v2 2/3] io/tls: Make qio_channel_tls_bye() always synchronous Peter Xu
2025-09-12 11:27   ` Daniel P. Berrangé
2025-09-12 15:36     ` Peter Xu
2025-09-15 18:40       ` Daniel P. Berrangé
2025-09-15 20:41         ` Peter Xu
2025-09-18 14:47   ` Fabiano Rosas
2025-09-18 18:12     ` Peter Xu [this message]
2025-09-11 21:23 ` [PATCH v2 3/3] migration: Make migration_has_failed() work even for CANCELLING Peter Xu
2025-09-12 12:07   ` Juraj Marcin
2025-09-18 14:52   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aMxLfX7Z-Z7l4Heb@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=jmarcin@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).