All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: "Dr. David Alan Gilbert" <dave@treblig.org>
Cc: Peter Xu <peterx@redhat.com>,
	qemu-devel@nongnu.org, berrange@redhat.com, armbru@redhat.com,
	Claudio Fontana <cfontana@suse.de>, Jim Fehlig <jfehlig@suse.com>
Subject: Re: [PATCH v2 06/18] monitor: Stop removing non-duplicated fds
Date: Wed, 05 Jun 2024 09:31:57 -0300	[thread overview]
Message-ID: <87a5jzh8oy.fsf@suse.de> (raw)
In-Reply-To: <Zl-l0U0Azz8lgIw2@gallifrey>

"Dr. David Alan Gilbert" <dave@treblig.org> writes:

> * Fabiano Rosas (farosas@suse.de) wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Thu, May 23, 2024 at 04:05:36PM -0300, Fabiano Rosas wrote:
>> >> We've been up until now cleaning up any file descriptors that have
>> >> been passed into QEMU and never duplicated[1,2]. A file descriptor
>> >> without duplicates indicates that no part of QEMU has made use of
>> >> it. This approach is starting to show some cracks now that we're
>> >> starting to consume fds from the migration code:
>> >> 
>> >> - Doing cleanup every time the last monitor connection closes works to
>> >>   reap unused fds, but also has the side effect of forcing the
>> >>   management layer to pass the file descriptors again in case of a
>> >>   disconnect/re-connect, if that happened to be the only monitor
>> >>   connection.
>> >> 
>> >>   Another side effect is that removing an fd with qmp_remove_fd() is
>> >>   effectively delayed until the last monitor connection closes.
>> >> 
>> >>   The reliance on mon_refcount is also problematic because it's racy.
>> >> 
>> >> - Checking runstate_is_running() skips the cleanup unless the VM is
>> >>   running and avoids premature cleanup of the fds, but also has the
>> >>   side effect of blocking the legitimate removal of an fd via
>> >>   qmp_remove_fd() if the VM happens to be in another state.
>> >> 
>> >>   This affects qmp_remove_fd() and qmp_query_fdsets() in particular
>> >>   because requesting a removal at a bad time (guest stopped) might
>> >>   cause an fd to never be removed, or to be removed at a much later
>> >>   point in time, causing the query command to continue showing the
>> >>   supposedly removed fd/fdset.
>> >> 
>> >> Note that file descriptors that *have* been duplicated are owned by
>> >> the code that uses them and will be removed after qemu_close() is
>> >> called. Therefore we've decided that the best course of action to
>> >> avoid the undesired side-effects is to stop managing non-duplicated
>> >> file descriptors.
>> >> 
>> >> 1- efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
>> >> 2- ebe52b592d ("monitor: Prevent removing fd from set during init")
>> >> 
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> ---
>> >>  monitor/fds.c              | 15 ++++++++-------
>> >>  monitor/hmp.c              |  2 --
>> >>  monitor/monitor-internal.h |  1 -
>> >>  monitor/qmp.c              |  2 --
>> >>  4 files changed, 8 insertions(+), 12 deletions(-)
>> >> 
>> >> diff --git a/monitor/fds.c b/monitor/fds.c
>> >> index 101e21720d..f7b98814fa 100644
>> >> --- a/monitor/fds.c
>> >> +++ b/monitor/fds.c
>> >> @@ -169,6 +169,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp)
>> >>  
>> >>  static void monitor_fdset_free(MonFdset *mon_fdset)
>> >>  {
>> >> +    /*
>> >> +     * Only remove an empty fdset. The fds are owned by the user and
>> >> +     * should have been removed with qmp_remove_fd(). The dup_fds are
>> >> +     * owned by QEMU and should have been removed with qemu_close().
>> >> +     */
>> >>      if (QLIST_EMPTY(&mon_fdset->fds) && QLIST_EMPTY(&mon_fdset->dup_fds)) {
>> >>          QLIST_REMOVE(mon_fdset, next);
>> >>          g_free(mon_fdset);
>> >> @@ -189,9 +194,7 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
>> >>      MonFdsetFd *mon_fdset_fd_next;
>> >>  
>> >>      QLIST_FOREACH_SAFE(mon_fdset_fd, &mon_fdset->fds, next, mon_fdset_fd_next) {
>> >> -        if ((mon_fdset_fd->removed ||
>> >> -                (QLIST_EMPTY(&mon_fdset->dup_fds) && mon_refcount == 0)) &&
>> >> -                runstate_is_running()) {
>> >> +        if (mon_fdset_fd->removed) {
>> >
>> > I don't know the code well, but I like it.
>> >
>> >>              monitor_fdset_fd_free(mon_fdset_fd);
>> >>          }
>> >>      }
>> >> @@ -206,7 +209,7 @@ void monitor_fdsets_cleanup(void)
>> >>  
>> >>      QEMU_LOCK_GUARD(&mon_fdsets_lock);
>> >>      QLIST_FOREACH_SAFE(mon_fdset, &mon_fdsets, next, mon_fdset_next) {
>> >> -        monitor_fdset_cleanup(mon_fdset);
>> >> +        monitor_fdset_free(mon_fdset);
>> >>      }
>> >>  }
>> >>  
>> >> @@ -479,9 +482,7 @@ void monitor_fdset_dup_fd_remove(int dup_fd)
>> >>              if (mon_fdset_fd_dup->fd == dup_fd) {
>> >>                  QLIST_REMOVE(mon_fdset_fd_dup, next);
>> >>                  g_free(mon_fdset_fd_dup);
>> >> -                if (QLIST_EMPTY(&mon_fdset->dup_fds)) {
>> >> -                    monitor_fdset_cleanup(mon_fdset);
>> >> -                }
>> >> +                monitor_fdset_free(mon_fdset);
>> >
>> > This and above changes are not crystal clear to me where the _cleanup()
>> > does extra check "removed" and clean those fds.
>> >
>> > I think it'll make it easier for me to understand if this one and the next
>> > are squashed together.  But maybe it's simply because I didn't fully
>> > understand.
>> 
>> monitor_fdsets_cleanup() was doing three things previously:
>> 
>> 1- Remove the removed=true fds. This is weird, but ok.
>> 
>> 2- Remove fds from an fdset that has an empty dup_fds list, but only if
>> the guest is not running and the monitor is closed. This is problematic.
>> 
>> 3- Remove/free fdsets that have become empty due to the above
>> removals. This is ok.
>> 
>> This patch covers (2), because that is the only change that has a
>> complex reasoning behind it and we need to document that without
>> conflating it with the rest of the changes.
>
> The ebe52b592d you reference, makes me think that the only reason for the
> 'is not running' was as a way to detect when init had finished; and there
> are certainly better ways to do that (now?).

I agree with the perception, however I can't determine what "init" means
in this scenario. It's not clear from the original change at exactly
which point we're safe to assume fds have been consumed from some
subsystem (block probably). This also clashes with the (new) usage we're
attempting here for migration because the migration code would almost
certainly only consume the fds after this point.

>
> However, the efb87c1697 talks about cleaning up non-dup's on all monitors
> closed, to stop getting left-over fd's that were added but then not used
> (because a disconnect happened between adding and being used).
> In your world when do these get cleaned up?

They stay around until after the reconnection. Then either get removed
via qmp_remove_fd() or when a subsystem dups them and eventually calls
qemu_close().

The initial assumption seems to have been overly conservative, there
will always be a monitor connection around, even if it disconnects
briefly.

Per Daniel's suggestion we're considering a management layer bug if it
passes fds that QEMU never needs to consume. So QEMU will not attempt
any cleanup of perceived unused fds since they could be needed at a
later point (e.g. after qmp_migrate).

>
> Dave
>
>> Since (3) is still a reasonable thing to do, this patch merely keeps it
>> around, but using the helpers introduced in the previous patch.
>> 
>> The next patch removed the need for (1), making the removal immediate
>> instead of delayed. It has it's own much less complex reasoning, which
>> is: "we don't need to wait to remove the fd".
>> 
>> I hope this clarifies a bit. I would prefer if we kept this and the next
>> patch separate to avoid having a single patch that does too many
>> things. I hope to be as explicit as possible with the reason behind
>> these changes to avoid putting future people in the situation that we
>> are in now, i.e. having to guess at the reasons behind this code.
>> 
>> If you still think we should squash or if you have more questions, let
>> me know.
>> 
>> >>                  return;
>> >>              }
>> >>          }
>> >> diff --git a/monitor/hmp.c b/monitor/hmp.c
>> >> index 69c1b7e98a..460e8832f6 100644
>> >> --- a/monitor/hmp.c
>> >> +++ b/monitor/hmp.c
>> >> @@ -1437,11 +1437,9 @@ static void monitor_event(void *opaque, QEMUChrEvent event)
>> >>              monitor_resume(mon);
>> >>          }
>> >>          qemu_mutex_unlock(&mon->mon_lock);
>> >> -        mon_refcount++;
>> >>          break;
>> >>  
>> >>      case CHR_EVENT_CLOSED:
>> >> -        mon_refcount--;
>> >>          monitor_fdsets_cleanup();
>> >>          break;
>> >>  
>> >> diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
>> >> index 252de85681..cb628f681d 100644
>> >> --- a/monitor/monitor-internal.h
>> >> +++ b/monitor/monitor-internal.h
>> >> @@ -168,7 +168,6 @@ extern bool qmp_dispatcher_co_shutdown;
>> >>  extern QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
>> >>  extern QemuMutex monitor_lock;
>> >>  extern MonitorList mon_list;
>> >> -extern int mon_refcount;
>> >>  
>> >>  extern HMPCommand hmp_cmds[];
>> >>  
>> >> diff --git a/monitor/qmp.c b/monitor/qmp.c
>> >> index a239945e8d..5e538f34c0 100644
>> >> --- a/monitor/qmp.c
>> >> +++ b/monitor/qmp.c
>> >> @@ -466,7 +466,6 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
>> >>          data = qmp_greeting(mon);
>> >>          qmp_send_response(mon, data);
>> >>          qobject_unref(data);
>> >> -        mon_refcount++;
>> >>          break;
>> >>      case CHR_EVENT_CLOSED:
>> >>          /*
>> >> @@ -479,7 +478,6 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
>> >>          json_message_parser_destroy(&mon->parser);
>> >>          json_message_parser_init(&mon->parser, handle_qmp_command,
>> >>                                   mon, NULL);
>> >> -        mon_refcount--;
>> >>          monitor_fdsets_cleanup();
>> >>          break;
>> >>      case CHR_EVENT_BREAK:
>> >
>> > I like this too when mon_refcount can be dropped.  It turns out I like this
>> > patch and the next a lot, and I hope nothing will break.
>> >
>> > Aside, you seem to have forgot removal of the "int mon_refcount" in
>> > monitor.c.
>> 
>> Yes, I'll fix that. Thanks.


  reply	other threads:[~2024-06-05 12:33 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 19:05 [PATCH v2 00/18] migration/mapped-ram: Add direct-io support Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 01/18] migration: Fix file migration with fdset Fabiano Rosas
2024-05-24 10:51   ` Prasad Pandit
2024-05-24 12:30     ` Fabiano Rosas
2024-05-25  6:16       ` Prasad Pandit
2024-05-30 16:11   ` Peter Xu
2024-05-31 14:58     ` Fabiano Rosas
2024-06-03 10:20   ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 02/18] tests/qtest/migration: Fix file migration offset check Fabiano Rosas
2024-05-30 16:14   ` Peter Xu
2024-06-03 10:21   ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 03/18] tests/qtest/migration: Add a precopy file test with fdset Fabiano Rosas
2024-05-30 16:18   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 04/18] monitor: Drop monitor_fdset_dup_fd_add() Fabiano Rosas
2024-06-03 10:26   ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 05/18] monitor: Introduce monitor_fdset_*free Fabiano Rosas
2024-05-30 20:03   ` Peter Xu
2024-05-31 15:01     ` Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 06/18] monitor: Stop removing non-duplicated fds Fabiano Rosas
2024-05-30 21:05   ` Peter Xu
2024-05-31 15:25     ` Fabiano Rosas
2024-05-31 15:56       ` Peter Xu
2024-06-04 23:40       ` Dr. David Alan Gilbert
2024-06-05 12:31         ` Fabiano Rosas [this message]
2024-05-23 19:05 ` [PATCH v2 07/18] monitor: Simplify fdset and fd removal Fabiano Rosas
2024-05-31 15:58   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 08/18] monitor: Report errors from monitor_fdset_dup_fd_add Fabiano Rosas
2024-05-30 21:08   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 09/18] io: Stop using qemu_open_old in channel-file Fabiano Rosas
2024-05-30 21:10   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 10/18] migration: Add direct-io parameter Fabiano Rosas
2024-05-30 21:12   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 11/18] migration/multifd: Add direct-io support Fabiano Rosas
2024-05-30 21:35   ` Peter Xu
2024-05-31 15:27     ` Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 12/18] tests/qtest/migration: Add tests for file migration with direct-io Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 13/18] monitor: fdset: Match against O_DIRECT Fabiano Rosas
2024-05-30 21:41   ` Peter Xu
2024-05-31 15:42     ` Fabiano Rosas
2024-05-31 15:58       ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 14/18] migration: Add documentation for fdset with multifd + file Fabiano Rosas
2024-06-04 20:46   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 15/18] tests/qtest/migration: Add a test for mapped-ram with passing of fds Fabiano Rosas
2024-06-04 20:51   ` Peter Xu
2024-05-23 19:05 ` [PATCH v2 16/18] io/channel-file: Add direct-io support Fabiano Rosas
2024-06-03 10:32   ` Daniel P. Berrangé
2024-05-23 19:05 ` [PATCH v2 17/18] migration: Add direct-io helpers Fabiano Rosas
2024-05-23 19:05 ` [PATCH v2 18/18] migration/ram: Add direct-io support to precopy file migration Fabiano Rosas
2024-06-04 20:56   ` Peter Xu
2024-06-07 18:42     ` Fabiano Rosas
2024-06-07 20:39       ` Jim Fehlig
2024-06-10 16:09       ` Peter Xu
2024-06-10 17:45         ` Fabiano Rosas
2024-06-10 19:02           ` Peter Xu
2024-06-10 19:07             ` Daniel P. Berrangé
2024-06-10 20:12             ` Fabiano Rosas
2024-06-12 18:08               ` Fabiano Rosas
2024-06-12 18:15                 ` Daniel P. Berrangé
2024-06-12 18:27                   ` Peter Xu
2024-06-12 18:44                     ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a5jzh8oy.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=dave@treblig.org \
    --cc=jfehlig@suse.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.