All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: "peter.maydell@linaro.org" <peter.maydell@linaro.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"stefanha@redhat.com" <stefanha@redhat.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"eblake@redhat.com" <eblake@redhat.com>,
	"sgarzare@redhat.com" <sgarzare@redhat.com>
Subject: Re: [PATCH v6] Work around vhost-user-blk-test hang
Date: Mon, 18 Oct 2021 17:50:41 -0400	[thread overview]
Message-ID: <20211018171738-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20211014043216.10325-1-raphael.norwitz@nutanix.com>

On Thu, Oct 14, 2021 at 04:32:23AM +0000, Raphael Norwitz wrote:
> The vhost-user-blk-test qtest has been hanging intermittently for a
> while. The root cause is not yet fully understood, but the hang is
> impacting enough users that it is important to merge a workaround for
> it.
> 
> The race which causes the hang occurs early on in vhost-user setup,
> where a vhost-user message is never received by the backend. Forcing
> QEMU to wait until the storage-daemon has had some time to initialize
> prevents the hang. Thus the existing storage-daemon pidfile option can
> be used to implement a workaround cleanly and effectively, since it
> creates a file only once the storage-daemon initialization is complete.
> 
> This change implements a workaround for the vhost-user-blk-test hang by
> making QEMU wait until the storage-daemon has written out a pidfile
> before attempting to connect and send messages over the vhost-user
> socket.
> 
> Some relevent mailing list discussions:
> 
> [1] https://lore.kernel.org/qemu-devel/CAFEAcA8kYpz9LiPNxnWJAPSjc=nv532bEdyfynaBeMeohqBp3A@mail.gmail.com/
> [2] https://lore.kernel.org/qemu-devel/YWaky%2FKVbS%2FKZjlV@stefanha-x1.localdomain/
> 
> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>


Um. Does not seem to make things better for me:

**
ERROR:../tests/qtest/vhost-user-blk-test.c:950:start_vhost_user_blk: assertion failed (retries < PIDFILE_RETRIES): (5 < 5)
ERROR qtest-x86_64/qos-test - Bail out! ERROR:../tests/qtest/vhost-user-blk-test.c:950:start_vhost_user_blk: assertion failed (retries < PIDFILE_RETRIES): (5 < 5)

At this point I just disabled the test in meson. No need to make
everyone suffer.


> ---
>  tests/qtest/vhost-user-blk-test.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/vhost-user-blk-test.c b/tests/qtest/vhost-user-blk-test.c
> index 6f108a1b62..c6626a286b 100644
> --- a/tests/qtest/vhost-user-blk-test.c
> +++ b/tests/qtest/vhost-user-blk-test.c
> @@ -24,6 +24,7 @@
>  #define TEST_IMAGE_SIZE         (64 * 1024 * 1024)
>  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
>  #define PCI_SLOT_HP             0x06
> +#define PIDFILE_RETRIES         5
>  
>  typedef struct {
>      pid_t pid;


Don't like the arbitrary retries counter.

Let's warn maybe, but on a busy machine we might not complete this
in time ...


> @@ -885,7 +886,8 @@ static void start_vhost_user_blk(GString *cmd_line, int vus_instances,
>                                   int num_queues)
>  {
>      const char *vhost_user_blk_bin = qtest_qemu_storage_daemon_binary();
> -    int i;
> +    int i, retries;
> +    char *daemon_pidfile_path;
>      gchar *img_path;
>      GString *storage_daemon_command = g_string_new(NULL);
>      QemuStorageDaemonState *qsd;
> @@ -898,6 +900,8 @@ static void start_vhost_user_blk(GString *cmd_line, int vus_instances,
>              " -object memory-backend-memfd,id=mem,size=256M,share=on "
>              " -M memory-backend=mem -m 256M ");
>  
> +    daemon_pidfile_path = g_strdup_printf("/tmp/daemon-%d", getpid());
> +

Ugh. Predictable paths directly in /tmp are problematic .. mktemp?

>      for (i = 0; i < vus_instances; i++) {
>          int fd;
>          char *sock_path = create_listen_socket(&fd);
> @@ -914,6 +918,9 @@ static void start_vhost_user_blk(GString *cmd_line, int vus_instances,
>                                 i + 1, sock_path);
>      }
>  
> +    g_string_append_printf(storage_daemon_command, "--pidfile %s ",
> +                           daemon_pidfile_path);
> +
>      g_test_message("starting vhost-user backend: %s",
>                     storage_daemon_command->str);
>      pid_t pid = fork();
> @@ -930,7 +937,27 @@ static void start_vhost_user_blk(GString *cmd_line, int vus_instances,
>          execlp("/bin/sh", "sh", "-c", storage_daemon_command->str, NULL);
>          exit(1);
>      }
> +
> +    /*
> +     * FIXME: The loop here ensures the storage-daemon has come up properly
> +     *        before allowing the test to proceed. This is a workaround for
> +     *        a race which used to cause the vhost-user-blk-test to hang. It
> +     *        should be deleted once the root cause is fully understood and
> +     *        fixed.
> +     */
> +    retries = 0;
> +    while (access(daemon_pidfile_path, F_OK) != 0) {
> +        g_assert_cmpint(retries, <, PIDFILE_RETRIES);
> +
> +        retries++;
> +        g_usleep(1000);
> +    }
> +
>      g_string_free(storage_daemon_command, true);
> +    if (access(daemon_pidfile_path, F_OK) == 0) {
> +        unlink(daemon_pidfile_path);
> +    }
> +    g_free(daemon_pidfile_path);
>  
>      qsd = g_new(QemuStorageDaemonState, 1);
>      qsd->pid = pid;
> -- 
> 2.20.1



  reply	other threads:[~2021-10-18 21:51 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-14  4:32 [PATCH v6] Work around vhost-user-blk-test hang Raphael Norwitz
2021-10-18 21:50 ` Michael S. Tsirkin [this message]
2021-10-18 22:33   ` Raphael Norwitz
2021-10-19  6:20     ` Michael S. Tsirkin
2021-10-19 13:57 ` Stefan Hajnoczi
2021-10-19 14:39   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211018171738-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eblake@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=sgarzare@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.