From: Kevin Wolf <kwolf@redhat.com>
To: mingwei <gongwilliam@163.com>
Cc: namei.unix@gmail.com, sheepdog@lists.wpkg.org,
qemu-devel@nongnu.org, qemu-block@nongnu.org, mreitz@redhat.com
Subject: Re: [PATCH v1 1/1] sheepdog driver patch: fixs the problem of qemu process become crashed when the sheepdog gateway break the IO and then recover
Date: Fri, 2 Oct 2020 13:46:00 +0200 [thread overview]
Message-ID: <20201002114600.GC4996@linux.fritz.box> (raw)
In-Reply-To: <20201001022127.7315-1-gongwilliam@163.com>
Am 01.10.2020 um 04:21 hat mingwei geschrieben:
> this patch fixs the problem of qemu process become crashed when the sheepdog gateway break the IO for a few seconds and then recover.
>
> problem reproduce:
> 1.start a fio process in qemu to produce IOs to sheepdog gateway, whatever IO type you like.
> 2.kill the sheepdog gateway.
> 3.wait for a few seconds.
> 4.restart the sheepdog gateway.
> 5.qemu process crashed with segfault error 6.
Can you post a stack trace?
Signal 6 is not a segfault, but SIGABRT.
> problem cause:
> the last io coroutine will be destroyed after reconnect to sheepdog gateway, but the coroutine still be scheduled and the s->co_recv is still the last io coroutine pointer which had been destroyed, so when this coroutine go to coroutine context switch, it will make qemu process crashed.
>
> problem fix:
> just make s->co_recv = NULL when the last io coroutine reconnect to sheepdog gateway.
>
> Signed-off-by: mingwei <gongwilliam@163.com>
> ---
> block/sheepdog.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index 2f5c0eb376..3a00f0c1e1 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -727,6 +727,7 @@ static coroutine_fn void reconnect_to_sdog(void *opaque)
> NULL, NULL, NULL);
> close(s->fd);
> s->fd = -1;
> + s->co_recv = NULL;
If s->co_revc != NULL before this, there is still a coroutine running
that hasn't terminated yet. Don't we need to make sure that the
coroutine actually terminates instead of just overwriting the pointer to
it?
Otherwise, we either leak the coroutine and the memory used for its
stack, or the coroutine continues to run at some point and might
interfer with the operation of the new instance.
> /* Wait for outstanding write requests to be completed. */
> while (s->co_send != NULL) {
co_write_request(opaque);
}
This existing code after your change is wrong, too, by the way. It
potentially calls aio_co_wake() multiple times in a row, which will
crash if it ends up only scheduling the coroutine instead of directly
entering it.
Kevin
prev parent reply other threads:[~2020-10-02 11:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-01 2:21 [PATCH v1 1/1] sheepdog driver patch: fixs the problem of qemu process become crashed when the sheepdog gateway break the IO and then recover mingwei
2020-10-02 11:46 ` Kevin Wolf [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201002114600.GC4996@linux.fritz.box \
--to=kwolf@redhat.com \
--cc=gongwilliam@163.com \
--cc=mreitz@redhat.com \
--cc=namei.unix@gmail.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sheepdog@lists.wpkg.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).