* [Qemu-devel] [PATCH v2 1/3] nbd-client: avoid read_reply_co entry if send failed
2017-08-29 12:27 [Qemu-devel] [PATCH v2 0/3] nbd-client: enter read_reply_co during init to avoid crash Stefan Hajnoczi
@ 2017-08-29 12:27 ` Stefan Hajnoczi
2017-08-29 19:13 ` Eric Blake
2017-09-26 18:12 ` Eric Blake
2017-08-29 12:27 ` [Qemu-devel] [PATCH v2 2/3] qemu-iotests: improve nbd-fault-injector.py startup protocol Stefan Hajnoczi
` (2 subsequent siblings)
3 siblings, 2 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-08-29 12:27 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Paolo Bonzini,
qemu-block, Eric Blake, Stefan Hajnoczi
The following segfault is encountered if the NBD server closes the UNIX
domain socket immediately after negotiation:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
441 QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
(gdb) bt
#0 0x000000d3c01a50f8 in aio_co_schedule (ctx=0x0, co=0xd3c0ff2ef0) at util/async.c:441
#1 0x000000d3c012fa90 in nbd_coroutine_end (bs=bs@entry=0xd3c0fec650, request=<optimized out>) at block/nbd-client.c:207
#2 0x000000d3c012fb58 in nbd_client_co_preadv (bs=0xd3c0fec650, offset=0, bytes=<optimized out>, qiov=0x7ffc10a91b20, flags=0) at block/nbd-client.c:237
#3 0x000000d3c0128e63 in bdrv_driver_preadv (bs=bs@entry=0xd3c0fec650, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=0) at block/io.c:836
#4 0x000000d3c012c3e0 in bdrv_aligned_preadv (child=child@entry=0xd3c0ff51d0, req=req@entry=0x7f31885d6e90, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=1, qiov=qiov@entry=0x7ffc10a91b20, f
+lags=0) at block/io.c:1086
#5 0x000000d3c012c6b8 in bdrv_co_preadv (child=0xd3c0ff51d0, offset=offset@entry=0, bytes=bytes@entry=512, qiov=qiov@entry=0x7ffc10a91b20, flags=flags@entry=0) at block/io.c:1182
#6 0x000000d3c011cc17 in blk_co_preadv (blk=0xd3c0ff4f80, offset=0, bytes=512, qiov=0x7ffc10a91b20, flags=0) at block/block-backend.c:1032
#7 0x000000d3c011ccec in blk_read_entry (opaque=0x7ffc10a91b40) at block/block-backend.c:1079
#8 0x000000d3c01bbb96 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79
#9 0x00007f3196cb8600 in __start_context () at /lib64/libc.so.6
The problem is that nbd_client_init() uses
nbd_client_attach_aio_context() -> aio_co_schedule(new_context,
client->read_reply_co). Execution of read_reply_co is deferred to a BH
which doesn't run until later.
In the mean time blk_co_preadv() can be called and nbd_coroutine_end()
calls aio_wake() on read_reply_co. At this point in time
read_reply_co's ctx isn't set because it has never been entered yet.
This patch simplifies the nbd_co_send_request() ->
nbd_co_receive_reply() -> nbd_coroutine_end() lifecycle to just
nbd_co_send_request() -> nbd_co_receive_reply(). The request is "ended"
if an error occurs at any point. Callers no longer have to invoke
nbd_coroutine_end().
This cleanup also eliminates the segfault because we don't call
aio_co_schedule() to wake up s->read_reply_co if sending the request
failed. It is only necessary to wake up s->read_reply_co if a reply was
received.
Note this only happens with UNIX domain sockets on Linux. It doesn't
seem possible to reproduce this with TCP sockets.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/nbd-client.c | 25 +++++++++----------------
1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 25bcaa2346..ea728fffc8 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -144,12 +144,12 @@ static int nbd_co_send_request(BlockDriverState *bs,
request->handle = INDEX_TO_HANDLE(s, i);
if (s->quit) {
- qemu_co_mutex_unlock(&s->send_mutex);
- return -EIO;
+ rc = -EIO;
+ goto err;
}
if (!s->ioc) {
- qemu_co_mutex_unlock(&s->send_mutex);
- return -EPIPE;
+ rc = -EPIPE;
+ goto err;
}
if (qiov) {
@@ -166,8 +166,13 @@ static int nbd_co_send_request(BlockDriverState *bs,
} else {
rc = nbd_send_request(s->ioc, request);
}
+
+err:
if (rc < 0) {
s->quit = true;
+ s->requests[i].coroutine = NULL;
+ s->in_flight--;
+ qemu_co_queue_next(&s->free_sema);
}
qemu_co_mutex_unlock(&s->send_mutex);
return rc;
@@ -201,13 +206,6 @@ static void nbd_co_receive_reply(NBDClientSession *s,
/* Tell the read handler to read another header. */
s->reply.handle = 0;
}
-}
-
-static void nbd_coroutine_end(BlockDriverState *bs,
- NBDRequest *request)
-{
- NBDClientSession *s = nbd_get_client_session(bs);
- int i = HANDLE_TO_INDEX(s, request->handle);
s->requests[i].coroutine = NULL;
@@ -243,7 +241,6 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
} else {
nbd_co_receive_reply(client, &request, &reply, qiov);
}
- nbd_coroutine_end(bs, &request);
return -reply.error;
}
@@ -272,7 +269,6 @@ int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
} else {
nbd_co_receive_reply(client, &request, &reply, NULL);
}
- nbd_coroutine_end(bs, &request);
return -reply.error;
}
@@ -306,7 +302,6 @@ int nbd_client_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
} else {
nbd_co_receive_reply(client, &request, &reply, NULL);
}
- nbd_coroutine_end(bs, &request);
return -reply.error;
}
@@ -330,7 +325,6 @@ int nbd_client_co_flush(BlockDriverState *bs)
} else {
nbd_co_receive_reply(client, &request, &reply, NULL);
}
- nbd_coroutine_end(bs, &request);
return -reply.error;
}
@@ -355,7 +349,6 @@ int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
} else {
nbd_co_receive_reply(client, &request, &reply, NULL);
}
- nbd_coroutine_end(bs, &request);
return -reply.error;
}
--
2.13.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v2 3/3] qemu-iotests: test NBD over UNIX domain sockets in 083
2017-08-29 12:27 [Qemu-devel] [PATCH v2 0/3] nbd-client: enter read_reply_co during init to avoid crash Stefan Hajnoczi
2017-08-29 12:27 ` [Qemu-devel] [PATCH v2 1/3] nbd-client: avoid read_reply_co entry if send failed Stefan Hajnoczi
2017-08-29 12:27 ` [Qemu-devel] [PATCH v2 2/3] qemu-iotests: improve nbd-fault-injector.py startup protocol Stefan Hajnoczi
@ 2017-08-29 12:27 ` Stefan Hajnoczi
2017-08-29 21:58 ` [Qemu-devel] [PATCH v2 0/3] nbd-client: enter read_reply_co during init to avoid crash Eric Blake
3 siblings, 0 replies; 7+ messages in thread
From: Stefan Hajnoczi @ 2017-08-29 12:27 UTC (permalink / raw)
To: qemu-devel
Cc: Kevin Wolf, Vladimir Sementsov-Ogievskiy, Paolo Bonzini,
qemu-block, Eric Blake, Stefan Hajnoczi
083 only tests TCP. Some failures might be specific to UNIX domain
sockets.
A few adjustments are necessary:
1. Generating a port number and waiting for server startup is
TCP-specific. Use the new nbd-fault-injector.py startup protocol to
fetch the address. This is a little more elegant because we don't
need netstat anymore.
2. The NBD filter does not work for the UNIX domain sockets URIs we
generate and must be extended.
3. Run all tests twice: once for TCP and once for UNIX domain sockets.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tests/qemu-iotests/083 | 138 +++++++++++++++++++++++--------------
tests/qemu-iotests/083.out | 145 ++++++++++++++++++++++++++++++++++-----
tests/qemu-iotests/common.filter | 4 +-
3 files changed, 215 insertions(+), 72 deletions(-)
diff --git a/tests/qemu-iotests/083 b/tests/qemu-iotests/083
index bff9360048..0306f112da 100755
--- a/tests/qemu-iotests/083
+++ b/tests/qemu-iotests/083
@@ -27,6 +27,14 @@ echo "QA output created by $seq"
here=`pwd`
status=1 # failure is the default!
+_cleanup()
+{
+ rm -f nbd.sock
+ rm -f nbd-fault-injector.out
+ rm -f nbd-fault-injector.conf
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
# get standard environment, filters and checks
. ./common.rc
. ./common.filter
@@ -35,81 +43,105 @@ _supported_fmt generic
_supported_proto nbd
_supported_os Linux
-# Pick a TCP port based on our pid. This way multiple instances of this test
-# can run in parallel without conflicting.
-choose_tcp_port() {
- echo $((($$ % 31744) + 1024)) # 1024 <= port < 32768
-}
-
-wait_for_tcp_port() {
- while ! (netstat --tcp --listening --numeric | \
- grep "$1.*0\\.0\\.0\\.0:\\*.*LISTEN") >/dev/null 2>&1; do
- sleep 0.1
- done
-}
-
check_disconnect() {
+ local event export_name=foo extra_args nbd_addr nbd_url proto when
+
+ while true; do
+ case $1 in
+ --classic-negotiation)
+ shift
+ extra_args=--classic-negotiation
+ export_name=
+ ;;
+ --tcp)
+ shift
+ proto=tcp
+ ;;
+ --unix)
+ shift
+ proto=unix
+ ;;
+ *)
+ break
+ ;;
+ esac
+ done
+
event=$1
when=$2
- negotiation=$3
echo "=== Check disconnect $when $event ==="
echo
- port=$(choose_tcp_port)
-
cat > "$TEST_DIR/nbd-fault-injector.conf" <<EOF
[inject-error]
event=$event
when=$when
EOF
- if [ "$negotiation" = "--classic-negotiation" ]; then
- extra_args=--classic-negotiation
- nbd_url="nbd:127.0.0.1:$port"
+ if [ "$proto" = "tcp" ]; then
+ nbd_addr="127.0.0.1:0"
else
- nbd_url="nbd:127.0.0.1:$port:exportname=foo"
+ nbd_addr="$TEST_DIR/nbd.sock"
+ fi
+
+ rm -f "$TEST_DIR/nbd.sock"
+
+ $PYTHON nbd-fault-injector.py $extra_args "$nbd_addr" "$TEST_DIR/nbd-fault-injector.conf" >"$TEST_DIR/nbd-fault-injector.out" 2>&1 &
+
+ # Wait for server to be ready
+ while ! grep -q 'Listening on ' "$TEST_DIR/nbd-fault-injector.out"; do
+ sleep 0.1
+ done
+
+ # Extract the final address (port number has now been assigned in tcp case)
+ nbd_addr=$(sed 's/Listening on \(.*\)$/\1/' "$TEST_DIR/nbd-fault-injector.out")
+
+ if [ "$proto" = "tcp" ]; then
+ nbd_url="nbd+tcp://$nbd_addr/$export_name"
+ else
+ nbd_url="nbd+unix:///$export_name?socket=$nbd_addr"
fi
- $PYTHON nbd-fault-injector.py $extra_args "127.0.0.1:$port" "$TEST_DIR/nbd-fault-injector.conf" >/dev/null 2>&1 &
- wait_for_tcp_port "127\\.0\\.0\\.1:$port"
$QEMU_IO -c "read 0 512" "$nbd_url" 2>&1 | _filter_qemu_io | _filter_nbd
echo
}
-for event in neg1 "export" neg2 request reply data; do
- for when in before after; do
- check_disconnect "$event" "$when"
- done
-
- # Also inject short replies from the NBD server
- case "$event" in
- neg1)
- for when in 8 16; do
- check_disconnect "$event" "$when"
- done
- ;;
- "export")
- for when in 4 12 16; do
- check_disconnect "$event" "$when"
- done
- ;;
- neg2)
- for when in 8 10; do
- check_disconnect "$event" "$when"
+for proto in tcp unix; do
+ for event in neg1 "export" neg2 request reply data; do
+ for when in before after; do
+ check_disconnect "--$proto" "$event" "$when"
done
- ;;
- reply)
- for when in 4 8; do
- check_disconnect "$event" "$when"
- done
- ;;
- esac
-done
-# Also check classic negotiation without export information
-for when in before 8 16 24 28 after; do
- check_disconnect "neg-classic" "$when" --classic-negotiation
+ # Also inject short replies from the NBD server
+ case "$event" in
+ neg1)
+ for when in 8 16; do
+ check_disconnect "--$proto" "$event" "$when"
+ done
+ ;;
+ "export")
+ for when in 4 12 16; do
+ check_disconnect "--$proto" "$event" "$when"
+ done
+ ;;
+ neg2)
+ for when in 8 10; do
+ check_disconnect "--$proto" "$event" "$when"
+ done
+ ;;
+ reply)
+ for when in 4 8; do
+ check_disconnect "--$proto" "$event" "$when"
+ done
+ ;;
+ esac
+ done
+
+ # Also check classic negotiation without export information
+ for when in before 8 16 24 28 after; do
+ check_disconnect "--$proto" --classic-negotiation "neg-classic" "$when"
+ done
done
# success, all done
diff --git a/tests/qemu-iotests/083.out b/tests/qemu-iotests/083.out
index a24c6bfece..a7fb081889 100644
--- a/tests/qemu-iotests/083.out
+++ b/tests/qemu-iotests/083.out
@@ -1,43 +1,43 @@
QA output created by 083
=== Check disconnect before neg1 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect after neg1 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect 8 neg1 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect 16 neg1 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect before export ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect after export ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect 4 export ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect 12 export ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect 16 export ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect before neg2 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect after neg2 ===
@@ -45,11 +45,11 @@ read failed: Input/output error
=== Check disconnect 8 neg2 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect 10 neg2 ===
-can't open device nbd:127.0.0.1:PORT:exportname=foo
+can't open device nbd+tcp://127.0.0.1:PORT/foo
=== Check disconnect before request ===
@@ -88,23 +88,134 @@ read 512/512 bytes at offset 0
=== Check disconnect before neg-classic ===
-can't open device nbd:127.0.0.1:PORT
+can't open device nbd+tcp://127.0.0.1:PORT/
=== Check disconnect 8 neg-classic ===
-can't open device nbd:127.0.0.1:PORT
+can't open device nbd+tcp://127.0.0.1:PORT/
=== Check disconnect 16 neg-classic ===
-can't open device nbd:127.0.0.1:PORT
+can't open device nbd+tcp://127.0.0.1:PORT/
=== Check disconnect 24 neg-classic ===
-can't open device nbd:127.0.0.1:PORT
+can't open device nbd+tcp://127.0.0.1:PORT/
=== Check disconnect 28 neg-classic ===
-can't open device nbd:127.0.0.1:PORT
+can't open device nbd+tcp://127.0.0.1:PORT/
+
+=== Check disconnect after neg-classic ===
+
+read failed: Input/output error
+
+=== Check disconnect before neg1 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect after neg1 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 8 neg1 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 16 neg1 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect before export ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect after export ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 4 export ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 12 export ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 16 export ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect before neg2 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect after neg2 ===
+
+read failed: Input/output error
+
+=== Check disconnect 8 neg2 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 10 neg2 ===
+
+can't open device nbd+unix:///foo?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect before request ===
+
+read failed: Input/output error
+
+=== Check disconnect after request ===
+
+read failed: Input/output error
+
+=== Check disconnect before reply ===
+
+read failed: Input/output error
+
+=== Check disconnect after reply ===
+
+read failed: Input/output error
+
+=== Check disconnect 4 reply ===
+
+read failed
+read failed: Input/output error
+
+=== Check disconnect 8 reply ===
+
+read failed
+read failed: Input/output error
+
+=== Check disconnect before data ===
+
+read failed: Input/output error
+
+=== Check disconnect after data ===
+
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Check disconnect before neg-classic ===
+
+can't open device nbd+unix:///?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 8 neg-classic ===
+
+can't open device nbd+unix:///?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 16 neg-classic ===
+
+can't open device nbd+unix:///?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 24 neg-classic ===
+
+can't open device nbd+unix:///?socket=TEST_DIR/nbd.sock
+
+=== Check disconnect 28 neg-classic ===
+
+can't open device nbd+unix:///?socket=TEST_DIR/nbd.sock
=== Check disconnect after neg-classic ===
diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 7a58e57317..9d5442ecd9 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -170,9 +170,9 @@ _filter_nbd()
#
# Filter out the TCP port number since this changes between runs.
sed -e '/nbd\/.*\.c:/d' \
- -e 's#nbd:\(//\)\?127\.0\.0\.1:[0-9]*#nbd:\1127.0.0.1:PORT#g' \
+ -e 's#127\.0\.0\.1:[0-9]*#127.0.0.1:PORT#g' \
-e "s#?socket=$TEST_DIR#?socket=TEST_DIR#g" \
- -e 's#\(exportname=foo\|PORT\): Failed to .*$#\1#'
+ -e 's#\(foo\|PORT/\?\|.sock\): Failed to .*$#\1#'
}
# make sure this script returns success
--
2.13.5
^ permalink raw reply related [flat|nested] 7+ messages in thread