public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* xenbus hang after userspace ctrl-c of xenstore-rm
@ 2019-10-01  9:57 James Dingwall
  2019-10-01 11:33 ` Jürgen Groß
  0 siblings, 1 reply; 2+ messages in thread
From: James Dingwall @ 2019-10-01  9:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: Boris Ostrovsky, Juergen Gross, Stefano Stabellini

Hi,

I have been investigating a problem where xenstore becomes unresponsive 
during domain shutdowns.  My test script seems to trigger the problem 
but without definitively being the same.  It is possible to replicate 
the issue in dom0 or a domU.  If the test script is run in dom0 it seems 
that it is possible to affect xenstore access in domUs but I have not 
observed any negative impact in dom0 or other guests when running in a 
domU.

The environment is a default Ubuntu 5.0.0-29-generic kernel, xen 
4.11.3-pre (built from current head of staging-4.11), xenstore is 
running in a stubdom.  I did try a kernel with 
d10e0cc113c9e1b64b5c6e3db37b5c839794f3df "xenbus: Avoid deadlock during 
suspend due to open transactions" but that didn't help, this stack trace 
is with that patch applied.

[ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
[ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
[ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2551.528585] xenbus          D    0    37      2 0x80000080
[ 2551.528590] Call Trace:
[ 2551.528603]  __schedule+0x2c0/0x870
[ 2551.528606]  ? _cond_resched+0x19/0x40
[ 2551.528632]  schedule+0x2c/0x70
[ 2551.528637]  xs_talkv+0x1ec/0x2b0
[ 2551.528642]  ? wait_woken+0x80/0x80
[ 2551.528645]  xs_single+0x53/0x80
[ 2551.528648]  xenbus_transaction_end+0x3b/0x70
[ 2551.528651]  xenbus_file_free+0x5a/0x160
[ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
[ 2551.528657]  xenbus_thread+0x7de/0x880
[ 2551.528660]  ? wait_woken+0x80/0x80
[ 2551.528665]  kthread+0x121/0x140
[ 2551.528667]  ? xb_read+0x1d0/0x1d0
[ 2551.528670]  ? kthread_park+0x90/0x90
[ 2551.528673]  ret_from_fork+0x35/0x40

From a vanilla Ubuntu 5.0.0-29-generic kernel (seems to be the same):

[ 3639.401276] INFO: task xenbus:37 blocked for more than 120 seconds.
[ 3639.417908]       Tainted: P           OE     5.0.0-29-generic #31~18.04.1-Ubuntu
[ 3639.435642] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3639.453824] xenbus          D    0    37      2 0x80000080
[ 3639.453828] Call Trace:
[ 3639.453837]  __schedule+0x2bd/0x850
[ 3639.453842]  ? __wake_up+0x13/0x20
[ 3639.453844]  ? _cond_resched+0x19/0x40
[ 3639.453845]  schedule+0x2c/0x70
[ 3639.453848]  xs_talkv+0x1e8/0x2a0
[ 3639.453850]  ? wait_woken+0x80/0x80
[ 3639.453852]  xs_single+0x53/0x80
[ 3639.453853]  xenbus_transaction_end+0x3b/0x70
[ 3639.453855]  xenbus_file_free+0x5a/0x160
[ 3639.453857]  xenbus_dev_queue_reply+0xc4/0x220
[ 3639.453859]  xenbus_thread+0x7de/0x880
[ 3639.453861]  ? wait_woken+0x80/0x80
[ 3639.453864]  kthread+0x121/0x140
[ 3639.453865]  ? xb_read+0x1d0/0x1d0
[ 3639.453867]  ? kthread_park+0x90/0x90
[ 3639.453870]  ret_from_fork+0x35/0x40

To reproduce this I run the script once and allow it to complete.  In a 
second run if I ctrl-c while it is doing the xenstore-rm it exits but 
then further xenstore commands are unresponsive.  I haven't tried to 
reduce the numbers but I assume that with only a small number of keys 
that interrupting the xenstore-rm would have the same result.  It was 
just a hunch that during domain shutdown the toolstack is doing a clean 
up and removing keys for the shutdown domain which is why I wrote this 
test.

I'm happy to provide further information about the configuration or run 
other tests.

Thanks,
James

----- Test script -----

#!/bin/bash

set -eu

# just in case xenstore stubdom tells us anything
xl set-parameters guest_loglvl=debug

tree="${1:-/tree}"
xcount="256"
branches="256"
leaves="256"
xstring="$(awk 'BEGIN { while (c++<'"${xcount}"') printf "x" }')"

echo "writing approximately $((xcount * branches * leaves)) of data to xenstore under ${tree}"

xenstore-rm "${tree}"
xenstore-write "${tree}" ""

for branch in $(seq 1 "${branches}") ; do
    echo "filling branch ${branch} of ${branches}"
    xenstore-write "${tree}/branch${branch}" ""
    for leaf in $(seq 1 "${leaves}") ; do
        xenstore-write "${tree}/branch${branch}/leaf${leaf}" "${xstring}"
    done
done


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: xenbus hang after userspace ctrl-c of xenstore-rm
  2019-10-01  9:57 xenbus hang after userspace ctrl-c of xenstore-rm James Dingwall
@ 2019-10-01 11:33 ` Jürgen Groß
  0 siblings, 0 replies; 2+ messages in thread
From: Jürgen Groß @ 2019-10-01 11:33 UTC (permalink / raw)
  To: James Dingwall, linux-kernel; +Cc: Stefano Stabellini, Boris Ostrovsky

On 01.10.19 11:57, James Dingwall wrote:
> Hi,
> 
> I have been investigating a problem where xenstore becomes unresponsive
> during domain shutdowns.  My test script seems to trigger the problem
> but without definitively being the same.  It is possible to replicate
> the issue in dom0 or a domU.  If the test script is run in dom0 it seems
> that it is possible to affect xenstore access in domUs but I have not
> observed any negative impact in dom0 or other guests when running in a
> domU.
> 
> The environment is a default Ubuntu 5.0.0-29-generic kernel, xen
> 4.11.3-pre (built from current head of staging-4.11), xenstore is
> running in a stubdom.  I did try a kernel with
> d10e0cc113c9e1b64b5c6e3db37b5c839794f3df "xenbus: Avoid deadlock during
> suspend due to open transactions" but that didn't help, this stack trace
> is with that patch applied.
> 
> [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds.
> [ 2551.492215]       Tainted: P           OE     5.0.0-29-generic #5
> [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2551.528585] xenbus          D    0    37      2 0x80000080
> [ 2551.528590] Call Trace:
> [ 2551.528603]  __schedule+0x2c0/0x870
> [ 2551.528606]  ? _cond_resched+0x19/0x40
> [ 2551.528632]  schedule+0x2c/0x70
> [ 2551.528637]  xs_talkv+0x1ec/0x2b0
> [ 2551.528642]  ? wait_woken+0x80/0x80
> [ 2551.528645]  xs_single+0x53/0x80
> [ 2551.528648]  xenbus_transaction_end+0x3b/0x70
> [ 2551.528651]  xenbus_file_free+0x5a/0x160
> [ 2551.528654]  xenbus_dev_queue_reply+0xc4/0x220
> [ 2551.528657]  xenbus_thread+0x7de/0x880
> [ 2551.528660]  ? wait_woken+0x80/0x80
> [ 2551.528665]  kthread+0x121/0x140
> [ 2551.528667]  ? xb_read+0x1d0/0x1d0
> [ 2551.528670]  ? kthread_park+0x90/0x90
> [ 2551.528673]  ret_from_fork+0x35/0x40

Yes, this is a self-deadlock when cleaning up a user's file context.
Thanks for the nice debug data. :-)

I need to do the cleanup via a workqueue instead of calling it directly.

Cooking up a patch now...


Juergen

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-10-01 11:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-10-01  9:57 xenbus hang after userspace ctrl-c of xenstore-rm James Dingwall
2019-10-01 11:33 ` Jürgen Groß

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox