From: Ian Campbell <Ian.Campbell@citrix.com>
To: "greg@enjellic.com" <greg@enjellic.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Basic blktap2 functionality issues.
Date: Tue, 10 Apr 2012 10:04:52 +0000 [thread overview]
Message-ID: <1334052292.12209.164.camel@dagon.hellion.org.uk> (raw)
In-Reply-To: <201204071559.q37Fx5u3004223@wind.enjellic.com>
On Sat, 2012-04-07 at 16:59 +0100, Dr. Greg Wettstein wrote:
> On Mar 30, 12:23pm, Ian Campbell wrote:
> } Subject: Re: [Xen-devel] Basic blktap2 functionality issues.
>
> Hi, hope the weekend is going well for everyone.
>
> Sorry for the delay in getting back to everyone on this, had a
> deadline on another project.
>
> > On Fri, 2012-03-30 at 09:17 +0100, Ian Campbell wrote:
> > > I think an approach worth trying would be to have
> > > tapdisk_control_detach_vbd respond to TAPDISK_MESSAGE_DETACH before
> > > doing the actual detach. i.e. it would respond with "Yes, I will do
> > > that" rather than "Yes, I have done that". My speculation is that this
> > > will allow libxl to continue and hopefully avoid the deadlock.
>
> > This seems to be the case as the following fixes things for
> > me. Thanks very much for your analysis which lead me to this
> > solution...
>
> I ported your fix into 4.1.2 but I think we still have a problem, at
> least in this codebase.
>
> I no longer see the select timeout delay when xl shuts down but upon
> shutdown the minor number is not freed. A 'tap-ctl list' shows a
> steadily increasing set of orphaned minor numbers as VM's are started
> up and shutdown.
>
> Are you seeing this in your development codebase?
It turns out that I am, yes. e.g. after starting/stopping a guest 3
times:
# tap-ctl list
- 0 - - -
- 1 - - -
- 2 - - -
> The culprit is a failed ioctl call for BLKTAP2_IOCTL_FREE_TAP in
> tap_ctl_free(). The underlying reason for the ioctl failure is the
> check in [linuxsrc]:drivers/block/blktap/ring.c:blktap_ring_destroy()
drivers/*xen*/... right? Or do you have a different blktap to me?
> for whether or not the task_struct pointer in the blktap_ring
> structure has been NULLed.
>
> Which certainly makes sense since there is a race between xl's call to
> tap_ctl_free() and tapdisk2 getting to the point where it can close
> its descriptor to the blktap instance and thus invoke the .release
> method which translates into a call to blktap_ring_release() which
> NULL's the task_struct pointer.
Sounds right, but then you would expect both the ioctl and release path
to cleanup, depending on who loses the race?
There also seems to be a BLKTAP_SHUTDOWN_REQUESTED bit which looks like
it is involved somehow...
We've gotten way beyond my understanding of blktap internals though.
> If you are not seeing the orphan minor numbers there must be ordering
> changes in the unstable version of xl which eliminate or alters the
> race timing.
>
> For the sake of completeness of information for this thread I captured
> the following stack trace of a tapdisk2 when it is deadlocked against
> xl:
[... thanks ...]
> Let me know if you are seeing the issues I'm seeing, in the meantime I
> will keep hunting to see if I can rundown the ultimate cause of the
> deadlock. Given the above trace it has to be an issue with xl
> orchestrating the release of resources which reference the tapdev
> block device.
I'm not convinced that the shutdown stuff on the kernel side isn't
either horribly broken or at best fragile (perhaps xl just tickles it
differently to xend).
Please do keep hunting and let us know what you find...
Thanks,
Ian
next prev parent reply other threads:[~2012-04-10 10:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-07 15:59 Basic blktap2 functionality issues Dr. Greg Wettstein
2012-04-10 10:04 ` Ian Campbell [this message]
-- strict thread matches above, loose matches on Subject: below --
2012-04-17 15:55 Dr. Greg Wettstein
2012-04-11 15:13 Dr. Greg Wettstein
2012-03-29 14:39 greg
2012-03-29 17:35 ` Stefano Stabellini
2012-03-30 8:02 ` Ian Campbell
2012-03-30 8:17 ` Ian Campbell
2012-03-30 11:23 ` Ian Campbell
2012-04-02 15:51 ` Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1334052292.12209.164.camel@dagon.hellion.org.uk \
--to=ian.campbell@citrix.com \
--cc=greg@enjellic.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).