All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Keir Fraser <keir.xen@gmail.com>
Cc: "Keir (Xen.org)" <keir@xen.org>, Jan Beulich <jbeulich@suse.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: Hypervisor memory leak/corruption because of guest irqs
Date: Fri, 7 Sep 2012 20:08:09 +0100	[thread overview]
Message-ID: <504A4619.90707@citrix.com> (raw)
In-Reply-To: <CC6FFE8E.3E156%keir.xen@gmail.com>


On 07/09/12 19:42, Keir Fraser wrote:
> Seems it means noone thought properly about teardown of guest-bound irqs.
> Probably because a lot of that code was dumbly ported over from Linux later.
>
> As for abuse of desc->action, you could turn that field explicitly into a
> discriminated union; it is already precisely discriminated by
> desc->status&IRQ_GUEST. Apart from that syntactic sugar, the idea of having
> that pointer point at two different things dependent on irq type doesn't
> seem ugly to me -- if it's irq-bound then it does not have, nor does it
> need, an irqaction. But the irq_guest_action of course has to be dealt with
> and your problem is that noone has thought about it!
>
>  -- Keir

Its not completely trivial to turn it into a discriminated union (in an
acceptable way), as irq_desc and irqaction is common while
irq_guest_action_t is x86 specific (and irq.c private, currently),
meaning the introduction of an arch_irqaction.

Now that the two actions are different, you could perhaps convert
__do_IRQ_guest() to be the 'handler' of the irqaction, which appears to
cause large amounts of "if(desc->status & IRQ_GUEST) then do something"
code to fall out.  Continuing along this line leads to large amounts of
code change.

I will see about working on a bare minimum patch to make the teardown of
guest IRQs safe, but I dont think it will be very small, and it will
open the way for some cleanup.

~Andrew

>
> On 07/09/2012 19:04, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:
>
>> Hello,
>>
>> I appear to have opened a can of worms here.  This was discovered when
>> turning on ASSERT()s, looking for another crash (and adding in 98 new
>> ASSERTs, along with some 'type checking' magic constants)
>>
>> The issue has been discovered against Xen 4.1.3, but a cursory
>> inspection of unstable shows no signs of it being fixed. The relevant
>> assertion (which I added) is attached.  (In the process of debugging, I
>> also developed the ASSERT_PRINTK(bool, fmt, ...) macro which will be
>> upstreamed in due course.)
>>
>> The root cause of the problem is the compelete abuse of the
>> irq_desc->action pointer being cast to a irq_guest_action_t* when
>> in-fact it is an irqaction*, but the (correct) solution is not easy.
>>
>> destroy_irq() calls dynamic_irq_cleanup() which xfree()'s desc->action.
>> This would be all well and fine if it were only an irqaction pointer.
>> However, in this case, it is actually an irq_guest_action_t pointer,
>> meaning that we have free()'d an inactive timer, which is on a pcpu's
>> inactive timer linked list.  This means that as soon as the free()'d
>> memory is reused for something new, the linked list gets trashed, which
>> which point all bets are off with regards to the validity of hypervisor
>> memory.
>>
>> As far as I can tell, this bug only manifests in combination with PCI
>> Passthrough, as we only perform cleanup of guest irqs when a domain with
>> passthrough is shut down.  The issue was first found by the ASSERT()s in
>> __list_del(), when something tried to use the pcpu inactive timer list,
>> after the free()'d memory was reused.
>>
>> In this specific case, a quick and dirty hack would be to check every
>> time we free an action and possibly kill the timer if it is a guest irq.
>>
>> Having said that, it is not a correct fix; the utter abuse of
>> irq_desc->action has been a ticking timebomb for a long time.
>> irq_guest_action_t is private to the 2nd half of irq.c(x86), whereas
>> irqaction is common and architecture independent.  The only acceptable
>> solution I see is to re-architect a substantial proportion of the irq code.
>>
>> Am I missing something obvious? or is the best way to continue (in which
>> case I have my work cut out as, it is currently affecting XenServer
>> customers) ?
>

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

  reply	other threads:[~2012-09-07 19:08 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-07 18:04 Hypervisor memory leak/corruption because of guest irqs Andrew Cooper
2012-09-07 18:42 ` Keir Fraser
2012-09-07 19:08   ` Andrew Cooper [this message]
2012-09-10 12:59     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=504A4619.90707@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=keir.xen@gmail.com \
    --cc=keir@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.