All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	konrad wilk <konrad.wilk@oracle.com>
Subject: Re: Xen 4.3 + tmem = Xen BUG at domain_page.c:143
Date: Wed, 12 Jun 2013 14:16:20 +0100	[thread overview]
Message-ID: <51B874A4.8070906@eu.citrix.com> (raw)
In-Reply-To: <51B881BD02000078000DD907@nat28.tlf.novell.com>

On 12/06/13 13:12, Jan Beulich wrote:
>>>> On 12.06.13 at 13:00, George Dunlap <George.Dunlap@eu.citrix.com> wrote:
>> create ^
>> title it map_domain_page second-stage emergency fallback path never taken
>> thanks
>>
>> On Tue, Jun 11, 2013 at 7:52 PM, konrad wilk <konrad.wilk@oracle.com> wrote:
>>>> The BUG_ON() here is definitely valid - a few lines down, after the
>>>> enclosing if(), we use it in ways that requires this to not have
>>>> triggered. It basically tells you whether an in range idx was found,
>>>> which apparently isn't the case here.
>>>>
>>>> As I think George already pointed out - printing accum here would
>>>> be quite useful: It should have at least one of the low 32 bits set,
>>>> given that dcache->entries must be at most 32 according to the
>>>> data you already got logged.
>>>
>>> With extra debugging (see attached patch)
>>>
>>> (XEN) domain_page.c:125:d1 mfn: 1eb483, [0]: bffff1ff, ~ffffffff40000e00,
>>> idx: 9 garbage: 40000e00, inuse: ffffffff
>>> (XEN) domain_page.c:125:d1 mfn: 1eb480, [0]: fdbfffff, ~ffffffff02400000,
>>> idx: 22 garbage: 2400000, inuse: ffffffff
>>> (XEN) domain_page.c:125:d1 mfn: 2067ca, [0]: fffff7ff, ~ffffffff00000800,
>>> idx: 11 garbage: 800, inuse: ffffffff
>>> (XEN) domain_page.c:125:d1 mfn: 183642, [0]: ffffffff, ~ffffffff00000000,
>>> idx: 32 garbage: 0, inuse: ffffffff
>> So regardless of the fact that tmem is obviously holding what are
>> supposed to be short-term references for so long, there is something
>> that seems not quite right about this failure path.
>>
>> It looks like the algorithm is:
>> 1. Clean the garbage map and update the inuse list
>> 2. If anything has been cleaned up, use the first not-inuse entry
>> 3. Otherwise, do something else ("replace a hash entry" -- not sure
>> exactly what that means).
>>
>> What we see above is that this failure path succeeds three times, but
>> fails the fourth time: there are, in fact, no zero entries after the
>> garbage clean-up; however, because "inuse" is 32-bit (effectively) and
>> "accum" is 64-bit, ~inuse always has bits 32-63 set, and so will
>> always return true and never fall back to the "something else"
> Right, that's what occurred to me too yesterday, but the again
> I knew I had seen this code path executed. Now that I look again,
> I think I understand why: All of my Dom0-s and typical DomU-s
> have a vCPU count divisible by 4, and with MAPCACHE_VCPU_ENTRIES
> being 16, the full unsigned long would always be used.
>
>> This is probably not something we need to fix for 4.3, but we should
>> put it on our to-do list.
> Actually I think we should fix this right away.

How often is the second path taken in practice?

And, you said this doesn't happen with debug=n builds -- why not exactly?

I'm trying to assess the actual risk of not fixing it, vs the risk of 
fixing it.

  -George

  reply	other threads:[~2013-06-12 13:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 13:45 Xen 4.3 + tmem = Xen BUG at domain_page.c:143 konrad wilk
2013-06-11 14:46 ` Jan Beulich
2013-06-11 15:30   ` konrad wilk
2013-06-11 15:56     ` George Dunlap
2013-06-11 16:38     ` Jan Beulich
2013-06-11 17:30       ` konrad wilk
2013-06-11 18:52       ` konrad wilk
2013-06-11 21:06         ` konrad wilk
2013-06-12  6:38           ` Jan Beulich
2013-06-12 11:00         ` George Dunlap
2013-06-12 11:15           ` Processed: " xen
2013-06-12 11:37           ` George Dunlap
2013-06-12 12:46             ` Jan Beulich
2013-06-12 14:13             ` Konrad Rzeszutek Wilk
2013-06-12 12:12           ` Jan Beulich
2013-06-12 13:16             ` George Dunlap [this message]
2013-06-12 13:27               ` Jan Beulich
2013-06-12 15:11             ` Keir Fraser
2013-06-12 15:27               ` Keir Fraser
2013-06-12 15:54                 ` Jan Beulich
2013-06-12 15:48               ` Jan Beulich
2013-06-12 17:26                 ` Keir Fraser
2013-07-05 16:56                   ` George Dunlap
2013-07-08  8:58                     ` Jan Beulich
2013-07-08  9:07                       ` George Dunlap
2013-07-08  9:15                         ` Processed: " xen
2013-07-08  9:25                         ` George Dunlap
2013-07-08  9:30                           ` Processed: " xen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B874A4.8070906@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.