public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>,
	Ian Campbell <ijc@hellion.org.uk>,
	Joel Becker <Joel.Becker@oracle.com>,
	Jody Belka <lists-lkml@pimb.org>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Andi Kleen <andi@firstfloor.org>,
	Mika Penttila <mika.penttila@kolumbus.fi>
Subject: Re: 2.6.25-rc1 xen pvops regression
Date: Thu, 21 Feb 2008 15:04:08 -0700	[thread overview]
Message-ID: <m14pc26k6f.fsf@ebiederm.dsl.xmission.com> (raw)
In-Reply-To: <47BDEB57.5040203@zytor.com> (H. Peter Anvin's message of "Thu, 21 Feb 2008 13:21:27 -0800")

"H. Peter Anvin" <hpa@zytor.com> writes:

> Jeremy Fitzhardinge wrote:
>> Ian Campbell wrote:
>>> I'll see if I can track down where the page is getting used and have a
>>> go at getting in there first. It must be pretty early to be allocated
>>> already when dmi_scan_machine gets called.
>>>
>>> It's possible that the domain builder might have already allocated a PT
>>> at this address. I haven't checked but I think currently the domain
>>> builder always puts PT pages after the kernel so hopefully it's only a
>>> theoretical problem.
>>>
>>
>> Yes, it does.  And presumably the early pagetable builder is guaranteed to
>> avoid special memory like the DMI space.  But the bug definitely seems to be a
>> result of the DMI code trying to make a RW mapping of a pagetable page, so
>> something is amiss there.
>>
>> Ooh, sleazy hack idea: make DMI always map RO, so even if it does get a
>> pagetable it causes no complaint...  A bit awkward, since there doesn't seem
>> to be an RO form of early_ioremap.
>>
>>> Another option I was thinking of was a command line option to disable
>>> DMI, which (maybe) isn't terribly useful in itself but it introduces an
>>> associated variable to frob with. That's similar to how the TSC was
>>> handled in the past (well, the opposite since TSC was forced on).
>>>
>>
>> Yep, that would work too.
>>
>> Still curious about why a pagetable page is ending up in that range though.
>> Seems like it shouldn't be possible, since we shouldn't be allowed to allocate
>> from those pages, at least until the DMI probe has happened...  Unless the
>> early allocator is only excluded from e820 reserved pages, which would cause a
>> problem on systems which don't reserve the DMI space...  HPA?
>>
>
> I thought the problem was a Xen-provided pagetable from before Linux started?

The immediate symptom was that we have a page table at the address we
are doing the DMI probe.  Xen does not allow pages tables to be mapped
read-write so early_ioremap get into trouble.

We have a mystery:
- Why did the Xen domain builder or the linux kernel use 0xf0000 - 0x10000
  for a page table.

  It should be possible to instrument the early linux page allocation
  and see what page pages linux is using to see who is doing weird
  things.

We have possible solutions.
- Add a read-only flag to early_ioremap for use by our table scans.
- Don't do a DMI scan in the case of Xen.
- Fix the Xen domain builder.

My inclination is that we disable the fruitless DMI scan in the case
Xen, or we get the Xen domain builder fixed.

If Xen is going to increasingly look like a normal X86 BIOS we should
let the DMI scan run and be put the burden on Xen to keep things
looking like a normal x86 machine.  If Xen is not going to look more
like a normal x86 machine we can say oh, that is nice, it's Xen so
don't bother doing things that will cause problems.

In this case can we confirm that the domain builder is using those
early 64k as pages for a page table, and then educate it that not
allowing OS access to those pages is a little silly.

All of that said.  For DMI tables other early tables we should not
be writing to them anyway so learning to use read-only maps may be
the right solution.  If the reason Xen was complaining was that
we were accessing an area that was not page tables but instead
should only be mapped read-only I would have a lot of sympathy
with that.  As mapping areas that are architecturally ROMs read-write
is silly.

So guys can you please finish the root cause and really see why there
is a page table page at in 64K ROM BIOS area?

Eric

  parent reply	other threads:[~2008-02-21 22:35 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-12 23:54 2.6.25-rc1 xen pvops regression Jody Belka
2008-02-13 11:59 ` Jeremy Fitzhardinge
2008-02-13 12:13   ` Jody Belka
2008-02-13 12:23   ` Ingo Molnar
2008-02-14  2:27   ` Joel Becker
2008-02-14  7:50     ` Jeremy Fitzhardinge
2008-02-15 20:23       ` Joel Becker
2008-02-16  2:44         ` Jeremy Fitzhardinge
2008-02-16  8:54           ` Joel Becker
2008-02-16 11:46             ` Jeremy Fitzhardinge
2008-02-17  6:29               ` Joel Becker
2008-02-17 12:09                 ` Jeremy Fitzhardinge
2008-02-17  6:39               ` Joel Becker
2008-02-17 18:49         ` Ian Campbell
2008-02-18 10:40           ` Joel Becker
2008-02-19 21:50             ` Ian Campbell
2008-02-19 21:59             ` Ian Campbell
2008-02-20  7:43               ` H. Peter Anvin
2008-02-20  8:51                 ` Ian Campbell
2008-02-20 21:42                   ` Joel Becker
2008-02-20 22:30                     ` Ian Campbell
2008-02-20 21:58                   ` Jeremy Fitzhardinge
2008-02-20 22:29                     ` Ian Campbell
2008-02-21 21:16                       ` Jeremy Fitzhardinge
2008-02-21 21:21                         ` H. Peter Anvin
2008-02-21 21:37                           ` Jeremy Fitzhardinge
2008-02-21 21:44                             ` H. Peter Anvin
2008-02-21 22:12                             ` Ian Campbell
2008-02-21 22:23                               ` H. Peter Anvin
2008-02-21 22:49                                 ` Jeremy Fitzhardinge
2008-02-21 22:58                                   ` H. Peter Anvin
2008-02-22  7:25                                     ` Ian Campbell
2008-02-22  9:28                                       ` Alan Cox
2008-02-22  9:55                                         ` Andi Kleen
2008-02-22 10:00                                           ` Alan Cox
2008-02-22 10:15                                             ` Andi Kleen
2008-02-22 16:27                                               ` H. Peter Anvin
2008-02-22 19:25                                               ` Pavel Machek
2008-02-26 17:06                                       ` Mark McLoughlin
2008-02-26 20:05                                         ` Jeremy Fitzhardinge
2008-02-21 22:58                               ` Joel Becker
2008-02-21 22:04                           ` Eric W. Biederman [this message]
2008-02-21 23:14                             ` Jeremy Fitzhardinge
2008-02-21 23:26                               ` H. Peter Anvin
2008-02-21 23:46                                 ` Jeremy Fitzhardinge
2008-02-21 23:57                                   ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m14pc26k6f.fsf@ebiederm.dsl.xmission.com \
    --to=ebiederm@xmission.com \
    --cc=Joel.Becker@oracle.com \
    --cc=andi@firstfloor.org \
    --cc=hpa@zytor.com \
    --cc=ijc@hellion.org.uk \
    --cc=jeremy@goop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lists-lkml@pimb.org \
    --cc=mika.penttila@kolumbus.fi \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox