xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: David Vrabel <david.vrabel@citrix.com>
To: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Cc: Attilio Rao <attilio.rao@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
Date: Thu, 16 Aug 2012 15:33:14 +0100	[thread overview]
Message-ID: <502D04AA.8030107@citrix.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1208161200200.2278@kaball.uk.xensource.com>

On 16/08/12 12:07, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, David Vrabel wrote:
>> On 15/08/12 14:55, Stefano Stabellini wrote:
>>> On Wed, 15 Aug 2012, David Vrabel wrote:
>>>> On 14/08/12 15:12, Attilio Rao wrote:
>>>>> On 14/08/12 14:57, David Vrabel wrote:
>>>>>> On 14/08/12 13:24, Attilio Rao wrote:
>>>> After looking at it some more, I think this pv-ops is unnecessary. How
>>>> about the following patch to just remove it completely?
>>>>
>>>> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
>>>> is sound.
>>>
>>> Do you have more then 4G to dom0 on those boxes?
>>
>> I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.
>>
>>> It certainly fixed a serious crash at the time it was introduced, see
>>> http://marc.info/?l=linux-kernel&m=129901609503574 and
>>> http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
>>> changed in kernel_physical_mapping_init, I think we still need it.
>>> Depending on the e820 of your test box, the kernel could crash (or not),
>>> possibly in different places.

FYI, it looks like pgt_buf is now located low down, which is why these
changes worked for me.  Possibly this changed as part of a memblock
refactor.

>>>>>> Having said that, I couldn't immediately see where pages in (end, 
>>>>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
>>>>>> done?
>>>>>>
>>>>>
>>>>> As mentioned in the comment, please look at xen_set_pte_init().
>>>>
>>>> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
>>>> is already present and read-only.
>>>
>>> look at mask_rw_pte and read the threads linked above, unfortunately it
>>> is not that simple.
>>
>> Yes, I was remembering what 32-bit did here.
>>
>> The 64-bit version is a bit confused and it often ends up /not/ clearing
>> RW for the direct mapping of the pages in the pgt_buf because any
>> existing RW mappings will be used as-is.  See phys_pte_init() which
>> checks for an existing mapping and only sets the PTE if it is not
>> already set.
> 
> not all the pagetable pages might be already mapped, even if they are
> already hooked into the pagetable

Yes, I think this is easy to handle though.

    /*
     * Make sure that active page table pages are not mapped RW.
     */
    if (is_early_ioremap_ptep(ptep)) {
        /*
         * If we're updating an early ioremap PTE, then this PFN may
         * already be in the linear mapping.  If it is, use the
         * existing RW bit.
         */
        unsigned int level;
        pte_t *linear_pte;

        linear_pte = lookup_address(__va(PFN_PHYS(pfn)), &level);
        if (linear_pte && !pte_write(*linear_pt))
            pte = pte_wrprotect(pte);
    } else if (pfn >= pgt_buf_start && pfn < pgt_buf_end) {
        /*
         * The PFN may not be mapped but may be hooked into the page
         * tables.  Make sure this new mapping is read-only.
         */
        pte = pte_wrprotect(pte);
    }

However, the real subtlety are page tables that are mapped as they
themselves are hooked in.

As as example, let's say pgt_buf_start (s) is on a 1 GiB boundary and
pgt_buf_top (t) is below the next 2 MiB boundary.  We're mapping memory
with 4 KiB pages from s to s + 4 MiB.  This requires a new PUD and two
new PMDs.

To map this region:

1. Allocate a new PUD. (@ e = pgt_buf_end)

2. Allocate a new PMD. (@ e + 1)

3. Fill in this PMD's PTEs.  This covers pgt_buf so sets (s, e + 1) as RO.

4. Call pmd_populate() to hook-in this PMD.  This does not set e+1 as RO
(but this is ok as it already is).

5. Allocate a new PMD  (@ e + 2)

6. Fill in this PMD's PTEs.

7. Call pmd_populate() to hook in this PMD.  This does not set e+2 as RO
as the region isn't mapped yet.

8. Call pud_populate() to hook in this PUD.  This sets e as RO but e + 2
is still RW.

9. Boom!

It may be possible to fixup the permissions as the pages are hooked in.
 i.e., if this page's entries cover (pgt_buf_start, pgt_buf_end], walk
the entries and any child tables and fixup the permissions of the leaf
entries.

This would walk the tables a few times unless we were careful to only
walk them when hooking a page into an active table.

It was fun trying to understand this, but I think I'll give up now...

David

  reply	other threads:[~2012-08-16 14:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-14 12:24 [PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic Attilio Rao
2012-08-14 12:24 ` [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS Attilio Rao
2012-08-15 17:25   ` Stefano Stabellini
2012-08-15 17:15     ` Attilio Rao
2012-08-15 17:46       ` Stefano Stabellini
2012-08-15 18:43         ` Ian Campbell
2012-08-15 18:50           ` Attilio Rao
2012-08-15 18:47         ` Attilio Rao
2012-08-16  8:12           ` Ian Campbell
2012-08-16  9:53           ` Stefano Stabellini
2012-08-15 17:33     ` Stefano Stabellini
2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
2012-08-14 13:57   ` David Vrabel
2012-08-14 14:12     ` Attilio Rao
2012-08-15 11:19       ` David Vrabel
2012-08-15 13:55         ` Stefano Stabellini
2012-08-15 19:15           ` David Vrabel
2012-08-16 11:07             ` Stefano Stabellini
2012-08-16 14:33               ` David Vrabel [this message]
2012-08-15 17:46   ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502D04AA.8030107@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=attilio.rao@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).