All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Vrabel <david.vrabel@citrix.com>
To: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Cc: Attilio Rao <attilio.rao@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2 2/2] Xen: Document the semantic of the pagetable_reserve PVOPS
Date: Thu, 16 Aug 2012 15:33:14 +0100	[thread overview]
Message-ID: <502D04AA.8030107@citrix.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1208161200200.2278@kaball.uk.xensource.com>

On 16/08/12 12:07, Stefano Stabellini wrote:
> On Wed, 15 Aug 2012, David Vrabel wrote:
>> On 15/08/12 14:55, Stefano Stabellini wrote:
>>> On Wed, 15 Aug 2012, David Vrabel wrote:
>>>> On 14/08/12 15:12, Attilio Rao wrote:
>>>>> On 14/08/12 14:57, David Vrabel wrote:
>>>>>> On 14/08/12 13:24, Attilio Rao wrote:
>>>> After looking at it some more, I think this pv-ops is unnecessary. How
>>>> about the following patch to just remove it completely?
>>>>
>>>> I've only smoke-tested 32-bit and 64-bit dom0 but I think the reasoning
>>>> is sound.
>>>
>>> Do you have more then 4G to dom0 on those boxes?
>>
>> I've tested with 6G now, both 64-bit and 32-bit with HIGHPTE.
>>
>>> It certainly fixed a serious crash at the time it was introduced, see
>>> http://marc.info/?l=linux-kernel&m=129901609503574 and
>>> http://marc.info/?l=linux-kernel&m=130133909408229. Unless something big
>>> changed in kernel_physical_mapping_init, I think we still need it.
>>> Depending on the e820 of your test box, the kernel could crash (or not),
>>> possibly in different places.

FYI, it looks like pgt_buf is now located low down, which is why these
changes worked for me.  Possibly this changed as part of a memblock
refactor.

>>>>>> Having said that, I couldn't immediately see where pages in (end, 
>>>>>> pgt_buf_top] was getting set RO.  Can you point me to where it's 
>>>>>> done?
>>>>>>
>>>>>
>>>>> As mentioned in the comment, please look at xen_set_pte_init().
>>>>
>>>> xen_set_pte_init() only ensures it doesn't set the PTE as writable if it
>>>> is already present and read-only.
>>>
>>> look at mask_rw_pte and read the threads linked above, unfortunately it
>>> is not that simple.
>>
>> Yes, I was remembering what 32-bit did here.
>>
>> The 64-bit version is a bit confused and it often ends up /not/ clearing
>> RW for the direct mapping of the pages in the pgt_buf because any
>> existing RW mappings will be used as-is.  See phys_pte_init() which
>> checks for an existing mapping and only sets the PTE if it is not
>> already set.
> 
> not all the pagetable pages might be already mapped, even if they are
> already hooked into the pagetable

Yes, I think this is easy to handle though.

    /*
     * Make sure that active page table pages are not mapped RW.
     */
    if (is_early_ioremap_ptep(ptep)) {
        /*
         * If we're updating an early ioremap PTE, then this PFN may
         * already be in the linear mapping.  If it is, use the
         * existing RW bit.
         */
        unsigned int level;
        pte_t *linear_pte;

        linear_pte = lookup_address(__va(PFN_PHYS(pfn)), &level);
        if (linear_pte && !pte_write(*linear_pt))
            pte = pte_wrprotect(pte);
    } else if (pfn >= pgt_buf_start && pfn < pgt_buf_end) {
        /*
         * The PFN may not be mapped but may be hooked into the page
         * tables.  Make sure this new mapping is read-only.
         */
        pte = pte_wrprotect(pte);
    }

However, the real subtlety are page tables that are mapped as they
themselves are hooked in.

As as example, let's say pgt_buf_start (s) is on a 1 GiB boundary and
pgt_buf_top (t) is below the next 2 MiB boundary.  We're mapping memory
with 4 KiB pages from s to s + 4 MiB.  This requires a new PUD and two
new PMDs.

To map this region:

1. Allocate a new PUD. (@ e = pgt_buf_end)

2. Allocate a new PMD. (@ e + 1)

3. Fill in this PMD's PTEs.  This covers pgt_buf so sets (s, e + 1) as RO.

4. Call pmd_populate() to hook-in this PMD.  This does not set e+1 as RO
(but this is ok as it already is).

5. Allocate a new PMD  (@ e + 2)

6. Fill in this PMD's PTEs.

7. Call pmd_populate() to hook in this PMD.  This does not set e+2 as RO
as the region isn't mapped yet.

8. Call pud_populate() to hook in this PUD.  This sets e as RO but e + 2
is still RW.

9. Boom!

It may be possible to fixup the permissions as the pages are hooked in.
 i.e., if this page's entries cover (pgt_buf_start, pgt_buf_end], walk
the entries and any child tables and fixup the permissions of the leaf
entries.

This would walk the tables a few times unless we were careful to only
walk them when hooking a page into an active table.

It was fun trying to understand this, but I think I'll give up now...

David

  reply	other threads:[~2012-08-16 14:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-14 12:24 [PATCH v2 0/2] XEN/X86: Document pagetable_reserve PVOPS and enforce a better semantic Attilio Rao
2012-08-14 12:24 ` [PATCH v2 1/2] XEN, X86: Improve semantic support for pagetable_reserve PVOPS Attilio Rao
2012-08-15 17:25   ` Stefano Stabellini
2012-08-15 17:15     ` Attilio Rao
2012-08-15 17:46       ` Stefano Stabellini
2012-08-15 18:43         ` Ian Campbell
2012-08-15 18:50           ` Attilio Rao
2012-08-15 18:47         ` Attilio Rao
2012-08-16  8:12           ` Ian Campbell
2012-08-16  9:53           ` Stefano Stabellini
2012-08-15 17:33     ` Stefano Stabellini
2012-08-14 12:24 ` [PATCH v2 2/2] Xen: Document the semantic of the " Attilio Rao
2012-08-14 13:57   ` David Vrabel
2012-08-14 14:12     ` Attilio Rao
2012-08-15 11:19       ` David Vrabel
2012-08-15 13:55         ` Stefano Stabellini
2012-08-15 19:15           ` David Vrabel
2012-08-16 11:07             ` Stefano Stabellini
2012-08-16 14:33               ` David Vrabel [this message]
2012-08-15 17:46   ` Stefano Stabellini
  -- strict thread matches above, loose matches on Subject: below --
2012-08-16 12:59 [PATCH v2 2/2] XEN: " Attilio Rao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502D04AA.8030107@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=Stefano.Stabellini@eu.citrix.com \
    --cc=attilio.rao@citrix.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.