public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: arch_flush_lazy_mmu_mode() in arch/x86/mm/highmem_32.c
Date: Mon, 17 Nov 2008 10:40:14 -0800	[thread overview]
Message-ID: <4921BA8E.60806@goop.org> (raw)
In-Reply-To: <1226944387.9969.77.camel@bodhitayantram.eng.vmware.com>

Zachary Amsden wrote:
> On Mon, 2008-11-17 at 01:08 -0800, Jan Beulich wrote:
>   
>> Commit 49f19710512c825aaea73b9207b3a848027cda1d hints at the
>> current solution not being the final one, yet the advertised (for 2.6.22)
>> change apparently never happened. However, it seems to me that
>> flushing a potentially huge (it terms of time to completion) batch
>> asynchronously is no really good idea. Instead, I'd think that adding to
>>     
>
> The batch size is limited by many factors; the number of possible PTEs
> updated is only 64 in the COW path, up to 1024 at most for zero / remap
> paths / mprotect paths.  While 1024 is large, I don't think it qualifies
> as huge, and it should really be very uncommon.  In practice, the
> batching under VMI is limited by the size of the hypercall queue, which
> is only 256 entries.
>   

And the Xen multicall queue is only 32 entries too.

Also, the lazy updates can only be deferred for as long as you're 
holding the appropriate pte lock, so its already a non-preemptible 
path.  The batch-executing hypercall could cause a bit of an interrupt 
hiccup, but that's really a tradeoff between large batches and hypercall 
overhead.  I thought a 32x reduction in hypercall cost would be 
reasonable in terms of increased interrupt latency.

>> the batch should be prevented in asynchronous contexts altogether, or
>> things should properly nest. As a positive side effect, disabling interrupts
>> in the batch handling - in particular around the submission of the batch -
>> could also be avoided, reducing interrupt latency (perhaps significantly
>> in some case).
>>     
>
> Jeremy already fixed that; we don't disable interrupts, the change he
> made was to flush and then immediately restart the batching.
>   

Yes.  The Xen code only disables interrupts temporarily while actually 
constructing a new multicall list member, to stop a half-constructed 
multicall from being issued by a nested flush.  But that's very brief, 
and cheap under Xen.

> Re: nesting properly, there is no low-level interface provided to issue
> "forced" updates, that is updates which bypass any batching and take
> effect immediately.  We considered it, but wanted a simple design of a
> strictly in-order update queue to reduce complexity.  This means you
> either have no batching, eliminate the scenarios which require nesting,
> disable interrupts, or flush previously batched updates when nested
> operations are required.  The latter seems the lesser of the evils.
>   

Yeah.  There's really been no call to do anything more complex here.

>> Likewise I would think that the flush out of vmalloc_sync_one() isn't
>> appropriate, and it should rather be avoided for the set_pmd() there to
>> get into the batching code altogether.
>>     
>
> That's impossible.  The flush is needed and there is no way to avoid it.
> The kernel has no general restrictions about contexts in which it is
> safe vs. unsafe to touch the kernel's own vmalloc'ed memory, so you can
> get a page fault due to lazy syncing of vmalloc area PDEs in non-PAE
> mode.  You really have to service that fault.
>   

You could do the flush in the fault handler itself, rather than 
vmalloc_sync_one.  If you enter the handler with outstanding updates, 
then flush them and return.  Hm, but that only works if you're always 
going from NP->P; if you're doing P->P updates then you may just end up 
with stale mappings.

>> (Background to this: After adding lazy mmu mode support in our forward
>> ported tree, we've actually been hit by these calls out of kmap_...(), as
>> originally I didn't pay much attention to these and didn't care to synchronize
>> batch addition and flushing with asynchronous operations.)
>>     
>
> What tree is that?  Unclear from your message.
>   

The Novell kernel tree.  Jan's been doggedly forward-porting the old Xen 
patches.

    J

  reply	other threads:[~2008-11-17 18:40 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-17  9:08 arch_flush_lazy_mmu_mode() in arch/x86/mm/highmem_32.c Jan Beulich
2008-11-17 17:53 ` Zachary Amsden
2008-11-17 18:40   ` Jeremy Fitzhardinge [this message]
2008-11-17 19:54     ` Zachary Amsden
2008-11-18  8:03     ` Jan Beulich
2008-11-18 17:01       ` Jeremy Fitzhardinge
2008-11-18 17:28         ` Jan Beulich
2008-11-18 18:00           ` Jeremy Fitzhardinge
2008-11-18 18:06           ` Zachary Amsden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4921BA8E.60806@goop.org \
    --to=jeremy@goop.org \
    --cc=jbeulich@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox