All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Xen-devel <xen-devel@lists.xensource.com>
Subject: Re: [PATCH] xen: core dom0 support
Date: Mon, 09 Mar 2009 11:06:40 -0700	[thread overview]
Message-ID: <49B55AB0.1070605@goop.org> (raw)
In-Reply-To: <20090308221208.GA24079@elte.hu>

Ingo Molnar wrote:
> * H. Peter Anvin <hpa@zytor.com> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>> Since it's the same kernel image i think the only truly reliable 
>>> method would be to reboot between _different_ kernel images: 
>>> same instructions but randomly re-align variables both in terms 
>>> of absolute address and in terms of relative position to each 
>>> other. Plus randomize bootmem allocs and never-gets-freed-really 
>>> boot-time allocations.
>>>
>>> Really hard to do i think ...
>>>
>>>       
>> Ouch, yeah.
>>
>> On the other hand, the numbers made sense to me, so I don't 
>> see why there is any reason to distrust them.  They show a 5% 
>> overhead with pv_ops enabled, reduced to a 2% overhead with 
>> the changed.  That is more or less what would match my 
>> intuition from seeing the code.
>>     
>
> Yeah - it was Jeremy expressed doubt in the numbers, not me.
>   

Mainly because I was seeing the instruction and cycle counts completely 
unchanged from run to run, which is implausible.  They're not zero, so 
they're clearly measurements of *something*, but not cycles and 
instructions, since we know that they're changing.  So what are they 
measurements of?  And if they're not what they claim, are the other 
numbers more meaningful?

It's easy to read the numbers as confirmations of preconceived 
expectations of the outcomes, but that's - as I said - unsatisfying.

> And we need to eliminate that 2% as well - 2% is still an awful 
> lot of native kernel overhead from a kernel feature that 95%+ of 
> users do not make any use of.
>   

Well, I think there's a few points here:

   1. the test in question is a bit vague about kernel and user
      measurements.  I assume the stuff coming from perfcounters is
      kernel-only state, but the elapsed time includes the usermode
      component, and so will be affected by the usermode page placement
      and cache effects.  If I change the test to copy the test
      executable (statically linked, to avoid libraries), then that
      should at least fuzz out user page placement.
   2. Its true that the cache effects could be due to the precise layout
      of the kernel executable; but if those effects are swamping
      effects of the changes to improve pvops then its unclear what the
      point of the exercise is.  Especially since:
   3. It is a config option, so if someone is sensitive to the
      performance hit and it gives them no useful functionality to
      offset it, then it can be disabled.  Distros tend to enable it
      because they tend to value function and flexibility over raw
      performance; they tend to enable things like audit, selinux,
      modules which all have performance hits of a similar scale (of
      course, you could argue that more people get benefit from those
      features to offset their costs).  But,
   4. I think you're underestimating the number of people who get
      benefit from pvops; the Xen userbase is actually pretty large, and
      KVM will use pvops hooks when available to improve Linux-as-guest.
   5. Also, we're looking at a single benchmark with no obvious
      relevance to a real workload.  Perhaps there are workloads which
      continuously mash mmap/munmap/mremap(!), but I think they're
      fairly rare.  Such a benchmark is useful for tuning specific
      areas, but if we're going to evaluate pvops overhead, it would be
      nice to use something a bit broader to base our measurements on. 
      Also, what weighting are we going to put on 32 vs 64 bit?  Equally
      important?  One more than the other?

All that said, I would like to get the pvops overhead down to 
unmeasureable - the ideal would be to be able to justify removing the 
config option altogether and leave it always enabled.

The tradeoff, as always, is how much other complexity are we willing to 
stand to get there?  The addition of a new calling convention is already 
fairly esoteric, but so far it has got us a 60% reduction in overhead 
(in this test).  But going further is going to get more complex.

For example, the next step would be to attack set_pte (including 
set_pte_*, pte_clear, etc), to make them use the new calling convention, 
and possibly make them inlineable (ie, to get it as close as possible to 
the non-pvops case).  But that will require them to be implemented in 
asm (to guarantee that they only use the registers they're allowed to 
use), and we already have 3 variants of each for the different pagetable 
modes.  All completely doable, and not even very hard, but it will be 
just one more thing to maintain - we just need to be sure the payoff is 
worth it.

    J

WARNING: multiple messages have this Message-ID (diff)
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: Xen-devel <xen-devel@lists.xensource.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [PATCH] xen: core dom0 support
Date: Mon, 09 Mar 2009 11:06:40 -0700	[thread overview]
Message-ID: <49B55AB0.1070605@goop.org> (raw)
In-Reply-To: <20090308221208.GA24079@elte.hu>

Ingo Molnar wrote:
> * H. Peter Anvin <hpa@zytor.com> wrote:
>
>   
>> Ingo Molnar wrote:
>>     
>>> Since it's the same kernel image i think the only truly reliable 
>>> method would be to reboot between _different_ kernel images: 
>>> same instructions but randomly re-align variables both in terms 
>>> of absolute address and in terms of relative position to each 
>>> other. Plus randomize bootmem allocs and never-gets-freed-really 
>>> boot-time allocations.
>>>
>>> Really hard to do i think ...
>>>
>>>       
>> Ouch, yeah.
>>
>> On the other hand, the numbers made sense to me, so I don't 
>> see why there is any reason to distrust them.  They show a 5% 
>> overhead with pv_ops enabled, reduced to a 2% overhead with 
>> the changed.  That is more or less what would match my 
>> intuition from seeing the code.
>>     
>
> Yeah - it was Jeremy expressed doubt in the numbers, not me.
>   

Mainly because I was seeing the instruction and cycle counts completely 
unchanged from run to run, which is implausible.  They're not zero, so 
they're clearly measurements of *something*, but not cycles and 
instructions, since we know that they're changing.  So what are they 
measurements of?  And if they're not what they claim, are the other 
numbers more meaningful?

It's easy to read the numbers as confirmations of preconceived 
expectations of the outcomes, but that's - as I said - unsatisfying.

> And we need to eliminate that 2% as well - 2% is still an awful 
> lot of native kernel overhead from a kernel feature that 95%+ of 
> users do not make any use of.
>   

Well, I think there's a few points here:

   1. the test in question is a bit vague about kernel and user
      measurements.  I assume the stuff coming from perfcounters is
      kernel-only state, but the elapsed time includes the usermode
      component, and so will be affected by the usermode page placement
      and cache effects.  If I change the test to copy the test
      executable (statically linked, to avoid libraries), then that
      should at least fuzz out user page placement.
   2. Its true that the cache effects could be due to the precise layout
      of the kernel executable; but if those effects are swamping
      effects of the changes to improve pvops then its unclear what the
      point of the exercise is.  Especially since:
   3. It is a config option, so if someone is sensitive to the
      performance hit and it gives them no useful functionality to
      offset it, then it can be disabled.  Distros tend to enable it
      because they tend to value function and flexibility over raw
      performance; they tend to enable things like audit, selinux,
      modules which all have performance hits of a similar scale (of
      course, you could argue that more people get benefit from those
      features to offset their costs).  But,
   4. I think you're underestimating the number of people who get
      benefit from pvops; the Xen userbase is actually pretty large, and
      KVM will use pvops hooks when available to improve Linux-as-guest.
   5. Also, we're looking at a single benchmark with no obvious
      relevance to a real workload.  Perhaps there are workloads which
      continuously mash mmap/munmap/mremap(!), but I think they're
      fairly rare.  Such a benchmark is useful for tuning specific
      areas, but if we're going to evaluate pvops overhead, it would be
      nice to use something a bit broader to base our measurements on. 
      Also, what weighting are we going to put on 32 vs 64 bit?  Equally
      important?  One more than the other?

All that said, I would like to get the pvops overhead down to 
unmeasureable - the ideal would be to be able to justify removing the 
config option altogether and leave it always enabled.

The tradeoff, as always, is how much other complexity are we willing to 
stand to get there?  The addition of a new calling convention is already 
fairly esoteric, but so far it has got us a 60% reduction in overhead 
(in this test).  But going further is going to get more complex.

For example, the next step would be to attack set_pte (including 
set_pte_*, pte_clear, etc), to make them use the new calling convention, 
and possibly make them inlineable (ie, to get it as close as possible to 
the non-pvops case).  But that will require them to be implemented in 
asm (to guarantee that they only use the registers they're allowed to 
use), and we already have 3 variants of each for the different pagetable 
modes.  All completely doable, and not even very hard, but it will be 
just one more thing to maintain - we just need to be sure the payoff is 
worth it.

    J

  reply	other threads:[~2009-03-09 18:06 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-02-28  1:59 [PATCH] xen: core dom0 support Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen dom0: Make hvc_xen console work for dom0 Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen dom0: Initialize xenbus " Jeremy Fitzhardinge
2009-02-28  1:59   ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen dom0: Set up basic IO permissions " Jeremy Fitzhardinge
2009-02-28  1:59   ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen-dom0: only selectively disable cpu features Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen dom0: Add support for the platform_ops hypercall Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen mtrr: Add mtrr_ops support for Xen mtrr Jeremy Fitzhardinge
2009-02-28  1:59   ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen: disable PAT Jeremy Fitzhardinge
2009-02-28  1:59   ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen/dom0: use _PAGE_IOMAP in ioremap to do machine mappings Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] paravirt/xen: add pvop for page_is_ram Jeremy Fitzhardinge
2009-03-10  1:07   ` H. Peter Anvin
2009-03-10 21:19     ` Jeremy Fitzhardinge
2009-03-10 21:19       ` Jeremy Fitzhardinge
2009-03-10 22:21       ` H. Peter Anvin
2009-03-10 22:44         ` Jeremy Fitzhardinge
2009-03-10 22:44           ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen/dom0: Use host E820 map Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen: implement XENMEM_machphys_mapping Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen: clear reserved bits in l3 entries given in the initial pagetables Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen/dom0: add XEN_DOM0 config option Jeremy Fitzhardinge
2009-02-28  1:59   ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen: allow enable use of VGA console on dom0 Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen mtrr: Use specific cpu_has_foo macros instead of generic cpu_has() Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen mtrr: Kill some unneccessary includes Jeremy Fitzhardinge
2009-02-28  1:59   ` Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen mtrr: Use generic_validate_add_page() Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen mtrr: Implement xen_get_free_region() Jeremy Fitzhardinge
2009-02-28  1:59 ` [PATCH] xen mtrr: Add xen_{get,set}_mtrr() implementations Jeremy Fitzhardinge
2009-02-28  5:28 ` [PATCH] xen: core dom0 support Andrew Morton
2009-02-28  6:52   ` Jeremy Fitzhardinge
2009-02-28  6:52     ` Jeremy Fitzhardinge
2009-02-28  7:20     ` Ingo Molnar
2009-02-28  7:20       ` Ingo Molnar
2009-02-28  8:05       ` Jeremy Fitzhardinge
2009-02-28  8:05         ` Jeremy Fitzhardinge
2009-02-28  8:36         ` Ingo Molnar
2009-02-28  8:36           ` Ingo Molnar
2009-02-28  9:57           ` Jeremy Fitzhardinge
2009-02-28  9:57             ` Jeremy Fitzhardinge
2009-03-02  9:26       ` Gerd Hoffmann
2009-03-02  9:26         ` Gerd Hoffmann
2009-03-02 12:04         ` Ingo Molnar
2009-03-02 12:04           ` Ingo Molnar
2009-03-02 12:26           ` Gerd Hoffmann
2009-03-02 12:26             ` Gerd Hoffmann
2009-02-28 12:09     ` Nick Piggin
2009-02-28 12:09       ` Nick Piggin
2009-02-28 18:11       ` [Xen-devel] " Jody Belka
2009-02-28 18:11         ` Jody Belka
2009-02-28 18:15         ` Andi Kleen
2009-03-01 23:38           ` Jeremy Fitzhardinge
2009-03-01 23:38             ` Jeremy Fitzhardinge
2009-03-02  0:14             ` Andi Kleen
2009-03-01 23:27       ` Jeremy Fitzhardinge
2009-03-01 23:27         ` Jeremy Fitzhardinge
2009-03-02  6:37         ` Nick Piggin
2009-03-02  6:37           ` Nick Piggin
2009-03-02  8:05           ` Jeremy Fitzhardinge
2009-03-02  8:05             ` Jeremy Fitzhardinge
2009-03-02  8:19             ` Nick Piggin
2009-03-02  8:19               ` Nick Piggin
2009-03-02  9:05               ` Jeremy Fitzhardinge
2009-03-04 17:34             ` Anthony Liguori
2009-03-04 17:34               ` Anthony Liguori
2009-03-04 17:38               ` Jeremy Fitzhardinge
2009-03-04 17:38                 ` Jeremy Fitzhardinge
2009-03-05 10:59               ` [Xen-devel] " George Dunlap
2009-03-05 10:59                 ` George Dunlap
2009-03-05 14:37                 ` [Xen-devel] " Anthony Liguori
2009-03-05 14:37                   ` Anthony Liguori
2009-03-04 17:31           ` Anthony Liguori
2009-03-04 17:31             ` Anthony Liguori
2009-03-04 19:03         ` Anthony Liguori
2009-03-04 19:16           ` H. Peter Anvin
2009-03-04 19:33             ` Anthony Liguori
2009-03-04 19:33               ` Anthony Liguori
2009-02-28 16:14     ` Andi Kleen
2009-03-01 23:34       ` Jeremy Fitzhardinge
2009-03-01 23:34         ` Jeremy Fitzhardinge
2009-03-01 23:52         ` H. Peter Anvin
2009-03-02  0:08           ` Jeremy Fitzhardinge
2009-03-02  0:08             ` Jeremy Fitzhardinge
2009-03-02  0:14             ` H. Peter Anvin
2009-03-02  0:42               ` Jeremy Fitzhardinge
2009-03-02  0:42                 ` Jeremy Fitzhardinge
2009-03-02  0:46                 ` H. Peter Anvin
2009-03-02  0:10         ` Andi Kleen
2009-02-28  8:42   ` Ingo Molnar
2009-02-28  8:42     ` Ingo Molnar
2009-02-28  9:46     ` Jeremy Fitzhardinge
2009-02-28  9:46       ` Jeremy Fitzhardinge
2009-03-02 12:08       ` Ingo Molnar
2009-03-02 12:08         ` Ingo Molnar
2009-03-07  9:06         ` Jeremy Fitzhardinge
2009-03-07  9:06           ` Jeremy Fitzhardinge
2009-03-08 11:01           ` Ingo Molnar
2009-03-08 11:01             ` Ingo Molnar
2009-03-08 21:56             ` H. Peter Anvin
2009-03-08 22:06               ` Ingo Molnar
2009-03-08 22:06                 ` Ingo Molnar
2009-03-08 22:08                 ` H. Peter Anvin
2009-03-08 22:12                   ` Ingo Molnar
2009-03-08 22:12                     ` Ingo Molnar
2009-03-09 18:06                     ` Jeremy Fitzhardinge [this message]
2009-03-09 18:06                       ` Jeremy Fitzhardinge
2009-03-10 12:44                       ` Ingo Molnar
2009-03-10 12:44                         ` Ingo Molnar
2009-03-10 12:49                       ` Nick Piggin
2009-03-10 12:49                         ` Nick Piggin
2009-03-05 13:52   ` Morten P.D. Stevens
2009-03-08 14:25     ` Manfred Knick
2009-03-09 19:51       ` Morten P.D. Stevens
2009-03-09 20:00       ` Morten P.D. Stevens
2009-02-28  6:17 ` Boris Derzhavets
2009-02-28  6:23   ` [Xen-devel] " Jeremy Fitzhardinge
2009-02-28  6:23     ` Jeremy Fitzhardinge
2009-02-28  6:28     ` Boris Derzhavets
  -- strict thread matches above, loose matches on Subject: below --
2009-03-11 19:58 devzero
2009-03-14  1:08 ` Morten P.D. Stevens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49B55AB0.1070605@goop.org \
    --to=jeremy@goop.org \
    --cc=akpm@linux-foundation.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.