Re: [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
	George Dunlap <george.dunlap@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	IanCampbell <Ian.Campbell@citrix.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Stefano Stabellini <Stefano.Stabellini@citrix.com>,
	"Lv, Zhiyuan" <zhiyuan.lv@intel.com>,
	"Keir (Xen.org)" <keir@xen.org>
Subject: Re: [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges.
Date: Fri, 5 Feb 2016 16:41:33 +0800	[thread overview]
Message-ID: <56B4603D.6070708@linux.intel.com> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D15F7A7485@SHSMSX101.ccr.corp.intel.com>



On 2/5/2016 12:18 PM, Tian, Kevin wrote:
>> From: George Dunlap [mailto:george.dunlap@citrix.com]
>> Sent: Friday, February 05, 2016 1:12 AM
>>
>> On 04/02/16 14:08, Jan Beulich wrote:
>>>>>> On 04.02.16 at 14:33, <Ian.Jackson@eu.citrix.com> wrote:
>>>> Jan Beulich writes ("Re: [Xen-devel] [PATCH v3 3/3] tools: introduce parameter
>>>> max_wp_ram_ranges."):
>>>>> On 04.02.16 at 10:38, <yu.c.zhang@linux.intel.com> wrote:
>>>>>> So another question is, if value of this limit really matters, will a
>>>>>> lower one be more acceptable(the current 256 being not enough)?
>>>>>
>>>>> If you've carefully read George's replies, [...]
>>>>
>>>> Thanks to George for the very clear explanation, and also to him for
>>>> an illuminating in-person discussion.
>>>>
>>>> It is disturbing that as a result of me as a tools maintainer asking
>>>> questions about what seems to me to be a troublesome a user-visible
>>>> control setting in libxl, we are now apparently revisiting lower
>>>> layers of the hypervisor design, which have already been committed.
>>>>
>>>> While I find George's line of argument convincing, neither I nor
>>>> George are maintainers of the relevant hypervisor code.  I am not
>>>> going to insist that anything in the hypervisor is done different and
>>>> am not trying to use my tools maintainer position to that end.
>>>>
>>>> Clearly there has been a failure of our workflow to consider and
>>>> review everything properly together.  But given where we are now, I
>>>> think that this discussion about hypervisor internals is probably a
>>>> distraction.
>>>
>>> While I recall George having made that alternative suggestion,
>>> both Yu and Paul having reservations against it made me not
>>> insist on that alternative. Instead I've been trying to limit some
>>> of the bad effects that the variant originally proposed brought
>>> with it. Clearly, with the more detailed reply George has now
>>> given (involving areas where he is the maintainer for), I should
>>> have been more demanding towards the exploration of that
>>> alternative. That's clearly unfortunate, and I apologize for that,
>>> but such things happen.
>>>
>>> As to one of the patches already having for committed - I'm not
>>> worried about that at all. We can always revert, that's why the
>>> thing is called "unstable".
>>
>> It looks like I should have been more careful to catch up on the current
>> state of things before I started arguing again -- please accept my
>> apologies.
>
> Thanks George for your careful thinking.
>
>>
>> I see that patch 2/3 addresses the gpfn/io question in the commit
>> message by saying, "Previously, a new hypercall or subop was suggested
>> to map write-protected pages into ioreq server. However, it turned out
>> handler of this new hypercall would be almost the same with the existing
>> pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's already a
>> type parameter in this hypercall. So no new hypercall defined, only a
>> new type is introduced."
>>
>> And I see that 2/3 internally separates the WP_RAM type into a separate
>> rangeset, whose size can be adjusted separately.
>>
>> This addresses my complaint about the interface using gpfns rather than
>> MMIO ranges as an interface (somewhat anyway).  Sorry for not
>> acknowledging this at first.
>>
>> The question of the internal implementation -- whether to use RB tree
>> rangesets, or radix trees (as apparently ARM memaccess does) or p2m
>> types -- is an internal implementation question.  I think p2m types is
>> long-term the best way to go, but it won't hurt to have the current
>> implementation checked in, as long as it doesn't have any impacts on the
>> stable interface.
>
> I'm still trying to understand your suggestion vs. this one. Today we
> already have a p2m_mmio_write_dm type. It's there already, and any
> write fault hitting that type will be delivered to ioreq server. Then next
> open is how a ioreq server could know whether it should handle this
> request or not, which is why some tracking structures (either RB/radix)
> are created to maintain that specific information. It's under the assumption
> that multiple ioreq servers co-exist, so a loop check on all ioreq servers
> is required to identify the right target. And multiple ioreq servers are
> real case in XenGT, because our vGPU device model is in kernel, as
> part of Intel i915 graphics driver. So at least two ioreq servers already
> exist, with one routing to XenGT in Dom0 kernel space and the other
> to the default Qemu in Dom0 user.
>
> In your long-term approach with p2m types, looks you are proposing
> encoding ioreq server ID in p2m type directly (e.g. 4bits), which then
> eliminates the need of tracking in ioreq server side so the whole
> security concern is gone. And no limitation at all. Because available
> p2m bits are limited, as Andrew pointed out, so it might be reasonable
> to implement this approach when a new p2t structure is added, which
> is why we consider it as a long-term approach.
>
> Please correct me if above understanding is correct?
>
>>
>> At the moment, as far as I can tell, there's no way for libxl to even
>> run a version of qemu with XenGT enabled, so there's no real need for
>> libxl to be involved.
>
> no way because we have upstreamed all toolstack changes yet, but
> we should still discuss the requirement as we've been doing in this
> thread right? or do you mean something different?
>
>>
>> The purpose of having the limit would putatively be to prevent a guest
>> being able to trigger an exhaustion of hypervisor memory by inducing the
>> device model to mark an arbitrary number of ranges as mmio_dm.
>>
>> Two angles on this.
>>
>> First, assuming that limiting the number of ranges is what we want:  I'm
>> not really a fan of using HVM_PARAMs for this, but as long as it's not
>> considered a public interface (i.e., it could go away or disappear and
>> everything would Just Work), then I wouldn't object.
>>
>> Although I would ask: would it instead be suitable for now to just set
>> the default limit for WP_RAM to 8196 in the hypervisor, since we do
>> expect it to be tracking gpfn ranges rather than IO regions?  And if we
>> determine in the future that more ranges are necessary, to then do the
>> work of moving it to using p2m types (or exposing a knob to adjust it)?
>>
>> But (and this the other angle): is simply marking a numerical limit
>> sufficient to avoid memory exhaustion? Is there a danger that after
>> creating several guests, such that Xen was now running very low on
>> memory, that a guest would (purposely or not) cause memory to be
>> exhausted sometime further after boot, causing a system-wide DoS (or
>> just general lack of stability)?
>>
>> In the shadow / hap memory case, the memory is pre-allocated up front,
>> which makes sure that nothing a guest does can cause Xen to run out of
>> memory once it's booted.  Without pre-allocating it, it's still possible
>> that the admin might start up enough VMs that exhaustion is *possible*,
>> but won't be triggered until later (when the guest starts using more GTTs).
>
> that's a valid concern though I believe not cleanly addressed in many places. :-)
>
>>
>> Although in fact this really points to the need for a general overhaul
>> in how memory allocation on behalf of a domain is handled in general;
>> that's a bigger chunk of work.
>>
>> But in any case, it seems to me that we can entirely avoid the question
>> of how many ranges might ever be necessary by starting with a fixed
>> limit in the hypervisor, and then moving to a p2m-type based
>> implementation if and when that becomes unsatisfactory.
>>
>
> I agree with this approach. Let's see whether we can get consensus from
> others to make a conclusion.
>
> Actually another quick thought. Please help check whether it makes sense
> or just be more tricky.
>
> Could we have an intermediate option toward p2m-type based option, by
> assuming only one ioreq server can handle write_dm related faults? If that's
> the case:
>
> - we don't need to add more bits in p2m type;
> - XenGT can register as the default ioreq server for write_dm;
> - when a write-dm fault is triggered, we'll loop all ioreq servers to find the
> default one and then deliver through that path w/o further check;
> - we may not change ioreq server interface at all. Assume the ioreq sever
> which receives the 1st write-protection request from device model is the
> default one;

Thank you, Kevin. With this suggestion, we do not need to worry about
the limits in rangeset, therefore no performance worries to traverse
the ranges. One disadvantage is that we need to add a field like
wp_ioreq_server in struct hvm_domain, yet there's already a
default_ioreq_server for qemu. So I'd like to hear Paul's opion about
this.

By now, we all agree that this new max_wp_ram_ranges in tool stack is
not suitable. And to give a summary of the choices we have:

1> George's suggestion: "starting with a fixed limit in the hypervisor,
and then moving to a p2m-type based implementation if and when that
becomes unsatisfactory". My understanding is this looks like the v9
implementation.

2> If hard code the limit as 8K is acceptable, we can use a xengt flag
in hvm configuration, which leave this choice to admin and without
exposing to much hypervisor implantation details.

3> Kevin's suggestion: leverage the p2m type, which no longer needs the
underlying rangeset.

4> I'll continue to do some path-finding in the device model side, to
see if, after optimization, this limit can be set to a lower value.

Anyway, thank you all for your attentions and suggestions.
This weekend is Chinese Spring Festival, my mail replies would be slow
in the next 2 weeks. Wish you all good luck in the new year! :)

Yu

next prev parent reply	other threads:[~2016-02-05  8:41 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-29 10:45 [PATCH v12 0/3] Refactor ioreq server for better performance Yu Zhang
2016-01-29 10:45 ` [PATCH v12 1/3] Refactor rangeset structure " Yu Zhang
2016-01-29 10:45 ` [PATCH v12 2/3] Differentiate IO/mem resources tracked by ioreq server Yu Zhang
2016-01-29 10:45 ` [PATCH v3 3/3] tools: introduce parameter max_wp_ram_ranges Yu Zhang
2016-01-29 16:33   ` Jan Beulich
2016-01-30 14:38     ` Yu, Zhang
2016-02-01  7:52       ` Jan Beulich
2016-02-01 12:02         ` Wei Liu
2016-02-01 12:15           ` Jan Beulich
2016-02-01 12:49             ` Wei Liu
2016-02-01 13:07               ` Jan Beulich
2016-02-01 15:14                 ` Yu, Zhang
2016-02-01 16:16                   ` Jan Beulich
2016-02-01 16:33                     ` Yu, Zhang
2016-02-01 16:19                   ` Yu, Zhang
2016-02-01 16:35                     ` Jan Beulich
2016-02-01 16:37                       ` Yu, Zhang
2016-02-01 17:05                         ` Ian Jackson
2016-02-02  8:04                           ` Yu, Zhang
2016-02-02 11:51                             ` Wei Liu
2016-02-02 13:56                               ` Yu, Zhang
2016-02-02 10:32                           ` Jan Beulich
2016-02-02 10:56                             ` Yu, Zhang
2016-02-02 11:12                               ` Jan Beulich
2016-02-02 14:01                                 ` Yu, Zhang
2016-02-02 14:42                                   ` Jan Beulich
2016-02-02 15:00                                     ` Yu, Zhang
2016-02-02 15:21                                       ` Jan Beulich
2016-02-02 15:19                                         ` Yu, Zhang
2016-02-03  7:10                                         ` Yu, Zhang
2016-02-03  8:32                                           ` Jan Beulich
2016-02-03 12:20                                             ` Paul Durrant
2016-02-03 12:35                                               ` Jan Beulich
2016-02-03 12:50                                                 ` Paul Durrant
2016-02-03 13:00                                                   ` Jan Beulich
2016-02-03 13:07                                                     ` Paul Durrant
2016-02-03 13:17                                                       ` Jan Beulich
2016-02-03 13:18                                                         ` Paul Durrant
2016-02-03 14:43                                                           ` Ian Jackson
2016-02-03 15:10                                                             ` Paul Durrant
2016-02-03 17:50                                                               ` George Dunlap
2016-02-04  8:50                                                                 ` Yu, Zhang
2016-02-03 17:41                                                             ` George Dunlap
2016-02-03 18:21                                                               ` George Dunlap
2016-02-03 18:26                                                                 ` George Dunlap
2016-02-03 18:39                                                                 ` Andrew Cooper
2016-02-03 19:12                                                                   ` George Dunlap
2016-02-04  8:51                                                                     ` Yu, Zhang
2016-02-04 10:49                                                                       ` George Dunlap
2016-02-04 11:08                                                                         ` Ian Campbell
2016-02-04 11:19                                                                           ` Ian Campbell
2016-02-04  8:50                                                                 ` Yu, Zhang
2016-02-04  9:28                                                                   ` Paul Durrant
2016-02-04  9:38                                                                     ` Yu, Zhang
2016-02-04  9:49                                                                       ` Paul Durrant
2016-02-04 10:34                                                                       ` Jan Beulich
2016-02-04 13:33                                                                         ` Ian Jackson
2016-02-04 13:47                                                                           ` Paul Durrant
2016-02-04 14:12                                                                             ` Jan Beulich
2016-02-04 14:25                                                                               ` Paul Durrant
2016-02-04 15:06                                                                                 ` Ian Jackson
2016-02-04 15:51                                                                                   ` Paul Durrant
2016-02-05  3:47                                                                                     ` Tian, Kevin
2016-02-05  3:35                                                                               ` Tian, Kevin
2016-02-04 14:08                                                                           ` Jan Beulich
2016-02-04 17:12                                                                             ` George Dunlap
2016-02-05  4:18                                                                               ` Tian, Kevin
2016-02-05  8:41                                                                                 ` Yu, Zhang [this message]
2016-02-05  8:32                                                                               ` Jan Beulich
2016-02-05  9:24                                                                                 ` Paul Durrant
2016-02-05 10:41                                                                                   ` Jan Beulich
2016-02-05 11:14                                                                                   ` George Dunlap
2016-02-05 11:24                                                                                     ` Paul Durrant
2016-02-16  7:22                                                                                       ` Tian, Kevin
2016-02-16  8:50                                                                                         ` Paul Durrant
2016-02-16 10:33                                                                                           ` Jan Beulich
2016-02-16 11:11                                                                                             ` Paul Durrant
2016-02-17  3:18                                                                                               ` Tian, Kevin
2016-02-17  8:58                                                                                                 ` Paul Durrant
2016-02-17  9:32                                                                                                   ` Jan Beulich
2016-02-17  9:58                                                                                                   ` Tian, Kevin
2016-02-17 10:03                                                                                                     ` Paul Durrant
2016-02-17 10:22                                                                                                     ` Jan Beulich
2016-02-17 10:24                                                                                                       ` Paul Durrant
2016-02-17 10:25                                                                                                         ` Tian, Kevin
2016-02-17 11:01                                                                                                       ` George Dunlap
2016-02-17 11:12                                                                                                         ` Paul Durrant
2016-02-22 15:56                                                                                                           ` George Dunlap
2016-02-22 16:02                                                                                                             ` Paul Durrant
2016-02-22 16:45                                                                                                               ` George Dunlap
2016-02-22 17:01                                                                                                                 ` Paul Durrant
2016-02-22 17:23                                                                                                                   ` George Dunlap
2016-02-22 17:34                                                                                                                     ` Paul Durrant
2016-02-05  8:41                                                                               ` Yu, Zhang
2016-02-04 11:06                                                                       ` George Dunlap
2016-02-05  2:01                                                                         ` Zhiyuan Lv
2016-02-05  3:44                                                                           ` Tian, Kevin
2016-02-05  8:38                                                                             ` Jan Beulich
2016-02-05 11:05                                                                             ` George Dunlap
2016-02-05 15:13                                                                               ` Zhiyuan Lv
2016-02-05 20:14                                                                                 ` George Dunlap
2016-02-05  8:40                                                                         ` Yu, Zhang
2016-02-04 10:06                                                               ` Ian Campbell
2016-02-05  3:31                                                                 ` Tian, Kevin
2016-02-02 11:31                             ` Andrew Cooper
2016-02-02 11:43                               ` Jan Beulich
2016-02-02 14:20                                 ` Andrew Cooper
2016-02-01 11:57   ` Wei Liu
2016-02-01 15:15     ` Yu, Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56B4603D.6070708@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=Stefano.Stabellini@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).