xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Yu, Zhang" <yu.c.zhang@linux.intel.com>
To: George Dunlap <George.Dunlap@eu.citrix.com>,
	Paul Durrant <Paul.Durrant@citrix.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	"Keir (Xen.org)" <keir@xen.org>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	George Dunlap <George.Dunlap@citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	"zhiyuan.lv@intel.com" <zhiyuan.lv@intel.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server
Date: Tue, 07 Jul 2015 16:16:18 +0800	[thread overview]
Message-ID: <559B8AD2.4090806@linux.intel.com> (raw)
In-Reply-To: <CAFLBxZa4=wAzt8O8fbUnbnN7C+XmBtDZqxJXFvbR=eiBb=x9qg@mail.gmail.com>

Thanks a lot, George.

On 7/6/2015 10:06 PM, George Dunlap wrote:
> On Mon, Jul 6, 2015 at 2:33 PM, Paul Durrant <Paul.Durrant@citrix.com> wrote:
>>> -----Original Message-----
>>> From: George Dunlap [mailto:george.dunlap@eu.citrix.com]
>>> Sent: 06 July 2015 14:28
>>> To: Paul Durrant; George Dunlap
>>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for
>>> ioreq server
>>>
>>> On 07/06/2015 02:09 PM, Paul Durrant wrote:
>>>>> -----Original Message-----
>>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>>> George Dunlap
>>>>> Sent: 06 July 2015 13:50
>>>>> To: Paul Durrant
>>>>> Cc: Yu Zhang; xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich;
>>> Andrew
>>>>> Cooper; Kevin Tian; zhiyuan.lv@intel.com
>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES
>>> for
>>>>> ioreq server
>>>>>
>>>>> On Mon, Jul 6, 2015 at 1:38 PM, Paul Durrant <Paul.Durrant@citrix.com>
>>>>> wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of
>>>>>>> George Dunlap
>>>>>>> Sent: 06 July 2015 13:36
>>>>>>> To: Yu Zhang
>>>>>>> Cc: xen-devel@lists.xen.org; Keir (Xen.org); Jan Beulich; Andrew
>>> Cooper;
>>>>>>> Paul Durrant; Kevin Tian; zhiyuan.lv@intel.com
>>>>>>> Subject: Re: [Xen-devel] [PATCH v2 1/2] Resize the
>>> MAX_NR_IO_RANGES
>>>>> for
>>>>>>> ioreq server
>>>>>>>
>>>>>>> On Mon, Jul 6, 2015 at 7:25 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>> wrote:
>>>>>>>> MAX_NR_IO_RANGES is used by ioreq server as the maximum
>>>>>>>> number of discrete ranges to be tracked. This patch changes
>>>>>>>> its value to 8k, so that more ranges can be tracked on next
>>>>>>>> generation of Intel platforms in XenGT. Future patches can
>>>>>>>> extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
>>>>>>>> can serve as a default limit.
>>>>>>>>
>>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>>
>>>>>>> I said this at the Hackathon, and I'll say it here:  I think this is
>>>>>>> the wrong approach.
>>>>>>>
>>>>>>> The problem here is not that you don't have enough memory ranges.
>>> The
>>>>>>> problem is that you are not tracking memory ranges, but individual
>>>>>>> pages.
>>>>>>>
>>>>>>> You need to make a new interface that allows you to tag individual
>>>>>>> gfns as p2m_mmio_write_dm, and then allow one ioreq server to get
>>>>>>> notifications for all such writes.
>>>>>>>
>>>>>>
>>>>>> I think that is conflating things. It's quite conceivable that more than one
>>>>> ioreq server will handle write_dm pages. If we had enough types to have
>>>>> two page types per server then I'd agree with you, but we don't.
>>>>>
>>>>> What's conflating things is using an interface designed for *device
>>>>> memory ranges* to instead *track writes to gfns*.
>>>>
>>>> What's the difference? Are you asserting that all device memory ranges
>>> have read side effects and therefore write_dm is not a reasonable
>>> optimization to use? I would not want to make that assertion.
>>>
>>> Using write_dm is not the problem; it's having thousands of memory
>>> "ranges" of 4k each that I object to.
>>>
>>> Which is why I suggested adding an interface to request updates to gfns
>>> (by marking them write_dm), rather than abusing the io range interface.
>>>
>>
>> And it's the assertion that use of write_dm will only be relevant to gfns, and that all such notifications only need go to a single ioreq server, that I have a problem with. Whilst the use of io ranges to track gfn updates is, I agree, not ideal I think the overloading of write_dm is not a step in the right direction.
>
> So there are two questions here.
>
> First of all, I certainly think that the *interface* should be able to
> be transparently extended to support multiple ioreq servers being able
> to track gfns.  My suggestion was to add a hypercall that allows an
> ioreq server to say, "Please send modifications to gfn N to ioreq
> server X"; and that for the time being, only allow one such X to exist
> at a time per domain.  That is, if ioreq server Y makes such a call
> after ioreq server X has done so, return -EBUSY.  That way we can add
> support when we need it.
>

Well, I also agree the current implementation is probably not optimal.
And yes, it seems promiscuous( hope I did not use the wrong word :) )
to mix the device I/O ranges and the guest memory. But, forwarding an
ioreq to backend driver, just by an p2m type? Although it would be easy
for XenGT to take this approach, I agree with Paul that this would
weaken the functionality of ioreq server. Besides, is it appropriate
for a p2m type to be used this way? It seems strange for me.

> In fact, you probably already have a problem with two ioreq servers,
> because (if I recall correctly) you don't know for sure when a page

Fortunately, we do, and these unmapped page tables will be removed from
the rangeset of ioreq server. So the following scenario won't happen. :)

> has stopped being used as a GPU pagetable.  Consider the following
> scenario:
> 1. Two devices, served by ioreq servers 1 and 2.
> 2. driver for device served by ioreq server 1 allocates a page, uses
> it as a pagetable.  ioreq server 1 adds that pfn to the ranges it's
> watching.
> 3. driver frees page back to guest OS; but ioreq server 1 doesn't know
> so it doesn't release the range
> 4. driver for device served by ioreq server 2 allocates a page, which
> happens to be the same one used before, and uses it as a pagetable.
> ioreq server 1 tries to add that pfn to the ranges it's watching.
>
> Now you have an "overlap in the range" between the two ioreq servers.
> What do you do?
>
> Regarding using write_dm for actual device memory ranges: Do you have
> any concrete scenarios in mind where you think this will be used?
>
> Fundamentally, write_dm looks to me like it's about tracking gfns --
> i.e., things backed by guest RAM -- not IO ranges.  As such, it should
> have an interface and an implementation that reflects that.
>

Here, I guess your major concern about the difference between tracking
gfns and I/O ranges is that the gfns are scattered? And yes, this is why
we need more ranges inside a rangeset. Here the new value of the limit, 
8K, is a practical one for XenGT. In the future, we can either provide
other approaches to configure the maximum ranges inside an ioreq server,
or provide some xenheap allocation management routines. Is this OK?

I thought we had successfully convinced you in hackathon. And seems
I'm wrong. Anyway, your advices are very appreciated. :)

>   -George
>

B.R.
Yu
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
>

  reply	other threads:[~2015-07-07  8:16 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-06  6:25 [PATCH v2 0/2] Refactor ioreq server for better performance Yu Zhang
2015-07-06  6:25 ` [PATCH v2 1/2] Resize the MAX_NR_IO_RANGES for ioreq server Yu Zhang
2015-07-06 12:35   ` George Dunlap
2015-07-06 12:38     ` Paul Durrant
2015-07-06 12:49       ` George Dunlap
2015-07-06 13:09         ` Paul Durrant
2015-07-06 13:23           ` George Dunlap
2015-07-06 13:28           ` George Dunlap
2015-07-06 13:33             ` Paul Durrant
2015-07-06 14:06               ` George Dunlap
2015-07-07  8:16                 ` Yu, Zhang [this message]
2015-07-07  9:23                   ` Paul Durrant
2015-07-07 12:53                     ` Jan Beulich
2015-07-07 13:11                       ` Paul Durrant
2015-07-07 14:04                         ` Jan Beulich
2015-07-07 14:30                           ` Yu, Zhang
2015-07-07 14:43                             ` Jan Beulich
2015-07-07 14:49                               ` Yu, Zhang
2015-07-07 15:10                                 ` Jan Beulich
2015-07-07 16:02                                   ` Yu, Zhang
2015-07-07 15:12                           ` Paul Durrant
2015-07-07 15:27                             ` Jan Beulich
2015-07-07 15:29                               ` Paul Durrant
2015-07-06  6:25 ` [PATCH v2 2/2] Add new data structure to track ranges Yu Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559B8AD2.4090806@linux.intel.com \
    --to=yu.c.zhang@linux.intel.com \
    --cc=Andrew.Cooper3@citrix.com \
    --cc=George.Dunlap@citrix.com \
    --cc=George.Dunlap@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=keir@xen.org \
    --cc=kevin.tian@intel.com \
    --cc=xen-devel@lists.xen.org \
    --cc=zhiyuan.lv@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).