From: Chao Gao <chao.gao@intel.com>
To: Paul Durrant <Paul.Durrant@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
Wei Liu <wei.liu2@citrix.com>,
Andrew Cooper <Andrew.Cooper3@citrix.com>,
"Tim (Xen.org)" <tim@xen.org>,
George Dunlap <George.Dunlap@citrix.com>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
Jan Beulich <jbeulich@suse.com>,
Ian Jackson <Ian.Jackson@citrix.com>
Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages
Date: Thu, 7 Dec 2017 14:56:31 +0800 [thread overview]
Message-ID: <20171207065629.GA49036@op-computing> (raw)
In-Reply-To: <f1f66b3af868410d87e0f6dad4a57116@AMSPEX02CL03.citrite.net>
On Thu, Dec 07, 2017 at 08:41:14AM +0000, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
>> Of Paul Durrant
>> Sent: 06 December 2017 16:10
>> To: 'Chao Gao' <chao.gao@intel.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>; Wei Liu
>> <wei.liu2@citrix.com>; Andrew Cooper <Andrew.Cooper3@citrix.com>; Tim
>> (Xen.org) <tim@xen.org>; George Dunlap <George.Dunlap@citrix.com>;
>> xen-devel@lists.xen.org; Jan Beulich <jbeulich@suse.com>; Ian Jackson
>> <Ian.Jackson@citrix.com>
>> Subject: Re: [Xen-devel] [RFC Patch v4 2/8] ioreq: bump the number of
>> IOREQ page to 4 pages
>>
>> > -----Original Message-----
>> > From: Chao Gao [mailto:chao.gao@intel.com]
>> > Sent: 06 December 2017 09:02
>> > To: Paul Durrant <Paul.Durrant@citrix.com>
>> > Cc: xen-devel@lists.xen.org; Tim (Xen.org) <tim@xen.org>; Stefano
>> > Stabellini <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> > <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
>> > Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> > <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian Jackson
>> > <Ian.Jackson@citrix.com>
>> > Subject: Re: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> > pages
>> >
>> > On Wed, Dec 06, 2017 at 03:04:11PM +0000, Paul Durrant wrote:
>> > >> -----Original Message-----
>> > >> From: Chao Gao [mailto:chao.gao@intel.com]
>> > >> Sent: 06 December 2017 07:50
>> > >> To: xen-devel@lists.xen.org
>> > >> Cc: Chao Gao <chao.gao@intel.com>; Paul Durrant
>> > >> <Paul.Durrant@citrix.com>; Tim (Xen.org) <tim@xen.org>; Stefano
>> > Stabellini
>> > >> <sstabellini@kernel.org>; Konrad Rzeszutek Wilk
>> > >> <konrad.wilk@oracle.com>; Jan Beulich <jbeulich@suse.com>; George
>> > >> Dunlap <George.Dunlap@citrix.com>; Andrew Cooper
>> > >> <Andrew.Cooper3@citrix.com>; Wei Liu <wei.liu2@citrix.com>; Ian
>> > Jackson
>> > >> <Ian.Jackson@citrix.com>
>> > >> Subject: [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4
>> > >> pages
>> > >>
>> > >> One 4K-byte page at most contains 128 'ioreq_t'. In order to remove the
>> > vcpu
>> > >> number constraint imposed by one IOREQ page, bump the number of
>> > IOREQ
>> > >> page to
>> > >> 4 pages. With this patch, multiple pages can be used as IOREQ page.
>> > >>
>> > >> Basically, this patch extends 'ioreq' field in struct hvm_ioreq_server to
>> an
>> > >> array. All accesses to 'ioreq' field such as 's->ioreq' are replaced with
>> > >> FOR_EACH_IOREQ_PAGE macro.
>> > >>
>> > >> In order to access an IOREQ page, QEMU should get the gmfn and map
>> > this
>> > >> gmfn
>> > >> to its virtual address space.
>> > >
>> > >No. There's no need to extend the 'legacy' mechanism of using magic
>> page
>> > gfns. You should only handle the case where the mfns are allocated on
>> > demand (see the call to hvm_ioreq_server_alloc_pages() in
>> > hvm_get_ioreq_server_frame()). The number of guest vcpus is known at
>> > this point so the correct number of pages can be allocated. If the creator of
>> > the ioreq server attempts to use the legacy hvm_get_ioreq_server_info()
>> > and the guest has >128 vcpus then the call should fail.
>> >
>> > Great suggestion. I will introduce a new dmop, a variant of
>> > hvm_get_ioreq_server_frame() for creator to get an array of gfns and the
>> > size of array. And the legacy interface will report an error if more
>> > than one IOREQ PAGES are needed.
>>
>> You don't need a new dmop for mapping I think. The mem op to map ioreq
>> server frames should work. All you should need to do is update
>> hvm_get_ioreq_server_frame() to deal with an index > 1, and provide some
>> means for the ioreq server creator to convert the number of guest vcpus into
>> the correct number of pages to map. (That might need a new dm op).
>
>I realise after saying this that an emulator already knows the size of the ioreq structure and so can easily calculate the correct number of pages to map, given the number of guest vcpus.
How about the patch in the bottom? Is it in the right direction?
Do you have the QEMU patch, which replaces the old method with the new method
to set up mapping? I want to integrate that patch and do some tests.
Thanks
Chao
From 44919e1e80f36981d6e213f74302c8c89cc9f828 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao@intel.com>
Date: Tue, 5 Dec 2017 14:20:24 +0800
Subject: [PATCH] ioreq: add support of multiple ioreq pages
Each vcpu should have an corresponding 'ioreq_t' structure in the ioreq page.
Currently, only one 4K-byte page is used as ioreq page. Thus it also limits
the number of vcpu to 128 if device model is in use.
This patch changes 'ioreq' field to an array. At most, 4 pages can be used.
When creating IO server, the actual number of ioreq page is calculated
according to the number of vcpus. All ioreq pages are allocated on demand.
The creator should provide enough number of gfn to set up the mapping.
For compatibility, all legacy operations take effect on ioreq[0].
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
xen/arch/x86/hvm/ioreq.c | 86 ++++++++++++++++++++++++++++------------
xen/include/asm-x86/hvm/domain.h | 6 ++-
2 files changed, 65 insertions(+), 27 deletions(-)
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d991ac9..598aedb 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -66,12 +66,12 @@ static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
{
- shared_iopage_t *p = s->ioreq.va;
+ shared_iopage_t *p = s->ioreq[v->vcpu_id / NR_IOREQ_PER_PAGE].va;
ASSERT((v == current) || !vcpu_runnable(v));
ASSERT(p != NULL);
- return &p->vcpu_ioreq[v->vcpu_id];
+ return &p->vcpu_ioreq[v->vcpu_id % NR_IOREQ_PER_PAGE];
}
bool hvm_io_pending(struct vcpu *v)
@@ -239,7 +239,7 @@ static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
{
- struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+ struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
if ( gfn_eq(iorp->gfn, INVALID_GFN) )
return;
@@ -256,7 +256,7 @@ static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
{
struct domain *d = s->domain;
- struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+ struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
int rc;
if ( iorp->page )
@@ -294,10 +294,10 @@ static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
return rc;
}
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, uint8_t idx)
{
struct domain *currd = current->domain;
- struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+ struct hvm_ioreq_page *iorp = idx ? &s->ioreq[idx - 1] : &s->bufioreq;
if ( iorp->page )
{
@@ -344,9 +344,9 @@ static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
return 0;
}
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, uint8_t idx)
{
- struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+ struct hvm_ioreq_page *iorp = idx ? &s->ioreq[idx - 1] : &s->bufioreq;
if ( !iorp->page )
return;
@@ -368,7 +368,18 @@ bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
FOR_EACH_IOREQ_SERVER(d, id, s)
{
- if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+ int i;
+
+ for ( i = 0; i < s->nr_ioreq_page; i++ )
+ {
+ if ( s->ioreq[i].page == page )
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if ( !found && s->bufioreq.page == page )
{
found = true;
break;
@@ -384,7 +395,7 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
{
struct domain *d = s->domain;
- struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+ struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
return;
@@ -398,7 +409,7 @@ static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
{
struct domain *d = s->domain;
- struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+ struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq[0];
int rc;
if ( IS_DEFAULT(s) || gfn_eq(iorp->gfn, INVALID_GFN) )
@@ -419,7 +430,7 @@ static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
{
ASSERT(spin_is_locked(&s->lock));
- if ( s->ioreq.va != NULL )
+ if ( s->ioreq[sv->vcpu->vcpu_id / NR_IOREQ_PER_PAGE].va )
{
ioreq_t *p = get_ioreq(s, sv->vcpu);
@@ -563,23 +574,27 @@ static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
{
- int rc;
-
- rc = hvm_alloc_ioreq_mfn(s, false);
+ int rc = 0, i;
- if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
- rc = hvm_alloc_ioreq_mfn(s, true);
+ for ( i = !HANDLE_BUFIOREQ(s); i <= s->nr_ioreq_page; i++ )
+ {
+ rc = hvm_alloc_ioreq_mfn(s, i);
+ break;
+ }
if ( rc )
- hvm_free_ioreq_mfn(s, false);
+ for ( ; i >= 0; i-- )
+ hvm_free_ioreq_mfn(s, i);
return rc;
}
static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
{
- hvm_free_ioreq_mfn(s, true);
- hvm_free_ioreq_mfn(s, false);
+ int i;
+
+ for ( i = 0; i <= s->nr_ioreq_page; i++ )
+ hvm_free_ioreq_mfn(s, i);
}
static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
@@ -681,7 +696,7 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
int bufioreq_handling, ioservid_t id)
{
struct vcpu *v;
- int rc;
+ int rc, i;
s->domain = d;
s->domid = domid;
@@ -690,7 +705,10 @@ static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
INIT_LIST_HEAD(&s->ioreq_vcpu_list);
spin_lock_init(&s->bufioreq_lock);
- s->ioreq.gfn = INVALID_GFN;
+ s->nr_ioreq_page = (d->max_vcpus + NR_IOREQ_PER_PAGE - 1) /
+ NR_IOREQ_PER_PAGE;
+ for ( i = 0; i < MAX_NR_IOREQ_PAGE; i++ )
+ s->ioreq[i].gfn = INVALID_GFN;
s->bufioreq.gfn = INVALID_GFN;
rc = hvm_ioreq_server_alloc_rangesets(s, id);
@@ -760,6 +778,10 @@ int hvm_create_ioreq_server(struct domain *d, domid_t domid,
rc = -EEXIST;
if ( GET_IOREQ_SERVER(d, i) )
goto fail;
+
+ /* Don't create default IO server if > 1 ioreq pages are needed */
+ if ( d->max_vcpus > NR_IOREQ_PER_PAGE )
+ goto fail;
}
else
{
@@ -858,6 +880,9 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
if ( !s )
goto out;
+ if ( s->nr_ioreq_page > 1 )
+ return -EINVAL;
+
ASSERT(!IS_DEFAULT(s));
if ( ioreq_gfn || bufioreq_gfn )
@@ -868,7 +893,7 @@ int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
}
if ( ioreq_gfn )
- *ioreq_gfn = gfn_x(s->ioreq.gfn);
+ *ioreq_gfn = gfn_x(s->ioreq[0].gfn);
if ( HANDLE_BUFIOREQ(s) )
{
@@ -917,10 +942,19 @@ int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
rc = 0;
break;
- case XENMEM_resource_ioreq_server_frame_ioreq(0):
- *mfn = _mfn(page_to_mfn(s->ioreq.page));
- rc = 0;
+ case XENMEM_resource_ioreq_server_frame_ioreq(0) ... \
+ XENMEM_resource_ioreq_server_frame_ioreq(MAX_NR_IOREQ_PAGE):
+ {
+ int i = idx - XENMEM_resource_ioreq_server_frame_ioreq(0);
+
+ rc = -EINVAL;
+ if ( i < s->nr_ioreq_page )
+ {
+ *mfn = _mfn(page_to_mfn(s->ioreq[i].page));
+ rc = 0;
+ }
break;
+ }
default:
rc = -EINVAL;
diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index 87f7994..3202f74 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -51,6 +51,8 @@ struct hvm_ioreq_vcpu {
#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1)
#define MAX_NR_IO_RANGES 256
+#define MAX_NR_IOREQ_PAGE 4
+#define NR_IOREQ_PER_PAGE (PAGE_SIZE / sizeof(ioreq_t))
struct hvm_ioreq_server {
struct list_head list_entry;
@@ -61,7 +63,9 @@ struct hvm_ioreq_server {
/* Domain id of emulating domain */
domid_t domid;
- struct hvm_ioreq_page ioreq;
+ /* Per-IOserver limitation on the size of 'ioreq' array */
+ uint8_t nr_ioreq_page;
+ struct hvm_ioreq_page ioreq[MAX_NR_IOREQ_PAGE];
struct list_head ioreq_vcpu_list;
struct hvm_ioreq_page bufioreq;
--
1.8.3.1
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
next prev parent reply other threads:[~2017-12-07 6:56 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-06 7:50 [RFC Patch v4 0/8] Extend resources to support more vcpus in single VM Chao Gao
2017-12-06 7:50 ` [RFC Patch v4 1/8] ioreq: remove most 'buf' parameter from static functions Chao Gao
2017-12-06 14:44 ` Paul Durrant
2017-12-06 8:37 ` Chao Gao
2017-12-06 7:50 ` [RFC Patch v4 2/8] ioreq: bump the number of IOREQ page to 4 pages Chao Gao
2017-12-06 15:04 ` Paul Durrant
2017-12-06 9:02 ` Chao Gao
2017-12-06 16:10 ` Paul Durrant
2017-12-07 8:41 ` Paul Durrant
2017-12-07 6:56 ` Chao Gao [this message]
2017-12-08 11:06 ` Paul Durrant
2017-12-12 1:03 ` Chao Gao
2017-12-12 9:07 ` Paul Durrant
2017-12-12 23:39 ` Chao Gao
2017-12-13 10:49 ` Paul Durrant
2017-12-13 17:50 ` Paul Durrant
2017-12-14 14:50 ` Paul Durrant
2017-12-15 0:35 ` Chao Gao
2017-12-15 9:40 ` Paul Durrant
2018-04-18 8:19 ` Jan Beulich
2017-12-06 7:50 ` [RFC Patch v4 3/8] xl/acpi: unify the computation of lapic_id Chao Gao
2018-02-22 18:05 ` Wei Liu
2017-12-06 7:50 ` [RFC Patch v4 4/8] hvmloader: boot cpu through broadcast Chao Gao
2018-02-22 18:44 ` Wei Liu
2018-02-23 8:41 ` Jan Beulich
2018-02-23 16:42 ` Roger Pau Monné
2018-02-24 5:49 ` Chao Gao
2018-02-26 8:28 ` Jan Beulich
2018-02-26 12:33 ` Chao Gao
2018-02-26 14:19 ` Roger Pau Monné
2018-04-18 8:38 ` Jan Beulich
2018-04-18 11:20 ` Chao Gao
2018-04-18 11:50 ` Jan Beulich
2017-12-06 7:50 ` [RFC Patch v4 5/8] Tool/ACPI: DSDT extension to support more vcpus Chao Gao
2017-12-06 7:50 ` [RFC Patch v4 6/8] hvmload: Add x2apic entry support in the MADT and SRAT build Chao Gao
2018-04-18 8:48 ` Jan Beulich
2017-12-06 7:50 ` [RFC Patch v4 7/8] x86/hvm: bump the number of pages of shadow memory Chao Gao
2018-02-27 14:17 ` George Dunlap
2018-04-18 8:53 ` Jan Beulich
2018-04-18 11:39 ` Chao Gao
2018-04-18 11:50 ` Andrew Cooper
2018-04-18 11:59 ` Jan Beulich
2017-12-06 7:50 ` [RFC Patch v4 8/8] x86/hvm: bump the maximum number of vcpus to 512 Chao Gao
2018-02-22 18:46 ` Wei Liu
2018-02-23 8:50 ` Jan Beulich
2018-02-23 17:18 ` Wei Liu
2018-02-23 18:11 ` Roger Pau Monné
2018-02-24 6:26 ` Chao Gao
2018-02-26 8:26 ` Jan Beulich
2018-02-26 13:11 ` Chao Gao
2018-02-26 16:10 ` Jan Beulich
2018-03-01 5:21 ` Chao Gao
2018-03-01 7:17 ` Juergen Gross
2018-03-01 7:37 ` Jan Beulich
2018-03-01 7:11 ` Chao Gao
2018-02-27 14:59 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171207065629.GA49036@op-computing \
--to=chao.gao@intel.com \
--cc=Andrew.Cooper3@citrix.com \
--cc=George.Dunlap@citrix.com \
--cc=Ian.Jackson@citrix.com \
--cc=Paul.Durrant@citrix.com \
--cc=jbeulich@suse.com \
--cc=sstabellini@kernel.org \
--cc=tim@xen.org \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).