* Re: [Xen-users] Grant reference batch transmission [not found] <DA845EDCE27355428C520DC5B8DC05CE47E7A93173@GEORGE.Emea.Arm.com> @ 2015-03-10 16:22 ` Ian Campbell 2015-03-11 10:12 ` Gareth Stockwell 0 siblings, 1 reply; 8+ messages in thread From: Ian Campbell @ 2015-03-10 16:22 UTC (permalink / raw) To: Gareth Stockwell; +Cc: xen-devel Hi Gareth, I think this counts as a -devel question, so I've added -devel and moved -users to bcc (-users is more for end users). I've done less quote trimming than usual for the other folks on -devel. On Tue, 2015-03-10 at 14:15 +0000, Gareth Stockwell wrote: > What is the recommended way to transmit batches of grant references > between Linux domains? > I need to share large regions of memory between domains. I understand > that the grant table API can be used to share memory, with one grant > reference being created per page (frame) to be shared: > gref = gnttab_grant_foreign_access(otherend_id, frame, readonly); > or > err = gnttab_alloc_grant_references(num_grefs, &gref_head); > gref = gnttab_claim_reference(&gref_head); > err = gnttab_grant_foreign_access_ref(gref, otherend_id, frame, > readonly); > > > > In order to share a large number of pages, it is desirable to minimise > both the number of hypercalls required in each domain, and the number > of messages (e.g. xenstore writes) required to transmit grant > reference(s) from the donor to the recipient. > > > > I see that gnttab_grant_foreign_access just updates grant table fields > in a page which is mapped into the donor, and does not require a > hypercall. In the recipient domain, multiple grant references can be > mapped by gnttab_map_refs using a single GNTTABOP_map_grant_ref > hypercall (assuming that the target memory is not paged out). Correct. Granting access to a page is just a case of writing to a local page and the mapping interface is batched. > What is the recommended way for the donor to transmit a batch of grant > references? I assume that this requires the donor to pack references > into an index page, grant foreign access to the index and transmit the > index grant reference. Does Linux provide any way to do this, or are > xenbus drivers expected to implement their own batch transmission? A bit of each. You would indeed want to setup a shared page and push the references into it, and Linux (/the Xen interface headers) provide some helpers for this sort of thing, but each driver largely sets things up themselves using a specific ring request format etc. The actual ring structure helpers are in xen.git/xen/include/public/io/ring.h you would define a request and response pair (e.g. containing one or more grefs per request) and then use the macros from ring.h to setup and use the shared ring data structures. (NB: xen.git/xen/include/public corresponds to linux.git/include/xen/interface) The ring macros include provision for batching and deferred notifications, so you can balance the number of grefs per request vs. multiple requests based on your needs. As far as setup of the ring itself goes typically the frontend would allocate one of its pages, grant it to the backend and communicate that to the backend via xenstore. Most drivers use a little start of day synchronisation protocol based around the "state" keys in the front and backend xenstore dirs, working through the states in enum xenbus_state XenbusState* from xen/include/public/io/xenbus.h. It's assumed that this setup is infrequent (i.e. corresponds to plugging in a new disk etc) xen/include/public/io/blkif.h has an example of how that works in the case of the blk driver. In Linux (for most drivers at least, yours may not fit this infrastructure) that state machine can be driven from the .otherend_changed callback in the struct xenbus_driver ops struct. http://wiki.xen.org/wiki/XenBus covers some of this in the first 3rd, but TBH it's not as helpful as it could be. I thought we had something better somewhere (a whitepaper or something), but I can't find any sign of such a thing, perhaps someone on the list has a reference to such a thing. Other than that there is the code in Linux. I think both net and blkback put most of the initial setup xenbus stuff in their respective drivers/{block,net}/xen-{blk,net}back/xenbus.c. For the frontend (drivers/{block,net}/xen-{blk,net}front.c) it's in the single file. In both cases the .otherend_changed hook is probably the place to start. I hope that helps. Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-10 16:22 ` [Xen-users] Grant reference batch transmission Ian Campbell @ 2015-03-11 10:12 ` Gareth Stockwell 2015-03-11 10:24 ` Ian Campbell 0 siblings, 1 reply; 8+ messages in thread From: Gareth Stockwell @ 2015-03-11 10:12 UTC (permalink / raw) To: Ian.Campbell@citrix.com; +Cc: xen-devel Hi Ian, Thanks for your reply. On Tue, Mar 10, 2015 at 16:22:40, Ian Campbell wrote: > > > What is the recommended way for the donor to transmit a batch of > > grant references? I assume that this requires the donor to pack > > references into an index page, grant foreign access to the index and > > transmit the index grant reference. Does Linux provide any way to > > do this, or are xenbus drivers expected to implement their own batch transmission? > > A bit of each. You would indeed want to setup a shared page and push > the references into it, and Linux (/the Xen interface headers) provide > some helpers for this sort of thing, but each driver largely sets > things up themselves using a specific ring request format etc. > > As far as setup of the ring itself goes typically the frontend would > allocate one of its pages, grant it to the backend and communicate > that to the backend via xenstore. Most drivers use a little start of > day synchronisation protocol based around the "state" keys in the > front and backend xenstore dirs, working through the states in enum > xenbus_state > XenbusState* from xen/include/public/io/xenbus.h. It's assumed that > this setup is infrequent (i.e. corresponds to plugging in a new disk > etc) > > In Linux (for most drivers at least, yours may not fit this > infrastructure) that state machine can be driven from the > .otherend_changed callback in the struct xenbus_driver ops struct. We have implemented front/backend drivers which perform this handshake via xenstore state keys, and which share a single page allocated by the frontend. I think this gives us two options for grant reference batch transmission: 1. Send the grant references via the ring buffer. This doesn't require any additional allocations, but means that if the number of grant references in the batch is greater than O(sizeof(ringbuffer) / sizeof(grant_ref_t)), cycling through the ring will be required. 2. Allocate and share one or more "index" page(s) which hold the grant references. This means that only a single grant_ref_t needs to be sent via the ring, but at the cost of allocating additional memory for the index. If multiple index pages are required, they could be chained together by appending to index page N a grant reference pointing to page N+1. AFAICS the existing drivers use approach #1; is there any precedent for #2? Gareth -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-11 10:12 ` Gareth Stockwell @ 2015-03-11 10:24 ` Ian Campbell 2015-03-11 10:40 ` Gareth Stockwell 2015-03-11 10:43 ` Wei Liu 0 siblings, 2 replies; 8+ messages in thread From: Ian Campbell @ 2015-03-11 10:24 UTC (permalink / raw) To: Gareth Stockwell; +Cc: Roger Pau Monne, Wei Liu, xen-devel On Wed, 2015-03-11 at 10:12 +0000, Gareth Stockwell wrote: > Hi Ian, > > Thanks for your reply. > > On Tue, Mar 10, 2015 at 16:22:40, Ian Campbell wrote: > > > > > What is the recommended way for the donor to transmit a batch of > > > grant references? I assume that this requires the donor to pack > > > references into an index page, grant foreign access to the index and > > > transmit the index grant reference. Does Linux provide any way to > > > do this, or are xenbus drivers expected to implement their own batch transmission? > > > > A bit of each. You would indeed want to setup a shared page and push > > the references into it, and Linux (/the Xen interface headers) provide > > some helpers for this sort of thing, but each driver largely sets > > things up themselves using a specific ring request format etc. > > > > As far as setup of the ring itself goes typically the frontend would > > allocate one of its pages, grant it to the backend and communicate > > that to the backend via xenstore. Most drivers use a little start of > > day synchronisation protocol based around the "state" keys in the > > front and backend xenstore dirs, working through the states in enum > > xenbus_state > > XenbusState* from xen/include/public/io/xenbus.h. It's assumed that > > this setup is infrequent (i.e. corresponds to plugging in a new disk > > etc) > > > > In Linux (for most drivers at least, yours may not fit this > > infrastructure) that state machine can be driven from the > > .otherend_changed callback in the struct xenbus_driver ops struct. > > We have implemented front/backend drivers which perform this handshake > via xenstore state keys, and which share a single page allocated by > the frontend. > > I think this gives us two options for grant reference batch transmission: > > 1. Send the grant references via the ring buffer. > This doesn't require any additional allocations, but means that if the > number of grant references in the batch is greater than > O(sizeof(ringbuffer) / sizeof(grant_ref_t)), cycling through the ring > will be required. Correct. In fact it's a bit worse because the ring pointers steal a bit of space out of the shared page. You might also find that in practice you want an id in the request which is echoed back in the response e.g. to handle out of order completion (depends on your use case though), which would increase sizeof(grant_ref_t) (which is really sizeof(my_req_t)). What sorts of batch sizes are you expecting to see in your use case? > 2. Allocate and share one or more "index" page(s) which hold the grant references. > This means that only a single grant_ref_t needs to be sent via the > ring, but at the cost of allocating additional memory for the index. > If multiple index pages are required, they could be chained together > by appending to index page N a grant reference pointing to page N+1. > > AFAICS the existing drivers use approach #1; is there any precedent > for #2? There have been patches for both net and blk at various points which implemented "multipage rings" in order to get around the limitation on the number of requests/responses which can git in a single page. I'm not sure what the status of those is (Roger, Wei?) but I would assume/hope that they also included some improvements to the common infrastructure to make it simpler to arrange for this. The other approach, which I believe is used by the blk protocol today, is "indirect descriptors" where the ring request contains a gref which references another page which can then be packed full of grant references. That has the downside of one more map call per request (although these can still be batched by the backend) but lets you pack a lot more "bandwidth" into a single ring. Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-11 10:24 ` Ian Campbell @ 2015-03-11 10:40 ` Gareth Stockwell 2015-03-11 10:54 ` Ian Campbell 2015-03-11 10:43 ` Wei Liu 1 sibling, 1 reply; 8+ messages in thread From: Gareth Stockwell @ 2015-03-11 10:40 UTC (permalink / raw) To: Ian.Campbell@citrix.com; +Cc: Roger Pau Monne, Wei Liu, xen-devel On Wed, Mar 11, 2015 at 10:24:48, Ian Campbell wrote: > > I think this gives us two options for grant reference batch transmission: > > > > 1. Send the grant references via the ring buffer. > > This doesn't require any additional allocations, but means that if > > the number of grant references in the batch is greater than > > O(sizeof(ringbuffer) / sizeof(grant_ref_t)), cycling through the > > ring will be required. > > Correct. In fact it's a bit worse because the ring pointers steal a > bit of space out of the shared page. You might also find that in > practice you want an id in the request which is echoed back in the response e.g. > to handle out of order completion (depends on your use case though), > which would increase sizeof(grant_ref_t) (which is really sizeof(my_req_t)). > > What sorts of batch sizes are you expecting to see in your use case? We need to share hundreds of MB, so (assuming a 4kB guest page size) the batch size can be thousands of grant references. > > 2. Allocate and share one or more "index" page(s) which hold the > > grant > references. > > This means that only a single grant_ref_t needs to be sent via the > > ring, but at the cost of allocating additional memory for the index. > > If multiple index pages are required, they could be chained together > > by appending to index page N a grant reference pointing to page N+1. > > There have been patches for both net and blk at various points which > implemented "multipage rings" in order to get around the limitation on > the number of requests/responses which can git in a single page. I'm > not sure what the status of those is (Roger, Wei?) but I would > assume/hope that they also included some improvements to the common > infrastructure to make it simpler to arrange for this. > > The other approach, which I believe is used by the blk protocol today, > is "indirect descriptors" where the ring request contains a gref which > references another page which can then be packed full of grant references. > That has the downside of one more map call per request (although these > can still be batched by the backend) but lets you pack a lot more "bandwidth" > into a single ring. The approach I was thinking about sounds similar to "indirect descriptors" - I'll have a closer look at the blk drivers. -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-11 10:40 ` Gareth Stockwell @ 2015-03-11 10:54 ` Ian Campbell 2015-03-11 13:30 ` Gareth Stockwell 0 siblings, 1 reply; 8+ messages in thread From: Ian Campbell @ 2015-03-11 10:54 UTC (permalink / raw) To: Gareth Stockwell; +Cc: Roger Pau Monne, Wei Liu, xen-devel On Wed, 2015-03-11 at 10:40 +0000, Gareth Stockwell wrote: > On Wed, Mar 11, 2015 at 10:24:48, Ian Campbell wrote: > > > I think this gives us two options for grant reference batch transmission: > > > > > > 1. Send the grant references via the ring buffer. > > > This doesn't require any additional allocations, but means that if > > > the number of grant references in the batch is greater than > > > O(sizeof(ringbuffer) / sizeof(grant_ref_t)), cycling through the > > > ring will be required. > > > > Correct. In fact it's a bit worse because the ring pointers steal a > > bit of space out of the shared page. You might also find that in > > practice you want an id in the request which is echoed back in the response e.g. > > to handle out of order completion (depends on your use case though), > > which would increase sizeof(grant_ref_t) (which is really sizeof(my_req_t)). > > > > What sorts of batch sizes are you expecting to see in your use case? > > We need to share hundreds of MB, so (assuming a 4kB guest page size) > the batch size can be thousands of grant references. FWIW, the granularity of a gref is 4kB irrespective of the guest's page size (so e.g. a 64Kb page would need 16 grefs to cover it), so your calculation is correct everywhere. We have been considering allow "superpage" grants of higher order mappings, but no work has been done on that yet. You such a feature be useful or are your hundreds of MB scattered? I suppose the 100s of MB is changing reasonable often (e.g. not just a start of day setup thing) otherwise the performance/efficiency of gref communications wouldn't be such an issue. Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-11 10:54 ` Ian Campbell @ 2015-03-11 13:30 ` Gareth Stockwell 2015-03-12 10:43 ` Ian Campbell 0 siblings, 1 reply; 8+ messages in thread From: Gareth Stockwell @ 2015-03-11 13:30 UTC (permalink / raw) To: Ian.Campbell@citrix.com; +Cc: Roger Pau Monne, Wei Liu, xen-devel On Wed, Mar 11, 2015 at 10:54:25, Ian Campbell wrote: > > > What sorts of batch sizes are you expecting to see in your use case? > > > > We need to share hundreds of MB, so (assuming a 4kB guest page size) > > the batch size can be thousands of grant references. > > FWIW, the granularity of a gref is 4kB irrespective of the guest's > page size (so e.g. a 64Kb page would need 16 grefs to cover it), so > your calculation is correct everywhere. > > We have been considering allow "superpage" grants of higher order > mappings, but no work has been done on that yet. You such a feature be > useful or are your hundreds of MB scattered? They will be contiguous in 2MB chunks - so higher order grants would be useful. What happens to the gref granularity if the stage-2 page size is larger than the stage-1 page size? > I suppose the 100s of MB is changing reasonable often (e.g. not just a > start of day setup thing) otherwise the performance/efficiency of gref > communications wouldn't be such an issue. It's not start of day (i.e. at driver probe time), but it's not particularly frequent either. The memory being mapped into the remote domain contains data for a workload whose processing time is typically quite extended. The time available for establishing the mapping should be at least in the tens of milliseconds. Gareth -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-11 13:30 ` Gareth Stockwell @ 2015-03-12 10:43 ` Ian Campbell 0 siblings, 0 replies; 8+ messages in thread From: Ian Campbell @ 2015-03-12 10:43 UTC (permalink / raw) To: Gareth Stockwell; +Cc: Roger Pau Monne, Wei Liu, xen-devel On Wed, 2015-03-11 at 13:30 +0000, Gareth Stockwell wrote: > On Wed, Mar 11, 2015 at 10:54:25, Ian Campbell wrote: > > > > What sorts of batch sizes are you expecting to see in your use case? > > > > > > We need to share hundreds of MB, so (assuming a 4kB guest page size) > > > the batch size can be thousands of grant references. > > > > FWIW, the granularity of a gref is 4kB irrespective of the guest's > > page size (so e.g. a 64Kb page would need 16 grefs to cover it), so > > your calculation is correct everywhere. > > > > We have been considering allow "superpage" grants of higher order > > mappings, but no work has been done on that yet. You such a feature be > > useful or are your hundreds of MB scattered? > > They will be contiguous in 2MB chunks - so higher order grants would be useful. > > What happens to the gref granularity if the stage-2 page size is larger than the stage-1 page size? I think it is unlikely that Xen would ever support anything other than 4K/2M/1G granules internally, i.e. in stage 2, there just too many assumption that the basic page size is 4K in the ABI. superpages are something we can cope with, but changing the base size not so much. TBH my feeling is that for a hypervisor at least using 2M mappings where possible is a better option than e.g. using 64K pages. > > > I suppose the 100s of MB is changing reasonable often (e.g. not just a > > start of day setup thing) otherwise the performance/efficiency of gref > > communications wouldn't be such an issue. > > It's not start of day (i.e. at driver probe time), but it's not > particularly frequent either. The memory being mapped into the remote > domain contains data for a workload whose processing time is typically > quite extended. The time available for establishing the mapping > should be at least in the tens of milliseconds. >From a back of the envelope calculation I think pushing 100MB worth of grefs over a single page ring is going to take tens of full rings worth of grefs, so pushing them individually is going to be pretty tedious and probably blow the time budget. Even multipage rings wouldn't cut it unless you went to pretty high order rings. Some sort of indirect scheme seems like it would be best. Given tens of ms to get it done I don't think the mapping of the indirect pages will be to much of a factor if you batch them. Ian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xen-users] Grant reference batch transmission 2015-03-11 10:24 ` Ian Campbell 2015-03-11 10:40 ` Gareth Stockwell @ 2015-03-11 10:43 ` Wei Liu 1 sibling, 0 replies; 8+ messages in thread From: Wei Liu @ 2015-03-11 10:43 UTC (permalink / raw) To: Ian Campbell; +Cc: Wei Liu, Roger Pau Monne, Gareth Stockwell, xen-devel On Wed, Mar 11, 2015 at 10:24:48AM +0000, Ian Campbell wrote: > On Wed, 2015-03-11 at 10:12 +0000, Gareth Stockwell wrote: > > Hi Ian, > > > > Thanks for your reply. > > > > On Tue, Mar 10, 2015 at 16:22:40, Ian Campbell wrote: > > > > > > > What is the recommended way for the donor to transmit a batch of > > > > grant references? I assume that this requires the donor to pack > > > > references into an index page, grant foreign access to the index and > > > > transmit the index grant reference. Does Linux provide any way to > > > > do this, or are xenbus drivers expected to implement their own batch transmission? > > > > > > A bit of each. You would indeed want to setup a shared page and push > > > the references into it, and Linux (/the Xen interface headers) provide > > > some helpers for this sort of thing, but each driver largely sets > > > things up themselves using a specific ring request format etc. > > > > > > As far as setup of the ring itself goes typically the frontend would > > > allocate one of its pages, grant it to the backend and communicate > > > that to the backend via xenstore. Most drivers use a little start of > > > day synchronisation protocol based around the "state" keys in the > > > front and backend xenstore dirs, working through the states in enum > > > xenbus_state > > > XenbusState* from xen/include/public/io/xenbus.h. It's assumed that > > > this setup is infrequent (i.e. corresponds to plugging in a new disk > > > etc) > > > > > > In Linux (for most drivers at least, yours may not fit this > > > infrastructure) that state machine can be driven from the > > > .otherend_changed callback in the struct xenbus_driver ops struct. > > > > We have implemented front/backend drivers which perform this handshake > > via xenstore state keys, and which share a single page allocated by > > the frontend. > > > > I think this gives us two options for grant reference batch transmission: > > > > 1. Send the grant references via the ring buffer. > > This doesn't require any additional allocations, but means that if the > > number of grant references in the batch is greater than > > O(sizeof(ringbuffer) / sizeof(grant_ref_t)), cycling through the ring > > will be required. > > Correct. In fact it's a bit worse because the ring pointers steal a bit > of space out of the shared page. You might also find that in practice > you want an id in the request which is echoed back in the response e.g. > to handle out of order completion (depends on your use case though), > which would increase sizeof(grant_ref_t) (which is really > sizeof(my_req_t)). > > What sorts of batch sizes are you expecting to see in your use case? > > > 2. Allocate and share one or more "index" page(s) which hold the grant references. > > This means that only a single grant_ref_t needs to be sent via the > > ring, but at the cost of allocating additional memory for the index. > > If multiple index pages are required, they could be chained together > > by appending to index page N a grant reference pointing to page N+1. > > > > AFAICS the existing drivers use approach #1; is there any precedent > > for #2? > > There have been patches for both net and blk at various points which > implemented "multipage rings" in order to get around the limitation on > the number of requests/responses which can git in a single page. I'm not > sure what the status of those is (Roger, Wei?) but I would assume/hope > that they also included some improvements to the common infrastructure > to make it simpler to arrange for this. > Search for "xenbus_client: extend interface to suppurt multi-page ring" for the common infrastructure bits. I think the latest public posting is <1422008071-27643-1-git-send-email-bob.liu@oracle.com> The same thread also contains changes to blk drivers to make use of that feature. Wei. > The other approach, which I believe is used by the blk protocol today, > is "indirect descriptors" where the ring request contains a gref which > references another page which can then be packed full of grant > references. That has the downside of one more map call per request > (although these can still be batched by the backend) but lets you pack a > lot more "bandwidth" into a single ring. > > Ian. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-03-12 10:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <DA845EDCE27355428C520DC5B8DC05CE47E7A93173@GEORGE.Emea.Arm.com>
2015-03-10 16:22 ` [Xen-users] Grant reference batch transmission Ian Campbell
2015-03-11 10:12 ` Gareth Stockwell
2015-03-11 10:24 ` Ian Campbell
2015-03-11 10:40 ` Gareth Stockwell
2015-03-11 10:54 ` Ian Campbell
2015-03-11 13:30 ` Gareth Stockwell
2015-03-12 10:43 ` Ian Campbell
2015-03-11 10:43 ` Wei Liu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.