* [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() @ 2017-06-02 11:50 Peter Xu 2017-06-02 11:50 ` [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate Peter Xu ` (3 more replies) 0 siblings, 4 replies; 34+ messages in thread From: Peter Xu @ 2017-06-02 11:50 UTC (permalink / raw) To: qemu-devel Cc: Michael S . Tsirkin, Paolo Bonzini, Maxime Coquelin, peterx, Jason Wang With the patch applied: [PATCH v3] exec: fix address_space_get_iotlb_entry page mask (already in Paolo's pull request but not yet merged) Now we can have valid address masks. However it is still not ideal, considering that the mask may not be aligned to guest page sizes. One example would be when huge page is used in guest (please see commit message in patch 1 for details). It applies to normal pages too. So we not only need a valid address mask, we should make sure it is page mask (for x86, it should be either 4K/2M/1G pages). Patch 1+2 fixes the problem. Tested with both kernel net driver or testpmd, on either 4K/2M pages, to make sure the page mask is correct. Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll definitely want patch 3 now. Here's the simplest TCP streaming test using vhost dmar and iommu=pt in guest: without patch 3: 12.0Gbps with patch 3: 33.5Gbps Please review, thanks. Peter Xu (3): exec: add page_mask for address_space_do_translate exec: simplify address_space_get_iotlb_entry vhost: iommu: cache static mapping if there is exec.c | 73 +++++++++++++++++++++++++++++++++----------------- hw/virtio/trace-events | 4 +++ hw/virtio/vhost.c | 66 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 119 insertions(+), 24 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 34+ messages in thread
* [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate 2017-06-02 11:50 [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Peter Xu @ 2017-06-02 11:50 ` Peter Xu 2017-06-02 16:45 ` Michael S. Tsirkin 2017-06-02 11:50 ` [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry Peter Xu ` (2 subsequent siblings) 3 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-02 11:50 UTC (permalink / raw) To: qemu-devel Cc: Michael S . Tsirkin, Paolo Bonzini, Maxime Coquelin, peterx, Jason Wang The function is originally used for address_space_translate() and what we care about most is (xlat, plen) range. However for iotlb requests, we don't really care about "plen", but the size of the page that "xlat" is located on. While, plen cannot really contain this information. A simple example to show why "plen" is not good for IOTLB translations: E.g., for huge pages, it is possible that guest mapped 1G huge page on device side that used this GPA range: 0x100000000 - 0x13fffffff Then let's say we want to translate one IOVA that finally mapped to GPA 0x13ffffe00 (which is located on this 1G huge page). Then here we'll get: (xlat, plen) = (0x13fffe00, 0x200) So the IOTLB would be only covering a very small range since from "plen" (which is 0x200 bytes) we cannot tell the size of the page. Actually we can really know that this is a huge page - we just throw the information away in address_space_do_translate(). This patch introduced "page_mask" optional parameter to capture that page mask info. Also, I made "plen" an optional parameter as well, with some comments for the whole function. No functional change yet. Signed-off-by: Peter Xu <peterx@redhat.com> --- exec.c | 46 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/exec.c b/exec.c index 8fc0e78..63a3ff0 100644 --- a/exec.c +++ b/exec.c @@ -465,21 +465,45 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x return section; } -/* Called from RCU critical section */ +/** + * address_space_do_translate - translate an address in AddressSpace + * + * @as: the address space that we want to translate on + * @addr: the address to be translated in above address space + * @xlat: the translated address offset within memory region. It + * cannot be @NULL. + * @plen_out: valid read/write length of the translated address. It + * can be @NULL when we don't care about it. + * @page_mask_out: page mask for the translated address. This + * should only be meaningful for IOMMU translated + * addresses, since there may be huge pages that this bit + * would tell. It can be @NULL if we don't care about it. + * @is_write: whether the translation operation is for write + * @is_mmio: whether this can be MMIO, set true if it can + * + * This function is called from RCU critical section + */ static MemoryRegionSection address_space_do_translate(AddressSpace *as, hwaddr addr, hwaddr *xlat, - hwaddr *plen, + hwaddr *plen_out, + hwaddr *page_mask_out, bool is_write, bool is_mmio) { IOMMUTLBEntry iotlb; MemoryRegionSection *section; MemoryRegion *mr; + hwaddr page_mask = TARGET_PAGE_MASK; + hwaddr plen = (hwaddr)(-1); + + if (plen_out) { + plen = *plen_out; + } for (;;) { AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch); - section = address_space_translate_internal(d, addr, &addr, plen, is_mmio); + section = address_space_translate_internal(d, addr, &addr, &plen, is_mmio); mr = section->mr; if (!mr->iommu_ops) { @@ -490,7 +514,8 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as, IOMMU_WO : IOMMU_RO); addr = ((iotlb.translated_addr & ~iotlb.addr_mask) | (addr & iotlb.addr_mask)); - *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); + page_mask = iotlb.addr_mask; + plen = MIN(plen, (addr | iotlb.addr_mask) - addr + 1); if (!(iotlb.perm & (1 << is_write))) { goto translate_fail; } @@ -500,6 +525,14 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as, *xlat = addr; + if (page_mask_out) { + *page_mask_out = page_mask; + } + + if (plen_out) { + *plen_out = plen; + } + return *section; translate_fail: @@ -518,7 +551,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, /* This can never be MMIO. */ section = address_space_do_translate(as, addr, &xlat, &plen, - is_write, false); + NULL, is_write, false); /* Illegal translation */ if (section.mr == &io_mem_unassigned) { @@ -560,7 +593,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, MemoryRegionSection section; /* This can be MMIO, so setup MMIO bit. */ - section = address_space_do_translate(as, addr, xlat, plen, is_write, true); + section = address_space_do_translate(as, addr, xlat, plen, + NULL, is_write, true); mr = section.mr; if (xen_enabled() && memory_access_is_direct(mr, is_write)) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate 2017-06-02 11:50 ` [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate Peter Xu @ 2017-06-02 16:45 ` Michael S. Tsirkin 2017-06-05 2:52 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-02 16:45 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 07:50:52PM +0800, Peter Xu wrote: > The function is originally used for address_space_translate() and what > we care about most is (xlat, plen) range. However for iotlb requests, we > don't really care about "plen", but the size of the page that "xlat" is > located on. While, plen cannot really contain this information. > > A simple example to show why "plen" is not good for IOTLB translations: > > E.g., for huge pages, it is possible that guest mapped 1G huge page on > device side that used this GPA range: > > 0x100000000 - 0x13fffffff > > Then let's say we want to translate one IOVA that finally mapped to GPA > 0x13ffffe00 (which is located on this 1G huge page). Then here we'll > get: > > (xlat, plen) = (0x13fffe00, 0x200) > > So the IOTLB would be only covering a very small range since from > "plen" (which is 0x200 bytes) we cannot tell the size of the page. > > Actually we can really know that this is a huge page - we just throw the > information away in address_space_do_translate(). > > This patch introduced "page_mask" optional parameter to capture that > page mask info. Also, I made "plen" an optional parameter as well, with > some comments for the whole function. > > No functional change yet. > > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > exec.c | 46 ++++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 40 insertions(+), 6 deletions(-) > > diff --git a/exec.c b/exec.c > index 8fc0e78..63a3ff0 100644 > --- a/exec.c > +++ b/exec.c > @@ -465,21 +465,45 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x > return section; > } > > -/* Called from RCU critical section */ > +/** > + * address_space_do_translate - translate an address in AddressSpace > + * > + * @as: the address space that we want to translate on > + * @addr: the address to be translated in above address space > + * @xlat: the translated address offset within memory region. It > + * cannot be @NULL. > + * @plen_out: valid read/write length of the translated address. It > + * can be @NULL when we don't care about it. > + * @page_mask_out: page mask for the translated address. This > + * should only be meaningful for IOMMU translated > + * addresses, since there may be huge pages that this bit > + * would tell. It can be @NULL if we don't care about it. Why do we need plen or mask at all? It seems MemoryRegionSection has address and length already. So if you want to find out distance to section end, do section.size - xlat and you are done. > + * @is_write: whether the translation operation is for write > + * @is_mmio: whether this can be MMIO, set true if it can > + * > + * This function is called from RCU critical section > + */ > static MemoryRegionSection address_space_do_translate(AddressSpace *as, > hwaddr addr, > hwaddr *xlat, > - hwaddr *plen, > + hwaddr *plen_out, > + hwaddr *page_mask_out, > bool is_write, > bool is_mmio) > { > IOMMUTLBEntry iotlb; > MemoryRegionSection *section; > MemoryRegion *mr; > + hwaddr page_mask = TARGET_PAGE_MASK; > + hwaddr plen = (hwaddr)(-1); > + > + if (plen_out) { > + plen = *plen_out; > + } > > for (;;) { > AddressSpaceDispatch *d = atomic_rcu_read(&as->dispatch); > - section = address_space_translate_internal(d, addr, &addr, plen, is_mmio); > + section = address_space_translate_internal(d, addr, &addr, &plen, is_mmio); > mr = section->mr; > > if (!mr->iommu_ops) { > @@ -490,7 +514,8 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as, > IOMMU_WO : IOMMU_RO); > addr = ((iotlb.translated_addr & ~iotlb.addr_mask) > | (addr & iotlb.addr_mask)); > - *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); > + page_mask = iotlb.addr_mask; > + plen = MIN(plen, (addr | iotlb.addr_mask) - addr + 1); > if (!(iotlb.perm & (1 << is_write))) { > goto translate_fail; > } > @@ -500,6 +525,14 @@ static MemoryRegionSection address_space_do_translate(AddressSpace *as, > > *xlat = addr; > > + if (page_mask_out) { > + *page_mask_out = page_mask; > + } > + > + if (plen_out) { > + *plen_out = plen; > + } > + > return *section; > > translate_fail: > @@ -518,7 +551,7 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, > > /* This can never be MMIO. */ > section = address_space_do_translate(as, addr, &xlat, &plen, > - is_write, false); > + NULL, is_write, false); > > /* Illegal translation */ > if (section.mr == &io_mem_unassigned) { > @@ -560,7 +593,8 @@ MemoryRegion *address_space_translate(AddressSpace *as, hwaddr addr, > MemoryRegionSection section; > > /* This can be MMIO, so setup MMIO bit. */ > - section = address_space_do_translate(as, addr, xlat, plen, is_write, true); > + section = address_space_do_translate(as, addr, xlat, plen, > + NULL, is_write, true); > mr = section.mr; > > if (xen_enabled() && memory_access_is_direct(mr, is_write)) { > -- > 2.7.4 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate 2017-06-02 16:45 ` Michael S. Tsirkin @ 2017-06-05 2:52 ` Peter Xu 0 siblings, 0 replies; 34+ messages in thread From: Peter Xu @ 2017-06-05 2:52 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 07:45:05PM +0300, Michael S. Tsirkin wrote: > On Fri, Jun 02, 2017 at 07:50:52PM +0800, Peter Xu wrote: > > The function is originally used for address_space_translate() and what > > we care about most is (xlat, plen) range. However for iotlb requests, we > > don't really care about "plen", but the size of the page that "xlat" is > > located on. While, plen cannot really contain this information. > > > > A simple example to show why "plen" is not good for IOTLB translations: > > > > E.g., for huge pages, it is possible that guest mapped 1G huge page on > > device side that used this GPA range: > > > > 0x100000000 - 0x13fffffff > > > > Then let's say we want to translate one IOVA that finally mapped to GPA > > 0x13ffffe00 (which is located on this 1G huge page). Then here we'll > > get: > > > > (xlat, plen) = (0x13fffe00, 0x200) > > > > So the IOTLB would be only covering a very small range since from > > "plen" (which is 0x200 bytes) we cannot tell the size of the page. > > > > Actually we can really know that this is a huge page - we just throw the > > information away in address_space_do_translate(). > > > > This patch introduced "page_mask" optional parameter to capture that > > page mask info. Also, I made "plen" an optional parameter as well, with > > some comments for the whole function. > > > > No functional change yet. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > exec.c | 46 ++++++++++++++++++++++++++++++++++++++++------ > > 1 file changed, 40 insertions(+), 6 deletions(-) > > > > diff --git a/exec.c b/exec.c > > index 8fc0e78..63a3ff0 100644 > > --- a/exec.c > > +++ b/exec.c > > @@ -465,21 +465,45 @@ address_space_translate_internal(AddressSpaceDispatch *d, hwaddr addr, hwaddr *x > > return section; > > } > > > > -/* Called from RCU critical section */ > > +/** > > + * address_space_do_translate - translate an address in AddressSpace > > + * > > + * @as: the address space that we want to translate on > > + * @addr: the address to be translated in above address space > > + * @xlat: the translated address offset within memory region. It > > + * cannot be @NULL. > > + * @plen_out: valid read/write length of the translated address. It > > + * can be @NULL when we don't care about it. > > + * @page_mask_out: page mask for the translated address. This > > + * should only be meaningful for IOMMU translated > > + * addresses, since there may be huge pages that this bit > > + * would tell. It can be @NULL if we don't care about it. > > Why do we need plen or mask at all? It seems MemoryRegionSection > has address and length already. So if you want to find out > distance to section end, do section.size - xlat and you are done. Hi, Michael, When you say: section.size - xlat Do you really mean this? section.offset_within_address_space + section.size - xlat Since otherwise it will make no much sense to me. Anyway, I don't know whether it'll be okay we remove the plen... In address_space_do_translate(), the logic is basically: 1. do internal translation (basically to find the section info from current address space) 2. do IOMMU translation if the MR is IOMMU typed 3. goto 1. Along the way (1 -> 2 -> 3 -> 1 -> ...) until we finished the translation (I don't really know whether we'll have cases for nested IOMMU translation, but anyway we have a while loop there, so assume the loop can be executed many times), plen can be shrinking all the time, either by this in address_space_translate_internal(): *plen = int128_get64(int128_min(diff, int128_make64(*plen))); Or this in address_space_do_translate(): *plen = MIN(*plen, (addr | iotlb.addr_mask) - addr + 1); And I don't know only using the final section.size to decide plen would be enough. Also, for page_mask information - I don't quite sure MemoryRegionSection can express that info. Again, huge page can be one example: MemoryRegionSection doesn't really contain huge page information, while MemoryRegionIOMMUOps.translate() does contain that information (via addr_mask field). (I see that you would like IOTLB to be using arbitary length rather than page masks. Maybe we can first decide which would be the best interface for IOTLB. I'll reply in that context later.) Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-02 11:50 [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Peter Xu 2017-06-02 11:50 ` [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate Peter Xu @ 2017-06-02 11:50 ` Peter Xu 2017-06-02 16:49 ` Michael S. Tsirkin 2017-06-02 11:50 ` [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is Peter Xu 2017-06-02 14:51 ` [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Michael S. Tsirkin 3 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-02 11:50 UTC (permalink / raw) To: qemu-devel Cc: Michael S . Tsirkin, Paolo Bonzini, Maxime Coquelin, peterx, Jason Wang This patch let address_space_get_iotlb_entry() to use the newly introduced page_mask parameter in address_space_do_translate(). Then we will be sure the IOTLB can be aligned to page mask, also we should nicely support huge pages now when introducing a764040. Fixes: a764040 ("exec: abstract address_space_do_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> --- exec.c | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/exec.c b/exec.c index 63a3ff0..1f86253 100644 --- a/exec.c +++ b/exec.c @@ -544,14 +544,14 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, bool is_write) { MemoryRegionSection section; - hwaddr xlat, plen; + hwaddr xlat, page_mask; - /* Try to get maximum page mask during translation. */ - plen = (hwaddr)-1; - - /* This can never be MMIO. */ - section = address_space_do_translate(as, addr, &xlat, &plen, - NULL, is_write, false); + /* + * This can never be MMIO, and we don't really care about plen, + * but page mask. + */ + section = address_space_do_translate(as, addr, &xlat, NULL, + &page_mask, is_write, false); /* Illegal translation */ if (section.mr == &io_mem_unassigned) { @@ -562,20 +562,11 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, xlat += section.offset_within_address_space - section.offset_within_region; - if (plen == (hwaddr)-1) { - /* If not specified during translation, use default mask */ - plen = TARGET_PAGE_MASK; - } else { - /* Make it a valid page mask */ - assert(plen); - plen = pow2floor(plen) - 1; - } - return (IOMMUTLBEntry) { .target_as = section.address_space, - .iova = addr & ~plen, - .translated_addr = xlat & ~plen, - .addr_mask = plen, + .iova = addr & ~page_mask, + .translated_addr = xlat & ~page_mask, + .addr_mask = page_mask, /* IOTLBs are for DMAs, and DMA only allows on RAMs. */ .perm = IOMMU_RW, }; -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-02 11:50 ` [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry Peter Xu @ 2017-06-02 16:49 ` Michael S. Tsirkin 2017-06-05 3:07 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-02 16:49 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 07:50:53PM +0800, Peter Xu wrote: > This patch let address_space_get_iotlb_entry() to use the newly > introduced page_mask parameter in address_space_do_translate(). Then we > will be sure the IOTLB can be aligned to page mask, also we should > nicely support huge pages now when introducing a764040. > > Fixes: a764040 ("exec: abstract address_space_do_translate()") > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > exec.c | 29 ++++++++++------------------- > 1 file changed, 10 insertions(+), 19 deletions(-) > > diff --git a/exec.c b/exec.c > index 63a3ff0..1f86253 100644 > --- a/exec.c > +++ b/exec.c > @@ -544,14 +544,14 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, > bool is_write) > { > MemoryRegionSection section; > - hwaddr xlat, plen; > + hwaddr xlat, page_mask; > > - /* Try to get maximum page mask during translation. */ > - plen = (hwaddr)-1; > - > - /* This can never be MMIO. */ > - section = address_space_do_translate(as, addr, &xlat, &plen, > - NULL, is_write, false); > + /* > + * This can never be MMIO, and we don't really care about plen, > + * but page mask. > + */ > + section = address_space_do_translate(as, addr, &xlat, NULL, > + &page_mask, is_write, false); > > /* Illegal translation */ > if (section.mr == &io_mem_unassigned) { Can we just use section.size - xlat here? > @@ -562,20 +562,11 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, > xlat += section.offset_within_address_space - > section.offset_within_region; > > - if (plen == (hwaddr)-1) { > - /* If not specified during translation, use default mask */ > - plen = TARGET_PAGE_MASK; > - } else { > - /* Make it a valid page mask */ > - assert(plen); > - plen = pow2floor(plen) - 1; > - } > - > return (IOMMUTLBEntry) { > .target_as = section.address_space, > - .iova = addr & ~plen, > - .translated_addr = xlat & ~plen, > - .addr_mask = plen, > + .iova = addr & ~page_mask, > + .translated_addr = xlat & ~page_mask, > + .addr_mask = page_mask, > /* IOTLBs are for DMAs, and DMA only allows on RAMs. */ BTW this comment is pretty confusing. What does it mean? > .perm = IOMMU_RW, > }; Looks like we should change IOMMUTLBEntry to pass size and not mask - then we could simply pass info from section as is. iova would be addr - xlat. > -- > 2.7.4 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-02 16:49 ` Michael S. Tsirkin @ 2017-06-05 3:07 ` Peter Xu 2017-06-06 14:34 ` Paolo Bonzini 0 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-05 3:07 UTC (permalink / raw) To: Michael S. Tsirkin Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang, David Gibson On Fri, Jun 02, 2017 at 07:49:58PM +0300, Michael S. Tsirkin wrote: > On Fri, Jun 02, 2017 at 07:50:53PM +0800, Peter Xu wrote: > > This patch let address_space_get_iotlb_entry() to use the newly > > introduced page_mask parameter in address_space_do_translate(). Then we > > will be sure the IOTLB can be aligned to page mask, also we should > > nicely support huge pages now when introducing a764040. > > > > Fixes: a764040 ("exec: abstract address_space_do_translate()") > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > exec.c | 29 ++++++++++------------------- > > 1 file changed, 10 insertions(+), 19 deletions(-) > > > > diff --git a/exec.c b/exec.c > > index 63a3ff0..1f86253 100644 > > --- a/exec.c > > +++ b/exec.c > > @@ -544,14 +544,14 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, > > bool is_write) > > { > > MemoryRegionSection section; > > - hwaddr xlat, plen; > > + hwaddr xlat, page_mask; > > > > - /* Try to get maximum page mask during translation. */ > > - plen = (hwaddr)-1; > > - > > - /* This can never be MMIO. */ > > - section = address_space_do_translate(as, addr, &xlat, &plen, > > - NULL, is_write, false); > > + /* > > + * This can never be MMIO, and we don't really care about plen, > > + * but page mask. > > + */ > > + section = address_space_do_translate(as, addr, &xlat, NULL, > > + &page_mask, is_write, false); > > > > /* Illegal translation */ > > if (section.mr == &io_mem_unassigned) { > > > Can we just use section.size - xlat here? I replied in the other thread about what I thought... So will skip here. > > > @@ -562,20 +562,11 @@ IOMMUTLBEntry address_space_get_iotlb_entry(AddressSpace *as, hwaddr addr, > > xlat += section.offset_within_address_space - > > section.offset_within_region; > > > > - if (plen == (hwaddr)-1) { > > - /* If not specified during translation, use default mask */ > > - plen = TARGET_PAGE_MASK; > > - } else { > > - /* Make it a valid page mask */ > > - assert(plen); > > - plen = pow2floor(plen) - 1; > > - } > > - > > return (IOMMUTLBEntry) { > > .target_as = section.address_space, > > - .iova = addr & ~plen, > > - .translated_addr = xlat & ~plen, > > - .addr_mask = plen, > > + .iova = addr & ~page_mask, > > + .translated_addr = xlat & ~page_mask, > > + .addr_mask = page_mask, > > /* IOTLBs are for DMAs, and DMA only allows on RAMs. */ > > BTW this comment is pretty confusing. What does it mean? This function, address_space_get_iotlb_entry(), is for device to get IOTLB entry when they want to request for page translations for DMA, and DMA should only be allowed to RAM, right? Then we need IOMMU_RW permission here. Maybe I should add one more check above on the returned MR - it should be RAM typed as well. But I don't really know whether that's too strict, since if guest setup the IOMMU page table to allow one IOVA points to a non-RAM region, I thought it should still be legal from hypervisor POV (I see it a guest OS bug though)? > > > .perm = IOMMU_RW, > > }; > > Looks like we should change IOMMUTLBEntry to pass size and not mask - > then we could simply pass info from section as is. iova would be > addr - xlat. I don't sure whether it'll be a good interface for IOTLB. AFAIU at least for VT-d, the IOMMU translation is page aligned which is defined by spec, so it makes sense that (again at least for VT-d) here we'd better just use page_mask/addr_mask. That's also how I know about IOMMU in general - I assume it do the translations always with page masks (never arbitary length), though page size can differ from platfrom to platform, that's why here the IOTLB interface used addr_mask, then it works for all platforms. I don't know whether I'm 100% correct here though. Maybe David/Paolo/... would comment as well? (CC David) Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-05 3:07 ` Peter Xu @ 2017-06-06 14:34 ` Paolo Bonzini 2017-06-06 23:47 ` David Gibson 0 siblings, 1 reply; 34+ messages in thread From: Paolo Bonzini @ 2017-06-06 14:34 UTC (permalink / raw) To: Peter Xu, Michael S. Tsirkin Cc: qemu-devel, Maxime Coquelin, Jason Wang, David Gibson On 05/06/2017 05:07, Peter Xu wrote: > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > least for VT-d, the IOMMU translation is page aligned which is defined > by spec, so it makes sense that (again at least for VT-d) here we'd > better just use page_mask/addr_mask. > > That's also how I know about IOMMU in general - I assume it do the > translations always with page masks (never arbitary length), though > page size can differ from platfrom to platform, that's why here the > IOTLB interface used addr_mask, then it works for all platforms. I > don't know whether I'm 100% correct here though. > > Maybe David/Paolo/... would comment as well? I would ask David. There are PowerPC MMUs that allow fast lookup of arbitrarily-sized windows (not necessarily power of two), so maybe the IOMMUs can do the same. Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-06 14:34 ` Paolo Bonzini @ 2017-06-06 23:47 ` David Gibson 2017-06-07 3:44 ` Peter Xu 2017-06-07 13:01 ` Paolo Bonzini 0 siblings, 2 replies; 34+ messages in thread From: David Gibson @ 2017-06-06 23:47 UTC (permalink / raw) To: Paolo Bonzini Cc: Peter Xu, Michael S. Tsirkin, qemu-devel, Maxime Coquelin, Jason Wang [-- Attachment #1: Type: text/plain, Size: 1461 bytes --] On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > On 05/06/2017 05:07, Peter Xu wrote: > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > least for VT-d, the IOMMU translation is page aligned which is defined > > by spec, so it makes sense that (again at least for VT-d) here we'd > > better just use page_mask/addr_mask. > > > > That's also how I know about IOMMU in general - I assume it do the > > translations always with page masks (never arbitary length), though > > page size can differ from platfrom to platform, that's why here the > > IOTLB interface used addr_mask, then it works for all platforms. I > > don't know whether I'm 100% correct here though. > > > > Maybe David/Paolo/... would comment as well? > > I would ask David. There are PowerPC MMUs that allow fast lookup of > arbitrarily-sized windows (not necessarily power of two), Uh.. I'm not sure what you mean here. You might be thinking of the BATs which really old (32-bit) PowerPC MMUs had - those allow arbitrary large block translations, but they do have to be a power of two. > so maybe the > IOMMUs can do the same. The only Power IOMMU I know about uses a fixed, power-of-two page size per DMA window. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-06 23:47 ` David Gibson @ 2017-06-07 3:44 ` Peter Xu 2017-06-07 13:07 ` Michael S. Tsirkin 2017-06-07 13:01 ` Paolo Bonzini 1 sibling, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-07 3:44 UTC (permalink / raw) To: David Gibson Cc: Paolo Bonzini, Michael S. Tsirkin, qemu-devel, Maxime Coquelin, Jason Wang On Wed, Jun 07, 2017 at 09:47:05AM +1000, David Gibson wrote: > On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > > > > On 05/06/2017 05:07, Peter Xu wrote: > > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > > least for VT-d, the IOMMU translation is page aligned which is defined > > > by spec, so it makes sense that (again at least for VT-d) here we'd > > > better just use page_mask/addr_mask. > > > > > > That's also how I know about IOMMU in general - I assume it do the > > > translations always with page masks (never arbitary length), though > > > page size can differ from platfrom to platform, that's why here the > > > IOTLB interface used addr_mask, then it works for all platforms. I > > > don't know whether I'm 100% correct here though. > > > > > > Maybe David/Paolo/... would comment as well? > > > > I would ask David. There are PowerPC MMUs that allow fast lookup of > > arbitrarily-sized windows (not necessarily power of two), > > Uh.. I'm not sure what you mean here. You might be thinking of the > BATs which really old (32-bit) PowerPC MMUs had - those allow > arbitrary large block translations, but they do have to be a power of > two. > > > so maybe the > > IOMMUs can do the same. > > The only Power IOMMU I know about uses a fixed, power-of-two page size > per DMA window. If so, I would still be inclined to keep using masks for QEMU IOTLB. Then, my first two patches should still stand. I am just afraid that not using masks will diverge the emulation from real hardware and brings trouble one day. For vhost IOTLB interface, it does not need to be strictly aligned to QEMU IOMMU IOTLB definition, and that's how it's working now (current vhost iotlb allows arbitary length, and I think it's good). So imho we don't really need to worry about the performance - after all, we can do everything customized for vhost, just like what patch 3 did (yeah, it can be better...). Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-07 3:44 ` Peter Xu @ 2017-06-07 13:07 ` Michael S. Tsirkin 2017-06-08 6:11 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-07 13:07 UTC (permalink / raw) To: Peter Xu Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang On Wed, Jun 07, 2017 at 11:44:43AM +0800, Peter Xu wrote: > On Wed, Jun 07, 2017 at 09:47:05AM +1000, David Gibson wrote: > > On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > > > > > > > On 05/06/2017 05:07, Peter Xu wrote: > > > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > > > least for VT-d, the IOMMU translation is page aligned which is defined > > > > by spec, so it makes sense that (again at least for VT-d) here we'd > > > > better just use page_mask/addr_mask. > > > > > > > > That's also how I know about IOMMU in general - I assume it do the > > > > translations always with page masks (never arbitary length), though > > > > page size can differ from platfrom to platform, that's why here the > > > > IOTLB interface used addr_mask, then it works for all platforms. I > > > > don't know whether I'm 100% correct here though. > > > > > > > > Maybe David/Paolo/... would comment as well? > > > > > > I would ask David. There are PowerPC MMUs that allow fast lookup of > > > arbitrarily-sized windows (not necessarily power of two), > > > > Uh.. I'm not sure what you mean here. You might be thinking of the > > BATs which really old (32-bit) PowerPC MMUs had - those allow > > arbitrary large block translations, but they do have to be a power of > > two. > > > > > so maybe the > > > IOMMUs can do the same. > > > > The only Power IOMMU I know about uses a fixed, power-of-two page size > > per DMA window. > > If so, I would still be inclined to keep using masks for QEMU IOTLB. > Then, my first two patches should still stand. > > I am just afraid that not using masks will diverge the emulation from > real hardware and brings trouble one day. > > For vhost IOTLB interface, it does not need to be strictly aligned to > QEMU IOMMU IOTLB definition, and that's how it's working now (current > vhost iotlb allows arbitary length, and I think it's good). So imho we > don't really need to worry about the performance - after all, we can > do everything customized for vhost, just like what patch 3 did (yeah, > it can be better...). > > Thanks, Pre-faults is also something that does not happen on real hardware. And it's about security so a bigger issue. If I had to choose between that and using non-power-of-2 in the API, I'd go for non-power-of-2. Let backends that can only support power of 2 split it up to multiple transactions. > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-07 13:07 ` Michael S. Tsirkin @ 2017-06-08 6:11 ` Peter Xu 2017-06-08 18:59 ` Michael S. Tsirkin 0 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-08 6:11 UTC (permalink / raw) To: Michael S. Tsirkin Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang On Wed, Jun 07, 2017 at 04:07:20PM +0300, Michael S. Tsirkin wrote: > On Wed, Jun 07, 2017 at 11:44:43AM +0800, Peter Xu wrote: > > On Wed, Jun 07, 2017 at 09:47:05AM +1000, David Gibson wrote: > > > On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > On 05/06/2017 05:07, Peter Xu wrote: > > > > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > > > > least for VT-d, the IOMMU translation is page aligned which is defined > > > > > by spec, so it makes sense that (again at least for VT-d) here we'd > > > > > better just use page_mask/addr_mask. > > > > > > > > > > That's also how I know about IOMMU in general - I assume it do the > > > > > translations always with page masks (never arbitary length), though > > > > > page size can differ from platfrom to platform, that's why here the > > > > > IOTLB interface used addr_mask, then it works for all platforms. I > > > > > don't know whether I'm 100% correct here though. > > > > > > > > > > Maybe David/Paolo/... would comment as well? > > > > > > > > I would ask David. There are PowerPC MMUs that allow fast lookup of > > > > arbitrarily-sized windows (not necessarily power of two), > > > > > > Uh.. I'm not sure what you mean here. You might be thinking of the > > > BATs which really old (32-bit) PowerPC MMUs had - those allow > > > arbitrary large block translations, but they do have to be a power of > > > two. > > > > > > > so maybe the > > > > IOMMUs can do the same. > > > > > > The only Power IOMMU I know about uses a fixed, power-of-two page size > > > per DMA window. > > > > If so, I would still be inclined to keep using masks for QEMU IOTLB. > > Then, my first two patches should still stand. > > > > I am just afraid that not using masks will diverge the emulation from > > real hardware and brings trouble one day. > > > > For vhost IOTLB interface, it does not need to be strictly aligned to > > QEMU IOMMU IOTLB definition, and that's how it's working now (current > > vhost iotlb allows arbitary length, and I think it's good). So imho we > > don't really need to worry about the performance - after all, we can > > do everything customized for vhost, just like what patch 3 did (yeah, > > it can be better...). > > > > Thanks, > > Pre-faults is also something that does not happen on real hardware. > And it's about security so a bigger issue. > > If I had to choose between that and using non-power-of-2 in > the API, I'd go for non-power-of-2. Let backends that can only > support power of 2 split it up to multiple transactions. The problem is that when I was fixing the problem that vhost had with PT (a764040, "exec: abstract address_space_do_translate()"), I did broke the IOTLB translation a bit (it was using page masks). IMHO we need to fix it first for correctness (patch 1/2). For patch 3, if we can have Jason's patch to allow dynamic iommu_platform switching, that'll be the best, then I can rewrite patch 3 with the switching logic rather than caching anything. But IMHO that can be separated from patch 1/2 if you like. Or do you have better suggestion on how should we fix it? Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-08 6:11 ` Peter Xu @ 2017-06-08 18:59 ` Michael S. Tsirkin 2017-06-09 1:58 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-08 18:59 UTC (permalink / raw) To: Peter Xu Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang On Thu, Jun 08, 2017 at 02:11:50PM +0800, Peter Xu wrote: > On Wed, Jun 07, 2017 at 04:07:20PM +0300, Michael S. Tsirkin wrote: > > On Wed, Jun 07, 2017 at 11:44:43AM +0800, Peter Xu wrote: > > > On Wed, Jun 07, 2017 at 09:47:05AM +1000, David Gibson wrote: > > > > On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > On 05/06/2017 05:07, Peter Xu wrote: > > > > > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > > > > > least for VT-d, the IOMMU translation is page aligned which is defined > > > > > > by spec, so it makes sense that (again at least for VT-d) here we'd > > > > > > better just use page_mask/addr_mask. > > > > > > > > > > > > That's also how I know about IOMMU in general - I assume it do the > > > > > > translations always with page masks (never arbitary length), though > > > > > > page size can differ from platfrom to platform, that's why here the > > > > > > IOTLB interface used addr_mask, then it works for all platforms. I > > > > > > don't know whether I'm 100% correct here though. > > > > > > > > > > > > Maybe David/Paolo/... would comment as well? > > > > > > > > > > I would ask David. There are PowerPC MMUs that allow fast lookup of > > > > > arbitrarily-sized windows (not necessarily power of two), > > > > > > > > Uh.. I'm not sure what you mean here. You might be thinking of the > > > > BATs which really old (32-bit) PowerPC MMUs had - those allow > > > > arbitrary large block translations, but they do have to be a power of > > > > two. > > > > > > > > > so maybe the > > > > > IOMMUs can do the same. > > > > > > > > The only Power IOMMU I know about uses a fixed, power-of-two page size > > > > per DMA window. > > > > > > If so, I would still be inclined to keep using masks for QEMU IOTLB. > > > Then, my first two patches should still stand. > > > > > > I am just afraid that not using masks will diverge the emulation from > > > real hardware and brings trouble one day. > > > > > > For vhost IOTLB interface, it does not need to be strictly aligned to > > > QEMU IOMMU IOTLB definition, and that's how it's working now (current > > > vhost iotlb allows arbitary length, and I think it's good). So imho we > > > don't really need to worry about the performance - after all, we can > > > do everything customized for vhost, just like what patch 3 did (yeah, > > > it can be better...). > > > > > > Thanks, > > > > Pre-faults is also something that does not happen on real hardware. > > And it's about security so a bigger issue. > > > > If I had to choose between that and using non-power-of-2 in > > the API, I'd go for non-power-of-2. Let backends that can only > > support power of 2 split it up to multiple transactions. > > The problem is that when I was fixing the problem that vhost had with > PT (a764040, "exec: abstract address_space_do_translate()"), I did > broke the IOTLB translation a bit (it was using page masks). IMHO we > need to fix it first for correctness (patch 1/2). > > For patch 3, if we can have Jason's patch to allow dynamic > iommu_platform switching, that'll be the best, then I can rewrite > patch 3 with the switching logic rather than caching anything. But > IMHO that can be separated from patch 1/2 if you like. > > Or do you have better suggestion on how should we fix it? > > Thanks, Can we drop masks completely and replace with length? I think we should do that instead of trying to fix masks. > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-08 18:59 ` Michael S. Tsirkin @ 2017-06-09 1:58 ` Peter Xu 2017-06-09 2:37 ` David Gibson 2017-06-11 10:09 ` Michael S. Tsirkin 0 siblings, 2 replies; 34+ messages in thread From: Peter Xu @ 2017-06-09 1:58 UTC (permalink / raw) To: Michael S. Tsirkin Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Thu, Jun 08, 2017 at 09:59:50PM +0300, Michael S. Tsirkin wrote: > On Thu, Jun 08, 2017 at 02:11:50PM +0800, Peter Xu wrote: > > On Wed, Jun 07, 2017 at 04:07:20PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Jun 07, 2017 at 11:44:43AM +0800, Peter Xu wrote: > > > > On Wed, Jun 07, 2017 at 09:47:05AM +1000, David Gibson wrote: > > > > > On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > > > > On 05/06/2017 05:07, Peter Xu wrote: > > > > > > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > > > > > > least for VT-d, the IOMMU translation is page aligned which is defined > > > > > > > by spec, so it makes sense that (again at least for VT-d) here we'd > > > > > > > better just use page_mask/addr_mask. > > > > > > > > > > > > > > That's also how I know about IOMMU in general - I assume it do the > > > > > > > translations always with page masks (never arbitary length), though > > > > > > > page size can differ from platfrom to platform, that's why here the > > > > > > > IOTLB interface used addr_mask, then it works for all platforms. I > > > > > > > don't know whether I'm 100% correct here though. > > > > > > > > > > > > > > Maybe David/Paolo/... would comment as well? > > > > > > > > > > > > I would ask David. There are PowerPC MMUs that allow fast lookup of > > > > > > arbitrarily-sized windows (not necessarily power of two), > > > > > > > > > > Uh.. I'm not sure what you mean here. You might be thinking of the > > > > > BATs which really old (32-bit) PowerPC MMUs had - those allow > > > > > arbitrary large block translations, but they do have to be a power of > > > > > two. > > > > > > > > > > > so maybe the > > > > > > IOMMUs can do the same. > > > > > > > > > > The only Power IOMMU I know about uses a fixed, power-of-two page size > > > > > per DMA window. > > > > > > > > If so, I would still be inclined to keep using masks for QEMU IOTLB. > > > > Then, my first two patches should still stand. > > > > > > > > I am just afraid that not using masks will diverge the emulation from > > > > real hardware and brings trouble one day. > > > > > > > > For vhost IOTLB interface, it does not need to be strictly aligned to > > > > QEMU IOMMU IOTLB definition, and that's how it's working now (current > > > > vhost iotlb allows arbitary length, and I think it's good). So imho we > > > > don't really need to worry about the performance - after all, we can > > > > do everything customized for vhost, just like what patch 3 did (yeah, > > > > it can be better...). > > > > > > > > Thanks, > > > > > > Pre-faults is also something that does not happen on real hardware. > > > And it's about security so a bigger issue. > > > > > > If I had to choose between that and using non-power-of-2 in > > > the API, I'd go for non-power-of-2. Let backends that can only > > > support power of 2 split it up to multiple transactions. > > > > The problem is that when I was fixing the problem that vhost had with > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > need to fix it first for correctness (patch 1/2). > > > > For patch 3, if we can have Jason's patch to allow dynamic > > iommu_platform switching, that'll be the best, then I can rewrite > > patch 3 with the switching logic rather than caching anything. But > > IMHO that can be separated from patch 1/2 if you like. > > > > Or do you have better suggestion on how should we fix it? > > > > Thanks, > > Can we drop masks completely and replace with length? I think we > should do that instead of trying to fix masks. Do you mean to modify IOMMUTLBEntry.addr_mask into length? Again, I am not sure this is good... At least we need to get ack from David since spapr should be the initial user of it, and possibly also Alex since vfio should be assuming that (IIUC both in QEMU and kernel) addr_mask is page masks rather than arbirary length. (CC Alex) Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-09 1:58 ` Peter Xu @ 2017-06-09 2:37 ` David Gibson 2017-06-11 10:09 ` Michael S. Tsirkin 1 sibling, 0 replies; 34+ messages in thread From: David Gibson @ 2017-06-09 2:37 UTC (permalink / raw) To: Peter Xu Cc: Michael S. Tsirkin, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson [-- Attachment #1: Type: text/plain, Size: 4656 bytes --] On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > On Thu, Jun 08, 2017 at 09:59:50PM +0300, Michael S. Tsirkin wrote: > > On Thu, Jun 08, 2017 at 02:11:50PM +0800, Peter Xu wrote: > > > On Wed, Jun 07, 2017 at 04:07:20PM +0300, Michael S. Tsirkin wrote: > > > > On Wed, Jun 07, 2017 at 11:44:43AM +0800, Peter Xu wrote: > > > > > On Wed, Jun 07, 2017 at 09:47:05AM +1000, David Gibson wrote: > > > > > > On Tue, Jun 06, 2017 at 04:34:30PM +0200, Paolo Bonzini wrote: > > > > > > > > > > > > > > > > > > > > > On 05/06/2017 05:07, Peter Xu wrote: > > > > > > > > I don't sure whether it'll be a good interface for IOTLB. AFAIU at > > > > > > > > least for VT-d, the IOMMU translation is page aligned which is defined > > > > > > > > by spec, so it makes sense that (again at least for VT-d) here we'd > > > > > > > > better just use page_mask/addr_mask. > > > > > > > > > > > > > > > > That's also how I know about IOMMU in general - I assume it do the > > > > > > > > translations always with page masks (never arbitary length), though > > > > > > > > page size can differ from platfrom to platform, that's why here the > > > > > > > > IOTLB interface used addr_mask, then it works for all platforms. I > > > > > > > > don't know whether I'm 100% correct here though. > > > > > > > > > > > > > > > > Maybe David/Paolo/... would comment as well? > > > > > > > > > > > > > > I would ask David. There are PowerPC MMUs that allow fast lookup of > > > > > > > arbitrarily-sized windows (not necessarily power of two), > > > > > > > > > > > > Uh.. I'm not sure what you mean here. You might be thinking of the > > > > > > BATs which really old (32-bit) PowerPC MMUs had - those allow > > > > > > arbitrary large block translations, but they do have to be a power of > > > > > > two. > > > > > > > > > > > > > so maybe the > > > > > > > IOMMUs can do the same. > > > > > > > > > > > > The only Power IOMMU I know about uses a fixed, power-of-two page size > > > > > > per DMA window. > > > > > > > > > > If so, I would still be inclined to keep using masks for QEMU IOTLB. > > > > > Then, my first two patches should still stand. > > > > > > > > > > I am just afraid that not using masks will diverge the emulation from > > > > > real hardware and brings trouble one day. > > > > > > > > > > For vhost IOTLB interface, it does not need to be strictly aligned to > > > > > QEMU IOMMU IOTLB definition, and that's how it's working now (current > > > > > vhost iotlb allows arbitary length, and I think it's good). So imho we > > > > > don't really need to worry about the performance - after all, we can > > > > > do everything customized for vhost, just like what patch 3 did (yeah, > > > > > it can be better...). > > > > > > > > > > Thanks, > > > > > > > > Pre-faults is also something that does not happen on real hardware. > > > > And it's about security so a bigger issue. > > > > > > > > If I had to choose between that and using non-power-of-2 in > > > > the API, I'd go for non-power-of-2. Let backends that can only > > > > support power of 2 split it up to multiple transactions. > > > > > > The problem is that when I was fixing the problem that vhost had with > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > need to fix it first for correctness (patch 1/2). > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > iommu_platform switching, that'll be the best, then I can rewrite > > > patch 3 with the switching logic rather than caching anything. But > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > Thanks, > > > > Can we drop masks completely and replace with length? I think we > > should do that instead of trying to fix masks. > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > Again, I am not sure this is good... At least we need to get ack from > David since spapr should be the initial user of it, and possibly also > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > addr_mask is page masks rather than arbirary length. So, I don't see that using size instead of mask would be a particular problem for spapr. However, I also don't see any advantage to switching. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-09 1:58 ` Peter Xu 2017-06-09 2:37 ` David Gibson @ 2017-06-11 10:09 ` Michael S. Tsirkin 2017-06-11 12:10 ` David Gibson 1 sibling, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-11 10:09 UTC (permalink / raw) To: Peter Xu Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > The problem is that when I was fixing the problem that vhost had with > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > need to fix it first for correctness (patch 1/2). > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > iommu_platform switching, that'll be the best, then I can rewrite > > > patch 3 with the switching logic rather than caching anything. But > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > Thanks, > > > > Can we drop masks completely and replace with length? I think we > > should do that instead of trying to fix masks. > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? I think it's better than alternatives. > Again, I am not sure this is good... At least we need to get ack from > David since spapr should be the initial user of it, and possibly also > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > addr_mask is page masks rather than arbirary length. > > (CC Alex) > > Thanks, Callbacks that need powers of two can easily split up the range. > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-11 10:09 ` Michael S. Tsirkin @ 2017-06-11 12:10 ` David Gibson 2017-06-12 2:34 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: David Gibson @ 2017-06-11 12:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Peter Xu, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson [-- Attachment #1: Type: text/plain, Size: 1809 bytes --] On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > The problem is that when I was fixing the problem that vhost had with > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > patch 3 with the switching logic rather than caching anything. But > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > Thanks, > > > > > > Can we drop masks completely and replace with length? I think we > > > should do that instead of trying to fix masks. > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > I think it's better than alternatives. > > > Again, I am not sure this is good... At least we need to get ack from > > David since spapr should be the initial user of it, and possibly also > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > addr_mask is page masks rather than arbirary length. > > > > (CC Alex) > > > > Thanks, > > Callbacks that need powers of two can easily split up the range. I think I missed part of the thread. What's the original use case for non-power-of-two IOTLB entries? It certainly won't happen on Power. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-11 12:10 ` David Gibson @ 2017-06-12 2:34 ` Peter Xu 2017-06-12 3:07 ` Michael S. Tsirkin 0 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-12 2:34 UTC (permalink / raw) To: David Gibson, Michael S. Tsirkin Cc: Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > Thanks, > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > should do that instead of trying to fix masks. > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > I think it's better than alternatives. > > > > > Again, I am not sure this is good... At least we need to get ack from > > > David since spapr should be the initial user of it, and possibly also > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > addr_mask is page masks rather than arbirary length. > > > > > > (CC Alex) > > > > > > Thanks, > > > > Callbacks that need powers of two can easily split up the range. > > I think I missed part of the thread. What's the original use case for > non-power-of-two IOTLB entries? It certainly won't happen on Power. Currently address_space_get_iotlb_entry() didn't really follow the rule, addr_mask can be arbitary length. This series tried to fix it, while Michael was questioning about whether we should really fix that at all. Michael, Even if for performance's sake, I should still think we should fix it. Let's consider a most simple worst case: we have a single page mapped with IOVA range (2M page): [0x0, 0x200000) And if guest access IOVA using the following patern: 0x1fffff, 0x1ffffe, 0x1ffffd, ... Then now we'll get this: - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) - ... We'll all cache miss along the way until we access 0x0. While if we are with page mask, we'll get: - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) - request 0x1ffffe, cache hit - request 0x1ffffd, cache hit - ... We'll only miss at the first IO. -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-12 2:34 ` Peter Xu @ 2017-06-12 3:07 ` Michael S. Tsirkin 2017-06-12 4:04 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-12 3:07 UTC (permalink / raw) To: Peter Xu Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Mon, Jun 12, 2017 at 10:34:43AM +0800, Peter Xu wrote: > On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > > > Thanks, > > > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > > should do that instead of trying to fix masks. > > > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > > > I think it's better than alternatives. > > > > > > > Again, I am not sure this is good... At least we need to get ack from > > > > David since spapr should be the initial user of it, and possibly also > > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > > addr_mask is page masks rather than arbirary length. > > > > > > > > (CC Alex) > > > > > > > > Thanks, > > > > > > Callbacks that need powers of two can easily split up the range. > > > > I think I missed part of the thread. What's the original use case for > > non-power-of-two IOTLB entries? It certainly won't happen on Power. > > Currently address_space_get_iotlb_entry() didn't really follow the > rule, addr_mask can be arbitary length. This series tried to fix it, > while Michael was questioning about whether we should really fix that > at all. > > Michael, > > Even if for performance's sake, I should still think we should fix it. > Let's consider a most simple worst case: we have a single page mapped > with IOVA range (2M page): > > [0x0, 0x200000) > > And if guest access IOVA using the following patern: > > 0x1fffff, 0x1ffffe, 0x1ffffd, ... > > Then now we'll get this: > > - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) > - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) > - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) > - ... We pass an offset too, do we not? So callee can figure out the region starts at 0x0 and avoid 2nd and 3rd misses. > We'll all cache miss along the way until we access 0x0. While if we > are with page mask, we'll get: > > - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) > - request 0x1ffffe, cache hit > - request 0x1ffffd, cache hit > - ... > > We'll only miss at the first IO. I think we should send as much info as we can. There should be a way to find full region start and length. > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-12 3:07 ` Michael S. Tsirkin @ 2017-06-12 4:04 ` Peter Xu 2017-06-14 18:34 ` Michael S. Tsirkin 0 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-12 4:04 UTC (permalink / raw) To: Michael S. Tsirkin Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Mon, Jun 12, 2017 at 06:07:04AM +0300, Michael S. Tsirkin wrote: > On Mon, Jun 12, 2017 at 10:34:43AM +0800, Peter Xu wrote: > > On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > > > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > > > should do that instead of trying to fix masks. > > > > > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > > > > > I think it's better than alternatives. > > > > > > > > > Again, I am not sure this is good... At least we need to get ack from > > > > > David since spapr should be the initial user of it, and possibly also > > > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > > > addr_mask is page masks rather than arbirary length. > > > > > > > > > > (CC Alex) > > > > > > > > > > Thanks, > > > > > > > > Callbacks that need powers of two can easily split up the range. > > > > > > I think I missed part of the thread. What's the original use case for > > > non-power-of-two IOTLB entries? It certainly won't happen on Power. > > > > Currently address_space_get_iotlb_entry() didn't really follow the > > rule, addr_mask can be arbitary length. This series tried to fix it, > > while Michael was questioning about whether we should really fix that > > at all. > > > > Michael, > > > > Even if for performance's sake, I should still think we should fix it. > > Let's consider a most simple worst case: we have a single page mapped > > with IOVA range (2M page): > > > > [0x0, 0x200000) > > > > And if guest access IOVA using the following patern: > > > > 0x1fffff, 0x1ffffe, 0x1ffffd, ... > > > > Then now we'll get this: > > > > - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) > > - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) > > - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) > > - ... > > We pass an offset too, do we not? So callee can figure out > the region starts at 0x0 and avoid 2nd and 3rd misses. Here when you say "offset", do you mean the offset in MemoryRegionSection? In address_space_get_iotlb_entry() we have this: section = address_space_do_translate(as, addr, &xlat, &plen, is_write, false); One thing to mention is that, imho we cannot really assume the xlat is valid on the whole "section" range - the section can be a huge GPA range, while the xlat may only be valid on a single 4K page. The only safe region we can use here is (xlat, xlat+plen). Outside that, we should know nothing valid. Please correct me if I didn't really catch the point.. > > > > We'll all cache miss along the way until we access 0x0. While if we > > are with page mask, we'll get: > > > > - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) > > - request 0x1ffffe, cache hit > > - request 0x1ffffd, cache hit > > - ... > > > > We'll only miss at the first IO. > > I think we should send as much info as we can. > There should be a way to find full region start and length. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-12 4:04 ` Peter Xu @ 2017-06-14 18:34 ` Michael S. Tsirkin 2017-06-15 2:31 ` Peter Xu 0 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-14 18:34 UTC (permalink / raw) To: Peter Xu Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Mon, Jun 12, 2017 at 12:04:58PM +0800, Peter Xu wrote: > On Mon, Jun 12, 2017 at 06:07:04AM +0300, Michael S. Tsirkin wrote: > > On Mon, Jun 12, 2017 at 10:34:43AM +0800, Peter Xu wrote: > > > On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > > > > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > > > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > > > > should do that instead of trying to fix masks. > > > > > > > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > > > > > > > I think it's better than alternatives. > > > > > > > > > > > Again, I am not sure this is good... At least we need to get ack from > > > > > > David since spapr should be the initial user of it, and possibly also > > > > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > > > > addr_mask is page masks rather than arbirary length. > > > > > > > > > > > > (CC Alex) > > > > > > > > > > > > Thanks, > > > > > > > > > > Callbacks that need powers of two can easily split up the range. > > > > > > > > I think I missed part of the thread. What's the original use case for > > > > non-power-of-two IOTLB entries? It certainly won't happen on Power. > > > > > > Currently address_space_get_iotlb_entry() didn't really follow the > > > rule, addr_mask can be arbitary length. This series tried to fix it, > > > while Michael was questioning about whether we should really fix that > > > at all. > > > > > > Michael, > > > > > > Even if for performance's sake, I should still think we should fix it. > > > Let's consider a most simple worst case: we have a single page mapped > > > with IOVA range (2M page): > > > > > > [0x0, 0x200000) > > > > > > And if guest access IOVA using the following patern: > > > > > > 0x1fffff, 0x1ffffe, 0x1ffffd, ... > > > > > > Then now we'll get this: > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) > > > - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) > > > - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) > > > - ... > > > > We pass an offset too, do we not? So callee can figure out > > the region starts at 0x0 and avoid 2nd and 3rd misses. > > Here when you say "offset", do you mean the offset in > MemoryRegionSection? > > In address_space_get_iotlb_entry() we have this: > > section = address_space_do_translate(as, addr, &xlat, &plen, > is_write, false); > > One thing to mention is that, imho we cannot really assume the xlat is > valid on the whole "section" range - the section can be a huge GPA > range, while the xlat may only be valid on a single 4K page. The only > safe region we can use here is (xlat, xlat+plen). Outside that, we > should know nothing valid. > > Please correct me if I didn't really catch the point.. IIUC section is the translation result. If so all of it is valid, not just one page. > > > > > > > We'll all cache miss along the way until we access 0x0. While if we > > > are with page mask, we'll get: > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) > > > - request 0x1ffffe, cache hit > > > - request 0x1ffffd, cache hit > > > - ... > > > > > > We'll only miss at the first IO. > > > > I think we should send as much info as we can. > > There should be a way to find full region start and length. > > Thanks, > > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-14 18:34 ` Michael S. Tsirkin @ 2017-06-15 2:31 ` Peter Xu 2017-06-15 2:57 ` Peter Xu 2017-06-16 15:33 ` Michael S. Tsirkin 0 siblings, 2 replies; 34+ messages in thread From: Peter Xu @ 2017-06-15 2:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Wed, Jun 14, 2017 at 09:34:52PM +0300, Michael S. Tsirkin wrote: > On Mon, Jun 12, 2017 at 12:04:58PM +0800, Peter Xu wrote: > > On Mon, Jun 12, 2017 at 06:07:04AM +0300, Michael S. Tsirkin wrote: > > > On Mon, Jun 12, 2017 at 10:34:43AM +0800, Peter Xu wrote: > > > > On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > > > > > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > > > > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > > > > > should do that instead of trying to fix masks. > > > > > > > > > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > > > > > > > > > I think it's better than alternatives. > > > > > > > > > > > > > Again, I am not sure this is good... At least we need to get ack from > > > > > > > David since spapr should be the initial user of it, and possibly also > > > > > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > > > > > addr_mask is page masks rather than arbirary length. > > > > > > > > > > > > > > (CC Alex) > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Callbacks that need powers of two can easily split up the range. > > > > > > > > > > I think I missed part of the thread. What's the original use case for > > > > > non-power-of-two IOTLB entries? It certainly won't happen on Power. > > > > > > > > Currently address_space_get_iotlb_entry() didn't really follow the > > > > rule, addr_mask can be arbitary length. This series tried to fix it, > > > > while Michael was questioning about whether we should really fix that > > > > at all. > > > > > > > > Michael, > > > > > > > > Even if for performance's sake, I should still think we should fix it. > > > > Let's consider a most simple worst case: we have a single page mapped > > > > with IOVA range (2M page): > > > > > > > > [0x0, 0x200000) > > > > > > > > And if guest access IOVA using the following patern: > > > > > > > > 0x1fffff, 0x1ffffe, 0x1ffffd, ... > > > > > > > > Then now we'll get this: > > > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) > > > > - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) > > > > - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) > > > > - ... > > > > > > We pass an offset too, do we not? So callee can figure out > > > the region starts at 0x0 and avoid 2nd and 3rd misses. > > > > Here when you say "offset", do you mean the offset in > > MemoryRegionSection? > > > > In address_space_get_iotlb_entry() we have this: > > > > section = address_space_do_translate(as, addr, &xlat, &plen, > > is_write, false); > > > > One thing to mention is that, imho we cannot really assume the xlat is > > valid on the whole "section" range - the section can be a huge GPA > > range, while the xlat may only be valid on a single 4K page. The only > > safe region we can use here is (xlat, xlat+plen). Outside that, we > > should know nothing valid. [1] > > > > Please correct me if I didn't really catch the point.. > > IIUC section is the translation result. If so all of it is valid, > not just one page. It should be only one page (I think that depends on how we implemented the translation logic). Please check this (gdb session on current master): (gdb) b address_space_get_iotlb_entry Breakpoint 2 at 0x55555576803a: file /root/git/qemu/exec.c, line 512. ... (until break) (gdb) bt #0 0x000055555576803a in address_space_get_iotlb_entry (as=0x55555841bbb0, addr=4294959104, is_write=false) at /root/git/qemu/exec.c:512 #1 0x00005555558322dd in vhost_device_iotlb_miss (dev=0x5555568b03a0, iova=4294959104, write=0) at /root/git/qemu/hw/virtio/vhost.c:982 #2 0x0000555555834ad9 in vhost_backend_handle_iotlb_msg (dev=0x5555568b03a0, imsg=0x7fffffffd428) at /root/git/qemu/hw/virtio/vhost-backend.c:334 #3 0x00005555558347be in vhost_kernel_iotlb_read (opaque=0x5555568b03a0) at /root/git/qemu/hw/virtio/vhost-backend.c:204 #4 0x0000555555c7e7c3 in aio_dispatch_handlers (ctx=0x5555568091b0) at /root/git/qemu/util/aio-posix.c:399 #5 0x0000555555c7e956 in aio_dispatch (ctx=0x5555568091b0) at /root/git/qemu/util/aio-posix.c:430 #6 0x0000555555c7a722 in aio_ctx_dispatch (source=0x5555568091b0, callback=0x0, user_data=0x0) at /root/git/qemu/util/async.c:261 #7 0x00007f98c62f3703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #8 0x0000555555c7d36c in glib_pollfds_poll () at /root/git/qemu/util/main-loop.c:213 #9 0x0000555555c7d468 in os_host_main_loop_wait (timeout=29311167) at /root/git/qemu/util/main-loop.c:261 #10 0x0000555555c7d521 in main_loop_wait (nonblocking=0) at /root/git/qemu/util/main-loop.c:517 #11 0x00005555558fe234 in main_loop () at /root/git/qemu/vl.c:1918 #12 0x0000555555905fe8 in main (argc=21, argv=0x7fffffffda48, envp=0x7fffffffdaf8) at /root/git/qemu/vl.c:4752 (gdb) s 517 plen = (hwaddr)-1; (gdb) 520 section = address_space_do_translate(as, addr, &xlat, &plen, (gdb) n 524 if (section.mr == &io_mem_unassigned) { (gdb) p/x xlat $2 = 0x73900000 (gdb) p/x addr $3 = 0xffffe000 (gdb) p/x plen $4 = 0x1000 (gdb) p section $5 = {mr = 0x5555569ad790, address_space = 0x5555562f9b40 <address_space_memory>, offset_within_region = 1048576, size = 0x0000000000000000000000007ff00000, offset_within_address_space = 1048576, readonly = false} Here $2-$4 shows that we are translating IOVA 0xffffe000 into [0x73900000, 0x73901000) range, while we can see the section is having size as 0x7ff00000, which is ranged from [0x100000000, 0x17ff00000). That's the upper RAM that the guest has, corresponds to what we can see in "info mtree -f" (the guest contains 4G RAM, this is upper 2G): 0000000100000000-000000017fffffff (prio 0, ram): pc.ram @0000000080000000 Obvious, we cannot assume we have a linear mapping on this whole range. So I don't think we can really use section info (though the section can be used to obtain some offset information, or address space the range belongs) to build up the IOTLB, but only the plen. Then, we need to fix current codes. And this is exactly what I meant above at [1]. Hope this clarifies a bit on the issue. Thanks, > > > > > > > > > > > We'll all cache miss along the way until we access 0x0. While if we > > > > are with page mask, we'll get: > > > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) > > > > - request 0x1ffffe, cache hit > > > > - request 0x1ffffd, cache hit > > > > - ... > > > > > > > > We'll only miss at the first IO. > > > > > > I think we should send as much info as we can. > > > There should be a way to find full region start and length. > > > > Thanks, > > > > -- > > Peter Xu -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-15 2:31 ` Peter Xu @ 2017-06-15 2:57 ` Peter Xu 2017-06-16 15:33 ` Michael S. Tsirkin 1 sibling, 0 replies; 34+ messages in thread From: Peter Xu @ 2017-06-15 2:57 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jason Wang, Alex Williamson, qemu-devel, Maxime Coquelin, Paolo Bonzini, David Gibson On Thu, Jun 15, 2017 at 10:31:11AM +0800, Peter Xu wrote: > On Wed, Jun 14, 2017 at 09:34:52PM +0300, Michael S. Tsirkin wrote: > > On Mon, Jun 12, 2017 at 12:04:58PM +0800, Peter Xu wrote: > > > On Mon, Jun 12, 2017 at 06:07:04AM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Jun 12, 2017 at 10:34:43AM +0800, Peter Xu wrote: > > > > > On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > > > > > > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > > > > > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > > > > > > should do that instead of trying to fix masks. > > > > > > > > > > > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > > > > > > > > > > > I think it's better than alternatives. > > > > > > > > > > > > > > > Again, I am not sure this is good... At least we need to get ack from > > > > > > > > David since spapr should be the initial user of it, and possibly also > > > > > > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > > > > > > addr_mask is page masks rather than arbirary length. > > > > > > > > > > > > > > > > (CC Alex) > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Callbacks that need powers of two can easily split up the range. > > > > > > > > > > > > I think I missed part of the thread. What's the original use case for > > > > > > non-power-of-two IOTLB entries? It certainly won't happen on Power. > > > > > > > > > > Currently address_space_get_iotlb_entry() didn't really follow the > > > > > rule, addr_mask can be arbitary length. This series tried to fix it, > > > > > while Michael was questioning about whether we should really fix that > > > > > at all. > > > > > > > > > > Michael, > > > > > > > > > > Even if for performance's sake, I should still think we should fix it. > > > > > Let's consider a most simple worst case: we have a single page mapped > > > > > with IOVA range (2M page): > > > > > > > > > > [0x0, 0x200000) > > > > > > > > > > And if guest access IOVA using the following patern: > > > > > > > > > > 0x1fffff, 0x1ffffe, 0x1ffffd, ... > > > > > > > > > > Then now we'll get this: > > > > > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) > > > > > - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) > > > > > - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) > > > > > - ... > > > > > > > > We pass an offset too, do we not? So callee can figure out > > > > the region starts at 0x0 and avoid 2nd and 3rd misses. > > > > > > Here when you say "offset", do you mean the offset in > > > MemoryRegionSection? > > > > > > In address_space_get_iotlb_entry() we have this: > > > > > > section = address_space_do_translate(as, addr, &xlat, &plen, > > > is_write, false); > > > > > > One thing to mention is that, imho we cannot really assume the xlat is > > > valid on the whole "section" range - the section can be a huge GPA > > > range, while the xlat may only be valid on a single 4K page. The only > > > safe region we can use here is (xlat, xlat+plen). Outside that, we > > > should know nothing valid. > > [1] > > > > > > > Please correct me if I didn't really catch the point.. > > > > IIUC section is the translation result. If so all of it is valid, > > not just one page. > > It should be only one page (I think that depends on how we implemented > the translation logic). Please check this (gdb session on current > master): > > (gdb) b address_space_get_iotlb_entry > Breakpoint 2 at 0x55555576803a: file /root/git/qemu/exec.c, line 512. > ... (until break) > (gdb) bt > #0 0x000055555576803a in address_space_get_iotlb_entry (as=0x55555841bbb0, addr=4294959104, is_write=false) at /root/git/qemu/exec.c:512 > #1 0x00005555558322dd in vhost_device_iotlb_miss (dev=0x5555568b03a0, iova=4294959104, write=0) at /root/git/qemu/hw/virtio/vhost.c:982 > #2 0x0000555555834ad9 in vhost_backend_handle_iotlb_msg (dev=0x5555568b03a0, imsg=0x7fffffffd428) at /root/git/qemu/hw/virtio/vhost-backend.c:334 > #3 0x00005555558347be in vhost_kernel_iotlb_read (opaque=0x5555568b03a0) at /root/git/qemu/hw/virtio/vhost-backend.c:204 > #4 0x0000555555c7e7c3 in aio_dispatch_handlers (ctx=0x5555568091b0) at /root/git/qemu/util/aio-posix.c:399 > #5 0x0000555555c7e956 in aio_dispatch (ctx=0x5555568091b0) at /root/git/qemu/util/aio-posix.c:430 > #6 0x0000555555c7a722 in aio_ctx_dispatch (source=0x5555568091b0, callback=0x0, user_data=0x0) at /root/git/qemu/util/async.c:261 > #7 0x00007f98c62f3703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 > #8 0x0000555555c7d36c in glib_pollfds_poll () at /root/git/qemu/util/main-loop.c:213 > #9 0x0000555555c7d468 in os_host_main_loop_wait (timeout=29311167) at /root/git/qemu/util/main-loop.c:261 > #10 0x0000555555c7d521 in main_loop_wait (nonblocking=0) at /root/git/qemu/util/main-loop.c:517 > #11 0x00005555558fe234 in main_loop () at /root/git/qemu/vl.c:1918 > #12 0x0000555555905fe8 in main (argc=21, argv=0x7fffffffda48, envp=0x7fffffffdaf8) at /root/git/qemu/vl.c:4752 > (gdb) s > 517 plen = (hwaddr)-1; > (gdb) > 520 section = address_space_do_translate(as, addr, &xlat, &plen, > (gdb) n > 524 if (section.mr == &io_mem_unassigned) { > (gdb) p/x xlat > $2 = 0x73900000 > (gdb) p/x addr > $3 = 0xffffe000 > (gdb) p/x plen > $4 = 0x1000 > (gdb) p section > $5 = {mr = 0x5555569ad790, address_space = 0x5555562f9b40 <address_space_memory>, offset_within_region = 1048576, size = 0x0000000000000000000000007ff00000, > offset_within_address_space = 1048576, readonly = false} > > Here $2-$4 shows that we are translating IOVA 0xffffe000 into > [0x73900000, 0x73901000) range, while we can see the section is having > size as 0x7ff00000, > which is ranged from [0x100000000, 0x17ff00000). > That's the upper RAM that the guest has, corresponds to what we can > see in "info mtree -f" (the guest contains 4G RAM, this is upper 2G): > > 0000000100000000-000000017fffffff (prio 0, ram): pc.ram @0000000080000000 Hmm I made a mistake here... It should be the lower 2G mem (not upper), which ranges from: 0000000000100000-000000007fffffff (prio 0, ram): pc.ram @0000000000100000 Though the main point still stands below... > > Obvious, we cannot assume we have a linear mapping on this whole > range. So I don't think we can really use section info (though the > section can be used to obtain some offset information, or address > space the range belongs) to build up the IOTLB, but only the plen. > Then, we need to fix current codes. > > And this is exactly what I meant above at [1]. Hope this clarifies a > bit on the issue. Thanks, > > > > > > > > > > > > > > > > We'll all cache miss along the way until we access 0x0. While if we > > > > > are with page mask, we'll get: > > > > > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) > > > > > - request 0x1ffffe, cache hit > > > > > - request 0x1ffffd, cache hit > > > > > - ... > > > > > > > > > > We'll only miss at the first IO. > > > > > > > > I think we should send as much info as we can. > > > > There should be a way to find full region start and length. > > > > > > Thanks, > > > > > > -- > > > Peter Xu > > -- > Peter Xu > -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-15 2:31 ` Peter Xu 2017-06-15 2:57 ` Peter Xu @ 2017-06-16 15:33 ` Michael S. Tsirkin 1 sibling, 0 replies; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-16 15:33 UTC (permalink / raw) To: Peter Xu Cc: David Gibson, Paolo Bonzini, qemu-devel, Maxime Coquelin, Jason Wang, Alex Williamson On Thu, Jun 15, 2017 at 10:31:11AM +0800, Peter Xu wrote: > On Wed, Jun 14, 2017 at 09:34:52PM +0300, Michael S. Tsirkin wrote: > > On Mon, Jun 12, 2017 at 12:04:58PM +0800, Peter Xu wrote: > > > On Mon, Jun 12, 2017 at 06:07:04AM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Jun 12, 2017 at 10:34:43AM +0800, Peter Xu wrote: > > > > > On Sun, Jun 11, 2017 at 08:10:15PM +0800, David Gibson wrote: > > > > > > On Sun, Jun 11, 2017 at 01:09:26PM +0300, Michael S. Tsirkin wrote: > > > > > > > On Fri, Jun 09, 2017 at 09:58:47AM +0800, Peter Xu wrote: > > > > > > > > > > The problem is that when I was fixing the problem that vhost had with > > > > > > > > > > PT (a764040, "exec: abstract address_space_do_translate()"), I did > > > > > > > > > > broke the IOTLB translation a bit (it was using page masks). IMHO we > > > > > > > > > > need to fix it first for correctness (patch 1/2). > > > > > > > > > > > > > > > > > > > > For patch 3, if we can have Jason's patch to allow dynamic > > > > > > > > > > iommu_platform switching, that'll be the best, then I can rewrite > > > > > > > > > > patch 3 with the switching logic rather than caching anything. But > > > > > > > > > > IMHO that can be separated from patch 1/2 if you like. > > > > > > > > > > > > > > > > > > > > Or do you have better suggestion on how should we fix it? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Can we drop masks completely and replace with length? I think we > > > > > > > > > should do that instead of trying to fix masks. > > > > > > > > > > > > > > > > Do you mean to modify IOMMUTLBEntry.addr_mask into length? > > > > > > > > > > > > > > I think it's better than alternatives. > > > > > > > > > > > > > > > Again, I am not sure this is good... At least we need to get ack from > > > > > > > > David since spapr should be the initial user of it, and possibly also > > > > > > > > Alex since vfio should be assuming that (IIUC both in QEMU and kernel) > > > > > > > > addr_mask is page masks rather than arbirary length. > > > > > > > > > > > > > > > > (CC Alex) > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Callbacks that need powers of two can easily split up the range. > > > > > > > > > > > > I think I missed part of the thread. What's the original use case for > > > > > > non-power-of-two IOTLB entries? It certainly won't happen on Power. > > > > > > > > > > Currently address_space_get_iotlb_entry() didn't really follow the > > > > > rule, addr_mask can be arbitary length. This series tried to fix it, > > > > > while Michael was questioning about whether we should really fix that > > > > > at all. > > > > > > > > > > Michael, > > > > > > > > > > Even if for performance's sake, I should still think we should fix it. > > > > > Let's consider a most simple worst case: we have a single page mapped > > > > > with IOVA range (2M page): > > > > > > > > > > [0x0, 0x200000) > > > > > > > > > > And if guest access IOVA using the following patern: > > > > > > > > > > 0x1fffff, 0x1ffffe, 0x1ffffd, ... > > > > > > > > > > Then now we'll get this: > > > > > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x1fffff, 0x200000) > > > > > - request 0x1ffffe, cache miss, will get iotlb [0x1ffffe, 0x200000) > > > > > - request 0x1ffffd, cache miss, will get iotlb [0x1ffffd, 0x200000) > > > > > - ... > > > > > > > > We pass an offset too, do we not? So callee can figure out > > > > the region starts at 0x0 and avoid 2nd and 3rd misses. > > > > > > Here when you say "offset", do you mean the offset in > > > MemoryRegionSection? > > > > > > In address_space_get_iotlb_entry() we have this: > > > > > > section = address_space_do_translate(as, addr, &xlat, &plen, > > > is_write, false); > > > > > > One thing to mention is that, imho we cannot really assume the xlat is > > > valid on the whole "section" range - the section can be a huge GPA > > > range, while the xlat may only be valid on a single 4K page. The only > > > safe region we can use here is (xlat, xlat+plen). Outside that, we > > > should know nothing valid. > > [1] > > > > > > > Please correct me if I didn't really catch the point.. > > > > IIUC section is the translation result. If so all of it is valid, > > not just one page. > > It should be only one page (I think that depends on how we implemented > the translation logic). Please check this (gdb session on current > master): > > (gdb) b address_space_get_iotlb_entry > Breakpoint 2 at 0x55555576803a: file /root/git/qemu/exec.c, line 512. > ... (until break) > (gdb) bt > #0 0x000055555576803a in address_space_get_iotlb_entry (as=0x55555841bbb0, addr=4294959104, is_write=false) at /root/git/qemu/exec.c:512 > #1 0x00005555558322dd in vhost_device_iotlb_miss (dev=0x5555568b03a0, iova=4294959104, write=0) at /root/git/qemu/hw/virtio/vhost.c:982 > #2 0x0000555555834ad9 in vhost_backend_handle_iotlb_msg (dev=0x5555568b03a0, imsg=0x7fffffffd428) at /root/git/qemu/hw/virtio/vhost-backend.c:334 > #3 0x00005555558347be in vhost_kernel_iotlb_read (opaque=0x5555568b03a0) at /root/git/qemu/hw/virtio/vhost-backend.c:204 > #4 0x0000555555c7e7c3 in aio_dispatch_handlers (ctx=0x5555568091b0) at /root/git/qemu/util/aio-posix.c:399 > #5 0x0000555555c7e956 in aio_dispatch (ctx=0x5555568091b0) at /root/git/qemu/util/aio-posix.c:430 > #6 0x0000555555c7a722 in aio_ctx_dispatch (source=0x5555568091b0, callback=0x0, user_data=0x0) at /root/git/qemu/util/async.c:261 > #7 0x00007f98c62f3703 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 > #8 0x0000555555c7d36c in glib_pollfds_poll () at /root/git/qemu/util/main-loop.c:213 > #9 0x0000555555c7d468 in os_host_main_loop_wait (timeout=29311167) at /root/git/qemu/util/main-loop.c:261 > #10 0x0000555555c7d521 in main_loop_wait (nonblocking=0) at /root/git/qemu/util/main-loop.c:517 > #11 0x00005555558fe234 in main_loop () at /root/git/qemu/vl.c:1918 > #12 0x0000555555905fe8 in main (argc=21, argv=0x7fffffffda48, envp=0x7fffffffdaf8) at /root/git/qemu/vl.c:4752 > (gdb) s > 517 plen = (hwaddr)-1; > (gdb) > 520 section = address_space_do_translate(as, addr, &xlat, &plen, > (gdb) n > 524 if (section.mr == &io_mem_unassigned) { > (gdb) p/x xlat > $2 = 0x73900000 > (gdb) p/x addr > $3 = 0xffffe000 > (gdb) p/x plen > $4 = 0x1000 > (gdb) p section > $5 = {mr = 0x5555569ad790, address_space = 0x5555562f9b40 <address_space_memory>, offset_within_region = 1048576, size = 0x0000000000000000000000007ff00000, > offset_within_address_space = 1048576, readonly = false} > > Here $2-$4 shows that we are translating IOVA 0xffffe000 into > [0x73900000, 0x73901000) range, while we can see the section is having > size as 0x7ff00000, which is ranged from [0x100000000, 0x17ff00000). > That's the upper RAM that the guest has, corresponds to what we can > see in "info mtree -f" (the guest contains 4G RAM, this is upper 2G): > > 0000000100000000-000000017fffffff (prio 0, ram): pc.ram @0000000080000000 > > Obvious, we cannot assume we have a linear mapping on this whole > range. So I don't think we can really use section info (though the > section can be used to obtain some offset information, or address > space the range belongs) to build up the IOTLB, but only the plen. > Then, we need to fix current codes. > > And this is exactly what I meant above at [1]. Hope this clarifies a > bit on the issue. Thanks, I agree we need to take the IOMMU PTE into account. What I was saying is that e.g. with a huge PTE we can figure out how it overlaps with the section and pass that exact info. Whatever is valid in both PTE and in section, is always safe to access, but might not be a power of two. > > > > > > > > > > > > > > > We'll all cache miss along the way until we access 0x0. While if we > > > > > are with page mask, we'll get: > > > > > > > > > > - request 0x1fffff, cache miss, will get iotlb [0x0, 0x200000) > > > > > - request 0x1ffffe, cache hit > > > > > - request 0x1ffffd, cache hit > > > > > - ... > > > > > > > > > > We'll only miss at the first IO. > > > > > > > > I think we should send as much info as we can. > > > > There should be a way to find full region start and length. > > > > > > Thanks, > > > > > > -- > > > Peter Xu > > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry 2017-06-06 23:47 ` David Gibson 2017-06-07 3:44 ` Peter Xu @ 2017-06-07 13:01 ` Paolo Bonzini 1 sibling, 0 replies; 34+ messages in thread From: Paolo Bonzini @ 2017-06-07 13:01 UTC (permalink / raw) To: David Gibson Cc: Peter Xu, Michael S. Tsirkin, qemu-devel, Maxime Coquelin, Jason Wang On 07/06/2017 01:47, David Gibson wrote: >> I would ask David. There are PowerPC MMUs that allow fast lookup of >> arbitrarily-sized windows (not necessarily power of two), > > Uh.. I'm not sure what you mean here. You might be thinking of the > BATs which really old (32-bit) PowerPC MMUs had - those allow > arbitrary large block translations, but they do have to be a power of > two. Oh, that is what I was thinking about. Thanks for figuring out! :) Paolo ^ permalink raw reply [flat|nested] 34+ messages in thread
* [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is 2017-06-02 11:50 [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Peter Xu 2017-06-02 11:50 ` [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate Peter Xu 2017-06-02 11:50 ` [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry Peter Xu @ 2017-06-02 11:50 ` Peter Xu 2017-06-02 15:45 ` Michael S. Tsirkin 2017-06-02 16:51 ` Michael S. Tsirkin 2017-06-02 14:51 ` [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Michael S. Tsirkin 3 siblings, 2 replies; 34+ messages in thread From: Peter Xu @ 2017-06-02 11:50 UTC (permalink / raw) To: qemu-devel Cc: Michael S . Tsirkin, Paolo Bonzini, Maxime Coquelin, peterx, Jason Wang This patch pre-heat vhost iotlb cache when passthrough mode enabled. Sometimes, even if user specified iommu_platform for vhost devices, IOMMU might still be disabled. One case is passthrough mode in VT-d implementation. We can detect this by observing iommu_list. If it's empty, it means IOMMU translation is disabled, then we can actually pre-heat the translation (it'll be static mapping then) by first invalidating all IOTLB, then cache existing memory ranges into vhost backend iotlb using 1:1 mapping. Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> --- hw/virtio/trace-events | 4 +++ hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+) diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index 1f7a7c1..54dcbb3 100644 --- a/hw/virtio/trace-events +++ b/hw/virtio/trace-events @@ -24,3 +24,7 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) "section name: %s g virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d actual: %d" virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d oldactual: %d" virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: %"PRIx64" num_pages: %d" + +# hw/virtio/vhost.c +vhost_iommu_commit(void) "" +vhost_iommu_static_preheat(void) "" diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index 03a46a7..d03d720 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -27,6 +27,7 @@ #include "hw/virtio/virtio-access.h" #include "migration/blocker.h" #include "sysemu/dma.h" +#include "trace.h" /* enabled until disconnected backend stabilizes */ #define _VHOST_DEBUG 1 @@ -730,6 +731,11 @@ static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) } } +static bool vhost_iommu_mr_enabled(struct vhost_dev *dev) +{ + return !QLIST_EMPTY(&dev->iommu_list); +} + static void vhost_iommu_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -782,6 +788,65 @@ static void vhost_iommu_region_del(MemoryListener *listener, } } +static void vhost_iommu_commit(MemoryListener *listener) +{ + struct vhost_dev *dev = container_of(listener, struct vhost_dev, + iommu_listener); + struct vhost_memory_region *r; + int i; + + trace_vhost_iommu_commit(); + + if (!vhost_iommu_mr_enabled(dev)) { + /* + * This means iommu_platform is enabled, however iommu memory + * region is disabled, e.g., when device passthrough is setup. + * Then, no translation is needed any more. + * + * Let's first invalidate the whole IOTLB, then pre-heat the + * static mapping by looping over vhost memory ranges. + */ + + if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, 0, + UINT64_MAX)) { + error_report("%s: flush existing IOTLB failed", __func__); + return; + } + + /* + * Current VHOST_IOTLB_INVALIDATE API has a small defect that + * the invalidation for (start=0, size=UINT64_MAX) cannot + * really invalidate an cached range of (start=UINT64_MAX-1, + * size=1). We send this 2nd invalidation to workaround this. + * But, frankly speaking for QEMU we don't have a problem with + * this since we will never have a vhost cache with range + * (start=UINT64_MAX-1, size=1) - if you see + * address_space_get_iotlb_entry() all IOTLBs are page + * aligned. + */ + if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, UINT64_MAX, + 1)) { + error_report("%s: flush existing IOTLB failed", __func__); + return; + } + + for (i = 0; i < dev->mem->nregions; i++) { + r = &dev->mem->regions[i]; + /* Vhost regions are writable RAM, so IOMMU_RW suites. */ + if (dev->vhost_ops->vhost_update_device_iotlb(dev, + r->guest_phys_addr, + r->userspace_addr, + r->memory_size, + IOMMU_RW)) { + error_report("%s: pre-heat static mapping failed", __func__); + return; + } + } + + trace_vhost_iommu_static_preheat(); + } +} + static void vhost_region_nop(MemoryListener *listener, MemoryRegionSection *section) { @@ -1298,6 +1363,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, hdev->iommu_listener = (MemoryListener) { .region_add = vhost_iommu_region_add, .region_del = vhost_iommu_region_del, + .commit = vhost_iommu_commit, }; if (hdev->migration_blocker == NULL) { -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is 2017-06-02 11:50 ` [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is Peter Xu @ 2017-06-02 15:45 ` Michael S. Tsirkin 2017-06-05 3:15 ` Peter Xu 2017-06-02 16:51 ` Michael S. Tsirkin 1 sibling, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-02 15:45 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 07:50:54PM +0800, Peter Xu wrote: > This patch pre-heat vhost iotlb cache when passthrough mode enabled. > > Sometimes, even if user specified iommu_platform for vhost devices, > IOMMU might still be disabled. One case is passthrough mode in VT-d > implementation. We can detect this by observing iommu_list. If it's > empty, it means IOMMU translation is disabled, then we can actually > pre-heat the translation (it'll be static mapping then) by first > invalidating all IOTLB, then cache existing memory ranges into vhost > backend iotlb using 1:1 mapping. > > Reviewed-by: Jason Wang <jasowang@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> This is still a hack I think. What if there's an invalidation? I think the right thing is to send updates only when requested, but sent the largest mapping including the iova, not from iova until end of page. Thoughts? > --- > hw/virtio/trace-events | 4 +++ > hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 70 insertions(+) > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events > index 1f7a7c1..54dcbb3 100644 > --- a/hw/virtio/trace-events > +++ b/hw/virtio/trace-events > @@ -24,3 +24,7 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) "section name: %s g > virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d actual: %d" > virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d oldactual: %d" > virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: %"PRIx64" num_pages: %d" > + > +# hw/virtio/vhost.c > +vhost_iommu_commit(void) "" > +vhost_iommu_static_preheat(void) "" > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 03a46a7..d03d720 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -27,6 +27,7 @@ > #include "hw/virtio/virtio-access.h" > #include "migration/blocker.h" > #include "sysemu/dma.h" > +#include "trace.h" > > /* enabled until disconnected backend stabilizes */ > #define _VHOST_DEBUG 1 > @@ -730,6 +731,11 @@ static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > } > } > > +static bool vhost_iommu_mr_enabled(struct vhost_dev *dev) > +{ > + return !QLIST_EMPTY(&dev->iommu_list); > +} > + > static void vhost_iommu_region_add(MemoryListener *listener, > MemoryRegionSection *section) > { > @@ -782,6 +788,65 @@ static void vhost_iommu_region_del(MemoryListener *listener, > } > } > > +static void vhost_iommu_commit(MemoryListener *listener) > +{ > + struct vhost_dev *dev = container_of(listener, struct vhost_dev, > + iommu_listener); > + struct vhost_memory_region *r; > + int i; > + > + trace_vhost_iommu_commit(); > + > + if (!vhost_iommu_mr_enabled(dev)) { > + /* > + * This means iommu_platform is enabled, however iommu memory > + * region is disabled, e.g., when device passthrough is setup. > + * Then, no translation is needed any more. > + * > + * Let's first invalidate the whole IOTLB, then pre-heat the > + * static mapping by looping over vhost memory ranges. > + */ > + > + if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, 0, > + UINT64_MAX)) { > + error_report("%s: flush existing IOTLB failed", __func__); > + return; > + } > + > + /* > + * Current VHOST_IOTLB_INVALIDATE API has a small defect that > + * the invalidation for (start=0, size=UINT64_MAX) cannot > + * really invalidate an cached range of (start=UINT64_MAX-1, > + * size=1). We send this 2nd invalidation to workaround this. > + * But, frankly speaking for QEMU we don't have a problem with > + * this since we will never have a vhost cache with range > + * (start=UINT64_MAX-1, size=1) - if you see > + * address_space_get_iotlb_entry() all IOTLBs are page > + * aligned. > + */ > + if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, UINT64_MAX, > + 1)) { > + error_report("%s: flush existing IOTLB failed", __func__); > + return; > + } > + > + for (i = 0; i < dev->mem->nregions; i++) { > + r = &dev->mem->regions[i]; > + /* Vhost regions are writable RAM, so IOMMU_RW suites. */ > + if (dev->vhost_ops->vhost_update_device_iotlb(dev, > + r->guest_phys_addr, > + r->userspace_addr, > + r->memory_size, > + IOMMU_RW)) { > + error_report("%s: pre-heat static mapping failed", __func__); > + return; > + } > + } > + > + trace_vhost_iommu_static_preheat(); > + } > +} > + > static void vhost_region_nop(MemoryListener *listener, > MemoryRegionSection *section) > { > @@ -1298,6 +1363,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, > hdev->iommu_listener = (MemoryListener) { > .region_add = vhost_iommu_region_add, > .region_del = vhost_iommu_region_del, > + .commit = vhost_iommu_commit, > }; > > if (hdev->migration_blocker == NULL) { > -- > 2.7.4 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is 2017-06-02 15:45 ` Michael S. Tsirkin @ 2017-06-05 3:15 ` Peter Xu 2017-06-05 4:07 ` Jason Wang 2017-06-05 15:05 ` Michael S. Tsirkin 0 siblings, 2 replies; 34+ messages in thread From: Peter Xu @ 2017-06-05 3:15 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 06:45:05PM +0300, Michael S. Tsirkin wrote: > On Fri, Jun 02, 2017 at 07:50:54PM +0800, Peter Xu wrote: > > This patch pre-heat vhost iotlb cache when passthrough mode enabled. > > > > Sometimes, even if user specified iommu_platform for vhost devices, > > IOMMU might still be disabled. One case is passthrough mode in VT-d > > implementation. We can detect this by observing iommu_list. If it's > > empty, it means IOMMU translation is disabled, then we can actually > > pre-heat the translation (it'll be static mapping then) by first > > invalidating all IOTLB, then cache existing memory ranges into vhost > > backend iotlb using 1:1 mapping. > > > > Reviewed-by: Jason Wang <jasowang@redhat.com> > > Signed-off-by: Peter Xu <peterx@redhat.com> > > This is still a hack I think. What if there's an invalidation? > I think the right thing is to send updates only when requested, > but sent the largest mapping including the iova, not from iova until end > of page. Thoughts? Indeed it's kind of a hack, but it does not hurt anything but will definitely boost performance in most cases... Yes "sent the largest mapping including the iova" is okay, but the first IO on one region would be delayed as well, so IMHO it's not the best solution as well. I think the best solution should be (for sure) that vhost knows it's PT, then it just skips the translation completely. I just don't sure whether there's simple/good way to do this. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is 2017-06-05 3:15 ` Peter Xu @ 2017-06-05 4:07 ` Jason Wang 2017-06-05 15:05 ` Michael S. Tsirkin 1 sibling, 0 replies; 34+ messages in thread From: Jason Wang @ 2017-06-05 4:07 UTC (permalink / raw) To: Peter Xu, Michael S. Tsirkin; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin On 2017年06月05日 11:15, Peter Xu wrote: > On Fri, Jun 02, 2017 at 06:45:05PM +0300, Michael S. Tsirkin wrote: >> On Fri, Jun 02, 2017 at 07:50:54PM +0800, Peter Xu wrote: >>> This patch pre-heat vhost iotlb cache when passthrough mode enabled. >>> >>> Sometimes, even if user specified iommu_platform for vhost devices, >>> IOMMU might still be disabled. One case is passthrough mode in VT-d >>> implementation. We can detect this by observing iommu_list. If it's >>> empty, it means IOMMU translation is disabled, then we can actually >>> pre-heat the translation (it'll be static mapping then) by first >>> invalidating all IOTLB, then cache existing memory ranges into vhost >>> backend iotlb using 1:1 mapping. >>> >>> Reviewed-by: Jason Wang <jasowang@redhat.com> >>> Signed-off-by: Peter Xu <peterx@redhat.com> >> This is still a hack I think. What if there's an invalidation? >> I think the right thing is to send updates only when requested, >> but sent the largest mapping including the iova, not from iova until end >> of page. Thoughts? > Indeed it's kind of a hack, but it does not hurt anything but will > definitely boost performance in most cases... > > Yes "sent the largest mapping including the iova" is okay, but the > first IO on one region would be delayed as well, so IMHO it's not the > best solution as well. I think the best solution should be (for sure) > that vhost knows it's PT, then it just skips the translation > completely. I just don't sure whether there's simple/good way to do > this. > > Thanks, We can disabled device IOTLB completely in this case. But looks like there's a minor kernel bug prevent us from doing this. Let me post a fix and let's see then. Thanks ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is 2017-06-05 3:15 ` Peter Xu 2017-06-05 4:07 ` Jason Wang @ 2017-06-05 15:05 ` Michael S. Tsirkin 1 sibling, 0 replies; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-05 15:05 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Mon, Jun 05, 2017 at 11:15:11AM +0800, Peter Xu wrote: > On Fri, Jun 02, 2017 at 06:45:05PM +0300, Michael S. Tsirkin wrote: > > On Fri, Jun 02, 2017 at 07:50:54PM +0800, Peter Xu wrote: > > > This patch pre-heat vhost iotlb cache when passthrough mode enabled. > > > > > > Sometimes, even if user specified iommu_platform for vhost devices, > > > IOMMU might still be disabled. One case is passthrough mode in VT-d > > > implementation. We can detect this by observing iommu_list. If it's > > > empty, it means IOMMU translation is disabled, then we can actually > > > pre-heat the translation (it'll be static mapping then) by first > > > invalidating all IOTLB, then cache existing memory ranges into vhost > > > backend iotlb using 1:1 mapping. > > > > > > Reviewed-by: Jason Wang <jasowang@redhat.com> > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > This is still a hack I think. What if there's an invalidation? > > I think the right thing is to send updates only when requested, > > but sent the largest mapping including the iova, not from iova until end > > of page. Thoughts? > > Indeed it's kind of a hack, but it does not hurt anything but will > definitely boost performance in most cases... > > Yes "sent the largest mapping including the iova" is okay, but the > first IO on one region would be delayed as well, so IMHO it's not the > best solution as well. I think the best solution should be (for sure) > that vhost knows it's PT, then it just skips the translation > completely. I just don't sure whether there's simple/good way to do > this. > > Thanks, If you send the whole 64 bit area, then backend can detect this easily. > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is 2017-06-02 11:50 ` [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is Peter Xu 2017-06-02 15:45 ` Michael S. Tsirkin @ 2017-06-02 16:51 ` Michael S. Tsirkin 1 sibling, 0 replies; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-02 16:51 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 07:50:54PM +0800, Peter Xu wrote: > This patch pre-heat vhost iotlb cache when passthrough mode enabled. > > Sometimes, even if user specified iommu_platform for vhost devices, > IOMMU might still be disabled. One case is passthrough mode in VT-d > implementation. We can detect this by observing iommu_list. If it's > empty, it means IOMMU translation is disabled, then we can actually > pre-heat the translation (it'll be static mapping then) by first > invalidating all IOTLB, then cache existing memory ranges into vhost > backend iotlb using 1:1 mapping. > > Reviewed-by: Jason Wang <jasowang@redhat.com> > Signed-off-by: Peter Xu <peterx@redhat.com> Can't say I like this, this does not help the more important use-case of dpdk which has a small static mapping but can't be detected by QEMU. vhost should just heat up whatever's actually used. Why isn't this enough? > --- > hw/virtio/trace-events | 4 +++ > hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 70 insertions(+) > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events > index 1f7a7c1..54dcbb3 100644 > --- a/hw/virtio/trace-events > +++ b/hw/virtio/trace-events > @@ -24,3 +24,7 @@ virtio_balloon_handle_output(const char *name, uint64_t gpa) "section name: %s g > virtio_balloon_get_config(uint32_t num_pages, uint32_t actual) "num_pages: %d actual: %d" > virtio_balloon_set_config(uint32_t actual, uint32_t oldactual) "actual: %d oldactual: %d" > virtio_balloon_to_target(uint64_t target, uint32_t num_pages) "balloon target: %"PRIx64" num_pages: %d" > + > +# hw/virtio/vhost.c > +vhost_iommu_commit(void) "" > +vhost_iommu_static_preheat(void) "" > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index 03a46a7..d03d720 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -27,6 +27,7 @@ > #include "hw/virtio/virtio-access.h" > #include "migration/blocker.h" > #include "sysemu/dma.h" > +#include "trace.h" > > /* enabled until disconnected backend stabilizes */ > #define _VHOST_DEBUG 1 > @@ -730,6 +731,11 @@ static void vhost_iommu_unmap_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > } > } > > +static bool vhost_iommu_mr_enabled(struct vhost_dev *dev) > +{ > + return !QLIST_EMPTY(&dev->iommu_list); > +} > + > static void vhost_iommu_region_add(MemoryListener *listener, > MemoryRegionSection *section) > { > @@ -782,6 +788,65 @@ static void vhost_iommu_region_del(MemoryListener *listener, > } > } > > +static void vhost_iommu_commit(MemoryListener *listener) > +{ > + struct vhost_dev *dev = container_of(listener, struct vhost_dev, > + iommu_listener); > + struct vhost_memory_region *r; > + int i; > + > + trace_vhost_iommu_commit(); > + > + if (!vhost_iommu_mr_enabled(dev)) { > + /* > + * This means iommu_platform is enabled, however iommu memory > + * region is disabled, e.g., when device passthrough is setup. > + * Then, no translation is needed any more. > + * > + * Let's first invalidate the whole IOTLB, then pre-heat the > + * static mapping by looping over vhost memory ranges. > + */ > + > + if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, 0, > + UINT64_MAX)) { > + error_report("%s: flush existing IOTLB failed", __func__); > + return; > + } > + > + /* > + * Current VHOST_IOTLB_INVALIDATE API has a small defect that > + * the invalidation for (start=0, size=UINT64_MAX) cannot > + * really invalidate an cached range of (start=UINT64_MAX-1, > + * size=1). We send this 2nd invalidation to workaround this. > + * But, frankly speaking for QEMU we don't have a problem with > + * this since we will never have a vhost cache with range > + * (start=UINT64_MAX-1, size=1) - if you see > + * address_space_get_iotlb_entry() all IOTLBs are page > + * aligned. > + */ > + if (dev->vhost_ops->vhost_invalidate_device_iotlb(dev, UINT64_MAX, > + 1)) { > + error_report("%s: flush existing IOTLB failed", __func__); > + return; > + } > + > + for (i = 0; i < dev->mem->nregions; i++) { > + r = &dev->mem->regions[i]; > + /* Vhost regions are writable RAM, so IOMMU_RW suites. */ > + if (dev->vhost_ops->vhost_update_device_iotlb(dev, > + r->guest_phys_addr, > + r->userspace_addr, > + r->memory_size, > + IOMMU_RW)) { > + error_report("%s: pre-heat static mapping failed", __func__); > + return; > + } > + } > + > + trace_vhost_iommu_static_preheat(); > + } > +} > + > static void vhost_region_nop(MemoryListener *listener, > MemoryRegionSection *section) > { > @@ -1298,6 +1363,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque, > hdev->iommu_listener = (MemoryListener) { > .region_add = vhost_iommu_region_add, > .region_del = vhost_iommu_region_del, > + .commit = vhost_iommu_commit, > }; > > if (hdev->migration_blocker == NULL) { > -- > 2.7.4 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() 2017-06-02 11:50 [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Peter Xu ` (2 preceding siblings ...) 2017-06-02 11:50 ` [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is Peter Xu @ 2017-06-02 14:51 ` Michael S. Tsirkin 2017-06-05 3:20 ` Peter Xu 3 siblings, 1 reply; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-02 14:51 UTC (permalink / raw) To: Peter Xu; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote: > With the patch applied: > > [PATCH v3] exec: fix address_space_get_iotlb_entry page mask > (already in Paolo's pull request but not yet merged) > > Now we can have valid address masks. However it is still not ideal, > considering that the mask may not be aligned to guest page sizes. One > example would be when huge page is used in guest (please see commit > message in patch 1 for details). It applies to normal pages too. So we > not only need a valid address mask, we should make sure it is page > mask (for x86, it should be either 4K/2M/1G pages). Why should we? To get better performance, right? > Patch 1+2 fixes the problem. Tested with both kernel net driver or > testpmd, on either 4K/2M pages, to make sure the page mask is correct. > > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll > definitely want patch 3 now. Here's the simplest TCP streaming test > using vhost dmar and iommu=pt in guest: > > without patch 3: 12.0Gbps And what happens without patches 1-2? > with patch 3: 33.5Gbps This is the part I don't get. Patches 1-2 will return a bigger region to callers. The result should be better performance - instead it seems to slow down vhost for some reason and we need tricks to get performance back. What's going on? > Please review, thanks. > > Peter Xu (3): > exec: add page_mask for address_space_do_translate > exec: simplify address_space_get_iotlb_entry > vhost: iommu: cache static mapping if there is > > exec.c | 73 +++++++++++++++++++++++++++++++++----------------- > hw/virtio/trace-events | 4 +++ > hw/virtio/vhost.c | 66 +++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 119 insertions(+), 24 deletions(-) > > -- > 2.7.4 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() 2017-06-02 14:51 ` [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Michael S. Tsirkin @ 2017-06-05 3:20 ` Peter Xu 2017-06-06 15:29 ` Michael S. Tsirkin 0 siblings, 1 reply; 34+ messages in thread From: Peter Xu @ 2017-06-05 3:20 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: qemu-devel, Paolo Bonzini, Maxime Coquelin, Jason Wang On Fri, Jun 02, 2017 at 05:51:07PM +0300, Michael S. Tsirkin wrote: > On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote: > > With the patch applied: > > > > [PATCH v3] exec: fix address_space_get_iotlb_entry page mask > > (already in Paolo's pull request but not yet merged) > > > > Now we can have valid address masks. However it is still not ideal, > > considering that the mask may not be aligned to guest page sizes. One > > example would be when huge page is used in guest (please see commit > > message in patch 1 for details). It applies to normal pages too. So we > > not only need a valid address mask, we should make sure it is page > > mask (for x86, it should be either 4K/2M/1G pages). > > Why should we? To get better performance, right? IMHO one point is for performance, the other point is on how we should define the IOTLB interface. My opinion is that it is better valid masks. > > > Patch 1+2 fixes the problem. Tested with both kernel net driver or > > testpmd, on either 4K/2M pages, to make sure the page mask is correct. > > > > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll > > definitely want patch 3 now. Here's the simplest TCP streaming test > > using vhost dmar and iommu=pt in guest: > > > > without patch 3: 12.0Gbps > > And what happens without patches 1-2? Without 1-2, performance is good. But I think it is hacky to have such a good result (I explained why the performance is good in the VT-d PT support thread with some logs)... > > > with patch 3: 33.5Gbps > > This is the part I don't get. Patches 1-2 will return a bigger region to > callers. The result should be better performance - instead it seems to > slow down vhost for some reason and we need tricks to get > performance back. What's going on? Yes. The problem is that if without patch 1/2 I think the codes lacks correctness. With correctness, we lost performance, then I picked patch 3 as well. Again, I think the first thing we need to settle is what should be the best definition for IOTLB (addr_mask or arbitary length). Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() 2017-06-05 3:20 ` Peter Xu @ 2017-06-06 15:29 ` Michael S. Tsirkin 0 siblings, 0 replies; 34+ messages in thread From: Michael S. Tsirkin @ 2017-06-06 15:29 UTC (permalink / raw) To: Peter Xu; +Cc: Jason Wang, Paolo Bonzini, Maxime Coquelin, qemu-devel On Mon, Jun 05, 2017 at 11:20:13AM +0800, Peter Xu wrote: > On Fri, Jun 02, 2017 at 05:51:07PM +0300, Michael S. Tsirkin wrote: > > On Fri, Jun 02, 2017 at 07:50:51PM +0800, Peter Xu wrote: > > > With the patch applied: > > > > > > [PATCH v3] exec: fix address_space_get_iotlb_entry page mask > > > (already in Paolo's pull request but not yet merged) > > > > > > Now we can have valid address masks. However it is still not ideal, > > > considering that the mask may not be aligned to guest page sizes. One > > > example would be when huge page is used in guest (please see commit > > > message in patch 1 for details). It applies to normal pages too. So we > > > not only need a valid address mask, we should make sure it is page > > > mask (for x86, it should be either 4K/2M/1G pages). > > > > Why should we? To get better performance, right? > > IMHO one point is for performance, the other point is on how we should > define the IOTLB interface. My opinion is that it is better valid > masks. > > > > > > Patch 1+2 fixes the problem. Tested with both kernel net driver or > > > testpmd, on either 4K/2M pages, to make sure the page mask is correct. > > > > > > Patch 3 is cherry picked from PT series, after fixing from 1+2, we'll > > > definitely want patch 3 now. Here's the simplest TCP streaming test > > > using vhost dmar and iommu=pt in guest: > > > > > > without patch 3: 12.0Gbps > > > > And what happens without patches 1-2? > > Without 1-2, performance is good. But I think it is hacky to have such > a good result (I explained why the performance is good in the VT-d PT > support thread with some logs)... > > > > > > with patch 3: 33.5Gbps > > > > This is the part I don't get. Patches 1-2 will return a bigger region to > > callers. The result should be better performance - instead it seems to > > slow down vhost for some reason and we need tricks to get > > performance back. What's going on? > > Yes. The problem is that if without patch 1/2 I think the codes lacks > correctness. With correctness, we lost performance, then I picked > patch 3 as well. > > Again, I think the first thing we need to settle is what should be the > best definition for IOTLB (addr_mask or arbitary length). > > Thanks, If arbitary length means we don't require prefaulting hacks, I'm for using arbitary length. > -- > Peter Xu ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2017-06-16 15:33 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-02 11:50 [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Peter Xu 2017-06-02 11:50 ` [Qemu-devel] [PATCH 1/3] exec: add page_mask for address_space_do_translate Peter Xu 2017-06-02 16:45 ` Michael S. Tsirkin 2017-06-05 2:52 ` Peter Xu 2017-06-02 11:50 ` [Qemu-devel] [PATCH 2/3] exec: simplify address_space_get_iotlb_entry Peter Xu 2017-06-02 16:49 ` Michael S. Tsirkin 2017-06-05 3:07 ` Peter Xu 2017-06-06 14:34 ` Paolo Bonzini 2017-06-06 23:47 ` David Gibson 2017-06-07 3:44 ` Peter Xu 2017-06-07 13:07 ` Michael S. Tsirkin 2017-06-08 6:11 ` Peter Xu 2017-06-08 18:59 ` Michael S. Tsirkin 2017-06-09 1:58 ` Peter Xu 2017-06-09 2:37 ` David Gibson 2017-06-11 10:09 ` Michael S. Tsirkin 2017-06-11 12:10 ` David Gibson 2017-06-12 2:34 ` Peter Xu 2017-06-12 3:07 ` Michael S. Tsirkin 2017-06-12 4:04 ` Peter Xu 2017-06-14 18:34 ` Michael S. Tsirkin 2017-06-15 2:31 ` Peter Xu 2017-06-15 2:57 ` Peter Xu 2017-06-16 15:33 ` Michael S. Tsirkin 2017-06-07 13:01 ` Paolo Bonzini 2017-06-02 11:50 ` [Qemu-devel] [PATCH 3/3] vhost: iommu: cache static mapping if there is Peter Xu 2017-06-02 15:45 ` Michael S. Tsirkin 2017-06-05 3:15 ` Peter Xu 2017-06-05 4:07 ` Jason Wang 2017-06-05 15:05 ` Michael S. Tsirkin 2017-06-02 16:51 ` Michael S. Tsirkin 2017-06-02 14:51 ` [Qemu-devel] [PATCH 0/3] exec: further refine address_space_get_iotlb_entry() Michael S. Tsirkin 2017-06-05 3:20 ` Peter Xu 2017-06-06 15:29 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).