* Re: [not found] <20090427104117.GB29082@redhat.com> @ 2009-04-27 13:16 ` Sheng Yang 2009-04-27 13:51 ` qemu/hw/device-assignment: questions about msix_table_page Michael S. Tsirkin 0 siblings, 1 reply; 18+ messages in thread From: Sheng Yang @ 2009-04-27 13:16 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: > Sheng, Marcelo, > I've been reading code in qemu/hw/device-assignment.c, and > I have a couple of questions about msi-x implementation: Hi Michael > 1. What is the reason that msix_table_page is allocated > with mmap and not with e.g. malloc? msix_table_page is a page, and mmap allocate memory on page boundary. So I use it. > 2. msix_table_page has the guest view of the msix table for the device. > However, even this memory isn't mapped into guest directly, instead > msix_mmio_read/msix_mmio_write perform the write in qemu. > Won't it be possible to map this page directly into > guest memory, reducing the overhead for table writes? First, Linux configured the real MSI-X table in device, which is out of our scope. KVM accepted the interrupt from Linux, then inject it to the guest according to the MSI-X table setting of guest. So KVM should know about the page modification. For example, MSI-X table got mask bit which can be written by guest at any time(this bit haven't been implement yet, but should be soon), then we should mask the correlated vector of real MSI-X table; then guest may modified the MSI address/data, that also should be intercepted by KVM and used to update our knowledge of guest. So we can't passthrough the modification. If guest can write to the real device MSI-X table directly, it would cause chaos on interrupt delivery, for what guest see is totally different with what's host see... -- regards Yang, Sheng > > Could you shed light on this for me please? > Thanks, ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 13:16 ` Sheng Yang @ 2009-04-27 13:51 ` Michael S. Tsirkin 2009-04-27 14:03 ` Sheng Yang 0 siblings, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-04-27 13:51 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: > On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: > > Sheng, Marcelo, > > I've been reading code in qemu/hw/device-assignment.c, and > > I have a couple of questions about msi-x implementation: > > Hi Michael > > > 1. What is the reason that msix_table_page is allocated > > with mmap and not with e.g. malloc? > > msix_table_page is a page, and mmap allocate memory on page boundary. So I use > it. Just wondering, would e.g. posix_memalign work here as well? > > 2. msix_table_page has the guest view of the msix table for the device. > > However, even this memory isn't mapped into guest directly, instead > > msix_mmio_read/msix_mmio_write perform the write in qemu. > > Won't it be possible to map this page directly into > > guest memory, reducing the overhead for table writes? > > First, Linux configured the real MSI-X table in device, which is out of our > scope. KVM accepted the interrupt from Linux, then inject it to the guest > according to the MSI-X table setting of guest. So KVM should know about the > page modification. For example, MSI-X table got mask bit which can be written > by guest at any time(this bit haven't been implement yet, but should be soon), > then we should mask the correlated vector of real MSI-X table; then guest may > modified the MSI address/data, that also should be intercepted by KVM and used > to update our knowledge of guest. So we can't passthrough the modification. Right, I see that. However all msix_mmio_write does is a memcpy. So what I don't understand yet, what causes the real MSI-X table to be modified? Where's that code? > If guest can write to the real device MSI-X table directly, it would cause > chaos on interrupt delivery, for what guest see is totally different with > what's host see... Obviously. Thanks, -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 13:51 ` qemu/hw/device-assignment: questions about msix_table_page Michael S. Tsirkin @ 2009-04-27 14:03 ` Sheng Yang 2009-04-27 14:15 ` Michael S. Tsirkin 2009-04-28 9:31 ` Avi Kivity 0 siblings, 2 replies; 18+ messages in thread From: Sheng Yang @ 2009-04-27 14:03 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote: > On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: > > On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: > > > Sheng, Marcelo, > > > I've been reading code in qemu/hw/device-assignment.c, and > > > I have a couple of questions about msi-x implementation: > > > > Hi Michael > > > > > 1. What is the reason that msix_table_page is allocated > > > with mmap and not with e.g. malloc? > > > > msix_table_page is a page, and mmap allocate memory on page boundary. So > > I use it. > > Just wondering, would e.g. posix_memalign work here as well? Um, I think it should work too. > > > > 2. msix_table_page has the guest view of the msix table for the device. > > > However, even this memory isn't mapped into guest directly, instead > > > msix_mmio_read/msix_mmio_write perform the write in qemu. > > > Won't it be possible to map this page directly into > > > guest memory, reducing the overhead for table writes? > > > > First, Linux configured the real MSI-X table in device, which is out of > > our scope. KVM accepted the interrupt from Linux, then inject it to the > > guest according to the MSI-X table setting of guest. So KVM should know > > about the page modification. For example, MSI-X table got mask bit which > > can be written by guest at any time(this bit haven't been implement yet, > > but should be soon), then we should mask the correlated vector of real > > MSI-X table; then guest may modified the MSI address/data, that also > > should be intercepted by KVM and used to update our knowledge of guest. > > So we can't passthrough the modification. > > Right, I see that. However all msix_mmio_write does is a memcpy. > So what I don't understand yet, what causes the real MSI-X table to be > modified? Where's that code? Now it haven't been allowed to do dynamically change...:( For now, please refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It would scan the MSI-X page and transfer to KVM through ioctl. And for the kernel part, please refer to pci_enable_msix(). The userspace/kernel both still need change to support mask/unmask feature. That's still in TODO list(and I hope it can catch up with 2.6.31 merge window). -- regards Yang, Sheng > > If guest can write to the real device MSI-X table directly, it would > > cause chaos on interrupt delivery, for what guest see is totally > > different with what's host see... > > Obviously. > > Thanks, ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 14:03 ` Sheng Yang @ 2009-04-27 14:15 ` Michael S. Tsirkin 2009-04-27 14:30 ` Sheng Yang 2009-04-28 9:31 ` Avi Kivity 1 sibling, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-04-27 14:15 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Mon, Apr 27, 2009 at 10:03:59PM +0800, Sheng Yang wrote: > On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote: > > On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: > > > On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: > > > > Sheng, Marcelo, > > > > I've been reading code in qemu/hw/device-assignment.c, and > > > > I have a couple of questions about msi-x implementation: > > > > > > Hi Michael > > > > > > > 1. What is the reason that msix_table_page is allocated > > > > with mmap and not with e.g. malloc? > > > > > > msix_table_page is a page, and mmap allocate memory on page boundary. So > > > I use it. > > > > Just wondering, would e.g. posix_memalign work here as well? > > Um, I think it should work too. > > > > > > 2. msix_table_page has the guest view of the msix table for the device. > > > > However, even this memory isn't mapped into guest directly, instead > > > > msix_mmio_read/msix_mmio_write perform the write in qemu. > > > > Won't it be possible to map this page directly into > > > > guest memory, reducing the overhead for table writes? > > > > > > First, Linux configured the real MSI-X table in device, which is out of > > > our scope. KVM accepted the interrupt from Linux, then inject it to the > > > guest according to the MSI-X table setting of guest. So KVM should know > > > about the page modification. For example, MSI-X table got mask bit which > > > can be written by guest at any time(this bit haven't been implement yet, > > > but should be soon), then we should mask the correlated vector of real > > > MSI-X table; then guest may modified the MSI address/data, that also > > > should be intercepted by KVM and used to update our knowledge of guest. > > > So we can't passthrough the modification. > > > > Right, I see that. However all msix_mmio_write does is a memcpy. > > So what I don't understand yet, what causes the real MSI-X table to be > > modified? Where's that code? > > Now it haven't been allowed to do dynamically change...:( For now, please > refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. It > would scan the MSI-X page and transfer to KVM through ioctl. And for the > kernel part, please refer to pci_enable_msix(). Right, I noticed that. So msix_mmio_read/write are placeholders, right? For exiting functionality we thinkably could let the guest write into msix_table_page. > The userspace/kernel both still need change to support mask/unmask feature. > That's still in TODO list(and I hope it can catch up with 2.6.31 merge > window). > > -- > regards > Yang, Sheng > > > > > If guest can write to the real device MSI-X table directly, it would > > > cause chaos on interrupt delivery, for what guest see is totally > > > different with what's host see... > > > > Obviously. > > > > Thanks, > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 14:15 ` Michael S. Tsirkin @ 2009-04-27 14:30 ` Sheng Yang 2009-04-27 14:35 ` Michael S. Tsirkin 2009-05-05 9:51 ` Michael S. Tsirkin 0 siblings, 2 replies; 18+ messages in thread From: Sheng Yang @ 2009-04-27 14:30 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Monday 27 April 2009 22:15:04 Michael S. Tsirkin wrote: > On Mon, Apr 27, 2009 at 10:03:59PM +0800, Sheng Yang wrote: > > On Monday 27 April 2009 21:51:34 Michael S. Tsirkin wrote: > > > On Mon, Apr 27, 2009 at 09:16:14PM +0800, Sheng Yang wrote: > > > > On Monday 27 April 2009 18:41:17 Michael S. Tsirkin wrote: > > > > > Sheng, Marcelo, > > > > > I've been reading code in qemu/hw/device-assignment.c, and > > > > > I have a couple of questions about msi-x implementation: > > > > > > > > Hi Michael > > > > > > > > > 1. What is the reason that msix_table_page is allocated > > > > > with mmap and not with e.g. malloc? > > > > > > > > msix_table_page is a page, and mmap allocate memory on page boundary. > > > > So I use it. > > > > > > Just wondering, would e.g. posix_memalign work here as well? > > > > Um, I think it should work too. > > > > > > > 2. msix_table_page has the guest view of the msix table for the > > > > > device. However, even this memory isn't mapped into guest directly, > > > > > instead msix_mmio_read/msix_mmio_write perform the write in qemu. > > > > > Won't it be possible to map this page directly into > > > > > guest memory, reducing the overhead for table writes? > > > > > > > > First, Linux configured the real MSI-X table in device, which is out > > > > of our scope. KVM accepted the interrupt from Linux, then inject it > > > > to the guest according to the MSI-X table setting of guest. So KVM > > > > should know about the page modification. For example, MSI-X table got > > > > mask bit which can be written by guest at any time(this bit haven't > > > > been implement yet, but should be soon), then we should mask the > > > > correlated vector of real MSI-X table; then guest may modified the > > > > MSI address/data, that also should be intercepted by KVM and used to > > > > update our knowledge of guest. So we can't passthrough the > > > > modification. > > > > > > Right, I see that. However all msix_mmio_write does is a memcpy. > > > So what I don't understand yet, what causes the real MSI-X table to be > > > modified? Where's that code? > > > > Now it haven't been allowed to do dynamically change...:( For now, please > > refer to assigned_device_update_msix_mmio in qemu/hw/device-assignment.c. > > It would scan the MSI-X page and transfer to KVM through ioctl. And for > > the kernel part, please refer to pci_enable_msix(). > > Right, I noticed that. So msix_mmio_read/write are placeholders, right? > For exiting functionality we thinkably could let the guest write into > msix_table_page. Yeah, in some meaning. But it's not suggested let guest write into the msix_table_page. 1. For some drivers don't set/unset mask bit, guest won't access the page frequently, so it shouldn't be a performance critical one. 2. With mask bit, some drivers would access the page more frequently, but you have to intercept it... My suggestion is get mask bit support first, then consider optimization. Mask bit would affect the performance on large scaled system, so it should be done. So current MSI-X status is indeed temporarily... I guess what you see now maybe accessing is frequently due to mask bit, but we ignored it for now, so it may can be optimized. But I really think it's temporarily status, so I think it's better not to spend much effect on optimize before mask bit support done... -- regards Yang, Sheng > > > The userspace/kernel both still need change to support mask/unmask > > feature. That's still in TODO list(and I hope it can catch up with 2.6.31 > > merge window). > > > > -- > > regards > > Yang, Sheng > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > different with what's host see... > > > > > > Obviously. > > > > > > Thanks, ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 14:30 ` Sheng Yang @ 2009-04-27 14:35 ` Michael S. Tsirkin 2009-05-05 9:51 ` Michael S. Tsirkin 1 sibling, 0 replies; 18+ messages in thread From: Michael S. Tsirkin @ 2009-04-27 14:35 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > My suggestion is get mask bit support first, then consider optimization. Yea. Thanks for the explanations! -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 14:30 ` Sheng Yang 2009-04-27 14:35 ` Michael S. Tsirkin @ 2009-05-05 9:51 ` Michael S. Tsirkin 2009-05-05 10:19 ` Marcelo Tosatti 1 sibling, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-05-05 9:51 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > different with what's host see... > > > > > > > > Obviously. > > > > > > > > Thanks, > What's the reason that this page is unmapped from the qemu memory space? Specifically what do these lines do: int offset = r_dev->msix_table_addr - real_region->base_addr; ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 9:51 ` Michael S. Tsirkin @ 2009-05-05 10:19 ` Marcelo Tosatti 2009-05-05 10:34 ` Michael S. Tsirkin 0 siblings, 1 reply; 18+ messages in thread From: Marcelo Tosatti @ 2009-05-05 10:19 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Sheng Yang, Avi Kivity, kvm On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > > different with what's host see... > > > > > > > > > > Obviously. > > > > > > > > > > Thanks, > > > > What's the reason that this page is unmapped from the qemu memory space? > Specifically what do these lines do: > int offset = r_dev->msix_table_addr - real_region->base_addr; > ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); I believe this allows accesses to this page (the MSI-X table), which is part of the guest address space (through kvm memory slots), to be trapped by qemu. Since there is no actual page in this guest address, KVM treats accesses as MMIO and forwards them to QEMU. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 10:19 ` Marcelo Tosatti @ 2009-05-05 10:34 ` Michael S. Tsirkin 2009-05-05 10:49 ` Marcelo Tosatti 0 siblings, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-05-05 10:34 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Sheng Yang, Avi Kivity, kvm On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > > > different with what's host see... > > > > > > > > > > > > Obviously. > > > > > > > > > > > > Thanks, > > > > > > > What's the reason that this page is unmapped from the qemu memory space? > > Specifically what do these lines do: > > int offset = r_dev->msix_table_addr - real_region->base_addr; > > ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); > > I believe this allows accesses to this page (the MSI-X table), which > is part of the guest address space (through kvm memory slots), to be > trapped by qemu. > > Since there is no actual page in this guest address, KVM treats accesses > as MMIO and forwards them to QEMU. > > I thought about this too. But why is this necessary for assigned MSI-X but not for emulated devices such as e.g. e1000? All e1000 does seems to be cpu_register_physical_memory ... -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 10:34 ` Michael S. Tsirkin @ 2009-05-05 10:49 ` Marcelo Tosatti 2009-05-05 11:45 ` Michael S. Tsirkin 2009-05-05 12:46 ` Michael S. Tsirkin 0 siblings, 2 replies; 18+ messages in thread From: Marcelo Tosatti @ 2009-05-05 10:49 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Sheng Yang, Avi Kivity, kvm On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > > > > different with what's host see... > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > What's the reason that this page is unmapped from the qemu memory space? > > > Specifically what do these lines do: > > > int offset = r_dev->msix_table_addr - real_region->base_addr; > > > ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); > > > > I believe this allows accesses to this page (the MSI-X table), which > > is part of the guest address space (through kvm memory slots), to be > > trapped by qemu. > > > > Since there is no actual page in this guest address, KVM treats accesses > > as MMIO and forwards them to QEMU. > > > > > > I thought about this too. > But why is this necessary for assigned MSI-X but not for emulated devices such as > e.g. e1000? All e1000 does seems to be cpu_register_physical_memory ... Because there is no registered (kvm) memory slot for the range which e1000 registers its MMIO? Not sure about the address of the MSI-X table page, but you could achieve the same effect by splitting the slot which it lives in two, with a 1 page hole between them. BTW this is why you can't map the MSI-X table page directly, you want accesses to be trapped. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 10:49 ` Marcelo Tosatti @ 2009-05-05 11:45 ` Michael S. Tsirkin 2009-05-05 11:51 ` Marcelo Tosatti 2009-05-05 12:46 ` Michael S. Tsirkin 1 sibling, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-05-05 11:45 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Sheng Yang, Avi Kivity, kvm On Tue, May 05, 2009 at 07:49:10AM -0300, Marcelo Tosatti wrote: > On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > > > > > different with what's host see... > > > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > What's the reason that this page is unmapped from the qemu memory space? > > > > Specifically what do these lines do: > > > > int offset = r_dev->msix_table_addr - real_region->base_addr; > > > > ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); > > > > > > I believe this allows accesses to this page (the MSI-X table), which > > > is part of the guest address space (through kvm memory slots), to be > > > trapped by qemu. > > > > > > Since there is no actual page in this guest address, KVM treats accesses > > > as MMIO and forwards them to QEMU. > > > > > > > > > > I thought about this too. > > But why is this necessary for assigned MSI-X but not for emulated devices such as > > e.g. e1000? All e1000 does seems to be cpu_register_physical_memory ... > > Because there is no registered (kvm) memory slot for the range which > e1000 registers its MMIO? ret = kvm_register_phys_mem(kvm_context, e_phys, region->u.r_virtbase, TARGET_PAGE_ALIGN(e_size), 0); is what creates this slot, correct? > Not sure about the address of the MSI-X table > page, but you could achieve the same effect by splitting the slot which > it lives in two, with a 1 page hole between them. > > BTW this is why you can't map the MSI-X table page directly, you want > accesses to be trapped. -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 11:45 ` Michael S. Tsirkin @ 2009-05-05 11:51 ` Marcelo Tosatti 0 siblings, 0 replies; 18+ messages in thread From: Marcelo Tosatti @ 2009-05-05 11:51 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Sheng Yang, Avi Kivity, kvm On Tue, May 05, 2009 at 02:45:38PM +0300, Michael S. Tsirkin wrote: > On Tue, May 05, 2009 at 07:49:10AM -0300, Marcelo Tosatti wrote: > > On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > > > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > > > > > > different with what's host see... > > > > > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > What's the reason that this page is unmapped from the qemu memory space? > > > > > Specifically what do these lines do: > > > > > int offset = r_dev->msix_table_addr - real_region->base_addr; > > > > > ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); > > > > > > > > I believe this allows accesses to this page (the MSI-X table), which > > > > is part of the guest address space (through kvm memory slots), to be > > > > trapped by qemu. > > > > > > > > Since there is no actual page in this guest address, KVM treats accesses > > > > as MMIO and forwards them to QEMU. > > > > > > > > > > > > > > I thought about this too. > > > But why is this necessary for assigned MSI-X but not for emulated devices such as > > > e.g. e1000? All e1000 does seems to be cpu_register_physical_memory ... > > > > Because there is no registered (kvm) memory slot for the range which > > e1000 registers its MMIO? > > ret = kvm_register_phys_mem(kvm_context, e_phys, > region->u.r_virtbase, > TARGET_PAGE_ALIGN(e_size), 0); > is what creates this slot, correct? Yes, think so. Now I remember: you map the assigned PCI device memory to the guest, but need to intercept only the MSI-X table. > > > Not sure about the address of the MSI-X table > > page, but you could achieve the same effect by splitting the slot which > > it lives in two, with a 1 page hole between them. > > > > BTW this is why you can't map the MSI-X table page directly, you want > > accesses to be trapped. > > -- > MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 10:49 ` Marcelo Tosatti 2009-05-05 11:45 ` Michael S. Tsirkin @ 2009-05-05 12:46 ` Michael S. Tsirkin 2009-05-06 2:35 ` Sheng Yang 1 sibling, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-05-05 12:46 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Sheng Yang, Avi Kivity, kvm On Tue, May 05, 2009 at 07:49:10AM -0300, Marcelo Tosatti wrote: > On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > > If guest can write to the real device MSI-X table directly, it would > > > > > > > > > cause chaos on interrupt delivery, for what guest see is totally > > > > > > > > > different with what's host see... > > > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > What's the reason that this page is unmapped from the qemu memory space? > > > > Specifically what do these lines do: > > > > int offset = r_dev->msix_table_addr - real_region->base_addr; > > > > ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE); > > > > > > I believe this allows accesses to this page (the MSI-X table), which > > > is part of the guest address space (through kvm memory slots), to be > > > trapped by qemu. > > > > > > Since there is no actual page in this guest address, KVM treats accesses > > > as MMIO and forwards them to QEMU. > > > > > > > > > > I thought about this too. > > But why is this necessary for assigned MSI-X but not for emulated devices such as > > e.g. e1000? All e1000 does seems to be cpu_register_physical_memory ... > > Because there is no registered (kvm) memory slot for the range which > e1000 registers its MMIO? Not sure about the address of the MSI-X table > page, but you could achieve the same effect by splitting the slot which > it lives in two, with a 1 page hole between them. You could also move the emulated MSI-X table, sticking it on top of the existing BAR. Since PCI config includes the pointer to the table, a driver that reads this pointer will continue to work. Of course, there's no guarantee that guest drivers don't just hard-code this offset. > BTW this is why you can't map the MSI-X table page directly, you want > accesses to be trapped. BTW current design won't work if the base page size is > 4K, will it? The hole covers a page, so you'll get faults outside the MSI-X table. -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-05 12:46 ` Michael S. Tsirkin @ 2009-05-06 2:35 ` Sheng Yang 2009-05-06 7:31 ` Michael S. Tsirkin 0 siblings, 1 reply; 18+ messages in thread From: Sheng Yang @ 2009-05-06 2:35 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Marcelo Tosatti, Avi Kivity, kvm On Tuesday 05 May 2009 20:46:04 Michael S. Tsirkin wrote: > On Tue, May 05, 2009 at 07:49:10AM -0300, Marcelo Tosatti wrote: > > On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > > > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > > > If guest can write to the real device MSI-X table > > > > > > > > > > directly, it would cause chaos on interrupt delivery, for > > > > > > > > > > what guest see is totally different with what's host > > > > > > > > > > see... > > > > > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > What's the reason that this page is unmapped from the qemu memory > > > > > space? Specifically what do these lines do: > > > > > int offset = r_dev->msix_table_addr - > > > > > real_region->base_addr; ret = munmap(region->u.r_virtbase + offset, > > > > > TARGET_PAGE_SIZE); > > > > > > > > I believe this allows accesses to this page (the MSI-X table), which > > > > is part of the guest address space (through kvm memory slots), to be > > > > trapped by qemu. > > > > > > > > Since there is no actual page in this guest address, KVM treats > > > > accesses as MMIO and forwards them to QEMU. > > > > > > I thought about this too. > > > But why is this necessary for assigned MSI-X but not for emulated > > > devices such as e.g. e1000? All e1000 does seems to be > > > cpu_register_physical_memory ... > > > > Because there is no registered (kvm) memory slot for the range which > > e1000 registers its MMIO? Not sure about the address of the MSI-X table > > page, but you could achieve the same effect by splitting the slot which > > it lives in two, with a 1 page hole between them. > > You could also move the emulated MSI-X table, sticking it on top of the > existing BAR. Since PCI config includes the pointer to the table, > a driver that reads this pointer will continue to work. One BAR can contain more than a MSI-X table... The PCI spec only said the other information should be page aligned and can't in the same page of MSI-X table(except PBA). I think this method make thing more complicate, we don't want to and can't trap other informations in the same BAR... > Of course, there's no guarantee that guest drivers don't just hard-code > this offset. I think this mostly won't happen. > > > BTW this is why you can't map the MSI-X table page directly, you want > > accesses to be trapped. > > BTW current design won't work if the base page size is > 4K, will it? > The hole covers a page, so you'll get faults outside the MSI-X table. Yes. One entry for MSI-X is 16bytes, one page can contain 256 entries. Well, I haven't see a device get more than 100 entries, but for this limitation, maybe we should limit MSI-X max entries to 256 (rather than 512 entries now)temporarily... -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-06 2:35 ` Sheng Yang @ 2009-05-06 7:31 ` Michael S. Tsirkin 2009-05-06 8:17 ` Sheng Yang 0 siblings, 1 reply; 18+ messages in thread From: Michael S. Tsirkin @ 2009-05-06 7:31 UTC (permalink / raw) To: Sheng Yang; +Cc: Marcelo Tosatti, Avi Kivity, kvm On Wed, May 06, 2009 at 10:35:27AM +0800, Sheng Yang wrote: > On Tuesday 05 May 2009 20:46:04 Michael S. Tsirkin wrote: > > On Tue, May 05, 2009 at 07:49:10AM -0300, Marcelo Tosatti wrote: > > > On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > > > > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > > > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > > > > If guest can write to the real device MSI-X table > > > > > > > > > > > directly, it would cause chaos on interrupt delivery, for > > > > > > > > > > > what guest see is totally different with what's host > > > > > > > > > > > see... > > > > > > > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > What's the reason that this page is unmapped from the qemu memory > > > > > > space? Specifically what do these lines do: > > > > > > int offset = r_dev->msix_table_addr - > > > > > > real_region->base_addr; ret = munmap(region->u.r_virtbase + offset, > > > > > > TARGET_PAGE_SIZE); > > > > > > > > > > I believe this allows accesses to this page (the MSI-X table), which > > > > > is part of the guest address space (through kvm memory slots), to be > > > > > trapped by qemu. > > > > > > > > > > Since there is no actual page in this guest address, KVM treats > > > > > accesses as MMIO and forwards them to QEMU. > > > > > > > > I thought about this too. > > > > But why is this necessary for assigned MSI-X but not for emulated > > > > devices such as e.g. e1000? All e1000 does seems to be > > > > cpu_register_physical_memory ... > > > > > > Because there is no registered (kvm) memory slot for the range which > > > e1000 registers its MMIO? Not sure about the address of the MSI-X table > > > page, but you could achieve the same effect by splitting the slot which > > > it lives in two, with a 1 page hole between them. > > > > You could also move the emulated MSI-X table, sticking it on top of the > > existing BAR. Since PCI config includes the pointer to the table, > > a driver that reads this pointer will continue to work. > > One BAR can contain more than a MSI-X table... The PCI spec only said the > other information should be page aligned and can't in the same page of MSI-X > table(except PBA). I think this method make thing more complicate, we don't > want to and can't trap other informations in the same BAR... The trick I was suggesting was increasing the BAR size. Let's assume we have real BAR of size 1Mbyte and MSI-X table at offset 0. We report to guest BAR of size 2Mbyte and MSI-X table offset 1MByte. Trap all accesses 1MByte to 2MByte and copy them to MSI-X table. > > Of course, there's no guarantee that guest drivers don't just hard-code > > this offset. > > I think this mostly won't happen. > > > > > BTW this is why you can't map the MSI-X table page directly, you want > > > accesses to be trapped. > > > > BTW current design won't work if the base page size is > 4K, will it? > > The hole covers a page, so you'll get faults outside the MSI-X table. > > Yes. One entry for MSI-X is 16bytes, one page can contain 256 entries. Well, I > haven't see a device get more than 100 entries, but for this limitation, maybe > we should limit MSI-X max entries to 256 (rather than 512 entries > now)temporarily... Drivers might not have a clean fallback path if the number of entries becomes smaller. Another problem is if TARGET_PAGE_SIZE is > 4K. PCI spec only asks devices to reserve 4K of space for the table, so you will accidentally trapping accesses not related to MSI-X. -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-05-06 7:31 ` Michael S. Tsirkin @ 2009-05-06 8:17 ` Sheng Yang 0 siblings, 0 replies; 18+ messages in thread From: Sheng Yang @ 2009-05-06 8:17 UTC (permalink / raw) To: Michael S. Tsirkin; +Cc: Marcelo Tosatti, Avi Kivity, kvm On Wednesday 06 May 2009 15:31:18 Michael S. Tsirkin wrote: > On Wed, May 06, 2009 at 10:35:27AM +0800, Sheng Yang wrote: > > On Tuesday 05 May 2009 20:46:04 Michael S. Tsirkin wrote: > > > On Tue, May 05, 2009 at 07:49:10AM -0300, Marcelo Tosatti wrote: > > > > On Tue, May 05, 2009 at 01:34:50PM +0300, Michael S. Tsirkin wrote: > > > > > On Tue, May 05, 2009 at 07:19:45AM -0300, Marcelo Tosatti wrote: > > > > > > On Tue, May 05, 2009 at 12:51:36PM +0300, Michael S. Tsirkin wrote: > > > > > > > On Mon, Apr 27, 2009 at 10:30:17PM +0800, Sheng Yang wrote: > > > > > > > > > > > > If guest can write to the real device MSI-X table > > > > > > > > > > > > directly, it would cause chaos on interrupt delivery, > > > > > > > > > > > > for what guest see is totally different with what's > > > > > > > > > > > > host see... > > > > > > > > > > > > > > > > > > > > > > Obviously. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > What's the reason that this page is unmapped from the qemu > > > > > > > memory space? Specifically what do these lines do: > > > > > > > int offset = r_dev->msix_table_addr - > > > > > > > real_region->base_addr; ret = munmap(region->u.r_virtbase + > > > > > > > offset, TARGET_PAGE_SIZE); > > > > > > > > > > > > I believe this allows accesses to this page (the MSI-X table), > > > > > > which is part of the guest address space (through kvm memory > > > > > > slots), to be trapped by qemu. > > > > > > > > > > > > Since there is no actual page in this guest address, KVM treats > > > > > > accesses as MMIO and forwards them to QEMU. > > > > > > > > > > I thought about this too. > > > > > But why is this necessary for assigned MSI-X but not for emulated > > > > > devices such as e.g. e1000? All e1000 does seems to be > > > > > cpu_register_physical_memory ... > > > > > > > > Because there is no registered (kvm) memory slot for the range which > > > > e1000 registers its MMIO? Not sure about the address of the MSI-X > > > > table page, but you could achieve the same effect by splitting the > > > > slot which it lives in two, with a 1 page hole between them. > > > > > > You could also move the emulated MSI-X table, sticking it on top of the > > > existing BAR. Since PCI config includes the pointer to the table, > > > a driver that reads this pointer will continue to work. > > > > One BAR can contain more than a MSI-X table... The PCI spec only said the > > other information should be page aligned and can't in the same page of > > MSI-X table(except PBA). I think this method make thing more complicate, > > we don't want to and can't trap other informations in the same BAR... > > The trick I was suggesting was increasing the BAR size. > Let's assume we have real BAR of size 1Mbyte and MSI-X table at offset 0. > We report to guest BAR of size 2Mbyte and MSI-X table offset 1MByte. > Trap all accesses 1MByte to 2MByte and copy them to MSI-X table. Oh, yeah, understand. And I use current method just because it's simply... > > > Of course, there's no guarantee that guest drivers don't just hard-code > > > this offset. > > > > I think this mostly won't happen. > > > > > > BTW this is why you can't map the MSI-X table page directly, you want > > > > accesses to be trapped. > > > > > > BTW current design won't work if the base page size is > 4K, will it? > > > The hole covers a page, so you'll get faults outside the MSI-X table. > > > > Yes. One entry for MSI-X is 16bytes, one page can contain 256 entries. > > Well, I haven't see a device get more than 100 entries, but for this > > limitation, maybe we should limit MSI-X max entries to 256 (rather than > > 512 entries now)temporarily... > > Drivers might not have a clean fallback path if the number of entries > becomes smaller. The biggest one I saw now is one oplin card, it got 2 MSI-X vector per cpu plus one(i.e. 17 vectors for a 8-core machine)... And if driver don't have clean fallback, it's driver's problem... > > Another problem is if TARGET_PAGE_SIZE is > 4K. > PCI spec only asks devices to reserve 4K of space for the table, > so you will accidentally trapping accesses not related to MSI-X. Yes, this should be fixed... -- regards Yang, Sheng ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: qemu/hw/device-assignment: questions about msix_table_page 2009-04-27 14:03 ` Sheng Yang 2009-04-27 14:15 ` Michael S. Tsirkin @ 2009-04-28 9:31 ` Avi Kivity 1 sibling, 0 replies; 18+ messages in thread From: Avi Kivity @ 2009-04-28 9:31 UTC (permalink / raw) To: Sheng Yang; +Cc: Michael S. Tsirkin, Marcelo Tosatti, kvm Sheng Yang wrote: >>> msix_table_page is a page, and mmap allocate memory on page boundary. So >>> I use it. >>> >> Just wondering, would e.g. posix_memalign work here as well? >> > > Um, I think it should work too. > I think qemu_malloc() would work just as well. The hardware never sees the page, so it doesn't need to be aligned. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 18+ messages in thread
* qemu/hw/device-assignment: questions about msix_table_page @ 2009-04-27 13:13 Michael S. Tsirkin 0 siblings, 0 replies; 18+ messages in thread From: Michael S. Tsirkin @ 2009-04-27 13:13 UTC (permalink / raw) To: Sheng Yang; +Cc: Avi Kivity, Marcelo Tosatti, kvm Sheng, Marcelo, I've been reading code in qemu/hw/device-assignment.c, and I have a couple of questions about msi-x implementation: 1. What is the reason that msix_table_page is allocated with mmap and not with e.g. malloc? 2. msix_table_page has the guest view of the msix table for the device. However, even this memory isn't mapped into guest directly, instead msix_mmio_read/msix_mmio_write perform the write in qemu. Won't it be possible to map this page directly into guest memory, reducing the overhead for table writes? Could you shed light on this for me please? Thanks, ( ------) ( Resending with a sane subject/reply-to address. ) ( Sorry about multiple copies.) -- MST ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-05-06 8:17 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20090427104117.GB29082@redhat.com>
2009-04-27 13:16 ` Sheng Yang
2009-04-27 13:51 ` qemu/hw/device-assignment: questions about msix_table_page Michael S. Tsirkin
2009-04-27 14:03 ` Sheng Yang
2009-04-27 14:15 ` Michael S. Tsirkin
2009-04-27 14:30 ` Sheng Yang
2009-04-27 14:35 ` Michael S. Tsirkin
2009-05-05 9:51 ` Michael S. Tsirkin
2009-05-05 10:19 ` Marcelo Tosatti
2009-05-05 10:34 ` Michael S. Tsirkin
2009-05-05 10:49 ` Marcelo Tosatti
2009-05-05 11:45 ` Michael S. Tsirkin
2009-05-05 11:51 ` Marcelo Tosatti
2009-05-05 12:46 ` Michael S. Tsirkin
2009-05-06 2:35 ` Sheng Yang
2009-05-06 7:31 ` Michael S. Tsirkin
2009-05-06 8:17 ` Sheng Yang
2009-04-28 9:31 ` Avi Kivity
2009-04-27 13:13 Michael S. Tsirkin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).