IOTLB page size question

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* IOTLB page size question
@ 2014-10-13 20:50 Jan Sacha
  2014-10-13 21:41 ` Alex Williamson
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Sacha @ 2014-10-13 20:50 UTC (permalink / raw)
  To: kvm

Hi,

I have a question about IOTLB. We are running KVM/Qemu VMs with huge 
page memory backing on Intel Xeon Ivy Bridge machines. Our VMs use 10G 
Ether NICs in Intel VT-d mode. We actually see that IOTLB becomes a 
performance bottleneck when IOMMU uses 4k pages. We get much better 
packet throughput with 2M IOTLB pages.

We have tried two different Linux distributions (CentOS and Fedora). An 
older CentOS kernel maps everything using 4k IOTLB pages. Our newer 
Fedora kernel initially maps guest memory using 2M IOTLB pages, but we 
see that a couple of seconds later it remaps the first 0xE000 pages 
(3.75GB) of memory using 4k IOTLB pages. We still have 2M IOTLB page 
mappings for memory above 4GB.

Why would the kernel change the IOTLB page size from 2M to 4k? How can 
we make sure that all memory (except for some non-aligned bits) gets 
mapped using 2M IOTLB pages? As I mentioned, we are using huge-page 
memory backing for all our VMs. Any advice, also for debugging and 
further diagnosis, would be appreciated.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IOTLB page size question
  2014-10-13 20:50 IOTLB page size question Jan Sacha
@ 2014-10-13 21:41 ` Alex Williamson
  2014-10-13 23:16   ` Jan Sacha
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Williamson @ 2014-10-13 21:41 UTC (permalink / raw)
  To: Jan Sacha; +Cc: kvm

On Mon, 2014-10-13 at 13:50 -0700, Jan Sacha wrote:
> Hi,
> 
> I have a question about IOTLB. We are running KVM/Qemu VMs with huge 
> page memory backing on Intel Xeon Ivy Bridge machines. Our VMs use 10G 
> Ether NICs in Intel VT-d mode. We actually see that IOTLB becomes a 
> performance bottleneck when IOMMU uses 4k pages. We get much better 
> packet throughput with 2M IOTLB pages.
> 
> We have tried two different Linux distributions (CentOS and Fedora). An 

This doesn't really help narrow your kernel version.

> older CentOS kernel maps everything using 4k IOTLB pages. Our newer 
> Fedora kernel initially maps guest memory using 2M IOTLB pages, but we 
> see that a couple of seconds later it remaps the first 0xE000 pages 
> (3.75GB) of memory using 4k IOTLB pages. We still have 2M IOTLB page 
> mappings for memory above 4GB.

0xe0000 pages?  0xe000 pages isn't 3.75G.

> 
> Why would the kernel change the IOTLB page size from 2M to 4k? How can 
> we make sure that all memory (except for some non-aligned bits) gets 
> mapped using 2M IOTLB pages? As I mentioned, we are using huge-page 
> memory backing for all our VMs. Any advice, also for debugging and 
> further diagnosis, would be appreciated.

Legacy KVM device assignment maps IOMMU pages using the host kernel page
size for the region while VFIO will pass the largest contiguous range of
pages available to the IOMMU, regardless of kernel page size.  If VFIO
doesn't have the same problem then perhaps the kernel idea of the page
size for that reason has changed between mappings.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IOTLB page size question
  2014-10-13 21:41 ` Alex Williamson
@ 2014-10-13 23:16   ` Jan Sacha
  2014-10-17 21:01     ` Jan Sacha
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Sacha @ 2014-10-13 23:16 UTC (permalink / raw)
  To: kvm


On 10/13/2014 02:41 PM, Alex Williamson wrote:
> On Mon, 2014-10-13 at 13:50 -0700, Jan Sacha wrote:
>> We have tried two different Linux distributions (CentOS and Fedora)...
> This doesn't really help narrow your kernel version.
Fedora kernel was 3.11.10-100.fc18 and CentOS kernel was an older 
2.6.32-431.el6.

> 0xe0000 pages? 0xe000 pages isn't 3.75G. 
Okay, never mind the 0xe000. I meant 3.75GB of memory.

> Legacy KVM device assignment maps IOMMU pages using the host kernel page
> size for the region while VFIO will pass the largest contiguous range of
> pages available to the IOMMU, regardless of kernel page size.  If VFIO
> doesn't have the same problem then perhaps the kernel idea of the page
> size for that reason has changed between mappings.  Thanks,
When I have a VM running, I can see on the host OS:

# cat /proc/1360/numa_maps
2aaaaac00000 prefer:0 
file=/dev/hugepages/libvirt/qemu/qemu_back_mem.DO2GNu\040(deleted) huge 
dirty=3070 N0=3070
...

# cat /proc/1360/maps
2aaaaac00000-2aac2a800000 rw-s 00000000 00:1e 19928 
/dev/hugepages/libvirt/qemu/qemu_back_mem.DO2GNu (deleted)
...

So it looks to me that there should be roughly 6GB mapped using huge 
(2M) pages. PID 1360 is the qemu process. The VM is configured to use 
6GB. However, the IOTLB seems to be using 2k pages for all memory below 
4GB in the guest physical space. It does use 2M pages for memory above 
4GB. Does this make sense?

Thanks,

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: IOTLB page size question
  2014-10-13 23:16   ` Jan Sacha
@ 2014-10-17 21:01     ` Jan Sacha
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Sacha @ 2014-10-17 21:01 UTC (permalink / raw)
  To: kvm


On 10/13/2014 04:16 PM, Jan Sacha wrote:
>> Legacy KVM device assignment maps IOMMU pages using the host kernel page
>> size for the region while VFIO will pass the largest contiguous range of
>> pages available to the IOMMU, regardless of kernel page size. If VFIO
>> doesn't have the same problem then perhaps the kernel idea of the page
>> size for that reason has changed between mappings.  Thanks,
> When I have a VM running, I can see on the host OS:
>
> # cat /proc/1360/numa_maps
> 2aaaaac00000 prefer:0 
> file=/dev/hugepages/libvirt/qemu/qemu_back_mem.DO2GNu\040(deleted) 
> huge dirty=3070 N0=3070
> ...
>
> # cat /proc/1360/maps
> 2aaaaac00000-2aac2a800000 rw-s 00000000 00:1e 19928 
> /dev/hugepages/libvirt/qemu/qemu_back_mem.DO2GNu (deleted)
> ...
>
> So it looks to me that there should be roughly 6GB mapped using huge 
> (2M) pages. PID 1360 is the qemu process. The VM is configured to use 
> 6GB. However, the IOTLB seems to be using 2k pages for all memory 
> below 4GB in the guest physical space. It does use 2M pages for memory 
> above 4GB. Does this make sense?

We found a solution so I can answer my own question. This behavior was 
caused by a bug in the kernel. It was fixed in 3.13.

https://github.com/torvalds/linux/commit/e0230e1327fb862c9b6cde24ae62d55f9db62c9b

Jan



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-10-17 21:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-13 20:50 IOTLB page size question Jan Sacha
2014-10-13 21:41 ` Alex Williamson
2014-10-13 23:16   ` Jan Sacha
2014-10-17 21:01     ` Jan Sacha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).