From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tomasz Wroblewski Subject: GPU passthrough performance regression in >4GB vms due to XSA-60 changes Date: Thu, 15 May 2014 11:11:05 +0200 Message-ID: <537484A9.9000001@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta4.messagelabs.com ([85.158.143.247]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1Wksdj-00083r-7C for xen-devel@lists.xenproject.org; Thu, 15 May 2014 10:11:51 +0000 Received: by mail-ee0-f44.google.com with SMTP id c41so478718eek.17 for ; Thu, 15 May 2014 03:11:49 -0700 (PDT) List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel@lists.xenproject.org Cc: jinsong.liu@intel.com, JBeulich@suse.com List-Id: xen-devel@lists.xenproject.org Hello, We've recently updated from Xen 4.3.1 to 4.3.2 and found out a major regression in gpu passthrough performance in VMs using >4GB of memory. When using GPU pt (some radeon cards, also intergrated intel gpu pt), load on cpu is constantly near maximum and screen is slow to update. The machines are intel haswell/ivybridge laptops/desktops, the guests are windows 7 64-bit HVMs. I've bisected the failure to be due to XSA-60 changes, specifically: commit e81d0ac25464825b3828cff5dc9e8285612992c4 Author: Liu Jinsong Date: Mon Dec 9 14:26:03 2013 +0100 VMX: remove the problematic set_uc_mode logic This commit seems to have removed a bit of logic which, when guest was setting cache disable bit in CR0 for a brief time, was iterating on all mapped pfns and resetting memory type in EPTs to be consistent with the result of mtrr.c:epte_get_entry_emt() call. I believe my tracing indicates this used to return WRITEBACK caching strategy for the 64bit memory areas where the BARs of the gpu seem to be located. This bit of code is not happening anymore, speculatively I think the PCI BAR area stays as uncached which causes the general slowness. Note that I'm not talking about slow performance during the window the CR0 has caching disabled, it does stays slow even after guest reenables it shortly after since the problem seems to be a side effect of removed loop setting some default EPT policies on all pfns. Reintroducing the removed loop fixes the problem. Would welcome comments/ideas how to debug this more, or maybe there's an obvious fix.