From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: vram_dirty vs. shadow paging dirty tracking Date: Tue, 13 Mar 2007 16:30:22 -0500 Message-ID: <45F717EE.5040900@us.ibm.com> References: <45F6FC68.3040207@us.ibm.com> <8A87A9A84C201449A0C56B728ACF491E0B9DBF@liverpoolst.ad.cl.cam.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <8A87A9A84C201449A0C56B728ACF491E0B9DBF@liverpoolst.ad.cl.cam.ac.uk> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Pratt Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Ian Pratt wrote: >> When thinking about multithreading the device model, it occurred to me >> that it's a little odd that we're doing a memcmp to determine which >> portions of the VRAM has changed. Couldn't we just use dirty page >> tracking in the shadow paging code? That should significantly lower >> the >> overhead of this plus I believe the infrastructure is already mostly >> there in the shadow2 code. >> > > Yep, its been in the roadmap doc for quite a while. However, the log > dirty code isn't ideal for this. We'd need to extend it to enable it to > be turned on for just a subset of the GFN range (we could use a xen > rangeset for this). > Okay, I was curious if the log dirty stuff could do ranges. I guess not. > Even so, I'm not super keen on the idea of tearing down and rebuilding > 1024 PTE's up to 50 times a second. > > A lower overhead solution would be to do scanning and resetting of the > dirty bits on the PTEs (and a global tlb flush). Right, this is the approach I was assuming. There's really no use in tearing down the whole PTE (since you would have to take an extraneous read fault). > In the general case > this is tricky as the framebuffer could be mapped by multiple PTEs. In > practice, I believe this doesn't happen for either Linux or Windows. > I wouldn't think so, but showing my ignorance for a moment, does shadow2 not provide a mechanism to lookup VA's given a GFN? This lookup could be cheap if the structures are built during shadow page table construction. Sounds like this is a good long term goal but I think I'll stick with the threading as an intermediate goal. I've got a minor concern that threading isn't going to help us much when dom0 is UP since the VGA scanning won't happen while an MMIO/PIO request happens. With an SMP dom0, you could potentially do all the VGA scanning on one processor ensuring that qemu-dm wasn't ever "busy" when a request occurs. I'm slightly concerned though that having a thread that's as CPU hungry as the VGA scanning may increase context-switches during the MMIO/PIO handling which would actually hurt performance. We'll see soon enough though. Regards, Anthony Liguori > There's always a good fallback of just returning 'all dirty' if the > heuristic is violated. Would be good to knock this up. > > Best, > Ian >