vram_dirty vs. shadow paging dirty tracking

All of lore.kernel.org
 help / color / mirror / Atom feed

* vram_dirty vs. shadow paging dirty tracking
@ 2007-03-13 19:32 Anthony Liguori
  2007-03-13 21:02 ` Ian Pratt
  2007-03-14  8:22 ` Zhai, Edwin
  0 siblings, 2 replies; 8+ messages in thread
From: Anthony Liguori @ 2007-03-13 19:32 UTC (permalink / raw)
  To: Ian Pratt, xen-devel

When thinking about multithreading the device model, it occurred to me 
that it's a little odd that we're doing a memcmp to determine which 
portions of the VRAM has changed.  Couldn't we just use dirty page 
tracking in the shadow paging code?  That should significantly lower the 
overhead of this plus I believe the infrastructure is already mostly 
there in the shadow2 code.

Is this a sane idea?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: vram_dirty vs. shadow paging dirty tracking
  2007-03-13 19:32 vram_dirty vs. shadow paging dirty tracking Anthony Liguori
@ 2007-03-13 21:02 ` Ian Pratt
  2007-03-13 21:30   ` Anthony Liguori
  2007-03-14  8:22 ` Zhai, Edwin
  1 sibling, 1 reply; 8+ messages in thread
From: Ian Pratt @ 2007-03-13 21:02 UTC (permalink / raw)
  To: Anthony Liguori, xen-devel

> When thinking about multithreading the device model, it occurred to me
> that it's a little odd that we're doing a memcmp to determine which
> portions of the VRAM has changed.  Couldn't we just use dirty page
> tracking in the shadow paging code?  That should significantly lower
> the
> overhead of this plus I believe the infrastructure is already mostly
> there in the shadow2 code.

Yep, its been in the roadmap doc for quite a while. However, the log
dirty code isn't ideal for this. We'd need to extend it to enable it to
be turned on for just a subset of the GFN range (we could use a xen
rangeset for this).

Even so, I'm not super keen on the idea of tearing down and rebuilding
1024 PTE's up to 50 times a second. 

A lower overhead solution would be to do scanning and resetting of the
dirty bits on the PTEs (and a global tlb flush). In the general case
this is tricky as the framebuffer could be mapped by multiple PTEs. In
practice, I believe this doesn't happen for either Linux or Windows.
There's always a good fallback of just returning 'all dirty' if the
heuristic is violated. Would be good to knock this up.

Best,
Ian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vram_dirty vs. shadow paging dirty tracking
  2007-03-13 21:02 ` Ian Pratt
@ 2007-03-13 21:30   ` Anthony Liguori
  2007-03-14  0:17     ` Ian Pratt
  0 siblings, 1 reply; 8+ messages in thread
From: Anthony Liguori @ 2007-03-13 21:30 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

Ian Pratt wrote:
>> When thinking about multithreading the device model, it occurred to me
>> that it's a little odd that we're doing a memcmp to determine which
>> portions of the VRAM has changed.  Couldn't we just use dirty page
>> tracking in the shadow paging code?  That should significantly lower
>> the
>> overhead of this plus I believe the infrastructure is already mostly
>> there in the shadow2 code.
>>     
>
> Yep, its been in the roadmap doc for quite a while. However, the log
> dirty code isn't ideal for this. We'd need to extend it to enable it to
> be turned on for just a subset of the GFN range (we could use a xen
> rangeset for this).
>   

Okay, I was curious if the log dirty stuff could do ranges.  I guess not.

> Even so, I'm not super keen on the idea of tearing down and rebuilding
> 1024 PTE's up to 50 times a second. 
>
> A lower overhead solution would be to do scanning and resetting of the
> dirty bits on the PTEs (and a global tlb flush).

Right, this is the approach I was assuming.  There's really no use in 
tearing down the whole PTE (since you would have to take an extraneous 
read fault).

> In the general case
> this is tricky as the framebuffer could be mapped by multiple PTEs. In
> practice, I believe this doesn't happen for either Linux or Windows.
>   

I wouldn't think so, but showing my ignorance for a moment, does shadow2 
not provide a mechanism to lookup VA's given a GFN?  This lookup could 
be cheap if the structures are built during shadow page table construction.

Sounds like this is a good long term goal but I think I'll stick with 
the threading as an intermediate goal.

I've got a minor concern that threading isn't going to help us much when 
dom0 is UP since the VGA scanning won't happen while an MMIO/PIO request 
happens.  With an SMP dom0, you could potentially do all the VGA 
scanning on one processor ensuring that qemu-dm wasn't ever "busy" when 
a request occurs.  I'm slightly concerned though that having a thread 
that's as CPU hungry as the VGA scanning may increase context-switches 
during the MMIO/PIO handling which would actually hurt performance.

We'll see soon enough though.

Regards,

Anthony Liguori

> There's always a good fallback of just returning 'all dirty' if the
> heuristic is violated. Would be good to knock this up.
>
> Best,
> Ian
>   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: Re: vram_dirty vs. shadow paging dirty tracking
  2007-03-13 21:30   ` Anthony Liguori
@ 2007-03-14  0:17     ` Ian Pratt
  0 siblings, 0 replies; 8+ messages in thread
From: Ian Pratt @ 2007-03-14  0:17 UTC (permalink / raw)
  To: Anthony Liguori, Ian Pratt; +Cc: xen-devel

> > Yep, its been in the roadmap doc for quite a while. However, the log
> > dirty code isn't ideal for this. We'd need to extend it to enable it
> to
> > be turned on for just a subset of the GFN range (we could use a xen
> > rangeset for this).
> >
> 
> Okay, I was curious if the log dirty stuff could do ranges.  I guess
> not.

It could certainly be added, but I prefer the dirty bit solution to this
particular problem. 
 
> > Even so, I'm not super keen on the idea of tearing down and
> rebuilding
> > 1024 PTE's up to 50 times a second.
> >
> > A lower overhead solution would be to do scanning and resetting of
> the
> > dirty bits on the PTEs (and a global tlb flush).
> 
> Right, this is the approach I was assuming.  There's really no use in
> tearing down the whole PTE (since you would have to take an extraneous
> read fault).
> 
> > In the general case
> > this is tricky as the framebuffer could be mapped by multiple PTEs.
> In
> > practice, I believe this doesn't happen for either Linux or Windows.
> >
> 
> I wouldn't think so, but showing my ignorance for a moment, does
> shadow2 not provide a mechanism to lookup VA's given a GFN?  This
lookup could
> be cheap if the structures are built during shadow page table
> construction.

No, it deliberately doesn't because threading all the PTEs that point to
a GFN can consume quite a bit of memory, introduces locking complexity
that will effect future scalability, and turns out to be completely
unnecessary for normal shadow mode operation because some simple
heuristics get a near-perfect hit rate.

> Sounds like this is a good long term goal but I think I'll stick with
> the threading as an intermediate goal.

Yes, that's more immediately useful, thanks.

> I've got a minor concern that threading isn't going to help us much
> when
> dom0 is UP since the VGA scanning won't happen while an MMIO/PIO
> request happens.  

I think the VGA scanning burns enough CPU to stand a good chance of
getting pre-empted when an MMIO/PIO request arrives. We need to make
sure there's no synchronization required that prevents this.

Best,
Ian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vram_dirty vs. shadow paging dirty tracking
  2007-03-13 19:32 vram_dirty vs. shadow paging dirty tracking Anthony Liguori
  2007-03-13 21:02 ` Ian Pratt
@ 2007-03-14  8:22 ` Zhai, Edwin
  2007-03-14 16:00   ` Anthony Liguori
  1 sibling, 1 reply; 8+ messages in thread
From: Zhai, Edwin @ 2007-03-14  8:22 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Ian Pratt, xen-devel

On Tue, Mar 13, 2007 at 02:32:56PM -0500, Anthony Liguori wrote:
> When thinking about multithreading the device model, it occurred to me 
> that it's a little odd that we're doing a memcmp to determine which 
> portions of the VRAM has changed.  Couldn't we just use dirty page 

we made this code to improve the user vnc responsiveness long before.
now QEMU has new vnc implementation to resolve this issue and this code 
introduce perf drop for guest of linux with X or windows.

so i'd like to send a patch to revert it and make a proper solution in future.

thanks,

> tracking in the shadow paging code?  That should significantly lower the 
> overhead of this plus I believe the infrastructure is already mostly 
> there in the shadow2 code.
> 
> Is this a sane idea?
> 
> Regards,
> 
> Anthony Liguori
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

-- 
best rgds,
edwin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vram_dirty vs. shadow paging dirty tracking
  2007-03-14  8:22 ` Zhai, Edwin
@ 2007-03-14 16:00   ` Anthony Liguori
  2007-03-15  2:59     ` Zhai, Edwin
  2007-03-15  3:22     ` Dong, Eddie
  0 siblings, 2 replies; 8+ messages in thread
From: Anthony Liguori @ 2007-03-14 16:00 UTC (permalink / raw)
  To: Zhai, Edwin; +Cc: Ian Pratt, xen-devel

Zhai, Edwin wrote:
> On Tue, Mar 13, 2007 at 02:32:56PM -0500, Anthony Liguori wrote:
>   
>> When thinking about multithreading the device model, it occurred to me 
>> that it's a little odd that we're doing a memcmp to determine which 
>> portions of the VRAM has changed.  Couldn't we just use dirty page 
>>     
>
> we made this code to improve the user vnc responsiveness long before.
> now QEMU has new vnc implementation to resolve this issue and this code 
> introduce perf drop for guest of linux with X or windows.
>   

Compared to what, just updating the full screen 30 times a second?  I 
suspect that's not as bad as it sounds since SDL will be using a XShmImage.

The VNC minimization is done based on a timer however so sticking the 
timer stuff into a thread is still useful.  Of course, we should be able 
to quickly determine how useful this is by just changing SDL to update 
the whole image...

Regards,

Anthony Liguori

> so i'd like to send a patch to revert it and make a proper solution in future.
>
> thanks,
>
>   
>> tracking in the shadow paging code?  That should significantly lower the 
>> overhead of this plus I believe the infrastructure is already mostly 
>> there in the shadow2 code.
>>
>> Is this a sane idea?
>>
>> Regards,
>>
>> Anthony Liguori
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>     
>
>   

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vram_dirty vs. shadow paging dirty tracking
  2007-03-14 16:00   ` Anthony Liguori
@ 2007-03-15  2:59     ` Zhai, Edwin
  2007-03-15  3:22     ` Dong, Eddie
  1 sibling, 0 replies; 8+ messages in thread
From: Zhai, Edwin @ 2007-03-15  2:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Ian Pratt, xen-devel, Zhai, Edwin

On Wed, Mar 14, 2007 at 11:00:17AM -0500, Anthony Liguori wrote:
> Zhai, Edwin wrote:
> >On Tue, Mar 13, 2007 at 02:32:56PM -0500, Anthony Liguori wrote:
> >  
> >>When thinking about multithreading the device model, it occurred to me 
> >>that it's a little odd that we're doing a memcmp to determine which 
> >>portions of the VRAM has changed.  Couldn't we just use dirty page 
> >>    
> >
> >we made this code to improve the user vnc responsiveness long before.
> >now QEMU has new vnc implementation to resolve this issue and this code 
> >introduce perf drop for guest of linux with X or windows.
> >  
> 
> Compared to what, just updating the full screen 30 times a second?  I 
> suspect that's not as bad as it sounds since SDL will be using a XShmImage.

removing the memcpy and having whole screen update each time has better 
performance.

> 
> The VNC minimization is done based on a timer however so sticking the 
> timer stuff into a thread is still useful.  Of course, we should be able 
> to quickly determine how useful this is by just changing SDL to update 
> the whole image...
> 
> Regards,
> 
> Anthony Liguori
> 
> >so i'd like to send a patch to revert it and make a proper solution in 
> >future.
> >
> >thanks,
> >
> >  

-- 
best rgds,
edwin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: vram_dirty vs. shadow paging dirty tracking
  2007-03-14 16:00   ` Anthony Liguori
  2007-03-15  2:59     ` Zhai, Edwin
@ 2007-03-15  3:22     ` Dong, Eddie
  1 sibling, 0 replies; 8+ messages in thread
From: Dong, Eddie @ 2007-03-15  3:22 UTC (permalink / raw)
  To: Anthony Liguori, Zhai, Edwin; +Cc: Ian Pratt, xen-devel

> 
> Compared to what, just updating the full screen 30 times a second?  I
> suspect that's not as bad as it sounds since SDL will be using a
> XShmImage. 
> 
It depends on how you run the benchmark. In case of multiple VMs case
where multiple Qemus (say 8) are running, this kind of comparation eat
unacceptable cpu cycles.
Eddie

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-03-15  3:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-13 19:32 vram_dirty vs. shadow paging dirty tracking Anthony Liguori
2007-03-13 21:02 ` Ian Pratt
2007-03-13 21:30   ` Anthony Liguori
2007-03-14  0:17     ` Ian Pratt
2007-03-14  8:22 ` Zhai, Edwin
2007-03-14 16:00   ` Anthony Liguori
2007-03-15  2:59     ` Zhai, Edwin
2007-03-15  3:22     ` Dong, Eddie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.