* [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty...
@ 2014-11-16 18:11 Peter Maydell
2014-11-18 14:53 ` Stefan Hajnoczi
0 siblings, 1 reply; 2+ messages in thread
From: Peter Maydell @ 2014-11-16 18:11 UTC (permalink / raw)
To: QEMU Developers; +Cc: Paolo Bonzini, Stefan Hajnoczi
I'm trying to track down a bug in ARM TCG where we:
* boot a guest
* run 'shutdown -r now' to trigger a reboot
* on reboot, crash when running userspace because the contents
of physical RAM have changed but the translated code from
before the shutdown was never invalidated
This is with a virtio-mmio block device as the disk.
Debugging indicates that when the post-reboot guest reloads
binaries from disk into ram we fail to invalidate the cached
translations. For the specific case I looked at, we have a
translation of code at ramaddr_t 0x806e000. The disk load
pulls 0x16000 bytes of data off disk to address 0x806a000.
Virtio correctly calls address_space_unmap(), which is supposed
to be what marks the ram range as dirty. It in turn calls
invalidate_and_set_clean(). However invalidate_and_set_clean()
just does this:
if (cpu_physical_memory_is_clean(addr)) {
/* invalidate code */
tb_invalidate_phys_page_range(addr, addr + length, 0);
/* set dirty bit */
cpu_physical_memory_set_dirty_range_nocode(addr, length);
}
So if the first page in the range (here 0x806a000) happens
to be dirty then we won't do anything, even if later pages
in the range do need to be invalidated. Also, we'll call
tb_invalidate_phys_page_range() with a start/end which may
be in different physical pages, which is forbidden by that
function's API.
I guess invalidate_and_set_clean() really needs to be
fixed to loop through each page in the range; does anybody
know how this is supposed to work (or why nobody's noticed
this bug before :-)) ?
thanks
-- PMM
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty...
2014-11-16 18:11 [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty Peter Maydell
@ 2014-11-18 14:53 ` Stefan Hajnoczi
0 siblings, 0 replies; 2+ messages in thread
From: Stefan Hajnoczi @ 2014-11-18 14:53 UTC (permalink / raw)
To: Peter Maydell; +Cc: Paolo Bonzini, QEMU Developers
[-- Attachment #1: Type: text/plain, Size: 2026 bytes --]
On Sun, Nov 16, 2014 at 06:11:48PM +0000, Peter Maydell wrote:
> I'm trying to track down a bug in ARM TCG where we:
> * boot a guest
> * run 'shutdown -r now' to trigger a reboot
> * on reboot, crash when running userspace because the contents
> of physical RAM have changed but the translated code from
> before the shutdown was never invalidated
>
> This is with a virtio-mmio block device as the disk.
>
> Debugging indicates that when the post-reboot guest reloads
> binaries from disk into ram we fail to invalidate the cached
> translations. For the specific case I looked at, we have a
> translation of code at ramaddr_t 0x806e000. The disk load
> pulls 0x16000 bytes of data off disk to address 0x806a000.
> Virtio correctly calls address_space_unmap(), which is supposed
> to be what marks the ram range as dirty. It in turn calls
> invalidate_and_set_clean(). However invalidate_and_set_clean()
> just does this:
>
> if (cpu_physical_memory_is_clean(addr)) {
> /* invalidate code */
> tb_invalidate_phys_page_range(addr, addr + length, 0);
> /* set dirty bit */
> cpu_physical_memory_set_dirty_range_nocode(addr, length);
> }
>
> So if the first page in the range (here 0x806a000) happens
> to be dirty then we won't do anything, even if later pages
> in the range do need to be invalidated. Also, we'll call
> tb_invalidate_phys_page_range() with a start/end which may
> be in different physical pages, which is forbidden by that
> function's API.
>
> I guess invalidate_and_set_clean() really needs to be
> fixed to loop through each page in the range; does anybody
> know how this is supposed to work (or why nobody's noticed
> this bug before :-)) ?
Not directly but I don't like this code because it's not atomic. I'll
send patches soon for atomic test-and-set and test-and-clear. Hopefully
it won't impact performance too much.
What you've discovered seems like a plain old bug. It needs a loop.
Stefan
[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2014-11-18 14:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-16 18:11 [Qemu-devel] exec.c:invalidate_and_set_dirty() only checks whether first page in its range is dirty Peter Maydell
2014-11-18 14:53 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).