qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages
@ 2017-11-12  9:26 Chunguang Li
  2017-11-15  9:45 ` Juan Quintela
  2017-11-15 10:11 ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 8+ messages in thread
From: Chunguang Li @ 2017-11-12  9:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: dgilbert, quintela, amit.shah, pbonzini, stefanha

Hi all!

I got a very abnormal observation for the VM migration. I found that many pages marked as dirty during migration are "not really dirty", which is, their content are the same as the old version.




I did the migration experiment like this:

During the setup phase of migration, first I suspended the VM. Then I copied all the pages within the guest physical address space to a memory buffer as large as the guest memory size. After that, the dirty tracking began and I resumed the VM. Besides, at the end
of each iteration, I also suspended the VM temporarily. During the suspension, I compared the content of all the pages marked as dirty in this iteration byte-by-byte with their former copies inside the buffer. If the content of one page was the same as its former copy, I recorded it as a "write-not-dirty" page (the page is written exactly with the same content as the old version). Otherwise, I replaced this page in the buffer with the new content, for the possible comparison in the future. After the reset of the dirty bitmap, I resumed the VM. Thus, I obtain the proportion of the write-not-dirty pages within all the pages marked as dirty for each pre-copy iteration.

I repeated this experiment with 15 workloads, which are 11 CPU2006 benchmarks, Memcached server, kernel compilation, playing a video, and an idle VM. The CPU2006 benchmarks and Memcached are write-intensive workloads. So almost all of them did not converge to stop-copy.




Startlingly, the proportions of the write-not-dirty pages are quite high. Memcached and three CPU2006 benchmarks(zeusmp, mcf and bzip2) have the most high proportions. Their proportions of the write-not-dirty pages within all the dirty pages are as high as 45%-80%. The proportions of the other workloads are about 5%-20%, which are also abnormal. According to my intuition, the proportion of write-not-dirty pages should be far less than these numbers. I think it should be quite a particular case that one page is written with exactly the same content as the former data.

Besides, the zero pages are not counted for all the results. Because I think codes like memset() may write large area of pages to zero pages, which are already zero pages before.




I excluded some possible unknown reasons with the machine hardware, because I repeated the experiments with two sets of different machines. Then I guessed it might be related with the huge page feature. However, the result was the same when I turned the huge page feature off in the OS.




Now there are only two possible reasons in my opinion. 

First, there is some bugs in the KVM kernel dirty tracking mechanism. It may mark some pages that do not receive write request as dirty.

Second, there is some bugs in the OS running inside the VM. It may issue some unnecessary write requests.




What do you think about this abnormal phenomenon? Any advice or possible reasons or even guesses? I appreciate any responses, because it has confused me for a long time. Thank you.


--
Chunguang Li, Ph.D. Candidate
Wuhan National Laboratory for Optoelectronics (WNLO)
Huazhong University of Science & Technology (HUST)
Wuhan, Hubei Prov., China


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages
@ 2017-11-15  6:24 Chunguang Li
  0 siblings, 0 replies; 8+ messages in thread
From: Chunguang Li @ 2017-11-15  6:24 UTC (permalink / raw)
  To: lichunguang; +Cc: qemu-devel, dgilbert, quintela, amit.shah, pbonzini, stefanha

Some more details about this experiment:

The host is running Ubuntu-16.04 with 4.4.0 Linux kernel and QEMU-2.5.1; The
guest is running Ubuntu-12.04, except Memcached with Ubuntu-16.04.

 

The exact numbers of the proportions of write-not-dirty pages for the first
2 pre-copy iterations: (0.445 means 44.5%)

Memcached:  0.445, 0.478

Zeusmp:      0.670, 0.727

Mcf:         0.808, 0.793

Bzip2:        0.464, 0.447

Milc:         0.341, 0.037

cactusADM:   0.280, 0.248

lbm:         0.090, 0.037

GemsFDTD:   0.226, 0.172

Bwaves:      0.069, 0.003

Astar:        0.113, 0.039

Xalancbmk:   0.082, 0.041

Wrf:         0.141, 0.073

 

Any advice? Looking forward to any response. Thank you.

 

Chunguang

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-16  3:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-12  9:26 [Qemu-devel] Abnormal observation during migration: too many "write-not-dirty" pages Chunguang Li
2017-11-15  9:45 ` Juan Quintela
2017-11-15 14:22   ` Chunguang Li
2017-11-15 10:11 ` Dr. David Alan Gilbert
2017-11-15 13:41   ` Chunguang Li
2017-11-15 14:23     ` Dr. David Alan Gilbert
2017-11-16  3:01       ` Chunguang Li
  -- strict thread matches above, loose matches on Subject: below --
2017-11-15  6:24 Chunguang Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).