From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:34852) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TWZBv-0007Pa-Ma for qemu-devel@nongnu.org; Thu, 08 Nov 2012 15:59:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TWZBu-0002VZ-Ik for qemu-devel@nongnu.org; Thu, 08 Nov 2012 15:59:11 -0500 Received: from e4.ny.us.ibm.com ([32.97.182.144]:50625) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TWZBu-0002TG-Ed for qemu-devel@nongnu.org; Thu, 08 Nov 2012 15:59:10 -0500 Received: from /spool/local by e4.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 8 Nov 2012 15:58:53 -0500 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 09A8B6E8036 for ; Thu, 8 Nov 2012 15:58:16 -0500 (EST) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id qA8KwF23303846 for ; Thu, 8 Nov 2012 15:58:15 -0500 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id qA8KwEWb014271 for ; Thu, 8 Nov 2012 13:58:14 -0700 Message-ID: <509C1CE4.1060305@linux.vnet.ibm.com> Date: Thu, 08 Nov 2012 15:58:12 -0500 From: "Jason J. Herne" MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Bug-fix: Align cpu_physical_memory_set_dirty_flags() addr to page boundary Reply-To: jjherne@linux.vnet.ibm.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Christian Borntraeger As part of some Qemu Migration related development work I'm doing I stumbled upon what appears to be a bug (patch to follow in separate email). exec-obsolete.h : cpu_physical_memory_set_dirty_flags() seems to assume the caller provided a page boundary aligned address. Some code paths call cpu_physical_memory_set_dirty_flags() with an address that is not on a page boundary. The subsequent call to cpu_physical_memory_get_dirty is assuming page boundary alignment because it hard codes a length of TARGET_PAGE_SIZE. This causes problems when the target address lies within a page whose "migration dirty bit" is NOT set, but the following page's "migration dirty bit" is set. In this case, cpu_physical_memory_get_dirty will claim that the page is already dirty when it is not. cpu_physical_memory_set_dirty_flags then skips incrementing ram_list.dirty_pages but still updates the target page's dirty bit with the following code: ram_list.phys_dirty[addr >> TARGET_PAGE_BITS] |= dirty_flags; This causes the counter (ram_list.dirty_pages) to be less than the actual number of dirty bits. This can cause our migration remaining ram counter to underflow and can even hang migration in some cases. In my development/test environment (non-x86 platform) I am experiencing this problem fairly frequently. I'm wondering if anyone knows if cpu_physical_memory_set_dirty_flags() should be performing a page boundary alignment on the target address or if there is some reason this is a bad idea? -- -- Jason J. Herne (jjherne@linux.vnet.ibm.com)