From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60541) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJj3r-0007rl-8e for Qemu-devel@nongnu.org; Sat, 01 Mar 2014 07:30:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WJj3l-0003Bo-DS for Qemu-devel@nongnu.org; Sat, 01 Mar 2014 07:30:35 -0500 Received: from mail-pd0-f173.google.com ([209.85.192.173]:35938) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WJj3l-0003B7-0p for Qemu-devel@nongnu.org; Sat, 01 Mar 2014 07:30:29 -0500 Received: by mail-pd0-f173.google.com with SMTP id z10so1864699pdj.4 for ; Sat, 01 Mar 2014 04:30:27 -0800 (PST) Message-ID: <5311D2DD.4090904@ozlabs.ru> Date: Sat, 01 Mar 2014 23:30:21 +1100 From: Alexey Kardashevskiy MIME-Version: 1.0 References: <6777CD901FD53644B082307CD745FB8326AC9348@SACEXCMBX02-PRD.hq.netapp.com> <530AC892.2080109@ozlabs.ru> In-Reply-To: <530AC892.2080109@ozlabs.ru> Content-Type: text/plain; charset=KOI8-R Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] e1000 memory corruption in guest OS List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Hoyer, David" , "Qemu-devel@nongnu.org" Cc: "Moyer, Keith" , "Best, Tish" On 02/24/2014 03:20 PM, Alexey Kardashevskiy wrote: > On 02/16/2014 03:29 PM, Hoyer, David wrote: > >> We are using Qemu-1.7.0 with Xen-4.3.0 and Debian jessie. We are >> noticing that when we transfer large files from our network to the >> guestOS via the e1000 virtual network device that we experience memory >> corruption on the guestOS. We have debugged this problem and have >> determined where it appears that the corruption is happening and have >> created a patch file with a fix (at least the corruption is no longer >> happening on our guestOS anymore). Note that our test file is a >> large file consisting of the value 0x61 repeated over the entire file. >> > >> To troubleshoot this issue, we enabled tracing in qemu and used the >> xen_map_cache and xen_map_cache_return trace events. We also added some >> of our own debug statements in e1000.c before and after the function >> call to DMA the network packet to the guestOS descriptor address. Below >> is a commented summary of the trace output: >> > >> /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted >> xen_map_cache want 0xe00000 >> xen_map_cache_return 0x7f15c313f000 >> /*** It wasn't corrupted before the dma write >> /*** DMA a packet of length 0x5aa containing '0x61616161...' to guestOS at address 0x12ffac2 (which maps to 0x7f15c313eac2) >> dma write to 12ffac2 len 5aa >> xen_map_cache want 0x12ffac2 >> xen_map_cache_return 0x7f15c313eac2 >> /*** Check if guestOS address 0xe00000 (which maps to 0x7f15c313f000) is corrupted >> xen_map_cache want 0xe00000 >> xen_map_cache_return 0x7f15c313f000 >> /*** It is corrupted now. >> e1000: Corrupted 7: test_buf:5aa 5aa >> > >> The DMA address 0x12ffac2 mapped to 0x7f15c313eac2. When you add the >> packet length, 0x5aa, the result is 0x7f15c313f06c. This result is 0x6c >> bytes into the mapping of guestOS address 0xe00000, which mapped to >> 0x7f15c313f000. If you dump 0xe00000 in the guestOS, 0x6c bytes are >> corrupted. > >> We believe that the correct fix is to use qemu_ram_ptr_length instead of >> qemu_get_ram_ptr in the function address_space_rw to ensure (from what >> we can tell) that the mapped address is valid for the entire length >> specified. It looked like this might also be an issue in >> cpu_physical_memory_write_rom so we made the change there as well. >> > > > Corrupted DMA buffer is 0x e00000 -- 0x7f15c313f000. > The e1000 packet is at 0x12ffac2 -- 0x7f15c313eac2. > > (0x7f15c313f000 - 0x7f15c313eac2) = 0x53e which is less than 0x5aa and > (0x5aa - 0x53e) = 0x6c bytes get corrupted. > > I see here buffer overrun from e1000 and I suspect that your patch just > hides this problem. What did I miss? Ping, anyone? > Does e1000 still work with the patch applied? Are all 100% packets > delivered fine? > > > > >> We are fairly new to the qemu source base so we are looking to the >> community to see if this problem has previously been identified and to >> see if this is the correct fix. >> > > >> Following is the patch >> >> --- orig/exec.c 2013-11-27 16:52:55.000000000 -0600 >> +++ new/exec.c 2014-02-15 21:58:34.311518000 -0600 >> @@ -1911,7 +1911,7 @@ >> } else { >> addr1 += memory_region_get_ram_addr(mr); >> /* RAM case */ >> - ptr = qemu_get_ram_ptr(addr1); >> + ptr = qemu_ram_ptr_length(addr1, &l); >> memcpy(ptr, buf, l); >> invalidate_and_set_dirty(addr1, l); >> } >> @@ -1945,7 +1945,7 @@ >> } >> } else { >> /* RAM case */ >> - ptr = qemu_get_ram_ptr(mr->ram_addr + addr1); >> + ptr = qemu_ram_ptr_length(mr->ram_addr + addr1, &l); >> memcpy(buf, ptr, l); >> } >> } >> @@ -1995,7 +1995,7 @@ >> } else { >> addr1 += memory_region_get_ram_addr(mr); >> /* ROM/RAM case */ >> - ptr = qemu_get_ram_ptr(addr1); >> + ptr = qemu_ram_ptr_length(addr1, &l); >> memcpy(ptr, buf, l); >> invalidate_and_set_dirty(addr1, l); >> } >> >> >> David Hoyer >> Controller Firmware Development >> Array Products Group >> >> NetApp >> 3718 N. Rock Road >> Wichita, KS 67226 >> 316-636-8047 phone >> 316-617-3677 mobile >> David.Hoyer@netapp.com >> netapp.com -- Alexey