From mboxrd@z Thu Jan 1 00:00:00 1970 From: xuehai zhang Subject: Re: open/stat64 syscalls run faster on Xen VM than standard Linux Date: Mon, 28 Nov 2005 11:16:56 -0600 Message-ID: <438B3B88.8040702@cs.uchicago.edu> References: <907625E08839C4409CE5768403633E0B0EAAC5@sefsexmb1.amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <907625E08839C4409CE5768403633E0B0EAAC5@sefsexmb1.amd.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: "Petersson, Mats" Cc: Kate Keahey , Xen Mailing List , Tim Freeman List-Id: xen-devel@lists.xenproject.org Petersson, Mats wrote: >>-----Original Message----- >>From: xuehai zhang [mailto:hai@cs.uchicago.edu] >>Sent: 28 November 2005 15:51 >>To: Petersson, Mats >>Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on >>Xen VM than standard Linux >> >>Mats, >> >>I mounted the loopback file in dom0, chrooted to the >>mountpoint and redid the experiment. The results is attached >>below. The time of open and stat64 calls is similar to the >>XenLinux case and also much smaller than the standard Linux >>case. So, either using loopback file as backend of XenLinux >>or directly mounting it in local filesystem will result in >>some benefit (maybe just caused by the extra layer of block >>caching) for the performance of some system calls. > > > Yes, I think the caching of the blocks in two layers will be the reason > you get this effect. The loopback file is cached once in the fs handling > the REAL HARD DISK, and then other blocks would be cached in the fs > handling the loopback. Is "the fs handling the REAL HD" the dom0's filesystem? Is the cache used here is the dom0's disk buffer cache or something else? What is "the fs handling the loopback"? Is the filesystem seen inside of the XenLinux or still the filesystem of dom0? What is the cache used in this case? > In this case the directory of the file(s) > involved in your benchmark are probably held entirely in memory, whilst > when you use a real disk to do the same thing, you could end up with > some "real" accesses to the disk device itself. To confirm our hypothesis that two layer block caching is the real cause, what experiments I can do to show exactly a block is accessed from a cache instead of hard disk on XenLinux but it has to be read from hard disk on stand Linux? Maybe I can use "vmstat" in dom0 to track block receive/send during the execution of the benchmark. > Next question will probably be why write is slower in Xen+Linux than > native Linux - something I can't say for sure, but I would expect it to > be because the write is going through Xen in the Xen+Linux case and > straight through Linux when in the native linux case. But that's just a > guess. [And since it's slower in Xen, I don't expect you to be surprised > by this]. And the write call is almost identical to the Linux native, as > you'd expect. I also agree the overhead of write system call in VM is caused by Xen. I actually run a "dd" benchmark to create a disk file from /dev/zero on both machines and the VM is slower than the physical machine as we expect. So, the benchmark experiments I've done so far suggests XenLinux using loopback files as VBD backends shows better performance (faster execution) on part of the system calls like open and stat64, but it shows worse performance (slower execution) on other system calls like write than the standard Linux. Does this mean different applications may have different execution behaviors on VM than on the standard Linux? In other words, some applications run faster on VM and some slower, comparing with the physical machine? Thanks. Xuehai >># strace -c /bin/sh -c /bin/echo foo >> >>% time seconds usecs/call calls errors syscall >>------ ----------- ----------- --------- --------- ---------------- >> 21.93 0.000490 490 1 write >> 16.34 0.000365 24 15 old_mmap >> 15.26 0.000341 38 9 3 open >> 9.62 0.000215 43 5 read >> 7.97 0.000178 10 18 brk >> 7.79 0.000174 87 2 munmap >> 4.07 0.000091 8 11 rt_sigaction >> 3.27 0.000073 12 6 close >> 2.91 0.000065 11 6 fstat64 >> 2.28 0.000051 9 6 rt_sigprocmask >> 2.15 0.000048 24 2 access >> 1.75 0.000039 13 3 uname >> 1.66 0.000037 19 2 stat64 >> 0.40 0.000009 9 1 getpgrp >> 0.40 0.000009 9 1 getuid32 >> 0.36 0.000008 8 1 time >> 0.36 0.000008 8 1 getppid >> 0.36 0.000008 8 1 getgid32 >> 0.31 0.000007 7 1 getpid >> 0.27 0.000006 6 1 execve >> 0.27 0.000006 6 1 geteuid32 >> 0.27 0.000006 6 1 getegid32 >>------ ----------- ----------- --------- --------- ---------------- >>100.00 0.002234 95 3 total >> >>Thanks. >> >>Xuehai >> >> >>Petersson, Mats wrote: >> >>>>-----Original Message----- >>>>From: xen-devel-bounces@lists.xensource.com >>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Anthony >>>>Liguori >>>>Sent: 28 November 2005 14:39 >>>>To: xuehai zhang >>>>Cc: Xen Mailing List >>>>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM >>>>than standard Linux >>>> >>>>This may just be the difference between having the extra level of >>>>block caching from using a loop back device. >>>> >>>>Try running the same benchmark on a domain that uses an actual >>>>partition. While the syscalls may appear to be faster, I >> >>imagine it's >> >>>>because the cost of pulling in a block has already been >> >>payed so the >> >>>>overall workload is unaffected. >>> >>> >>>And this would be the same as running standard linux with >> >>the loopback >> >>>file-system mounted and chroot to the local file-system, or >> >>would that >> >>>be different? [I'm asking because I don't actually >> >>understand enough >> >>>about how it works to know what difference it makes, and I >> >>would like >> >>>to know, because at some point I'll probably need to know this.] >>> >>>-- >>>Mats >>> >>> >>>>Regards, >>>> >>>>Anthony Liguori >>>> >>>>xuehai zhang wrote: >>> >>>[snip] >>> >>> >> >> >> > >