All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: open/stat64 syscalls run faster on Xen VM than standard Linux
@ 2005-11-28 16:15 Petersson, Mats
  2005-11-28 17:16 ` xuehai zhang
  0 siblings, 1 reply; 21+ messages in thread
From: Petersson, Mats @ 2005-11-28 16:15 UTC (permalink / raw)
  To: xuehai zhang; +Cc: Kate Keahey, Xen Mailing List, Tim Freeman

> -----Original Message-----
> From: xuehai zhang [mailto:hai@cs.uchicago.edu] 
> Sent: 28 November 2005 15:51
> To: Petersson, Mats
> Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman
> Subject: Re: [Xen-devel] open/stat64 syscalls run faster on 
> Xen VM than standard Linux
> 
> Mats,
> 
> I mounted the loopback file in dom0, chrooted to the 
> mountpoint and redid the experiment. The results is attached 
> below. The time of open and stat64 calls is similar to the 
> XenLinux case and also much smaller than the standard Linux 
> case. So, either using loopback file as backend of XenLinux 
> or directly mounting it in local filesystem will result in 
> some benefit (maybe just caused by the extra layer of block 
> caching) for the performance of some system calls.

Yes, I think the caching of the blocks in two layers will be the reason
you get this effect. The loopback file is cached once in the fs handling
the REAL HARD DISK, and then other blocks would be cached in the fs
handling the loopback. In this case the directory of the file(s)
involved in your benchmark are probably held entirely in memory, whilst
when you use a real disk to do the same thing, you could end up with
some "real" accesses to the disk device itself. 

Next question will probably be why write is slower in Xen+Linux than
native Linux - something I can't say for sure, but I would expect it to
be because the write is going through Xen in the Xen+Linux case and
straight through Linux when in the native linux case. But that's just a
guess. [And since it's slower in Xen, I don't expect you to be surprised
by this]. And the write call is almost identical to the Linux native, as
you'd expect. 

--
Mats
> 
> # strace -c /bin/sh -c /bin/echo foo
> 
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   21.93    0.000490         490         1           write
>   16.34    0.000365          24        15           old_mmap
>   15.26    0.000341          38         9         3 open
>    9.62    0.000215          43         5           read
>    7.97    0.000178          10        18           brk
>    7.79    0.000174          87         2           munmap
>    4.07    0.000091           8        11           rt_sigaction
>    3.27    0.000073          12         6           close
>    2.91    0.000065          11         6           fstat64
>    2.28    0.000051           9         6           rt_sigprocmask
>    2.15    0.000048          24         2           access
>    1.75    0.000039          13         3           uname
>    1.66    0.000037          19         2           stat64
>    0.40    0.000009           9         1           getpgrp
>    0.40    0.000009           9         1           getuid32
>    0.36    0.000008           8         1           time
>    0.36    0.000008           8         1           getppid
>    0.36    0.000008           8         1           getgid32
>    0.31    0.000007           7         1           getpid
>    0.27    0.000006           6         1           execve
>    0.27    0.000006           6         1           geteuid32
>    0.27    0.000006           6         1           getegid32
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.002234                    95         3 total
> 
> Thanks.
> 
> Xuehai
> 
> 
> Petersson, Mats wrote:
> >>-----Original Message-----
> >>From: xen-devel-bounces@lists.xensource.com
> >>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Anthony 
> >>Liguori
> >>Sent: 28 November 2005 14:39
> >>To: xuehai zhang
> >>Cc: Xen Mailing List
> >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM 
> >>than standard Linux
> >>
> >>This may just be the difference between having the extra level of 
> >>block caching from using a loop back device.
> >>
> >>Try running the same benchmark on a domain that uses an actual 
> >>partition.  While the syscalls may appear to be faster, I 
> imagine it's 
> >>because the cost of pulling in a block has already been 
> payed so the 
> >>overall workload is unaffected.
> > 
> > 
> > And this would be the same as running standard linux with 
> the loopback 
> > file-system mounted and chroot to the local file-system, or 
> would that 
> > be different? [I'm asking because I don't actually 
> understand enough 
> > about how it works to know what difference it makes, and I 
> would like 
> > to know, because at some point I'll probably need to know this.]
> > 
> > --
> > Mats
> > 
> >>Regards,
> >>
> >>Anthony Liguori
> >>
> >>xuehai zhang wrote:
> > 
> > [snip]
> > 
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: open/stat64 syscalls run faster on Xen VM than standard Linux
@ 2005-11-29 12:59 Petersson, Mats
  2005-11-29 13:48 ` xuehai zhang
  0 siblings, 1 reply; 21+ messages in thread
From: Petersson, Mats @ 2005-11-29 12:59 UTC (permalink / raw)
  To: xuehai zhang, Anthony Liguori; +Cc: Tim Freeman, Kate Keahey, Xen Mailing List

 

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> xuehai zhang
> Sent: 29 November 2005 12:40
> To: Anthony Liguori
> Cc: Kate Keahey; Tim Freeman; Xen Mailing List
> Subject: Re: [Xen-devel] open/stat64 syscalls run faster on 
> Xen VM than standard Linux
> 
> Anthony Liguori wrote:
> > This may just be the difference between having the extra level of 
> > block caching from using a loop back device.
> > 
> > Try running the same benchmark on a domain that uses an actual 
> > partition.  While the syscalls may appear to be faster, I 
> imagine it's 
> > because the cost of pulling in a block has already been 
> payed so the 
> > overall workload is unaffected.
> 
> I created a new domainU by using the physical partition 
> instead of the loopback file as the backends of the VBDs and 
> I reran the "strace -c /bin/sh -c /bin/echo foo" benchmark 
> inside of the domU. The following are the results. Comparing 
> with the results for domU with loopback files as VBDs I 
> reported in the previous email (quoted below), the average 
> time of open/stat64 syscalls are very similar, but still much 
> smaller than the values for standard Linux. If the reason 
> that open/stat64 run faster on domU with loopback files as 
> VBDs is because of the extra level of block caching from 
> using a loop back device, why open/stat64 still run 
> similarily faster on domU with physical partition as VBDs 
> when there is no extra level of block caching from using a 
> loop back device?

Again, have you run the same test-case on the same physical partition?
You should be able to do the same trick with mount and chroot for this
one too. 

This is just to rule out other variations (where in the file-system
different files live, just as an example). 

I'm also not entirely convinced that you don't get some double-caching
with this method too - there is the real hard-disk driver in Dom0, then
the virtual disk driver in DomU, and the file-system driver in DomU. I
don't know if the real or virtual block-device driver itself will cache
things or not. Someone who knows may want to comment on this... 

Still, I agree with Anthony - run some REAL loads on the system, rather
than microbenchmark a particular, very small function of the whole
system. Microbenchmarks are very good at showing how long some
particular thing takes, but they are also very sensitive to extremely
small changes in the system architecture, which for a bigger load would
not be noticable. 

I'm doing my best to explain things here, but I'm not sure if I'm
getting the point across, as it seems like you're trying very hard to
get back to the same original question. We do know that Xen behaves
different than Linux. One reason for this is that it's dealing with
hardware devices through a virtual device. This will most of the time
slow things down, but as you've seen, for example due to the
double-caching effect, you may sometimes actually gain from it,
particularly with very small workloads. 

--
Mats
> 
> XenLinux (physical partition as VBDs)
> 
> root@cctest1:~/c# strace -c /bin/sh -c /bin/echo foo
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   39.56    0.001955        1955         1           write
>   18.94    0.000936         936         1           execve
>    7.65    0.000378          24        16           old_mmap
>    7.57    0.000374          42         9         2 open
>    6.27    0.000310          52         6           read
>    5.10    0.000252          84         3           munmap
>    4.92    0.000243           9        26           brk
>    1.92    0.000095          14         7           close
>    1.78    0.000088           8        11           rt_sigaction
>    1.40    0.000069          10         7           fstat64
>    1.01    0.000050           8         6           rt_sigprocmask
>    0.93    0.000046          23         2           access
>    0.79    0.000039          13         3           uname
>    0.69    0.000034          17         2           stat64
>    0.38    0.000019          19         1           ioctl
>    0.16    0.000008           8         1           getppid
>    0.16    0.000008           8         1           getpgrp
>    0.14    0.000007           7         1           time
>    0.14    0.000007           7         1           getuid32
>    0.14    0.000007           7         1           getgid32
>    0.12    0.000006           6         1           getpid
>    0.12    0.000006           6         1           getegid32
>    0.10    0.000005           5         1           geteuid32
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.004942                   109         2 total
> 
> Thanks.
> 
> Xuehai
> 
> 
> > xuehai zhang wrote:
> > 
> >> Dear all,
> >> When I debugged the execution performance of an application using 
> >> strace, I found there are some system calls like open and stat64 
> >> which run faster on XenLinux than the standard Linux. The 
> following 
> >> is the output of running "strace -c /bin/sh -c /bin/echo 
> foo" on both 
> >> systems. An open call runs averagely 109 usec on standard 
> Linux but 
> >> only 41 usecs on XenLinux. An stat64 call runs
> >> 75 usecs on standard Linux but only 19 usecs on XenLinux.
> >> The Xen VM runs on the same physical machine as the 
> standard Linux. 
> >> It uses loopback files in dom0 as the backends of VBDs.
> >> Any insight is highly appreciated.
> >> Thanks.
> >> Xuehai
> >>
> >> XenLinux:
> >> # strace -c /bin/sh -c /bin/echo foo
> >>
> >> % time     seconds  usecs/call     calls    errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >>  39.05    0.001972        1972         1           write
> >>  19.35    0.000977         977         1           execve
> >>   7.74    0.000391          24        16           old_mmap
> >>   7.23    0.000365          41         9         2 open
> >>   6.06    0.000306          51         6           read
> >>   5.17    0.000261          10        26           brk
> >>   4.93    0.000249          83         3           munmap
> >>   1.98    0.000100          14         7           close
> >>   1.90    0.000096           9        11           rt_sigaction
> >>   1.52    0.000077          11         7           fstat64
> >>   1.01    0.000051           9         6           rt_sigprocmask
> >>   0.95    0.000048          24         2           access
> >>   0.81    0.000041          14         3           uname
> >>   0.75    0.000038          19         2           stat64
> >>   0.38    0.000019          19         1           ioctl
> >>   0.18    0.000009           9         1           time
> >>   0.18    0.000009           9         1           getppid
> >>   0.16    0.000008           8         1           getpgrp
> >>   0.16    0.000008           8         1           getuid32
> >>   0.14    0.000007           7         1           getgid32
> >>   0.12    0.000006           6         1           getpid
> >>   0.12    0.000006           6         1           geteuid32
> >>   0.12    0.000006           6         1           getegid32
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 100.00    0.005050                   109         2 total
> >>
> >>
> >> Standard Linux:
> >> # strace -c /bin/sh -c /bin/echo foo
> >>
> >> % time     seconds  usecs/call     calls    errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >>  22.90    0.000982         109         9         2 open
> >>  22.85    0.000980         980         1           execve
> >>  10.87    0.000466          29        16           old_mmap
> >>  10.45    0.000448         448         1           write
> >>   7.06    0.000303          51         6           read
> >>   6.67    0.000286          10        30           brk
> >>   3.61    0.000155          78         2           access
> >>   3.50    0.000150          75         2           stat64
> >>   2.91    0.000125          42         3           munmap
> >>   2.24    0.000096          14         7           close
> >>   2.12    0.000091          13         7           fstat64
> >>   1.84    0.000079           7        11           rt_sigaction
> >>   1.03    0.000044           7         6           rt_sigprocmask
> >>   0.72    0.000031          10         3           uname
> >>   0.19    0.000008           8         1           geteuid32
> >>   0.16    0.000007           7         1           time
> >>   0.16    0.000007           7         1           getppid
> >>   0.16    0.000007           7         1           getpgrp
> >>   0.16    0.000007           7         1           getuid32
> >>   0.14    0.000006           6         1           getpid
> >>   0.14    0.000006           6         1           getgid32
> >>   0.12    0.000005           5         1           getegid32
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 100.00    0.004289                   112         2 total
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >>
> > 
> > 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: open/stat64 syscalls run faster on Xen VM than standard Linux
@ 2005-11-28 18:17 Petersson, Mats
  0 siblings, 0 replies; 21+ messages in thread
From: Petersson, Mats @ 2005-11-28 18:17 UTC (permalink / raw)
  To: xuehai zhang; +Cc: Kate Keahey, Xen Mailing List, Tim Freeman

> -----Original Message-----
> From: xuehai zhang [mailto:hai@cs.uchicago.edu] 
> Sent: 28 November 2005 17:17
> To: Petersson, Mats
> Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman
> Subject: Re: [Xen-devel] open/stat64 syscalls run faster on 
> Xen VM than standard Linux
> 
> Petersson, Mats wrote:
> >>-----Original Message-----
> >>From: xuehai zhang [mailto:hai@cs.uchicago.edu]
> >>Sent: 28 November 2005 15:51
> >>To: Petersson, Mats
> >>Cc: Anthony Liguori; Xen Mailing List; Kate Keahey; Tim Freeman
> >>Subject: Re: [Xen-devel] open/stat64 syscalls run faster on Xen VM 
> >>than standard Linux
> >>
> >>Mats,
> >>
> >>I mounted the loopback file in dom0, chrooted to the mountpoint and 
> >>redid the experiment. The results is attached below. The 
> time of open 
> >>and stat64 calls is similar to the XenLinux case and also 
> much smaller 
> >>than the standard Linux case. So, either using loopback file as 
> >>backend of XenLinux or directly mounting it in local 
> filesystem will 
> >>result in some benefit (maybe just caused by the extra 
> layer of block
> >>caching) for the performance of some system calls.
> > 
> > 
> > Yes, I think the caching of the blocks in two layers will be the 
> > reason you get this effect. The loopback file is cached 
> once in the fs 
> > handling the REAL HARD DISK, and then other blocks would be 
> cached in 
> > the fs handling the loopback.
> 
> Is "the fs handling the REAL HD" the dom0's filesystem? Is 
> the cache used here is the dom0's disk buffer cache or 
> something else? What is "the fs handling the loopback"? Is 
> the filesystem seen inside of the XenLinux or still the 
> filesystem of dom0? What is the cache used in this case?

In both native linux and XenLinux, it's the file-system that handles the
actual disk where the VBD-file lives. The software that manages the
file-system, aka file-system driver, will have a block-cache. Since
you're then mounting a VBD through a different (maybe the same type)
file-system driver, that handles this "disk", it will have it's
additional set of block caching. Of course, if a block is cached in the
upper level (VBD) file-system driver, the lower-level file-system driver
doesn't get the call, and thus no action needs to be taken there. It
also means that any access to a DIFFERENT block in the lower level
driver can be cached, because there's now more space in that cache,
since the first cached access didn't even touch the second driver. It's
similar to doubling the block-cache. [The exact effects of doubling the
block-cache would depend on it's internal architecture, and I know
NOTHING about this, so I can't say if it would have the same effect or
different effect]. 
>
> > In this case the directory of the file(s) involved in your 
> benchmark 
> > are probably held entirely in memory, whilst when you use a 
> real disk 
> > to do the same thing, you could end up with some "real" accesses to 
> > the disk device itself.
> 
> To confirm our hypothesis that two layer block caching is the 
> real cause, what experiments I can do to show exactly a block 
> is accessed from a cache instead of hard disk on XenLinux but 
> it has to be read from hard disk on stand Linux? Maybe I can 
> use "vmstat" in dom0 to track block receive/send during the 
> execution of the benchmark.

I think we've just proven that it's the VBD that causes the speed
difference. Within a few percent, your last test showed the same numbers
as the Xen-Linux combination - writes being slower on Xen-linux, which
is what I'd expect, but otherwise the system calls are pretty much the
same. 

I'd expect execve to be much slower on Xen+Linux, as it's messing about
with loading an application and doing page-table manipulations. 

We get similar times for old_mmap, open is about the same, read is
reasonably the same (slower on XenLinux, also expected, I think),
similar for brk, munmap, etc, etc. So the main differences between
XenLinux and native Linux is that you're running one with two levels of
block-caching, and if you don't use the VBD/Loop-back-mounted file as
your "disk", you only get one level of caching and thus slower access. 

> 
> > Next question will probably be why write is slower in 
> Xen+Linux than 
> > native Linux - something I can't say for sure, but I would 
> expect it 
> > to be because the write is going through Xen in the 
> Xen+Linux case and 
> > straight through Linux when in the native linux case. But 
> that's just 
> > a guess. [And since it's slower in Xen, I don't expect you to be 
> > surprised by this]. And the write call is almost identical to the 
> > Linux native, as you'd expect.
> 
> I also agree the overhead of write system call in VM is 
> caused by Xen. I actually run a "dd" 
> benchmark to create a disk file from /dev/zero on both 
> machines and the VM is slower than the physical machine as we expect.
> 
> So, the benchmark experiments I've done so far suggests 
> XenLinux using loopback files as VBD backends shows better 
> performance (faster execution) on part of the system calls 
> like open and stat64, but it shows worse performance (slower 
> execution) on other system calls like write than the standard 
> Linux. Does this mean different applications may have 
> different execution behaviors on VM than on the standard 
> Linux? In other words, some applications run faster on VM and 
> some slower, comparing with the physical machine?

Yes, but for larger (i.e., more realistic) workloads it would probably
even out, since the advantage of any cache is only really useful if you
get lots of hits in that cache, and the smaller cache isn't big enough
already. If you have a cache of, say, 2MB, then the difference between
2MB and 4MB when you access 3.5MB of data would be very noticable
[numbers just pulled from thin air, I have no idea of how much data a
testcase such as yours will access]. If, on the other hand, you run a
full-linux kernel compile that accesses several hundred megabytes of
source-code, include files and generating several megabyes of object
files, symbol files and end-product binary, it would be much less
noticable whether the cache is 2MB or 4MB or even 32MB. And of course,
this would perform a lot of read/write operations that would very likely
eat up any of the previosly gained performance benefit. 

However, performing REAL tasks would be a much better benchmark on the
Xen performance than to microbenchmark some special system call such as
popen, since the REAL task would actually be something that a Xen-Linux
user would perhaps do in their daily work. [Although compiling the
kernel is fairly unlikely, I can certainly see people doing
code-development under Xen-Linux]. 

Running a large database under Xen-Linux would be another likely task,
or a web-server. Some of this would obviously take longer, others may
run faster. I have no idea of which it is. My point is rather that
you're analyzing ONE C-library function, and drawing big conclusions
from it. Running a wider spread of loads on the system would give more
accurate average results. Averages are good for describing something
where you have many different samples, but as my father once said "If
you stand with one foot in ice, and the other in hot water, on average,
your feet are comfortable". So don't JUST look at the average, look at
the max/min values too to determine how things stand.

Yes, I do realize that testing the system (real world) performance in
such tests as Compiling the Linux kernel, or running an SQL database or
web-server are both more complicated, involve less (or no) code-writing
- so less fun, and will not test PARTICULAR points of the kernel, and
will spend a lot of time in user-mode running exactly identical code on
both systems. However, I think it's a better way to get a BIG picture
understanding of differences between Xen and native Linux. 

Btw, I would also suggest that it would make sense to test on a MODERN
system - a Pentium III must be at least 5 years old by now, and if the
rest of the components are equal age, then I suspect that the
performance on a modern machine is at least 2-3x better.


--
Mats
> 
> Thanks.
> 
> Xuehai
> 
> >># strace -c /bin/sh -c /bin/echo foo
> >>
> >>% time     seconds  usecs/call     calls    errors syscall
> >>------ ----------- ----------- --------- --------- ----------------
> >>  21.93    0.000490         490         1           write
> >>  16.34    0.000365          24        15           old_mmap
> >>  15.26    0.000341          38         9         3 open
> >>   9.62    0.000215          43         5           read
> >>   7.97    0.000178          10        18           brk
> >>   7.79    0.000174          87         2           munmap
> >>   4.07    0.000091           8        11           rt_sigaction
> >>   3.27    0.000073          12         6           close
> >>   2.91    0.000065          11         6           fstat64
> >>   2.28    0.000051           9         6           rt_sigprocmask
> >>   2.15    0.000048          24         2           access
> >>   1.75    0.000039          13         3           uname
> >>   1.66    0.000037          19         2           stat64
> >>   0.40    0.000009           9         1           getpgrp
> >>   0.40    0.000009           9         1           getuid32
> >>   0.36    0.000008           8         1           time
> >>   0.36    0.000008           8         1           getppid
> >>   0.36    0.000008           8         1           getgid32
> >>   0.31    0.000007           7         1           getpid
> >>   0.27    0.000006           6         1           execve
> >>   0.27    0.000006           6         1           geteuid32
> >>   0.27    0.000006           6         1           getegid32
> >>------ ----------- ----------- --------- --------- ----------------
> >>100.00    0.002234                    95         3 total
> >>
> >>Thanks.
> >>
> >>Xuehai
> >>
> >>
> >>Petersson, Mats wrote:
> >>
> >>>>-----Original Message-----
> >>>>From: xen-devel-bounces@lists.xensource.com
> >>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf 
> Of Anthony 
> >>>>Liguori
> >>>>Sent: 28 November 2005 14:39
> >>>>To: xuehai zhang
> >>>>Cc: Xen Mailing List
> >>>>Subject: Re: [Xen-devel] open/stat64 syscalls run faster 
> on Xen VM 
> >>>>than standard Linux
> >>>>
> >>>>This may just be the difference between having the extra level of 
> >>>>block caching from using a loop back device.
> >>>>
> >>>>Try running the same benchmark on a domain that uses an actual 
> >>>>partition.  While the syscalls may appear to be faster, I
> >>
> >>imagine it's
> >>
> >>>>because the cost of pulling in a block has already been
> >>
> >>payed so the
> >>
> >>>>overall workload is unaffected.
> >>>
> >>>
> >>>And this would be the same as running standard linux with
> >>
> >>the loopback
> >>
> >>>file-system mounted and chroot to the local file-system, or
> >>
> >>would that
> >>
> >>>be different? [I'm asking because I don't actually
> >>
> >>understand enough
> >>
> >>>about how it works to know what difference it makes, and I
> >>
> >>would like
> >>
> >>>to know, because at some point I'll probably need to know this.]
> >>>
> >>>--
> >>>Mats
> >>>
> >>>
> >>>>Regards,
> >>>>
> >>>>Anthony Liguori
> >>>>
> >>>>xuehai zhang wrote:
> >>>
> >>>[snip]
> >>>
> >>>
> >>
> >>
> >>
> > 
> > 
> 
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: open/stat64 syscalls run faster on Xen VM than standard Linux
@ 2005-11-28 14:45 Petersson, Mats
  2005-11-28 15:28 ` xuehai zhang
  2005-11-28 15:50 ` xuehai zhang
  0 siblings, 2 replies; 21+ messages in thread
From: Petersson, Mats @ 2005-11-28 14:45 UTC (permalink / raw)
  To: Anthony Liguori, xuehai zhang; +Cc: Xen Mailing List

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> Anthony Liguori
> Sent: 28 November 2005 14:39
> To: xuehai zhang
> Cc: Xen Mailing List
> Subject: Re: [Xen-devel] open/stat64 syscalls run faster on 
> Xen VM than standard Linux
> 
> This may just be the difference between having the extra 
> level of block caching from using a loop back device.
> 
> Try running the same benchmark on a domain that uses an 
> actual partition.  While the syscalls may appear to be 
> faster, I imagine it's because the cost of pulling in a block 
> has already been payed so the overall workload is unaffected.

And this would be the same as running standard linux with the loopback
file-system mounted and chroot to the local file-system, or would that
be different? [I'm asking because I don't actually understand enough
about how it works to know what difference it makes, and I would like to
know, because at some point I'll probably need to know this.]

--
Mats
> 
> Regards,
> 
> Anthony Liguori
> 
> xuehai zhang wrote:
[snip]

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: open/stat64 syscalls run faster on Xen VM than standard Linux
@ 2005-11-28 14:36 Petersson, Mats
  2005-11-28 14:47 ` xuehai zhang
  0 siblings, 1 reply; 21+ messages in thread
From: Petersson, Mats @ 2005-11-28 14:36 UTC (permalink / raw)
  To: xuehai zhang, Xen Mailing List

 

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> xuehai zhang
> Sent: 28 November 2005 08:21
> To: Xen Mailing List
> Subject: [Xen-devel] open/stat64 syscalls run faster on Xen 
> VM than standard Linux
> 
> Dear all,
> When I debugged the execution performance of an application 
> using strace, I found there are some system calls like open 
> and stat64 which run faster on XenLinux than the standard 
> Linux. The following is the output of running "strace -c 
> /bin/sh -c /bin/echo foo" on both systems. An open call runs 
> averagely 109 usec on standard Linux but only 41 usecs on 
> XenLinux. An stat64 call runs
> 75 usecs on standard Linux but only 19 usecs on XenLinux.
> The Xen VM runs on the same physical machine as the standard 
> Linux. It uses loopback files in dom0 as the backends of VBDs.
> Any insight is highly appreciated.

And are you using the same Loopback file-system when running native
Linux? If not, it's probably the main cause of the difference. 

--
Mats

> Thanks.
> Xuehai
> 
[snip of tables]

^ permalink raw reply	[flat|nested] 21+ messages in thread
* open/stat64 syscalls run faster on Xen VM than standard Linux
@ 2005-11-28  8:21 xuehai zhang
  2005-11-28 14:38 ` Anthony Liguori
  0 siblings, 1 reply; 21+ messages in thread
From: xuehai zhang @ 2005-11-28  8:21 UTC (permalink / raw)
  To: Xen Mailing List

Dear all,
When I debugged the execution performance of an application using strace, I found there are some 
system calls like open and stat64 which run faster on XenLinux than the standard Linux. The 
following is the output of running "strace -c /bin/sh -c /bin/echo foo" on both systems. An open 
call runs averagely 109 usec on standard Linux but only 41 usecs on XenLinux. An stat64 call runs
75 usecs on standard Linux but only 19 usecs on XenLinux.
The Xen VM runs on the same physical machine as the standard Linux. It uses loopback files in dom0 
as the backends of VBDs.
Any insight is highly appreciated.
Thanks.
Xuehai

XenLinux:
# strace -c /bin/sh -c /bin/echo foo

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  39.05    0.001972        1972         1           write
  19.35    0.000977         977         1           execve
   7.74    0.000391          24        16           old_mmap
   7.23    0.000365          41         9         2 open
   6.06    0.000306          51         6           read
   5.17    0.000261          10        26           brk
   4.93    0.000249          83         3           munmap
   1.98    0.000100          14         7           close
   1.90    0.000096           9        11           rt_sigaction
   1.52    0.000077          11         7           fstat64
   1.01    0.000051           9         6           rt_sigprocmask
   0.95    0.000048          24         2           access
   0.81    0.000041          14         3           uname
   0.75    0.000038          19         2           stat64
   0.38    0.000019          19         1           ioctl
   0.18    0.000009           9         1           time
   0.18    0.000009           9         1           getppid
   0.16    0.000008           8         1           getpgrp
   0.16    0.000008           8         1           getuid32
   0.14    0.000007           7         1           getgid32
   0.12    0.000006           6         1           getpid
   0.12    0.000006           6         1           geteuid32
   0.12    0.000006           6         1           getegid32
------ ----------- ----------- --------- --------- ----------------
100.00    0.005050                   109         2 total


Standard Linux:
# strace -c /bin/sh -c /bin/echo foo

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  22.90    0.000982         109         9         2 open
  22.85    0.000980         980         1           execve
  10.87    0.000466          29        16           old_mmap
  10.45    0.000448         448         1           write
   7.06    0.000303          51         6           read
   6.67    0.000286          10        30           brk
   3.61    0.000155          78         2           access
   3.50    0.000150          75         2           stat64
   2.91    0.000125          42         3           munmap
   2.24    0.000096          14         7           close
   2.12    0.000091          13         7           fstat64
   1.84    0.000079           7        11           rt_sigaction
   1.03    0.000044           7         6           rt_sigprocmask
   0.72    0.000031          10         3           uname
   0.19    0.000008           8         1           geteuid32
   0.16    0.000007           7         1           time
   0.16    0.000007           7         1           getppid
   0.16    0.000007           7         1           getpgrp
   0.16    0.000007           7         1           getuid32
   0.14    0.000006           6         1           getpid
   0.14    0.000006           6         1           getgid32
   0.12    0.000005           5         1           getegid32
------ ----------- ----------- --------- --------- ----------------
100.00    0.004289                   112         2 total

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-11-29 13:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-28 16:15 open/stat64 syscalls run faster on Xen VM than standard Linux Petersson, Mats
2005-11-28 17:16 ` xuehai zhang
2005-11-28 19:37   ` Anthony Liguori
  -- strict thread matches above, loose matches on Subject: below --
2005-11-29 12:59 Petersson, Mats
2005-11-29 13:48 ` xuehai zhang
2005-11-28 18:17 Petersson, Mats
2005-11-28 14:45 Petersson, Mats
2005-11-28 15:28 ` xuehai zhang
2005-11-28 15:50 ` xuehai zhang
2005-11-28 16:07   ` Stephen C. Tweedie
2005-11-28 16:27     ` xuehai zhang
2005-11-28 17:10       ` Stephen C. Tweedie
2005-11-29  4:50         ` Tim Freeman
2005-11-29 13:27         ` xuehai zhang
2005-11-28 14:36 Petersson, Mats
2005-11-28 14:47 ` xuehai zhang
2005-11-28  8:21 xuehai zhang
2005-11-28 14:38 ` Anthony Liguori
2005-11-28 14:59   ` xuehai zhang
2005-11-29 12:39   ` xuehai zhang
2005-11-29 12:58     ` xuehai zhang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.