* [Qemu-devel] Network Performance between Win Host and Linux
@ 2006-04-11 17:20 Kenneth Duda
2006-04-11 17:28 ` Paul Brook
2006-04-11 22:36 ` [Qemu-devel] " Kenneth Duda
0 siblings, 2 replies; 15+ messages in thread
From: Kenneth Duda @ 2006-04-11 17:20 UTC (permalink / raw)
To: qemu-devel
I am also having severe performance problems using NFS-over-TCP on
qemu-0.8 with a Linux host and guest. I will be looking at this
today. My current theory is that the whole machine is going idle
before qemu decides to poll kernel ring buffers holding packets the
guest is transmitting, but if anyone has actual information, please
let me know.
Thanks,
-Ken
> Hello,
>
> I tried the cvs version from about a week ago with the latest kqemu
> driver, but the network problem still exists. I am using:
>
> qemu -net nic -net tap,ifname=my-tap
>
> under Win2k with a Gentoo guest. The network throughput is about 20 MB (
> per Minute ! ). When I use qemu 0.7.2 with the tap patch:
>
> qemu -tap "my-tap"
>
> pthe performance is much better ( about factor 10: 3 MB per second ).
> Whats going wrong there ?
>
> Thanks
>
> Helmut
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 17:20 [Qemu-devel] Network Performance between Win Host and Linux Kenneth Duda @ 2006-04-11 17:28 ` Paul Brook 2006-04-11 17:49 ` Kenneth Duda 2006-04-11 22:36 ` [Qemu-devel] " Kenneth Duda 1 sibling, 1 reply; 15+ messages in thread From: Paul Brook @ 2006-04-11 17:28 UTC (permalink / raw) To: qemu-devel On Tuesday 11 April 2006 18:20, Kenneth Duda wrote: > I am also having severe performance problems using NFS-over-TCP on > qemu-0.8 with a Linux host and guest. I will be looking at this > today. My current theory is that the whole machine is going idle > before qemu decides to poll kernel ring buffers holding packets the > guest is transmitting, but if anyone has actual information, please > let me know. You could be suffering from high interrupt latency. If the guest CPU is not idle then qemu only checks for interrupts (eg. the network RX interrupt) every 1ms or 1/host_HZ seconds, whichever is greater. If the guest CPU is idle it should respond immediately. I wouldn't be surprised if this problem is worse when using kqemu. Paul ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 17:28 ` Paul Brook @ 2006-04-11 17:49 ` Kenneth Duda 2006-04-11 18:19 ` Helmut Auer ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: Kenneth Duda @ 2006-04-11 17:49 UTC (permalink / raw) To: Paul Brook; +Cc: qemu-devel Paul, thanks for the note. In my case, the guest CPU is idle. The host CPU utilization is only 5 or 10 percent when running "find / -print > /dev/null" on the guest. So I don't think guest interrupt latency is the issue for me in this case. My first guess is that qemu is asleep when the NFS response arrives on the slirp socket, and stays asleep for several milliseconds before deciding to check if anything has shown up via slirp. The problem is that vl.c's main_loop_wait() has separate calls to select() for slirp versus non-slirp fd's. I think this is the problem because strace reveals qemu blocking for several milliseconds at a time in select(), waking up with a SIGALRM, and then polling slirp and finding stuff to do there. These select calls don't appear hard to integrate, and the author seems to feel this would be a good idea anyway; from vl.c: #if defined(CONFIG_SLIRP) /* XXX: merge with the previous select() */ if (slirp_inited) { I will take a swing at this first. Please let me know if there's anything I should be aware of. Thanks, -Ken On 4/11/06, Paul Brook <paul@codesourcery.com> wrote: > On Tuesday 11 April 2006 18:20, Kenneth Duda wrote: > > I am also having severe performance problems using NFS-over-TCP on > > qemu-0.8 with a Linux host and guest. I will be looking at this > > today. My current theory is that the whole machine is going idle > > before qemu decides to poll kernel ring buffers holding packets the > > guest is transmitting, but if anyone has actual information, please > > let me know. > > You could be suffering from high interrupt latency. If the guest CPU is not > idle then qemu only checks for interrupts (eg. the network RX interrupt) > every 1ms or 1/host_HZ seconds, whichever is greater. > If the guest CPU is idle it should respond immediately. > I wouldn't be surprised if this problem is worse when using kqemu. > > Paul > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 17:49 ` Kenneth Duda @ 2006-04-11 18:19 ` Helmut Auer 2006-04-12 2:10 ` Kazu 2006-04-11 20:40 ` Leonardo E. Reiter 2006-04-11 21:00 ` Leonardo E. Reiter 2 siblings, 1 reply; 15+ messages in thread From: Helmut Auer @ 2006-04-11 18:19 UTC (permalink / raw) To: qemu-devel Hello > In my case, the guest CPU is idle. The host CPU utilization is only 5 > or 10 percent when running "find / -print > /dev/null" on the guest. > So I don't think guest interrupt latency is the issue for me in this > case. > In my environment the performance of about 300 KB is the good case, that means the cpu is idle. When the cpu is busy it degrades to20 KB in the worst case. As I said before, with the tap-patched qemu 0.7.2 it is about 10 times faster. -- Helmut Auer, helmut@helmutauer.de ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 18:19 ` Helmut Auer @ 2006-04-12 2:10 ` Kazu 0 siblings, 0 replies; 15+ messages in thread From: Kazu @ 2006-04-12 2:10 UTC (permalink / raw) To: qemu-devel Hi, I have already made a patch. Try this. http://lists.gnu.org/archive/html/qemu-devel/2006-03/msg00041.html Regards, Kazu Sent: Wednesday, April 12, 2006 3:19 AM Helmut Auer wrote: > Hello >> In my case, the guest CPU is idle. The host CPU utilization is only 5 >> or 10 percent when running "find / -print > /dev/null" on the guest. >> So I don't think guest interrupt latency is the issue for me in this >> case. >> > In my environment the performance of about 300 KB is the good case, that > means the cpu is idle. When the cpu is busy it degrades to20 KB in the > worst case. > As I said before, with the tap-patched qemu 0.7.2 it is about 10 times > faster. > > -- > Helmut Auer, helmut@helmutauer.de > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 17:49 ` Kenneth Duda 2006-04-11 18:19 ` Helmut Auer @ 2006-04-11 20:40 ` Leonardo E. Reiter 2006-04-11 21:46 ` Kenneth Duda 2006-04-11 21:00 ` Leonardo E. Reiter 2 siblings, 1 reply; 15+ messages in thread From: Leonardo E. Reiter @ 2006-04-11 20:40 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 2411 bytes --] Hi Ken, I'm attaching a pretty old patch I made (from the 0.7.1 days), which did a quick and dirty merge of the select's. It's not something that is clean and it will need adapting to 0.8.0... but, I figure you could draw some quick hints on how to merge the 2. Basically it fills the select bitmaps when it walks through the fd's the first time, then calls select instead of poll. It also has slirp fill its own bits (fd's) in before calling select. So this is condensed to 1 select call. Do what you want with the code - like I said, it's messy and old. But maybe you can at least use it to quickly test your hypothesis. I'd be interested in learning about any benchmarks you come up with if you merge the select+poll. Also, it may not be valid at all on Windows hosts since there is a question about select() being interrupted properly on those hosts - it should work on Linux/BSD. Regards, Leo Reiter P.S. this patch should be applied with -p1, not -p0 like my newer patches are applied. Sorry for that - like I said, it's quite old. Kenneth Duda wrote: > Paul, thanks for the note. > > In my case, the guest CPU is idle. The host CPU utilization is only 5 > or 10 percent when running "find / -print > /dev/null" on the guest. > So I don't think guest interrupt latency is the issue for me in this > case. > > My first guess is that qemu is asleep when the NFS response arrives on > the slirp socket, and stays asleep for several milliseconds before > deciding to check if anything has shown up via slirp. The problem is > that vl.c's main_loop_wait() has separate calls to select() for slirp > versus non-slirp fd's. I think this is the problem because strace > reveals qemu blocking for several milliseconds at a time in select(), > waking up with a SIGALRM, and then polling slirp and finding stuff to > do there. These select calls don't appear hard to integrate, and the > author seems to feel this would be a good idea anyway; from vl.c: > > #if defined(CONFIG_SLIRP) > /* XXX: merge with the previous select() */ > if (slirp_inited) { > > I will take a swing at this first. Please let me know if there's > anything I should be aware of. > > Thanks, > -Ken -- Leonardo E. Reiter Vice President of Product Development, CTO Win4Lin, Inc. Virtual Computing from Desktop to Data Center Main: +1 512 339 7979 Fax: +1 512 532 6501 http://www.win4lin.com [-- Attachment #2: qemu-vl-select.patch --] [-- Type: text/x-patch, Size: 4168 bytes --] --- qemu/vl.c 2005-05-11 17:10:02.000000000 -0400 +++ qemu-select/vl.c 2005-05-11 17:13:24.000000000 -0400 @@ -2598,51 +2598,85 @@ void main_loop_wait(int timeout) { #ifndef _WIN32 - struct pollfd ufds[MAX_IO_HANDLERS + 1], *pf; IOHandlerRecord *ioh, *ioh_next; uint8_t buf[4096]; int n, max_size; #endif int ret; +#if defined(CONFIG_SLIRP) || !defined(_WIN32) + fd_set rfds, wfds, xfds; + int nfds; + struct timeval tv; +#endif +#if defined(CONFIG_SLIRP) + int slirp_nfds; +#endif #ifdef _WIN32 if (timeout > 0) Sleep(timeout); + +#if defined(CONFIG_SLIRP) + /* XXX: merge with poll() */ + if (slirp_inited) { + + nfds = -1; + FD_ZERO(&rfds); + FD_ZERO(&wfds); + FD_ZERO(&xfds); + slirp_select_fill(&nfds, &rfds, &wfds, &xfds); + tv.tv_sec = 0; + tv.tv_usec = 0; + ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); + if (ret >= 0) { + slirp_select_poll(&rfds, &wfds, &xfds); + } + } +#endif #else /* poll any events */ /* XXX: separate device handlers from system ones */ - pf = ufds; + FD_ZERO(&rfds); + FD_ZERO(&wfds); + FD_ZERO(&xfds); + nfds = -1; for(ioh = first_io_handler; ioh != NULL; ioh = ioh->next) { if (!ioh->fd_can_read) { + FD_SET(ioh->fd, &rfds); max_size = 0; - pf->fd = ioh->fd; - pf->events = POLLIN; - ioh->ufd = pf; - pf++; + if (ioh->fd > nfds) + nfds = ioh->fd; } else { max_size = ioh->fd_can_read(ioh->opaque); if (max_size > 0) { if (max_size > sizeof(buf)) max_size = sizeof(buf); - pf->fd = ioh->fd; - pf->events = POLLIN; - ioh->ufd = pf; - pf++; - } else { - ioh->ufd = NULL; + FD_SET(ioh->fd, &rfds); + if (ioh->fd > nfds) + nfds = ioh->fd; } } ioh->max_size = max_size; } + +#if defined(CONFIG_SLIRP) + if (slirp_inited) { + slirp_nfds = -1; + slirp_select_fill(&slirp_nfds, &rfds, &wfds, &xfds); + if (slirp_nfds > nfds) + nfds = slirp_nfds; + } +#endif /* CONFIG_SLIRP */ + + tv.tv_sec = 0; + tv.tv_usec = timeout * 1000; + ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); - ret = poll(ufds, pf - ufds, timeout); if (ret > 0) { /* XXX: better handling of removal */ for(ioh = first_io_handler; ioh != NULL; ioh = ioh_next) { ioh_next = ioh->next; - pf = ioh->ufd; - if (pf) { - if (pf->revents & POLLIN) { + if (FD_ISSET(ioh->fd, &rfds)) { if (ioh->max_size == 0) { /* just a read event */ ioh->fd_read(ioh->opaque, NULL, 0); @@ -2654,31 +2688,16 @@ ioh->fd_read(ioh->opaque, NULL, -errno); } } - } - } + } } - } -#endif /* !defined(_WIN32) */ -#if defined(CONFIG_SLIRP) - /* XXX: merge with poll() */ - if (slirp_inited) { - fd_set rfds, wfds, xfds; - int nfds; - struct timeval tv; - nfds = -1; - FD_ZERO(&rfds); - FD_ZERO(&wfds); - FD_ZERO(&xfds); - slirp_select_fill(&nfds, &rfds, &wfds, &xfds); - tv.tv_sec = 0; - tv.tv_usec = 0; - ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); - if (ret >= 0) { +#if defined(CONFIG_SLIRP) + if (slirp_inited) slirp_select_poll(&rfds, &wfds, &xfds); - } } -#endif +#endif /* defined(CONFIG_SLIRP) */ + +#endif /* !defined(_WIN32) */ if (vm_running) { qemu_run_timers(&active_timers[QEMU_TIMER_VIRTUAL], ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 20:40 ` Leonardo E. Reiter @ 2006-04-11 21:46 ` Kenneth Duda 2006-04-11 21:58 ` Leonardo E. Reiter 0 siblings, 1 reply; 15+ messages in thread From: Kenneth Duda @ 2006-04-11 21:46 UTC (permalink / raw) To: qemu-devel Thanks, Leo. It appears your patch or something similar has made it into 0.8.0. I have already merged the select loops, but it didn't help as much as I hoped, maybe 10%. A much bigger improvement was made by fixing the badly hacked slirp DELACK behavior. Believe it or not, slirp delays all TCP acks *unless* the segment data starts with an escape character, I kid you not. I threw that out, and have made slirp's tcp_input rfc2581 compliant (to my shallow reading of the rfc) and that boosted throughput from vm->host by 3.5x, to 56 megabits (from 16 megabits). The performance from host->vm was helped less, and that was because of another hack in slirp that was causing it to get the wrong MSS --- it was sending 512 byte segments. Now, I'm looking at excessive numbers of retransmissions (believe it or not) --- I suspect the ne2000 ring buffer is overflowing but I'm not yet sure. I will post a patch including all of these things when I'm done. I'm expecting a significant aggregate improvement. -Ken On 4/11/06, Leonardo E. Reiter <lreiter@win4lin.com> wrote: > Hi Ken, > > I'm attaching a pretty old patch I made (from the 0.7.1 days), which did > a quick and dirty merge of the select's. It's not something that is > clean and it will need adapting to 0.8.0... but, I figure you could draw > some quick hints on how to merge the 2. Basically it fills the select > bitmaps when it walks through the fd's the first time, then calls select > instead of poll. It also has slirp fill its own bits (fd's) in before > calling select. So this is condensed to 1 select call. > > Do what you want with the code - like I said, it's messy and old. But > maybe you can at least use it to quickly test your hypothesis. I'd be > interested in learning about any benchmarks you come up with if you > merge the select+poll. Also, it may not be valid at all on Windows > hosts since there is a question about select() being interrupted > properly on those hosts - it should work on Linux/BSD. > > Regards, > > Leo Reiter > > P.S. this patch should be applied with -p1, not -p0 like my newer > patches are applied. Sorry for that - like I said, it's quite old. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 21:46 ` Kenneth Duda @ 2006-04-11 21:58 ` Leonardo E. Reiter 2006-04-11 22:42 ` Kenneth Duda 0 siblings, 1 reply; 15+ messages in thread From: Leonardo E. Reiter @ 2006-04-11 21:58 UTC (permalink / raw) To: qemu-devel Yes... I sent a follow-up note after I looked at the latest vl.c with a newer patch applied. much simpler. As for the delay acks, I've seen this and removed the delay for testing before. I read in the comment (not sure if it was Fabrice or the slirp author) about how the delay was 1 of 3 methods that had been chosen as sort of a "compromise." I recall testing newer versions of the code and not having as much of an issue with the delayed ack as before, so I figured Paul's performance fixes had addressed that somewhat (they definitely helped tremendously for receiving data). In any case, it's good that you are taking a scientific approach to addressing this. I personally think that slirp is a great idea for networking, for most uses, because it's totally in userspace, etc., etc. But let's keep in mind that the original code was designed to meet the performance criteria of a serial line ;) The work you are doing should help in bringing that more up to date. I'd be glad to help with any testing if/when you have patches. Thanks, Leo Reiter Kenneth Duda wrote: > Thanks, Leo. It appears your patch or something similar has made it > into 0.8.0. I have already merged the select loops, but it didn't > help as much as I hoped, maybe 10%. A much bigger improvement was > made by fixing the badly hacked slirp DELACK behavior. Believe it or > not, slirp delays all TCP acks *unless* the segment data starts with > an escape character, I kid you not. I threw that out, and have made > slirp's tcp_input rfc2581 compliant (to my shallow reading of the rfc) > and that boosted throughput from vm->host by 3.5x, to 56 megabits > (from 16 megabits). The performance from host->vm was helped less, > and that was because of another hack in slirp that was causing it to > get the wrong MSS --- it was sending 512 byte segments. Now, I'm > looking at excessive numbers of retransmissions (believe it or not) > --- I suspect the ne2000 ring buffer is overflowing but I'm not yet > sure. I will post a patch including all of these things when I'm > done. I'm expecting a significant aggregate improvement. > > -Ken > > On 4/11/06, Leonardo E. Reiter <lreiter@win4lin.com> wrote: > >>Hi Ken, >> >>I'm attaching a pretty old patch I made (from the 0.7.1 days), which did >>a quick and dirty merge of the select's. It's not something that is >>clean and it will need adapting to 0.8.0... but, I figure you could draw >>some quick hints on how to merge the 2. Basically it fills the select >>bitmaps when it walks through the fd's the first time, then calls select >>instead of poll. It also has slirp fill its own bits (fd's) in before >>calling select. So this is condensed to 1 select call. >> >>Do what you want with the code - like I said, it's messy and old. But >>maybe you can at least use it to quickly test your hypothesis. I'd be >>interested in learning about any benchmarks you come up with if you >>merge the select+poll. Also, it may not be valid at all on Windows >>hosts since there is a question about select() being interrupted >>properly on those hosts - it should work on Linux/BSD. >> >>Regards, >> >>Leo Reiter >> >>P.S. this patch should be applied with -p1, not -p0 like my newer >>patches are applied. Sorry for that - like I said, it's quite old. >> > > > > _______________________________________________ > Qemu-devel mailing list > Qemu-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/qemu-devel -- Leonardo E. Reiter Vice President of Product Development, CTO Win4Lin, Inc. Virtual Computing from Desktop to Data Center Main: +1 512 339 7979 Fax: +1 512 532 6501 http://www.win4lin.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 21:58 ` Leonardo E. Reiter @ 2006-04-11 22:42 ` Kenneth Duda 0 siblings, 0 replies; 15+ messages in thread From: Kenneth Duda @ 2006-04-11 22:42 UTC (permalink / raw) To: qemu-devel I was confused by the comments around the delaying of acks. Delaying these acks didn't make intuitive sense to me and is inconsistent with RFC 2581, which states: ... a TCP receiver MUST NOT excessively delay acknowledgments. Specifically, an ACK SHOULD be generated for at least every second full-sized segment, and MUST be generated within 500 ms of the arrival of the first unacknowledged packet. I have implemented things so that acks are never delayed, which is simplest and seems fine in the environment where I imagine slirp-within-qemu is being used (simulated ethernets). I'm interested in other viewpoints. -Ken On 4/11/06, Leonardo E. Reiter <lreiter@win4lin.com> wrote: > Yes... I sent a follow-up note after I looked at the latest vl.c with a > newer patch applied. much simpler. > > As for the delay acks, I've seen this and removed the delay for testing > before. I read in the comment (not sure if it was Fabrice or the slirp > author) about how the delay was 1 of 3 methods that had been chosen as > sort of a "compromise." I recall testing newer versions of the code and > not having as much of an issue with the delayed ack as before, so I > figured Paul's performance fixes had addressed that somewhat (they > definitely helped tremendously for receiving data). In any case, it's > good that you are taking a scientific approach to addressing this. I > personally think that slirp is a great idea for networking, for most > uses, because it's totally in userspace, etc., etc. But let's keep in > mind that the original code was designed to meet the performance > criteria of a serial line ;) The work you are doing should help in > bringing that more up to date. I'd be glad to help with any testing > if/when you have patches. > > Thanks, > > Leo Reiter > > Kenneth Duda wrote: > > Thanks, Leo. It appears your patch or something similar has made it > > into 0.8.0. I have already merged the select loops, but it didn't > > help as much as I hoped, maybe 10%. A much bigger improvement was > > made by fixing the badly hacked slirp DELACK behavior. Believe it or > > not, slirp delays all TCP acks *unless* the segment data starts with > > an escape character, I kid you not. I threw that out, and have made > > slirp's tcp_input rfc2581 compliant (to my shallow reading of the rfc) > > and that boosted throughput from vm->host by 3.5x, to 56 megabits > > (from 16 megabits). The performance from host->vm was helped less, > > and that was because of another hack in slirp that was causing it to > > get the wrong MSS --- it was sending 512 byte segments. Now, I'm > > looking at excessive numbers of retransmissions (believe it or not) > > --- I suspect the ne2000 ring buffer is overflowing but I'm not yet > > sure. I will post a patch including all of these things when I'm > > done. I'm expecting a significant aggregate improvement. > > > > -Ken > > > > On 4/11/06, Leonardo E. Reiter <lreiter@win4lin.com> wrote: > > > >>Hi Ken, > >> > >>I'm attaching a pretty old patch I made (from the 0.7.1 days), which did > >>a quick and dirty merge of the select's. It's not something that is > >>clean and it will need adapting to 0.8.0... but, I figure you could draw > >>some quick hints on how to merge the 2. Basically it fills the select > >>bitmaps when it walks through the fd's the first time, then calls select > >>instead of poll. It also has slirp fill its own bits (fd's) in before > >>calling select. So this is condensed to 1 select call. > >> > >>Do what you want with the code - like I said, it's messy and old. But > >>maybe you can at least use it to quickly test your hypothesis. I'd be > >>interested in learning about any benchmarks you come up with if you > >>merge the select+poll. Also, it may not be valid at all on Windows > >>hosts since there is a question about select() being interrupted > >>properly on those hosts - it should work on Linux/BSD. > >> > >>Regards, > >> > >>Leo Reiter > >> > >>P.S. this patch should be applied with -p1, not -p0 like my newer > >>patches are applied. Sorry for that - like I said, it's quite old. > >> > > > > > > > > _______________________________________________ > > Qemu-devel mailing list > > Qemu-devel@nongnu.org > > http://lists.nongnu.org/mailman/listinfo/qemu-devel > > -- > Leonardo E. Reiter > Vice President of Product Development, CTO > > Win4Lin, Inc. > Virtual Computing from Desktop to Data Center > Main: +1 512 339 7979 > Fax: +1 512 532 6501 > http://www.win4lin.com > > > _______________________________________________ > Qemu-devel mailing list > Qemu-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/qemu-devel > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Network Performance between Win Host and Linux 2006-04-11 17:49 ` Kenneth Duda 2006-04-11 18:19 ` Helmut Auer 2006-04-11 20:40 ` Leonardo E. Reiter @ 2006-04-11 21:00 ` Leonardo E. Reiter 2 siblings, 0 replies; 15+ messages in thread From: Leonardo E. Reiter @ 2006-04-11 21:00 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 835 bytes --] Hi Ken, please disregard my last mail on this... here's a current patch against today's CVS. I didn't realize that vl.c already converted from poll() to select(), so the patch logic is much easier and cleaner. Check it out... I tested it minimally and it seems to work - only tested it on Linux host, Leo P.S. you can apply this one with -p0 arg to patch. Kenneth Duda wrote: > Paul, thanks for the note. > > In my case, the guest CPU is idle. The host CPU utilization is only 5 > or 10 percent when running "find / -print > /dev/null" on the guest. > So I don't think guest interrupt latency is the issue for me in this > case. -- Leonardo E. Reiter Vice President of Product Development, CTO Win4Lin, Inc. Virtual Computing from Desktop to Data Center Main: +1 512 339 7979 Fax: +1 512 532 6501 http://www.win4lin.com [-- Attachment #2: qemu-select-merge.patch --] [-- Type: text/x-patch, Size: 2057 bytes --] Index: vl.c =================================================================== RCS file: /cvsroot/qemu/qemu/vl.c,v retrieving revision 1.168 diff -a -u -r1.168 vl.c --- vl.c 9 Apr 2006 01:32:52 -0000 1.168 +++ vl.c 11 Apr 2006 20:56:56 -0000 @@ -3952,8 +3952,11 @@ void main_loop_wait(int timeout) { IOHandlerRecord *ioh, *ioh_next; - fd_set rfds, wfds; + fd_set rfds, wfds, xfds; int ret, nfds; +#if defined(CONFIG_SLIRP) + int slirp_nfds; +#endif struct timeval tv; #ifdef _WIN32 @@ -3967,6 +3970,7 @@ nfds = -1; FD_ZERO(&rfds); FD_ZERO(&wfds); + FD_ZERO(&xfds); for(ioh = first_io_handler; ioh != NULL; ioh = ioh->next) { if (ioh->fd_read && (!ioh->fd_read_poll || @@ -3988,7 +3992,14 @@ #else tv.tv_usec = timeout * 1000; #endif - ret = select(nfds + 1, &rfds, &wfds, NULL, &tv); +#if defined(CONFIG_SLIRP) + if (slirp_inited) { + slirp_select_fill(&slirp_nfds, &rfds, &wfds, &xfds); + if (slirp_nfds > nfds) + nfds = slirp_nfds; + } +#endif + ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); if (ret > 0) { /* XXX: better handling of removal */ for(ioh = first_io_handler; ioh != NULL; ioh = ioh_next) { @@ -4000,30 +4011,14 @@ ioh->fd_write(ioh->opaque); } } - } -#ifdef _WIN32 - tap_win32_poll(); -#endif - #if defined(CONFIG_SLIRP) - /* XXX: merge with the previous select() */ - if (slirp_inited) { - fd_set rfds, wfds, xfds; - int nfds; - struct timeval tv; - - nfds = -1; - FD_ZERO(&rfds); - FD_ZERO(&wfds); - FD_ZERO(&xfds); - slirp_select_fill(&nfds, &rfds, &wfds, &xfds); - tv.tv_sec = 0; - tv.tv_usec = 0; - ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); - if (ret >= 0) { + if (slirp_inited) slirp_select_poll(&rfds, &wfds, &xfds); - } +#endif } + +#ifdef _WIN32 + tap_win32_poll(); #endif if (vm_running) { ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Network Performance between Win Host and Linux 2006-04-11 17:20 [Qemu-devel] Network Performance between Win Host and Linux Kenneth Duda 2006-04-11 17:28 ` Paul Brook @ 2006-04-11 22:36 ` Kenneth Duda 2006-04-12 14:04 ` Leonardo E. Reiter 2006-04-12 14:31 ` Leonardo E. Reiter 1 sibling, 2 replies; 15+ messages in thread From: Kenneth Duda @ 2006-04-11 22:36 UTC (permalink / raw) To: qemu-devel [-- Attachment #1: Type: text/plain, Size: 1476 bytes --] The "qemu-slirp-performance" patch contains three improvements to qemu slirp networking performance. Booting my virtual machine (which NFS-mounts its root filesystem from the host) has been accelerated by 8x, from over 5 minutes to 40 seconds. TCP throughput has been accelerated from about 2 megabytes/sec to 9 megabytes/sec, in both directions (measured using a simple python script). The system is subjectively more responsive (for activities such as logging in or running simple python scripts). The specific problems fixed are: - the mss for the slirp-to-vm direction was 512 bytes (now 1460); - qemu would block in select() for up to four milliseconds at a time, even when data was waiting on slirp sockets; - slirp was deliberately delaying acks until timer expiration (TF_DELACK), preventing the vm from opening its send window, in violation of rfc2581. These fixes are together in one patch (qemu-slirp-performance.patch). I have also attached some related patches that fix fairly serious slirp bugs for IP datagrams larger than 4k. Before these patches, large packets can corrupt the heap or get reassembled in some entertaining but incorrect orders (because ip_off was being sorted as though it was signed!) These patches are attached in the order I apply them. I hope they are helpful. If there's anything I can do to make them more likely to be accepted into the mainline, please let me know. Thanks, -Ken [-- Attachment #2: qemu-slirp-mbuf-bug.patch --] [-- Type: text/plain, Size: 888 bytes --] diff -BurN qemu-snapshot-2006-03-27_23.orig/slirp/mbuf.c qemu-snapshot-2006-03-27_23/slirp/mbuf.c --- qemu-snapshot-2006-03-27_23.orig/slirp/mbuf.c 2004-04-22 00:10:47.000000000 +0000 +++ qemu-snapshot-2006-03-27_23/slirp/mbuf.c 2006-04-05 13:03:03.000000000 +0000 @@ -146,18 +146,19 @@ struct mbuf *m; int size; { + int datasize; + /* some compiles throw up on gotos. This one we can fake. */ if(m->m_size>size) return; if (m->m_flags & M_EXT) { - /* datasize = m->m_data - m->m_ext; */ + datasize = m->m_data - m->m_ext; m->m_ext = (char *)realloc(m->m_ext,size); /* if (m->m_ext == NULL) * return (struct mbuf *)NULL; */ - /* m->m_data = m->m_ext + datasize; */ + m->m_data = m->m_ext + datasize; } else { - int datasize; char *dat; datasize = m->m_data - m->m_dat; dat = (char *)malloc(size); [-- Attachment #3: qemu-slirp-reassembly-bug.patch --] [-- Type: text/plain, Size: 467 bytes --] diff -BurN qemu-snapshot-2006-03-27_23.orig/slirp/ip_input.c qemu-snapshot-2006-03-27_23/slirp/ip_input.c --- qemu-snapshot-2006-03-27_23.orig/slirp/ip_input.c 2004-04-22 00:10:47.000000000 +0000 +++ qemu-snapshot-2006-03-27_23/slirp/ip_input.c 2006-04-06 06:02:52.000000000 +0000 @@ -344,8 +344,8 @@ while (q != (struct ipasfrag *)fp) { struct mbuf *t; t = dtom(q); - m_cat(m, t); q = (struct ipasfrag *) q->ipf_next; + m_cat(m, t); } /* [-- Attachment #4: qemu-slirp-32k-packets.patch --] [-- Type: text/plain, Size: 4434 bytes --] diff -burN qemu-snapshot-2006-03-27_23.orig/slirp/ip.h qemu-snapshot-2006-03-27_23/slirp/ip.h --- qemu-snapshot-2006-03-27_23.orig/slirp/ip.h 2004-04-21 17:10:47.000000000 -0700 +++ qemu-snapshot-2006-03-27_23/slirp/ip.h 2006-04-06 00:28:49.000000000 -0700 @@ -79,6 +79,11 @@ * We declare ip_len and ip_off to be short, rather than u_short * pragmatically since otherwise unsigned comparisons can result * against negative integers quite easily, and fail in subtle ways. + * + * The only problem with the above theory is that these quantities + * are in fact unsigned, and sorting fragments by a signed version + * of ip_off doesn't work very well, nor does length checks on + * ip packets with a signed version of their length! */ struct ip { #ifdef WORDS_BIGENDIAN @@ -101,6 +106,9 @@ struct in_addr ip_src,ip_dst; /* source and dest address */ }; +#define IP_OFF(ip) (*(u_int16_t *)&((ip)->ip_off)) +#define IP_LEN(ip) (*(u_int16_t *)&((ip)->ip_len)) + #define IP_MAXPACKET 65535 /* maximum packet size */ /* diff -burN qemu-snapshot-2006-03-27_23.orig/slirp/ip_input.c qemu-snapshot-2006-03-27_23/slirp/ip_input.c --- qemu-snapshot-2006-03-27_23.orig/slirp/ip_input.c 2004-04-21 17:10:47.000000000 -0700 +++ qemu-snapshot-2006-03-27_23/slirp/ip_input.c 2006-04-06 00:32:19.000000000 -0700 @@ -111,7 +111,7 @@ * Convert fields to host representation. */ NTOHS(ip->ip_len); - if (ip->ip_len < hlen) { + if (IP_LEN(ip) < hlen) { ipstat.ips_badlen++; goto bad; } @@ -124,13 +124,13 @@ * Trim mbufs if longer than we expect. * Drop packet if shorter than we expect. */ - if (m->m_len < ip->ip_len) { + if (m->m_len < IP_LEN(ip)) { ipstat.ips_tooshort++; goto bad; } /* Should drop packet if mbuf too long? hmmm... */ - if (m->m_len > ip->ip_len) - m_adj(m, ip->ip_len - m->m_len); + if (m->m_len > IP_LEN(ip)) + m_adj(m, IP_LEN(ip) - m->m_len); /* check ip_ttl for a correct ICMP reply */ if(ip->ip_ttl==0 || ip->ip_ttl==1) { @@ -191,7 +191,7 @@ * or if this is not the first fragment, * attempt reassembly; if it succeeds, proceed. */ - if (((struct ipasfrag *)ip)->ipf_mff & 1 || ip->ip_off) { + if (((struct ipasfrag *)ip)->ipf_mff & 1 || IP_OFF(ip)) { ipstat.ips_fragments++; ip = ip_reass((struct ipasfrag *)ip, fp); if (ip == 0) @@ -281,7 +281,7 @@ */ for (q = (struct ipasfrag *)fp->ipq_next; q != (struct ipasfrag *)fp; q = (struct ipasfrag *)q->ipf_next) - if (q->ip_off > ip->ip_off) + if (IP_OFF(q) > IP_OFF(ip)) break; /* @@ -290,10 +290,10 @@ * segment. If it provides all of our data, drop us. */ if (q->ipf_prev != (ipasfragp_32)fp) { - i = ((struct ipasfrag *)(q->ipf_prev))->ip_off + - ((struct ipasfrag *)(q->ipf_prev))->ip_len - ip->ip_off; + i = IP_OFF((struct ipasfrag *)(q->ipf_prev)) + + IP_LEN((struct ipasfrag *)(q->ipf_prev)) - IP_OFF(ip); if (i > 0) { - if (i >= ip->ip_len) + if (i >= IP_LEN(ip)) goto dropfrag; m_adj(dtom(ip), i); ip->ip_off += i; @@ -305,9 +305,9 @@ * While we overlap succeeding segments trim them or, * if they are completely covered, dequeue them. */ - while (q != (struct ipasfrag *)fp && ip->ip_off + ip->ip_len > q->ip_off) { - i = (ip->ip_off + ip->ip_len) - q->ip_off; - if (i < q->ip_len) { + while (q != (struct ipasfrag *)fp && IP_OFF(ip) + IP_LEN(ip) > IP_OFF(q)) { + i = (IP_OFF(ip) + IP_LEN(ip)) - IP_OFF(q); + if (i < IP_LEN(q)) { q->ip_len -= i; q->ip_off += i; m_adj(dtom(q), i); @@ -327,9 +327,9 @@ next = 0; for (q = (struct ipasfrag *) fp->ipq_next; q != (struct ipasfrag *)fp; q = (struct ipasfrag *) q->ipf_next) { - if (q->ip_off != next) + if (IP_OFF(q) != next) return (0); - next += q->ip_len; + next += IP_LEN(q); } if (((struct ipasfrag *)(q->ipf_prev))->ipf_mff & 1) return (0); diff -burN qemu-snapshot-2006-03-27_23.orig/slirp/udp.c qemu-snapshot-2006-03-27_23/slirp/udp.c --- qemu-snapshot-2006-03-27_23.orig/slirp/udp.c 2006-04-06 00:24:30.000000000 -0700 +++ qemu-snapshot-2006-03-27_23/slirp/udp.c 2006-04-06 00:32:55.000000000 -0700 @@ -111,12 +111,12 @@ */ len = ntohs((u_int16_t)uh->uh_ulen); - if (ip->ip_len != len) { - if (len > ip->ip_len) { + if (IP_LEN(ip) != len) { + if (len > IP_LEN(ip)) { udpstat.udps_badlen++; goto bad; } - m_adj(m, len - ip->ip_len); + m_adj(m, len - IP_LEN(ip)); ip->ip_len = len; } [-- Attachment #5: qemu-slirp-performance.patch --] [-- Type: text/plain, Size: 3196 bytes --] diff -BurN qemu-snapshot-2006-03-27_23.orig/slirp/tcp.h qemu-snapshot-2006-03-27_23/slirp/tcp.h --- qemu-snapshot-2006-03-27_23.orig/slirp/tcp.h 2004-04-21 17:10:47.000000000 -0700 +++ qemu-snapshot-2006-03-27_23/slirp/tcp.h 2006-04-11 15:22:05.000000000 -0700 @@ -100,8 +100,10 @@ * With an IP MSS of 576, this is 536, * but 512 is probably more convenient. * This should be defined as MIN(512, IP_MSS - sizeof (struct tcpiphdr)). + * + * We make this 1460 because we only care about Ethernet in the qemu context. */ -#define TCP_MSS 512 +#define TCP_MSS 1460 #define TCP_MAXWIN 65535 /* largest value for (unscaled) window */ diff -BurN qemu-snapshot-2006-03-27_23.orig/slirp/tcp_input.c qemu-snapshot-2006-03-27_23/slirp/tcp_input.c --- qemu-snapshot-2006-03-27_23.orig/slirp/tcp_input.c 2004-10-07 16:27:35.000000000 -0700 +++ qemu-snapshot-2006-03-27_23/slirp/tcp_input.c 2006-04-11 15:22:05.000000000 -0700 @@ -580,28 +580,11 @@ * congestion avoidance sender won't send more until * he gets an ACK. * - * Here are 3 interpretations of what should happen. - * The best (for me) is to delay-ack everything except - * if it's a one-byte packet containing an ESC - * (this means it's an arrow key (or similar) sent using - * Nagel, hence there will be no echo) - * The first of these is the original, the second is the - * middle ground between the other 2 + * It is better to not delay acks at all to maximize + * TCP throughput. See RFC 2581. */ -/* if (((unsigned)ti->ti_len < tp->t_maxseg)) { - */ -/* if (((unsigned)ti->ti_len < tp->t_maxseg && - * (so->so_iptos & IPTOS_LOWDELAY) == 0) || - * ((so->so_iptos & IPTOS_LOWDELAY) && - * ((struct tcpiphdr_2 *)ti)->first_char == (char)27)) { - */ - if ((unsigned)ti->ti_len == 1 && - ((struct tcpiphdr_2 *)ti)->first_char == (char)27) { - tp->t_flags |= TF_ACKNOW; - tcp_output(tp); - } else { - tp->t_flags |= TF_DELACK; - } + tp->t_flags |= TF_ACKNOW; + tcp_output(tp); return; } } /* header prediction */ diff -BurN qemu-snapshot-2006-03-27_23.orig/vl.c qemu-snapshot-2006-03-27_23/vl.c --- qemu-snapshot-2006-03-27_23.orig/vl.c 2006-04-11 15:21:27.000000000 -0700 +++ qemu-snapshot-2006-03-27_23/vl.c 2006-04-11 15:22:05.000000000 -0700 @@ -4026,7 +4026,7 @@ void main_loop_wait(int timeout) { IOHandlerRecord *ioh, *ioh_next; - fd_set rfds, wfds; + fd_set rfds, wfds, xfds; int ret, nfds; struct timeval tv; @@ -4041,6 +4041,7 @@ nfds = -1; FD_ZERO(&rfds); FD_ZERO(&wfds); + FD_ZERO(&xfds); for(ioh = first_io_handler; ioh != NULL; ioh = ioh->next) { if (ioh->fd_read && (!ioh->fd_read_poll || @@ -4062,7 +4063,12 @@ #else tv.tv_usec = timeout * 1000; #endif - ret = select(nfds + 1, &rfds, &wfds, NULL, &tv); +#if defined(CONFIG_SLIRP) + if (slirp_inited) { + slirp_select_fill(&nfds, &rfds, &wfds, &xfds); + } +#endif + ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv); if (ret > 0) { /* XXX: better handling of removal */ for(ioh = first_io_handler; ioh != NULL; ioh = ioh_next) { ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: Network Performance between Win Host and Linux 2006-04-11 22:36 ` [Qemu-devel] " Kenneth Duda @ 2006-04-12 14:04 ` Leonardo E. Reiter 2006-04-12 18:19 ` Kenneth Duda 2006-04-12 14:31 ` Leonardo E. Reiter 1 sibling, 1 reply; 15+ messages in thread From: Leonardo E. Reiter @ 2006-04-12 14:04 UTC (permalink / raw) To: qemu-devel Hi Ken, (all) the patches seem to work very well and be very stable with Windows 2000 guests here. I measured some SMB over TCP/IP transfers, and got about a 1.5x downstream improvement and a 2x upstream improvement. You will likely get more boost from less convoluted protocols like FTP or something, but I didn't get around to testing that. Plus it's not clear how much Windows itself is impeding the bandwidth. I am using -kernel-kqemu. 2 additional things I noticed: 1. before your patches, the upstream transfers (guest->host) consumed almost no CPU at all, but of course were much slower. Now, about half the CPU gets used under heavy upstream load. The downstream, with Windows guests at least, consumes 100% CPU the same as before. I suspect you addressed this specifically with your select hack to avoid the delay if there is pending slirp activity 2. overall latency "feels" improved as well, at least for basic stuff like web browsing, etc. This is purely subjective. Nice work! I'll be testing with a Linux VM soon and try to pin down some better benchmarks, free of Windows clutter. - Leo Reiter Kenneth Duda wrote: > The "qemu-slirp-performance" patch contains three improvements to qemu > slirp networking performance. Booting my virtual machine (which > NFS-mounts its root filesystem from the host) has been accelerated by > 8x, from over 5 minutes to 40 seconds. TCP throughput has been > accelerated from about 2 megabytes/sec to 9 megabytes/sec, in both > directions (measured using a simple python script). The system is > subjectively more responsive (for activities such as logging in or > running simple python scripts). > > The specific problems fixed are: > > - the mss for the slirp-to-vm direction was 512 bytes (now 1460); > - qemu would block in select() for up to four milliseconds at a > time, even when data was waiting on slirp sockets; > - slirp was deliberately delaying acks until timer expiration > (TF_DELACK), preventing the vm from opening its send window, in > violation of rfc2581. > > These fixes are together in one patch (qemu-slirp-performance.patch). ><snip> -- Leonardo E. Reiter Vice President of Product Development, CTO Win4Lin, Inc. Virtual Computing from Desktop to Data Center Main: +1 512 339 7979 Fax: +1 512 532 6501 http://www.win4lin.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: Network Performance between Win Host and Linux 2006-04-12 14:04 ` Leonardo E. Reiter @ 2006-04-12 18:19 ` Kenneth Duda 2006-04-12 18:26 ` Leonardo E. Reiter 0 siblings, 1 reply; 15+ messages in thread From: Kenneth Duda @ 2006-04-12 18:19 UTC (permalink / raw) To: qemu-devel Leo, thank you for exercising this stuff. > 1. before your patches, the upstream transfers (guest->host) consumed > almost no CPU at all, but of course were much slower. Now, about half > the CPU gets used under heavy upstream load. I am surprised that only half the CPU gets consumed --- that suggests there's another factor of two improvement waiting to be made. If you see anything like this with Linux-on-Linux, please let me know and I'll try to track it down. Separately, I'm curious about the path for getting these changes into the qemu mainline. If that's something you're in tune with and are in the mood to summarize for me, I'd appreciate that. We love qemu but there are some rough edges and I think we have something like 16 patches we're maintaining internally, many of which might be helpful for others. -Ken On 4/12/06, Leonardo E. Reiter <lreiter@win4lin.com> wrote: > Hi Ken, > > (all) the patches seem to work very well and be very stable with Windows > 2000 guests here. I measured some SMB over TCP/IP transfers, and got > about a 1.5x downstream improvement and a 2x upstream improvement. You > will likely get more boost from less convoluted protocols like FTP or > something, but I didn't get around to testing that. Plus it's not clear > how much Windows itself is impeding the bandwidth. I am using > -kernel-kqemu. > > 2 additional things I noticed: > > 1. before your patches, the upstream transfers (guest->host) consumed > almost no CPU at all, but of course were much slower. Now, about half > the CPU gets used under heavy upstream load. The downstream, with > Windows guests at least, consumes 100% CPU the same as before. I > suspect you addressed this specifically with your select hack to avoid > the delay if there is pending slirp activity > > 2. overall latency "feels" improved as well, at least for basic stuff > like web browsing, etc. This is purely subjective. > > Nice work! I'll be testing with a Linux VM soon and try to pin down > some better benchmarks, free of Windows clutter. > > - Leo Reiter ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: Network Performance between Win Host and Linux 2006-04-12 18:19 ` Kenneth Duda @ 2006-04-12 18:26 ` Leonardo E. Reiter 0 siblings, 0 replies; 15+ messages in thread From: Leonardo E. Reiter @ 2006-04-12 18:26 UTC (permalink / raw) To: qemu-devel Ken, I'll check that on Linux-on-Linux... it's likely just some Windows overhead. Windows is my guest OS priority, which is why I tested on Windows. As for getting patches into the mainline, this is a job for the maintainers. Fabrice is the main person, but Paul Brook also merges a lot of patches in. I'm not sure what their process is, or to what extent they communicate with each other. I'm sure Paul and/or Fabrice would be kind enough to explain. I agree that there are lots of pending patches... in the case of yours specifically though, since it's so sweeping, I would guess that it probably needs more field testing before it becomes mainline. Regards, Leo Reiter Kenneth Duda wrote: > Leo, thank you for exercising this stuff. > > >>1. before your patches, the upstream transfers (guest->host) consumed >>almost no CPU at all, but of course were much slower. Now, about half >>the CPU gets used under heavy upstream load. > > > I am surprised that only half the CPU gets consumed --- that suggests > there's another factor of two improvement waiting to be made. If you > see anything like this with Linux-on-Linux, please let me know and > I'll try to track it down. > > Separately, I'm curious about the path for getting these changes into > the qemu mainline. If that's something you're in tune with and are in > the mood to summarize for me, I'd appreciate that. We love qemu but > there are some rough edges and I think we have something like 16 > patches we're maintaining internally, many of which might be helpful > for others. > > -Ken > > On 4/12/06, Leonardo E. Reiter <lreiter@win4lin.com> wrote: > >>Hi Ken, >> >>(all) the patches seem to work very well and be very stable with Windows >>2000 guests here. I measured some SMB over TCP/IP transfers, and got >>about a 1.5x downstream improvement and a 2x upstream improvement. You >>will likely get more boost from less convoluted protocols like FTP or >>something, but I didn't get around to testing that. Plus it's not clear >>how much Windows itself is impeding the bandwidth. I am using >>-kernel-kqemu. >> >>2 additional things I noticed: >> >>1. before your patches, the upstream transfers (guest->host) consumed >>almost no CPU at all, but of course were much slower. Now, about half >>the CPU gets used under heavy upstream load. The downstream, with >>Windows guests at least, consumes 100% CPU the same as before. I >>suspect you addressed this specifically with your select hack to avoid >>the delay if there is pending slirp activity >> >>2. overall latency "feels" improved as well, at least for basic stuff >>like web browsing, etc. This is purely subjective. >> >>Nice work! I'll be testing with a Linux VM soon and try to pin down >>some better benchmarks, free of Windows clutter. >> >>- Leo Reiter > > > > _______________________________________________ > Qemu-devel mailing list > Qemu-devel@nongnu.org > http://lists.nongnu.org/mailman/listinfo/qemu-devel -- Leonardo E. Reiter Vice President of Product Development, CTO Win4Lin, Inc. Virtual Computing from Desktop to Data Center Main: +1 512 339 7979 Fax: +1 512 532 6501 http://www.win4lin.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: Network Performance between Win Host and Linux 2006-04-11 22:36 ` [Qemu-devel] " Kenneth Duda 2006-04-12 14:04 ` Leonardo E. Reiter @ 2006-04-12 14:31 ` Leonardo E. Reiter 1 sibling, 0 replies; 15+ messages in thread From: Leonardo E. Reiter @ 2006-04-12 14:31 UTC (permalink / raw) To: qemu-devel On an additional note, Windows host users may want to try moving the arbitrary Sleep() in main_loop_wait() to the end of the function, and making that conditional if there are no I/O events pending. Otherwise, there is a fixed penalty and this does not take advantage of Ken's new patch to avoid the delay if there are pending slirp requests for example. When I have some time I will see if there is a better way to multiplex poll in Windows, so that you can use something like select() but still get interrupted. There might for example be something relevant in the newer winsock libraries (i.e. V2), but of course it has to be general enough to work on any type of fd. I apologize but I have not yet been able to successfully build QEMU on Windows, even after mucking with the mingw stuff. I probably need to spend more time on it at some point. But if anyone is using Windows and can compile QEMU from source, you can try moving the Sleep to see if that helps, especially after applying Ken's new patches. Actually Kazu's patch for TAP performance addresses this for TAP for example, so it should be easy to adapt to slirp... the code is in very close proximity. - Leo Reiter Kenneth Duda wrote: > The "qemu-slirp-performance" patch contains three improvements to qemu > slirp networking performance. Booting my virtual machine (which > NFS-mounts its root filesystem from the host) has been accelerated by > 8x, from over 5 minutes to 40 seconds. TCP throughput has been > accelerated from about 2 megabytes/sec to 9 megabytes/sec, in both > directions (measured using a simple python script). The system is > subjectively more responsive (for activities such as logging in or > running simple python scripts). -- Leonardo E. Reiter Vice President of Product Development, CTO Win4Lin, Inc. Virtual Computing from Desktop to Data Center Main: +1 512 339 7979 Fax: +1 512 532 6501 http://www.win4lin.com ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2006-04-12 18:26 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-11 17:20 [Qemu-devel] Network Performance between Win Host and Linux Kenneth Duda 2006-04-11 17:28 ` Paul Brook 2006-04-11 17:49 ` Kenneth Duda 2006-04-11 18:19 ` Helmut Auer 2006-04-12 2:10 ` Kazu 2006-04-11 20:40 ` Leonardo E. Reiter 2006-04-11 21:46 ` Kenneth Duda 2006-04-11 21:58 ` Leonardo E. Reiter 2006-04-11 22:42 ` Kenneth Duda 2006-04-11 21:00 ` Leonardo E. Reiter 2006-04-11 22:36 ` [Qemu-devel] " Kenneth Duda 2006-04-12 14:04 ` Leonardo E. Reiter 2006-04-12 18:19 ` Kenneth Duda 2006-04-12 18:26 ` Leonardo E. Reiter 2006-04-12 14:31 ` Leonardo E. Reiter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).