From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: About vcpu wakeup and runq tickling in credit Date: Fri, 16 Nov 2012 11:53:54 +0100 Message-ID: <1353063234.5351.107.camel@Solace> References: <1350999260.5064.56.camel@Solace> <5086B4DF.6060701@eu.citrix.com> <1352981447.5351.51.camel@Solace> <50A4DD95.5020107@eu.citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7231670897189083106==" Return-path: In-Reply-To: <50A4DD95.5020107@eu.citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: George Dunlap Cc: Keir Fraser , David Vrabel , Jan Beulich , xen-devel List-Id: xen-devel@lists.xenproject.org --===============7231670897189083106== Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-v5YesdCyogvCYj4ko/ra" --=-v5YesdCyogvCYj4ko/ra Content-Type: multipart/mixed; boundary="=-lSdy+MLWirKNyQYp3l8x" --=-lSdy+MLWirKNyQYp3l8x Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable (Cc-ing David as it looks like he uses xenalyze quite a bit, and I'm=20 seeking for any advice on how to squeeze data from there too :-P) On Thu, 2012-11-15 at 12:18 +0000, George Dunlap wrote: > Maybe what we should do is do the wake-up based on who is likely to run= =20 > on the current cpu: i.e., if "current" is likely to be pre-empted, look= =20 > at idlers based on "current"'s mask; if "new" is likely to be put on the= =20 > queue, look at idlers based on "new"'s mask. >=20 Ok, find attached the two (trivial) patches that I produced and am testing in these days. Unfortunately, early results shows that I/we might be missing something. In fact, although I still don't yet have the numbers for the NUMA-aware scheduling case (which is what originated all this! :-D), comparing 'upstream' and 'patched' (namely, 'upstream' plus the two attached patches) I can spot some perf regressions. :-( Here's the results of running some benchmarks on 2, 6 and 10 VMs. Each VM has 2 VCPUs and they run and execute the benchmarks concurrently on a 16 CPUs host. (Each test is repeated 3 times, and avg+/-stddev is what is reported). Also, the VCPUs where statically pinned on the host's PCPUs. As already said, numbers for no-pinning and NUMA-scheduling will follow. + sysbench --test=3Dmemory (throughput, higher is better) #VMs | upstream | patched 2 | 550.97667 +/- 2.3512355 | 540.185 +/- 21.416892 6 | 443.15 +/- 5.7471797 | 442.66389 +/- 2.1071732 10 | 313.89233 +/- 1.3237493 | 305.69567 +/- 0.3279853 + sysbench --test=3Dcpu (time, lower is better) #VMs | upstream | patched 2 | 47.8211 +/- 0.0215503 | 47.816117 +/- 0.0174079 6 | 62.689122 +/- 0.0877172 | 62.789883 +/- 0.1892171 10 | 90.321097 +/- 1.4803867 | 91.197767 +/- 0.1032667 + specjbb2005 (throughput, higher is better) #VMs | upstream | patched 2 | 49591.057 +/- 952.93384 | 50008.28 +/- 1502.4863 6 | 33538.247 +/- 1089.2115 | 33647.873 +/- 1007.3538 10 | 21927.87 +/- 831.88742 | 21869.654 +/- 578.236 So, as you can easily see, the numbers are very similar, with cases where the patches produces some slight performance reduction, while I was expecting the opposite, i.e., similar but a little bit better with the patches. For most of the runs of all the benchmarks, I have the full traces (although, only for SCHED-* events, IIRC), so I can investigate more. It's an huge amount of data, so it's really hard to make sense out of it, and any advice and direction on that would be much appreciated. For instance, looking at one of the runs of sysbench-memory, here's what I found. With 10 VMs, the memory throughput reported by one of the VM during one of the runs is as follows: upstream: 315.68 MB/s patched: 306.69 MB/s I then went through the traces and I found out that the patched case lasted longer (for transferring the same amount of memory, hence the lower throughput), but with the following runstate related results: upstream: running for 73.67% of the time runnable for 24.94% of the time patched: running for 74.57% of the time runnable for 24.10% of the time And that is consistent with other random instances I checked. So, it looks like the patches are, after all, doing their job in increasing (at least a little) the running time, at the expenses of the runnable time, of the various VCPUs, but the benefits of that is being all eaten by some other effect --to the point that sometimes things go even worse-- that I'm not able to identify... For now! :-P Any idea about what's going on and what I should check to better figure that out? Thanks a lot and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-lSdy+MLWirKNyQYp3l8x Content-Disposition: attachment; filename="xen-sched_credit-clarify-cpumask-and-during-tickle.patch" Content-Type: text/x-patch; name="xen-sched_credit-clarify-cpumask-and-during-tickle.patch"; charset="UTF-8" Content-Transfer-Encoding: base64 IyBIRyBjaGFuZ2VzZXQgcGF0Y2gNCiMgUGFyZW50IGIwYzM0MmI3NDk3NjViZjI1NGM2NjQ4ODNk NGY1ZTI4OTFjMWZmMTgNCg0KZGlmZiAtciBiMGMzNDJiNzQ5NzYgeGVuL2NvbW1vbi9zY2hlZF9j cmVkaXQuYw0KLS0tIGEveGVuL2NvbW1vbi9zY2hlZF9jcmVkaXQuYwlGcmkgTm92IDA5IDExOjAy OjU0IDIwMTIgKzAxMDANCisrKyBiL3hlbi9jb21tb24vc2NoZWRfY3JlZGl0LmMJVGh1IE5vdiAx NSAxODoyMjo1NiAyMDEyICswMTAwDQpAQCAtMjU0LDcgKzI1NCwxMSBAQCBzdGF0aWMgaW5saW5l IHZvaWQNCiAgICAgQVNTRVJUKGN1cik7DQogICAgIGNwdW1hc2tfY2xlYXIoJm1hc2spOw0KIA0K LSAgICAvKiBJZiBzdHJpY3RseSBoaWdoZXIgcHJpb3JpdHkgdGhhbiBjdXJyZW50IFZDUFUsIHNp Z25hbCB0aGUgQ1BVICovDQorICAgIC8qDQorICAgICAqIElmIG5ldyBpcyBzdHJpY3RseSBoaWdo ZXIgcHJpb3JpdHkgdGhhbiBjdXJyZW50IFZDUFUsIGxldCBDUFUNCisgICAgICoga25vdyB0aGF0 IHJlLXNjaGVkdWxpbmcgaXMgbmVlZGVkLiBUaGF0IHdpbGwgbGlrZWx5IHBpY2stdXAgbmV3DQor ICAgICAqIGFuZCBwdXQgY3VyIGJhY2sgaW4gdGhlIHJ1bnF1ZXVlLg0KKyAgICAgKi8NCiAgICAg aWYgKCBuZXctPnByaSA+IGN1ci0+cHJpICkNCiAgICAgew0KICAgICAgICAgaWYgKCBjdXItPnBy aSA9PSBDU0NIRURfUFJJX0lETEUgKQ0KQEAgLTI5Niw3ICszMDAsNiBAQCBzdGF0aWMgaW5saW5l IHZvaWQNCiAgICAgICAgICAgICAgICAgZWxzZQ0KICAgICAgICAgICAgICAgICAgICAgY3B1bWFz a19vcigmbWFzaywgJm1hc2ssICZpZGxlX21hc2spOw0KICAgICAgICAgICAgIH0NCi0gICAgICAg ICAgICBjcHVtYXNrX2FuZCgmbWFzaywgJm1hc2ssIG5ldy0+dmNwdS0+Y3B1X2FmZmluaXR5KTsN CiAgICAgICAgIH0NCiAgICAgfQ0KIA0K --=-lSdy+MLWirKNyQYp3l8x Content-Disposition: attachment; filename="xen-sched_credit-fix-tickling" Content-Type: text/plain; name="xen-sched_credit-fix-tickling"; charset="UTF-8" Content-Transfer-Encoding: base64 IyBIRyBjaGFuZ2VzZXQgcGF0Y2gNCiMgUGFyZW50IDNhNzBiZDFkMDJjMTMzNDg1N2M4NGM5ZmI1 ZTFkZDIyYjY2MDNhMmMNCg0KZGlmZiAtciAzYTcwYmQxZDAyYzEgeGVuL2NvbW1vbi9zY2hlZF9j cmVkaXQuYw0KLS0tIGEveGVuL2NvbW1vbi9zY2hlZF9jcmVkaXQuYwlUaHUgTm92IDE1IDE4OjIy OjU2IDIwMTIgKzAxMDANCisrKyBiL3hlbi9jb21tb24vc2NoZWRfY3JlZGl0LmMJVGh1IE5vdiAx NSAxOTowMzoxOSAyMDEyICswMTAwDQpAQCAtMjc0LDcgKzI3NCw3IEBAIHN0YXRpYyBpbmxpbmUg dm9pZA0KICAgICB9DQogDQogICAgIC8qDQotICAgICAqIElmIHRoaXMgQ1BVIGhhcyBhdCBsZWFz dCB0d28gcnVubmFibGUgVkNQVXMsIHdlIHRpY2tsZSBhbnkgaWRsZXJzIHRvDQorICAgICAqIElm IHRoaXMgQ1BVIGhhcyBhdCBsZWFzdCB0d28gcnVubmFibGUgVkNQVXMsIHdlIHRpY2tsZSBzb21l IGlkbGVycyB0bw0KICAgICAgKiBsZXQgdGhlbSBrbm93IHRoZXJlIGlzIHJ1bm5hYmxlIHdvcmsg aW4gdGhlIHN5c3RlbS4uLg0KICAgICAgKi8NCiAgICAgaWYgKCBjdXItPnByaSA+IENTQ0hFRF9Q UklfSURMRSApDQpAQCAtMjg3LDcgKzI4NywxNyBAQCBzdGF0aWMgaW5saW5lIHZvaWQNCiAgICAg ICAgIHsNCiAgICAgICAgICAgICBjcHVtYXNrX3QgaWRsZV9tYXNrOw0KIA0KLSAgICAgICAgICAg IGNwdW1hc2tfYW5kKCZpZGxlX21hc2ssIHBydi0+aWRsZXJzLCBuZXctPnZjcHUtPmNwdV9hZmZp bml0eSk7DQorICAgICAgICAgICAgLyoNCisgICAgICAgICAgICAgKiBXaGljaCBpZGxlcnMgZG8g d2Ugd2FudCB0byB0aWNrbGU/IElmIG5ldyBoYXMgaGlnaGVyIHByaW9yaXR5LA0KKyAgICAgICAg ICAgICAqIGl0IHdpbGwgbGlrZWx5IHByZWVtcHQgY3VyIGFuZCBydW4gaGVyZS4gV2UgdGhlbiBu ZWVkIHNvbWVvbmUNCisgICAgICAgICAgICAgKiB3aGVyZSBjdXIgY2FuIHJ1biB0byBjb21lIGFu ZCBwaWNrIGl0IHVwLiBWaWNlLXZlcnNhLCBpZiBpdCBpcw0KKyAgICAgICAgICAgICAqIGN1ciB0 aGF0IHN0YXlzLCB3ZSBwb2tlIGlkbGVycyB3aGVyZSBuZXcgY2FuIHJ1bi4NCisgICAgICAgICAg ICAgKi8NCisgICAgICAgICAgICBpZiAoIG5ldy0+cHJpID4gY3VyLT5wcmkgKQ0KKyAgICAgICAg ICAgICAgICBjcHVtYXNrX2FuZCgmaWRsZV9tYXNrLCBwcnYtPmlkbGVycywgY3VyLT52Y3B1LT5j cHVfYWZmaW5pdHkpOw0KKyAgICAgICAgICAgIGVsc2UNCisgICAgICAgICAgICAgICAgY3B1bWFz a19hbmQoJmlkbGVfbWFzaywgcHJ2LT5pZGxlcnMsIG5ldy0+dmNwdS0+Y3B1X2FmZmluaXR5KTsN CisNCiAgICAgICAgICAgICBpZiAoICFjcHVtYXNrX2VtcHR5KCZpZGxlX21hc2spICkNCiAgICAg ICAgICAgICB7DQogICAgICAgICAgICAgICAgIFNDSEVEX1NUQVRfQ1JBTksodGlja2xlX2lkbGVy c19zb21lKTsNCg== --=-lSdy+MLWirKNyQYp3l8x-- --=-v5YesdCyogvCYj4ko/ra Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iEYEABECAAYFAlCmG0IACgkQk4XaBE3IOsQ09gCdH+Kzg9V/t3cCX8Z5D8625XEo n3gAoKMXf8empkHEQCajOOhB71HuOlko =6K1e -----END PGP SIGNATURE----- --=-v5YesdCyogvCYj4ko/ra-- --===============7231670897189083106== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============7231670897189083106==--