From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [Help] Trigger Watchdog when adding an IPI in vcpu_wake Date: Tue, 20 Sep 2016 01:24:17 +0200 Message-ID: <1474327457.4393.102.camel@citrix.com> References: <20160913085437.GA14514@linux-gk3p> <1473766217.6339.103.camel@citrix.com> <20160914104417.GA25572@linux-gk3p> <1473869928.6339.210.camel@citrix.com> <20160916024932.GA13124@linux-gk3p> <1474042028.6339.262.camel@citrix.com> <20160917003109.GA15467@vultr.guest> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2546525695181211249==" Return-path: In-Reply-To: <20160917003109.GA15467@vultr.guest> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Wei Yang Cc: xuquan8@huawei.com, liuxiaojian6@huawei.com, george.dunlap@eu.citrix.com, xen-devel@lists.xen.org, mengxu@cis.upenn.edu, Wei Yang List-Id: xen-devel@lists.xenproject.org --===============2546525695181211249== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-gBfTFMoFqkRv9Uo+y5y4" --=-gBfTFMoFqkRv9Uo+y5y4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, 2016-09-17 at 00:31 +0000, Wei Yang wrote: > On Fri, Sep 16, 2016 at 06:07:08PM +0200, Dario Faggioli wrote: > > But then again, if the system is not oversubscribed, I'd tend to > > think > > it to be tolerable, and I'd expect the biggest problem to be the > > work- > > stealing logic (considering the high number of pcpus), rather than > > the > > duration of the critical sections within vcpu_wake(). > >=20 > Yes, we are trying to improve the stealing part too. >=20 Great. > Sounds reasonable, vcpu_wake() is O(1) while "stealing" is O(N) in > terms of > #PCPUs. >=20 Exactly! > > I also think current upstream is a bit better, especially because > > it's > > mostly me that made it look the way it does. :-D >=20 > Ah, not intended to flattering you. >=20 EhEh! Sure, I was joking myself. :-) > >=20 > >=20 > > But I was actually describing how 4.1 works. In fact, in 4.1, if > > pcpu 6 > > is idle (se the '//xxx xxx xxx' comments I'm adding to the code > > excerpts: > >=20 > > =C2=A0=C2=A0=C2=A0=C2=A0if ( new->pri > cur->pri )=C2=A0=C2=A0//is true= , so we put pcpu 6 in mask > > =C2=A0=C2=A0=C2=A0=C2=A0{ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpu_set(cpu, mask); > > =C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0=C2=A0=C2=A0=C2=A0if ( cur->pri > CSCHED_PRI_IDLE )=C2=A0=C2=A0//= is false!! > > =C2=A0=C2=A0=C2=A0=C2=A0{ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0.... > > =C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0=C2=A0=C2=A0=C2=A0if ( !cpus_empty(mask) ) //the mask contains pc= pu 6 only > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0cpumask_raise_softirq(m= ask, SCHEDULE_SOFTIRQ); > >=20 >=20 > Hmm... don't have the code at hand, while looks you are right. I > misunderstood > the code. > So only one idler would be tickled even there would be several idlers > in the > system. I thought we would tickle several idlers, which may introduce > some > contention between them. >=20 You can ask Xen to tickle more than one idle, by means of that parameter. But again, tickling idlers --either one or many-- would only happen if there's actual need for it. > BTW, any idea behind the cycle_cpu()? If this is about the locality, > how about > cycle within the node? and cycle within the node where v->processor > is? >=20 Cycle is there as a form of load spreading, i.e., we don't want to risk tickling always the same set of pcpus, or always the ones with the lower cpu IDs. You're right that taking locality into account could be a good thing in big systems. No, that is not done, not in 4.1 nor in upstream, and it may be something actually worth trying (although, again, it's probably unrelated to your issue). > > And again, I'd personally spend some more time --even by > > temporarily > > and hackishly instrumenting the code-- to understand better where > > the > > bottleneck is. > >=20 >=20 > Hmm... usually we use xentrace and lock profile to identify the > bottleneck, > other method you would like to suggest? and instrumenting the code > means to > add some log in code? >=20 xentrace and lock profile is all I use too, and there's not much else, without touching the code. And in fact, yes, with "instrumenting the code" I means temporary changing the code to display the information you need. In this specific case, rather than adding printks here and there (which, e.g., is what you usually do for debugging crashes or alike), I'd see about modifying a little bit either xentrace or lock profiling code (or both!) to make them provide the info you need. Sorry, but I don't think I can be much more specific without going looking at the code and actually trying to do that... > > E.g., something that has proven effective for others (and for which > > I've got an unfinished and never submitted patch to upstream), is > > keeping track of what pcpus actually have at least 1 vcpu in their > > runqueues (i.e., at least 1 vcpus that is runnable and not running) > > and, inside balance_load(), only consider those when looking for > > work > > to steal. >=20 > Looks like we keep a cpumask with those who have 1 more vcpus and > during > stealing process, just scan those instead of the online cpumask. >=20 Yep, that sounds like a plan, and was along the lines of what I was doing. If you want to give it a try, patches (for upstream, of course :-D) are welcome. :-P Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-gBfTFMoFqkRv9Uo+y5y4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJX4HOhAAoJEBZCeImluHPuAUQQAItA2RIIXjrwhl0VDOkYsxej r0oUMNz/dMh4wnqwjrSZw4p+bL1VyQoq+/zmpk3KxPhyAVRGWPyvUDS9L2Z9g0MV LHPjQF7SlsrNnidHNHvGXZfuy9wep+3tAEUzBimtS8/Wbd26SOYqjzx5bmVIvLHe MzpRTQA9PKsYFr8idlW4a9Ypdu/1SXu5QtY2XksfTJUfcVvKrwnw3LPnDc1S7f9x Xm1W3/8tA30kFCY8cluFFK2d5K26ZFrh3GnszKwh3VxcjM60QF8VWlC5KaD52jdz k98J/jwUY4tZolmO/B7h/rFUxQBQDSH3EOscctjWpGLnBIdmNSOpODyW+nJXf5ux t64M6lU0zmh1K94DRVfajzpsO+Mt1x4B1pF9q3J/smgbW9Ll/a9oiGndDJ4CB2aK ULkjvkRm9vxX8mApHRqyGwUU/Xw/rcv0Bk/XHWEViCVUrf1Qs9y4FTmDpAEZkUu/ gbNggyWeupkgWqxtdbUDG2vJle7gQ1Zfj7KljpT0gpfaFvTGOKWJrCyuRZgimY5Q 11kChF/FHPyWsZ0tvmSeyrAYwje/NEBegUW7rZvhMPe5NJWR7w4OASyiK3wXMuau glnQzAWLzvr8gDKtDyalJvCJQSdhfaW1WH8OyODtWUpz/u9yx53faSPTNlfkaYiR 7zb+EtPtLewgc12nF1bX =Bwwu -----END PGP SIGNATURE----- --=-gBfTFMoFqkRv9Uo+y5y4-- --===============2546525695181211249== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============2546525695181211249==--