From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261336AbVFVOnw (ORCPT ); Wed, 22 Jun 2005 10:43:52 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261335AbVFVOnu (ORCPT ); Wed, 22 Jun 2005 10:43:50 -0400 Received: from mail06.syd.optusnet.com.au ([211.29.132.187]:16011 "EHLO mail06.syd.optusnet.com.au") by vger.kernel.org with ESMTP id S261497AbVFVOlP (ORCPT ); Wed, 22 Jun 2005 10:41:15 -0400 From: Con Kolivas To: Ingo Molnar Subject: Re: [patch] fix SMT scheduler latency bug Date: Thu, 23 Jun 2005 00:40:55 +1000 User-Agent: KMail/1.8.1 Cc: Andrew Morton , linux-kernel@vger.kernel.org, William Weston References: <20050622102541.GA10043@elte.hu> In-Reply-To: <20050622102541.GA10043@elte.hu> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1664757.Vt8IERZCmk"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200506230040.58846.kernel@kolivas.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org --nextPart1664757.Vt8IERZCmk Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi On Wed, 22 Jun 2005 20:25, Ingo Molnar wrote: > William Weston reported unusually high scheduling latencies on his x86 > HT box, on the -RT kernel. I managed to reproduce it on my HT box and > the latency tracer shows the incident in action: Thanks for picking this up. I've had a long hard look at the code and your= =20 patch. > the reason for this anomaly is the following code in dependent_sleeper(): > > /* > * If a user task with lower static priority than the > * running task on the SMT sibling is trying to schedule, > * delay it till there is proportionately less timeslice > * left of the sibling task to prevent a lower priority > * task from using an unfair proportion of the > * physical cpu's resources. -ck > */ > [...] > if (((smt_curr->time_slice * (100 - > sd->per_cpu_gain) / 100) > task_timeslice(p))) > ret =3D 1; > > note that in contrast to the comment above, we dont actually do the > check based on static priority, we do the check based on timeslices. But > timeslices go up and down, and even highprio tasks can randomly have > very low timeslices (just before their next refill) and can thus be > judged as 'lowprio' by the above piece of code.=20 I don't see it like that. task_timeslice(p) will always return the same val= ue=20 based purely on static priority and smt_curr->time_slice cannot ever be=20 larger than task_timeslice(p) unless there is a significant enough 'nice'=20 difference. It is not smt_curr that is rescheduled as a result of this test= ,=20 it is p that is not scheduled and we look at p's task_timeslice which does= =20 not alter. The task that is delayed in either case is dependant on its stat= ic=20 priority which will determine its task_timeslice() vs the current value of= =20 =2D>time_slice on the sibling which is emptied as that task runs, and it is= =20 expected to fluctuate. > This condition is=20 > clearly buggy. The correct test is to check for static_prio _and_ to > check for the preemption priority. Even on different static priority > levels, a higher-prio interactive task should not be delayed due to a > higher-static-prio CPU hog. > - if (((smt_curr->time_slice * (100 - sd->per_cpu_gain) / > - 100) > task_timeslice(p))) > + if (smt_curr->static_prio < p->static_prio && > + !TASK_PREEMPTS_CURR(p, smt_rq) && > + smt_slice(smt_curr, sd) > task_timeslice(p)) Checking for smt_curr->static_prio < p->static_prio appears redundant to me= =20 because the condition can only be met if there is a significant difference = in=20 the different timeslice case as I mentioned above. > + if (TASK_PREEMPTS_CURR(p, smt_rq) && Is this check necessary? The proportion is supposed to be distributed=20 according to static priority only. If this code is causing large latencies then I believe it can only occur wi= th=20 different nice levels running on siblings and high priority tasks starting= =20 new timeslices repeatedly and never getting to the last per_cpu_gain% of=20 their timeslice. Ingo do you think this might be what is being seen? If thi= s=20 truly can happen then this code will have to move to a jiffy based proporti= on=20 as the real time code is to prevent this problem.=20 Cheers, Con --nextPart1664757.Vt8IERZCmk Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQBCuXh6ZUg7+tp6mRURAg4+AJ4ytXCcw23RnSk9p99YPsOO5WqQqACfZjhh 31iwOikh8pjt45hUoNsJo3M= =8mxl -----END PGP SIGNATURE----- --nextPart1664757.Vt8IERZCmk--