From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pw0-f42.google.com (mail-pw0-f42.google.com [209.85.160.42]) by ozlabs.org (Postfix) with ESMTP id B1814B7CF6 for ; Fri, 26 Mar 2010 13:04:37 +1100 (EST) Received: by pwj8 with SMTP id 8so6055839pwj.15 for ; Thu, 25 Mar 2010 19:04:36 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1269568527.8599.259.camel@pasglop> References: <43c137a81003241941p84cba56y3e02e40cb22623e2@mail.gmail.com> <1269505301.8599.238.camel@pasglop> <201003251105.10033.arnd@arndb.de> <43c137a81003250800n660195c5k42c8516068aeda8d@mail.gmail.com> <1269549524.8599.243.camel@pasglop> <43c137a81003251811s52ac72eaud921d187e9747098@mail.gmail.com> <1269568527.8599.259.camel@pasglop> Date: Fri, 26 Mar 2010 10:04:35 +0800 Message-ID: <43c137a81003251904ged9f17fof65e74bc766167d@mail.gmail.com> Subject: Re: Continual reading from the PowerPc time base register is not stable From: Csdncannon To: Benjamin Herrenschmidt Content-Type: multipart/mixed; boundary=001636b144abc085c20482aa9513 Cc: linuxppc-dev@ozlabs.org, Arnd Bergmann List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , --001636b144abc085c20482aa9513 Content-Type: multipart/alternative; boundary=001636b144abc085b60482aa9511 --001636b144abc085b60482aa9511 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Ben Attached is the previous failing one. Thanks Gino 2010/3/26 Benjamin Herrenschmidt > On Fri, 2010-03-26 at 09:11 +0800, Csdncannon wrote: > > After trying the new code with "isync" and unsigned long long > > convertion, this problem doesn't happen(I tested for several minutes). > > But the previous block of codes(lacking of isync) is borrowed from > > kernel. And if this is a bug of kernel? > > There's an outstanding question about that. Some processors make mftb > context synchronizing but it appears that it may not be the case for all > of them. > > Thus indeed, we -might- need some isync's in some places, it's not > totally clear to me though. > > Can you send the code that fails (without the isync) ? The one you sent > did have them everywhere. > > Cheers, > Ben. > > > Thanks > > Gino > > > > 2010/3/26 Benjamin Herrenschmidt > > On Thu, 2010-03-25 at 23:00 +0800, Csdncannon wrote: > > > I am really sorry that the previously attached code is > > wrong, this one > > > "timebase.c" is the right one, and the "log_timebase" file > > is the > > > right log. > > > > > > We are using FreeScale PowerPc 8378, kernel 2.6.28 and > > compiled as > > > 32-bit. > > > > > > And despite all those sync/isync you can still observe the > > timebase > > going backward ? That sounds scary. However, at this stage all > > I can > > suggest is getting freescale folks to have a look, as this > > should really > > not happen. Maybe there's some setting with that specific SoC > > that is > > missing or similar... > > > > Cheers, > > Ben. > > > > > > > > > > Thanks > > > Gino > > > > > > 2010/3/25 Arnd Bergmann > > > On Thursday 25 March 2010, Benjamin Herrenschmidt > > wrote: > > > > On Thu, 2010-03-25 at 10:41 +0800, Csdncannon > > wrote: > > > > > In my program, the value of the 64-bit > > time base > > > register is > > > > > read out, and you will find the later value is > > even > > > smaller than the > > > > > earlier value from the log =93log_timebase=94. Wh= ile > > the > > > kernel depends on > > > > > the accuracy of the timebase for the > > compensation of the > > > lost PIT > > > > > interrupt, the negative value between two > > continual > > > timebase reading > > > > > will bring to the jump of the jiffies. And this > > timebase > > > problem will > > > > > bring to the instability of the gettimeofday > > system call. > > > > > > > > > > Do you have any idea about this > > problem, thanks > > > for your any > > > > > advice. Attached is the code and log. > > > > > > > > This is a concern, it should definitely not > > happen. What > > > machine is > > > > that ? is the code compiled 32-bit or 64-bit ? > > What kernel > > > version ? > > > > > > > > Arnd, any chance that could relate to the bug > > you've been > > > chasing on > > > > Cell ? > > > > > > > > > We're still busy with the problem analysis on Cell, > > waiting > > > for a time > > > slot to run the next test kernel. So far it seems > > like the > > > timebase > > > is actually synchronized at a significant accuracy > > on QS22 to > > > never > > > cause this problem with correct code, however it is > > possible > > > to > > > observe incorrect timebase values on Cell whenever > > the mftb > > > instruction > > > is not serialized with memory accesses, e.g. by > > using an isync > > > in front > > > of the mftb. On Power6 and other CPUs, that problem > > will not > > > happen. > > > > > > Arnd > > > > > > > > > > > > > > --001636b144abc085b60482aa9511 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Ben
=A0=A0=A0=A0=A0=A0 Attached is the previous failing one.

Than= ks
Gino

2010/3/26 Benjamin Herrenschmi= dt <benh@k= ernel.crashing.org>
On Fri, 2010-03-26 at 09:11 +0800, Csdncannon wrote:
> After trying the new code with "isync" and unsigned long lon= g
> convertion, this problem doesn't happen(I tested for several minut= es).
> But the previous block of codes(lacking of isync) is borrowed from
> kernel. And if this is a bug of kernel?

There's an outstanding question about that. Some processors make = mftb
context synchronizing but it appears that it may not be the case for all of them.

Thus indeed, we -might- need some isync's in some places, it's not<= br> totally clear to me though.

Can you send the code that fails (without the isync) ? The one you sent
did have them everywhere.

Cheers,
Ben.

> Thanks
> Gino
>
> 2010/3/26 Benjamin Herrenschmidt <benh@kernel.crashing.org>
> =A0 =A0 =A0 =A0 On Thu, 2010-03-25 at 23:00 +0800, Csdncannon wrote: > =A0 =A0 =A0 =A0 > I am really sorry that the previously attached co= de is
> =A0 =A0 =A0 =A0 wrong, this one
> =A0 =A0 =A0 =A0 > "timebase.c" is the right one, and the = "log_timebase" file
> =A0 =A0 =A0 =A0 is the
> =A0 =A0 =A0 =A0 > right log.
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > We are using FreeScale PowerPc 8378, kernel 2.6.2= 8 and
> =A0 =A0 =A0 =A0 compiled as
> =A0 =A0 =A0 =A0 > 32-bit.
>
>
> =A0 =A0 =A0 =A0 And despite all those sync/isync you can still observe= the
> =A0 =A0 =A0 =A0 timebase
> =A0 =A0 =A0 =A0 going backward ? That sounds scary. However, at this s= tage all
> =A0 =A0 =A0 =A0 I can
> =A0 =A0 =A0 =A0 suggest is getting freescale folks to have a look, as = this
> =A0 =A0 =A0 =A0 should really
> =A0 =A0 =A0 =A0 not happen. Maybe there's some setting with that s= pecific SoC
> =A0 =A0 =A0 =A0 that is
> =A0 =A0 =A0 =A0 missing or similar...
>
> =A0 =A0 =A0 =A0 Cheers,
> =A0 =A0 =A0 =A0 Ben.
>
>
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > Thanks
> =A0 =A0 =A0 =A0 > Gino
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > 2010/3/25 Arnd Bergmann <arnd@arndb.de>
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 On Thursday 25 March 2010, Benjam= in Herrenschmidt
> =A0 =A0 =A0 =A0 wrote:
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > On Thu, 2010-03-25 at 10:41 = +0800, Csdncannon
> =A0 =A0 =A0 =A0 wrote:
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > =A0 =A0 =A0 =A0 =A0In m= y program, the value of the 64-bit
> =A0 =A0 =A0 =A0 time base
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 register is
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > read out, and you will = find the later value is
> =A0 =A0 =A0 =A0 even
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 smaller than the
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > earlier value from the = log =93log_timebase=94. While
> =A0 =A0 =A0 =A0 the
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 kernel depends on
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > the accuracy of the tim= ebase for the
> =A0 =A0 =A0 =A0 compensation of the
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 lost PIT
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > interrupt, the negative= value between two
> =A0 =A0 =A0 =A0 continual
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 timebase reading
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > will bring to the jump = of the jiffies. And this
> =A0 =A0 =A0 =A0 timebase
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 problem will
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > bring to the instabilit= y of the gettimeofday
> =A0 =A0 =A0 =A0 system call.
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > >
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > =A0 =A0 =A0 =A0 =A0Do y= ou have any idea about this
> =A0 =A0 =A0 =A0 problem, thanks
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 for your any
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > > advice. Attached is the= code and log.
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > This is a concern, it should= definitely not
> =A0 =A0 =A0 =A0 happen. What
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 machine is
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > that ? is the code compiled = 32-bit or 64-bit ?
> =A0 =A0 =A0 =A0 What kernel
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 version ?
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > Arnd, any chance that could = relate to the bug
> =A0 =A0 =A0 =A0 you've been
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 chasing on
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 > Cell ?
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 We're still busy with the pro= blem analysis on Cell,
> =A0 =A0 =A0 =A0 waiting
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 for a time
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 slot to run the next test kernel.= So far it seems
> =A0 =A0 =A0 =A0 like the
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 timebase
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 is actually synchronized at a sig= nificant accuracy
> =A0 =A0 =A0 =A0 on QS22 to
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 never
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 cause this problem with correct c= ode, however it is
> =A0 =A0 =A0 =A0 possible
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 to
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 observe incorrect timebase values= on Cell whenever
> =A0 =A0 =A0 =A0 the mftb
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 instruction
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 is not serialized with memory acc= esses, e.g. by
> =A0 =A0 =A0 =A0 using an isync
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 in front
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 of the mftb. On Power6 and other = CPUs, that problem
> =A0 =A0 =A0 =A0 will not
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 happen.
> =A0 =A0 =A0 =A0 >
> =A0 =A0 =A0 =A0 > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Arnd
> =A0 =A0 =A0 =A0 >
>
>
>
>



--001636b144abc085b60482aa9511-- --001636b144abc085c20482aa9513 Content-Type: application/octet-stream; name="timebase.c" Content-Disposition: attachment; filename="timebase.c" Content-Transfer-Encoding: base64 X-Attachment-Id: f_g78cn0ma0 LyogVFNDIHN5bmMgdGVzdA0KICoJCWJ5OiBqb2huIHN0dWx0eiAoam9obnN0dWxAdXMuaWJtLmNv bSkNCiAqCQkoQykgQ29weXJpZ2h0IElCTSAyMDAzLCAyMDA1DQogKgkJTGljZW5zZWQgdW5kZXIg dGhlIEdQTA0KICovDQoNCg0KI2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3lzL3RpbWUu aD4NCiNpbmNsdWRlIDxzdGRsaWIuaD4NCg0KI2RlZmluZSBDQUxMU19QRVJfTE9PUCA2NA0KDQp2 b2xhdGlsZSB1bnNpZ25lZCBsb25nIGxvbmcgZ2V0VGltZUJhc2UoKQ0Kew0KCXVuc2lnbmVkIGxv bmcgdXBwZXIsbG93ZXIsdXBwZXIyOw0KCWRvIHsNCgkJYXNtIHZvbGF0aWxlKCJzeW5jIjo6OiJt ZW1vcnkiKTsNCgkJYXNtIHZvbGF0aWxlKCJtZnRidSAlMCIgOiAiPXIiICh1cHBlcikpOw0KCQlh c20gdm9sYXRpbGUoInN5bmMiOjo6Im1lbW9yeSIpOw0KCQlhc20gdm9sYXRpbGUoIm1mdGJsICUw IiA6ICI9ciIgKGxvd2VyKSk7DQoJCWFzbSB2b2xhdGlsZSgic3luYyI6OjoibWVtb3J5Iik7DQoJ CWFzbSB2b2xhdGlsZSgibWZ0YnUgJTAiIDogIj1yIiAodXBwZXIyKSk7DQoJCWFzbSB2b2xhdGls ZSgic3luYyI6OjoibWVtb3J5Iik7DQoJfXdoaWxlKHVwcGVyMiE9dXBwZXIpOw0KDQoJcmV0dXJu ICh1cHBlcjw8MzIpfGxvd2VyOw0KfQ0KDQppbnQgbWFpbihpbnQgYXJnYywgY2hhciAqYXJndltd KQ0Kew0KCXZvbGF0aWxlIHVuc2lnbmVkIGxvbmcgbG9uZyBsaXN0W0NBTExTX1BFUl9MT09QXTsN Cglib29sIGJhZFtDQUxMU19QRVJfTE9PUF07DQoJaW50IGksIGluY29uc2lzdGVudDsNCg0KDQoJ LyogdGltZXN0YW1wIHN0YXJ0IG9mIHRlc3QgKi8NCglzeXN0ZW0oImRhdGUiKTsNCgl3aGlsZSgx KXsNCgkJaW5jb25zaXN0ZW50ID0gMDsNCg0KCQkvKiBGaWxsIGxpc3QgKi8NCgkJZm9yKGk9MDsg aSA8IENBTExTX1BFUl9MT09QOyBpKyspDQoJCQlsaXN0W2ldID0gZ2V0VGltZUJhc2UoKTsNCgkJ DQoJCS8qIENoZWNrIGZvciBpbmNvbnNpc3RlbmNpZXMgKi8NCgkJZm9yKGk9MDsgaSA8IENBTExT X1BFUl9MT09QLTE7IGkrKykNCgkJew0KCQkJaWYobGlzdFtpXSA+IGxpc3RbaSsxXSkNCgkJCXsN CgkJCQlpbmNvbnNpc3RlbnQgPSBpKzE7DQoJCQkJYmFkW2ldID0gdHJ1ZTsNCgkJCX0NCgkJCWVs c2V7DQoJCQkJYmFkW2ldID0gZmFsc2U7DQoJCQl9DQoJCX0NCg0KCQkvKiBkaXNwbGF5IGluY29u c2lzdGVuY3kgKi8NCgkJaWYoaW5jb25zaXN0ZW50KXsNCgkJCWluY29uc2lzdGVudC0tOw0KCQkJ Zm9yKGk9MDsgaSA8IENBTExTX1BFUl9MT09QOyBpKyspew0KCQkJCWlmKGJhZFtpXSA9PSB0cnVl KQ0KCQkJCQlwcmludGYoIi0tLS0tLS0tLS0tLS0tLS0tLS0tXG4iKTsNCgkJCQlwcmludGYoIjB4 JWxseFxuIixsaXN0W2ldKTsNCgkJCQlpZihiYWRbaS0xXSA9PSB0cnVlICYmIGJhZFtpXSA9PSBm YWxzZSApDQoJCQkJCXByaW50ZigiLS0tLS0tLS0tLS0tLS0tLS0tLS1cbiIpOw0KCQkJfQ0KCQkJ ZmZsdXNoKDApOw0KCQkJLyogdGltZXN0YW1wIGluY29uc2lzdGVuY3kqLw0KCQkJc3lzdGVtKCJk YXRlIik7CQ0KCQl9DQoNCgl9DQoJcmV0dXJuIDA7DQp9DQoNCg== --001636b144abc085c20482aa9513--