From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <50F45358.1020601@xenomai.org> Date: Mon, 14 Jan 2013 19:50:00 +0100 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <50F19CE1.5080106@zultron.com> <50F19DE4.4030805@xenomai.org> <50F239E1.40400@zultron.com> <50F2A59D.5050600@xenomai.org> <50F30781.3050302@zultron.com> <50F30DE5.7030007@xenomai.org> <50F3F36A.8050804@siemens.com> In-Reply-To: <50F3F36A.8050804@siemens.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] 3.5.7 "I-pipe: could not find timer" (Was: Re: Kernel OOPS during regression tests) List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: John Morris , Xenomai On 01/14/2013 01:00 PM, Jan Kiszka wrote: > On 2013-01-13 20:41, Gilles Chanteperdrix wrote: >> On 01/13/2013 08:14 PM, John Morris wrote: >> >>> Hi Gilles and Jan, >>> >>> Note change of thread subject. I'm starting to get confused. >>> >>> On 01/13/2013 06:16 AM, Gilles Chanteperdrix wrote: >>>> On 01/13/2013 05:36 AM, John Morris wrote: >>>> >>>>> On 01/12/2013 11:31 AM, Gilles Chanteperdrix wrote: >>>>>> On 01/12/2013 06:26 PM, John Morris wrote: >>>>>>> 1) Most worrisome is "kernel BUG at mm/mmap.c:2313! invalid opcode: >>>>>>> 0000 [#2] SMP". Is this related to HEAPSZ or STACKPOOLSZ? My mind is >>>>>>> getting foggy about all the things I've seen, but it seems like it was >>>>>>> happening earlier in the tests until these config values were quadrupled. >>>>>> >>>>>> >>>>>> Could you check whether you can reproduce this issue with the I-pipe >>>>>> patch for 3.5.7 ? The next xenomai release will be based on this version >>>>>> on x86 anyway. Branch for-core-3.5.7 in ipipe-gch.git >>>>> >>>>> Different problem; Xenomai wouldn't start: >>>>> >>>>> I-pipe: could not find timer for cpu #0 >>>>> >>>>> dmesg: >>>>> http://www.zultron.com/static/2013/01/xenomai/3.5.7-test/foo-dmesg-3.5.7.log >>>>> >>>>> .config: >>>>> www.zultron.com/static/2013/01/xenomai/3.5.7-test/foo-kernel-3.5.7.config >>>>> >>>>> FYI, I found this same problem on two of my systems while testing your >>>>> Debian packages. Both AMD Athlon II 64-bit (one single, one dual core). >>>>> They're about the same generation of motherboards, AM2 or AM2+ socket. >>>>> One is AMD 770 chipset, the other NVidia GeForce 6100 / nForce 430. >>>>> >>>>> Hardware looks similar to Mariusz's in this post, where he had the same >>>>> problem: >>>>> >>>>> http://www.xenomai.org/pipermail/xenomai/2012-December/027121.html >>>>> >>>>> He's also running AMD 64-bit on a Gigabyte motherboard, but the next >>>>> generation AM3 socket, Phenom CPU, AMD 890 chipset. I don't have a C1E >>>>> BIOS option on these boards to enable/disable. These same motherboards >>>>> don't suffer this problem with mainline Xenomai on 3.5.3. >>>> >>>> >>>> If you had the same problem as Marius, you would have seen it with >>>> 3.5.3, and you would get the message in the dmesg about C1E, so, it is >>>> probably something else. >>> >>> Yes, I'm definitely getting confused. I did see the same problem with >>> C1E, but only while running your 3.5.7 .debs, and not in the 3.5.7 el6 >>> packages that are the main subject of this sub-thread: >>> >>> http://www.zultron.com/static/2013/01/xenomai/3.5.7-deb-gilles/dmesg-3.5.7-deb-gilles.log >> >> >> Ah, that is because I rebased the I-pipe tree in between, and at some >> point the code printing the message was wrong (ATOMIC_INIT(0) instead of >> ATOMIC_INIT(-1)). That is my fault then, sorry. >> >>> >>>> Could you run >>>> >>>> cat /proc/timer_list >>> >>> Back to el6 again, 3.5.7 i-pipe: >>> >>> http://www.zultron.com/static/2013/01/xenomai/3.5.7-test/foo-cat-proc-timer_list-3.5.7-test.log >> >> >> The LAPIC is definitely up and running (mode: 3). So, it probably means >> that the erratum detection is not sufficient to decide not to use a >> LAPIC. Checking your logs, we see: >> >> using AMD E400 aware idle routine >> >> which means the LAPIC could potentially be unusable, but the idle >> routine also checks for a bit in a K8 specific MSR and prints the message: >> >> System has AMD C1E enabled >> >> If this bit is set, and in your case the message is not printed so the >> bit is not set. So, the LAPIC is usable, but due to the changes I made >> to try and print a message in Marius case, I broke the detection in your >> case. >> >> I have just pushed a rework for this commit in the for-core-3.5.7 branch >> in ipipe-gch git. > > Could you fold those changes into a single patch and add a few words to > the changelog that setup_APIC_timer is too early to check? Then I'll > merge it into the x86 queue. I am trying to reach a point where we add bug-fixes and only bug-fixes to re-release 2.6.2, so, the for-core-3.5.7 branch is what I intend to put in this release, I would like to avoid the other commits in your branch. -- Gilles.