* Re: Status update on sparc32 genirq support
@ 2011-03-08 7:01 David Miller
2011-03-08 7:08 ` Sam Ravnborg
` (25 more replies)
0 siblings, 26 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 7:01 UTC (permalink / raw)
To: sparclinux
From: Sam Ravnborg <sam@ravnborg.org>
Date: Tue, 8 Mar 2011 07:00:39 +0100
> Added davem...
> We see strange SEGV faults in userspace and fail to read from ext2..
> All on some (but not all) sparc32 boxes.
I saw the original report.
But reverting this commit is the wrong thing to do from what I can
tell.
Either we have:
1) A compiler code gen bug.
2) Some piece of code which is sparc32 specific is invoking memset
or memcpy in a way which makes assumptions which are in fact not
valid
3) The code change is merely making cache offsets change, masking the
true problem
Especially in cases #2 and #3 we're just hiding a heisen-bug and
not fixing the real problem.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
@ 2011-03-08 7:08 ` Sam Ravnborg
2011-03-08 7:19 ` David Miller
` (24 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-08 7:08 UTC (permalink / raw)
To: sparclinux
On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
> From: Sam Ravnborg <sam@ravnborg.org>
> Date: Tue, 8 Mar 2011 07:00:39 +0100
>
> > Added davem...
> > We see strange SEGV faults in userspace and fail to read from ext2..
> > All on some (but not all) sparc32 boxes.
>
> I saw the original report.
>
> But reverting this commit is the wrong thing to do from what I can
> tell.
>
> Either we have:
>
> 1) A compiler code gen bug.
>
> 2) Some piece of code which is sparc32 specific is invoking memset
> or memcpy in a way which makes assumptions which are in fact not
> valid
>
> 3) The code change is merely making cache offsets change, masking the
> true problem
>
> Especially in cases #2 and #3 we're just hiding a heisen-bug and
> not fixing the real problem.
Agree on this.
But first step is to get confirmation that reverting this commit
indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
I hope we will find that 2) is the culprint.
Sam
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
2011-03-08 7:08 ` Sam Ravnborg
@ 2011-03-08 7:19 ` David Miller
2011-03-08 7:37 ` Marcel van Nies
` (23 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 7:19 UTC (permalink / raw)
To: sparclinux
From: Sam Ravnborg <sam@ravnborg.org>
Date: Tue, 8 Mar 2011 08:08:42 +0100
> But first step is to get confirmation that reverting this commit
> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
> I hope we will find that 2) is the culprint.
Agreed, information never hurts :-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
2011-03-08 7:08 ` Sam Ravnborg
2011-03-08 7:19 ` David Miller
@ 2011-03-08 7:37 ` Marcel van Nies
2011-03-08 7:45 ` Marcel van Nies
` (22 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 7:37 UTC (permalink / raw)
To: sparclinux
Hi,
It appears that two consecutive commits are causing problems on
hyperSPARC, I noticed that too late.
Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
earlier) only causes the system to hang, not panic:
[ 11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
[ 11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
[ 11.299998] kjournald starting. Commit interval 5 seconds
[ 11.303332] EXT3-fs: mounted filesystem with writeback data mode.
[ 11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[ 11.309998] Freeing unused kernel memory: 100k freed
<system hangs here - stop-A does go back to prom>
and
commit c658ad1b4e1520511da8323aa5e60d444cc303ed
Author: David S. Miller <davem@davemloft.net>
Date: Fri Dec 11 00:44:47 2009 -0800
sparc64: Add syscall tracepoint support.
Signed-off-by: David S. Miller <davem@davemloft.net>
actually makes the kernel panic:
[ 11.336665] Freeing unused kernel memory: 100k freed
[ 11.419998] Kernel panic - not syncing: Attempted to kill init!
[ 11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
[f0039490 : get_signal_to_deliver+0x338/0x35c ]
[f00124cc : do_signal+0x30/0x8f0 ]
[f0012da0 : do_notify_resume+0x14/0x38 ]
[f000fca4 : signal_p+0x14/0x24 ]
[f000edfc : srmmu_fault+0x58/0x68 ]
[ 11.466665] Press Stop-A (L1-A) to return to the boot prom
Marcel
On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>> From: Sam Ravnborg <sam@ravnborg.org>
>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>
>> > Added davem...
>> > We see strange SEGV faults in userspace and fail to read from ext2..
>> > All on some (but not all) sparc32 boxes.
>>
>> I saw the original report.
>>
>> But reverting this commit is the wrong thing to do from what I can
>> tell.
>>
>> Either we have:
>>
>> 1) A compiler code gen bug.
>>
>> 2) Some piece of code which is sparc32 specific is invoking memset
>> or memcpy in a way which makes assumptions which are in fact not
>> valid
>>
>> 3) The code change is merely making cache offsets change, masking the
>> true problem
>>
>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>> not fixing the real problem.
> Agree on this.
> But first step is to get confirmation that reverting this commit
> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
> I hope we will find that 2) is the culprint.
>
> Sam
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (2 preceding siblings ...)
2011-03-08 7:37 ` Marcel van Nies
@ 2011-03-08 7:45 ` Marcel van Nies
2011-03-08 11:17 ` Marcel van Nies
` (21 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 7:45 UTC (permalink / raw)
To: sparclinux
Hi,
> But first step is to get confirmation that reverting this commit
> indeed fixes the bug
I'll try that.
M
On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> It appears that two consecutive commits are causing problems on
> hyperSPARC, I noticed that too late.
>
> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
> earlier) only causes the system to hang, not panic:
> [ 11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
> [ 11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
> [ 11.299998] kjournald starting. Commit interval 5 seconds
> [ 11.303332] EXT3-fs: mounted filesystem with writeback data mode.
> [ 11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
> [ 11.309998] Freeing unused kernel memory: 100k freed
> <system hangs here - stop-A does go back to prom>
>
> and
> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
> Author: David S. Miller <davem@davemloft.net>
> Date: Fri Dec 11 00:44:47 2009 -0800
>
> sparc64: Add syscall tracepoint support.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> actually makes the kernel panic:
> [ 11.336665] Freeing unused kernel memory: 100k freed
> [ 11.419998] Kernel panic - not syncing: Attempted to kill init!
> [ 11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
> [f0039490 : get_signal_to_deliver+0x338/0x35c ]
> [f00124cc : do_signal+0x30/0x8f0 ]
> [f0012da0 : do_notify_resume+0x14/0x38 ]
> [f000fca4 : signal_p+0x14/0x24 ]
> [f000edfc : srmmu_fault+0x58/0x68 ]
> [ 11.466665] Press Stop-A (L1-A) to return to the boot prom
>
>
> Marcel
>
>
> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>> From: Sam Ravnborg <sam@ravnborg.org>
>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>
>>> > Added davem...
>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>> > All on some (but not all) sparc32 boxes.
>>>
>>> I saw the original report.
>>>
>>> But reverting this commit is the wrong thing to do from what I can
>>> tell.
>>>
>>> Either we have:
>>>
>>> 1) A compiler code gen bug.
>>>
>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>> or memcpy in a way which makes assumptions which are in fact not
>>> valid
>>>
>>> 3) The code change is merely making cache offsets change, masking the
>>> true problem
>>>
>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>> not fixing the real problem.
>> Agree on this.
>> But first step is to get confirmation that reverting this commit
>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>> I hope we will find that 2) is the culprint.
>>
>> Sam
>>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (3 preceding siblings ...)
2011-03-08 7:45 ` Marcel van Nies
@ 2011-03-08 11:17 ` Marcel van Nies
2011-03-08 20:22 ` Marcel van Nies
` (20 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 11:17 UTC (permalink / raw)
To: sparclinux
Hi,
2.6.33.7 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
does not segfault.
I also tried sparc-next-2.6, but I messed up my tree somehow. I will
try again later.
M
On Tue, Mar 8, 2011 at 8:45 AM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
>> But first step is to get confirmation that reverting this commit
>> indeed fixes the bug
>
> I'll try that.
> M
>
> On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
>> Hi,
>>
>> It appears that two consecutive commits are causing problems on
>> hyperSPARC, I noticed that too late.
>>
>> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
>> earlier) only causes the system to hang, not panic:
>> [ 11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
>> [ 11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
>> [ 11.299998] kjournald starting. Commit interval 5 seconds
>> [ 11.303332] EXT3-fs: mounted filesystem with writeback data mode.
>> [ 11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
>> [ 11.309998] Freeing unused kernel memory: 100k freed
>> <system hangs here - stop-A does go back to prom>
>>
>> and
>> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
>> Author: David S. Miller <davem@davemloft.net>
>> Date: Fri Dec 11 00:44:47 2009 -0800
>>
>> sparc64: Add syscall tracepoint support.
>>
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> actually makes the kernel panic:
>> [ 11.336665] Freeing unused kernel memory: 100k freed
>> [ 11.419998] Kernel panic - not syncing: Attempted to kill init!
>> [ 11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>> [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>> [f00124cc : do_signal+0x30/0x8f0 ]
>> [f0012da0 : do_notify_resume+0x14/0x38 ]
>> [f000fca4 : signal_p+0x14/0x24 ]
>> [f000edfc : srmmu_fault+0x58/0x68 ]
>> [ 11.466665] Press Stop-A (L1-A) to return to the boot prom
>>
>>
>> Marcel
>>
>>
>> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>>> From: Sam Ravnborg <sam@ravnborg.org>
>>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>>
>>>> > Added davem...
>>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>>> > All on some (but not all) sparc32 boxes.
>>>>
>>>> I saw the original report.
>>>>
>>>> But reverting this commit is the wrong thing to do from what I can
>>>> tell.
>>>>
>>>> Either we have:
>>>>
>>>> 1) A compiler code gen bug.
>>>>
>>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>> or memcpy in a way which makes assumptions which are in fact not
>>>> valid
>>>>
>>>> 3) The code change is merely making cache offsets change, masking the
>>>> true problem
>>>>
>>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>>> not fixing the real problem.
>>> Agree on this.
>>> But first step is to get confirmation that reverting this commit
>>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>>> I hope we will find that 2) is the culprint.
>>>
>>> Sam
>>>
>>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (4 preceding siblings ...)
2011-03-08 11:17 ` Marcel van Nies
@ 2011-03-08 20:22 ` Marcel van Nies
2011-03-08 21:09 ` Sam Ravnborg
` (19 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 20:22 UTC (permalink / raw)
To: sparclinux
Hi,
The good news:
sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
does NOT segfault. I did not apply the genirq patch yet.
The bad news:
Segfault gone, say hello to EXT2 read failure :o(
I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
[ 0.233333] esp: esp0, regs[fd00a000:fd009000] irq[36]
[ 0.236666] esp: esp0 is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
[ 3.243333] scsi0 : esp
[ 3.483332] scsi 0:0:1:0: Direct-Access FUJITSU MAP3735N
SUN72G 0401 PQ: 0 ANSI: 4
[ 3.486666] scsi target0:0:1: Beginning Domain Validation
[ 3.493332] scsi target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
[ 3.499999] scsi target0:0:1: Domain Validation skipping write tests
[ 3.503332] scsi target0:0:1: Ending Domain Validation
[ 3.743332] scsi 0:0:3:0: Direct-Access FUJITSU MAP3735N
SUN72G 0401 PQ: 0 ANSI: 4
[ 3.746666] scsi target0:0:3: Beginning Domain Validation
[ 3.753332] scsi target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
[ 3.756666] scsi target0:0:3: Domain Validation skipping write tests
[ 3.759999] scsi target0:0:3: Ending Domain Validation
[ 4.469999] esp: esp1, regs[fd00c000:fd00b000] irq[53]
[ 4.473332] esp: esp1 is a FASHME, 40 MHz (ccf=0), SCSI ID 7
[ 7.479999] scsi1 : esp
...
[ 11.029998] sd 0:0:1:0: [sda] 143374738 512-byte logical blocks:
(73.4 GB/68.3 GiB)
[ 11.033332] sd 0:0:3:0: [sdb] 143374738 512-byte logical blocks:
(73.4 GB/68.3 GiB)
[ 11.036665] sd 0:0:1:0: [sda] Write Protect is off
[ 11.043332] sd 0:0:1:0: [sda] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 11.046665] sd 0:0:3:0: [sdb] Write Protect is off
[ 11.053332] sd 0:0:3:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 11.066665] sda: sda1 sda2 sda3
[ 11.073332] sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
[ 11.089998] sd 0:0:1:0: [sda] Attached SCSI disk
[ 11.093332] sd 0:0:3:0: [sdb] Attached SCSI disk
[ 11.106665] EXT3-fs: barriers not enabled
[ 11.113332] kjournald starting. Commit interval 5 seconds
[ 11.116665] EXT3-fs (sdb4): mounted filesystem with ordered data mode
[ 11.119998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[ 11.123332] Freeing unused kernel memory: 108k freed
INIT: version 2.86 booting
[ 12.673332] NET: Registered protocol family 1
Gentoo Linux; http://www.gentoo.org/
Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2
* Mounting proc at /proc ... [ ok ]
* Mounting sysfs at /sys ... [ ok ]
* Mounting /dev for udev ... [ ok ]
...
blahblah
...
* Checking root filesystem ...fsck.ext3: No such file or directory
while trying to open /dev/sdb4
/dev/sdb4:
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193 <device>
* Filesystem couldn't be fixed :(
[ !! ]
Give root password for maintenance
(or type Control-D to continue):
Marcel
On Tue, Mar 8, 2011 at 12:17 PM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> 2.6.33.7 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does not segfault.
> I also tried sparc-next-2.6, but I messed up my tree somehow. I will
> try again later.
>
> M
>
> On Tue, Mar 8, 2011 at 8:45 AM, Marcel van Nies <morcles@gmail.com> wrote:
>> Hi,
>>
>>> But first step is to get confirmation that reverting this commit
>>> indeed fixes the bug
>>
>> I'll try that.
>> M
>>
>> On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
>>> Hi,
>>>
>>> It appears that two consecutive commits are causing problems on
>>> hyperSPARC, I noticed that too late.
>>>
>>> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
>>> earlier) only causes the system to hang, not panic:
>>> [ 11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
>>> [ 11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
>>> [ 11.299998] kjournald starting. Commit interval 5 seconds
>>> [ 11.303332] EXT3-fs: mounted filesystem with writeback data mode.
>>> [ 11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
>>> [ 11.309998] Freeing unused kernel memory: 100k freed
>>> <system hangs here - stop-A does go back to prom>
>>>
>>> and
>>> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
>>> Author: David S. Miller <davem@davemloft.net>
>>> Date: Fri Dec 11 00:44:47 2009 -0800
>>>
>>> sparc64: Add syscall tracepoint support.
>>>
>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> actually makes the kernel panic:
>>> [ 11.336665] Freeing unused kernel memory: 100k freed
>>> [ 11.419998] Kernel panic - not syncing: Attempted to kill init!
>>> [ 11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>>> [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>>> [f00124cc : do_signal+0x30/0x8f0 ]
>>> [f0012da0 : do_notify_resume+0x14/0x38 ]
>>> [f000fca4 : signal_p+0x14/0x24 ]
>>> [f000edfc : srmmu_fault+0x58/0x68 ]
>>> [ 11.466665] Press Stop-A (L1-A) to return to the boot prom
>>>
>>>
>>> Marcel
>>>
>>>
>>> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>>>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>>>> From: Sam Ravnborg <sam@ravnborg.org>
>>>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>>>
>>>>> > Added davem...
>>>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>>>> > All on some (but not all) sparc32 boxes.
>>>>>
>>>>> I saw the original report.
>>>>>
>>>>> But reverting this commit is the wrong thing to do from what I can
>>>>> tell.
>>>>>
>>>>> Either we have:
>>>>>
>>>>> 1) A compiler code gen bug.
>>>>>
>>>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>>> or memcpy in a way which makes assumptions which are in fact not
>>>>> valid
>>>>>
>>>>> 3) The code change is merely making cache offsets change, masking the
>>>>> true problem
>>>>>
>>>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>>>> not fixing the real problem.
>>>> Agree on this.
>>>> But first step is to get confirmation that reverting this commit
>>>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>>>> I hope we will find that 2) is the culprint.
>>>>
>>>> Sam
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (5 preceding siblings ...)
2011-03-08 20:22 ` Marcel van Nies
@ 2011-03-08 21:09 ` Sam Ravnborg
2011-03-08 21:13 ` Marcel van Nies
` (18 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-08 21:09 UTC (permalink / raw)
To: sparclinux
On Tue, Mar 08, 2011 at 09:22:07PM +0100, Marcel van Nies wrote:
> Hi,
>
> The good news:
> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does NOT segfault. I did not apply the genirq patch yet.
Narrowing it down to a sinlge patch is good.
>
> The bad news:
> Segfault gone, say hello to EXT2 read failure :o(
So we are dealing with two faults. Not a suprise considering how
little we have tested on sparc32 lately.
> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
I have tried said patch myself.
You may try to play around with the value as it produces a lot of output.
Regarding the segfault - the easiest way forward would be to split the
patch up in smaller chunks so we know which part causes the segfault to happen.
I assume you had to hand-apply the revert. If you could post the exact patch
you used for revert I will try to split it up in smaller logical parts.
But likely not until the weekend.
I have a sparcstation 5 that I managed to boot - unfortunately it
did not show erratic behaviour as you describe.
And for now I do not have any Ext2 filesystem on disk to play with.
Sam
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (6 preceding siblings ...)
2011-03-08 21:09 ` Sam Ravnborg
@ 2011-03-08 21:13 ` Marcel van Nies
2011-03-08 21:19 ` David Miller
` (17 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:13 UTC (permalink / raw)
To: sparclinux
Hi,
As expected, esp_debug gives a lot of ouput.
Is there anything in particular to look out for ?
Btw:
At this point:
> Give root password for maintenance
> (or type Control-D to continue):
I can logon, and reads from disk seem to go fine.
# mount -n -o remount,rw /
Then also writes to disk seem to go fine.
So, is this an ESP or EXT2 bug at all ?
Marcel
On Tue, Mar 8, 2011 at 9:22 PM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> The good news:
> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does NOT segfault. I did not apply the genirq patch yet.
>
> The bad news:
> Segfault gone, say hello to EXT2 read failure :o(
>
> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
>
>
> [ 0.233333] esp: esp0, regs[fd00a000:fd009000] irq[36]
> [ 0.236666] esp: esp0 is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
> [ 3.243333] scsi0 : esp
> [ 3.483332] scsi 0:0:1:0: Direct-Access FUJITSU MAP3735N
> SUN72G 0401 PQ: 0 ANSI: 4
> [ 3.486666] scsi target0:0:1: Beginning Domain Validation
> [ 3.493332] scsi target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [ 3.499999] scsi target0:0:1: Domain Validation skipping write tests
> [ 3.503332] scsi target0:0:1: Ending Domain Validation
> [ 3.743332] scsi 0:0:3:0: Direct-Access FUJITSU MAP3735N
> SUN72G 0401 PQ: 0 ANSI: 4
> [ 3.746666] scsi target0:0:3: Beginning Domain Validation
> [ 3.753332] scsi target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [ 3.756666] scsi target0:0:3: Domain Validation skipping write tests
> [ 3.759999] scsi target0:0:3: Ending Domain Validation
> [ 4.469999] esp: esp1, regs[fd00c000:fd00b000] irq[53]
> [ 4.473332] esp: esp1 is a FASHME, 40 MHz (ccf=0), SCSI ID 7
> [ 7.479999] scsi1 : esp
> ...
> [ 11.029998] sd 0:0:1:0: [sda] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [ 11.033332] sd 0:0:3:0: [sdb] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [ 11.036665] sd 0:0:1:0: [sda] Write Protect is off
> [ 11.043332] sd 0:0:1:0: [sda] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 11.046665] sd 0:0:3:0: [sdb] Write Protect is off
> [ 11.053332] sd 0:0:3:0: [sdb] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 11.066665] sda: sda1 sda2 sda3
> [ 11.073332] sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
> [ 11.089998] sd 0:0:1:0: [sda] Attached SCSI disk
> [ 11.093332] sd 0:0:3:0: [sdb] Attached SCSI disk
> [ 11.106665] EXT3-fs: barriers not enabled
> [ 11.113332] kjournald starting. Commit interval 5 seconds
> [ 11.116665] EXT3-fs (sdb4): mounted filesystem with ordered data mode
> [ 11.119998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
> [ 11.123332] Freeing unused kernel memory: 108k freed
> INIT: version 2.86 booting
> [ 12.673332] NET: Registered protocol family 1
>
> Gentoo Linux; http://www.gentoo.org/
> Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2
>
> * Mounting proc at /proc ... [ ok ]
> * Mounting sysfs at /sys ... [ ok ]
> * Mounting /dev for udev ... [ ok ]
> ...
> blahblah
> ...
> * Checking root filesystem ...fsck.ext3: No such file or directory
> while trying to open /dev/sdb4
> /dev/sdb4:
> The superblock could not be read or does not describe a correct ext2
> filesystem. If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193 <device>
>
> * Filesystem couldn't be fixed :(
> [ !! ]
> Give root password for maintenance
> (or type Control-D to continue):
>
>
> Marcel
>
>
> On Tue, Mar 8, 2011 at 12:17 PM, Marcel van Nies <morcles@gmail.com> wrote:
>> Hi,
>>
>> 2.6.33.7 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
>> does not segfault.
>> I also tried sparc-next-2.6, but I messed up my tree somehow. I will
>> try again later.
>>
>> M
>>
>> On Tue, Mar 8, 2011 at 8:45 AM, Marcel van Nies <morcles@gmail.com> wrote:
>>> Hi,
>>>
>>>> But first step is to get confirmation that reverting this commit
>>>> indeed fixes the bug
>>>
>>> I'll try that.
>>> M
>>>
>>> On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> It appears that two consecutive commits are causing problems on
>>>> hyperSPARC, I noticed that too late.
>>>>
>>>> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
>>>> earlier) only causes the system to hang, not panic:
>>>> [ 11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
>>>> [ 11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
>>>> [ 11.299998] kjournald starting. Commit interval 5 seconds
>>>> [ 11.303332] EXT3-fs: mounted filesystem with writeback data mode.
>>>> [ 11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
>>>> [ 11.309998] Freeing unused kernel memory: 100k freed
>>>> <system hangs here - stop-A does go back to prom>
>>>>
>>>> and
>>>> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
>>>> Author: David S. Miller <davem@davemloft.net>
>>>> Date: Fri Dec 11 00:44:47 2009 -0800
>>>>
>>>> sparc64: Add syscall tracepoint support.
>>>>
>>>> Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> actually makes the kernel panic:
>>>> [ 11.336665] Freeing unused kernel memory: 100k freed
>>>> [ 11.419998] Kernel panic - not syncing: Attempted to kill init!
>>>> [ 11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>>>> [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>>>> [f00124cc : do_signal+0x30/0x8f0 ]
>>>> [f0012da0 : do_notify_resume+0x14/0x38 ]
>>>> [f000fca4 : signal_p+0x14/0x24 ]
>>>> [f000edfc : srmmu_fault+0x58/0x68 ]
>>>> [ 11.466665] Press Stop-A (L1-A) to return to the boot prom
>>>>
>>>>
>>>> Marcel
>>>>
>>>>
>>>> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>>>>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>>>>> From: Sam Ravnborg <sam@ravnborg.org>
>>>>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>>>>
>>>>>> > Added davem...
>>>>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>>>>> > All on some (but not all) sparc32 boxes.
>>>>>>
>>>>>> I saw the original report.
>>>>>>
>>>>>> But reverting this commit is the wrong thing to do from what I can
>>>>>> tell.
>>>>>>
>>>>>> Either we have:
>>>>>>
>>>>>> 1) A compiler code gen bug.
>>>>>>
>>>>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>>>> or memcpy in a way which makes assumptions which are in fact not
>>>>>> valid
>>>>>>
>>>>>> 3) The code change is merely making cache offsets change, masking the
>>>>>> true problem
>>>>>>
>>>>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>>>>> not fixing the real problem.
>>>>> Agree on this.
>>>>> But first step is to get confirmation that reverting this commit
>>>>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>>>>> I hope we will find that 2) is the culprint.
>>>>>
>>>>> Sam
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (7 preceding siblings ...)
2011-03-08 21:13 ` Marcel van Nies
@ 2011-03-08 21:19 ` David Miller
2011-03-08 21:20 ` Marcel van Nies
` (16 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 21:19 UTC (permalink / raw)
To: sparclinux
From: Marcel van Nies <morcles@gmail.com>
Date: Tue, 8 Mar 2011 22:13:05 +0100
> So, is this an ESP or EXT2 bug at all ?
The error message is that the program "fsck.ext3" cannot be found.
Does that binary exist in the correct location so that fsck can
find it?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (8 preceding siblings ...)
2011-03-08 21:19 ` David Miller
@ 2011-03-08 21:20 ` Marcel van Nies
2011-03-08 21:27 ` Marcel van Nies
` (15 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:20 UTC (permalink / raw)
To: sparclinux
Hi Sam,
> Regarding the segfault - the easiest way forward would be to split the
> patch up in smaller chunks so we know which part causes the segfault to happen.
Yes, I'll see if I can work that out.
> I assume you had to hand-apply the revert.
That's what I did.
> If you could post the exact patch you used for revert
> I will try to split it up in smaller logical parts.
I'll post that patch later.
> I have a sparcstation 5 that I managed to boot -
> unfortunately itdid not show erratic behaviour as you describe.
My sparcSTATION 5 boots sparc-next-2.6 with your genirq patch just fine too.
It's up for almost 2 days now, doing things. It has 4 EXT3
filesystems. No problemo.
Marcel
On Tue, Mar 8, 2011 at 10:09 PM, Sam Ravnborg <sam@ravnborg.org> wrote:
> On Tue, Mar 08, 2011 at 09:22:07PM +0100, Marcel van Nies wrote:
>> Hi,
>>
>> The good news:
>> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
>> does NOT segfault. I did not apply the genirq patch yet.
>
> Narrowing it down to a sinlge patch is good.
>
>>
>> The bad news:
>> Segfault gone, say hello to EXT2 read failure :o(
>
> So we are dealing with two faults. Not a suprise considering how
> little we have tested on sparc32 lately.
>
>> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
>
> I have tried said patch myself.
> You may try to play around with the value as it produces a lot of output.
>
> Regarding the segfault - the easiest way forward would be to split the
> patch up in smaller chunks so we know which part causes the segfault to happen.
>
> I assume you had to hand-apply the revert. If you could post the exact patch
> you used for revert I will try to split it up in smaller logical parts.
> But likely not until the weekend.
>
> I have a sparcstation 5 that I managed to boot - unfortunately it
> did not show erratic behaviour as you describe.
> And for now I do not have any Ext2 filesystem on disk to play with.
>
> Sam
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (9 preceding siblings ...)
2011-03-08 21:20 ` Marcel van Nies
@ 2011-03-08 21:27 ` Marcel van Nies
2011-03-08 21:30 ` Marcel van Nies
` (14 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:27 UTC (permalink / raw)
To: sparclinux
Hi,
/sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
I can happily boot anything 2.6.32.27 or earlier.
Marcel
On Tue, Mar 8, 2011 at 10:19 PM, David Miller <davem@davemloft.net> wrote:
> From: Marcel van Nies <morcles@gmail.com>
> Date: Tue, 8 Mar 2011 22:13:05 +0100
>
>> So, is this an ESP or EXT2 bug at all ?
>
> The error message is that the program "fsck.ext3" cannot be found.
> Does that binary exist in the correct location so that fsck can
> find it?
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (10 preceding siblings ...)
2011-03-08 21:27 ` Marcel van Nies
@ 2011-03-08 21:30 ` Marcel van Nies
2011-03-08 21:30 ` David Miller
` (13 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:30 UTC (permalink / raw)
To: sparclinux
Correction:
/sbin/fsck.ext3 is a link to e2fsck.
/sbin/e2fsck is there.
Is the link the problem ?
Let's see.
M
On Tue, Mar 8, 2011 at 10:27 PM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> /sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
>
> I can happily boot anything 2.6.32.27 or earlier.
>
> Marcel
>
> On Tue, Mar 8, 2011 at 10:19 PM, David Miller <davem@davemloft.net> wrote:
>> From: Marcel van Nies <morcles@gmail.com>
>> Date: Tue, 8 Mar 2011 22:13:05 +0100
>>
>>> So, is this an ESP or EXT2 bug at all ?
>>
>> The error message is that the program "fsck.ext3" cannot be found.
>> Does that binary exist in the correct location so that fsck can
>> find it?
>>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (11 preceding siblings ...)
2011-03-08 21:30 ` Marcel van Nies
@ 2011-03-08 21:30 ` David Miller
2011-03-08 21:51 ` Marcel van Nies
` (12 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 21:30 UTC (permalink / raw)
To: sparclinux
From: Marcel van Nies <morcles@gmail.com>
Date: Tue, 8 Mar 2011 22:27:39 +0100
> /sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
>
> I can happily boot anything 2.6.32.27 or earlier.
One large possibility is that there is a missing cache flush somewhere,
and reverting the memcpy/memset change masks it.
Or, like I said yesterday, bad gcc code generation wrt. those routines.
Looking at ESP logs is not going to give much information as that driver
has been stressed heavily on sparc64 for years without any blips.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (12 preceding siblings ...)
2011-03-08 21:30 ` David Miller
@ 2011-03-08 21:51 ` Marcel van Nies
2011-03-08 22:00 ` David Miller
` (11 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:51 UTC (permalink / raw)
To: sparclinux
Hi,
> One large possibility is that there is a missing cache flush somewhere,
> and reverting the memcpy/memset change masks it.
2.6.33.7 with reverted commit is fine too, still leaving
2.6.34 - 2.6.38 for introducing weird behavior.
I've got 3 kinds of SPARC32 here, and the big difference is CPU.
All other hardware in the boxes is "the same".
The software (kernel et.al.) is pretty much the same too.
Everything is ok, except for hyperSPARC.
So what makes the difference? This one :
[ 0.000000] Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek (jj@ultra.linux.cz).
Patching kernel for srmmu[ROSS HyperSparc]/iommu
Combine that with srmmu_fault as in
init[1]: segfault at 0 ip 5000dac8 (rpc f000eea8) spefe738a0 error
30001 in ld-2.3.5.so[50000000+1a000]
Kernel panic - not syncing: Attempted to kill init!
[f002ed74 : do_group_exit+0x84/0xb4 ]
[f0039a24 : get_signal_to_deliver+0x338/0x35c ]
[f0011fbc : do_signal+0x30/0x914 ]
[f00128b4 : do_notify_resume+0x14/0x38 ]
[f000fd50 : signal_p+0x14/0x24 ]
[f000eea8 : srmmu_fault+0x58/0x68 ]
I start thinking there is something wrong with Jakub's srmmu patch for
hyperSPARC...
Marcel
On Tue, Mar 8, 2011 at 10:30 PM, David Miller <davem@davemloft.net> wrote:
> From: Marcel van Nies <morcles@gmail.com>
> Date: Tue, 8 Mar 2011 22:27:39 +0100
>
>> /sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
>>
>> I can happily boot anything 2.6.32.27 or earlier.
>
> One large possibility is that there is a missing cache flush somewhere,
> and reverting the memcpy/memset change masks it.
>
> Or, like I said yesterday, bad gcc code generation wrt. those routines.
>
> Looking at ESP logs is not going to give much information as that driver
> has been stressed heavily on sparc64 for years without any blips.
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (13 preceding siblings ...)
2011-03-08 21:51 ` Marcel van Nies
@ 2011-03-08 22:00 ` David Miller
2011-03-09 5:25 ` Bob Breuer
` (10 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 22:00 UTC (permalink / raw)
To: sparclinux
From: Marcel van Nies <morcles@gmail.com>
Date: Tue, 8 Mar 2011 22:51:00 +0100
> I start thinking there is something wrong with Jakub's srmmu patch for
> hyperSPARC...
That's just the code that links up the cpu specific cache and tlb
flushing routines, it's been doing that for more than 10 years.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (14 preceding siblings ...)
2011-03-08 22:00 ` David Miller
@ 2011-03-09 5:25 ` Bob Breuer
2011-03-09 6:16 ` Bob Breuer
` (9 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Bob Breuer @ 2011-03-09 5:25 UTC (permalink / raw)
To: sparclinux
Marcel van Nies wrote:
> Hi,
>
> The good news:
> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does NOT segfault. I did not apply the genirq patch yet.
>
> The bad news:
> Segfault gone, say hello to EXT2 read failure :o(
>
> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
>
>
> [ 0.233333] esp: esp0, regs[fd00a000:fd009000] irq[36]
> [ 0.236666] esp: esp0 is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
> [ 3.243333] scsi0 : esp
> [ 3.483332] scsi 0:0:1:0: Direct-Access FUJITSU MAP3735N
> SUN72G 0401 PQ: 0 ANSI: 4
> [ 3.486666] scsi target0:0:1: Beginning Domain Validation
> [ 3.493332] scsi target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [ 3.499999] scsi target0:0:1: Domain Validation skipping write tests
> [ 3.503332] scsi target0:0:1: Ending Domain Validation
> [ 3.743332] scsi 0:0:3:0: Direct-Access FUJITSU MAP3735N
> SUN72G 0401 PQ: 0 ANSI: 4
> [ 3.746666] scsi target0:0:3: Beginning Domain Validation
> [ 3.753332] scsi target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [ 3.756666] scsi target0:0:3: Domain Validation skipping write tests
> [ 3.759999] scsi target0:0:3: Ending Domain Validation
> [ 4.469999] esp: esp1, regs[fd00c000:fd00b000] irq[53]
> [ 4.473332] esp: esp1 is a FASHME, 40 MHz (ccf=0), SCSI ID 7
> [ 7.479999] scsi1 : esp
> ...
> [ 11.029998] sd 0:0:1:0: [sda] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [ 11.033332] sd 0:0:3:0: [sdb] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [ 11.036665] sd 0:0:1:0: [sda] Write Protect is off
> [ 11.043332] sd 0:0:1:0: [sda] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 11.046665] sd 0:0:3:0: [sdb] Write Protect is off
> [ 11.053332] sd 0:0:3:0: [sdb] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 11.066665] sda: sda1 sda2 sda3
> [ 11.073332] sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
> [ 11.089998] sd 0:0:1:0: [sda] Attached SCSI disk
> [ 11.093332] sd 0:0:3:0: [sdb] Attached SCSI disk
> [ 11.106665] EXT3-fs: barriers not enabled
> [ 11.113332] kjournald starting. Commit interval 5 seconds
> [ 11.116665] EXT3-fs (sdb4): mounted filesystem with ordered data mode
> [ 11.119998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
> [ 11.123332] Freeing unused kernel memory: 108k freed
> INIT: version 2.86 booting
> [ 12.673332] NET: Registered protocol family 1
>
> Gentoo Linux; http://www.gentoo.org/
> Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2
>
> * Mounting proc at /proc ... [ ok ]
> * Mounting sysfs at /sys ... [ ok ]
> * Mounting /dev for udev ... [ ok ]
> ...
> blahblah
> ...
> * Checking root filesystem ...fsck.ext3: No such file or directory
> while trying to open /dev/sdb4
Check if the device node for /dev/sdb4 exists.
For me, udev will sometimes fail to create a device node with the latest
kernel. This is with a SuperSparc cpu, so it's not hyperSparc related.
Of course, it could just be my udev acting up, as I get error messages
from udev such as:
Starting udev: udevd[88]: udev_event_run: fork of child failed: Invalid
argument
udevd[88]: udev_event_run: fork of child failed: Invalid argument
udevd[88]: udev_event_run: fork of child failed: Invalid argument
udevd[88]: udev_event_run: fork of child failed: Invalid argument
I'm not sure whether the invalid argument is a udev problem, userspace
incompatibility, kernel bug, or a kernel feature I left out.
Bob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (15 preceding siblings ...)
2011-03-09 5:25 ` Bob Breuer
@ 2011-03-09 6:16 ` Bob Breuer
2011-03-09 6:37 ` Bob Breuer
` (8 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Bob Breuer @ 2011-03-09 6:16 UTC (permalink / raw)
To: sparclinux
Marcel van Nies wrote:
> Hi,
>
> git bisect came up with this:
>
> 4d14a459857bd151ecbd14bcd37b4628da00792b is the first bad commit
> commit 4d14a459857bd151ecbd14bcd37b4628da00792b
> Author: David S. Miller <davem@davemloft.net>
> Date: Thu Dec 10 23:32:10 2009 -0800
>
> sparc: Stop trying to be so fancy and use __builtin_{memcpy,memset}()
>
> This mirrors commit ff60fab71bb3b4fdbf8caf57ff3739ffd0887396
> (x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy)
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
My guess is that we're no longer using the special hyperSparc block copy
and fill from mm/hypersparc.S and are now leaving some data in the cache
that wasn't there before.
Unfortunately, my hyperSparc is failing from a completely different commit:
commit b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
Author: Ollie Wild <aaw@google.com>
Date: Thu Jul 19 01:48:16 2007 -0700
mm: variable length argument support
The result I have is that the argv array for new commands looks
completely empty leading to this strange output when booting:
Freeing unused kernel memory: 144k freed
modprobe: FATAL: Module not found.
INIT: version 2.86 booting
: : No such file or directory
INIT: Entering runlevel: 3
: : No such file or directory
There must be a missing cache flush somewhere in there that's needed for
the argv array...
Bob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (16 preceding siblings ...)
2011-03-09 6:16 ` Bob Breuer
@ 2011-03-09 6:37 ` Bob Breuer
2011-03-09 20:17 ` David Miller
` (7 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Bob Breuer @ 2011-03-09 6:37 UTC (permalink / raw)
To: sparclinux
Marcel van Nies wrote:
> Hi,
>
>> One large possibility is that there is a missing cache flush somewhere,
>> and reverting the memcpy/memset change masks it.
>
> 2.6.33.7 with reverted commit is fine too, still leaving
> 2.6.34 - 2.6.38 for introducing weird behavior.
>
>
>
> I've got 3 kinds of SPARC32 here, and the big difference is CPU.
> All other hardware in the boxes is "the same".
> The software (kernel et.al.) is pretty much the same too.
> Everything is ok, except for hyperSPARC.
>
> So what makes the difference? This one :
> [ 0.000000] Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek (jj@ultra.linux.cz).
> Patching kernel for srmmu[ROSS HyperSparc]/iommu
>
> Combine that with srmmu_fault as in
> init[1]: segfault at 0 ip 5000dac8 (rpc f000eea8) spefe738a0 error
> 30001 in ld-2.3.5.so[50000000+1a000]
> Kernel panic - not syncing: Attempted to kill init!
> [f002ed74 : do_group_exit+0x84/0xb4 ]
> [f0039a24 : get_signal_to_deliver+0x338/0x35c ]
> [f0011fbc : do_signal+0x30/0x914 ]
> [f00128b4 : do_notify_resume+0x14/0x38 ]
> [f000fd50 : signal_p+0x14/0x24 ]
> [f000eea8 : srmmu_fault+0x58/0x68 ]
>
> I start thinking there is something wrong with Jakub's srmmu patch for
> hyperSPARC...
No, hyperSparc just accesses it's data cache in a peculiar way. It is
1-way virtually indexed, so when you map the same physical page to 2
different virtual addresses, you can actually end up with 2 different
cache lines being used simultaneously for the same chunk of physical
memory. This is known as cache-aliasing. SuperSparc accesses it's
cache using the physical address and doesn't suffer from the same problem.
Bob
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (17 preceding siblings ...)
2011-03-09 6:37 ` Bob Breuer
@ 2011-03-09 20:17 ` David Miller
2011-03-11 21:26 ` Marcel van Nies
` (6 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-09 20:17 UTC (permalink / raw)
To: sparclinux
From: Bob Breuer <breuerr@mc.net>
Date: Wed, 09 Mar 2011 00:16:50 -0600
> My guess is that we're no longer using the special hyperSparc block copy
> and fill from mm/hypersparc.S and are now leaving some data in the cache
> that wasn't there before.
That is a possibility.
But let's be clear that this only applies to:
1) memcpy calls with constant "count" of PAGE_SIZE
2) memset calls with constant "c" of zero and "count" of PAGE_SIZE
Since those are the only cases where memset/memcpy gets translated
into calls to those optimized Hypersparc routines.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (18 preceding siblings ...)
2011-03-09 20:17 ` David Miller
@ 2011-03-11 21:26 ` Marcel van Nies
2011-03-11 22:40 ` Sam Ravnborg
` (5 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-11 21:26 UTC (permalink / raw)
To: sparclinux
[-- Attachment #1: Type: text/plain, Size: 2842 bytes --]
Hi,
> Regarding the segfault - the easiest way forward would be to split the
> patch up in smaller chunks so we know which part causes the segfault to happen.
>
> I assume you had to hand-apply the revert. If you could post the exact patch
> you used for revert I will try to split it up in smaller logical parts.
I attached the patch which I used to revert commit
4d14a459857bd151ecbd14bcd37b4628da00792b
I did a split of this patch, and build kernels with only the memcpy or
only the memset part reverted.
They both segfault.
memset reverted
===============
[ 11.129998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[ 11.133332] Freeing unused kernel memory: 108k freed
[ 11.196665] init[1]: segfault at 0 ip 5000dac8 (rpc f000eea4)
sp eff4f8a0 error 30001 in ld-2.3.5.so[50000000+1a000]
[ 12.023332] Kernel panic - not syncing: Attempted to kill init!
[ 12.026665] [f002f494 : do_group_exit+0x84/0xb4 ]
[f003a130 : get_signal_to_deliver+0x338/0x35c ]
[f0012654 : do_signal+0x30/0x914 ]
[f0012f4c : do_notify_resume+0x14/0x38 ]
[f000fd4c : signal_p+0x14/0x24 ]
[f000eea4 : srmmu_fault+0x58/0x68 ]
[ 12.069998] Press Stop-A (L1-A) to return to the boot prom
memcpy reverted
===============
[ 11.113332] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[ 11.119998] Freeing unused kernel memory: 108k freed
INIT: version 2.86 booting
[ 12.453332] bash[23]: segfault at ffffffe8 ip 5015e81c (rpc 5015e80c)
sp ef868ac8 error 30002 in libc-2.3.5.so[5009a000+11e000]
[ 12.533332] rc[26]: segfault at 50189568 ip 50013ce0 (rpc 5000e034)
sp efbb1558 error 30001 in ld-2.3.5.so[50000000+1a000]
[ 13.156665] modprobe[43]: segfault at 24 ip 500aebdc (rpc 500aeb1c)
sp ef8fd800 error 30001 in libc-2.3.5.so[5004e000+11e000]
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
[ 13.316665] modprobe[39]: segfault at 50 ip 500b7e28 (rpc 500b7e04)
sp ef885e18 error 30001 in libc-2.3.5.so[5004e000+11e000]
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: *** glibc detected *** realloc(): invalid next size: 0x000262d0 ***
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: *** glibc detected *** realloc(): invalid pointer: 0x000262d0 ***
INIT: Entering runlevel: 3
[ 14.169998] rc[44]: segfault at 3 ip 50013c2c (rpc 5000e034)
sp eff09558 error 30001 in ld-2.3.5.so[50000000+1a000]
Marcel
[-- Attachment #2: revert.patch --]
[-- Type: application/octet-stream, Size: 6049 bytes --]
diff -uNr a/arch/sparc/include/asm/string_32.h b/arch/sparc/include/asm/string_32.h
--- a/arch/sparc/include/asm/string_32.h
+++ b/arch/sparc/include/asm/string_32.h
@@ -16,7 +16,9 @@
#ifdef __KERNEL__
extern void __memmove(void *,const void *,__kernel_size_t);
-
+extern __kernel_size_t __memcpy(void *,const void *,__kernel_size_t);
+extern __kernel_size_t __memset(void *,int,__kernel_size_t);
+
#ifndef EXPORT_SYMTAB_STROPS
/* First the mem*() things. */
@@ -30,10 +32,82 @@
})
#define __HAVE_ARCH_MEMCPY
-#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
+
+static inline void *__constant_memcpy(void *to, const void *from, __kernel_size_t n)
+{
+ extern void __copy_1page(void *, const void *);
+
+ if(n <= 32) {
+ __builtin_memcpy(to, from, n);
+ } else if (((unsigned int) to & 7) != 0) {
+ /* Destination is not aligned on the double-word boundary */
+ __memcpy(to, from, n);
+ } else {
+ switch(n) {
+ case PAGE_SIZE:
+ __copy_1page(to, from);
+ break;
+ default:
+ __memcpy(to, from, n);
+ break;
+ }
+ }
+ return to;
+}
+
+static inline void *__nonconstant_memcpy(void *to, const void *from, __kernel_size_t n)
+{
+ __memcpy(to, from, n);
+ return to;
+}
+
+#undef memcpy
+#define memcpy(t, f, n) \
+(__builtin_constant_p(n) ? \
+ __constant_memcpy((t),(f),(n)) : \
+ __nonconstant_memcpy((t),(f),(n)))
#define __HAVE_ARCH_MEMSET
-#define memset(s, c, count) __builtin_memset(s, c, count)
+
+static inline void *__constant_c_and_count_memset(void *s, char c, __kernel_size_t count)
+{
+ extern void bzero_1page(void *);
+ extern __kernel_size_t __bzero(void *, __kernel_size_t);
+
+ if(!c) {
+ if(count == PAGE_SIZE)
+ bzero_1page(s);
+ else
+ __bzero(s, count);
+ } else {
+ __memset(s, c, count);
+ }
+ return s;
+}
+
+static inline void *__constant_c_memset(void *s, char c, __kernel_size_t count)
+{
+ extern __kernel_size_t __bzero(void *, __kernel_size_t);
+
+ if(!c)
+ __bzero(s, count);
+ else
+ __memset(s, c, count);
+ return s;
+}
+
+static inline void *__nonconstant_memset(void *s, char c, __kernel_size_t count)
+{
+ __memset(s, c, count);
+ return s;
+}
+
+#undef memset
+#define memset(s, c, count) \
+(__builtin_constant_p(c) ? (__builtin_constant_p(count) ? \
+ __constant_c_and_count_memset((s), (c), (count)) : \
+ __constant_c_memset((s), (c), (count))) \
+ : __nonconstant_memset((s), (c), (count)))
#define __HAVE_ARCH_MEMSCAN
diff -uNr a/arch/sparc/include/asm/string_64.h b/arch/sparc/include/asm/string_64.h
--- a/arch/sparc/include/asm/string_64.h
+++ b/arch/sparc/include/asm/string_64.h
@@ -15,6 +15,8 @@
#include <asm/asi.h>
+extern void *__memset(void *,int,__kernel_size_t);
+
#ifndef EXPORT_SYMTAB_STROPS
/* First the mem*() things. */
@@ -22,10 +24,29 @@
extern void *memmove(void *, const void *, __kernel_size_t);
#define __HAVE_ARCH_MEMCPY
-#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
+extern void *memcpy(void *, const void *, __kernel_size_t);
#define __HAVE_ARCH_MEMSET
-#define memset(s, c, count) __builtin_memset(s, c, count)
+extern void *__builtin_memset(void *,int,__kernel_size_t);
+
+static inline void *__constant_memset(void *s, int c, __kernel_size_t count)
+{
+ extern __kernel_size_t __bzero(void *, __kernel_size_t);
+
+ if (!c) {
+ __bzero(s, count);
+ return s;
+ } else
+ return __memset(s, c, count);
+}
+
+#undef memset
+#define memset(s, c, count) \
+((__builtin_constant_p(count) && (count) <= 32) ? \
+ __builtin_memset((s), (c), (count)) : \
+ (__builtin_constant_p(c) ? \
+ __constant_memset((s), (c), (count)) : \
+ __memset((s), (c), (count))))
#define __HAVE_ARCH_MEMSCAN
diff -uNr a/arch/sparc/lib/bzero.S b/arch/sparc/lib/bzero.S
--- a/arch/sparc/lib/bzero.S
+++ b/arch/sparc/lib/bzero.S
@@ -6,6 +6,10 @@
.text
+ .globl __memset
+ .type __memset, #function
+__memset: /* %o0=buf, %o1=pat, %o2=len */
+
.globl memset
.type memset, #function
memset: /* %o0=buf, %o1=pat, %o2=len */
@@ -79,6 +83,7 @@
retl
mov %o3, %o0
.size __bzero, .-__bzero
+ .size __memset, .-__memset
.size memset, .-memset
#define EX_ST(x,y) \
diff -uNr a/arch/sparc/lib/checksum_32.S b/arch/sparc/lib/checksum_32.S
--- a/arch/sparc/lib/checksum_32.S
+++ b/arch/sparc/lib/checksum_32.S
@@ -560,7 +560,7 @@
mov %i0, %o1
mov %i1, %o0
5:
- call memcpy
+ call __memcpy
mov %i2, %o2
tst %o0
bne,a 2f
diff -uNr a/arch/sparc/lib/ksyms.c b/arch/sparc/lib/ksyms.c
--- a/arch/sparc/lib/ksyms.c
+++ b/arch/sparc/lib/ksyms.c
@@ -30,6 +30,7 @@
EXPORT_SYMBOL(memcmp);
EXPORT_SYMBOL(memcpy);
EXPORT_SYMBOL(memset);
+EXPORT_SYMBOL(__memset);
EXPORT_SYMBOL(memmove);
EXPORT_SYMBOL(__bzero);
@@ -80,6 +81,7 @@
/* Special internal versions of library functions. */
EXPORT_SYMBOL(__copy_1page);
+EXPORT_SYMBOL(__memcpy);
EXPORT_SYMBOL(__memmove);
EXPORT_SYMBOL(bzero_1page);
diff -uNr a/arch/sparc/lib/memcpy.S b/arch/sparc/lib/memcpy.S
--- a/arch/sparc/lib/memcpy.S
+++ b/arch/sparc/lib/memcpy.S
@@ -543,6 +543,9 @@
b 3f
add %o0, 2, %o0
+#ifdef __KERNEL__
+FUNC(__memcpy)
+#endif
FUNC(memcpy) /* %o0=dst %o1=src %o2=len */
sub %o0, %o1, %o4
diff -uNr a/arch/sparc/lib/memset.S b/arch/sparc/lib/memset.S
--- a/arch/sparc/lib/memset.S
+++ b/arch/sparc/lib/memset.S
@@ -60,10 +60,11 @@
.globl __bzero_begin
__bzero_begin:
- .globl __bzero
+ .globl __bzero, __memset,
.globl memset
.globl __memset_start, __memset_end
__memset_start:
+__memset:
memset:
and %o1, 0xff, %g3
sll %g3, 8, %g2
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (19 preceding siblings ...)
2011-03-11 21:26 ` Marcel van Nies
@ 2011-03-11 22:40 ` Sam Ravnborg
2011-03-12 18:03 ` daniel
` (4 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-11 22:40 UTC (permalink / raw)
To: sparclinux
Hi Marcel.
On Fri, Mar 11, 2011 at 10:26:36PM +0100, Marcel van Nies wrote:
> Hi,
>
> > Regarding the segfault - the easiest way forward would be to split the
> > patch up in smaller chunks so we know which part causes the segfault to happen.
> >
> > I assume you had to hand-apply the revert. If you could post the exact patch
> > you used for revert I will try to split it up in smaller logical parts.
>
> I attached the patch which I used to revert commit
> 4d14a459857bd151ecbd14bcd37b4628da00792b
>
> I did a split of this patch, and build kernels with only the memcpy or
> only the memset part reverted.
Thanks for all your effort!
I will during the weekend try to think how we can nail this,
but I'm a bit lost here as we are looking into areas I know little about.
Sam
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (20 preceding siblings ...)
2011-03-11 22:40 ` Sam Ravnborg
@ 2011-03-12 18:03 ` daniel
2011-03-13 21:13 ` Sam Ravnborg
` (3 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: daniel @ 2011-03-12 18:03 UTC (permalink / raw)
To: sparclinux
On Fri, 11 Mar 2011 23:40:32 0100, Sam Ravnborg wrote:
Hi Marcel.
>
> On Fri, Mar 11, 2011 at 10:26:36PM 0100, Marcel van Nies wrote:
> > Hi,
> >
> > > Regarding the segfault - the easiest way forward would be to split the
> > > patch up in smaller chunks so we know which part causes the
segfault to happen.
> > >
> > > I assume you had to hand-apply the revert. If you could post
the exact patch
> > > you used for revert I will try to split it up in smaller logical parts.
> >
> > I attached the patch which I used to revert commit
> > 4d14a459857bd151ecbd14bcd37b4628da00792b
> >
> > I did a split of this patch, and build kernels with only the memcpy or
> > only the memset part reverted.
>
> Thanks for all your effort!
> I will during the weekend try to think how we can nail this,
> but I'm a bit lost here as we are looking into areas I know little about.
>
Hi,
I have begun too look at the patches, one thing that strikes me is
why handle_level_irq is used and why the ack functions irq_ack and
irq_mask_ack are not defined. On the LEON architecture IRQs are
normally edge triggered, the exception beeing PCI interrupts that is
level triggered. Implementing the ack functions in the current
implementation would result in acking edge triggered IRQs which means
IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
edge triggered interrupts and that the CPU acks the IRQ automatically
when the trap is taken?
What is the difference between having handle_level_irq without ACKs
implemented and having handle_edge_irq doing the interrupt flow
handling?
Daniel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (21 preceding siblings ...)
2011-03-12 18:03 ` daniel
@ 2011-03-13 21:13 ` Sam Ravnborg
2011-03-14 11:17 ` Daniel Hellstrom
` (2 subsequent siblings)
25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-13 21:13 UTC (permalink / raw)
To: sparclinux
Hi Daniel - thanks for looking at this patch.
I was actually planning to send it to David tonight.
But after your comments I will wait.
> I have begun too look at the patches, one thing that strikes me is
> why handle_level_irq is used and why the ack functions irq_ack and
> irq_mask_ack are not defined.
The sun4m at least uses level triggered interrupts.
And looking at the implmentation the enable() and disable() functions
in all cases did a simple mask and unmask - also the leon variants.
So based on this observation I decided to go got the handle_level_irq
flow handler. From the implmentation in kernel/irq/ I could
also see that handle_level_irq always called irq_mask() / irq_unmask()
which was a match towoards to earlier implemtnation.
On top of this - this just worked for my sun4m box.
> On the LEON architecture IRQs are
> normally edge triggered, the exception beeing PCI interrupts that is
> level triggered. Implementing the ack functions in the current
> implementation would result in acking edge triggered IRQs which means
> IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
> edge triggered interrupts and that the CPU acks the IRQ automatically
> when the trap is taken?
>
> What is the difference between having handle_level_irq without ACKs
> implemented and having handle_edge_irq doing the interrupt flow
> handling?
Thomas? Can you help here? You are much more into these details than I am.
Sam
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (22 preceding siblings ...)
2011-03-13 21:13 ` Sam Ravnborg
@ 2011-03-14 11:17 ` Daniel Hellstrom
2011-03-14 11:25 ` Daniel Hellstrom
2011-03-14 17:03 ` Thomas Gleixner
25 siblings, 0 replies; 27+ messages in thread
From: Daniel Hellstrom @ 2011-03-14 11:17 UTC (permalink / raw)
To: sparclinux
Sam Ravnborg wrote:
>Hi Daniel - thanks for looking at this patch.
>
>I was actually planning to send it to David tonight.
>But after your comments I will wait.
>
>
>
>> I have begun too look at the patches, one thing that strikes me is
>>why handle_level_irq is used and why the ack functions irq_ack and
>>irq_mask_ack are not defined.
>>
>>
>
>The sun4m at least uses level triggered interrupts.
>And looking at the implmentation the enable() and disable() functions
>in all cases did a simple mask and unmask - also the leon variants.
>
>
Ok.
mask and unmask is harmless on the LEON, incomming IRQs will be pending
but not propagated to the CPU until it is unmasked, they will not disapear.
As I understand it the mask/unmask in handler_irq is done in order to
avoid generating an extra IRQ for level triggered IRQs. When first
unmasking no more IRQs will be queued for the CPU from this IRQ source,
the current pending IRQ is cleared by acking the IRQ controller, the
"real" IRQ source (for example a PCI board) is acked in the ISR, then
handler_irq unmasks the IRQ again and can safely avoid an extra spurious
IRQ.
>So based on this observation I decided to go got the handle_level_irq
>flow handler. From the implmentation in kernel/irq/ I could
>also see that handle_level_irq always called irq_mask() / irq_unmask()
>which was a match towoards to earlier implemtnation.
>
>On top of this - this just worked for my sun4m box.
>
>
Ok
>
>
>>On the LEON architecture IRQs are
>>normally edge triggered, the exception beeing PCI interrupts that is
>>level triggered. Implementing the ack functions in the current
>>implementation would result in acking edge triggered IRQs which means
>>IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
>>edge triggered interrupts and that the CPU acks the IRQ automatically
>>when the trap is taken?
>>
>> What is the difference between having handle_level_irq without ACKs
>>implemented and having handle_edge_irq doing the interrupt flow
>>handling?
>>
>>
>
>Thomas? Can you help here? You are much more into these details than I am.
>
>
Must add one more thing here: I now see that the egde IRQ handler
(handler_edge_irq) does ack which is also incorrect on the LEON. One
could argue that irq_ack should be left undefined, however it is needed
for PCI Level IRQs later. Mixing handler_edge_irq and handler_level_irq
does not seem to be a good idea on LEON since ack is called both times,
and edge IRQs must not be acked whereas level IRQs should on the LEON.
Instead I suggest for the LEON:
1. using handle_fasteoi_irq for "edge" IRQs
2. using handle_level_irq for PCI IRQs in the future, this also requires
that irq_ack and irq_mask_ack is defined (I have a patch for this)
One other thing is that I can't boot anymore on the LEON with the genirq
patch, this is because the virtual IRQs is not 1:1 to real IRQs and the
APBUART serial tty console driver uses the "interrupt" property
directely, thus VIRQs and real IRQs are mixed and that can not work.
Perhaps there are other drivers on sparc32 machines that use the
interrupt propery directely? I will try creating a patch for APBUART
driver to use VIRQs instead, perhaps that patch can go in before Sam's
genirq patch...
I really think this is the right way forward for IRQ handling on
sparc32, this is an improvement for LEON indeed. Thank you all for your
efforts in this.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (23 preceding siblings ...)
2011-03-14 11:17 ` Daniel Hellstrom
@ 2011-03-14 11:25 ` Daniel Hellstrom
2011-03-14 17:03 ` Thomas Gleixner
25 siblings, 0 replies; 27+ messages in thread
From: Daniel Hellstrom @ 2011-03-14 11:25 UTC (permalink / raw)
To: sparclinux
Sam Ravnborg wrote:
>Hi Daniel - thanks for looking at this patch.
>
>I was actually planning to send it to David tonight.
>But after your comments I will wait.
>
>
Thanks, please hold it a couple of days until I can test this a bit more.
Below is a patch that adds irq_unlink (havn't tested it yet), I figure
we must have that in order to implement irq_shutdown?
I used the patch on irq_alloc() below to get LEON booting, this will
make VIRQs map 1:1 to real IRQs in most cases. This is needed in order
to get APBUART driver working, however as said in previous email I will
try fixing APBUART driver instead so you should probably ignore that hunk.
Daniel
---
arch/sparc/kernel/irq.h | 1 +
arch/sparc/kernel/irq_32.c | 28 +++++++++++++++++++++++-----
2 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/arch/sparc/kernel/irq.h b/arch/sparc/kernel/irq.h
index a43fc46..ecff50f 100644
--- a/arch/sparc/kernel/irq.h
+++ b/arch/sparc/kernel/irq.h
@@ -54,6 +54,7 @@ extern struct sparc_irq_config sparc_irq_config;
unsigned int irq_alloc(unsigned int real_irq, unsigned int pil);
void irq_link(unsigned int irq);
+void irq_unlink(unsigned int irq);
void handler_irq(unsigned int pil, struct pt_regs *regs);
/* Dave Redman (djhr@tadpole.co.uk)
diff --git a/arch/sparc/kernel/irq_32.c b/arch/sparc/kernel/irq_32.c
index 9ce6b97..f698f07 100644
--- a/arch/sparc/kernel/irq_32.c
+++ b/arch/sparc/kernel/irq_32.c
@@ -105,12 +105,12 @@ EXPORT_SYMBOL(arch_local_irq_restore);
* Sun4d complicates things even further. IRQ numbers are arbitrary
* 32-bit values in that case. Since this is similar to sparc64,
* we adopt a virtual IRQ numbering scheme as is done there.
- * Virutal interrupt numbers are allocated by build_irq(). So NR_IRQS
+ * Virtual interrupt numbers are allocated by build_irq(). So NR_IRQS
* just becomes a limit of how many interrupt sources we can handle in
* a single system. Even fully loaded SS2000 machines top off at
* about 32 interrupt sources or so, therefore a NR_IRQS value of 64
* is more than enough.
- *
+ *
* We keep a map of per-PIL enable interrupts. These get wired
* up via the irq_chip->startup() method which gets invoked by
* the generic IRQ layer during request_irq().
@@ -135,9 +135,13 @@ unsigned int irq_alloc(unsigned int real_irq,
unsigned int pil)
return i;
}
- for (i = 1; i < NR_IRQS; i++) {
- if (!irq_table[i].irq)
- break;
+ if (real_irq < NR_IRQS && irq_table[real_irq].irq = 0) {
+ i = real_irq;
+ } else {
+ for (i = 1; i < NR_IRQS; i++) {
+ if (!irq_table[i].irq)
+ break;
+ }
}
if (i >= NR_IRQS) {
@@ -170,6 +174,20 @@ void irq_link(unsigned int irq)
irq_map[pil] = p;
}
+void irq_unlink(unsigned int irq)
+{
+ struct irq_bucket *p, **pnext;
+
+ BUG_ON(irq >= NR_IRQS);
+
+ p = &irq_table[irq];
+ BUG_ON(p->pil > SUN4D_MAX_IRQ);
+ pnext = &irq_map[p->pil];
+ while (*pnext != p)
+ pnext = &(*pnext)->next;
+ *pnext = p->next;
+}
+
int show_interrupts(struct seq_file *p, void *v)
{
int i = *(loff_t *) v, j;
--
1.6.3.3
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: Status update on sparc32 genirq support
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
` (24 preceding siblings ...)
2011-03-14 11:25 ` Daniel Hellstrom
@ 2011-03-14 17:03 ` Thomas Gleixner
25 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2011-03-14 17:03 UTC (permalink / raw)
To: sparclinux
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1952 bytes --]
On Sun, 13 Mar 2011, Sam Ravnborg wrote:
> Hi Daniel - thanks for looking at this patch.
>
> I was actually planning to send it to David tonight.
> But after your comments I will wait.
>
> > I have begun too look at the patches, one thing that strikes me is
> > why handle_level_irq is used and why the ack functions irq_ack and
> > irq_mask_ack are not defined.
>
> The sun4m at least uses level triggered interrupts.
> And looking at the implmentation the enable() and disable() functions
> in all cases did a simple mask and unmask - also the leon variants.
>
> So based on this observation I decided to go got the handle_level_irq
> flow handler. From the implmentation in kernel/irq/ I could
> also see that handle_level_irq always called irq_mask() / irq_unmask()
> which was a match towoards to earlier implemtnation.
>
> On top of this - this just worked for my sun4m box.
>
> > On the LEON architecture IRQs are
> > normally edge triggered, the exception beeing PCI interrupts that is
> > level triggered. Implementing the ack functions in the current
> > implementation would result in acking edge triggered IRQs which means
> > IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
> > edge triggered interrupts and that the CPU acks the IRQ automatically
> > when the trap is taken?
> >
> > What is the difference between having handle_level_irq without ACKs
> > implemented and having handle_edge_irq doing the interrupt flow
> > handling?
>
> Thomas? Can you help here? You are much more into these details than I am.
That largely depends on the hardware. There is hardware which
automatically acks either on trap entry or with the mask.
Also edge triggered interrupts can be handled entirely safe by
handle_level_irq if the hardware latches the edge even when the
interrupt line is masked.
W/o looking at the actual datasheets I can't tell.
Thanks,
tglx
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2011-03-14 17:03 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-08 7:01 Status update on sparc32 genirq support David Miller
2011-03-08 7:08 ` Sam Ravnborg
2011-03-08 7:19 ` David Miller
2011-03-08 7:37 ` Marcel van Nies
2011-03-08 7:45 ` Marcel van Nies
2011-03-08 11:17 ` Marcel van Nies
2011-03-08 20:22 ` Marcel van Nies
2011-03-08 21:09 ` Sam Ravnborg
2011-03-08 21:13 ` Marcel van Nies
2011-03-08 21:19 ` David Miller
2011-03-08 21:20 ` Marcel van Nies
2011-03-08 21:27 ` Marcel van Nies
2011-03-08 21:30 ` Marcel van Nies
2011-03-08 21:30 ` David Miller
2011-03-08 21:51 ` Marcel van Nies
2011-03-08 22:00 ` David Miller
2011-03-09 5:25 ` Bob Breuer
2011-03-09 6:16 ` Bob Breuer
2011-03-09 6:37 ` Bob Breuer
2011-03-09 20:17 ` David Miller
2011-03-11 21:26 ` Marcel van Nies
2011-03-11 22:40 ` Sam Ravnborg
2011-03-12 18:03 ` daniel
2011-03-13 21:13 ` Sam Ravnborg
2011-03-14 11:17 ` Daniel Hellstrom
2011-03-14 11:25 ` Daniel Hellstrom
2011-03-14 17:03 ` Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.