All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Status update on sparc32 genirq support
@ 2011-03-08  7:01 David Miller
  2011-03-08  7:08 ` Sam Ravnborg
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: David Miller @ 2011-03-08  7:01 UTC (permalink / raw)
  To: sparclinux

From: Sam Ravnborg <sam@ravnborg.org>
Date: Tue, 8 Mar 2011 07:00:39 +0100

> Added davem...
> We see strange SEGV faults in userspace and fail to read from ext2..
> All on some (but not all) sparc32 boxes.

I saw the original report.

But reverting this commit is the wrong thing to do from what I can
tell.

Either we have:

1) A compiler code gen bug.

2) Some piece of code which is sparc32 specific is invoking memset
   or memcpy in a way which makes assumptions which are in fact not
   valid

3) The code change is merely making cache offsets change, masking the
   true problem

Especially in cases #2 and #3 we're just hiding a heisen-bug and
not fixing the real problem.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
@ 2011-03-08  7:08 ` Sam Ravnborg
  2011-03-08  7:19 ` David Miller
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-08  7:08 UTC (permalink / raw)
  To: sparclinux

On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
> From: Sam Ravnborg <sam@ravnborg.org>
> Date: Tue, 8 Mar 2011 07:00:39 +0100
> 
> > Added davem...
> > We see strange SEGV faults in userspace and fail to read from ext2..
> > All on some (but not all) sparc32 boxes.
> 
> I saw the original report.
> 
> But reverting this commit is the wrong thing to do from what I can
> tell.
> 
> Either we have:
> 
> 1) A compiler code gen bug.
> 
> 2) Some piece of code which is sparc32 specific is invoking memset
>    or memcpy in a way which makes assumptions which are in fact not
>    valid
> 
> 3) The code change is merely making cache offsets change, masking the
>    true problem
> 
> Especially in cases #2 and #3 we're just hiding a heisen-bug and
> not fixing the real problem.
Agree on this.
But first step is to get confirmation that reverting this commit
indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
I hope we will find that 2) is the culprint.

	Sam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
  2011-03-08  7:08 ` Sam Ravnborg
@ 2011-03-08  7:19 ` David Miller
  2011-03-08  7:37 ` Marcel van Nies
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08  7:19 UTC (permalink / raw)
  To: sparclinux

From: Sam Ravnborg <sam@ravnborg.org>
Date: Tue, 8 Mar 2011 08:08:42 +0100

> But first step is to get confirmation that reverting this commit
> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
> I hope we will find that 2) is the culprint.

Agreed, information never hurts :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
  2011-03-08  7:08 ` Sam Ravnborg
  2011-03-08  7:19 ` David Miller
@ 2011-03-08  7:37 ` Marcel van Nies
  2011-03-08  7:45 ` Marcel van Nies
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08  7:37 UTC (permalink / raw)
  To: sparclinux

Hi,

It appears that two consecutive commits are causing problems on
hyperSPARC, I noticed that too late.

Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
earlier) only causes the system to hang, not panic:
[   11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
[   11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
[   11.299998] kjournald starting.  Commit interval 5 seconds
[   11.303332] EXT3-fs: mounted filesystem with writeback data mode.
[   11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[   11.309998] Freeing unused kernel memory: 100k freed
<system hangs here - stop-A does go back to prom>

and
commit c658ad1b4e1520511da8323aa5e60d444cc303ed
Author: David S. Miller <davem@davemloft.net>
Date:   Fri Dec 11 00:44:47 2009 -0800

    sparc64: Add syscall tracepoint support.

    Signed-off-by: David S. Miller <davem@davemloft.net>

actually makes the kernel panic:
[   11.336665] Freeing unused kernel memory: 100k freed
[   11.419998] Kernel panic - not syncing: Attempted to kill init!
[   11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
 [f0039490 : get_signal_to_deliver+0x338/0x35c ]
 [f00124cc : do_signal+0x30/0x8f0 ]
 [f0012da0 : do_notify_resume+0x14/0x38 ]
 [f000fca4 : signal_p+0x14/0x24 ]
 [f000edfc : srmmu_fault+0x58/0x68 ]
[   11.466665] Press Stop-A (L1-A) to return to the boot prom


Marcel


On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>> From: Sam Ravnborg <sam@ravnborg.org>
>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>
>> > Added davem...
>> > We see strange SEGV faults in userspace and fail to read from ext2..
>> > All on some (but not all) sparc32 boxes.
>>
>> I saw the original report.
>>
>> But reverting this commit is the wrong thing to do from what I can
>> tell.
>>
>> Either we have:
>>
>> 1) A compiler code gen bug.
>>
>> 2) Some piece of code which is sparc32 specific is invoking memset
>>    or memcpy in a way which makes assumptions which are in fact not
>>    valid
>>
>> 3) The code change is merely making cache offsets change, masking the
>>    true problem
>>
>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>> not fixing the real problem.
> Agree on this.
> But first step is to get confirmation that reverting this commit
> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
> I hope we will find that 2) is the culprint.
>
>        Sam
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (2 preceding siblings ...)
  2011-03-08  7:37 ` Marcel van Nies
@ 2011-03-08  7:45 ` Marcel van Nies
  2011-03-08 11:17 ` Marcel van Nies
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08  7:45 UTC (permalink / raw)
  To: sparclinux

Hi,

> But first step is to get confirmation that reverting this commit
> indeed fixes the bug

I'll try that.
M

On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> It appears that two consecutive commits are causing problems on
> hyperSPARC, I noticed that too late.
>
> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
> earlier) only causes the system to hang, not panic:
> [   11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
> [   11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
> [   11.299998] kjournald starting.  Commit interval 5 seconds
> [   11.303332] EXT3-fs: mounted filesystem with writeback data mode.
> [   11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
> [   11.309998] Freeing unused kernel memory: 100k freed
> <system hangs here - stop-A does go back to prom>
>
> and
> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
> Author: David S. Miller <davem@davemloft.net>
> Date:   Fri Dec 11 00:44:47 2009 -0800
>
>    sparc64: Add syscall tracepoint support.
>
>    Signed-off-by: David S. Miller <davem@davemloft.net>
>
> actually makes the kernel panic:
> [   11.336665] Freeing unused kernel memory: 100k freed
> [   11.419998] Kernel panic - not syncing: Attempted to kill init!
> [   11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>  [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>  [f00124cc : do_signal+0x30/0x8f0 ]
>  [f0012da0 : do_notify_resume+0x14/0x38 ]
>  [f000fca4 : signal_p+0x14/0x24 ]
>  [f000edfc : srmmu_fault+0x58/0x68 ]
> [   11.466665] Press Stop-A (L1-A) to return to the boot prom
>
>
> Marcel
>
>
> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>> From: Sam Ravnborg <sam@ravnborg.org>
>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>
>>> > Added davem...
>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>> > All on some (but not all) sparc32 boxes.
>>>
>>> I saw the original report.
>>>
>>> But reverting this commit is the wrong thing to do from what I can
>>> tell.
>>>
>>> Either we have:
>>>
>>> 1) A compiler code gen bug.
>>>
>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>    or memcpy in a way which makes assumptions which are in fact not
>>>    valid
>>>
>>> 3) The code change is merely making cache offsets change, masking the
>>>    true problem
>>>
>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>> not fixing the real problem.
>> Agree on this.
>> But first step is to get confirmation that reverting this commit
>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>> I hope we will find that 2) is the culprint.
>>
>>        Sam
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (3 preceding siblings ...)
  2011-03-08  7:45 ` Marcel van Nies
@ 2011-03-08 11:17 ` Marcel van Nies
  2011-03-08 20:22 ` Marcel van Nies
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 11:17 UTC (permalink / raw)
  To: sparclinux

Hi,

2.6.33.7 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
does not segfault.
I also tried sparc-next-2.6, but I messed up my tree somehow. I will
try again later.

M

On Tue, Mar 8, 2011 at 8:45 AM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
>> But first step is to get confirmation that reverting this commit
>> indeed fixes the bug
>
> I'll try that.
> M
>
> On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
>> Hi,
>>
>> It appears that two consecutive commits are causing problems on
>> hyperSPARC, I noticed that too late.
>>
>> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
>> earlier) only causes the system to hang, not panic:
>> [   11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
>> [   11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
>> [   11.299998] kjournald starting.  Commit interval 5 seconds
>> [   11.303332] EXT3-fs: mounted filesystem with writeback data mode.
>> [   11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
>> [   11.309998] Freeing unused kernel memory: 100k freed
>> <system hangs here - stop-A does go back to prom>
>>
>> and
>> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
>> Author: David S. Miller <davem@davemloft.net>
>> Date:   Fri Dec 11 00:44:47 2009 -0800
>>
>>    sparc64: Add syscall tracepoint support.
>>
>>    Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>> actually makes the kernel panic:
>> [   11.336665] Freeing unused kernel memory: 100k freed
>> [   11.419998] Kernel panic - not syncing: Attempted to kill init!
>> [   11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>>  [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>>  [f00124cc : do_signal+0x30/0x8f0 ]
>>  [f0012da0 : do_notify_resume+0x14/0x38 ]
>>  [f000fca4 : signal_p+0x14/0x24 ]
>>  [f000edfc : srmmu_fault+0x58/0x68 ]
>> [   11.466665] Press Stop-A (L1-A) to return to the boot prom
>>
>>
>> Marcel
>>
>>
>> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>>> From: Sam Ravnborg <sam@ravnborg.org>
>>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>>
>>>> > Added davem...
>>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>>> > All on some (but not all) sparc32 boxes.
>>>>
>>>> I saw the original report.
>>>>
>>>> But reverting this commit is the wrong thing to do from what I can
>>>> tell.
>>>>
>>>> Either we have:
>>>>
>>>> 1) A compiler code gen bug.
>>>>
>>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>>    or memcpy in a way which makes assumptions which are in fact not
>>>>    valid
>>>>
>>>> 3) The code change is merely making cache offsets change, masking the
>>>>    true problem
>>>>
>>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>>> not fixing the real problem.
>>> Agree on this.
>>> But first step is to get confirmation that reverting this commit
>>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>>> I hope we will find that 2) is the culprint.
>>>
>>>        Sam
>>>
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (4 preceding siblings ...)
  2011-03-08 11:17 ` Marcel van Nies
@ 2011-03-08 20:22 ` Marcel van Nies
  2011-03-08 21:09 ` Sam Ravnborg
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 20:22 UTC (permalink / raw)
  To: sparclinux

Hi,

The good news:
sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
does NOT segfault. I did not apply the genirq patch yet.

The bad news:
Segfault gone, say hello to EXT2 read failure   :o(

I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.


[    0.233333] esp: esp0, regs[fd00a000:fd009000] irq[36]
[    0.236666] esp: esp0 is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
[    3.243333] scsi0 : esp
[    3.483332] scsi 0:0:1:0: Direct-Access     FUJITSU  MAP3735N
SUN72G  0401 PQ: 0 ANSI: 4
[    3.486666] scsi target0:0:1: Beginning Domain Validation
[    3.493332] scsi target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
[    3.499999] scsi target0:0:1: Domain Validation skipping write tests
[    3.503332] scsi target0:0:1: Ending Domain Validation
[    3.743332] scsi 0:0:3:0: Direct-Access     FUJITSU  MAP3735N
SUN72G  0401 PQ: 0 ANSI: 4
[    3.746666] scsi target0:0:3: Beginning Domain Validation
[    3.753332] scsi target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
[    3.756666] scsi target0:0:3: Domain Validation skipping write tests
[    3.759999] scsi target0:0:3: Ending Domain Validation
[    4.469999] esp: esp1, regs[fd00c000:fd00b000] irq[53]
[    4.473332] esp: esp1 is a FASHME, 40 MHz (ccf=0), SCSI ID 7
[    7.479999] scsi1 : esp
...
[   11.029998] sd 0:0:1:0: [sda] 143374738 512-byte logical blocks:
(73.4 GB/68.3 GiB)
[   11.033332] sd 0:0:3:0: [sdb] 143374738 512-byte logical blocks:
(73.4 GB/68.3 GiB)
[   11.036665] sd 0:0:1:0: [sda] Write Protect is off
[   11.043332] sd 0:0:1:0: [sda] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[   11.046665] sd 0:0:3:0: [sdb] Write Protect is off
[   11.053332] sd 0:0:3:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[   11.066665]  sda: sda1 sda2 sda3
[   11.073332]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
[   11.089998] sd 0:0:1:0: [sda] Attached SCSI disk
[   11.093332] sd 0:0:3:0: [sdb] Attached SCSI disk
[   11.106665] EXT3-fs: barriers not enabled
[   11.113332] kjournald starting.  Commit interval 5 seconds
[   11.116665] EXT3-fs (sdb4): mounted filesystem with ordered data mode
[   11.119998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[   11.123332] Freeing unused kernel memory: 108k freed
INIT: version 2.86 booting
[   12.673332] NET: Registered protocol family 1

Gentoo Linux; http://www.gentoo.org/
 Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2

 * Mounting proc at /proc ...                                             [ ok ]
 * Mounting sysfs at /sys ...                                             [ ok ]
 * Mounting /dev for udev ...                                             [ ok ]
...
blahblah
...
 * Checking root filesystem ...fsck.ext3: No such file or directory
while trying to open /dev/sdb4
/dev/sdb4:
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

 * Filesystem couldn't be fixed :(
         [ !! ]
Give root password for maintenance
(or type Control-D to continue):


Marcel


On Tue, Mar 8, 2011 at 12:17 PM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> 2.6.33.7 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does not segfault.
> I also tried sparc-next-2.6, but I messed up my tree somehow. I will
> try again later.
>
> M
>
> On Tue, Mar 8, 2011 at 8:45 AM, Marcel van Nies <morcles@gmail.com> wrote:
>> Hi,
>>
>>> But first step is to get confirmation that reverting this commit
>>> indeed fixes the bug
>>
>> I'll try that.
>> M
>>
>> On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
>>> Hi,
>>>
>>> It appears that two consecutive commits are causing problems on
>>> hyperSPARC, I noticed that too late.
>>>
>>> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
>>> earlier) only causes the system to hang, not panic:
>>> [   11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
>>> [   11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
>>> [   11.299998] kjournald starting.  Commit interval 5 seconds
>>> [   11.303332] EXT3-fs: mounted filesystem with writeback data mode.
>>> [   11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
>>> [   11.309998] Freeing unused kernel memory: 100k freed
>>> <system hangs here - stop-A does go back to prom>
>>>
>>> and
>>> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
>>> Author: David S. Miller <davem@davemloft.net>
>>> Date:   Fri Dec 11 00:44:47 2009 -0800
>>>
>>>    sparc64: Add syscall tracepoint support.
>>>
>>>    Signed-off-by: David S. Miller <davem@davemloft.net>
>>>
>>> actually makes the kernel panic:
>>> [   11.336665] Freeing unused kernel memory: 100k freed
>>> [   11.419998] Kernel panic - not syncing: Attempted to kill init!
>>> [   11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>>>  [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>>>  [f00124cc : do_signal+0x30/0x8f0 ]
>>>  [f0012da0 : do_notify_resume+0x14/0x38 ]
>>>  [f000fca4 : signal_p+0x14/0x24 ]
>>>  [f000edfc : srmmu_fault+0x58/0x68 ]
>>> [   11.466665] Press Stop-A (L1-A) to return to the boot prom
>>>
>>>
>>> Marcel
>>>
>>>
>>> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>>>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>>>> From: Sam Ravnborg <sam@ravnborg.org>
>>>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>>>
>>>>> > Added davem...
>>>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>>>> > All on some (but not all) sparc32 boxes.
>>>>>
>>>>> I saw the original report.
>>>>>
>>>>> But reverting this commit is the wrong thing to do from what I can
>>>>> tell.
>>>>>
>>>>> Either we have:
>>>>>
>>>>> 1) A compiler code gen bug.
>>>>>
>>>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>>>    or memcpy in a way which makes assumptions which are in fact not
>>>>>    valid
>>>>>
>>>>> 3) The code change is merely making cache offsets change, masking the
>>>>>    true problem
>>>>>
>>>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>>>> not fixing the real problem.
>>>> Agree on this.
>>>> But first step is to get confirmation that reverting this commit
>>>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>>>> I hope we will find that 2) is the culprint.
>>>>
>>>>        Sam
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (5 preceding siblings ...)
  2011-03-08 20:22 ` Marcel van Nies
@ 2011-03-08 21:09 ` Sam Ravnborg
  2011-03-08 21:13 ` Marcel van Nies
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-08 21:09 UTC (permalink / raw)
  To: sparclinux

On Tue, Mar 08, 2011 at 09:22:07PM +0100, Marcel van Nies wrote:
> Hi,
> 
> The good news:
> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does NOT segfault. I did not apply the genirq patch yet.

Narrowing it down to a sinlge patch is good.

> 
> The bad news:
> Segfault gone, say hello to EXT2 read failure   :o(

So we are dealing with two faults. Not a suprise considering how
little we have tested on sparc32 lately.

> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.

I have tried said patch myself.
You may try to play around with the value as it produces a lot of output.

Regarding the segfault - the easiest way forward would be to split the
patch up in smaller chunks so we know which part causes the segfault to happen.

I assume you had to hand-apply the revert. If you could post the exact patch
you used for revert I will try to split it up in smaller logical parts.
But likely not until the weekend.

I have a sparcstation 5 that I managed to boot - unfortunately it
did not show erratic behaviour as you describe.
And for now I do not have any Ext2 filesystem on disk to play with.

	Sam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (6 preceding siblings ...)
  2011-03-08 21:09 ` Sam Ravnborg
@ 2011-03-08 21:13 ` Marcel van Nies
  2011-03-08 21:19 ` David Miller
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:13 UTC (permalink / raw)
  To: sparclinux

Hi,

As expected, esp_debug gives a lot of ouput.
Is there anything in particular to look out for ?

Btw:
At this point:
> Give root password for maintenance
> (or type Control-D to continue):

I can logon, and reads from disk seem to go fine.

# mount -n -o remount,rw /
Then also writes to disk seem to go fine.


So, is this an ESP or EXT2 bug at all ?

Marcel

On Tue, Mar 8, 2011 at 9:22 PM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> The good news:
> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does NOT segfault. I did not apply the genirq patch yet.
>
> The bad news:
> Segfault gone, say hello to EXT2 read failure   :o(
>
> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
>
>
> [    0.233333] esp: esp0, regs[fd00a000:fd009000] irq[36]
> [    0.236666] esp: esp0 is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
> [    3.243333] scsi0 : esp
> [    3.483332] scsi 0:0:1:0: Direct-Access     FUJITSU  MAP3735N
> SUN72G  0401 PQ: 0 ANSI: 4
> [    3.486666] scsi target0:0:1: Beginning Domain Validation
> [    3.493332] scsi target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [    3.499999] scsi target0:0:1: Domain Validation skipping write tests
> [    3.503332] scsi target0:0:1: Ending Domain Validation
> [    3.743332] scsi 0:0:3:0: Direct-Access     FUJITSU  MAP3735N
> SUN72G  0401 PQ: 0 ANSI: 4
> [    3.746666] scsi target0:0:3: Beginning Domain Validation
> [    3.753332] scsi target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [    3.756666] scsi target0:0:3: Domain Validation skipping write tests
> [    3.759999] scsi target0:0:3: Ending Domain Validation
> [    4.469999] esp: esp1, regs[fd00c000:fd00b000] irq[53]
> [    4.473332] esp: esp1 is a FASHME, 40 MHz (ccf=0), SCSI ID 7
> [    7.479999] scsi1 : esp
> ...
> [   11.029998] sd 0:0:1:0: [sda] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [   11.033332] sd 0:0:3:0: [sdb] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [   11.036665] sd 0:0:1:0: [sda] Write Protect is off
> [   11.043332] sd 0:0:1:0: [sda] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [   11.046665] sd 0:0:3:0: [sdb] Write Protect is off
> [   11.053332] sd 0:0:3:0: [sdb] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [   11.066665]  sda: sda1 sda2 sda3
> [   11.073332]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
> [   11.089998] sd 0:0:1:0: [sda] Attached SCSI disk
> [   11.093332] sd 0:0:3:0: [sdb] Attached SCSI disk
> [   11.106665] EXT3-fs: barriers not enabled
> [   11.113332] kjournald starting.  Commit interval 5 seconds
> [   11.116665] EXT3-fs (sdb4): mounted filesystem with ordered data mode
> [   11.119998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
> [   11.123332] Freeing unused kernel memory: 108k freed
> INIT: version 2.86 booting
> [   12.673332] NET: Registered protocol family 1
>
> Gentoo Linux; http://www.gentoo.org/
>  Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2
>
>  * Mounting proc at /proc ...                                             [ ok ]
>  * Mounting sysfs at /sys ...                                             [ ok ]
>  * Mounting /dev for udev ...                                             [ ok ]
> ...
> blahblah
> ...
>  * Checking root filesystem ...fsck.ext3: No such file or directory
> while trying to open /dev/sdb4
> /dev/sdb4:
> The superblock could not be read or does not describe a correct ext2
> filesystem.  If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
>    e2fsck -b 8193 <device>
>
>  * Filesystem couldn't be fixed :(
>         [ !! ]
> Give root password for maintenance
> (or type Control-D to continue):
>
>
> Marcel
>
>
> On Tue, Mar 8, 2011 at 12:17 PM, Marcel van Nies <morcles@gmail.com> wrote:
>> Hi,
>>
>> 2.6.33.7 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
>> does not segfault.
>> I also tried sparc-next-2.6, but I messed up my tree somehow. I will
>> try again later.
>>
>> M
>>
>> On Tue, Mar 8, 2011 at 8:45 AM, Marcel van Nies <morcles@gmail.com> wrote:
>>> Hi,
>>>
>>>> But first step is to get confirmation that reverting this commit
>>>> indeed fixes the bug
>>>
>>> I'll try that.
>>> M
>>>
>>> On Tue, Mar 8, 2011 at 8:37 AM, Marcel van Nies <morcles@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> It appears that two consecutive commits are causing problems on
>>>> hyperSPARC, I noticed that too late.
>>>>
>>>> Commit 4d14a459857bd151ecbd14bcd37b4628da00792b (the one I reported
>>>> earlier) only causes the system to hang, not panic:
>>>> [   11.266665] sd 0:0:1:0: [sda] Attached SCSI disk
>>>> [   11.279998] sd 0:0:3:0: [sdb] Attached SCSI disk
>>>> [   11.299998] kjournald starting.  Commit interval 5 seconds
>>>> [   11.303332] EXT3-fs: mounted filesystem with writeback data mode.
>>>> [   11.306665] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
>>>> [   11.309998] Freeing unused kernel memory: 100k freed
>>>> <system hangs here - stop-A does go back to prom>
>>>>
>>>> and
>>>> commit c658ad1b4e1520511da8323aa5e60d444cc303ed
>>>> Author: David S. Miller <davem@davemloft.net>
>>>> Date:   Fri Dec 11 00:44:47 2009 -0800
>>>>
>>>>    sparc64: Add syscall tracepoint support.
>>>>
>>>>    Signed-off-by: David S. Miller <davem@davemloft.net>
>>>>
>>>> actually makes the kernel panic:
>>>> [   11.336665] Freeing unused kernel memory: 100k freed
>>>> [   11.419998] Kernel panic - not syncing: Attempted to kill init!
>>>> [   11.423332] [f002f5b8 : do_group_exit+0x84/0xb4 ]
>>>>  [f0039490 : get_signal_to_deliver+0x338/0x35c ]
>>>>  [f00124cc : do_signal+0x30/0x8f0 ]
>>>>  [f0012da0 : do_notify_resume+0x14/0x38 ]
>>>>  [f000fca4 : signal_p+0x14/0x24 ]
>>>>  [f000edfc : srmmu_fault+0x58/0x68 ]
>>>> [   11.466665] Press Stop-A (L1-A) to return to the boot prom
>>>>
>>>>
>>>> Marcel
>>>>
>>>>
>>>> On Tue, Mar 8, 2011 at 8:08 AM, Sam Ravnborg <sam@ravnborg.org> wrote:
>>>>> On Mon, Mar 07, 2011 at 11:01:20PM -0800, David Miller wrote:
>>>>>> From: Sam Ravnborg <sam@ravnborg.org>
>>>>>> Date: Tue, 8 Mar 2011 07:00:39 +0100
>>>>>>
>>>>>> > Added davem...
>>>>>> > We see strange SEGV faults in userspace and fail to read from ext2..
>>>>>> > All on some (but not all) sparc32 boxes.
>>>>>>
>>>>>> I saw the original report.
>>>>>>
>>>>>> But reverting this commit is the wrong thing to do from what I can
>>>>>> tell.
>>>>>>
>>>>>> Either we have:
>>>>>>
>>>>>> 1) A compiler code gen bug.
>>>>>>
>>>>>> 2) Some piece of code which is sparc32 specific is invoking memset
>>>>>>    or memcpy in a way which makes assumptions which are in fact not
>>>>>>    valid
>>>>>>
>>>>>> 3) The code change is merely making cache offsets change, masking the
>>>>>>    true problem
>>>>>>
>>>>>> Especially in cases #2 and #3 we're just hiding a heisen-bug and
>>>>>> not fixing the real problem.
>>>>> Agree on this.
>>>>> But first step is to get confirmation that reverting this commit
>>>>> indeed fixes the bug. Then we can go hunting for 2), 3) or 1).
>>>>> I hope we will find that 2) is the culprint.
>>>>>
>>>>>        Sam
>>>>>
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (7 preceding siblings ...)
  2011-03-08 21:13 ` Marcel van Nies
@ 2011-03-08 21:19 ` David Miller
  2011-03-08 21:20 ` Marcel van Nies
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 21:19 UTC (permalink / raw)
  To: sparclinux

From: Marcel van Nies <morcles@gmail.com>
Date: Tue, 8 Mar 2011 22:13:05 +0100

> So, is this an ESP or EXT2 bug at all ?

The error message is that the program "fsck.ext3" cannot be found.
Does that binary exist in the correct location so that fsck can
find it?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (8 preceding siblings ...)
  2011-03-08 21:19 ` David Miller
@ 2011-03-08 21:20 ` Marcel van Nies
  2011-03-08 21:27 ` Marcel van Nies
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:20 UTC (permalink / raw)
  To: sparclinux

Hi Sam,

> Regarding the segfault - the easiest way forward would be to split the
> patch up in smaller chunks so we know which part causes the segfault to happen.

Yes, I'll see if I can work that out.

> I assume you had to hand-apply the revert.
That's what I did.

> If you could post the exact patch you used for revert
> I will try to split it up in smaller logical parts.
I'll post that patch later.

> I have a sparcstation 5 that I managed to boot -
> unfortunately itdid not show erratic behaviour as you describe.

My sparcSTATION 5 boots sparc-next-2.6 with your genirq patch just fine too.
It's up for almost 2 days now, doing things.  It has 4 EXT3
filesystems. No problemo.

Marcel


On Tue, Mar 8, 2011 at 10:09 PM, Sam Ravnborg <sam@ravnborg.org> wrote:
> On Tue, Mar 08, 2011 at 09:22:07PM +0100, Marcel van Nies wrote:
>> Hi,
>>
>> The good news:
>> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
>> does NOT segfault. I did not apply the genirq patch yet.
>
> Narrowing it down to a sinlge patch is good.
>
>>
>> The bad news:
>> Segfault gone, say hello to EXT2 read failure   :o(
>
> So we are dealing with two faults. Not a suprise considering how
> little we have tested on sparc32 lately.
>
>> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
>
> I have tried said patch myself.
> You may try to play around with the value as it produces a lot of output.
>
> Regarding the segfault - the easiest way forward would be to split the
> patch up in smaller chunks so we know which part causes the segfault to happen.
>
> I assume you had to hand-apply the revert. If you could post the exact patch
> you used for revert I will try to split it up in smaller logical parts.
> But likely not until the weekend.
>
> I have a sparcstation 5 that I managed to boot - unfortunately it
> did not show erratic behaviour as you describe.
> And for now I do not have any Ext2 filesystem on disk to play with.
>
>        Sam
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (9 preceding siblings ...)
  2011-03-08 21:20 ` Marcel van Nies
@ 2011-03-08 21:27 ` Marcel van Nies
  2011-03-08 21:30 ` Marcel van Nies
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:27 UTC (permalink / raw)
  To: sparclinux

Hi,

/sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.

I can happily boot anything 2.6.32.27 or earlier.

Marcel

On Tue, Mar 8, 2011 at 10:19 PM, David Miller <davem@davemloft.net> wrote:
> From: Marcel van Nies <morcles@gmail.com>
> Date: Tue, 8 Mar 2011 22:13:05 +0100
>
>> So, is this an ESP or EXT2 bug at all ?
>
> The error message is that the program "fsck.ext3" cannot be found.
> Does that binary exist in the correct location so that fsck can
> find it?
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (10 preceding siblings ...)
  2011-03-08 21:27 ` Marcel van Nies
@ 2011-03-08 21:30 ` Marcel van Nies
  2011-03-08 21:30 ` David Miller
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:30 UTC (permalink / raw)
  To: sparclinux

Correction:
/sbin/fsck.ext3 is a link to e2fsck.
/sbin/e2fsck is there.

Is the link the problem ?
Let's see.

M

On Tue, Mar 8, 2011 at 10:27 PM, Marcel van Nies <morcles@gmail.com> wrote:
> Hi,
>
> /sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
>
> I can happily boot anything 2.6.32.27 or earlier.
>
> Marcel
>
> On Tue, Mar 8, 2011 at 10:19 PM, David Miller <davem@davemloft.net> wrote:
>> From: Marcel van Nies <morcles@gmail.com>
>> Date: Tue, 8 Mar 2011 22:13:05 +0100
>>
>>> So, is this an ESP or EXT2 bug at all ?
>>
>> The error message is that the program "fsck.ext3" cannot be found.
>> Does that binary exist in the correct location so that fsck can
>> find it?
>>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (11 preceding siblings ...)
  2011-03-08 21:30 ` Marcel van Nies
@ 2011-03-08 21:30 ` David Miller
  2011-03-08 21:51 ` Marcel van Nies
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 21:30 UTC (permalink / raw)
  To: sparclinux

From: Marcel van Nies <morcles@gmail.com>
Date: Tue, 8 Mar 2011 22:27:39 +0100

> /sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
> 
> I can happily boot anything 2.6.32.27 or earlier.

One large possibility is that there is a missing cache flush somewhere,
and reverting the memcpy/memset change masks it.

Or, like I said yesterday, bad gcc code generation wrt. those routines.

Looking at ESP logs is not going to give much information as that driver
has been stressed heavily on sparc64 for years without any blips.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (12 preceding siblings ...)
  2011-03-08 21:30 ` David Miller
@ 2011-03-08 21:51 ` Marcel van Nies
  2011-03-08 22:00 ` David Miller
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-08 21:51 UTC (permalink / raw)
  To: sparclinux

Hi,

> One large possibility is that there is a missing cache flush somewhere,
> and reverting the memcpy/memset change masks it.

2.6.33.7 with reverted commit is fine too, still leaving
2.6.34 - 2.6.38 for introducing weird behavior.



I've got 3 kinds of SPARC32 here, and the big difference is CPU.
All other hardware in the boxes is "the same".
The software (kernel et.al.) is pretty much the same too.
Everything is ok, except for hyperSPARC.

So what makes the difference? This one :
[    0.000000] Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek (jj@ultra.linux.cz).
 Patching kernel for srmmu[ROSS HyperSparc]/iommu

Combine that with srmmu_fault as in
init[1]: segfault at 0 ip 5000dac8 (rpc f000eea8) spefe738a0 error
30001 in ld-2.3.5.so[50000000+1a000]
Kernel panic - not syncing: Attempted to kill init!
 [f002ed74 : do_group_exit+0x84/0xb4 ]
 [f0039a24 : get_signal_to_deliver+0x338/0x35c ]
 [f0011fbc : do_signal+0x30/0x914 ]
 [f00128b4 : do_notify_resume+0x14/0x38 ]
 [f000fd50 : signal_p+0x14/0x24 ]
 [f000eea8 : srmmu_fault+0x58/0x68 ]

I start thinking there is something wrong with Jakub's srmmu patch for
hyperSPARC...

Marcel


On Tue, Mar 8, 2011 at 10:30 PM, David Miller <davem@davemloft.net> wrote:
> From: Marcel van Nies <morcles@gmail.com>
> Date: Tue, 8 Mar 2011 22:27:39 +0100
>
>> /sbin/fsck.ext3 is there, it's used to check / in runlevel 1, i.e. every boot.
>>
>> I can happily boot anything 2.6.32.27 or earlier.
>
> One large possibility is that there is a missing cache flush somewhere,
> and reverting the memcpy/memset change masks it.
>
> Or, like I said yesterday, bad gcc code generation wrt. those routines.
>
> Looking at ESP logs is not going to give much information as that driver
> has been stressed heavily on sparc64 for years without any blips.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (13 preceding siblings ...)
  2011-03-08 21:51 ` Marcel van Nies
@ 2011-03-08 22:00 ` David Miller
  2011-03-09  5:25 ` Bob Breuer
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-08 22:00 UTC (permalink / raw)
  To: sparclinux

From: Marcel van Nies <morcles@gmail.com>
Date: Tue, 8 Mar 2011 22:51:00 +0100

> I start thinking there is something wrong with Jakub's srmmu patch for
> hyperSPARC...

That's just the code that links up the cpu specific cache and tlb
flushing routines, it's been doing that for more than 10 years.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (14 preceding siblings ...)
  2011-03-08 22:00 ` David Miller
@ 2011-03-09  5:25 ` Bob Breuer
  2011-03-09  6:16 ` Bob Breuer
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Bob Breuer @ 2011-03-09  5:25 UTC (permalink / raw)
  To: sparclinux

Marcel van Nies wrote:
> Hi,
> 
> The good news:
> sparc-next-2.6 with commit 4d14a459857bd151ecbd14bcd37b4628da00792b reverted
> does NOT segfault. I did not apply the genirq patch yet.
> 
> The bad news:
> Segfault gone, say hello to EXT2 read failure   :o(
> 
> I'll rebuild this kernel with the esp_debug.patch Sam sent a couple of days ago.
> 
> 
> [    0.233333] esp: esp0, regs[fd00a000:fd009000] irq[36]
> [    0.236666] esp: esp0 is a FAS100A, 40 MHz (ccf=0), SCSI ID 7
> [    3.243333] scsi0 : esp
> [    3.483332] scsi 0:0:1:0: Direct-Access     FUJITSU  MAP3735N
> SUN72G  0401 PQ: 0 ANSI: 4
> [    3.486666] scsi target0:0:1: Beginning Domain Validation
> [    3.493332] scsi target0:0:1: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [    3.499999] scsi target0:0:1: Domain Validation skipping write tests
> [    3.503332] scsi target0:0:1: Ending Domain Validation
> [    3.743332] scsi 0:0:3:0: Direct-Access     FUJITSU  MAP3735N
> SUN72G  0401 PQ: 0 ANSI: 4
> [    3.746666] scsi target0:0:3: Beginning Domain Validation
> [    3.753332] scsi target0:0:3: FAST-10 SCSI 10.0 MB/s ST (100 ns, offset 15)
> [    3.756666] scsi target0:0:3: Domain Validation skipping write tests
> [    3.759999] scsi target0:0:3: Ending Domain Validation
> [    4.469999] esp: esp1, regs[fd00c000:fd00b000] irq[53]
> [    4.473332] esp: esp1 is a FASHME, 40 MHz (ccf=0), SCSI ID 7
> [    7.479999] scsi1 : esp
> ...
> [   11.029998] sd 0:0:1:0: [sda] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [   11.033332] sd 0:0:3:0: [sdb] 143374738 512-byte logical blocks:
> (73.4 GB/68.3 GiB)
> [   11.036665] sd 0:0:1:0: [sda] Write Protect is off
> [   11.043332] sd 0:0:1:0: [sda] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [   11.046665] sd 0:0:3:0: [sdb] Write Protect is off
> [   11.053332] sd 0:0:3:0: [sdb] Write cache: disabled, read cache:
> enabled, doesn't support DPO or FUA
> [   11.066665]  sda: sda1 sda2 sda3
> [   11.073332]  sdb: sdb1 sdb2 sdb3 sdb4 sdb5 sdb6 sdb7
> [   11.089998] sd 0:0:1:0: [sda] Attached SCSI disk
> [   11.093332] sd 0:0:3:0: [sdb] Attached SCSI disk
> [   11.106665] EXT3-fs: barriers not enabled
> [   11.113332] kjournald starting.  Commit interval 5 seconds
> [   11.116665] EXT3-fs (sdb4): mounted filesystem with ordered data mode
> [   11.119998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
> [   11.123332] Freeing unused kernel memory: 108k freed
> INIT: version 2.86 booting
> [   12.673332] NET: Registered protocol family 1
> 
> Gentoo Linux; http://www.gentoo.org/
>  Copyright 1999-2007 Gentoo Foundation; Distributed under the GPLv2
> 
>  * Mounting proc at /proc ...                                             [ ok ]
>  * Mounting sysfs at /sys ...                                             [ ok ]
>  * Mounting /dev for udev ...                                             [ ok ]
> ...
> blahblah
> ...
>  * Checking root filesystem ...fsck.ext3: No such file or directory
> while trying to open /dev/sdb4


Check if the device node for /dev/sdb4 exists.

For me, udev will sometimes fail to create a device node with the latest
kernel.  This is with a SuperSparc cpu, so it's not hyperSparc related.
 Of course, it could just be my udev acting up, as I get error messages
from udev such as:

Starting udev: udevd[88]: udev_event_run: fork of child failed: Invalid
argument
udevd[88]: udev_event_run: fork of child failed: Invalid argument
udevd[88]: udev_event_run: fork of child failed: Invalid argument
udevd[88]: udev_event_run: fork of child failed: Invalid argument

I'm not sure whether the invalid argument is a udev problem, userspace
incompatibility, kernel bug, or a kernel feature I left out.

Bob


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (15 preceding siblings ...)
  2011-03-09  5:25 ` Bob Breuer
@ 2011-03-09  6:16 ` Bob Breuer
  2011-03-09  6:37 ` Bob Breuer
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Bob Breuer @ 2011-03-09  6:16 UTC (permalink / raw)
  To: sparclinux

Marcel van Nies wrote:
> Hi,
> 
> git bisect came up with this:
> 
> 4d14a459857bd151ecbd14bcd37b4628da00792b is the first bad commit
> commit 4d14a459857bd151ecbd14bcd37b4628da00792b
> Author: David S. Miller <davem@davemloft.net>
> Date:   Thu Dec 10 23:32:10 2009 -0800
> 
>     sparc: Stop trying to be so fancy and use __builtin_{memcpy,memset}()
> 
>     This mirrors commit ff60fab71bb3b4fdbf8caf57ff3739ffd0887396
>     (x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy)
> 
>     Signed-off-by: David S. Miller <davem@davemloft.net>


My guess is that we're no longer using the special hyperSparc block copy
and fill from mm/hypersparc.S and are now leaving some data in the cache
that wasn't there before.

Unfortunately, my hyperSparc is failing from a completely different commit:
commit b6a2fea39318e43fee84fa7b0b90d68bed92d2ba
Author: Ollie Wild <aaw@google.com>
Date:   Thu Jul 19 01:48:16 2007 -0700

    mm: variable length argument support


The result I have is that the argv array for new commands looks
completely empty leading to this strange output when booting:

Freeing unused kernel memory: 144k freed
modprobe: FATAL: Module  not found.

INIT: version 2.86 booting
: : No such file or directory
INIT: Entering runlevel: 3
: : No such file or directory


There must be a missing cache flush somewhere in there that's needed for
the argv array...

Bob

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (16 preceding siblings ...)
  2011-03-09  6:16 ` Bob Breuer
@ 2011-03-09  6:37 ` Bob Breuer
  2011-03-09 20:17 ` David Miller
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Bob Breuer @ 2011-03-09  6:37 UTC (permalink / raw)
  To: sparclinux

Marcel van Nies wrote:
> Hi,
> 
>> One large possibility is that there is a missing cache flush somewhere,
>> and reverting the memcpy/memset change masks it.
> 
> 2.6.33.7 with reverted commit is fine too, still leaving
> 2.6.34 - 2.6.38 for introducing weird behavior.
> 
> 
> 
> I've got 3 kinds of SPARC32 here, and the big difference is CPU.
> All other hardware in the boxes is "the same".
> The software (kernel et.al.) is pretty much the same too.
> Everything is ok, except for hyperSPARC.
> 
> So what makes the difference? This one :
> [    0.000000] Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek (jj@ultra.linux.cz).
>  Patching kernel for srmmu[ROSS HyperSparc]/iommu
> 
> Combine that with srmmu_fault as in
> init[1]: segfault at 0 ip 5000dac8 (rpc f000eea8) spefe738a0 error
> 30001 in ld-2.3.5.so[50000000+1a000]
> Kernel panic - not syncing: Attempted to kill init!
>  [f002ed74 : do_group_exit+0x84/0xb4 ]
>  [f0039a24 : get_signal_to_deliver+0x338/0x35c ]
>  [f0011fbc : do_signal+0x30/0x914 ]
>  [f00128b4 : do_notify_resume+0x14/0x38 ]
>  [f000fd50 : signal_p+0x14/0x24 ]
>  [f000eea8 : srmmu_fault+0x58/0x68 ]
> 
> I start thinking there is something wrong with Jakub's srmmu patch for
> hyperSPARC...

No, hyperSparc just accesses it's data cache in a peculiar way.  It is
1-way virtually indexed, so when you map the same physical page to 2
different virtual addresses, you can actually end up with 2 different
cache lines being used simultaneously for the same chunk of physical
memory.  This is known as cache-aliasing.  SuperSparc accesses it's
cache using the physical address and doesn't suffer from the same problem.

Bob


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (17 preceding siblings ...)
  2011-03-09  6:37 ` Bob Breuer
@ 2011-03-09 20:17 ` David Miller
  2011-03-11 21:26 ` Marcel van Nies
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: David Miller @ 2011-03-09 20:17 UTC (permalink / raw)
  To: sparclinux

From: Bob Breuer <breuerr@mc.net>
Date: Wed, 09 Mar 2011 00:16:50 -0600

> My guess is that we're no longer using the special hyperSparc block copy
> and fill from mm/hypersparc.S and are now leaving some data in the cache
> that wasn't there before.

That is a possibility.

But let's be clear that this only applies to:

1) memcpy calls with constant "count" of PAGE_SIZE
2) memset calls with constant "c" of zero and "count" of PAGE_SIZE

Since those are the only cases where memset/memcpy gets translated
into calls to those optimized Hypersparc routines.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (18 preceding siblings ...)
  2011-03-09 20:17 ` David Miller
@ 2011-03-11 21:26 ` Marcel van Nies
  2011-03-11 22:40 ` Sam Ravnborg
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Marcel van Nies @ 2011-03-11 21:26 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: text/plain, Size: 2842 bytes --]

Hi,

> Regarding the segfault - the easiest way forward would be to split the
> patch up in smaller chunks so we know which part causes the segfault to happen.
>
> I assume you had to hand-apply the revert. If you could post the exact patch
> you used for revert I will try to split it up in smaller logical parts.

I attached the patch which I used to revert commit
4d14a459857bd151ecbd14bcd37b4628da00792b

I did a split of this patch, and build kernels with only the memcpy or
only the memset part reverted.
They both segfault.

memset reverted
===============
[   11.129998] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[   11.133332] Freeing unused kernel memory: 108k freed
[   11.196665] init[1]: segfault at 0 ip 5000dac8 (rpc f000eea4)
 sp eff4f8a0 error 30001 in ld-2.3.5.so[50000000+1a000]
[   12.023332] Kernel panic - not syncing: Attempted to kill init!
[   12.026665] [f002f494 : do_group_exit+0x84/0xb4 ]
 [f003a130 : get_signal_to_deliver+0x338/0x35c ]
 [f0012654 : do_signal+0x30/0x914 ]
 [f0012f4c : do_notify_resume+0x14/0x38 ]
 [f000fd4c : signal_p+0x14/0x24 ]
 [f000eea4 : srmmu_fault+0x58/0x68 ]
[   12.069998] Press Stop-A (L1-A) to return to the boot prom

memcpy reverted
===============
[   11.113332] VFS: Mounted root (ext3 filesystem) readonly on device 8:20.
[   11.119998] Freeing unused kernel memory: 108k freed
INIT: version 2.86 booting
[   12.453332] bash[23]: segfault at ffffffe8 ip 5015e81c (rpc 5015e80c)
 sp ef868ac8 error 30002 in libc-2.3.5.so[5009a000+11e000]
[   12.533332] rc[26]: segfault at 50189568 ip 50013ce0 (rpc 5000e034)
 sp efbb1558 error 30001 in ld-2.3.5.so[50000000+1a000]
[   13.156665] modprobe[43]: segfault at 24 ip 500aebdc (rpc 500aeb1c)
 sp ef8fd800 error 30001 in libc-2.3.5.so[5004e000+11e000]
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
[   13.316665] modprobe[39]: segfault at 50 ip 500b7e28 (rpc 500b7e04)
 sp ef885e18 error 30001 in libc-2.3.5.so[5004e000+11e000]
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: *** glibc detected *** realloc(): invalid next size: 0x000262d0 ***
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: FATAL: Error inserting unix
(/lib/modules/2.6.38-rc2-up-rev-memcpy/kernel/net/unix/unix.ko):
Invalid module format
modprobe: *** glibc detected *** realloc(): invalid pointer: 0x000262d0 ***
INIT: Entering runlevel: 3
[   14.169998] rc[44]: segfault at 3 ip 50013c2c (rpc 5000e034)
 sp eff09558 error 30001 in ld-2.3.5.so[50000000+1a000]

Marcel

[-- Attachment #2: revert.patch --]
[-- Type: application/octet-stream, Size: 6049 bytes --]

diff -uNr a/arch/sparc/include/asm/string_32.h b/arch/sparc/include/asm/string_32.h
--- a/arch/sparc/include/asm/string_32.h
+++ b/arch/sparc/include/asm/string_32.h
@@ -16,7 +16,9 @@
 #ifdef __KERNEL__
 
 extern void __memmove(void *,const void *,__kernel_size_t);
-
+extern __kernel_size_t __memcpy(void *,const void *,__kernel_size_t);
+extern __kernel_size_t __memset(void *,int,__kernel_size_t);
+ 
 #ifndef EXPORT_SYMTAB_STROPS
 
 /* First the mem*() things. */
@@ -30,10 +32,82 @@
 })
 
 #define __HAVE_ARCH_MEMCPY
-#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
+
+static inline void *__constant_memcpy(void *to, const void *from, __kernel_size_t n)
+{
+       extern void __copy_1page(void *, const void *);
+
+       if(n <= 32) {
+               __builtin_memcpy(to, from, n);
+       } else if (((unsigned int) to & 7) != 0) {
+               /* Destination is not aligned on the double-word boundary */
+               __memcpy(to, from, n);
+       } else {
+               switch(n) {
+               case PAGE_SIZE:
+                       __copy_1page(to, from);
+                       break;
+               default:
+                       __memcpy(to, from, n);
+                       break;
+               }
+       }
+       return to;
+}
+
+static inline void *__nonconstant_memcpy(void *to, const void *from, __kernel_size_t n)
+{
+       __memcpy(to, from, n);
+       return to;
+}
+
+#undef memcpy
+#define memcpy(t, f, n) \
+(__builtin_constant_p(n) ? \
+ __constant_memcpy((t),(f),(n)) : \
+ __nonconstant_memcpy((t),(f),(n)))
 
 #define __HAVE_ARCH_MEMSET
-#define memset(s, c, count) __builtin_memset(s, c, count)
+
+static inline void *__constant_c_and_count_memset(void *s, char c, __kernel_size_t count)
+{
+       extern void bzero_1page(void *);
+       extern __kernel_size_t __bzero(void *, __kernel_size_t);
+
+       if(!c) {
+               if(count == PAGE_SIZE)
+                       bzero_1page(s);
+               else
+                       __bzero(s, count);
+       } else {
+               __memset(s, c, count);
+       }
+       return s;
+}
+
+static inline void *__constant_c_memset(void *s, char c, __kernel_size_t count)
+{
+       extern __kernel_size_t __bzero(void *, __kernel_size_t);
+
+       if(!c)
+               __bzero(s, count);
+       else
+               __memset(s, c, count);
+       return s;
+}
+
+static inline void *__nonconstant_memset(void *s, char c, __kernel_size_t count)
+{
+       __memset(s, c, count);
+       return s;
+}
+
+#undef memset
+#define memset(s, c, count) \
+(__builtin_constant_p(c) ? (__builtin_constant_p(count) ? \
+                            __constant_c_and_count_memset((s), (c), (count)) : \
+                            __constant_c_memset((s), (c), (count))) \
+                          : __nonconstant_memset((s), (c), (count)))
 
 #define __HAVE_ARCH_MEMSCAN
 
diff -uNr a/arch/sparc/include/asm/string_64.h b/arch/sparc/include/asm/string_64.h
--- a/arch/sparc/include/asm/string_64.h
+++ b/arch/sparc/include/asm/string_64.h
@@ -15,6 +15,8 @@
 
 #include <asm/asi.h>
 
+extern void *__memset(void *,int,__kernel_size_t);
+
 #ifndef EXPORT_SYMTAB_STROPS
 
 /* First the mem*() things. */
@@ -22,10 +24,29 @@
 extern void *memmove(void *, const void *, __kernel_size_t);
 
 #define __HAVE_ARCH_MEMCPY
-#define memcpy(t, f, n) __builtin_memcpy(t, f, n)
+extern void *memcpy(void *, const void *, __kernel_size_t);
 
 #define __HAVE_ARCH_MEMSET
-#define memset(s, c, count) __builtin_memset(s, c, count)
+extern void *__builtin_memset(void *,int,__kernel_size_t);
+
+static inline void *__constant_memset(void *s, int c, __kernel_size_t count)
+{
+       extern __kernel_size_t __bzero(void *, __kernel_size_t);
+
+       if (!c) {
+               __bzero(s, count);
+               return s;
+       } else
+               return __memset(s, c, count);
+}
+
+#undef memset
+#define memset(s, c, count) \
+((__builtin_constant_p(count) && (count) <= 32) ? \
+ __builtin_memset((s), (c), (count)) : \
+ (__builtin_constant_p(c) ? \
+  __constant_memset((s), (c), (count)) : \
+  __memset((s), (c), (count))))
 
 #define __HAVE_ARCH_MEMSCAN
 
diff -uNr a/arch/sparc/lib/bzero.S b/arch/sparc/lib/bzero.S
--- a/arch/sparc/lib/bzero.S
+++ b/arch/sparc/lib/bzero.S
@@ -6,6 +6,10 @@
 
 	.text
 
+       .globl  __memset
+       .type   __memset, #function
+__memset:              /* %o0=buf, %o1=pat, %o2=len */
+
 	.globl	memset
 	.type	memset, #function
 memset:			/* %o0=buf, %o1=pat, %o2=len */
@@ -79,6 +83,7 @@
 	retl
 	 mov		%o3, %o0
 	.size		__bzero, .-__bzero
+	.size		__memset, .-__memset
 	.size		memset, .-memset
 
 #define EX_ST(x,y)		\
diff -uNr a/arch/sparc/lib/checksum_32.S b/arch/sparc/lib/checksum_32.S
--- a/arch/sparc/lib/checksum_32.S
+++ b/arch/sparc/lib/checksum_32.S
@@ -560,7 +560,7 @@
 	 mov	%i0, %o1
 	mov	%i1, %o0
 5:
-	call	memcpy
+	call	__memcpy
 	 mov	%i2, %o2
 	tst	%o0
 	bne,a	2f
diff -uNr a/arch/sparc/lib/ksyms.c b/arch/sparc/lib/ksyms.c
--- a/arch/sparc/lib/ksyms.c
+++ b/arch/sparc/lib/ksyms.c
@@ -30,6 +30,7 @@
 EXPORT_SYMBOL(memcmp);
 EXPORT_SYMBOL(memcpy);
 EXPORT_SYMBOL(memset);
+EXPORT_SYMBOL(__memset);
 EXPORT_SYMBOL(memmove);
 EXPORT_SYMBOL(__bzero);
 
@@ -80,6 +81,7 @@
 
 /* Special internal versions of library functions. */
 EXPORT_SYMBOL(__copy_1page);
+EXPORT_SYMBOL(__memcpy);
 EXPORT_SYMBOL(__memmove);
 EXPORT_SYMBOL(bzero_1page);
 
diff -uNr a/arch/sparc/lib/memcpy.S b/arch/sparc/lib/memcpy.S
--- a/arch/sparc/lib/memcpy.S
+++ b/arch/sparc/lib/memcpy.S
@@ -543,6 +543,9 @@
 	b		3f
 	 add		%o0, 2, %o0
 
+#ifdef __KERNEL__
+FUNC(__memcpy)
+#endif
 FUNC(memcpy)	/* %o0=dst %o1=src %o2=len */
 
 	sub		%o0, %o1, %o4
diff -uNr a/arch/sparc/lib/memset.S b/arch/sparc/lib/memset.S
--- a/arch/sparc/lib/memset.S
+++ b/arch/sparc/lib/memset.S
@@ -60,10 +60,11 @@
         .globl  __bzero_begin
 __bzero_begin:
 
-	.globl	__bzero
+	.globl	__bzero,	__memset,
 	.globl	memset
 	.globl	__memset_start, __memset_end
 __memset_start:
+__memset:
 memset:
 	and	%o1, 0xff, %g3
 	sll	%g3, 8, %g2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (19 preceding siblings ...)
  2011-03-11 21:26 ` Marcel van Nies
@ 2011-03-11 22:40 ` Sam Ravnborg
  2011-03-12 18:03 ` daniel
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-11 22:40 UTC (permalink / raw)
  To: sparclinux

Hi Marcel.

On Fri, Mar 11, 2011 at 10:26:36PM +0100, Marcel van Nies wrote:
> Hi,
> 
> > Regarding the segfault - the easiest way forward would be to split the
> > patch up in smaller chunks so we know which part causes the segfault to happen.
> >
> > I assume you had to hand-apply the revert. If you could post the exact patch
> > you used for revert I will try to split it up in smaller logical parts.
> 
> I attached the patch which I used to revert commit
> 4d14a459857bd151ecbd14bcd37b4628da00792b
> 
> I did a split of this patch, and build kernels with only the memcpy or
> only the memset part reverted.

Thanks for all your effort!
I will during the weekend try to think how we can nail this,
but I'm a bit lost here as we are looking into areas I know little about.

	Sam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (20 preceding siblings ...)
  2011-03-11 22:40 ` Sam Ravnborg
@ 2011-03-12 18:03 ` daniel
  2011-03-13 21:13 ` Sam Ravnborg
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: daniel @ 2011-03-12 18:03 UTC (permalink / raw)
  To: sparclinux

 
  On Fri, 11 Mar 2011 23:40:32  0100, Sam Ravnborg  wrote:
Hi Marcel. 
  >
  > On Fri, Mar 11, 2011 at 10:26:36PM  0100, Marcel van Nies wrote:
  > > Hi,
  > >
  > > > Regarding the segfault - the easiest way forward would be to split the
  > > > patch up in smaller chunks so we know which part causes the 
segfault to happen. 
  > > >
  > > > I assume you had to hand-apply the revert. If you could post 
the exact patch
  > > > you used for revert I will try to split it up in smaller logical parts. 
  > >
  > > I attached the patch which I used to revert commit
  > > 4d14a459857bd151ecbd14bcd37b4628da00792b
  > >
  > > I did a split of this patch, and build kernels with only the memcpy or
  > > only the memset part reverted. 
  >
  > Thanks for all your effort!
  > I will during the weekend try to think how we can nail this,
  > but I'm a bit lost here as we are looking into areas I know little about. 
  >
   
  Hi,
   
  I have begun too look at the patches, one thing that strikes me is 
why handle_level_irq is used and why the ack functions irq_ack and 
irq_mask_ack are not defined. On the LEON architecture IRQs are 
normally edge triggered, the exception beeing PCI interrupts that is 
level triggered. Implementing the ack functions in the current 
implementation would result in acking edge triggered IRQs which means 
IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have 
edge triggered interrupts and that the CPU acks the IRQ automatically 
when the trap is taken?
   
  What is the difference between having handle_level_irq without ACKs 
implemented and having handle_edge_irq doing the interrupt flow 
handling?
   
  Daniel


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (21 preceding siblings ...)
  2011-03-12 18:03 ` daniel
@ 2011-03-13 21:13 ` Sam Ravnborg
  2011-03-14 11:17 ` Daniel Hellstrom
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2011-03-13 21:13 UTC (permalink / raw)
  To: sparclinux

Hi Daniel - thanks for looking at this patch.

I was actually planning to send it to David tonight.
But after your comments I will wait.

>  I have begun too look at the patches, one thing that strikes me is
> why handle_level_irq is used and why the ack functions irq_ack and
> irq_mask_ack are not defined.

The sun4m at least uses level triggered interrupts.
And looking at the implmentation the enable() and disable() functions
in all cases did a simple mask and unmask - also the leon variants.

So based on this observation I decided to go got the handle_level_irq
flow handler. From the implmentation in kernel/irq/ I could
also see that handle_level_irq always called irq_mask() / irq_unmask()
which was a match towoards to earlier implemtnation.

On top of this - this just worked for my sun4m box.

> On the LEON architecture IRQs are
> normally edge triggered, the exception beeing PCI interrupts that is
> level triggered. Implementing the ack functions in the current
> implementation would result in acking edge triggered IRQs which means
> IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
> edge triggered interrupts and that the CPU acks the IRQ automatically
> when the trap is taken?
>   
>  What is the difference between having handle_level_irq without ACKs
> implemented and having handle_edge_irq doing the interrupt flow
> handling?

Thomas? Can you help here? You are much more into these details than I am.

	Sam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (22 preceding siblings ...)
  2011-03-13 21:13 ` Sam Ravnborg
@ 2011-03-14 11:17 ` Daniel Hellstrom
  2011-03-14 11:25 ` Daniel Hellstrom
  2011-03-14 17:03 ` Thomas Gleixner
  25 siblings, 0 replies; 27+ messages in thread
From: Daniel Hellstrom @ 2011-03-14 11:17 UTC (permalink / raw)
  To: sparclinux

Sam Ravnborg wrote:

>Hi Daniel - thanks for looking at this patch.
>
>I was actually planning to send it to David tonight.
>But after your comments I will wait.
>
>  
>
>> I have begun too look at the patches, one thing that strikes me is
>>why handle_level_irq is used and why the ack functions irq_ack and
>>irq_mask_ack are not defined.
>>    
>>
>
>The sun4m at least uses level triggered interrupts.
>And looking at the implmentation the enable() and disable() functions
>in all cases did a simple mask and unmask - also the leon variants.
>  
>
Ok.

mask and unmask is harmless on the LEON, incomming IRQs will be pending 
but not propagated to the CPU until it is unmasked, they will not disapear.

As I understand it the mask/unmask in handler_irq is done in order to 
avoid generating an extra IRQ for level triggered IRQs. When first 
unmasking no more IRQs will be queued for the CPU from this IRQ source, 
the current pending IRQ is cleared by acking the IRQ controller, the 
"real" IRQ source (for example a PCI board) is acked in the ISR, then 
handler_irq unmasks the IRQ again and can safely avoid an extra spurious 
IRQ.

>So based on this observation I decided to go got the handle_level_irq
>flow handler. From the implmentation in kernel/irq/ I could
>also see that handle_level_irq always called irq_mask() / irq_unmask()
>which was a match towoards to earlier implemtnation.
>
>On top of this - this just worked for my sun4m box.
>  
>
Ok

>  
>
>>On the LEON architecture IRQs are
>>normally edge triggered, the exception beeing PCI interrupts that is
>>level triggered. Implementing the ack functions in the current
>>implementation would result in acking edge triggered IRQs which means
>>IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
>>edge triggered interrupts and that the CPU acks the IRQ automatically
>>when the trap is taken?
>>  
>> What is the difference between having handle_level_irq without ACKs
>>implemented and having handle_edge_irq doing the interrupt flow
>>handling?
>>    
>>
>
>Thomas? Can you help here? You are much more into these details than I am.
>  
>

Must add one more thing here: I now see that the egde IRQ handler 
(handler_edge_irq) does ack which is also incorrect on the LEON. One 
could argue that irq_ack should be left undefined, however it is needed 
for PCI Level IRQs later. Mixing handler_edge_irq and handler_level_irq 
does not seem to be a good idea on LEON since ack is called both times, 
and edge IRQs must not be acked whereas level IRQs should on the LEON. 
Instead I suggest for the LEON:
1. using handle_fasteoi_irq for "edge" IRQs
2. using handle_level_irq for PCI IRQs in the future, this also requires 
that irq_ack and irq_mask_ack is defined (I have a patch for this)


One other thing is that I can't boot anymore on the LEON with the genirq 
patch, this is because the virtual IRQs is not 1:1 to real IRQs and the 
APBUART serial tty console driver uses the "interrupt" property 
directely, thus VIRQs and real IRQs are mixed and that can not work. 
Perhaps there are other drivers on sparc32 machines that use the 
interrupt propery directely? I will try creating a patch for APBUART 
driver to use VIRQs instead, perhaps that patch can go in before Sam's 
genirq patch...

I really think this is the right way forward for IRQ handling on 
sparc32, this is an improvement for LEON indeed. Thank you all for your 
efforts in this.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (23 preceding siblings ...)
  2011-03-14 11:17 ` Daniel Hellstrom
@ 2011-03-14 11:25 ` Daniel Hellstrom
  2011-03-14 17:03 ` Thomas Gleixner
  25 siblings, 0 replies; 27+ messages in thread
From: Daniel Hellstrom @ 2011-03-14 11:25 UTC (permalink / raw)
  To: sparclinux

Sam Ravnborg wrote:

>Hi Daniel - thanks for looking at this patch.
>
>I was actually planning to send it to David tonight.
>But after your comments I will wait.
>  
>
Thanks, please hold it a couple of days until I can test this a bit more.

Below is a patch that adds irq_unlink (havn't tested it yet), I figure 
we must have that in order to implement irq_shutdown?

I used the patch on irq_alloc() below to get LEON booting, this will 
make VIRQs map 1:1 to real IRQs in most cases. This is needed in order 
to get APBUART driver working, however as said in previous email I will 
try fixing APBUART driver instead so you should probably ignore that hunk.

Daniel


---
 arch/sparc/kernel/irq.h    |    1 +
 arch/sparc/kernel/irq_32.c |   28 +++++++++++++++++++++++-----
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/irq.h b/arch/sparc/kernel/irq.h
index a43fc46..ecff50f 100644
--- a/arch/sparc/kernel/irq.h
+++ b/arch/sparc/kernel/irq.h
@@ -54,6 +54,7 @@ extern struct sparc_irq_config sparc_irq_config;
 
 unsigned int irq_alloc(unsigned int real_irq, unsigned int pil);
 void irq_link(unsigned int irq);
+void irq_unlink(unsigned int irq);
 void handler_irq(unsigned int pil, struct pt_regs *regs);
 
 /* Dave Redman (djhr@tadpole.co.uk)
diff --git a/arch/sparc/kernel/irq_32.c b/arch/sparc/kernel/irq_32.c
index 9ce6b97..f698f07 100644
--- a/arch/sparc/kernel/irq_32.c
+++ b/arch/sparc/kernel/irq_32.c
@@ -105,12 +105,12 @@ EXPORT_SYMBOL(arch_local_irq_restore);
  * Sun4d complicates things even further.  IRQ numbers are arbitrary
  * 32-bit values in that case.  Since this is similar to sparc64,
  * we adopt a virtual IRQ numbering scheme as is done there.
- * Virutal interrupt numbers are allocated by build_irq().  So NR_IRQS
+ * Virtual interrupt numbers are allocated by build_irq().  So NR_IRQS
  * just becomes a limit of how many interrupt sources we can handle in
  * a single system.  Even fully loaded SS2000 machines top off at
  * about 32 interrupt sources or so, therefore a NR_IRQS value of 64
  * is more than enough.
-  *
+ *
  * We keep a map of per-PIL enable interrupts.  These get wired
  * up via the irq_chip->startup() method which gets invoked by
  * the generic IRQ layer during request_irq().
@@ -135,9 +135,13 @@ unsigned int irq_alloc(unsigned int real_irq, 
unsigned int pil)
             return i;
     }
 
-    for (i = 1; i < NR_IRQS; i++) {
-        if (!irq_table[i].irq)
-            break;
+    if (real_irq < NR_IRQS && irq_table[real_irq].irq = 0) {
+        i = real_irq;
+    } else {
+        for (i = 1; i < NR_IRQS; i++) {
+            if (!irq_table[i].irq)
+                break;
+        }
     }
 
     if (i >= NR_IRQS) {
@@ -170,6 +174,20 @@ void irq_link(unsigned int irq)
     irq_map[pil] = p;
 }
 
+void irq_unlink(unsigned int irq)
+{
+    struct irq_bucket *p, **pnext;
+
+    BUG_ON(irq >= NR_IRQS);
+
+    p = &irq_table[irq];
+    BUG_ON(p->pil > SUN4D_MAX_IRQ);
+    pnext = &irq_map[p->pil];
+    while (*pnext != p)
+        pnext = &(*pnext)->next;
+    *pnext = p->next;
+}
+
 int show_interrupts(struct seq_file *p, void *v)
 {
     int i = *(loff_t *) v, j;
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Status update on sparc32 genirq support
  2011-03-08  7:01 Status update on sparc32 genirq support David Miller
                   ` (24 preceding siblings ...)
  2011-03-14 11:25 ` Daniel Hellstrom
@ 2011-03-14 17:03 ` Thomas Gleixner
  25 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2011-03-14 17:03 UTC (permalink / raw)
  To: sparclinux

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1952 bytes --]

On Sun, 13 Mar 2011, Sam Ravnborg wrote:

> Hi Daniel - thanks for looking at this patch.
> 
> I was actually planning to send it to David tonight.
> But after your comments I will wait.
> 
> >  I have begun too look at the patches, one thing that strikes me is
> > why handle_level_irq is used and why the ack functions irq_ack and
> > irq_mask_ack are not defined.
> 
> The sun4m at least uses level triggered interrupts.
> And looking at the implmentation the enable() and disable() functions
> in all cases did a simple mask and unmask - also the leon variants.
> 
> So based on this observation I decided to go got the handle_level_irq
> flow handler. From the implmentation in kernel/irq/ I could
> also see that handle_level_irq always called irq_mask() / irq_unmask()
> which was a match towoards to earlier implemtnation.
> 
> On top of this - this just worked for my sun4m box.
> 
> > On the LEON architecture IRQs are
> > normally edge triggered, the exception beeing PCI interrupts that is
> > level triggered. Implementing the ack functions in the current
> > implementation would result in acking edge triggered IRQs which means
> > IRQs may be lost (on the LEON at least)? I thought SUN SPARCs also have
> > edge triggered interrupts and that the CPU acks the IRQ automatically
> > when the trap is taken?
> >   
> >  What is the difference between having handle_level_irq without ACKs
> > implemented and having handle_edge_irq doing the interrupt flow
> > handling?
> 
> Thomas? Can you help here? You are much more into these details than I am.

That largely depends on the hardware. There is hardware which
automatically acks either on trap entry or with the mask.

Also edge triggered interrupts can be handled entirely safe by
handle_level_irq if the hardware latches the edge even when the
interrupt line is masked.

W/o looking at the actual datasheets I can't tell.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-03-14 17:03 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-08  7:01 Status update on sparc32 genirq support David Miller
2011-03-08  7:08 ` Sam Ravnborg
2011-03-08  7:19 ` David Miller
2011-03-08  7:37 ` Marcel van Nies
2011-03-08  7:45 ` Marcel van Nies
2011-03-08 11:17 ` Marcel van Nies
2011-03-08 20:22 ` Marcel van Nies
2011-03-08 21:09 ` Sam Ravnborg
2011-03-08 21:13 ` Marcel van Nies
2011-03-08 21:19 ` David Miller
2011-03-08 21:20 ` Marcel van Nies
2011-03-08 21:27 ` Marcel van Nies
2011-03-08 21:30 ` Marcel van Nies
2011-03-08 21:30 ` David Miller
2011-03-08 21:51 ` Marcel van Nies
2011-03-08 22:00 ` David Miller
2011-03-09  5:25 ` Bob Breuer
2011-03-09  6:16 ` Bob Breuer
2011-03-09  6:37 ` Bob Breuer
2011-03-09 20:17 ` David Miller
2011-03-11 21:26 ` Marcel van Nies
2011-03-11 22:40 ` Sam Ravnborg
2011-03-12 18:03 ` daniel
2011-03-13 21:13 ` Sam Ravnborg
2011-03-14 11:17 ` Daniel Hellstrom
2011-03-14 11:25 ` Daniel Hellstrom
2011-03-14 17:03 ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.