flush_dcache_range problems

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* flush_dcache_range problems
@ 2001-08-15  0:18 Justin (Gus) Hurwitz
  2001-08-14 23:08 ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Justin (Gus) Hurwitz @ 2001-08-15  0:18 UTC (permalink / raw)
  To: linuxppc-embedded

Anyone have an idea as to why flush_dcache_range is crashing the kernel on
my 603e based board? It's dying on the "dcbst   r0,r3" instruction. I've
been unable to find any other code that uses this function, so I don't
even know if it works on the 603 at all.

Thoughts?

PS- tomorrow's the last day I'm working on this project, so quick replies
are great :)

Thanks,
--Gus

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: flush_dcache_range problems
  2001-08-15  0:18 flush_dcache_range problems Justin (Gus) Hurwitz
@ 2001-08-14 23:08 ` Dan Malek
  2001-08-15  4:33   ` Justin (Gus) Hurwitz
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Malek @ 2001-08-14 23:08 UTC (permalink / raw)
  To: Justin (Gus) Hurwitz; +Cc: linuxppc-embedded

"Justin (Gus) Hurwitz" wrote:
>
> Anyone have an idea as to why flush_dcache_range is crashing the kernel on
> my 603e based board?

Well, a little more information would be helpful.  First, you shouldn't
need to call it, so a backtrace would be useful.  The whole Linux panic
printout would be nice, so we can see what is in the registers......

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: flush_dcache_range problems
  2001-08-14 23:08 ` Dan Malek
@ 2001-08-15  4:33   ` Justin (Gus) Hurwitz
  2001-08-15  0:27     ` Dan Malek
  0 siblings, 1 reply; 6+ messages in thread
From: Justin (Gus) Hurwitz @ 2001-08-15  4:33 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

On Tue, 14 Aug 2001, Dan Malek wrote:

> "Justin (Gus) Hurwitz" wrote:
> >
> > Anyone have an idea as to why flush_dcache_range is crashing the kernel on
> > my 603e based board?
>
> Well, a little more information would be helpful.  First, you shouldn't
> need to call it, so a backtrace would be useful.  The whole Linux panic
> printout would be nice, so we can see what is in the registers......

Indeed- I appologize for the brevity of the mail; I knowcked it off as I
ran out of the office this afternoon.

I assume that you say I shouldn't need it because the 603e typically
doesn't need such a call. Our 603e is broken (or, the chipset is): it has
no rupport for hardware snooping.

We're using an Inte; 82596 ethernet chip. This chip requires either
hardware snooping or a region on non-cacheable memory. I have been unable
to get the kernel to allocate the non-cacheable memory, despite trying
many things that have been recommended. In a last ditch effort, I am
trying another approach.

Basically, our driver is nearly identical to drivers/net/lasi_82596.c (I'm
going from memory here- I can get out the code if you give me some more
time later tonight). The lasi_ driver, for a parisc system, uses
pci_consistent_alloc to allocate a block of non-cacheable memory. When
this fails, it resorts to flushing the cache whenever needed, using the
parisc's flush_dcache_range(long start, long size) function (all of the
dma_cache_* functions are macros for flush_dcache_range).

Our driver works perfectly with L1 disabled. We _need_ te L1 enabled
however (the RAM is Reed-Solomon encoded, which up to triples latencies).
With L1 enabled it very quickly bombs. Since I'm been so drastically
unable to get the kernel to allocate non-cached memory on this board (not
for lack of trying, or suggestions from people on the list), I've decided
to borrow the lasi_ driver's cache flushing approach.

Basically, I added a macro to my driver:
#define CC(addr, len)	flush_dcache_range(addr, addr+len)

And wherever there was a CHECK_* macro in the lasi_ driver I added a CC in
my code with the same parameters (remember, despite appearances all of the
three CHECK_* functions in the lasi_ driver all eventually become
flush_dcache_range()'s).

And this crashed dramatically when first called (in i82596_probe(), where
I added the call). And then I looked at my watch and realized I had to be
45 minutes away in 30 minutes (I ended up being 5 minutes late- damn), so
I threw off my email hoping that the working-code fairy would leave a
miracle under my pillow over night :)

If you need any more info, let me know what you need, and I'll get it to
you ASAP :)

Thanks,
--Gus

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: flush_dcache_range problems
  2001-08-15  4:33   ` Justin (Gus) Hurwitz
@ 2001-08-15  0:27     ` Dan Malek
  2001-08-15  5:43       ` Justin (Gus) Hurwitz
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Malek @ 2001-08-15  0:27 UTC (permalink / raw)
  To: Justin (Gus) Hurwitz; +Cc: linuxppc-embedded

"Justin (Gus) Hurwitz" wrote:

> Our driver works perfectly with L1 disabled.

I don't believe you :-).  You can't call dcb* instructions on
memory that isn't cache enabled......they crash.

> If you need any more info, let me know what you need, and I'll get it to
> you ASAP :)

I told you in the last message......backtrace and register dumps :-).

	-- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: flush_dcache_range problems
  2001-08-15  0:27     ` Dan Malek
@ 2001-08-15  5:43       ` Justin (Gus) Hurwitz
  2001-08-15 18:03         ` Justin (Gus) Hurwitz
  0 siblings, 1 reply; 6+ messages in thread
From: Justin (Gus) Hurwitz @ 2001-08-15  5:43 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

On Tue, 14 Aug 2001, Dan Malek wrote:

> "Justin (Gus) Hurwitz" wrote:
>
> > Our driver works perfectly with L1 disabled.
>
> I don't believe you :-).  You can't call dcb* instructions on
> memory that isn't cache enabled......they crash.

I change the CC macro when running without the L1 (to do{}while(0)). I'm
quite certain that it works- I'm mounting an NFS root partition and
performing as much IO as possible to test the board- it remains stable
under multiple flood pings, while doing heavy NFS reading (as much as the
board can generate) and while maxing out the the serial console.

> > If you need any more info, let me know what you need, and I'll get it to
> > you ASAP :)
>
> I told you in the last message......backtrace and register dumps :-).

I unfortunately don't have any with me- and I won't have them until around
9A EST tomorrow.

Remembering the backtrace, the crash was at NIP=c0006048 (which was the
dcbst instruction), and the backtrace was from i82965_probe, which was
called from probe_list (IIRC), called from ethif_probe.

But you do raise an interesting point- my memory is currently being
allocated with:
dev->mem_start = (int)pci_alloc_consistent( NULL, sizeof(struct i596_pri
vate), &dma_addr);

which might be allocating the memory as non-cacheable (despite the fact
that it is not working as non-cacheable memory should). I'll make some
modifications to the code now and test them when I get in in the morning
to revert to allocating non-non-cacheable memory, which might prevent the
dcbst from crashing, eh?

Any recommendations for other things I should look at/for? I'll gladly get
you backtrace.

---

Hold a second- I lied, and I lied bigtime. I forgot that I had a hole
punched in the firewall at work, and and ftp server running on the PC that
had my logs on it (don't tell anyone):

Oops: kernel access of bad area, sig: 11
NIP: C0006048 XER: 20000000 LR: C01366D8 SP: C03C5F20 REGS: c03c5e70 TRAP: 0300
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: C8000000, DSISR: 20000000
TASK = c03c4000[1] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
GPR00: 00000000 C03C5F20 C03C4000 C8000000 01C0228D 0000001F C7FBD19C C009F638
GPR08: C009F62C 07FBC000 C009E3E8 C009D73C 44002024 00000000 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 003FF000 00000000 00000000 00000000
GPR24: 00000000 00000000 C0130000 C0100000 C0126528 C03C5F30 C01264B8 C7FBC000
Call backtrace:
C0136614 C013623C C0136350 C0136A40 C0135170 C012E7B0 C012E7F8
C0003AF8 C00064CC

Those last three addresses in the bt correspond the the functions I
mentioned above.

I'm going to go grab my notebook and spend some time mucking with the code
(removing my attempts to allocte the memory as non-cacheable (or,
ifdef'ing it out)).

Thanks again,
--Gus

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: flush_dcache_range problems
  2001-08-15  5:43       ` Justin (Gus) Hurwitz
@ 2001-08-15 18:03         ` Justin (Gus) Hurwitz
  0 siblings, 0 replies; 6+ messages in thread
From: Justin (Gus) Hurwitz @ 2001-08-15 18:03 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded


On Wed, 15 Aug 2001, Justin (Gus) Hurwitz wrote:

>
> On Tue, 14 Aug 2001, Dan Malek wrote:


> ---
>
> Hold a second- I lied, and I lied bigtime. I forgot that I had a hole
> punched in the firewall at work, and and ftp server running on the PC that
> had my logs on it (don't tell anyone):
>
> Oops: kernel access of bad area, sig: 11
> NIP: C0006048 XER: 20000000 LR: C01366D8 SP: C03C5F20 REGS: c03c5e70 TRAP: 0300
> MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
> DAR: C8000000, DSISR: 20000000
> TASK = c03c4000[1] 'swapper' Last syscall: 120
> last math 00000000 last altivec 00000000
> GPR00: 00000000 C03C5F20 C03C4000 C8000000 01C0228D 0000001F C7FBD19C C009F638
> GPR08: C009F62C 07FBC000 C009E3E8 C009D73C 44002024 00000000 00000000 00000000
> GPR16: 00000000 00000000 00000000 00000000 003FF000 00000000 00000000 00000000
> GPR24: 00000000 00000000 C0130000 C0100000 C0126528 C03C5F30 C01264B8 C7FBC000
> Call backtrace:
> C0136614 C013623C C0136350 C0136A40 C0135170 C012E7B0 C012E7F8
> C0003AF8 C00064CC
>
>
> Those last three addresses in the bt correspond the the functions I
> mentioned above.
>
> I'm going to go grab my notebook and spend some time mucking with the code
> (removing my attempts to allocte the memory as non-cacheable (or,
> ifdef'ing it out)).

OK- I'm now running with L1 cache enabled, and am no longer trying to
allocate non-cacheable memory. I am crashing later in the bootup process
(before it was in i82596_probe, now it's when trying to bring up the
chip).

Here's the crash:

NIP: C0006048 XER: 20000000 LR: C009DB5C SP: C03C5EE0 REGS: c03c5e30 TRAP: 0300
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: C8000000, DSISR: 20000000
TASK = c03c4000[1] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
GPR00: FFFF8008 C03C5EE0 C03C4000 C8000000 0007A120 0000001F C7FBE3E8 C7FBE280
GPR08: FFFFFFFF C7FBE3E8 C7FBE400 E28007FB 82000022 00000000 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 003FF000 00000000 00000000 00000000
GPR24: C03C5EE8 00000000 C0130000 C01264B8 00000000 C7FBE000 C7FBE5E0 C7FBE620
Call backtrace:
C009D908 C00A7ABC C00A9344 C0137BB4 C01381C8 C012E7B0 C012E7F8
C0003AF8 C00064CC

c009d854 <i596_open>
c00a7a58 <dev_open>
c00a92d8 <dev_change_flags>
c0137a78 <ic_open_devs>
c013814c <ip_auto_config>
c012e780 <do_initcalls>

And, from i596_open:
  c009d8fc:       38 60 06 10     li      r3,1552
  c009d900:       38 80 00 20     li      r4,32
  c009d904:       48 00 5f e5     bl      c00a38e8 <alloc_skb>
* c009d908:       7c 63 1b 79     mr.     r3,r3
  c009d90c:       41 82 00 20     beq     c009d92c <i596_open+0xd8>
  c009d910:       81 23 00 80     lwz     r9,128(r3)
  c009d914:       81 63 00 84     lwz     r11,132(r3)
  c009d918:       39 29 00 10     addi    r9,r9,16
  c009d91c:       39 6b 00 10     addi    r11,r11,16
  c009d920:       91 23 00 80     stw     r9,128(r3)
  c009d924:       91 63 00 84     stw     r11,132(r3)
  c009d928:       40 82 00 10     bne     c009d938 <i596_open+0xe4>
  c009d92c:       3c 60 c0 10     lis     r3,-16368
  c009d930:       38 63 6b b8     addi    r3,r3,27576
  c009d934:       4b f7 3b 71     bl      c00114a4 <panic>

(note, c009d908 is in init_rx_bufs, which is a static inline called from
i596_open).

C0006048 is still dcbst (from flush_dcache_range):

c0006044:       7c 89 03 a6     mtctr   r4
c0006048:       7c 00 18 6c     dcbst   r0,r3
c000604c:       38 63 00 20     addi    r3,r3,32



And the clock is now officially running: I have about 3 hours left working
on this project (before I move to the other side of the country).

Thanks :)
--Gus


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-08-15 18:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-08-15  0:18 flush_dcache_range problems Justin (Gus) Hurwitz
2001-08-14 23:08 ` Dan Malek
2001-08-15  4:33   ` Justin (Gus) Hurwitz
2001-08-15  0:27     ` Dan Malek
2001-08-15  5:43       ` Justin (Gus) Hurwitz
2001-08-15 18:03         ` Justin (Gus) Hurwitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).