Linux PARISC architecture development
 help / color / mirror / Atom feed
* [parisc-linux] The problem on the PA8800 is all in the data-cache.
@ 2006-07-22 17:50 Carlos O'Donell
  2006-07-23  1:01 ` Thibaut VARENE
  2006-07-24  2:33 ` James Bottomley
  0 siblings, 2 replies; 23+ messages in thread
From: Carlos O'Donell @ 2006-07-22 17:50 UTC (permalink / raw)
  To: parisc-linux, James Bottomley, Grant Grundler

James,

After spending ~4 hours of debugging yesterday evening, between
Thibaut, Dave, and myself, we firmly believe the PA8800 problems are
data cache issues.

Let me describe the test environment, the test, and the results/conclusions:

1. Thibaut's magnum, PA8800. 1 cpu enabled, 2 cores, L1/L2 enabled. etc.
2. Kernel was 2.6.17, with a revision identifier that is not in my notes.
3. A statically compiled sshd, with *everything* disabled.
    This required LIBS="-ldl" and LD_FLAGS="-static" to achieve.

We copied the statically compiled sshd to the PA8800, loaded sshd via
gdb. Passed the following parameters "-D -p 2222", in gdb we used "set
follow-fork-mode child", and started the process.

>>From an external box we initiate an ssh connection to the remote
PA8800 and waite for gdb to catch the SIGSEGV in the sshd child. We
did this over, and over, and over to look for patterns.

Pattern 1:

Using strace, we looked at the syscalls, and determined that the child
sshd process *always* dies after an fd socket read.

Pattern 2:

The set of registers involved is small, roughly r4, r19, r3, r21, r28.
These registers are primarily used by GCC to reference local data on
the stack. r3 is the frame marker and was frequently involved in the
faults.

Pattern 3:

Called functions that fail deal with allocating and touching new
memory. Deaths are primarily in malloc, xmalloc, memset,
packet_read_seqnr, buffer_put_bignum2_ret. Infact we died more often
than not in malloc.

Results:

Initially we thought it was an icache issue, then we realized that
PLABEL's are just data, and when we removed the PLABEL's from the
equation (complete static compile) we stopped seeing invalid insns. We
believe the truth here is that the PLABEL data is corrupted, and thus
r19 and the ip are bogus, so the failure appears to be icache related.
In thruth it was only corrupted PLABELs.

With a fully static sshd, the PLABEL's are not present, and the faults
are *all* memory loads and stores to the stack.

Conclusions:

a) We think it is not an icache issue, but infact a dcache issue.

Often it appears as if a register was corrupted, but the truth is that
the ldw loaded bogus data into a register.

b) One time, on a later comparison in gdb, the register and data in
memory did not equal. I stress that we only saw this situation once.

c) We have often seen the failure with the frame marker on a cacheline
boundary, for example 0xc0278100 (e.g. 256 bytes).

It is my hope that these patterns will trigger someone to devise a
plan for fixing this. If you have any questions about our methods, or
reproducing this, you can easily talk to Thibaut and we can probably
setup access to the test sshd binary.

Grant expressed worry that "Pattern 1" was indicative of a dma sync
problem with the network socket read.

Cheers,
Carlos.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-22 17:50 [parisc-linux] The problem on the PA8800 is all in the data-cache Carlos O'Donell
@ 2006-07-23  1:01 ` Thibaut VARENE
  2006-07-23 16:28   ` Michael S. Zick
  2006-07-24  2:33 ` James Bottomley
  1 sibling, 1 reply; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-23  1:01 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: James Bottomley, parisc-linux

Carlos,

I'm observing something totally crazy right now. On the very same
machine we hacked yesterday, exact same setup (same kernel, same
binary, same everything):

i /can't/ start our static sshd anymore. It dies right after a sysctl (!):

898   mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4000b000
898   getpid()                          = 898
898   rt_sigaction(SIGRTMIN, {0x40c51eba, [], 0}, NULL, 8) = 0
898   rt_sigaction(SIGRT_1, {0x40c51ec2, [TRAP], 0}, NULL, 8) = 0
898   rt_sigaction(SIGRT_2, {0x40c51ea2, [], 0}, NULL, 8) = 0
898   rt_sigprocmask(SIG_BLOCK, [RTMIN], NULL, 8) = 0
898   rt_sigprocmask(SIG_UNBLOCK, [RT_1], NULL, 8) = 0
898   _sysctl({{CTL_KERN, KERN_VERSION}, 2, 0xc00e67d4, 36, (nil), 0}) = 0
898   --- SIGSEGV (Segmentation fault) @ 0 (0) ---
898   +++ killed by SIGSEGV +++

as you can see at that point it hasn't even yet spawned any child. gdb
isn't of much help, as the backtrace is pretty clueless:

(gdb) symbol-file /home/varenet/openssh-4.3p2/sshd
Reading symbols from /home/varenet/openssh-4.3p2/sshd...done.
(gdb) set follow-fork-mode child
(gdb) set args -D -p 2222
(gdb) run
Starting program: /usr/local/test/sbin/sshd -D -p 2222

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x40c3a9a0 in ?? ()
Previous frame identical to this frame (corrupt stack?)

finally dmesg shows:

do_page_fault() pid=898 command='sshd' type=6 address=0x00000003

     YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001101111111100001011 Not tainted
r00-03  0000000000000000 0000000040c5233c 0000000040c3a9a3 00000000c00e67d4
r04-07  0000000040c5233c 0000000040c51a80 000000004069e830 0000000000000200
r08-11  0000000040c513d4 00000000000000e1 0000000080000001 0000000090000000
r12-15  00000000d0000000 000000000021fd70 00000000001a5800 00000000001a5800
r16-19  000000000004c8c0 00000000c00de698 000000000004c8c0 0000000040c5233c
r20-23  0000000000000000 0000000000000053 0000000000000000 00000000c00e66c8
r24-27  00000000c00e67d4 0000000040c401b2 00000000c00e67d7 00000000001db060
r28-31  0000000040c51a80 0000000000000213 00000000c00e6a40 0000000040c3a9a3
sr0-3   00000000001fa800 0000000000000000 0000000000000000 00000000001fa800
sr4-7   00000000001fa800 00000000001fa800 00000000001fa800 00000000001fa800

The cool thing is that i can /consistently/ reproduce this.

I'm leaving the box powered up, not touching anything until we get a
chance to investigate this a bit more.

Aside that, I /really/ believe that the fact that we can trigger the
bug that easily with some network applications isn't a coincidence.
Grant's hint of a dma think problem shouldn't be overlooked. The
"make" failures could also be I/O related...

HTH

T-Bone

PS: i tried "ssh localhost" with the 'normal' sshd (/usr/sbin/sshd) as
I told you earlier today, it dies as expected with pretty much the
same tombstones that those we've seen yesterday.

Haven't investigated that much more at that point.

Note: you and James can access that machine (provided you remember
your password) and that your ssh key on mkhppa02.esiee.fr is valid. If
there's a problem with any of these, it's easy to fix. If jda wants an
account, it's also easy

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-23  1:01 ` Thibaut VARENE
@ 2006-07-23 16:28   ` Michael S. Zick
  2006-07-23 22:03     ` Thibaut VARENE
  0 siblings, 1 reply; 23+ messages in thread
From: Michael S. Zick @ 2006-07-23 16:28 UTC (permalink / raw)
  To: parisc-linux; +Cc: Thibaut VARENE

On Sat July 22 2006 20:01, Thibaut VARENE wrote:
> Carlos,
> 
> I'm observing something totally crazy right now. On the very same
> machine we hacked yesterday, exact same setup (same kernel, same
> binary, same everything):
> 
> i /can't/ start our static sshd anymore. It dies right after a sysctl (!):
> 
> I'm leaving the box powered up, not touching anything until we get a
> chance to investigate this a bit more.
> 

Is it still possible to capture the kernel's internal state by
copying /proc/kcore to a file somewhere?

Mike
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-23 16:28   ` Michael S. Zick
@ 2006-07-23 22:03     ` Thibaut VARENE
  2006-07-24  1:40       ` Kyle McMartin
  0 siblings, 1 reply; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-23 22:03 UTC (permalink / raw)
  To: Michael S. Zick; +Cc: parisc-linux

On 7/23/06, Michael S. Zick <mszick@morethan.org> wrote:
> On Sat July 22 2006 20:01, Thibaut VARENE wrote:
> > Carlos,
> >
> > I'm observing something totally crazy right now. On the very same
> > machine we hacked yesterday, exact same setup (same kernel, same
> > binary, same everything):
> >
> > i /can't/ start our static sshd anymore. It dies right after a sysctl (!):
> >
> > I'm leaving the box powered up, not touching anything until we get a
> > chance to investigate this a bit more.
> >
>
> Is it still possible to capture the kernel's internal state by
> copying /proc/kcore to a file somewhere?

That box hit the "kill your fs" bug I've seen on ppc and ia64
(upcoming report mail to be posted soon) so it's basically dead at
that point and I'm having hard time reinstalling it (thanks to pa8800
being /such/ a hassle to install remotely).

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-23 22:03     ` Thibaut VARENE
@ 2006-07-24  1:40       ` Kyle McMartin
  2006-07-24  2:39         ` Thibaut VARENE
  0 siblings, 1 reply; 23+ messages in thread
From: Kyle McMartin @ 2006-07-24  1:40 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: parisc-linux

On Sun, Jul 23, 2006 at 06:03:47PM -0400, Thibaut VARENE wrote:
> That box hit the "kill your fs" bug I've seen on ppc and ia64
> (upcoming report mail to be posted soon) so it's basically dead at
> that point and I'm having hard time reinstalling it (thanks to pa8800
> being /such/ a hassle to install remotely).
> 

Does it have two disks? Maybe we should keep one disk as 'known good'
backup on ioz and magnum, and if they crap out, BO ALT and dd it over
the scrogged disk? Just a thought which might save some pain in the
future considering they are crash n' bash machines.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-22 17:50 [parisc-linux] The problem on the PA8800 is all in the data-cache Carlos O'Donell
  2006-07-23  1:01 ` Thibaut VARENE
@ 2006-07-24  2:33 ` James Bottomley
  2006-07-24  2:54   ` Thibaut VARENE
  2006-07-24 14:51   ` James Bottomley
  1 sibling, 2 replies; 23+ messages in thread
From: James Bottomley @ 2006-07-24  2:33 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: parisc-linux

On Sat, 2006-07-22 at 13:50 -0400, Carlos O'Donell wrote: 
> After spending ~4 hours of debugging yesterday evening, between
> Thibaut, Dave, and myself, we firmly believe the PA8800 problems are
> data cache issues.

Thanks very much for spending the time to do this!

> a) We think it is not an icache issue, but infact a dcache issue.
> 
> Often it appears as if a register was corrupted, but the truth is that
> the ldw loaded bogus data into a register.

OK, I'll buy this provisionally ... D cache incoherence is far more
difficult to explain than D/I incoherence, but I'll try.

> b) One time, on a later comparison in gdb, the register and data in
> memory did not equal. I stress that we only saw this situation once.

OK, as long as it was a register read, this must have been D
incoherence.

> c) We have often seen the failure with the frame marker on a cacheline
> boundary, for example 0xc0278100 (e.g. 256 bytes).
> 
> It is my hope that these patterns will trigger someone to devise a
> plan for fixing this. If you have any questions about our methods, or
> reproducing this, you can easily talk to Thibaut and we can probably
> setup access to the test sshd binary.

OK .. let me try to think of this one.  To me, the pattern indicates
errors in newly faulted memory (either from stack growth or touching
malloced areas which are mmapped in glibc).  So, here's the only current
theory I can come up with:  Aggressive prefetching is causing us
problems in faulting.  Theoretically, it looks like the culprit should
be anonymous pages (because that's what stack and malloc areas
are---they're not file backed).  However, I tend to discount this
because the only way data gets into anonymous memory is when the user
(or the linker running on behalf of the user) puts it there.  Thus,
there should be no user coherence issues with data in anonymous memory.

> Grant expressed worry that "Pattern 1" was indicative of a dma sync
> problem with the network socket read.

I'm still dubious about this one ... even if we agree it's a D cache
issue, it's definitely a D cache issue affecting program execution (i.e.
function pointers or call indirection).  The data coming out of the
network pipe for ssh never finds its way into the execution stream,
which means it's unlikely to affect these areas.  Additionally, ssh has
message integrity checks which fail noisily (i.e. the network data is
verified against a secure hash before it's used).  So, if we had
incoherent data from the pipe, I would exect to see periodic MIC
failures, which we don't see.

I'm also coming to the conclusion that the aggressive prefetch theory
isn't entirely accurate.  It fits the I/D incoherency theory because we
get I cache prefetches on D cache TLB entries (because of the combined
I/D tlb) then we only flush the D cache because we don't expect I data
there (all our data regions currently seem to be executable as well).
However, for D cache incoherence alone, it seems implausible because we
have to have a tlb entry to move in across a page boundary, and every
tlb entry can only be inserted from a pte (and for every pte we will
flush the cache on object destruction).

One of the suspicious monsters in our code is the PAGE_FLUSH setting,
which allows tlb re-insertion after linux thinks it has been cleared in
violation of the linux tlb philosophy.  However, those mappings are
supposed to be "flush only" and, since the algorithm had a hole in it, I
thought I fixed it not to need PAGE_FLUSH entries (even though we keep
them around).  Regardless, tmpalias flushing completely eliminates any
window we have in this regard, and, as pa8800 still doesn't work, I
think I have to conclude it's not even this.

So, the final thing we're left with is a missed or elided flush
somewhere in the linux code, which is going to be extremely hard to
find.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  1:40       ` Kyle McMartin
@ 2006-07-24  2:39         ` Thibaut VARENE
  0 siblings, 0 replies; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-24  2:39 UTC (permalink / raw)
  To: Kyle McMartin; +Cc: parisc-linux

On 7/23/06, Kyle McMartin <kyle@mcmartin.ca> wrote:
> On Sun, Jul 23, 2006 at 06:03:47PM -0400, Thibaut VARENE wrote:
> > That box hit the "kill your fs" bug I've seen on ppc and ia64
> > (upcoming report mail to be posted soon) so it's basically dead at
> > that point and I'm having hard time reinstalling it (thanks to pa8800
> > being /such/ a hassle to install remotely).
> >
>
> Does it have two disks? Maybe we should keep one disk as 'known good'
> backup on ioz and magnum, and if they crap out, BO ALT and dd it over
> the scrogged disk? Just a thought which might save some pain in the
> future considering they are crash n' bash machines.

It /had/ two disks and one good backup. Until I ran (for my own shame
and that of my offsprings) '# rm -rf /mnt *' on the backup one...

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  2:33 ` James Bottomley
@ 2006-07-24  2:54   ` Thibaut VARENE
  2006-07-24  3:32     ` Matthew Wilcox
       [not found]     ` <1153711459.1235.13.camel@mulgrave.il.steeleye.com>
  2006-07-24 14:51   ` James Bottomley
  1 sibling, 2 replies; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-24  2:54 UTC (permalink / raw)
  To: James Bottomley; +Cc: parisc-linux

On 7/23/06, James Bottomley <James.Bottomley@steeleye.com> wrote:
> Carlos wrote:
> > Grant expressed worry that "Pattern 1" was indicative of a dma sync
> > problem with the network socket read.
>
> I'm still dubious about this one ... even if we agree it's a D cache
> issue, it's definitely a D cache issue affecting program execution (i.e.
> function pointers or call indirection).  The data coming out of the
> network pipe for ssh never finds its way into the execution stream,
> which means it's unlikely to affect these areas.  Additionally, ssh has
> message integrity checks which fail noisily (i.e. the network data is
> verified against a secure hash before it's used).  So, if we had
> incoherent data from the pipe, I would exect to see periodic MIC
> failures, which we don't see.

Actually on some occasion, the sshd would kill the incoming connection
with "bad packet length" and "invalid hash packet" and all sorts of
various nasty error messages. And we made sure that these messages
were sent by the _server_, not the _client_...

My take is that we see that bug so much more on pa8800 because of its
huge cache and thus because we hit cache much more often than on all
other machines...

Still investigating this, i'm about to bring back online my rp3440 ;)

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  2:54   ` Thibaut VARENE
@ 2006-07-24  3:32     ` Matthew Wilcox
  2006-07-24  4:15       ` Thibaut VARENE
  2006-07-24 14:58       ` John David Anglin
       [not found]     ` <1153711459.1235.13.camel@mulgrave.il.steeleye.com>
  1 sibling, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2006-07-24  3:32 UTC (permalink / raw)
  To: Thibaut VARENE; +Cc: James Bottomley, parisc-linux

On Sun, Jul 23, 2006 at 10:54:38PM -0400, Thibaut VARENE wrote:
> My take is that we see that bug so much more on pa8800 because of its
> huge cache and thus because we hit cache much more often than on all
> other machines...

I don't think so.  pa8800 has less cache per core than pa8700.  The L2
cache is ignorable for the purposes of this scenario, since it's
transparent to software.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  3:32     ` Matthew Wilcox
@ 2006-07-24  4:15       ` Thibaut VARENE
       [not found]         ` <1153750204.1235.18.camel@mulgrave.il.steeleye.com>
  2006-07-24 14:58       ` John David Anglin
  1 sibling, 1 reply; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-24  4:15 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: James Bottomley, parisc-linux

On 7/23/06, Matthew Wilcox <matthew@wil.cx> wrote:
> On Sun, Jul 23, 2006 at 10:54:38PM -0400, Thibaut VARENE wrote:
> > My take is that we see that bug so much more on pa8800 because of its
> > huge cache and thus because we hit cache much more often than on all
> > other machines...
>
> I don't think so.  pa8800 has less cache per core than pa8700.  The L2
> cache is ignorable for the purposes of this scenario, since it's
> transparent to software.

I don't think so :)

On B180, 1MB cache addon is "transparent" and the kernel doesn't see
it at all (/proc/cpuinfo doesn't even show it).

On pa8800, the kernel sees L2 (/proc/cpuinfo shows it) and computes
the flush routines based on L2 size, so I really don't understand how
it is transparent...

Which is why I'm making the point that either our cache flush
computations are wrong on pa8800 (and ggg says they aren't), or the L2
is /not/ transparent and what I said previously wrt huge cache size
should make some sense, shouldn't it?

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
       [not found]     ` <1153711459.1235.13.camel@mulgrave.il.steeleye.com>
@ 2006-07-24  4:26       ` Thibaut VARENE
  2006-07-24  4:31         ` Thibaut VARENE
  0 siblings, 1 reply; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-24  4:26 UTC (permalink / raw)
  To: James Bottomley; +Cc: parisc-linux

On 7/23/06, James Bottomley <James.Bottomley@steeleye.com> wrote:
> On Sun, 2006-07-23 at 22:54 -0400, Thibaut VARENE wrote:
> > Actually on some occasion, the sshd would kill the incoming connection
> > with "bad packet length" and "invalid hash packet" and all sorts of
> > various nasty error messages. And we made sure that these messages
> > were sent by the _server_, not the _client_...
> >
> > My take is that we see that bug so much more on pa8800 because of its
> > huge cache and thus because we hit cache much more often than on all
> > other machines...
> >
> > Still investigating this, i'm about to bring back online my rp3440 ;)
>
> I can get a stable ssh connection to a pa8800 ... once established, I
> don't ever see them close for MIC problems.

I'm totally amazed. Carlos and I could never get a single established
connection...

> To get one going, you just start several remote ssh's at once, so this
> would tend to indicate that it's some type of timing issue connected to
> the fork.

Well, I can consistently crash a remote telnet session if i output too
much data in the terminal (eg running dmesg or ls -lR). I doubt this
is fork related...

Another point worth mentioning, is that sshd is very likely mlocking
stuff so that it doesn't get swapped. Actually, it's been a while
since i last gave a look at sshd source, but I'd bet it keeps mostly
everything in RAM, and given the size of the cache on pa8800, blah
blah see my other mail i'm exhausted ;)

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  4:26       ` Thibaut VARENE
@ 2006-07-24  4:31         ` Thibaut VARENE
  0 siblings, 0 replies; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-24  4:31 UTC (permalink / raw)
  To: James Bottomley; +Cc: parisc-linux

On 7/24/06, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> On 7/23/06, James Bottomley <James.Bottomley@steeleye.com> wrote:
> > On Sun, 2006-07-23 at 22:54 -0400, Thibaut VARENE wrote:
> > > Actually on some occasion, the sshd would kill the incoming connection
> > > with "bad packet length" and "invalid hash packet" and all sorts of
> > > various nasty error messages. And we made sure that these messages
> > > were sent by the _server_, not the _client_...
> > >
> > > My take is that we see that bug so much more on pa8800 because of its
> > > huge cache and thus because we hit cache much more often than on all
> > > other machines...
> > >
> > > Still investigating this, i'm about to bring back online my rp3440 ;)
> >
> > I can get a stable ssh connection to a pa8800 ... once established, I
> > don't ever see them close for MIC problems.
>
> I'm totally amazed. Carlos and I could never get a single established
> connection...

Another quick dump, yesterday, Carlos and I experienced a totally
different behaviour from our statically linked sshd. It wouldn't even
start... Grant on the other hand, first kept being bounced off ioz and
suddenly started to consistently get passwd prompts, but couldn't
login... Dunno if that rings any bells.

Carlos will probably have more to say about what we've been through
yesterday, he took notes ;)

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  2:33 ` James Bottomley
  2006-07-24  2:54   ` Thibaut VARENE
@ 2006-07-24 14:51   ` James Bottomley
  1 sibling, 0 replies; 23+ messages in thread
From: James Bottomley @ 2006-07-24 14:51 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: parisc-linux

On Sun, 2006-07-23 at 22:33 -0400, James Bottomley wrote:
> > Grant expressed worry that "Pattern 1" was indicative of a dma sync
> > problem with the network socket read.
> 
> I'm still dubious about this one ... even if we agree it's a D cache
> issue, it's definitely a D cache issue affecting program execution (i.e.
> function pointers or call indirection).  The data coming out of the
> network pipe for ssh never finds its way into the execution stream,
> which means it's unlikely to affect these areas.  Additionally, ssh has
> message integrity checks which fail noisily (i.e. the network data is
> verified against a secure hash before it's used).  So, if we had
> incoherent data from the pipe, I would exect to see periodic MIC
> failures, which we don't see.

Let me back up on this one.  I still don't think it's a DMA sync issue.
However, it could be a different D incoherency issue.  Because the linux
kernel operates with kernel to user aliases (i.e. the user address of a
page is rarely congruent to the kernel address of a page) it is possible
to generate D incoherency by missing a flush when a kernel page is
reclaimed (i.e. freed).

The scenario that resonates nicely with all this has to do with the
skbuff allocation and copying.  Because the network read path isn't zero
copy, we do intermediate copies into skbuff areas before eventually
sending the data to the user socket.  the idea is that the skbuff is
freed and then reallocated to the user process in the fault (this gives
us the necessary same physical index).  If the kernel address of the
skbuff were accidentally congruent to the fault address, we'd actually
see the skbuff data instead of the underlying page data if it weren't
flushed.  The problem, as usual, is that this isn't pa8800 specific ...

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24  3:32     ` Matthew Wilcox
  2006-07-24  4:15       ` Thibaut VARENE
@ 2006-07-24 14:58       ` John David Anglin
  1 sibling, 0 replies; 23+ messages in thread
From: John David Anglin @ 2006-07-24 14:58 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: James.Bottomley, parisc-linux, T-Bone

> On Sun, Jul 23, 2006 at 10:54:38PM -0400, Thibaut VARENE wrote:
> > My take is that we see that bug so much more on pa8800 because of its
> > huge cache and thus because we hit cache much more often than on all
> > other machines...
> 
> I don't think so.  pa8800 has less cache per core than pa8700.  The L2
> cache is ignorable for the purposes of this scenario, since it's
> transparent to software.

It is visible.  You see it in the size returned for the D cache by
PDC_CACHE.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
       [not found]         ` <1153750204.1235.18.camel@mulgrave.il.steeleye.com>
@ 2006-07-24 16:32           ` Grant Grundler
  2006-07-25 14:51             ` James Bottomley
  0 siblings, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2006-07-24 16:32 UTC (permalink / raw)
  To: James Bottomley; +Cc: Matthew Wilcox, parisc-linux, Thibaut VARENE

On Mon, Jul 24, 2006 at 09:10:04AM -0500, James Bottomley wrote:
> What Matthew means is that the L2 cache is PIPT ... you can't get
> aliasing effects in a PIPT cache, so for the purposes of the problem it
> must be ignorable, since we can only get aliasing effects in the L1
> cache which is VIPT.

While I agree in general that a PIPT cache won't have aliasing effects.
ISTR the virtual coherence index (VCI) is part of the "physical address".
If it's not, I'm confused how CPUs on different sockets remain coherent.
I expect the VCI is visible across the Mckinley Bus and thus is part
of the physical address. IOMMU is also pushing out an address that
has VCI bits in it - so DMA remains coherent with CPU virtual addresses.

If I've got this right, then we can have aliasing in PIPT cache.
Willy, can you check the pa8800 ERS and look for "coherence index"
or similar, related words?

thanks,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-24 16:32           ` Grant Grundler
@ 2006-07-25 14:51             ` James Bottomley
  2006-07-25 16:13               ` John David Anglin
  2006-07-25 16:34               ` Thibaut VARENE
  0 siblings, 2 replies; 23+ messages in thread
From: James Bottomley @ 2006-07-25 14:51 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Thibaut VARENE, parisc-linux, Matthew Wilcox

On Mon, 2006-07-24 at 10:32 -0600, Grant Grundler wrote:
> While I agree in general that a PIPT cache won't have aliasing effects.
> ISTR the virtual coherence index (VCI) is part of the "physical address".
> If it's not, I'm confused how CPUs on different sockets remain coherent.
> I expect the VCI is visible across the Mckinley Bus and thus is part
> of the physical address. IOMMU is also pushing out an address that
> has VCI bits in it - so DMA remains coherent with CPU virtual addresses.
> 
> If I've got this right, then we can have aliasing in PIPT cache.
> Willy, can you check the pa8800 ERS and look for "coherence index"
> or similar, related words?

I don't believe the PIPT cache can have a coherence index, otherwise it
would be effectively VIPT (see the virtual index in a form).  A PIPT
cache is fully addressable by the physical address alone.  So, for I/O
we use the CI to evict L1 lines and the simple physaddr (without a CI)
to evict PIPT lines.  Exclusivity means that if we find the line in the
L2 we don't even need to present the physaddr and CI to the L1.
However, verifying this in docs would be a good thing ...

However, here's the kicker ... I think we can get incoherence from clean
aliases in a combined VIPT/PIPT environment.  This is impossible in a
VIPT environment, and something Linux isn't expecting in its cache
management.

The Linux theory of managing inequivalent aliases is that as long as we
ensure that we only ever have one dirty alias, we can never get
corruption (and as long as we flush the aliases before reassigning the
page, we can never get incoherence from clean lines).  However, linux is
perfectly happy to tolerate incoherence on the Kernel/User addresses
provided only one alias is dirty.  We get this in COW breaking (where we
bring in a spurious clean copy of the old page at the kernel VA) and at
other I/O related places.

However, in a combined VIPT/PIPT, there is a nasty scenario to do with
cache eviction.  In general cache lines are selected for eviction (a
process called victim selection) on a LRU basis. For a single cache
level, a clean victim is just discarded, a dirty victim is written back
to main memory. In a cache hierarchy, eviction means the line is written
from the n-1 to the n cache hierarchy.  This gives a scenario where a
clean line is actually rewritten to a lower cache in the hierarchy
rather than being discarded.  If the hierarchy is fully VIPT, this
doesn't matter.

However, if the lower cache is PIPT, it has to do alias combining like
this:  the L1 presents the tag and data to the PIPT L2 (discarding the
virtual index which the PIPT doesn't care about), so the L2 stores this
fully physically indexed and tagged).  Now, assume we have two
inequivalent aliases of the same physical line P at virtual indexes V1
(containing data D1) and V2 (containing data D2).  Even if these are
clean, you get this behaviour:  the L1 selects V1 for eviction and
writes its contents to the L2 at P (which now caches D1).  However, at a
later time the LRU algorithm selects V2 for eviction and writes the line
to the L2 at P (which now caches D2).  If the process later reads V1 for
P, it gets data D2 (an inequivalent alias incoherence of clean lines).
Theoretically, as long as the LRU algorithm is perfect, we should never
get this effect, since the oldest line should be evicted first, and that
should be the one containing the stale data.  However, my bet (which
would require doc verification in the pa8800 ers) is that the LRU isn't
perfect, so there's a tiny window, most likely to be hit on COW
breaking, where the kernel and the user get different data aliases with
the kernel one being only slightly older. If the slightly younger user
alias is selected for eviction first, we get the old kernel data in the
PIPT which is read back in when the user accesses the address again.  

Note: this scenario only works if both lines are clean ... to get the
modified D2, you have to dirty a line and then re-clean it.  i.e. write
it out and then read it back again so fast that it's still in the
imperfect LRU window.  This seems impossible until you consider that
there are lots of operations (mainly around I/O) that will cause cache
flushing of the user alias.  So the final scenario seems to be COW break
to cause the kernel alias, user write, user flush, user read all in the
imperfect LRU window (which would be incredibly rare).

This theory seems to explain all of the behaviour, why we see it so
rarely, and why it seems to happen around forks when I/O is going on in
the system.  Unfortunately it also seems to indicate that the only way
of fixing it would be to have full equivalence.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 14:51             ` James Bottomley
@ 2006-07-25 16:13               ` John David Anglin
  2006-07-25 16:17                 ` James Bottomley
  2006-07-25 16:34               ` Thibaut VARENE
  1 sibling, 1 reply; 23+ messages in thread
From: John David Anglin @ 2006-07-25 16:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: Matthew Wilcox, parisc-linux, Thibaut VARENE

> This theory seems to explain all of the behaviour, why we see it so
> rarely, and why it seems to happen around forks when I/O is going on in
> the system.  Unfortunately it also seems to indicate that the only way
> of fixing it would be to have full equivalence.

How about disabling L2?  This should test your theory.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 16:13               ` John David Anglin
@ 2006-07-25 16:17                 ` James Bottomley
  2006-07-25 16:46                   ` Kyle McMartin
  2006-07-25 22:02                   ` Grant Grundler
  0 siblings, 2 replies; 23+ messages in thread
From: James Bottomley @ 2006-07-25 16:17 UTC (permalink / raw)
  To: John David Anglin; +Cc: Matthew Wilcox, parisc-linux, Thibaut VARENE

On Tue, 2006-07-25 at 12:13 -0400, John David Anglin wrote:
> How about disabling L2?  This should test your theory.

Kyle's working on it based on Grant's original work ... you need access
to a lot of secret diag registers to do this, though, so it's not easy.

James
 

_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 14:51             ` James Bottomley
  2006-07-25 16:13               ` John David Anglin
@ 2006-07-25 16:34               ` Thibaut VARENE
  2006-07-25 16:37                 ` Thibaut VARENE
  1 sibling, 1 reply; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-25 16:34 UTC (permalink / raw)
  To: James Bottomley; +Cc: parisc-linux

On 7/25/06, James Bottomley <James.Bottomley@steeleye.com> wrote:

> This theory seems to explain all of the behaviour, why we see it so
> rarely, and why it seems to happen around forks when I/O is going on in
> the system.  Unfortunately it also seems to indicate that the only way
> of fixing it would be to have full equivalence.

So basically what you explain here should only affect pa8800, and what
we see on other machines is something else, correct?

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 16:34               ` Thibaut VARENE
@ 2006-07-25 16:37                 ` Thibaut VARENE
  0 siblings, 0 replies; 23+ messages in thread
From: Thibaut VARENE @ 2006-07-25 16:37 UTC (permalink / raw)
  To: James Bottomley; +Cc: parisc-linux

On 7/25/06, Thibaut VARENE <T-Bone@parisc-linux.org> wrote:
> On 7/25/06, James Bottomley <James.Bottomley@steeleye.com> wrote:
>
> > This theory seems to explain all of the behaviour, why we see it so
> > rarely, and why it seems to happen around forks when I/O is going on in
> > the system.  Unfortunately it also seems to indicate that the only way
> > of fixing it would be to have full equivalence.
>
> So basically what you explain here should only affect pa8800, and what
> we see on other machines is something else, correct?

And another quick dump: "rarely" only if we have one socket enabled in
the system. Try to enable both sockets, the system won't even boot...
Might be something else again, though.
Yet, that's probably a clue.

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 16:17                 ` James Bottomley
@ 2006-07-25 16:46                   ` Kyle McMartin
  2006-07-25 22:02                   ` Grant Grundler
  1 sibling, 0 replies; 23+ messages in thread
From: Kyle McMartin @ 2006-07-25 16:46 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Thibaut VARENE, parisc-linux, Matthew Wilcox

On Tue, Jul 25, 2006 at 11:17:19AM -0500, James Bottomley wrote:
> Kyle's working on it based on Grant's original work ... you need access
> to a lot of secret diag registers to do this, though, so it's not easy.
> 

WTF? Am not.
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 16:17                 ` James Bottomley
  2006-07-25 16:46                   ` Kyle McMartin
@ 2006-07-25 22:02                   ` Grant Grundler
  2006-07-26 21:54                     ` James Bottomley
  1 sibling, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2006-07-25 22:02 UTC (permalink / raw)
  To: James Bottomley
  Cc: John David Anglin, Matthew Wilcox, parisc-linux, Thibaut VARENE

On Tue, Jul 25, 2006 at 11:17:19AM -0500, James Bottomley wrote:
> On Tue, 2006-07-25 at 12:13 -0400, John David Anglin wrote:
> > How about disabling L2?  This should test your theory.
> 
> Kyle's working on it based on Grant's original work ... you need access
> to a lot of secret diag registers to do this, though, so it's not easy.

You misheard. I'll be poking at this again in the near future.
I've gotten the code to modify diag registers done.
I just need to test the L2 disable and write a wrapper that
does the necessary I/D cache flushing.

grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [parisc-linux] The problem on the PA8800 is all in the data-cache.
  2006-07-25 22:02                   ` Grant Grundler
@ 2006-07-26 21:54                     ` James Bottomley
  0 siblings, 0 replies; 23+ messages in thread
From: James Bottomley @ 2006-07-26 21:54 UTC (permalink / raw)
  To: Grant Grundler
  Cc: John David Anglin, Thibaut VARENE, parisc-linux, Matthew Wilcox

On Tue, 2006-07-25 at 16:02 -0600, Grant Grundler wrote:
> You misheard. I'll be poking at this again in the near future.
> I've gotten the code to modify diag registers done.
> I just need to test the L2 disable and write a wrapper that
> does the necessary I/D cache flushing.

Actually, I found a different way to prove the theory.

In linux, we're careful about sequential accesses to memory (i.e. only
the user or the kernel may touch a page) to avoid the dirty line
aliasing which bites on all architectures.  However, the kernel takes
this care in almost every place, principally because we don't
necessarily know if the user line is dirty or not.  Thus it's possible
to use this guarantee to make the kernel fully equivalently mapped.

The primary engine of this guarantee is the kmap/kunmap (also _atomic).
When we come into a kmap region, we're guaranteed that the user mappings
have been flushed, so really all we have to do is flush the kernel
mappings when we leave.

This flush kernel mapping is a simple two line addition to highmem.h and
with it (as demonstration code), the pa8800 works reliably.

This fix is a gross O(1) one.  There will still be O(2) regions where
this guarantee needs to be fixed up independent of kmap/kunmap.
However, the O(1) fix gets pa8800 to the point where sshd works and we
get only the occasional segfault (I've only found one so far in an hour
of stress testing).

The other thing that will be contributing to O(2) effects is cache
movein.  Even though we flush the page, the CPU is entitled to move it
back into cache if it finds a TLB entry, so we have to begin purging the
TLB entries as we do this as well.

The final problem (once we have the O(2) effects all sorted) will be to
get this upstream.  The kmap API is supposed not to care about coherence
effects, now I'm making it, so I can sell this to the arch maintainers
as eliminating a lot of the kmap/flush_kernel_dcache_page/kunmap that we
do today ... I hope.

James



_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2006-07-26 21:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-22 17:50 [parisc-linux] The problem on the PA8800 is all in the data-cache Carlos O'Donell
2006-07-23  1:01 ` Thibaut VARENE
2006-07-23 16:28   ` Michael S. Zick
2006-07-23 22:03     ` Thibaut VARENE
2006-07-24  1:40       ` Kyle McMartin
2006-07-24  2:39         ` Thibaut VARENE
2006-07-24  2:33 ` James Bottomley
2006-07-24  2:54   ` Thibaut VARENE
2006-07-24  3:32     ` Matthew Wilcox
2006-07-24  4:15       ` Thibaut VARENE
     [not found]         ` <1153750204.1235.18.camel@mulgrave.il.steeleye.com>
2006-07-24 16:32           ` Grant Grundler
2006-07-25 14:51             ` James Bottomley
2006-07-25 16:13               ` John David Anglin
2006-07-25 16:17                 ` James Bottomley
2006-07-25 16:46                   ` Kyle McMartin
2006-07-25 22:02                   ` Grant Grundler
2006-07-26 21:54                     ` James Bottomley
2006-07-25 16:34               ` Thibaut VARENE
2006-07-25 16:37                 ` Thibaut VARENE
2006-07-24 14:58       ` John David Anglin
     [not found]     ` <1153711459.1235.13.camel@mulgrave.il.steeleye.com>
2006-07-24  4:26       ` Thibaut VARENE
2006-07-24  4:31         ` Thibaut VARENE
2006-07-24 14:51   ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox