public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked
@ 2002-03-06 19:45 Luck, Tony
  2002-03-06 20:22 ` David Mosberger
  2002-03-06 20:59 ` Fleckenstein, Chuck
  0 siblings, 2 replies; 3+ messages in thread
From: Luck, Tony @ 2002-03-06 19:45 UTC (permalink / raw)
  To: linux-ia64

Some systems running an old kernel (2.4.7) have been seen to
hang looping in an apparent recursive TLB fault.  The same tests
that locked up these machines seem to run fine on new kernels,
but while looking into the issue the following oddity was
noted in the code, that still exists in 2.4.18

in arch/ia64/kernel/ivt.S we have:
ENTRY(page_fault)
        ssm psr.dt
        ;;
        srlz.i
        ;;
        SAVE_MIN_WITH_COVER

and minstate.h defines:

#define SAVE_MIN_WITH_COVER     DO_SAVE_MIN(cover, mov rCRIFS=cr.ifs,)

which in turn says:

/*
 * DO_SAVE_MIN switches to the kernel stacks (if necessary) and saves
 * the minimum state necessary that allows us to turn psr.ic back
 * on.
 *
 * Assumed state upon entry:
 *      psr.ic: off
 *      psr.dt: off
 *      r31:    contains saved predicates (pr)
 *
	...

See how page_fault explicitly sets psr.dt, and then invokes a macro
that says that the assumed entry state is psr.dt should be off. Is
the comment just plain wrong, or is there a potential issue here?

The 2.4.7 failure hits at around the 67 hour mark in the tests, the
newer (RedHat 7.2 a.k.a. 2.4.9-18) kernel survives 72 hours ... but
that's as long as we scheduled the test to run.

-Tony


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked
  2002-03-06 19:45 [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked Luck, Tony
@ 2002-03-06 20:22 ` David Mosberger
  2002-03-06 20:59 ` Fleckenstein, Chuck
  1 sibling, 0 replies; 3+ messages in thread
From: David Mosberger @ 2002-03-06 20:22 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 6 Mar 2002 11:45:38 -0800, "Luck, Tony" <tony.luck@intel.com> said:

  Tony> See how page_fault explicitly sets psr.dt, and then invokes a
  Tony> macro that says that the assumed entry state is psr.dt should
  Tony> be off. Is the comment just plain wrong, or is there a
  Tony> potential issue here?

The comment is wrong.  It used to be correct, but when the MC folks
changed the code to do the min save through virtual space, the comment
apparently wasn't updated.  I'll fix this.

Thanks,

	--david


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked
  2002-03-06 19:45 [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked Luck, Tony
  2002-03-06 20:22 ` David Mosberger
@ 2002-03-06 20:59 ` Fleckenstein, Chuck
  1 sibling, 0 replies; 3+ messages in thread
From: Fleckenstein, Chuck @ 2002-03-06 20:59 UTC (permalink / raw)
  To: linux-ia64


not sure if this is the same problem you are seeing, but
with earlier kernels we were seeing a recursive fault and tracked it down
to an old serialization bug (has since been fixed via code rewrite in new
kernels)..

the problem was a serialization issue in ia64_switch_to (entry.S) and it was
due to a 
load of the stack pointer that could be done before the insertion for the
new mapping was
completed...  we just moved the load down below the ic serialization to make
sure
the insertion was completed before trying to do the access..
The itr.d is located at the bottom of the switch_to routine and then
immediately
branched to .done...

not sure if this the same issue you are encountering..

my 0.5 cents worth for today...

Chuck

###############  diffs between non faulting and faulting kernels...

Index: entry.S
=================================RCS file: /ehome/cvs/CVSROOT/linux.sv/arch/ia64/kernel/entry.S,v
retrieving revision 1.13
retrieving revision 1.12
diff -c -r1.13 -r1.12
*** entry.S	2002/01/07 23:36:43	1.13
--- entry.S	2001/12/05 00:16:07	1.12
***************
*** 153,163 ****
  (p6)	cmp.eq p7,p6=r26,r27
  (p6)	br.cond.dpnt.few .map
  	;;
! .done:
  (p6)	ssm psr.ic			// if we we had to map, renable the
psr.ic bit FIRST!!!
  	;;
  (p6)	srlz.d
- 	ld8 sp=[r21]			// load kernel stack pointer of new
task
  	mov IA64_KR(CURRENT)=r20	// update "current" application
register
  	mov r8=r13			// return pointer to previously
running task
  	mov r13=in0			// set "current" pointer
--- 153,162 ----
  (p6)	cmp.eq p7,p6=r26,r27
  (p6)	br.cond.dpnt.few .map
  	;;
! .done:	ld8 sp=[r21]			// load kernel stack pointer
of new task
  (p6)	ssm psr.ic			// if we we had to map, renable the
psr.ic bit FIRST!!!
  	;;
  (p6)	srlz.d
  	mov IA64_KR(CURRENT)=r20	// update "current" application
register
  	mov r8=r13			// return pointer to previously
running task
  	mov r13=in0			// set "current" pointer

> -----Original Message-----
> From: Luck, Tony [mailto:tony.luck@intel.com]
> Sent: Wednesday, March 06, 2002 11:46 AM
> To: linux-ia64@linuxia64.org
> Subject: [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked
> 
> 
> Some systems running an old kernel (2.4.7) have been seen to
> hang looping in an apparent recursive TLB fault.  The same tests
> that locked up these machines seem to run fine on new kernels,
> but while looking into the issue the following oddity was
> noted in the code, that still exists in 2.4.18
> 
> in arch/ia64/kernel/ivt.S we have:
> ENTRY(page_fault)
>         ssm psr.dt
>         ;;
>         srlz.i
>         ;;
>         SAVE_MIN_WITH_COVER
> 
> and minstate.h defines:
> 
> #define SAVE_MIN_WITH_COVER     DO_SAVE_MIN(cover, mov rCRIFS=cr.ifs,)
> 
> which in turn says:
> 
> /*
>  * DO_SAVE_MIN switches to the kernel stacks (if necessary) and saves
>  * the minimum state necessary that allows us to turn psr.ic back
>  * on.
>  *
>  * Assumed state upon entry:
>  *      psr.ic: off
>  *      psr.dt: off
>  *      r31:    contains saved predicates (pr)
>  *
> 	...
> 
> See how page_fault explicitly sets psr.dt, and then invokes a macro
> that says that the assumed entry state is psr.dt should be off. Is
> the comment just plain wrong, or is there a potential issue here?
> 
> The 2.4.7 failure hits at around the 67 hour mark in the tests, the
> newer (RedHat 7.2 a.k.a. 2.4.9-18) kernel survives 72 hours ... but
> that's as long as we scheduled the test to run.
> 
> -Tony
> 
> _______________________________________________
> Linux-IA64 mailing list
> Linux-IA64@linuxia64.org
> http://lists.linuxia64.org/lists/listinfo/linux-ia64
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-03-06 20:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-03-06 19:45 [Linux-ia64] psr.dt state when DO_SAVE_MIN is invoked Luck, Tony
2002-03-06 20:22 ` David Mosberger
2002-03-06 20:59 ` Fleckenstein, Chuck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox