public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Francois WELLENREITER <francois.wellenreiter@bull.net>
To: linux-ia64@vger.kernel.org
Subject: Re: 2.6.16 fails to resume after INIT in user space
Date: Wed, 05 Apr 2006 12:16:17 +0000	[thread overview]
Message-ID: <4433B511.1010301@bull.net> (raw)
In-Reply-To: <12848.1144211334@kao2.melbourne.sgi.com>



                                    Hi Keith and all,

          concerning this issue, it works well on Bull Novascale 5160.

However, have you tested INIT feature with a 2.6.15 kernel ?
Indeed, since this kernel version, I have noticed that on Intel Tiger
machines,
the behavior was exactly the same than the description you are giving
here below.
After a more detailed investigation with an ITP, I have seen that the
trouble ever happens
when executing the following code :

________________________________________

ia64_old_stack:
    add regs=MCA_PT_REGS_OFFSET, r3
    mov b0=r2            // save return address
    GET_IA64_MCA_DATA(temp2)
    LOAD_PHYSICAL(p0,temp1,1f)
    ;;
    mov cr.ipsr=r0
    mov cr.ifs=r0
    mov cr.iip=temp1
    ;;
    invala
    rfi   <---------------------------------------
________________________________________

After rfi instruction, the kernel INIT handler is called again instead
of executing the code
located at "temp1" address.
Since we provide our own SAL version on NS5160 machines, I think that
the problem might be located at the SAL level,

My comprehension is that there might be a misfunctioning in the SAL
concerning INIT event management
and when psr.mc bit is forced to 0 again, the previous INIT signal is
not filtered anymore, and the entire INIT call chain
is executed again. But it is just a personal interpretation and I have
no proof about this.
This point has been submitted to Intel gurus and is under investigation.

Best regards,

                                                                        
         Francois WELLENREITER

>2.6.16 on SN2, compiled with gcc 3.3.3, no KDB.
>
>The SN2 controller 'NMI' command sends INIT to all processors, one as
>monarch, the rest as slaves.  If all the processors are in kernel space
>(including idle) then INIT resumes after dumping the process list.  If
>any of the processors are in user space then INIT claims to resume but
>gets something wrong, the system becomes dead.
>
>Send first NMI
>
>  Entered OS INIT handler. PSPÿe301a0 cpu=0 monarch=0
>  cpu 0, INIT occurred in user space, original stack not modified
>  Entered OS INIT handler. PSPÿe301a0 cpu=3 monarch=0
>  Entered OS INIT handler. PSPÿe301a0 cpu=2 monarch=0
>  Entered OS INIT handler. PSPÿe301a0 cpu=1 monarch=1
>  Delaying for 5 seconds...
>  Processes interrupted by INIT - 0 (cpu 1 task 0xe00000b47a4b8000) 0 (cpu 2 task 0xe00000b47a4e8000) 0 (cpu 3 task 0xe00000b47a500000)
>
>  ... process dump ...
>
>  INIT dump complete.  Monarch on cpu 1 returning to normal service.
>  Slave on cpu 0 returning to normal service.
>  Slave on cpu 3 returning to normal service.
>  Slave on cpu 2 returning to normal service.
>
>  ... No response ...
>
>Send second NMI
>
>  Entered OS INIT handler. PSPÿe301a0 cpu=3 monarch=0
>  Entered OS INIT handler. PSPÿe301a0 cpu=0 monarch=0
>  cpu 0, INIT inconsistent previous current and r13, original stack not modified
>  Entered OS INIT handler. PSPÿe301a0 cpu=2 monarch=0
>  Entered OS INIT handler. PSPÿe301a0 cpu=1 monarch=1
>  Delaying for 5 seconds...
>  Processes interrupted by INIT - 0 (cpu 1 task 0xe00000b47a4b8000) 0 (cpu 2 task 0xe00000b47a4e8000) 0 (cpu 3 task 0xe00000b47a500000)
>
>cpu 0 was running in user space during the first NMI, so the original
>stack was not modified.  On the second NMI, current for cpu 0 does not
>match r13.  Which means that something went wrong when processing the
>first NMI while the process was in user space.
>
>I am still investigating this problem, but any other eyes on the code
>would be appreciated.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>  
>

      reply	other threads:[~2006-04-05 12:16 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-05  4:28 2.6.16 fails to resume after INIT in user space Keith Owens
2006-04-05 12:16 ` Francois WELLENREITER [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4433B511.1010301@bull.net \
    --to=francois.wellenreiter@bull.net \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox