From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mukesh Rathor Subject: Re: nmi cache weirdness??? Date: Fri, 15 Aug 2008 20:36:01 -0700 Message-ID: <48A64B21.90900@oracle.com> References: Reply-To: mukesh.rathor@oracle.com Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org You are right! What's happening is that when pressing NMI, this hardware generates it on all CPUs. During my prev experiments on different systems, it was always only one CPU, BSP, receiving the external NMI. I was totally not expecting it, and given that watchdog was disabled and nothing else to generate NMI, i was baffled. Anyways, your response got me thinking in the possibiliy of different source. thanks. Mukesh Keir Fraser wrote: > On 15/8/08 03:20, "Mukesh Rathor" wrote: > >> However, in do_nmi(), nmi_callback still points to dummy (receiving cpus). >> What'sinteresting is, if I put two print lines back to back with nothing >> in between right at the beginning, then the first prints dummy but the >> second prints kdb_nmi_receive. I'm at a complete loss. Does NMI change >> cache protocol? I've been looking thru Intel/AMD manuals, but nothing.... > > What you describe is indeed impossible. My guess is that the NMI executing > on the other CPUs is not the one triggered by smp_send_nmi_allbutself() > immediately after set_nmi_callback(). For example, it could be a watchdog > NMI or something like that. > > smp_send_nmi_allbutself() is not safe to call from within NMI context. > send_IPI_mask() is not atomic, and it would be possible for an NMI handler > to interrupt it, reenter it, and corrupt the IPI state being set up by the > context that got interrupted. You can make it safe by saving/restoring the > top half of the APIC ICR register, as that's what would get corrupted. > > -- Keir > >