All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Slutz <dslutz@verizon.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Don Slutz <dslutz@verizon.com>,
	xen-devel@lists.xen.org
Subject: Re: [PATCH 1/1] xentrace: Add TRC_HW_VCHIP
Date: Fri, 28 Mar 2014 09:18:57 -0400	[thread overview]
Message-ID: <533576C1.1070203@terremark.com> (raw)
In-Reply-To: <53356EF402000078000035D7@nat28.tlf.novell.com>


On 03/28/14 07:45, Jan Beulich wrote:
>>>> On 28.03.14 at 12:25, <dslutz@verizon.com> wrote:
>> This add a set of trace events that track the setup of various
>> virtual chips related to timers in domU.
>>
>> This set is hpet, pit (i8253, i8254), rtc (MC146818), apic (lapic),
>> and pic (i8259).  The pmtimer is not traced since it does not have a
>> changeable rate.
> But you're not saying anything about why this would be useful
> (considering that it wasn't needed before), and hence don't
> provide a reason for taking this change.

Thank you for asking.  I am assuming from this that some
patches (like this one) should have this.

This is an area that I am very weak on.  No simple statement
comes to mind.  So here is the story of how this patch came
about.

Months ago, 1 server for about 2 to 3 days would after 1st boot
have the 1 domU hang (1 out of 10 times) with the 1st interesting
message on the domU console of:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

Since I know that this message was added to deal with certain
bad motherboards and that xen does not have this issue, I
started looking into this.

I considered add code like:

         HVM_DBG_LOG(DBG_LEVEL_VLAPIC_TIMER, "value[0x%016"PRIx64"]", 
value);

but was not sure this would not change the timer enough to
stop the bug from happening.  So I added this patch.

I then spent a lot of time trying to reproduce this issue.  I was
not able to, nor was the person and server that reported it was
able to.  This was under various configurations:

1) No change.

2) debug=y xen build

3) debug=n + patch

4) debug=y + patch

Using a few of the trace files and the source code of the domU's
kernel, I was able to determine that the hpet.c code was involved.

Using this knowledge, I made a patch to xen to simulate various
values of "diff" (tn_cmp - cur_tick).  With this debug code I was able
to generate the hang on demand.  This work is what caused me to
post the patch:

hpet: Act more like real hardware
http://lists.xen.org/archives/html/xen-devel/2014-02/msg02408.html

Which I now know to not be complete.  More testing after that time
has shown that 'diff > 0' will also cause this report if diff is large
enough.  Armed with this I went back to a few saved traces that I
had an was able to determine that the first interval in the calls to
create_periodic_time() (i.e. diff) had a very high variance.  I no
longer have the actual data, but my memory was that the
hpet_tick_to_ns(h, diff) values ranged from 23,696ns to 955,456ns.

More looking into linux in this area and learning about hpet hardware
and specification leads me to fact that this should not be happening.

I am still working on the set of changes to the hpet.c code to fix the
set of bugs that I think are there.


So I only know that this patch did provide very useful data to me. I
would think that it would be a help to developers in the future.

I think would could write a complex analysis program of the current
trace data and infer what some of the trace data was.  This to me is
a lot harder.  Since this is a new independent selectable trace a
developer can deside to include of exclude these.

This is long (and not a good reason for taking this change) but I
hope it helps.

    -Don Slutz



> Jan
>

  reply	other threads:[~2014-03-28 13:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-28 11:25 [PATCH 1/1] xentrace: Add TRC_HW_VCHIP Don Slutz
2014-03-28 11:45 ` Jan Beulich
2014-03-28 13:18   ` Don Slutz [this message]
2014-03-28 13:24     ` Jan Beulich
2014-03-28 12:04 ` Tim Deegan
2014-03-28 13:49   ` Don Slutz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=533576C1.1070203@terremark.com \
    --to=dslutz@verizon.com \
    --cc=JBeulich@suse.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.