From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Ofsthun Subject: Re: [timer/ticks related] dom0 hang during boot on large 1TB system Date: Mon, 21 Dec 2009 14:17:57 -0500 Message-ID: <4B2FC9E5.4050001@oracle.com> References: <20091217203636.76a10aea@mantra.us.oracle.com> <20091218204318.180e58f3@mantra.us.oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20091218204318.180e58f3@mantra.us.oracle.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Mukesh Rathor Cc: Dan Magenheimer , "Xen-devel@lists.xensource.com" , Hackel , jeremy@goop.org, Keir Fraser , Kurt@acsinet11.oracle.com List-Id: xen-devel@lists.xenproject.org Mukesh Rathor wrote: > On Fri, 18 Dec 2009 07:02:55 +0000 > Keir Fraser wrote: > >> On 18/12/2009 04:36, "Mukesh Rathor" wrote: >> >>> The other fix I thought of was to change INITIAL_JIFFIES to >>> something sooner. >>> >>> Would appreciate any help, I don't understand xen time management >>> well. >> This isn't really Xen time code, but unchanged Linux time code. I >> don't know which tree you quoted the code from -- 2.6.18 has similar >> but not identical. Anyway, I suggest try using the jiffy-comparison >> macros from : time_before(), time_after(), etc. >> These are designed to work even when jiffies wraps. Feel free to send >> patch(es) for that, if you test that out and it works okay. >> >> -- Keir >> > > Ok, I came up with the following patch. Jeremy, can you please take a > look also, and comment on my fix since I noticed you've got the same > issue in your tree. Here's a summary for your benefit: > > init/calibrate.c : calibrate_delay_direct(): > > start_jiffies = get_jiffies_64(); > while (get_jiffies_64() <= (start_jiffies + tick_divider)) { > pre_start = start; > read_current_timer(&start); > } > Linux time code explicitly forces jiffies (32-bit) to wrap soon after boot to prevent other kernel code from making assumptions about jiffies wrap. In your case, I'm guessing that the scrubbing delay is causing a sufficient number of timer interrupts to be delayed (queued up) that it is forcing the jiffies to wrap earlier in the boot path than expected. As Keir suggests, the correct solution is probably to use the time_before/after macros appropriately. The proposed code avoids the problem by accessing jiffies_64 instead. > if first ever timer interrupt comes after start_jiffies is set, dom0 boot > may hang if delta in timer_interrupt() is so huge that it causes jiffies > to wrap. It appears delta is very large when memory is more than 512GB on > certain boxes causing wrap around. > > why is delta in dom0->timer_interrupt() related to memory on system? > Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it > appears lot of page scurbbing results in huge delta on first tick. The problem here may be that timers are running in the domain while the vcpu is not. Steve