From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Manohar Vanga <manohar.vanga@gmail.com>
Cc: xen-devel@lists.xen.org
Subject: Re: Problem with simple scheduler
Date: Thu, 2 Jan 2014 09:59:21 +0000 [thread overview]
Message-ID: <52C53879.7090609@citrix.com> (raw)
In-Reply-To: <CAEktxaGXyoBO9bdjxAcO=BH45-cbD52VLreGWxbAG+J_fRRRvA@mail.gmail.com>
[-- Attachment #1.1: Type: text/plain, Size: 3626 bytes --]
On 02/01/14 06:46, Manohar Vanga wrote:
> Hi all,
>
> I've spent the last few weeks trying to debug a weird issue with a new
> scheduler I'm developing for Xen. I have written a barebones
> round-robin scheduler which seems to work fine when starting up Dom0,
> but then at some point during the boot everything just hangs (somewhat
> deterministically from what I can tell from a week of debugging; see
> below).
>
> I've inlined my source code below. I don't expect anyone to read the
> whole thing (although it's quite minimal) so here are the key points:
>
> * I've implemented the following callbacks: init_domain,
> destroy_domain, insert_vcpu, remove_vcpu, sleep, wake, yield,
> pick_cpu, do_schedule, init, deinit, alloc_vdata, free_vdata,
> alloc_pdata, free_pdata, alloc_domdata, free_domdata. Most of
> these are minimal (or in some cases do nothing). Am I missing
> anything critical?
> * The hang occurs even if I'm running Dom0 with just a single vcpu.
> Nothing hangs if I choose a stock scheduler. Either I'm doing
> something foolish that is causing a deadlock (less likely since
> the code structure is borrowed from sched_credit.c) or I'm *not*
> doing something leading to Dom0 crashing and the vcpu just dying.
>
> If you do suspect some specific issue please let me know. Below are
> some of the possible issues that I've investigated but hit dead ends on:
>
> * Checking if my debug printk statements were leading to a deadlock
> due to sleeps in interrupt mode. This doesn't seem to be the case
> since Dom0 hangs during boot even if I disable all debug output.
> * I suspected incorrect queuing operations that might be corrupting
> memory somewhere. However, my debug logs tell me that this is not
> the case. There is at most one element in the runqueue at all
> times (I use Dom0 with 1 vcpu).
> * I also suspected a deadlock due to incorrect locking. However,
> based on what the credit scheduler does in sched_credit.c, I'm
> don't seem to be doing anything significantly different. In
> general though, which callbacks run in interrupt context?
> * In the end, I stuck debug statements in tick_suspend and
> tick_resume and after the hang, those get called infinitely which
> seems like the physical CPU has gone idle. Is this correct? In
> that case, *what am I doing wrong in the scheduler* to cause Dom0
> to crash?
> * The hang occurs around 3-5 seconds into the boot process quite
> deterministically. Could it be some periodic timer going off and
> bugging with my code in weird and wonderful ways?
>
> Also, how do the sleep/wake/yield callbacks work? When do they get
> called? Is there any documentation on the different callbacks with
> regards to when they are called? If I understand everything correctly
> after this, I would gladly create a wiki page explaining this (and
> perhaps a tutorial on writing a simple scheduler; something I wish
> existed!).
>
> I hope the description was enough to help understand my problem. If
> not, feel free to ask for more details :-)
>
> Thanks for reading this far! Source code follows
Using printk()s in the code is going to skew the timing terribly.
A serial console and the 'q' debug key is probably a good start, to see
some vcpu state.
'watchdog' on the Xen command line will enable NMI watchdogs which will
catch deadlocks, but as I don't see a single use of spinlocks in your
code, I doubt this is your issue.
Beyond that, writing a custom keyhandler to dump all of the xfair state
is probably the next thing to try.
~Andrew
[-- Attachment #1.2: Type: text/html, Size: 5022 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2014-01-02 9:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-02 6:46 Problem with simple scheduler Manohar Vanga
2014-01-02 9:59 ` Andrew Cooper [this message]
2014-01-02 10:37 ` Manohar Vanga
2014-01-02 11:16 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52C53879.7090609@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=manohar.vanga@gmail.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.