xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Don Slutz <Don@CloudSwitch.Com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Keir Fraser <keir@xen.org>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Don Slutz <dslutz@verizon.com>,
	xen-devel@lists.xen.org, David Vrabel <david.vrabel@citrix.com>
Subject: Re: [PATCH v2 0/2] Add xen-crashd.
Date: Mon, 2 Dec 2013 12:09:44 -0500	[thread overview]
Message-ID: <529CBED8.2000103@CloudSwitch.Com> (raw)
In-Reply-To: <1385720807.20209.58.camel@kazak.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 9027 bytes --]

On 11/29/13 05:26, Ian Campbell wrote:
> On Fri, 2013-11-15 at 14:20 -0500, Don Slutz wrote:
>
>>    Ian Campbell:
>>      Add 1st pass on some documention on crash's remote protocol.
> My concern with this was that we were using some sort of internal crash
> protocol which has no ABI stability guarantees etc. Documenting it in
> the Xen tree doesn't really do anything to alleviate that concern. It
> should be a protocol which is published by the crash folks not us.
I have no issues with this.  The only documentation I can find is:

    http://people.redhat.com/anderson/crash_whitepaper/



> Ideally they would agree to some sort of protocol stability level, or
> maybe you can show that the protocol had inbuilt backward and forward
> compatibility capabilities already?
It may not have the best backwards and forwards compatibility that could be designed.  However so far I have been able to add features to a newer crash that have no issues with older "crashd" servers. And older crash code works fine with the newer "crashd" servers. This is not the 1st one of these I have coded, just the 1st that I can release.
> Even more concerning is [0] where one of the crash maintainers says:
>> It's been deprecated for almost 10 years now.  I don't understand how
>> you have been able to even get it to build, never mind work as the mail
>> thread indicates?
> We surely don't want to be adding code which relies on a protocol which
> has been deprecated for 10 years!
The main reason that I know of is that crash in active mode (i.e. running live on machine A), is just so much simpler to use that using a remote crash on machine B talking to a crashd on machine A. This is because the crashd on machine A is in "live" mode.  This means that slow or unresponsive systems cannot be examined using the remote protocol.  And keeping the right kernel versions on machine B that you need is just overhead.

With all this in mind, I was not surprised  that it had been deprecated for 10 years.  However with Xen in the mix, the machine A no longer needs to be active to run "crashd", in fact it can be paused, or running, or crashed, or shutdown, etc.


> Daniel K asked about gdbsx -- can that not speak to crash somehow?
It is clearly possible to write a remote crash to remote gdb server, but needing to run 2 servers to connect up crash is to me too complex.  I could also embed the xen-crashd code in gdbsx by adding command line options.  However very little code would be shared. Since I based xen-crashd off of xenctx, it currently uses libxc calls.  gdbsx uses ioctl() directly to do the hyper calls.  It does not appear to support physical addresses. It does not appear to support virtual address to physical address conversion. Quoteing from the crash whitepaper:

    Furthermore, to examine the contents of a live system's kernel internals from user space, the only readily available option has been to use gdb on /proc/kcore. While gdb is an incredibly powerful tool, it is designed to debug user programs, and is not at all "kernel-aware". Consequently, using gdb alone has limited usefulness when looking at kernel memory, essentially constrained to the printing of kernel data structures */if/* the vmlinux file was built with the -g C flag, the disassembly of kernel text, and raw data dumps.


>   Or
> run on /proc/vmcore directly, or be extended to do so?
There is no /proc/vmcore in this case. Extending dom0 linux to provide /proc/1/vmcore, /proc/2/vmcore, etc. (I.E. /proc/<domid>/vmcore) would be a big change and designing a security model for these would also not be quick.

Maybe this will help:

[root@dcs-xen-54 tmp]# xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0  2048 8     r-----    3928.9
P-1-0                                        1  3080 1     -b----      18.0


    [root@dcs-xen-54 tmp]# /usr/lib/xen/bin/xen-crashd 1&
    [1] 1447
    [root@dcs-xen-54 tmp]#  2 Dec 13 11:38:01.042 socket ready on port 5001 after 1 bind call

    [root@dcs-xen-54 tmp]# crash --machdep phys_base=0x200000 localhost:5001 /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux

    crash 6.1.4
    Copyright (C) 2002-2013  Red Hat, Inc.
    Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
    Copyright (C) 1999-2006  Hewlett-Packard Co
    Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
    Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
    Copyright (C) 2005, 2011  NEC Corporation
    Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
    Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
    This program is free software, covered by the GNU General Public License,
    and you are welcome to change it and/or distribute copies of it under
    certain conditions.  Enter "help copying" to see the conditions.
    This program has absolutely no warranty.  Enter "help warranty" for details.

      2 Dec 13 11:38:08.917 Accepted a connection.
    WARNING: daemon cannot access /proc/version

    NOTE: setting phys_base to: 0x200000

    GNU gdb (GDB) 7.3.1
    Copyright (C) 2011 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-unknown-linux-gnu"...

           KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
         DUMPFILE: /dev/mem@localhost  (remote live system)
             CPUS: 1
             DATE: Mon Dec  2 11:37:02 2013
           UPTIME: 00:33:11
    LOAD AVERAGE: 0.01, 0.00, 0.00
            TASKS: 81
         NODENAME: P-1-0.TC5.CloudSwitch.com
          RELEASE: 2.6.18-128.el5
          VERSION: #1 SMP Wed Jan 21 10:41:14 EST 2009
          MACHINE: x86_64  (2400 Mhz)
           MEMORY: 3 GB
              PID: 0
          COMMAND: "swapper"
             TASK: ffffffff802eeae0  [THREAD_INFO: ffffffff803dc000]
              CPU: 0
            STATE: TASK_RUNNING (ACTIVE)

    crash> net
        NET_DEVICE     NAME   IP ADDRESS(ES)
    ffffffff80321e80  lo     127.0.0.1
    ffff8100babd9000  eth1   172.16.64.65
    ffff8100b6c96000  sit0
    crash> q
    [1]+  Done                    /usr/lib/xen/bin/xen-crashd 1


Is almost the same as:

    [root@dcs-xen-54 tmp]# xl dump-core 1 p-1-0.vmore
    [root@dcs-xen-54 tmp]# crash p-1-0.vmore /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux

    crash 6.1.4
    Copyright (C) 2002-2013  Red Hat, Inc.
    Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
    Copyright (C) 1999-2006  Hewlett-Packard Co
    Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
    Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
    Copyright (C) 2005, 2011  NEC Corporation
    Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
    Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
    This program is free software, covered by the GNU General Public License,
    and you are welcome to change it and/or distribute copies of it under
    certain conditions.  Enter "help copying" to see the conditions.
    This program has absolutely no warranty.  Enter "help warranty" for details.

    GNU gdb (GDB) 7.3.1
    Copyright (C) 2011 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-unknown-linux-gnu"...

           KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
         DUMPFILE: p-1-0.vmore
             CPUS: 1
             DATE: Mon Dec  2 11:05:09 2013
           UPTIME: 00:01:18
    LOAD AVERAGE: 2.00, 0.70, 0.24
            TASKS: 81
         NODENAME: P-1-0.TC5.CloudSwitch.com
          RELEASE: 2.6.18-128.el5
          VERSION: #1 SMP Wed Jan 21 10:41:14 EST 2009
          MACHINE: x86_64  (2400 Mhz)
           MEMORY: 3 GB
            PANIC: ""
              PID: 0
          COMMAND: "swapper"
             TASK: ffffffff802eeae0  [THREAD_INFO: ffffffff803dc000]
              CPU: 0
            STATE: TASK_RUNNING (ACTIVE)
          WARNING: panic task not found

    crash> net
        NET_DEVICE     NAME   IP ADDRESS(ES)
    ffffffff80321e80  lo     127.0.0.1
    ffff8100babd9000  eth1   172.16.64.65
    ffff8100b6c96000  sit0
    crash> quit

With the changes in crash 7.0.4 (yet to be released), crash can be invoked in a remote "not live" mode, which is how it runs on a vmcore file.

So if a DomU is paused, "xl dump-core;crash" and"xen-crashd;crash" will give the exact same answers in a lot less real time (xen-crashd case).


    -Don Slutz
>
> Ian.
>
> [0]
> http://thread.gmane.org/gmane.linux.kernel.crash-dump.crash-utility/4714/focus=4736
>


[-- Attachment #1.2: Type: text/html, Size: 14465 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2013-12-02 17:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-15 19:20 [PATCH v2 0/2] Add xen-crashd Don Slutz
2013-11-15 19:20 ` [PATCH v2 1/2] xen-crashd: Connect crash with domain Don Slutz
2013-11-15 19:20 ` [PATCH v2 2/2] MAINTAINERS: Add xen-crashd maintainer Don Slutz
2013-11-19  0:19 ` [PATCH v2 0/2] Add xen-crashd Don Slutz
2013-11-29 10:26 ` Ian Campbell
2013-12-02 17:09   ` Don Slutz [this message]
2013-12-05 11:23   ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=529CBED8.2000103@CloudSwitch.Com \
    --to=don@cloudswitch.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=dslutz@verizon.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).