From mboxrd@z Thu Jan 1 00:00:00 1970 From: Atom2 Subject: Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM with PCI passthrough Date: Tue, 04 Nov 2014 17:14:01 +0100 Message-ID: <5458FB49.4040801@web2web.at> References: <544EB843.9060503@web2web.at> <1414493998.10206.3.camel@citrix.com> <544FB8C4.9000102@web2web.at> <1414512266.10974.5.camel@citrix.com> <54503440.3050302@web2web.at> <5452C43C.6050800@web2web.at> <5458ED27.8060502@web2web.at> <1415115868.11486.49.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1415115868.11486.49.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Am 04.11.14 um 16:44 schrieb Ian Campbell: > On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote: >> I assume it may be warranted to "upgrade" this issue to a bug status >> (obviously also in the hope that it attractes wider interest) by >> prefixing the subject line with a [BUG] prefix as per >> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen_Project. I have >> exhausted all my options (including numerous IRC attempts), provided all >> the information I have been asked for but the issue persists and nobody >> seems to have an idea how to rectify the problem. > > Sorry for the delay, the issue is quite perplexing so I was intending to > sleep on it, but didn't get any inspiration in doing so... Thanks for getting back ... obviously sometimes sleep is not the right cure. > > In the gdb traces you provided there is: > #10 read_all (fd=10, data=data@entry=0x7ffff0000a10, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374 > Just to be on the same page: That was for the destroy case. The corresponding line for the create case was: #10 read_all (fd=18, data=data@entry=0x7fffe80008d0, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374 I don't know whether that makes any difference though. > which seems to correspond to the > if (!read_all(h->fd, &msg->hdr, sizeof(msg->hdr), nonblocking)) { /* Cancellation point */ I did have a look at the file xs.c as well in the source and there are 3 source code files named xs.c: tools/xenstore/xs.c tools/python/xen/lowlevel/xs/xs.c extras/mini-os/lib/xs.c Out of these only the first two do have at least 374 lines and only the first one has a non empty source code line at line 374. That line however reads as follows in my source: done = read(fd, data, len) and is located in function static bool read_all(int fd, void *data, unsigned int len, int nonblocking) starting at line 361 The line you referr to is located at line 1139 in the same file. I just wanted to bring this to your attention, but I might be on the wrong track here ... > in read_message (because the size and offset seem matches this call, so > I think it is more likely than the other one, but the logic below > applies in either case). > > The thing we are reading into has literally just been allocated, so I > can't think of any reason accessing it should fault. > > There is only one xenstore change between 4.3.1 and 4.3.3 which is > commit 014f9219f1dca3ee92948f0cfcda8d1befa6cbcd > Author: Matthew Daley > Date: Sat Nov 30 13:20:04 2013 +1300 > > xenstore: sanity check incoming message body lengths > > This is for the client-side receiving messages from xenstored, so there > is no security impact, unlike XSA-72. > > but I can't see any way that could possibly cause a segfault. > > So, I'm afraid I'm completely mystified. > > You could try running the xl command under valgrind, you may find "xl > create -F" (which keeps xl in the foreground) handy if you try this. > That might help catch any heap corruption etc. I don't know what valgrind is, but I'll have a look and see how to deal with that ... > > A related thing to try might be to run "MALLOC_CHECK_=2 xl create ..." > which enables glib's heap consistency checks (described at the end of > http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html) which might give a clue. I tried that, but the same segfault and no more messages on the screen - or should I have run this under gdb as well? > > Otherwise I think the next step would be to downgrade to 4.3.1 and see > if the problem persists, in order to rule out changes elsewhere in the > system. If the problem doesn't happen with a 4.3.1 rebuilt on your > current system then the next thing would probably be to bisect the > issue. There are only 31 toolstack changes in that range, so it ought to > only take 5-6 iterations. Unfortunately 4.3.1 is no longer available as an ebuild as 4.3.3 seemed to fix security issues and therefore 4.3.1 has been deleted from the repos. So it's not straightforward and I need to figure out how to get the old version back. But I am sure there's a way. Thanks Atom2