From mboxrd@z Thu Jan  1 00:00:00 1970
From: Atom2 <ariel.atom2@web2web.at>
Subject: Re: [BUG] XEN 4.3.3 - segfault in xl create for HVM
 with PCI passthrough
Date: Tue, 04 Nov 2014 17:14:01 +0100
Message-ID: <5458FB49.4040801@web2web.at>
References: <544EB843.9060503@web2web.at>			<1414493998.10206.3.camel@citrix.com>	<544FB8C4.9000102@web2web.at>		<1414512266.10974.5.camel@citrix.com>	<54503440.3050302@web2web.at>	
	<5452C43C.6050800@web2web.at> <5458ED27.8060502@web2web.at>
	<1415115868.11486.49.camel@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1415115868.11486.49.camel@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

Am 04.11.14 um 16:44 schrieb Ian Campbell:
> On Tue, 2014-11-04 at 16:13 +0100, Atom2 wrote:
>> I assume it may be warranted to "upgrade" this issue to a bug status
>> (obviously also in the hope that it attractes wider interest) by
>> prefixing the subject line with a [BUG] prefix as per
>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen_Project. I have
>> exhausted all my options (including numerous IRC attempts), provided all
>> the information I have been asked for but the issue persists and nobody
>> seems to have an idea how to rectify the problem.
>
> Sorry for the delay, the issue is quite perplexing so I was intending to
> sleep on it, but didn't get any inspiration in doing so...
Thanks for getting back ... obviously sometimes sleep is not the right cure.
>
> In the gdb traces you provided there is:
> #10 read_all (fd=10, data=data@entry=0x7ffff0000a10, len=len@entry=16, nonblocking=nonblocking@entry=0) at xs.c:374
>
Just to be on the same page: That was for the destroy case. The 
corresponding line for the create case was:
#10 read_all (fd=18, data=data@entry=0x7fffe80008d0, len=len@entry=16, 
nonblocking=nonblocking@entry=0) at xs.c:374

I don't know whether that makes any difference though.
> which seems to correspond to the
>          if (!read_all(h->fd, &msg->hdr, sizeof(msg->hdr), nonblocking)) { /* Cancellation point */
I did have a look at the file xs.c as well in the source and there are 3 
source code files named xs.c:
	tools/xenstore/xs.c
	tools/python/xen/lowlevel/xs/xs.c
	extras/mini-os/lib/xs.c
Out of these only the first two do have at least 374 lines and only the 
first one has a non empty source code line at line 374. That line 
however reads as follows in my source:
	done = read(fd, data, len)
and is located in function
	static bool read_all(int fd, void *data, unsigned int len, int nonblocking)
starting at line 361

The line you referr to is located at line 1139 in the same file. I just 
wanted to bring this to your attention, but I might be on the wrong 
track here ...

> in read_message (because the size and offset seem matches this call, so
> I think it is more likely than the other one, but the logic below
> applies in either case).
>
> The thing we are reading into has literally just been allocated, so I
> can't think of any reason accessing it should fault.
>
> There is only one xenstore change between 4.3.1 and 4.3.3 which is
>          commit 014f9219f1dca3ee92948f0cfcda8d1befa6cbcd
>          Author: Matthew Daley <mattd@bugfuzz.com>
>          Date:   Sat Nov 30 13:20:04 2013 +1300
>
>              xenstore: sanity check incoming message body lengths
>
>              This is for the client-side receiving messages from xenstored, so there
>              is no security impact, unlike XSA-72.
>
> but I can't see any way that could possibly cause a segfault.
>
> So, I'm afraid I'm completely mystified.
>
> You could try running the xl command under valgrind, you may find "xl
> create -F" (which keeps xl in the foreground) handy if you try this.
> That might help catch any heap corruption etc.
I don't know what valgrind is, but I'll have a look and see how to deal 
with that ...
>
> A related thing to try might be to run "MALLOC_CHECK_=2 xl create ..."
> which enables glib's heap consistency checks (described at the end of
> http://www.gnu.org/software/libc/manual/html_node/Heap-Consistency-Checking.html) which might give a clue.
I tried that, but the same segfault and no more messages on the screen - 
or should I have run this under gdb as well?
>
> Otherwise I think the next step would be to downgrade to 4.3.1 and see
> if the problem persists, in order to rule out changes elsewhere in the
> system. If the problem doesn't happen with a 4.3.1 rebuilt on your
> current system then the next thing would probably be to bisect the
> issue. There are only 31 toolstack changes in that range, so it ought to
> only take 5-6 iterations.
Unfortunately 4.3.1 is no longer available as an ebuild as 4.3.3 seemed 
to fix security issues and therefore 4.3.1 has been deleted from the 
repos. So it's not straightforward and I need to figure out how to get 
the old version back. But I am sure there's a way.

Thanks Atom2