From: Andrew Cooper <andrew.cooper3@citrix.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>,
"Ian Campbell" <ian.campbell@citrix.com>
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>,
Ian Jackson <Ian.Jackson@eu.citrix.com>,
Wei Liu <wei.liu2@citrix.com>,
xen-devel@lists.xen.org
Subject: Re: [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable
Date: Wed, 1 Apr 2015 15:36:36 +0100 [thread overview]
Message-ID: <551C0274.3050105@citrix.com> (raw)
In-Reply-To: <551BC9B0.10000@citrix.com>
On 01/04/15 11:34, Roger Pau Monné wrote:
> Hello,
>
> El 30/03/15 a les 12.40, Ian Campbell ha escrit:
>> On Thu, 2015-03-26 at 16:20 +0100, Roger Pau Monné wrote:
>>> El 16/03/15 a les 14.29, Ross Lagerwall ha escrit:
>>>> From: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>
>>>> POLLHUP|POLLIN is a valid revent to receive when there is readable data in a
>>>> pipe, but the writable fd has been closed. This occurs in migration v2 when
>>>> the legacy conversion process (which transforms the data inline) completes and
>>>> exits successfully.
>>>>
>>>> In the case that there is data to read, suppress the POLLHUP. POSIX states
>>>> that the hangup state is latched[1], which means it will reoccur on subsequent
>>>> poll() calls. The datacopier is thus provided the opportunity to read until
>>>> EOF, if possible.
>>>>
>>>> A POLLHUP on its own is treated exactly as before, indicating a different
>>>> error with the fd.
>>>>
>>>> [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
>>> This patch breaks booting pygrub guests on FreeBSD, it seems to trigger
>>> some kind of error when pygrub exits:
>>>
>>> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
>>> libxl: error: libxl_create.c:1129:domcreate_rebuild_done: cannot (re-)build domain: -3
>>>
>>> I will try to look into it, have you tested if the same happens on
>>> Linux?
>> Ross/Andrew, have you tried this on Linux?
>>
>> Roger, Does xl -vvv or even strace give any clues as to the sequences of
>> events which lead to the error?
> Here is the output of xl -vvv, it doesn't seem to contain any useful
> info apart from what I've already posted:
>
> # xl -vvv create debian.cfg
> Parsing config from debian.cfg
> libxl: debug: libxl_create.c:1521:do_domain_create: ao 0x802834080: create: how=0x0 callback=0x0 poller=0x80281f0a0
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=xvda spec.backend=unknown
> libxl: debug: libxl_device.c:298:libxl__device_disk_set_backend: Disk vdev=xvda, using backend phy
> libxl: debug: libxl_create.c:924:initiate_domain_create: running bootloader
> libxl: debug: libxl_device.c:269:libxl__device_disk_set_backend: Disk vdev=(null) spec.backend=phy
> libxl: debug: libxl.c:3065:libxl__device_disk_local_initiate_attach: locally attaching PHY disk /dev/zvol/tank/debian
> libxl: debug: libxl_bootloader.c:411:bootloader_disk_attached_cb: Config bootloader value: pygrub
> libxl: debug: libxl_bootloader.c:427:bootloader_disk_attached_cb: Checking for bootloader in libexec path: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_create.c:1537:do_domain_create: ao 0x802834080: inprogress: poller=0x80281f0a0, flags=i
> libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: register slotnum=3
> libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0x802834080: progress report: ignored
> libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing bootloader: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader arg: /usr/local/lib/xen/bin/pygrub
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader arg: --output=/var/run/xen/bootloader.2.out
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader arg: --output-format=simple0
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader arg: --output-directory=/var/run/xen/bootloader.2.d
> libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader arg: /dev/zvol/tank/debian
> libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: event epath=/local/domain/2
> libxl: error: libxl_bootloader.c:596:bootloader_copyfail: unexpected eof copying bootloader output
> libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader completed
> libxl: debug: libxl_event.c:615:libxl__ev_xswatch_deregister: watch w=0x8028367a0 wpath=/local/domain/2 token=3/0: deregister slotnum=3
> libxl: error: libxl_create.c:1138:domcreate_rebuild_done: cannot (re-)build domain: -3
> libxl: info: libxl.c:1699:devices_destroy_cb: forked pid 763 for destroy of domain 2
> libxl: debug: libxl_event.c:1766:libxl__ao_complete: ao 0x802834080: complete, rc=-3
> libxl: debug: libxl_event.c:1738:libxl__ao__destroy: ao 0x802834080: destroy
> xc: debug: hypercall buffer: total allocations:44 total releases:44
> xc: debug: hypercall buffer: current allocations:0 maximum allocations:2
> xc: debug: hypercall buffer: cache current size:2
> xc: debug: hypercall buffer: cache hits:35 misses:2 toobig:7
>
> And this is (I think) the relevant output from ktrace/kdump:
>
> 766 xl RET read 20/0x14
> 766 xl CALL read(0xc,0x80280702f,0x3cd)
> 766 xl RET read -1 errno 35 Resource temporarily unavailable
> 766 xl CALL gettimeofday(0x7fffffffde90,0)
> 766 xl RET gettimeofday 0
> 766 xl CALL poll(0x802816180,0x6,0xffffffff)
> 766 xl RET poll 1
> 766 xl CALL gettimeofday(0x7fffffffde90,0)
> 766 xl RET gettimeofday 0
> 766 xl CALL write(0x10,0x802807014,0x1b)
> 766 xl GIO fd 16 wrote 27 bytes
> 0x0000 1b5b 3f31 6c1b 3e1b 5b32 343b 3148 0000 0000 000d 1b5b |.[?1l.>.[24;1H.......[|
> 0x0016 3f31 6c1b 3e |?1l.>|
>
> 766 xl RET write 27/0x1b
> 766 xl CALL gettimeofday(0x7fffffffde90,0)
> 766 xl RET gettimeofday 0
> 766 xl CALL poll(0x802816180,0x5,0xffffffff)
> 766 xl RET poll 1
> 766 xl PSIG SIGCHLD caught handler=0x80114bdf0 mask=0x0 code=CLD_EXITED
> 766 xl CALL sigprocmask(SIG_SETMASK,0x7fffffffd81c,0)
> 766 xl RET sigprocmask 0
> 766 xl CALL write(0xe,0x800af1fd2,0x1)
> 766 xl GIO fd 14 wrote 1 byte
> "\0"
> 766 xl RET write 1
> 766 xl CALL sigreturn(0x7fffffffd440)
> 766 xl RET sigreturn JUSTRETURN
> 766 xl CALL gettimeofday(0x7fffffffde90,0)
> 766 xl RET gettimeofday 0
> 766 xl CALL read(0xc,0x802807014,0x3e8)
> 766 xl GIO fd 12 read 0 bytes
> ""
> 766 xl RET read 0
> 766 xl CALL write(0x2,0x7fffffffd0b0,0x7)
> 766 xl GIO fd 2 wrote 7 bytes
> "libxl: "
> 766 xl RET write 7
> 766 xl CALL write(0x2,0x7fffffffd0b0,0x7)
> 766 xl GIO fd 2 wrote 7 bytes
> "error: "
>
>> Did/does commit 916735e1814d "tools/libxl: Restore errnoval behavior for
>> datacopier callback" make any difference?
> No, I still get the same error. I will try to take a closer look into this
> when I finish some pending stuff...
This works on Linux
libxl: debug: libxl_event.c:577:libxl__ev_xswatch_register: watch
w=0xb50870 wpath=/local/domain/11 token=3/0: register slotnum=3
libxl: debug: libxl_event.c:1942:libxl__ao_progress_report: ao 0xb53700:
progress report: ignored
libxl: debug: libxl_bootloader.c:539:bootloader_gotptys: executing
bootloader: /usr/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader
arg: /usr/lib/xen/bin/pygrub
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader
arg: --args=root=/dev/xvda2 ro
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader
arg: --output=/var/run/xen/bootloader.11.out
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader
arg: --output-format=simple0
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader
arg: --output-directory=/var/run/xen/bootloader.11.d
libxl: debug: libxl_bootloader.c:543:bootloader_gotptys: bootloader
arg: /dev/vm/andrewcoop-jessie-disk
libxl: debug: libxl_event.c:514:watchfd_callback: watch w=0xb50870
wpath=/local/domain/11 token=3/0: event epath=/local/domain/11
libxl: debug: libxl_bootloader.c:638:bootloader_finished: bootloader
completed
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained kernel
/var/run/xen/bootloader.11.d/boot_kernel.t2OsEV
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained ramdisk
/var/run/xen/bootloader.11.d/boot_ramdisk.3i2X1b
libxl: debug: libxl_bootloader.c:138:bootloader_result_command:
bootloader output contained args root=/dev/xvda2 ro elevator=noop
root=/dev/xvda2 ro
libxl: debug: libxl_bootloader.c:651:bootloader_finished: bootloader
execution successful
As 7e9ec5 has positively been identified as the cause of the regression
on FreeBSD, that presumably means that FreeBSD won't permit a read() of
an FD which has POLLHUP set.
POSIX appears to specifically allow this action.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2015-04-01 14:36 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-16 13:29 [PATCH v3 1/6] tools/libxl: Introduce min and max macros Ross Lagerwall
2015-03-16 13:29 ` [PATCH v3 2/6] tools/libxl: Update datacopier to support sending data only Ross Lagerwall
2015-03-16 13:29 ` [PATCH v3 3/6] tools/libxl: Avoid overrunning static buffer with prefixdata Ross Lagerwall
2015-03-16 13:29 ` [PATCH v3 4/6] tools/libxl: Allow limiting amount copied by datacopier Ross Lagerwall
2015-03-18 11:12 ` Ian Campbell
2015-03-16 13:29 ` [PATCH v3 5/6] tools/libxl: Extend datacopier to support reading into a buffer Ross Lagerwall
2015-03-18 11:18 ` Ian Campbell
2015-04-01 15:53 ` Ian Jackson
2015-04-01 15:59 ` Ian Campbell
2015-03-16 13:29 ` [PATCH v3 6/6] tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable Ross Lagerwall
2015-03-18 11:31 ` Ian Campbell
2015-03-26 15:20 ` Roger Pau Monné
2015-03-30 10:40 ` Ian Campbell
2015-04-01 10:34 ` Roger Pau Monné
2015-04-01 14:36 ` Andrew Cooper [this message]
2015-04-02 15:03 ` [PATCH 0/3] datacopier POLLHUP fixes " Ian Jackson
2015-04-02 15:04 ` [PATCH 1/3] Revert "tools/libxl: Adjust datacopiers POLLHUP handling when the fd is also readable" Ian Jackson
2015-04-02 15:08 ` Ian Campbell
2015-04-02 15:27 ` Ian Jackson
2015-04-02 15:10 ` Andrew Cooper
2015-04-02 15:11 ` Ian Jackson
2015-04-02 15:04 ` [PATCH 2/3] libxl: Cope with pipes which signal POLLHUP|POLLIN on read eof Ian Jackson
2015-04-02 15:13 ` Ian Campbell
2015-04-02 15:04 ` [PATCH 3/3] libxl: datacopier: Avoid theoretical eof/POLLHUP race Ian Jackson
2015-04-02 15:15 ` Ian Campbell
2015-04-02 16:29 ` Ian Jackson
2015-04-07 11:14 ` Roger Pau Monné
2015-04-07 12:26 ` Ian Jackson
2015-04-02 15:09 ` [PATCH 0/3] datacopier POLLHUP fixes handling when the fd is also readable Ian Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551C0274.3050105@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=Ian.Jackson@eu.citrix.com \
--cc=ian.campbell@citrix.com \
--cc=roger.pau@citrix.com \
--cc=ross.lagerwall@citrix.com \
--cc=wei.liu2@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.