qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Sam Bobroff <sam.bobroff@au1.ibm.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Question about nonblocking stderr and lost logs
Date: Mon, 14 Dec 2015 16:19:50 +1100	[thread overview]
Message-ID: <20151214051950.GA23879@tungsten.ozlabs.ibm.com> (raw)
In-Reply-To: <20151210083722.GB4222@stefanha-x1.localdomain>

On Thu, Dec 10, 2015 at 04:37:22PM +0800, Stefan Hajnoczi wrote:
> On Mon, Dec 07, 2015 at 02:22:18PM +1100, Sam Bobroff wrote:
> > On Fri, Dec 04, 2015 at 02:33:01PM +0800, Stefan Hajnoczi wrote:
> > > On Mon, Nov 30, 2015 at 04:34:47PM +1100, Sam Bobroff wrote:
> > > > Hi QEMU programmers,
> > > > 
> > > > While doing some experimental work on QEMU that has involved adding a lot of
> > > > new log messages (using qemu_log_mask()), I've discovered that under some
> > > > conditions a lot of my log messages go missing.
> > > > 
> > > > I've tracked the issue down to qemu_logfile being left at the default (stderr)
> > > > (so when not using -D) and according to strace what is happening is that the
> > > > debug messages are being passed to write() but write() is returning EWOULDBLOCK
> > > > and the messags are dropped.
> > > > 
> > > > This seems to be happening because stderr is being set non-blocking (which is a
> > > > bit odd to me), and it seems like NONBLOCK is being set as qmp_chardev_add() adds a
> > > > device for stdout (yes stdout, not stderr; perhaps file descriptors have been
> > > > dup'd by that point?).
> > > > 
> > > > Is this by design to prevent a slow console from blocking QEMU? If not, should
> > > > I delve further and try to prevent non-blocking being set on stderr?
> > > > 
> > > > (Unfortunately I don't have a replication for this using an unmodified QEMU but
> > > > I suspect I could find one if necessary.)
> > > 
> > > This sounds like a bug.  stderr should be blocking.
> > > 
> > > But it's specific to your QEMU build or your command-line.  Here are my
> > > results on an x86 host:
> > > 
> > > $ qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
> > >                      -drive if=virtio,cache=none,file=test.img,format=raw
> > > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > > pos:	0
> > > flags:	0100002
> > > mnt_id:	22
> > > 
> > > The flags are O_RDWR | O_LARGEFILE.
> > > 
> > > O_NONBLOCK would have been 04000 (octal).
> > > 
> > > Please retry with qemu.git/master and post your QEMU command-line.
> > > 
> > > Stefan
> > 
> > Hi Stefan,
> > 
> > Thanks for checking this! I think I've worked out what's happening and it's a
> > bit complicated ;-)
> > 
> > It appears that the behaviour is not directly related to the command line or
> > build but rather the way the shell sets up the environment.
> > 
> > Here's a test very similar to yours using origin/master QEMU on my Debian
> > stable machine, with bash as my shell:
> > 
> > $ qemu-system-x86_64 -S -vga none -nographic
> > ( In another virtual console: )
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:    0
> > flags:  0104002
> > mnt_id: 19
> > 
> > So I seem to have O_NONBLOCK set on stderr.
> > 
> > However, if I perform that test by backgrounding QEMU and using the same
> > console:
> > $ qemu-system-x86_64 -S -vga none -nographic &
> > [1] 4970
> > [1]+  Stopped                 qemu-system-x86_64 -S -nographic -vga none
> > $ cat /proc/$(pgrep qemu)/fdinfo/2
> > pos:    0
> > flags:  0100002
> > mnt_id: 19
> > 
> > Hmm!
> > 
> > After a bit of reading, it seems like this is caused by the fact that bash,
> > when there is no redirection, sets up fds 0, 1 and 2 by dup'ing a single file
> > descriptor. I found this surprising!
> > 
> > A small test program that sets O_NONBLOCK on fd 1 and checks for it on fd 2
> > seems to corroborate it:
> > 
> > #include <stdio.h>
> > #include <fcntl.h>
> > #include <assert.h>
> > 
> > int main() {
> >         int f1, f2;
> > 
> >         f1 = fcntl(1, F_GETFL, 0);
> >         f2 = fcntl(2, F_GETFL, 0);
> >         assert(!(f1 & O_NONBLOCK));
> >         assert(!(f2 & O_NONBLOCK));
> >         assert(!fcntl(1, F_SETFL, f1 | O_NONBLOCK));
> >         if (fcntl(2, F_GETFL, 0) & O_NONBLOCK)
> >                 fprintf(stderr, "File descriptors are duplicates.\n");
> >         else
> >                 fprintf(stderr, "File descriptors are separate.\n");
> >         return 0;
> > }
> > 
> > gcc -Wall foo.c
> > $ ./a.out
> > File descriptors are duplicates.
> > $ ./a.out > /dev/null
> > File descriptors are separate.
> > 
> > The nonblocking flag is always (?) being set on stdin and stdout during
> > qemu_chr_open_stdio(), but it doesn't affect stderr unless they are dup'd.
> > 
> > So this is probably not going to occur often, and certainly not when running
> > under libvirt or other tools. But if it is actually what is happening, I'm not
> > sure what to do about it. Any ideas?
> 
> This is unfortunate.  I wonder if opening /proc/fd/<n> is just a dup or
> if it actually creates separate file status flags.

Calling open() always creates a new "file description" (which contains file
status flags), so it's certainly a way to approach this.

I've testing doing this if we're going to use stderr as the qemu_logfile:

	dup2(open("/proc/self/fd/2", O_WRONLY), 2)

... in do_qemu_set_log() and it fixes the issue, however...

* Doing the dup2() causes it to fix any other fprintf()s to stderr, but
  I feel bad for changing the fd that's inside the stderr "FILE *" "underneath
  it". Would it be better to do the following...
	qemu_logfile = fopen("/proc/self/fd/2", "a")
  ... even though that will leave other fprintf()s unfixed. (I think that is
  probably OK.)
* We could also unconditionally do the dup2(), maybe early in main().
* I'm not sure which platforms other than modern Linux support
  /proc/self/fd/<n>. (Just fixing it for Linux seems a lot better than
  nothing.)
* We could print a warning if stderr is nonblocking and we can't work around
  it. This seems like a good idea even if don't fix it for Linux.
* Alternatively, we could take this approach to stdin and stdout as they are
  added to the char driver and set nonblocking. This somehow feels more
  invasive, although I'm not sure why.
* We could also handle it by converting the qemu_log_mask calls from fprintf()
  to sprintf() + write(), as we'd then be able to handle EAGAIN correctly. This
  doesn't feel any cleaner to me.

Cheers,
Sam.

  reply	other threads:[~2015-12-14  5:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-30  5:34 [Qemu-devel] Question about nonblocking stderr and lost logs Sam Bobroff
2015-12-04  6:33 ` Stefan Hajnoczi
2015-12-07  3:22   ` Sam Bobroff
2015-12-10  8:37     ` Stefan Hajnoczi
2015-12-14  5:19       ` Sam Bobroff [this message]
2015-12-14  7:29         ` Stefan Hajnoczi
2015-12-16  0:53           ` Sam Bobroff
2015-12-22  6:43             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151214051950.GA23879@tungsten.ozlabs.ibm.com \
    --to=sam.bobroff@au1.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).