From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53115) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atGwc-0006cw-FB for qemu-devel@nongnu.org; Thu, 21 Apr 2016 11:55:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1atGwX-0003tw-FD for qemu-devel@nongnu.org; Thu, 21 Apr 2016 11:55:06 -0400 Received: from plane.gmane.org ([80.91.229.3]:56700) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1atGwX-0003tX-9K for qemu-devel@nongnu.org; Thu, 21 Apr 2016 11:55:01 -0400 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1atGwV-0002B3-7j for qemu-devel@nongnu.org; Thu, 21 Apr 2016 17:54:59 +0200 Received: from barriere.frankfurter-softwarefabrik.de ([217.11.197.1]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 21 Apr 2016 17:54:59 +0200 Received: from lvml by barriere.frankfurter-softwarefabrik.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 21 Apr 2016 17:54:59 +0200 From: Lutz Vieweg Date: Thu, 21 Apr 2016 17:54:48 +0200 Message-ID: References: <20160420021101.GC8684@ad-mail.usersys.redhat.com> <20160420115036.GG6517@noname.str.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit In-Reply-To: Subject: Re: [Qemu-devel] I/O errors reported to guest for raw-image-file backed /dev/vda - but host sees no I/O errors List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: qemu-block@nongnu.org On 04/20/2016 04:38 PM, Lutz Vieweg wrote: > I've now a >> strace -f -p 10727 -e trace=pwrite,pwritev,fdatasync,file -t 2>&1 | gzip -1 -c >trace.gz > attached to the qemu-process. > > If the incident rate stays the same, by tomorrow I should be able > to correlate newly emitted I/O-errors in the guest with that log. Ok, mystery solved: > [pid 18241] 00:17:15 pwritev(16, [{..., 4096}, {..., 4096}], 2, 6585417728) = -1 ENOSPC (No space left on device) > [pid 18241] 00:17:15 pwrite(16, ..., 4096, 6581915648) = -1 ENOSPC (No space left on device) > [pid 18241] 00:17:15 pwrite(16, ..., 4096, 1048576) = -1 ENOSPC (No space left on device) > [pid 18241] 00:17:15 pwrite(16, ..., 4096, 1048576) = -1 ENOSPC (No space left on device) File descriptor fd=16 was associated with a raw image file that actually resides on a btrfs filesystem, a constant-sized 16GB file with attributes set to not use CopyOnWrite semantics. Nevertheless, writes to such files can still yield ENOSPC due to a bug in btrfs: > http://www.spinics.net/lists/linux-btrfs/msg52691.html And indeed, the errors occured exactly at the time a backup procedure was preparing a read-only snapshot with "btrfs subvolume snapshot -r" - so until I can upgrade to a mainline kernel including the fix, I'll pause the qemu process while the "btrfs subvolume snapshot -r" runs. Thanks for the hints. Sorry this turned out to be a btrfs rather than a qemu bug - I was first misled to believe the image was on XFS. Nevertheless, I think qemu could be somewhat more verbose, reporting when and why it stops emulation. Something like a message to the monitor or to standard out would be helpful to start with... Regards, Lutz Vieweg