On 05/06/2014 12:25 PM, Stoo Davies wrote: > I'm doing some powerfail recovery testing on a storage array over iSCSI. > Host is RHEL 6.4 kernel 2.6.32-358.el6.x86_64. > > With fio 2.1.2 -> 2.1.4 the job file below rides through the disks going > away, and continues I/O after they come back, without reporting any errors. > With fio 2.1.5 -> 2.1.8 when the disks come back fio immediately reports > a meta verification error. > > I captured a trace with an finisar analyzer, and can see that after the > disks come back and the host logs back in, a read is issued for an lba > which was never written to. > Since I don't see verification errors outside of the powerfail testing, > I suspect fio isn't correctly handling failed writes during the time the > disks are unavailable. > > The trace file is rather large, but I can make it available if you need > to see it. > > [whee] > bs=8k > thread=4 > time_based=1 > runtime=864000 > readwrite=randrw > direct=1 > iodepth=128 > ioengine=libaio > size=100% > verify=meta > do_verify=1 > verify_fatal=1 > verify_dump=1 > verify_backlog=8192 > buffer_compress_percentage=95 > ignore_error=ENODEV:EIO,ENODEV:EIO,ENODEV:EIO > filename=/dev/mapper/lun0 > . > . > filename=/dev/mapper/lun9 2.1.5 did indeed change when the IO was logged for verification, so that does explain why it fails for you now. That's a problem. Can you try with this patch? I'm not going to commit it yet, I want to carefully audit all paths to ensure we also unlog or trim an io_piece, if we don't fully complete it. -- Jens Axboe