linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: scst_vdisk: Resize-fsync() race fails IO
       [not found] <56EBEFAA.9040501@profitbricks.com>
@ 2016-03-21 15:22 ` Sebastian Parschauer
       [not found]   ` <56F20A8C.4010202@vlnb.net>
  0 siblings, 1 reply; 2+ messages in thread
From: Sebastian Parschauer @ 2016-03-21 15:22 UTC (permalink / raw)
  To: linux-scsi
  Cc: Scst-devel, Bart Van Assche, Sebastian Parschauer,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 4021 bytes --]

On 18.03.2016 13:08, Sebastian Parschauer wrote:
> Hi,
> 
> we are live resizing LVM LVs below SCST/SRP but we only increase the size.
> 
> Writing to /sys/kernel/scst_tgt/devices/$DEV/resync_size with a new
> device size races against fsync() from the SRP initiator so that IO is
> failed. Resizing SCST devices only with reads/writes going on works.
> 
> With the vdisk_fileio handler this is quite sporadic but with
> vdisk_blockio it can be triggered immediately.
> 
> I'm attaching my LV resizing test script and the fsync() script run on
> the initiator on top of ib_srp devices.
> 
> I suspect an issue with usage of virt_dev->file_size in
> vdisk_synchronize_cache() at
> https://sourceforge.net/p/scst/svn/HEAD/tree/trunk/scst/src/dev_handlers/scst_vdisk.c#l1866.
> 
> Cheers,
> Sebastian
> 
> 
> === Details ===
> 
> Error in kernel log:
> 
> [69514.891904] sd 10:0:0:1: Capacity data has changed
> [69514.891962] end_request: I/O error, dev sdc, sector 0
> 
> fsync script output:
> 
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> OSError: [Errno 5] Input/output error
> 

+CC: linux-scsi, hch

Hi SCSI developers,

I've debugged this further with a v4.5 kernel and full SCSI command
logging enabled on the initiator side. It seems to be a SCSI issue. SCSI
commands succeed but result 8000002 is reported to the upper driver
which seems to fail IO.

What does result 8000002 mean?

Please help me to fix the issue.

Thanks,
Sebastian

dmesg output of the IO error with SCSI debugging:

> [   84.450906] sd 6:0:0:1: [sdc] sd_open
> [   84.451008] sd 6:0:0:1: scsi_block_when_processing_errors: rtn: 1
> [   84.451084] sd 6:0:0:1: [sdc] tag#1 Send: scmd 0xffff880210357080
> [   84.451143] sd 6:0:0:1: [sdc] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
> [   84.456081] sd 6:0:0:1: [sdc] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> [   84.456209] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 10 00 00 08 00
> [   84.456295] sd 6:0:0:1: [sdc] tag#0 scsi host busy 2 failed 0
> [   84.456374] sd 6:0:0:1: Notifying upper driver of completion (result 0)
> [   84.456438] sd 6:0:0:1: [sdc] tag#0 sd_done: completed 4096 of 4096 bytes
> [   84.456498] sd 6:0:0:1: [sdc] tag#0 8 sectors total, 4096 bytes done.
> [   84.456595] sd 6:0:0:1: Capacity data has changed
> [   84.456612] sd 6:0:0:1: [sdc] sd_setup_read_write_cmnd: block=280, count=8
> [   84.456616] sd 6:0:0:1: [sdc] block=280
> [   84.456620] sd 6:0:0:1: [sdc] reading 8/8 512 byte blocks.
> [   84.456628] sd 6:0:0:1: [sdc] tag#0 Send: scmd 0xffff8800ca41ed80
> [   84.456634] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 18 00 00 08 00
> [   84.457003] sd 6:0:0:1: [sdc] tag#1 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> [   84.457090] sd 6:0:0:1: [sdc] tag#1 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
> [   84.457211] sd 6:0:0:1: [sdc] tag#1 Sense Key : Unit Attention [current] 
> [   84.457272] sd 6:0:0:1: [sdc] tag#1 Add. Sense: Capacity data has changed
> [   84.457331] sd 6:0:0:1: [sdc] tag#1 scsi host busy 2 failed 0
> [   84.457387] sd 6:0:0:1: Notifying upper driver of completion (result 8000002)
> [   84.457445] sd 6:0:0:1: [sdc] tag#1 sd_done: completed 0 of 0 bytes
> [   84.457503] sd 6:0:0:1: [sdc] tag#1 0 sectors total, 0 bytes done.
> [   84.457562] blk_update_request: I/O error, dev sdc, sector 0
> [   84.457637] sd 6:0:0:1: [sdc] tag#0 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> [   84.457757] sd 6:0:0:1: [sdc] tag#0 CDB: Read(10) 28 00 00 00 01 18 00 00 08 00
> [   84.457842] sd 6:0:0:1: [sdc] tag#0 scsi host busy 1 failed 0
> [   84.457899] sd 6:0:0:1: Notifying upper driver of completion (result 0)
> [   84.457959] sd 6:0:0:1: [sdc] tag#0 sd_done: completed 4096 of 4096 bytes
> [   84.458019] sd 6:0:0:1: [sdc] tag#0 8 sectors total, 4096 bytes done.
> [   84.458111] sd 6:0:0:1: [sdc] sd_setup_read_write_cmnd: block=288, count=8
> [   84.458180] sd 6:0:0:1: [sdc] block=288

[-- Attachment #2: resize_vol1.sh.txt --]
[-- Type: text/plain, Size: 523 bytes --]

#!/bin/bash

set -e
#set -x

FILE1="/dev/storage/vol1"
INC=4

do_resize() {
    FILE=$1
    DEV=`basename $FILE`
    echo "FILE: $FILE"
    SIZE=`blockdev --getsize64 $FILE`
    let "SIZE /= (1024 * 1024)"
    echo "SIZE: $SIZE"
    let "SIZE += INC"
    echo "NEW_SIZE: $SIZE"
    lvresize -L "+${INC}M" $FILE >/dev/null
    SIZE=`blockdev --getsize64 $FILE`
    let "SIZE /= (1024 * 1024)"
    echo "SET_SIZE: $SIZE"
    echo 1 > /sys/kernel/scst_tgt/devices/$DEV/resync_size
}

while [ 1 ]; do
    do_resize $FILE1
done

[-- Attachment #3: test_resize_fsync.sh.txt --]
[-- Type: text/plain, Size: 218 bytes --]

#!/bin/bash

# test resize on storage while we do fsync()s all the time
# on the SRP initiator

DEV=/dev/sdc

RC=0
while [ $RC -eq 0 ]; do
    python -c "import os; os.fsync(os.open('$DEV', os.O_RDWR))"
    RC=$?
done

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: sd: Resize-fsync() race fails IO
       [not found]       ` <56F368EA.1010607@vlnb.net>
@ 2016-03-24  9:46         ` Sebastian Parschauer
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Parschauer @ 2016-03-24  9:46 UTC (permalink / raw)
  To: Vladislav Bolkhovitin
  Cc: Christoph Hellwig, Bart Van Assche, Scst-devel, linux-scsi

On 24.03.2016 05:11, Vladislav Bolkhovitin wrote:
[snip]
>>>> +CC: linux-scsi, hch
>>>>
>>>> Hi SCSI developers,
>>>>
>>>> I've debugged this further with a v4.5 kernel and full SCSI command
>>>> logging enabled on the initiator side. It seems to be a SCSI issue. SCSI
>>>> commands succeed but result 8000002 is reported to the upper driver
>>>> which seems to fail IO.
>>>>
>>>> What does result 8000002 mean?
>>>
>>> It is driver_byte() DID_RESET and status_byte() CHECK_CONDITION (see the corresponding
>>> kernel's functions). It sounds like you have timeouts and resets somewhere.
>>
>> Thanks for the hint! But it is definitely not DID_RESET. It is
>> DRIVER_SENSE and CHECK_CONDITION then. The host byte is DID_OK.
>>
>> If Unit Attention and "Capacity data has changed" is part of a read or a
>> write, then the result is the same but this is handled correctly and IO
>> is not failed. But with a zero bytes fsync()/flush resulting in
>> SYNCHRONIZE_CACHE_16, it fails IO as the sd driver doesn't handle this
>> correctly it seems.
> 
> I'm not sure, what is not handled correctly? Zero length SYNCHRONIZE_CACHE is perfectly
> legal, and vdisk_fsync() should never see data_len 0.

The issue is located in the SCSI stack. I think the problem is located
in the sd driver. It can't handle zero length SYNCHRONIZE_CACHE together
with UA and "Capacity data changed". The race exists to which SCSI
command UA is put to.

> There should be no race around virt_dev->file_size, because assignments to 64-bit
> integers are atomic, the compiler supposed to align it to 64-bit boundaries and it's
> never assigned to any invalid value. At worst, accesses to it should be covered by
> ACCESS_ONCE(), but I'm not sure it is really needed, so would prefer to keep it away
> from the fast path. Although, probably, vdisk_synchronize_cache() is not too fast path.

SCST is fine. I couldn't break it with my tests. It succeeds the
commands but the sd driver fails IO because of the sense data.

So discussion should be moved to the linux-scsi list again.
I've changed the prefix in the subject from "scst_vdisk" to "sd".

[snip]


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-03-24  9:46 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <56EBEFAA.9040501@profitbricks.com>
2016-03-21 15:22 ` scst_vdisk: Resize-fsync() race fails IO Sebastian Parschauer
     [not found]   ` <56F20A8C.4010202@vlnb.net>
     [not found]     ` <56F25C8F.9040801@profitbricks.com>
     [not found]       ` <56F368EA.1010607@vlnb.net>
2016-03-24  9:46         ` sd: " Sebastian Parschauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).