From: Peter Lieven <pl@kamp.de>
To: ronnie sahlberg <ronniesahlberg@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Michael Tokarev <mjt@tls.msk.ru>,
qemu-devel <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary
Date: Tue, 02 Sep 2014 21:30:34 +0200 [thread overview]
Message-ID: <54061ADA.7040109@kamp.de> (raw)
In-Reply-To: <CAN05THQ6=u2LgGC=qsVPzuKn5nbHP9JJW0Sw5SiGE8Bw9JLpYw@mail.gmail.com>
Looking at the code, is it possible that not the guest is causing trouble here, but
multiwrite_merge code?
>From what I see the only limit it has when merging requests is the number of IOVs.
Any thoughts?
Mine are:
a) Introducing bs->bl.max_request_size and set merge = 0 if the result would be too big. Default
max request size to 32768 sectors (see below).
b) Hardcoding the limit in multiwrite_merge for now limiting the merged size to 16MB (32768 sectors).
Which is the limit we already use in bdrv_co_discard and bdrv_co_write_zeroes if we don't know
better.
Peter
Am 02.09.2014 um 17:28 schrieb ronnie sahlberg:
> That is one big request. I assume the device reports "no limit" in
> the VPD page so we can not state it is the guest/application going
> beyond the allowed limit?
>
>
> I am not entirely sure what meaning the target assigns to Protocol
> Error means here.
> It could be that ~100M is way higher than MaxBurstLength ? What is
> the MaxBurstLength that was reported by the server during login
> negotiation?
> If so, we should make libiscsi check the maxburstlength and fail the
> request early. We would still fail the I/O so it will not really solve
> anything much
> but at least we should not send the request to the server.
>
> Best would probably be to take the smallest of a non-zero
> Block-Limits.max_transfer_length and iscsi-MaxBurstLength/block-size
> and pass this back to the guest in the emulated Block-Limits-VPD.
> At least then you have tried to tell the guest "never do SCSI I/O
> bigger than this".
>
> I.e. even if the target reports BlockLimits.MaxTransferLength == 0 ==
> no limit to QEMU, QEMU should probably take the iscsi transport limit
> into account and pass this to the guest
> by setting the emulated BlockLimits page it passes to scale to the
> maximum that MaxBurstLength allows.
>
>
> Then if BTRFS or SG_IO in the guest ignores the BlockLimits it is
> clearly a guest problem.
>
> (A different interpretation for ProtocolError could be the mismatch
> between the iscsi expected data transfer length and the scsi transfer
> length, but that should result in residuals, not protocol error.)
>
>
>
> Hypothetically there could be targets that support really huge
> MaxBurstLengths > 32MB. For those you probably want to switch to
> WRITE16 when the SCSI transfer length goes > 0xffff.
>
> - if (iscsilun->use_16_for_rw) {
> + if (iscsilun->use_16_for_rw || num_sectors > 0xffff) {
>
>
> regards
> ronnie sahlberg
>
> On Mon, Sep 1, 2014 at 8:21 AM, Peter Lieven <pl@kamp.de> wrote:
>> On 17.06.2014 13:46, Paolo Bonzini wrote:
>>
>> Il 17/06/2014 13:37, Peter Lieven ha scritto:
>>
>> On 17.06.2014 13:15, Paolo Bonzini wrote:
>>
>> Il 17/06/2014 08:14, Peter Lieven ha scritto:
>>
>>
>>
>> BTW, while debugging a case with a bigger storage supplier I found
>> that open-iscsi seems to do exactly this undeterministic behaviour.
>> I have a 3TB LUN. If I access < 2TB sectors it uses READ10/WRITE10 and
>> if I go beyond 2TB it changes to READ16/WRITE16.
>>
>>
>> Isn't that exactly what your latest patch does for >64K sector writes? :)
>>
>>
>> Not exactly, we choose the default by checking the LUN size. 10 Byte for
>> < 2TB and 16 Byte otherwise.
>>
>>
>> Yeah, I meant introducing the non-determinism.
>>
>> My latest patch makes an exception if a request is bigger than 64K
>> sectors and
>> switches to 16 Byte requests. These would otherwise end in an I/O error.
>>
>>
>> It could also be split at the block layer, like we do for unmap. I think
>> there's also a maximum transfer size somewhere in the VPD, we could to
>> READ16/WRITE16 if it is >64K sectors.
>>
>>
>> It seems that there might be a real world example where Linux issues >32MB
>> write requests. Maybe someone familiar with btrfs can advise.
>> I see iSCSI Protocol Errors in my logs:
>>
>> Sep 1 10:10:14 libiscsi:0 PDU header: 01 a1 00 00 00 01 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 07 06 8f 30 00 00 00 00 06 00 00 00 0a 2a 00 01 09 9e
>> 50 00 47 98 00 00 00 00 00 00 00 [XXX]
>> Sep 1 10:10:14 qemu-2.0.0: iSCSI: Failed to write10 data to iSCSI lun.
>> Request was rejected with reason: 0x04 (Protocol Error)
>>
>> Looking at the headers the xferlen in the iSCSI PDU is 110047232 Byte which
>> is 214936 sectors.
>> 214936 % 65536 = 18328 which is exactly the number of blocks in the SCSI
>> WRITE10 CDB.
>>
>> Can someone advise if this is something that btrfs can cause
>> or if I have to
>> blame the customer that he issues very big write requests with Direct I/O?
>>
>> The user sseems something like this in the log:
>> [34640.489284] BTRFS: bdev /dev/vda2 errs: wr 8232, rd 0, flush 0, corrupt
>> 0, gen 0
>> [34640.490379] end_request: I/O error, dev vda, sector 17446880
>> [34640.491251] end_request: I/O error, dev vda, sector 5150144
>> [34640.491290] end_request: I/O error, dev vda, sector 17472080
>> [34640.492201] end_request: I/O error, dev vda, sector 17523488
>> [34640.492201] end_request: I/O error, dev vda, sector 17536592
>> [34640.492201] end_request: I/O error, dev vda, sector 17599088
>> [34640.492201] end_request: I/O error, dev vda, sector 17601104
>> [34640.685611] end_request: I/O error, dev vda, sector 15495456
>> [34640.685650] end_request: I/O error, dev vda, sector 7138216
>>
>> Thanks,
>> Peter
>>
next prev parent reply other threads:[~2014-09-02 19:30 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-04 13:47 [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary Peter Lieven
2014-06-04 14:00 ` ronnie sahlberg
2014-06-04 14:43 ` Peter Lieven
2014-06-04 14:54 ` ronnie sahlberg
2014-06-05 9:12 ` Michael Tokarev
2014-06-05 9:27 ` Peter Lieven
2014-06-17 6:14 ` Peter Lieven
2014-06-17 11:15 ` Paolo Bonzini
2014-06-17 11:37 ` Peter Lieven
2014-06-17 11:46 ` Paolo Bonzini
2014-06-17 11:50 ` Peter Lieven
2014-06-17 13:45 ` Peter Lieven
2014-09-01 15:21 ` Peter Lieven
2014-09-02 15:28 ` ronnie sahlberg
2014-09-02 18:14 ` Peter Lieven
2014-09-02 19:30 ` Peter Lieven [this message]
2014-09-03 8:09 ` Peter Lieven
2014-09-03 12:31 ` Stefan Hajnoczi
2014-09-03 13:13 ` Peter Lieven
2014-09-03 14:17 ` ronnie sahlberg
2014-09-03 14:18 ` Paolo Bonzini
2014-09-03 14:48 ` ronnie sahlberg
2014-09-03 19:29 ` Peter Lieven
2014-06-04 15:31 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54061ADA.7040109@kamp.de \
--to=pl@kamp.de \
--cc=kwolf@redhat.com \
--cc=mjt@tls.msk.ru \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ronniesahlberg@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.