From: Peter Lieven <pl@kamp.de>
To: ronnie sahlberg <ronniesahlberg@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Michael Tokarev <mjt@tls.msk.ru>,
qemu-devel <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary
Date: Tue, 02 Sep 2014 21:30:34 +0200 [thread overview]
Message-ID: <54061ADA.7040109@kamp.de> (raw)
In-Reply-To: <CAN05THQ6=u2LgGC=qsVPzuKn5nbHP9JJW0Sw5SiGE8Bw9JLpYw@mail.gmail.com>
Looking at the code, is it possible that not the guest is causing trouble here, but
multiwrite_merge code?
>From what I see the only limit it has when merging requests is the number of IOVs.
Any thoughts?
Mine are:
a) Introducing bs->bl.max_request_size and set merge = 0 if the result would be too big. Default
max request size to 32768 sectors (see below).
b) Hardcoding the limit in multiwrite_merge for now limiting the merged size to 16MB (32768 sectors).
Which is the limit we already use in bdrv_co_discard and bdrv_co_write_zeroes if we don't know
better.
Peter
Am 02.09.2014 um 17:28 schrieb ronnie sahlberg:
> That is one big request. I assume the device reports "no limit" in
> the VPD page so we can not state it is the guest/application going
> beyond the allowed limit?
>
>
> I am not entirely sure what meaning the target assigns to Protocol
> Error means here.
> It could be that ~100M is way higher than MaxBurstLength ? What is
> the MaxBurstLength that was reported by the server during login
> negotiation?
> If so, we should make libiscsi check the maxburstlength and fail the
> request early. We would still fail the I/O so it will not really solve
> anything much
> but at least we should not send the request to the server.
>
> Best would probably be to take the smallest of a non-zero
> Block-Limits.max_transfer_length and iscsi-MaxBurstLength/block-size
> and pass this back to the guest in the emulated Block-Limits-VPD.
> At least then you have tried to tell the guest "never do SCSI I/O
> bigger than this".
>
> I.e. even if the target reports BlockLimits.MaxTransferLength == 0 ==
> no limit to QEMU, QEMU should probably take the iscsi transport limit
> into account and pass this to the guest
> by setting the emulated BlockLimits page it passes to scale to the
> maximum that MaxBurstLength allows.
>
>
> Then if BTRFS or SG_IO in the guest ignores the BlockLimits it is
> clearly a guest problem.
>
> (A different interpretation for ProtocolError could be the mismatch
> between the iscsi expected data transfer length and the scsi transfer
> length, but that should result in residuals, not protocol error.)
>
>
>
> Hypothetically there could be targets that support really huge
> MaxBurstLengths > 32MB. For those you probably want to switch to
> WRITE16 when the SCSI transfer length goes > 0xffff.
>
> - if (iscsilun->use_16_for_rw) {
> + if (iscsilun->use_16_for_rw || num_sectors > 0xffff) {
>
>
> regards
> ronnie sahlberg
>
> On Mon, Sep 1, 2014 at 8:21 AM, Peter Lieven <pl@kamp.de> wrote:
>> On 17.06.2014 13:46, Paolo Bonzini wrote:
>>
>> Il 17/06/2014 13:37, Peter Lieven ha scritto:
>>
>> On 17.06.2014 13:15, Paolo Bonzini wrote:
>>
>> Il 17/06/2014 08:14, Peter Lieven ha scritto:
>>
>>
>>
>> BTW, while debugging a case with a bigger storage supplier I found
>> that open-iscsi seems to do exactly this undeterministic behaviour.
>> I have a 3TB LUN. If I access < 2TB sectors it uses READ10/WRITE10 and
>> if I go beyond 2TB it changes to READ16/WRITE16.
>>
>>
>> Isn't that exactly what your latest patch does for >64K sector writes? :)
>>
>>
>> Not exactly, we choose the default by checking the LUN size. 10 Byte for
>> < 2TB and 16 Byte otherwise.
>>
>>
>> Yeah, I meant introducing the non-determinism.
>>
>> My latest patch makes an exception if a request is bigger than 64K
>> sectors and
>> switches to 16 Byte requests. These would otherwise end in an I/O error.
>>
>>
>> It could also be split at the block layer, like we do for unmap. I think
>> there's also a maximum transfer size somewhere in the VPD, we could to
>> READ16/WRITE16 if it is >64K sectors.
>>
>>
>> It seems that there might be a real world example where Linux issues >32MB
>> write requests. Maybe someone familiar with btrfs can advise.
>> I see iSCSI Protocol Errors in my logs:
>>
>> Sep 1 10:10:14 libiscsi:0 PDU header: 01 a1 00 00 00 01 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 07 06 8f 30 00 00 00 00 06 00 00 00 0a 2a 00 01 09 9e
>> 50 00 47 98 00 00 00 00 00 00 00 [XXX]
>> Sep 1 10:10:14 qemu-2.0.0: iSCSI: Failed to write10 data to iSCSI lun.
>> Request was rejected with reason: 0x04 (Protocol Error)
>>
>> Looking at the headers the xferlen in the iSCSI PDU is 110047232 Byte which
>> is 214936 sectors.
>> 214936 % 65536 = 18328 which is exactly the number of blocks in the SCSI
>> WRITE10 CDB.
>>
>> Can someone advise if this is something that btrfs can cause
>> or if I have to
>> blame the customer that he issues very big write requests with Direct I/O?
>>
>> The user sseems something like this in the log:
>> [34640.489284] BTRFS: bdev /dev/vda2 errs: wr 8232, rd 0, flush 0, corrupt
>> 0, gen 0
>> [34640.490379] end_request: I/O error, dev vda, sector 17446880
>> [34640.491251] end_request: I/O error, dev vda, sector 5150144
>> [34640.491290] end_request: I/O error, dev vda, sector 17472080
>> [34640.492201] end_request: I/O error, dev vda, sector 17523488
>> [34640.492201] end_request: I/O error, dev vda, sector 17536592
>> [34640.492201] end_request: I/O error, dev vda, sector 17599088
>> [34640.492201] end_request: I/O error, dev vda, sector 17601104
>> [34640.685611] end_request: I/O error, dev vda, sector 15495456
>> [34640.685650] end_request: I/O error, dev vda, sector 7138216
>>
>> Thanks,
>> Peter
>>
next prev parent reply other threads:[~2014-09-02 19:30 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-04 13:47 [Qemu-devel] [PATCH] block/iscsi: use 16 byte CDBs only when necessary Peter Lieven
2014-06-04 14:00 ` ronnie sahlberg
2014-06-04 14:43 ` Peter Lieven
2014-06-04 14:54 ` ronnie sahlberg
2014-06-05 9:12 ` Michael Tokarev
2014-06-05 9:27 ` Peter Lieven
2014-06-17 6:14 ` Peter Lieven
2014-06-17 11:15 ` Paolo Bonzini
2014-06-17 11:37 ` Peter Lieven
2014-06-17 11:46 ` Paolo Bonzini
2014-06-17 11:50 ` Peter Lieven
2014-06-17 13:45 ` Peter Lieven
2014-09-01 15:21 ` Peter Lieven
2014-09-02 15:28 ` ronnie sahlberg
2014-09-02 18:14 ` Peter Lieven
2014-09-02 19:30 ` Peter Lieven [this message]
2014-09-03 8:09 ` Peter Lieven
2014-09-03 12:31 ` Stefan Hajnoczi
2014-09-03 13:13 ` Peter Lieven
2014-09-03 14:17 ` ronnie sahlberg
2014-09-03 14:18 ` Paolo Bonzini
2014-09-03 14:48 ` ronnie sahlberg
2014-09-03 19:29 ` Peter Lieven
2014-06-04 15:31 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54061ADA.7040109@kamp.de \
--to=pl@kamp.de \
--cc=kwolf@redhat.com \
--cc=mjt@tls.msk.ru \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=ronniesahlberg@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).