From: Maxim Levitsky <mlevitsk@redhat.com>
To: Tom Yan <tom.ty89@gmail.com>,
qemu-block@nongnu.org, kwolf@redhat.com, mreitz@redhat.com,
qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg
Date: Sun, 06 Sep 2020 15:53:11 +0300 [thread overview]
Message-ID: <f5f5a6e2f4cc75c84e1364ab1276ee22aa5e3748.camel@redhat.com> (raw)
In-Reply-To: <CAGnHSE=qnj8a8qaS6eiFdevp6ya0GjqXHuDgAV12B0tEAajddg@mail.gmail.com>
On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> Maybe you want to add some condition for this:
> https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> Or not clamp it at all.
>
> On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> > to have assumed that the only "SCSI Passthrough" is `-device
> > scsi-generic`, while the fact is there's also `-device scsi-block`
> > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > max_sectors is necessary to it (more precisely, hw_max_sectors might
> > what matters, but BLKSECTGET reports max_sectors, so).
> >
> > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> > right approach to fix the original issue it addresses. (It should for
> > example ignore the max_transfer if it will never matter in to it, or
> > overrides it in certain cases; when I glimpsed over
> > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> > it could be file-posix problem when it is reporting the right thing,
> > regardless of whether "removing" the code helps.)
> >
> > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > will probably bring even more unexpected problems, because they are
> > literally different sets of things behind the scene / in the kernel.
Yes, I did overlook the fact that scsi-block is kind of hybrid passthough device,
doing SG_IO for some things and regular IO for others.
I don't think that my approach was wrong way to fix the problem, but as you found
out, abusing 'bs->sg' hack (which I would be very happy to remove completely)
wasn't a good idea.
I actualy was aware of scsi-block and that it does SG_IO but it
was forgotten some where on the way.
So in summary what the problem is and what I think is the right solution:
Each qemu block driver exposes block limits and assumes that they are the same
for two IO interfaces the block driver can expose:
1. Regular qemu blk_pread/pwrite alike functions
2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on
host block devices/sg char devices, and by iscsi
The problem is that these two interfaces can have different block limits.
I don't know about iscsi, but for files, doing regular IO is always unlimited,
since it passes through the kernel block layer and segemented there on
demand which is faster that doing it in userspace, while SG_IO is passed as is
to the underlying SCSI device and lacks this segmentation.
Regardless of how NBD uses these limits, I think that these limits should be correctly
exposed by the block drivers, and therefore I propose is that each qemu block driver
would expose a pair of block limits.
One for the regular block IO, and other for SG_IO.
Then block driver clients (like scsi devices that you mention, nbd, etc)
can choose which limit to use, depending on which IO api they use.
The scsi-generic, and scsi-block can use the SG_IO limits,
while the rest can use the normal (unlimited for file I/O) limits, including the NBD.
Best regards,
Maxim Levitsky
> >
> > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > sg devices have different major/minor than their corresponding
> > > block devices. Using sysfs to get max segments never really worked
> > > for them.
> > >
> > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > which is apparently equivalent to max segments.
> > >
> > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > ---
> > > block/file-posix.c | 17 ++++++++++++++++-
> > > 1 file changed, 16 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 411a23cf99..9e37594145 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> > > #endif
> > > }
> > >
> > > +static int sg_get_max_segments(int fd)
> > > +{
> > > +#ifdef SG_GET_SG_TABLESIZE
> > > + long max_segments = 0;
> > > +
> > > + if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > + return max_segments;
> > > + } else {
> > > + return -errno;
> > > + }
> > > +#else
> > > + return -ENOSYS;
> > > +#endif
> > > +}
> > > +
> > > static int get_max_transfer_length(int fd)
> > > {
> > > #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> > > bs->bl.max_transfer = pow2floor(ret);
> > > }
> > >
> > > - ret = get_max_segments(s->fd);
> > > + ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> > > if (ret > 0) {
> > > bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > > ret * qemu_real_host_page_size);
> > > --
> > > 2.28.0
> > >
next prev parent reply other threads:[~2020-09-06 12:54 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-04 2:09 [PATCH v4 1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits Tom Yan
2020-09-04 2:09 ` [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg Tom Yan
2020-09-06 10:58 ` Tom Yan
2020-09-06 11:04 ` Tom Yan
2020-09-06 12:53 ` Maxim Levitsky [this message]
2020-09-06 15:26 ` Tom Yan
2020-09-06 17:05 ` Maxim Levitsky
2020-09-06 19:50 ` Dmitry Fomichev
2020-09-07 1:29 ` Tom Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f5f5a6e2f4cc75c84e1364ab1276ee22aa5e3748.camel@redhat.com \
--to=mlevitsk@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=tom.ty89@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).