qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Maxim Levitsky <mlevitsk@redhat.com>
To: Tom Yan <tom.ty89@gmail.com>,
	eblake@redhat.com, pbonzini@redhat.com,  fam@euphon.net,
	anielhb@linux.vnet.ibm.com, kwolf@redhat.com, mreitz@redhat.com
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg
Date: Sun, 06 Sep 2020 20:05:44 +0300	[thread overview]
Message-ID: <46216704388f0c33c18e3a1d54505ab5e71c8cb2.camel@redhat.com> (raw)
In-Reply-To: <CAGnHSEkdTBBdT-Dm1Ztazq-ZxkLU5zq=3kYa_uiqna2uzm0Gkg@mail.gmail.com>

On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote:
> I don't disagree with your proposal, but the thing is, do we even need
> another field/limit for case 1? For example, do we *really* need to
> clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and
> max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind
> of "dynamic" limit?

It depends on if we have other block drivers that have IO segment/max transfer sizes,
that we need to enforce, other than the SG_IO case.

Knowing that the commit that added these limits to file-posix, which I was fixing is relatively
new, I guess there are other cases for these limits.

I'll check this tomorrow.

> 
> Either way I can add another field (`max_pwrite`, maybe?) to
> BlockLimits, as an infrastructure/hint, but what should be switched to
> it, and what value should each block driver set it to, is probably
> beyond what I can take.

Implementation wise, I think I can do this, but I'll wait few days for the
feedback on my proposal.

Thanks for finding this bug!

Best regards,
	Maxim Levitsky

> 
> IMHO my patches should be merged first (they are really just fixing a
> regression and some bugs). If anyone finds it necessary and is capable
> and willing to fix the insufficiency of BlockLimits, they can do it
> afterwards with another patch series as an improvement. I can add a
> patch to remove the clamping in nbd/server.c as a perhaps-temporary
> measure to address the original issue, if desired.
> 
> On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com> wrote:
> > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> > > Maybe you want to add some condition for this:
> > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> > > Or not clamp it at all.
> > > 
> > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> > > > to have assumed that the only "SCSI Passthrough" is `-device
> > > > scsi-generic`, while the fact is there's also `-device scsi-block`
> > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > > > max_sectors is necessary to it (more precisely, hw_max_sectors might
> > > > what matters, but BLKSECTGET reports max_sectors, so).
> > > > 
> > > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> > > > right approach to fix the original issue it addresses. (It should for
> > > > example ignore the max_transfer if it will never matter in to it, or
> > > > overrides it in certain cases; when I glimpsed over
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> > > > it could be file-posix problem when it is reporting the right thing,
> > > > regardless of whether "removing" the code helps.)
> > > > 
> > > > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > > > will probably bring even more unexpected problems, because they are
> > > > literally different sets of things behind the scene / in the kernel.
> > 
> > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough device,
> > doing SG_IO for some things and regular IO for others.
> > 
> > I don't think that my approach was wrong way to fix the problem, but as you found
> > out, abusing 'bs->sg' hack (which I would be very happy to remove completely)
> > wasn't a good idea.
> > I actualy was aware of scsi-block and that it does SG_IO but it
> > was forgotten some where on the way.
> > 
> > So in summary what the problem is and what I think is the right solution:
> > 
> > 
> > Each qemu block driver exposes block limits and assumes that they are the same
> > for two IO interfaces the block driver can expose:
> > 
> > 1. Regular qemu blk_pread/pwrite alike functions
> > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on
> >    host block devices/sg char devices, and by iscsi
> > 
> > The problem is that these two interfaces can have different block limits.
> > 
> > I don't know about iscsi, but for files, doing regular IO is always unlimited,
> > since it passes through the kernel block layer and segemented there on
> > demand which is faster that doing it in userspace, while SG_IO is passed as is
> > to the underlying SCSI device and lacks this segmentation.
> > 
> > Regardless of how NBD uses these limits, I think that these limits should be correctly
> > exposed by the block drivers, and therefore I propose is that each qemu block driver
> > would expose a pair of block limits.
> > One for the regular block IO, and other for SG_IO.
> > 
> > Then block driver clients (like scsi devices that you mention, nbd, etc)
> > can choose which limit to use, depending on which IO api they use.
> > The scsi-generic, and scsi-block can use the SG_IO limits,
> > while the rest  can use the normal (unlimited for file I/O) limits, including the NBD.
> > 
> > Best regards,
> >         Maxim Levitsky
> > 
> > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > > sg devices have different major/minor than their corresponding
> > > > > block devices. Using sysfs to get max segments never really worked
> > > > > for them.
> > > > > 
> > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > > > which is apparently equivalent to max segments.
> > > > > 
> > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > > > ---
> > > > >  block/file-posix.c | 17 ++++++++++++++++-
> > > > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > > > index 411a23cf99..9e37594145 100644
> > > > > --- a/block/file-posix.c
> > > > > +++ b/block/file-posix.c
> > > > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> > > > >  #endif
> > > > >  }
> > > > > 
> > > > > +static int sg_get_max_segments(int fd)
> > > > > +{
> > > > > +#ifdef SG_GET_SG_TABLESIZE
> > > > > +    long max_segments = 0;
> > > > > +
> > > > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > > > +        return max_segments;
> > > > > +    } else {
> > > > > +        return -errno;
> > > > > +    }
> > > > > +#else
> > > > > +    return -ENOSYS;
> > > > > +#endif
> > > > > +}
> > > > > +
> > > > >  static int get_max_transfer_length(int fd)
> > > > >  {
> > > > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> > > > >          bs->bl.max_transfer = pow2floor(ret);
> > > > >      }
> > > > > 
> > > > > -    ret = get_max_segments(s->fd);
> > > > > +    ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> > > > >      if (ret > 0) {
> > > > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > > > >                                             ret * qemu_real_host_page_size);
> > > > > --
> > > > > 2.28.0
> > > > > 




  reply	other threads:[~2020-09-06 17:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-04  2:09 [PATCH v4 1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits Tom Yan
2020-09-04  2:09 ` [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg Tom Yan
2020-09-06 10:58   ` Tom Yan
2020-09-06 11:04     ` Tom Yan
2020-09-06 12:53       ` Maxim Levitsky
2020-09-06 15:26         ` Tom Yan
2020-09-06 17:05           ` Maxim Levitsky [this message]
2020-09-06 19:50             ` Dmitry Fomichev
2020-09-07  1:29               ` Tom Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46216704388f0c33c18e3a1d54505ab5e71c8cb2.camel@redhat.com \
    --to=mlevitsk@redhat.com \
    --cc=anielhb@linux.vnet.ibm.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=tom.ty89@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).