From: Paolo Bonzini <pbonzini@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk,
linux-scsi@vger.kernel.org,
"James E.J. Bottomley" <JBottomley@parallels.com>
Subject: Re: [PATCH] sg_io: allow UNMAP and WRITE SAME without CAP_SYS_RAWIO
Date: Tue, 11 Sep 2012 20:54:03 +0200 [thread overview]
Message-ID: <504F88CB.6030105@redhat.com> (raw)
In-Reply-To: <20120911182904.GS7677@google.com>
Il 11/09/2012 20:29, Tejun Heo ha scritto:> Hello, Paolo.
>
> On Tue, Sep 11, 2012 at 07:56:53PM +0200, Paolo Bonzini wrote:
>> Understood; unfortunately, there is another major user of it
>> (virtualization). If you are passing "raw" LUNs down to a virtual
>> machine, there's no possibility at all to use a properly encapsulated
>
> Is there still command filtering issue when you're passing "raw" LUNs
> down?
Yes, the passing down is just a userland program that gets SCSI
commands from the guest, sends them via SG_IO, and passes back the
result. If the userland program is unprivileged (it usually is), then
you go through the filter.
>> The set of use cases is so variable that no single filter can accomodate
>> all of them: high availability people want persistent reservations, NAS
>> people want trim/discard, but these are just two groups. Someone is
>> using a Windows VM to run vendor tools and wants to have access to
>> vendor-specific commands.
>>
>> You can tell this last group to use root, but not everyone else who is
>> already relying on Unix permissions, SELinux and/or device cgroups to
>> confine their virtual machines.
>
> You listed three - HA w/ persistent reservation, NAS w/ trim/discard
> and the third which you said that using root would be fine. Dunno
> much about persistent reservation but I don't see why trim/discard
> can't use existing block layer facilities whether from userland or
> virtio-scsi?
This is the userland for virtio-scsi (the kernel part of virtio-scsi is just
a driver running in the guest). It can run in two mode: it can do its own
SCSI emulation, or it can just relay CDBs and their results.
It can (and does) use higher-level services if SCSI emulation is done in
userland. In that case, trim/discard can become a BLKDISCARD or a fallocate
for example. However, in this case userland doesn't do any emulation and in
fact doesn't even need to know that this CDB is a discard.
Also, if it fails, there's no way to reconstruct the NAS's sense data to
pass it back to the guest. We do a limited amount of "making up" sense
data (for example if a command is filtered, all we get is an errno value;
and we say it was not recognized), but it should really be as simple
and limited as possible.
>> A generic filter (see
>> http://article.gmane.org/gmane.linux.kernel/1312326 for a proposal)
>> would be satisfactory for everyone, but it's also a major undertaking
>> and so far I've not received a single comment about it.
>
> Maybe I'm just not familiar with the problem space but I really hope
> things don't come to that.
Why not? :) (BTW it was suggested by Alan Cox, that's just my proposal for
how to do it). I think that it's a good idea, but it's a big bazooka for
the smaller issue of supporting trim/discard.
>>> So, it wouldn't be a good idea to abuse SG_IO filtering for exposing
>>> trim/discard. It's something which should be retired or at least
>>> severely restricted in time. I don't think we want to be developing
>>> new uses of it.
>>>
>>> I think trim/discards are fairly easy to abstract and common enough to
>>> justify having properly abstracted interface. In fact, we already
>>> have block layer interface for it - BLKDISCARD. If it's lacking,
>>> let's improve that.
>>
>> I do want to improve the block layer interfaces to avoid that people use
>> SG_IO. But unfortunately this is for a completely different use case.
>
> Hmmm? This was about discard, no?
One example of block layer interfaces that I want to add is BLKPING, so
that you can see if the NAS is reachable. Then SCSI emulation can map
the "test unit ready" command to BLKPING. There's a handful of such
ioctls that would be useful, such as BLKDISCARD itself.
But this is for the other direction, where ioctls are not enough accurate.
Paolo
next prev parent reply other threads:[~2012-09-11 18:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-20 16:30 [PATCH] sg_io: allow UNMAP and WRITE SAME without CAP_SYS_RAWIO Paolo Bonzini
2012-08-01 15:53 ` Paolo Bonzini
2012-08-28 11:04 ` Paolo Bonzini
2012-09-05 14:41 ` [Ping^3] " Paolo Bonzini
2012-09-05 20:18 ` Ric Wheeler
2012-09-06 6:31 ` Paolo Bonzini
2012-09-06 11:31 ` Ric Wheeler
2012-09-06 11:49 ` Paolo Bonzini
2012-09-06 12:08 ` Ric Wheeler
2012-09-06 12:36 ` Paolo Bonzini
2012-09-06 14:20 ` Lukáš Czerner
2012-09-11 16:59 ` Tejun Heo
2012-09-11 17:56 ` Paolo Bonzini
2012-09-11 18:29 ` Tejun Heo
2012-09-11 18:54 ` Paolo Bonzini [this message]
2012-09-11 19:13 ` Tejun Heo
2012-09-11 19:24 ` Paolo Bonzini
2012-09-11 20:01 ` Tejun Heo
2012-09-11 21:50 ` Paolo Bonzini
2012-09-11 22:02 ` Tejun Heo
2012-09-11 22:10 ` Paolo Bonzini
2012-09-11 22:13 ` Tejun Heo
2012-09-12 8:05 ` James Bottomley
2012-09-12 8:18 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=504F88CB.6030105@redhat.com \
--to=pbonzini@redhat.com \
--cc=JBottomley@parallels.com \
--cc=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).