All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, axboe@kernel.dk,
	linux-scsi@vger.kernel.org,
	"James E.J. Bottomley" <JBottomley@parallels.com>
Subject: Re: [PATCH] sg_io: allow UNMAP and WRITE SAME without CAP_SYS_RAWIO
Date: Tue, 11 Sep 2012 21:24:32 +0200	[thread overview]
Message-ID: <504F8FF0.3000408@redhat.com> (raw)
In-Reply-To: <20120911191325.GU7677@google.com>

Il 11/09/2012 21:13, Tejun Heo ha scritto:
> Hello, Paolo.
> 
> On Tue, Sep 11, 2012 at 08:54:03PM +0200, Paolo Bonzini wrote:
>>> On Tue, Sep 11, 2012 at 07:56:53PM +0200, Paolo Bonzini wrote:
>>>> Understood; unfortunately, there is another major user of it
>>>> (virtualization).  If you are passing "raw" LUNs down to a virtual
>>>> machine, there's no possibility at all to use a properly encapsulated
>>>
>>> Is there still command filtering issue when you're passing "raw" LUNs
>>> down?
>>
>> Yes, the passing down is just a userland program that gets SCSI
>> commands from the guest, sends them via SG_IO, and passes back the
>> result.  If the userland program is unprivileged (it usually is), then
>> you go through the filter.
> 
> Could being able to bypass the filters for this "you own this LUN" be
> a solution?  Or is it that we still need command filtering for
> whatever reason?

Yes, it could be.  Enabling/disabling the filters from a privileged
program and passing the unfiltered fd via SCM_RIGHTS would be enough.

>> This is the userland for virtio-scsi (the kernel part of virtio-scsi is just
>> a driver running in the guest).  It can run in two mode: it can do its own
>> SCSI emulation, or it can just relay CDBs and their results.
>>
>> It can (and does) use higher-level services if SCSI emulation is done in
>> userland.  In that case, trim/discard can become a BLKDISCARD or a fallocate
>> for example.  However, in this case userland doesn't do any emulation and in
>> fact doesn't even need to know that this CDB is a discard.
> 
> Couldn't it intercept some of them - e.g. RWs and discards?
> What's the benifit / use case of doing pure bypass?

Basically, using the same storage technology for bare metal and
virtualized systems.  IMHO losing sense data is a no-no, but the above
solution could be feasible too.

> Would the benefits be strong enough to justify whole bpf cdb filtering?

If we can get a simpler solution that is okay with kernel maintainers,
I'm all for it.

>>> Hmmm?  This was about discard, no?
>>
>> One example of block layer interfaces that I want to add is BLKPING, so
>> that you can see if the NAS is reachable.  Then SCSI emulation can map
>> the "test unit ready" command to BLKPING.  There's a handful of such
>> ioctls that would be useful, such as BLKDISCARD itself.
> 
> Can't you make use of the existing disk events mechanism for that?
> Block layer already knows how to watch readiness of a device and tell
> the userland about it via uevent.

How?  But anyway i don't want to divert the discussion from the actual
topic...

Paolo

  reply	other threads:[~2012-09-11 19:24 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-20 16:30 [PATCH] sg_io: allow UNMAP and WRITE SAME without CAP_SYS_RAWIO Paolo Bonzini
2012-08-01 15:53 ` Paolo Bonzini
2012-08-28 11:04   ` Paolo Bonzini
2012-09-05 14:41     ` [Ping^3] " Paolo Bonzini
2012-09-05 20:18       ` Ric Wheeler
2012-09-06  6:31         ` Paolo Bonzini
2012-09-06 11:31           ` Ric Wheeler
2012-09-06 11:49             ` Paolo Bonzini
2012-09-06 12:08               ` Ric Wheeler
2012-09-06 12:36                 ` Paolo Bonzini
2012-09-06 14:20                   ` Lukáš Czerner
2012-09-11 16:59 ` Tejun Heo
2012-09-11 17:56   ` Paolo Bonzini
2012-09-11 18:29     ` Tejun Heo
2012-09-11 18:54       ` Paolo Bonzini
2012-09-11 19:13         ` Tejun Heo
2012-09-11 19:24           ` Paolo Bonzini [this message]
2012-09-11 20:01             ` Tejun Heo
2012-09-11 21:50               ` Paolo Bonzini
2012-09-11 22:02                 ` Tejun Heo
2012-09-11 22:10                   ` Paolo Bonzini
2012-09-11 22:13                     ` Tejun Heo
2012-09-12  8:05     ` James Bottomley
2012-09-12  8:18       ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=504F8FF0.3000408@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=JBottomley@parallels.com \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.