linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Hannes Reinecke <hare@suse.de>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Chuck Lever <chuck.lever@oracle.com>, Ben Myers <bpm@sgi.com>,
	<lsf-pc@lists.linux-foundation.org>,
	<linux-fsdevel@vger.kernel.org>, <linux-scsi@vger.kernel.org>,
	<martin.petersen@oracle.com>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Subject: Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
Date: Thu, 7 Feb 2013 14:08:32 +0200	[thread overview]
Message-ID: <51139940.3000902@panasas.com> (raw)
In-Reply-To: <51138FA0.1080507@suse.de>

On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>
>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>>
>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm interested in discussing how to pass protection information to and from
>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>
>>>>>> I read that some work has already been done in this area but have not been able
>>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>>> get the guard and application tags back out to backup applications such as
>>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>>> is also interesting.
>>>>>
>>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>
>>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>>> but afaik that won't affect the crc or the app tag.
>>>>>
>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>
>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>>
>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>>
>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>> coding hasn't happened.  I do think we're better off with some kind of
>>> explicit API than some magic state on the file.  I mean, even something
>>> like:
>>>
>>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>> 			      const void *pi, size_t pi_count);
>>>
>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>> probably plays nicer with buffered I/O.
>>
>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>> and all the other plumbing necessary to make that happen...
>>
>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>> 		       int iovcnt, long long offset, const void *pi,
>> 		       size_t pi_count);
>>
> This is also what I've envisioned.
> Updating io_prep / async I/O is reasonably easy as its been using a 
> separate structure for passing in the I/O details.
> 
> Normal read/write calls don't really map as you simply don't have 
> enough parameter to feed PI information into the kernel.
> So for that you'd need to invent a new interface / syscall.
> 
> For aio we just need to add additional fields to an existing structure.
> 
> So yeah, I'd be interested in that discussion as well.
> 

Me too, in multiple fronts. It's part of my general concern about
   "things we would like for user-mode servers"

I think that the current aio and libaio Interface is broken for a long
time, for multitude of reasons. For instance the nested structure definitions
are COMPAT broken, and lots of missing pieces. (For example search in archives
for why bsg does not support sg-lists.)

And there are all these additions that everyone wants on top, that call for
a new interface anyway.

So I would like to see a deep fixup of this interface, with an aio version2
that can take into considerations, all of future needs including these
above. Kernel code will be very happy to be implemented with the new, interface
and a COMPAT layer could be put in place for the old interface.

All interested parties should bring to the table what is the extension/changes
they need. And we can try and union all of them together.

(My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
 I know that qemu was wanting this for a while as well as the multitude of
 user-mode servers)

Thanks
Boaz

> Cheers,
> 
> Hannes
> 


  reply	other threads:[~2013-02-07 12:08 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-06 19:51 [LSF/MM TOPIC][ATTEND] protection information and userspace Ben Myers
2013-02-06 20:24 ` Darrick J. Wong
2013-02-06 20:34   ` Chuck Lever
2013-02-07  9:40     ` Joel Becker
2013-02-07 10:01       ` Darrick J. Wong
2013-02-07 11:27         ` Hannes Reinecke
2013-02-07 12:08           ` Boaz Harrosh [this message]
2013-02-07 12:16             ` Boaz Harrosh
2013-02-07 12:33               ` Hannes Reinecke
2013-02-07 12:54                 ` Boaz Harrosh
2013-02-07 12:29             ` Bart Van Assche
2013-02-07 12:47               ` Boaz Harrosh
2013-02-07 16:19             ` Jeff Moyer
2013-02-07 17:27               ` Zach Brown
2013-02-07 17:36                 ` Joel Becker
2013-02-07 21:04                   ` J. Bruce Fields
2013-02-08  9:38                     ` Joel Becker
2013-02-07 19:12       ` Martin K. Petersen
2013-02-08  9:36         ` Joel Becker
2013-02-07 19:09   ` Martin K. Petersen
2013-02-07 23:45     ` Darrick J. Wong
2013-02-07 23:59       ` Martin K. Petersen
2013-02-07 19:20 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51139940.3000902@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=bpm@sgi.com \
    --cc=chuck.lever@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=hare@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).