linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <koverstreet@google.com>
To: Zach Brown <zab@zabbo.net>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	tytso@google.com, tj@kernel.org,
	Dave Kleikamp <dave.kleikamp@oracle.com>,
	Dmitry Monakhov <dmonakhov@openvz.org>,
	"Maxim V. Patlasov" <mpatlasov@parallels.com>,
	michael.mesnier@intel.com, jeffrey.d.skirvin@intel.com,
	Martin Petersen <martin.petersen@oracle.com>
Subject: Re: [RFC, PATCH] Extensible AIO interface
Date: Mon, 1 Oct 2012 16:22:35 -0700	[thread overview]
Message-ID: <20121001232235.GH26488@google.com> (raw)
In-Reply-To: <20121001231222.GB14533@lenny.home.zabbo.net>

On Mon, Oct 01, 2012 at 04:12:22PM -0700, Zach Brown wrote:
> On Mon, Oct 01, 2012 at 03:23:41PM -0700, Kent Overstreet wrote:
> > So, I and other people keep running into things where we really need to
> > add an interface to pass some auxiliary... stuff along with a pread() or
> > pwrite().
> 
> Sure.  Martin (cc:ed) will sympathize.
> 
> > A few examples:
> > 
> > * IO scheduler hints...
> > * Cache hints...
> > 
> > * Passing checksums out to userspace. We've got bio integrity, which is
> > a (somewhat) generic interface for passing data checksums between the
> > filesystem and the hardware.
> 
> Hmm, careful here.  I think that in DIF/DIX the checksums are
> per-sector, not per IO, right?  That'd mean that the PAGE_SIZE attr
> limit in this patch would be magically creating different max IO size
> limits on different architectures.  That doesn't seem great.

Not just per sector, Per hardware sector. For passing around checksums
userspace would have to find out the hardware sector size and checksum
type/size via a different interface, and then the attribute would
contain a pointer to a buffer that can hold the appropriate number of
checksums.

> 
> > Hence, AIO attributes.
> 
> I have to be honest: I really don't like tying the interface to AIO, but
> I guess it's the only per-io facility we have today.  It'd be nice to
> include sync O_DIRECT when designing the interface to make sure that it
> is possible to use generic syscalls in the future without running up
> against unexpected problems. 

It'd certainly useful with regular sync IO, I just want to take it
one step at a time particularly since for sync IO we'll probably need
new syscalls.

But yes you're right, it would be good to keep in mind.

> > An iocb_attr has an id field, and a size field - and some amount of data
> > specific to that attribute.
> 
> I'd hope that we can come up with a less fragile interface.  The kernel
> would have to scan the attributes to make sure that there aren't
> malicious sizes.  I only quickly glanced at the loops, but it seemed
> like you could have a 0 size attribute in there and _next() would spin
> forever.

Ouch, yeah that's wrong :/

I don't think there's anything fragile about the basic idea though. Or
do you have some way of improving upon it in mind?

The idea with the size field is that it's just sizeof(the particular
attribute struct), so when userspace is appending attributes it just
sets size = sizeof() and attr_list->size += attr->size.

The kernel is going to have to sanity check the size fields of the
individual attributes anyways to verify the size of the last attr
doesn't extend off the end of the attr list, so I think it makes sense
to keep the current semantics of the size fields and just also check
that the size field is nonzero (actually >= sizeof(struct iocb_attr)).

  reply	other threads:[~2012-10-01 23:22 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-01 22:23 [RFC, PATCH] Extensible AIO interface Kent Overstreet
2012-10-01 23:12 ` Zach Brown
2012-10-01 23:22   ` Kent Overstreet [this message]
2012-10-01 23:44     ` Zach Brown
2012-10-02  0:22       ` Kent Overstreet
2012-10-02 17:43         ` Zach Brown
2012-10-02 21:41           ` Kent Overstreet
2012-10-03  1:41             ` Tejun Heo
2012-10-03  3:00               ` Kent Overstreet
2012-10-03 21:58                 ` Tejun Heo
2012-10-04 19:50                   ` Kent Overstreet
2012-10-02  0:47       ` Kent Overstreet
2012-10-02 22:34     ` Martin K. Petersen
2012-10-02 17:41 ` Jeff Moyer
2012-10-03  0:20   ` Kent Overstreet
2012-10-03  1:28     ` Dave Chinner
2012-10-03  2:41       ` Kent Overstreet
2012-10-04  1:04         ` Dave Chinner
2012-10-03 19:15     ` Jeff Moyer
2012-10-04 19:37       ` Kent Overstreet
2012-10-02 19:34 ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121001232235.GH26488@google.com \
    --to=koverstreet@google.com \
    --cc=dave.kleikamp@oracle.com \
    --cc=dmonakhov@openvz.org \
    --cc=jeffrey.d.skirvin@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=michael.mesnier@intel.com \
    --cc=mpatlasov@parallels.com \
    --cc=tj@kernel.org \
    --cc=tytso@google.com \
    --cc=zab@zabbo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).