Re: RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Al Viro <viro@ZenIV.linux.org.uk>
To: "Dilger, Andreas" <andreas.dilger@intel.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Drokin, Oleg" <oleg.drokin@intel.com>,
	Peng Tao <bergwolf@gmail.com>, "greg@kroah.com" <greg@kroah.com>,
	"Xiong, Jinshan" <jinshan.xiong@intel.com>
Subject: Re: RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict
Date: Tue, 11 Feb 2014 02:40:10 +0000	[thread overview]
Message-ID: <20140211024010.GI18016@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CF1EB9B4.913BD%andreas.dilger@intel.com>

On Tue, Feb 11, 2014 at 12:31:39AM +0000, Dilger, Andreas wrote:

> The off-stack storage of per-thread info is used to avoid stack overflow
> because of the 4kB stack limitation, and to avoid kmalloc/kfree of the
> same data structures for every call into the callpath.  Other code in
> the kernel handles similar problems by having a separate thread do the
> lower level IO, but that adds context switches to the IO path.  I'm not
> a big fan of this, but that is what we live with due to limited stack
> size.

<wry> Reducing the depth of call chains also might help. </wry>
And yes, I understand the motivation; makes keeping track of what's going on
really unpleasant, though...

> >4) cl_io_init() which calls
> >
> >5) cl_io_init0() which possibly calls a bunch of instances of
> >->coo_io_init(),
> >which may or may not return 0; I _hope_ that in this case that's what'll
> >happen and we get back to ll_file_io_generic(), where

Umm...  So am I right about the thing returning 0 in this case and in
case of ->aio_read()?

> >5') cl_io_start() which a calls a bunch (how large in that case?) of
> >
> >6') ->cio_start() instances.  Hopefully that'll be vvp_io_write_start()

Again, is that assumption correct?  I'm not interested in criticizing
overall design, etc.; what I am interested in is what does that code
end up doing wrt carrying iov/iovlen/iocb/position through the entire
thing.

> >which will pass the value of io->u.ci_rw.crw_pos (picked via an
> >overlapping field of union) to generic_file_aio_write().  Which, BTW,
> > updates ->ki_pos. Then we return into cl_io_loop(), where
> >
> >4'') we call cl_io_end(), cl_io_unlock() and
> >
> >5'') cl_io_rw_advance() which increments ....crw_pos, hopefully in sync
> >with what we have in iocb->ki_pos.  And calls a bunch of methods.  And
> >return into cl_io_loop(), where

> >3''') we copy ....crw_pos into iocb->ki_pos.  WTF do we need that?  Hadn't
> >generic_file_aio_write() been good enough?

Is it safe to assume that
	* these two calls of generic_file_aio_{read,write} will always
get pos equal to iocb->ki_pos
	* iocb/iov/iovlen will be unchanged since the moment they'd been
passed to ll_file_aio_{read,write}()
	* ->ki_pos assigment after the call of cl_io_loop() is no-op for
IO_NORMAL case
	* these calls of generic_file_aio_{read,write} will always have
ll_file_aio_{read,write} in the call chain and can't be reached otherwise?

What I want to do is turn ->aio_read()/->aio_write() signature into
kiocb x iov_iter -> ssize_t.  With all instances, including
generic_file_aio_{read,write}() converted to that.   With pos delivered via
iocb and iov/nrsegs via iov_iter.  Supplying that in places where they
are called via method is trivial and for such calls we definitely have
pos == iocb->ki_pos.  Most of the places calling those instances directly
(not via method) are in other instances of the same methods and pass the
arguments unchanged.  These two in lustre are, in fact, the only real
exceptions.  IF the hypotheses above are correct, we can convert those
two - just store iov_iter reference instead of iov/nrsegs in vvp_io_args and
in ccc_io and that'll do the trick.  Probably put iov_iter alongside the
vti_local_iovec as well, but I really wonder if that's worth the trouble -
might be better to use do_sync_read/do_sync_write and be done with that.
Sure, it will slightly increase the stack footprint, but if you are _that_
tight on stack...  Anyway, that's a separate story - we can deal with that
later.

PS: I'm serious about the call graph, ideally along with the stack footprints
of all functions.  At least it would give some idea how much do various
paths contribute...  Of course, it's a real bitch to automate with the
amount of indirect calls you've got there - the tricky part is finding the
set of functions that might be called at given location ;-/

next prev parent reply	other threads:[~2014-02-11  2:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-10 20:16 RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict Dilger, Andreas
2014-02-10 21:29 ` Al Viro
2014-02-10 22:10   ` Al Viro
2014-02-10 22:51     ` Al Viro
2014-02-11  0:31       ` Dilger, Andreas
2014-02-11  2:40         ` Al Viro [this message]
2014-02-11  2:54           ` Drokin, Oleg
2014-02-11  6:55           ` Xiong, Jinshan
2014-02-11 14:25             ` Al Viro
2014-02-11 18:26               ` Xiong, Jinshan
2014-02-11  0:18     ` Xiong, Jinshan
2014-02-11  0:37   ` Dilger, Andreas
2014-02-11  0:51     ` greg
2014-02-11  9:13   ` Christoph Hellwig
2014-02-11 11:01     ` Dilger, Andreas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140211024010.GI18016@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=andreas.dilger@intel.com \
    --cc=bergwolf@gmail.com \
    --cc=greg@kroah.com \
    --cc=hch@infradead.org \
    --cc=jinshan.xiong@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=oleg.drokin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).