From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict Date: Mon, 10 Feb 2014 22:51:30 +0000 Message-ID: <20140210225130.GH18016@ZenIV.linux.org.uk> References: <20140210212929.GF18016@ZenIV.linux.org.uk> <20140210221030.GG18016@ZenIV.linux.org.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Christoph Hellwig , "linux-fsdevel@vger.kernel.org" , "Drokin, Oleg" , Peng Tao , "greg@kroah.com" To: "Dilger, Andreas" Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:37321 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752111AbaBJWvh (ORCPT ); Mon, 10 Feb 2014 17:51:37 -0500 Content-Disposition: inline In-Reply-To: <20140210221030.GG18016@ZenIV.linux.org.uk> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, Feb 10, 2014 at 10:10:30PM +0000, Al Viro wrote: > Ugh... Sorry, I misread that code. Why the devil do you have the > pos argument passed to lustre_generic_file_{read,write}() by address, > when both proceed to dereference it and pass the value on? Egads. Correct me if I'm wrong, please, but it looks like you have 1) ->aio_write() stash its arguments in an off-stack struct and pass it and &iocb->ki_pos to 2) ll_file_io_generic() which calls 3) cl_io_rw_init() which copied ki_pos value passed to it into io->u.ci_rw.crw_pos (in another off-stack struct) and calls 4) cl_io_init() which calls 5) cl_io_init0() which possibly calls a bunch of instances of ->coo_io_init(), which may or may not return 0; I _hope_ that in this case that's what'll happen and we get back to ll_file_io_generic(), where 3') we stash iocb and iov/iovlen into the third off-stack structure (cio) and call 4') cl_io_loop() where we (in a loop) call cl_io_iter_init(), cl_io_lock() and 5') cl_io_start() which a calls a bunch (how large in that case?) of 6') ->cio_start() instances. Hopefully that'll be vvp_io_write_start() which will pass the value of io->u.ci_rw.crw_pos (picked via an overlapping field of union) to generic_file_aio_write(). Which, BTW, updates ->ki_pos. Then we return into cl_io_loop(), where 4'') we call cl_io_end(), cl_io_unlock() and 5'') cl_io_rw_advance() which increments ....crw_pos, hopefully in sync with what we have in iocb->ki_pos. And calls a bunch of methods. And return into cl_io_loop(), where 4''') we call cl_io_iter_fini() (_another_ pile of methods called) and possibly repeat everything from (4') on (apparently only if nothing had been done so far). Eventually we return into ll_file_io_generic() and there 3''') we copy ....crw_pos into iocb->ki_pos. WTF do we need that? Hadn't generic_file_aio_write() been good enough? Is that correct? I have *not* traced it into all methods that might've been called in process - stuff called from cl_io_loop() is chock-full of those. Have I missed anything relevant wrt file position handling in there? You guys really should be forced to hand-draw a call graph for that thing. Both as a punishment and a deterrent...