From mboxrd@z Thu Jan 1 00:00:00 1970 From: Al Viro Subject: Re: RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict Date: Tue, 11 Feb 2014 14:25:27 +0000 Message-ID: <20140211142527.GK18016@ZenIV.linux.org.uk> References: <20140210212929.GF18016@ZenIV.linux.org.uk> <20140210221030.GG18016@ZenIV.linux.org.uk> <20140210225130.GH18016@ZenIV.linux.org.uk> <20140211024010.GI18016@ZenIV.linux.org.uk> <9AB6D806-62F9-4066-85CE-4467426B5671@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: "Dilger, Andreas" , Christoph Hellwig , "linux-fsdevel@vger.kernel.org" , "Drokin, Oleg" , Peng Tao , "greg@kroah.com" To: "Xiong, Jinshan" Return-path: Received: from zeniv.linux.org.uk ([195.92.253.2]:40160 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750934AbaBKOZe (ORCPT ); Tue, 11 Feb 2014 09:25:34 -0500 Content-Disposition: inline In-Reply-To: <9AB6D806-62F9-4066-85CE-4467426B5671@intel.com> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Feb 11, 2014 at 06:55:59AM +0000, Xiong, Jinshan wrote: >> 4) cl_io_init() which calls >>=20 >> 5) cl_io_init0() which possibly calls a bunch of instances of >> ->coo_io_init(), >> which may or may not return 0; I _hope_ that in this case that's wha= t'll >> happen and we get back to ll_file_io_generic(), where >=20 >> Umm... So am I right about the thing returning 0 in this case and i= n >> case of ->aio_read()? >=20 > Yes, cio_init() can return one of the following integers: > =3D 0: IO can go forward > > 0: IO is not necessary > < 0: error occurs >=20 > So it will call aio_read() if cl_io_init() returns zero. Er... The question is the other way round, actually - will it return zero when we arrive there from ll_file_aio_read()? Or could it happen that it returns something positive in that case? > Let=E2=80=99s take an example here, say a file has two stripes on OST= 0 and OST1, and stripe size is 1M, so when 2M data is to be written to = this file starting from zero, then the above code will loop twice to fi= nish this IO. > The first iteration will write [0, 1M) to OST0, and the 2nd one will = write [1M, 2M) to OST1. As a result, generic_file_aio_write() will be c= alled twice specifically. _Twice_? Oh, I see... ccc_io_advance(), right? With iov_iter the whol= e ->cio_advance() thing would die, AFAICS. > Yes. But as you have already noticed, the ppos in ll_file_io_generic(= ) is NOT always &iocb->ki_pos. So it isn=E2=80=99t totally wrong to hav= e >=20 > =E2=80=9C*ppos =3D io->u.ci_wr.wr.crw_pos;=E2=80=9D >=20 > in ll_file_io_generic(), take a look at ll_file_splice_read(). ->splice_read/->splice_write is another part of that story; eventually, I hope to get rid of those guys, when we get polymorphic iov_iter sorte= d out. Short version of the story: turn iov_iter into a tagged structure= with variants; primitives working with it would check the tag to decide what= to do. Think of *BSD struct uio on steroids; AFAIK, Solaris has kept it a= s well. One of the cases would be iovec-based, with or without splitting= the kernel-space iovec into separate case. That's pretty much what *BSD on= e provides and what iov_iter does right now. Another case: array of triples. ->splice_writ= e() would simply set such an array for all non-empty pipe_buffers and pass it to ->write_iter(). That gives a usable generic implementation, suit= able for pretty much all filesystems. _Maybe_ it can even be taught to deal with page stealing, in a way that would allow to kill ->splice_write() = as a method - I didn't get to the point where it would be easy to investig= ate. ->splice_read() ought to use another iov_iter case, where copy_page_to_= iter() would grab a reference to page and plain copy_to_iter() would allocate a page if needed and copy over there. Hell knows; I'm still dealing with preliminary messes and with splice_w= rite side of things. We'll see what set of primitives will it shake down to= =2E So far it's been spread between several fsdevel threads (and some bits = of off-list mail when potentially nasty bugs got caught). I hope to get e= nough of the queue stable enough and large enough to make it worth discussing= =2E Hopefully today or tomorrow there'll be enough meat on it... In any ca= se I'm going to bring it up on LSF/MM. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html