From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxim Patlasov Subject: Re: [fuse-devel] [PATCH 2/4] fuse: writepages: crop secondary requests Date: Wed, 9 Oct 2013 19:37:11 +0400 Message-ID: <52557827.3020807@parallels.com> References: <20131002173701.31188.33547.stgit@dhcp-10-30-17-2.sw.ru> <20131002173823.31188.77171.stgit@dhcp-10-30-17-2.sw.ru> <20131003095749.GB14242@tucsk.piliscsaba.szeredi.hu> <524D70FE.5000701@parallels.com> <20131003151432.GE14242@tucsk.piliscsaba.szeredi.hu> <524D925A.8050402@parallels.com> <524D99BC.1030007@parallels.com> <525511CA.5010601@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Cc: fuse-devel , Linux-Fsdevel , Kernel Mailing List To: "miklos@szeredi.hu" Return-path: In-Reply-To: <525511CA.5010601@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On 10/09/2013 12:20 PM, Maxim Patlasov wrote: > Hi Miklos, > > On 10/03/2013 08:22 PM, Maxim Patlasov wrote: >> On 10/03/2013 08:09 PM, Miklos Szeredi wrote: >>> On Thu, Oct 3, 2013 at 5:50 PM, Maxim Patlasov wrote: >>>> On 10/03/2013 07:14 PM, Miklos Szeredi wrote: >>>>> On Thu, Oct 03, 2013 at 05:28:30PM +0400, Maxim Patlasov wrote: >>>>> >>>>>> 1. There is an in-flight primary request with a chain of secondary ones. >>>>>> 2. User calls ftruncate(2) to extend file; fuse_set_nowrite() makes >>>>>> fi->writectr negative and starts waiting for completion of that >>>>>> in-flight request >>>>>> 3. Userspace fuse daemon ACKs the request and fuse_writepage_end() >>>>>> is called; it calls __fuse_flush_writepages(), but the latter does >>>>>> nothing because fi->writectr < 0 >>>>>> 4. fuse_do_setattr() proceeds extending i_size and calling >>>>>> __fuse_release_nowrite(). But now new (increased) i_size will be >>>>>> used as 'crop' arg of __fuse_flush_writepages() >>>>>> >>>>>> stale data can leak to the server. >>>>> So, lets do this then: skip fuse_flush_writepages() and call >>>>> fuse_send_writepage() directly. It will ignore the NOWRITE logic, but >>>>> that's >>>>> okay, this happens rarely and cannot happen more than once in a row. >>>>> >>>>> Does this look good? >>>> Yes, but let's at least add a comment explaining why it's safe. There are >>>> three different cases and what you write above explains only one of them: >>>> >>>> 1st case (trivial): there are no concurrent activities using >>>> fuse_set/release_nowrite. Then we're on safe side because >>>> fuse_flush_writepages() would call fuse_send_writepage() anyway. >>>> 2nd case: someone called fuse_set_nowrite and it is waiting now for >>>> completion of all in-flight requests. Here what you wrote about "happening >>>> rarely and no more than once" is applicable. >>>> 3rd case: someone (e.g. fuse_do_setattr()) is in the middle of >>>> fuse_set_nowrite..fuse_release_nowrite section. The fact that >>>> fuse_set_nowrite returned implies that all in-flight requests were completed >>>> along with all its secondary requests (because we increment writectr for a >>>> secondry before decrementing it for the primary -- that's how >>>> fuse_writepage_end is implemeted). Further requests are blocked by negative >>>> writectr. Hence there cannot be any in-flight requests and no invocations of >>>> fuse_writepage_end while we're in fuse_set_nowrite..fuse_release_nowrite >>>> section. >>>> >>>> It looks obvious now, but I'm not sure we'll able to recollect it later. >>> Added your analysis as a comment and all patches pushed to writepages.v2. >> Great! So I can proceed with re-basing the rest of >> writeback-cache-policy pile to writepages.v2 soon. > More testing (with writeback-cache-policy enabled) revealed another bug > in that implementation. The problem deals with a write(2) extending i_size: > > 1. There is an in-flight primary request now. It was properly cropped > against i_size which was valid then and is valid now. So there is a page > in the request that will be written to the server partially. > 2. write(2) to a distant offset makes a hole and extends i_size. > 3. write(2) populates that whole page by new user data. > 4. Writeback happens and fuse_writepage_in_flight() attaches a secondary > request to the primary request. > 5. fuse_writepage_end() for the primary request calls > fuse_send_writepage() with 'crop' arg equal to "inarg->offset + > inarg->size". But inarg->size was calculated before i_size extension, so > the second request will be cropped as well as primary. The result is > that the tail of secondary request populated by valid actual user data > won't be stored on the server. > > The problem will be hidden by adding fuse_wait_on_page_writeback() to > write_begin fuse method, but the implementation will remain unsafe if we > believe a re-dirty may happen spontaneously. Straightforward solution > would be to crop secondary requests at the time of their queuing (using > actual i_size). Then fuse_send_writepage() would crop further only if > i_size shrunk. Please let me know if you come up with a smarter idea. Sorry for flooding. I've just realized that the problem is actually solved (not "hidden") by adding fuse_wait_on_page_writeback() to write_begin fuse method. No need to rework cropping mechanism again. Thanks, Maxim