From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maxim Patlasov Subject: Re: [PATCH] fuse: hotfix truncate_pagecache() issue Date: Thu, 29 Aug 2013 17:01:06 +0400 Message-ID: <521F4612.9090708@parallels.com> References: <20130828121920.23965.78383.stgit@maximpc.sw.ru> <20130829092533.GA18044@tucsk.piliscsaba.szeredi.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: , , , , , , To: Miklos Szeredi Return-path: In-Reply-To: <20130829092533.GA18044@tucsk.piliscsaba.szeredi.hu> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Hi, 08/29/2013 01:25 PM, Miklos Szeredi =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > On Wed, Aug 28, 2013 at 04:21:46PM +0400, Maxim Patlasov wrote: >> The way how fuse calls truncate_pagecache() from fuse_change_attribu= tes() >> is completely wrong. Because, w/o i_mutex held, we never sure whethe= r >> 'oldsize' and 'attr->size' are valid by the time of execution of >> truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as = we >> released fc->lock in the middle of fuse_change_attributes(), we comp= letely >> loose control of actions which may happen with given inode until we = reach >> truncate_pagecache. The list of potentially dangerous actions includ= es mmap-ed >> reads and writes, ftruncate(2) and write(2) extending file size. >> >> The typical outcome of doing truncate_pagecache() with outdated argu= ments is >> data corruption from user point of view. This is (in some sense) acc= eptable >> in cases when the issue is triggered by a change of the file on the = server >> (i.e. externally wrt fuse operation), but it is absolutely intolerab= le in >> scenarios when a single fuse client modifies a file without any exte= rnal >> intervention. A real life case I discovered by fsx-linux looked like= this: >> >> 1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sen= ds >> FUSE_SETATTR to the server synchronously, but before getting fc->loc= k ... >> 2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_= LOOKUP >> to the server synchronously, then calls fuse_change_attributes(). Th= e latter >> updates i_size, releases fc->lock, but before comparing oldsize vs a= ttr->size.. >> 3. fuse_do_setattr() from the first step proceeds by acquiring fc->l= ock and >> updating attributes and i_size, but now oldsize is equal to outarg.a= ttr.size >> because i_size has just been updated (step 2). Hence, fuse_do_setatt= r() >> returns w/o calling truncate_pagecache(). >> 4. As soon as ftruncate(2) completes, the user extends file size by = write(2) >> making a hole in the middle of file, then reads data from the hole e= ither by >> read(2) or mmap-ed read. The user expects to get zero data from the = hole, but >> gets stale data because truncate_pagecache() is not executed yet. >> >> The patch is a hotfix resolving the issue in a simplistic way: let's= skip >> dangerous i_size update and truncate_pagecache if an operation chang= ing file >> size is in progress. This simplistic approach looks correct for the = cases >> w/o external changes. And to handle them properly, more sophisticate= d and >> intrusive techniques (e.g. NFS-like one) would be required. I'd like >> to postpone it until the issue is well discussed on the mailing list= (s). > Thanks for the analysis! > > Okay, what about this even more simplistic approach? > > Not tested, but I think it addresses the very crux of the issue: not = truncating > the page cache even though we should. The patch looks fine, but it solves only one side of the problem=20 (exactly what you formulated above). Another side is opposite:=20 truncating page cache too late, when the state of inode changed=20 significantly. The beginning may be as in the scenario above:=20 fuse_dentry_revalidate() discovered that i_size changed (due to our own= =20 fuse_do_setattr()) and is going to call truncate_pagecache() for some=20 'new_size' it believes valid right now. But by the time that particular= =20 truncate_pagecache() is called, a lot of things may happen: 1) fuse_do_setattr() called truncate_pagecache() according to your patc= h 2) the file was extended either by write(2) or ftruncate(2) or fallocat= e(2). 3) mmap-ed write make a page in the extended region dirty The result will be the lost of data user wrote on the step '3)'. (my=20 patch solves the issue at least for all cases w/o external changes) > > AFAICS there's no such issue with write(2) or fallocate(2). But I ha= ven't > thought about this very deeply. I added bits to the fuse_perform_write() to address that other side of=20 the issue. Fixing fallocate is also required, but I postponed it until=20 you include that another fix for fallocate (incorrect use of=20 fuse_set_nowrite()). Thanks, Maxim