From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Maxim V. Patlasov" Subject: Re: [PATCH v2 00/14] fuse: An attempt to implement a write-back cache policy Date: Tue, 15 Jan 2013 19:20:42 +0400 Message-ID: <50F573CA.90402@parallels.com> References: <20121116170123.3196.93431.stgit@maximpc.sw.ru> <50C89A78.4010309@parallels.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Kirill Korotaev , "fuse-devel@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , James Bottomley , "viro@zeniv.linux.org.uk" , "linux-fsdevel@vger.kernel.org" , Pavel Emelianov To: "miklos@szeredi.hu" Return-path: In-Reply-To: <50C89A78.4010309@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Hi Miklos, 12/12/2012 06:53 PM, Maxim V. Patlasov =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > Hi Miklos, > > 11/16/2012 09:04 PM, Maxim Patlasov =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> Hi, >> >> This is the second iteration of Pavel Emelyanov's patch-set implemen= ting >> write-back policy for FUSE page cache. Initial patch-set description= was >> the following: >> >> One of the problems with the existing FUSE implementation is that it= =20 >> uses the >> write-through cache policy which results in performance problems on=20 >> certain >> workloads. E.g. when copying a big file into a FUSE file the cp=20 >> pushes every >> 128k to the userspace synchronously. This becomes a problem when the= =20 >> userspace >> back-end uses networking for storing the data. >> >> A good solution of this is switching the FUSE page cache into a=20 >> write-back policy. >> With this file data are pushed to the userspace with big chunks=20 >> (depending on the >> dirty memory limits, but this is much more than 128k) which lets the= =20 >> FUSE daemons >> handle the size updates in a more efficient manner. >> >> The writeback feature is per-connection and is explicitly=20 >> configurable at the >> init stage (is it worth making it CAP_SOMETHING protected?) When the= =20 >> writeback is >> turned ON: >> >> * still copy writeback pages to temporary buffer when sending a=20 >> writeback request >> and finish the page writeback immediately >> >> * make kernel maintain the inode's i_size to avoid frequent i_size=20 >> synchronization >> with the user space >> >> * take NR_WRITEBACK_TEMP into account when makeing=20 >> balance_dirty_pages decision. >> This protects us from having too many dirty pages on FUSE >> >> The provided patchset survives the fsx test. Performance measurement= s=20 >> are not yet >> all finished, but the mentioned copying of a huge file becomes=20 >> noticeably faster >> even on machines with few RAM and doesn't make the system stuck (the= =20 >> dirty pages >> balancer does its work OK). Applies on top of v3.5-rc4. >> >> We are currently exploring this with our own distributed storage=20 >> implementation >> which is heavily oriented on storing big blobs of data with extremel= y=20 >> rare meta-data >> updates (virtual machines' and containers' disk images). With the=20 >> existing cache >> policy a typical usage scenario -- copying a big VM disk into a clou= d=20 >> -- takes way >> too much time to proceed, much longer than if it was simply scp-ed=20 >> over the same >> network. The write-back policy (as I mentioned) noticeably improves=20 >> this scenario. >> Kirill (in Cc) can share more details about the performance and the=20 >> storage concepts >> details if required. >> >> Changed in v2: >> - numerous bugfixes: >> - fuse_write_begin and fuse_writepages_fill and=20 >> fuse_writepage_locked must wait >> on page writeback because page writeback can extend beyond the= =20 >> lifetime of >> the page-cache page >> - fuse_send_writepages can end_page_writeback on original page=20 >> only after adding >> request to fi->writepages list; otherwise another writeback ma= y=20 >> happen inside >> the gap between end_page_writeback and adding to the list >> - fuse_direct_io must wait on page writeback; otherwise data=20 >> corruption is possible >> due to reordering requests >> - fuse_flush must flush dirty memory and wait for all writeback=20 >> on given inode >> before sending FUSE_FLUSH to userspace; otherwise FUSE_FLUSH i= s=20 >> not reliable >> - fuse_file_fallocate must hold i_mutex around FUSE_FALLOCATE an= d=20 >> i_size update; >> otherwise a race with a writer extending i_size is possible >> - fix handling errors in fuse_writepages and fuse_send_writepage= s >> - handle i_mtime intelligently if writeback cache is on (see patch= =20 >> #7 (update i_mtime >> on buffered writes) for details. >> - put enabling writeback cache under fusermount control; (see moun= t=20 >> option >> 'allow_wbcache' introduced by patch #13 (turn writeback cache on= )) >> - rebased on v3.7-rc5 > > Any feedback on this version (v2) would be appreciated. Heard nothing from you for two months. Any feedback would still be=20 appreciated. Thanks, Maxim > > Thanks, > Maxim > >> >> Thanks, >> Maxim >> >> --- >> >> Maxim Patlasov (14): >> fuse: Linking file to inode helper >> fuse: Getting file for writeback helper >> fuse: Prepare to handle short reads >> fuse: Prepare to handle multiple pages in writeback >> fuse: Connection bit for enabling writeback >> fuse: Trust kernel i_size only >> fuse: Update i_mtime on buffered writes >> fuse: Flush files on wb close >> fuse: Implement writepages and write_begin/write_end callback= s >> fuse: fuse_writepage_locked() should wait on writeback >> fuse: fuse_flush() should wait on writeback >> fuse: Fix O_DIRECT operations vs cached writeback misorder >> fuse: Turn writeback cache on >> mm: Account for WRITEBACK_TEMP in balance_dirty_pages >> >> >> fs/fuse/dir.c | 51 ++++ >> fs/fuse/file.c | 523=20 >> +++++++++++++++++++++++++++++++++++++++++---- >> fs/fuse/fuse_i.h | 20 ++ >> fs/fuse/inode.c | 98 ++++++++ >> include/uapi/linux/fuse.h | 1 >> mm/page-writeback.c | 3 >> 6 files changed, 638 insertions(+), 58 deletions(-) >> > > >