From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw1-f67.google.com ([209.85.161.67]:36952 "EHLO mail-yw1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726645AbeJCAnZ (ORCPT ); Tue, 2 Oct 2018 20:43:25 -0400 Received: by mail-yw1-f67.google.com with SMTP id y14-v6so1134848ywa.4 for ; Tue, 02 Oct 2018 10:58:50 -0700 (PDT) MIME-Version: 1.0 References: <20180917201054.3530-1-martin@omnibond.com> In-Reply-To: From: Mike Marshall Date: Tue, 2 Oct 2018 13:58:38 -0400 Message-ID: Subject: Re: [PATCH 00/17] orangefs: page cache To: adilger@dilger.ca Cc: Mike Marshall , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: That seems like one of several writeback errors that might occur... It looks to me like our code would set PG_error through setPageError and then the error should be returned back to the application through close or fsync which seems a little late. I guess this is the kind of writeback error "mess" that Jeff Layton was talking about at LFSMM a couple of years ago... -Mike On Mon, Oct 1, 2018 at 4:03 PM Andreas Dilger wrote: > > On Sep 20, 2018, at 12:31 PM, Mike Marshall wrote: > > > > Using the page cache seems like a game changer for the Orangefs kernel module. > > Workloads with small IO suffer trying to push a parallel filesystem > > with just a handful of bytes at a time. Below, vm2 with Fedora's 4.17 > > has /pvfsmnt mounted from an Orangefs filesystem that is itself running > > on vm2. vm1 with 4.19.0-rc2 plus the Orangefs page cache patch, also has > > its /pvfsmnt mounted from a local Orangefs filesystem. > > Is there some mechanism to prevent the client cache size exceeding the amount > of free space on the filesystem? If not, then the client may write data that > can never be flushed to disk on the server. > > Cheers, Andreas > > > [vm2]$ dd if=/dev/zero of=/pvfsmnt/d.vm2/d.foo/dds.out bs=128 count=4194304 > > 4194304+0 records in > > 4194304+0 records out > > 536870912 bytes (537 MB, 512 MiB) copied, 662.013 s, 811 kB/s > > > > [vm1]$ dd if=/dev/zero of=/pvfsmnt/d.vm1/d.foo/dds.out bs=128 count=4194304 > > 4194304+0 records in > > 4194304+0 records out > > 536870912 bytes (537 MB, 512 MiB) copied, 11.3072 s, 47.5 MB/s > > > > Small IO collects in the page cache until a reasonable amount of > > data is available for writeback. > > > > The trick, it seems, is to improve small IO without harming large IO. > > Aligning writeback sizes, when possible, with the size of the IO buffer > > that the Orangefs kernel module shares with its userspace component seems > > promising on my dinky vm tests. > > > > -Mike > > > > On Mon, Sep 17, 2018 at 4:11 PM Martin Brandenburg wrote: > >> > >> If no major issues are found in review or in our testing, we intend to > >> submit this during the next merge window. > >> > >> The goal of all this is to significantly reduce the number of network > >> requests made to the OrangeFS > >> > >> First the xattr cache is needed because otherwise we make a ton of > >> getxattr calls from security_inode_need_killpriv. > >> > >> Then there's some reorganization so inode changes can be cached. > >> Finally, we enable write_inode. > >> > >> Then remove the old readpages. Next there's some reorganization to > >> support readpage/writepage. Finally, enable readpage/writepage which > >> is fairly straightforward except for the need to separate writes from > >> different uid/gid pairs due to the design of our server. > >> > >> Martin Brandenburg (17): > >> orangefs: implement xattr cache > >> orangefs: do not invalidate attributes on inode create > >> orangefs: simply orangefs_inode_getattr interface > >> orangefs: update attributes rather than relying on server > >> orangefs: hold i_lock during inode_getattr > >> orangefs: set up and use backing_dev_info > >> orangefs: let setattr write to cached inode > >> orangefs: reorganize setattr functions to track attribute changes > >> orangefs: remove orangefs_readpages > >> orangefs: service ops done for writeback are not killable > >> orangefs: migrate to generic_file_read_iter > >> orangefs: implement writepage > >> orangefs: skip inode writeout if nothing to write > >> orangefs: write range tracking > >> orangefs: avoid fsync service operation on flush > >> orangefs: use kmem_cache for orangefs_write_request > >> orangefs: implement writepages > >> > >> fs/orangefs/acl.c | 4 +- > >> fs/orangefs/file.c | 193 ++++-------- > >> fs/orangefs/inode.c | 576 +++++++++++++++++++++++++++------- > >> fs/orangefs/namei.c | 41 ++- > >> fs/orangefs/orangefs-cache.c | 24 +- > >> fs/orangefs/orangefs-kernel.h | 56 +++- > >> fs/orangefs/orangefs-mod.c | 10 +- > >> fs/orangefs/orangefs-utils.c | 181 +++++------ > >> fs/orangefs/super.c | 38 ++- > >> fs/orangefs/waitqueue.c | 18 +- > >> fs/orangefs/xattr.c | 104 ++++++ > >> 11 files changed, 839 insertions(+), 406 deletions(-) > >> > >> -- > >> 2.19.0 > >> > > > Cheers, Andreas > > > > >