From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Maxim V. Patlasov" Subject: Re: [fuse-devel] [PATCH 14/14] mm: Account for WRITEBACK_TEMP in balance_dirty_pages Date: Fri, 26 Apr 2013 12:32:24 +0400 Message-ID: <517A3B98.807@parallels.com> References: <20130401103749.19027.89833.stgit@maximpc.sw.ru> <20130401104250.19027.27795.stgit@maximpc.sw.ru> <51793DE6.3000503@parallels.com> <517956ED.7060102@parallels.com> <20130425204331.GB16238@tucsk.piliscsaba.szeredi.hu> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Cc: Kirill Korotaev , Pavel Emelianov , "fuse-devel@lists.sourceforge.net" , Kernel Mailing List , James Bottomley , Al Viro , Linux-Fsdevel , , Andrew Morton , , , , , , , Andrew Morton To: Miklos Szeredi Return-path: In-Reply-To: <20130425204331.GB16238@tucsk.piliscsaba.szeredi.hu> Sender: owner-linux-mm@kvack.org List-Id: linux-fsdevel.vger.kernel.org Hi Miklos, 04/26/2013 12:43 AM, Miklos Szeredi =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > On Thu, Apr 25, 2013 at 08:16:45PM +0400, Maxim V. Patlasov wrote: >> As Mel Gorman pointed out, fuse daemon diving into >> balance_dirty_pages should not kick flusher judging on >> NR_WRITEBACK_TEMP. Essentially, all we need in balance_dirty_pages >> is: >> >> if (I'm not fuse daemon) >> nr_dirty +=3D global_page_state(NR_WRITEBACK_TEMP); > I strongly dislike the above. The above was well-discussed on mm track of LSF/MM. Everybody seemed to=20 agree with solution above. I'm cc-ing some guys who were involved in=20 discussion, mm mailing list and Andrew as well. For those who don't=20 follow from the beginning here is an excerpt: > 04/25/2013 07:49 PM, Miklos Szeredi =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> On Thu, Apr 25, 2013 at 4:29 PM, Maxim V. Patlasov >> wrote: >>>> diff --git a/mm/page-writeback.c b/mm/page-writeback.c >>>> index 0713bfb..c47bcd4 100644 >>>> --- a/mm/page-writeback.c >>>> +++ b/mm/page-writeback.c >>>> @@ -1235,7 +1235,8 @@ static void balance_dirty_pages(struct address= _space >>>> *mapping, >>>> */ >>>> nr_reclaimable =3D global_page_state(NR_FILE_DIRTY= ) + >>>> >>>> global_page_state(NR_UNSTABLE_NFS); >>>> - nr_dirty =3D nr_reclaimable + >>>> global_page_state(NR_WRITEBACK); >>>> + nr_dirty =3D nr_reclaimable + >>>> global_page_state(NR_WRITEBACK) + >>>> + global_page_state(NR_WRITEBACK_TEMP); >>>> global_dirty_limits(&background_thresh, &dirty_thr= esh); >>> Please drop this patch. As we discussed in LSF/MM, the fix above is c= orrect, >>> but it's not enough: we also need to ensure disregard of NR_WRITEBACK= _TEMP >>> when balance_dirty_pages() is called from fuse daemon. I'll send a se= parate >>> patch-set soon. >> Please elaborate. From a technical perspective "fuse daemon" is very >> hard to define, so anything that relies on whether something came from >> the fuse daemon or not is conceptually broken. > As Mel Gorman pointed out, fuse daemon diving into balance_dirty_pages > should not kick flusher judging on NR_WRITEBACK_TEMP. Essentially, all > we need in balance_dirty_pages is: > > if (I'm not fuse daemon) > nr_dirty +=3D global_page_state(NR_WRITEBACK_TEMP); > > The way how to identify fuse daemon was not thoroughly scrutinized > during LSF/MM. Firstly, I thought it would be enough to set a > per-process flag handling fuse device open. But now I understand that > fuse daemon may be quite a complicated multi-threaded multi-process > construction. I'm going to add new FUSE_NOTIFY to allow fuse daemon > decide when it works on behalf of draining writeout-s. Having in mind > that fuse-lib is multi-threaded, I'm also going to inherit the flag on > copy_process(). Does it make sense for you? > > Also, another patch will put this ad-hoc FUSE_NOTIFY under fusermount > control. This will prevent malicious unprivileged fuse mounts from > setting the flag for malicious purposes. And returning back to the last Miklos' mail... > > What about something like the following untested patch? > > The idea is that fuse filesystems should not go over the bdi limit even= if the > global limit hasn't been reached. This might work, but kicking flusher every time someone write to fuse=20 mount and dives into balance_dirty_pages looks fishy. However, setting=20 ad-hoc inode flag for files on fuse makes much more sense than my=20 approach of identifying fuse daemons (a feeble hope that userspace=20 daemons would notify in-kernel fuse saying "I'm fuse daemon, please=20 disregard NR_WRITEBACK_TEMP for me"). Let's combine our suggestions:=20 mark fuse inodes with AS_FUSE_WRITEBACK flag and convert what you=20 strongly dislike above to: if (test_bit(AS_FUSE_WRITEBACK, &mapping->flags)) nr_dirty +=3D global_page_state(NR_WRITEBACK_TEMP); Thanks, Maxim > > Thanks, > Miklos > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 137185c..195ee45 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -291,6 +291,7 @@ struct inode *fuse_iget(struct super_block *sb, u64= nodeid, > inode->i_flags |=3D S_NOATIME|S_NOCMTIME; > inode->i_generation =3D generation; > inode->i_data.backing_dev_info =3D &fc->bdi; > + set_bit(AS_STRICTLIMIT, &inode->i_data.flags); > fuse_init_inode(inode, attr); > unlock_new_inode(inode); > } else if ((inode->i_mode ^ attr->mode) & S_IFMT) { > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 0e38e13..97f6a0c 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -25,6 +25,7 @@ enum mapping_flags { > AS_MM_ALL_LOCKS =3D __GFP_BITS_SHIFT + 2, /* under mm_take_all_locks= () */ > AS_UNEVICTABLE =3D __GFP_BITS_SHIFT + 3, /* e.g., ramdisk, SHM_LOCK = */ > AS_BALLOON_MAP =3D __GFP_BITS_SHIFT + 4, /* balloon page special ma= p */ > + AS_STRICTLIMIT =3D __GFP_BITS_SHIFT + 5, /* strict dirty limit */ > }; > =20 > static inline void mapping_set_error(struct address_space *mapping, i= nt error) > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index efe6814..91a9e6e 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1226,6 +1226,7 @@ static void balance_dirty_pages(struct address_sp= ace *mapping, > unsigned long dirty_ratelimit; > unsigned long pos_ratio; > struct backing_dev_info *bdi =3D mapping->backing_dev_info; > + int strictlimit =3D test_bit(AS_STRICTLIMIT, &mapping->flags); > unsigned long start_time =3D jiffies; > =20 > for (;;) { > @@ -1250,7 +1251,7 @@ static void balance_dirty_pages(struct address_sp= ace *mapping, > */ > freerun =3D dirty_freerun_ceiling(dirty_thresh, > background_thresh); > - if (nr_dirty <=3D freerun) { > + if (nr_dirty <=3D freerun && !strictlimit) { > current->dirty_paused_when =3D now; > current->nr_dirtied =3D 0; > current->nr_dirtied_pause =3D > @@ -1297,7 +1298,7 @@ static void balance_dirty_pages(struct address_sp= ace *mapping, > } > =20 > dirty_exceeded =3D (bdi_dirty > bdi_thresh) && > - (nr_dirty > dirty_thresh); > + ((nr_dirty > dirty_thresh) || strictlimit); > if (dirty_exceeded && !bdi->dirty_exceeded) > bdi->dirty_exceeded =3D 1; > =20 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org