From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Fengguang Wu <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
linux-mm@kvack.org, Maxim Patlasov <MPatlasov@parallels.com>,
hmh@hmh.eng.br, mel@csn.ul.ie, t.artem@lycos.com,
"Theodore Ts'o" <tytso@mit.edu>, Jens Axboe <axboe@kernel.dk>,
linux-fsdevel@vger.kernel.org
Subject: Re: [patch 15/15] mm: add strictlimit knob
Date: Wed, 31 Jan 2018 14:58:44 -0800 [thread overview]
Message-ID: <20180131145844.6f3ccb03a73266bebddea80c@linux-foundation.org> (raw)
In-Reply-To: <CAJfpeguJyyJ4ix2waeEh9nAC6YoWcxiepBH4iOc1_is3NYChEQ@mail.gmail.com>
On Thu, 7 Dec 2017 11:32:43 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Thu, Dec 7, 2017 at 11:15 AM, Fengguang Wu <fengguang.wu@intel.com> wrote:
> > On Thu, Dec 07, 2017 at 09:50:23AM +0100, Miklos Szeredi wrote:
> >>
> >> On Thu, Dec 7, 2017 at 5:14 AM, Fengguang Wu <fengguang.wu@intel.com>
> >> wrote:
> >>>
> >>> CC fuse maintainer, too.
> >>>
> >>> On Wed, Dec 06, 2017 at 05:09:27PM -0800, Andrew Morton wrote:
> >>>>
> >>>>
> >>>> On Fri, 1 Dec 2017 13:29:28 +0100 Jan Kara <jack@suse.cz> wrote:
> >>>>
> >>>>> On Thu 30-11-17 14:15:58, Andrew Morton wrote:
> >>>>> > From: Maxim Patlasov <MPatlasov@parallels.com>
> >>>>> > Subject: mm: add strictlimit knob
> >>>>> >
> >>>>> > The "strictlimit" feature was introduced to enforce per-bdi dirty
> >>>>> > limits
> >>>>> > for FUSE which sets bdi max_ratio to 1% by default:
> >>>>> >
> >>>>> > http://article.gmane.org/gmane.linux.kernel.mm/105809
> >>>>> >
> >>>>> > However the feature can be useful for other relatively slow or
> >>>>> > untrusted
> >>>>> > BDIs like USB flash drives and DVD+RW. The patch adds a knob to
> >>>>> > enable
> >>>>> > the feature:
> >>>>> >
> >>>>> > echo 1 > /sys/class/bdi/X:Y/strictlimit
> >>>>> >
> >>>>> > Being enabled, the feature enforces bdi max_ratio limit even if
> >>>>> > global
> >>>>> > (10%) dirty limit is not reached. Of course, the effect is not
> >>>>> > visible
> >>>>> > until /sys/class/bdi/X:Y/max_ratio is decreased to some reasonable
> >>>>> > value.
> >>>>>
> >>>>> In principle I have nothing against this and the usecase sounds
> >>>>> reasonable
> >>>>> (in fact I believe the lack of a feature like this is one of reasons
> >>>>> why
> >>>>> desktop automounters usually mount USB devices with 'sync' mount
> >>>>> option).
> >>>>> So feel free to add:
> >>>>>
> >>>>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>>>>
> >>>>
> >>>> Cc Jens, who may be vaguely interested in plans to finally merge this
> >>>> three-year-old patch?
> >>>>
> >>>>
> >>>>
> >>>> From: Maxim Patlasov <MPatlasov@parallels.com>
> >>>> Subject: mm: add strictlimit knob
> >>>>
> >>>> The "strictlimit" feature was introduced to enforce per-bdi dirty limits
> >>>> for FUSE which sets bdi max_ratio to 1% by default:
> >>>>
> >>>> http://article.gmane.org/gmane.linux.kernel.mm/105809
> >>>
> >>>
> >>>
> >>> That link is invalid for now, possibly due to the gmane site rebuild.
> >>> I find an email thread here which looks relevant:
> >>>
> >>> https://sourceforge.net/p/fuse/mailman/message/35254883/
> >>>
> >>> Where Maxim has an interesting point:
> >>>
> >>> > Did any one try increasing the limit and did see any
> >>> better/worse
> >>>>
> >>>> performance ?
> >>>
> >>>
> >>> We've used 20% as default value in OpenVZ kernel for a long while
> >>> (1%
> >>> was not enough to saturate our distributed parallel storage).
> >>>
> >>> So the knob will also enable people to _disable_ the 1% fuse limit to
> >>> increase performance.
> >>>
> >>> So people can use the exposed knob in 2 ways to fit their needs, which
> >>> is in general a good thing.
> >>>
> >>> However the comment in wb_position_ratio() says
> >>>
> >>> Without strictlimit feature, fuse writeback may
> >>> * consume arbitrary amount of RAM because it is accounted in
> >>> * NR_WRITEBACK_TEMP which is not involved in calculating
> >>> "nr_dirty".
> >>>
> >>> How dangerous would that be if some user disabled the 1% fuse limit
> >>> through the exposed knob? Will the NR_WRITEBACK_TEMP effect go far
> >>> beyond the user's expectation (20% max dirty limit)?
> >>>
> >>> Looking at the fuse code, NR_WRITEBACK_TEMP will grow proportional to
> >>> WB_WRITEBACK, which should be throttled when bdi_write_congested().
> >>> The congested flag will be set on
> >>>
> >>> fuse_conn.num_background >= fuse_conn.congestion_threshold
> >>> So it looks NR_WRITEBACK_TEMP will somehow be throttled. Just that
> >>> it's not included in the 20% dirty limit.
> >>
> >>
> >> Only balance_dirty_pages_ratelimited() is going to limit the
> >> generation of dirty pages, I don't think congestion flags will do
> >> that.
> >
> >
> > Right. However my concern is something to limit the generation of
> > fuse's _writeback_ pages.
> >
> > The normal writeback pages are limited in 2 ways:
> >
> > - balance_dirty_pages_ratelimited()'s dirty throttling:
> >
> > nr_dirty + nr_writeback + nr_unstable < global and/or bdi dirty limit
> >
> > - block layer's nr_requests queue limit
> >
> > However fuse's NR_WRITEBACK_TEMP looks special and has none of such
> > limits. The congested bit merely affect the vmscan pageout path.
> >
> > pageout
> > may_write_to_inode
> > inode_write_congested
> > wb_congested
> >
> > I wonder if fuse has its own approach to limit NR_WRITEBACK_TEMP?
> > Either explicitly or implicitly, there has to be some hard limit.
> >
> >> And (AFAICS) for fuse only BDI_CAP_STRICTLIMIT will allow
> >> accounting temp writeback pages when throttling dirty page generation.
> >> So without BDI_CAP_STRICTLIMIT kernel memory use of fuse may explode.
> >> So we probably need a way to force BDI_CAP_STRICTLIMIT (i.e. do not
> >> permit disabling it for fuse).
> >
> >
> > So fuse relies on small nr_dirty. Does fuse impose any explicit or
> > implicit rule that NR_WRITEBACK_TEMP will never exceed (N * nr_dirty)?
> > Otherwise the size of NR_WRITEBACK_TEMP cannot be guaranteed.
> >
> > For example, is it possible for some process (eg. dd) to dirty pages
> > as fast as possible while some other kernel logic to convert PG_dirty
> > to NR_WRITEBACK_TEMP as fast as possible, so that even the 1% bdi
> > strictlimit (which limits PG_dirty rather than NR_WRITEBACK_TEMP)
> > cannot stop all memory being eat up by ever growing NR_WRITEBACK_TEMP?
>
> Hmm, temp pages are still accounted as WB_WRITEBACK until writeback
> finishes. Does that not count towards the dirty limit?
>
This discussion died out and the patch is still "stuck" :(
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
prev parent reply other threads:[~2018-01-31 22:58 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <5a20831e./7a6H+akjTcq4WCk%akpm@linux-foundation.org>
[not found] ` <20171201122928.GD8365@quack2.suse.cz>
[not found] ` <20171206170927.5d40106be6fdc6dc88354b65@linux-foundation.org>
2017-12-07 4:14 ` [patch 15/15] mm: add strictlimit knob Fengguang Wu
2017-12-07 8:50 ` Miklos Szeredi
2017-12-07 10:15 ` Fengguang Wu
2017-12-07 10:32 ` Miklos Szeredi
2018-01-31 22:58 ` Andrew Morton [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180131145844.6f3ccb03a73266bebddea80c@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=MPatlasov@parallels.com \
--cc=axboe@kernel.dk \
--cc=fengguang.wu@intel.com \
--cc=hmh@hmh.eng.br \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=miklos@szeredi.hu \
--cc=t.artem@lycos.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).