linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Minchan Kim <minchan-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Anton Vorontsov
	<anton.vorontsov-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
	Pekka Enberg <penberg-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Leonid Moiseichuk
	<leonid.moiseichuk-xNZwKgViW5gAvxtiuMwx3w@public.gmane.org>,
	KOSAKI Motohiro
	<kosaki.motohiro-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Bartlomiej Zolnierkiewicz
	<b.zolnierkie-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>,
	John Stultz <john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linaro-kernel-cunTk1MwBs8s++Sfvej+rw@public.gmane.org,
	patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org,
	kernel-team-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [RFC v2 0/2] vmevent: A bit reworked pressure attribute + docs + man page
Date: Fri, 26 Oct 2012 11:37:20 +0900	[thread overview]
Message-ID: <20121026023720.GE15767@bbox> (raw)
In-Reply-To: <20121025090813.GA16078@lizard>

On Thu, Oct 25, 2012 at 02:08:14AM -0700, Anton Vorontsov wrote:
> Hello Minchan,
> 
> Thanks a lot for the email!
> 
> On Thu, Oct 25, 2012 at 03:40:09PM +0900, Minchan Kim wrote:
> [...]
> > > What applications (well, activity managers) are really interested in is
> > > this:
> > > 
> > > 1. Do we we sacrifice resources for new memory allocations (e.g. files
> > >    cache)?
> > > 2. Does the new memory allocations' cost becomes too high, and the system
> > >    hurts because of this?
> > > 3. Are we about to OOM soon?
> > 
> > Good but I think 3 is never easy.
> > But early notification would be better than late notification which can kill
> > someone.
> 
> Well, basically these are two fixed (strictly defined) levels (low and
> oom) + one flexible level (med), which meaning can be slightly tuned (but
> we still have a meaningful definition for it).
> 

I mean detection of "3) Are we about to OOM soon" isn't easy.

> So, I guess it's a good start. :)

Absolutely!

> 
> > > And here are the answers:
> > > 
> > > 1. VMEVENT_PRESSURE_LOW
> > > 2. VMEVENT_PRESSURE_MED
> > > 3. VMEVENT_PRESSURE_OOM
> > > 
> > > There is no "high" pressure, since I really don't see any definition of
> > > it, but it's possible to introduce new levels without breaking ABI. The
> > > levels described in more details in the patches, and the stuff is still
> > > tunable, but now via sysctls, not the vmevent_fd() call itself (i.e. we
> > > don't need to rebuild applications to adjust window size or other mm
> > > "details").
> > > 
> > > What I couldn't fix in this RFC is making vmevent_{scanned,reclaimed}
> > > stuff per-CPU (there's a comment describing the problem with this). But I
> > > made it lockless and tried to make it very lightweight (plus I moved the
> > > vmevent_pressure() call to a more "cold" path).
> > 
> > Your description doesn't include why we need new vmevent_fd(2).
> > Of course, it's very flexible and potential to add new VM knob easily but
> > the thing we is about to use now is only VMEVENT_ATTR_PRESSURE.
> > Is there any other use cases for swap or free? or potential user?
> 
> Number of idle pages by itself might be not that interesting, but
> cache+idle level is quite interesting.
> 
> By definition, _MED happens when performance already degraded, slightly,
> but still -- we can be swapping.
> 
> But _LOW notifications are coming when kernel is just reclaiming, so by
> using _LOW notifications + watching for cache level we can very easily
> predict the swapping activity long before we have even _MED pressure.

So, for seeing cache level, we need new vmevent_attr?

> 
> E.g. if idle+cache drops below amount of memory that userland can free,
> we'd indeed like to start freeing stuff (this somewhat resembles current
> logic that we have in the in-kernel LMK).
> 
> Sure, we can read and parse /proc/vmstat upon _LOW events (and that was my
> backup plan), but reporting stuff together would make things much nicer.

My concern is that user can imagine various scenario with vmstat and they might start to
require new vmevent_attr in future and vmevent_fd will be bloated and mm guys should
care of vmevent_vd whenever they add new vmstat. I don't like it. User can do it by
just reading /proc/vmstat. So I support your backup plan.


> 
> Although, I somewhat doubt that it is OK to report raw numbers, so this
> needs some thinking to develop more elegant solution.

Indeed.

> 
> Maybe it makes sense to implement something like PRESSURE_MILD with an
> additional nr_pages threshold, which basically hits the kernel about how
> many easily reclaimable pages userland has (that would be a part of our
> definition for the mild pressure level). So, essentially it will be
> 
> 	if (pressure_index >= oom_level)
> 		return PRESSURE_OOM;
> 	else if (pressure_index >= med_level)
> 		return PRESSURE_MEDIUM;
> 	else if (userland_reclaimable_pages >= nr_reclaimable_pages)
> 		return PRESSURE_MILD;
> 	return PRESSURE_LOW;
> 
> I must admit I like the idea more than exposing NR_FREE and stuff, but the
> scheme reminds me the blended attributes, which we abandoned. Although,
> the definition sounds better now, and we seem to be doing it in the right
> place.
> 
> And if we go this way, then sure, we won't need any other attributes, and
> so we could make the API much simpler.

That's what I want! If there isn't any user who really are willing to use it,
let's drop it. Do not persuade with imaginary scenario because we should be 
careful to introduce new ABI.

> 
> > Adding vmevent_fd without them is rather overkill.
> > 
> > And I want to avoid timer-base polling of vmevent if possbile.
> > mem_notify of KOSAKI doesn't use such timer.
> 
> For pressure notifications we don't use the timers. We also read the

Hmm, when I see the code, timer still works and can notify to user. No?

> vmstat counters together with the pressure, so "pressure + counters"
> effectively turns it into non-timer based polling. :)
> 
> But yes, hopefully we can get rid of the raw counters and timers, I don't
> them it too.

You and i are reaching on a conclusion, at least.

> 
> > I don't object but we need rationale for adding new system call which should
> > be maintained forever once we add it.
> 
> We can do it via eventfd, or /dev/chardev (which has been discussed and
> people didn't like it, IIRC), or signals (which also has been discussed
> and there are problems with this approach as well).
> 
> I'm not sure why having a syscall is a big issue. If we're making eventfd
> interface, then we'd need to maintain /sys/.../ ABI the same way as we
> maintain the syscall. What's the difference? A dedicated syscall is just a

No difference. What I want is just to remove unnecessary stuff in vmevent_fd
and keep it as simple. If we do via /dev/chardev, I expect we can do necessary
things for VM pressure. But if we can diet with vmevent_fd, It would be better.
If so, maybe we have to change vmevent_fd to lowmem_fd or vmpressure_fd.

> simpler interface, we don't need to mess with opening and passing things
> through /sys/.../.
> 
> Personally I don't have any preference (except that I distaste chardev and
> ioctls :), I just want to see pros and cons of all the solutions, and so
> far the syscall seems like an easiest way? Anyway, I'm totally open to
> changing it into whatever fits best.

Yeb. Interface stuff isn't a big concern for low memory notification so I'm not
against it stronlgy, too.

Thanks, Anton.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2012-10-26  2:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-22 11:19 [RFC v2 0/2] vmevent: A bit reworked pressure attribute + docs + man page Anton Vorontsov
2012-10-22 11:21 ` [RFC 1/2] vmevent: Implement pressure attribute Anton Vorontsov
2012-10-24  9:03   ` Pekka Enberg
     [not found]     ` <alpine.LFD.2.02.1210241159590.13035-XMdqyYT0w3YmYvmMESoHnA@public.gmane.org>
2012-10-25  2:23       ` Anton Vorontsov
2012-10-25  8:38         ` Minchan Kim
2012-10-22 11:22 ` [RFC 2/2] man-pages: Add man page for vmevent_fd(2) Anton Vorontsov
2012-10-25  6:40 ` [RFC v2 0/2] vmevent: A bit reworked pressure attribute + docs + man page Minchan Kim
2012-10-25  6:44   ` Pekka Enberg
     [not found]     ` <CAOJsxLGsjTe13WjY_Q=BLBELwQXOjuwo7PiEKwONHUfR4mQmig-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-25  8:53       ` Minchan Kim
2012-10-25  9:08   ` Anton Vorontsov
2012-10-25  9:23     ` Anton Vorontsov
2012-10-26  2:37     ` Minchan Kim [this message]
2012-10-27  1:02       ` Anton Vorontsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121026023720.GE15767@bbox \
    --to=minchan-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=anton.vorontsov-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=b.zolnierkie-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org \
    --cc=john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=kernel-team-z5hGa2qSFaRBDgjK7y7TUQ@public.gmane.org \
    --cc=kosaki.motohiro-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=leonid.moiseichuk-xNZwKgViW5gAvxtiuMwx3w@public.gmane.org \
    --cc=linaro-kernel-cunTk1MwBs8s++Sfvej+rw@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mgorman-l3A5Bk7waGM@public.gmane.org \
    --cc=patches-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org \
    --cc=penberg-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).