From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Pekka Enberg <penberg@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Anton Vorontsov <anton.vorontsov@linaro.org>,
Minchan Kim <minchan@kernel.org>,
Leonid Moiseichuk <leonid.moiseichuk@nokia.com>,
John Stultz <john.stultz@linaro.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linaro-kernel@lists.linaro.org, patches@linaro.org,
kernel-team@android.com
Subject: Re: [PATCH 3/3] vmevent: Implement special low-memory attribute
Date: Tue, 08 May 2012 03:50:30 -0400 [thread overview]
Message-ID: <4FA8D046.7000808@gmail.com> (raw)
In-Reply-To: <CAOJsxLG1+zhOKgi2Rg1eSoXSCU8QGvHVED_EefOOLP-6JbMDkg@mail.gmail.com>
(5/8/12 3:36 AM), Pekka Enberg wrote:
> On Tue, May 8, 2012 at 10:11 AM, KOSAKI Motohiro
> <kosaki.motohiro@gmail.com> wrote:
>> Ok, sane. Then I take my time a little and review current vmevent code briefly.
>> (I read vmevent/core branch in pekka's tree. please let me know if
>> there is newer repositry)
>
> It's the latest one.
>
> On Tue, May 8, 2012 at 10:11 AM, KOSAKI Motohiro
> <kosaki.motohiro@gmail.com> wrote:
>> 1) sample_period is brain damaged idea. If people ONLY need to
>> sampling stastics, they
>> only need to read /proc/vmstat periodically. just remove it and
>> implement push notification.
>> _IF_ someone need unfrequent level trigger, just use
>> "usleep(timeout); read(vmevent_fd)"
>> on userland code.
>
> That comes from a real-world requirement. See Leonid's email on the topic:
>
> https://lkml.org/lkml/2012/5/2/42
I know, many embedded guys prefer such timer interval. I also have an experience
similar logic when I was TV box developer. but I must disagree. Someone hope
timer housekeeping complexity into kernel. but I haven't seen any justification.
>> 2) VMEVENT_ATTR_STATE_ONE_SHOT is misleading name. That is effect as
>> edge trigger shot. not only once.
>
> Would VMEVENT_ATTR_STATE_EDGE_TRIGGER be a better name?
maybe.
>> 3) vmevent_fd() seems sane interface. but it has name space unaware.
>> maybe we discuss how to harmonize name space feature. No hurry. but we have
>> to think that issue since at beginning.
>
> You mean VFS namespaces? Yeah, we need to take care of that.
If we keep current vmevent_fd() design, we may need to create new namespace concept
likes ipc namespace. current vmevent_fd() is not VFS based.
>> 4) Currently, vmstat have per-cpu batch and vmstat updating makes 3
>> second delay at maximum.
>> This is fine for usual case because almost userland watcher only
>> read /proc/vmstat per second.
>> But, for vmevent_fd() case, 3 seconds may be unacceptable delay. At
>> worst, 128 batch x 4096
>> x 4k pagesize = 2G bytes inaccurate is there.
>
> That's pretty awful. Anton, Leonid, comments?
>
>> 5) __VMEVENT_ATTR_STATE_VALUE_WAS_LT should be removed from userland
>> exporting files.
>> When exporing kenrel internal, always silly gus used them and made unhappy.
>
> Agreed. Anton, care to cook up a patch to do that?
>
>> 6) Also vmevent_event must hide from userland.
>
> Why? That's part of the ABI.
Ahhh, if so, I missed something. as far as I look, vmevent_fd() only depend
on vmevent_config. which syscall depend on vmevent_evennt?
>> 7) vmevent_config::size must be removed. In 20th century, M$ API
>> prefer to use this technique. But
>> They dropped the way because a lot of application don't initialize
>> size member and they can't use it for keeping upper compitibility.
>
> It's there to support forward/backward ABI compatibility like perf
> does. I'm going to keep it for now but I'm open to dropping it when
> the ABI is more mature.
perf api is not intended to use from generic applications. then, I don't
think it will make abi issue. tool/perf is sane, isn't it? but vmevent_fd()
is generic api and we can't trust all userland guy have sane, unfortunately.
>> 8) memcg unaware
>> 9) numa unaware
>> 10) zone unaware
>
> Yup.
>
>> And, we may need vm internal change if we really need lowmem
>> notification. current kernel don't have such info. _And_ there is one more
>> big problem. Currently the kernel maintain memory per
>> zone. But almost all userland application aren't aware zone nor node.
>> Thus raw notification aren't useful for userland. In the other hands, total
>> memory and total free memory is useful? Definitely No!
>> Even though total free memory are lots, system may start swap out and
>> oom invokation. If we can't oom invocation, this feature has serious raison
>> d'etre issue. (i.e. (4), (8), (9) and (19) are not ignorable issue. I think)
>
> I'm guessing most of the existing solutions get away with
> approximations and soft limits because they're mostly used on UMA
> embedded machines.
>
> But yes, we need to do better here.
Hm. If you want vmevent makes depend on CONFIG_EMBEDDED, I have no reason to
complain this feature. At that world, almost all applications _know_ their
system configuration. then I don't think api misuse issue is big matter.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
To: Pekka Enberg <penberg@kernel.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Anton Vorontsov <anton.vorontsov@linaro.org>,
Minchan Kim <minchan@kernel.org>,
Leonid Moiseichuk <leonid.moiseichuk@nokia.com>,
John Stultz <john.stultz@linaro.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linaro-kernel@lists.linaro.org, patches@linaro.org,
kernel-team@android.com
Subject: Re: [PATCH 3/3] vmevent: Implement special low-memory attribute
Date: Tue, 08 May 2012 03:50:30 -0400 [thread overview]
Message-ID: <4FA8D046.7000808@gmail.com> (raw)
In-Reply-To: <CAOJsxLG1+zhOKgi2Rg1eSoXSCU8QGvHVED_EefOOLP-6JbMDkg@mail.gmail.com>
(5/8/12 3:36 AM), Pekka Enberg wrote:
> On Tue, May 8, 2012 at 10:11 AM, KOSAKI Motohiro
> <kosaki.motohiro@gmail.com> wrote:
>> Ok, sane. Then I take my time a little and review current vmevent code briefly.
>> (I read vmevent/core branch in pekka's tree. please let me know if
>> there is newer repositry)
>
> It's the latest one.
>
> On Tue, May 8, 2012 at 10:11 AM, KOSAKI Motohiro
> <kosaki.motohiro@gmail.com> wrote:
>> 1) sample_period is brain damaged idea. If people ONLY need to
>> sampling stastics, they
>> only need to read /proc/vmstat periodically. just remove it and
>> implement push notification.
>> _IF_ someone need unfrequent level trigger, just use
>> "usleep(timeout); read(vmevent_fd)"
>> on userland code.
>
> That comes from a real-world requirement. See Leonid's email on the topic:
>
> https://lkml.org/lkml/2012/5/2/42
I know, many embedded guys prefer such timer interval. I also have an experience
similar logic when I was TV box developer. but I must disagree. Someone hope
timer housekeeping complexity into kernel. but I haven't seen any justification.
>> 2) VMEVENT_ATTR_STATE_ONE_SHOT is misleading name. That is effect as
>> edge trigger shot. not only once.
>
> Would VMEVENT_ATTR_STATE_EDGE_TRIGGER be a better name?
maybe.
>> 3) vmevent_fd() seems sane interface. but it has name space unaware.
>> maybe we discuss how to harmonize name space feature. No hurry. but we have
>> to think that issue since at beginning.
>
> You mean VFS namespaces? Yeah, we need to take care of that.
If we keep current vmevent_fd() design, we may need to create new namespace concept
likes ipc namespace. current vmevent_fd() is not VFS based.
>> 4) Currently, vmstat have per-cpu batch and vmstat updating makes 3
>> second delay at maximum.
>> This is fine for usual case because almost userland watcher only
>> read /proc/vmstat per second.
>> But, for vmevent_fd() case, 3 seconds may be unacceptable delay. At
>> worst, 128 batch x 4096
>> x 4k pagesize = 2G bytes inaccurate is there.
>
> That's pretty awful. Anton, Leonid, comments?
>
>> 5) __VMEVENT_ATTR_STATE_VALUE_WAS_LT should be removed from userland
>> exporting files.
>> When exporing kenrel internal, always silly gus used them and made unhappy.
>
> Agreed. Anton, care to cook up a patch to do that?
>
>> 6) Also vmevent_event must hide from userland.
>
> Why? That's part of the ABI.
Ahhh, if so, I missed something. as far as I look, vmevent_fd() only depend
on vmevent_config. which syscall depend on vmevent_evennt?
>> 7) vmevent_config::size must be removed. In 20th century, M$ API
>> prefer to use this technique. But
>> They dropped the way because a lot of application don't initialize
>> size member and they can't use it for keeping upper compitibility.
>
> It's there to support forward/backward ABI compatibility like perf
> does. I'm going to keep it for now but I'm open to dropping it when
> the ABI is more mature.
perf api is not intended to use from generic applications. then, I don't
think it will make abi issue. tool/perf is sane, isn't it? but vmevent_fd()
is generic api and we can't trust all userland guy have sane, unfortunately.
>> 8) memcg unaware
>> 9) numa unaware
>> 10) zone unaware
>
> Yup.
>
>> And, we may need vm internal change if we really need lowmem
>> notification. current kernel don't have such info. _And_ there is one more
>> big problem. Currently the kernel maintain memory per
>> zone. But almost all userland application aren't aware zone nor node.
>> Thus raw notification aren't useful for userland. In the other hands, total
>> memory and total free memory is useful? Definitely No!
>> Even though total free memory are lots, system may start swap out and
>> oom invokation. If we can't oom invocation, this feature has serious raison
>> d'etre issue. (i.e. (4), (8), (9) and (19) are not ignorable issue. I think)
>
> I'm guessing most of the existing solutions get away with
> approximations and soft limits because they're mostly used on UMA
> embedded machines.
>
> But yes, we need to do better here.
Hm. If you want vmevent makes depend on CONFIG_EMBEDDED, I have no reason to
complain this feature. At that world, almost all applications _know_ their
system configuration. then I don't think api misuse issue is big matter.
next prev parent reply other threads:[~2012-05-08 7:50 UTC|newest]
Thread overview: 188+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-01 13:24 [PATCH 0/3] vmevent: Implement 'low memory' attribute Anton Vorontsov
2012-05-01 13:24 ` Anton Vorontsov
2012-05-01 13:25 ` [PATCH 1/3] vmevent: Implement equal-to attribute state Anton Vorontsov
2012-05-01 13:25 ` Anton Vorontsov
2012-05-01 13:25 ` [PATCH 2/3] vmevent: Pass attr argument to sampling functions Anton Vorontsov
2012-05-01 13:25 ` Anton Vorontsov
2012-05-01 13:26 ` [PATCH 3/3] vmevent: Implement special low-memory attribute Anton Vorontsov
2012-05-01 13:26 ` Anton Vorontsov
2012-05-03 10:33 ` Pekka Enberg
2012-05-03 10:33 ` Pekka Enberg
2012-05-04 4:26 ` Minchan Kim
2012-05-04 4:26 ` Minchan Kim
2012-05-04 7:38 ` Anton Vorontsov
2012-05-04 7:38 ` Anton Vorontsov
2012-05-07 7:14 ` Pekka Enberg
2012-05-07 7:14 ` Pekka Enberg
2012-05-07 8:26 ` KOSAKI Motohiro
2012-05-07 8:26 ` KOSAKI Motohiro
2012-05-07 12:15 ` Anton Vorontsov
2012-05-07 12:15 ` Anton Vorontsov
2012-05-07 19:19 ` KOSAKI Motohiro
2012-05-07 19:19 ` KOSAKI Motohiro
2012-05-08 0:31 ` Anton Vorontsov
2012-05-08 0:31 ` Anton Vorontsov
2012-05-08 5:20 ` Pekka Enberg
2012-05-08 5:20 ` Pekka Enberg
2012-05-08 5:42 ` KOSAKI Motohiro
2012-05-08 5:42 ` KOSAKI Motohiro
2012-05-08 5:53 ` Pekka Enberg
2012-05-08 5:53 ` Pekka Enberg
2012-05-08 7:11 ` KOSAKI Motohiro
2012-05-08 7:11 ` KOSAKI Motohiro
2012-05-08 7:36 ` Pekka Enberg
2012-05-08 7:36 ` Pekka Enberg
2012-05-08 7:50 ` KOSAKI Motohiro [this message]
2012-05-08 7:50 ` KOSAKI Motohiro
2012-05-08 8:03 ` Pekka Enberg
2012-05-08 8:03 ` Pekka Enberg
2012-05-08 9:15 ` leonid.moiseichuk
2012-05-08 9:15 ` leonid.moiseichuk
2012-05-08 9:19 ` Pekka Enberg
2012-05-08 9:19 ` Pekka Enberg
2012-05-08 10:38 ` leonid.moiseichuk
2012-05-08 10:38 ` leonid.moiseichuk
2012-06-01 12:21 ` [PATCH 0/5] Some vmevent fixes Anton Vorontsov
2012-06-01 12:21 ` Anton Vorontsov
2012-06-01 12:24 ` [PATCH 1/5] vmstat: Implement refresh_vm_stats() Anton Vorontsov
2012-06-01 12:24 ` Anton Vorontsov
2012-06-05 14:30 ` Christoph Lameter
2012-06-05 14:30 ` Christoph Lameter
2012-06-08 3:17 ` KOSAKI Motohiro
2012-06-08 3:17 ` KOSAKI Motohiro
2012-06-01 12:24 ` [PATCH 2/5] vmevent: Convert from deferred timer to deferred work Anton Vorontsov
2012-06-01 12:24 ` Anton Vorontsov
2012-06-08 3:25 ` KOSAKI Motohiro
2012-06-08 3:25 ` KOSAKI Motohiro
2012-06-08 6:58 ` Anton Vorontsov
2012-06-08 6:58 ` Anton Vorontsov
2012-06-08 7:03 ` Pekka Enberg
2012-06-08 7:03 ` Pekka Enberg
2012-06-08 8:07 ` Anton Vorontsov
2012-06-08 8:07 ` Anton Vorontsov
2012-06-08 7:05 ` leonid.moiseichuk
2012-06-08 7:05 ` leonid.moiseichuk
2012-06-08 7:10 ` KOSAKI Motohiro
2012-06-08 7:10 ` KOSAKI Motohiro
2012-06-08 7:18 ` leonid.moiseichuk
2012-06-08 7:18 ` leonid.moiseichuk
2012-06-08 7:23 ` KOSAKI Motohiro
2012-06-08 7:23 ` KOSAKI Motohiro
2012-06-08 7:28 ` leonid.moiseichuk
2012-06-08 7:28 ` leonid.moiseichuk
2012-06-08 7:33 ` KOSAKI Motohiro
2012-06-08 7:33 ` KOSAKI Motohiro
2012-06-08 7:49 ` leonid.moiseichuk
2012-06-08 7:49 ` leonid.moiseichuk
2012-06-08 7:58 ` Anton Vorontsov
2012-06-08 7:58 ` Anton Vorontsov
2012-06-08 8:16 ` leonid.moiseichuk
2012-06-08 8:16 ` leonid.moiseichuk
2012-06-08 8:41 ` Anton Vorontsov
2012-06-08 8:41 ` Anton Vorontsov
2012-06-08 8:57 ` leonid.moiseichuk
2012-06-08 8:57 ` leonid.moiseichuk
2012-06-08 10:35 ` Anton Vorontsov
2012-06-08 10:35 ` Anton Vorontsov
2012-06-08 11:03 ` leonid.moiseichuk
2012-06-08 11:03 ` leonid.moiseichuk
2012-06-08 12:13 ` Anton Vorontsov
2012-06-08 12:13 ` Anton Vorontsov
2012-06-08 12:25 ` leonid.moiseichuk
2012-06-08 12:25 ` leonid.moiseichuk
2012-06-01 12:24 ` [PATCH 3/5] vmevent: Refresh vmstats before sampling Anton Vorontsov
2012-06-01 12:24 ` Anton Vorontsov
2012-06-05 14:36 ` Christoph Lameter
2012-06-05 14:36 ` Christoph Lameter
2012-06-01 12:24 ` [PATCH 4/5] vmevent: Hide meaningful names from the user-visible header Anton Vorontsov
2012-06-01 12:24 ` Anton Vorontsov
2012-06-01 12:24 ` [PATCH 5/5] vmevent: Rename one-shot mode to edge trigger mode Anton Vorontsov
2012-06-01 12:24 ` Anton Vorontsov
2012-06-03 18:26 ` [PATCH 0/5] Some vmevent fixes Pekka Enberg
2012-06-03 18:26 ` Pekka Enberg
2012-06-04 8:45 ` Minchan Kim
2012-06-04 8:45 ` Minchan Kim
2012-06-04 9:20 ` Pekka Enberg
2012-06-04 9:20 ` Pekka Enberg
2012-06-04 12:23 ` Minchan Kim
2012-06-04 12:23 ` Minchan Kim
2012-06-04 11:38 ` Anton Vorontsov
2012-06-04 11:38 ` Anton Vorontsov
2012-06-04 12:17 ` Minchan Kim
2012-06-04 12:17 ` Minchan Kim
2012-06-04 13:35 ` Anton Vorontsov
2012-06-04 13:35 ` Anton Vorontsov
2012-06-05 7:53 ` Pekka Enberg
2012-06-05 7:53 ` Pekka Enberg
2012-06-05 8:00 ` Minchan Kim
2012-06-05 8:00 ` Minchan Kim
2012-06-05 8:01 ` Pekka Enberg
2012-06-05 8:01 ` Pekka Enberg
2012-06-05 8:16 ` leonid.moiseichuk
2012-06-05 8:16 ` leonid.moiseichuk
2012-06-05 8:27 ` Minchan Kim
2012-06-05 8:27 ` Minchan Kim
2012-06-08 3:35 ` KOSAKI Motohiro
2012-06-08 3:35 ` KOSAKI Motohiro
2012-06-04 20:05 ` KOSAKI Motohiro
2012-06-04 20:05 ` KOSAKI Motohiro
2012-06-04 22:39 ` Anton Vorontsov
2012-06-04 22:39 ` Anton Vorontsov
2012-06-08 3:45 ` KOSAKI Motohiro
2012-06-08 3:45 ` KOSAKI Motohiro
2012-06-08 6:57 ` Pekka Enberg
2012-06-08 6:57 ` Pekka Enberg
2012-06-05 7:47 ` Pekka Enberg
2012-06-05 7:47 ` Pekka Enberg
2012-06-05 8:39 ` Anton Vorontsov
2012-06-05 8:39 ` Anton Vorontsov
2012-06-07 2:41 ` Minchan Kim
2012-06-07 2:41 ` Minchan Kim
2012-06-08 7:49 ` Anton Vorontsov
2012-06-08 7:49 ` Anton Vorontsov
2012-06-08 8:43 ` Minchan Kim
2012-06-08 8:43 ` Minchan Kim
2012-06-08 8:48 ` Pekka Enberg
2012-06-08 8:48 ` Pekka Enberg
2012-06-08 9:12 ` leonid.moiseichuk
2012-06-08 9:12 ` leonid.moiseichuk
2012-06-08 9:45 ` Anton Vorontsov
2012-06-08 9:45 ` Anton Vorontsov
2012-06-08 10:42 ` Minchan Kim
2012-06-08 10:42 ` Minchan Kim
2012-06-08 11:14 ` Anton Vorontsov
2012-06-08 11:14 ` Anton Vorontsov
2012-06-11 4:50 ` Minchan Kim
2012-06-11 4:50 ` Minchan Kim
2012-06-05 7:52 ` Pekka Enberg
2012-06-05 7:52 ` Pekka Enberg
2012-06-08 3:55 ` KOSAKI Motohiro
2012-06-08 3:55 ` KOSAKI Motohiro
2012-06-08 6:54 ` Pekka Enberg
2012-06-08 6:54 ` Pekka Enberg
2012-06-08 6:57 ` KOSAKI Motohiro
2012-06-08 6:57 ` KOSAKI Motohiro
2012-06-08 6:59 ` Pekka Enberg
2012-06-08 6:59 ` Pekka Enberg
2012-06-04 19:50 ` KOSAKI Motohiro
2012-06-04 19:50 ` KOSAKI Motohiro
2012-05-08 8:32 ` [PATCH 3/3] vmevent: Implement special low-memory attribute Minchan Kim
2012-05-08 8:32 ` Minchan Kim
2012-05-08 9:27 ` Pekka Enberg
2012-05-08 9:27 ` Pekka Enberg
2012-06-05 14:40 ` Christoph Lameter
2012-06-05 14:40 ` Christoph Lameter
2012-05-08 6:58 ` Anton Vorontsov
2012-05-08 6:58 ` Anton Vorontsov
2012-05-08 7:16 ` KOSAKI Motohiro
2012-05-08 7:16 ` KOSAKI Motohiro
2012-05-08 8:13 ` Anton Vorontsov
2012-05-08 8:13 ` Anton Vorontsov
2012-05-08 8:21 ` Anton Vorontsov
2012-05-08 8:21 ` Anton Vorontsov
2012-05-03 8:10 ` [PATCH 0/3] vmevent: Implement 'low memory' attribute Pekka Enberg
2012-05-03 8:10 ` Pekka Enberg
2012-05-03 9:44 ` Anton Vorontsov
2012-05-03 9:44 ` Anton Vorontsov
2012-05-03 10:54 ` Pekka Enberg
2012-05-03 10:54 ` Pekka Enberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FA8D046.7000808@gmail.com \
--to=kosaki.motohiro@gmail.com \
--cc=anton.vorontsov@linaro.org \
--cc=john.stultz@linaro.org \
--cc=kernel-team@android.com \
--cc=leonid.moiseichuk@nokia.com \
--cc=linaro-kernel@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=patches@linaro.org \
--cc=penberg@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.