All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <marcelo@kvack.org>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: "Marcelo Tosatti" <marcelo@kvack.org>,
	linux-mm@kvack.org, "Daniel Sp蚣g" <daniel.spang@gmail.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Andrew Morton" <akpm@linux-foundation.org>
Subject: Re: [RFC][patch 1/2] mem notifications v3 improvement for large system
Date: Thu, 27 Dec 2007 16:04:56 -0500	[thread overview]
Message-ID: <20071227210456.GB14823@dmt> (raw)
In-Reply-To: <20071225182144.D26D.KOSAKI.MOTOHIRO@jp.fujitsu.com>

On Tue, Dec 25, 2007 at 07:31:14PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> I tried resolve too few notification problem.
> 
> mem_notify_status global variable mean wakeup 1 process.
> it is too few.
> 
> improvement step1:
> - add read method and wake up all process.
> 
> 1. run >10000 process test
>    console1# LANG=C; while [ 1 ] ;do sleep 1; date; vmstat 1 1 -S M -a; done
>    console2# sh m.sh 12500
> 
> result:
>  - wakeup all unoccur neither thundering herd nor soft lock-up.
>  - no swap out occured.
>  - but too much free ;-)
>    in my test-case, over 5GB freed.
> 
> 
> Wed Dec 26 03:19:20 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  7  0      0    605    209  12778    0    0   143    11 1458  183 14 10 76  1  0
> Wed Dec 26 03:19:21 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  6  0      0   2687    209  10769    0    0   142    11 1459  188 14 10 75  1  0
> Wed Dec 26 03:19:22 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  2  0      0   4560    209   8968    0    0   142    11 1459  191 14 10 75  1  0
> Wed Dec 26 03:19:23 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5857    209   7724    0    0   142    11 1457  192 14 10 75  1  0
> Wed Dec 26 03:19:24 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5872    209   7724    0    0   141    11 1454  192 14 10 75  1  0
> Wed Dec 26 03:19:25 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5884    209   7724    0    0   141    11 1451  192 14 10 75  1  0
> Wed Dec 26 03:19:26 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5895    209   7724    0    0   140    11 1448  191 14 10 75  1  0
> Wed Dec 26 03:19:27 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5904    209   7724    0    0   140    11 1445  191 14 10 75  1  0
> Wed Dec 26 03:19:28 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5912    209   7724    0    0   140    11 1442  190 13 10 75  1  0
> Wed Dec 26 03:19:29 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5920    209   7724    0    0   139    11 1439  190 13 10 75  1  0
> Wed Dec 26 03:19:30 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  1      0   5929    209   7724    0    0   139    11 1436  189 13 10 75  1  0
> Wed Dec 26 03:19:32 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5935    209   7724    0    0   139    11 1433  189 13 10 75  1  0
> Wed Dec 26 03:19:33 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0   5940    209   7724    0    0   138    11 1430  188 13 10 75  1  0
> Wed Dec 26 03:19:34 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  2  1      0   5948    209   7725    0    0   138    11 1427  188 13 10 75  1  0
> Wed Dec 26 03:19:35 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  0  0      0   5676    209   8005    0    0   138    11 1425  188 13 10 75  1  0
> Wed Dec 26 03:19:36 JST 2007
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
>  r  b   swpd   free  inact active   si   so    bi    bo   in   cs us sy id wa st
>  0  1      0   5676    209   8006    0    0   137    11 1422  188 13 10 75  1  0
> 
> 
> Index: linux-2.6.23-mem_notify_v3/mm/mem_notify.c
> ===================================================================
> --- linux-2.6.23-mem_notify_v3.orig/mm/mem_notify.c
> +++ linux-2.6.23-mem_notify_v3/mm/mem_notify.c
> @@ -13,7 +13,11 @@
>  #include <linux/percpu.h>
>  #include <linux/timer.h>
> 
> -static unsigned long mem_notify_status = 0;
> +struct mem_notify_file_info {
> +        long          last_event;
> +};
> +
> +atomic_t mem_notify_event = ATOMIC_INIT(0);
> 
>  static DECLARE_WAIT_QUEUE_HEAD(mem_wait);
>  static DEFINE_PER_CPU(unsigned long, last_mem_notify) = INITIAL_JIFFIES;
> @@ -28,53 +32,81 @@ void mem_notify_userspace(void)
> 
>         if (time_after(now, target)) {
>                 __get_cpu_var(last_mem_notify) = now;
> -               mem_notify_status = 1;
> +               atomic_inc(&mem_notify_event);
>                 wake_up(&mem_wait);
>         }
>  }
> 
>  static int mem_notify_open(struct inode *inode, struct file *file)
>  {
> -       return 0;
> +        struct mem_notify_file_info *ptr;
> +        int    err = 0;
> +
> +        ptr = kmalloc(sizeof(*ptr), GFP_KERNEL);
> +        if (!ptr) {
> +                err = -ENOMEM;
> +                goto out;
> +        }
> +
> +        ptr->last_event = atomic_read(&mem_notify_event);
> +        file->private_data = ptr;
> +
> +out:
> +        return err;
>  }
> 
>  static int mem_notify_release(struct inode *inode, struct file *file)
>  {
> +        kfree(file->private_data);
> +
>         return 0;
>  }
> 
>  static unsigned int mem_notify_poll(struct file *file, poll_table *wait)
>  {
>         unsigned int val = 0;
> +       struct zone *zone;
> +       int pages_high, pages_free, pages_reserve;
> +        struct mem_notify_file_info *file_info = file->private_data;
> 
>         poll_wait(file, &mem_wait, wait);
> 
> -       if (mem_notify_status) {
> -               struct zone *zone;
> -               int pages_high, pages_free, pages_reserve;
> -
> -               mem_notify_status = 0;

By clearing mem_notify_status you avoid other processes going
through mem_notify_poll() from having a spurious notification if
memory is not exhausted. So the real point of notification is the
call from shrink_zone() (or shrink_active_list()), not the pages <
(pages_high+pages_reserve)*2 formula.

So something like the following sounds better:

- have your poll_wait_exclusive() patch in place
- pass a "status" parameter to mem_notify_userspace() and have it clear
mem_notify_status in case status is zero, so to stop sending POLLIN to processes.
- call mem_notify_userspace(0) from mm/vmscan.c when ZONE_NORMAL reclaim_mapped 
is false (that seems a good indication that VM is out of trouble).
- test for mem_notify_status in mem_notify_poll(), but do not clear it.
- at mem_notify_userspace(), use wake_up_nr(number of mem_notify users/10) (10
meaning a small percentage of registered users).


> -
> -               /* check if its not a spurious/stale notification */
> -               pages_high = pages_free = pages_reserve = 0;
> -               for_each_zone(zone) {
> -                       if (!populated_zone(zone) || is_highmem(zone))
> -                               continue;
> -                       pages_high += zone->pages_high;
> -                       pages_free += zone_page_state(zone, NR_FREE_PAGES);
> -                       pages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
> -               }
> +        if (file_info->last_event == atomic_read(&mem_notify_event))
> +                goto out;

What exactly are you trying to deal with by using last_event?

> 
> -               if (pages_free < (pages_high+pages_reserve)*2)
> -                       val = POLLIN;
> +       /* check if its not a spurious/stale notification */
> +       pages_high = pages_free = pages_reserve = 0;
> +       for_each_zone(zone) {
> +               if (!populated_zone(zone) || is_highmem(zone))
> +                       continue;
> +               pages_high += zone->pages_high;
> +               pages_free += zone_page_state(zone, NR_FREE_PAGES);
> +               pages_reserve += zone->lowmem_reserve[MAX_NR_ZONES-1];
>         }
> -
> +
> +       if (pages_free < (pages_high+pages_reserve)*2)
> +               val = POLLIN;
> +
> +out:
>         return val;
>  }
> 
> +static ssize_t mem_notify_read(struct file *file, char __user *buf,
> +                               size_t count, loff_t *ppos)
> +{
> +        struct mem_notify_file_info *file_info = file->private_data;
> +        if (!file_info)
> +                return -EINVAL;
> +
> +        file_info->last_event = atomic_read(&mem_notify_event);
> +
> +        return 0;
> +}

Your then using read() to affect poll() behavior. I don't like it.

> +
>  struct file_operations mem_notify_fops = {
>         .open = mem_notify_open,
>         .release = mem_notify_release,
>         .poll = mem_notify_poll,
> +        .read = mem_notify_read,
>  };
>  EXPORT_SYMBOL(mem_notify_fops);
> 
> 
> 
> 
> 
> 
> /kosaki

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-12-27 21:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-24 20:32 [PATCH] mem notifications v3 Marcelo Tosatti
2007-12-25  3:47 ` KOSAKI Motohiro
2007-12-25  4:56   ` [RFC] add poll_wait_exclusive() API KOSAKI Motohiro
2007-12-27 21:05     ` Marcelo Tosatti
2007-12-25  8:31   ` [PATCH] mem notifications v3 KOSAKI Motohiro
2007-12-25 10:31     ` [RFC][patch 1/2] mem notifications v3 improvement for large system KOSAKI Motohiro
2007-12-27 21:04       ` Marcelo Tosatti [this message]
2007-12-28  0:38         ` KOSAKI Motohiro
2007-12-25 10:31     ` [RFC][patch 2/2] " KOSAKI Motohiro
2007-12-25 10:41       ` KOSAKI Motohiro
2007-12-27  4:49       ` [RFC][patch] mem_notify more faster reduce load average KOSAKI Motohiro
2007-12-27 20:13   ` [PATCH] mem notifications v3 Marcelo Tosatti
2007-12-28  1:44     ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071227210456.GB14823@dmt \
    --to=marcelo@kvack.org \
    --cc=akpm@linux-foundation.org \
    --cc=daniel.spang@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.