All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Travis <travis@sgi.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <ak@suse.de>, Christoph Lameter <clameter@sgi.com>,
	Jack Steiner <steiner@sgi.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs
Date: Mon, 14 Jan 2008 09:52:48 -0800	[thread overview]
Message-ID: <478BA170.10806@sgi.com> (raw)
In-Reply-To: <20080114090010.GA5404@elte.hu>

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>>> 32cpus			  1kcpus-before		    1kcpus-after
>>>    7172678 Total	   +23314404 Total	       -147590 Total
>> 1kcpus-after means it's +23314404-147590, i.e. +23166814? (i.e. a 0.6% 
>> reduction of the bloat?)
> 
> or if it's relative to 32cpus then that's an excellent result :)
> 
> 	Ingo

Nope, it's a cumulative thing.


> allsizes -w 72 32cpus 1kcpus-after
32cpus                              1kcpus-after
       228 .altinstr_replacemen             +0 .altinstr_replacemen
      1219 .altinstructions                 +0 .altinstructions
    717512 .bss                       +1395328 .bss
     61374 .comment                         +0 .comment
        16 .con_initcall.init               +0 .con_initcall.init
    425256 .data                        +19200 .data
    178688 .data.cacheline_alig      +12898304 .data.cacheline_alig
      8192 .data.init_task                  +0 .data.init_task
      4096 .data.page_aligned               +0 .data.page_aligned
     27008 .data.percpu                +128896 .data.percpu
     43904 .data.read_mostly          +8703776 .data.read_mostly
         4 .data_nosave                     +0 .data_nosave
      5141 .exit.text                       +8 .exit.text
    138480 .init.data                    +4608 .init.data
       133 .init.ramfs                      +1 .init.ramfs
      3192 .init.setup                      +0 .init.setup
    159754 .init.text                     +904 .init.text
      2288 .initcall.init                   +0 .initcall.init
         8 .jiffies                         +0 .jiffies
      4512 .pci_fixup                       +0 .pci_fixup
   1314438 .rodata                        +760 .rodata
     36552 .smp_locks                     +256 .smp_locks
   3971848 .text                        +14773 .text
      3368 .vdso                            +0 .vdso
         4 .vgetcpu_mode                    +0 .vgetcpu_mode
       218 .vsyscall_0                      +0 .vsyscall_0
        52 .vsyscall_1                      +0 .vsyscall_1
        91 .vsyscall_2                      +0 .vsyscall_2
         8 .vsyscall_3                      +0 .vsyscall_3
        54 .vsyscall_fn                     +0 .vsyscall_fn
        80 .vsyscall_gtod_data              +0 .vsyscall_gtod_data
     39480 __bug_table                      +0 __bug_table
     16320 __ex_table                       +0 __ex_table
      9160 __param                          +0 __param
   7172678 Total                     +23166814 Total

My goal is to move 90+% of the wasted, unused memory to either
the percpu area or the initdata section.  The 4 fronts are:
NR_CPUS arrays, cpumask_t usages, more efficient cpu_alloc/percpu
area, and (relatively small) redesign of the irq system.  (The
node and apicid arrays are related to the NR_CPUS arrays.)

The irq structs are particularly bad because they use NR_CPUS**2
arrays and the irq vars use 22588416 bytes (74%) of the total
30339492 bytes of memory:

   7172678 Total                      30339492 Total

> datasizes -w 72 32cpus 1kcpus-before
32cpus                              1kcpus-before
      262144    BSS __log_buf           12681216 CALNDA irq_desc
      163840 CALNDA irq_desc             8718336 RMDATA irq_cfg
      131072    BSS entries               528384    BSS irq_lists
       76800 INITDA early_node_map        396288    BSS irq_2_pin
       30720 RMDATA irq_cfg               264192    BSS irq_timer_state
       29440    BSS ide_hwifs             262144    BSS __log_buf
       24576    BSS boot_exception_       132168 PERCPU per_cpu__kstat
       20480    BSS irq_lists             131072    BSS entries
       18840   DATA ioctl_start           131072    BSS boot_pageset
       16384    BSS boot_cpu_stack        131072 CALNDA boot_cpu_pda
       15360    BSS irq_2_pin              98304    BSS cpu_devices
       14677   DATA bnx2_CP_b06FwTe        76800 INITDA early_node_map

I'm still working on a tool to analyze runtime usage of kernel
memory.

And I'm very open to any and all suggestions... ;-)

Thanks,
Mike

WARNING: multiple messages have this Message-ID (diff)
From: Mike Travis <travis@sgi.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <ak@suse.de>, Christoph Lameter <clameter@sgi.com>,
	Jack Steiner <steiner@sgi.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs
Date: Mon, 14 Jan 2008 09:52:48 -0800	[thread overview]
Message-ID: <478BA170.10806@sgi.com> (raw)
In-Reply-To: <20080114090010.GA5404@elte.hu>

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
>>> 32cpus			  1kcpus-before		    1kcpus-after
>>>    7172678 Total	   +23314404 Total	       -147590 Total
>> 1kcpus-after means it's +23314404-147590, i.e. +23166814? (i.e. a 0.6% 
>> reduction of the bloat?)
> 
> or if it's relative to 32cpus then that's an excellent result :)
> 
> 	Ingo

Nope, it's a cumulative thing.


> allsizes -w 72 32cpus 1kcpus-after
32cpus                              1kcpus-after
       228 .altinstr_replacemen             +0 .altinstr_replacemen
      1219 .altinstructions                 +0 .altinstructions
    717512 .bss                       +1395328 .bss
     61374 .comment                         +0 .comment
        16 .con_initcall.init               +0 .con_initcall.init
    425256 .data                        +19200 .data
    178688 .data.cacheline_alig      +12898304 .data.cacheline_alig
      8192 .data.init_task                  +0 .data.init_task
      4096 .data.page_aligned               +0 .data.page_aligned
     27008 .data.percpu                +128896 .data.percpu
     43904 .data.read_mostly          +8703776 .data.read_mostly
         4 .data_nosave                     +0 .data_nosave
      5141 .exit.text                       +8 .exit.text
    138480 .init.data                    +4608 .init.data
       133 .init.ramfs                      +1 .init.ramfs
      3192 .init.setup                      +0 .init.setup
    159754 .init.text                     +904 .init.text
      2288 .initcall.init                   +0 .initcall.init
         8 .jiffies                         +0 .jiffies
      4512 .pci_fixup                       +0 .pci_fixup
   1314438 .rodata                        +760 .rodata
     36552 .smp_locks                     +256 .smp_locks
   3971848 .text                        +14773 .text
      3368 .vdso                            +0 .vdso
         4 .vgetcpu_mode                    +0 .vgetcpu_mode
       218 .vsyscall_0                      +0 .vsyscall_0
        52 .vsyscall_1                      +0 .vsyscall_1
        91 .vsyscall_2                      +0 .vsyscall_2
         8 .vsyscall_3                      +0 .vsyscall_3
        54 .vsyscall_fn                     +0 .vsyscall_fn
        80 .vsyscall_gtod_data              +0 .vsyscall_gtod_data
     39480 __bug_table                      +0 __bug_table
     16320 __ex_table                       +0 __ex_table
      9160 __param                          +0 __param
   7172678 Total                     +23166814 Total

My goal is to move 90+% of the wasted, unused memory to either
the percpu area or the initdata section.  The 4 fronts are:
NR_CPUS arrays, cpumask_t usages, more efficient cpu_alloc/percpu
area, and (relatively small) redesign of the irq system.  (The
node and apicid arrays are related to the NR_CPUS arrays.)

The irq structs are particularly bad because they use NR_CPUS**2
arrays and the irq vars use 22588416 bytes (74%) of the total
30339492 bytes of memory:

   7172678 Total                      30339492 Total

> datasizes -w 72 32cpus 1kcpus-before
32cpus                              1kcpus-before
      262144    BSS __log_buf           12681216 CALNDA irq_desc
      163840 CALNDA irq_desc             8718336 RMDATA irq_cfg
      131072    BSS entries               528384    BSS irq_lists
       76800 INITDA early_node_map        396288    BSS irq_2_pin
       30720 RMDATA irq_cfg               264192    BSS irq_timer_state
       29440    BSS ide_hwifs             262144    BSS __log_buf
       24576    BSS boot_exception_       132168 PERCPU per_cpu__kstat
       20480    BSS irq_lists             131072    BSS entries
       18840   DATA ioctl_start           131072    BSS boot_pageset
       16384    BSS boot_cpu_stack        131072 CALNDA boot_cpu_pda
       15360    BSS irq_2_pin              98304    BSS cpu_devices
       14677   DATA bnx2_CP_b06FwTe        76800 INITDA early_node_map

I'm still working on a tool to analyze runtime usage of kernel
memory.

And I'm very open to any and all suggestions... ;-)

Thanks,
Mike

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-01-14 17:53 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-13 18:34 [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs travis
2008-01-13 18:34 ` travis
2008-01-13 18:34 ` [PATCH 01/10] x86: Change size of APICIDs from u8 to u16 travis
2008-01-13 18:34   ` travis
2008-01-14 12:23   ` Mel Gorman
2008-01-14 12:23     ` Mel Gorman
2008-01-14 18:13     ` Mike Travis
2008-01-14 18:13       ` Mike Travis
2008-01-14 19:26     ` Mike Travis
2008-01-14 19:26       ` Mike Travis
2008-01-14 18:10   ` Jan Engelhardt
2008-01-14 18:10     ` Jan Engelhardt
2008-01-14 18:22     ` Mike Travis
2008-01-14 18:22       ` Mike Travis
2008-01-14 18:32     ` Mike Travis
2008-01-14 18:32       ` Mike Travis
2008-01-14 19:16       ` Christoph Lameter
2008-01-14 19:16         ` Christoph Lameter
2008-01-13 18:34 ` [PATCH 02/10] x86: Change size of node ids " travis
2008-01-13 18:34   ` travis
2008-01-13 20:01   ` Eric Dumazet
2008-01-13 20:01     ` Eric Dumazet
2008-01-13 18:34 ` [PATCH 03/10] x86: Change NR_CPUS arrays in powernow-k8 travis
2008-01-13 18:34   ` travis
2008-01-13 18:34 ` [PATCH 04/10] x86: Change NR_CPUS arrays in intel_cacheinfo travis
2008-01-13 18:34   ` travis
2008-01-13 18:34 ` [PATCH 05/10] x86: Change NR_CPUS arrays in smpboot_64 travis
2008-01-13 18:34   ` travis
2008-01-13 18:34 ` [PATCH 06/10] x86: Change NR_CPUS arrays in topology travis
2008-01-13 18:34   ` travis
2008-01-14 18:25   ` Jan Engelhardt
2008-01-14 18:25     ` Jan Engelhardt
2008-01-14 19:08     ` Mike Travis
2008-01-14 19:08       ` Mike Travis
2008-01-13 18:35 ` [PATCH 07/10] x86: Cleanup x86_cpu_to_apicid references travis
2008-01-13 18:35   ` travis
2008-01-13 18:35 ` [PATCH 08/10] x86: Change NR_CPUS arrays in numa_64 travis
2008-01-13 18:35   ` travis
2008-01-14 11:14   ` Ingo Molnar
2008-01-14 11:14     ` Ingo Molnar
2008-01-14 17:17     ` Mike Travis
2008-01-14 17:17       ` Mike Travis
2008-01-14 18:14   ` Jan Engelhardt
2008-01-14 18:14     ` Jan Engelhardt
2008-01-13 18:35 ` [PATCH 09/10] x86: Change NR_CPUS arrays in acpi-cpufreq travis
2008-01-13 18:35   ` travis
2008-01-13 18:35 ` [PATCH 10/10] x86: Change bios_cpu_apicid to percpu data variable travis
2008-01-13 18:35   ` travis
2008-01-14  8:14 ` [PATCH 00/10] x86: Reduce memory and intra-node effects with large count NR_CPUs Ingo Molnar
2008-01-14  8:14   ` Ingo Molnar
2008-01-14  9:00   ` Ingo Molnar
2008-01-14  9:00     ` Ingo Molnar
2008-01-14 17:52     ` Mike Travis [this message]
2008-01-14 17:52       ` Mike Travis
2008-01-14 10:04   ` Andi Kleen
2008-01-14 10:04     ` Andi Kleen
2008-01-14 10:11     ` Ingo Molnar
2008-01-14 10:11       ` Ingo Molnar
2008-01-14 11:30       ` Andi Kleen
2008-01-14 11:30         ` Andi Kleen
2008-01-16  7:34         ` Nick Piggin
2008-01-16  7:34           ` Nick Piggin
2008-01-16 18:07           ` Christoph Lameter
2008-01-16 18:07             ` Christoph Lameter
2008-01-14 18:00       ` Mike Travis
2008-01-14 18:00         ` Mike Travis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=478BA170.10806@sgi.com \
    --to=travis@sgi.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=clameter@sgi.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.