From: Paul Jackson <pj@sgi.com>
To: Paul Jackson <pj@sgi.com>
Cc: colpatch@us.ibm.com, wli@holomorphy.com, linux-kernel@vger.kernel.org
Subject: Re: [Patch 17/23] mask v2 = [6/7] nodemask_t_ia64_changes
Date: Tue, 6 Apr 2004 04:37:32 -0700 [thread overview]
Message-ID: <20040406043732.6fb2df9f.pj@sgi.com> (raw)
In-Reply-To: <20040401131240.00f7d74d.pj@sgi.com>
Matthew,
A couple of these nodemask changes are increasing kernel text size quite
a bit on big numa configurations.
I've got one test case I ran where the text size of vmlinux increased
from 8789097 to 8810513 bytes (2.6.5 kernel for ia64 SN2 NR_CPUS=512
sn2_defconfig, gcc 3.2.3).
>From a cursory comparson of 'nm --print-size --size-sort' output,
I think the increased space is caused by the numerous numnodes
changes, such as:
- pxm_to_nid_map[i] = numnodes;
- nid_to_pxm_map[numnodes++] = i;
+ pxm_to_nid_map[i] = num_online_nodes();
+ nid_to_pxm_map[num_online_nodes()] = i;
+ node_set_online(num_online_nodes());
And by the for loop replacements:
- for (nid = 0, i = 0; i < numnodes; i++) {
+ nid = 0;
+ for_each_online_node(i) {
In particular, the machine code generated by the following silly little
routine:
int foo() { int i = 0, n; for_each_online_node(n) i++; return i; }
is ... hold onto your hat ...
a000000100116f40 <foo>:
a000000100116f40: 00 10 20 02 29 26 [MII] addl r2=-2091896,r1
a000000100116f46: 80 00 00 00 42 20 mov r8=r0
a000000100116f4c: 02 00 08 90 mov r17=256
a000000100116f50: 0b 90 00 00 00 21 [MMI] mov r18=r0;;
a000000100116f56: 50 01 08 30 20 00 ld8 r21=[r2]
a000000100116f5c: 00 00 04 00 nop.i 0x0;;
a000000100116f60: 01 78 00 2a 00 21 [MII] mov r15=r21
a000000100116f66: 00 00 00 02 00 00 nop.i 0x0
a000000100116f6c: 00 00 04 00 nop.i 0x0;;
a000000100116f70: 03 80 20 1e 18 14 [MII] ld8 r16=[r15],8
a000000100116f76: 10 01 46 7e 46 40 adds r17=-64,r17;;
a000000100116f7c: 00 8c b0 88 and r2=-64,r17;;
a000000100116f80: 10 48 00 04 08 39 [MIB] cmp.eq p9,p8=0,r2
a000000100116f86: a0 00 40 16 f2 05 cmp.eq p10,p11=0,r16
a000000100116f8c: 90 00 00 43 (p11) br.cond.dpnt.few a000000100117010 <foo+0xd0>
a000000100116f90: 11 90 00 25 00 21 [MIB] adds r18=64,r18
a000000100116f96: 00 00 00 02 00 04 nop.i 0x0
a000000100116f9c: e0 ff ff 4a (p08) br.cond.dptk.few a000000100116f70 <foo+0x30>;;
a000000100116fa0: 01 50 fc f9 ff 27 [MII] mov r10=-1
a000000100116fa6: 00 01 46 4a 40 20 sub r16=64,r17
a000000100116fac: 01 88 20 e4 cmp.eq p9,p8=0,r17;;
a000000100116fb0: 30 71 00 24 00 21 [MIB] (p09) mov r14=r18
a000000100116fb6: b0 00 40 24 80 04 zxt4 r11=r16
a000000100116fbc: 90 00 00 42 (p09) br.cond.dptk.few a000000100117040 <foo+0x100>
a000000100116fc0: 0b 18 00 1e 18 10 [MMI] ld8 r3=[r15];;
a000000100116fc6: 00 00 00 02 00 20 nop.m 0x0
a000000100116fcc: b1 50 00 79 shr.u r9=r10,r11;;
a000000100116fd0: 03 00 00 00 01 00 [MII] nop.m 0x0
a000000100116fd6: 00 00 00 02 00 00 nop.i 0x0;;
a000000100116fdc: 00 00 04 00 nop.i 0x0;;
a000000100116fe0: 01 00 00 00 01 00 [MII] nop.m 0x0
a000000100116fe6: 00 00 00 02 00 00 nop.i 0x0
a000000100116fec: 00 00 04 00 nop.i 0x0;;
a000000100116ff0: 0b 80 24 06 0c 20 [MMI] and r16=r9,r3;;
a000000100116ff6: d0 00 40 18 72 00 cmp.eq p13,p12=0,r16
a000000100116ffc: 00 00 04 00 nop.i 0x0;;
a000000100117000: b0 71 48 22 00 20 [MIB] (p13) add r14=r18,r17
a000000100117006: 00 00 00 02 80 06 nop.i 0x0
a00000010011700c: 40 00 00 42 (p13) br.cond.dptk.few a000000100117040 <foo+0x100>
a000000100117010: 0b 98 fc 21 3f 23 [MMI] adds r19=-1,r16;;
a000000100117016: 10 99 40 1a 40 00 andcm r17=r19,r16
a00000010011701c: 00 00 04 00 nop.i 0x0;;
a000000100117020: 02 00 00 00 01 00 [MII] nop.m 0x0
a000000100117026: f0 00 44 a4 39 c0 popcnt r15=r17;;
a00000010011702c: 21 79 00 80 add r14=r18,r15
a000000100117030: 01 00 00 00 01 00 [MII] nop.m 0x0
a000000100117036: 00 00 00 02 00 00 nop.i 0x0
a00000010011703c: 00 00 04 00 nop.i 0x0;;
a000000100117040: 00 90 fc 01 01 24 [MII] mov r18=255
a000000100117046: f0 00 38 00 42 c0 mov r15=r14
a00000010011704c: f2 e7 ff 9f mov r22=-1
a000000100117050: 1d a0 fc 01 01 24 [MFB] mov r20=255
a000000100117056: 00 00 00 02 00 00 nop.f 0x0
a00000010011705c: 00 00 00 20 nop.b 0x0;;
a000000100117060: 10 78 48 1c 8e 30 [MIB] cmp4.lt p15,p14=r18,r14
a000000100117066: 00 00 00 02 80 87 nop.i 0x0
a00000010011706c: 08 00 84 03 (p15) br.ret.dpnt.many b0
a000000100117070: 01 c0 04 1e 00 21 [MII] adds r24=1,r15
a000000100117076: 30 01 00 04 48 00 mov r19=256
a00000010011707c: 11 40 00 84 adds r8=1,r8;;
a000000100117080: 02 00 00 00 01 00 [MII] nop.m 0x0
a000000100117086: f0 00 60 2c 00 e0 sxt4 r15=r24;;
a00000010011708c: c2 78 e4 52 shr.u r23=r15,6
a000000100117090: 09 90 00 1f 2c 22 [MMI] and r18=-64,r15
a000000100117096: a0 78 4c 16 68 e0 cmp.ltu p10,p11=r15,r19
a00000010011709c: f1 7b b0 80 and r15=63,r15;;
a0000001001170a0: 10 98 4c 24 05 e0 [MIB] sub r19=r19,r18
a0000001001170a6: e2 00 00 04 c8 05 (p11) mov r14=256
a0000001001170ac: a0 01 00 42 (p11) br.cond.dptk.few a000000100117240 <foo+0x300>
a0000001001170b0: 0a 40 00 1e 09 39 [MMI] cmp.eq p8,p9=0,r15;;
a0000001001170b6: c0 f8 4d 1a 6a 40 cmp.ltu p12,p13=63,r19
a0000001001170bc: 63 79 20 79 shl r26=r22,r15
a0000001001170c0: 11 88 5c 2a 12 20 [MIB] shladd r17=r23,3,r21
a0000001001170c6: 00 00 00 02 00 04 nop.i 0x0
a0000001001170cc: 70 00 00 42 (p08) br.cond.dptk.few a000000100117130 <foo+0x1f0>;;
a0000001001170d0: 01 c8 20 22 18 14 [MII] ld8 r25=[r17],8
a0000001001170d6: 00 00 00 02 00 00 nop.i 0x0
a0000001001170dc: 00 00 04 00 nop.i 0x0;;
a0000001001170e0: 01 00 00 00 01 00 [MII] nop.m 0x0
a0000001001170e6: 00 00 00 02 00 00 nop.i 0x0
a0000001001170ec: 00 00 04 00 nop.i 0x0;;
a0000001001170f0: 10 80 68 32 0c 20 [MIB] and r16=r26,r25
a0000001001170f6: 00 00 00 02 80 06 nop.i 0x0
a0000001001170fc: c0 00 00 43 (p13) br.cond.dpnt.few a0000001001171b0 <foo+0x270>
a000000100117100: 1d 98 00 27 3f 23 [MFB] adds r19=-64,r19
a000000100117106: 00 00 00 02 00 00 nop.f 0x0
a00000010011710c: 00 00 00 20 nop.b 0x0;;
a000000100117110: 10 70 00 20 0f 39 [MIB] cmp.eq p14,p15=0,r16
a000000100117116: 00 00 00 02 80 07 nop.i 0x0
a00000010011711c: 00 01 00 43 (p15) br.cond.dpnt.few a000000100117210 <foo+0x2d0>
a000000100117120: 00 90 00 25 00 21 [MII] adds r18=64,r18
a000000100117126: 00 00 00 02 00 00 nop.i 0x0
a00000010011712c: 00 00 04 00 nop.i 0x0
a000000100117130: 1d d8 00 27 2c 22 [MFB] and r27=-64,r19
a000000100117136: 00 00 00 02 00 00 nop.f 0x0
a00000010011713c: 00 00 00 20 nop.b 0x0;;
a000000100117140: 10 58 00 36 0a 39 [MIB] cmp.eq p11,p10=0,r27
a000000100117146: 00 00 00 02 80 05 nop.i 0x0
a00000010011714c: 40 00 00 43 (p11) br.cond.dpnt.few a000000100117180 <foo+0x240>
a000000100117150: 03 80 20 22 18 14 [MII] ld8 r16=[r17],8
a000000100117156: 30 01 4e 7e 46 80 adds r19=-64,r19;;
a00000010011715c: 03 9c b0 88 and r28=-64,r19;;
a000000100117160: 10 48 00 38 08 39 [MIB] cmp.eq p9,p8=0,r28
a000000100117166: c0 00 40 1a f2 06 cmp.eq p12,p13=0,r16
a00000010011716c: b0 00 00 43 (p13) br.cond.dpnt.few a000000100117210 <foo+0x2d0>
a000000100117170: 11 90 00 25 00 21 [MIB] adds r18=64,r18
a000000100117176: 00 00 00 02 00 04 nop.i 0x0
a00000010011717c: e0 ff ff 4a (p08) br.cond.dptk.few a000000100117150 <foo+0x210>;;
a000000100117180: 1d 48 00 26 08 39 [MFB] cmp.eq p9,p8=0,r19
a000000100117186: 00 00 00 02 00 00 nop.f 0x0
a00000010011718c: 00 00 00 20 nop.b 0x0;;
a000000100117190: 30 71 00 24 00 21 [MIB] (p09) mov r14=r18
a000000100117196: 00 00 00 02 80 04 nop.i 0x0
a00000010011719c: b0 00 00 42 (p09) br.cond.dptk.few a000000100117240 <foo+0x300>
a0000001001171a0: 00 80 00 22 18 10 [MII] ld8 r16=[r17]
a0000001001171a6: 00 00 00 02 00 00 nop.i 0x0
a0000001001171ac: 00 00 04 00 nop.i 0x0
a0000001001171b0: 0b f8 00 27 25 20 [MMI] sub r31=64,r19;;
a0000001001171b6: 00 00 00 02 00 c0 nop.m 0x0
a0000001001171bc: 03 f8 48 00 zxt4 r30=r31;;
a0000001001171c0: 01 00 00 00 01 00 [MII] nop.m 0x0
a0000001001171c6: d0 f1 58 80 3c 00 shr.u r29=r22,r30
a0000001001171cc: 00 00 04 00 nop.i 0x0;;
a0000001001171d0: 03 00 00 00 01 00 [MII] nop.m 0x0
a0000001001171d6: 00 00 00 02 00 00 nop.i 0x0;;
a0000001001171dc: 00 00 04 00 nop.i 0x0;;
a0000001001171e0: 01 00 00 00 01 00 [MII] nop.m 0x0
a0000001001171e6: 00 00 00 02 00 00 nop.i 0x0
a0000001001171ec: 00 00 04 00 nop.i 0x0;;
a0000001001171f0: 0b 80 74 20 0c 20 [MMI] and r16=r29,r16;;
a0000001001171f6: f0 00 40 1c 72 00 cmp.eq p15,p14=0,r16
a0000001001171fc: 00 00 04 00 nop.i 0x0;;
a000000100117200: f0 71 48 26 00 20 [MIB] (p15) add r14=r18,r19
a000000100117206: 00 00 00 02 80 07 nop.i 0x0
a00000010011720c: 40 00 00 42 (p15) br.cond.dptk.few a000000100117240 <foo+0x300>
a000000100117210: 0b 48 fc 21 3f 23 [MMI] adds r9=-1,r16;;
a000000100117216: 30 48 40 1a 40 00 andcm r3=r9,r16
a00000010011721c: 00 00 04 00 nop.i 0x0;;
a000000100117220: 02 00 00 00 01 00 [MII] nop.m 0x0
a000000100117226: 20 00 0c a4 39 c0 popcnt r2=r3;;
a00000010011722c: 21 11 00 80 add r14=r18,r2
a000000100117230: 01 00 00 00 01 00 [MII] nop.m 0x0
a000000100117236: 00 00 00 02 00 00 nop.i 0x0
a00000010011723c: 00 00 04 00 nop.i 0x0;;
a000000100117240: 10 78 00 1c 00 21 [MIB] mov r15=r14
a000000100117246: b0 a0 38 14 61 05 cmp4.lt p11,p10=r20,r14
a00000010011724c: 30 fe ff 4a (p10) br.cond.dptk.few a000000100117070 <foo+0x130>
a000000100117250: 11 00 00 00 01 00 [MIB] nop.m 0x0
a000000100117256: 00 00 00 02 00 80 nop.i 0x0
a00000010011725c: 08 00 84 00 br.ret.sptk.many b0;;
Possible changes to consider:
1) Instead of replacing each numnodes with num_online_nodes(), rather
add a local function variable:
int numnodes = num_online_nodes();
This would reduce the size of the source code patch as well.
2) Perhaps some of the mechanism laying beneath num_online_nodes(),
such as in the bitmap/bitop area, should not be inlined.
3) Are not the following two codes essentially equivalent:
int n;
for_each_online_node(n) {
blah blah ...
}
and:
int n;
for (n = 0; n < MAX_NUMNODES; n++) {
if (! node_online(n))
continue;
blah blah ...
}
I'll wager the second form generates better code. And since
the second form is closer to what was there before, generates
a smaller patch.
In other words, I am not yet understanding the value of changing
each loop over nodes to use these macros.
Just possible avenues for investigation - there are likely others.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.650.933.1373
next prev parent reply other threads:[~2004-04-06 11:44 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-04-01 20:28 [Patch 0/23] mask v2 - Second version of mask, cpumask and nodemask consolidation Paul Jackson
2004-04-01 21:10 ` [Patch 1/23] mask v2 - Document bitmap.c bit model Paul Jackson
2004-04-01 21:10 ` [Patch 2/23] mask v2 - Tighten unused bitmap bit handling Paul Jackson
2004-04-01 21:11 ` [Patch 3/23] mask v2 - New bitmap operators Paul Jackson
2004-04-01 21:11 ` [Patch 4/23] mask v2 - two missing 'const' qualifiers Paul Jackson
2004-04-01 21:11 ` [Patch 5/23] mask v2 - Add new mask.h file Paul Jackson
2004-04-02 20:26 ` Matthew Dobson
2004-04-03 5:12 ` Paul Jackson
2004-04-01 21:11 ` [Patch 6/23] mask v2 - Replace cpumask_t with one using mask Paul Jackson
2004-04-02 22:24 ` Matthew Dobson
2004-04-02 23:35 ` Paul Jackson
2004-04-03 1:09 ` Matthew Dobson
2004-04-03 6:00 ` Paul Jackson
2004-04-04 5:57 ` Paul Jackson
2004-04-03 5:23 ` Paul Jackson
2004-04-01 21:11 ` [Patch 7/23] mask v2 - Remove i386 obsolete cpumask ops Paul Jackson
2004-04-01 21:11 ` [Patch 8/23] mask v2 - Remove ppc64 " Paul Jackson
2004-04-01 21:11 ` [Patch 9/23] mask v2 - Remove x86_64 " Paul Jackson
2004-04-01 21:12 ` [Patch 10/23] mask v2 - Remove obsolete cpumask emulation Paul Jackson
2004-04-01 21:12 ` [Patch 11/23] mask v2 - Add new nodemasks.h file Paul Jackson
2004-04-01 21:12 ` [Patch 12/23] mask v2 - [1/7] mmzone.h changes for nodemask Paul Jackson
2004-04-01 21:12 ` [Patch 13/23] mask v2 - [2/7] nodemask_t core changes Paul Jackson
2004-04-01 21:12 ` [Patch 14/23] mask v2 - [3/7] nodemask_t_i386_changes Paul Jackson
2004-04-01 21:12 ` [Patch 15/23] mask v2 - [4/7] nodemask_t_pp64_changes Paul Jackson
2004-04-01 21:12 ` [Patch 16/23] mask v2 - [5/7] nodemask_t_x86_64_changes Paul Jackson
2004-04-01 21:12 ` [Patch 17/23] mask v2 = [6/7] nodemask_t_ia64_changes Paul Jackson
2004-04-06 11:37 ` Paul Jackson [this message]
2004-04-07 5:55 ` Denis Vlasenko
2004-04-07 6:50 ` Paul Jackson
2004-04-07 7:44 ` Paul Jackson
2004-04-07 14:13 ` Nick Piggin
2004-04-07 14:44 ` Paul Jackson
2004-04-07 15:02 ` Nick Piggin
2004-04-07 15:21 ` Paul Jackson
2004-04-09 7:54 ` Denis Vlasenko
2004-04-09 17:53 ` Paul Jackson
2004-04-09 20:04 ` Denis Vlasenko
2004-04-10 2:54 ` Paul Jackson
2004-04-07 11:27 ` Paul Jackson
2004-04-09 18:54 ` Paul Jackson
2004-04-01 21:12 ` [Patch 18/23] mask v2 - [7/7] nodemask_t_other_arch_changes Paul Jackson
2004-04-01 21:12 ` [Patch 19/23] mask v2 - Simplify sparc64 cpumask loop code Paul Jackson
2004-04-01 21:12 ` [Patch 20/23] mask v2 - Optimize i386 cpumask macro usage Paul Jackson
2004-04-01 21:13 ` [Patch 21/23] mask v2 - Dyadic physids_complement() Paul Jackson
2004-04-01 21:13 ` [Patch 22/23] mask v2 - Fix cpumask in asm-x86_64/topology.h Paul Jackson
2004-04-01 21:13 ` [Patch 23/23] mask v2 - Cpumask tweak in kernel/sched.c Paul Jackson
2004-04-02 8:15 ` [Patch 24/23] mask v2 - Small system optimizations Paul Jackson
2004-04-04 5:56 ` Paul Jackson
2004-04-04 6:16 ` [Patch 24a/23] mask v2 - UP fix, faster mask_of_bit, MASK_ALL* names Paul Jackson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20040406043732.6fb2df9f.pj@sgi.com \
--to=pj@sgi.com \
--cc=colpatch@us.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox