From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jia He Date: Fri, 30 Sep 2016 03:28:57 +0000 Subject: [PATCH v6 0/7] Reduce cache miss for snmp_fold_field Message-Id: <1475206144-23228-1-git-send-email-hejianet@gmail.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: netdev@vger.kernel.org Cc: linux-sctp@vger.kernel.org, linux-kernel@vger.kernel.org, davem@davemloft.net, Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , Vlad Yasevich , Neil Horman , Steffen Klassert , Herbert Xu , marcelo.leitner@gmail.com, Jia He In a PowerPc server with large cpu number(160), besides commit a3a773726c9f ("net: Optimize snmp stat aggregation by walking all the percpu data at once"), I watched several other snmp_fold_field callsites which would cause high cache miss rate. test source code: ======== My simple test case, which read from the procfs items endlessly: /***********************************************************/ #include #include #include #include #include #define LINELEN 2560 int main(int argc, char **argv) { int i; int fd = -1 ; int rdsize = 0; char buf[LINELEN+1]; buf[LINELEN] = 0; memset(buf,0,LINELEN); if(1 >= argc) { printf("file name empty\n"); return -1; } fd = open(argv[1], O_RDWR, 0644); if(0 > fd){ printf("open error\n"); return -2; } for(i=0;i<0xffffffff;i++) { while(0 < (rdsize = read(fd,buf,LINELEN))){ //nothing here } lseek(fd, 0, SEEK_SET); } close(fd); return 0; } /**********************************************************/ compile and run: ======== gcc test.c -o test perf stat -d -e cache-misses ./test /proc/net/snmp perf stat -d -e cache-misses ./test /proc/net/snmp6 perf stat -d -e cache-misses ./test /proc/net/sctp/snmp perf stat -d -e cache-misses ./test /proc/net/xfrm_stat before the patch set: ========== Performance counter stats for 'system wide': 355911097 cache-misses [40.08%] 2356829300 L1-dcache-loads [60.04%] 355642645 L1-dcache-load-misses # 15.09% of all L1-dcache hits [60.02%] 346544541 LLC-loads [59.97%] 389763 LLC-load-misses # 0.11% of all LL-cache hits [40.02%] 6.245162638 seconds time elapsed After the patch set: ========= Performance counter stats for 'system wide': 194992476 cache-misses [40.03%] 6718051877 L1-dcache-loads [60.07%] 194871921 L1-dcache-load-misses # 2.90% of all L1-dcache hits [60.11%] 187632232 LLC-loads [60.04%] 464466 LLC-load-misses # 0.25% of all LL-cache hits [39.89%] 6.868422769 seconds time elapsed The cache-miss rate can be reduced from 15% to 2.9% changelog ====v6: - correct v5 v5: - order local variables from longest to shortest line v4: - move memset into one block of if statement in snmp6_seq_show_item - remove the changes in netstat_seq_show considerred the stack usage is too large v3: - introduce generic interface (suggested by Marcelo Ricardo Leitner) - use max_t instead of self defined macro (suggested by David Miller) v2: - fix bug in udplite statistics. - snmp_seq_show is split into 2 parts Jia He (7): net:snmp: Introduce generic interfaces for snmp_get_cpu_field{,64} proc: Reduce cache miss in snmp_seq_show proc: Reduce cache miss in snmp6_seq_show proc: Reduce cache miss in sctp_snmp_seq_show proc: Reduce cache miss in xfrm_statistics_seq_show ipv6: Remove useless parameter in __snmp6_fill_statsdev net: Suppress the "Comparison to NULL could be written" warnings include/net/ip.h | 23 ++++++++++++ net/ipv4/proc.c | 102 +++++++++++++++++++++++++++++++-------------------- net/ipv6/addrconf.c | 12 +++--- net/ipv6/proc.c | 30 +++++++++++---- net/sctp/proc.c | 10 +++-- net/xfrm/xfrm_proc.c | 10 ++++- 6 files changed, 129 insertions(+), 58 deletions(-) -- 2.5.5