* [PATCH 2.6.9-rc2 0/2] enhanced accounting data collection @ 2004-09-27 22:34 Jay Lan 2004-09-27 22:44 ` [PATCH 2.6.9-rc2 1/2] enhanced I/O " Jay Lan 2004-09-27 22:50 ` [PATCH 2.6.9-rc2 2/2] enhanced MM " Jay Lan 0 siblings, 2 replies; 10+ messages in thread From: Jay Lan @ 2004-09-27 22:34 UTC (permalink / raw) To: LKML Cc: lse-tech, CSA-ML, Andrew Morton, Guillaume Thouvenin, Tim Schmielau, Arthur Corliss This is an effort of providing an enhanced accounting data collection. It is intended to offer common data collection method for various accounting packages including BSD accouting, ELSA, CSA, and any other acct packages that favor a common layer of data collection, separated from data presentation layer and management of process groups layer. This patchset consists of two parts: acct_io and acct_mm as we identified useful spots for improved data collection in the area of IO and MM. This patchset is to replace the previously submitted CSA patchset of four. The CSA kernel module is a standalone module. The csa_eop patch was to provide a hook for end-of-process handling and that can be considered separately unless there is enough common interest. Now that the patchset is down to IO and MM, i hope it is more appealing :) Comments? Best Regards, - jay --- Jay Lan - Linux System Software Silicon Graphics Inc., Mountain View, CA ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 1/2] enhanced I/O accounting data collection 2004-09-27 22:34 [PATCH 2.6.9-rc2 0/2] enhanced accounting data collection Jay Lan @ 2004-09-27 22:44 ` Jay Lan 2004-09-27 22:50 ` [PATCH 2.6.9-rc2 2/2] enhanced MM " Jay Lan 1 sibling, 0 replies; 10+ messages in thread From: Jay Lan @ 2004-09-27 22:44 UTC (permalink / raw) To: LKML Cc: lse-tech, CSA-ML, Andrew Morton, Guillaume Thouvenin, Tim Schmielau, Arthur Corliss [-- Attachment #1: Type: text/plain, Size: 95 bytes --] 1/2: acct_io Enhanced I/O accounting data collection. Signed-off-by: Jay Lan <jlan@sgi.com> [-- Attachment #2: acct_io --] [-- Type: text/plain, Size: 2404 bytes --] Index: linux/drivers/block/ll_rw_blk.c =================================================================== --- linux.orig/drivers/block/ll_rw_blk.c 2004-09-12 22:31:31.000000000 -0700 +++ linux/drivers/block/ll_rw_blk.c 2004-09-27 12:37:04.374234677 -0700 @@ -1741,6 +1741,7 @@ { DEFINE_WAIT(wait); struct request *rq; + unsigned long start_wait = jiffies; generic_unplug_device(q); do { @@ -1769,6 +1770,7 @@ finish_wait(&rl->wait[rw], &wait); } while (!rq); + current->bwtime += (unsigned long) jiffies - start_wait; return rq; } Index: linux/fs/read_write.c =================================================================== --- linux.orig/fs/read_write.c 2004-09-12 22:32:55.000000000 -0700 +++ linux/fs/read_write.c 2004-09-27 12:37:04.381070659 -0700 @@ -216,8 +216,11 @@ ret = file->f_op->read(file, buf, count, pos); else ret = do_sync_read(file, buf, count, pos); - if (ret > 0) + if (ret > 0) { dnotify_parent(file->f_dentry, DN_ACCESS); + current->rchar += ret; + } + current->syscr++; } } @@ -260,8 +263,11 @@ ret = file->f_op->write(file, buf, count, pos); else ret = do_sync_write(file, buf, count, pos); - if (ret > 0) + if (ret > 0) { dnotify_parent(file->f_dentry, DN_MODIFY); + current->wchar += ret; + } + current->syscw++; } } @@ -540,6 +546,10 @@ fput_light(file, fput_needed); } + if (ret > 0) { + current->rchar += ret; + } + current->syscr++; return ret; } @@ -558,6 +568,10 @@ fput_light(file, fput_needed); } + if (ret > 0) { + current->wchar += ret; + } + current->syscw++; return ret; } @@ -636,6 +650,13 @@ retval = in_file->f_op->sendfile(in_file, ppos, count, file_send_actor, out_file); + if (retval > 0) { + current->rchar += retval; + current->wchar += retval; + } + current->syscr++; + current->syscw++; + if (*ppos > max) retval = -EOVERFLOW; Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h 2004-09-27 11:57:40.220967100 -0700 +++ linux/include/linux/sched.h 2004-09-27 12:52:51.305237393 -0700 @@ -591,6 +591,9 @@ struct rw_semaphore pagg_sem; #endif +/* i/o counters(bytes read/written, #syscalls, waittime */ + unsigned long rchar, wchar, syscr, syscw, bwtime; + }; static inline pid_t process_group(struct task_struct *tsk) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 2/2] enhanced MM accounting data collection 2004-09-27 22:34 [PATCH 2.6.9-rc2 0/2] enhanced accounting data collection Jay Lan 2004-09-27 22:44 ` [PATCH 2.6.9-rc2 1/2] enhanced I/O " Jay Lan @ 2004-09-27 22:50 ` Jay Lan 2004-09-28 9:33 ` Paul Jackson 1 sibling, 1 reply; 10+ messages in thread From: Jay Lan @ 2004-09-27 22:50 UTC (permalink / raw) To: LKML Cc: lse-tech, CSA-ML, Andrew Morton, Guillaume Thouvenin, Tim Schmielau, Arthur Corliss [-- Attachment #1: Type: text/plain, Size: 94 bytes --] 2/2: acct_mm Enhanced MM accounting data collection. Signed-off-by: Jay Lan <jlan@sgi.com> [-- Attachment #2: acct_mm --] [-- Type: text/plain, Size: 11185 bytes --] Index: linux/fs/exec.c =================================================================== --- linux.orig/fs/exec.c 2004-09-27 11:57:40.201435722 -0700 +++ linux/fs/exec.c 2004-09-27 14:05:41.266160725 -0700 @@ -47,6 +47,7 @@ #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/pagg.h> +#include <linux/csa_internal.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1163,6 +1164,9 @@ /* execve success */ security_bprm_free(&bprm); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); return retval; } Index: linux/include/linux/csa_internal.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/include/linux/csa_internal.h 2004-09-27 14:05:41.279832688 -0700 @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2000-2002 Silicon Graphics, Inc and LANL All Rights Reserved. + * Copyright (c) 2004 Silicon Graphics, Inc All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of + * the License, or (at your option) any later version. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ + +/* + * CSA (Comprehensive System Accounting) + * Job Accounting for Linux + * + * This header file contains the definitions needed for communication + * between the kernel and the CSA module. + */ + +#ifndef _LINUX_CSA_INTERNAL_H +#define _LINUX_CSA_INTERNAL_H + +#include <linux/config.h> + +#if defined (CONFIG_CSA) || defined (CONFIG_CSA_MODULE) + +#include <linux/linkage.h> +#include <linux/ptrace.h> + +static inline void csa_update_integrals(void) +{ + long delta; + + if (current->mm) { + delta = current->stime - current->csa_stimexpd; + current->csa_stimexpd = current->stime; + current->csa_rss_mem1 += delta * current->mm->rss; + current->csa_vm_mem1 += delta * current->mm->total_vm; + } +} + +static inline void csa_clear_integrals(struct task_struct *tsk) +{ + if (tsk) { + tsk->csa_stimexpd = 0; + tsk->csa_rss_mem1 = 0; + tsk->csa_vm_mem1 = 0; + } +} + +#else /* CONFIG_CSA || CONFIG_CSA_MODULE */ + +#define csa_update_integrals() do { } while (0) +#define csa_clear_integrals(task) do { } while (0) +#endif /* CONFIG_CSA || CONFIG_CSA_MODULE */ + +#endif /* _LINUX_CSA_INTERNAL_H */ Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h 2004-09-27 14:04:52.905497872 -0700 +++ linux/include/linux/sched.h 2004-09-27 14:06:35.938387661 -0700 @@ -249,6 +249,8 @@ struct kioctx *ioctx_list; struct kioctx default_kioctx; + + unsigned long hiwater_rss, hiwater_vm; }; extern int mmlist_nr; @@ -593,6 +595,10 @@ /* i/o counters(bytes read/written, #syscalls, waittime */ unsigned long rchar, wchar, syscr, syscw, bwtime; +#if defined(CONFIG_CSA) || defined(CONFIG_CSA_MODULE) + u64 csa_rss_mem1, csa_vm_mem1; + clock_t csa_stimexpd; +#endif }; @@ -817,6 +823,19 @@ /* Remove the current tasks stale references to the old mm_struct */ extern void mm_release(struct task_struct *, struct mm_struct *); +/* Update highwater values */ +static inline void update_mem_hiwater(void) +{ + if (current->mm) { + if (current->mm->hiwater_rss < current->mm->rss) { + current->mm->hiwater_rss = current->mm->rss; + } + if (current->mm->hiwater_vm < current->mm->total_vm) { + current->mm->hiwater_vm = current->mm->total_vm; + } + } +} + extern int copy_thread(int, unsigned long, unsigned long, unsigned long, struct task_struct *, struct pt_regs *); extern void flush_thread(void); extern void exit_thread(void); Index: linux/kernel/exit.c =================================================================== --- linux.orig/kernel/exit.c 2004-09-27 11:57:40.247334460 -0700 +++ linux/kernel/exit.c 2004-09-27 14:05:41.292528082 -0700 @@ -25,6 +25,7 @@ #include <linux/proc_fs.h> #include <linux/mempolicy.h> #include <linux/pagg.h> +#include <linux/csa_internal.h> #include <asm/uaccess.h> #include <asm/unistd.h> @@ -808,6 +809,9 @@ ptrace_notify((PTRACE_EVENT_EXIT << 8) | SIGTRAP); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); acct_process(code); __exit_mm(tsk); Index: linux/kernel/fork.c =================================================================== --- linux.orig/kernel/fork.c 2004-09-27 12:35:06.585377528 -0700 +++ linux/kernel/fork.c 2004-09-27 14:05:41.296434358 -0700 @@ -39,7 +39,7 @@ #include <linux/profile.h> #include <linux/rmap.h> #include <linux/pagg.h> - +#include <linux/csa_internal.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> #include <asm/uaccess.h> @@ -607,6 +607,9 @@ if (retval) goto free_pt; + mm->hiwater_rss = mm->rss; + mm->hiwater_vm = mm->total_vm; + good_mm: tsk->mm = mm; tsk->active_mm = mm; @@ -995,6 +998,8 @@ p->utime = p->stime = 0; p->rchar = p->wchar = p->syscr = p->syscw = 0; p->bwtime = 0; + /* no-op if CONFIG_CSA not set */ + csa_clear_integrals(p); p->lock_depth = -1; /* -1 = no lock */ p->start_time = get_jiffies_64(); p->security = NULL; Index: linux/mm/memory.c =================================================================== --- linux.orig/mm/memory.c 2004-09-12 22:32:26.000000000 -0700 +++ linux/mm/memory.c 2004-09-27 14:05:41.304246908 -0700 @@ -44,6 +44,7 @@ #include <linux/highmem.h> #include <linux/pagemap.h> #include <linux/rmap.h> +#include <linux/csa_internal.h> #include <linux/module.h> #include <linux/init.h> @@ -605,6 +606,8 @@ tlb = tlb_gather_mmu(mm, 0); unmap_vmas(&tlb, mm, vma, address, end, &nr_accounted, details); tlb_finish_mmu(tlb, address, end); + /* no-op unless CONFIG_CSA is set */ + csa_update_integrals(); spin_unlock(&mm->page_table_lock); } @@ -1095,9 +1098,12 @@ spin_lock(&mm->page_table_lock); page_table = pte_offset_map(pmd, address); if (likely(pte_same(*page_table, pte))) { - if (PageReserved(old_page)) + if (PageReserved(old_page)) { ++mm->rss; - else + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + } else page_remove_rmap(old_page); break_cow(vma, new_page, address, page_table); lru_cache_add_active(new_page); @@ -1379,6 +1385,10 @@ remove_exclusive_swap_page(page); mm->rss++; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + pte = mk_pte(page, vma->vm_page_prot); if (write_access && can_share_swap_page(page)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); @@ -1444,6 +1454,9 @@ goto out; } mm->rss++; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); entry = maybe_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)), vma); @@ -1553,6 +1566,10 @@ if (pte_none(*page_table)) { if (!PageReserved(new_page)) ++mm->rss; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + flush_icache_page(vma, new_page); entry = mk_pte(new_page, vma->vm_page_prot); if (write_access) Index: linux/mm/mmap.c =================================================================== --- linux.orig/mm/mmap.c 2004-09-12 22:32:54.000000000 -0700 +++ linux/mm/mmap.c 2004-09-27 14:05:41.308153183 -0700 @@ -20,6 +20,7 @@ #include <linux/hugetlb.h> #include <linux/profile.h> #include <linux/module.h> +#include <linux/csa_internal.h> #include <linux/mount.h> #include <linux/mempolicy.h> #include <linux/rmap.h> @@ -1014,6 +1015,9 @@ down_write(&mm->mmap_sem); } __vm_stat_account(mm, vm_flags, file, len >> PAGE_SHIFT); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); return addr; unmap_and_free_vma: @@ -1360,6 +1364,9 @@ if (vma->vm_flags & VM_LOCKED) vma->vm_mm->locked_vm += grow; __vm_stat_account(vma->vm_mm, vma->vm_flags, vma->vm_file, grow); + /* no-op if CONFIG_CSA_JOB_ACCT not set */ + csa_update_integrals(); + update_mem_hiwater(); anon_vma_unlock(vma); return 0; } @@ -1816,6 +1823,9 @@ mm->locked_vm += len >> PAGE_SHIFT; make_pages_present(addr, addr + len); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); return addr; } Index: linux/mm/mremap.c =================================================================== --- linux.orig/mm/mremap.c 2004-09-12 22:32:48.000000000 -0700 +++ linux/mm/mremap.c 2004-09-27 14:05:41.312059458 -0700 @@ -16,6 +16,7 @@ #include <linux/fs.h> #include <linux/highmem.h> #include <linux/security.h> +#include <linux/csa_internal.h> #include <asm/uaccess.h> #include <asm/cacheflush.h> @@ -232,6 +233,10 @@ new_addr + new_len); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); + return new_addr; } @@ -368,6 +373,9 @@ make_pages_present(addr + old_len, addr + new_len); } + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); ret = addr; goto out; } Index: linux/mm/rmap.c =================================================================== --- linux.orig/mm/rmap.c 2004-09-12 22:33:36.000000000 -0700 +++ linux/mm/rmap.c 2004-09-27 14:05:41.315965733 -0700 @@ -50,6 +50,7 @@ #include <linux/swapops.h> #include <linux/slab.h> #include <linux/init.h> +#include <linux/csa_internal.h> #include <linux/rmap.h> #include <linux/rcupdate.h> @@ -580,6 +581,8 @@ } mm->rss--; + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); page_remove_rmap(page); page_cache_release(page); @@ -679,6 +682,8 @@ page_remove_rmap(page); page_cache_release(page); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); mm->rss--; (*mapcount)--; } Index: linux/mm/swapfile.c =================================================================== --- linux.orig/mm/swapfile.c 2004-09-12 22:31:57.000000000 -0700 +++ linux/mm/swapfile.c 2004-09-27 14:05:41.320848577 -0700 @@ -24,6 +24,7 @@ #include <linux/module.h> #include <linux/rmap.h> #include <linux/security.h> +#include <linux/csa_internal.h> #include <linux/backing-dev.h> #include <asm/pgtable.h> @@ -435,6 +436,9 @@ set_pte(dir, pte_mkold(mk_pte(page, vma->vm_page_prot))); page_add_anon_rmap(page, vma, address); swap_free(entry); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); + update_mem_hiwater(); } /* vma->vm_mm->page_table_lock is held */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 2/2] enhanced MM accounting data collection 2004-09-27 22:50 ` [PATCH 2.6.9-rc2 2/2] enhanced MM " Jay Lan @ 2004-09-28 9:33 ` Paul Jackson 2004-09-28 11:38 ` Robin Holt 2004-10-02 0:38 ` [Lse-tech] " Jay Lan 0 siblings, 2 replies; 10+ messages in thread From: Paul Jackson @ 2004-09-28 9:33 UTC (permalink / raw) To: Jay Lan Cc: linux-kernel, lse-tech, csa, akpm, guillaume.thouvenin, tim, corliss nits: 1) I'm not sure the "no-op if CONFIG_CSA not set" comments are worthwhile - it does not seem to be a common practice to mark macros that collapse under certain CONFIG's with such comments, and some code, such as in fork.c, would become quite a bit less readable if such comments were widely used. 2) Three of the added csa_update_integrals() lines have leading spaces, instead of a tab char, such as in: =================================================================== --- linux.orig/fs/exec.c 2004-09-27 11:57:40.201435722 -0700 +++ linux/fs/exec.c 2004-09-27 14:05:41.266160725 -0700 @@ -1163,6 +1164,9 @@ /* execve success */ security_bprm_free(&bprm); + /* no-op if CONFIG_CSA not set */ + csa_update_integrals(); <========= + update_mem_hiwater(); <========= return retval; } 3) Is it always the case that csa_update_integrals() and update_mem_hiwater() are used together? If so, perhaps they could be collapsed into one? Even the current->mm test inside them could be made one test, perhaps? 4) What kind of kernel text size expansion does this cause? There seem to be about a dozen of these calls. What are the pros and cons of inlining csa_update_integrals() and update_mem_hiwater()? Are these on hot enough kernel code paths that we should benchmark with and without these hooks enabled, both inline and out-of-line? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 2/2] enhanced MM accounting data collection 2004-09-28 9:33 ` Paul Jackson @ 2004-09-28 11:38 ` Robin Holt 2004-09-28 13:29 ` Paul Jackson 2004-10-02 0:38 ` [Lse-tech] " Jay Lan 1 sibling, 1 reply; 10+ messages in thread From: Robin Holt @ 2004-09-28 11:38 UTC (permalink / raw) To: Paul Jackson Cc: Jay Lan, linux-kernel, lse-tech, csa, akpm, guillaume.thouvenin, tim, corliss On Tue, Sep 28, 2004 at 02:33:50AM -0700, Paul Jackson wrote: > nits: > > 3) Is it always the case that csa_update_integrals() and > update_mem_hiwater() are used together? If so, perhaps > they could be collapsed into one? Even the current->mm > test inside them could be made one test, perhaps? This sounds like a really good idea. Maybe update_mem_hiwater should have the #ifdef CONFIG_CSA inside it. This really sounds like a good idea. Is update_mem_hiwater everywhere that csa_update_integrals needs to be? I seem to remember one or two places where that was not the case. Of course that was a few years ago and my memory is really fuzzy > > 4) What kind of kernel text size expansion does this cause? > There seem to be about a dozen of these calls. What are > the pros and cons of inlining csa_update_integrals() and > update_mem_hiwater()? Are these on hot enough kernel code > paths that we should benchmark with and without these hooks > enabled, both inline and out-of-line? The size was never very noticable. It usually did not even cause overflow to the next 4k page. The csa_job module added the biggest bloat, but I have nearly always compiled that as a module. I have benchmarked these hooks a very long time ago. The number and location has not changed appreciably. I ran three seperate tests. The first was without any csa config'd on. The second was with csa config'd on, but no job containers and the writing of the accounting file turned off. Last was with everything. We ran 7 runs of each config on a 32 cpu system. There was no delta between the two kernels that were not writing accounting files. Actually, the average for the with integrals was 0.3% above the without integrals, but this is well within the noise range. Originally, there was a 5% decrease in performance with the writing of the accounting data. There was another unfortunate side effect that some of the CSA metrics became much worse. This problem was later identified and fixed. At that point, CSA logging caused a 2.7% drop in AIM7 peak with shifting the transition point (I think that was the name I remember) towards a higher number of processes. Jack said that was due to a slight serializing of process exits. Robin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 2/2] enhanced MM accounting data collection 2004-09-28 11:38 ` Robin Holt @ 2004-09-28 13:29 ` Paul Jackson 2004-09-28 14:34 ` Robin Holt 0 siblings, 1 reply; 10+ messages in thread From: Paul Jackson @ 2004-09-28 13:29 UTC (permalink / raw) To: Robin Holt Cc: jlan, linux-kernel, lse-tech, csa, akpm, guillaume.thouvenin, tim, corliss Robin wrote: > I have benchmarked these hooks a very long time ago. The number and > location has not changed appreciably. These results seem reasonable ... thanks. > The size was never very noticable. But would the time cost of being out of line be noticable either? Actually, being out of line might be a tick faster, if it reduced by a cache line what was needed for a common execution path. > Originally, there was a 5% decrease in performance with the writing of > the accounting data. There was another unfortunate side effect that some > of the CSA metrics became much worse. This problem was later identified > and fixed. Is there any non-trivial risk that some other "unfortunate side affect" exists today, that we'd find on benchmarking? I'm not sure its worth benchmarking again, but I slightly suspect it is, and if benchmarking was done, I'd do it with these calls both inline and out of line, to see what affect that had on runtime. If no affect on runtime, I'd tend toward the out of line calls - at least saving a little kernel text space. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@sgi.com> 1.650.933.1373 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 2/2] enhanced MM accounting data collection 2004-09-28 13:29 ` Paul Jackson @ 2004-09-28 14:34 ` Robin Holt 0 siblings, 0 replies; 10+ messages in thread From: Robin Holt @ 2004-09-28 14:34 UTC (permalink / raw) To: Paul Jackson Cc: Robin Holt, jlan, linux-kernel, lse-tech, csa, akpm, guillaume.thouvenin, tim, corliss > > Is there any non-trivial risk that some other "unfortunate side affect" > exists today, that we'd find on benchmarking? When I last owned csa, I was running benchmarks before each SGI release. The tests were a simple matter of grabbing belay or belay2 and running setting up an FC disk vault (one was usually attached that had 16 disks and use Jack's runit script to launch it. I would then take the output and use Jack's web page to graph and compare it to the previous. Additionally, every time I got access to a new larger system, I would run the tests on there and check for any odd affects of CSA. Nothing interesting ever popped up from LBS2.1.1 all the way through to LBS3.0. > > I'm not sure its worth benchmarking again, but I slightly suspect it is, > and if benchmarking was done, I'd do it with these calls both inline and > out of line, to see what affect that had on runtime. If no affect on > runtime, I'd tend toward the out of line calls - at least saving a > little kernel text space. AIM7 is far to big of a hammer to find this level of micro-optimization. You could probably find or write a simple microbenchmark which shows the difference that introducing the code causes, but I would doubt it would show the inline versus the callout. Either way, we have probably spent more time discussing benchmarking this than it is worth at this point of time. I would expect the do_no_page() path will be the easiest to identify the change. I have a simple test which maps a large region and then touches a large number of pages. The whole loop is surronded by reading of the Shub RTC register. This was done to determine the effect of quicklists on page faulting. That type of microbenchmark might be your best bet at finding the problem. Robin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Lse-tech] Re: [PATCH 2.6.9-rc2 2/2] enhanced MM accounting data collection 2004-09-28 9:33 ` Paul Jackson 2004-09-28 11:38 ` Robin Holt @ 2004-10-02 0:38 ` Jay Lan 1 sibling, 0 replies; 10+ messages in thread From: Jay Lan @ 2004-10-02 0:38 UTC (permalink / raw) To: Paul Jackson Cc: linux-kernel, lse-tech, csa, akpm, guillaume.thouvenin, tim, corliss Paul Jackson wrote: > nits: > > 1) I'm not sure the "no-op if CONFIG_CSA not set" comments > are worthwhile - it does not seem to be a common practice > to mark macros that collapse under certain CONFIG's with > such comments, and some code, such as in fork.c, would > become quite a bit less readable if such comments were > widely used. Yeah, that makes sense. Will be fixed in next posting. > > 2) Three of the added csa_update_integrals() lines have > leading spaces, instead of a tab char, such as in: > > =================================================================== > --- linux.orig/fs/exec.c 2004-09-27 11:57:40.201435722 -0700 > +++ linux/fs/exec.c 2004-09-27 14:05:41.266160725 -0700 > @@ -1163,6 +1164,9 @@ > > /* execve success */ > security_bprm_free(&bprm); > + /* no-op if CONFIG_CSA not set */ > + csa_update_integrals(); <========= > + update_mem_hiwater(); <========= > return retval; > } Caused by 'cut-n-paste'. Will be fixed. > > 3) Is it always the case that csa_update_integrals() and > update_mem_hiwater() are used together? If so, perhaps > they could be collapsed into one? Even the current->mm > test inside them could be made one test, perhaps? As Robin pointed out, there are a couple of instances that are not the case. Actually there are three. Thanks for your feedback, Paul! - jay ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2.6.9-rc2 1/2] enhanced I/O accounting data collection
@ 2004-09-28 15:21 Jens Axboe
2004-09-29 23:01 ` Jay Lan
0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2004-09-28 15:21 UTC (permalink / raw)
To: Jay Lan, Linux Kernel
Hi,
> Index: linux/drivers/block/ll_rw_blk.c
> ===================================================================
> --- linux.orig/drivers/block/ll_rw_blk.c 2004-09-12 22:31:31.000000000 -0700
> +++ linux/drivers/block/ll_rw_blk.c 2004-09-27 12:37:04.374234677 -0700
> @@ -1741,6 +1741,7 @@
> {
> DEFINE_WAIT(wait);
> struct request *rq;
>+ unsigned long start_wait = jiffies;
>
> generic_unplug_device(q);
> do {
>@@ -1769,6 +1770,7 @@
> finish_wait(&rl->wait[rw], &wait);
> } while (!rq);
>
>+ current->bwtime += (unsigned long) jiffies - start_wait;
> return rq;
> }
What is the purpose of this hunk alone as block io accounting? It
doesn't make any sense to me - you are accounting the time a process
spends sleeping on a congested queue, it has nothing to do with the
bandwidth time it used. Which, btw, isn't so easy to account on queueing
hardware.
Just curious on what you are trying to achieve here.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2.6.9-rc2 1/2] enhanced I/O accounting data collection 2004-09-28 15:21 [PATCH 2.6.9-rc2 1/2] enhanced I/O " Jens Axboe @ 2004-09-29 23:01 ` Jay Lan 0 siblings, 0 replies; 10+ messages in thread From: Jay Lan @ 2004-09-29 23:01 UTC (permalink / raw) To: Jens Axboe; +Cc: Linux Kernel You are right, Jens. In our earlier posting, we also included block device read/write counters. The block read/write counts are not very accurate but it fits our customers' needs since they used that information sort of for performance analysis than for accounting purpose. Thus the block read/write counters were removed from our patch so that we can concentrate on the accounting needs. This bwtime (block wait time) should have been pulled together with block read/write counters. Regards, - jay Jens Axboe wrote: > Hi, > > >>Index: linux/drivers/block/ll_rw_blk.c >>=================================================================== >>--- linux.orig/drivers/block/ll_rw_blk.c 2004-09-12 22:31:31.000000000 -0700 >>+++ linux/drivers/block/ll_rw_blk.c 2004-09-27 12:37:04.374234677 -0700 >>@@ -1741,6 +1741,7 @@ >>{ >> DEFINE_WAIT(wait); >> struct request *rq; >>+ unsigned long start_wait = jiffies; >> >> generic_unplug_device(q); >> do { >>@@ -1769,6 +1770,7 @@ >> finish_wait(&rl->wait[rw], &wait); >> } while (!rq); >> >>+ current->bwtime += (unsigned long) jiffies - start_wait; >> return rq; >>} > > > What is the purpose of this hunk alone as block io accounting? It > doesn't make any sense to me - you are accounting the time a process > spends sleeping on a congested queue, it has nothing to do with the > bandwidth time it used. Which, btw, isn't so easy to account on queueing > hardware. > > Just curious on what you are trying to achieve here. > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-10-02 0:42 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-09-27 22:34 [PATCH 2.6.9-rc2 0/2] enhanced accounting data collection Jay Lan 2004-09-27 22:44 ` [PATCH 2.6.9-rc2 1/2] enhanced I/O " Jay Lan 2004-09-27 22:50 ` [PATCH 2.6.9-rc2 2/2] enhanced MM " Jay Lan 2004-09-28 9:33 ` Paul Jackson 2004-09-28 11:38 ` Robin Holt 2004-09-28 13:29 ` Paul Jackson 2004-09-28 14:34 ` Robin Holt 2004-10-02 0:38 ` [Lse-tech] " Jay Lan -- strict thread matches above, loose matches on Subject: below -- 2004-09-28 15:21 [PATCH 2.6.9-rc2 1/2] enhanced I/O " Jens Axboe 2004-09-29 23:01 ` Jay Lan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox