* Is notify_die being overloaded?
@ 2006-04-13 19:46 Robin Holt
2006-04-15 6:19 ` Keith Owens
2006-04-17 16:45 ` Keshavamurthy Anil S
0 siblings, 2 replies; 13+ messages in thread
From: Robin Holt @ 2006-04-13 19:46 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Anil S Keshavamurthy, Keith Owens, Dean Nelson
notify_die seems to be called to indicate the machine is going down as
well as there are trapped events for the process.
Specifically, the following call notify_die when there are machine
related events:
ia64_mca_rendez_int_handler (DIE_MCA_RENDZVOUS_ENTER,
DIE_MCA_RENDZVOUS_PROCESS, DIE_MCA_RENDZVOUS_LEAVE)
ia64_mca_handler (DIE_MCA_MONARCH_ENTER, DIE_MCA_MONARCH_PROCESS,
DIE_MCA_MONARCH_LEAVE)
ia64_init_handler (DIE_INIT_ENTER,
DIE_INIT_{SLAVE|MONARCH}_{ENTER|PROCESS|LEAVE})
ia64_mca_init (DIE_MCA_NEW_TIMEOUT)
machine_restart (DIE_MACHINE_RESTART)
machine_halt (DIE_MACHINE_HALT)
die (DIE_OOPS)
The following seem to be process related:
ia64_bad_break (DIE_BREAK, DIE_FAULT)
ia64_do_page_fault (DIE_PAGE_FAULT)
Shouldn't these really be seperated into two seperate notifier chains?
One for OS level die() type activity and another for process faults
which a debugger et. al. would want to know about?
The specific concern is some testing we have been doing with an upcoming
OSD release. We see notify_die being called from ia64_do_page_fault
frequently in our performance samples. On these machines, xpc has
registers a die notifier and therefore callouts are occuring which have
no relationship to a processes page faulting. XPC is looking for events
which indicate the OS is stopping. Additionally, kdb is installed on
this machine as well and it has registered a die notifier as well.
Thanks,
Robin Holt
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: Is notify_die being overloaded? 2006-04-13 19:46 Is notify_die being overloaded? Robin Holt @ 2006-04-15 6:19 ` Keith Owens 2006-04-15 10:43 ` Robin Holt 2006-04-17 16:45 ` Keshavamurthy Anil S 1 sibling, 1 reply; 13+ messages in thread From: Keith Owens @ 2006-04-15 6:19 UTC (permalink / raw) To: Robin Holt; +Cc: linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson Robin Holt (on Thu, 13 Apr 2006 14:46:44 -0500) wrote: >notify_die seems to be called to indicate the machine is going down as >well as there are trapped events for the process. > >Specifically, the following call notify_die when there are machine >related events: >ia64_mca_rendez_int_handler (DIE_MCA_RENDZVOUS_ENTER, > DIE_MCA_RENDZVOUS_PROCESS, DIE_MCA_RENDZVOUS_LEAVE) >ia64_mca_handler (DIE_MCA_MONARCH_ENTER, DIE_MCA_MONARCH_PROCESS, > DIE_MCA_MONARCH_LEAVE) >ia64_init_handler (DIE_INIT_ENTER, > DIE_INIT_{SLAVE|MONARCH}_{ENTER|PROCESS|LEAVE}) >ia64_mca_init (DIE_MCA_NEW_TIMEOUT) >machine_restart (DIE_MACHINE_RESTART) >machine_halt (DIE_MACHINE_HALT) >die (DIE_OOPS) > > >The following seem to be process related: >ia64_bad_break (DIE_BREAK, DIE_FAULT) >ia64_do_page_fault (DIE_PAGE_FAULT) > > >Shouldn't these really be seperated into two seperate notifier chains? >One for OS level die() type activity and another for process faults >which a debugger et. al. would want to know about? The only real problem is the page fault handler event. All the other calls to notify_die() are for rare events (MCA, INIT, restarts, halt, oops) or for debugging events, none of which are performance critical. DIE_PAGE_FAULT is only called because kprobes needs it, but that call is on a performance critical path and it can significantly slow down the rest of the system. kprobes should be using its own notify chain to trap page faults, and the handler for that chain should be optimized away when CONFIG_KPROBES=n or there are no active probes. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-15 6:19 ` Keith Owens @ 2006-04-15 10:43 ` Robin Holt 2006-04-17 7:52 ` Keith Owens 0 siblings, 1 reply; 13+ messages in thread From: Robin Holt @ 2006-04-15 10:43 UTC (permalink / raw) To: Keith Owens Cc: linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson On Sat, Apr 15, 2006 at 04:19:55PM +1000, Keith Owens wrote: > Robin Holt (on Thu, 13 Apr 2006 14:46:44 -0500) wrote: > >notify_die seems to be called to indicate the machine is going down as > >well as there are trapped events for the process. ... > The only real problem is the page fault handler event. All the other ... > > kprobes should be using its own notify chain to trap page faults, and > the handler for that chain should be optimized away when > CONFIG_KPROBES=n or there are no active probes. I realize the page fault handler is the only performance critical event, but don't all the debugging events _REALLY_ deserve a seperate call chain? They are _completely_ seperate and isolated events. One is a minor event which a small number of other userland processes are concerned with. The other is indicating the machine is about stop running and is only relevant to critical system infrastructure. When I get back from vacation on Tuesday, I will try to work up a patch which introduces a notify_debug() call and its call chain. Maybe that will initiate more discussion. Thanks, Robin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-15 10:43 ` Robin Holt @ 2006-04-17 7:52 ` Keith Owens 2006-04-17 10:51 ` Robin Holt 2006-04-17 16:50 ` Is notify_die being overloaded? Keshavamurthy Anil S 0 siblings, 2 replies; 13+ messages in thread From: Keith Owens @ 2006-04-17 7:52 UTC (permalink / raw) To: Robin Holt; +Cc: linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson Robin Holt (on Sat, 15 Apr 2006 05:43:56 -0500) wrote: >On Sat, Apr 15, 2006 at 04:19:55PM +1000, Keith Owens wrote: >> Robin Holt (on Thu, 13 Apr 2006 14:46:44 -0500) wrote: >> >notify_die seems to be called to indicate the machine is going down as >> >well as there are trapped events for the process. >... >> The only real problem is the page fault handler event. All the other >... >> >> kprobes should be using its own notify chain to trap page faults, and >> the handler for that chain should be optimized away when >> CONFIG_KPROBES=n or there are no active probes. > >I realize the page fault handler is the only performance critical event, >but don't all the debugging events _REALLY_ deserve a seperate call chain? >They are _completely_ seperate and isolated events. One is a minor event >which a small number of other userland processes are concerned with. >The other is indicating the machine is about stop running and is only >relevant to critical system infrastructure. Unfortunately the ebents are ambiguous. On IA64 BUG() maps to break 0, but break 0 is also used for debugging[*]. Which makes it awkward to differentiate between a kernel error and a debug event, we have to first ask the debuggers if the event if for them then, if the debuggers do not want the event, drop into the die_if_kernel event. [*] It does not help that IA64 break.b <n> does not store the value of <n> in cr.iim. All break.b values look like break.b 0. There used to be code in traps.c to detect this and extract the value of break.b, but a kprobes patch removed that code. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-17 7:52 ` Keith Owens @ 2006-04-17 10:51 ` Robin Holt 2006-04-17 11:25 ` Robin Holt 2006-04-17 16:50 ` Is notify_die being overloaded? Keshavamurthy Anil S 1 sibling, 1 reply; 13+ messages in thread From: Robin Holt @ 2006-04-17 10:51 UTC (permalink / raw) To: Keith Owens Cc: Robin Holt, linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson On Mon, Apr 17, 2006 at 05:52:10PM +1000, Keith Owens wrote: > Robin Holt (on Sat, 15 Apr 2006 05:43:56 -0500) wrote: > >On Sat, Apr 15, 2006 at 04:19:55PM +1000, Keith Owens wrote: > >> Robin Holt (on Thu, 13 Apr 2006 14:46:44 -0500) wrote: > >> >notify_die seems to be called to indicate the machine is going down as > >> >well as there are trapped events for the process. > >... > >> The only real problem is the page fault handler event. All the other > >... > >> > >> kprobes should be using its own notify chain to trap page faults, and > >> the handler for that chain should be optimized away when > >> CONFIG_KPROBES=n or there are no active probes. > > > >I realize the page fault handler is the only performance critical event, > >but don't all the debugging events _REALLY_ deserve a seperate call chain? > >They are _completely_ seperate and isolated events. One is a minor event > >which a small number of other userland processes are concerned with. > >The other is indicating the machine is about stop running and is only > >relevant to critical system infrastructure. > > Unfortunately the ebents are ambiguous. On IA64 BUG() maps to break 0, > but break 0 is also used for debugging[*]. Which makes it awkward to > differentiate between a kernel error and a debug event, we have to > first ask the debuggers if the event if for them then, if the debuggers > do not want the event, drop into the die_if_kernel event. I think this still would argue for a notify_debugger() sort of callout which would read something like: if (notify_debugger(...) == NOTIFY_STOP) return; die_if_kernel(...) Makes more sense than a notify_die() in there. Am I missing something? Thanks, Robin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-17 10:51 ` Robin Holt @ 2006-04-17 11:25 ` Robin Holt 2006-04-18 0:23 ` Keith Owens 0 siblings, 1 reply; 13+ messages in thread From: Robin Holt @ 2006-04-17 11:25 UTC (permalink / raw) To: Robin Holt Cc: Keith Owens, linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson On Mon, Apr 17, 2006 at 05:51:44AM -0500, Robin Holt wrote: > On Mon, Apr 17, 2006 at 05:52:10PM +1000, Keith Owens wrote: > > Robin Holt (on Sat, 15 Apr 2006 05:43:56 -0500) wrote: ... > > Unfortunately the ebents are ambiguous. On IA64 BUG() maps to break 0, > > but break 0 is also used for debugging[*]. Which makes it awkward to > > differentiate between a kernel error and a debug event, we have to > > first ask the debuggers if the event if for them then, if the debuggers > > do not want the event, drop into the die_if_kernel event. > > I think this still would argue for a notify_debugger() sort of callout > which would read something like: I finally think I understand your point. You are saying that kdb would have to register for the notify_debugger() chain and would therefore get in the way of handle_page_fault(). What about changing notify_die() callout in handle_page_fault() into a notify_page_fault(). That actually feels a lot better now that you got me to think about it. Thanks, Robin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-17 11:25 ` Robin Holt @ 2006-04-18 0:23 ` Keith Owens 2006-04-18 22:16 ` ia64_do_page_fault shows 19.4% slowdown from notify_die Robin Holt 0 siblings, 1 reply; 13+ messages in thread From: Keith Owens @ 2006-04-18 0:23 UTC (permalink / raw) To: Robin Holt; +Cc: linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson Robin Holt (on Mon, 17 Apr 2006 06:25:52 -0500) wrote: >On Mon, Apr 17, 2006 at 05:51:44AM -0500, Robin Holt wrote: >> On Mon, Apr 17, 2006 at 05:52:10PM +1000, Keith Owens wrote: >> > Robin Holt (on Sat, 15 Apr 2006 05:43:56 -0500) wrote: >... >> > Unfortunately the ebents are ambiguous. On IA64 BUG() maps to break 0, >> > but break 0 is also used for debugging[*]. Which makes it awkward to >> > differentiate between a kernel error and a debug event, we have to >> > first ask the debuggers if the event if for them then, if the debuggers >> > do not want the event, drop into the die_if_kernel event. >> >> I think this still would argue for a notify_debugger() sort of callout >> which would read something like: > >I finally think I understand your point. You are saying that kdb would >have to register for the notify_debugger() chain and would therefore >get in the way of handle_page_fault(). What about changing notify_die() >callout in handle_page_fault() into a notify_page_fault(). That actually >feels a lot better now that you got me to think about it. I thought that is what I said in my original response, "kprobes should be using its own notify chain to trap page faults, and the handler for that chain should be optimized away when CONFIG_KPROBES=n or there are no active probes". Even the overhead of calling into a notify_page_fault() routine just to do nothing adds a measurable overhead to the page fault handler (according to Jack Steiner). Since kprobes is the only code that needs a callback on a page fault, it is up to kprobes to minimize the impact of that callback on the normal processing. ^ permalink raw reply [flat|nested] 13+ messages in thread
* ia64_do_page_fault shows 19.4% slowdown from notify_die. 2006-04-18 0:23 ` Keith Owens @ 2006-04-18 22:16 ` Robin Holt 2006-04-18 23:03 ` Keshavamurthy Anil S 2006-04-19 0:30 ` Andi Kleen 0 siblings, 2 replies; 13+ messages in thread From: Robin Holt @ 2006-04-18 22:16 UTC (permalink / raw) To: Keith Owens, Anil S Keshavamurthy, prasanna, ananth, davem Cc: tony.luck, linux-kernel, Andrew Morton On Tue, Apr 18, 2006 at 10:23:52AM +1000, Keith Owens wrote: > I thought that is what I said in my original response, "kprobes should I was a little dense and had forgotten that KDB would still need to register as a debugger. Some micro-benchmarking has shown this to be very painful. The average of 128 iterations with 4194304 faults per iteration using the attached micro-benchmark showed the following: 499 nSec/fault ia64_do_page_fault notify_die commented out. 501 nSec/fault ia64_do_page_fault with nobody registered. 533 nSec/fault notify_die in and just kprobes. 596 nSec/fault notify_die in and kdb, kprobes, mca, and xpc loaded. The 596 nSec/fault is a 19.4% slowdown. This is an upcoming OSD beta kernel. It will be representative of what our typical customer will have loaded. Is this enough justification for breaking notify_die into notify_page_fault for the fault path? > that chain should be optimized away when CONFIG_KPROBES=n or there are > no active probes". Having the notify_page_fault() without anybody registered was only a 0.4% slowdown. I am not sure that justifies the optimize away, but I would certainly not object. I think the second and third numbers also indicate strongly that kprobes should only be registering the notify_page_fault when it actually is monitoring for a memory access. I know so little about how kprobes works, I will stop right there. Is there anybody who is willing to take that task or explain why it is impossible? Thanks, Robin Holt ------------------ Page fault micro-benchmark ------------------------- #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/prctl.h> #include <sys/stat.h> #include <sys/time.h> #include <sys/types.h> #include <sys/wait.h> #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define PAGE_SIZE getpagesize() #define STRIDE PAGE_SIZE #define FAULTS_TO_CAUSE (2048UL * 2048UL) #define MAPPING_SIZE FAULTS_TO_CAUSE * STRIDE #define LOOPS_TO_TIME 128 int main(int argc, char **argv) { long offset, i, j; char * mapping; volatile char z; struct timeval tv; unsigned long start_ts, end_ts; unsigned long total_uSec; struct timezone tz; pid_t child; int child_status; tz.tz_minuteswest = 0; total_uSec = 0; mapping = mmap(NULL, (size_t) MAPPING_SIZE, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); if ((unsigned long) mapping == -1UL) { perror("Mapping failed."); exit(0); } for (j=0; j < LOOPS_TO_TIME; j++) { child = fork(); if (child > 0) { wait(&child_status); } else if (child == 0) { gettimeofday(&tv, &tz); start_ts = tv.tv_sec * 1000000 + tv.tv_usec; for (i = 0; i < FAULTS_TO_CAUSE; i++) { offset = i * STRIDE; z = mapping[offset]; } gettimeofday(&tv, &tz); end_ts = tv.tv_sec * 1000000 + tv.tv_usec; total_uSec += (end_ts - start_ts); printf("Took %ld nSecs per fault\n", (total_uSec*1000) / FAULTS_TO_CAUSE); exit(0); } else { printf ("Fork failed\n"); } } munmap(mapping, (size_t) MAPPING_SIZE); return 0; } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ia64_do_page_fault shows 19.4% slowdown from notify_die. 2006-04-18 22:16 ` ia64_do_page_fault shows 19.4% slowdown from notify_die Robin Holt @ 2006-04-18 23:03 ` Keshavamurthy Anil S 2006-04-19 0:30 ` Andi Kleen 1 sibling, 0 replies; 13+ messages in thread From: Keshavamurthy Anil S @ 2006-04-18 23:03 UTC (permalink / raw) To: Robin Holt Cc: Keith Owens, Anil S Keshavamurthy, prasanna, ananth, davem, tony.luck, linux-kernel, Andrew Morton On Tue, Apr 18, 2006 at 05:16:23PM -0500, Robin Holt wrote: > On Tue, Apr 18, 2006 at 10:23:52AM +1000, Keith Owens wrote: > > I thought that is what I said in my original response, "kprobes should > > I was a little dense and had forgotten that KDB would still need to > register as a debugger. > > > Some micro-benchmarking has shown this to be very painful. The average > of 128 iterations with 4194304 faults per iteration using the attached > micro-benchmark showed the following: > > 499 nSec/fault ia64_do_page_fault notify_die commented out. > 501 nSec/fault ia64_do_page_fault with nobody registered. > 533 nSec/fault notify_die in and just kprobes. > 596 nSec/fault notify_die in and kdb, kprobes, mca, and xpc loaded. > > The 596 nSec/fault is a 19.4% slowdown. This is an upcoming OSD beta > kernel. It will be representative of what our typical customer will > have loaded. > > Is this enough justification for breaking notify_die into > notify_page_fault for the fault path? Yes sir, I am convinced 100%. > > > > that chain should be optimized away when CONFIG_KPROBES=n or there are > > no active probes". > > Having the notify_page_fault() without anybody registered was only a > 0.4% slowdown. I am not sure that justifies the optimize away, but I > would certainly not object. > > I think the second and third numbers also indicate strongly that kprobes > should only be registering the notify_page_fault when it actually is > monitoring for a memory access. I know so little about how kprobes works, > I will stop right there. Is there anybody who is willing to take that > task or explain why it is impossible? I will take it up and submit a path soon. Thanks for your analysis. -Anil ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ia64_do_page_fault shows 19.4% slowdown from notify_die. 2006-04-18 22:16 ` ia64_do_page_fault shows 19.4% slowdown from notify_die Robin Holt 2006-04-18 23:03 ` Keshavamurthy Anil S @ 2006-04-19 0:30 ` Andi Kleen 2006-04-19 11:11 ` Robin Holt 1 sibling, 1 reply; 13+ messages in thread From: Andi Kleen @ 2006-04-19 0:30 UTC (permalink / raw) To: Robin Holt; +Cc: tony.luck, linux-kernel, Andrew Morton Robin Holt <holt@sgi.com> writes: > 499 nSec/fault ia64_do_page_fault notify_die commented out. > 501 nSec/fault ia64_do_page_fault with nobody registered. > 533 nSec/fault notify_die in and just kprobes. > 596 nSec/fault notify_die in and kdb, kprobes, mca, and xpc loaded. > > The 596 nSec/fault is a 19.4% slowdown. This is an upcoming OSD beta > kernel. It will be representative of what our typical customer will > have loaded. With kdb some slowdown is expected. But just going through kprobes shouldn't be that slow. I guess there would be optimization potential there. Do you have finer grained profiling what is actually slow? > Having the notify_page_fault() without anybody registered was only a > 0.4% slowdown. I am not sure that justifies the optimize away, but I > would certainly not object. Still sounds far too much for what is essentially a call + load + test + return Where is that overhead comming from? I know IA64 doesn't like indirect calls, but there shouldn't any be there for this case. -Andi ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: ia64_do_page_fault shows 19.4% slowdown from notify_die. 2006-04-19 0:30 ` Andi Kleen @ 2006-04-19 11:11 ` Robin Holt 0 siblings, 0 replies; 13+ messages in thread From: Robin Holt @ 2006-04-19 11:11 UTC (permalink / raw) To: Andi Kleen; +Cc: Robin Holt, tony.luck, linux-kernel, Andrew Morton On Wed, Apr 19, 2006 at 02:30:35AM +0200, Andi Kleen wrote: > Robin Holt <holt@sgi.com> writes: > > > 499 nSec/fault ia64_do_page_fault notify_die commented out. > > 501 nSec/fault ia64_do_page_fault with nobody registered. > > 533 nSec/fault notify_die in and just kprobes. > > 596 nSec/fault notify_die in and kdb, kprobes, mca, and xpc loaded. > > > > With kdb some slowdown is expected. kdb does not register a die notifier. It only does the notify_die callouts. Sorry for the confusion. mca handler and xpc both register notifiers and both have very early exits. > > But just going through kprobes shouldn't be that slow. I guess > there would be optimization potential there. > > Do you have finer grained profiling what is actually slow? > > > > Having the notify_page_fault() without anybody registered was only a > > 0.4% slowdown. I am not sure that justifies the optimize away, but I > > would certainly not object. > > Still sounds far too much for what is essentially a call + load + test + return > Where is that overhead comming from? I know IA64 doesn't like indirect > calls, but there shouldn't any be there for this case. I think each registered notifier is adding approx 32 nSec. Actually, the noise on these samples was about +-9nSec which I assumed was processor stalls on cacheline load. I think it looks like a lot of time when viewed as nSec, but when viewed as a percentage of process run time, it is probably not that great of an issue which is why it has been allowed to creep by for so long. I can not think of an easy way to diagnose this slowdown any further. I could run through this code on the simulator so you can see which instructions actually got executed. Would that be helpful? Thanks, Robin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-17 7:52 ` Keith Owens 2006-04-17 10:51 ` Robin Holt @ 2006-04-17 16:50 ` Keshavamurthy Anil S 1 sibling, 0 replies; 13+ messages in thread From: Keshavamurthy Anil S @ 2006-04-17 16:50 UTC (permalink / raw) To: Keith Owens Cc: Robin Holt, linux-kernel, Andrew Morton, Anil S Keshavamurthy, Dean Nelson On Mon, Apr 17, 2006 at 05:52:10PM +1000, Keith Owens wrote: > > [*] It does not help that IA64 break.b <n> does not store the value of > <n> in cr.iim. All break.b values look like break.b 0. There used > to be code in traps.c to detect this and extract the value of > break.b, but a kprobes patch removed that code. Yes, Kprobes code removed it because, by the time this cpu reads the ia64 instruction to decode the break value, at the same time on the other cpu, due to unregister_kprobes() call, this instruction might be replace with the original instruction. Hence the reading/decoding the instruction might result in wrong break number. So not a good idea to decode the instruction. -Anil ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is notify_die being overloaded? 2006-04-13 19:46 Is notify_die being overloaded? Robin Holt 2006-04-15 6:19 ` Keith Owens @ 2006-04-17 16:45 ` Keshavamurthy Anil S 1 sibling, 0 replies; 13+ messages in thread From: Keshavamurthy Anil S @ 2006-04-17 16:45 UTC (permalink / raw) To: Robin Holt Cc: linux-kernel, Andrew Morton, Anil S Keshavamurthy, Keith Owens, Dean Nelson On Thu, Apr 13, 2006 at 02:46:44PM -0500, Robin Holt wrote: > notify_die seems to be called to indicate the machine is going down as > well as there are trapped events for the process. > > Specifically, the following call notify_die when there are machine > related events: > ia64_mca_rendez_int_handler (DIE_MCA_RENDZVOUS_ENTER, > DIE_MCA_RENDZVOUS_PROCESS, DIE_MCA_RENDZVOUS_LEAVE) > ia64_mca_handler (DIE_MCA_MONARCH_ENTER, DIE_MCA_MONARCH_PROCESS, > DIE_MCA_MONARCH_LEAVE) > ia64_init_handler (DIE_INIT_ENTER, > DIE_INIT_{SLAVE|MONARCH}_{ENTER|PROCESS|LEAVE}) > ia64_mca_init (DIE_MCA_NEW_TIMEOUT) > machine_restart (DIE_MACHINE_RESTART) > machine_halt (DIE_MACHINE_HALT) > die (DIE_OOPS) > > > The following seem to be process related: > ia64_bad_break (DIE_BREAK, DIE_FAULT) > ia64_do_page_fault (DIE_PAGE_FAULT) > > > Shouldn't these really be seperated into two seperate notifier chains? > One for OS level die() type activity and another for process faults > which a debugger et. al. would want to know about? > > The specific concern is some testing we have been doing with an upcoming > OSD release. We see notify_die being called from ia64_do_page_fault > frequently in our performance samples. On these machines, xpc has > registers a die notifier and therefore callouts are occuring which have > no relationship to a processes page faulting. XPC is looking for events > which indicate the OS is stopping. Additionally, kdb is installed on > this machine as well and it has registered a die notifier as well. Since DIE_PAGE_FAULT is the one which come in performance path, I think this should be optimised and I would suggest just making notify_die(DIE_PAGE_FAULT,..) into a seperate notifier chains (something like notify_page_fault() which calls just the registered handlers). In this way, in the performance critical path, we will be calling only the required handlers(probally only the kprobes handlers) and not the whole world registered on notify_die() call chain. -thanks, Anil ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-04-19 11:11 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-13 19:46 Is notify_die being overloaded? Robin Holt 2006-04-15 6:19 ` Keith Owens 2006-04-15 10:43 ` Robin Holt 2006-04-17 7:52 ` Keith Owens 2006-04-17 10:51 ` Robin Holt 2006-04-17 11:25 ` Robin Holt 2006-04-18 0:23 ` Keith Owens 2006-04-18 22:16 ` ia64_do_page_fault shows 19.4% slowdown from notify_die Robin Holt 2006-04-18 23:03 ` Keshavamurthy Anil S 2006-04-19 0:30 ` Andi Kleen 2006-04-19 11:11 ` Robin Holt 2006-04-17 16:50 ` Is notify_die being overloaded? Keshavamurthy Anil S 2006-04-17 16:45 ` Keshavamurthy Anil S
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox