* Question about core files
@ 2009-10-06 14:04 Holger Kiehl
2009-10-06 14:41 ` Manish Katiyar
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-06 14:04 UTC (permalink / raw)
To: linux-c-programming
Hello
Most the time I compile my application without the -g option due to
performance reasons. Problem is that when it hits some bug and dumps
core, this is not very useful because there is hardly any information
in it. Is there some way to get some useful information out of
the core file. For example one of my program crashed and with gdb
I see the following:
afd@helena:~$ gdb fd core.2515
GNU gdb Fedora (6.8-24.fc9)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(no debugging symbols found)
warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib64/libc-2.8.so...Reading symbols from /usr/lib/debug/lib64/libc-2.8.so.debug...done.
done.
Loaded symbols for /lib64/libc-2.8.so
Reading symbols from /lib64/ld-2.8.so...Reading symbols from /usr/lib/debug/lib64/ld-2.8.so.debug...done.
done.
Loaded symbols for /lib64/ld-2.8.so
Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files-2.8.so
Core was generated by `fd -w /home/afd'.
Program terminated with signal 6, Aborted.
[New process 2515]
#0 0x000000304cc32215 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) where
#0 0x000000304cc32215 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x000000304cc33d83 in abort () at abort.c:88
#2 0x000000000040b174 in sig_segv ()
#3 <signal handler called>
#4 0x0000000000404b5f in start_process ()
#5 0x0000000000407b9a in main ()
At least I know that the bug is in my function start_process. But is
there some way to find out at what line it happened?
Thanks,
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: Question about core files 2009-10-06 14:04 Question about core files Holger Kiehl @ 2009-10-06 14:41 ` Manish Katiyar 2009-10-07 13:28 ` Holger Kiehl 2009-10-07 4:45 ` Glynn Clements 2009-10-07 4:58 ` vinit dhatrak 2 siblings, 1 reply; 18+ messages in thread From: Manish Katiyar @ 2009-10-06 14:41 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > Hello > > Most the time I compile my application without the -g option due to > performance reasons. Problem is that when it hits some bug and dumps > core, this is not very useful because there is hardly any information > in it. Is there some way to get some useful information out of > the core file. Is it possible to post your code ? Atleast the start_process() function. Given that you have got a sigsegv it is probably an invalid pointer access. You can also try to print $eip (or rip since this is 64 bit machine) and look around the assembly . Output of "disas start_process" from gdb will also help. > For example one of my program crashed and with gdb > I see the following: > > afd@helena:~$ gdb fd core.2515 > GNU gdb Fedora (6.8-24.fc9) > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"... > (no debugging symbols found) > > warning: Can't read pathname for load map: Input/output error. > Reading symbols from /lib64/libc-2.8.so...Reading symbols from > /usr/lib/debug/lib64/libc-2.8.so.debug...done. > done. > Loaded symbols for /lib64/libc-2.8.so > Reading symbols from /lib64/ld-2.8.so...Reading symbols from > /usr/lib/debug/lib64/ld-2.8.so.debug...done. > done. > Loaded symbols for /lib64/ld-2.8.so > Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from > /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done. > done. > Loaded symbols for /lib64/libnss_files-2.8.so > Core was generated by `fd -w /home/afd'. > Program terminated with signal 6, Aborted. > [New process 2515] > #0 0x000000304cc32215 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > (gdb) where > #0 0x000000304cc32215 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x000000304cc33d83 in abort () at abort.c:88 > #2 0x000000000040b174 in sig_segv () > #3 <signal handler called> > #4 0x0000000000404b5f in start_process () > #5 0x0000000000407b9a in main () > > At least I know that the bug is in my function start_process. But is > there some way to find out at what line it happened? > > Thanks, > Holger > -- > To unsubscribe from this list: send the line "unsubscribe > linux-c-programming" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Thanks - Manish ================================== [$\*.^ -- I miss being one of them ================================== -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-06 14:41 ` Manish Katiyar @ 2009-10-07 13:28 ` Holger Kiehl 2009-10-07 13:54 ` Manish Katiyar 0 siblings, 1 reply; 18+ messages in thread From: Holger Kiehl @ 2009-10-07 13:28 UTC (permalink / raw) To: Manish Katiyar; +Cc: linux-c-programming On Tue, 6 Oct 2009, Manish Katiyar wrote: > On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >> Hello >> >> Most the time I compile my application without the -g option due to >> performance reasons. Problem is that when it hits some bug and dumps >> core, this is not very useful because there is hardly any information >> in it. Is there some way to get some useful information out of >> the core file. > > Is it possible to post your code ? Atleast the start_process() > function. Given that you have got a sigsegv it is probably an invalid > pointer access. > The code is GPL so that is no problem. However it is long so I just cut out start_process() which you will find below. > You can also try to print $eip (or rip since this is 64 bit machine) > and look around the assembly . Output of "disas start_process" from > gdb will also help. > I tried those but I am not familier with assembly: (gdb) print $eip $1 = void (gdb) print $rip $2 = (void (*)()) 0x404b5f <start_process+143> (gdb) where #0 0x000000304cc32215 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x000000304cc33d83 in abort () at abort.c:88 #2 0x000000000040b174 in sig_segv () #3 <signal handler called> #4 0x0000000000404b5f in start_process () #5 0x0000000000407b9a in main () (gdb) disas start_process Dump of assembler code for function start_process: 0x0000000000404ad0 <start_process+0>: movslq %esi,%rsi 0x0000000000404ad3 <start_process+3>: mov %rbx,-0x30(%rsp) 0x0000000000404ad8 <start_process+8>: mov %rbp,-0x28(%rsp) 0x0000000000404add <start_process+13>: mov %rsi,%r11 0x0000000000404ae0 <start_process+16>: mov $0x68,%esi 0x0000000000404ae5 <start_process+21>: mov %r12,-0x20(%rsp) 0x0000000000404aea <start_process+26>: imul %rsi,%r11 0x0000000000404aee <start_process+30>: mov %r13,-0x18(%rsp) 0x0000000000404af3 <start_process+35>: mov %r14,-0x10(%rsp) 0x0000000000404af8 <start_process+40>: mov %r15,-0x8(%rsp) 0x0000000000404afd <start_process+45>: sub $0x568,%rsp 0x0000000000404b04 <start_process+52>: mov %rdx,%rbx 0x0000000000404b07 <start_process+55>: mov %edi,0x24(%rsp) 0x0000000000404b0b <start_process+59>: mov %r11,%rdi 0x0000000000404b0e <start_process+62>: add 0x225513(%rip),%rdi # 0x62a028 <qb> 0x0000000000404b15 <start_process+69>: cmpb $0x0,0x31(%rdi) 0x0000000000404b19 <start_process+73>: je 0x404ed8 <start_process+1032> 0x0000000000404b1f <start_process+79>: movslq 0x28(%rdi),%rax 0x0000000000404b23 <start_process+83>: lea 0x0(,%rax,8),%rdx 0x0000000000404b2b <start_process+91>: mov %rax,%r8 0x0000000000404b2e <start_process+94>: shl $0x6,%r8 0x0000000000404b32 <start_process+98>: sub %rdx,%r8 0x0000000000404b35 <start_process+101>: add 0x2259cc(%rip),%r8 # 0x62a508 <mdb> 0x0000000000404b3c <start_process+108>: mov 0x2c(%r8),%r9d 0x0000000000404b40 <start_process+112>: test %r9d,%r9d 0x0000000000404b43 <start_process+115>: jne 0x404d70 <start_process+672> 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14 0x0000000000404b55 <start_process+133>: mov %r14,%rax 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa> 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx 0x0000000000404b65 <start_process+149>: test $0x1,%dl 0x0000000000404b68 <start_process+152>: jne 0x404d30 <start_process+608> 0x0000000000404b6e <start_process+158>: dec %ecx 0x0000000000404b70 <start_process+160>: je 0x404bd0 <start_process+256> 0x0000000000404b72 <start_process+162>: mov 0xf0(%rax),%ecx 0x0000000000404b78 <start_process+168>: mov $0x2,%esi 0x0000000000404b7d <start_process+173>: test %ecx,%ecx 0x0000000000404b7f <start_process+175>: jne 0x404c88 <start_process+440> 0x0000000000404b85 <start_process+181>: test %dl,%dl 0x0000000000404b87 <start_process+183>: jns 0x404bd0 <start_process+256> 0x0000000000404b89 <start_process+185>: mov 0x104(%rax),%ecx 0x0000000000404b8f <start_process+191>: movslq 0x28(%rdi),%rax 0x0000000000404b93 <start_process+195>: mov $0xffffffff,%esi 0x0000000000404b98 <start_process+200>: mov %r11,(%rsp) 0x0000000000404b9c <start_process+204>: lea 0x0(,%rax,8),%rdx 0x0000000000404ba4 <start_process+212>: shl $0x6,%rax 0x0000000000404ba8 <start_process+216>: sub %rdx,%rax 0x0000000000404bab <start_process+219>: mov 0x225956(%rip),%rdx # 0x62a508 <mdb> 0x0000000000404bb2 <start_process+226>: mov 0x28(%rdx,%rax,1),%edi 0x0000000000404bb6 <start_process+230>: mov %rbx,%rdx 0x0000000000404bb9 <start_process+233>: callq 0x41ab00 <check_error_queue> 0x0000000000404bbe <start_process+238>: test %eax,%eax 0x0000000000404bc0 <start_process+240>: mov %eax,%esi 0x0000000000404bc2 <start_process+242>: mov (%rsp),%r11 0x0000000000404bc6 <start_process+246>: jne 0x404c88 <start_process+440> 0x0000000000404bcc <start_process+252>: nopl 0x0(%rax) 0x0000000000404bd0 <start_process+256>: mov %r14,%rcx 0x0000000000404bd3 <start_process+259>: add 0x2253c6(%rip),%rcx # 0x629fa0 <fsa> 0x0000000000404bda <start_process+266>: cmpb $0x5,0xba(%rcx) 0x0000000000404be1 <start_process+273>: je 0x404f88 <start_process+1208> 0x0000000000404be7 <start_process+279>: mov 0x225462(%rip),%rax # 0x62a050 <p_afd_status> 0x0000000000404bee <start_process+286>: mov 0x225194(%rip),%ecx # 0x629d88 <max_connections> 0x0000000000404bf4 <start_process+292>: cmp %ecx,0x4f4(%rax) 0x0000000000404bfa <start_process+298>: jge 0x404d30 <start_process+608> 0x0000000000404c00 <start_process+304>: mov %r14,%r8 0x0000000000404c03 <start_process+307>: add 0x225396(%rip),%r8 # 0x629fa0 <fsa> 0x0000000000404c0a <start_process+314>: mov 0x174(%r8),%edi 0x0000000000404c11 <start_process+321>: cmp %edi,0x170(%r8) 0x0000000000404c18 <start_process+328>: jge 0x404d30 <start_process+608> 0x0000000000404c1e <start_process+334>: test %ecx,%ecx 0x0000000000404c20 <start_process+336>: jle 0x404c5e <start_process+398> 0x0000000000404c22 <start_process+338>: mov 0x2251ff(%rip),%rsi # 0x62---Type <return> to continue, or q <return> to quit---q So all I now know is that it happened with the assembly instruction: mov 0xec(%rax),%edx But what does it tell me. At what part of my code could this be? Thanks, Holger --------- code of start_process() ---------- static pid_t start_process(int fsa_pos, int qb_pos, time_t current_time, int retry) { pid_t pid = PENDING; if ((qb[qb_pos].msg_name[0] != '\0') && (mdb[qb[qb_pos].pos].age_limit > 0) && ((fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA) == 0) && (current_time > qb[qb_pos].creation_time) && ((current_time - qb[qb_pos].creation_time) > mdb[qb[qb_pos].pos].age_limit)) { char del_dir[MAX_PATH_LENGTH]; if (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) { remove_from_error_queue(mdb[qb[qb_pos].pos].job_id, &fsa[fsa_pos], fsa_pos, fsa_fd); } (void)sprintf(del_dir, "%s%s%s/%s", p_work_dir, AFD_FILE_DIR, OUTGOING_DIR, qb[qb_pos].msg_name); extract_cus(qb[qb_pos].msg_name, dl.input_time, dl.split_job_counter, dl.unique_number); remove_job_files(del_dir, fsa_pos, mdb[qb[qb_pos].pos].job_id, FD, AGE_OUTPUT, -1); ABS_REDUCE(fsa_pos); pid = REMOVED; } else { int in_error_queue = NEITHER; if ((qb[qb_pos].msg_name[0] == '\0') && (*(unsigned char *)((char *)fsa - AFD_FEATURE_FLAG_OFFSET_END) & DISABLE_RETRIEVE)) { ABS_REDUCE(fsa_pos); return(REMOVED); } if (((fsa[fsa_pos].host_status & STOP_TRANSFER_STAT) == 0) && ((retry == YES) || ((fsa[fsa_pos].error_counter == 0) && (((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) == 0) || ((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) && ((in_error_queue = check_error_queue(mdb[qb[qb_pos].pos].job_id, -1, current_time, fsa[fsa_pos].retry_interval)) == NO)))) || ((fsa[fsa_pos].error_counter > 0) && (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) && ((current_time - (fsa[fsa_pos].last_retry_time + fsa[fsa_pos].retry_interval)) >= 0) && ((in_error_queue == NO) || ((in_error_queue == NEITHER) && (check_error_queue(mdb[qb[qb_pos].pos].job_id, -1, current_time, fsa[fsa_pos].retry_interval) == NO)))) || ((fsa[fsa_pos].active_transfers == 0) && ((current_time - (fsa[fsa_pos].last_retry_time + fsa[fsa_pos].retry_interval)) >= 0)))) { /* * First lets try and take an existing process, * that is waiting for more data to come. */ if ((fsa[fsa_pos].original_toggle_pos == NONE) && ((fsa[fsa_pos].protocol_options & DISABLE_BURSTING) == 0) && (fsa[fsa_pos].keep_connected > 0) && (fsa[fsa_pos].active_transfers > 0) && (fsa[fsa_pos].jobs_queued > 0) && ((((fsa[fsa_pos].special_flag & KEEP_CON_NO_SEND) == 0) && (qb[qb_pos].msg_name[0] != '\0')) || (((fsa[fsa_pos].special_flag & KEEP_CON_NO_FETCH) == 0) && (qb[qb_pos].msg_name[0] == '\0'))) && ((qb[qb_pos].special_flag & HELPER_JOB) == 0)) { int i, other_job_wait_pos[MAX_NO_PARALLEL_JOBS], other_qb_pos[MAX_NO_PARALLEL_JOBS], wait_counter = 0; for (i = 0; i < fsa[fsa_pos].allowed_transfers; i++) { if ((fsa[fsa_pos].job_status[i].proc_id != -1) && (fsa[fsa_pos].job_status[i].unique_name[2] == 5)) { int exec_qb_pos; qb_pos_pid(fsa[fsa_pos].job_status[i].proc_id, &exec_qb_pos); if (exec_qb_pos != -1) { if ((qb[qb_pos].msg_name[0] != '\0') && (qb[exec_qb_pos].msg_name[0] != '\0') && (mdb[qb[qb_pos].pos].type == mdb[qb[exec_qb_pos].pos].type) && (mdb[qb[qb_pos].pos].port == mdb[qb[exec_qb_pos].pos].port)) { if (qb[qb_pos].retries > 0) { fsa[fsa_pos].job_status[i].file_name_in_use[0] = '\0'; fsa[fsa_pos].job_status[i].file_name_in_use[1] = 1; (void)sprintf(&fsa[fsa_pos].job_status[i].file_name_in_use[2], "%u", qb[qb_pos].retries); } fsa[fsa_pos].job_status[i].job_id = mdb[qb[qb_pos].pos].job_id; mdb[qb[qb_pos].pos].last_transfer_time = mdb[qb[exec_qb_pos].pos].last_transfer_time = current_time; (void)memcpy(fsa[fsa_pos].job_status[i].unique_name, qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH); (void)memcpy(connection[qb[exec_qb_pos].connect_pos].msg_name, qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH); qb[qb_pos].pid = qb[exec_qb_pos].pid; qb[qb_pos].connect_pos = qb[exec_qb_pos].connect_pos; qb[qb_pos].special_flag |= BURST_REQUEUE; connection[qb[exec_qb_pos].connect_pos].job_no = i; if (qb[exec_qb_pos].pid > 0) { if (kill(qb[exec_qb_pos].pid, SIGUSR1) == -1) { system_log(DEBUG_SIGN, __FILE__, __LINE__, "Failed to send SIGUSR1 to %lld : %s", (pri_pid_t)qb[exec_qb_pos].pid, strerror(errno)); } p_afd_status->burst2_counter++; } else { system_log(DEBUG_SIGN, __FILE__, __LINE__, "Hmmm, pid = %lld!!!", (pri_pid_t)qb[exec_qb_pos].pid); } if ((fsa[fsa_pos].transfer_rate_limit > 0) || (no_of_trl_groups > 0)) { calc_trl_per_process(fsa_pos); } ABS_REDUCE(fsa_pos); remove_msg(exec_qb_pos); return(qb[qb_pos].pid); } else { other_job_wait_pos[wait_counter] = i; other_qb_pos[wait_counter] = exec_qb_pos; wait_counter++; } } else { system_log(DEBUG_SIGN, __FILE__, __LINE__, "Unable to locate qb_pos for %lld [fsa_pos=%d].", (pri_pid_t)fsa[fsa_pos].job_status[i].proc_id, fsa_pos); } } } if ((fsa[fsa_pos].active_transfers == fsa[fsa_pos].allowed_transfers) && (wait_counter > 0)) { for (i = 0; i < wait_counter; i++) { if (fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] == 5) { if (qb[other_qb_pos[i]].pid > 0) { fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 6; if (qb[other_qb_pos[i]].msg_name[0] == '\0') { return(PENDING); } else { if (kill(qb[other_qb_pos[i]].pid, SIGUSR1) == -1) { system_log(DEBUG_SIGN, __FILE__, __LINE__, "Failed to send SIGUSR1 to %lld : %s", (pri_pid_t)qb[other_qb_pos[i]].pid, strerror(errno)); fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 5; } else { return(PENDING); } } } else { system_log(DEBUG_SIGN, __FILE__, __LINE__, "Hmmm, pid = %lld!!!", (pri_pid_t)qb[other_qb_pos[i]].pid); } } } } } if ((p_afd_status->no_of_transfers < max_connections) && (fsa[fsa_pos].active_transfers < fsa[fsa_pos].allowed_transfers)) { int pos; if ((pos = get_free_connection()) == INCORRECT) { system_log(ERROR_SIGN, __FILE__, __LINE__, "Failed to get free connection."); } else { if ((connection[pos].job_no = get_free_disp_pos(fsa_pos)) != INCORRECT) { if (qb[qb_pos].msg_name[0] == '\0') { connection[pos].fra_pos = qb[qb_pos].pos; connection[pos].protocol = fra[qb[qb_pos].pos].protocol; connection[pos].msg_name[0] = '\0'; (void)memcpy(connection[pos].dir_alias, fra[qb[qb_pos].pos].dir_alias, MAX_DIR_ALIAS_LENGTH + 1); } else { connection[pos].fra_pos = -1; connection[pos].protocol = mdb[qb[qb_pos].pos].type; (void)memcpy(connection[pos].msg_name, qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH); connection[pos].dir_alias[0] = '\0'; } if (qb[qb_pos].special_flag & RESEND_JOB) { connection[pos].resend = YES; } else { connection[pos].resend = NO; } connection[pos].temp_toggle = OFF; (void)memcpy(connection[pos].hostname, fsa[fsa_pos].host_alias, MAX_HOSTNAME_LENGTH + 1); connection[pos].host_id = fsa[fsa_pos].host_id; connection[pos].fsa_pos = fsa_pos; if (fd_check_fsa() == YES) { if (check_fra_fd() == YES) { init_fra_data(); } /* * We need to set the connection[pos].pid to a * value higher then 0 so the function get_new_positions() * also locates the new connection[pos].fsa_pos. Otherwise * from here on we point to some completely different * host and this can cause havoc when someone uses * edit_hc and changes the alias order. */ connection[pos].pid = 1; get_new_positions(); connection[pos].pid = 0; init_msg_buffer(); fsa_pos = connection[pos].fsa_pos; last_pos_lookup = INCORRECT; } (void)strcpy(fsa[fsa_pos].job_status[connection[pos].job_no].unique_name, qb[qb_pos].msg_name); if ((fsa[fsa_pos].error_counter == 0) && (fsa[fsa_pos].auto_toggle == ON) && (fsa[fsa_pos].original_toggle_pos != NONE) && (fsa[fsa_pos].max_successful_retries > 0)) { if ((fsa[fsa_pos].original_toggle_pos == fsa[fsa_pos].toggle_pos) && (fsa[fsa_pos].successful_retries > 0)) { fsa[fsa_pos].original_toggle_pos = NONE; fsa[fsa_pos].successful_retries = 0; } else if (fsa[fsa_pos].successful_retries >= fsa[fsa_pos].max_successful_retries) { connection[pos].temp_toggle = ON; fsa[fsa_pos].successful_retries = 0; } else { fsa[fsa_pos].successful_retries++; } } /* Create process to distribute file. */ if ((connection[pos].pid = make_process(&connection[pos], qb_pos)) > 0) { pid = fsa[fsa_pos].job_status[connection[pos].job_no].proc_id = connection[pos].pid; fsa[fsa_pos].active_transfers += 1; if ((fsa[fsa_pos].transfer_rate_limit > 0) || (no_of_trl_groups > 0)) { calc_trl_per_process(fsa_pos); } ABS_REDUCE(fsa_pos); qb[qb_pos].connect_pos = pos; p_afd_status->no_of_transfers++; } else { fsa[fsa_pos].job_status[connection[pos].job_no].connect_status = NOT_WORKING; fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files = 0; fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files_done = 0; fsa[fsa_pos].job_status[connection[pos].job_no].file_size = 0; fsa[fsa_pos].job_status[connection[pos].job_no].file_size_done = 0; fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use = 0; fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use_done = 0; fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[0] = '\0'; fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[1] = 0; fsa[fsa_pos].job_status[connection[pos].job_no].unique_name[0] = '\0'; connection[pos].hostname[0] = '\0'; connection[pos].msg_name[0] = '\0'; connection[pos].host_id = 0; connection[pos].job_no = -1; connection[pos].fsa_pos = -1; connection[pos].fra_pos = -1; connection[pos].pid = 0; } } } } } } return(pid); } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 13:28 ` Holger Kiehl @ 2009-10-07 13:54 ` Manish Katiyar 2009-10-07 14:21 ` Holger Kiehl 0 siblings, 1 reply; 18+ messages in thread From: Manish Katiyar @ 2009-10-07 13:54 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming On Wed, Oct 7, 2009 at 6:58 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > On Tue, 6 Oct 2009, Manish Katiyar wrote: > >> On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >>> >>> Hello >>> >>> Most the time I compile my application without the -g option due to >>> performance reasons. Problem is that when it hits some bug and dumps >>> core, this is not very useful because there is hardly any information >>> in it. Is there some way to get some useful information out of >>> the core file. >> >> Is it possible to post your code ? Atleast the start_process() >> function. Given that you have got a sigsegv it is probably an invalid >> pointer access. >> > The code is GPL so that is no problem. However it is long so I just > cut out start_process() which you will find below. > >> You can also try to print $eip (or rip since this is 64 bit machine) >> and look around the assembly . Output of "disas start_process" from >> gdb will also help. >> > I tried those but I am not familier with assembly: > > (gdb) print $eip > $1 = void > (gdb) print $rip > $2 = (void (*)()) 0x404b5f <start_process+143> > (gdb) where > #0 0x000000304cc32215 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x000000304cc33d83 in abort () at abort.c:88 > #2 0x000000000040b174 in sig_segv () > #3 <signal handler called> > #4 0x0000000000404b5f in start_process () > #5 0x0000000000407b9a in main () > (gdb) disas start_process > Dump of assembler code for function start_process: > 0x0000000000404ad0 <start_process+0>: movslq %esi,%rsi > 0x0000000000404ad3 <start_process+3>: mov %rbx,-0x30(%rsp) > 0x0000000000404ad8 <start_process+8>: mov %rbp,-0x28(%rsp) > 0x0000000000404add <start_process+13>: mov %rsi,%r11 > 0x0000000000404ae0 <start_process+16>: mov $0x68,%esi > 0x0000000000404ae5 <start_process+21>: mov %r12,-0x20(%rsp) > 0x0000000000404aea <start_process+26>: imul %rsi,%r11 > 0x0000000000404aee <start_process+30>: mov %r13,-0x18(%rsp) > 0x0000000000404af3 <start_process+35>: mov %r14,-0x10(%rsp) > 0x0000000000404af8 <start_process+40>: mov %r15,-0x8(%rsp) > 0x0000000000404afd <start_process+45>: sub $0x568,%rsp > 0x0000000000404b04 <start_process+52>: mov %rdx,%rbx > 0x0000000000404b07 <start_process+55>: mov %edi,0x24(%rsp) > 0x0000000000404b0b <start_process+59>: mov %r11,%rdi > 0x0000000000404b0e <start_process+62>: add 0x225513(%rip),%rdi > # 0x62a028 <qb> > 0x0000000000404b15 <start_process+69>: cmpb $0x0,0x31(%rdi) > 0x0000000000404b19 <start_process+73>: je 0x404ed8 > <start_process+1032> > 0x0000000000404b1f <start_process+79>: movslq 0x28(%rdi),%rax > 0x0000000000404b23 <start_process+83>: lea 0x0(,%rax,8),%rdx > 0x0000000000404b2b <start_process+91>: mov %rax,%r8 > 0x0000000000404b2e <start_process+94>: shl $0x6,%r8 > 0x0000000000404b32 <start_process+98>: sub %rdx,%r8 > 0x0000000000404b35 <start_process+101>: add 0x2259cc(%rip),%r8 # > 0x62a508 <mdb> > 0x0000000000404b3c <start_process+108>: mov 0x2c(%r8),%r9d > 0x0000000000404b40 <start_process+112>: test %r9d,%r9d > 0x0000000000404b43 <start_process+115>: jne 0x404d70 > <start_process+672> > 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax > 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14 > 0x0000000000404b55 <start_process+133>: mov %r14,%rax > 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax > # 0x629fa0 <fsa> > 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx > 0x0000000000404b65 <start_process+149>: test $0x1,%dl > 0x0000000000404b68 <start_process+152>: jne 0x404d30 > <start_process+608> > 0x0000000000404b6e <start_process+158>: dec %ecx > 0x0000000000404b70 <start_process+160>: je 0x404bd0 > <start_process+256> > 0x0000000000404b72 <start_process+162>: mov 0xf0(%rax),%ecx > 0x0000000000404b78 <start_process+168>: mov $0x2,%esi > 0x0000000000404b7d <start_process+173>: test %ecx,%ecx > 0x0000000000404b7f <start_process+175>: jne 0x404c88 > <start_process+440> > 0x0000000000404b85 <start_process+181>: test %dl,%dl > 0x0000000000404b87 <start_process+183>: jns 0x404bd0 > <start_process+256> > 0x0000000000404b89 <start_process+185>: mov 0x104(%rax),%ecx > 0x0000000000404b8f <start_process+191>: movslq 0x28(%rdi),%rax > 0x0000000000404b93 <start_process+195>: mov $0xffffffff,%esi > 0x0000000000404b98 <start_process+200>: mov %r11,(%rsp) > 0x0000000000404b9c <start_process+204>: lea 0x0(,%rax,8),%rdx > 0x0000000000404ba4 <start_process+212>: shl $0x6,%rax > 0x0000000000404ba8 <start_process+216>: sub %rdx,%rax > 0x0000000000404bab <start_process+219>: mov 0x225956(%rip),%rdx > # 0x62a508 <mdb> > 0x0000000000404bb2 <start_process+226>: mov 0x28(%rdx,%rax,1),%edi > 0x0000000000404bb6 <start_process+230>: mov %rbx,%rdx > 0x0000000000404bb9 <start_process+233>: callq 0x41ab00 > <check_error_queue> > 0x0000000000404bbe <start_process+238>: test %eax,%eax > 0x0000000000404bc0 <start_process+240>: mov %eax,%esi > 0x0000000000404bc2 <start_process+242>: mov (%rsp),%r11 > 0x0000000000404bc6 <start_process+246>: jne 0x404c88 > <start_process+440> > 0x0000000000404bcc <start_process+252>: nopl 0x0(%rax) > 0x0000000000404bd0 <start_process+256>: mov %r14,%rcx > 0x0000000000404bd3 <start_process+259>: add 0x2253c6(%rip),%rcx > # 0x629fa0 <fsa> > 0x0000000000404bda <start_process+266>: cmpb $0x5,0xba(%rcx) > 0x0000000000404be1 <start_process+273>: je 0x404f88 > <start_process+1208> > 0x0000000000404be7 <start_process+279>: mov 0x225462(%rip),%rax > # 0x62a050 <p_afd_status> > 0x0000000000404bee <start_process+286>: mov 0x225194(%rip),%ecx > # 0x629d88 <max_connections> > 0x0000000000404bf4 <start_process+292>: cmp %ecx,0x4f4(%rax) > 0x0000000000404bfa <start_process+298>: jge 0x404d30 > <start_process+608> > 0x0000000000404c00 <start_process+304>: mov %r14,%r8 > 0x0000000000404c03 <start_process+307>: add 0x225396(%rip),%r8 # > 0x629fa0 <fsa> > 0x0000000000404c0a <start_process+314>: mov 0x174(%r8),%edi > 0x0000000000404c11 <start_process+321>: cmp %edi,0x170(%r8) > 0x0000000000404c18 <start_process+328>: jge 0x404d30 > <start_process+608> > 0x0000000000404c1e <start_process+334>: test %ecx,%ecx > 0x0000000000404c20 <start_process+336>: jle 0x404c5e > <start_process+398> > 0x0000000000404c22 <start_process+338>: mov 0x2251ff(%rip),%rsi > # 0x62---Type <return> to continue, or q <return> to quit---q > > So all I now know is that it happened with the assembly instruction: > > mov 0xec(%rax),%edx > > But what does it tell me. At what part of my code could this be? Hi Holger, I don't have the source code, so a bit hard to guess. But you can try to find out which member of your fsa structure is at offset 236 (0xec) and look around those lines in the function where you are accessing that member. I am trying to download the AFD source code, which looks like it will take ages on my slow broadband. Hopefully I can help after that. > > Thanks, > Holger > > --------- code of start_process() ---------- > static pid_t > start_process(int fsa_pos, int qb_pos, time_t current_time, int retry) > { > pid_t pid = PENDING; > > if ((qb[qb_pos].msg_name[0] != '\0') && > (mdb[qb[qb_pos].pos].age_limit > 0) && > ((fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA) == 0) && > (current_time > qb[qb_pos].creation_time) && > ((current_time - qb[qb_pos].creation_time) > > mdb[qb[qb_pos].pos].age_limit)) > { > char del_dir[MAX_PATH_LENGTH]; > > if (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) > { > remove_from_error_queue(mdb[qb[qb_pos].pos].job_id, &fsa[fsa_pos], > fsa_pos, fsa_fd); > } > (void)sprintf(del_dir, "%s%s%s/%s", > p_work_dir, AFD_FILE_DIR, > OUTGOING_DIR, qb[qb_pos].msg_name); > extract_cus(qb[qb_pos].msg_name, dl.input_time, dl.split_job_counter, > dl.unique_number); > remove_job_files(del_dir, fsa_pos, mdb[qb[qb_pos].pos].job_id, > FD, AGE_OUTPUT, -1); > ABS_REDUCE(fsa_pos); > pid = REMOVED; > } > else > { > int in_error_queue = NEITHER; > > if ((qb[qb_pos].msg_name[0] == '\0') && > (*(unsigned char *)((char *)fsa - AFD_FEATURE_FLAG_OFFSET_END) & > DISABLE_RETRIEVE)) > { > ABS_REDUCE(fsa_pos); > > return(REMOVED); > } > > if (((fsa[fsa_pos].host_status & STOP_TRANSFER_STAT) == 0) && > ((retry == YES) || > ((fsa[fsa_pos].error_counter == 0) && > (((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) == 0) || > ((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) && > ((in_error_queue = > check_error_queue(mdb[qb[qb_pos].pos].job_id, > -1, current_time, > > fsa[fsa_pos].retry_interval)) == NO)))) || > ((fsa[fsa_pos].error_counter > 0) && > (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) && > ((current_time - (fsa[fsa_pos].last_retry_time + > fsa[fsa_pos].retry_interval)) >= 0) && > ((in_error_queue == NO) || > ((in_error_queue == NEITHER) && > (check_error_queue(mdb[qb[qb_pos].pos].job_id, -1, > current_time, > fsa[fsa_pos].retry_interval) == NO)))) || > ((fsa[fsa_pos].active_transfers == 0) && > ((current_time - (fsa[fsa_pos].last_retry_time + > fsa[fsa_pos].retry_interval)) >= 0)))) > { > /* > * First lets try and take an existing process, > * that is waiting for more data to come. > */ > if ((fsa[fsa_pos].original_toggle_pos == NONE) && > ((fsa[fsa_pos].protocol_options & DISABLE_BURSTING) == 0) && > (fsa[fsa_pos].keep_connected > 0) && > (fsa[fsa_pos].active_transfers > 0) && > (fsa[fsa_pos].jobs_queued > 0) && > ((((fsa[fsa_pos].special_flag & KEEP_CON_NO_SEND) == 0) && > (qb[qb_pos].msg_name[0] != '\0')) || > (((fsa[fsa_pos].special_flag & KEEP_CON_NO_FETCH) == 0) && > (qb[qb_pos].msg_name[0] == '\0'))) && > ((qb[qb_pos].special_flag & HELPER_JOB) == 0)) > { > int i, > other_job_wait_pos[MAX_NO_PARALLEL_JOBS], > other_qb_pos[MAX_NO_PARALLEL_JOBS], > wait_counter = 0; > > for (i = 0; i < fsa[fsa_pos].allowed_transfers; i++) > { > if ((fsa[fsa_pos].job_status[i].proc_id != -1) && > (fsa[fsa_pos].job_status[i].unique_name[2] == 5)) > { > int exec_qb_pos; > > qb_pos_pid(fsa[fsa_pos].job_status[i].proc_id, > &exec_qb_pos); > if (exec_qb_pos != -1) > { > if ((qb[qb_pos].msg_name[0] != '\0') && > (qb[exec_qb_pos].msg_name[0] != '\0') && > (mdb[qb[qb_pos].pos].type == > mdb[qb[exec_qb_pos].pos].type) && > (mdb[qb[qb_pos].pos].port == > mdb[qb[exec_qb_pos].pos].port)) > { > if (qb[qb_pos].retries > 0) > { > fsa[fsa_pos].job_status[i].file_name_in_use[0] = > '\0'; > fsa[fsa_pos].job_status[i].file_name_in_use[1] = > 1; > > (void)sprintf(&fsa[fsa_pos].job_status[i].file_name_in_use[2], > "%u", qb[qb_pos].retries); > } > fsa[fsa_pos].job_status[i].job_id = > mdb[qb[qb_pos].pos].job_id; > mdb[qb[qb_pos].pos].last_transfer_time = > mdb[qb[exec_qb_pos].pos].last_transfer_time = current_time; > (void)memcpy(fsa[fsa_pos].job_status[i].unique_name, > qb[qb_pos].msg_name, > MAX_MSG_NAME_LENGTH); > > (void)memcpy(connection[qb[exec_qb_pos].connect_pos].msg_name, > qb[qb_pos].msg_name, > MAX_MSG_NAME_LENGTH); > qb[qb_pos].pid = qb[exec_qb_pos].pid; > qb[qb_pos].connect_pos = qb[exec_qb_pos].connect_pos; > qb[qb_pos].special_flag |= BURST_REQUEUE; > connection[qb[exec_qb_pos].connect_pos].job_no = i; > if (qb[exec_qb_pos].pid > 0) > { > if (kill(qb[exec_qb_pos].pid, SIGUSR1) == -1) > { > system_log(DEBUG_SIGN, __FILE__, __LINE__, > "Failed to send SIGUSR1 to %lld : > %s", > (pri_pid_t)qb[exec_qb_pos].pid, > strerror(errno)); > } > p_afd_status->burst2_counter++; > } > else > { > system_log(DEBUG_SIGN, __FILE__, __LINE__, > "Hmmm, pid = %lld!!!", > (pri_pid_t)qb[exec_qb_pos].pid); > } > if ((fsa[fsa_pos].transfer_rate_limit > 0) || > (no_of_trl_groups > 0)) > { > calc_trl_per_process(fsa_pos); > } > ABS_REDUCE(fsa_pos); > remove_msg(exec_qb_pos); > > return(qb[qb_pos].pid); > } > else > { > other_job_wait_pos[wait_counter] = i; > other_qb_pos[wait_counter] = exec_qb_pos; > wait_counter++; > } > } > else > { > system_log(DEBUG_SIGN, __FILE__, __LINE__, > "Unable to locate qb_pos for %lld > [fsa_pos=%d].", > > (pri_pid_t)fsa[fsa_pos].job_status[i].proc_id, > fsa_pos); > } > } > } > if ((fsa[fsa_pos].active_transfers == > fsa[fsa_pos].allowed_transfers) && > (wait_counter > 0)) > { > for (i = 0; i < wait_counter; i++) > { > if > (fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] == 5) > { > if (qb[other_qb_pos[i]].pid > 0) > { > > fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 6; > if (qb[other_qb_pos[i]].msg_name[0] == '\0') > { > return(PENDING); > } > else > { > if (kill(qb[other_qb_pos[i]].pid, SIGUSR1) == -1) > { > system_log(DEBUG_SIGN, __FILE__, __LINE__, > "Failed to send SIGUSR1 to %lld : > %s", > (pri_pid_t)qb[other_qb_pos[i]].pid, > strerror(errno)); > > fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 5; > } > else > { > return(PENDING); > } > } > } > else > { > system_log(DEBUG_SIGN, __FILE__, __LINE__, > "Hmmm, pid = %lld!!!", > (pri_pid_t)qb[other_qb_pos[i]].pid); > } > } > } > } > } > > if ((p_afd_status->no_of_transfers < max_connections) && > (fsa[fsa_pos].active_transfers < > fsa[fsa_pos].allowed_transfers)) > { > int pos; > > if ((pos = get_free_connection()) == INCORRECT) > { > system_log(ERROR_SIGN, __FILE__, __LINE__, > "Failed to get free connection."); > } > else > { > if ((connection[pos].job_no = get_free_disp_pos(fsa_pos)) != > INCORRECT) > { > if (qb[qb_pos].msg_name[0] == '\0') > { > connection[pos].fra_pos = qb[qb_pos].pos; > connection[pos].protocol = fra[qb[qb_pos].pos].protocol; > connection[pos].msg_name[0] = '\0'; > (void)memcpy(connection[pos].dir_alias, > fra[qb[qb_pos].pos].dir_alias, > MAX_DIR_ALIAS_LENGTH + 1); > } > else > { > connection[pos].fra_pos = -1; > connection[pos].protocol = mdb[qb[qb_pos].pos].type; > (void)memcpy(connection[pos].msg_name, > qb[qb_pos].msg_name, > MAX_MSG_NAME_LENGTH); > connection[pos].dir_alias[0] = '\0'; > } > if (qb[qb_pos].special_flag & RESEND_JOB) > { > connection[pos].resend = YES; > } > else > { > connection[pos].resend = NO; > } > connection[pos].temp_toggle = OFF; > (void)memcpy(connection[pos].hostname, > fsa[fsa_pos].host_alias, > MAX_HOSTNAME_LENGTH + 1); > connection[pos].host_id = fsa[fsa_pos].host_id; > connection[pos].fsa_pos = fsa_pos; > if (fd_check_fsa() == YES) > { > if (check_fra_fd() == YES) > { > init_fra_data(); > } > > /* > * We need to set the connection[pos].pid to a > * value higher then 0 so the function > get_new_positions() > * also locates the new connection[pos].fsa_pos. > Otherwise > * from here on we point to some completely different > * host and this can cause havoc when someone uses > * edit_hc and changes the alias order. > */ > connection[pos].pid = 1; > get_new_positions(); > connection[pos].pid = 0; > init_msg_buffer(); > fsa_pos = connection[pos].fsa_pos; > last_pos_lookup = INCORRECT; > } > > (void)strcpy(fsa[fsa_pos].job_status[connection[pos].job_no].unique_name, > qb[qb_pos].msg_name); > if ((fsa[fsa_pos].error_counter == 0) && > (fsa[fsa_pos].auto_toggle == ON) && > (fsa[fsa_pos].original_toggle_pos != NONE) && > (fsa[fsa_pos].max_successful_retries > 0)) > { > if ((fsa[fsa_pos].original_toggle_pos == > fsa[fsa_pos].toggle_pos) && > (fsa[fsa_pos].successful_retries > 0)) > { > fsa[fsa_pos].original_toggle_pos = NONE; > fsa[fsa_pos].successful_retries = 0; > } > else if (fsa[fsa_pos].successful_retries >= > fsa[fsa_pos].max_successful_retries) > { > connection[pos].temp_toggle = ON; > fsa[fsa_pos].successful_retries = 0; > } > else > { > fsa[fsa_pos].successful_retries++; > } > } > > /* Create process to distribute file. */ > if ((connection[pos].pid = make_process(&connection[pos], > qb_pos)) > 0) > { > pid = > fsa[fsa_pos].job_status[connection[pos].job_no].proc_id = > connection[pos].pid; > fsa[fsa_pos].active_transfers += 1; > if ((fsa[fsa_pos].transfer_rate_limit > 0) || > (no_of_trl_groups > 0)) > { > calc_trl_per_process(fsa_pos); > } > ABS_REDUCE(fsa_pos); > qb[qb_pos].connect_pos = pos; > p_afd_status->no_of_transfers++; > } > else > { > > fsa[fsa_pos].job_status[connection[pos].job_no].connect_status = > NOT_WORKING; > > fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files_done = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].file_size = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].file_size_done = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use_done = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[0] = '\0'; > > fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[1] = 0; > > fsa[fsa_pos].job_status[connection[pos].job_no].unique_name[0] = '\0'; > connection[pos].hostname[0] = '\0'; > connection[pos].msg_name[0] = '\0'; > connection[pos].host_id = 0; > connection[pos].job_no = -1; > connection[pos].fsa_pos = -1; > connection[pos].fra_pos = -1; > connection[pos].pid = 0; > } > } > } > } > } > } > return(pid); > } > -- Thanks - Manish ================================== [$\*.^ -- I miss being one of them ================================== -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 13:54 ` Manish Katiyar @ 2009-10-07 14:21 ` Holger Kiehl 2009-10-07 17:36 ` Manish Katiyar 0 siblings, 1 reply; 18+ messages in thread From: Holger Kiehl @ 2009-10-07 14:21 UTC (permalink / raw) To: Manish Katiyar; +Cc: linux-c-programming Hello Manish On Wed, 7 Oct 2009, Manish Katiyar wrote: > Hi Holger, > > I don't have the source code, so a bit hard to guess. But you can try > to find out which member of your fsa structure is at offset 236 (0xec) > and look around those lines in the function where you are accessing > that member. > > I am trying to download the AFD source code, which looks like it will > take ages on my slow broadband. Hopefully I can help after that. > If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that is the one that caused the error. You can get it from: ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2 You will find the relevant code in src/fd.c. Holger ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 14:21 ` Holger Kiehl @ 2009-10-07 17:36 ` Manish Katiyar 2009-10-08 18:47 ` Manish Katiyar 2009-10-09 12:09 ` Holger Kiehl 0 siblings, 2 replies; 18+ messages in thread From: Manish Katiyar @ 2009-10-07 17:36 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > Hello Manish > > On Wed, 7 Oct 2009, Manish Katiyar wrote: > >> Hi Holger, >> >> I don't have the source code, so a bit hard to guess. But you can try >> to find out which member of your fsa structure is at offset 236 (0xec) >> and look around those lines in the function where you are accessing >> that member. >> >> I am trying to download the AFD source code, which looks like it will >> take ages on my slow broadband. Hopefully I can help after that. >> > If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that > is the one that caused the error. You can get it from: > > ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2 > > You will find the relevant code in src/fd.c. Hi Holger, (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status) (gdb) p $offset $5 = 236 (gdb) p/x 236 $6 = 0xec host_status is at offset 236. In the function start_process I can see that this is used at places by dereferencing below "fsa[fsa_pos].host_status ". At this point my guess would be that you are getting fsa_pos as something illegal ie.. probably you are trying to access beyond the array. Since this is an input to the function, you can just check its value at the start and assert if that is ok and within reasonable range. HTH > > Holger > -- Thanks - Manish ================================== [$\*.^ -- I miss being one of them ================================== -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 17:36 ` Manish Katiyar @ 2009-10-08 18:47 ` Manish Katiyar 2009-10-09 12:09 ` Holger Kiehl 1 sibling, 0 replies; 18+ messages in thread From: Manish Katiyar @ 2009-10-08 18:47 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming On Wed, Oct 7, 2009 at 11:06 PM, Manish Katiyar <mkatiyar@gmail.com> wrote: > On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >> Hello Manish >> >> On Wed, 7 Oct 2009, Manish Katiyar wrote: >> >>> Hi Holger, >>> >>> I don't have the source code, so a bit hard to guess. But you can try >>> to find out which member of your fsa structure is at offset 236 (0xec) >>> and look around those lines in the function where you are accessing >>> that member. >>> >>> I am trying to download the AFD source code, which looks like it will >>> take ages on my slow broadband. Hopefully I can help after that. >>> >> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that >> is the one that caused the error. You can get it from: >> >> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2 >> >> You will find the relevant code in src/fd.c. Hi Holger, Have you been able to trace the bug ? > > Hi Holger, > > (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status) > (gdb) p $offset > $5 = 236 > (gdb) p/x 236 > $6 = 0xec > > host_status is at offset 236. In the function start_process I can see > that this is used at places by dereferencing below > "fsa[fsa_pos].host_status ". > > At this point my guess would be that you are getting fsa_pos as > something illegal ie.. probably you are trying to access beyond the > array. Since this is an input to the function, you can just check its > value at the start and assert if that is ok and within reasonable > range. > > HTH > > >> >> Holger >> > > > > -- > Thanks - > Manish > ================================== > [$\*.^ -- I miss being one of them > ================================== > -- Thanks - Manish ================================== [$\*.^ -- I miss being one of them ================================== -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 17:36 ` Manish Katiyar 2009-10-08 18:47 ` Manish Katiyar @ 2009-10-09 12:09 ` Holger Kiehl 2009-10-09 12:15 ` Manish Katiyar 1 sibling, 1 reply; 18+ messages in thread From: Holger Kiehl @ 2009-10-09 12:09 UTC (permalink / raw) To: Manish Katiyar; +Cc: linux-c-programming [-- Attachment #1: Type: TEXT/PLAIN, Size: 1897 bytes --] Hello Manish First, sorry for the late responce! On Wed, 7 Oct 2009, Manish Katiyar wrote: > On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >> Hello Manish >> >> On Wed, 7 Oct 2009, Manish Katiyar wrote: >> >>> Hi Holger, >>> >>> I don't have the source code, so a bit hard to guess. But you can try >>> to find out which member of your fsa structure is at offset 236 (0xec) >>> and look around those lines in the function where you are accessing >>> that member. >>> >>> I am trying to download the AFD source code, which looks like it will >>> take ages on my slow broadband. Hopefully I can help after that. >>> >> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that >> is the one that caused the error. You can get it from: >> >> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2 >> >> You will find the relevant code in src/fd.c. > > Hi Holger, > > (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status) > (gdb) p $offset > $5 = 236 > (gdb) p/x 236 > $6 = 0xec > > host_status is at offset 236. In the function start_process I can see > that this is used at places by dereferencing below > "fsa[fsa_pos].host_status ". > > At this point my guess would be that you are getting fsa_pos as > something illegal ie.. probably you are trying to access beyond the > array. Since this is an input to the function, you can just check its > value at the start and assert if that is ok and within reasonable > range. > > HTH > Many thanks for finding this out! I think I now, with your help, have a clue where the error could be. Is there a way to find out what value fsa_pos had at that time? If it was -1 then it is definitely the error I am thinking of, but if it is something else then I don't know. Again many thanks for the valuable help! Regards, Holger ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-09 12:09 ` Holger Kiehl @ 2009-10-09 12:15 ` Manish Katiyar 2009-10-09 12:43 ` Holger Kiehl 0 siblings, 1 reply; 18+ messages in thread From: Manish Katiyar @ 2009-10-09 12:15 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming On Fri, Oct 9, 2009 at 5:39 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > Hello Manish > > First, sorry for the late responce! > > On Wed, 7 Oct 2009, Manish Katiyar wrote: > >> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >>> >>> Hello Manish >>> >>> On Wed, 7 Oct 2009, Manish Katiyar wrote: >>> >>>> Hi Holger, >>>> >>>> I don't have the source code, so a bit hard to guess. But you can try >>>> to find out which member of your fsa structure is at offset 236 (0xec) >>>> and look around those lines in the function where you are accessing >>>> that member. >>>> >>>> I am trying to download the AFD source code, which looks like it will >>>> take ages on my slow broadband. Hopefully I can help after that. >>>> >>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that >>> is the one that caused the error. You can get it from: >>> >>> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2 >>> >>> You will find the relevant code in src/fd.c. >> >> Hi Holger, >> >> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status) >> (gdb) p $offset >> $5 = 236 >> (gdb) p/x 236 >> $6 = 0xec >> >> host_status is at offset 236. In the function start_process I can see >> that this is used at places by dereferencing below >> "fsa[fsa_pos].host_status ". >> >> At this point my guess would be that you are getting fsa_pos as >> something illegal ie.. probably you are trying to access beyond the >> array. Since this is an input to the function, you can just check its >> value at the start and assert if that is ok and within reasonable >> range. >> >> HTH >> > Many thanks for finding this out! I think I now, with your help, have a > clue where the error could be. Is there a way to find out what value > fsa_pos had at that time? Since it is a runtime variable, probably we can get something by looking at the output of "info registers". But you can try putting if (fsa_pos <0 ) { printf("going to die ... \n"); return } in the start of the function itself and try. > If it was -1 then it is definitely the error > I am thinking of, but if it is something else then I don't know. > > Again many thanks for the valuable help! > > Regards, > Holger -- Thanks - Manish ================================== [$\*.^ -- I miss being one of them ================================== -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-09 12:15 ` Manish Katiyar @ 2009-10-09 12:43 ` Holger Kiehl 2009-10-10 8:35 ` Glynn Clements 0 siblings, 1 reply; 18+ messages in thread From: Holger Kiehl @ 2009-10-09 12:43 UTC (permalink / raw) To: Manish Katiyar; +Cc: linux-c-programming [-- Attachment #1: Type: TEXT/PLAIN, Size: 3595 bytes --] On Fri, 9 Oct 2009, Manish Katiyar wrote: > On Fri, Oct 9, 2009 at 5:39 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >> Hello Manish >> >> First, sorry for the late responce! >> >> On Wed, 7 Oct 2009, Manish Katiyar wrote: >> >>> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: >>>> >>>> Hello Manish >>>> >>>> On Wed, 7 Oct 2009, Manish Katiyar wrote: >>>> >>>>> Hi Holger, >>>>> >>>>> I don't have the source code, so a bit hard to guess. But you can try >>>>> to find out which member of your fsa structure is at offset 236 (0xec) >>>>> and look around those lines in the function where you are accessing >>>>> that member. >>>>> >>>>> I am trying to download the AFD source code, which looks like it will >>>>> take ages on my slow broadband. Hopefully I can help after that. >>>>> >>>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that >>>> is the one that caused the error. You can get it from: >>>> >>>> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2 >>>> >>>> You will find the relevant code in src/fd.c. >>> >>> Hi Holger, >>> >>> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status) >>> (gdb) p $offset >>> $5 = 236 >>> (gdb) p/x 236 >>> $6 = 0xec >>> >>> host_status is at offset 236. In the function start_process I can see >>> that this is used at places by dereferencing below >>> "fsa[fsa_pos].host_status ". >>> >>> At this point my guess would be that you are getting fsa_pos as >>> something illegal ie.. probably you are trying to access beyond the >>> array. Since this is an input to the function, you can just check its >>> value at the start and assert if that is ok and within reasonable >>> range. >>> >>> HTH >>> >> Many thanks for finding this out! I think I now, with your help, have a >> clue where the error could be. Is there a way to find out what value >> fsa_pos had at that time? > > Since it is a runtime variable, probably we can get something by > looking at the output of "info registers". But you can try putting > How can I find which register is fsa_pos? (gdb) info registers rax 0x7fb48a2c8718 140413389014808 rbx 0x4acb3bcd 1254833101 rcx 0x0 0 rdx 0x7fb48a2c9010 140413389017104 rsi 0x68 104 rdi 0x7fb48a3795d8 140413389739480 rbp 0x0 0x0 rsp 0x7fffe4906840 0x7fffe4906840 r8 0x7fb48a346018 140413389529112 r9 0x0 0 r10 0x3f 63 r11 0x25c8 9672 r12 0x5d 93 r13 0xbbfe88b9 3154020537 r14 0xfffffffffffff708 -2296 r15 0x1 1 rip 0x404b5f 0x404b5f <start_process+143> eflags 0x10207 [ CF PF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fctrl 0x0 0 fstat 0x0 0 ftag 0x0 0 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x0 [ ] > if (fsa_pos <0 ) { > printf("going to die ... \n"); > return > } > > in the start of the function itself and try. > Yes, I have already added that. Thanks! Holger ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-09 12:43 ` Holger Kiehl @ 2009-10-10 8:35 ` Glynn Clements 2009-10-10 9:08 ` Manish Katiyar 2009-10-10 16:56 ` Holger Kiehl 0 siblings, 2 replies; 18+ messages in thread From: Glynn Clements @ 2009-10-10 8:35 UTC (permalink / raw) To: Holger Kiehl; +Cc: Manish Katiyar, linux-c-programming Holger Kiehl wrote: > How can I find which register is fsa_pos? fsa_pos is a parameter, and doesn't appear to be changed within the function, so I would expect "print fsa_pos" to give the correct value. AFAICT, the following portion of the disassembly: 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14 0x0000000000404b55 <start_process+133>: mov %r14,%rax 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa> 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx 0x0000000000404b65 <start_process+149>: test $0x1,%dl corresponds to the expression fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status field. So: movslq 0x24(%rsp),%rax # %rax = fsa_pos imul $0x8f8,%rax,%r14 # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0] mov %r14,%rax # %rax = &fsa[fsa_pos] - &fsa[0] add 0x225441(%rip),%rax # %rax = &fsa[fsa_pos] mov 0xec(%rax),%edx # %edx = fsa[fsa_pos].host_status Based upon this, %r14 should contain fsa_pos * 2296, so: > (gdb) info registers > r14 0xfffffffffffff708 -2296 Which suggests that fsa_pos is -1. -- Glynn Clements <glynn@gclements.plus.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-10 8:35 ` Glynn Clements @ 2009-10-10 9:08 ` Manish Katiyar 2009-10-10 16:56 ` Holger Kiehl 1 sibling, 0 replies; 18+ messages in thread From: Manish Katiyar @ 2009-10-10 9:08 UTC (permalink / raw) To: Glynn Clements; +Cc: Holger Kiehl, linux-c-programming On Sat, Oct 10, 2009 at 2:05 PM, Glynn Clements <glynn@gclements.plus.com> wrote: > > Holger Kiehl wrote: > >> How can I find which register is fsa_pos? > > fsa_pos is a parameter, and doesn't appear to be changed within the > function, so I would expect "print fsa_pos" to give the correct value. > > AFAICT, the following portion of the disassembly: > > 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax > 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14 > 0x0000000000404b55 <start_process+133>: mov %r14,%rax > 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa> > 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx > 0x0000000000404b65 <start_process+149>: test $0x1,%dl > > corresponds to the expression > > fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA > > 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of > fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status > field. > > So: > > movslq 0x24(%rsp),%rax # %rax = fsa_pos > imul $0x8f8,%rax,%r14 # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0] > mov %r14,%rax # %rax = &fsa[fsa_pos] - &fsa[0] > add 0x225441(%rip),%rax # %rax = &fsa[fsa_pos] > mov 0xec(%rax),%edx # %edx = fsa[fsa_pos].host_status > > Based upon this, %r14 should contain fsa_pos * 2296, so: > >> (gdb) info registers >> r14 0xfffffffffffff708 -2296 > > Which suggests that fsa_pos is -1. Excellent Glynn .... thanks :-) . I was having trouble deciphering it . > > -- > Glynn Clements <glynn@gclements.plus.com> > -- Thanks - Manish ================================== [$\*.^ -- I miss being one of them ================================== -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-10 8:35 ` Glynn Clements 2009-10-10 9:08 ` Manish Katiyar @ 2009-10-10 16:56 ` Holger Kiehl 1 sibling, 0 replies; 18+ messages in thread From: Holger Kiehl @ 2009-10-10 16:56 UTC (permalink / raw) To: Glynn Clements; +Cc: Manish Katiyar, linux-c-programming On Sat, 10 Oct 2009, Glynn Clements wrote: > > Holger Kiehl wrote: > >> How can I find which register is fsa_pos? > > fsa_pos is a parameter, and doesn't appear to be changed within the > function, so I would expect "print fsa_pos" to give the correct value. > Unfortunately not: (gdb) #4 0x0000000000404b5f in start_process () (gdb) print fsa_pos No symbol "fsa_pos" in current context. > AFAICT, the following portion of the disassembly: > > 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax > 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14 > 0x0000000000404b55 <start_process+133>: mov %r14,%rax > 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa> > 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx > 0x0000000000404b65 <start_process+149>: test $0x1,%dl > > corresponds to the expression > > fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA > > 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of > fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status > field. > > So: > > movslq 0x24(%rsp),%rax # %rax = fsa_pos > imul $0x8f8,%rax,%r14 # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0] > mov %r14,%rax # %rax = &fsa[fsa_pos] - &fsa[0] > add 0x225441(%rip),%rax # %rax = &fsa[fsa_pos] > mov 0xec(%rax),%edx # %edx = fsa[fsa_pos].host_status > > Based upon this, %r14 should contain fsa_pos * 2296, so: > >> (gdb) info registers >> r14 0xfffffffffffff708 -2296 > > Which suggests that fsa_pos is -1. > Many thanks for this very detailed explanation and confirming that fsa_pos was indeed -1. I would never have thought that one could find so much information of a core from an optimized binary without debug information. Thanks to you and Manish Katiyar for this valuable help! Regards, Holger ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-06 14:04 Question about core files Holger Kiehl 2009-10-06 14:41 ` Manish Katiyar @ 2009-10-07 4:45 ` Glynn Clements 2009-10-07 13:43 ` Holger Kiehl 2009-10-07 4:58 ` vinit dhatrak 2 siblings, 1 reply; 18+ messages in thread From: Glynn Clements @ 2009-10-07 4:45 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming Holger Kiehl wrote: > Most the time I compile my application without the -g option due to > performance reasons. The -g switch has absolutely no effect upon performance. It simply causes and additional section to be added to the resulting binary. When the program is run normally (i.e. not under gdb), that section won't be mapped. The only downside to -g is that it increases the size of the file. However: debug information isn't necessarily much help if you compile with optimisation enabled, as the resulting machine code will bear little resemblance to the original source code. Statements will be re-ordered, many variables will be eliminated, etc. > Problem is that when it hits some bug and dumps > core, this is not very useful because there is hardly any information > in it. Is there some way to get some useful information out of > the core file. For example one of my program crashed and with gdb > I see the following: [snip] > At least I know that the bug is in my function start_process. But is > there some way to find out at what line it happened? It isn't meaningful to talk about a "line" in the source code if you compile with optimisation enabled. However, you can tell gdb to disassemble the machine code for a particular function, and you can print the values contained in registers or at specific memory locations. Working out what that information means in terms of the source code is something which needs to be done manually. -- Glynn Clements <glynn@gclements.plus.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 4:45 ` Glynn Clements @ 2009-10-07 13:43 ` Holger Kiehl 2009-10-08 0:28 ` Glynn Clements 0 siblings, 1 reply; 18+ messages in thread From: Holger Kiehl @ 2009-10-07 13:43 UTC (permalink / raw) To: Glynn Clements; +Cc: linux-c-programming On Wed, 7 Oct 2009, Glynn Clements wrote: > > Holger Kiehl wrote: > >> Most the time I compile my application without the -g option due to >> performance reasons. > > The -g switch has absolutely no effect upon performance. It simply > causes and additional section to be added to the resulting binary. > When the program is run normally (i.e. not under gdb), that section > won't be mapped. The only downside to -g is that it increases the size > of the file. > But when executing the program will it not read the whole binary which is much larger with debug information and so will take longer (just the first reading of the binary)? Holger ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-07 13:43 ` Holger Kiehl @ 2009-10-08 0:28 ` Glynn Clements 2009-10-09 12:12 ` Holger Kiehl 0 siblings, 1 reply; 18+ messages in thread From: Glynn Clements @ 2009-10-08 0:28 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming Holger Kiehl wrote: > >> Most the time I compile my application without the -g option due to > >> performance reasons. > > > > The -g switch has absolutely no effect upon performance. It simply > > causes and additional section to be added to the resulting binary. > > When the program is run normally (i.e. not under gdb), that section > > won't be mapped. The only downside to -g is that it increases the size > > of the file. > > But when executing the program will it not read the whole binary which > is much larger with debug information and so will take longer (just the > first reading of the binary)? No. Binaries aren't "read", they're mapped (with mmap); pages are read into memory on demand. The loader only maps the sections which are actually required, which doesn't include the debug sections. -- Glynn Clements <glynn@gclements.plus.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-08 0:28 ` Glynn Clements @ 2009-10-09 12:12 ` Holger Kiehl 0 siblings, 0 replies; 18+ messages in thread From: Holger Kiehl @ 2009-10-09 12:12 UTC (permalink / raw) To: Glynn Clements; +Cc: linux-c-programming On Thu, 8 Oct 2009, Glynn Clements wrote: > > Holger Kiehl wrote: > >>>> Most the time I compile my application without the -g option due to >>>> performance reasons. >>> >>> The -g switch has absolutely no effect upon performance. It simply >>> causes and additional section to be added to the resulting binary. >>> When the program is run normally (i.e. not under gdb), that section >>> won't be mapped. The only downside to -g is that it increases the size >>> of the file. >> >> But when executing the program will it not read the whole binary which >> is much larger with debug information and so will take longer (just the >> first reading of the binary)? > > No. Binaries aren't "read", they're mapped (with mmap); pages are read > into memory on demand. The loader only maps the sections which are > actually required, which doesn't include the debug sections. > Thanks for the clarification! Holger ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files 2009-10-06 14:04 Question about core files Holger Kiehl 2009-10-06 14:41 ` Manish Katiyar 2009-10-07 4:45 ` Glynn Clements @ 2009-10-07 4:58 ` vinit dhatrak 2 siblings, 0 replies; 18+ messages in thread From: vinit dhatrak @ 2009-10-07 4:58 UTC (permalink / raw) To: Holger Kiehl; +Cc: linux-c-programming On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote: > Hello > > Most the time I compile my application without the -g option due to > performance reasons. Problem is that when it hits some bug and dumps GCC allows you to use -g option with -O flag. Here is what "man gcc" says, [snip] GCC allows you to use -g with -O. The shortcuts taken by optimized code may occasionally produce surprising results: some variables you declared may not exist at all; flow of control may briefly move where you did not expect it; some statements may not be executed because they compute constant results or their values were already at hand; some statements may execute in different places because they were moved out of loops. [\snip] -Vinit > core, this is not very useful because there is hardly any information > in it. Is there some way to get some useful information out of > the core file. For example one of my program crashed and with gdb > I see the following: > > afd@helena:~$ gdb fd core.2515 > GNU gdb Fedora (6.8-24.fc9) > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-redhat-linux-gnu"... > (no debugging symbols found) > > warning: Can't read pathname for load map: Input/output error. > Reading symbols from /lib64/libc-2.8.so...Reading symbols from > /usr/lib/debug/lib64/libc-2.8.so.debug...done. > done. > Loaded symbols for /lib64/libc-2.8.so > Reading symbols from /lib64/ld-2.8.so...Reading symbols from > /usr/lib/debug/lib64/ld-2.8.so.debug...done. > done. > Loaded symbols for /lib64/ld-2.8.so > Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from > /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done. > done. > Loaded symbols for /lib64/libnss_files-2.8.so > Core was generated by `fd -w /home/afd'. > Program terminated with signal 6, Aborted. > [New process 2515] > #0 0x000000304cc32215 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); > (gdb) where > #0 0x000000304cc32215 in raise (sig=<value optimized out>) > at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 > #1 0x000000304cc33d83 in abort () at abort.c:88 > #2 0x000000000040b174 in sig_segv () > #3 <signal handler called> > #4 0x0000000000404b5f in start_process () > #5 0x0000000000407b9a in main () > > At least I know that the bug is in my function start_process. But is > there some way to find out at what line it happened? > > Thanks, > Holger > -- > To unsubscribe from this list: send the line "unsubscribe > linux-c-programming" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-10-10 16:56 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-10-06 14:04 Question about core files Holger Kiehl 2009-10-06 14:41 ` Manish Katiyar 2009-10-07 13:28 ` Holger Kiehl 2009-10-07 13:54 ` Manish Katiyar 2009-10-07 14:21 ` Holger Kiehl 2009-10-07 17:36 ` Manish Katiyar 2009-10-08 18:47 ` Manish Katiyar 2009-10-09 12:09 ` Holger Kiehl 2009-10-09 12:15 ` Manish Katiyar 2009-10-09 12:43 ` Holger Kiehl 2009-10-10 8:35 ` Glynn Clements 2009-10-10 9:08 ` Manish Katiyar 2009-10-10 16:56 ` Holger Kiehl 2009-10-07 4:45 ` Glynn Clements 2009-10-07 13:43 ` Holger Kiehl 2009-10-08 0:28 ` Glynn Clements 2009-10-09 12:12 ` Holger Kiehl 2009-10-07 4:58 ` vinit dhatrak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).