* Question about core files
@ 2009-10-06 14:04 Holger Kiehl
2009-10-06 14:41 ` Manish Katiyar
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-06 14:04 UTC (permalink / raw)
To: linux-c-programming
Hello
Most the time I compile my application without the -g option due to
performance reasons. Problem is that when it hits some bug and dumps
core, this is not very useful because there is hardly any information
in it. Is there some way to get some useful information out of
the core file. For example one of my program crashed and with gdb
I see the following:
afd@helena:~$ gdb fd core.2515
GNU gdb Fedora (6.8-24.fc9)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(no debugging symbols found)
warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib64/libc-2.8.so...Reading symbols from /usr/lib/debug/lib64/libc-2.8.so.debug...done.
done.
Loaded symbols for /lib64/libc-2.8.so
Reading symbols from /lib64/ld-2.8.so...Reading symbols from /usr/lib/debug/lib64/ld-2.8.so.debug...done.
done.
Loaded symbols for /lib64/ld-2.8.so
Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
done.
Loaded symbols for /lib64/libnss_files-2.8.so
Core was generated by `fd -w /home/afd'.
Program terminated with signal 6, Aborted.
[New process 2515]
#0 0x000000304cc32215 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) where
#0 0x000000304cc32215 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x000000304cc33d83 in abort () at abort.c:88
#2 0x000000000040b174 in sig_segv ()
#3 <signal handler called>
#4 0x0000000000404b5f in start_process ()
#5 0x0000000000407b9a in main ()
At least I know that the bug is in my function start_process. But is
there some way to find out at what line it happened?
Thanks,
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-06 14:04 Question about core files Holger Kiehl
@ 2009-10-06 14:41 ` Manish Katiyar
2009-10-07 13:28 ` Holger Kiehl
2009-10-07 4:45 ` Glynn Clements
2009-10-07 4:58 ` vinit dhatrak
2 siblings, 1 reply; 18+ messages in thread
From: Manish Katiyar @ 2009-10-06 14:41 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello
>
> Most the time I compile my application without the -g option due to
> performance reasons. Problem is that when it hits some bug and dumps
> core, this is not very useful because there is hardly any information
> in it. Is there some way to get some useful information out of
> the core file.
Is it possible to post your code ? Atleast the start_process()
function. Given that you have got a sigsegv it is probably an invalid
pointer access.
You can also try to print $eip (or rip since this is 64 bit machine)
and look around the assembly . Output of "disas start_process" from
gdb will also help.
> For example one of my program crashed and with gdb
> I see the following:
>
> afd@helena:~$ gdb fd core.2515
> GNU gdb Fedora (6.8-24.fc9)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
>
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib64/libc-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libc-2.8.so.debug...done.
> done.
> Loaded symbols for /lib64/libc-2.8.so
> Reading symbols from /lib64/ld-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/ld-2.8.so.debug...done.
> done.
> Loaded symbols for /lib64/ld-2.8.so
> Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
> done.
> Loaded symbols for /lib64/libnss_files-2.8.so
> Core was generated by `fd -w /home/afd'.
> Program terminated with signal 6, Aborted.
> [New process 2515]
> #0 0x000000304cc32215 in raise (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> (gdb) where
> #0 0x000000304cc32215 in raise (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x000000304cc33d83 in abort () at abort.c:88
> #2 0x000000000040b174 in sig_segv ()
> #3 <signal handler called>
> #4 0x0000000000404b5f in start_process ()
> #5 0x0000000000407b9a in main ()
>
> At least I know that the bug is in my function start_process. But is
> there some way to find out at what line it happened?
>
> Thanks,
> Holger
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-06 14:04 Question about core files Holger Kiehl
2009-10-06 14:41 ` Manish Katiyar
@ 2009-10-07 4:45 ` Glynn Clements
2009-10-07 13:43 ` Holger Kiehl
2009-10-07 4:58 ` vinit dhatrak
2 siblings, 1 reply; 18+ messages in thread
From: Glynn Clements @ 2009-10-07 4:45 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
Holger Kiehl wrote:
> Most the time I compile my application without the -g option due to
> performance reasons.
The -g switch has absolutely no effect upon performance. It simply
causes and additional section to be added to the resulting binary.
When the program is run normally (i.e. not under gdb), that section
won't be mapped. The only downside to -g is that it increases the size
of the file.
However: debug information isn't necessarily much help if you compile
with optimisation enabled, as the resulting machine code will bear
little resemblance to the original source code. Statements will be
re-ordered, many variables will be eliminated, etc.
> Problem is that when it hits some bug and dumps
> core, this is not very useful because there is hardly any information
> in it. Is there some way to get some useful information out of
> the core file. For example one of my program crashed and with gdb
> I see the following:
[snip]
> At least I know that the bug is in my function start_process. But is
> there some way to find out at what line it happened?
It isn't meaningful to talk about a "line" in the source code if you
compile with optimisation enabled.
However, you can tell gdb to disassemble the machine code for a
particular function, and you can print the values contained in
registers or at specific memory locations. Working out what that
information means in terms of the source code is something which needs
to be done manually.
--
Glynn Clements <glynn@gclements.plus.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-06 14:04 Question about core files Holger Kiehl
2009-10-06 14:41 ` Manish Katiyar
2009-10-07 4:45 ` Glynn Clements
@ 2009-10-07 4:58 ` vinit dhatrak
2 siblings, 0 replies; 18+ messages in thread
From: vinit dhatrak @ 2009-10-07 4:58 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello
>
> Most the time I compile my application without the -g option due to
> performance reasons. Problem is that when it hits some bug and dumps
GCC allows you to use -g option with -O flag. Here is what "man gcc" says,
[snip]
GCC allows you to use -g with -O. The shortcuts taken by
optimized code may occasionally produce surprising results: some
variables you declared may not exist at all; flow of control may
briefly move where you did not expect it; some statements may not be
executed because they compute constant results or their values were
already at hand; some statements may execute in different places
because they were moved out of loops.
[\snip]
-Vinit
> core, this is not very useful because there is hardly any information
> in it. Is there some way to get some useful information out of
> the core file. For example one of my program crashed and with gdb
> I see the following:
>
> afd@helena:~$ gdb fd core.2515
> GNU gdb Fedora (6.8-24.fc9)
> Copyright (C) 2008 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
>
> warning: Can't read pathname for load map: Input/output error.
> Reading symbols from /lib64/libc-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libc-2.8.so.debug...done.
> done.
> Loaded symbols for /lib64/libc-2.8.so
> Reading symbols from /lib64/ld-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/ld-2.8.so.debug...done.
> done.
> Loaded symbols for /lib64/ld-2.8.so
> Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
> done.
> Loaded symbols for /lib64/libnss_files-2.8.so
> Core was generated by `fd -w /home/afd'.
> Program terminated with signal 6, Aborted.
> [New process 2515]
> #0 0x000000304cc32215 in raise (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
> (gdb) where
> #0 0x000000304cc32215 in raise (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x000000304cc33d83 in abort () at abort.c:88
> #2 0x000000000040b174 in sig_segv ()
> #3 <signal handler called>
> #4 0x0000000000404b5f in start_process ()
> #5 0x0000000000407b9a in main ()
>
> At least I know that the bug is in my function start_process. But is
> there some way to find out at what line it happened?
>
> Thanks,
> Holger
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-06 14:41 ` Manish Katiyar
@ 2009-10-07 13:28 ` Holger Kiehl
2009-10-07 13:54 ` Manish Katiyar
0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-07 13:28 UTC (permalink / raw)
To: Manish Katiyar; +Cc: linux-c-programming
On Tue, 6 Oct 2009, Manish Katiyar wrote:
> On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello
>>
>> Most the time I compile my application without the -g option due to
>> performance reasons. Problem is that when it hits some bug and dumps
>> core, this is not very useful because there is hardly any information
>> in it. Is there some way to get some useful information out of
>> the core file.
>
> Is it possible to post your code ? Atleast the start_process()
> function. Given that you have got a sigsegv it is probably an invalid
> pointer access.
>
The code is GPL so that is no problem. However it is long so I just
cut out start_process() which you will find below.
> You can also try to print $eip (or rip since this is 64 bit machine)
> and look around the assembly . Output of "disas start_process" from
> gdb will also help.
>
I tried those but I am not familier with assembly:
(gdb) print $eip
$1 = void
(gdb) print $rip
$2 = (void (*)()) 0x404b5f <start_process+143>
(gdb) where
#0 0x000000304cc32215 in raise (sig=<value optimized out>)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x000000304cc33d83 in abort () at abort.c:88
#2 0x000000000040b174 in sig_segv ()
#3 <signal handler called>
#4 0x0000000000404b5f in start_process ()
#5 0x0000000000407b9a in main ()
(gdb) disas start_process
Dump of assembler code for function start_process:
0x0000000000404ad0 <start_process+0>: movslq %esi,%rsi
0x0000000000404ad3 <start_process+3>: mov %rbx,-0x30(%rsp)
0x0000000000404ad8 <start_process+8>: mov %rbp,-0x28(%rsp)
0x0000000000404add <start_process+13>: mov %rsi,%r11
0x0000000000404ae0 <start_process+16>: mov $0x68,%esi
0x0000000000404ae5 <start_process+21>: mov %r12,-0x20(%rsp)
0x0000000000404aea <start_process+26>: imul %rsi,%r11
0x0000000000404aee <start_process+30>: mov %r13,-0x18(%rsp)
0x0000000000404af3 <start_process+35>: mov %r14,-0x10(%rsp)
0x0000000000404af8 <start_process+40>: mov %r15,-0x8(%rsp)
0x0000000000404afd <start_process+45>: sub $0x568,%rsp
0x0000000000404b04 <start_process+52>: mov %rdx,%rbx
0x0000000000404b07 <start_process+55>: mov %edi,0x24(%rsp)
0x0000000000404b0b <start_process+59>: mov %r11,%rdi
0x0000000000404b0e <start_process+62>: add 0x225513(%rip),%rdi # 0x62a028 <qb>
0x0000000000404b15 <start_process+69>: cmpb $0x0,0x31(%rdi)
0x0000000000404b19 <start_process+73>: je 0x404ed8 <start_process+1032>
0x0000000000404b1f <start_process+79>: movslq 0x28(%rdi),%rax
0x0000000000404b23 <start_process+83>: lea 0x0(,%rax,8),%rdx
0x0000000000404b2b <start_process+91>: mov %rax,%r8
0x0000000000404b2e <start_process+94>: shl $0x6,%r8
0x0000000000404b32 <start_process+98>: sub %rdx,%r8
0x0000000000404b35 <start_process+101>: add 0x2259cc(%rip),%r8 # 0x62a508 <mdb>
0x0000000000404b3c <start_process+108>: mov 0x2c(%r8),%r9d
0x0000000000404b40 <start_process+112>: test %r9d,%r9d
0x0000000000404b43 <start_process+115>: jne 0x404d70 <start_process+672>
0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14
0x0000000000404b55 <start_process+133>: mov %r14,%rax
0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa>
0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx
0x0000000000404b65 <start_process+149>: test $0x1,%dl
0x0000000000404b68 <start_process+152>: jne 0x404d30 <start_process+608>
0x0000000000404b6e <start_process+158>: dec %ecx
0x0000000000404b70 <start_process+160>: je 0x404bd0 <start_process+256>
0x0000000000404b72 <start_process+162>: mov 0xf0(%rax),%ecx
0x0000000000404b78 <start_process+168>: mov $0x2,%esi
0x0000000000404b7d <start_process+173>: test %ecx,%ecx
0x0000000000404b7f <start_process+175>: jne 0x404c88 <start_process+440>
0x0000000000404b85 <start_process+181>: test %dl,%dl
0x0000000000404b87 <start_process+183>: jns 0x404bd0 <start_process+256>
0x0000000000404b89 <start_process+185>: mov 0x104(%rax),%ecx
0x0000000000404b8f <start_process+191>: movslq 0x28(%rdi),%rax
0x0000000000404b93 <start_process+195>: mov $0xffffffff,%esi
0x0000000000404b98 <start_process+200>: mov %r11,(%rsp)
0x0000000000404b9c <start_process+204>: lea 0x0(,%rax,8),%rdx
0x0000000000404ba4 <start_process+212>: shl $0x6,%rax
0x0000000000404ba8 <start_process+216>: sub %rdx,%rax
0x0000000000404bab <start_process+219>: mov 0x225956(%rip),%rdx # 0x62a508 <mdb>
0x0000000000404bb2 <start_process+226>: mov 0x28(%rdx,%rax,1),%edi
0x0000000000404bb6 <start_process+230>: mov %rbx,%rdx
0x0000000000404bb9 <start_process+233>: callq 0x41ab00 <check_error_queue>
0x0000000000404bbe <start_process+238>: test %eax,%eax
0x0000000000404bc0 <start_process+240>: mov %eax,%esi
0x0000000000404bc2 <start_process+242>: mov (%rsp),%r11
0x0000000000404bc6 <start_process+246>: jne 0x404c88 <start_process+440>
0x0000000000404bcc <start_process+252>: nopl 0x0(%rax)
0x0000000000404bd0 <start_process+256>: mov %r14,%rcx
0x0000000000404bd3 <start_process+259>: add 0x2253c6(%rip),%rcx # 0x629fa0 <fsa>
0x0000000000404bda <start_process+266>: cmpb $0x5,0xba(%rcx)
0x0000000000404be1 <start_process+273>: je 0x404f88 <start_process+1208>
0x0000000000404be7 <start_process+279>: mov 0x225462(%rip),%rax # 0x62a050 <p_afd_status>
0x0000000000404bee <start_process+286>: mov 0x225194(%rip),%ecx # 0x629d88 <max_connections>
0x0000000000404bf4 <start_process+292>: cmp %ecx,0x4f4(%rax)
0x0000000000404bfa <start_process+298>: jge 0x404d30 <start_process+608>
0x0000000000404c00 <start_process+304>: mov %r14,%r8
0x0000000000404c03 <start_process+307>: add 0x225396(%rip),%r8 # 0x629fa0 <fsa>
0x0000000000404c0a <start_process+314>: mov 0x174(%r8),%edi
0x0000000000404c11 <start_process+321>: cmp %edi,0x170(%r8)
0x0000000000404c18 <start_process+328>: jge 0x404d30 <start_process+608>
0x0000000000404c1e <start_process+334>: test %ecx,%ecx
0x0000000000404c20 <start_process+336>: jle 0x404c5e <start_process+398>
0x0000000000404c22 <start_process+338>: mov 0x2251ff(%rip),%rsi # 0x62---Type <return> to continue, or q <return> to quit---q
So all I now know is that it happened with the assembly instruction:
mov 0xec(%rax),%edx
But what does it tell me. At what part of my code could this be?
Thanks,
Holger
--------- code of start_process() ----------
static pid_t
start_process(int fsa_pos, int qb_pos, time_t current_time, int retry)
{
pid_t pid = PENDING;
if ((qb[qb_pos].msg_name[0] != '\0') &&
(mdb[qb[qb_pos].pos].age_limit > 0) &&
((fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA) == 0) &&
(current_time > qb[qb_pos].creation_time) &&
((current_time - qb[qb_pos].creation_time) > mdb[qb[qb_pos].pos].age_limit))
{
char del_dir[MAX_PATH_LENGTH];
if (fsa[fsa_pos].host_status & ERROR_QUEUE_SET)
{
remove_from_error_queue(mdb[qb[qb_pos].pos].job_id, &fsa[fsa_pos],
fsa_pos, fsa_fd);
}
(void)sprintf(del_dir, "%s%s%s/%s",
p_work_dir, AFD_FILE_DIR,
OUTGOING_DIR, qb[qb_pos].msg_name);
extract_cus(qb[qb_pos].msg_name, dl.input_time, dl.split_job_counter,
dl.unique_number);
remove_job_files(del_dir, fsa_pos, mdb[qb[qb_pos].pos].job_id,
FD, AGE_OUTPUT, -1);
ABS_REDUCE(fsa_pos);
pid = REMOVED;
}
else
{
int in_error_queue = NEITHER;
if ((qb[qb_pos].msg_name[0] == '\0') &&
(*(unsigned char *)((char *)fsa - AFD_FEATURE_FLAG_OFFSET_END) & DISABLE_RETRIEVE))
{
ABS_REDUCE(fsa_pos);
return(REMOVED);
}
if (((fsa[fsa_pos].host_status & STOP_TRANSFER_STAT) == 0) &&
((retry == YES) ||
((fsa[fsa_pos].error_counter == 0) &&
(((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) == 0) ||
((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
((in_error_queue = check_error_queue(mdb[qb[qb_pos].pos].job_id,
-1, current_time,
fsa[fsa_pos].retry_interval)) == NO)))) ||
((fsa[fsa_pos].error_counter > 0) &&
(fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
((current_time - (fsa[fsa_pos].last_retry_time + fsa[fsa_pos].retry_interval)) >= 0) &&
((in_error_queue == NO) ||
((in_error_queue == NEITHER) &&
(check_error_queue(mdb[qb[qb_pos].pos].job_id, -1, current_time,
fsa[fsa_pos].retry_interval) == NO)))) ||
((fsa[fsa_pos].active_transfers == 0) &&
((current_time - (fsa[fsa_pos].last_retry_time + fsa[fsa_pos].retry_interval)) >= 0))))
{
/*
* First lets try and take an existing process,
* that is waiting for more data to come.
*/
if ((fsa[fsa_pos].original_toggle_pos == NONE) &&
((fsa[fsa_pos].protocol_options & DISABLE_BURSTING) == 0) &&
(fsa[fsa_pos].keep_connected > 0) &&
(fsa[fsa_pos].active_transfers > 0) &&
(fsa[fsa_pos].jobs_queued > 0) &&
((((fsa[fsa_pos].special_flag & KEEP_CON_NO_SEND) == 0) &&
(qb[qb_pos].msg_name[0] != '\0')) ||
(((fsa[fsa_pos].special_flag & KEEP_CON_NO_FETCH) == 0) &&
(qb[qb_pos].msg_name[0] == '\0'))) &&
((qb[qb_pos].special_flag & HELPER_JOB) == 0))
{
int i,
other_job_wait_pos[MAX_NO_PARALLEL_JOBS],
other_qb_pos[MAX_NO_PARALLEL_JOBS],
wait_counter = 0;
for (i = 0; i < fsa[fsa_pos].allowed_transfers; i++)
{
if ((fsa[fsa_pos].job_status[i].proc_id != -1) &&
(fsa[fsa_pos].job_status[i].unique_name[2] == 5))
{
int exec_qb_pos;
qb_pos_pid(fsa[fsa_pos].job_status[i].proc_id, &exec_qb_pos);
if (exec_qb_pos != -1)
{
if ((qb[qb_pos].msg_name[0] != '\0') &&
(qb[exec_qb_pos].msg_name[0] != '\0') &&
(mdb[qb[qb_pos].pos].type == mdb[qb[exec_qb_pos].pos].type) &&
(mdb[qb[qb_pos].pos].port == mdb[qb[exec_qb_pos].pos].port))
{
if (qb[qb_pos].retries > 0)
{
fsa[fsa_pos].job_status[i].file_name_in_use[0] = '\0';
fsa[fsa_pos].job_status[i].file_name_in_use[1] = 1;
(void)sprintf(&fsa[fsa_pos].job_status[i].file_name_in_use[2],
"%u", qb[qb_pos].retries);
}
fsa[fsa_pos].job_status[i].job_id = mdb[qb[qb_pos].pos].job_id;
mdb[qb[qb_pos].pos].last_transfer_time = mdb[qb[exec_qb_pos].pos].last_transfer_time = current_time;
(void)memcpy(fsa[fsa_pos].job_status[i].unique_name,
qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH);
(void)memcpy(connection[qb[exec_qb_pos].connect_pos].msg_name,
qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH);
qb[qb_pos].pid = qb[exec_qb_pos].pid;
qb[qb_pos].connect_pos = qb[exec_qb_pos].connect_pos;
qb[qb_pos].special_flag |= BURST_REQUEUE;
connection[qb[exec_qb_pos].connect_pos].job_no = i;
if (qb[exec_qb_pos].pid > 0)
{
if (kill(qb[exec_qb_pos].pid, SIGUSR1) == -1)
{
system_log(DEBUG_SIGN, __FILE__, __LINE__,
"Failed to send SIGUSR1 to %lld : %s",
(pri_pid_t)qb[exec_qb_pos].pid, strerror(errno));
}
p_afd_status->burst2_counter++;
}
else
{
system_log(DEBUG_SIGN, __FILE__, __LINE__,
"Hmmm, pid = %lld!!!", (pri_pid_t)qb[exec_qb_pos].pid);
}
if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
(no_of_trl_groups > 0))
{
calc_trl_per_process(fsa_pos);
}
ABS_REDUCE(fsa_pos);
remove_msg(exec_qb_pos);
return(qb[qb_pos].pid);
}
else
{
other_job_wait_pos[wait_counter] = i;
other_qb_pos[wait_counter] = exec_qb_pos;
wait_counter++;
}
}
else
{
system_log(DEBUG_SIGN, __FILE__, __LINE__,
"Unable to locate qb_pos for %lld [fsa_pos=%d].",
(pri_pid_t)fsa[fsa_pos].job_status[i].proc_id,
fsa_pos);
}
}
}
if ((fsa[fsa_pos].active_transfers == fsa[fsa_pos].allowed_transfers) &&
(wait_counter > 0))
{
for (i = 0; i < wait_counter; i++)
{
if (fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] == 5)
{
if (qb[other_qb_pos[i]].pid > 0)
{
fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 6;
if (qb[other_qb_pos[i]].msg_name[0] == '\0')
{
return(PENDING);
}
else
{
if (kill(qb[other_qb_pos[i]].pid, SIGUSR1) == -1)
{
system_log(DEBUG_SIGN, __FILE__, __LINE__,
"Failed to send SIGUSR1 to %lld : %s",
(pri_pid_t)qb[other_qb_pos[i]].pid, strerror(errno));
fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 5;
}
else
{
return(PENDING);
}
}
}
else
{
system_log(DEBUG_SIGN, __FILE__, __LINE__,
"Hmmm, pid = %lld!!!", (pri_pid_t)qb[other_qb_pos[i]].pid);
}
}
}
}
}
if ((p_afd_status->no_of_transfers < max_connections) &&
(fsa[fsa_pos].active_transfers < fsa[fsa_pos].allowed_transfers))
{
int pos;
if ((pos = get_free_connection()) == INCORRECT)
{
system_log(ERROR_SIGN, __FILE__, __LINE__,
"Failed to get free connection.");
}
else
{
if ((connection[pos].job_no = get_free_disp_pos(fsa_pos)) != INCORRECT)
{
if (qb[qb_pos].msg_name[0] == '\0')
{
connection[pos].fra_pos = qb[qb_pos].pos;
connection[pos].protocol = fra[qb[qb_pos].pos].protocol;
connection[pos].msg_name[0] = '\0';
(void)memcpy(connection[pos].dir_alias,
fra[qb[qb_pos].pos].dir_alias,
MAX_DIR_ALIAS_LENGTH + 1);
}
else
{
connection[pos].fra_pos = -1;
connection[pos].protocol = mdb[qb[qb_pos].pos].type;
(void)memcpy(connection[pos].msg_name, qb[qb_pos].msg_name,
MAX_MSG_NAME_LENGTH);
connection[pos].dir_alias[0] = '\0';
}
if (qb[qb_pos].special_flag & RESEND_JOB)
{
connection[pos].resend = YES;
}
else
{
connection[pos].resend = NO;
}
connection[pos].temp_toggle = OFF;
(void)memcpy(connection[pos].hostname, fsa[fsa_pos].host_alias,
MAX_HOSTNAME_LENGTH + 1);
connection[pos].host_id = fsa[fsa_pos].host_id;
connection[pos].fsa_pos = fsa_pos;
if (fd_check_fsa() == YES)
{
if (check_fra_fd() == YES)
{
init_fra_data();
}
/*
* We need to set the connection[pos].pid to a
* value higher then 0 so the function get_new_positions()
* also locates the new connection[pos].fsa_pos. Otherwise
* from here on we point to some completely different
* host and this can cause havoc when someone uses
* edit_hc and changes the alias order.
*/
connection[pos].pid = 1;
get_new_positions();
connection[pos].pid = 0;
init_msg_buffer();
fsa_pos = connection[pos].fsa_pos;
last_pos_lookup = INCORRECT;
}
(void)strcpy(fsa[fsa_pos].job_status[connection[pos].job_no].unique_name,
qb[qb_pos].msg_name);
if ((fsa[fsa_pos].error_counter == 0) &&
(fsa[fsa_pos].auto_toggle == ON) &&
(fsa[fsa_pos].original_toggle_pos != NONE) &&
(fsa[fsa_pos].max_successful_retries > 0))
{
if ((fsa[fsa_pos].original_toggle_pos == fsa[fsa_pos].toggle_pos) &&
(fsa[fsa_pos].successful_retries > 0))
{
fsa[fsa_pos].original_toggle_pos = NONE;
fsa[fsa_pos].successful_retries = 0;
}
else if (fsa[fsa_pos].successful_retries >= fsa[fsa_pos].max_successful_retries)
{
connection[pos].temp_toggle = ON;
fsa[fsa_pos].successful_retries = 0;
}
else
{
fsa[fsa_pos].successful_retries++;
}
}
/* Create process to distribute file. */
if ((connection[pos].pid = make_process(&connection[pos],
qb_pos)) > 0)
{
pid = fsa[fsa_pos].job_status[connection[pos].job_no].proc_id = connection[pos].pid;
fsa[fsa_pos].active_transfers += 1;
if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
(no_of_trl_groups > 0))
{
calc_trl_per_process(fsa_pos);
}
ABS_REDUCE(fsa_pos);
qb[qb_pos].connect_pos = pos;
p_afd_status->no_of_transfers++;
}
else
{
fsa[fsa_pos].job_status[connection[pos].job_no].connect_status = NOT_WORKING;
fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files_done = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].file_size = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].file_size_done = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use_done = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[0] = '\0';
fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[1] = 0;
fsa[fsa_pos].job_status[connection[pos].job_no].unique_name[0] = '\0';
connection[pos].hostname[0] = '\0';
connection[pos].msg_name[0] = '\0';
connection[pos].host_id = 0;
connection[pos].job_no = -1;
connection[pos].fsa_pos = -1;
connection[pos].fra_pos = -1;
connection[pos].pid = 0;
}
}
}
}
}
}
return(pid);
}
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 4:45 ` Glynn Clements
@ 2009-10-07 13:43 ` Holger Kiehl
2009-10-08 0:28 ` Glynn Clements
0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-07 13:43 UTC (permalink / raw)
To: Glynn Clements; +Cc: linux-c-programming
On Wed, 7 Oct 2009, Glynn Clements wrote:
>
> Holger Kiehl wrote:
>
>> Most the time I compile my application without the -g option due to
>> performance reasons.
>
> The -g switch has absolutely no effect upon performance. It simply
> causes and additional section to be added to the resulting binary.
> When the program is run normally (i.e. not under gdb), that section
> won't be mapped. The only downside to -g is that it increases the size
> of the file.
>
But when executing the program will it not read the whole binary which
is much larger with debug information and so will take longer (just the
first reading of the binary)?
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 13:28 ` Holger Kiehl
@ 2009-10-07 13:54 ` Manish Katiyar
2009-10-07 14:21 ` Holger Kiehl
0 siblings, 1 reply; 18+ messages in thread
From: Manish Katiyar @ 2009-10-07 13:54 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
On Wed, Oct 7, 2009 at 6:58 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> On Tue, 6 Oct 2009, Manish Katiyar wrote:
>
>> On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>>>
>>> Hello
>>>
>>> Most the time I compile my application without the -g option due to
>>> performance reasons. Problem is that when it hits some bug and dumps
>>> core, this is not very useful because there is hardly any information
>>> in it. Is there some way to get some useful information out of
>>> the core file.
>>
>> Is it possible to post your code ? Atleast the start_process()
>> function. Given that you have got a sigsegv it is probably an invalid
>> pointer access.
>>
> The code is GPL so that is no problem. However it is long so I just
> cut out start_process() which you will find below.
>
>> You can also try to print $eip (or rip since this is 64 bit machine)
>> and look around the assembly . Output of "disas start_process" from
>> gdb will also help.
>>
> I tried those but I am not familier with assembly:
>
> (gdb) print $eip
> $1 = void
> (gdb) print $rip
> $2 = (void (*)()) 0x404b5f <start_process+143>
> (gdb) where
> #0 0x000000304cc32215 in raise (sig=<value optimized out>)
> at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
> #1 0x000000304cc33d83 in abort () at abort.c:88
> #2 0x000000000040b174 in sig_segv ()
> #3 <signal handler called>
> #4 0x0000000000404b5f in start_process ()
> #5 0x0000000000407b9a in main ()
> (gdb) disas start_process
> Dump of assembler code for function start_process:
> 0x0000000000404ad0 <start_process+0>: movslq %esi,%rsi
> 0x0000000000404ad3 <start_process+3>: mov %rbx,-0x30(%rsp)
> 0x0000000000404ad8 <start_process+8>: mov %rbp,-0x28(%rsp)
> 0x0000000000404add <start_process+13>: mov %rsi,%r11
> 0x0000000000404ae0 <start_process+16>: mov $0x68,%esi
> 0x0000000000404ae5 <start_process+21>: mov %r12,-0x20(%rsp)
> 0x0000000000404aea <start_process+26>: imul %rsi,%r11
> 0x0000000000404aee <start_process+30>: mov %r13,-0x18(%rsp)
> 0x0000000000404af3 <start_process+35>: mov %r14,-0x10(%rsp)
> 0x0000000000404af8 <start_process+40>: mov %r15,-0x8(%rsp)
> 0x0000000000404afd <start_process+45>: sub $0x568,%rsp
> 0x0000000000404b04 <start_process+52>: mov %rdx,%rbx
> 0x0000000000404b07 <start_process+55>: mov %edi,0x24(%rsp)
> 0x0000000000404b0b <start_process+59>: mov %r11,%rdi
> 0x0000000000404b0e <start_process+62>: add 0x225513(%rip),%rdi
> # 0x62a028 <qb>
> 0x0000000000404b15 <start_process+69>: cmpb $0x0,0x31(%rdi)
> 0x0000000000404b19 <start_process+73>: je 0x404ed8
> <start_process+1032>
> 0x0000000000404b1f <start_process+79>: movslq 0x28(%rdi),%rax
> 0x0000000000404b23 <start_process+83>: lea 0x0(,%rax,8),%rdx
> 0x0000000000404b2b <start_process+91>: mov %rax,%r8
> 0x0000000000404b2e <start_process+94>: shl $0x6,%r8
> 0x0000000000404b32 <start_process+98>: sub %rdx,%r8
> 0x0000000000404b35 <start_process+101>: add 0x2259cc(%rip),%r8 #
> 0x62a508 <mdb>
> 0x0000000000404b3c <start_process+108>: mov 0x2c(%r8),%r9d
> 0x0000000000404b40 <start_process+112>: test %r9d,%r9d
> 0x0000000000404b43 <start_process+115>: jne 0x404d70
> <start_process+672>
> 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
> 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14
> 0x0000000000404b55 <start_process+133>: mov %r14,%rax
> 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax
> # 0x629fa0 <fsa>
> 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx
> 0x0000000000404b65 <start_process+149>: test $0x1,%dl
> 0x0000000000404b68 <start_process+152>: jne 0x404d30
> <start_process+608>
> 0x0000000000404b6e <start_process+158>: dec %ecx
> 0x0000000000404b70 <start_process+160>: je 0x404bd0
> <start_process+256>
> 0x0000000000404b72 <start_process+162>: mov 0xf0(%rax),%ecx
> 0x0000000000404b78 <start_process+168>: mov $0x2,%esi
> 0x0000000000404b7d <start_process+173>: test %ecx,%ecx
> 0x0000000000404b7f <start_process+175>: jne 0x404c88
> <start_process+440>
> 0x0000000000404b85 <start_process+181>: test %dl,%dl
> 0x0000000000404b87 <start_process+183>: jns 0x404bd0
> <start_process+256>
> 0x0000000000404b89 <start_process+185>: mov 0x104(%rax),%ecx
> 0x0000000000404b8f <start_process+191>: movslq 0x28(%rdi),%rax
> 0x0000000000404b93 <start_process+195>: mov $0xffffffff,%esi
> 0x0000000000404b98 <start_process+200>: mov %r11,(%rsp)
> 0x0000000000404b9c <start_process+204>: lea 0x0(,%rax,8),%rdx
> 0x0000000000404ba4 <start_process+212>: shl $0x6,%rax
> 0x0000000000404ba8 <start_process+216>: sub %rdx,%rax
> 0x0000000000404bab <start_process+219>: mov 0x225956(%rip),%rdx
> # 0x62a508 <mdb>
> 0x0000000000404bb2 <start_process+226>: mov 0x28(%rdx,%rax,1),%edi
> 0x0000000000404bb6 <start_process+230>: mov %rbx,%rdx
> 0x0000000000404bb9 <start_process+233>: callq 0x41ab00
> <check_error_queue>
> 0x0000000000404bbe <start_process+238>: test %eax,%eax
> 0x0000000000404bc0 <start_process+240>: mov %eax,%esi
> 0x0000000000404bc2 <start_process+242>: mov (%rsp),%r11
> 0x0000000000404bc6 <start_process+246>: jne 0x404c88
> <start_process+440>
> 0x0000000000404bcc <start_process+252>: nopl 0x0(%rax)
> 0x0000000000404bd0 <start_process+256>: mov %r14,%rcx
> 0x0000000000404bd3 <start_process+259>: add 0x2253c6(%rip),%rcx
> # 0x629fa0 <fsa>
> 0x0000000000404bda <start_process+266>: cmpb $0x5,0xba(%rcx)
> 0x0000000000404be1 <start_process+273>: je 0x404f88
> <start_process+1208>
> 0x0000000000404be7 <start_process+279>: mov 0x225462(%rip),%rax
> # 0x62a050 <p_afd_status>
> 0x0000000000404bee <start_process+286>: mov 0x225194(%rip),%ecx
> # 0x629d88 <max_connections>
> 0x0000000000404bf4 <start_process+292>: cmp %ecx,0x4f4(%rax)
> 0x0000000000404bfa <start_process+298>: jge 0x404d30
> <start_process+608>
> 0x0000000000404c00 <start_process+304>: mov %r14,%r8
> 0x0000000000404c03 <start_process+307>: add 0x225396(%rip),%r8 #
> 0x629fa0 <fsa>
> 0x0000000000404c0a <start_process+314>: mov 0x174(%r8),%edi
> 0x0000000000404c11 <start_process+321>: cmp %edi,0x170(%r8)
> 0x0000000000404c18 <start_process+328>: jge 0x404d30
> <start_process+608>
> 0x0000000000404c1e <start_process+334>: test %ecx,%ecx
> 0x0000000000404c20 <start_process+336>: jle 0x404c5e
> <start_process+398>
> 0x0000000000404c22 <start_process+338>: mov 0x2251ff(%rip),%rsi
> # 0x62---Type <return> to continue, or q <return> to quit---q
>
> So all I now know is that it happened with the assembly instruction:
>
> mov 0xec(%rax),%edx
>
> But what does it tell me. At what part of my code could this be?
Hi Holger,
I don't have the source code, so a bit hard to guess. But you can try
to find out which member of your fsa structure is at offset 236 (0xec)
and look around those lines in the function where you are accessing
that member.
I am trying to download the AFD source code, which looks like it will
take ages on my slow broadband. Hopefully I can help after that.
>
> Thanks,
> Holger
>
> --------- code of start_process() ----------
> static pid_t
> start_process(int fsa_pos, int qb_pos, time_t current_time, int retry)
> {
> pid_t pid = PENDING;
>
> if ((qb[qb_pos].msg_name[0] != '\0') &&
> (mdb[qb[qb_pos].pos].age_limit > 0) &&
> ((fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA) == 0) &&
> (current_time > qb[qb_pos].creation_time) &&
> ((current_time - qb[qb_pos].creation_time) >
> mdb[qb[qb_pos].pos].age_limit))
> {
> char del_dir[MAX_PATH_LENGTH];
>
> if (fsa[fsa_pos].host_status & ERROR_QUEUE_SET)
> {
> remove_from_error_queue(mdb[qb[qb_pos].pos].job_id, &fsa[fsa_pos],
> fsa_pos, fsa_fd);
> }
> (void)sprintf(del_dir, "%s%s%s/%s",
> p_work_dir, AFD_FILE_DIR,
> OUTGOING_DIR, qb[qb_pos].msg_name);
> extract_cus(qb[qb_pos].msg_name, dl.input_time, dl.split_job_counter,
> dl.unique_number);
> remove_job_files(del_dir, fsa_pos, mdb[qb[qb_pos].pos].job_id,
> FD, AGE_OUTPUT, -1);
> ABS_REDUCE(fsa_pos);
> pid = REMOVED;
> }
> else
> {
> int in_error_queue = NEITHER;
>
> if ((qb[qb_pos].msg_name[0] == '\0') &&
> (*(unsigned char *)((char *)fsa - AFD_FEATURE_FLAG_OFFSET_END) &
> DISABLE_RETRIEVE))
> {
> ABS_REDUCE(fsa_pos);
>
> return(REMOVED);
> }
>
> if (((fsa[fsa_pos].host_status & STOP_TRANSFER_STAT) == 0) &&
> ((retry == YES) ||
> ((fsa[fsa_pos].error_counter == 0) &&
> (((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) == 0) ||
> ((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
> ((in_error_queue =
> check_error_queue(mdb[qb[qb_pos].pos].job_id,
> -1, current_time,
>
> fsa[fsa_pos].retry_interval)) == NO)))) ||
> ((fsa[fsa_pos].error_counter > 0) &&
> (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
> ((current_time - (fsa[fsa_pos].last_retry_time +
> fsa[fsa_pos].retry_interval)) >= 0) &&
> ((in_error_queue == NO) ||
> ((in_error_queue == NEITHER) &&
> (check_error_queue(mdb[qb[qb_pos].pos].job_id, -1,
> current_time,
> fsa[fsa_pos].retry_interval) == NO)))) ||
> ((fsa[fsa_pos].active_transfers == 0) &&
> ((current_time - (fsa[fsa_pos].last_retry_time +
> fsa[fsa_pos].retry_interval)) >= 0))))
> {
> /*
> * First lets try and take an existing process,
> * that is waiting for more data to come.
> */
> if ((fsa[fsa_pos].original_toggle_pos == NONE) &&
> ((fsa[fsa_pos].protocol_options & DISABLE_BURSTING) == 0) &&
> (fsa[fsa_pos].keep_connected > 0) &&
> (fsa[fsa_pos].active_transfers > 0) &&
> (fsa[fsa_pos].jobs_queued > 0) &&
> ((((fsa[fsa_pos].special_flag & KEEP_CON_NO_SEND) == 0) &&
> (qb[qb_pos].msg_name[0] != '\0')) ||
> (((fsa[fsa_pos].special_flag & KEEP_CON_NO_FETCH) == 0) &&
> (qb[qb_pos].msg_name[0] == '\0'))) &&
> ((qb[qb_pos].special_flag & HELPER_JOB) == 0))
> {
> int i,
> other_job_wait_pos[MAX_NO_PARALLEL_JOBS],
> other_qb_pos[MAX_NO_PARALLEL_JOBS],
> wait_counter = 0;
>
> for (i = 0; i < fsa[fsa_pos].allowed_transfers; i++)
> {
> if ((fsa[fsa_pos].job_status[i].proc_id != -1) &&
> (fsa[fsa_pos].job_status[i].unique_name[2] == 5))
> {
> int exec_qb_pos;
>
> qb_pos_pid(fsa[fsa_pos].job_status[i].proc_id,
> &exec_qb_pos);
> if (exec_qb_pos != -1)
> {
> if ((qb[qb_pos].msg_name[0] != '\0') &&
> (qb[exec_qb_pos].msg_name[0] != '\0') &&
> (mdb[qb[qb_pos].pos].type ==
> mdb[qb[exec_qb_pos].pos].type) &&
> (mdb[qb[qb_pos].pos].port ==
> mdb[qb[exec_qb_pos].pos].port))
> {
> if (qb[qb_pos].retries > 0)
> {
> fsa[fsa_pos].job_status[i].file_name_in_use[0] =
> '\0';
> fsa[fsa_pos].job_status[i].file_name_in_use[1] =
> 1;
>
> (void)sprintf(&fsa[fsa_pos].job_status[i].file_name_in_use[2],
> "%u", qb[qb_pos].retries);
> }
> fsa[fsa_pos].job_status[i].job_id =
> mdb[qb[qb_pos].pos].job_id;
> mdb[qb[qb_pos].pos].last_transfer_time =
> mdb[qb[exec_qb_pos].pos].last_transfer_time = current_time;
> (void)memcpy(fsa[fsa_pos].job_status[i].unique_name,
> qb[qb_pos].msg_name,
> MAX_MSG_NAME_LENGTH);
>
> (void)memcpy(connection[qb[exec_qb_pos].connect_pos].msg_name,
> qb[qb_pos].msg_name,
> MAX_MSG_NAME_LENGTH);
> qb[qb_pos].pid = qb[exec_qb_pos].pid;
> qb[qb_pos].connect_pos = qb[exec_qb_pos].connect_pos;
> qb[qb_pos].special_flag |= BURST_REQUEUE;
> connection[qb[exec_qb_pos].connect_pos].job_no = i;
> if (qb[exec_qb_pos].pid > 0)
> {
> if (kill(qb[exec_qb_pos].pid, SIGUSR1) == -1)
> {
> system_log(DEBUG_SIGN, __FILE__, __LINE__,
> "Failed to send SIGUSR1 to %lld :
> %s",
> (pri_pid_t)qb[exec_qb_pos].pid,
> strerror(errno));
> }
> p_afd_status->burst2_counter++;
> }
> else
> {
> system_log(DEBUG_SIGN, __FILE__, __LINE__,
> "Hmmm, pid = %lld!!!",
> (pri_pid_t)qb[exec_qb_pos].pid);
> }
> if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
> (no_of_trl_groups > 0))
> {
> calc_trl_per_process(fsa_pos);
> }
> ABS_REDUCE(fsa_pos);
> remove_msg(exec_qb_pos);
>
> return(qb[qb_pos].pid);
> }
> else
> {
> other_job_wait_pos[wait_counter] = i;
> other_qb_pos[wait_counter] = exec_qb_pos;
> wait_counter++;
> }
> }
> else
> {
> system_log(DEBUG_SIGN, __FILE__, __LINE__,
> "Unable to locate qb_pos for %lld
> [fsa_pos=%d].",
>
> (pri_pid_t)fsa[fsa_pos].job_status[i].proc_id,
> fsa_pos);
> }
> }
> }
> if ((fsa[fsa_pos].active_transfers ==
> fsa[fsa_pos].allowed_transfers) &&
> (wait_counter > 0))
> {
> for (i = 0; i < wait_counter; i++)
> {
> if
> (fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] == 5)
> {
> if (qb[other_qb_pos[i]].pid > 0)
> {
>
> fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 6;
> if (qb[other_qb_pos[i]].msg_name[0] == '\0')
> {
> return(PENDING);
> }
> else
> {
> if (kill(qb[other_qb_pos[i]].pid, SIGUSR1) == -1)
> {
> system_log(DEBUG_SIGN, __FILE__, __LINE__,
> "Failed to send SIGUSR1 to %lld :
> %s",
> (pri_pid_t)qb[other_qb_pos[i]].pid,
> strerror(errno));
>
> fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 5;
> }
> else
> {
> return(PENDING);
> }
> }
> }
> else
> {
> system_log(DEBUG_SIGN, __FILE__, __LINE__,
> "Hmmm, pid = %lld!!!",
> (pri_pid_t)qb[other_qb_pos[i]].pid);
> }
> }
> }
> }
> }
>
> if ((p_afd_status->no_of_transfers < max_connections) &&
> (fsa[fsa_pos].active_transfers <
> fsa[fsa_pos].allowed_transfers))
> {
> int pos;
>
> if ((pos = get_free_connection()) == INCORRECT)
> {
> system_log(ERROR_SIGN, __FILE__, __LINE__,
> "Failed to get free connection.");
> }
> else
> {
> if ((connection[pos].job_no = get_free_disp_pos(fsa_pos)) !=
> INCORRECT)
> {
> if (qb[qb_pos].msg_name[0] == '\0')
> {
> connection[pos].fra_pos = qb[qb_pos].pos;
> connection[pos].protocol = fra[qb[qb_pos].pos].protocol;
> connection[pos].msg_name[0] = '\0';
> (void)memcpy(connection[pos].dir_alias,
> fra[qb[qb_pos].pos].dir_alias,
> MAX_DIR_ALIAS_LENGTH + 1);
> }
> else
> {
> connection[pos].fra_pos = -1;
> connection[pos].protocol = mdb[qb[qb_pos].pos].type;
> (void)memcpy(connection[pos].msg_name,
> qb[qb_pos].msg_name,
> MAX_MSG_NAME_LENGTH);
> connection[pos].dir_alias[0] = '\0';
> }
> if (qb[qb_pos].special_flag & RESEND_JOB)
> {
> connection[pos].resend = YES;
> }
> else
> {
> connection[pos].resend = NO;
> }
> connection[pos].temp_toggle = OFF;
> (void)memcpy(connection[pos].hostname,
> fsa[fsa_pos].host_alias,
> MAX_HOSTNAME_LENGTH + 1);
> connection[pos].host_id = fsa[fsa_pos].host_id;
> connection[pos].fsa_pos = fsa_pos;
> if (fd_check_fsa() == YES)
> {
> if (check_fra_fd() == YES)
> {
> init_fra_data();
> }
>
> /*
> * We need to set the connection[pos].pid to a
> * value higher then 0 so the function
> get_new_positions()
> * also locates the new connection[pos].fsa_pos.
> Otherwise
> * from here on we point to some completely different
> * host and this can cause havoc when someone uses
> * edit_hc and changes the alias order.
> */
> connection[pos].pid = 1;
> get_new_positions();
> connection[pos].pid = 0;
> init_msg_buffer();
> fsa_pos = connection[pos].fsa_pos;
> last_pos_lookup = INCORRECT;
> }
>
> (void)strcpy(fsa[fsa_pos].job_status[connection[pos].job_no].unique_name,
> qb[qb_pos].msg_name);
> if ((fsa[fsa_pos].error_counter == 0) &&
> (fsa[fsa_pos].auto_toggle == ON) &&
> (fsa[fsa_pos].original_toggle_pos != NONE) &&
> (fsa[fsa_pos].max_successful_retries > 0))
> {
> if ((fsa[fsa_pos].original_toggle_pos ==
> fsa[fsa_pos].toggle_pos) &&
> (fsa[fsa_pos].successful_retries > 0))
> {
> fsa[fsa_pos].original_toggle_pos = NONE;
> fsa[fsa_pos].successful_retries = 0;
> }
> else if (fsa[fsa_pos].successful_retries >=
> fsa[fsa_pos].max_successful_retries)
> {
> connection[pos].temp_toggle = ON;
> fsa[fsa_pos].successful_retries = 0;
> }
> else
> {
> fsa[fsa_pos].successful_retries++;
> }
> }
>
> /* Create process to distribute file. */
> if ((connection[pos].pid = make_process(&connection[pos],
> qb_pos)) > 0)
> {
> pid =
> fsa[fsa_pos].job_status[connection[pos].job_no].proc_id =
> connection[pos].pid;
> fsa[fsa_pos].active_transfers += 1;
> if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
> (no_of_trl_groups > 0))
> {
> calc_trl_per_process(fsa_pos);
> }
> ABS_REDUCE(fsa_pos);
> qb[qb_pos].connect_pos = pos;
> p_afd_status->no_of_transfers++;
> }
> else
> {
>
> fsa[fsa_pos].job_status[connection[pos].job_no].connect_status =
> NOT_WORKING;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files_done = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size_done = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use_done = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[0] = '\0';
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[1] = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].unique_name[0] = '\0';
> connection[pos].hostname[0] = '\0';
> connection[pos].msg_name[0] = '\0';
> connection[pos].host_id = 0;
> connection[pos].job_no = -1;
> connection[pos].fsa_pos = -1;
> connection[pos].fra_pos = -1;
> connection[pos].pid = 0;
> }
> }
> }
> }
> }
> }
> return(pid);
> }
>
--
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 13:54 ` Manish Katiyar
@ 2009-10-07 14:21 ` Holger Kiehl
2009-10-07 17:36 ` Manish Katiyar
0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-07 14:21 UTC (permalink / raw)
To: Manish Katiyar; +Cc: linux-c-programming
Hello Manish
On Wed, 7 Oct 2009, Manish Katiyar wrote:
> Hi Holger,
>
> I don't have the source code, so a bit hard to guess. But you can try
> to find out which member of your fsa structure is at offset 236 (0xec)
> and look around those lines in the function where you are accessing
> that member.
>
> I am trying to download the AFD source code, which looks like it will
> take ages on my slow broadband. Hopefully I can help after that.
>
If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
is the one that caused the error. You can get it from:
ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
You will find the relevant code in src/fd.c.
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 14:21 ` Holger Kiehl
@ 2009-10-07 17:36 ` Manish Katiyar
2009-10-08 18:47 ` Manish Katiyar
2009-10-09 12:09 ` Holger Kiehl
0 siblings, 2 replies; 18+ messages in thread
From: Manish Katiyar @ 2009-10-07 17:36 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello Manish
>
> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>
>> Hi Holger,
>>
>> I don't have the source code, so a bit hard to guess. But you can try
>> to find out which member of your fsa structure is at offset 236 (0xec)
>> and look around those lines in the function where you are accessing
>> that member.
>>
>> I am trying to download the AFD source code, which looks like it will
>> take ages on my slow broadband. Hopefully I can help after that.
>>
> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
> is the one that caused the error. You can get it from:
>
> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>
> You will find the relevant code in src/fd.c.
Hi Holger,
(gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
(gdb) p $offset
$5 = 236
(gdb) p/x 236
$6 = 0xec
host_status is at offset 236. In the function start_process I can see
that this is used at places by dereferencing below
"fsa[fsa_pos].host_status ".
At this point my guess would be that you are getting fsa_pos as
something illegal ie.. probably you are trying to access beyond the
array. Since this is an input to the function, you can just check its
value at the start and assert if that is ok and within reasonable
range.
HTH
>
> Holger
>
--
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 13:43 ` Holger Kiehl
@ 2009-10-08 0:28 ` Glynn Clements
2009-10-09 12:12 ` Holger Kiehl
0 siblings, 1 reply; 18+ messages in thread
From: Glynn Clements @ 2009-10-08 0:28 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
Holger Kiehl wrote:
> >> Most the time I compile my application without the -g option due to
> >> performance reasons.
> >
> > The -g switch has absolutely no effect upon performance. It simply
> > causes and additional section to be added to the resulting binary.
> > When the program is run normally (i.e. not under gdb), that section
> > won't be mapped. The only downside to -g is that it increases the size
> > of the file.
>
> But when executing the program will it not read the whole binary which
> is much larger with debug information and so will take longer (just the
> first reading of the binary)?
No. Binaries aren't "read", they're mapped (with mmap); pages are read
into memory on demand. The loader only maps the sections which are
actually required, which doesn't include the debug sections.
--
Glynn Clements <glynn@gclements.plus.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 17:36 ` Manish Katiyar
@ 2009-10-08 18:47 ` Manish Katiyar
2009-10-09 12:09 ` Holger Kiehl
1 sibling, 0 replies; 18+ messages in thread
From: Manish Katiyar @ 2009-10-08 18:47 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
On Wed, Oct 7, 2009 at 11:06 PM, Manish Katiyar <mkatiyar@gmail.com> wrote:
> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello Manish
>>
>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>
>>> Hi Holger,
>>>
>>> I don't have the source code, so a bit hard to guess. But you can try
>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>> and look around those lines in the function where you are accessing
>>> that member.
>>>
>>> I am trying to download the AFD source code, which looks like it will
>>> take ages on my slow broadband. Hopefully I can help after that.
>>>
>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>> is the one that caused the error. You can get it from:
>>
>> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>
>> You will find the relevant code in src/fd.c.
Hi Holger,
Have you been able to trace the bug ?
>
> Hi Holger,
>
> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
> (gdb) p $offset
> $5 = 236
> (gdb) p/x 236
> $6 = 0xec
>
> host_status is at offset 236. In the function start_process I can see
> that this is used at places by dereferencing below
> "fsa[fsa_pos].host_status ".
>
> At this point my guess would be that you are getting fsa_pos as
> something illegal ie.. probably you are trying to access beyond the
> array. Since this is an input to the function, you can just check its
> value at the start and assert if that is ok and within reasonable
> range.
>
> HTH
>
>
>>
>> Holger
>>
>
>
>
> --
> Thanks -
> Manish
> ==================================
> [$\*.^ -- I miss being one of them
> ==================================
>
--
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-07 17:36 ` Manish Katiyar
2009-10-08 18:47 ` Manish Katiyar
@ 2009-10-09 12:09 ` Holger Kiehl
2009-10-09 12:15 ` Manish Katiyar
1 sibling, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-09 12:09 UTC (permalink / raw)
To: Manish Katiyar; +Cc: linux-c-programming
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1897 bytes --]
Hello Manish
First, sorry for the late responce!
On Wed, 7 Oct 2009, Manish Katiyar wrote:
> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello Manish
>>
>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>
>>> Hi Holger,
>>>
>>> I don't have the source code, so a bit hard to guess. But you can try
>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>> and look around those lines in the function where you are accessing
>>> that member.
>>>
>>> I am trying to download the AFD source code, which looks like it will
>>> take ages on my slow broadband. Hopefully I can help after that.
>>>
>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>> is the one that caused the error. You can get it from:
>>
>> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>
>> You will find the relevant code in src/fd.c.
>
> Hi Holger,
>
> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
> (gdb) p $offset
> $5 = 236
> (gdb) p/x 236
> $6 = 0xec
>
> host_status is at offset 236. In the function start_process I can see
> that this is used at places by dereferencing below
> "fsa[fsa_pos].host_status ".
>
> At this point my guess would be that you are getting fsa_pos as
> something illegal ie.. probably you are trying to access beyond the
> array. Since this is an input to the function, you can just check its
> value at the start and assert if that is ok and within reasonable
> range.
>
> HTH
>
Many thanks for finding this out! I think I now, with your help, have a
clue where the error could be. Is there a way to find out what value
fsa_pos had at that time? If it was -1 then it is definitely the error
I am thinking of, but if it is something else then I don't know.
Again many thanks for the valuable help!
Regards,
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-08 0:28 ` Glynn Clements
@ 2009-10-09 12:12 ` Holger Kiehl
0 siblings, 0 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-09 12:12 UTC (permalink / raw)
To: Glynn Clements; +Cc: linux-c-programming
On Thu, 8 Oct 2009, Glynn Clements wrote:
>
> Holger Kiehl wrote:
>
>>>> Most the time I compile my application without the -g option due to
>>>> performance reasons.
>>>
>>> The -g switch has absolutely no effect upon performance. It simply
>>> causes and additional section to be added to the resulting binary.
>>> When the program is run normally (i.e. not under gdb), that section
>>> won't be mapped. The only downside to -g is that it increases the size
>>> of the file.
>>
>> But when executing the program will it not read the whole binary which
>> is much larger with debug information and so will take longer (just the
>> first reading of the binary)?
>
> No. Binaries aren't "read", they're mapped (with mmap); pages are read
> into memory on demand. The loader only maps the sections which are
> actually required, which doesn't include the debug sections.
>
Thanks for the clarification!
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-09 12:09 ` Holger Kiehl
@ 2009-10-09 12:15 ` Manish Katiyar
2009-10-09 12:43 ` Holger Kiehl
0 siblings, 1 reply; 18+ messages in thread
From: Manish Katiyar @ 2009-10-09 12:15 UTC (permalink / raw)
To: Holger Kiehl; +Cc: linux-c-programming
On Fri, Oct 9, 2009 at 5:39 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello Manish
>
> First, sorry for the late responce!
>
> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>
>> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>>>
>>> Hello Manish
>>>
>>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>>
>>>> Hi Holger,
>>>>
>>>> I don't have the source code, so a bit hard to guess. But you can try
>>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>>> and look around those lines in the function where you are accessing
>>>> that member.
>>>>
>>>> I am trying to download the AFD source code, which looks like it will
>>>> take ages on my slow broadband. Hopefully I can help after that.
>>>>
>>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>>> is the one that caused the error. You can get it from:
>>>
>>> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>>
>>> You will find the relevant code in src/fd.c.
>>
>> Hi Holger,
>>
>> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
>> (gdb) p $offset
>> $5 = 236
>> (gdb) p/x 236
>> $6 = 0xec
>>
>> host_status is at offset 236. In the function start_process I can see
>> that this is used at places by dereferencing below
>> "fsa[fsa_pos].host_status ".
>>
>> At this point my guess would be that you are getting fsa_pos as
>> something illegal ie.. probably you are trying to access beyond the
>> array. Since this is an input to the function, you can just check its
>> value at the start and assert if that is ok and within reasonable
>> range.
>>
>> HTH
>>
> Many thanks for finding this out! I think I now, with your help, have a
> clue where the error could be. Is there a way to find out what value
> fsa_pos had at that time?
Since it is a runtime variable, probably we can get something by
looking at the output of "info registers". But you can try putting
if (fsa_pos <0 ) {
printf("going to die ... \n");
return
}
in the start of the function itself and try.
> If it was -1 then it is definitely the error
> I am thinking of, but if it is something else then I don't know.
>
> Again many thanks for the valuable help!
>
> Regards,
> Holger
--
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-09 12:15 ` Manish Katiyar
@ 2009-10-09 12:43 ` Holger Kiehl
2009-10-10 8:35 ` Glynn Clements
0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-09 12:43 UTC (permalink / raw)
To: Manish Katiyar; +Cc: linux-c-programming
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3595 bytes --]
On Fri, 9 Oct 2009, Manish Katiyar wrote:
> On Fri, Oct 9, 2009 at 5:39 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello Manish
>>
>> First, sorry for the late responce!
>>
>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>
>>> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>>>>
>>>> Hello Manish
>>>>
>>>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>>>
>>>>> Hi Holger,
>>>>>
>>>>> I don't have the source code, so a bit hard to guess. But you can try
>>>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>>>> and look around those lines in the function where you are accessing
>>>>> that member.
>>>>>
>>>>> I am trying to download the AFD source code, which looks like it will
>>>>> take ages on my slow broadband. Hopefully I can help after that.
>>>>>
>>>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>>>> is the one that caused the error. You can get it from:
>>>>
>>>> ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>>>
>>>> You will find the relevant code in src/fd.c.
>>>
>>> Hi Holger,
>>>
>>> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
>>> (gdb) p $offset
>>> $5 = 236
>>> (gdb) p/x 236
>>> $6 = 0xec
>>>
>>> host_status is at offset 236. In the function start_process I can see
>>> that this is used at places by dereferencing below
>>> "fsa[fsa_pos].host_status ".
>>>
>>> At this point my guess would be that you are getting fsa_pos as
>>> something illegal ie.. probably you are trying to access beyond the
>>> array. Since this is an input to the function, you can just check its
>>> value at the start and assert if that is ok and within reasonable
>>> range.
>>>
>>> HTH
>>>
>> Many thanks for finding this out! I think I now, with your help, have a
>> clue where the error could be. Is there a way to find out what value
>> fsa_pos had at that time?
>
> Since it is a runtime variable, probably we can get something by
> looking at the output of "info registers". But you can try putting
>
How can I find which register is fsa_pos?
(gdb) info registers
rax 0x7fb48a2c8718 140413389014808
rbx 0x4acb3bcd 1254833101
rcx 0x0 0
rdx 0x7fb48a2c9010 140413389017104
rsi 0x68 104
rdi 0x7fb48a3795d8 140413389739480
rbp 0x0 0x0
rsp 0x7fffe4906840 0x7fffe4906840
r8 0x7fb48a346018 140413389529112
r9 0x0 0
r10 0x3f 63
r11 0x25c8 9672
r12 0x5d 93
r13 0xbbfe88b9 3154020537
r14 0xfffffffffffff708 -2296
r15 0x1 1
rip 0x404b5f 0x404b5f <start_process+143>
eflags 0x10207 [ CF PF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fctrl 0x0 0
fstat 0x0 0
ftag 0x0 0
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
mxcsr 0x0 [ ]
> if (fsa_pos <0 ) {
> printf("going to die ... \n");
> return
> }
>
> in the start of the function itself and try.
>
Yes, I have already added that. Thanks!
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-09 12:43 ` Holger Kiehl
@ 2009-10-10 8:35 ` Glynn Clements
2009-10-10 9:08 ` Manish Katiyar
2009-10-10 16:56 ` Holger Kiehl
0 siblings, 2 replies; 18+ messages in thread
From: Glynn Clements @ 2009-10-10 8:35 UTC (permalink / raw)
To: Holger Kiehl; +Cc: Manish Katiyar, linux-c-programming
Holger Kiehl wrote:
> How can I find which register is fsa_pos?
fsa_pos is a parameter, and doesn't appear to be changed within the
function, so I would expect "print fsa_pos" to give the correct value.
AFAICT, the following portion of the disassembly:
0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14
0x0000000000404b55 <start_process+133>: mov %r14,%rax
0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa>
0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx
0x0000000000404b65 <start_process+149>: test $0x1,%dl
corresponds to the expression
fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA
0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of
fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status
field.
So:
movslq 0x24(%rsp),%rax # %rax = fsa_pos
imul $0x8f8,%rax,%r14 # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0]
mov %r14,%rax # %rax = &fsa[fsa_pos] - &fsa[0]
add 0x225441(%rip),%rax # %rax = &fsa[fsa_pos]
mov 0xec(%rax),%edx # %edx = fsa[fsa_pos].host_status
Based upon this, %r14 should contain fsa_pos * 2296, so:
> (gdb) info registers
> r14 0xfffffffffffff708 -2296
Which suggests that fsa_pos is -1.
--
Glynn Clements <glynn@gclements.plus.com>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-10 8:35 ` Glynn Clements
@ 2009-10-10 9:08 ` Manish Katiyar
2009-10-10 16:56 ` Holger Kiehl
1 sibling, 0 replies; 18+ messages in thread
From: Manish Katiyar @ 2009-10-10 9:08 UTC (permalink / raw)
To: Glynn Clements; +Cc: Holger Kiehl, linux-c-programming
On Sat, Oct 10, 2009 at 2:05 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:
>
> Holger Kiehl wrote:
>
>> How can I find which register is fsa_pos?
>
> fsa_pos is a parameter, and doesn't appear to be changed within the
> function, so I would expect "print fsa_pos" to give the correct value.
>
> AFAICT, the following portion of the disassembly:
>
> 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
> 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14
> 0x0000000000404b55 <start_process+133>: mov %r14,%rax
> 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa>
> 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx
> 0x0000000000404b65 <start_process+149>: test $0x1,%dl
>
> corresponds to the expression
>
> fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA
>
> 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of
> fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status
> field.
>
> So:
>
> movslq 0x24(%rsp),%rax # %rax = fsa_pos
> imul $0x8f8,%rax,%r14 # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0]
> mov %r14,%rax # %rax = &fsa[fsa_pos] - &fsa[0]
> add 0x225441(%rip),%rax # %rax = &fsa[fsa_pos]
> mov 0xec(%rax),%edx # %edx = fsa[fsa_pos].host_status
>
> Based upon this, %r14 should contain fsa_pos * 2296, so:
>
>> (gdb) info registers
>> r14 0xfffffffffffff708 -2296
>
> Which suggests that fsa_pos is -1.
Excellent Glynn .... thanks :-) . I was having trouble deciphering it .
>
> --
> Glynn Clements <glynn@gclements.plus.com>
>
--
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Question about core files
2009-10-10 8:35 ` Glynn Clements
2009-10-10 9:08 ` Manish Katiyar
@ 2009-10-10 16:56 ` Holger Kiehl
1 sibling, 0 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-10 16:56 UTC (permalink / raw)
To: Glynn Clements; +Cc: Manish Katiyar, linux-c-programming
On Sat, 10 Oct 2009, Glynn Clements wrote:
>
> Holger Kiehl wrote:
>
>> How can I find which register is fsa_pos?
>
> fsa_pos is a parameter, and doesn't appear to be changed within the
> function, so I would expect "print fsa_pos" to give the correct value.
>
Unfortunately not:
(gdb)
#4 0x0000000000404b5f in start_process ()
(gdb) print fsa_pos
No symbol "fsa_pos" in current context.
> AFAICT, the following portion of the disassembly:
>
> 0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
> 0x0000000000404b4e <start_process+126>: imul $0x8f8,%rax,%r14
> 0x0000000000404b55 <start_process+133>: mov %r14,%rax
> 0x0000000000404b58 <start_process+136>: add 0x225441(%rip),%rax # 0x629fa0 <fsa>
> 0x0000000000404b5f <start_process+143>: mov 0xec(%rax),%edx
> 0x0000000000404b65 <start_process+149>: test $0x1,%dl
>
> corresponds to the expression
>
> fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA
>
> 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of
> fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status
> field.
>
> So:
>
> movslq 0x24(%rsp),%rax # %rax = fsa_pos
> imul $0x8f8,%rax,%r14 # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0]
> mov %r14,%rax # %rax = &fsa[fsa_pos] - &fsa[0]
> add 0x225441(%rip),%rax # %rax = &fsa[fsa_pos]
> mov 0xec(%rax),%edx # %edx = fsa[fsa_pos].host_status
>
> Based upon this, %r14 should contain fsa_pos * 2296, so:
>
>> (gdb) info registers
>> r14 0xfffffffffffff708 -2296
>
> Which suggests that fsa_pos is -1.
>
Many thanks for this very detailed explanation and confirming that
fsa_pos was indeed -1. I would never have thought that one could find
so much information of a core from an optimized binary without debug
information.
Thanks to you and Manish Katiyar for this valuable help!
Regards,
Holger
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-10-10 16:56 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-06 14:04 Question about core files Holger Kiehl
2009-10-06 14:41 ` Manish Katiyar
2009-10-07 13:28 ` Holger Kiehl
2009-10-07 13:54 ` Manish Katiyar
2009-10-07 14:21 ` Holger Kiehl
2009-10-07 17:36 ` Manish Katiyar
2009-10-08 18:47 ` Manish Katiyar
2009-10-09 12:09 ` Holger Kiehl
2009-10-09 12:15 ` Manish Katiyar
2009-10-09 12:43 ` Holger Kiehl
2009-10-10 8:35 ` Glynn Clements
2009-10-10 9:08 ` Manish Katiyar
2009-10-10 16:56 ` Holger Kiehl
2009-10-07 4:45 ` Glynn Clements
2009-10-07 13:43 ` Holger Kiehl
2009-10-08 0:28 ` Glynn Clements
2009-10-09 12:12 ` Holger Kiehl
2009-10-07 4:58 ` vinit dhatrak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).