linux-c-programming.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question about core files
@ 2009-10-06 14:04 Holger Kiehl
  2009-10-06 14:41 ` Manish Katiyar
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-06 14:04 UTC (permalink / raw)
  To: linux-c-programming

Hello

Most the time I compile my application without the -g option due to
performance reasons. Problem is that when it hits some bug and dumps
core, this is not very useful because there is hardly any information
in it. Is there some way to get some useful information out of
the core file. For example one of my program crashed and with gdb
I see the following:

    afd@helena:~$ gdb fd core.2515
    GNU gdb Fedora (6.8-24.fc9)
    Copyright (C) 2008 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu"...
    (no debugging symbols found)

    warning: Can't read pathname for load map: Input/output error.
    Reading symbols from /lib64/libc-2.8.so...Reading symbols from /usr/lib/debug/lib64/libc-2.8.so.debug...done.
    done.
    Loaded symbols for /lib64/libc-2.8.so
    Reading symbols from /lib64/ld-2.8.so...Reading symbols from /usr/lib/debug/lib64/ld-2.8.so.debug...done.
    done.
    Loaded symbols for /lib64/ld-2.8.so
    Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
    done.
    Loaded symbols for /lib64/libnss_files-2.8.so
    Core was generated by `fd -w /home/afd'.
    Program terminated with signal 6, Aborted.
    [New process 2515]
    #0  0x000000304cc32215 in raise (sig=<value optimized out>)
        at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
    64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
    (gdb) where
    #0  0x000000304cc32215 in raise (sig=<value optimized out>)
        at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
    #1  0x000000304cc33d83 in abort () at abort.c:88
    #2  0x000000000040b174 in sig_segv ()
    #3  <signal handler called>
    #4  0x0000000000404b5f in start_process ()
    #5  0x0000000000407b9a in main ()

At least I know that the bug is in my function start_process. But is
there some way to find out at what line it happened?

Thanks,
Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-06 14:04 Question about core files Holger Kiehl
@ 2009-10-06 14:41 ` Manish Katiyar
  2009-10-07 13:28   ` Holger Kiehl
  2009-10-07  4:45 ` Glynn Clements
  2009-10-07  4:58 ` vinit dhatrak
  2 siblings, 1 reply; 18+ messages in thread
From: Manish Katiyar @ 2009-10-06 14:41 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming

On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello
>
> Most the time I compile my application without the -g option due to
> performance reasons. Problem is that when it hits some bug and dumps
> core, this is not very useful because there is hardly any information
> in it. Is there some way to get some useful information out of
> the core file.

Is it possible to post your code ? Atleast the start_process()
function. Given that you have got a sigsegv it is probably an invalid
pointer access.

You can also try to print $eip (or rip since this is 64 bit machine)
and look around the assembly . Output of "disas start_process" from
gdb will also help.


> For example one of my program crashed and with gdb
> I see the following:
>
>   afd@helena:~$ gdb fd core.2515
>   GNU gdb Fedora (6.8-24.fc9)
>   Copyright (C) 2008 Free Software Foundation, Inc.
>   License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
>   This is free software: you are free to change and redistribute it.
>   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>   and "show warranty" for details.
>   This GDB was configured as "x86_64-redhat-linux-gnu"...
>   (no debugging symbols found)
>
>   warning: Can't read pathname for load map: Input/output error.
>   Reading symbols from /lib64/libc-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libc-2.8.so.debug...done.
>   done.
>   Loaded symbols for /lib64/libc-2.8.so
>   Reading symbols from /lib64/ld-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/ld-2.8.so.debug...done.
>   done.
>   Loaded symbols for /lib64/ld-2.8.so
>   Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
>   done.
>   Loaded symbols for /lib64/libnss_files-2.8.so
>   Core was generated by `fd -w /home/afd'.
>   Program terminated with signal 6, Aborted.
>   [New process 2515]
>   #0  0x000000304cc32215 in raise (sig=<value optimized out>)
>       at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>   64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>   (gdb) where
>   #0  0x000000304cc32215 in raise (sig=<value optimized out>)
>       at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>   #1  0x000000304cc33d83 in abort () at abort.c:88
>   #2  0x000000000040b174 in sig_segv ()
>   #3  <signal handler called>
>   #4  0x0000000000404b5f in start_process ()
>   #5  0x0000000000407b9a in main ()
>
> At least I know that the bug is in my function start_process. But is
> there some way to find out at what line it happened?
>
> Thanks,
> Holger
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-06 14:04 Question about core files Holger Kiehl
  2009-10-06 14:41 ` Manish Katiyar
@ 2009-10-07  4:45 ` Glynn Clements
  2009-10-07 13:43   ` Holger Kiehl
  2009-10-07  4:58 ` vinit dhatrak
  2 siblings, 1 reply; 18+ messages in thread
From: Glynn Clements @ 2009-10-07  4:45 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming


Holger Kiehl wrote:

> Most the time I compile my application without the -g option due to
> performance reasons.

The -g switch has absolutely no effect upon performance. It simply
causes and additional section to be added to the resulting binary. 
When the program is run normally (i.e. not under gdb), that section
won't be mapped. The only downside to -g is that it increases the size
of the file.

However: debug information isn't necessarily much help if you compile
with optimisation enabled, as the resulting machine code will bear
little resemblance to the original source code. Statements will be
re-ordered, many variables will be eliminated, etc.

> Problem is that when it hits some bug and dumps
> core, this is not very useful because there is hardly any information
> in it. Is there some way to get some useful information out of
> the core file. For example one of my program crashed and with gdb
> I see the following:

[snip]

> At least I know that the bug is in my function start_process. But is
> there some way to find out at what line it happened?

It isn't meaningful to talk about a "line" in the source code if you
compile with optimisation enabled.

However, you can tell gdb to disassemble the machine code for a
particular function, and you can print the values contained in
registers or at specific memory locations. Working out what that
information means in terms of the source code is something which needs
to be done manually.

-- 
Glynn Clements <glynn@gclements.plus.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-06 14:04 Question about core files Holger Kiehl
  2009-10-06 14:41 ` Manish Katiyar
  2009-10-07  4:45 ` Glynn Clements
@ 2009-10-07  4:58 ` vinit dhatrak
  2 siblings, 0 replies; 18+ messages in thread
From: vinit dhatrak @ 2009-10-07  4:58 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming

On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello
>
> Most the time I compile my application without the -g option due to
> performance reasons. Problem is that when it hits some bug and dumps

GCC allows you to use -g option with -O flag. Here is what "man gcc" says,
[snip]

         GCC allows you to use -g with -O.  The shortcuts taken by
optimized code may occasionally produce surprising results: some
variables you declared may not exist at all; flow of control may
briefly move where you did not expect it; some statements may not be
executed because they compute constant results or their values were
already at hand; some statements may execute in different places
because they were moved out of loops.

[\snip]


-Vinit


> core, this is not very useful because there is hardly any information
> in it. Is there some way to get some useful information out of
> the core file. For example one of my program crashed and with gdb
> I see the following:
>
>   afd@helena:~$ gdb fd core.2515
>   GNU gdb Fedora (6.8-24.fc9)
>   Copyright (C) 2008 Free Software Foundation, Inc.
>   License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
>   This is free software: you are free to change and redistribute it.
>   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>   and "show warranty" for details.
>   This GDB was configured as "x86_64-redhat-linux-gnu"...
>   (no debugging symbols found)
>
>   warning: Can't read pathname for load map: Input/output error.
>   Reading symbols from /lib64/libc-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libc-2.8.so.debug...done.
>   done.
>   Loaded symbols for /lib64/libc-2.8.so
>   Reading symbols from /lib64/ld-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/ld-2.8.so.debug...done.
>   done.
>   Loaded symbols for /lib64/ld-2.8.so
>   Reading symbols from /lib64/libnss_files-2.8.so...Reading symbols from
> /usr/lib/debug/lib64/libnss_files-2.8.so.debug...done.
>   done.
>   Loaded symbols for /lib64/libnss_files-2.8.so
>   Core was generated by `fd -w /home/afd'.
>   Program terminated with signal 6, Aborted.
>   [New process 2515]
>   #0  0x000000304cc32215 in raise (sig=<value optimized out>)
>       at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>   64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
>   (gdb) where
>   #0  0x000000304cc32215 in raise (sig=<value optimized out>)
>       at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>   #1  0x000000304cc33d83 in abort () at abort.c:88
>   #2  0x000000000040b174 in sig_segv ()
>   #3  <signal handler called>
>   #4  0x0000000000404b5f in start_process ()
>   #5  0x0000000000407b9a in main ()
>
> At least I know that the bug is in my function start_process. But is
> there some way to find out at what line it happened?
>
> Thanks,
> Holger
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-c-programming" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-06 14:41 ` Manish Katiyar
@ 2009-10-07 13:28   ` Holger Kiehl
  2009-10-07 13:54     ` Manish Katiyar
  0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-07 13:28 UTC (permalink / raw)
  To: Manish Katiyar; +Cc: linux-c-programming

On Tue, 6 Oct 2009, Manish Katiyar wrote:

> On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello
>>
>> Most the time I compile my application without the -g option due to
>> performance reasons. Problem is that when it hits some bug and dumps
>> core, this is not very useful because there is hardly any information
>> in it. Is there some way to get some useful information out of
>> the core file.
>
> Is it possible to post your code ? Atleast the start_process()
> function. Given that you have got a sigsegv it is probably an invalid
> pointer access.
>
The code is GPL so that is no problem. However it is long so I just
cut out start_process() which you will find below.

> You can also try to print $eip (or rip since this is 64 bit machine)
> and look around the assembly . Output of "disas start_process" from
> gdb will also help.
>
I tried those but I am not familier with assembly:

    (gdb) print $eip
    $1 = void
    (gdb) print $rip
    $2 = (void (*)()) 0x404b5f <start_process+143>
    (gdb) where
    #0  0x000000304cc32215 in raise (sig=<value optimized out>)
        at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
    #1  0x000000304cc33d83 in abort () at abort.c:88
    #2  0x000000000040b174 in sig_segv ()
    #3  <signal handler called>
    #4  0x0000000000404b5f in start_process ()
    #5  0x0000000000407b9a in main ()
    (gdb) disas start_process
    Dump of assembler code for function start_process:
    0x0000000000404ad0 <start_process+0>:   movslq %esi,%rsi
    0x0000000000404ad3 <start_process+3>:   mov    %rbx,-0x30(%rsp)
    0x0000000000404ad8 <start_process+8>:   mov    %rbp,-0x28(%rsp)
    0x0000000000404add <start_process+13>:  mov    %rsi,%r11
    0x0000000000404ae0 <start_process+16>:  mov    $0x68,%esi
    0x0000000000404ae5 <start_process+21>:  mov    %r12,-0x20(%rsp)
    0x0000000000404aea <start_process+26>:  imul   %rsi,%r11
    0x0000000000404aee <start_process+30>:  mov    %r13,-0x18(%rsp)
    0x0000000000404af3 <start_process+35>:  mov    %r14,-0x10(%rsp)
    0x0000000000404af8 <start_process+40>:  mov    %r15,-0x8(%rsp)
    0x0000000000404afd <start_process+45>:  sub    $0x568,%rsp
    0x0000000000404b04 <start_process+52>:  mov    %rdx,%rbx
    0x0000000000404b07 <start_process+55>:  mov    %edi,0x24(%rsp)
    0x0000000000404b0b <start_process+59>:  mov    %r11,%rdi
    0x0000000000404b0e <start_process+62>:  add    0x225513(%rip),%rdi        # 0x62a028 <qb>
    0x0000000000404b15 <start_process+69>:  cmpb   $0x0,0x31(%rdi)
    0x0000000000404b19 <start_process+73>:  je     0x404ed8 <start_process+1032>
    0x0000000000404b1f <start_process+79>:  movslq 0x28(%rdi),%rax
    0x0000000000404b23 <start_process+83>:  lea    0x0(,%rax,8),%rdx
    0x0000000000404b2b <start_process+91>:  mov    %rax,%r8
    0x0000000000404b2e <start_process+94>:  shl    $0x6,%r8
    0x0000000000404b32 <start_process+98>:  sub    %rdx,%r8
    0x0000000000404b35 <start_process+101>: add    0x2259cc(%rip),%r8        # 0x62a508 <mdb>
    0x0000000000404b3c <start_process+108>: mov    0x2c(%r8),%r9d
    0x0000000000404b40 <start_process+112>: test   %r9d,%r9d
    0x0000000000404b43 <start_process+115>: jne    0x404d70 <start_process+672>
    0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
    0x0000000000404b4e <start_process+126>: imul   $0x8f8,%rax,%r14
    0x0000000000404b55 <start_process+133>: mov    %r14,%rax
    0x0000000000404b58 <start_process+136>: add    0x225441(%rip),%rax        # 0x629fa0 <fsa>
    0x0000000000404b5f <start_process+143>: mov    0xec(%rax),%edx
    0x0000000000404b65 <start_process+149>: test   $0x1,%dl
    0x0000000000404b68 <start_process+152>: jne    0x404d30 <start_process+608>
    0x0000000000404b6e <start_process+158>: dec    %ecx
    0x0000000000404b70 <start_process+160>: je     0x404bd0 <start_process+256>
    0x0000000000404b72 <start_process+162>: mov    0xf0(%rax),%ecx
    0x0000000000404b78 <start_process+168>: mov    $0x2,%esi
    0x0000000000404b7d <start_process+173>: test   %ecx,%ecx
    0x0000000000404b7f <start_process+175>: jne    0x404c88 <start_process+440>
    0x0000000000404b85 <start_process+181>: test   %dl,%dl
    0x0000000000404b87 <start_process+183>: jns    0x404bd0 <start_process+256>
    0x0000000000404b89 <start_process+185>: mov    0x104(%rax),%ecx
    0x0000000000404b8f <start_process+191>: movslq 0x28(%rdi),%rax
    0x0000000000404b93 <start_process+195>: mov    $0xffffffff,%esi
    0x0000000000404b98 <start_process+200>: mov    %r11,(%rsp)
    0x0000000000404b9c <start_process+204>: lea    0x0(,%rax,8),%rdx
    0x0000000000404ba4 <start_process+212>: shl    $0x6,%rax
    0x0000000000404ba8 <start_process+216>: sub    %rdx,%rax
    0x0000000000404bab <start_process+219>: mov    0x225956(%rip),%rdx        # 0x62a508 <mdb>
    0x0000000000404bb2 <start_process+226>: mov    0x28(%rdx,%rax,1),%edi
    0x0000000000404bb6 <start_process+230>: mov    %rbx,%rdx
    0x0000000000404bb9 <start_process+233>: callq  0x41ab00 <check_error_queue>
    0x0000000000404bbe <start_process+238>: test   %eax,%eax
    0x0000000000404bc0 <start_process+240>: mov    %eax,%esi
    0x0000000000404bc2 <start_process+242>: mov    (%rsp),%r11
    0x0000000000404bc6 <start_process+246>: jne    0x404c88 <start_process+440>
    0x0000000000404bcc <start_process+252>: nopl   0x0(%rax)
    0x0000000000404bd0 <start_process+256>: mov    %r14,%rcx
    0x0000000000404bd3 <start_process+259>: add    0x2253c6(%rip),%rcx        # 0x629fa0 <fsa>
    0x0000000000404bda <start_process+266>: cmpb   $0x5,0xba(%rcx)
    0x0000000000404be1 <start_process+273>: je     0x404f88 <start_process+1208>
    0x0000000000404be7 <start_process+279>: mov    0x225462(%rip),%rax        # 0x62a050 <p_afd_status>
    0x0000000000404bee <start_process+286>: mov    0x225194(%rip),%ecx        # 0x629d88 <max_connections>
    0x0000000000404bf4 <start_process+292>: cmp    %ecx,0x4f4(%rax)
    0x0000000000404bfa <start_process+298>: jge    0x404d30 <start_process+608>
    0x0000000000404c00 <start_process+304>: mov    %r14,%r8
    0x0000000000404c03 <start_process+307>: add    0x225396(%rip),%r8        # 0x629fa0 <fsa>
    0x0000000000404c0a <start_process+314>: mov    0x174(%r8),%edi
    0x0000000000404c11 <start_process+321>: cmp    %edi,0x170(%r8)
    0x0000000000404c18 <start_process+328>: jge    0x404d30 <start_process+608>
    0x0000000000404c1e <start_process+334>: test   %ecx,%ecx
    0x0000000000404c20 <start_process+336>: jle    0x404c5e <start_process+398>
    0x0000000000404c22 <start_process+338>: mov    0x2251ff(%rip),%rsi        # 0x62---Type <return> to continue, or q <return> to quit---q

So all I now know is that it happened with the assembly instruction:

    mov    0xec(%rax),%edx

But what does it tell me. At what part of my code could this be?

Thanks,
Holger

--------- code of start_process() ----------
static pid_t
start_process(int fsa_pos, int qb_pos, time_t current_time, int retry)
{
    pid_t pid = PENDING;

    if ((qb[qb_pos].msg_name[0] != '\0') &&
        (mdb[qb[qb_pos].pos].age_limit > 0) &&
        ((fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA) == 0) &&
        (current_time > qb[qb_pos].creation_time) &&
        ((current_time - qb[qb_pos].creation_time) > mdb[qb[qb_pos].pos].age_limit))
    {
       char del_dir[MAX_PATH_LENGTH];

       if (fsa[fsa_pos].host_status & ERROR_QUEUE_SET)
       {
          remove_from_error_queue(mdb[qb[qb_pos].pos].job_id, &fsa[fsa_pos],
                                  fsa_pos, fsa_fd);
       }
       (void)sprintf(del_dir, "%s%s%s/%s",
                     p_work_dir, AFD_FILE_DIR,
                     OUTGOING_DIR, qb[qb_pos].msg_name);
       extract_cus(qb[qb_pos].msg_name, dl.input_time, dl.split_job_counter,
                   dl.unique_number);
       remove_job_files(del_dir, fsa_pos, mdb[qb[qb_pos].pos].job_id,
                        FD, AGE_OUTPUT, -1);
       ABS_REDUCE(fsa_pos);
       pid = REMOVED;
    }
    else
    {
       int in_error_queue = NEITHER;

       if ((qb[qb_pos].msg_name[0] == '\0') &&
           (*(unsigned char *)((char *)fsa - AFD_FEATURE_FLAG_OFFSET_END) & DISABLE_RETRIEVE))
       {
          ABS_REDUCE(fsa_pos);

          return(REMOVED);
       }

       if (((fsa[fsa_pos].host_status & STOP_TRANSFER_STAT) == 0) &&
           ((retry == YES) ||
            ((fsa[fsa_pos].error_counter == 0) &&
             (((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) == 0) ||
              ((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
               ((in_error_queue = check_error_queue(mdb[qb[qb_pos].pos].job_id,
                                                    -1, current_time,
                                                    fsa[fsa_pos].retry_interval)) == NO)))) ||
            ((fsa[fsa_pos].error_counter > 0) &&
             (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
             ((current_time - (fsa[fsa_pos].last_retry_time + fsa[fsa_pos].retry_interval)) >= 0) &&
             ((in_error_queue == NO) ||
              ((in_error_queue == NEITHER) &&
               (check_error_queue(mdb[qb[qb_pos].pos].job_id, -1, current_time,
                                  fsa[fsa_pos].retry_interval) == NO)))) ||
            ((fsa[fsa_pos].active_transfers == 0) &&
             ((current_time - (fsa[fsa_pos].last_retry_time + fsa[fsa_pos].retry_interval)) >= 0))))
       {
          /*
           * First lets try and take an existing process,
           * that is waiting for more data to come.
           */
          if ((fsa[fsa_pos].original_toggle_pos == NONE) &&
              ((fsa[fsa_pos].protocol_options & DISABLE_BURSTING) == 0) &&
              (fsa[fsa_pos].keep_connected > 0) &&
              (fsa[fsa_pos].active_transfers > 0) &&
              (fsa[fsa_pos].jobs_queued > 0) &&
              ((((fsa[fsa_pos].special_flag & KEEP_CON_NO_SEND) == 0) &&
                (qb[qb_pos].msg_name[0] != '\0')) ||
               (((fsa[fsa_pos].special_flag & KEEP_CON_NO_FETCH) == 0) &&
                (qb[qb_pos].msg_name[0] == '\0'))) &&
              ((qb[qb_pos].special_flag & HELPER_JOB) == 0))
          {
             int i,
                 other_job_wait_pos[MAX_NO_PARALLEL_JOBS],
                 other_qb_pos[MAX_NO_PARALLEL_JOBS],
                 wait_counter = 0;

             for (i = 0; i < fsa[fsa_pos].allowed_transfers; i++)
             {
                if ((fsa[fsa_pos].job_status[i].proc_id != -1) &&
                    (fsa[fsa_pos].job_status[i].unique_name[2] == 5))
                {
                   int exec_qb_pos;

                   qb_pos_pid(fsa[fsa_pos].job_status[i].proc_id, &exec_qb_pos);
                   if (exec_qb_pos != -1)
                   {
                      if ((qb[qb_pos].msg_name[0] != '\0') &&
                          (qb[exec_qb_pos].msg_name[0] != '\0') &&
                          (mdb[qb[qb_pos].pos].type == mdb[qb[exec_qb_pos].pos].type) &&
                          (mdb[qb[qb_pos].pos].port == mdb[qb[exec_qb_pos].pos].port))
                      {
                         if (qb[qb_pos].retries > 0)
                         {
                            fsa[fsa_pos].job_status[i].file_name_in_use[0] = '\0';
                            fsa[fsa_pos].job_status[i].file_name_in_use[1] = 1;
                            (void)sprintf(&fsa[fsa_pos].job_status[i].file_name_in_use[2],
                                          "%u", qb[qb_pos].retries);
                         }
                         fsa[fsa_pos].job_status[i].job_id = mdb[qb[qb_pos].pos].job_id;
                         mdb[qb[qb_pos].pos].last_transfer_time = mdb[qb[exec_qb_pos].pos].last_transfer_time = current_time;
                         (void)memcpy(fsa[fsa_pos].job_status[i].unique_name,
                                      qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH);
                         (void)memcpy(connection[qb[exec_qb_pos].connect_pos].msg_name,
                                      qb[qb_pos].msg_name, MAX_MSG_NAME_LENGTH);
                         qb[qb_pos].pid = qb[exec_qb_pos].pid;
                         qb[qb_pos].connect_pos = qb[exec_qb_pos].connect_pos;
                         qb[qb_pos].special_flag |= BURST_REQUEUE;
                         connection[qb[exec_qb_pos].connect_pos].job_no = i;
                         if (qb[exec_qb_pos].pid > 0)
                         {
                            if (kill(qb[exec_qb_pos].pid, SIGUSR1) == -1)
                            {
                               system_log(DEBUG_SIGN, __FILE__, __LINE__,
                                          "Failed to send SIGUSR1 to %lld : %s",
                                          (pri_pid_t)qb[exec_qb_pos].pid, strerror(errno));
                            }
                            p_afd_status->burst2_counter++;
                         }
                         else
                         {
                            system_log(DEBUG_SIGN, __FILE__, __LINE__,
                                       "Hmmm, pid = %lld!!!", (pri_pid_t)qb[exec_qb_pos].pid);
                         }
                         if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
                             (no_of_trl_groups > 0))
                         {
                            calc_trl_per_process(fsa_pos);
                         }
                         ABS_REDUCE(fsa_pos);
                         remove_msg(exec_qb_pos);

                         return(qb[qb_pos].pid);
                      }
                      else
                      {
                         other_job_wait_pos[wait_counter] = i;
                         other_qb_pos[wait_counter] = exec_qb_pos;
                         wait_counter++;
                      }
                   }
                   else
                   {
                      system_log(DEBUG_SIGN, __FILE__, __LINE__,
                                 "Unable to locate qb_pos for %lld [fsa_pos=%d].",
                                 (pri_pid_t)fsa[fsa_pos].job_status[i].proc_id,
                                 fsa_pos);
                   }
                }
             }
             if ((fsa[fsa_pos].active_transfers == fsa[fsa_pos].allowed_transfers) &&
                 (wait_counter > 0))
             {
                for (i = 0; i < wait_counter; i++)
                {
                   if (fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] == 5)
                   {
                      if (qb[other_qb_pos[i]].pid > 0)
                      {
                         fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 6;
                         if (qb[other_qb_pos[i]].msg_name[0] == '\0')
                         {
                            return(PENDING);
                         }
                         else
                         {
                            if (kill(qb[other_qb_pos[i]].pid, SIGUSR1) == -1)
                            {
                               system_log(DEBUG_SIGN, __FILE__, __LINE__,
                                          "Failed to send SIGUSR1 to %lld : %s",
                                          (pri_pid_t)qb[other_qb_pos[i]].pid, strerror(errno));
                               fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 5;
                            }
                            else
                            {
                               return(PENDING);
                            }
                         }
                      }
                      else
                      {
                         system_log(DEBUG_SIGN, __FILE__, __LINE__,
                                    "Hmmm, pid = %lld!!!", (pri_pid_t)qb[other_qb_pos[i]].pid);
                      }
                   }
                }
             }
          }

          if ((p_afd_status->no_of_transfers < max_connections) &&
              (fsa[fsa_pos].active_transfers < fsa[fsa_pos].allowed_transfers))
          {
             int pos;

             if ((pos = get_free_connection()) == INCORRECT)
             {
                system_log(ERROR_SIGN, __FILE__, __LINE__,
                           "Failed to get free connection.");
             }
             else
             {
                if ((connection[pos].job_no = get_free_disp_pos(fsa_pos)) != INCORRECT)
                {
                   if (qb[qb_pos].msg_name[0] == '\0')
                   {
                      connection[pos].fra_pos = qb[qb_pos].pos;
                      connection[pos].protocol = fra[qb[qb_pos].pos].protocol;
                      connection[pos].msg_name[0] = '\0';
                      (void)memcpy(connection[pos].dir_alias,
                                   fra[qb[qb_pos].pos].dir_alias,
                                   MAX_DIR_ALIAS_LENGTH + 1);
                   }
                   else
                   {
                      connection[pos].fra_pos = -1;
                      connection[pos].protocol = mdb[qb[qb_pos].pos].type;
                      (void)memcpy(connection[pos].msg_name, qb[qb_pos].msg_name,
                                   MAX_MSG_NAME_LENGTH);
                      connection[pos].dir_alias[0] = '\0';
                   }
                   if (qb[qb_pos].special_flag & RESEND_JOB)
                   {
                      connection[pos].resend = YES;
                   }
                   else
                   {
                      connection[pos].resend = NO;
                   }
                   connection[pos].temp_toggle = OFF;
                   (void)memcpy(connection[pos].hostname, fsa[fsa_pos].host_alias,
                                MAX_HOSTNAME_LENGTH + 1);
                   connection[pos].host_id = fsa[fsa_pos].host_id;
                   connection[pos].fsa_pos = fsa_pos;
                   if (fd_check_fsa() == YES)
                   {
                      if (check_fra_fd() == YES)
                      {
                         init_fra_data();
                      }

                      /*
                       * We need to set the connection[pos].pid to a
                       * value higher then 0 so the function get_new_positions()
                       * also locates the new connection[pos].fsa_pos. Otherwise
                       * from here on we point to some completely different
                       * host and this can cause havoc when someone uses
                       * edit_hc and changes the alias order.
                       */
                      connection[pos].pid = 1;
                      get_new_positions();
                      connection[pos].pid = 0;
                      init_msg_buffer();
                      fsa_pos = connection[pos].fsa_pos;
                      last_pos_lookup = INCORRECT;
                   }
                   (void)strcpy(fsa[fsa_pos].job_status[connection[pos].job_no].unique_name,
                                qb[qb_pos].msg_name);
                   if ((fsa[fsa_pos].error_counter == 0) &&
                       (fsa[fsa_pos].auto_toggle == ON) &&
                       (fsa[fsa_pos].original_toggle_pos != NONE) &&
                       (fsa[fsa_pos].max_successful_retries > 0))
                   {
                      if ((fsa[fsa_pos].original_toggle_pos == fsa[fsa_pos].toggle_pos) &&
                          (fsa[fsa_pos].successful_retries > 0))
                      {
                         fsa[fsa_pos].original_toggle_pos = NONE;
                         fsa[fsa_pos].successful_retries = 0;
                      }
                      else if (fsa[fsa_pos].successful_retries >= fsa[fsa_pos].max_successful_retries)
                           {
                              connection[pos].temp_toggle = ON;
                              fsa[fsa_pos].successful_retries = 0;
                           }
                           else
                           {
                              fsa[fsa_pos].successful_retries++;
                           }
                   }

                   /* Create process to distribute file. */
                   if ((connection[pos].pid = make_process(&connection[pos],
                                                           qb_pos)) > 0)
                   {
                      pid = fsa[fsa_pos].job_status[connection[pos].job_no].proc_id = connection[pos].pid;
                      fsa[fsa_pos].active_transfers += 1;
                      if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
                          (no_of_trl_groups > 0))
                      {
                         calc_trl_per_process(fsa_pos);
                      }
                      ABS_REDUCE(fsa_pos);
                      qb[qb_pos].connect_pos = pos;
                      p_afd_status->no_of_transfers++;
                   }
                   else
                   {
                      fsa[fsa_pos].job_status[connection[pos].job_no].connect_status = NOT_WORKING;
                      fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files_done = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].file_size = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].file_size_done = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use_done = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[0] = '\0';
                      fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[1] = 0;
                      fsa[fsa_pos].job_status[connection[pos].job_no].unique_name[0] = '\0';
                      connection[pos].hostname[0] = '\0';
                      connection[pos].msg_name[0] = '\0';
                      connection[pos].host_id = 0;
                      connection[pos].job_no = -1;
                      connection[pos].fsa_pos = -1;
                      connection[pos].fra_pos = -1;
                      connection[pos].pid = 0;
                   }
                }
             }
          }
       }
    }
    return(pid);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07  4:45 ` Glynn Clements
@ 2009-10-07 13:43   ` Holger Kiehl
  2009-10-08  0:28     ` Glynn Clements
  0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-07 13:43 UTC (permalink / raw)
  To: Glynn Clements; +Cc: linux-c-programming

On Wed, 7 Oct 2009, Glynn Clements wrote:

>
> Holger Kiehl wrote:
>
>> Most the time I compile my application without the -g option due to
>> performance reasons.
>
> The -g switch has absolutely no effect upon performance. It simply
> causes and additional section to be added to the resulting binary.
> When the program is run normally (i.e. not under gdb), that section
> won't be mapped. The only downside to -g is that it increases the size
> of the file.
>
But when executing the program will it not read the whole binary which
is much larger with debug information and so will take longer (just the
first reading of the binary)?

Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07 13:28   ` Holger Kiehl
@ 2009-10-07 13:54     ` Manish Katiyar
  2009-10-07 14:21       ` Holger Kiehl
  0 siblings, 1 reply; 18+ messages in thread
From: Manish Katiyar @ 2009-10-07 13:54 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming

On Wed, Oct 7, 2009 at 6:58 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> On Tue, 6 Oct 2009, Manish Katiyar wrote:
>
>> On Tue, Oct 6, 2009 at 7:34 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>>>
>>> Hello
>>>
>>> Most the time I compile my application without the -g option due to
>>> performance reasons. Problem is that when it hits some bug and dumps
>>> core, this is not very useful because there is hardly any information
>>> in it. Is there some way to get some useful information out of
>>> the core file.
>>
>> Is it possible to post your code ? Atleast the start_process()
>> function. Given that you have got a sigsegv it is probably an invalid
>> pointer access.
>>
> The code is GPL so that is no problem. However it is long so I just
> cut out start_process() which you will find below.
>
>> You can also try to print $eip (or rip since this is 64 bit machine)
>> and look around the assembly . Output of "disas start_process" from
>> gdb will also help.
>>
> I tried those but I am not familier with assembly:
>
>   (gdb) print $eip
>   $1 = void
>   (gdb) print $rip
>   $2 = (void (*)()) 0x404b5f <start_process+143>
>   (gdb) where
>   #0  0x000000304cc32215 in raise (sig=<value optimized out>)
>       at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
>   #1  0x000000304cc33d83 in abort () at abort.c:88
>   #2  0x000000000040b174 in sig_segv ()
>   #3  <signal handler called>
>   #4  0x0000000000404b5f in start_process ()
>   #5  0x0000000000407b9a in main ()
>   (gdb) disas start_process
>   Dump of assembler code for function start_process:
>   0x0000000000404ad0 <start_process+0>:   movslq %esi,%rsi
>   0x0000000000404ad3 <start_process+3>:   mov    %rbx,-0x30(%rsp)
>   0x0000000000404ad8 <start_process+8>:   mov    %rbp,-0x28(%rsp)
>   0x0000000000404add <start_process+13>:  mov    %rsi,%r11
>   0x0000000000404ae0 <start_process+16>:  mov    $0x68,%esi
>   0x0000000000404ae5 <start_process+21>:  mov    %r12,-0x20(%rsp)
>   0x0000000000404aea <start_process+26>:  imul   %rsi,%r11
>   0x0000000000404aee <start_process+30>:  mov    %r13,-0x18(%rsp)
>   0x0000000000404af3 <start_process+35>:  mov    %r14,-0x10(%rsp)
>   0x0000000000404af8 <start_process+40>:  mov    %r15,-0x8(%rsp)
>   0x0000000000404afd <start_process+45>:  sub    $0x568,%rsp
>   0x0000000000404b04 <start_process+52>:  mov    %rdx,%rbx
>   0x0000000000404b07 <start_process+55>:  mov    %edi,0x24(%rsp)
>   0x0000000000404b0b <start_process+59>:  mov    %r11,%rdi
>   0x0000000000404b0e <start_process+62>:  add    0x225513(%rip),%rdi
>  # 0x62a028 <qb>
>   0x0000000000404b15 <start_process+69>:  cmpb   $0x0,0x31(%rdi)
>   0x0000000000404b19 <start_process+73>:  je     0x404ed8
> <start_process+1032>
>   0x0000000000404b1f <start_process+79>:  movslq 0x28(%rdi),%rax
>   0x0000000000404b23 <start_process+83>:  lea    0x0(,%rax,8),%rdx
>   0x0000000000404b2b <start_process+91>:  mov    %rax,%r8
>   0x0000000000404b2e <start_process+94>:  shl    $0x6,%r8
>   0x0000000000404b32 <start_process+98>:  sub    %rdx,%r8
>   0x0000000000404b35 <start_process+101>: add    0x2259cc(%rip),%r8        #
> 0x62a508 <mdb>
>   0x0000000000404b3c <start_process+108>: mov    0x2c(%r8),%r9d
>   0x0000000000404b40 <start_process+112>: test   %r9d,%r9d
>   0x0000000000404b43 <start_process+115>: jne    0x404d70
> <start_process+672>
>   0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
>   0x0000000000404b4e <start_process+126>: imul   $0x8f8,%rax,%r14
>   0x0000000000404b55 <start_process+133>: mov    %r14,%rax
>   0x0000000000404b58 <start_process+136>: add    0x225441(%rip),%rax
>  # 0x629fa0 <fsa>
>   0x0000000000404b5f <start_process+143>: mov    0xec(%rax),%edx
>   0x0000000000404b65 <start_process+149>: test   $0x1,%dl
>   0x0000000000404b68 <start_process+152>: jne    0x404d30
> <start_process+608>
>   0x0000000000404b6e <start_process+158>: dec    %ecx
>   0x0000000000404b70 <start_process+160>: je     0x404bd0
> <start_process+256>
>   0x0000000000404b72 <start_process+162>: mov    0xf0(%rax),%ecx
>   0x0000000000404b78 <start_process+168>: mov    $0x2,%esi
>   0x0000000000404b7d <start_process+173>: test   %ecx,%ecx
>   0x0000000000404b7f <start_process+175>: jne    0x404c88
> <start_process+440>
>   0x0000000000404b85 <start_process+181>: test   %dl,%dl
>   0x0000000000404b87 <start_process+183>: jns    0x404bd0
> <start_process+256>
>   0x0000000000404b89 <start_process+185>: mov    0x104(%rax),%ecx
>   0x0000000000404b8f <start_process+191>: movslq 0x28(%rdi),%rax
>   0x0000000000404b93 <start_process+195>: mov    $0xffffffff,%esi
>   0x0000000000404b98 <start_process+200>: mov    %r11,(%rsp)
>   0x0000000000404b9c <start_process+204>: lea    0x0(,%rax,8),%rdx
>   0x0000000000404ba4 <start_process+212>: shl    $0x6,%rax
>   0x0000000000404ba8 <start_process+216>: sub    %rdx,%rax
>   0x0000000000404bab <start_process+219>: mov    0x225956(%rip),%rdx
>  # 0x62a508 <mdb>
>   0x0000000000404bb2 <start_process+226>: mov    0x28(%rdx,%rax,1),%edi
>   0x0000000000404bb6 <start_process+230>: mov    %rbx,%rdx
>   0x0000000000404bb9 <start_process+233>: callq  0x41ab00
> <check_error_queue>
>   0x0000000000404bbe <start_process+238>: test   %eax,%eax
>   0x0000000000404bc0 <start_process+240>: mov    %eax,%esi
>   0x0000000000404bc2 <start_process+242>: mov    (%rsp),%r11
>   0x0000000000404bc6 <start_process+246>: jne    0x404c88
> <start_process+440>
>   0x0000000000404bcc <start_process+252>: nopl   0x0(%rax)
>   0x0000000000404bd0 <start_process+256>: mov    %r14,%rcx
>   0x0000000000404bd3 <start_process+259>: add    0x2253c6(%rip),%rcx
>  # 0x629fa0 <fsa>
>   0x0000000000404bda <start_process+266>: cmpb   $0x5,0xba(%rcx)
>   0x0000000000404be1 <start_process+273>: je     0x404f88
> <start_process+1208>
>   0x0000000000404be7 <start_process+279>: mov    0x225462(%rip),%rax
>  # 0x62a050 <p_afd_status>
>   0x0000000000404bee <start_process+286>: mov    0x225194(%rip),%ecx
>  # 0x629d88 <max_connections>
>   0x0000000000404bf4 <start_process+292>: cmp    %ecx,0x4f4(%rax)
>   0x0000000000404bfa <start_process+298>: jge    0x404d30
> <start_process+608>
>   0x0000000000404c00 <start_process+304>: mov    %r14,%r8
>   0x0000000000404c03 <start_process+307>: add    0x225396(%rip),%r8        #
> 0x629fa0 <fsa>
>   0x0000000000404c0a <start_process+314>: mov    0x174(%r8),%edi
>   0x0000000000404c11 <start_process+321>: cmp    %edi,0x170(%r8)
>   0x0000000000404c18 <start_process+328>: jge    0x404d30
> <start_process+608>
>   0x0000000000404c1e <start_process+334>: test   %ecx,%ecx
>   0x0000000000404c20 <start_process+336>: jle    0x404c5e
> <start_process+398>
>   0x0000000000404c22 <start_process+338>: mov    0x2251ff(%rip),%rsi
>  # 0x62---Type <return> to continue, or q <return> to quit---q
>
> So all I now know is that it happened with the assembly instruction:
>
>   mov    0xec(%rax),%edx
>
> But what does it tell me. At what part of my code could this be?

Hi Holger,

I don't have the source code, so a bit hard to guess. But you can try
to find out which member of your fsa structure is at offset 236 (0xec)
and look around those lines in the function where you are accessing
that member.

I am trying to download the AFD source code, which looks like it will
take ages on my slow broadband. Hopefully I can help after that.


>
> Thanks,
> Holger
>
> --------- code of start_process() ----------
> static pid_t
> start_process(int fsa_pos, int qb_pos, time_t current_time, int retry)
> {
>   pid_t pid = PENDING;
>
>   if ((qb[qb_pos].msg_name[0] != '\0') &&
>       (mdb[qb[qb_pos].pos].age_limit > 0) &&
>       ((fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA) == 0) &&
>       (current_time > qb[qb_pos].creation_time) &&
>       ((current_time - qb[qb_pos].creation_time) >
> mdb[qb[qb_pos].pos].age_limit))
>   {
>      char del_dir[MAX_PATH_LENGTH];
>
>      if (fsa[fsa_pos].host_status & ERROR_QUEUE_SET)
>      {
>         remove_from_error_queue(mdb[qb[qb_pos].pos].job_id, &fsa[fsa_pos],
>                                 fsa_pos, fsa_fd);
>      }
>      (void)sprintf(del_dir, "%s%s%s/%s",
>                    p_work_dir, AFD_FILE_DIR,
>                    OUTGOING_DIR, qb[qb_pos].msg_name);
>      extract_cus(qb[qb_pos].msg_name, dl.input_time, dl.split_job_counter,
>                  dl.unique_number);
>      remove_job_files(del_dir, fsa_pos, mdb[qb[qb_pos].pos].job_id,
>                       FD, AGE_OUTPUT, -1);
>      ABS_REDUCE(fsa_pos);
>      pid = REMOVED;
>   }
>   else
>   {
>      int in_error_queue = NEITHER;
>
>      if ((qb[qb_pos].msg_name[0] == '\0') &&
>          (*(unsigned char *)((char *)fsa - AFD_FEATURE_FLAG_OFFSET_END) &
> DISABLE_RETRIEVE))
>      {
>         ABS_REDUCE(fsa_pos);
>
>         return(REMOVED);
>      }
>
>      if (((fsa[fsa_pos].host_status & STOP_TRANSFER_STAT) == 0) &&
>          ((retry == YES) ||
>           ((fsa[fsa_pos].error_counter == 0) &&
>            (((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) == 0) ||
>             ((fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
>              ((in_error_queue =
> check_error_queue(mdb[qb[qb_pos].pos].job_id,
>                                                   -1, current_time,
>
> fsa[fsa_pos].retry_interval)) == NO)))) ||
>           ((fsa[fsa_pos].error_counter > 0) &&
>            (fsa[fsa_pos].host_status & ERROR_QUEUE_SET) &&
>            ((current_time - (fsa[fsa_pos].last_retry_time +
> fsa[fsa_pos].retry_interval)) >= 0) &&
>            ((in_error_queue == NO) ||
>             ((in_error_queue == NEITHER) &&
>              (check_error_queue(mdb[qb[qb_pos].pos].job_id, -1,
> current_time,
>                                 fsa[fsa_pos].retry_interval) == NO)))) ||
>           ((fsa[fsa_pos].active_transfers == 0) &&
>            ((current_time - (fsa[fsa_pos].last_retry_time +
> fsa[fsa_pos].retry_interval)) >= 0))))
>      {
>         /*
>          * First lets try and take an existing process,
>          * that is waiting for more data to come.
>          */
>         if ((fsa[fsa_pos].original_toggle_pos == NONE) &&
>             ((fsa[fsa_pos].protocol_options & DISABLE_BURSTING) == 0) &&
>             (fsa[fsa_pos].keep_connected > 0) &&
>             (fsa[fsa_pos].active_transfers > 0) &&
>             (fsa[fsa_pos].jobs_queued > 0) &&
>             ((((fsa[fsa_pos].special_flag & KEEP_CON_NO_SEND) == 0) &&
>               (qb[qb_pos].msg_name[0] != '\0')) ||
>              (((fsa[fsa_pos].special_flag & KEEP_CON_NO_FETCH) == 0) &&
>               (qb[qb_pos].msg_name[0] == '\0'))) &&
>             ((qb[qb_pos].special_flag & HELPER_JOB) == 0))
>         {
>            int i,
>                other_job_wait_pos[MAX_NO_PARALLEL_JOBS],
>                other_qb_pos[MAX_NO_PARALLEL_JOBS],
>                wait_counter = 0;
>
>            for (i = 0; i < fsa[fsa_pos].allowed_transfers; i++)
>            {
>               if ((fsa[fsa_pos].job_status[i].proc_id != -1) &&
>                   (fsa[fsa_pos].job_status[i].unique_name[2] == 5))
>               {
>                  int exec_qb_pos;
>
>                  qb_pos_pid(fsa[fsa_pos].job_status[i].proc_id,
> &exec_qb_pos);
>                  if (exec_qb_pos != -1)
>                  {
>                     if ((qb[qb_pos].msg_name[0] != '\0') &&
>                         (qb[exec_qb_pos].msg_name[0] != '\0') &&
>                         (mdb[qb[qb_pos].pos].type ==
> mdb[qb[exec_qb_pos].pos].type) &&
>                         (mdb[qb[qb_pos].pos].port ==
> mdb[qb[exec_qb_pos].pos].port))
>                     {
>                        if (qb[qb_pos].retries > 0)
>                        {
>                           fsa[fsa_pos].job_status[i].file_name_in_use[0] =
> '\0';
>                           fsa[fsa_pos].job_status[i].file_name_in_use[1] =
> 1;
>
> (void)sprintf(&fsa[fsa_pos].job_status[i].file_name_in_use[2],
>                                         "%u", qb[qb_pos].retries);
>                        }
>                        fsa[fsa_pos].job_status[i].job_id =
> mdb[qb[qb_pos].pos].job_id;
>                        mdb[qb[qb_pos].pos].last_transfer_time =
> mdb[qb[exec_qb_pos].pos].last_transfer_time = current_time;
>                        (void)memcpy(fsa[fsa_pos].job_status[i].unique_name,
>                                     qb[qb_pos].msg_name,
> MAX_MSG_NAME_LENGTH);
>
>  (void)memcpy(connection[qb[exec_qb_pos].connect_pos].msg_name,
>                                     qb[qb_pos].msg_name,
> MAX_MSG_NAME_LENGTH);
>                        qb[qb_pos].pid = qb[exec_qb_pos].pid;
>                        qb[qb_pos].connect_pos = qb[exec_qb_pos].connect_pos;
>                        qb[qb_pos].special_flag |= BURST_REQUEUE;
>                        connection[qb[exec_qb_pos].connect_pos].job_no = i;
>                        if (qb[exec_qb_pos].pid > 0)
>                        {
>                           if (kill(qb[exec_qb_pos].pid, SIGUSR1) == -1)
>                           {
>                              system_log(DEBUG_SIGN, __FILE__, __LINE__,
>                                         "Failed to send SIGUSR1 to %lld :
> %s",
>                                         (pri_pid_t)qb[exec_qb_pos].pid,
> strerror(errno));
>                           }
>                           p_afd_status->burst2_counter++;
>                        }
>                        else
>                        {
>                           system_log(DEBUG_SIGN, __FILE__, __LINE__,
>                                      "Hmmm, pid = %lld!!!",
> (pri_pid_t)qb[exec_qb_pos].pid);
>                        }
>                        if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
>                            (no_of_trl_groups > 0))
>                        {
>                           calc_trl_per_process(fsa_pos);
>                        }
>                        ABS_REDUCE(fsa_pos);
>                        remove_msg(exec_qb_pos);
>
>                        return(qb[qb_pos].pid);
>                     }
>                     else
>                     {
>                        other_job_wait_pos[wait_counter] = i;
>                        other_qb_pos[wait_counter] = exec_qb_pos;
>                        wait_counter++;
>                     }
>                  }
>                  else
>                  {
>                     system_log(DEBUG_SIGN, __FILE__, __LINE__,
>                                "Unable to locate qb_pos for %lld
> [fsa_pos=%d].",
>
>  (pri_pid_t)fsa[fsa_pos].job_status[i].proc_id,
>                                fsa_pos);
>                  }
>               }
>            }
>            if ((fsa[fsa_pos].active_transfers ==
> fsa[fsa_pos].allowed_transfers) &&
>                (wait_counter > 0))
>            {
>               for (i = 0; i < wait_counter; i++)
>               {
>                  if
> (fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] == 5)
>                  {
>                     if (qb[other_qb_pos[i]].pid > 0)
>                     {
>
>  fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 6;
>                        if (qb[other_qb_pos[i]].msg_name[0] == '\0')
>                        {
>                           return(PENDING);
>                        }
>                        else
>                        {
>                           if (kill(qb[other_qb_pos[i]].pid, SIGUSR1) == -1)
>                           {
>                              system_log(DEBUG_SIGN, __FILE__, __LINE__,
>                                         "Failed to send SIGUSR1 to %lld :
> %s",
>                                         (pri_pid_t)qb[other_qb_pos[i]].pid,
> strerror(errno));
>
>  fsa[fsa_pos].job_status[other_job_wait_pos[i]].unique_name[2] = 5;
>                           }
>                           else
>                           {
>                              return(PENDING);
>                           }
>                        }
>                     }
>                     else
>                     {
>                        system_log(DEBUG_SIGN, __FILE__, __LINE__,
>                                   "Hmmm, pid = %lld!!!",
> (pri_pid_t)qb[other_qb_pos[i]].pid);
>                     }
>                  }
>               }
>            }
>         }
>
>         if ((p_afd_status->no_of_transfers < max_connections) &&
>             (fsa[fsa_pos].active_transfers <
> fsa[fsa_pos].allowed_transfers))
>         {
>            int pos;
>
>            if ((pos = get_free_connection()) == INCORRECT)
>            {
>               system_log(ERROR_SIGN, __FILE__, __LINE__,
>                          "Failed to get free connection.");
>            }
>            else
>            {
>               if ((connection[pos].job_no = get_free_disp_pos(fsa_pos)) !=
> INCORRECT)
>               {
>                  if (qb[qb_pos].msg_name[0] == '\0')
>                  {
>                     connection[pos].fra_pos = qb[qb_pos].pos;
>                     connection[pos].protocol = fra[qb[qb_pos].pos].protocol;
>                     connection[pos].msg_name[0] = '\0';
>                     (void)memcpy(connection[pos].dir_alias,
>                                  fra[qb[qb_pos].pos].dir_alias,
>                                  MAX_DIR_ALIAS_LENGTH + 1);
>                  }
>                  else
>                  {
>                     connection[pos].fra_pos = -1;
>                     connection[pos].protocol = mdb[qb[qb_pos].pos].type;
>                     (void)memcpy(connection[pos].msg_name,
> qb[qb_pos].msg_name,
>                                  MAX_MSG_NAME_LENGTH);
>                     connection[pos].dir_alias[0] = '\0';
>                  }
>                  if (qb[qb_pos].special_flag & RESEND_JOB)
>                  {
>                     connection[pos].resend = YES;
>                  }
>                  else
>                  {
>                     connection[pos].resend = NO;
>                  }
>                  connection[pos].temp_toggle = OFF;
>                  (void)memcpy(connection[pos].hostname,
> fsa[fsa_pos].host_alias,
>                               MAX_HOSTNAME_LENGTH + 1);
>                  connection[pos].host_id = fsa[fsa_pos].host_id;
>                  connection[pos].fsa_pos = fsa_pos;
>                  if (fd_check_fsa() == YES)
>                  {
>                     if (check_fra_fd() == YES)
>                     {
>                        init_fra_data();
>                     }
>
>                     /*
>                      * We need to set the connection[pos].pid to a
>                      * value higher then 0 so the function
> get_new_positions()
>                      * also locates the new connection[pos].fsa_pos.
> Otherwise
>                      * from here on we point to some completely different
>                      * host and this can cause havoc when someone uses
>                      * edit_hc and changes the alias order.
>                      */
>                     connection[pos].pid = 1;
>                     get_new_positions();
>                     connection[pos].pid = 0;
>                     init_msg_buffer();
>                     fsa_pos = connection[pos].fsa_pos;
>                     last_pos_lookup = INCORRECT;
>                  }
>
>  (void)strcpy(fsa[fsa_pos].job_status[connection[pos].job_no].unique_name,
>                               qb[qb_pos].msg_name);
>                  if ((fsa[fsa_pos].error_counter == 0) &&
>                      (fsa[fsa_pos].auto_toggle == ON) &&
>                      (fsa[fsa_pos].original_toggle_pos != NONE) &&
>                      (fsa[fsa_pos].max_successful_retries > 0))
>                  {
>                     if ((fsa[fsa_pos].original_toggle_pos ==
> fsa[fsa_pos].toggle_pos) &&
>                         (fsa[fsa_pos].successful_retries > 0))
>                     {
>                        fsa[fsa_pos].original_toggle_pos = NONE;
>                        fsa[fsa_pos].successful_retries = 0;
>                     }
>                     else if (fsa[fsa_pos].successful_retries >=
> fsa[fsa_pos].max_successful_retries)
>                          {
>                             connection[pos].temp_toggle = ON;
>                             fsa[fsa_pos].successful_retries = 0;
>                          }
>                          else
>                          {
>                             fsa[fsa_pos].successful_retries++;
>                          }
>                  }
>
>                  /* Create process to distribute file. */
>                  if ((connection[pos].pid = make_process(&connection[pos],
>                                                          qb_pos)) > 0)
>                  {
>                     pid =
> fsa[fsa_pos].job_status[connection[pos].job_no].proc_id =
> connection[pos].pid;
>                     fsa[fsa_pos].active_transfers += 1;
>                     if ((fsa[fsa_pos].transfer_rate_limit > 0) ||
>                         (no_of_trl_groups > 0))
>                     {
>                        calc_trl_per_process(fsa_pos);
>                     }
>                     ABS_REDUCE(fsa_pos);
>                     qb[qb_pos].connect_pos = pos;
>                     p_afd_status->no_of_transfers++;
>                  }
>                  else
>                  {
>
> fsa[fsa_pos].job_status[connection[pos].job_no].connect_status =
> NOT_WORKING;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].no_of_files_done = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size_done = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_size_in_use_done = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[0] = '\0';
>
> fsa[fsa_pos].job_status[connection[pos].job_no].file_name_in_use[1] = 0;
>
> fsa[fsa_pos].job_status[connection[pos].job_no].unique_name[0] = '\0';
>                     connection[pos].hostname[0] = '\0';
>                     connection[pos].msg_name[0] = '\0';
>                     connection[pos].host_id = 0;
>                     connection[pos].job_no = -1;
>                     connection[pos].fsa_pos = -1;
>                     connection[pos].fra_pos = -1;
>                     connection[pos].pid = 0;
>                  }
>               }
>            }
>         }
>      }
>   }
>   return(pid);
> }
>



-- 
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07 13:54     ` Manish Katiyar
@ 2009-10-07 14:21       ` Holger Kiehl
  2009-10-07 17:36         ` Manish Katiyar
  0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-07 14:21 UTC (permalink / raw)
  To: Manish Katiyar; +Cc: linux-c-programming

Hello Manish

On Wed, 7 Oct 2009, Manish Katiyar wrote:

> Hi Holger,
>
> I don't have the source code, so a bit hard to guess. But you can try
> to find out which member of your fsa structure is at offset 236 (0xec)
> and look around those lines in the function where you are accessing
> that member.
>
> I am trying to download the AFD source code, which looks like it will
> take ages on my slow broadband. Hopefully I can help after that.
>
If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
is the one that caused the error. You can get it from:

    ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2

You will find the relevant code in src/fd.c.

Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07 14:21       ` Holger Kiehl
@ 2009-10-07 17:36         ` Manish Katiyar
  2009-10-08 18:47           ` Manish Katiyar
  2009-10-09 12:09           ` Holger Kiehl
  0 siblings, 2 replies; 18+ messages in thread
From: Manish Katiyar @ 2009-10-07 17:36 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming

On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello Manish
>
> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>
>> Hi Holger,
>>
>> I don't have the source code, so a bit hard to guess. But you can try
>> to find out which member of your fsa structure is at offset 236 (0xec)
>> and look around those lines in the function where you are accessing
>> that member.
>>
>> I am trying to download the AFD source code, which looks like it will
>> take ages on my slow broadband. Hopefully I can help after that.
>>
> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
> is the one that caused the error. You can get it from:
>
>   ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>
> You will find the relevant code in src/fd.c.

Hi Holger,

(gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
(gdb) p $offset
$5 = 236
(gdb) p/x 236
$6 = 0xec

host_status is at offset 236. In the function start_process I can see
that this is used at places by dereferencing below
"fsa[fsa_pos].host_status ".

At this point my guess would be that you are getting fsa_pos as
something illegal ie.. probably you are trying to access beyond the
array. Since this is an input to the function, you can just check its
value at the start and assert if that is ok and within reasonable
range.

HTH


>
> Holger
>



-- 
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07 13:43   ` Holger Kiehl
@ 2009-10-08  0:28     ` Glynn Clements
  2009-10-09 12:12       ` Holger Kiehl
  0 siblings, 1 reply; 18+ messages in thread
From: Glynn Clements @ 2009-10-08  0:28 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming


Holger Kiehl wrote:

> >> Most the time I compile my application without the -g option due to
> >> performance reasons.
> >
> > The -g switch has absolutely no effect upon performance. It simply
> > causes and additional section to be added to the resulting binary.
> > When the program is run normally (i.e. not under gdb), that section
> > won't be mapped. The only downside to -g is that it increases the size
> > of the file.
> 
> But when executing the program will it not read the whole binary which
> is much larger with debug information and so will take longer (just the
> first reading of the binary)?

No. Binaries aren't "read", they're mapped (with mmap); pages are read
into memory on demand. The loader only maps the sections which are
actually required, which doesn't include the debug sections.

-- 
Glynn Clements <glynn@gclements.plus.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07 17:36         ` Manish Katiyar
@ 2009-10-08 18:47           ` Manish Katiyar
  2009-10-09 12:09           ` Holger Kiehl
  1 sibling, 0 replies; 18+ messages in thread
From: Manish Katiyar @ 2009-10-08 18:47 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming

On Wed, Oct 7, 2009 at 11:06 PM, Manish Katiyar <mkatiyar@gmail.com> wrote:
> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello Manish
>>
>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>
>>> Hi Holger,
>>>
>>> I don't have the source code, so a bit hard to guess. But you can try
>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>> and look around those lines in the function where you are accessing
>>> that member.
>>>
>>> I am trying to download the AFD source code, which looks like it will
>>> take ages on my slow broadband. Hopefully I can help after that.
>>>
>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>> is the one that caused the error. You can get it from:
>>
>>   ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>
>> You will find the relevant code in src/fd.c.

Hi Holger,

Have you been able to trace the bug ?


>
> Hi Holger,
>
> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
> (gdb) p $offset
> $5 = 236
> (gdb) p/x 236
> $6 = 0xec
>
> host_status is at offset 236. In the function start_process I can see
> that this is used at places by dereferencing below
> "fsa[fsa_pos].host_status ".
>
> At this point my guess would be that you are getting fsa_pos as
> something illegal ie.. probably you are trying to access beyond the
> array. Since this is an input to the function, you can just check its
> value at the start and assert if that is ok and within reasonable
> range.
>
> HTH
>
>
>>
>> Holger
>>
>
>
>
> --
> Thanks -
> Manish
> ==================================
> [$\*.^ -- I miss being one of them
> ==================================
>



-- 
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-07 17:36         ` Manish Katiyar
  2009-10-08 18:47           ` Manish Katiyar
@ 2009-10-09 12:09           ` Holger Kiehl
  2009-10-09 12:15             ` Manish Katiyar
  1 sibling, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-09 12:09 UTC (permalink / raw)
  To: Manish Katiyar; +Cc: linux-c-programming

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1897 bytes --]

Hello Manish

First, sorry for the late responce!

On Wed, 7 Oct 2009, Manish Katiyar wrote:

> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello Manish
>>
>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>
>>> Hi Holger,
>>>
>>> I don't have the source code, so a bit hard to guess. But you can try
>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>> and look around those lines in the function where you are accessing
>>> that member.
>>>
>>> I am trying to download the AFD source code, which looks like it will
>>> take ages on my slow broadband. Hopefully I can help after that.
>>>
>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>> is the one that caused the error. You can get it from:
>>
>>   ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>
>> You will find the relevant code in src/fd.c.
>
> Hi Holger,
>
> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
> (gdb) p $offset
> $5 = 236
> (gdb) p/x 236
> $6 = 0xec
>
> host_status is at offset 236. In the function start_process I can see
> that this is used at places by dereferencing below
> "fsa[fsa_pos].host_status ".
>
> At this point my guess would be that you are getting fsa_pos as
> something illegal ie.. probably you are trying to access beyond the
> array. Since this is an input to the function, you can just check its
> value at the start and assert if that is ok and within reasonable
> range.
>
> HTH
>
Many thanks for finding this out! I think I now, with your help, have a
clue where the error could be. Is there a way to find out what value
fsa_pos had at that time? If it was -1 then it is definitely the error
I am thinking of, but if it is something else then I don't know.

Again many thanks for the valuable help!

Regards,
Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-08  0:28     ` Glynn Clements
@ 2009-10-09 12:12       ` Holger Kiehl
  0 siblings, 0 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-09 12:12 UTC (permalink / raw)
  To: Glynn Clements; +Cc: linux-c-programming

On Thu, 8 Oct 2009, Glynn Clements wrote:

>
> Holger Kiehl wrote:
>
>>>> Most the time I compile my application without the -g option due to
>>>> performance reasons.
>>>
>>> The -g switch has absolutely no effect upon performance. It simply
>>> causes and additional section to be added to the resulting binary.
>>> When the program is run normally (i.e. not under gdb), that section
>>> won't be mapped. The only downside to -g is that it increases the size
>>> of the file.
>>
>> But when executing the program will it not read the whole binary which
>> is much larger with debug information and so will take longer (just the
>> first reading of the binary)?
>
> No. Binaries aren't "read", they're mapped (with mmap); pages are read
> into memory on demand. The loader only maps the sections which are
> actually required, which doesn't include the debug sections.
>
Thanks for the clarification!

Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-09 12:09           ` Holger Kiehl
@ 2009-10-09 12:15             ` Manish Katiyar
  2009-10-09 12:43               ` Holger Kiehl
  0 siblings, 1 reply; 18+ messages in thread
From: Manish Katiyar @ 2009-10-09 12:15 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-c-programming

On Fri, Oct 9, 2009 at 5:39 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
> Hello Manish
>
> First, sorry for the late responce!
>
> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>
>> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>>>
>>> Hello Manish
>>>
>>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>>
>>>> Hi Holger,
>>>>
>>>> I don't have the source code, so a bit hard to guess. But you can try
>>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>>> and look around those lines in the function where you are accessing
>>>> that member.
>>>>
>>>> I am trying to download the AFD source code, which looks like it will
>>>> take ages on my slow broadband. Hopefully I can help after that.
>>>>
>>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>>> is the one that caused the error. You can get it from:
>>>
>>>   ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>>
>>> You will find the relevant code in src/fd.c.
>>
>> Hi Holger,
>>
>> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
>> (gdb) p $offset
>> $5 = 236
>> (gdb) p/x 236
>> $6 = 0xec
>>
>> host_status is at offset 236. In the function start_process I can see
>> that this is used at places by dereferencing below
>> "fsa[fsa_pos].host_status ".
>>
>> At this point my guess would be that you are getting fsa_pos as
>> something illegal ie.. probably you are trying to access beyond the
>> array. Since this is an input to the function, you can just check its
>> value at the start and assert if that is ok and within reasonable
>> range.
>>
>> HTH
>>
> Many thanks for finding this out! I think I now, with your help, have a
> clue where the error could be. Is there a way to find out what value
> fsa_pos had at that time?

Since it is a runtime variable, probably we can get something by
looking at the output of "info registers". But you can try putting

if (fsa_pos <0 ) {
   printf("going to die ... \n");
   return
}

in the start of the function itself and try.

>  If it was -1 then it is definitely the error
> I am thinking of, but if it is something else then I don't know.
>
> Again many thanks for the valuable help!
>
> Regards,
> Holger



-- 
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-09 12:15             ` Manish Katiyar
@ 2009-10-09 12:43               ` Holger Kiehl
  2009-10-10  8:35                 ` Glynn Clements
  0 siblings, 1 reply; 18+ messages in thread
From: Holger Kiehl @ 2009-10-09 12:43 UTC (permalink / raw)
  To: Manish Katiyar; +Cc: linux-c-programming

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3595 bytes --]

On Fri, 9 Oct 2009, Manish Katiyar wrote:

> On Fri, Oct 9, 2009 at 5:39 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>> Hello Manish
>>
>> First, sorry for the late responce!
>>
>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>
>>> On Wed, Oct 7, 2009 at 7:51 PM, Holger Kiehl <Holger.Kiehl@dwd.de> wrote:
>>>>
>>>> Hello Manish
>>>>
>>>> On Wed, 7 Oct 2009, Manish Katiyar wrote:
>>>>
>>>>> Hi Holger,
>>>>>
>>>>> I don't have the source code, so a bit hard to guess. But you can try
>>>>> to find out which member of your fsa structure is at offset 236 (0xec)
>>>>> and look around those lines in the function where you are accessing
>>>>> that member.
>>>>>
>>>>> I am trying to download the AFD source code, which looks like it will
>>>>> take ages on my slow broadband. Hopefully I can help after that.
>>>>>
>>>> If you download, please take afd-1.4.0-0.20.beta.tar.bz2 because that
>>>> is the one that caused the error. You can get it from:
>>>>
>>>>   ftp://ftp.dwd.de/pub/afd/development/afd-1.4.0-0.20.beta.tar.bz2
>>>>
>>>> You will find the relevant code in src/fd.c.
>>>
>>> Hi Holger,
>>>
>>> (gdb) set $offset = (int)(&((struct filetransfer_status *)0)->host_status)
>>> (gdb) p $offset
>>> $5 = 236
>>> (gdb) p/x 236
>>> $6 = 0xec
>>>
>>> host_status is at offset 236. In the function start_process I can see
>>> that this is used at places by dereferencing below
>>> "fsa[fsa_pos].host_status ".
>>>
>>> At this point my guess would be that you are getting fsa_pos as
>>> something illegal ie.. probably you are trying to access beyond the
>>> array. Since this is an input to the function, you can just check its
>>> value at the start and assert if that is ok and within reasonable
>>> range.
>>>
>>> HTH
>>>
>> Many thanks for finding this out! I think I now, with your help, have a
>> clue where the error could be. Is there a way to find out what value
>> fsa_pos had at that time?
>
> Since it is a runtime variable, probably we can get something by
> looking at the output of "info registers". But you can try putting
>
How can I find which register is fsa_pos?

    (gdb) info registers
    rax            0x7fb48a2c8718   140413389014808
    rbx            0x4acb3bcd       1254833101
    rcx            0x0      0
    rdx            0x7fb48a2c9010   140413389017104
    rsi            0x68     104
    rdi            0x7fb48a3795d8   140413389739480
    rbp            0x0      0x0
    rsp            0x7fffe4906840   0x7fffe4906840
    r8             0x7fb48a346018   140413389529112
    r9             0x0      0
    r10            0x3f     63
    r11            0x25c8   9672
    r12            0x5d     93
    r13            0xbbfe88b9       3154020537
    r14            0xfffffffffffff708       -2296
    r15            0x1      1
    rip            0x404b5f 0x404b5f <start_process+143>
    eflags         0x10207  [ CF PF IF RF ]
    cs             0x33     51
    ss             0x2b     43
    ds             0x0      0
    es             0x0      0
    fs             0x0      0
    gs             0x0      0
    fctrl          0x0      0
    fstat          0x0      0
    ftag           0x0      0
    fiseg          0x0      0
    fioff          0x0      0
    foseg          0x0      0
    fooff          0x0      0
    fop            0x0      0
    mxcsr          0x0      [ ]

> if (fsa_pos <0 ) {
>   printf("going to die ... \n");
>   return
> }
>
> in the start of the function itself and try.
>
Yes, I have already added that. Thanks!

Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-09 12:43               ` Holger Kiehl
@ 2009-10-10  8:35                 ` Glynn Clements
  2009-10-10  9:08                   ` Manish Katiyar
  2009-10-10 16:56                   ` Holger Kiehl
  0 siblings, 2 replies; 18+ messages in thread
From: Glynn Clements @ 2009-10-10  8:35 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: Manish Katiyar, linux-c-programming


Holger Kiehl wrote:

> How can I find which register is fsa_pos?

fsa_pos is a parameter, and doesn't appear to be changed within the
function, so I would expect "print fsa_pos" to give the correct value.

AFAICT, the following portion of the disassembly:

    0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
    0x0000000000404b4e <start_process+126>: imul   $0x8f8,%rax,%r14
    0x0000000000404b55 <start_process+133>: mov    %r14,%rax
    0x0000000000404b58 <start_process+136>: add    0x225441(%rip),%rax        # 0x629fa0 <fsa>
    0x0000000000404b5f <start_process+143>: mov    0xec(%rax),%edx
    0x0000000000404b65 <start_process+149>: test   $0x1,%dl

corresponds to the expression

	fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA

0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of
fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status
field.

So:

    movslq 0x24(%rsp),%rax	# %rax = fsa_pos
    imul   $0x8f8,%rax,%r14	# %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0]
    mov    %r14,%rax		# %rax = &fsa[fsa_pos] - &fsa[0]
    add    0x225441(%rip),%rax	# %rax = &fsa[fsa_pos]
    mov    0xec(%rax),%edx      # %edx = fsa[fsa_pos].host_status

Based upon this, %r14 should contain fsa_pos * 2296, so:

>     (gdb) info registers
>     r14            0xfffffffffffff708       -2296

Which suggests that fsa_pos is -1.

-- 
Glynn Clements <glynn@gclements.plus.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-10  8:35                 ` Glynn Clements
@ 2009-10-10  9:08                   ` Manish Katiyar
  2009-10-10 16:56                   ` Holger Kiehl
  1 sibling, 0 replies; 18+ messages in thread
From: Manish Katiyar @ 2009-10-10  9:08 UTC (permalink / raw)
  To: Glynn Clements; +Cc: Holger Kiehl, linux-c-programming

On Sat, Oct 10, 2009 at 2:05 PM, Glynn Clements
<glynn@gclements.plus.com> wrote:
>
> Holger Kiehl wrote:
>
>> How can I find which register is fsa_pos?
>
> fsa_pos is a parameter, and doesn't appear to be changed within the
> function, so I would expect "print fsa_pos" to give the correct value.
>
> AFAICT, the following portion of the disassembly:
>
>    0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
>    0x0000000000404b4e <start_process+126>: imul   $0x8f8,%rax,%r14
>    0x0000000000404b55 <start_process+133>: mov    %r14,%rax
>    0x0000000000404b58 <start_process+136>: add    0x225441(%rip),%rax        # 0x629fa0 <fsa>
>    0x0000000000404b5f <start_process+143>: mov    0xec(%rax),%edx
>    0x0000000000404b65 <start_process+149>: test   $0x1,%dl
>
> corresponds to the expression
>
>        fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA
>
> 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of
> fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status
> field.
>
> So:
>
>    movslq 0x24(%rsp),%rax      # %rax = fsa_pos
>    imul   $0x8f8,%rax,%r14     # %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0]
>    mov    %r14,%rax            # %rax = &fsa[fsa_pos] - &fsa[0]
>    add    0x225441(%rip),%rax  # %rax = &fsa[fsa_pos]
>    mov    0xec(%rax),%edx      # %edx = fsa[fsa_pos].host_status
>
> Based upon this, %r14 should contain fsa_pos * 2296, so:
>
>>     (gdb) info registers
>>     r14            0xfffffffffffff708       -2296
>
> Which suggests that fsa_pos is -1.

Excellent Glynn .... thanks :-) . I was having trouble deciphering it .

>
> --
> Glynn Clements <glynn@gclements.plus.com>
>



-- 
Thanks -
Manish
==================================
[$\*.^ -- I miss being one of them
==================================
--
To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Question about core files
  2009-10-10  8:35                 ` Glynn Clements
  2009-10-10  9:08                   ` Manish Katiyar
@ 2009-10-10 16:56                   ` Holger Kiehl
  1 sibling, 0 replies; 18+ messages in thread
From: Holger Kiehl @ 2009-10-10 16:56 UTC (permalink / raw)
  To: Glynn Clements; +Cc: Manish Katiyar, linux-c-programming

On Sat, 10 Oct 2009, Glynn Clements wrote:

>
> Holger Kiehl wrote:
>
>> How can I find which register is fsa_pos?
>
> fsa_pos is a parameter, and doesn't appear to be changed within the
> function, so I would expect "print fsa_pos" to give the correct value.
>
Unfortunately not:

    (gdb)
    #4  0x0000000000404b5f in start_process ()
    (gdb) print fsa_pos
    No symbol "fsa_pos" in current context.

> AFAICT, the following portion of the disassembly:
>
>    0x0000000000404b49 <start_process+121>: movslq 0x24(%rsp),%rax
>    0x0000000000404b4e <start_process+126>: imul   $0x8f8,%rax,%r14
>    0x0000000000404b55 <start_process+133>: mov    %r14,%rax
>    0x0000000000404b58 <start_process+136>: add    0x225441(%rip),%rax        # 0x629fa0 <fsa>
>    0x0000000000404b5f <start_process+143>: mov    0xec(%rax),%edx
>    0x0000000000404b65 <start_process+149>: test   $0x1,%dl
>
> corresponds to the expression
>
> 	fsa[fsa_pos].host_status & DO_NOT_DELETE_DATA
>
> 0x24(%rsp) is fsa_pos, $0x8f8 (2296) is the size of each element of
> fsa[], 0x225441(%rip) is fsa, 0xec is the offset of the host_status
> field.
>
> So:
>
>    movslq 0x24(%rsp),%rax	# %rax = fsa_pos
>    imul   $0x8f8,%rax,%r14	# %r14 = fsa_pos * sizeof(fsa[i]) = &fsa[fsa_pos] - &fsa[0]
>    mov    %r14,%rax		# %rax = &fsa[fsa_pos] - &fsa[0]
>    add    0x225441(%rip),%rax	# %rax = &fsa[fsa_pos]
>    mov    0xec(%rax),%edx      # %edx = fsa[fsa_pos].host_status
>
> Based upon this, %r14 should contain fsa_pos * 2296, so:
>
>>     (gdb) info registers
>>     r14            0xfffffffffffff708       -2296
>
> Which suggests that fsa_pos is -1.
>
Many thanks for this very detailed explanation and confirming that
fsa_pos was indeed -1. I would never have thought that one could find
so much information of a core from an optimized binary without debug
information.

Thanks to you and Manish Katiyar for this valuable help!

Regards,
Holger

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-10-10 16:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-06 14:04 Question about core files Holger Kiehl
2009-10-06 14:41 ` Manish Katiyar
2009-10-07 13:28   ` Holger Kiehl
2009-10-07 13:54     ` Manish Katiyar
2009-10-07 14:21       ` Holger Kiehl
2009-10-07 17:36         ` Manish Katiyar
2009-10-08 18:47           ` Manish Katiyar
2009-10-09 12:09           ` Holger Kiehl
2009-10-09 12:15             ` Manish Katiyar
2009-10-09 12:43               ` Holger Kiehl
2009-10-10  8:35                 ` Glynn Clements
2009-10-10  9:08                   ` Manish Katiyar
2009-10-10 16:56                   ` Holger Kiehl
2009-10-07  4:45 ` Glynn Clements
2009-10-07 13:43   ` Holger Kiehl
2009-10-08  0:28     ` Glynn Clements
2009-10-09 12:12       ` Holger Kiehl
2009-10-07  4:58 ` vinit dhatrak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).