All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leonardo Chiquitto <leonardo.lists@gmail.com>
To: Ian Kent <raven@themaw.net>
Cc: autofs@linux.kernel.org
Subject: Re: Automount daemon getting killed by SIGBUS
Date: Wed, 17 Mar 2010 21:07:59 -0300	[thread overview]
Message-ID: <20100318000758.GA9422@libre.l.ngdn.org> (raw)
In-Reply-To: <1267151409.2116.22.camel@localhost>

> > > What is the version in use and what additional patches have been applied?
> > 
> > They are running 5.0.3 plus all the patches in patch_order-5.0.3 and
> > autofs-5.0.4-fix_negative_cache_non-existent_key.patch, meaning that we
> > don't have the other alloca() replacements that went in after 5.0.4.
> 
> OK.
> 
> I had a bug report where the customer believed that the max open file
> limit and stack size was a problem. It turned out that increasing them,
> for some unknown reason reduced the likelihood of the problem occurring,
> but actually had nothing to to with the problem.

Increasing the stack size definitelly helped here too. Customer is not
seeing the problem anymore and now that we have a workaround, it's
more complicated to keep asking for more tests. I spent a lot of time
trying to reproduce the problem in house to make testing easier, but
even with a very similar setup (LDAP plus thousands of mount points)
I was not able to make it crash.

> If automount crashes then you need to look at the gdb backtrace of the
> running threads at the time of the crash with "thr a a bt" to get more
> info. I don't know how you provide debug symbols for your packages but
> you will need them if you want to make any sens at all of the backtrace.

All threads look allright, except for thread 1 that apparently has a
corrupted stack (and hence caused the SIGBUS):

(gdb) thr a a bt
Thread 7 (Thread 3577):
#0  0x00002b39dd901a48 in do_sigwait () from /lib64/libpthread.so.0
#1  0x00002b39dd901aed in sigwait () from /lib64/libpthread.so.0
#2  0x000055555555d6aa in statemachine (arg=<value optimized out>)
    at automount.c:1382
#3  main (arg=<value optimized out>) at automount.c:2105

Thread 6 (Thread 3578):
#0  0x00002b39dd8fe517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x0000555555571802 in alarm_handler (arg=<value optimized out>)
    at alarm.c:203
#2  0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#3  0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6

Thread 5 (Thread 3579):
#0  0x00002b39dd8fe517 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x000055555556b72d in st_queue_handler (arg=<value optimized out>)
    at state.c:1022
#2  0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#3  0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6

Thread 4 (Thread 3582):
#0  0x00002b39ddbc9b26 in poll () from /lib64/libc.so.6
#1  0x000055555555f2f7 in get_pkt (pkt=<value optimized out>, 
    ap=<value optimized out>) at automount.c:925
#2  handle_packet (pkt=<value optimized out>, ap=<value optimized out>)
    at automount.c:1082
#3  handle_mounts (pkt=<value optimized out>, ap=<value optimized out>)
    at automount.c:1581
#4  0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#5  0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6

Thread 3 (Thread 3585):
#0  0x00002b39ddbc9b26 in poll () from /lib64/libc.so.6
#1  0x000055555555f2f7 in get_pkt (pkt=<value optimized out>, 
    ap=<value optimized out>) at automount.c:925
#2  handle_packet (pkt=<value optimized out>, ap=<value optimized out>)
    at automount.c:1082
#3  handle_mounts (pkt=<value optimized out>, ap=<value optimized out>)
    at automount.c:1581
#4  0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#5  0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6

Thread 2 (Thread 3586):
#0  0x00002b39ddbc9b26 in poll () from /lib64/libc.so.6
#1  0x000055555555f2f7 in get_pkt (pkt=<value optimized out>, 
    ap=<value optimized out>) at automount.c:925
#2  handle_packet (pkt=<value optimized out>, ap=<value optimized out>)
    at automount.c:1082
#3  handle_mounts (pkt=<value optimized out>, ap=<value optimized out>)
    at automount.c:1581
#4  0x00002b39dd8fa193 in start_thread () from /lib64/libpthread.so.0
#5  0x00002b39ddbd1dfd in clone () from /lib64/libc.so.6

Thread 1 (Thread 11657):
Cannot access memory at address 0x800040623598

(gdb) thr 1
[Switching to thread 1 (Thread 11657)]#0  0x0000555555566bd0 in
spawn_mount (
    logopt=Cannot access memory at address 0x80004062242c
) at spawn.c:412
412	}

(gdb) info registers
rax            0x0	0
rbx            0x406223f0	1080173552
rcx            0x1	1
rdx            0x0	0
rsi            0x0	0
rdi            0x1	1
rbp            0x800040623590	0x800040623590
rsp            0x800040623568	0x800040623568
r8             0x1	1
r9             0x2d89	11657
r10            0x8	8
r11            0x246	582
r12            0x0	0
r13            0x0	0
r14            0x2	2
r15            0x406223b0	1080173488
rip            0x555555566bd0	0x555555566bd0 <spawn_mount+832>
eflags         0x10287	[ CF PF SF IF RF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x63	99
gs             0x0	0
fctrl          0x37f	895
fstat          0x0	0
ftag           0xffff	65535
fiseg          0x0	0
fioff          0x0	0
foseg          0x0	0
fooff          0x0	0
fop            0x0	0
mxcsr          0x1f80	[ IM DM ZM OM UM PM ]

> Is your customer using direct mounts?

Yes, lots of direct mounts (more than 9000).

> Is your customer using LDAP?

Yes, all maps are retrieved from LDAP.

> Have a look at the patches below and try and work out if they are
> relevant to the code base you are working with:

Thanks a lot for the useful comments and for listing the patches. I'll
try to merge them in our package.

Kind regards,
Leonardo

      reply	other threads:[~2010-03-18  0:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20100222194830.GA11730@libre.l.ngdn.org>
2010-02-23  2:46 ` Automount daemon getting killed by SIGBUS Ian Kent
2010-02-25  3:15 ` Ian Kent
2010-02-25 11:15   ` Leonardo Chiquitto
2010-02-26  2:30     ` Ian Kent
2010-03-18  0:07       ` Leonardo Chiquitto [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100318000758.GA9422@libre.l.ngdn.org \
    --to=leonardo.lists@gmail.com \
    --cc=autofs@linux.kernel.org \
    --cc=raven@themaw.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.