On 08/11/2016 10:50 AM, Bart Van Assche wrote: > On 08/08/2016 05:01 AM, Mike Christie wrote: >> This patch adds a callback which can be used to repair a path >> if check() has determined it is in the PATH_DOWN state. >> >> The next patch that adds rbd checker support which will use this to >> handle the case where a rbd device is blacklisted. > > Hello Mike, > > With this patch applied, with the TUR checker enabled in multipath.conf > I see the following crash if I trigger SRP failover and failback: > > ion-dev-ib-ini:~ # gdb ~bart/software/multipath-tools/multipathd/multipathd > (gdb) handle SIGPIPE noprint nostop > Signal Stop Print Pass to program Description > SIGPIPE No No Yes Broken pipe > (gdb) run -d > Aug 11 08:46:27 | sde: remove path (uevent) > Aug 11 08:46:27 | mpathbe: adding map > Aug 11 08:46:27 | 8:64: cannot find block device > Aug 11 08:46:27 | Invalid device number 1 > Aug 11 08:46:27 | 1: cannot find block device > Aug 11 08:46:27 | 8:96: cannot find block device > Aug 11 08:46:27 | mpathbe: failed to setup multipath > Aug 11 08:46:27 | dm-0: uev_add_map failed > Aug 11 08:46:27 | uevent trigger error > > Thread 4 "multipathd" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff7f8b700 (LWP 8446)] > 0x0000000000000000 in ?? () > (gdb) bt > #0 0x0000000000000000 in ?? () > #1 0x00007ffff6c41905 in checker_repair (c=0x7fffdc001ef0) at checkers.c:225 > #2 0x000000000040a760 in repair_path (vecs=0x66d7e0, pp=0x7fffdc001a40) > at main.c:1733 > #3 0x000000000040ab27 in checkerloop (ap=0x66d7e0) at main.c:1807 > #4 0x00007ffff79bb474 in start_thread (arg=0x7ffff7f8b700) > at pthread_create.c:333 > #5 0x00007ffff63243ed in clone () > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > (gdb) up > #1 0x00007ffff6c41905 in checker_repair (c=0x7fffdc001ef0) at checkers.c:225 > 225 c->repair(c); > (gdb) print *c > $1 = {node = {next = 0x0, prev = 0x0}, handle = 0x0, refcount = 0, fd = 0, > sync = 0, timeout = 0, disable = 0, name = '\000' , > message = '\000' , context = 0x0, mpcontext = 0x0, > check = 0x0, repair = 0x0, init = 0x0, free = 0x0} > Sorry about the stupid bug. Could you try the attached patch. I found two segfaults. If check_path returns less than 0 then we free the path and so we cannot call repair on it. If libcheck_init fails it memsets the checker, so we cannot call repair on it too. I moved the repair call to the specific paths that the path is down.