From mboxrd@z Thu Jan 1 00:00:00 1970 From: seth vidal Subject: Re: failover time and failback time Date: Sat, 26 Aug 2006 14:39:02 -0400 Message-ID: <1156617542.13298.31.camel@cutter> References: <1156613723.13298.18.camel@cutter> <1156615617.13298.25.camel@cutter> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1156615617.13298.25.camel@cutter> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: device-mapper development List-Id: dm-devel.ids On Sat, 2006-08-26 at 14:06 -0400, seth vidal wrote: > On Sat, 2006-08-26 at 13:35 -0400, seth vidal wrote: > > Hi, > > > > > Then I yank one connection on one of the cards in the back of the > > system. > > I watch dmesg and I see: > > qla2300 0000:03:0b.0: LOOP DOWN detected (2). > > > > At this point I would expect multipathd to fail out the paths connected > > and continue happily. > > > > So, I think I know why multipathd was failing back correctly :) > > It's because it wasn't running. I thought it was but I was wrong. > > However, now I'm seeing this when it tries to failover: > Aug 26 14:04:10 multipathd: error calling out /sbin/mpath_prio_alua > 8:240 > Aug 26 14:04:10 kernel: SCSI error : <1 0 3 3> return code = 0x10000 > > I've checked /sbin/mpath_prio_alua works to run - so I'm not sure where > I should look next. It's so fun learning things in semi-public :) This is calling to verify the path. It continues to do this until the path is restored. Now - is there any way to tell multipath: "yes, we know, it's down, stop trying for now b/c it isn't going to be back" Sort of like acknowledging an alert in nagios. I can think of some controlled 'failures' where I might want to tell it to be quiet. Thanks for putting up with my messages. :) -sv