From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Dilger Date: Fri, 09 Jan 2009 17:50:16 -0700 Subject: [Lustre-devel] imperative recovery In-Reply-To: References: <1906DB02-F9DF-4F49-9A9A-23FE7E799EA8@sun.com> <046101c95ef4$2fe3a8d0$8faafa70$@com> <494AAF4A.4030304@sun.com> <49676CF9.5050805@cray.com> Message-ID: <20090110005016.GL13721@webber.adilger.int> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lustre-devel@lists.lustre.org On Jan 09, 2009 09:04 -0800, Robert Read wrote: > On Jan 9, 2009, at 07:27 , Nicholas Henke wrote: > > This would be a great enhancement for OSS failover or reboot, it is > > really the only way we'll get to recovery times under ~2.5 x obd_timeout. > > > > I do think this will miss a significant case: combo MGS+MDS. A > > majority of our customers are deploying with this configuration. > > Perhaps exposing this mechanism on the clients via a /proc file > > would be enough - that way a failover framework > > could manually trigger the timeout and/or nid switching. > > Yes, exactly what I was thinking. Exposing this feature via proc (or > lctl) on the clients is the first step. It's has minimal impact, > requires no changes to the server, and should integrate well with > existing failover frameworks. We also need to get the server to end > recovery sooner (without waiting for all the stale exports), but VBR > should help with that. Hey, wouldn't (essentially) "lctl --device $foo recover" do the trick today? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.