From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Williams Date: Fri, 5 Feb 2010 14:20:13 -0600 Subject: [Lustre-devel] server-side resending & bulk transfer In-Reply-To: <002d01caa686$72c40d40$584c27c0$@com> References: <20100205163524.GW236@granier.hd.free.fr> <002d01caa686$72c40d40$584c27c0$@com> Message-ID: <20100205202013.GN1061@Sun.COM> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lustre-devel@lists.lustre.org On Fri, Feb 05, 2010 at 05:12:51PM +0000, Eric Barton wrote: > On Feb 5, 2010, at 8:35 AM, Johann Lombardi wrote: > > Unlike lock callback rpcs, losing the start bulk signal is not fatal since > > the bulk transfer will timeout on the server side, the request be dropped > > and the client will resend after reconnection. This is indeed harmless, > > but still causes slowdown which could be avoided according to LLNL if we > > try to resend the start bulk signal (bug 21714). Brian Behlendorf's > > proposal is to resend the start bulk signal after the first l_wait_event() > > timeout in ost_brw_write(). However, we don't know if this is safe to do, > > e.g. how does the client react if it receives duplicated start bulk signals? > > Yes, the server could retry the bulk if it times out and this > will be safe for the client since its bulk buffer is auto-unlinked, > so only 1 bulk PUT/GET can match it. But if the problem happens > on the way back to the server rather than the way out to the client, > you're hosed since the bulk has completed from the client's POV. > > This should be an exceptional circumstance - i.e. a router has > actually failed - so I think it's better just to stick with the > client retrying from scratch rather than tying down a server thread > until it has decided whether there was a router failure or the > client really crashed. I agree that tying down a server thread on a long block is not a good thing. If the LLNL proposal (resend the start bulk signal) is on the money, then the thing to do would be to create a queue and separate service thread(s) to handle such resends. > Roll on the health network! :) Well, if the deadline here is on the order of 1s or thereabouts then the health network isn't likely to help much because we're not going to get sub-second dead node detection. (Well, if we jack up the ping rate and reduce the time-to-declare-death low enough, and make sure that HN threads and messaging are suitably prioritized, then we might be able to get sub-second dead node detection, but my gut feeling is that any heuristic approach should wait for longer than 1s.) Nico --