From mboxrd@z Thu Jan 1 00:00:00 1970 From: Olivier Bonvalet Subject: Re: Issue #5876 : assertion failure in rbd_img_obj_callback() Date: Wed, 26 Mar 2014 04:58:55 +0100 Message-ID: <1395806335.2076.69.camel@localhost> References: <1395736765.2823.29.camel@localhost> <53317BC2.9010700@ieee.org> <1395753516.2823.37.camel@localhost> <533184AF.9050101@ieee.org> <5331853D.40408@ieee.org> <1395767705.9967.5.camel@localhost> <5331C05D.1060008@ieee.org> <1395773582.2076.10.camel@localhost> <5331D2E8.6060002@ieee.org> <1395778894.2076.12.camel@localhost> <1395780835.2076.15.camel@localhost> <1395781847.2076.21.camel@localhost> <1395782577.2076.23.camel@localhost> <1395783675.2076.26.camel@localhost> <1395784476.2076.28.camel@localhost> <1395785839.2076.30.camel@localhost> <5332416D.20600@ieee.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Received: from licorne.daevel.fr ([178.32.94.222]:48393 "EHLO licorne.daevel.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753371AbaCZD7E (ORCPT ); Tue, 25 Mar 2014 23:59:04 -0400 In-Reply-To: <5332416D.20600@ieee.org> Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Alex Elder Cc: Ilya Dryomov , Ceph Development Le mardi 25 mars 2014 =C3=A0 21:54 -0500, Alex Elder a =C3=A9crit : > OK, here's some more analysis of this one. I'm pretty > convinced the object request in question has already > been freed/destroyed. The real question is "why?" >=20 > On 03/25/2014 05:17 PM, Olivier Bonvalet wrote: > > Mar 25 23:14:45 rurkh kernel: [ 330.054196] rbd_img_obj_callback: = bad image object request information: > > Mar 25 23:14:45 rurkh kernel: [ 330.054205] obj_request ffff88025f= 3df058 > > Mar 25 23:14:45 rurkh kernel: [ 330.054209] ->object_name <(nu= ll)> > This can occur for an object request that has been destroyed. > After the name is freed, the field is set to NULL. >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054211] ->offset 0 > > Mar 25 23:14:45 rurkh kernel: [ 330.054213] ->length 4096 > Given the image request, this makes sense as the second of > two object request in this image request. (More on the offsets > below). >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054216] ->type 0x1 > This is OBJ_REQUEST_BIO. Reasonable. >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054218] ->flags 0x3 > This means IMG_DATA and DONE are set. It's an image object > (which is expected), and it's done (which is also expected, > because we're calling the callback function). >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054220] ->which 4294967295 > This is BAD_WHICH. That value overwrites the original when > an image object request is removed from its image in > rbd_img_obj_request_del(). >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054222] ->xferred 4096 > > Mar 25 23:14:45 rurkh kernel: [ 330.054224] ->result 0 > > Mar 25 23:14:45 rurkh kernel: [ 330.054227] img_request ffff880273= 1f8448 > > Mar 25 23:14:45 rurkh kernel: [ 330.054229] ->snap 0xfffffffff= ffffffe > This is CEPH_NOSNAP. >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054231] ->offset 250818150= 4 > > Mar 25 23:14:45 rurkh kernel: [ 330.054233] ->length 16384 > Offset is 0x957FD000, length 0x4000, The first object request > should be (assuming 4MB objects) object 255, offset 0x3FD000, > length 0x3000 (or 12KB) The second object request should be in > object 256, offset 0, length 0x1000 (or 4KB). >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054235] ->flags 0x0 > It is not layered nor initiated by a child image. It was a read. >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054237] ->obj_request_coun= t 0 > As objects are removed from an image request, the request count > is decremented until it's zero. This suggests this image request > (which looks for the most part valid) has been destroyed (or at > least is in the process). >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054239] ->next_completion = 2 > For an image request comprised of 2 object requests, this is > the right value when both of the object requests have completed. > The next completion (which will never occur) equals the object > request count (2). >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054241] ->xferred 16384 > This means the request transferred completely... >=20 > > Mar 25 23:14:45 rurkh kernel: [ 330.054243] ->result 0 > ...and was successful. >=20 >=20 > So, it looks like the image request completed successfully. The > first and second object requests must have both passed through > rbd_img_obj_callback() in order for this to occur, and both of > them transferred all their data successfully. >=20 > However, at the time of this assertion failure, the callback > is being called for the already-completed second object > request. Why? >=20 > The only place an object request's callback is called is in > rbd_obj_request_complete(). Olivier reports that these are > version 1 images and there is no layering involved. So the > only place rbd_obj_request_complete() is called is from > rbd_osd_req_callback(), and then only if the object request > has been marked done by (in this case) rbd_osd_read_callback(). > For a non-layered image object request, that always occurs > (after possibly zeroing all or part of the buffer). So it > makes sense that these OSD read requests completed normally, > and triggered object request completion, and when both of > them had completed they caused the image request to complete. >=20 > This suggests that rbd_osd_req_callback() got called more > than once for that second object request. That function is > called only via ceph_osd_request->r_callback(), and that > only occurs in net/ceph/osd_client.c:handle_reply(). It > only calls it if local variable already_completed is false. > That's set to the osd request's r_got_reply value, taken > while the OSD client' request mutex is held. >=20 > Any chance the osd client can be calling r_callback twice? >=20 > A second possibility is that the last object request > got duplicated somehow. Right now I'm looking at > rbd_img_request_fill() and bio_chain_clone_range() to > see if I can see a way that could happen. >=20 > That's it for now. I'm pretty sure this is getting > pretty close to the root cause. >=20 > -Alex >=20 I'm pretty sure that the new high frequency of the crash is a part of the problem. Before the spinlock fix, the problem was hard to trigger (once per week, maybe per day on two hosts only). Now it's really hard to *don't* trigger it : I now have 5 client servers, and only one of them have more than one hour of uptime. I can be wrong, since I don't know neither really understand the code, but from what I understand, there was a "ghost" request, a duplicate, that follow the good one. Before the spinlock fix, this second request could be proceed if next_completion was not yet modified, so this "bug" hadn't a big impact. But now that next_completion is properly checked with the spinlock, tha= t ghost request is no more proceed neither ignored, it throws a kernel bug. =46rom my point of view, since it seems to be an orphaned read request,= it should be track in logs (this request should not exists) but then it should be ignored without locking the kernel. =46rom now, I switch on previous kernels (yes, I should do that some ho= urs sooner ;)). -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html