From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from barkeeper1 (office.linbit [213.229.1.138]) by mail.linbit.com (LINBIT Mail Daemon) with ESMTP id C4D002DF6558 for ; Wed, 6 Sep 2006 10:09:31 +0200 (CEST) Date: Wed, 6 Sep 2006 10:09:31 +0200 From: Lars Ellenberg To: drbd-dev@lists.linbit.com Subject: Re: [Drbd-dev] DRBD-8 - system hangs when NegDReply received Message-ID: <20060906080931.GA30543@barkeeper1.linbit> References: <342BAC0A5467384983B586A6B0B37671038AFA18@EXNA.corp.stratus.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <342BAC0A5467384983B586A6B0B37671038AFA18@EXNA.corp.stratus.com> List-Id: Coordination of development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , / 2006-09-05 21:41:36 -0400 \ Graham, Simon: > I'd still like to understand why simply completing the original request > with an error similar to what is done in receive_DataReply leads to a > hang - all suggestions gratefully received - this is what the NegDReply > code looks like now: > > STATIC int got_NegDReply(drbd_dev *mdev, Drbd_Header* h) > { > drbd_request_t *req; > Drbd_BlockAck_Packet *p = (Drbd_BlockAck_Packet*)h; > sector_t sector = be64_to_cpu(p->sector); > > req = (drbd_request_t *)(unsigned long)p->block_id; > if(unlikely(!drbd_pr_verify(mdev,req,sector))) { > ERR("Got a corrupt block_id/sector pair(3).\n"); > return FALSE; > } > > ERR("Got NegDReply; Sector %llx, len %x; Fail original > request.\n", > (unsigned long long)sector,be32_to_cpu(p->blksize)); > > spin_lock(&mdev->pr_lock); > hlist_del(&req->colision); > spin_unlock(&mdev->pr_lock); > > /* Complete original request with error */ > drbd_bio_endio(req->master_bio,0 /* failed */); I am still working on a monster patch to consolidate all the request functionality in one place, so it is more obvious what should and should not happen. I may be wrong here, but you cannot simply end the master request and free the req because you get a NegDReply. the local part (submit_bio) may still be on the fly. you have to use drbd_end_req with appropriate flags... > > dec_ap_bio(mdev); > dec_ap_pending(mdev); > > drbd_req_free(req); > > drbd_khelper(mdev,"pri-on-incon-degr"); > > return TRUE; > } -- : Lars Ellenberg Tel +43-1-8178292-55 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Schoenbrunner Str. 244, A-1120 Vienna/Europe http://www.linbit.com :