From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763284AbYDNRhf (ORCPT ); Mon, 14 Apr 2008 13:37:35 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753361AbYDNRh2 (ORCPT ); Mon, 14 Apr 2008 13:37:28 -0400 Received: from brick.kernel.dk ([87.55.233.238]:1721 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753336AbYDNRh1 (ORCPT ); Mon, 14 Apr 2008 13:37:27 -0400 Date: Mon, 14 Apr 2008 19:37:20 +0200 From: Jens Axboe To: scameron@beardog.cca.cpqcorp.net Cc: linux-kernel@vger.kernel.org, mike.miller@hp.com, mikem@beardog.cca.cpqcorp.net Subject: Re: [patch] cciss: Fix race between disk-adding code and interrupt handler Message-ID: <20080414173720.GQ12774@kernel.dk> References: <20080414142019.GG16584@beardog.cca.cpqcorp.net> <20080414170505.GO12774@kernel.dk> <20080414172716.GM16584@beardog.cca.cpqcorp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080414172716.GM16584@beardog.cca.cpqcorp.net> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 14 2008, scameron@beardog.cca.cpqcorp.net wrote: > > > > On Mon, Apr 14 2008, scameron@beardog.cca.cpqcorp.net wrote: > > > > > > > > > Fix race condition between cciss_init_one(), cciss_update_drive_info(), > > > and cciss_check_queues(). cciss_softirq_done would try to start > > > queues which were not quite ready to be started, as its checks for > > > readiness were not sufficiently synchronized with the queue initializing > > > code in cciss_init_one and cciss_update_drive_info. Slow cpu and > > > large numbers of logical drives seem to make the race more likely > > > to cause a problem. > > > > Hmm, this seems backwards to me. cciss_softirq_done() isn't going to > > start the queues, until an irq has triggered for instance. Why isn't the > > init properly ordered instead of band-aiding around this with a > > 'queue_ready' variable? > > > > Each call to add_disk() will trigger some interrupts, > and earlier added disks may cause the queues of later, > not-yet-completely added disks to be started. > > I suppose the init routine might be reorganized to initialize all > the queues, then have second loop call add_disk() for all > of them. Is that what you had in mind by "properly ordered?" Yep precisely, don't call add_disk() until everything is set up. > Disks may be added at run time though, and I think this tears > down all but the first disk, and re-adds them all, if I remember > right, so there is some complication there to think about. Well, other drivers manage quite fine without resorting to work-arounds :-) -- Jens Axboe