From mboxrd@z Thu Jan 1 00:00:00 1970 From: Doug Ledford Subject: Re: possible use-after-free in 2.5.44 scsi changes Date: Fri, 25 Oct 2002 00:06:59 -0400 Sender: linux-scsi-owner@vger.kernel.org Message-ID: <20021025040659.GB3556@redhat.com> References: <3DB8A0CC.1804DF79@digeo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <3DB8A0CC.1804DF79@digeo.com> List-Id: linux-scsi@vger.kernel.org To: Andrew Morton Cc: "linux-scsi@vger.kernel.org" , Badari Pulavarty , "Martin J. Bligh" , Jens Axboe On Thu, Oct 24, 2002 at 06:39:24PM -0700, Andrew Morton wrote: > > Gents, > > we have some code in the -mm patchsets which adds a per-cpu > LIFO pool which frontends the page allocator. To return pages > which are cache-warm on the calling CPU. > > That code has been stable and unchanging since 2.5.40. But in > 2.5.44, Badari's machines are crashing when those patches are > applied. Memory corruption deep in the scsi softirq callbacks. > > There were no significant memory allocator changes between 2.5.43 > and 2.5.44, but there were a lot of scsi changes. [ snip ] Sorry I haven't been able to do anything since Tuesday and won't be able to again until next week around Wednesday or so (company meeting stuff the entire time :-( Anyway, I've got all my current updates pushed to linux-scsi.bkbits.net/scsi-misc-2.5 and I know James and Patrick have some fixes in there as well. I also know someone has been changing around the scsi merge function and the scsi init io function recently, and it hasn't been me ;-) In any case, the thing appears to be leaking memory and might be partially related to this problem. If you touch any device on the scsi bus such that it results in an actual merge between two requests, and a sg table has to be realloced to a larger size in order to accomodate the combined sg table size, then it appears that the smaller table(s) are leaked. Just try hitting the disk with an e2fsck or similar program and then try unloading the complete scsi stack to see the failure on attempt to free the sg table caches when unloading scsi_mod. Debugging takes a bit too since it means you have to reboot the machine to be able to load the scsi modules again :-/ If someone could look into that specific problem it might give a clue into this other problem (and sorry I can't do more about it myself right now, it's what I was just starting to look into when I ran out of time and into meetings). -- Doug Ledford 919-754-3700 x44233 Red Hat, Inc. 1801 Varsity Dr. Raleigh, NC 27606