From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff Mahoney Subject: Re: [patch 07/99] btrfs: Use mempools for extent_state structures Date: Thu, 01 Dec 2011 14:55:11 -0500 Message-ID: <4ED7DB9F.4030502@suse.de> References: <20111124003533.395674389@suse.com> <20111124004220.628076319@suse.com> <4ED42171.6090704@suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Cc: Jeff Mahoney , Btrfs List To: Andi Kleen Return-path: In-Reply-To: <4ED42171.6090704@suse.de> List-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/28/2011 07:04 PM, Jeff Mahoney wrote: > On 11/28/2011 06:53 PM, Andi Kleen wrote: >> Jeff Mahoney writes: > >>> The extent_state structure is used at the core of the extent >>> i/o code for managing flags, locking, etc. It requires >>> allocations deep in the write code and if failures occur they >>> are difficult to recover from. >>> >>> We avoid most of the failures by using a mempool, which can >>> sleep when required, to honor the allocations. This allows >>> future patches to convert most of the >>> {set,clear,convert}_extent_bit and derivatives to return void. > >> Is this really safe? > >> iirc if there's any operation that needs multiple mempool >> objects you can get into the following ABBA style deadlock: > >> thread 1 thread 2 > >> get object A from pool 1 get object C from pool 2 get object B >> from pool 2 pool 2 full -> block get object D from pool 2 > ^ pool1, i assume >> pool1 full -> block > > >> Now for thread 1 to make progress it needs thread 2 to free its >> object first, but that needs thread 1 to free its object also >> first, which is a deadlock. > >> It would still work if there are other users which eventually >> make progress, but if you're really unlucky either pool 1 or 2 >> is complete used by threads doing a multiple object operation. So >> you got a nasty rare deadlock ... > > Yes, I think you're right. I think the risk is probably there and > I'm not sure it can be totally mitigated. We'd be stuck if there is > a string of pathological cases and both mempool are empty. The only > way I can see to try to help the situation is to make the mempool > fairly substantial, which is what I did here. I'd prefer to make > the mempools per-fs but that would require pretty heavy > modifications in order to pass around a per-fs struct. In any case, > the risk isn't totally eliminated. The more I look into it, the more I don't think this is an uncommon scenario in the kernel. Device mapper draws from a number of mempools that can be interspersed with allocations from the bio pool. Even without stacking different types of dm devices, I think we can run into this scenario. The DM scenario is probably even worse since the allocs are GFP_NOIO instead of the (relatively) more relaxed GFP_NOFS. I'm not saying it's ok, just that there are similar sites already that seem to be easier to hit but we haven't heard of issues there either. It seems to be a behavior that is largely mitigated by establishing mempools of sufficient size. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIcBAEBAgAGBQJO19ueAAoJEB57S2MheeWyDb4QAKZ3T9JGTXZN9MikO0Q8aH08 KE0X4NSnagn/goS/88+6Hj9Z165gQ3ERUaoW4+VsdpLOAaJuNo4YtMShoN7pgC9X HeaddX1cQb8aducWPt9o7sglkpERJHzdSNKO4rJaYQbJLZFBQ1KLQEISRjS1ZAiu 74Y/t3tESiJO8W5XV1b06pythUKG4sLPK8+ssBvpYt+qrfhfp//dse/sKPE3L1k5 zHP0GIePRazhkE9RUB8+5rjItUR7MSvECJviCKKhxyY/TsQRm4g27AhbL717Ew/9 V3LyyZe5ojhXWNDyaCeLUu4j/xOcYVftkxuQLHcVltTDaAzxCkaRBnzxoz5nouc+ SusUzVYy/aR14QyNbtFpfJVAB3E9djXsaA3EZO2ar+NPeBch28LEJRfC8hoItej3 O39PAFpviYRSHa12GDIqJ0x3q9auTeU5DRPt3gnpstT26t4vICkFn4B/cp9EzpQI G1Gm09ftkehpSdBiFFSBXCpPg/7qTAMPnaF1Yq3ueQoWbAqVEtnPnaWQLfH4IBBj 5fp6ei0/zAS2qr1iBKZnWUKJKCIAyvCkEeLusTcLymFzLzVMpywYWchU+EW0AZ5G 98tx8Dgzi77ImHG4uF9uWuNJ80G7iR+nGgE+8dKIoEKLETHdoQMRgTCc05o9tTHM Gk0GUwRESn97r/ytUm8n =zdAv -----END PGP SIGNATURE-----