From: Jeff Mahoney <jeffm@suse.de>
To: Andi Kleen <andi@firstfloor.org>
Cc: Btrfs List <linux-btrfs@vger.kernel.org>
Subject: Re: [patch 07/99] btrfs: Use mempools for extent_state structures
Date: Fri, 02 Dec 2011 23:53:24 -0500 [thread overview]
Message-ID: <4ED9AB44.60600@suse.de> (raw)
In-Reply-To: <4ED7DB9F.4030502@suse.de>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 12/01/2011 02:55 PM, Jeff Mahoney wrote:
> On 11/28/2011 07:04 PM, Jeff Mahoney wrote:
>> On 11/28/2011 06:53 PM, Andi Kleen wrote:
>>> Jeff Mahoney <jeffm@suse.com> writes:
>
>>>> The extent_state structure is used at the core of the extent
>>>> i/o code for managing flags, locking, etc. It requires
>>>> allocations deep in the write code and if failures occur
>>>> they are difficult to recover from.
>>>>
>>>> We avoid most of the failures by using a mempool, which can
>>>> sleep when required, to honor the allocations. This allows
>>>> future patches to convert most of the
>>>> {set,clear,convert}_extent_bit and derivatives to return
>>>> void.
>
>>> Is this really safe?
>
>>> iirc if there's any operation that needs multiple mempool
>>> objects you can get into the following ABBA style deadlock:
>
>>> thread 1 thread 2
>
>>> get object A from pool 1 get object C from pool 2 get object B
>>> from pool 2 pool 2 full -> block get object D from pool 2
>> ^ pool1, i assume
>>> pool1 full -> block
>
>
>>> Now for thread 1 to make progress it needs thread 2 to free its
>>> object first, but that needs thread 1 to free its object also
>>> first, which is a deadlock.
>
>>> It would still work if there are other users which eventually
>>> make progress, but if you're really unlucky either pool 1 or 2
>>> is complete used by threads doing a multiple object operation.
>>> So you got a nasty rare deadlock ...
>
>> Yes, I think you're right. I think the risk is probably there
>> and I'm not sure it can be totally mitigated. We'd be stuck if
>> there is a string of pathological cases and both mempool are
>> empty. The only way I can see to try to help the situation is to
>> make the mempool fairly substantial, which is what I did here.
>> I'd prefer to make the mempools per-fs but that would require
>> pretty heavy modifications in order to pass around a per-fs
>> struct. In any case, the risk isn't totally eliminated.
>
> The more I look into it, the more I don't think this is an
> uncommon scenario in the kernel. Device mapper draws from a number
> of mempools that can be interspersed with allocations from the bio
> pool. Even without stacking different types of dm devices, I think
> we can run into this scenario. The DM scenario is probably even
> worse since the allocs are GFP_NOIO instead of the (relatively)
> more relaxed GFP_NOFS.
>
> I'm not saying it's ok, just that there are similar sites already
> that seem to be easier to hit but we haven't heard of issues there
> either. It seems to be a behavior that is largely mitigated by
> establishing mempools of sufficient size.
... and the next installment:
My understanding of mempools is inaccurate and the pathological case
can be hit much more quickly than I anticipated, or at least have a
performance impact I'd rather not see. It turns out that the pool will
be depleted *before* the other pathological cases are hit. In fact,
the pool is hit before reclaim starts at all.
Rather than:
normal alloc with gfp_t
(normal alloc fails after trying reclaim)
pool alloc
(pool alloc fails)
(wait, timeout, retry)
it is:
normal alloc with gfp_t & ~(__GFP_WAIT|__GFP_IO)
(which is GFP_NOFS & ~GFP_NOFS for the common case)
.. so:
normal alloc with GFP_NOWAIT
(normal alloc fails) (much more often)
pool alloc
(pool alloc fails)
((wait until an object is freed or timeout) and retry with GFP_NOFS)
So, using mempools for all cases as I'd hoped isn't what I want. I'll
work up another set of patches that use a straight kmem_cache_alloc
where possible. The good news is that slab-backed mempools can be
replenished with objects from the slab that weren't allocated from the
pool. So there's that.
- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQIcBAEBAgAGBQJO2atEAAoJEB57S2MheeWy2CYP/iix4znJVqBkN8wUg31MELGB
UpodfZkwlhrD/8PfJrisjA1wLLKpK5GzJ0m6rhKCYvkBau28yCSAWGW9sJIfGM0I
4fLAoupncoJqmYyQvrZIczxztDVwMu37bL7fNfCkC64HoaETChQbaDbrwu+L8Rn2
Lls7EtkUNWrpY+J4F0zq9b8eE46WjbpZBKhI/PuqAE0LfSXK8nd46Z5qA/MM7fab
tFftC0AyBLP2dq8R0H1IX0yjri6YD0xwwfdHdbzVAoJ/tINVbfiQxntYyONgNnBF
6sNogCtcskpTSDHZyVK7ATuJJL6ZAIFO0ZUJjZYFc+q0q1oYMfmbhNs9Qq5le7bZ
Ig6pcMgHqGOqkis95jqzgStl2A1OIYPZsn6K1329N44fvZ8PjxDVCHS83FzBY0qw
YuEgBNd8vRrdjtcMax2QOs9yaSmvEXDkws1+tLWg/ZV0Ik8crbW0ctVqsyDpWwPN
2dSyrlxGbAJyXzELUg498dLORw+chHokUhsEvwYEmL1HGLsZGFoXAV3H5df4v6Mx
wsKZeT+Nsp16CyYOXVCAWGRyp6FPDWECJAUxygjyEaIGWmiJjMKU1GizL9J4fgc+
59fantrAMbFwPpRwJxfHumCdnQOi55qtrcP1UvCB3o9jBH4QsUofP4sGQsIAw4oQ
ctuvtI1R7zcub0906BwP
=h+a2
-----END PGP SIGNATURE-----
next prev parent reply other threads:[~2011-12-03 4:53 UTC|newest]
Thread overview: 115+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-24 0:35 [patch 00/99] Error handling patchset v6 Jeff Mahoney
2011-11-24 0:35 ` [patch 01/99] btrfs: Add btrfs_panic() Jeff Mahoney
2011-11-24 2:05 ` David Brown
2011-11-24 2:22 ` Jeff Mahoney
2011-11-24 6:37 ` David Brown
2011-11-25 2:36 ` Jeff Mahoney
2011-11-25 5:42 ` David Brown
2011-11-24 0:35 ` [patch 02/99] btrfs: Catch locking failures in {set,clear,convert}_extent_bit Jeff Mahoney
2011-11-24 0:35 ` [patch 03/99] btrfs: Panic on bad rbtree operations Jeff Mahoney
2011-11-24 23:41 ` David Sterba
2011-11-25 2:13 ` Jeff Mahoney
2011-11-24 0:35 ` [patch 04/99] btrfs: Simplify btrfs_insert_root Jeff Mahoney
2011-11-24 0:35 ` [patch 05/99] btrfs: Remove set bits return from clear_extent_bit Jeff Mahoney
2011-11-24 0:35 ` [patch 06/99] btrfs: Add extent_state alloc/free tracing Jeff Mahoney
2011-11-24 0:35 ` [patch 07/99] btrfs: Use mempools for extent_state structures Jeff Mahoney
2011-11-28 23:53 ` Andi Kleen
2011-11-29 0:04 ` Jeff Mahoney
2011-12-01 19:55 ` Jeff Mahoney
2011-12-03 4:53 ` Jeff Mahoney [this message]
2011-11-24 0:35 ` [patch 08/99] btrfs: clear_extent_bit should return void with __GFP_WAIT set Jeff Mahoney
2011-11-24 0:35 ` [patch 09/99] btrfs: unlock_extent can return void Jeff Mahoney
2011-11-24 0:35 ` [patch 10/99] btrfs: Split unlock_extent_cached into sleeping and atomic versions Jeff Mahoney
2011-11-24 0:35 ` [patch 11/99] btrfs: unlock_extent can drop gfp_t argument Jeff Mahoney
2011-11-24 0:35 ` [patch 12/99] btrfs: clear_extent_dirty " Jeff Mahoney
2011-11-24 0:35 ` [patch 13/99] btrfs: clear_extent_uptodate can drop gfp_t argumetn Jeff Mahoney
2011-11-24 23:57 ` David Sterba
2011-11-25 2:14 ` Jeff Mahoney
2011-11-24 0:35 ` [patch 14/99] btrfs: clear_extent_bits " Jeff Mahoney
2011-11-24 0:35 ` [patch 15/99] btrfs: try_lock_extent " Jeff Mahoney
2011-11-24 0:35 ` [patch 16/99] btrfs: clear_extent_bit can drop gfp_t argument Jeff Mahoney
2011-11-24 0:35 ` [patch 17/99] btrfs: set_extent_bit: split exclusive mode out Jeff Mahoney
2011-11-24 0:35 ` [patch 18/99] btrfs: set_extent_bit should return void with __GFP_WAIT set Jeff Mahoney
2011-11-24 0:35 ` [patch 19/99] btrfs: lock_extent can drop gfp_t argument Jeff Mahoney
2011-11-24 0:35 ` [patch 20/99] btrfs: set_extent_dirty " Jeff Mahoney
2011-11-24 0:35 ` [patch 21/99] btrfs: set_extent_bits " Jeff Mahoney
2011-11-24 0:35 ` [patch 22/99] btrfs: set_extent_delalloc " Jeff Mahoney
2011-11-24 0:35 ` [patch 23/99] btrfs: set_extent_new " Jeff Mahoney
2011-11-24 0:35 ` [patch 24/99] btrfs: set_extent_uptodate " Jeff Mahoney
2011-11-24 0:35 ` [patch 25/99] btrfs: set_extent_bit " Jeff Mahoney
2011-11-24 0:35 ` [patch 26/99] btrfs: set_extent_buffer_uptodate should return void Jeff Mahoney
2011-11-24 0:36 ` [patch 27/99] btrfs: set_extent_bit should return -ENOMEM on GFP_ATOMIC failures Jeff Mahoney
2011-11-24 0:36 ` [patch 28/99] btrfs: clear_extent_bit error push-up Jeff Mahoney
2011-11-24 0:36 ` [patch 29/99] btrfs: convert_extent_bit should return void with __GFP_WAIT set Jeff Mahoney
2011-11-24 0:36 ` [patch 30/99] btrfs: pin_down_extent should return void Jeff Mahoney
2011-11-24 0:36 ` [patch 31/99] btrfs: btrfs_pin_extent error push-up Jeff Mahoney
2011-11-24 0:36 ` [patch 32/99] btrfs: btrfs_drop_snapshot should return int Jeff Mahoney
2011-11-24 0:36 ` [patch 33/99] btrfs: btrfs_start_transaction non-looped error push-up Jeff Mahoney
2011-11-24 0:36 ` [patch 34/99] btrfs: find_and_setup_root " Jeff Mahoney
2011-11-24 0:36 ` [patch 35/99] btrfs: btrfs_update_root " Jeff Mahoney
2011-11-24 0:36 ` [patch 36/99] btrfs: set_range_writeback should return void Jeff Mahoney
2011-11-24 0:36 ` [patch 37/99] btrfs: wait_on_state " Jeff Mahoney
2011-11-24 0:36 ` [patch 38/99] btrfs: wait_extent_bit " Jeff Mahoney
2011-11-24 0:36 ` [patch 39/99] btrfs: __unlock_for_delalloc " Jeff Mahoney
2011-11-24 0:36 ` [patch 40/99] btrfs: check_page_uptodate " Jeff Mahoney
2011-11-24 0:36 ` [patch 41/99] btrfs: check_page_locked " Jeff Mahoney
2011-11-24 0:36 ` [patch 42/99] btrfs: check_page_writeback " Jeff Mahoney
2011-11-24 0:36 ` [patch 43/99] btrfs: clear_extent_buffer_dirty " Jeff Mahoney
2011-11-24 0:36 ` [patch 44/99] btrfs: btrfs_cleanup_fs_uuids " Jeff Mahoney
2011-11-24 0:36 ` [patch 45/99] btrfs: run_scheduled_bios " Jeff Mahoney
2011-11-24 0:36 ` [patch 46/99] btrfs: btrfs_close_extra_devices " Jeff Mahoney
2011-11-24 0:36 ` [patch 47/99] btrfs: schedule_bio " Jeff Mahoney
2011-11-24 0:36 ` [patch 48/99] btrfs: fill_device_from_item " Jeff Mahoney
2011-11-24 0:36 ` [patch 49/99] btrfs: btrfs_queue_worker " Jeff Mahoney
2011-11-24 0:36 ` [patch 50/99] btrfs: run_ordered_completions " Jeff Mahoney
2011-11-24 0:36 ` [patch 51/99] btrfs: btrfs_stop_workers " Jeff Mahoney
2011-11-24 0:36 ` [patch 52/99] btrfs: btrfs_requeue_work " Jeff Mahoney
2011-11-24 0:36 ` [patch 53/99] btrfs: btrfs_end_log_trans " Jeff Mahoney
2011-11-24 0:36 ` [patch 54/99] btrfs: wait_for_writer " Jeff Mahoney
2011-11-24 0:36 ` [patch 55/99] btrfs: btrfs_init_compress " Jeff Mahoney
2011-11-24 0:36 ` [patch 56/99] btrfs: btrfs_invalidate_inodes " Jeff Mahoney
2011-11-24 0:36 ` [patch 57/99] btrfs: __setup_root " Jeff Mahoney
2011-11-24 0:36 ` [patch 58/99] btrfs: btrfs_destroy_delalloc_inodes " Jeff Mahoney
2011-11-24 0:36 ` [patch 59/99] btrfs: btrfs_prepare_extent_commit " Jeff Mahoney
2011-11-24 0:36 ` [patch 60/99] btrfs: btrfs_set_block_group_rw " Jeff Mahoney
2011-11-24 0:36 ` [patch 61/99] btrfs: setup_inline_extent_backref " Jeff Mahoney
2011-11-24 0:36 ` [patch 62/99] btrfs: btrfs_run_defrag_inodes " Jeff Mahoney
2011-11-24 0:36 ` [patch 63/99] btrfs: Simplify btrfs_submit_bio_hook Jeff Mahoney
2011-11-24 0:36 ` [patch 64/99] btrfs: Factor out tree->ops->merge_bio_hook call Jeff Mahoney
2011-11-24 0:36 ` [patch 65/99] btrfs: ->submit_bio_hook error push-up Jeff Mahoney
2011-11-25 0:46 ` David Sterba
2011-11-25 2:17 ` Jeff Mahoney
2011-11-24 0:36 ` [patch 66/99] btrfs: __add_reloc_root " Jeff Mahoney
2011-11-24 0:36 ` [patch 67/99] btrfs: fixup_low_keys should return void Jeff Mahoney
2011-11-24 0:36 ` [patch 68/99] btrfs: setup_items_for_insert " Jeff Mahoney
2011-11-24 0:36 ` [patch 69/99] btrfs: del_ptr " Jeff Mahoney
2011-11-24 0:36 ` [patch 70/99] btrfs: insert_ptr " Jeff Mahoney
2011-11-24 0:36 ` [patch 71/99] btrfs: add_delayed_ref_head " Jeff Mahoney
2011-11-24 0:36 ` [patch 72/99] btrfs: add_delayed_tree_ref " Jeff Mahoney
2011-11-24 0:36 ` [patch 73/99] btrfs: add_delayed_data_ref " Jeff Mahoney
2011-11-24 0:36 ` [patch 74/99] btrfs: Fix kfree of member instead of structure Jeff Mahoney
2011-11-24 0:36 ` [patch 75/99] btrfs: Use mempools for delayed refs Jeff Mahoney
2011-11-24 0:36 ` [patch 76/99] btrfs: Delayed ref mempool functions should return void Jeff Mahoney
2011-11-24 0:36 ` [patch 77/99] btrfs: btrfs_inc_extent_ref void return prep Jeff Mahoney
2011-11-24 0:36 ` [patch 78/99] btrfs: btrfs_free_extent " Jeff Mahoney
2011-11-24 0:36 ` [patch 79/99] btrfs: __btrfs_mod_refs process_func should return void Jeff Mahoney
2011-11-24 0:36 ` [patch 80/99] btrfs: __btrfs_mod_ref " Jeff Mahoney
2011-11-24 0:36 ` [patch 81/99] btrfs: clean_tree_block " Jeff Mahoney
2011-11-24 0:36 ` [patch 82/99] btrfs: btrfs_truncate_item " Jeff Mahoney
2011-11-24 0:36 ` [patch 83/99] btrfs: btrfs_extend_item " Jeff Mahoney
2011-11-24 0:36 ` [patch 84/99] btrfs: end_compressed_writeback " Jeff Mahoney
2011-11-24 0:36 ` [patch 85/99] btrfs: copy_for_split " Jeff Mahoney
2011-11-24 0:36 ` [patch 86/99] btrfs: update_inline_extent_backref " Jeff Mahoney
2011-11-24 0:37 ` [patch 87/99] btrfs: btrfs_put_ordered_extent " Jeff Mahoney
2011-11-24 0:37 ` [patch 88/99] btrfs: __btrfs_remove_ordered_extent " Jeff Mahoney
2011-11-24 0:37 ` [patch 89/99] btrfs: btrfs_wait_ordered_extents " Jeff Mahoney
2011-11-24 0:37 ` [patch 90/99] btrfs: btrfs_wait_ordered_range " Jeff Mahoney
2011-11-24 0:37 ` [patch 91/99] btrfs: btrfs_run_ordered_operations " Jeff Mahoney
2011-11-24 0:37 ` [patch 92/99] btrfs: btrfs_add_ordered_operation " Jeff Mahoney
2011-11-24 0:37 ` [patch 93/99] btrfs: btrfs_add_ordered_sum " Jeff Mahoney
2011-11-24 0:37 ` [patch 94/99] btrfs: btrfs_free_fs_root " Jeff Mahoney
2011-11-24 0:37 ` [patch 95/99] btrfs: del_fs_roots " Jeff Mahoney
2011-11-24 0:37 ` [patch 96/99] btrfs: btrfs_destroy_ordered_operations " Jeff Mahoney
2011-11-24 0:37 ` [patch 97/99] btrfs: btrfs_destroy_ordered_extents " Jeff Mahoney
2011-11-24 0:37 ` [patch 98/99] btrfs: btrfs_destroy_pending_snapshots " Jeff Mahoney
2011-11-24 0:37 ` [patch 99/99] btrfs: add_excluded_extent " Jeff Mahoney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ED9AB44.60600@suse.de \
--to=jeffm@suse.de \
--cc=andi@firstfloor.org \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).