* sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage
@ 2009-06-08 2:37 Wu Fengguang
2009-06-08 4:55 ` KOSAKI Motohiro
2009-07-06 10:52 ` Herbert Xu
0 siblings, 2 replies; 17+ messages in thread
From: Wu Fengguang @ 2009-06-08 2:37 UTC (permalink / raw)
To: LKML; +Cc: linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
Hi,
This lockdep warning appears when doing stress memory tests over NFS.
page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim
Any ideas?
Thanks,
Fengguang
---
[ 1630.751276] NFS: Server wrote zero bytes, expected 4096.
[ 1637.984875]
[ 1637.984878] =================================
[ 1637.987429] [ INFO: inconsistent lock state ]
[ 1637.987429] 2.6.30-rc8-mm1 #299
[ 1637.987429] ---------------------------------
[ 1637.987429] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1637.987429] kswapd0/387 [HC0[0]:SC0[1]:HE1:SE0] takes:
[ 1637.987429] (sk_lock-AF_INET-RPC){+.+.?.}, at: [<ffffffff81458972>] tcp_sendmsg+0x22/0xbc0
[ 1637.987429] {RECLAIM_FS-ON-W} state was registered at:
[ 1637.987429] [<ffffffff81079b18>] mark_held_locks+0x68/0x90
[ 1637.987429] [<ffffffff81079c35>] lockdep_trace_alloc+0xf5/0x100
[ 1637.987429] [<ffffffff810c7f55>] __alloc_pages_nodemask+0x95/0x6c0
[ 1637.987429] [<ffffffff810f71f9>] __slab_alloc_page+0xb9/0x3b0
[ 1637.987429] [<ffffffff810f8596>] kmem_cache_alloc_node+0x166/0x200
[ 1637.987429] [<ffffffff81423cba>] __alloc_skb+0x4a/0x160
[ 1637.987429] [<ffffffff81466da6>] tcp_send_fin+0x86/0x1a0
[ 1637.987429] [<ffffffff814573c0>] tcp_close+0x3f0/0x4b0
[ 1637.987429] [<ffffffff81478dd2>] inet_release+0x42/0x70
[ 1637.987429] [<ffffffff8141b2d4>] sock_release+0x24/0x90
[ 1637.987429] [<ffffffff814d8fd8>] xs_reset_transport+0xb8/0xd0
[ 1637.987429] [<ffffffff814d900d>] xs_close+0x1d/0x60
[ 1637.987429] [<ffffffff814d9082>] xs_destroy+0x32/0xa0
[ 1637.987429] [<ffffffff814d69be>] xprt_destroy+0x6e/0x90
[ 1637.987429] [<ffffffff8126ed77>] kref_put+0x37/0x70
[ 1637.987429] [<ffffffff814d6940>] xprt_put+0x10/0x20
[ 1637.987429] [<ffffffff814d5e2b>] rpc_free_client+0x8b/0x100
[ 1637.987429] [<ffffffff8126ed77>] kref_put+0x37/0x70
[ 1637.987429] [<ffffffff814d5ee1>] rpc_free_auth+0x41/0x70
[ 1637.987429] [<ffffffff8126ed77>] kref_put+0x37/0x70
[ 1637.987429] [<ffffffff814d5d5e>] rpc_release_client+0x2e/0x70
[ 1637.987429] [<ffffffff814d5f5c>] rpc_shutdown_client+0x4c/0xf0
[ 1637.987429] [<ffffffff814e52b1>] rpcb_getport_sync+0xa1/0xf0
[ 1637.987429] [<ffffffff81814f5f>] nfs_root_data+0x3a9/0x40a
[ 1637.987429] [<ffffffff817f63ae>] mount_root+0x1f/0x141
[ 1637.987429] [<ffffffff817f65c8>] prepare_namespace+0xf8/0x190
[ 1637.987429] [<ffffffff817f5728>] kernel_init+0x1b5/0x1d2
[ 1637.987429] [<ffffffff8100d0ca>] child_rip+0xa/0x20
[ 1637.987429] [<ffffffffffffffff>] 0xffffffffffffffff
[ 1637.987429] irq event stamp: 285158
[ 1637.987429] hardirqs last enabled at (285157): [<ffffffff81544cff>] _spin_unlock_irqrestore+0x3f/0x70
[ 1637.987429] hardirqs last disabled at (285156): [<ffffffff8154506d>] _spin_lock_irqsave+0x2d/0x90
[ 1637.987429] softirqs last enabled at (285152): [<ffffffff814d7f8f>] xprt_transmit+0x1bf/0x2d0
[ 1637.987429] softirqs last disabled at (285158): [<ffffffff81544f67>] _spin_lock_bh+0x17/0x70
[ 1637.987429]
[ 1637.987429] other info that might help us debug this:
[ 1637.987429] no locks held by kswapd0/387.
[ 1637.987429]
[ 1637.987429] stack backtrace:
[ 1637.987429] Pid: 387, comm: kswapd0 Not tainted 2.6.30-rc8-mm1 #299
[ 1637.987429] Call Trace:
[ 1637.987429] [<ffffffff810793bc>] print_usage_bug+0x18c/0x1f0
[ 1638.251441] [<ffffffff810798bf>] mark_lock+0x49f/0x690
[ 1638.259418] [<ffffffff8107a310>] ? check_usage_forwards+0x0/0xc0
[ 1638.267420] [<ffffffff8107b039>] __lock_acquire+0x289/0x1b40
[ 1638.267420] [<ffffffff810127f0>] ? native_sched_clock+0x20/0x80
[ 1638.277673] [<ffffffff8107c9d1>] lock_acquire+0xe1/0x120
[ 1638.277673] [<ffffffff81458972>] ? tcp_sendmsg+0x22/0xbc0
[ 1638.287418] [<ffffffff8141d9b5>] lock_sock_nested+0x105/0x120
[ 1638.287418] [<ffffffff81458972>] ? tcp_sendmsg+0x22/0xbc0
[ 1638.287418] [<ffffffff810127f0>] ? native_sched_clock+0x20/0x80
[ 1638.287418] [<ffffffff81458972>] tcp_sendmsg+0x22/0xbc0
[ 1638.287418] [<ffffffff81077944>] ? find_usage_forwards+0x94/0xd0
[ 1638.287418] [<ffffffff81077944>] ? find_usage_forwards+0x94/0xd0
[ 1638.287418] [<ffffffff8141ad2f>] sock_sendmsg+0xdf/0x110
[ 1638.287418] [<ffffffff81077944>] ? find_usage_forwards+0x94/0xd0
[ 1638.287418] [<ffffffff81066ae0>] ? autoremove_wake_function+0x0/0x40
[ 1638.287418] [<ffffffff8107a36e>] ? check_usage_forwards+0x5e/0xc0
[ 1638.287418] [<ffffffff8107966e>] ? mark_lock+0x24e/0x690
[ 1638.287418] [<ffffffff8141b0b4>] kernel_sendmsg+0x34/0x50
[ 1638.287418] [<ffffffff814d9234>] xs_send_kvec+0x94/0xa0
[ 1638.287418] [<ffffffff81079e55>] ? trace_hardirqs_on_caller+0x155/0x1a0
[ 1638.287418] [<ffffffff814d92bd>] xs_sendpages+0x7d/0x220
[ 1638.287418] [<ffffffff814d95b9>] xs_tcp_send_request+0x59/0x190
[ 1638.287418] [<ffffffff814d7e4e>] xprt_transmit+0x7e/0x2d0
[ 1638.287418] [<ffffffff814d4eb8>] call_transmit+0x1c8/0x2a0
[ 1638.287418] [<ffffffff814dccb2>] __rpc_execute+0xb2/0x2b0
[ 1638.287418] [<ffffffff814dced8>] rpc_execute+0x28/0x30
[ 1638.403414] [<ffffffff814d5b5b>] rpc_run_task+0x3b/0x80
[ 1638.403414] [<ffffffff811c7b6d>] nfs_write_rpcsetup+0x1ad/0x250
[ 1638.403414] [<ffffffff811c9b69>] nfs_flush_one+0xb9/0x100
[ 1638.419417] [<ffffffff811c3f82>] nfs_pageio_doio+0x32/0x70
[ 1638.419417] [<ffffffff811c3fc9>] nfs_pageio_complete+0x9/0x10
[ 1638.427413] [<ffffffff811c7ee5>] nfs_writepage_locked+0x85/0xc0
[ 1638.435414] [<ffffffff811c9ab0>] ? nfs_flush_one+0x0/0x100
[ 1638.435414] [<ffffffff81079e55>] ? trace_hardirqs_on_caller+0x155/0x1a0
[ 1638.435414] [<ffffffff811c8509>] nfs_writepage+0x19/0x40
[ 1638.435414] [<ffffffff810ce005>] shrink_page_list+0x675/0x810
[ 1638.435414] [<ffffffff810127f0>] ? native_sched_clock+0x20/0x80
[ 1638.435414] [<ffffffff810ce761>] shrink_list+0x301/0x650
[ 1638.435414] [<ffffffff810ced23>] shrink_zone+0x273/0x370
[ 1638.435414] [<ffffffff810cf9f9>] kswapd+0x729/0x7a0
[ 1638.435414] [<ffffffff810cca80>] ? isolate_pages_global+0x0/0x250
[ 1638.435414] [<ffffffff81066ae0>] ? autoremove_wake_function+0x0/0x40
[ 1638.435414] [<ffffffff810cf2d0>] ? kswapd+0x0/0x7a0
[ 1638.435414] [<ffffffff810666de>] kthread+0x9e/0xb0
[ 1638.435414] [<ffffffff8100d0ca>] child_rip+0xa/0x20
[ 1638.435414] [<ffffffff8100ca90>] ? restore_args+0x0/0x30
[ 1638.435414] [<ffffffff81066640>] ? kthread+0x0/0xb0
[ 1638.435414] [<ffffffff8100d0c0>] ? child_rip+0x0/0x20
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-08 2:37 sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage Wu Fengguang @ 2009-06-08 4:55 ` KOSAKI Motohiro 2009-06-08 5:00 ` Wu Fengguang [not found] ` <20090608134428.4373.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 2009-07-06 10:52 ` Herbert Xu 1 sibling, 2 replies; 17+ messages in thread From: KOSAKI Motohiro @ 2009-06-08 4:55 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA Hi > Hi, > > This lockdep warning appears when doing stress memory tests over NFS. > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > Any ideas? AFAIK, btrfs has re-dirty hack. ------------------------------------------------------------------ static int btrfs_writepage(struct page *page, struct writeback_control *wbc) { struct extent_io_tree *tree; if (current->flags & PF_MEMALLOC) { redirty_page_for_writepage(wbc, page); unlock_page(page); return 0; } tree = &BTRFS_I(page->mapping->host)->io_tree; return extent_write_full_page(tree, page, btrfs_get_extent, wbc); } --------------------------------------------------------------- PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) Can't nfs does similar hack? I'm not net nor nfs expert. perhaps I'm wrong :-) Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-08 4:55 ` KOSAKI Motohiro @ 2009-06-08 5:00 ` Wu Fengguang 2009-06-08 5:07 ` KOSAKI Motohiro [not found] ` <20090608134428.4373.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 1 sibling, 1 reply; 17+ messages in thread From: Wu Fengguang @ 2009-06-08 5:00 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-nfs@vger.kernel.org, netdev@vger.kernel.org On Mon, Jun 08, 2009 at 12:55:18PM +0800, KOSAKI Motohiro wrote: > Hi > > > Hi, > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > > > Any ideas? > > AFAIK, btrfs has re-dirty hack. > > ------------------------------------------------------------------ > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > { > struct extent_io_tree *tree; > > > if (current->flags & PF_MEMALLOC) { > redirty_page_for_writepage(wbc, page); > unlock_page(page); > return 0; > } > tree = &BTRFS_I(page->mapping->host)->io_tree; > return extent_write_full_page(tree, page, btrfs_get_extent, wbc); > } > --------------------------------------------------------------- > > PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) > Can't nfs does similar hack? But the trace shows that current is kswapd: [ 1638.403414] [<ffffffff811c9b69>] nfs_flush_one+0xb9/0x100 [ 1638.419417] [<ffffffff811c3f82>] nfs_pageio_doio+0x32/0x70 [ 1638.419417] [<ffffffff811c3fc9>] nfs_pageio_complete+0x9/0x10 [ 1638.427413] [<ffffffff811c7ee5>] nfs_writepage_locked+0x85/0xc0 [ 1638.435414] [<ffffffff811c8509>] nfs_writepage+0x19/0x40 [ 1638.435414] [<ffffffff810ce005>] shrink_page_list+0x675/0x810 [ 1638.435414] [<ffffffff810ce761>] shrink_list+0x301/0x650 [ 1638.435414] [<ffffffff810ced23>] shrink_zone+0x273/0x370 [ 1638.435414] [<ffffffff810cf9f9>] kswapd+0x729/0x7a0 [ 1638.435414] [<ffffffff810666de>] kthread+0x9e/0xb0 [ 1638.435414] [<ffffffff8100d0ca>] child_rip+0xa/0x20 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-08 5:00 ` Wu Fengguang @ 2009-06-08 5:07 ` KOSAKI Motohiro [not found] ` <20090608140529.4376.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: KOSAKI Motohiro @ 2009-06-08 5:07 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > On Mon, Jun 08, 2009 at 12:55:18PM +0800, KOSAKI Motohiro wrote: > > Hi > > > > > Hi, > > > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > > > > > Any ideas? > > > > AFAIK, btrfs has re-dirty hack. > > > > ------------------------------------------------------------------ > > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > > { > > struct extent_io_tree *tree; > > > > > > if (current->flags & PF_MEMALLOC) { > > redirty_page_for_writepage(wbc, page); > > unlock_page(page); > > return 0; > > } > > tree = &BTRFS_I(page->mapping->host)->io_tree; > > return extent_write_full_page(tree, page, btrfs_get_extent, wbc); > > } > > --------------------------------------------------------------- > > > > PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) > > Can't nfs does similar hack? > > But the trace shows that current is kswapd: > > [ 1638.403414] [<ffffffff811c9b69>] nfs_flush_one+0xb9/0x100 > [ 1638.419417] [<ffffffff811c3f82>] nfs_pageio_doio+0x32/0x70 > [ 1638.419417] [<ffffffff811c3fc9>] nfs_pageio_complete+0x9/0x10 > [ 1638.427413] [<ffffffff811c7ee5>] nfs_writepage_locked+0x85/0xc0 > [ 1638.435414] [<ffffffff811c8509>] nfs_writepage+0x19/0x40 > [ 1638.435414] [<ffffffff810ce005>] shrink_page_list+0x675/0x810 > [ 1638.435414] [<ffffffff810ce761>] shrink_list+0x301/0x650 > [ 1638.435414] [<ffffffff810ced23>] shrink_zone+0x273/0x370 > [ 1638.435414] [<ffffffff810cf9f9>] kswapd+0x729/0x7a0 > [ 1638.435414] [<ffffffff810666de>] kthread+0x9e/0xb0 > [ 1638.435414] [<ffffffff8100d0ca>] child_rip+0xa/0x20 kswapd can't hold sk-lock before calling reclaim. Thus, we don't need care its bogus warning, I think. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20090608140529.4376.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage [not found] ` <20090608140529.4376.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-06-08 5:53 ` Wu Fengguang 2009-06-08 5:56 ` Wu Fengguang 0 siblings, 1 reply; 17+ messages in thread From: Wu Fengguang @ 2009-06-08 5:53 UTC (permalink / raw) To: KOSAKI Motohiro Cc: LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon, Jun 08, 2009 at 01:07:26PM +0800, KOSAKI Motohiro wrote: > > On Mon, Jun 08, 2009 at 12:55:18PM +0800, KOSAKI Motohiro wrote: > > > Hi > > > > > > > Hi, > > > > > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > > > > > > > Any ideas? > > > > > > AFAIK, btrfs has re-dirty hack. > > > > > > ------------------------------------------------------------------ > > > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > > > { > > > struct extent_io_tree *tree; > > > > > > > > > if (current->flags & PF_MEMALLOC) { > > > redirty_page_for_writepage(wbc, page); > > > unlock_page(page); > > > return 0; > > > } > > > tree = &BTRFS_I(page->mapping->host)->io_tree; > > > return extent_write_full_page(tree, page, btrfs_get_extent, wbc); > > > } > > > --------------------------------------------------------------- > > > > > > PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) > > > Can't nfs does similar hack? > > > > But the trace shows that current is kswapd: > > > > [ 1638.403414] [<ffffffff811c9b69>] nfs_flush_one+0xb9/0x100 > > [ 1638.419417] [<ffffffff811c3f82>] nfs_pageio_doio+0x32/0x70 > > [ 1638.419417] [<ffffffff811c3fc9>] nfs_pageio_complete+0x9/0x10 > > [ 1638.427413] [<ffffffff811c7ee5>] nfs_writepage_locked+0x85/0xc0 > > [ 1638.435414] [<ffffffff811c8509>] nfs_writepage+0x19/0x40 > > [ 1638.435414] [<ffffffff810ce005>] shrink_page_list+0x675/0x810 > > [ 1638.435414] [<ffffffff810ce761>] shrink_list+0x301/0x650 > > [ 1638.435414] [<ffffffff810ced23>] shrink_zone+0x273/0x370 > > [ 1638.435414] [<ffffffff810cf9f9>] kswapd+0x729/0x7a0 > > [ 1638.435414] [<ffffffff810666de>] kthread+0x9e/0xb0 > > [ 1638.435414] [<ffffffff8100d0ca>] child_rip+0xa/0x20 > > kswapd can't hold sk-lock before calling reclaim. Thus, we don't need > care its bogus warning, I think. Right. Although this path is possible: tcp_sendmsg() => page reclaim => tcp_send_fin() But it won't happen for the same socket, so one sk_lock won't be grabbed twice and go deadlock. So it's a harmful warning for both direct/background page reclaims? Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-08 5:53 ` Wu Fengguang @ 2009-06-08 5:56 ` Wu Fengguang 2009-06-08 6:12 ` KOSAKI Motohiro 0 siblings, 1 reply; 17+ messages in thread From: Wu Fengguang @ 2009-06-08 5:56 UTC (permalink / raw) To: KOSAKI Motohiro Cc: LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon, Jun 08, 2009 at 01:53:26PM +0800, Wu Fengguang wrote: > On Mon, Jun 08, 2009 at 01:07:26PM +0800, KOSAKI Motohiro wrote: > > > On Mon, Jun 08, 2009 at 12:55:18PM +0800, KOSAKI Motohiro wrote: > > > > Hi > > > > > > > > > Hi, > > > > > > > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > > > > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > > > > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > > > > > > > > > Any ideas? > > > > > > > > AFAIK, btrfs has re-dirty hack. > > > > > > > > ------------------------------------------------------------------ > > > > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > > > > { > > > > struct extent_io_tree *tree; > > > > > > > > > > > > if (current->flags & PF_MEMALLOC) { > > > > redirty_page_for_writepage(wbc, page); > > > > unlock_page(page); > > > > return 0; > > > > } > > > > tree = &BTRFS_I(page->mapping->host)->io_tree; > > > > return extent_write_full_page(tree, page, btrfs_get_extent, wbc); > > > > } > > > > --------------------------------------------------------------- > > > > > > > > PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) > > > > Can't nfs does similar hack? > > > > > > But the trace shows that current is kswapd: > > > > > > [ 1638.403414] [<ffffffff811c9b69>] nfs_flush_one+0xb9/0x100 > > > [ 1638.419417] [<ffffffff811c3f82>] nfs_pageio_doio+0x32/0x70 > > > [ 1638.419417] [<ffffffff811c3fc9>] nfs_pageio_complete+0x9/0x10 > > > [ 1638.427413] [<ffffffff811c7ee5>] nfs_writepage_locked+0x85/0xc0 > > > [ 1638.435414] [<ffffffff811c8509>] nfs_writepage+0x19/0x40 > > > [ 1638.435414] [<ffffffff810ce005>] shrink_page_list+0x675/0x810 > > > [ 1638.435414] [<ffffffff810ce761>] shrink_list+0x301/0x650 > > > [ 1638.435414] [<ffffffff810ced23>] shrink_zone+0x273/0x370 > > > [ 1638.435414] [<ffffffff810cf9f9>] kswapd+0x729/0x7a0 > > > [ 1638.435414] [<ffffffff810666de>] kthread+0x9e/0xb0 > > > [ 1638.435414] [<ffffffff8100d0ca>] child_rip+0xa/0x20 > > > > kswapd can't hold sk-lock before calling reclaim. Thus, we don't need > > care its bogus warning, I think. > > Right. Although this path is possible: > tcp_sendmsg() => page reclaim => tcp_send_fin() > But it won't happen for the same socket, so one sk_lock won't be > grabbed twice and go deadlock. > > So it's a harmful warning for both direct/background page reclaims? btw, can anyone explain these NFS warnings? It happens in a very memory tight and busy nfsroot system. [ 113.267340] NFS: Server wrote zero bytes, expected 3671. [ 423.202607] NFS: Server wrote zero bytes, expected 108. [ 723.588411] NFS: Server wrote zero bytes, expected 560. [ 1060.246747] NFS: Server wrote zero bytes, expected 54. [ 1397.841183] NFS: Server wrote zero bytes, expected 402. [ 1779.545035] NFS: Server wrote zero bytes, expected 319. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-08 5:56 ` Wu Fengguang @ 2009-06-08 6:12 ` KOSAKI Motohiro 0 siblings, 0 replies; 17+ messages in thread From: KOSAKI Motohiro @ 2009-06-08 6:12 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > btw, can anyone explain these NFS warnings? It happens in a very > memory tight and busy nfsroot system. > > [ 113.267340] NFS: Server wrote zero bytes, expected 3671. > [ 423.202607] NFS: Server wrote zero bytes, expected 108. > [ 723.588411] NFS: Server wrote zero bytes, expected 560. > [ 1060.246747] NFS: Server wrote zero bytes, expected 54. > [ 1397.841183] NFS: Server wrote zero bytes, expected 402. > [ 1779.545035] NFS: Server wrote zero bytes, expected 319. server side write function is below ----------------------------------------------------------- static __be32 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, loff_t offset, struct kvec *vec, int vlen, unsigned long *cnt, int *stablep) { (snip) host_err = vfs_writev(file, (struct iovec __user *)vec, vlen, &offset); //(1) (snip) /* * Gathered writes: If another process is currently * writing to the file, there's a high chance * this is another nfsd (triggered by a bulk write * from a client's biod). Rather than syncing the * file with each write request, we sleep for 10 msec. * * I don't know if this roughly approximates * C. Juszak's idea of gathered writes, but it's a * nice and simple solution (IMHO), and it seems to * work:-) */ if (EX_WGATHER(exp)) { (snip) if (inode->i_state & I_DIRTY) { dprintk("nfsd: write sync %d\n", task_pid_nr(current)); host_err=nfsd_sync(file); // (2) } } (snip) dprintk("nfsd: write complete host_err=%d\n", host_err); if (host_err >= 0) { err = 0; *cnt = host_err; } else err = nfserrno(host_err); out: return err; } --------------------------------------------------------------------------- if (1) or (2) makes host_err == 0, it makes your warning messages. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20090608134428.4373.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>]
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage [not found] ` <20090608134428.4373.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org> @ 2009-06-09 3:07 ` Wu Fengguang 2009-06-09 3:15 ` KOSAKI Motohiro 0 siblings, 1 reply; 17+ messages in thread From: Wu Fengguang @ 2009-06-09 3:07 UTC (permalink / raw) To: KOSAKI Motohiro Cc: LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org On Mon, Jun 08, 2009 at 12:55:18PM +0800, KOSAKI Motohiro wrote: > Hi > > > Hi, > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > > > Any ideas? > > AFAIK, btrfs has re-dirty hack. > > ------------------------------------------------------------------ > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > { > struct extent_io_tree *tree; > > > if (current->flags & PF_MEMALLOC) { > redirty_page_for_writepage(wbc, page); > unlock_page(page); > return 0; > } > tree = &BTRFS_I(page->mapping->host)->io_tree; > return extent_write_full_page(tree, page, btrfs_get_extent, wbc); > } > --------------------------------------------------------------- > > PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) No, kswapd also sets the PF_MEMALLOC flag. It looks like btrfs_writepage() is trying to avoid inefficient page outs at the cost of pinning dirty pages in memory (even when we really want free pages). Thanks, Fengguang > Can't nfs does similar hack? > > > I'm not net nor nfs expert. perhaps I'm wrong :-) > > > Thanks. > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-09 3:07 ` Wu Fengguang @ 2009-06-09 3:15 ` KOSAKI Motohiro 0 siblings, 0 replies; 17+ messages in thread From: KOSAKI Motohiro @ 2009-06-09 3:15 UTC (permalink / raw) To: Wu Fengguang Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, LKML, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-btrfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, chris.mason-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org > On Mon, Jun 08, 2009 at 12:55:18PM +0800, KOSAKI Motohiro wrote: > > Hi > > > > > Hi, > > > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > > > > > Any ideas? > > > > AFAIK, btrfs has re-dirty hack. > > > > ------------------------------------------------------------------ > > static int btrfs_writepage(struct page *page, struct writeback_control *wbc) > > { > > struct extent_io_tree *tree; > > > > > > if (current->flags & PF_MEMALLOC) { > > redirty_page_for_writepage(wbc, page); > > unlock_page(page); > > return 0; > > } > > tree = &BTRFS_I(page->mapping->host)->io_tree; > > return extent_write_full_page(tree, page, btrfs_get_extent, wbc); > > } > > --------------------------------------------------------------- > > > > PF_MEMALLOC mean caller is try_to_free_pages(). (not normal write nor kswapd) > > No, kswapd also sets the PF_MEMALLOC flag. It looks like btrfs_writepage() > is trying to avoid inefficient page outs at the cost of pinning dirty > pages in memory (even when we really want free pages). Sorry, I was confused ;) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-06-08 2:37 sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage Wu Fengguang 2009-06-08 4:55 ` KOSAKI Motohiro @ 2009-07-06 10:52 ` Herbert Xu 2009-07-09 13:17 ` Wu Fengguang 1 sibling, 1 reply; 17+ messages in thread From: Herbert Xu @ 2009-07-06 10:52 UTC (permalink / raw) To: Wu Fengguang Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > This lockdep warning appears when doing stress memory tests over NFS. > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim Well perhaps not this particular path, but it is certainly possible if an existing NFS socket dies and NFS tries to reestablish it. I suggest that NFS should utilise the sk_allocation field and set an appropriate value. Note that you may have to patch TCP so that it uses sk_allocation everywhere necessary, e.g., in tcp_send_fin. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-07-06 10:52 ` Herbert Xu @ 2009-07-09 13:17 ` Wu Fengguang 2009-07-10 0:13 ` David Miller 0 siblings, 1 reply; 17+ messages in thread From: Wu Fengguang @ 2009-07-09 13:17 UTC (permalink / raw) To: Herbert Xu Cc: linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, netdev@vger.kernel.org, David S. Miller On Mon, Jul 06, 2009 at 06:52:16PM +0800, Herbert Xu wrote: > Wu Fengguang <fengguang.wu@intel.com> wrote: > > > > This lockdep warning appears when doing stress memory tests over NFS. > > > > page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock > > > > tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim > > Well perhaps not this particular path, but it is certainly possible > if an existing NFS socket dies and NFS tries to reestablish it. > > I suggest that NFS should utilise the sk_allocation field and > set an appropriate value. Note that you may have to patch TCP > so that it uses sk_allocation everywhere necessary, e.g., in > tcp_send_fin. Good suggestion! NFS already sets sk_allocation to GFP_ATOMIC in linux/net/sunrpc/xprtsock.c <<xs_tcp_finish_connecting>>. To fix this warning and possible recursions, I converted some GPF_KERNEL cases to sk_allocation in the tcp/ipv4 code: --- tcp: replace hard coded GFP_KERNEL with sk_allocation This fixed a lockdep warning which appeared when doing stress memory tests over NFS: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. mount_root => nfs_root_data => tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: David S. Miller <davem@davemloft.net> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> --- net/ipv4/tcp.c | 4 ++-- net/ipv4/tcp_ipv4.c | 5 +++-- net/ipv4/tcp_output.c | 5 +++-- 3 files changed, 8 insertions(+), 6 deletions(-) --- linux.orig/net/ipv4/tcp_ipv4.c +++ linux/net/ipv4/tcp_ipv4.c @@ -970,8 +970,9 @@ static int tcp_v4_parse_md5_keys(struct if (!tcp_sk(sk)->md5sig_info) { struct tcp_sock *tp = tcp_sk(sk); - struct tcp_md5sig_info *p = kzalloc(sizeof(*p), GFP_KERNEL); + struct tcp_md5sig_info *p; + p = kzalloc(sizeof(*p), sk->sk_allocation); if (!p) return -EINVAL; @@ -979,7 +980,7 @@ static int tcp_v4_parse_md5_keys(struct sk->sk_route_caps &= ~NETIF_F_GSO_MASK; } - newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, GFP_KERNEL); + newkey = kmemdup(cmd.tcpm_key, cmd.tcpm_keylen, sk->sk_allocation); if (!newkey) return -ENOMEM; return tcp_v4_md5_do_add(sk, sin->sin_addr.s_addr, --- linux.orig/net/ipv4/tcp_output.c +++ linux/net/ipv4/tcp_output.c @@ -2100,7 +2100,8 @@ void tcp_send_fin(struct sock *sk) } else { /* Socket is locked, keep trying until memory is available. */ for (;;) { - skb = alloc_skb_fclone(MAX_TCP_HEADER, GFP_KERNEL); + skb = alloc_skb_fclone(MAX_TCP_HEADER, + sk->sk_allocation); if (skb) break; yield(); @@ -2358,7 +2359,7 @@ int tcp_connect(struct sock *sk) sk->sk_wmem_queued += buff->truesize; sk_mem_charge(sk, buff->truesize); tp->packets_out += tcp_skb_pcount(buff); - tcp_transmit_skb(sk, buff, 1, GFP_KERNEL); + tcp_transmit_skb(sk, buff, 1, sk->sk_allocation); /* We change tp->snd_nxt after the tcp_transmit_skb() call * in order to make this packet get counted in tcpOutSegs. --- linux.orig/net/ipv4/tcp.c +++ linux/net/ipv4/tcp.c @@ -1834,7 +1834,7 @@ void tcp_close(struct sock *sk, long tim /* Unread data was tossed, zap the connection. */ NET_INC_STATS_USER(sock_net(sk), LINUX_MIB_TCPABORTONCLOSE); tcp_set_state(sk, TCP_CLOSE); - tcp_send_active_reset(sk, GFP_KERNEL); + tcp_send_active_reset(sk, sk->sk_allocation); } else if (sock_flag(sk, SOCK_LINGER) && !sk->sk_lingertime) { /* Check zero linger _after_ checking for unread data. */ sk->sk_prot->disconnect(sk, 0); @@ -2666,7 +2666,7 @@ static struct tcp_md5sig_pool **__tcp_al struct tcp_md5sig_pool *p; struct crypto_hash *hash; - p = kzalloc(sizeof(*p), GFP_KERNEL); + p = kzalloc(sizeof(*p), sk->sk_allocation); if (!p) goto out_free; *per_cpu_ptr(pool, cpu) = p; ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-07-09 13:17 ` Wu Fengguang @ 2009-07-10 0:13 ` David Miller [not found] ` <20090709.171355.09466097.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2009-07-10 0:13 UTC (permalink / raw) To: fengguang.wu-ral2JQCrhuEAvxtiuMwx3w Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Date: Thu, 9 Jul 2009 21:17:46 +0800 > @@ -2100,7 +2100,8 @@ void tcp_send_fin(struct sock *sk) > } else { > /* Socket is locked, keep trying until memory is available. */ > for (;;) { > - skb = alloc_skb_fclone(MAX_TCP_HEADER, GFP_KERNEL); > + skb = alloc_skb_fclone(MAX_TCP_HEADER, > + sk->sk_allocation); > if (skb) > break; > yield(); I think this specific case needs more thinking. If the allocation fails, and it's GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting for the allocation to succeed. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20090709.171355.09466097.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>]
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage [not found] ` <20090709.171355.09466097.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> @ 2009-07-10 0:59 ` Herbert Xu 2009-07-10 8:00 ` Wu Fengguang 1 sibling, 0 replies; 17+ messages in thread From: Herbert Xu @ 2009-07-10 0:59 UTC (permalink / raw) To: David Miller Cc: fengguang.wu-ral2JQCrhuEAvxtiuMwx3w, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA On Thu, Jul 09, 2009 at 05:13:55PM -0700, David Miller wrote: > > I think this specific case needs more thinking. > > If the allocation fails, and it's GFP_ATOMIC, we are going to yield() > (which sleeps) and loop endlessly waiting for the allocation to > succeed. Indeed. We could do one of the following: 1) Preallocate, either universally or conditinally on sk_allocation. 2) Set a bit somewhere to indicate FIN and retry the allocation as part of retransmit. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage [not found] ` <20090709.171355.09466097.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> 2009-07-10 0:59 ` Herbert Xu @ 2009-07-10 8:00 ` Wu Fengguang 2009-07-10 8:02 ` Herbert Xu 1 sibling, 1 reply; 17+ messages in thread From: Wu Fengguang @ 2009-07-10 8:00 UTC (permalink / raw) To: David Miller Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Fri, Jul 10, 2009 at 08:13:55AM +0800, David Miller wrote: > From: Wu Fengguang <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> > Date: Thu, 9 Jul 2009 21:17:46 +0800 > > > @@ -2100,7 +2100,8 @@ void tcp_send_fin(struct sock *sk) > > } else { > > /* Socket is locked, keep trying until memory is available. */ > > for (;;) { > > - skb = alloc_skb_fclone(MAX_TCP_HEADER, GFP_KERNEL); > > + skb = alloc_skb_fclone(MAX_TCP_HEADER, > > + sk->sk_allocation); > > if (skb) > > break; > > yield(); > > I think this specific case needs more thinking. > > If the allocation fails, and it's GFP_ATOMIC, we are going to yield() > (which sleeps) and loop endlessly waiting for the allocation to > succeed. The _retried_ GFP_ATOMIC won't be much worse than GFP_KERNEL. GFP_KERNEL can directly reclaim FS pages; GFP_ATOMIC will wake up kswapd to do that. So after yield(), GFP_ATOMIC have good opportunity to succeed if GFP_KERNEL could succeed. The original GFP_KERNEL does have _a bit_ better chance to succeed, but there are no guarantee. It could loop endlessly whether it be GFP_KERNEL or GFP_ATOMIC. btw, generally speaking, it would be more robust that NFS set sk_allocation to GFP_NOIO, and let the networking code choose whether to use plain sk_allocation or (sk_allocation & ~__GFP_WAIT). The (sk_allocation & ~__GFP_WAIT) cases should be rare, but I guess the networking code shall do it anyway, because sk_allocation defaults to GFP_KERNEL. It seems that currently the networking code simply uses a lot of GFP_ATOMIC, do they really mean "I cannot sleep"? Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage 2009-07-10 8:00 ` Wu Fengguang @ 2009-07-10 8:02 ` Herbert Xu [not found] ` <20090710080247.GA2693-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Herbert Xu @ 2009-07-10 8:02 UTC (permalink / raw) To: Wu Fengguang Cc: David Miller, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, netdev@vger.kernel.org On Fri, Jul 10, 2009 at 04:00:17PM +0800, Wu Fengguang wrote: > > The (sk_allocation & ~__GFP_WAIT) cases should be rare, but I guess > the networking code shall do it anyway, because sk_allocation defaults > to GFP_KERNEL. It seems that currently the networking code simply uses > a lot of GFP_ATOMIC, do they really mean "I cannot sleep"? Yep because they're done from softirq context. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20090710080247.GA2693-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>]
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage [not found] ` <20090710080247.GA2693-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> @ 2009-07-14 16:04 ` David Miller [not found] ` <20090714.090432.13343695.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: David Miller @ 2009-07-14 16:04 UTC (permalink / raw) To: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q Cc: fengguang.wu-ral2JQCrhuEAvxtiuMwx3w, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA From: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> Date: Fri, 10 Jul 2009 16:02:47 +0800 > On Fri, Jul 10, 2009 at 04:00:17PM +0800, Wu Fengguang wrote: >> >> The (sk_allocation & ~__GFP_WAIT) cases should be rare, but I guess >> the networking code shall do it anyway, because sk_allocation defaults >> to GFP_KERNEL. It seems that currently the networking code simply uses >> a lot of GFP_ATOMIC, do they really mean "I cannot sleep"? > > Yep because they're done from softirq context. Yes, this is the core issue. All of Wu's talk about how "GFP_ATOMIC will wake up kswapd and therefore can succeed just as well as GFP_KERNEL" is not relevant, because GFP_ATOMIC means sleeping is not allowed. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20090714.090432.13343695.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>]
* Re: sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage [not found] ` <20090714.090432.13343695.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> @ 2009-07-15 7:45 ` Wu Fengguang 0 siblings, 0 replies; 17+ messages in thread From: Wu Fengguang @ 2009-07-15 7:45 UTC (permalink / raw) To: David Miller Cc: herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Jul 15, 2009 at 12:04:32AM +0800, David Miller wrote: > From: Herbert Xu <herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org> > Date: Fri, 10 Jul 2009 16:02:47 +0800 > > > On Fri, Jul 10, 2009 at 04:00:17PM +0800, Wu Fengguang wrote: > >> > >> The (sk_allocation & ~__GFP_WAIT) cases should be rare, but I guess > >> the networking code shall do it anyway, because sk_allocation defaults > >> to GFP_KERNEL. It seems that currently the networking code simply uses > >> a lot of GFP_ATOMIC, do they really mean "I cannot sleep"? > > > > Yep because they're done from softirq context. > > Yes, this is the core issue. Yes, that's general true. But.. > All of Wu's talk about how "GFP_ATOMIC will wake up kswapd and > therefore can succeed just as well as GFP_KERNEL" is not relevant, > because GFP_ATOMIC means sleeping is not allowed. We are talking about tcp_send_fin() here, which can sleep. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-07-15 7:45 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-08 2:37 sk_lock: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage Wu Fengguang
2009-06-08 4:55 ` KOSAKI Motohiro
2009-06-08 5:00 ` Wu Fengguang
2009-06-08 5:07 ` KOSAKI Motohiro
[not found] ` <20090608140529.4376.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-06-08 5:53 ` Wu Fengguang
2009-06-08 5:56 ` Wu Fengguang
2009-06-08 6:12 ` KOSAKI Motohiro
[not found] ` <20090608134428.4373.A69D9226-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
2009-06-09 3:07 ` Wu Fengguang
2009-06-09 3:15 ` KOSAKI Motohiro
2009-07-06 10:52 ` Herbert Xu
2009-07-09 13:17 ` Wu Fengguang
2009-07-10 0:13 ` David Miller
[not found] ` <20090709.171355.09466097.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2009-07-10 0:59 ` Herbert Xu
2009-07-10 8:00 ` Wu Fengguang
2009-07-10 8:02 ` Herbert Xu
[not found] ` <20090710080247.GA2693-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q@public.gmane.org>
2009-07-14 16:04 ` David Miller
[not found] ` <20090714.090432.13343695.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2009-07-15 7:45 ` Wu Fengguang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).