From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx117.postini.com [74.125.245.117]) by kanga.kvack.org (Postfix) with SMTP id 09C7D6B005D for ; Wed, 30 May 2012 12:33:22 -0400 (EDT) Date: Wed, 30 May 2012 12:33:17 -0400 From: Dave Jones Subject: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120530163317.GA13189@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-linux-mm@kvack.org List-ID: To: Linux Kernel Cc: linux-mm@kvack.org Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc ------------[ cut here ]------------ WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Hardware name: Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] __set_page_dirty_nobuffers+0x13a/0x170 [] migrate_page_copy+0x1e2/0x260 [] migrate_page+0x5b/0x70 [] move_to_new_page+0xa5/0x260 [] migrate_pages+0x4c8/0x540 [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 [] compact_zone+0x216/0x480 [] ? debug_check_no_obj_freed+0x88/0x210 [] compact_zone_order+0x8d/0xd0 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 [] __alloc_pages_nodemask+0x60b/0xab0 [] ? debug_check_no_obj_freed+0x16c/0x210 [] alloc_pages_vma+0xb6/0x190 [] khugepaged+0x95d/0x1570 [] ? wake_up_bit+0x40/0x40 [] ? collect_mm_slot+0xa0/0xa0 [] kthread+0xb7/0xc0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0xe/0xe [] ? flush_kthread_worker+0x190/0x190 [] ? gs_change+0xb/0xb ---[ end trace 4324bd0bca27f6f0 ]--- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx198.postini.com [74.125.245.198]) by kanga.kvack.org (Postfix) with SMTP id BC15D6B005D for ; Wed, 30 May 2012 20:57:43 -0400 (EDT) Date: Wed, 30 May 2012 20:57:40 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120531005739.GA4532@redhat.com> References: <20120530163317.GA13189@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120530163317.GA13189@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Linux Kernel Cc: linux-mm@kvack.org, Andrew Morton On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_null+0x1a/0x20 > [] __set_page_dirty_nobuffers+0x13a/0x170 > [] migrate_page_copy+0x1e2/0x260 > [] migrate_page+0x5b/0x70 > [] move_to_new_page+0xa5/0x260 > [] migrate_pages+0x4c8/0x540 > [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > [] compact_zone+0x216/0x480 > [] ? debug_check_no_obj_freed+0x88/0x210 > [] compact_zone_order+0x8d/0xd0 > [] try_to_compact_pages+0xc9/0x140 > [] __alloc_pages_direct_compact+0xaa/0x1d0 > [] __alloc_pages_nodemask+0x60b/0xab0 > [] ? debug_check_no_obj_freed+0x16c/0x210 > [] alloc_pages_vma+0xb6/0x190 > [] khugepaged+0x95d/0x1570 > [] ? wake_up_bit+0x40/0x40 > [] ? collect_mm_slot+0xa0/0xa0 > [] kthread+0xb7/0xc0 > [] kernel_thread_helper+0x4/0x10 > [] ? retint_restore_args+0xe/0xe > [] ? flush_kthread_worker+0x190/0x190 > [] ? gs_change+0xb/0xb Seems this can be triggered from mmap, as well as from khugepaged.. WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] __set_page_dirty_nobuffers+0x13a/0x170 [] migrate_page_copy+0x1e2/0x260 [] migrate_page+0x5b/0x70 [] move_to_new_page+0xa5/0x260 [] migrate_pages+0x4c8/0x540 [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 [] compact_zone+0x216/0x480 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] compact_zone_order+0x8d/0xd0 [] ? get_page_from_freelist+0x565/0x970 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 [] __alloc_pages_nodemask+0x60b/0xab0 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] ? __lock_acquire+0x2b0/0x1aa0 [] alloc_pages_vma+0xb6/0x190 [] do_huge_pmd_anonymous_page+0x133/0x310 [] handle_mm_fault+0x242/0x2e0 [] __get_user_pages+0x142/0x560 [] ? mmap_region+0x3f8/0x630 [] get_user_pages+0x52/0x60 [] make_pages_present+0x92/0xc0 [] mmap_region+0x3a6/0x630 [] ? do_setitimer+0x1cc/0x310 [] do_mmap_pgoff+0x35d/0x3b0 [] ? sys_mmap_pgoff+0x66/0x240 [] sys_mmap_pgoff+0x84/0x240 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] sys_mmap+0x22/0x30 [] system_call_fastpath+0x16/0x1b ---[ end trace 336c91f371296e41 ]--- I'd bisect this, but it takes a few hours to trigger, which makes it hard to distinguish between 'good kernel' and 'hasn't triggered yet'. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id E35A66B004D for ; Thu, 31 May 2012 22:31:12 -0400 (EDT) Date: Thu, 31 May 2012 22:31:07 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601023107.GA19445@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120531005739.GA4532@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Linux Kernel , linux-mm@kvack.org, Andrew Morton , Linus Torvalds , Hugh Dickins , Cong Wang On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote: > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] > > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 > > Call Trace: > > [] warn_slowpath_common+0x7f/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] __set_page_dirty_nobuffers+0x13a/0x170 > > [] migrate_page_copy+0x1e2/0x260 > > [] migrate_page+0x5b/0x70 > > [] move_to_new_page+0xa5/0x260 > > [] migrate_pages+0x4c8/0x540 > > [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > > [] compact_zone+0x216/0x480 > > [] ? debug_check_no_obj_freed+0x88/0x210 > > [] compact_zone_order+0x8d/0xd0 > > [] try_to_compact_pages+0xc9/0x140 > > [] __alloc_pages_direct_compact+0xaa/0x1d0 > > [] __alloc_pages_nodemask+0x60b/0xab0 > > [] ? debug_check_no_obj_freed+0x16c/0x210 > > [] alloc_pages_vma+0xb6/0x190 > > [] khugepaged+0x95d/0x1570 > > [] ? wake_up_bit+0x40/0x40 > > [] ? collect_mm_slot+0xa0/0xa0 > > [] kthread+0xb7/0xc0 > > [] kernel_thread_helper+0x4/0x10 > > [] ? retint_restore_args+0xe/0xe > > [] ? flush_kthread_worker+0x190/0x190 > > [] ? gs_change+0xb/0xb > > Seems this can be triggered from mmap, as well as from khugepaged.. > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_null+0x1a/0x20 > [] __set_page_dirty_nobuffers+0x13a/0x170 > [] migrate_page_copy+0x1e2/0x260 > [] migrate_page+0x5b/0x70 > [] move_to_new_page+0xa5/0x260 > [] migrate_pages+0x4c8/0x540 > [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > [] compact_zone+0x216/0x480 > [] ? trace_hardirqs_off_caller+0x28/0xc0 > [] compact_zone_order+0x8d/0xd0 > [] ? get_page_from_freelist+0x565/0x970 > [] try_to_compact_pages+0xc9/0x140 > [] __alloc_pages_direct_compact+0xaa/0x1d0 > [] __alloc_pages_nodemask+0x60b/0xab0 > [] ? trace_hardirqs_off_caller+0x28/0xc0 > [] ? __lock_acquire+0x2b0/0x1aa0 > [] alloc_pages_vma+0xb6/0x190 > [] do_huge_pmd_anonymous_page+0x133/0x310 > [] handle_mm_fault+0x242/0x2e0 > [] __get_user_pages+0x142/0x560 > [] ? mmap_region+0x3f8/0x630 > [] get_user_pages+0x52/0x60 > [] make_pages_present+0x92/0xc0 > [] mmap_region+0x3a6/0x630 > [] ? do_setitimer+0x1cc/0x310 > [] do_mmap_pgoff+0x35d/0x3b0 > [] ? sys_mmap_pgoff+0x66/0x240 > [] sys_mmap_pgoff+0x84/0x240 > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [] sys_mmap+0x22/0x30 > [] system_call_fastpath+0x16/0x1b > ---[ end trace 336c91f371296e41 ]--- > > > > I'd bisect this, but it takes a few hours to trigger, which makes it hard > to distinguish between 'good kernel' and 'hasn't triggered yet'. So I bisected it anyway, and it led to ... 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit commit 3f31d07571eeea18a7d34db9af21d2285b807a17 Author: Hugh Dickins Date: Tue May 29 15:06:40 2012 -0700 mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE Now tmpfs supports hole-punching via fallocate(), switch madvise_remove() to use do_fallocate() instead of vmtruncate_range(): which extends madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs. There is one more user of vmtruncate_range() in our tree, staging/android's ashmem_shrink(): convert it to use do_fallocate() too (but if its unpinned areas are already unmapped - I don't know - then it would do better to use shmem_truncate_range() directly). Based-on-patch-by: Cong Wang Signed-off-by: Hugh Dickins Cc: Christoph Hellwig Cc: Al Viro Cc: Colin Cross Cc: John Stultz Cc: Greg Kroah-Hartman Cc: "Theodore Ts'o" Cc: Andreas Dilger Cc: Mark Fasheh Cc: Joel Becker Cc: Dave Chinner Cc: Ben Myers Cc: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Hugh ? I'll repeat the bisect tomorrow just to be sure. (It took all day, even though there were only a half dozen bisect points, as I ran the test for an hour on each build to see what fell out). Here's what I found.. git bisect start 'mm/' # bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4 git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc # good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682 # bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec git bisect bad 89abfab133ef1f5902abafb744df72793213ac19 # bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3 # good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605 # bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17 # good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing git bisect good ec9516fbc5fa814014991e1ae7f8860127122105 # good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe This has been a challenge to bisect additionally because I'm not sure if the other mm bug I reported in the last few days (the list_debug/list_add corruption warnings in the compaction code) are related or not. Sometimes during the bisect these errors happened in pairs, sometimes only together. The 'good' builds showed no errors at all. As a reminder, the list_add corruption looks like this... WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90() list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920). Modules linked in: ipt_ULOG fuse tun nfnetlink binfmt_misc sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw i2c_i801 pcspkr e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_fmt+0x46/0x50 [] ? trace_hardirqs_on+0xd/0x10 [] __list_add+0x6c/0x90 [] move_freepages_block+0x16d/0x190 [] suitable_migration_target.isra.14+0x1b3/0x1d0 [] compaction_alloc+0x1db/0x2f0 [] migrate_pages+0xc7/0x540 [] ? isolate_freepages_block+0x260/0x260 [] compact_zone+0x216/0x480 [] compact_zone_order+0x8d/0xd0 [] ? get_page_from_freelist+0x565/0x970 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 [] __alloc_pages_nodemask+0x60b/0xab0 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] ? __lock_acquire+0x2f0/0x1aa0 [] alloc_pages_vma+0xb6/0x190 [] do_huge_pmd_anonymous_page+0x133/0x310 [] handle_mm_fault+0x242/0x2e0 [] __get_user_pages+0x142/0x560 [] ? mmap_region+0x3f8/0x630 [] get_user_pages+0x52/0x60 [] make_pages_present+0x92/0xc0 [] mmap_region+0x3a6/0x630 [] ? do_setitimer+0x1cc/0x310 [] do_mmap_pgoff+0x35d/0x3b0 [] ? sys_mmap_pgoff+0x66/0x240 [] sys_mmap_pgoff+0x84/0x240 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] sys_mmap+0x22/0x30 [] system_call_fastpath+0x16/0x1b ---[ end trace b606ea2a53bf1425 ]--- On an affected kernel, it'll show up within an hour of fuzzing on a fast machine. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id E0D156B004D for ; Thu, 31 May 2012 22:43:47 -0400 (EDT) Received: by wefh52 with SMTP id h52so1399293wef.14 for ; Thu, 31 May 2012 19:43:46 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20120601023107.GA19445@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> From: Linus Torvalds Date: Thu, 31 May 2012 19:43:25 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , Linux Kernel , linux-mm@kvack.org, Andrew Morton , Linus Torvalds , Hugh Dickins , Cong Wang On Thu, May 31, 2012 at 7:31 PM, Dave Jones wrote: > > So I bisected it anyway, and it led to ... Ok, that doesn't sound entirely unlikely, but considering that you're nervous about the bisection, please just try to revert it and see if that fixes your testcase. You'll obviously need to revert the commit that removes vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce the use of it (it's the next one: 17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two commits revert cleanly and the end result seems to compile ok. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id C84956B004D for ; Fri, 1 Jun 2012 04:45:11 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so3348677pbb.14 for ; Fri, 01 Jun 2012 01:45:11 -0700 (PDT) Date: Fri, 1 Jun 2012 01:44:44 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120601023107.GA19445@redhat.com> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Thu, 31 May 2012, Dave Jones wrote: > On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote: > > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > > > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() I did see your reports, and noted to come back to them, but sad to say I hadn't even made time to check out line 1990 of mm/page-writeback.c: ah, that WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > > > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 > > > Call Trace: > > > [] __set_page_dirty_nobuffers+0x13a/0x170 > > > [] migrate_page_copy+0x1e2/0x260 > > > > Seems this can be triggered from mmap, as well as from khugepaged.. > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38 > > Call Trace: > > [] __set_page_dirty_nobuffers+0x13a/0x170 > > [] migrate_page_copy+0x1e2/0x260 > > > > I'd bisect this, but it takes a few hours to trigger, which makes it hard > > to distinguish between 'good kernel' and 'hasn't triggered yet'. > > So I bisected it anyway, and it led to ... Thanks so much for taking the trouble. > > 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit > commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > Author: Hugh Dickins > Date: Tue May 29 15:06:40 2012 -0700 > > mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE > > Now tmpfs supports hole-punching via fallocate(), switch madvise_remove() > to use do_fallocate() instead of vmtruncate_range(): which extends > madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs. > > Hugh ? Ow, you've caught me. > > I'll repeat the bisect tomorrow just to be sure. (It took all day, even though > there were only a half dozen bisect points, as I ran the test for an hour on > each build to see what fell out). > > Here's what I found.. > > git bisect start 'mm/' > # bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux > git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef > # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4 > git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc > # good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata > git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682 > # bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec > git bisect bad 89abfab133ef1f5902abafb744df72793213ac19 > # bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE > git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3 > # good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone > git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605 > # bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE > git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17 > # good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing > git bisect good ec9516fbc5fa814014991e1ae7f8860127122105 > # good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE > git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe That puzzled me for quite a while: it seemed so much more likely that your bisection would converge on the commit which comes a few later, 1635f6a74152 "tmpfs: undo fallocation on failure", where indeed I do start to play around with tmpfs pages unlocked while !PageUptodate. And yes, they're PageDirty !PagePrivate, so migration could very well end up trying to migrate one and hitting line 1990. It's an aberration of migrate_page_copy(), that it uses __set_page_dirty_nobuffers() on mappings which would never normally go that way at all (I discovered this last year, when I experimented with radix_tree tags for swap in tmpfs, and hit upon this rare case where page migration sets a dirty tag for a tmpfs page, despite tmpfs never using tags). One half of the patch at the bottom should fix that: I'm not sure that it's the fix we actually want (a mapping_cap_account_dirty test might be more appropriate, but it's easier just to test a page flag here); but it should be good to shed more light on the problem. Because your bisection converged on a commit a few before I introduced that bug - and although it was a difficult bisection, you would be very unlikely to mistake a good for bad: the danger was the other way around. So I'm wondering if your trinity fuzzer happens to succeed a lot more often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?), which began to support MADV_REMOVE with that commit. So the second half of the patch should show which filesystem's page is involved when you hit the WARN_ON - unless the first half of the patch turns out to stop the warnings completely, in which case I need to think harder about what was going on in tmpfs, and whether it matters. Or another possibility is that the bad commit doesn't actually touch mm at all: you were doing a bisection just on mm/ changes, weren't you? > > This has been a challenge to bisect additionally because I'm not sure if the other mm > bug I reported in the last few days (the list_debug/list_add corruption warnings in the > compaction code) are related or not. At present I suspect they're not related; but may change my mind. > Sometimes during the bisect these errors happened > in pairs, sometimes only together. Sometimes in pairs, sometimes together? I don't understand. And are "these errors" the list debug warnings, or list debug warnings and Line 1990 warnings? > The 'good' builds showed no errors at all. > > As a reminder, the list_add corruption looks like this... > > WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90() > list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920). > Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_fmt+0x46/0x50 > [] ? trace_hardirqs_on+0xd/0x10 > [] __list_add+0x6c/0x90 > [] move_freepages_block+0x16d/0x190 > [] suitable_migration_target.isra.14+0x1b3/0x1d0 > [] compaction_alloc+0x1db/0x2f0 > [] migrate_pages+0xc7/0x540 > [] ? isolate_freepages_block+0x260/0x260 > [] compact_zone+0x216/0x480 > [] compact_zone_order+0x8d/0xd0 > [] ? get_page_from_freelist+0x565/0x970 > [] try_to_compact_pages+0xc9/0x140 > [] __alloc_pages_direct_compact+0xaa/0x1d0 > [] __alloc_pages_nodemask+0x60b/0xab0 > [] ? trace_hardirqs_off_caller+0x28/0xc0 > [] ? __lock_acquire+0x2f0/0x1aa0 > [] alloc_pages_vma+0xb6/0x190 > [] do_huge_pmd_anonymous_page+0x133/0x310 > [] handle_mm_fault+0x242/0x2e0 > [] __get_user_pages+0x142/0x560 > [] ? mmap_region+0x3f8/0x630 > [] get_user_pages+0x52/0x60 > [] make_pages_present+0x92/0xc0 > [] mmap_region+0x3a6/0x630 > [] ? do_setitimer+0x1cc/0x310 > [] do_mmap_pgoff+0x35d/0x3b0 > [] ? sys_mmap_pgoff+0x66/0x240 > [] sys_mmap_pgoff+0x84/0x240 > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [] sys_mmap+0x22/0x30 > [] system_call_fastpath+0x16/0x1b > ---[ end trace b606ea2a53bf1425 ]--- > > On an affected kernel, it'll show up within an hour of fuzzing on a fast machine. Please give this patch a try (preferably on current git), and let us know. Thanks, Hugh --- 3.4.0+/mm/migrate.c 2012-05-27 10:01:43.104049010 -0700 +++ linux/mm/migrate.c 2012-06-01 00:10:58.080098749 -0700 @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp * is actually a signal that all of the page has become dirty. * Whereas only part of our page may be dirty. */ - __set_page_dirty_nobuffers(newpage); + if (PageSwapBacked(page)) + SetPageDirty(newpage); + else + __set_page_dirty_nobuffers(newpage); } mlock_migrate_page(newpage, page); --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa mapping2 = page_mapping(page); if (mapping2) { /* Race with truncate? */ BUG_ON(mapping2 != mapping); - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); + if (WARN_ON(!PagePrivate(page) && !PageUptodate(page))) + print_symbol(KERN_WARNING + "mapping->a_ops->writepage: %s\n", + (unsigned long)mapping->a_ops->writepage); account_page_dirtied(page, mapping); radix_tree_tag_set(&mapping->page_tree, page_index(page), PAGECACHE_TAG_DIRTY); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx128.postini.com [74.125.245.128]) by kanga.kvack.org (Postfix) with SMTP id 849436B004D for ; Fri, 1 Jun 2012 04:51:38 -0400 (EDT) Received: by qcsd16 with SMTP id d16so1258666qcs.14 for ; Fri, 01 Jun 2012 01:51:37 -0700 (PDT) Message-ID: <4FC88299.1040707@gmail.com> Date: Fri, 01 Jun 2012 04:51:37 -0400 From: KOSAKI Motohiro MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com > mlock_migrate_page(newpage, page); > --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 > +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa > mapping2 = page_mapping(page); > if (mapping2) { /* Race with truncate? */ > BUG_ON(mapping2 != mapping); > - WARN_ON_ONCE(!PagePrivate(page)&& !PageUptodate(page)); > + if (WARN_ON(!PagePrivate(page)&& !PageUptodate(page))) > + print_symbol(KERN_WARNING > + "mapping->a_ops->writepage: %s\n", > + (unsigned long)mapping->a_ops->writepage); type mismatch? I guess you want %pf or %pF. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx194.postini.com [74.125.245.194]) by kanga.kvack.org (Postfix) with SMTP id 7F6586B004D for ; Fri, 1 Jun 2012 05:08:35 -0400 (EDT) Received: by dakp5 with SMTP id p5so3185426dak.14 for ; Fri, 01 Jun 2012 02:08:34 -0700 (PDT) Date: Fri, 1 Jun 2012 02:08:07 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <4FC88299.1040707@gmail.com> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <4FC88299.1040707@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: KOSAKI Motohiro Cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, 1 Jun 2012, KOSAKI Motohiro wrote: > > mlock_migrate_page(newpage, page); > > --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 > > +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 > > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa > > mapping2 = page_mapping(page); > > if (mapping2) { /* Race with truncate? */ > > BUG_ON(mapping2 != mapping); > > - WARN_ON_ONCE(!PagePrivate(page)&& > > !PageUptodate(page)); > > + if (WARN_ON(!PagePrivate(page)&& > > !PageUptodate(page))) > > + print_symbol(KERN_WARNING > > + "mapping->a_ops->writepage: %s\n", > > + (unsigned > > long)mapping->a_ops->writepage); > > type mismatch? I don't think so: I just copied from print_bad_pte(). Probably you're reading "printk" where it's "print_symbol"? > I guess you want %pf or %pF. I expect there is new-fangled %pMagic that can do it too, yes. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id A39896B005C for ; Fri, 1 Jun 2012 05:12:43 -0400 (EDT) Received: by ggm4 with SMTP id 4so2018737ggm.14 for ; Fri, 01 Jun 2012 02:12:42 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <4FC88299.1040707@gmail.com> From: KOSAKI Motohiro Date: Fri, 1 Jun 2012 05:12:19 -0400 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 1, 2012 at 5:08 AM, Hugh Dickins wrote: > On Fri, 1 Jun 2012, KOSAKI Motohiro wrote: >> > =A0 =A0 mlock_migrate_page(newpage, page); >> > --- 3.4.0+/mm/page-writeback.c =A0 =A0 =A02012-05-29 08:09:58.30480678= 2 -0700 >> > +++ linux/mm/page-writeback.c =A0 =A0 =A0 2012-06-01 00:23:43.98411697= 3 -0700 >> > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa >> > =A0 =A0 =A0 =A0 =A0 =A0 mapping2 =3D page_mapping(page); >> > =A0 =A0 =A0 =A0 =A0 =A0 if (mapping2) { /* Race with truncate? */ >> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 BUG_ON(mapping2 !=3D mapping); >> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 WARN_ON_ONCE(!PagePrivate(page)&= & >> > !PageUptodate(page)); >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (WARN_ON(!PagePrivate(page)&& >> > !PageUptodate(page))) >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 print_symbol(KER= N_WARNING >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "mapping= ->a_ops->writepage: %s\n", >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (unsigne= d >> > long)mapping->a_ops->writepage); >> >> type mismatch? > > I don't think so: I just copied from print_bad_pte(). > Probably you're reading "printk" where it's "print_symbol"? Oops, yes, sorry for noise. >> I guess you want %pf or %pF. > > I expect there is new-fangled %pMagic that can do it too, yes. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx131.postini.com [74.125.245.131]) by kanga.kvack.org (Postfix) with SMTP id B62A66B004D for ; Fri, 1 Jun 2012 09:43:30 -0400 (EDT) Date: Fri, 1 Jun 2012 09:43:23 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601134323.GA5214@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Linux Kernel , linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Cong Wang On Thu, May 31, 2012 at 07:43:25PM -0700, Linus Torvalds wrote: > On Thu, May 31, 2012 at 7:31 PM, Dave Jones wrote: > > > > So I bisected it anyway, and it led to ... > > Ok, that doesn't sound entirely unlikely, but considering that you're > nervous about the bisection, please just try to revert it and see if > that fixes your testcase. > > You'll obviously need to revert the commit that removes > vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce > the use of it (it's the next one: > 17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two > commits revert cleanly and the end result seems to compile ok. crap, so much for that theory. I ran latest with those two reverted overnight, and woke up to a dead box. Over serial console, I see a bunch of those same compaction oopses (Via sys_mmap_pgoff), and then kernel BUG at include/linux/mm.h:448! was the last thing it said before it choked. I'll redo the bisect. It's possible that one of the 'good' paths just didn't run for long enough. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx118.postini.com [74.125.245.118]) by kanga.kvack.org (Postfix) with SMTP id ADB436B005A for ; Fri, 1 Jun 2012 10:09:48 -0400 (EDT) Date: Fri, 1 Jun 2012 10:09:43 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601140943.GB1732@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote: > > 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit > > commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > > Author: Hugh Dickins > > Date: Tue May 29 15:06:40 2012 -0700 > > > > mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE > > > > Now tmpfs supports hole-punching via fallocate(), switch madvise_remove() > > to use do_fallocate() instead of vmtruncate_range(): which extends > > madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs. > > > > Hugh ? > > Ow, you've caught me. As I said in another mail, it looks like the bisect was wrong somewhere, as with this backed out I still see problems. > One half of the patch at the bottom should fix that: I'm not sure that > it's the fix we actually want (a mapping_cap_account_dirty test might > be more appropriate, but it's easier just to test a page flag here); > but it should be good to shed more light on the problem. I'll give the patch a try anyway, as builds are quick on that box. > So I'm wondering if your trinity fuzzer happens to succeed a lot more > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?), > which began to support MADV_REMOVE with that commit. ext4 is a possibility. > So the second half of the patch should show which filesystem's page is > involved when you hit the WARN_ON - unless the first half of the patch > turns out to stop the warnings completely, in which case I need to think > harder about what was going on in tmpfs, and whether it matters. > > Or another possibility is that the bad commit doesn't actually touch mm > at all: you were doing a bisection just on mm/ changes, weren't you? oh, good point. It hadn't occured to me that this could be fs related. The mm-heavy stack-trace may have misled me. > > Sometimes during the bisect these errors happened > > in pairs, sometimes only together. > > Sometimes in pairs, sometimes together? I don't understand. beware late-night emails. I meant sometimes I saw both the list-debug's and the WARN, but other times I saw only one or the other. > Please give this patch a try (preferably on current git), and let us know. Will do. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id E3C8E6B005A for ; Fri, 1 Jun 2012 10:15:04 -0400 (EDT) Date: Fri, 1 Jun 2012 10:14:59 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601141459.GC1732@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote: > So I'm wondering if your trinity fuzzer happens to succeed a lot more > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?), > which began to support MADV_REMOVE with that commit. One more thing: I happened to see this during a kernel build last night on another machine too, so it's not just fuzzing fallout. I'm surprised more people aren't seeing it. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx180.postini.com [74.125.245.180]) by kanga.kvack.org (Postfix) with SMTP id CB8606B004D for ; Fri, 1 Jun 2012 12:12:10 -0400 (EDT) Date: Fri, 1 Jun 2012 12:12:05 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601161205.GA1918@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote: > Please give this patch a try (preferably on current git), and let us know. > > Thanks, > Hugh > > --- 3.4.0+/mm/migrate.c 2012-05-27 10:01:43.104049010 -0700 > +++ linux/mm/migrate.c 2012-06-01 00:10:58.080098749 -0700 > @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp > * is actually a signal that all of the page has become dirty. > * Whereas only part of our page may be dirty. > */ > - __set_page_dirty_nobuffers(newpage); > + if (PageSwapBacked(page)) > + SetPageDirty(newpage); > + else > + __set_page_dirty_nobuffers(newpage); > } > > mlock_migrate_page(newpage, page); > --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 > +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa > mapping2 = page_mapping(page); > if (mapping2) { /* Race with truncate? */ > BUG_ON(mapping2 != mapping); > - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > + if (WARN_ON(!PagePrivate(page) && !PageUptodate(page))) > + print_symbol(KERN_WARNING > + "mapping->a_ops->writepage: %s\n", > + (unsigned long)mapping->a_ops->writepage); > account_page_dirtied(page, mapping); > radix_tree_tag_set(&mapping->page_tree, > page_index(page), PAGECACHE_TAG_DIRTY); So with this applied, I don't seem to be able to trigger it. It's been running two hours so far. I'll leave it running, but right now I don't know what to make of this. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx191.postini.com [74.125.245.191]) by kanga.kvack.org (Postfix) with SMTP id 75D3A6B004D for ; Fri, 1 Jun 2012 12:16:43 -0400 (EDT) Date: Fri, 1 Jun 2012 18:16:40 +0200 From: Markus Trippelsdorf Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601161640.GA329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 2012.06.01 at 01:44 -0700, Hugh Dickins wrote: > On Thu, 31 May 2012, Dave Jones wrote: > > On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote: > > > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > > > > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > > > > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() I've also hit this warning today: ------------[ cut here ]------------ WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0xea/0x120() Hardware name: System Product Name Pid: 4385, comm: firefox Not tainted 3.4.0-09547-gfb21aff-dirty #46 Call Trace: [] ? warn_slowpath_common+0x60/0xa0 [] ? __set_page_dirty_nobuffers+0xea/0x120 [] ? migrate_page_copy+0x150/0x160 [] ? migrate_page+0x4d/0x80 [] ? move_to_new_page+0x7d/0x220 [] ? suitable_migration_target.isra.12+0x1a0/0x1a0 [] ? migrate_pages+0x3c8/0x460 [] ? compact_zone+0x1c4/0x2c0 [] ? compact_zone_order+0x82/0xc0 [] ? try_to_compact_pages+0xca/0x140 [] ? __alloc_pages_direct_compact+0xa7/0x18f [] ? __alloc_pages_nodemask+0x3b0/0x7a0 [] ? do_huge_pmd_anonymous_page+0x10d/0x2a0 [] ? do_page_fault+0xfb/0x400 [] ? mmap_region+0x1dd/0x540 [] ? page_fault+0x1f/0x30 ---[ end trace 7d7c821044142576 ]--- -- Markus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx150.postini.com [74.125.245.150]) by kanga.kvack.org (Postfix) with SMTP id 34D7C6B004D for ; Fri, 1 Jun 2012 12:29:18 -0400 (EDT) Received: by wgbdt14 with SMTP id dt14so1891191wgb.26 for ; Fri, 01 Jun 2012 09:29:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20120601161640.GA329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161640.GA329@x4> From: Linus Torvalds Date: Fri, 1 Jun 2012 09:28:56 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Markus Trippelsdorf Cc: Hugh Dickins , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf wrote: > > I've also hit this warning today: Can you try the patch by Hugh Dickins earlier in this thread? Dave is reporting tentative success with it, even though I don't think we really understand this thing fully yet. Getting way more testing would still be good, though. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx185.postini.com [74.125.245.185]) by kanga.kvack.org (Postfix) with SMTP id 0178E6B004D for ; Fri, 1 Jun 2012 12:39:20 -0400 (EDT) Date: Fri, 1 Jun 2012 18:39:18 +0200 From: Markus Trippelsdorf Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601163918.GB329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161640.GA329@x4> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Hugh Dickins , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 2012.06.01 at 09:28 -0700, Linus Torvalds wrote: > On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf > wrote: > > > > I've also hit this warning today: > > Can you try the patch by Hugh Dickins earlier in this thread? I will try. But please notice that the warning just happened per accident. I don't know how to reproduce the issue yet. > Dave is reporting tentative success with it, even though I don't think > we really understand this thing fully yet. Getting way more testing > would still be good, though. -- Markus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx106.postini.com [74.125.245.106]) by kanga.kvack.org (Postfix) with SMTP id C60A96B004D for ; Fri, 1 Jun 2012 14:05:47 -0400 (EDT) Date: Fri, 1 Jun 2012 13:16:06 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601171606.GA3794@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120601161205.GA1918@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote: > So with this applied, I don't seem to be able to trigger it. It's been running two hours > so far. I'll leave it running, but right now I don't know what to make of this. I can trigger the list corruption still, but not the WARN. Dave [ 551.980716] ------------[ cut here ]------------ [ 551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() [ 551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0 [ 551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49 [ 551.989063] Call Trace: [ 551.990012] [] warn_slowpath_common+0x7f/0xc0 [ 551.990956] [] warn_slowpath_fmt+0x46/0x50 [ 551.991902] [] __list_del_entry+0xa1/0xd0 [ 551.992849] [] move_freepages_block+0x159/0x190 [ 551.993800] [] suitable_migration_target.isra.15+0x1b3/0x1d0 [ 551.994761] [] compaction_alloc+0x22e/0x2f0 [ 551.995731] [] migrate_pages+0xc7/0x540 [ 551.996684] [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 [ 551.997638] [] compact_zone+0x216/0x480 [ 551.998593] [] ? trace_hardirqs_off_caller+0x28/0xc0 [ 551.999558] [] compact_zone_order+0x8d/0xd0 [ 552.000525] [] ? get_page_from_freelist+0x565/0x970 [ 552.001502] [] try_to_compact_pages+0xc9/0x140 [ 552.002548] [] __alloc_pages_direct_compact+0xaa/0x1d0 [ 552.003592] [] __alloc_pages_nodemask+0x60b/0xab0 [ 552.004650] [] ? trace_hardirqs_off_caller+0x28/0xc0 [ 552.005708] [] ? __lock_acquire+0x2d0/0x1aa0 [ 552.007332] [] alloc_pages_vma+0xb6/0x190 [ 552.008953] [] do_huge_pmd_anonymous_page+0x133/0x310 [ 552.010584] [] handle_mm_fault+0x242/0x2e0 [ 552.012233] [] __get_user_pages+0x142/0x560 [ 552.013891] [] ? mmap_region+0x3f8/0x630 [ 552.015753] [] get_user_pages+0x52/0x60 [ 552.017348] [] make_pages_present+0x92/0xc0 [ 552.018936] [] mmap_region+0x3a6/0x630 [ 552.021074] [] ? do_setitimer+0x1cc/0x310 [ 552.022367] [] do_mmap_pgoff+0x35d/0x3b0 [ 552.023406] [] ? sys_mmap_pgoff+0x66/0x240 [ 552.024429] [] sys_mmap_pgoff+0x84/0x240 [ 552.025445] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 552.026466] [] sys_mmap+0x22/0x30 [ 552.027486] [] system_call_fastpath+0x16/0x1b [ 552.028521] ---[ end trace c092df1e14d11d14 ]--- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx198.postini.com [74.125.245.198]) by kanga.kvack.org (Postfix) with SMTP id 27F556B004D for ; Fri, 1 Jun 2012 18:18:14 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so4503787pbb.14 for ; Fri, 01 Jun 2012 15:18:13 -0700 (PDT) Date: Fri, 1 Jun 2012 15:17:48 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120601171606.GA3794@redhat.com> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, 1 Jun 2012, Dave Jones wrote: > On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote: > > > So with this applied, I don't seem to be able to trigger it. It's been running two hours > > so far. I'll leave it running, but right now I don't know what to make of this. > > I can trigger the list corruption still, but not the WARN. > > Dave > > [ 551.980716] ------------[ cut here ]------------ > [ 551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() > [ 551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0 > [ 551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > [ 551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49 > [ 551.989063] Call Trace: > [ 551.990012] [] warn_slowpath_common+0x7f/0xc0 > [ 551.990956] [] warn_slowpath_fmt+0x46/0x50 > [ 551.991902] [] __list_del_entry+0xa1/0xd0 > [ 551.992849] [] move_freepages_block+0x159/0x190 > [ 551.993800] [] suitable_migration_target.isra.15+0x1b3/0x1d0 > [ 551.994761] [] compaction_alloc+0x22e/0x2f0 > [ 551.995731] [] migrate_pages+0xc7/0x540 > [ 551.996684] [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > [ 551.997638] [] compact_zone+0x216/0x480 > [ 551.998593] [] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 551.999558] [] compact_zone_order+0x8d/0xd0 > [ 552.000525] [] ? get_page_from_freelist+0x565/0x970 > [ 552.001502] [] try_to_compact_pages+0xc9/0x140 > [ 552.002548] [] __alloc_pages_direct_compact+0xaa/0x1d0 > [ 552.003592] [] __alloc_pages_nodemask+0x60b/0xab0 > [ 552.004650] [] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 552.005708] [] ? __lock_acquire+0x2d0/0x1aa0 > [ 552.007332] [] alloc_pages_vma+0xb6/0x190 > [ 552.008953] [] do_huge_pmd_anonymous_page+0x133/0x310 > [ 552.010584] [] handle_mm_fault+0x242/0x2e0 > [ 552.012233] [] __get_user_pages+0x142/0x560 > [ 552.013891] [] ? mmap_region+0x3f8/0x630 > [ 552.015753] [] get_user_pages+0x52/0x60 > [ 552.017348] [] make_pages_present+0x92/0xc0 > [ 552.018936] [] mmap_region+0x3a6/0x630 > [ 552.021074] [] ? do_setitimer+0x1cc/0x310 > [ 552.022367] [] do_mmap_pgoff+0x35d/0x3b0 > [ 552.023406] [] ? sys_mmap_pgoff+0x66/0x240 > [ 552.024429] [] sys_mmap_pgoff+0x84/0x240 > [ 552.025445] [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 552.026466] [] sys_mmap+0x22/0x30 > [ 552.027486] [] system_call_fastpath+0x16/0x1b > [ 552.028521] ---[ end trace c092df1e14d11d14 ]--- Several distractions today, and I must rush out now for two or three hours: but please check if this patch below makes sense (I've only checked that it builds), and if so give it a run to see if it fixes your list corruptions - thanks. (Looks like there's an independent off-by-one in page_zone(end_page), but that shouldn't do any harm.) Hugh --- 3.4.0+/mm/compaction.c 2012-05-30 08:17:19.396008280 -0700 +++ linux/mm/compaction.c 2012-06-01 15:04:18.612051243 -0700 @@ -369,6 +369,9 @@ static bool rescue_unmovable_pageblock(s { unsigned long pfn, start_pfn, end_pfn; struct page *start_page, *end_page; + struct zone *zone; + unsigned long flags; + bool rescued = false; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); @@ -378,9 +381,11 @@ static bool rescue_unmovable_pageblock(s end_page = pfn_to_page(end_pfn); /* Do not deal with pageblocks that overlap zones */ - if (page_zone(start_page) != page_zone(end_page)) + zone = page_zone(start_page); + if (zone != page_zone(end_page)) return false; + spin_lock_irqsave(&zone->lock, flags); for (page = start_page, pfn = start_pfn; page < end_page; pfn++, page++) { if (!pfn_valid_within(pfn)) @@ -396,12 +401,15 @@ static bool rescue_unmovable_pageblock(s } else if (page_count(page) == 0 || PageLRU(page)) continue; - return false; + goto out; } set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); - return true; + move_freepages_block(zone, page, MIGRATE_MOVABLE); + rescued = true; +out: + spin_unlock_irqrestore(&zone->lock, flags); + return rescued; } enum smt_result { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx104.postini.com [74.125.245.104]) by kanga.kvack.org (Postfix) with SMTP id 415196B004D for ; Fri, 1 Jun 2012 21:45:30 -0400 (EDT) Received: by wgbdt14 with SMTP id dt14so2195391wgb.26 for ; Fri, 01 Jun 2012 18:45:28 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> From: Linus Torvalds Date: Fri, 1 Jun 2012 18:45:07 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > > + =A0 =A0 =A0 spin_lock_irqsave(&zone->lock, flags); > =A0 =A0 =A0 =A0for (page =3D start_page, pfn =3D start_pfn; page < end_pa= ge; pfn++, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0page++) { So holding the spinlock (and disabling irqs!) over the whole loop sounds horrible. At the same time, the iterators don't seem to require the spinlock, so it should be possible to just move the lock into the loop, no? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx190.postini.com [74.125.245.190]) by kanga.kvack.org (Postfix) with SMTP id 305CA6B004D for ; Sat, 2 Jun 2012 00:41:02 -0400 (EDT) Received: by dakp5 with SMTP id p5so4626159dak.14 for ; Fri, 01 Jun 2012 21:41:01 -0700 (PDT) Date: Fri, 1 Jun 2012 21:40:35 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323584-1474580492-1338612042=:11308" Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323584-1474580492-1338612042=:11308 Content-Type: TEXT/PLAIN; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE On Fri, 1 Jun 2012, Linus Torvalds wrote: > On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > > > > + =A0 =A0 =A0 spin_lock_irqsave(&zone->lock, flags); > > =A0 =A0 =A0 =A0for (page =3D start_page, pfn =3D start_pfn; page < end_= page; pfn++, > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0page++) { >=20 > So holding the spinlock (and disabling irqs!) over the whole loop > sounds horrible. There looks to be a pretty similar loop inside move_freepages_block(), which is the part which I believe really needs the lock - it's moving free pages from one lru to another. >=20 > At the same time, the iterators don't seem to require the spinlock, so > it should be possible to just move the lock into the loop, no? Move the lock after the loop, I think you meant. I put the lock before the loop because it's deciding whether it can usefully proceed, and then proceeding: I was thinking that the lock would stabilize the conditions that it bases that decision on. But it certainly does not stabilize all of them (most obviously not PageLRU), so I'm guesssing that this is a best-effort decision which can safely go wrong some of the time. In which case, yes, much better to follow your suggestion, and hold the lock (with irqs disabled) for only half the time. Similarly untested patch below. But I'm entirely unfamiliar with this code: best Cc people more familiar with it. Does this addition of locking to rescue_unmovable_pageblock() look correct to you, and do you think it has a good chance of fixing the move_freepages_block() list debug warnings which Dave has been reporting (in this and in another thread)? (Although there's still something of a mystery in where Dave's bisection appeared to converge, our best assumption at present is that one of my tmpfs changes is to blame for the __set_page_dirty_nobuffers warnings, and I need to send a finalized patch to fix that later. I'm guessing that the few people who see the warning are those running new systemd distros, and that systemd is indeed now making use of the fallocate support we added into tmpfs for it.) Hugh --- 3.4.0+/mm/compaction.c=092012-05-30 08:17:19.396008280 -0700 +++ linux/mm/compaction.c=092012-06-01 20:59:56.840204915 -0700 @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s { =09unsigned long pfn, start_pfn, end_pfn; =09struct page *start_page, *end_page; +=09struct zone *zone; +=09unsigned long flags; =20 =09pfn =3D page_to_pfn(page); =09start_pfn =3D pfn & ~(pageblock_nr_pages - 1); @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s =09end_page =3D pfn_to_page(end_pfn); =20 =09/* Do not deal with pageblocks that overlap zones */ -=09if (page_zone(start_page) !=3D page_zone(end_page)) +=09zone =3D page_zone(start_page); +=09if (zone !=3D page_zone(end_page)) =09=09return false; =20 =09for (page =3D start_page, pfn =3D start_pfn; page < end_page; pfn++, @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s =09=09return false; =09} =20 +=09spin_lock_irqsave(&zone->lock, flags); =09set_pageblock_migratetype(page, MIGRATE_MOVABLE); -=09move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); +=09move_freepages_block(zone, page, MIGRATE_MOVABLE); +=09spin_unlock_irqrestore(&zone->lock, flags); =09return true; } =20 --8323584-1474580492-1338612042=:11308-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx152.postini.com [74.125.245.152]) by kanga.kvack.org (Postfix) with SMTP id DB44F6B004D for ; Sat, 2 Jun 2012 00:59:17 -0400 (EDT) Received: by wefh52 with SMTP id h52so2365370wef.14 for ; Fri, 01 Jun 2012 21:59:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> From: Linus Torvalds Date: Fri, 1 Jun 2012 21:58:50 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins wrote: > > Move the lock after the loop, I think you meant. Well, I wasn't sure if anything inside the loop might need it. I don't *think* so, but at the same time, what protects "page_order(page)" (or, indeed PageBuddy()) from being stable while that loop content uses them? I don't understand that code at all. It does that crazy iteration over page, and changes "page" in random ways, and then finishes up with a totally new "page" value that is some random thing that is *after* the end_page thing. WHAT? The code makes no sense. It tests all those pages within the page-block, but then after it has done all those tests, it does the final set_pageblock_migratetype(..) move_freepages_block(..) using a page that is *beyond* the pageblock (and with the whole page_order() thing, who knows just how far beyond it?) It looks entirely too much like random-monkey code to me. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx177.postini.com [74.125.245.177]) by kanga.kvack.org (Postfix) with SMTP id 1EED46B004D for ; Sat, 2 Jun 2012 03:17:34 -0400 (EDT) Date: Sat, 2 Jun 2012 09:17:30 +0200 From: Markus Trippelsdorf Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120602071730.GB329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote: > > I'm guessing that the few people who see the warning are those running > new systemd distros, and that systemd is indeed now making use of the > fallocate support we added into tmpfs for it.) At least in my case it's nothing that horrible. I'm just setting browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox does indeed use fallocate on its "disk cache" items. -- Markus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx195.postini.com [74.125.245.195]) by kanga.kvack.org (Postfix) with SMTP id A3B5C6B004D for ; Sat, 2 Jun 2012 03:20:40 -0400 (EDT) Received: by dakp5 with SMTP id p5so4772795dak.14 for ; Sat, 02 Jun 2012 00:20:40 -0700 (PDT) Date: Sat, 2 Jun 2012 00:20:13 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, 1 Jun 2012, Linus Torvalds wrote: > On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins wrote: > > > > Move the lock after the loop, I think you meant. > > Well, I wasn't sure if anything inside the loop might need it. I don't > *think* so, but at the same time, what protects "page_order(page)" > (or, indeed PageBuddy()) from being stable while that loop content > uses them? Yes, I believe you're right, page_order(page) could supply nonsense if it's not stabilized under zone->lock along with PageBuddy(page). Though if this rescue_unmovable_pageblock() is just best-effort, with a little more care we can probably avoid the lock in there. > > I don't understand that code at all. It does that crazy iteration over > page, and changes "page" in random ways, I don't think they're random ways: when buddy it uses the order to skip that block, otherwise it goes page by page, considering a free (I guess on pcp) page or an lru page as good for movable. > and then finishes up with a > totally new "page" value that is some random thing that is *after* the > end_page thing. WHAT? > > The code makes no sense. It tests all those pages within the > page-block, but then after it has done all those tests, it does the > final > > set_pageblock_migratetype(..) > move_freepages_block(..) > > using a page that is *beyond* the pageblock (and with the whole > page_order() thing, who knows just how far beyond it?) I totally missed that, thank goodness you did not. Yes, it's rubbish. It goes to this effort to find a suitable pageblock, then chooses the next one instead (or possibly another). Perhaps it would get even better results using a random number generator in there. > > It looks entirely too much like random-monkey code to me. Presumably it should be passing start_page instead of page to set_pageblock_migratetype() and move_freepages_block(). But this does seem to be code of the kind, that the longer you look at it, the more bugs you find. And I worry about what trouble it might then cause, if it actually started to work in the way it was intending. I don't think fixing it up is wise for -rc1. Commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199 ("mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks") appears to revert cleanly, and I'm running with it reverted now. I'm not saying it shouldn't come back later, but does anyone see an argument against reverting it now? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx158.postini.com [74.125.245.158]) by kanga.kvack.org (Postfix) with SMTP id 4F5B06B004D for ; Sat, 2 Jun 2012 03:23:01 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so4974606pbb.14 for ; Sat, 02 Jun 2012 00:23:00 -0700 (PDT) Date: Sat, 2 Jun 2012 00:22:34 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120602071730.GB329@x4> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120602071730.GB329@x4> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Markus Trippelsdorf Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sat, 2 Jun 2012, Markus Trippelsdorf wrote: > On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote: > > > > I'm guessing that the few people who see the warning are those running > > new systemd distros, and that systemd is indeed now making use of the > > fallocate support we added into tmpfs for it.) > > At least in my case it's nothing that horrible. I'm just setting > browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox > does indeed use fallocate on its "disk cache" items. That fits, and it's very helpful to know - thank you. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx205.postini.com [74.125.245.205]) by kanga.kvack.org (Postfix) with SMTP id B57826B004D for ; Sat, 2 Jun 2012 03:28:07 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so4979047pbb.14 for ; Sat, 02 Jun 2012 00:28:07 -0700 (PDT) Date: Sat, 2 Jun 2012 00:27:47 -0700 (PDT) From: Hugh Dickins Subject: [PATCH] mm: fix warning in __set_page_dirty_nobuffers In-Reply-To: Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120602071730.GB329@x4> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Markus Trippelsdorf , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org New tmpfs use of !PageUptodate pages for fallocate() is triggering the WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers() is called from migrate_page_copy() for compaction. It is anomalous that migration should use __set_page_dirty_nobuffers() on an address_space that does not participate in dirty and writeback accounting; and this has also been observed to insert surprising dirty tags into a tmpfs radix_tree, despite tmpfs not using tags at all. We should probably give migrate_page_copy() a better way to preserve the tag and migrate accounting info, when mapping_cap_account_dirty(). But that needs some more work: so in the interim, avoid the warning by using a simple SetPageDirty on PageSwapBacked pages. Reported-by: Dave Jones Signed-off-by: Hugh Dickins --- mm/migrate.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- 3.4.0+/mm/migrate.c 2012-05-27 10:01:43.104049010 -0700 +++ linux/mm/migrate.c 2012-06-01 00:10:58.080098749 -0700 @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp * is actually a signal that all of the page has become dirty. * Whereas only part of our page may be dirty. */ - __set_page_dirty_nobuffers(newpage); + if (PageSwapBacked(page)) + SetPageDirty(newpage); + else + __set_page_dirty_nobuffers(newpage); } mlock_migrate_page(newpage, page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id 0ABD06B004D for ; Sun, 3 Jun 2012 14:16:03 -0400 (EDT) Date: Sun, 3 Jun 2012 14:15:48 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603181548.GA306@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Fri, Jun 01, 2012 at 09:40:35PM -0700, Hugh Dickins wrote: > In which case, yes, much better to follow your suggestion, and hold > the lock (with irqs disabled) for only half the time. > > Similarly untested patch below. Things aren't happy with that patch at all. ============================================= [ INFO: possible recursive locking detected ] 3.5.0-rc1+ #50 Not tainted --------------------------------------------- trinity-child1/31784 is trying to acquire lock: (&(&zone->lock)->rlock){-.-.-.}, at: [] suitable_migration_target.isra.15+0x19d/0x1e0 but task is already holding lock: (&(&zone->lock)->rlock){-.-.-.}, at: [] compaction_alloc+0x21b/0x2f0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&zone->lock)->rlock); lock(&(&zone->lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by trinity-child1/31784: #0: (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x66/0xb0 #1: (&(&zone->lock)->rlock){-.-.-.}, at: [] compaction_alloc+0x21b/0x2f0 stack backtrace: Pid: 31784, comm: trinity-child1 Not tainted 3.5.0-rc1+ #50 Call Trace: [] __lock_acquire+0x1584/0x1aa0 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] ? local_clock+0x47/0x60 [] lock_acquire+0x92/0x1f0 [] ? suitable_migration_target.isra.15+0x19d/0x1e0 [] ? _raw_spin_lock_irqsave+0x25/0x90 [] _raw_spin_lock_irqsave+0x52/0x90 [] ? suitable_migration_target.isra.15+0x19d/0x1e0 [] suitable_migration_target.isra.15+0x19d/0x1e0 [] compaction_alloc+0x22e/0x2f0 [] migrate_pages+0xc7/0x540 [] ? isolate_freepages_block+0x260/0x260 [] compact_zone+0x216/0x480 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] compact_zone_order+0x8d/0xd0 [] ? get_page_from_freelist+0x565/0x970 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 Then a bunch of NMI backtraces, and a hard lockup. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx154.postini.com [74.125.245.154]) by kanga.kvack.org (Postfix) with SMTP id DFD1A6B004D for ; Sun, 3 Jun 2012 14:23:51 -0400 (EDT) Received: by wibhj6 with SMTP id hj6so1817770wib.8 for ; Sun, 03 Jun 2012 11:23:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20120603181548.GA306@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> From: Linus Torvalds Date: Sun, 3 Jun 2012 11:23:29 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , Hugh Dickins , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > Things aren't happy with that patch at all. Yeah, at this point I think we need to just revert the compaction changes. Guys, what's the minimal set of commits to revert? That clearly buggy "rescue_unmovable_pageblock()" function was introduced by commit 5ceb9ce6fe94, but is that actually involved with the particular bug? That commit seems to revert cleanly still, but is that sufficient or does it even matter? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 7F3406B004D for ; Sun, 3 Jun 2012 14:31:47 -0400 (EDT) Date: Sun, 3 Jun 2012 14:31:39 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603183139.GA1061@redhat.com> References: <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > > > Things aren't happy with that patch at all. > > Yeah, at this point I think we need to just revert the compaction changes. > > Guys, what's the minimal set of commits to revert? That clearly buggy > "rescue_unmovable_pageblock()" function was introduced by commit > 5ceb9ce6fe94, but is that actually involved with the particular bug? > That commit seems to revert cleanly still, but is that sufficient or > does it even matter? I'l rerun the test with that (and Hugh's last patch) backed out, and see if that makes any difference. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx165.postini.com [74.125.245.165]) by kanga.kvack.org (Postfix) with SMTP id DA32E6B005C for ; Sun, 3 Jun 2012 16:53:42 -0400 (EDT) Date: Sun, 3 Jun 2012 16:53:32 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603205332.GA5412@redhat.com> References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120603183139.GA1061@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote: > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote: > > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > > > > > Things aren't happy with that patch at all. > > > > Yeah, at this point I think we need to just revert the compaction changes. > > > > Guys, what's the minimal set of commits to revert? That clearly buggy > > "rescue_unmovable_pageblock()" function was introduced by commit > > 5ceb9ce6fe94, but is that actually involved with the particular bug? > > That commit seems to revert cleanly still, but is that sufficient or > > does it even matter? > > I'l rerun the test with that (and Hugh's last patch) backed out, and see > if that makes any difference. running just over two hours with that commit reverted with no obvious ill effects so far. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx162.postini.com [74.125.245.162]) by kanga.kvack.org (Postfix) with SMTP id B67466B005C for ; Sun, 3 Jun 2012 17:59:44 -0400 (EDT) Received: by wefh52 with SMTP id h52so3320724wef.14 for ; Sun, 03 Jun 2012 14:59:43 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20120603205332.GA5412@redhat.com> References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> From: Linus Torvalds Date: Sun, 3 Jun 2012 14:59:22 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones wrote: > > running just over two hours with that commit reverted with no obvious ill effects so far. And how quickly have you usually seen the problems? Would you have considered two ours "good" in your bisection thing? Also, just to check: Hugh sent out a patch called "mm: fix warning in __set_page_dirty_nobuffers". Is that applied in your tree too, or did the __set_page_dirty_nobuffers() warning go away with just the revert? I'm just trying to figure out exactly what you are testing. When you said "test with that (and Hugh's last patch) backed out", the "and Hugh's last patch" part was a bit ambiguous. Do you mean the trial patch in this thread (backed out) or do you mean "*with* Hugh's patch for the __set_page_dirty_nobuffers() warning". Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx203.postini.com [74.125.245.203]) by kanga.kvack.org (Postfix) with SMTP id 6B7066B005C for ; Sun, 3 Jun 2012 18:13:36 -0400 (EDT) Date: Sun, 3 Jun 2012 18:13:26 -0400 From: Dave Jones Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603221326.GA7707@redhat.com> References: <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, Jun 03, 2012 at 02:59:22PM -0700, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones wrote: > > > > running just over two hours with that commit reverted with no obvious ill effects so far. > > And how quickly have you usually seen the problems? Would you have > considered two ours "good" in your bisection thing? Yeah, usually see something go awry in an hour or less. > Also, just to check: Hugh sent out a patch called "mm: fix warning in > __set_page_dirty_nobuffers". Is that applied in your tree too, or did > the __set_page_dirty_nobuffers() warning go away with just the revert? That is applied. Otherwise I see the warning he refers to. > I'm just trying to figure out exactly what you are testing. When you > said "test with that (and Hugh's last patch) backed out", the "and > Hugh's last patch" part was a bit ambiguous. Do you mean the trial > patch in this thread (backed out) or do you mean "*with* Hugh's patch > for the __set_page_dirty_nobuffers() warning". The former. (This). --- 3.4.0+/mm/compaction.c 2012-05-30 08:17:19.396008280 -0700 +++ linux/mm/compaction.c 2012-06-01 20:59:56.840204915 -0700 @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s { unsigned long pfn, start_pfn, end_pfn; struct page *start_page, *end_page; + struct zone *zone; + unsigned long flags; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s end_page = pfn_to_page(end_pfn); /* Do not deal with pageblocks that overlap zones */ - if (page_zone(start_page) != page_zone(end_page)) + zone = page_zone(start_page); + if (zone != page_zone(end_page)) return false; for (page = start_page, pfn = start_pfn; page < end_page; pfn++, @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s return false; } + spin_lock_irqsave(&zone->lock, flags); set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); + move_freepages_block(zone, page, MIGRATE_MOVABLE); + spin_unlock_irqrestore(&zone->lock, flags); return true; I do see something else weird going on, but it seems like an unrelated problem. I have a lot of processes hanging after calling sys_renameat. I'll dig some more on that, and post a follow-up. Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx161.postini.com [74.125.245.161]) by kanga.kvack.org (Postfix) with SMTP id EDE486B005C for ; Sun, 3 Jun 2012 18:18:04 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so6654209pbb.14 for ; Sun, 03 Jun 2012 15:18:04 -0700 (PDT) Date: Sun, 3 Jun 2012 15:17:36 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120603205332.GA5412@redhat.com> Message-ID: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Dave Jones , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, 3 Jun 2012, Dave Jones wrote: > On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote: > > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote: > > > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > > > > > > > Things aren't happy with that patch at all. > > > > > > Yeah, at this point I think we need to just revert the compaction changes. > > > > > > Guys, what's the minimal set of commits to revert? That clearly buggy > > > "rescue_unmovable_pageblock()" function was introduced by commit > > > 5ceb9ce6fe94, but is that actually involved with the particular bug? > > > That commit seems to revert cleanly still, but is that sufficient or > > > does it even matter? > > > > I'l rerun the test with that (and Hugh's last patch) backed out, and see > > if that makes any difference. > > running just over two hours with that commit reverted with no obvious ill effects so far. Yes, and I ran happily with precisely that commit reverted on Friday - though I've never got the list corruption that you saw with it in. The locking bug certainly comes in with that commit, it's an isolated commit that reverts cleanly, and I think you got the list corruption rather sooner than two hours before (9min, 30min, 41min from the traces you sent). Maybe we should let you run a little longer, or wait for others to comment. But another strike against that commit: I tried fixing it up to use start_page instead of page at the end, with the worrying but safer locking I suggested at first, with a count of how many times it went there, and how many times it succeeded. While I ran my usual swapping test (perhaps that's a very unfair test to run on this, I've no idea) for seven hours, it went there 25406 times (once per second, it appears) and it succeeded... 0 times. Let's hope it failed quickly each time, I wasn't capturing that. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx123.postini.com [74.125.245.123]) by kanga.kvack.org (Postfix) with SMTP id ACA8F6B005C for ; Sun, 3 Jun 2012 18:30:14 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so6662068pbb.14 for ; Sun, 03 Jun 2012 15:30:14 -0700 (PDT) Date: Sun, 3 Jun 2012 15:29:46 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, 3 Jun 2012, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones wrote: > > > > running just over two hours with that commit reverted with no obvious ill effects so far. > > And how quickly have you usually seen the problems? Would you have > considered two ours "good" in your bisection thing? > > Also, just to check: Hugh sent out a patch called "mm: fix warning in > __set_page_dirty_nobuffers". Is that applied in your tree too, or did > the __set_page_dirty_nobuffers() warning go away with just the revert? That patch is good for fixing the __set_page_dirty_nobuffers() warning, but it has no relevance to the list corruption Dave was also reporting, nor vice versa. The common factor there is just Dave. And no disaster that the warning fix missed -rc1: it's only a WARN_ON_ONCE, and nothing was wrong beyond the warning itself, just noise. It's true that Dave's original bisection raised the doubt whether that warning is coming from somewhere else too; but best guess at this point is that something got mixed up, and we should only worry about that if we see the warning again once the known fix is in. Hugh > > I'm just trying to figure out exactly what you are testing. When you > said "test with that (and Hugh's last patch) backed out", the "and > Hugh's last patch" part was a bit ambiguous. Do you mean the trial > patch in this thread (backed out) or do you mean "*with* Hugh's patch > for the __set_page_dirty_nobuffers() warning". -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx187.postini.com [74.125.245.187]) by kanga.kvack.org (Postfix) with SMTP id 26A7F6B005C for ; Sun, 3 Jun 2012 19:13:35 -0400 (EDT) Received: by wefh52 with SMTP id h52so3347022wef.14 for ; Sun, 03 Jun 2012 16:13:33 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> From: Linus Torvalds Date: Sun, 3 Jun 2012 16:13:13 -0700 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: > > But another strike against that commit: I tried fixing it up to use > start_page instead of page at the end, with the worrying but safer > locking I suggested at first, with a count of how many times it went > there, and how many times it succeeded. You can't use start_page anyway, it might not be a valid page. There's a reson it does that "pfn_valid_within()", methinks. Anyway, my current plan is to apply your "mm: fix warning in __set_page_dirty_nobuffers" patch - even if it's just a harmless WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally hit his problem much before two hours, and it must be even longer now. Ack on that plan? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx192.postini.com [74.125.245.192]) by kanga.kvack.org (Postfix) with SMTP id 56A5A6B005C for ; Sun, 3 Jun 2012 20:46:16 -0400 (EDT) Received: by qafl39 with SMTP id l39so1587324qaf.9 for ; Sun, 03 Jun 2012 17:46:15 -0700 (PDT) Message-ID: <4FCC0553.80100@gmail.com> Date: Sun, 03 Jun 2012 20:46:11 -0400 From: KOSAKI Motohiro MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com (6/3/12 7:13 PM), Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: >> >> But another strike against that commit: I tried fixing it up to use >> start_page instead of page at the end, with the worrying but safer >> locking I suggested at first, with a count of how many times it went >> there, and how many times it succeeded. > > You can't use start_page anyway, it might not be a valid page. There's > a reson it does that "pfn_valid_within()", methinks. Right. ia64 has strange^H^H^H^H optimized pfn_valid and we need care it. (btw, I don't understand why mips may enable CONFIG_HOLES_INZONE, mips doesn't have custom pfn_valid) > Anyway, my current plan is to apply your "mm: fix warning in > __set_page_dirty_nobuffers" patch - even if it's just a harmless > WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally > hit his problem much before two hours, and it must be even longer now. > > Ack on that plan? +1. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id B16EF6B005C for ; Sun, 3 Jun 2012 21:10:24 -0400 (EDT) Message-ID: <4FCC0B09.1070708@kernel.org> Date: Mon, 04 Jun 2012 10:10:33 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 06/02/2012 01:40 PM, Hugh Dickins wrote: > On Fri, 1 Jun 2012, Linus Torvalds wrote: >> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: >>> >>> + spin_lock_irqsave(&zone->lock, flags); >>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >>> page++) { >> >> So holding the spinlock (and disabling irqs!) over the whole loop >> sounds horrible. > > There looks to be a pretty similar loop inside move_freepages_block(), > which is the part which I believe really needs the lock - it's moving > free pages from one lru to another. > >> >> At the same time, the iterators don't seem to require the spinlock, so >> it should be possible to just move the lock into the loop, no? > > Move the lock after the loop, I think you meant. > > I put the lock before the loop because it's deciding whether it can > usefully proceed, and then proceeding: I was thinking that the lock > would stabilize the conditions that it bases that decision on. We do it with two phase. In first phase, we don't need lock because we don't need to be exact. In second phase where move pages really, we need a lock so we already hold it. ret = suitable_migration_target(page, cc); .. .. spin_lock_irqsave(&zone->lock, flags); ret = suitable_migration_target(page, cc); So you shouldn't put the lock in loop. > > But it certainly does not stabilize all of them (most obviously not > PageLRU), so I'm guesssing that this is a best-effort decision which > can safely go wrong some of the time. Right. > > In which case, yes, much better to follow your suggestion, and hold > the lock (with irqs disabled) for only half the time. > > Similarly untested patch below. > > But I'm entirely unfamiliar with this code: best Cc people more familiar > with it. Does this addition of locking to rescue_unmovable_pageblock() > look correct to you, and do you think it has a good chance of fixing the No.I think we need to use start_page instead of page and we need a last page of page block to check cross-over zones, not first page in next page block. I should have reviewed more carefully. :( barrios@bbox:~/linux-2.6$ git diff diff --git a/mm/compaction.c b/mm/compaction.c index 4ac338a..b3fcc4b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); - end_pfn = start_pfn + pageblock_nr_pages; + end_pfn = start_pfn + pageblock_nr_pages - 1; start_page = pfn_to_page(start_pfn); end_page = pfn_to_page(end_pfn); @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) if (page_zone(start_page) != page_zone(end_page)) return false; - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, page++) { if (!pfn_valid_within(pfn)) continue; @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) return false; } - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); return true; } Hugh, thanks for looking this. > move_freepages_block() list debug warnings which Dave has been reporting > (in this and in another thread)? > > (Although there's still something of a mystery in where Dave's bisection > appeared to converge, our best assumption at present is that one of my > tmpfs changes is to blame for the __set_page_dirty_nobuffers warnings, > and I need to send a finalized patch to fix that later. > > I'm guessing that the few people who see the warning are those running > new systemd distros, and that systemd is indeed now making use of the > fallocate support we added into tmpfs for it.) > > Hugh > > --- 3.4.0+/mm/compaction.c 2012-05-30 08:17:19.396008280 -0700 > +++ linux/mm/compaction.c 2012-06-01 20:59:56.840204915 -0700 > @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s > { > unsigned long pfn, start_pfn, end_pfn; > struct page *start_page, *end_page; > + struct zone *zone; > + unsigned long flags; > > pfn = page_to_pfn(page); > start_pfn = pfn & ~(pageblock_nr_pages - 1); > @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s > end_page = pfn_to_page(end_pfn); > > /* Do not deal with pageblocks that overlap zones */ > - if (page_zone(start_page) != page_zone(end_page)) > + zone = page_zone(start_page); > + if (zone != page_zone(end_page)) > return false; > > for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s > return false; > } > > + spin_lock_irqsave(&zone->lock, flags); > set_pageblock_migratetype(page, MIGRATE_MOVABLE); > - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); > + move_freepages_block(zone, page, MIGRATE_MOVABLE); > + spin_unlock_irqrestore(&zone->lock, flags); > return true; > } > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id 680E16B005C for ; Sun, 3 Jun 2012 21:19:09 -0400 (EDT) Received: by dakp5 with SMTP id p5so6586211dak.14 for ; Sun, 03 Jun 2012 18:19:08 -0700 (PDT) Date: Sun, 3 Jun 2012 18:18:39 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Sun, 3 Jun 2012, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: > > > > But another strike against that commit: I tried fixing it up to use > > start_page instead of page at the end, with the worrying but safer > > locking I suggested at first, with a count of how many times it went > > there, and how many times it succeeded. > > You can't use start_page anyway, it might not be a valid page. There's > a reson it does that "pfn_valid_within()", methinks. You wouldn't want me to say that I think you're right, it would impudently suggest that I might conceive of you being wrong. I sigh for your heavy burden. > > Anyway, my current plan is to apply your "mm: fix warning in > __set_page_dirty_nobuffers" patch - even if it's just a harmless > WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally > hit his problem much before two hours, and it must be even longer now. > > Ack on that plan? Sure, ack from me on that plan. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx182.postini.com [74.125.245.182]) by kanga.kvack.org (Postfix) with SMTP id 138ED6B005D for ; Sun, 3 Jun 2012 21:21:42 -0400 (EDT) Message-ID: <4FCC0DB4.30106@kernel.org> Date: Mon, 04 Jun 2012 10:21:56 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Linus Torvalds Cc: Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 06/04/2012 08:13 AM, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: >> >> But another strike against that commit: I tried fixing it up to use >> start_page instead of page at the end, with the worrying but safer >> locking I suggested at first, with a count of how many times it went >> there, and how many times it succeeded. > > You can't use start_page anyway, it might not be a valid page. There's > a reson it does that "pfn_valid_within()", methinks. Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock. We make sure it's valid in isolate_freepages. So how about this? barrios@bbox:~/linux-2.6$ git diff diff --git a/mm/compaction.c b/mm/compaction.c index 4ac338a..7459ab5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, static bool rescue_unmovable_pageblock(struct page *page) { unsigned long pfn, start_pfn, end_pfn; - struct page *start_page, *end_page; + struct page *start_page, *end_page, *cursor_page; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); - end_pfn = start_pfn + pageblock_nr_pages; + end_pfn = start_pfn + pageblock_nr_pages - 1; start_page = pfn_to_page(start_pfn); end_page = pfn_to_page(end_pfn); @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page) if (page_zone(start_page) != page_zone(end_page)) return false; - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, - page++) { + for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++, + cursor_page++) { if (!pfn_valid_within(pfn)) continue; - if (PageBuddy(page)) { - int order = page_order(page); + if (PageBuddy(cursor_page)) { + int order = page_order(cursor_page); pfn += (1 << order) - 1; - page += (1 << order) - 1; + cursor_page += (1 << order) - 1; continue; - } else if (page_count(page) == 0 || PageLRU(page)) + } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page)) continue; return false; > > Anyway, my current plan is to apply your "mm: fix warning in > __set_page_dirty_nobuffers" patch - even if it's just a harmless > WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally > hit his problem much before two hours, and it must be even longer now. > > Ack on that plan? No objection. The patch wasn't a bug fix and even test workload was very theoretical. > > Linus > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx200.postini.com [74.125.245.200]) by kanga.kvack.org (Postfix) with SMTP id 8D2CD6B0069 for ; Sun, 3 Jun 2012 21:27:00 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so6799403pbb.14 for ; Sun, 03 Jun 2012 18:26:59 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4FCC0DB4.30106@kernel.org> References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> <4FCC0DB4.30106@kernel.org> From: KOSAKI Motohiro Date: Sun, 3 Jun 2012 21:26:39 -0400 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Linus Torvalds , Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org > Right. I missed that. I think we can use the page passed to rescue_unmova= ble_pageblock. > We make sure it's valid in isolate_freepages. So how about this? > > barrios@bbox:~/linux-2.6$ git diff > diff --git a/mm/compaction.c b/mm/compaction.c > index 4ac338a..7459ab5 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struc= t compact_control *cc, > =A0static bool rescue_unmovable_pageblock(struct page *page) > =A0{ > =A0 =A0 =A0 =A0unsigned long pfn, start_pfn, end_pfn; > - =A0 =A0 =A0 struct page *start_page, *end_page; > + =A0 =A0 =A0 struct page *start_page, *end_page, *cursor_page; > > =A0 =A0 =A0 =A0pfn =3D page_to_pfn(page); > =A0 =A0 =A0 =A0start_pfn =3D pfn & ~(pageblock_nr_pages - 1); > - =A0 =A0 =A0 end_pfn =3D start_pfn + pageblock_nr_pages; > + =A0 =A0 =A0 end_pfn =3D start_pfn + pageblock_nr_pages - 1; > > =A0 =A0 =A0 =A0start_page =3D pfn_to_page(start_pfn); > =A0 =A0 =A0 =A0end_page =3D pfn_to_page(end_pfn); > @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page = *page) > =A0 =A0 =A0 =A0if (page_zone(start_page) !=3D page_zone(end_page)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return false; > > - =A0 =A0 =A0 for (page =3D start_page, pfn =3D start_pfn; page < end_pag= e; pfn++, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 page++) { > + =A0 =A0 =A0 for (cursor_page =3D start_page, pfn =3D start_pfn; cursor_= page <=3D end_page; pfn++, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cursor_page++) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (!pfn_valid_within(pfn)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; I guess page_zone() should be used after pfn_valid_within(). Why can we assume invalid pfn return correct zone? > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (PageBuddy(page)) { > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int order =3D page_order(pa= ge); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (PageBuddy(cursor_page)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 int order =3D page_order(cu= rsor_page); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0pfn +=3D (1 << order) - 1; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 page +=3D (1 << order) - 1; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cursor_page +=3D (1 << orde= r) - 1; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if (page_count(page) =3D=3D 0 || Pag= eLRU(page)) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } else if (page_count(cursor_page) =3D=3D 0= || PageLRU(cursor_page)) > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue; > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return false; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx134.postini.com [74.125.245.134]) by kanga.kvack.org (Postfix) with SMTP id 940786B0069 for ; Sun, 3 Jun 2012 21:41:46 -0400 (EDT) Received: by pbbrp2 with SMTP id rp2so6814253pbb.14 for ; Sun, 03 Jun 2012 18:41:45 -0700 (PDT) Date: Sun, 3 Jun 2012 18:41:21 -0700 (PDT) From: Hugh Dickins Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <4FCC0B09.1070708@kernel.org> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On Mon, 4 Jun 2012, Minchan Kim wrote: > On 06/02/2012 01:40 PM, Hugh Dickins wrote: > > > On Fri, 1 Jun 2012, Linus Torvalds wrote: > >> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > >>> > >>> + spin_lock_irqsave(&zone->lock, flags); > >>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > >>> page++) { > >> > >> So holding the spinlock (and disabling irqs!) over the whole loop > >> sounds horrible. > > > > There looks to be a pretty similar loop inside move_freepages_block(), > > which is the part which I believe really needs the lock - it's moving > > free pages from one lru to another. > > > >> > >> At the same time, the iterators don't seem to require the spinlock, so > >> it should be possible to just move the lock into the loop, no? > > > > Move the lock after the loop, I think you meant. > > > > I put the lock before the loop because it's deciding whether it can > > usefully proceed, and then proceeding: I was thinking that the lock > > would stabilize the conditions that it bases that decision on. > > > We do it with two phase. > In first phase, we don't need lock because we don't need to be exact. > In second phase where move pages really, we need a lock so we already hold it. No, see Linus's point elsewhere in this thread. To spell it out further, page_order(page) uses page_private(page), and you've no idea what someone might put into page_private(page) once it's no longer PageBuddy but perhaps allocated to a user. So the unlocked advancment by page_order(page) may even take you way out of this or any pageblock. Linus was suggesting to take and drop the lock around that little block each time. Maybe. I'm wary, I don't pretend to have thought it through (nor shall further). > > ret = suitable_migration_target(page, cc); > .. > .. > spin_lock_irqsave(&zone->lock, flags); > ret = suitable_migration_target(page, cc); > > So you shouldn't put the lock in loop. > > > > > But it certainly does not stabilize all of them (most obviously not > > PageLRU), so I'm guesssing that this is a best-effort decision which > > > can safely go wrong some of the time. > > Right. > > > > > In which case, yes, much better to follow your suggestion, and hold > > the lock (with irqs disabled) for only half the time. > > > > Similarly untested patch below. > > > > But I'm entirely unfamiliar with this code: best Cc people more familiar > > with it. Does this addition of locking to rescue_unmovable_pageblock() > > look correct to you, and do you think it has a good chance of fixing the > > > No.I think we need to use start_page instead of page and I thought so, but Linus points out why not (pfn_valid_within). > we need a last page of page block to check cross-over zones, > not first page in next page block. Yes, that's the off-by-one I was alluding to. > > I should have reviewed more carefully. :( > > barrios@bbox:~/linux-2.6$ git diff > diff --git a/mm/compaction.c b/mm/compaction.c > index 4ac338a..b3fcc4b 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > > pfn = page_to_pfn(page); > start_pfn = pfn & ~(pageblock_nr_pages - 1); > - end_pfn = start_pfn + pageblock_nr_pages; > + end_pfn = start_pfn + pageblock_nr_pages - 1; Yes. > > start_page = pfn_to_page(start_pfn); > end_page = pfn_to_page(end_pfn); > @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > if (page_zone(start_page) != page_zone(end_page)) > return false; > > - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, > page++) { Yes. > if (!pfn_valid_within(pfn)) > continue; > @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) > return false; > } > > - set_pageblock_migratetype(page, MIGRATE_MOVABLE); > - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); > + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); > + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); No. I guess we can assume the incoming page was valid (fair?), so should still use that, but something else for the loop iterator. And you seem to have missed out all the locking needed. > return true; > } So Nack to that on several grounds. And I'd like to hear evidence that this really is useful code, justifying the locking and interrupt-disabling which would have to be added. My 0 out of 25000 was not reassuring. Nor the original test results, when it was doing completely the wrong thing unnoticed. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 9B2176B005C for ; Sun, 3 Jun 2012 21:47:29 -0400 (EDT) Received: by qcsd16 with SMTP id d16so2466429qcs.14 for ; Sun, 03 Jun 2012 18:47:28 -0700 (PDT) Message-ID: <4FCC13AC.3070005@gmail.com> Date: Sun, 03 Jun 2012 21:47:24 -0400 From: KOSAKI Motohiro MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Minchan Kim , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); >> - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); >> + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); > > No. I guess we can assume the incoming page was valid (fair?), > so should still use that, but something else for the loop iterator. Fair. passed page is always valid. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx194.postini.com [74.125.245.194]) by kanga.kvack.org (Postfix) with SMTP id 3E9016B005C for ; Sun, 3 Jun 2012 22:28:40 -0400 (EDT) Message-ID: <4FCC1D68.8060406@kernel.org> Date: Mon, 04 Jun 2012 11:28:56 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Hugh Dickins Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 06/04/2012 10:41 AM, Hugh Dickins wrote: > On Mon, 4 Jun 2012, Minchan Kim wrote: >> On 06/02/2012 01:40 PM, Hugh Dickins wrote: >> >>> On Fri, 1 Jun 2012, Linus Torvalds wrote: >>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: >>>>> >>>>> + spin_lock_irqsave(&zone->lock, flags); >>>>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >>>>> page++) { >>>> >>>> So holding the spinlock (and disabling irqs!) over the whole loop >>>> sounds horrible. >>> >>> There looks to be a pretty similar loop inside move_freepages_block(), >>> which is the part which I believe really needs the lock - it's moving >>> free pages from one lru to another. >>> >>>> >>>> At the same time, the iterators don't seem to require the spinlock, so >>>> it should be possible to just move the lock into the loop, no? >>> >>> Move the lock after the loop, I think you meant. >>> >>> I put the lock before the loop because it's deciding whether it can >>> usefully proceed, and then proceeding: I was thinking that the lock >>> would stabilize the conditions that it bases that decision on. >> >> >> We do it with two phase. >> In first phase, we don't need lock because we don't need to be exact. >> In second phase where move pages really, we need a lock so we already hold it. > > No, see Linus's point elsewhere in this thread. > > To spell it out further, page_order(page) uses page_private(page), > and you've no idea what someone might put into page_private(page) > once it's no longer PageBuddy but perhaps allocated to a user. > > So the unlocked advancment by page_order(page) may even take you > way out of this or any pageblock. > > Linus was suggesting to take and drop the lock around that little > block each time. Maybe. I'm wary, I don't pretend to have thought > it through (nor shall further). Right. I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it. I hope separating test which does just check whether it's migratable or not and working which really does rescue. So I think it would be better following as. if (!suitable_migration_target()) continue; spin_lock_irqsave(&zone->lock, flags); if (ret = suitable_migration_target()) { if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK) rescure_unmoable_pageblock() isolate_freepages_block(); } > >> >> ret = suitable_migration_target(page, cc); >> .. >> .. >> spin_lock_irqsave(&zone->lock, flags); >> ret = suitable_migration_target(page, cc); >> >> So you shouldn't put the lock in loop. >> >>> >>> But it certainly does not stabilize all of them (most obviously not >>> PageLRU), so I'm guesssing that this is a best-effort decision which >> >>> can safely go wrong some of the time. >> >> Right. >> >>> >>> In which case, yes, much better to follow your suggestion, and hold >>> the lock (with irqs disabled) for only half the time. >>> >>> Similarly untested patch below. >>> >>> But I'm entirely unfamiliar with this code: best Cc people more familiar >>> with it. Does this addition of locking to rescue_unmovable_pageblock() >>> look correct to you, and do you think it has a good chance of fixing the >> >> >> No.I think we need to use start_page instead of page and > > I thought so, but Linus points out why not (pfn_valid_within). > >> we need a last page of page block to check cross-over zones, >> not first page in next page block. > > Yes, that's the off-by-one I was alluding to. > >> >> I should have reviewed more carefully. :( >> >> barrios@bbox:~/linux-2.6$ git diff >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 4ac338a..b3fcc4b 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) >> >> pfn = page_to_pfn(page); >> start_pfn = pfn & ~(pageblock_nr_pages - 1); >> - end_pfn = start_pfn + pageblock_nr_pages; >> + end_pfn = start_pfn + pageblock_nr_pages - 1; > > Yes. > >> >> start_page = pfn_to_page(start_pfn); >> end_page = pfn_to_page(end_pfn); >> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) >> if (page_zone(start_page) != page_zone(end_page)) >> return false; >> >> - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >> + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, >> page++) { > > Yes. > >> if (!pfn_valid_within(pfn)) >> continue; >> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) >> return false; >> } >> >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); >> - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); >> + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); > > No. I guess we can assume the incoming page was valid (fair?), > so should still use that, but something else for the loop iterator. It should be fair. I did it in following mail. > > And you seem to have missed out all the locking needed. > >> return true; >> } > > So Nack to that on several grounds. > > And I'd like to hear evidence that this really is useful code, > justifying the locking and interrupt-disabling which would have to > be added. My 0 out of 25000 was not reassuring. Nor the original > test results, when it was doing completely the wrong thing unnoticed. In changelog, Bartlomiej said. My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means 131072 standard 4KiB pages in 'Normal' zone) is to: - allocate 120000 pages for kernel's usage - free every second page (60000 pages) of memory just allocated - allocate and use 60000 pages from user space - free remaining 60000 pages of kernel memory (now we have fragmented memory occupied mostly by user space pages) - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage The results: - with compaction disabled I get 11 successful allocations - with compaction enabled - 14 successful allocations - with this patch I'm able to get all 100 successful allocations I think above workload is really really artificial and theoretical so I didn't like this patch but Mel seem to like it. :( Quote from Mel " Ok, that is indeed an adverse workload that the current system will not properly deal with. I think you are right to try fixing this but may need a different approach that takes the cost out of the allocation/free path and moves it the compaction path." We can correct this patch to work but at least need justification about it. Do we really need this patch for such artificial workload? what do you think? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx201.postini.com [74.125.245.201]) by kanga.kvack.org (Postfix) with SMTP id 602B66B0062 for ; Sun, 3 Jun 2012 22:30:23 -0400 (EDT) Message-ID: <4FCC1DD0.8090003@kernel.org> Date: Mon, 04 Jun 2012 11:30:40 +0900 From: Minchan Kim MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> <4FCC0DB4.30106@kernel.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: KOSAKI Motohiro Cc: Linus Torvalds , Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org On 06/04/2012 10:26 AM, KOSAKI Motohiro wrote: >> Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock. >> We make sure it's valid in isolate_freepages. So how about this? >> >> barrios@bbox:~/linux-2.6$ git diff >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 4ac338a..7459ab5 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, >> static bool rescue_unmovable_pageblock(struct page *page) >> { >> unsigned long pfn, start_pfn, end_pfn; >> - struct page *start_page, *end_page; >> + struct page *start_page, *end_page, *cursor_page; >> >> pfn = page_to_pfn(page); >> start_pfn = pfn & ~(pageblock_nr_pages - 1); >> - end_pfn = start_pfn + pageblock_nr_pages; >> + end_pfn = start_pfn + pageblock_nr_pages - 1; >> >> start_page = pfn_to_page(start_pfn); >> end_page = pfn_to_page(end_pfn); >> @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page) >> if (page_zone(start_page) != page_zone(end_page)) >> return false; >> >> - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >> - page++) { >> + for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++, >> + cursor_page++) { >> if (!pfn_valid_within(pfn)) >> continue; > > I guess page_zone() should be used after pfn_valid_within(). Why can > we assume invalid > pfn return correct zone? Right you are. We can't make sure it in case of CONFIG_HOLES_IN_ZONE. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx186.postini.com [74.125.245.186]) by kanga.kvack.org (Postfix) with SMTP id 3054E6B005C for ; Mon, 4 Jun 2012 00:21:40 -0400 (EDT) Received: by qafl39 with SMTP id l39so1651393qaf.9 for ; Sun, 03 Jun 2012 21:21:39 -0700 (PDT) Message-ID: <4FCC37CE.3080203@gmail.com> Date: Mon, 04 Jun 2012 00:21:34 -0400 From: KOSAKI Motohiro MIME-Version: 1.0 Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> <4FCC1D68.8060406@kernel.org> In-Reply-To: <4FCC1D68.8060406@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Hugh Dickins , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com > In changelog, Bartlomiej said. > > My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means > 131072 standard 4KiB pages in 'Normal' zone) is to: > > - allocate 120000 pages for kernel's usage > - free every second page (60000 pages) of memory just allocated > - allocate and use 60000 pages from user space > - free remaining 60000 pages of kernel memory > (now we have fragmented memory occupied mostly by user space pages) > - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage > > The results: > - with compaction disabled I get 11 successful allocations > - with compaction enabled - 14 successful allocations > - with this patch I'm able to get all 100 successful allocations > > I think above workload is really really artificial and theoretical so I didn't like > this patch but Mel seem to like it. :( > > Quote from Mel > " Ok, that is indeed an adverse workload that the current system will not > properly deal with. I think you are right to try fixing this but may need > a different approach that takes the cost out of the allocation/free path > and moves it the compaction path." > > We can correct this patch to work but at least need justification about it. > Do we really need this patch for such artificial workload? > what do you think? I'm ok to resubmit. But please change the thread. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx193.postini.com [74.125.245.193]) by kanga.kvack.org (Postfix) with SMTP id 78CC86B0068 for ; Mon, 4 Jun 2012 09:38:48 -0400 (EDT) Received: from euspt1 (mailout4.w1.samsung.com [210.118.77.14]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0M530068DHXSLH10@mailout4.w1.samsung.com> for linux-mm@kvack.org; Mon, 04 Jun 2012 14:39:28 +0100 (BST) Received: from linux.samsung.com ([106.116.38.10]) by spt1.w1.samsung.com (iPlanet Messaging Server 5.2 Patch 2 (built Jul 14 2004)) with ESMTPA id <0M5300JI5HWLIS@spt1.w1.samsung.com> for linux-mm@kvack.org; Mon, 04 Jun 2012 14:38:46 +0100 (BST) Date: Mon, 04 Jun 2012 15:37:30 +0200 From: Bartlomiej Zolnierkiewicz Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-reply-to: <4FCC1D68.8060406@kernel.org> Message-id: <201206041537.30787.b.zolnierkie@samsung.com> MIME-version: 1.0 Content-type: Text/Plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT References: <20120530163317.GA13189@redhat.com> <4FCC1D68.8060406@kernel.org> Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Hugh Dickins , Linus Torvalds , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Hi, On Monday 04 June 2012 04:28:56 Minchan Kim wrote: > On 06/04/2012 10:41 AM, Hugh Dickins wrote: > > > On Mon, 4 Jun 2012, Minchan Kim wrote: > >> On 06/02/2012 01:40 PM, Hugh Dickins wrote: > >> > >>> On Fri, 1 Jun 2012, Linus Torvalds wrote: > >>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > >>>>> > >>>>> + spin_lock_irqsave(&zone->lock, flags); > >>>>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > >>>>> page++) { > >>>> > >>>> So holding the spinlock (and disabling irqs!) over the whole loop > >>>> sounds horrible. > >>> > >>> There looks to be a pretty similar loop inside move_freepages_block(), > >>> which is the part which I believe really needs the lock - it's moving > >>> free pages from one lru to another. > >>> > >>>> > >>>> At the same time, the iterators don't seem to require the spinlock, so > >>>> it should be possible to just move the lock into the loop, no? > >>> > >>> Move the lock after the loop, I think you meant. > >>> > >>> I put the lock before the loop because it's deciding whether it can > >>> usefully proceed, and then proceeding: I was thinking that the lock > >>> would stabilize the conditions that it bases that decision on. > >> > >> > >> We do it with two phase. > >> In first phase, we don't need lock because we don't need to be exact. > >> In second phase where move pages really, we need a lock so we already hold it. > > > > No, see Linus's point elsewhere in this thread. > > > > To spell it out further, page_order(page) uses page_private(page), > > and you've no idea what someone might put into page_private(page) > > once it's no longer PageBuddy but perhaps allocated to a user. > > > > So the unlocked advancment by page_order(page) may even take you > > way out of this or any pageblock. > > > > Linus was suggesting to take and drop the lock around that little > > block each time. Maybe. I'm wary, I don't pretend to have thought > > it through (nor shall further). > > > Right. > I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it. > I hope separating test which does just check whether it's migratable or not and working > which really does rescue. > So I think it would be better following as. > > if (!suitable_migration_target()) > continue; > > spin_lock_irqsave(&zone->lock, flags); > if (ret = suitable_migration_target()) { > if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK) > rescure_unmoable_pageblock() > isolate_freepages_block(); > } > > > > >> > >> ret = suitable_migration_target(page, cc); > >> .. > >> .. > >> spin_lock_irqsave(&zone->lock, flags); > >> ret = suitable_migration_target(page, cc); > >> > >> So you shouldn't put the lock in loop. > >> > >>> > >>> But it certainly does not stabilize all of them (most obviously not > >>> PageLRU), so I'm guesssing that this is a best-effort decision which > >> > >>> can safely go wrong some of the time. > >> > >> Right. > >> > >>> > >>> In which case, yes, much better to follow your suggestion, and hold > >>> the lock (with irqs disabled) for only half the time. > >>> > >>> Similarly untested patch below. > >>> > >>> But I'm entirely unfamiliar with this code: best Cc people more familiar > >>> with it. Does this addition of locking to rescue_unmovable_pageblock() > >>> look correct to you, and do you think it has a good chance of fixing the > >> > >> > >> No.I think we need to use start_page instead of page and > > > > I thought so, but Linus points out why not (pfn_valid_within). > > > >> we need a last page of page block to check cross-over zones, > >> not first page in next page block. > > > > Yes, that's the off-by-one I was alluding to. > > > >> > >> I should have reviewed more carefully. :( > >> > >> barrios@bbox:~/linux-2.6$ git diff > >> diff --git a/mm/compaction.c b/mm/compaction.c > >> index 4ac338a..b3fcc4b 100644 > >> --- a/mm/compaction.c > >> +++ b/mm/compaction.c > >> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > >> > >> pfn = page_to_pfn(page); > >> start_pfn = pfn & ~(pageblock_nr_pages - 1); > >> - end_pfn = start_pfn + pageblock_nr_pages; > >> + end_pfn = start_pfn + pageblock_nr_pages - 1; > > > > Yes. > > > >> > >> start_page = pfn_to_page(start_pfn); > >> end_page = pfn_to_page(end_pfn); > >> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > >> if (page_zone(start_page) != page_zone(end_page)) > >> return false; > >> > >> - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > >> + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, > >> page++) { > > > > Yes. > > > >> if (!pfn_valid_within(pfn)) > >> continue; > >> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) > >> return false; > >> } > >> > >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); > >> - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); > >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); > >> + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); > > > > No. I guess we can assume the incoming page was valid (fair?), > > so should still use that, but something else for the loop iterator. > > > It should be fair. I did it in following mail. > > > > > And you seem to have missed out all the locking needed. > > > >> return true; > >> } > > > > So Nack to that on several grounds. > > > > And I'd like to hear evidence that this really is useful code, > > justifying the locking and interrupt-disabling which would have to > > be added. My 0 out of 25000 was not reassuring. Nor the original > > test results, when it was doing completely the wrong thing unnoticed. > > > In changelog, Bartlomiej said. > > My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means > 131072 standard 4KiB pages in 'Normal' zone) is to: > > - allocate 120000 pages for kernel's usage > - free every second page (60000 pages) of memory just allocated > - allocate and use 60000 pages from user space > - free remaining 60000 pages of kernel memory > (now we have fragmented memory occupied mostly by user space pages) > - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage > > The results: > - with compaction disabled I get 11 successful allocations > - with compaction enabled - 14 successful allocations > - with this patch I'm able to get all 100 successful allocations > > I think above workload is really really artificial and theoretical so I didn't like > this patch but Mel seem to like it. :( > > Quote from Mel > " Ok, that is indeed an adverse workload that the current system will not > properly deal with. I think you are right to try fixing this but may need > a different approach that takes the cost out of the allocation/free path > and moves it the compaction path." Please note that the current patch is less intrusive than the original version that Mel was talking about in the above quote (the cost is only in compaction path which is non-default one and in a allocation slow-path). > We can correct this patch to work but at least need justification about it. > Do we really need this patch for such artificial workload? > what do you think? I would still like to get this patch included since it helps with my test-case and is not very much code / complexity added. So far I fixed (all?) outstanding issues in the patch attached below and will post the next combined version (v9) of the patch in the new thread. Best regards, -- Bartlomiej Zolnierkiewicz Samsung Poland R&D Center - use right page for pageblock conversion in rescue_unmovable_pageblock() - split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock() and __rescue_unmovable_pageblock() - add missing locking --- mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 21 deletions(-) Index: b/mm/compaction.c =================================================================== --- a/mm/compaction.c 2012-06-04 15:19:04.564467996 +0200 +++ b/mm/compaction.c 2012-06-04 15:19:15.700467901 +0200 @@ -362,50 +362,70 @@ isolate_migratepages_range(struct zone * #endif /* CONFIG_COMPACTION || CONFIG_CMA */ #ifdef CONFIG_COMPACTION /* - * Returns true if MIGRATE_UNMOVABLE pageblock was successfully + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully * converted to MIGRATE_MOVABLE type, false otherwise. */ -static bool rescue_unmovable_pageblock(struct page *page) +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked) { unsigned long pfn, start_pfn, end_pfn; - struct page *start_page, *end_page; + struct page *start_page, *end_page, *cursor_page; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); - end_pfn = start_pfn + pageblock_nr_pages; + end_pfn = start_pfn + pageblock_nr_pages - 1; start_page = pfn_to_page(start_pfn); end_page = pfn_to_page(end_pfn); - /* Do not deal with pageblocks that overlap zones */ - if (page_zone(start_page) != page_zone(end_page)) - return false; + for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; + pfn++, cursor_page++) { + struct zone *zone = page_zone(start_page); + unsigned long flags; - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, - page++) { if (!pfn_valid_within(pfn)) continue; - if (PageBuddy(page)) { - int order = page_order(page); + /* Do not deal with pageblocks that overlap zones */ + if (page_zone(cursor_page) != zone) + return false; + + if (!locked) + spin_lock_irqsave(&zone->lock, flags); + + if (PageBuddy(cursor_page)) { + int order = page_order(cursor_page); pfn += (1 << order) - 1; - page += (1 << order) - 1; + cursor_page += (1 << order) - 1; + if (!locked) + spin_unlock_irqrestore(&zone->lock, flags); continue; - } else if (page_count(page) == 0 || PageLRU(page)) + } else if (page_count(cursor_page) == 0 || + PageLRU(cursor_page)) { + if (!locked) + spin_unlock_irqrestore(&zone->lock, flags); continue; + } + + if (!locked) + spin_unlock_irqrestore(&zone->lock, flags); return false; } + return true; +} + +void __rescue_unmovable_pageblock(struct page *page) +{ set_pageblock_migratetype(page, MIGRATE_MOVABLE); move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); - return true; } enum smt_result { GOOD_AS_MIGRATION_TARGET, + GOOD_CAN_RESCUE_UNMOVABLE_TARGET, FAIL_UNMOVABLE_TARGET, FAIL_BAD_TARGET, }; @@ -416,7 +436,7 @@ enum smt_result { * is within a MIGRATE_UNMOVABLE block, FAIL_BAD_TARGET otherwise. */ static enum smt_result suitable_migration_target(struct page *page, - struct compact_control *cc) + struct compact_control *cc, bool locked) { int migratetype = get_pageblock_migratetype(page); @@ -440,8 +460,8 @@ static enum smt_result suitable_migratio if (cc->mode != COMPACT_ASYNC_MOVABLE && migratetype == MIGRATE_UNMOVABLE && - rescue_unmovable_pageblock(page)) - return GOOD_AS_MIGRATION_TARGET; + can_rescue_unmovable_pageblock(page, locked)) + return GOOD_CAN_RESCUE_UNMOVABLE_TARGET; /* Otherwise skip the block */ return FAIL_BAD_TARGET; @@ -509,8 +529,9 @@ static void isolate_freepages(struct zon continue; /* Check the block is suitable for migration */ - ret = suitable_migration_target(page, cc); - if (ret != GOOD_AS_MIGRATION_TARGET) { + ret = suitable_migration_target(page, cc, false); + if (ret != GOOD_AS_MIGRATION_TARGET && + ret != GOOD_CAN_RESCUE_UNMOVABLE_TARGET) { if (ret == FAIL_UNMOVABLE_TARGET) cc->nr_pageblocks_skipped++; continue; @@ -523,8 +544,11 @@ static void isolate_freepages(struct zon */ isolated = 0; spin_lock_irqsave(&zone->lock, flags); - ret = suitable_migration_target(page, cc); - if (ret == GOOD_AS_MIGRATION_TARGET) { + ret = suitable_migration_target(page, cc, true); + if (ret == GOOD_AS_MIGRATION_TARGET || + ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) { + if (ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) + __rescue_unmovable_pageblock(page); end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); isolated = isolate_freepages_block(pfn, end_pfn, freelist, false); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752837Ab2E3QdZ (ORCPT ); Wed, 30 May 2012 12:33:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58424 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752502Ab2E3QdX (ORCPT ); Wed, 30 May 2012 12:33:23 -0400 Date: Wed, 30 May 2012 12:33:17 -0400 From: Dave Jones To: Linux Kernel Cc: linux-mm@kvack.org Subject: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120530163317.GA13189@redhat.com> Mail-Followup-To: Dave Jones , Linux Kernel , linux-mm@kvack.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc ------------[ cut here ]------------ WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Hardware name: Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] __set_page_dirty_nobuffers+0x13a/0x170 [] migrate_page_copy+0x1e2/0x260 [] migrate_page+0x5b/0x70 [] move_to_new_page+0xa5/0x260 [] migrate_pages+0x4c8/0x540 [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 [] compact_zone+0x216/0x480 [] ? debug_check_no_obj_freed+0x88/0x210 [] compact_zone_order+0x8d/0xd0 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 [] __alloc_pages_nodemask+0x60b/0xab0 [] ? debug_check_no_obj_freed+0x16c/0x210 [] alloc_pages_vma+0xb6/0x190 [] khugepaged+0x95d/0x1570 [] ? wake_up_bit+0x40/0x40 [] ? collect_mm_slot+0xa0/0xa0 [] kthread+0xb7/0xc0 [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0xe/0xe [] ? flush_kthread_worker+0x190/0x190 [] ? gs_change+0xb/0xb ---[ end trace 4324bd0bca27f6f0 ]--- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756828Ab2EaA5u (ORCPT ); Wed, 30 May 2012 20:57:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33243 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754583Ab2EaA5s (ORCPT ); Wed, 30 May 2012 20:57:48 -0400 Date: Wed, 30 May 2012 20:57:40 -0400 From: Dave Jones To: Linux Kernel Cc: linux-mm@kvack.org, Andrew Morton Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120531005739.GA4532@redhat.com> Mail-Followup-To: Dave Jones , Linux Kernel , linux-mm@kvack.org, Andrew Morton References: <20120530163317.GA13189@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120530163317.GA13189@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_null+0x1a/0x20 > [] __set_page_dirty_nobuffers+0x13a/0x170 > [] migrate_page_copy+0x1e2/0x260 > [] migrate_page+0x5b/0x70 > [] move_to_new_page+0xa5/0x260 > [] migrate_pages+0x4c8/0x540 > [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > [] compact_zone+0x216/0x480 > [] ? debug_check_no_obj_freed+0x88/0x210 > [] compact_zone_order+0x8d/0xd0 > [] try_to_compact_pages+0xc9/0x140 > [] __alloc_pages_direct_compact+0xaa/0x1d0 > [] __alloc_pages_nodemask+0x60b/0xab0 > [] ? debug_check_no_obj_freed+0x16c/0x210 > [] alloc_pages_vma+0xb6/0x190 > [] khugepaged+0x95d/0x1570 > [] ? wake_up_bit+0x40/0x40 > [] ? collect_mm_slot+0xa0/0xa0 > [] kthread+0xb7/0xc0 > [] kernel_thread_helper+0x4/0x10 > [] ? retint_restore_args+0xe/0xe > [] ? flush_kthread_worker+0x190/0x190 > [] ? gs_change+0xb/0xb Seems this can be triggered from mmap, as well as from khugepaged.. WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_null+0x1a/0x20 [] __set_page_dirty_nobuffers+0x13a/0x170 [] migrate_page_copy+0x1e2/0x260 [] migrate_page+0x5b/0x70 [] move_to_new_page+0xa5/0x260 [] migrate_pages+0x4c8/0x540 [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 [] compact_zone+0x216/0x480 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] compact_zone_order+0x8d/0xd0 [] ? get_page_from_freelist+0x565/0x970 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 [] __alloc_pages_nodemask+0x60b/0xab0 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] ? __lock_acquire+0x2b0/0x1aa0 [] alloc_pages_vma+0xb6/0x190 [] do_huge_pmd_anonymous_page+0x133/0x310 [] handle_mm_fault+0x242/0x2e0 [] __get_user_pages+0x142/0x560 [] ? mmap_region+0x3f8/0x630 [] get_user_pages+0x52/0x60 [] make_pages_present+0x92/0xc0 [] mmap_region+0x3a6/0x630 [] ? do_setitimer+0x1cc/0x310 [] do_mmap_pgoff+0x35d/0x3b0 [] ? sys_mmap_pgoff+0x66/0x240 [] sys_mmap_pgoff+0x84/0x240 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] sys_mmap+0x22/0x30 [] system_call_fastpath+0x16/0x1b ---[ end trace 336c91f371296e41 ]--- I'd bisect this, but it takes a few hours to trigger, which makes it hard to distinguish between 'good kernel' and 'hasn't triggered yet'. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758797Ab2FACbS (ORCPT ); Thu, 31 May 2012 22:31:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32357 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758452Ab2FACbP (ORCPT ); Thu, 31 May 2012 22:31:15 -0400 Date: Thu, 31 May 2012 22:31:07 -0400 From: Dave Jones To: Linux Kernel , linux-mm@kvack.org, Andrew Morton , Linus Torvalds , Hugh Dickins , Cong Wang Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601023107.GA19445@redhat.com> Mail-Followup-To: Dave Jones , Linux Kernel , linux-mm@kvack.org, Andrew Morton , Linus Torvalds , Hugh Dickins , Cong Wang References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120531005739.GA4532@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote: > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > > Modules linked in: ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_emu10k1 snd_util_mem snd_ac97_codec ac97_bus snd_hwdep snd_rawmidi snd_seq_device snd_pcm microcode snd_page_alloc pcspkr snd_timer snd lpc_ich i2c_i801 mfd_core e1000e soundcore vhost_net tun macvtap macvlan kvm_intel nfsd kvm nfs_acl auth_rpcgss lockd sunrpc btrfs libcrc32c zlib_deflate firewire_ohci firewire_core sata_sil crc_itu_t floppy radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] > > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 > > Call Trace: > > [] warn_slowpath_common+0x7f/0xc0 > > [] warn_slowpath_null+0x1a/0x20 > > [] __set_page_dirty_nobuffers+0x13a/0x170 > > [] migrate_page_copy+0x1e2/0x260 > > [] migrate_page+0x5b/0x70 > > [] move_to_new_page+0xa5/0x260 > > [] migrate_pages+0x4c8/0x540 > > [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > > [] compact_zone+0x216/0x480 > > [] ? debug_check_no_obj_freed+0x88/0x210 > > [] compact_zone_order+0x8d/0xd0 > > [] try_to_compact_pages+0xc9/0x140 > > [] __alloc_pages_direct_compact+0xaa/0x1d0 > > [] __alloc_pages_nodemask+0x60b/0xab0 > > [] ? debug_check_no_obj_freed+0x16c/0x210 > > [] alloc_pages_vma+0xb6/0x190 > > [] khugepaged+0x95d/0x1570 > > [] ? wake_up_bit+0x40/0x40 > > [] ? collect_mm_slot+0xa0/0xa0 > > [] kthread+0xb7/0xc0 > > [] kernel_thread_helper+0x4/0x10 > > [] ? retint_restore_args+0xe/0xe > > [] ? flush_kthread_worker+0x190/0x190 > > [] ? gs_change+0xb/0xb > > Seems this can be triggered from mmap, as well as from khugepaged.. > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > Modules linked in: tun dccp_ipv4 dccp nfnetlink sctp libcrc32c fuse ipt_ULOG binfmt_misc caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel serio_raw microcode pcspkr i2c_i801 usb_debug lpc_ich mfd_core e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_null+0x1a/0x20 > [] __set_page_dirty_nobuffers+0x13a/0x170 > [] migrate_page_copy+0x1e2/0x260 > [] migrate_page+0x5b/0x70 > [] move_to_new_page+0xa5/0x260 > [] migrate_pages+0x4c8/0x540 > [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > [] compact_zone+0x216/0x480 > [] ? trace_hardirqs_off_caller+0x28/0xc0 > [] compact_zone_order+0x8d/0xd0 > [] ? get_page_from_freelist+0x565/0x970 > [] try_to_compact_pages+0xc9/0x140 > [] __alloc_pages_direct_compact+0xaa/0x1d0 > [] __alloc_pages_nodemask+0x60b/0xab0 > [] ? trace_hardirqs_off_caller+0x28/0xc0 > [] ? __lock_acquire+0x2b0/0x1aa0 > [] alloc_pages_vma+0xb6/0x190 > [] do_huge_pmd_anonymous_page+0x133/0x310 > [] handle_mm_fault+0x242/0x2e0 > [] __get_user_pages+0x142/0x560 > [] ? mmap_region+0x3f8/0x630 > [] get_user_pages+0x52/0x60 > [] make_pages_present+0x92/0xc0 > [] mmap_region+0x3a6/0x630 > [] ? do_setitimer+0x1cc/0x310 > [] do_mmap_pgoff+0x35d/0x3b0 > [] ? sys_mmap_pgoff+0x66/0x240 > [] sys_mmap_pgoff+0x84/0x240 > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [] sys_mmap+0x22/0x30 > [] system_call_fastpath+0x16/0x1b > ---[ end trace 336c91f371296e41 ]--- > > > > I'd bisect this, but it takes a few hours to trigger, which makes it hard > to distinguish between 'good kernel' and 'hasn't triggered yet'. So I bisected it anyway, and it led to ... 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit commit 3f31d07571eeea18a7d34db9af21d2285b807a17 Author: Hugh Dickins Date: Tue May 29 15:06:40 2012 -0700 mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE Now tmpfs supports hole-punching via fallocate(), switch madvise_remove() to use do_fallocate() instead of vmtruncate_range(): which extends madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs. There is one more user of vmtruncate_range() in our tree, staging/android's ashmem_shrink(): convert it to use do_fallocate() too (but if its unpinned areas are already unmapped - I don't know - then it would do better to use shmem_truncate_range() directly). Based-on-patch-by: Cong Wang Signed-off-by: Hugh Dickins Cc: Christoph Hellwig Cc: Al Viro Cc: Colin Cross Cc: John Stultz Cc: Greg Kroah-Hartman Cc: "Theodore Ts'o" Cc: Andreas Dilger Cc: Mark Fasheh Cc: Joel Becker Cc: Dave Chinner Cc: Ben Myers Cc: Michael Kerrisk Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Hugh ? I'll repeat the bisect tomorrow just to be sure. (It took all day, even though there were only a half dozen bisect points, as I ran the test for an hour on each build to see what fell out). Here's what I found.. git bisect start 'mm/' # bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4 git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc # good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682 # bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec git bisect bad 89abfab133ef1f5902abafb744df72793213ac19 # bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3 # good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605 # bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17 # good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing git bisect good ec9516fbc5fa814014991e1ae7f8860127122105 # good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe This has been a challenge to bisect additionally because I'm not sure if the other mm bug I reported in the last few days (the list_debug/list_add corruption warnings in the compaction code) are related or not. Sometimes during the bisect these errors happened in pairs, sometimes only together. The 'good' builds showed no errors at all. As a reminder, the list_add corruption looks like this... WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90() list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920). Modules linked in: ipt_ULOG fuse tun nfnetlink binfmt_misc sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw i2c_i801 pcspkr e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42 Call Trace: [] warn_slowpath_common+0x7f/0xc0 [] warn_slowpath_fmt+0x46/0x50 [] ? trace_hardirqs_on+0xd/0x10 [] __list_add+0x6c/0x90 [] move_freepages_block+0x16d/0x190 [] suitable_migration_target.isra.14+0x1b3/0x1d0 [] compaction_alloc+0x1db/0x2f0 [] migrate_pages+0xc7/0x540 [] ? isolate_freepages_block+0x260/0x260 [] compact_zone+0x216/0x480 [] compact_zone_order+0x8d/0xd0 [] ? get_page_from_freelist+0x565/0x970 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 [] __alloc_pages_nodemask+0x60b/0xab0 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] ? __lock_acquire+0x2f0/0x1aa0 [] alloc_pages_vma+0xb6/0x190 [] do_huge_pmd_anonymous_page+0x133/0x310 [] handle_mm_fault+0x242/0x2e0 [] __get_user_pages+0x142/0x560 [] ? mmap_region+0x3f8/0x630 [] get_user_pages+0x52/0x60 [] make_pages_present+0x92/0xc0 [] mmap_region+0x3a6/0x630 [] ? do_setitimer+0x1cc/0x310 [] do_mmap_pgoff+0x35d/0x3b0 [] ? sys_mmap_pgoff+0x66/0x240 [] sys_mmap_pgoff+0x84/0x240 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] sys_mmap+0x22/0x30 [] system_call_fastpath+0x16/0x1b ---[ end trace b606ea2a53bf1425 ]--- On an affected kernel, it'll show up within an hour of fuzzing on a fast machine. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758803Ab2FACns (ORCPT ); Thu, 31 May 2012 22:43:48 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:48431 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758775Ab2FACnr (ORCPT ); Thu, 31 May 2012 22:43:47 -0400 MIME-Version: 1.0 In-Reply-To: <20120601023107.GA19445@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> From: Linus Torvalds Date: Thu, 31 May 2012 19:43:25 -0700 X-Google-Sender-Auth: tpl3gwMFanlH_ZOBf-7ku_28mh0 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Dave Jones , Linux Kernel , linux-mm@kvack.org, Andrew Morton , Linus Torvalds , Hugh Dickins , Cong Wang Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 31, 2012 at 7:31 PM, Dave Jones wrote: > > So I bisected it anyway, and it led to ... Ok, that doesn't sound entirely unlikely, but considering that you're nervous about the bisection, please just try to revert it and see if that fixes your testcase. You'll obviously need to revert the commit that removes vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce the use of it (it's the next one: 17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two commits revert cleanly and the end result seems to compile ok. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759062Ab2FAIpP (ORCPT ); Fri, 1 Jun 2012 04:45:15 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:44128 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758375Ab2FAIpL (ORCPT ); Fri, 1 Jun 2012 04:45:11 -0400 Date: Fri, 1 Jun 2012 01:44:44 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120601023107.GA19445@redhat.com> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 31 May 2012, Dave Jones wrote: > On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote: > > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > > > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() I did see your reports, and noted to come back to them, but sad to say I hadn't even made time to check out line 1990 of mm/page-writeback.c: ah, that WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > > > Pid: 35, comm: khugepaged Not tainted 3.4.0+ #75 > > > Call Trace: > > > [] __set_page_dirty_nobuffers+0x13a/0x170 > > > [] migrate_page_copy+0x1e2/0x260 > > > > Seems this can be triggered from mmap, as well as from khugepaged.. > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() > > Pid: 1171, comm: trinity-child4 Not tainted 3.4.0+ #38 > > Call Trace: > > [] __set_page_dirty_nobuffers+0x13a/0x170 > > [] migrate_page_copy+0x1e2/0x260 > > > > I'd bisect this, but it takes a few hours to trigger, which makes it hard > > to distinguish between 'good kernel' and 'hasn't triggered yet'. > > So I bisected it anyway, and it led to ... Thanks so much for taking the trouble. > > 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit > commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > Author: Hugh Dickins > Date: Tue May 29 15:06:40 2012 -0700 > > mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE > > Now tmpfs supports hole-punching via fallocate(), switch madvise_remove() > to use do_fallocate() instead of vmtruncate_range(): which extends > madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs. > > Hugh ? Ow, you've caught me. > > I'll repeat the bisect tomorrow just to be sure. (It took all day, even though > there were only a half dozen bisect points, as I ran the test for an hour on > each build to see what fell out). > > Here's what I found.. > > git bisect start 'mm/' > # bad: [4b395d7ea79472ac240ee8768b4930ca9ce096ef] Merge /home/davej/src/git-trees/kernel/linux > git bisect bad 4b395d7ea79472ac240ee8768b4930ca9ce096ef > # good: [76e10d158efb6d4516018846f60c2ab5501900bc] Linux 3.4 > git bisect good 76e10d158efb6d4516018846f60c2ab5501900bc > # good: [c6785b6bf1b2a4b47238b24ee56f61e27c3af682] mm: bootmem: rename alloc_bootmem_core to alloc_bootmem_bdata > git bisect good c6785b6bf1b2a4b47238b24ee56f61e27c3af682 > # bad: [89abfab133ef1f5902abafb744df72793213ac19] mm/memcg: move reclaim_stat into lruvec > git bisect bad 89abfab133ef1f5902abafb744df72793213ac19 > # bad: [4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3] tmpfs: support SEEK_DATA and SEEK_HOLE > git bisect bad 4fb5ef089b288942c6fc3f85c4ecb4016c1aa4c3 > # good: [bde05d1ccd512696b09db9dd2e5f33ad19152605] shmem: replace page if mapping excludes its zone > git bisect good bde05d1ccd512696b09db9dd2e5f33ad19152605 > # bad: [3f31d07571eeea18a7d34db9af21d2285b807a17] mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE > git bisect bad 3f31d07571eeea18a7d34db9af21d2285b807a17 > # good: [ec9516fbc5fa814014991e1ae7f8860127122105] tmpfs: optimize clearing when writing > git bisect good ec9516fbc5fa814014991e1ae7f8860127122105 > # good: [83e4fa9c16e4af7122e31be3eca5d57881d236fe] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE > git bisect good 83e4fa9c16e4af7122e31be3eca5d57881d236fe That puzzled me for quite a while: it seemed so much more likely that your bisection would converge on the commit which comes a few later, 1635f6a74152 "tmpfs: undo fallocation on failure", where indeed I do start to play around with tmpfs pages unlocked while !PageUptodate. And yes, they're PageDirty !PagePrivate, so migration could very well end up trying to migrate one and hitting line 1990. It's an aberration of migrate_page_copy(), that it uses __set_page_dirty_nobuffers() on mappings which would never normally go that way at all (I discovered this last year, when I experimented with radix_tree tags for swap in tmpfs, and hit upon this rare case where page migration sets a dirty tag for a tmpfs page, despite tmpfs never using tags). One half of the patch at the bottom should fix that: I'm not sure that it's the fix we actually want (a mapping_cap_account_dirty test might be more appropriate, but it's easier just to test a page flag here); but it should be good to shed more light on the problem. Because your bisection converged on a commit a few before I introduced that bug - and although it was a difficult bisection, you would be very unlikely to mistake a good for bad: the danger was the other way around. So I'm wondering if your trinity fuzzer happens to succeed a lot more often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?), which began to support MADV_REMOVE with that commit. So the second half of the patch should show which filesystem's page is involved when you hit the WARN_ON - unless the first half of the patch turns out to stop the warnings completely, in which case I need to think harder about what was going on in tmpfs, and whether it matters. Or another possibility is that the bad commit doesn't actually touch mm at all: you were doing a bisection just on mm/ changes, weren't you? > > This has been a challenge to bisect additionally because I'm not sure if the other mm > bug I reported in the last few days (the list_debug/list_add corruption warnings in the > compaction code) are related or not. At present I suspect they're not related; but may change my mind. > Sometimes during the bisect these errors happened > in pairs, sometimes only together. Sometimes in pairs, sometimes together? I don't understand. And are "these errors" the list debug warnings, or list debug warnings and Line 1990 warnings? > The 'good' builds showed no errors at all. > > As a reminder, the list_add corruption looks like this... > > WARNING: at lib/list_debug.c:29 __list_add+0x6c/0x90() > list_add corruption. next->prev should be prev (ffff88014e5d9ed8), but was ffffea0004f48360. (next=ffffea0004b23920). > Pid: 24594, comm: trinity-child1 Not tainted 3.4.0+ #42 > Call Trace: > [] warn_slowpath_common+0x7f/0xc0 > [] warn_slowpath_fmt+0x46/0x50 > [] ? trace_hardirqs_on+0xd/0x10 > [] __list_add+0x6c/0x90 > [] move_freepages_block+0x16d/0x190 > [] suitable_migration_target.isra.14+0x1b3/0x1d0 > [] compaction_alloc+0x1db/0x2f0 > [] migrate_pages+0xc7/0x540 > [] ? isolate_freepages_block+0x260/0x260 > [] compact_zone+0x216/0x480 > [] compact_zone_order+0x8d/0xd0 > [] ? get_page_from_freelist+0x565/0x970 > [] try_to_compact_pages+0xc9/0x140 > [] __alloc_pages_direct_compact+0xaa/0x1d0 > [] __alloc_pages_nodemask+0x60b/0xab0 > [] ? trace_hardirqs_off_caller+0x28/0xc0 > [] ? __lock_acquire+0x2f0/0x1aa0 > [] alloc_pages_vma+0xb6/0x190 > [] do_huge_pmd_anonymous_page+0x133/0x310 > [] handle_mm_fault+0x242/0x2e0 > [] __get_user_pages+0x142/0x560 > [] ? mmap_region+0x3f8/0x630 > [] get_user_pages+0x52/0x60 > [] make_pages_present+0x92/0xc0 > [] mmap_region+0x3a6/0x630 > [] ? do_setitimer+0x1cc/0x310 > [] do_mmap_pgoff+0x35d/0x3b0 > [] ? sys_mmap_pgoff+0x66/0x240 > [] sys_mmap_pgoff+0x84/0x240 > [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [] sys_mmap+0x22/0x30 > [] system_call_fastpath+0x16/0x1b > ---[ end trace b606ea2a53bf1425 ]--- > > On an affected kernel, it'll show up within an hour of fuzzing on a fast machine. Please give this patch a try (preferably on current git), and let us know. Thanks, Hugh --- 3.4.0+/mm/migrate.c 2012-05-27 10:01:43.104049010 -0700 +++ linux/mm/migrate.c 2012-06-01 00:10:58.080098749 -0700 @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp * is actually a signal that all of the page has become dirty. * Whereas only part of our page may be dirty. */ - __set_page_dirty_nobuffers(newpage); + if (PageSwapBacked(page)) + SetPageDirty(newpage); + else + __set_page_dirty_nobuffers(newpage); } mlock_migrate_page(newpage, page); --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa mapping2 = page_mapping(page); if (mapping2) { /* Race with truncate? */ BUG_ON(mapping2 != mapping); - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); + if (WARN_ON(!PagePrivate(page) && !PageUptodate(page))) + print_symbol(KERN_WARNING + "mapping->a_ops->writepage: %s\n", + (unsigned long)mapping->a_ops->writepage); account_page_dirtied(page, mapping); radix_tree_tag_set(&mapping->page_tree, page_index(page), PAGECACHE_TAG_DIRTY); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932225Ab2FAIvj (ORCPT ); Fri, 1 Jun 2012 04:51:39 -0400 Received: from mail-qa0-f46.google.com ([209.85.216.46]:54482 "EHLO mail-qa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752810Ab2FAIvi (ORCPT ); Fri, 1 Jun 2012 04:51:38 -0400 Message-ID: <4FC88299.1040707@gmail.com> Date: Fri, 01 Jun 2012 04:51:37 -0400 From: KOSAKI Motohiro User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Hugh Dickins CC: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > mlock_migrate_page(newpage, page); > --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 > +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa > mapping2 = page_mapping(page); > if (mapping2) { /* Race with truncate? */ > BUG_ON(mapping2 != mapping); > - WARN_ON_ONCE(!PagePrivate(page)&& !PageUptodate(page)); > + if (WARN_ON(!PagePrivate(page)&& !PageUptodate(page))) > + print_symbol(KERN_WARNING > + "mapping->a_ops->writepage: %s\n", > + (unsigned long)mapping->a_ops->writepage); type mismatch? I guess you want %pf or %pF. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757262Ab2FAJIh (ORCPT ); Fri, 1 Jun 2012 05:08:37 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:45377 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754086Ab2FAJIg (ORCPT ); Fri, 1 Jun 2012 05:08:36 -0400 Date: Fri, 1 Jun 2012 02:08:07 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: KOSAKI Motohiro cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <4FC88299.1040707@gmail.com> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <4FC88299.1040707@gmail.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 1 Jun 2012, KOSAKI Motohiro wrote: > > mlock_migrate_page(newpage, page); > > --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 > > +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 > > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa > > mapping2 = page_mapping(page); > > if (mapping2) { /* Race with truncate? */ > > BUG_ON(mapping2 != mapping); > > - WARN_ON_ONCE(!PagePrivate(page)&& > > !PageUptodate(page)); > > + if (WARN_ON(!PagePrivate(page)&& > > !PageUptodate(page))) > > + print_symbol(KERN_WARNING > > + "mapping->a_ops->writepage: %s\n", > > + (unsigned > > long)mapping->a_ops->writepage); > > type mismatch? I don't think so: I just copied from print_bad_pte(). Probably you're reading "printk" where it's "print_symbol"? > I guess you want %pf or %pF. I expect there is new-fangled %pMagic that can do it too, yes. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758103Ab2FAJMp (ORCPT ); Fri, 1 Jun 2012 05:12:45 -0400 Received: from mail-yw0-f46.google.com ([209.85.213.46]:34662 "EHLO mail-yw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757867Ab2FAJMn convert rfc822-to-8bit (ORCPT ); Fri, 1 Jun 2012 05:12:43 -0400 MIME-Version: 1.0 In-Reply-To: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <4FC88299.1040707@gmail.com> From: KOSAKI Motohiro Date: Fri, 1 Jun 2012 05:12:19 -0400 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Hugh Dickins Cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 1, 2012 at 5:08 AM, Hugh Dickins wrote: > On Fri, 1 Jun 2012, KOSAKI Motohiro wrote: >> >     mlock_migrate_page(newpage, page); >> > --- 3.4.0+/mm/page-writeback.c      2012-05-29 08:09:58.304806782 -0700 >> > +++ linux/mm/page-writeback.c       2012-06-01 00:23:43.984116973 -0700 >> > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa >> >             mapping2 = page_mapping(page); >> >             if (mapping2) { /* Race with truncate? */ >> >                     BUG_ON(mapping2 != mapping); >> > -                   WARN_ON_ONCE(!PagePrivate(page)&& >> > !PageUptodate(page)); >> > +                   if (WARN_ON(!PagePrivate(page)&& >> > !PageUptodate(page))) >> > +                           print_symbol(KERN_WARNING >> > +                               "mapping->a_ops->writepage: %s\n", >> > +                               (unsigned >> > long)mapping->a_ops->writepage); >> >> type mismatch? > > I don't think so: I just copied from print_bad_pte(). > Probably you're reading "printk" where it's "print_symbol"? Oops, yes, sorry for noise. >> I guess you want %pf or %pF. > > I expect there is new-fangled %pMagic that can do it too, yes. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759872Ab2FANnd (ORCPT ); Fri, 1 Jun 2012 09:43:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:61484 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758209Ab2FANnc (ORCPT ); Fri, 1 Jun 2012 09:43:32 -0400 Date: Fri, 1 Jun 2012 09:43:23 -0400 From: Dave Jones To: Linus Torvalds Cc: Linux Kernel , linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Cong Wang Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601134323.GA5214@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Linux Kernel , linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Cong Wang References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 31, 2012 at 07:43:25PM -0700, Linus Torvalds wrote: > On Thu, May 31, 2012 at 7:31 PM, Dave Jones wrote: > > > > So I bisected it anyway, and it led to ... > > Ok, that doesn't sound entirely unlikely, but considering that you're > nervous about the bisection, please just try to revert it and see if > that fixes your testcase. > > You'll obviously need to revert the commit that removes > vmtruncate_range() too, since reverting 3f31d07571ee will re-introduce > the use of it (it's the next one: > 17cf28afea2a1112f240a3a2da8af883be024811), but it looks like those two > commits revert cleanly and the end result seems to compile ok. crap, so much for that theory. I ran latest with those two reverted overnight, and woke up to a dead box. Over serial console, I see a bunch of those same compaction oopses (Via sys_mmap_pgoff), and then kernel BUG at include/linux/mm.h:448! was the last thing it said before it choked. I'll redo the bisect. It's possible that one of the 'good' paths just didn't run for long enough. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759976Ab2FAOJv (ORCPT ); Fri, 1 Jun 2012 10:09:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21885 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758970Ab2FAOJu (ORCPT ); Fri, 1 Jun 2012 10:09:50 -0400 Date: Fri, 1 Jun 2012 10:09:43 -0400 From: Dave Jones To: Hugh Dickins Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601140943.GB1732@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote: > > 3f31d07571eeea18a7d34db9af21d2285b807a17 is the first bad commit > > commit 3f31d07571eeea18a7d34db9af21d2285b807a17 > > Author: Hugh Dickins > > Date: Tue May 29 15:06:40 2012 -0700 > > > > mm/fs: route MADV_REMOVE to FALLOC_FL_PUNCH_HOLE > > > > Now tmpfs supports hole-punching via fallocate(), switch madvise_remove() > > to use do_fallocate() instead of vmtruncate_range(): which extends > > madvise(,,MADV_REMOVE) support from tmpfs to ext4, ocfs2 and xfs. > > > > Hugh ? > > Ow, you've caught me. As I said in another mail, it looks like the bisect was wrong somewhere, as with this backed out I still see problems. > One half of the patch at the bottom should fix that: I'm not sure that > it's the fix we actually want (a mapping_cap_account_dirty test might > be more appropriate, but it's easier just to test a page flag here); > but it should be good to shed more light on the problem. I'll give the patch a try anyway, as builds are quick on that box. > So I'm wondering if your trinity fuzzer happens to succeed a lot more > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?), > which began to support MADV_REMOVE with that commit. ext4 is a possibility. > So the second half of the patch should show which filesystem's page is > involved when you hit the WARN_ON - unless the first half of the patch > turns out to stop the warnings completely, in which case I need to think > harder about what was going on in tmpfs, and whether it matters. > > Or another possibility is that the bad commit doesn't actually touch mm > at all: you were doing a bisection just on mm/ changes, weren't you? oh, good point. It hadn't occured to me that this could be fs related. The mm-heavy stack-trace may have misled me. > > Sometimes during the bisect these errors happened > > in pairs, sometimes only together. > > Sometimes in pairs, sometimes together? I don't understand. beware late-night emails. I meant sometimes I saw both the list-debug's and the WARN, but other times I saw only one or the other. > Please give this patch a try (preferably on current git), and let us know. Will do. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933048Ab2FAOPI (ORCPT ); Fri, 1 Jun 2012 10:15:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59713 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932595Ab2FAOPG (ORCPT ); Fri, 1 Jun 2012 10:15:06 -0400 Date: Fri, 1 Jun 2012 10:14:59 -0400 From: Dave Jones To: Hugh Dickins Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601141459.GC1732@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote: > So I'm wondering if your trinity fuzzer happens to succeed a lot more > often on madvise MADV_REMOVEs than fallocate FALLOC_FL_PUNCH_HOLEs, and > the bug you converged on is not in tmpfs, but in ext4 (or xfs? or ocfs2?), > which began to support MADV_REMOVE with that commit. One more thing: I happened to see this during a kernel build last night on another machine too, so it's not just fuzzing fallout. I'm surprised more people aren't seeing it. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965365Ab2FAQMQ (ORCPT ); Fri, 1 Jun 2012 12:12:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53469 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965287Ab2FAQMN (ORCPT ); Fri, 1 Jun 2012 12:12:13 -0400 Date: Fri, 1 Jun 2012 12:12:05 -0400 From: Dave Jones To: Hugh Dickins Cc: Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601161205.GA1918@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 01, 2012 at 01:44:44AM -0700, Hugh Dickins wrote: > Please give this patch a try (preferably on current git), and let us know. > > Thanks, > Hugh > > --- 3.4.0+/mm/migrate.c 2012-05-27 10:01:43.104049010 -0700 > +++ linux/mm/migrate.c 2012-06-01 00:10:58.080098749 -0700 > @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp > * is actually a signal that all of the page has become dirty. > * Whereas only part of our page may be dirty. > */ > - __set_page_dirty_nobuffers(newpage); > + if (PageSwapBacked(page)) > + SetPageDirty(newpage); > + else > + __set_page_dirty_nobuffers(newpage); > } > > mlock_migrate_page(newpage, page); > --- 3.4.0+/mm/page-writeback.c 2012-05-29 08:09:58.304806782 -0700 > +++ linux/mm/page-writeback.c 2012-06-01 00:23:43.984116973 -0700 > @@ -1987,7 +1987,10 @@ int __set_page_dirty_nobuffers(struct pa > mapping2 = page_mapping(page); > if (mapping2) { /* Race with truncate? */ > BUG_ON(mapping2 != mapping); > - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); > + if (WARN_ON(!PagePrivate(page) && !PageUptodate(page))) > + print_symbol(KERN_WARNING > + "mapping->a_ops->writepage: %s\n", > + (unsigned long)mapping->a_ops->writepage); > account_page_dirtied(page, mapping); > radix_tree_tag_set(&mapping->page_tree, > page_index(page), PAGECACHE_TAG_DIRTY); So with this applied, I don't seem to be able to trigger it. It's been running two hours so far. I'll leave it running, but right now I don't know what to make of this. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965488Ab2FAQX1 (ORCPT ); Fri, 1 Jun 2012 12:23:27 -0400 Received: from ud10.udmedia.de ([194.117.254.50]:45387 "EHLO mail.ud10.udmedia.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965393Ab2FAQXX (ORCPT ); Fri, 1 Jun 2012 12:23:23 -0400 X-Greylist: delayed 400 seconds by postgrey-1.27 at vger.kernel.org; Fri, 01 Jun 2012 12:23:23 EDT Date: Fri, 1 Jun 2012 18:16:40 +0200 From: Markus Trippelsdorf To: Hugh Dickins Cc: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601161640.GA329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2012.06.01 at 01:44 -0700, Hugh Dickins wrote: > On Thu, 31 May 2012, Dave Jones wrote: > > On Wed, May 30, 2012 at 08:57:40PM -0400, Dave Jones wrote: > > > On Wed, May 30, 2012 at 12:33:17PM -0400, Dave Jones wrote: > > > > Just saw this on Linus tree as of 731a7378b81c2f5fa88ca1ae20b83d548d5613dc > > > > > > > > WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() I've also hit this warning today: ------------[ cut here ]------------ WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0xea/0x120() Hardware name: System Product Name Pid: 4385, comm: firefox Not tainted 3.4.0-09547-gfb21aff-dirty #46 Call Trace: [] ? warn_slowpath_common+0x60/0xa0 [] ? __set_page_dirty_nobuffers+0xea/0x120 [] ? migrate_page_copy+0x150/0x160 [] ? migrate_page+0x4d/0x80 [] ? move_to_new_page+0x7d/0x220 [] ? suitable_migration_target.isra.12+0x1a0/0x1a0 [] ? migrate_pages+0x3c8/0x460 [] ? compact_zone+0x1c4/0x2c0 [] ? compact_zone_order+0x82/0xc0 [] ? try_to_compact_pages+0xca/0x140 [] ? __alloc_pages_direct_compact+0xa7/0x18f [] ? __alloc_pages_nodemask+0x3b0/0x7a0 [] ? do_huge_pmd_anonymous_page+0x10d/0x2a0 [] ? do_page_fault+0xfb/0x400 [] ? mmap_region+0x1dd/0x540 [] ? page_fault+0x1f/0x30 ---[ end trace 7d7c821044142576 ]--- -- Markus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965475Ab2FAQ3S (ORCPT ); Fri, 1 Jun 2012 12:29:18 -0400 Received: from mail-wi0-f170.google.com ([209.85.212.170]:51470 "EHLO mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965417Ab2FAQ3R (ORCPT ); Fri, 1 Jun 2012 12:29:17 -0400 MIME-Version: 1.0 In-Reply-To: <20120601161640.GA329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161640.GA329@x4> From: Linus Torvalds Date: Fri, 1 Jun 2012 09:28:56 -0700 X-Google-Sender-Auth: WjkHH_IY3griDFqCf6FEe7Cv7ms Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Markus Trippelsdorf Cc: Hugh Dickins , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf wrote: > > I've also hit this warning today: Can you try the patch by Hugh Dickins earlier in this thread? Dave is reporting tentative success with it, even though I don't think we really understand this thing fully yet. Getting way more testing would still be good, though. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965512Ab2FAQjX (ORCPT ); Fri, 1 Jun 2012 12:39:23 -0400 Received: from ud10.udmedia.de ([194.117.254.50]:47819 "EHLO mail.ud10.udmedia.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965403Ab2FAQjW (ORCPT ); Fri, 1 Jun 2012 12:39:22 -0400 Date: Fri, 1 Jun 2012 18:39:18 +0200 From: Markus Trippelsdorf To: Linus Torvalds Cc: Hugh Dickins , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601163918.GB329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161640.GA329@x4> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2012.06.01 at 09:28 -0700, Linus Torvalds wrote: > On Fri, Jun 1, 2012 at 9:16 AM, Markus Trippelsdorf > wrote: > > > > I've also hit this warning today: > > Can you try the patch by Hugh Dickins earlier in this thread? I will try. But please notice that the warning just happened per accident. I don't know how to reproduce the issue yet. > Dave is reporting tentative success with it, even though I don't think > we really understand this thing fully yet. Getting way more testing > would still be good, though. -- Markus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933617Ab2FASFy (ORCPT ); Fri, 1 Jun 2012 14:05:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48582 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932480Ab2FASFt (ORCPT ); Fri, 1 Jun 2012 14:05:49 -0400 Date: Fri, 1 Jun 2012 13:16:06 -0400 From: Dave Jones To: Hugh Dickins , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120601171606.GA3794@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Linus Torvalds , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120601161205.GA1918@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote: > So with this applied, I don't seem to be able to trigger it. It's been running two hours > so far. I'll leave it running, but right now I don't know what to make of this. I can trigger the list corruption still, but not the WARN. Dave [ 551.980716] ------------[ cut here ]------------ [ 551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() [ 551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0 [ 551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] [ 551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49 [ 551.989063] Call Trace: [ 551.990012] [] warn_slowpath_common+0x7f/0xc0 [ 551.990956] [] warn_slowpath_fmt+0x46/0x50 [ 551.991902] [] __list_del_entry+0xa1/0xd0 [ 551.992849] [] move_freepages_block+0x159/0x190 [ 551.993800] [] suitable_migration_target.isra.15+0x1b3/0x1d0 [ 551.994761] [] compaction_alloc+0x22e/0x2f0 [ 551.995731] [] migrate_pages+0xc7/0x540 [ 551.996684] [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 [ 551.997638] [] compact_zone+0x216/0x480 [ 551.998593] [] ? trace_hardirqs_off_caller+0x28/0xc0 [ 551.999558] [] compact_zone_order+0x8d/0xd0 [ 552.000525] [] ? get_page_from_freelist+0x565/0x970 [ 552.001502] [] try_to_compact_pages+0xc9/0x140 [ 552.002548] [] __alloc_pages_direct_compact+0xaa/0x1d0 [ 552.003592] [] __alloc_pages_nodemask+0x60b/0xab0 [ 552.004650] [] ? trace_hardirqs_off_caller+0x28/0xc0 [ 552.005708] [] ? __lock_acquire+0x2d0/0x1aa0 [ 552.007332] [] alloc_pages_vma+0xb6/0x190 [ 552.008953] [] do_huge_pmd_anonymous_page+0x133/0x310 [ 552.010584] [] handle_mm_fault+0x242/0x2e0 [ 552.012233] [] __get_user_pages+0x142/0x560 [ 552.013891] [] ? mmap_region+0x3f8/0x630 [ 552.015753] [] get_user_pages+0x52/0x60 [ 552.017348] [] make_pages_present+0x92/0xc0 [ 552.018936] [] mmap_region+0x3a6/0x630 [ 552.021074] [] ? do_setitimer+0x1cc/0x310 [ 552.022367] [] do_mmap_pgoff+0x35d/0x3b0 [ 552.023406] [] ? sys_mmap_pgoff+0x66/0x240 [ 552.024429] [] sys_mmap_pgoff+0x84/0x240 [ 552.025445] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 552.026466] [] sys_mmap+0x22/0x30 [ 552.027486] [] system_call_fastpath+0x16/0x1b [ 552.028521] ---[ end trace c092df1e14d11d14 ]--- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965830Ab2FAWSR (ORCPT ); Fri, 1 Jun 2012 18:18:17 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:44828 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965750Ab2FAWSO (ORCPT ); Fri, 1 Jun 2012 18:18:14 -0400 Date: Fri, 1 Jun 2012 15:17:48 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones , Linus Torvalds , Andrew Morton , Cong Wang cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120601171606.GA3794@redhat.com> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 1 Jun 2012, Dave Jones wrote: > On Fri, Jun 01, 2012 at 12:12:05PM -0400, Dave Jones wrote: > > > So with this applied, I don't seem to be able to trigger it. It's been running two hours > > so far. I'll leave it running, but right now I don't know what to make of this. > > I can trigger the list corruption still, but not the WARN. > > Dave > > [ 551.980716] ------------[ cut here ]------------ > [ 551.981646] WARNING: at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0() > [ 551.983461] list_del corruption. prev->next should be ffffea0004b305a0, but was ffffea0004f117e0 > [ 551.984406] Modules linked in: tun fuse nfnetlink binfmt_misc ipt_ULOG sctp libcrc32c caif_socket caif phonet bluetooth rfkill can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 psnap p8022 llc ax25 ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan] > [ 551.988121] Pid: 21459, comm: trinity-child2 Not tainted 3.4.0+ #49 > [ 551.989063] Call Trace: > [ 551.990012] [] warn_slowpath_common+0x7f/0xc0 > [ 551.990956] [] warn_slowpath_fmt+0x46/0x50 > [ 551.991902] [] __list_del_entry+0xa1/0xd0 > [ 551.992849] [] move_freepages_block+0x159/0x190 > [ 551.993800] [] suitable_migration_target.isra.15+0x1b3/0x1d0 > [ 551.994761] [] compaction_alloc+0x22e/0x2f0 > [ 551.995731] [] migrate_pages+0xc7/0x540 > [ 551.996684] [] ? suitable_migration_target.isra.15+0x1d0/0x1d0 > [ 551.997638] [] compact_zone+0x216/0x480 > [ 551.998593] [] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 551.999558] [] compact_zone_order+0x8d/0xd0 > [ 552.000525] [] ? get_page_from_freelist+0x565/0x970 > [ 552.001502] [] try_to_compact_pages+0xc9/0x140 > [ 552.002548] [] __alloc_pages_direct_compact+0xaa/0x1d0 > [ 552.003592] [] __alloc_pages_nodemask+0x60b/0xab0 > [ 552.004650] [] ? trace_hardirqs_off_caller+0x28/0xc0 > [ 552.005708] [] ? __lock_acquire+0x2d0/0x1aa0 > [ 552.007332] [] alloc_pages_vma+0xb6/0x190 > [ 552.008953] [] do_huge_pmd_anonymous_page+0x133/0x310 > [ 552.010584] [] handle_mm_fault+0x242/0x2e0 > [ 552.012233] [] __get_user_pages+0x142/0x560 > [ 552.013891] [] ? mmap_region+0x3f8/0x630 > [ 552.015753] [] get_user_pages+0x52/0x60 > [ 552.017348] [] make_pages_present+0x92/0xc0 > [ 552.018936] [] mmap_region+0x3a6/0x630 > [ 552.021074] [] ? do_setitimer+0x1cc/0x310 > [ 552.022367] [] do_mmap_pgoff+0x35d/0x3b0 > [ 552.023406] [] ? sys_mmap_pgoff+0x66/0x240 > [ 552.024429] [] sys_mmap_pgoff+0x84/0x240 > [ 552.025445] [] ? trace_hardirqs_on_thunk+0x3a/0x3f > [ 552.026466] [] sys_mmap+0x22/0x30 > [ 552.027486] [] system_call_fastpath+0x16/0x1b > [ 552.028521] ---[ end trace c092df1e14d11d14 ]--- Several distractions today, and I must rush out now for two or three hours: but please check if this patch below makes sense (I've only checked that it builds), and if so give it a run to see if it fixes your list corruptions - thanks. (Looks like there's an independent off-by-one in page_zone(end_page), but that shouldn't do any harm.) Hugh --- 3.4.0+/mm/compaction.c 2012-05-30 08:17:19.396008280 -0700 +++ linux/mm/compaction.c 2012-06-01 15:04:18.612051243 -0700 @@ -369,6 +369,9 @@ static bool rescue_unmovable_pageblock(s { unsigned long pfn, start_pfn, end_pfn; struct page *start_page, *end_page; + struct zone *zone; + unsigned long flags; + bool rescued = false; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); @@ -378,9 +381,11 @@ static bool rescue_unmovable_pageblock(s end_page = pfn_to_page(end_pfn); /* Do not deal with pageblocks that overlap zones */ - if (page_zone(start_page) != page_zone(end_page)) + zone = page_zone(start_page); + if (zone != page_zone(end_page)) return false; + spin_lock_irqsave(&zone->lock, flags); for (page = start_page, pfn = start_pfn; page < end_page; pfn++, page++) { if (!pfn_valid_within(pfn)) @@ -396,12 +401,15 @@ static bool rescue_unmovable_pageblock(s } else if (page_count(page) == 0 || PageLRU(page)) continue; - return false; + goto out; } set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); - return true; + move_freepages_block(zone, page, MIGRATE_MOVABLE); + rescued = true; +out: + spin_unlock_irqrestore(&zone->lock, flags); + return rescued; } enum smt_result { From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966202Ab2FBBpb (ORCPT ); Fri, 1 Jun 2012 21:45:31 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:40328 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966188Ab2FBBp3 convert rfc822-to-8bit (ORCPT ); Fri, 1 Jun 2012 21:45:29 -0400 MIME-Version: 1.0 In-Reply-To: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> From: Linus Torvalds Date: Fri, 1 Jun 2012 18:45:07 -0700 X-Google-Sender-Auth: MgjYepTACaQC49TKf5qQ11LrLmw Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Hugh Dickins Cc: Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > > +       spin_lock_irqsave(&zone->lock, flags); >        for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >                                                                  page++) { So holding the spinlock (and disabling irqs!) over the whole loop sounds horrible. At the same time, the iterators don't seem to require the spinlock, so it should be possible to just move the lock into the loop, no? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932542Ab2FBE7S (ORCPT ); Sat, 2 Jun 2012 00:59:18 -0400 Received: from mail-wi0-f172.google.com ([209.85.212.172]:59654 "EHLO mail-wi0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756993Ab2FBE7R (ORCPT ); Sat, 2 Jun 2012 00:59:17 -0400 MIME-Version: 1.0 In-Reply-To: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> From: Linus Torvalds Date: Fri, 1 Jun 2012 21:58:50 -0700 X-Google-Sender-Auth: 1BphR7UYHfcZPYjKyGWiFBLzWiI Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Hugh Dickins Cc: Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins wrote: > > Move the lock after the loop, I think you meant. Well, I wasn't sure if anything inside the loop might need it. I don't *think* so, but at the same time, what protects "page_order(page)" (or, indeed PageBuddy()) from being stable while that loop content uses them? I don't understand that code at all. It does that crazy iteration over page, and changes "page" in random ways, and then finishes up with a totally new "page" value that is some random thing that is *after* the end_page thing. WHAT? The code makes no sense. It tests all those pages within the page-block, but then after it has done all those tests, it does the final set_pageblock_migratetype(..) move_freepages_block(..) using a page that is *beyond* the pageblock (and with the whole page_order() thing, who knows just how far beyond it?) It looks entirely too much like random-monkey code to me. Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966400Ab2FBHRh (ORCPT ); Sat, 2 Jun 2012 03:17:37 -0400 Received: from ud10.udmedia.de ([194.117.254.50]:33707 "EHLO mail.ud10.udmedia.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966316Ab2FBHRd (ORCPT ); Sat, 2 Jun 2012 03:17:33 -0400 Date: Sat, 2 Jun 2012 09:17:30 +0200 From: Markus Trippelsdorf To: Hugh Dickins Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120602071730.GB329@x4> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote: > > I'm guessing that the few people who see the warning are those running > new systemd distros, and that systemd is indeed now making use of the > fallocate support we added into tmpfs for it.) At least in my case it's nothing that horrible. I'm just setting browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox does indeed use fallocate on its "disk cache" items. -- Markus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966394Ab2FBHUl (ORCPT ); Sat, 2 Jun 2012 03:20:41 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:48261 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966371Ab2FBHUk (ORCPT ); Sat, 2 Jun 2012 03:20:40 -0400 Date: Sat, 2 Jun 2012 00:20:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 1 Jun 2012, Linus Torvalds wrote: > On Fri, Jun 1, 2012 at 9:40 PM, Hugh Dickins wrote: > > > > Move the lock after the loop, I think you meant. > > Well, I wasn't sure if anything inside the loop might need it. I don't > *think* so, but at the same time, what protects "page_order(page)" > (or, indeed PageBuddy()) from being stable while that loop content > uses them? Yes, I believe you're right, page_order(page) could supply nonsense if it's not stabilized under zone->lock along with PageBuddy(page). Though if this rescue_unmovable_pageblock() is just best-effort, with a little more care we can probably avoid the lock in there. > > I don't understand that code at all. It does that crazy iteration over > page, and changes "page" in random ways, I don't think they're random ways: when buddy it uses the order to skip that block, otherwise it goes page by page, considering a free (I guess on pcp) page or an lru page as good for movable. > and then finishes up with a > totally new "page" value that is some random thing that is *after* the > end_page thing. WHAT? > > The code makes no sense. It tests all those pages within the > page-block, but then after it has done all those tests, it does the > final > > set_pageblock_migratetype(..) > move_freepages_block(..) > > using a page that is *beyond* the pageblock (and with the whole > page_order() thing, who knows just how far beyond it?) I totally missed that, thank goodness you did not. Yes, it's rubbish. It goes to this effort to find a suitable pageblock, then chooses the next one instead (or possibly another). Perhaps it would get even better results using a random number generator in there. > > It looks entirely too much like random-monkey code to me. Presumably it should be passing start_page instead of page to set_pageblock_migratetype() and move_freepages_block(). But this does seem to be code of the kind, that the longer you look at it, the more bugs you find. And I worry about what trouble it might then cause, if it actually started to work in the way it was intending. I don't think fixing it up is wise for -rc1. Commit 5ceb9ce6fe9462a298bb2cd5c9f1ca6cb80a0199 ("mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks") appears to revert cleanly, and I'm running with it reverted now. I'm not saying it shouldn't come back later, but does anyone see an argument against reverting it now? Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966406Ab2FBHXD (ORCPT ); Sat, 2 Jun 2012 03:23:03 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:51298 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966378Ab2FBHXB (ORCPT ); Sat, 2 Jun 2012 03:23:01 -0400 Date: Sat, 2 Jun 2012 00:22:34 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Markus Trippelsdorf cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120602071730.GB329@x4> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120602071730.GB329@x4> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 2 Jun 2012, Markus Trippelsdorf wrote: > On 2012.06.01 at 21:40 -0700, Hugh Dickins wrote: > > > > I'm guessing that the few people who see the warning are those running > > new systemd distros, and that systemd is indeed now making use of the > > fallocate support we added into tmpfs for it.) > > At least in my case it's nothing that horrible. I'm just setting > browser.cache.disk.parent_directory to /dev/shm in Firefox. And Firefox > does indeed use fallocate on its "disk cache" items. That fits, and it's very helpful to know - thank you. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966419Ab2FBH2J (ORCPT ); Sat, 2 Jun 2012 03:28:09 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:34037 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966396Ab2FBH2H (ORCPT ); Sat, 2 Jun 2012 03:28:07 -0400 Date: Sat, 2 Jun 2012 00:27:47 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Markus Trippelsdorf , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH] mm: fix warning in __set_page_dirty_nobuffers In-Reply-To: Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120602071730.GB329@x4> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org New tmpfs use of !PageUptodate pages for fallocate() is triggering the WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers() is called from migrate_page_copy() for compaction. It is anomalous that migration should use __set_page_dirty_nobuffers() on an address_space that does not participate in dirty and writeback accounting; and this has also been observed to insert surprising dirty tags into a tmpfs radix_tree, despite tmpfs not using tags at all. We should probably give migrate_page_copy() a better way to preserve the tag and migrate accounting info, when mapping_cap_account_dirty(). But that needs some more work: so in the interim, avoid the warning by using a simple SetPageDirty on PageSwapBacked pages. Reported-by: Dave Jones Signed-off-by: Hugh Dickins --- mm/migrate.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) --- 3.4.0+/mm/migrate.c 2012-05-27 10:01:43.104049010 -0700 +++ linux/mm/migrate.c 2012-06-01 00:10:58.080098749 -0700 @@ -436,7 +436,10 @@ void migrate_page_copy(struct page *newp * is actually a signal that all of the page has become dirty. * Whereas only part of our page may be dirty. */ - __set_page_dirty_nobuffers(newpage); + if (PageSwapBacked(page)) + SetPageDirty(newpage); + else + __set_page_dirty_nobuffers(newpage); } mlock_migrate_page(newpage, page); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754118Ab2FCSQP (ORCPT ); Sun, 3 Jun 2012 14:16:15 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21060 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753992Ab2FCSQO (ORCPT ); Sun, 3 Jun 2012 14:16:14 -0400 Date: Sun, 3 Jun 2012 14:15:48 -0400 From: Dave Jones To: Hugh Dickins Cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603181548.GA306@redhat.com> Mail-Followup-To: Dave Jones , Hugh Dickins , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 01, 2012 at 09:40:35PM -0700, Hugh Dickins wrote: > In which case, yes, much better to follow your suggestion, and hold > the lock (with irqs disabled) for only half the time. > > Similarly untested patch below. Things aren't happy with that patch at all. ============================================= [ INFO: possible recursive locking detected ] 3.5.0-rc1+ #50 Not tainted --------------------------------------------- trinity-child1/31784 is trying to acquire lock: (&(&zone->lock)->rlock){-.-.-.}, at: [] suitable_migration_target.isra.15+0x19d/0x1e0 but task is already holding lock: (&(&zone->lock)->rlock){-.-.-.}, at: [] compaction_alloc+0x21b/0x2f0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&zone->lock)->rlock); lock(&(&zone->lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by trinity-child1/31784: #0: (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x66/0xb0 #1: (&(&zone->lock)->rlock){-.-.-.}, at: [] compaction_alloc+0x21b/0x2f0 stack backtrace: Pid: 31784, comm: trinity-child1 Not tainted 3.5.0-rc1+ #50 Call Trace: [] __lock_acquire+0x1584/0x1aa0 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] ? local_clock+0x47/0x60 [] lock_acquire+0x92/0x1f0 [] ? suitable_migration_target.isra.15+0x19d/0x1e0 [] ? _raw_spin_lock_irqsave+0x25/0x90 [] _raw_spin_lock_irqsave+0x52/0x90 [] ? suitable_migration_target.isra.15+0x19d/0x1e0 [] suitable_migration_target.isra.15+0x19d/0x1e0 [] compaction_alloc+0x22e/0x2f0 [] migrate_pages+0xc7/0x540 [] ? isolate_freepages_block+0x260/0x260 [] compact_zone+0x216/0x480 [] ? trace_hardirqs_off_caller+0x28/0xc0 [] compact_zone_order+0x8d/0xd0 [] ? get_page_from_freelist+0x565/0x970 [] try_to_compact_pages+0xc9/0x140 [] __alloc_pages_direct_compact+0xaa/0x1d0 Then a bunch of NMI backtraces, and a hard lockup. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754172Ab2FCSXw (ORCPT ); Sun, 3 Jun 2012 14:23:52 -0400 Received: from mail-we0-f174.google.com ([74.125.82.174]:40777 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754084Ab2FCSXv (ORCPT ); Sun, 3 Jun 2012 14:23:51 -0400 MIME-Version: 1.0 In-Reply-To: <20120603181548.GA306@redhat.com> References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> From: Linus Torvalds Date: Sun, 3 Jun 2012 11:23:29 -0700 X-Google-Sender-Auth: nt_fNlnp8FLv9cZ8UfCt9GDQupA Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Dave Jones , Hugh Dickins , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > Things aren't happy with that patch at all. Yeah, at this point I think we need to just revert the compaction changes. Guys, what's the minimal set of commits to revert? That clearly buggy "rescue_unmovable_pageblock()" function was introduced by commit 5ceb9ce6fe94, but is that actually involved with the particular bug? That commit seems to revert cleanly still, but is that sufficient or does it even matter? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754251Ab2FCSb5 (ORCPT ); Sun, 3 Jun 2012 14:31:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26400 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754081Ab2FCSb4 (ORCPT ); Sun, 3 Jun 2012 14:31:56 -0400 Date: Sun, 3 Jun 2012 14:31:39 -0400 From: Dave Jones To: Linus Torvalds Cc: Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603183139.GA1061@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > > > Things aren't happy with that patch at all. > > Yeah, at this point I think we need to just revert the compaction changes. > > Guys, what's the minimal set of commits to revert? That clearly buggy > "rescue_unmovable_pageblock()" function was introduced by commit > 5ceb9ce6fe94, but is that actually involved with the particular bug? > That commit seems to revert cleanly still, but is that sufficient or > does it even matter? I'l rerun the test with that (and Hugh's last patch) backed out, and see if that makes any difference. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754784Ab2FCUxw (ORCPT ); Sun, 3 Jun 2012 16:53:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50883 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754531Ab2FCUxv (ORCPT ); Sun, 3 Jun 2012 16:53:51 -0400 Date: Sun, 3 Jun 2012 16:53:32 -0400 From: Dave Jones To: Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603205332.GA5412@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120603183139.GA1061@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote: > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote: > > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > > > > > Things aren't happy with that patch at all. > > > > Yeah, at this point I think we need to just revert the compaction changes. > > > > Guys, what's the minimal set of commits to revert? That clearly buggy > > "rescue_unmovable_pageblock()" function was introduced by commit > > 5ceb9ce6fe94, but is that actually involved with the particular bug? > > That commit seems to revert cleanly still, but is that sufficient or > > does it even matter? > > I'l rerun the test with that (and Hugh's last patch) backed out, and see > if that makes any difference. running just over two hours with that commit reverted with no obvious ill effects so far. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754839Ab2FCV7q (ORCPT ); Sun, 3 Jun 2012 17:59:46 -0400 Received: from mail-wg0-f44.google.com ([74.125.82.44]:47649 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751037Ab2FCV7p (ORCPT ); Sun, 3 Jun 2012 17:59:45 -0400 MIME-Version: 1.0 In-Reply-To: <20120603205332.GA5412@redhat.com> References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> From: Linus Torvalds Date: Sun, 3 Jun 2012 14:59:22 -0700 X-Google-Sender-Auth: 26yaPxQEQNrG8IG020bsSgI0LxU Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Dave Jones , Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones wrote: > > running just over two hours with that commit reverted with no obvious ill effects so far. And how quickly have you usually seen the problems? Would you have considered two ours "good" in your bisection thing? Also, just to check: Hugh sent out a patch called "mm: fix warning in __set_page_dirty_nobuffers". Is that applied in your tree too, or did the __set_page_dirty_nobuffers() warning go away with just the revert? I'm just trying to figure out exactly what you are testing. When you said "test with that (and Hugh's last patch) backed out", the "and Hugh's last patch" part was a bit ambiguous. Do you mean the trial patch in this thread (backed out) or do you mean "*with* Hugh's patch for the __set_page_dirty_nobuffers() warning". Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754860Ab2FCWNp (ORCPT ); Sun, 3 Jun 2012 18:13:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59848 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753869Ab2FCWNn (ORCPT ); Sun, 3 Jun 2012 18:13:43 -0400 Date: Sun, 3 Jun 2012 18:13:26 -0400 From: Dave Jones To: Linus Torvalds Cc: Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() Message-ID: <20120603221326.GA7707@redhat.com> Mail-Followup-To: Dave Jones , Linus Torvalds , Hugh Dickins , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 03, 2012 at 02:59:22PM -0700, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones wrote: > > > > running just over two hours with that commit reverted with no obvious ill effects so far. > > And how quickly have you usually seen the problems? Would you have > considered two ours "good" in your bisection thing? Yeah, usually see something go awry in an hour or less. > Also, just to check: Hugh sent out a patch called "mm: fix warning in > __set_page_dirty_nobuffers". Is that applied in your tree too, or did > the __set_page_dirty_nobuffers() warning go away with just the revert? That is applied. Otherwise I see the warning he refers to. > I'm just trying to figure out exactly what you are testing. When you > said "test with that (and Hugh's last patch) backed out", the "and > Hugh's last patch" part was a bit ambiguous. Do you mean the trial > patch in this thread (backed out) or do you mean "*with* Hugh's patch > for the __set_page_dirty_nobuffers() warning". The former. (This). --- 3.4.0+/mm/compaction.c 2012-05-30 08:17:19.396008280 -0700 +++ linux/mm/compaction.c 2012-06-01 20:59:56.840204915 -0700 @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s { unsigned long pfn, start_pfn, end_pfn; struct page *start_page, *end_page; + struct zone *zone; + unsigned long flags; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s end_page = pfn_to_page(end_pfn); /* Do not deal with pageblocks that overlap zones */ - if (page_zone(start_page) != page_zone(end_page)) + zone = page_zone(start_page); + if (zone != page_zone(end_page)) return false; for (page = start_page, pfn = start_pfn; page < end_page; pfn++, @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s return false; } + spin_lock_irqsave(&zone->lock, flags); set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); + move_freepages_block(zone, page, MIGRATE_MOVABLE); + spin_unlock_irqrestore(&zone->lock, flags); return true; I do see something else weird going on, but it seems like an unrelated problem. I have a lot of processes hanging after calling sys_renameat. I'll dig some more on that, and post a follow-up. Dave From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754928Ab2FCWSH (ORCPT ); Sun, 3 Jun 2012 18:18:07 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:48709 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754617Ab2FCWSF (ORCPT ); Sun, 3 Jun 2012 18:18:05 -0400 Date: Sun, 3 Jun 2012 15:17:36 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Jones , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <20120603205332.GA5412@redhat.com> Message-ID: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 3 Jun 2012, Dave Jones wrote: > On Sun, Jun 03, 2012 at 02:31:39PM -0400, Dave Jones wrote: > > On Sun, Jun 03, 2012 at 11:23:29AM -0700, Linus Torvalds wrote: > > > On Sun, Jun 3, 2012 at 11:15 AM, Dave Jones wrote: > > > > > > > > Things aren't happy with that patch at all. > > > > > > Yeah, at this point I think we need to just revert the compaction changes. > > > > > > Guys, what's the minimal set of commits to revert? That clearly buggy > > > "rescue_unmovable_pageblock()" function was introduced by commit > > > 5ceb9ce6fe94, but is that actually involved with the particular bug? > > > That commit seems to revert cleanly still, but is that sufficient or > > > does it even matter? > > > > I'l rerun the test with that (and Hugh's last patch) backed out, and see > > if that makes any difference. > > running just over two hours with that commit reverted with no obvious ill effects so far. Yes, and I ran happily with precisely that commit reverted on Friday - though I've never got the list corruption that you saw with it in. The locking bug certainly comes in with that commit, it's an isolated commit that reverts cleanly, and I think you got the list corruption rather sooner than two hours before (9min, 30min, 41min from the traces you sent). Maybe we should let you run a little longer, or wait for others to comment. But another strike against that commit: I tried fixing it up to use start_page instead of page at the end, with the worrying but safer locking I suggested at first, with a count of how many times it went there, and how many times it succeeded. While I ran my usual swapping test (perhaps that's a very unfair test to run on this, I've no idea) for seven hours, it went there 25406 times (once per second, it appears) and it succeeded... 0 times. Let's hope it failed quickly each time, I wasn't capturing that. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755023Ab2FCWaQ (ORCPT ); Sun, 3 Jun 2012 18:30:16 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:52993 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754920Ab2FCWaO (ORCPT ); Sun, 3 Jun 2012 18:30:14 -0400 Date: Sun, 3 Jun 2012 15:29:46 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 3 Jun 2012, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 1:53 PM, Dave Jones wrote: > > > > running just over two hours with that commit reverted with no obvious ill effects so far. > > And how quickly have you usually seen the problems? Would you have > considered two ours "good" in your bisection thing? > > Also, just to check: Hugh sent out a patch called "mm: fix warning in > __set_page_dirty_nobuffers". Is that applied in your tree too, or did > the __set_page_dirty_nobuffers() warning go away with just the revert? That patch is good for fixing the __set_page_dirty_nobuffers() warning, but it has no relevance to the list corruption Dave was also reporting, nor vice versa. The common factor there is just Dave. And no disaster that the warning fix missed -rc1: it's only a WARN_ON_ONCE, and nothing was wrong beyond the warning itself, just noise. It's true that Dave's original bisection raised the doubt whether that warning is coming from somewhere else too; but best guess at this point is that something got mixed up, and we should only worry about that if we see the warning again once the known fix is in. Hugh > > I'm just trying to figure out exactly what you are testing. When you > said "test with that (and Hugh's last patch) backed out", the "and > Hugh's last patch" part was a bit ambiguous. Do you mean the trial > patch in this thread (backed out) or do you mean "*with* Hugh's patch > for the __set_page_dirty_nobuffers() warning". From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755130Ab2FCXNf (ORCPT ); Sun, 3 Jun 2012 19:13:35 -0400 Received: from mail-we0-f174.google.com ([74.125.82.174]:44162 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755074Ab2FCXNe (ORCPT ); Sun, 3 Jun 2012 19:13:34 -0400 MIME-Version: 1.0 In-Reply-To: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> From: Linus Torvalds Date: Sun, 3 Jun 2012 16:13:13 -0700 X-Google-Sender-Auth: dVP7z6mlpZKrjxfKJ2IKV_DjlVw Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Hugh Dickins Cc: Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: > > But another strike against that commit: I tried fixing it up to use > start_page instead of page at the end, with the worrying but safer > locking I suggested at first, with a count of how many times it went > there, and how many times it succeeded. You can't use start_page anyway, it might not be a valid page. There's a reson it does that "pfn_valid_within()", methinks. Anyway, my current plan is to apply your "mm: fix warning in __set_page_dirty_nobuffers" patch - even if it's just a harmless WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally hit his problem much before two hours, and it must be even longer now. Ack on that plan? Linus From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755414Ab2FDAqR (ORCPT ); Sun, 3 Jun 2012 20:46:17 -0400 Received: from mail-qa0-f46.google.com ([209.85.216.46]:56527 "EHLO mail-qa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755391Ab2FDAqQ (ORCPT ); Sun, 3 Jun 2012 20:46:16 -0400 Message-ID: <4FCC0553.80100@gmail.com> Date: Sun, 03 Jun 2012 20:46:11 -0400 From: KOSAKI Motohiro User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Linus Torvalds CC: Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (6/3/12 7:13 PM), Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: >> >> But another strike against that commit: I tried fixing it up to use >> start_page instead of page at the end, with the worrying but safer >> locking I suggested at first, with a count of how many times it went >> there, and how many times it succeeded. > > You can't use start_page anyway, it might not be a valid page. There's > a reson it does that "pfn_valid_within()", methinks. Right. ia64 has strange^H^H^H^H optimized pfn_valid and we need care it. (btw, I don't understand why mips may enable CONFIG_HOLES_INZONE, mips doesn't have custom pfn_valid) > Anyway, my current plan is to apply your "mm: fix warning in > __set_page_dirty_nobuffers" patch - even if it's just a harmless > WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally > hit his problem much before two hours, and it must be even longer now. > > Ack on that plan? +1. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755457Ab2FDBK1 (ORCPT ); Sun, 3 Jun 2012 21:10:27 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:61527 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755385Ab2FDBK0 (ORCPT ); Sun, 3 Jun 2012 21:10:26 -0400 X-AuditID: 9c930197-b7be2ae000000ebb-24-4fcc0afe8784 Message-ID: <4FCC0B09.1070708@kernel.org> Date: Mon, 04 Jun 2012 10:10:33 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Hugh Dickins CC: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/02/2012 01:40 PM, Hugh Dickins wrote: > On Fri, 1 Jun 2012, Linus Torvalds wrote: >> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: >>> >>> + spin_lock_irqsave(&zone->lock, flags); >>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >>> page++) { >> >> So holding the spinlock (and disabling irqs!) over the whole loop >> sounds horrible. > > There looks to be a pretty similar loop inside move_freepages_block(), > which is the part which I believe really needs the lock - it's moving > free pages from one lru to another. > >> >> At the same time, the iterators don't seem to require the spinlock, so >> it should be possible to just move the lock into the loop, no? > > Move the lock after the loop, I think you meant. > > I put the lock before the loop because it's deciding whether it can > usefully proceed, and then proceeding: I was thinking that the lock > would stabilize the conditions that it bases that decision on. We do it with two phase. In first phase, we don't need lock because we don't need to be exact. In second phase where move pages really, we need a lock so we already hold it. ret = suitable_migration_target(page, cc); .. .. spin_lock_irqsave(&zone->lock, flags); ret = suitable_migration_target(page, cc); So you shouldn't put the lock in loop. > > But it certainly does not stabilize all of them (most obviously not > PageLRU), so I'm guesssing that this is a best-effort decision which > can safely go wrong some of the time. Right. > > In which case, yes, much better to follow your suggestion, and hold > the lock (with irqs disabled) for only half the time. > > Similarly untested patch below. > > But I'm entirely unfamiliar with this code: best Cc people more familiar > with it. Does this addition of locking to rescue_unmovable_pageblock() > look correct to you, and do you think it has a good chance of fixing the No.I think we need to use start_page instead of page and we need a last page of page block to check cross-over zones, not first page in next page block. I should have reviewed more carefully. :( barrios@bbox:~/linux-2.6$ git diff diff --git a/mm/compaction.c b/mm/compaction.c index 4ac338a..b3fcc4b 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); - end_pfn = start_pfn + pageblock_nr_pages; + end_pfn = start_pfn + pageblock_nr_pages - 1; start_page = pfn_to_page(start_pfn); end_page = pfn_to_page(end_pfn); @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) if (page_zone(start_page) != page_zone(end_page)) return false; - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, page++) { if (!pfn_valid_within(pfn)) continue; @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) return false; } - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); return true; } Hugh, thanks for looking this. > move_freepages_block() list debug warnings which Dave has been reporting > (in this and in another thread)? > > (Although there's still something of a mystery in where Dave's bisection > appeared to converge, our best assumption at present is that one of my > tmpfs changes is to blame for the __set_page_dirty_nobuffers warnings, > and I need to send a finalized patch to fix that later. > > I'm guessing that the few people who see the warning are those running > new systemd distros, and that systemd is indeed now making use of the > fallocate support we added into tmpfs for it.) > > Hugh > > --- 3.4.0+/mm/compaction.c 2012-05-30 08:17:19.396008280 -0700 > +++ linux/mm/compaction.c 2012-06-01 20:59:56.840204915 -0700 > @@ -369,6 +369,8 @@ static bool rescue_unmovable_pageblock(s > { > unsigned long pfn, start_pfn, end_pfn; > struct page *start_page, *end_page; > + struct zone *zone; > + unsigned long flags; > > pfn = page_to_pfn(page); > start_pfn = pfn & ~(pageblock_nr_pages - 1); > @@ -378,7 +380,8 @@ static bool rescue_unmovable_pageblock(s > end_page = pfn_to_page(end_pfn); > > /* Do not deal with pageblocks that overlap zones */ > - if (page_zone(start_page) != page_zone(end_page)) > + zone = page_zone(start_page); > + if (zone != page_zone(end_page)) > return false; > > for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > @@ -399,8 +402,10 @@ static bool rescue_unmovable_pageblock(s > return false; > } > > + spin_lock_irqsave(&zone->lock, flags); > set_pageblock_migratetype(page, MIGRATE_MOVABLE); > - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); > + move_freepages_block(zone, page, MIGRATE_MOVABLE); > + spin_unlock_irqrestore(&zone->lock, flags); > return true; > } > -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755486Ab2FDBTL (ORCPT ); Sun, 3 Jun 2012 21:19:11 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:48586 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755407Ab2FDBTJ (ORCPT ); Sun, 3 Jun 2012 21:19:09 -0400 Date: Sun, 3 Jun 2012 18:18:39 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Minchan Kim , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: Message-ID: References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 3 Jun 2012, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: > > > > But another strike against that commit: I tried fixing it up to use > > start_page instead of page at the end, with the worrying but safer > > locking I suggested at first, with a count of how many times it went > > there, and how many times it succeeded. > > You can't use start_page anyway, it might not be a valid page. There's > a reson it does that "pfn_valid_within()", methinks. You wouldn't want me to say that I think you're right, it would impudently suggest that I might conceive of you being wrong. I sigh for your heavy burden. > > Anyway, my current plan is to apply your "mm: fix warning in > __set_page_dirty_nobuffers" patch - even if it's just a harmless > WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally > hit his problem much before two hours, and it must be even longer now. > > Ack on that plan? Sure, ack from me on that plan. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755492Ab2FDBVo (ORCPT ); Sun, 3 Jun 2012 21:21:44 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:59683 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755407Ab2FDBVm (ORCPT ); Sun, 3 Jun 2012 21:21:42 -0400 X-AuditID: 9c93016f-b7c3cae000001954-5e-4fcc0da3c499 Message-ID: <4FCC0DB4.30106@kernel.org> Date: Mon, 04 Jun 2012 10:21:56 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Linus Torvalds CC: Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/04/2012 08:13 AM, Linus Torvalds wrote: > On Sun, Jun 3, 2012 at 3:17 PM, Hugh Dickins wrote: >> >> But another strike against that commit: I tried fixing it up to use >> start_page instead of page at the end, with the worrying but safer >> locking I suggested at first, with a count of how many times it went >> there, and how many times it succeeded. > > You can't use start_page anyway, it might not be a valid page. There's > a reson it does that "pfn_valid_within()", methinks. Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock. We make sure it's valid in isolate_freepages. So how about this? barrios@bbox:~/linux-2.6$ git diff diff --git a/mm/compaction.c b/mm/compaction.c index 4ac338a..7459ab5 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, static bool rescue_unmovable_pageblock(struct page *page) { unsigned long pfn, start_pfn, end_pfn; - struct page *start_page, *end_page; + struct page *start_page, *end_page, *cursor_page; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); - end_pfn = start_pfn + pageblock_nr_pages; + end_pfn = start_pfn + pageblock_nr_pages - 1; start_page = pfn_to_page(start_pfn); end_page = pfn_to_page(end_pfn); @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page) if (page_zone(start_page) != page_zone(end_page)) return false; - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, - page++) { + for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++, + cursor_page++) { if (!pfn_valid_within(pfn)) continue; - if (PageBuddy(page)) { - int order = page_order(page); + if (PageBuddy(cursor_page)) { + int order = page_order(cursor_page); pfn += (1 << order) - 1; - page += (1 << order) - 1; + cursor_page += (1 << order) - 1; continue; - } else if (page_count(page) == 0 || PageLRU(page)) + } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page)) continue; return false; > > Anyway, my current plan is to apply your "mm: fix warning in > __set_page_dirty_nobuffers" patch - even if it's just a harmless > WARN_ON_ONCE(), and revert 5ceb9ce6fe94. Sounds like Dave hit normally > hit his problem much before two hours, and it must be even longer now. > > Ack on that plan? No objection. The patch wasn't a bug fix and even test workload was very theoretical. > > Linus > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ > Don't email: email@kvack.org > -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755518Ab2FDB1C (ORCPT ); Sun, 3 Jun 2012 21:27:02 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:44486 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755444Ab2FDB1A convert rfc822-to-8bit (ORCPT ); Sun, 3 Jun 2012 21:27:00 -0400 MIME-Version: 1.0 In-Reply-To: <4FCC0DB4.30106@kernel.org> References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> <4FCC0DB4.30106@kernel.org> From: KOSAKI Motohiro Date: Sun, 3 Jun 2012 21:26:39 -0400 Message-ID: Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() To: Minchan Kim Cc: Linus Torvalds , Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock. > We make sure it's valid in isolate_freepages. So how about this? > > barrios@bbox:~/linux-2.6$ git diff > diff --git a/mm/compaction.c b/mm/compaction.c > index 4ac338a..7459ab5 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, >  static bool rescue_unmovable_pageblock(struct page *page) >  { >        unsigned long pfn, start_pfn, end_pfn; > -       struct page *start_page, *end_page; > +       struct page *start_page, *end_page, *cursor_page; > >        pfn = page_to_pfn(page); >        start_pfn = pfn & ~(pageblock_nr_pages - 1); > -       end_pfn = start_pfn + pageblock_nr_pages; > +       end_pfn = start_pfn + pageblock_nr_pages - 1; > >        start_page = pfn_to_page(start_pfn); >        end_page = pfn_to_page(end_pfn); > @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page) >        if (page_zone(start_page) != page_zone(end_page)) >                return false; > > -       for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > -                                                                 page++) { > +       for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++, > +                                                                 cursor_page++) { >                if (!pfn_valid_within(pfn)) >                        continue; I guess page_zone() should be used after pfn_valid_within(). Why can we assume invalid pfn return correct zone? > -               if (PageBuddy(page)) { > -                       int order = page_order(page); > +               if (PageBuddy(cursor_page)) { > +                       int order = page_order(cursor_page); > >                        pfn += (1 << order) - 1; > -                       page += (1 << order) - 1; > +                       cursor_page += (1 << order) - 1; > >                        continue; > -               } else if (page_count(page) == 0 || PageLRU(page)) > +               } else if (page_count(cursor_page) == 0 || PageLRU(cursor_page)) >                        continue; > >                return false; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755539Ab2FDBlr (ORCPT ); Sun, 3 Jun 2012 21:41:47 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:48208 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755463Ab2FDBlq (ORCPT ); Sun, 3 Jun 2012 21:41:46 -0400 Date: Sun, 3 Jun 2012 18:41:21 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Minchan Kim cc: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-Reply-To: <4FCC0B09.1070708@kernel.org> Message-ID: References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> User-Agent: Alpine 2.00 (LSU 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 4 Jun 2012, Minchan Kim wrote: > On 06/02/2012 01:40 PM, Hugh Dickins wrote: > > > On Fri, 1 Jun 2012, Linus Torvalds wrote: > >> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > >>> > >>> + spin_lock_irqsave(&zone->lock, flags); > >>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > >>> page++) { > >> > >> So holding the spinlock (and disabling irqs!) over the whole loop > >> sounds horrible. > > > > There looks to be a pretty similar loop inside move_freepages_block(), > > which is the part which I believe really needs the lock - it's moving > > free pages from one lru to another. > > > >> > >> At the same time, the iterators don't seem to require the spinlock, so > >> it should be possible to just move the lock into the loop, no? > > > > Move the lock after the loop, I think you meant. > > > > I put the lock before the loop because it's deciding whether it can > > usefully proceed, and then proceeding: I was thinking that the lock > > would stabilize the conditions that it bases that decision on. > > > We do it with two phase. > In first phase, we don't need lock because we don't need to be exact. > In second phase where move pages really, we need a lock so we already hold it. No, see Linus's point elsewhere in this thread. To spell it out further, page_order(page) uses page_private(page), and you've no idea what someone might put into page_private(page) once it's no longer PageBuddy but perhaps allocated to a user. So the unlocked advancment by page_order(page) may even take you way out of this or any pageblock. Linus was suggesting to take and drop the lock around that little block each time. Maybe. I'm wary, I don't pretend to have thought it through (nor shall further). > > ret = suitable_migration_target(page, cc); > .. > .. > spin_lock_irqsave(&zone->lock, flags); > ret = suitable_migration_target(page, cc); > > So you shouldn't put the lock in loop. > > > > > But it certainly does not stabilize all of them (most obviously not > > PageLRU), so I'm guesssing that this is a best-effort decision which > > > can safely go wrong some of the time. > > Right. > > > > > In which case, yes, much better to follow your suggestion, and hold > > the lock (with irqs disabled) for only half the time. > > > > Similarly untested patch below. > > > > But I'm entirely unfamiliar with this code: best Cc people more familiar > > with it. Does this addition of locking to rescue_unmovable_pageblock() > > look correct to you, and do you think it has a good chance of fixing the > > > No.I think we need to use start_page instead of page and I thought so, but Linus points out why not (pfn_valid_within). > we need a last page of page block to check cross-over zones, > not first page in next page block. Yes, that's the off-by-one I was alluding to. > > I should have reviewed more carefully. :( > > barrios@bbox:~/linux-2.6$ git diff > diff --git a/mm/compaction.c b/mm/compaction.c > index 4ac338a..b3fcc4b 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > > pfn = page_to_pfn(page); > start_pfn = pfn & ~(pageblock_nr_pages - 1); > - end_pfn = start_pfn + pageblock_nr_pages; > + end_pfn = start_pfn + pageblock_nr_pages - 1; Yes. > > start_page = pfn_to_page(start_pfn); > end_page = pfn_to_page(end_pfn); > @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > if (page_zone(start_page) != page_zone(end_page)) > return false; > > - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, > page++) { Yes. > if (!pfn_valid_within(pfn)) > continue; > @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) > return false; > } > > - set_pageblock_migratetype(page, MIGRATE_MOVABLE); > - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); > + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); > + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); No. I guess we can assume the incoming page was valid (fair?), so should still use that, but something else for the loop iterator. And you seem to have missed out all the locking needed. > return true; > } So Nack to that on several grounds. And I'd like to hear evidence that this really is useful code, justifying the locking and interrupt-disabling which would have to be added. My 0 out of 25000 was not reassuring. Nor the original test results, when it was doing completely the wrong thing unnoticed. Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755550Ab2FDBra (ORCPT ); Sun, 3 Jun 2012 21:47:30 -0400 Received: from mail-qa0-f49.google.com ([209.85.216.49]:57952 "EHLO mail-qa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755462Ab2FDBr3 (ORCPT ); Sun, 3 Jun 2012 21:47:29 -0400 Message-ID: <4FCC13AC.3070005@gmail.com> Date: Sun, 03 Jun 2012 21:47:24 -0400 From: KOSAKI Motohiro User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Hugh Dickins CC: Minchan Kim , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); >> - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); >> + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); > > No. I guess we can assume the incoming page was valid (fair?), > so should still use that, but something else for the loop iterator. Fair. passed page is always valid. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755607Ab2FDC2m (ORCPT ); Sun, 3 Jun 2012 22:28:42 -0400 Received: from LGEMRELSE7Q.lge.com ([156.147.1.151]:53067 "EHLO LGEMRELSE7Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755288Ab2FDC2k (ORCPT ); Sun, 3 Jun 2012 22:28:40 -0400 X-AuditID: 9c930197-b7be2ae000000ebb-04-4fcc1d562d2a Message-ID: <4FCC1D68.8060406@kernel.org> Date: Mon, 04 Jun 2012 11:28:56 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: Hugh Dickins CC: Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/04/2012 10:41 AM, Hugh Dickins wrote: > On Mon, 4 Jun 2012, Minchan Kim wrote: >> On 06/02/2012 01:40 PM, Hugh Dickins wrote: >> >>> On Fri, 1 Jun 2012, Linus Torvalds wrote: >>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: >>>>> >>>>> + spin_lock_irqsave(&zone->lock, flags); >>>>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >>>>> page++) { >>>> >>>> So holding the spinlock (and disabling irqs!) over the whole loop >>>> sounds horrible. >>> >>> There looks to be a pretty similar loop inside move_freepages_block(), >>> which is the part which I believe really needs the lock - it's moving >>> free pages from one lru to another. >>> >>>> >>>> At the same time, the iterators don't seem to require the spinlock, so >>>> it should be possible to just move the lock into the loop, no? >>> >>> Move the lock after the loop, I think you meant. >>> >>> I put the lock before the loop because it's deciding whether it can >>> usefully proceed, and then proceeding: I was thinking that the lock >>> would stabilize the conditions that it bases that decision on. >> >> >> We do it with two phase. >> In first phase, we don't need lock because we don't need to be exact. >> In second phase where move pages really, we need a lock so we already hold it. > > No, see Linus's point elsewhere in this thread. > > To spell it out further, page_order(page) uses page_private(page), > and you've no idea what someone might put into page_private(page) > once it's no longer PageBuddy but perhaps allocated to a user. > > So the unlocked advancment by page_order(page) may even take you > way out of this or any pageblock. > > Linus was suggesting to take and drop the lock around that little > block each time. Maybe. I'm wary, I don't pretend to have thought > it through (nor shall further). Right. I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it. I hope separating test which does just check whether it's migratable or not and working which really does rescue. So I think it would be better following as. if (!suitable_migration_target()) continue; spin_lock_irqsave(&zone->lock, flags); if (ret = suitable_migration_target()) { if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK) rescure_unmoable_pageblock() isolate_freepages_block(); } > >> >> ret = suitable_migration_target(page, cc); >> .. >> .. >> spin_lock_irqsave(&zone->lock, flags); >> ret = suitable_migration_target(page, cc); >> >> So you shouldn't put the lock in loop. >> >>> >>> But it certainly does not stabilize all of them (most obviously not >>> PageLRU), so I'm guesssing that this is a best-effort decision which >> >>> can safely go wrong some of the time. >> >> Right. >> >>> >>> In which case, yes, much better to follow your suggestion, and hold >>> the lock (with irqs disabled) for only half the time. >>> >>> Similarly untested patch below. >>> >>> But I'm entirely unfamiliar with this code: best Cc people more familiar >>> with it. Does this addition of locking to rescue_unmovable_pageblock() >>> look correct to you, and do you think it has a good chance of fixing the >> >> >> No.I think we need to use start_page instead of page and > > I thought so, but Linus points out why not (pfn_valid_within). > >> we need a last page of page block to check cross-over zones, >> not first page in next page block. > > Yes, that's the off-by-one I was alluding to. > >> >> I should have reviewed more carefully. :( >> >> barrios@bbox:~/linux-2.6$ git diff >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 4ac338a..b3fcc4b 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) >> >> pfn = page_to_pfn(page); >> start_pfn = pfn & ~(pageblock_nr_pages - 1); >> - end_pfn = start_pfn + pageblock_nr_pages; >> + end_pfn = start_pfn + pageblock_nr_pages - 1; > > Yes. > >> >> start_page = pfn_to_page(start_pfn); >> end_page = pfn_to_page(end_pfn); >> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) >> if (page_zone(start_page) != page_zone(end_page)) >> return false; >> >> - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >> + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, >> page++) { > > Yes. > >> if (!pfn_valid_within(pfn)) >> continue; >> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) >> return false; >> } >> >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); >> - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); >> + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); > > No. I guess we can assume the incoming page was valid (fair?), > so should still use that, but something else for the loop iterator. It should be fair. I did it in following mail. > > And you seem to have missed out all the locking needed. > >> return true; >> } > > So Nack to that on several grounds. > > And I'd like to hear evidence that this really is useful code, > justifying the locking and interrupt-disabling which would have to > be added. My 0 out of 25000 was not reassuring. Nor the original > test results, when it was doing completely the wrong thing unnoticed. In changelog, Bartlomiej said. My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means 131072 standard 4KiB pages in 'Normal' zone) is to: - allocate 120000 pages for kernel's usage - free every second page (60000 pages) of memory just allocated - allocate and use 60000 pages from user space - free remaining 60000 pages of kernel memory (now we have fragmented memory occupied mostly by user space pages) - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage The results: - with compaction disabled I get 11 successful allocations - with compaction enabled - 14 successful allocations - with this patch I'm able to get all 100 successful allocations I think above workload is really really artificial and theoretical so I didn't like this patch but Mel seem to like it. :( Quote from Mel " Ok, that is indeed an adverse workload that the current system will not properly deal with. I think you are right to try fixing this but may need a different approach that takes the cost out of the allocation/free path and moves it the compaction path." We can correct this patch to work but at least need justification about it. Do we really need this patch for such artificial workload? what do you think? -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755622Ab2FDCaY (ORCPT ); Sun, 3 Jun 2012 22:30:24 -0400 Received: from LGEMRELSE1Q.lge.com ([156.147.1.111]:60876 "EHLO LGEMRELSE1Q.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755323Ab2FDCaX (ORCPT ); Sun, 3 Jun 2012 22:30:23 -0400 X-AuditID: 9c93016f-b7c3cae000001954-63-4fcc1dbd97a1 Message-ID: <4FCC1DD0.8090003@kernel.org> Date: Mon, 04 Jun 2012 11:30:40 +0900 From: Minchan Kim User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 MIME-Version: 1.0 Newsgroups: gmane.linux.kernel.mm,gmane.linux.kernel To: KOSAKI Motohiro CC: Linus Torvalds , Hugh Dickins , Dave Jones , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <20120603181548.GA306@redhat.com> <20120603183139.GA1061@redhat.com> <20120603205332.GA5412@redhat.com> <4FCC0DB4.30106@kernel.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Brightmail-Tracker: AAAAAA== Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/04/2012 10:26 AM, KOSAKI Motohiro wrote: >> Right. I missed that. I think we can use the page passed to rescue_unmovable_pageblock. >> We make sure it's valid in isolate_freepages. So how about this? >> >> barrios@bbox:~/linux-2.6$ git diff >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 4ac338a..7459ab5 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -368,11 +368,11 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc, >> static bool rescue_unmovable_pageblock(struct page *page) >> { >> unsigned long pfn, start_pfn, end_pfn; >> - struct page *start_page, *end_page; >> + struct page *start_page, *end_page, *cursor_page; >> >> pfn = page_to_pfn(page); >> start_pfn = pfn & ~(pageblock_nr_pages - 1); >> - end_pfn = start_pfn + pageblock_nr_pages; >> + end_pfn = start_pfn + pageblock_nr_pages - 1; >> >> start_page = pfn_to_page(start_pfn); >> end_page = pfn_to_page(end_pfn); >> @@ -381,19 +381,19 @@ static bool rescue_unmovable_pageblock(struct page *page) >> if (page_zone(start_page) != page_zone(end_page)) >> return false; >> >> - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, >> - page++) { >> + for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; pfn++, >> + cursor_page++) { >> if (!pfn_valid_within(pfn)) >> continue; > > I guess page_zone() should be used after pfn_valid_within(). Why can > we assume invalid > pfn return correct zone? Right you are. We can't make sure it in case of CONFIG_HOLES_IN_ZONE. -- Kind regards, Minchan Kim From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754879Ab2FDEVk (ORCPT ); Mon, 4 Jun 2012 00:21:40 -0400 Received: from mail-qa0-f46.google.com ([209.85.216.46]:46652 "EHLO mail-qa0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751735Ab2FDEVj (ORCPT ); Mon, 4 Jun 2012 00:21:39 -0400 Message-ID: <4FCC37CE.3080203@gmail.com> Date: Mon, 04 Jun 2012 00:21:34 -0400 From: KOSAKI Motohiro User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Minchan Kim CC: Hugh Dickins , Linus Torvalds , Bartlomiej Zolnierkiewicz , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kosaki.motohiro@gmail.com Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() References: <20120530163317.GA13189@redhat.com> <20120531005739.GA4532@redhat.com> <20120601023107.GA19445@redhat.com> <20120601161205.GA1918@redhat.com> <20120601171606.GA3794@redhat.com> <4FCC0B09.1070708@kernel.org> <4FCC1D68.8060406@kernel.org> In-Reply-To: <4FCC1D68.8060406@kernel.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > In changelog, Bartlomiej said. > > My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means > 131072 standard 4KiB pages in 'Normal' zone) is to: > > - allocate 120000 pages for kernel's usage > - free every second page (60000 pages) of memory just allocated > - allocate and use 60000 pages from user space > - free remaining 60000 pages of kernel memory > (now we have fragmented memory occupied mostly by user space pages) > - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage > > The results: > - with compaction disabled I get 11 successful allocations > - with compaction enabled - 14 successful allocations > - with this patch I'm able to get all 100 successful allocations > > I think above workload is really really artificial and theoretical so I didn't like > this patch but Mel seem to like it. :( > > Quote from Mel > " Ok, that is indeed an adverse workload that the current system will not > properly deal with. I think you are right to try fixing this but may need > a different approach that takes the cost out of the allocation/free path > and moves it the compaction path." > > We can correct this patch to work but at least need justification about it. > Do we really need this patch for such artificial workload? > what do you think? I'm ok to resubmit. But please change the thread. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932317Ab2FDNiy (ORCPT ); Mon, 4 Jun 2012 09:38:54 -0400 Received: from mailout3.w1.samsung.com ([210.118.77.13]:49295 "EHLO mailout3.w1.samsung.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755468Ab2FDNis (ORCPT ); Mon, 4 Jun 2012 09:38:48 -0400 Date: Mon, 04 Jun 2012 15:37:30 +0200 From: Bartlomiej Zolnierkiewicz Subject: Re: WARNING: at mm/page-writeback.c:1990 __set_page_dirty_nobuffers+0x13a/0x170() In-reply-to: <4FCC1D68.8060406@kernel.org> To: Minchan Kim Cc: Hugh Dickins , Linus Torvalds , Kyungmin Park , Marek Szyprowski , Mel Gorman , Rik van Riel , Dave Jones , Andrew Morton , Cong Wang , Markus Trippelsdorf , linux-kernel@vger.kernel.org, linux-mm@kvack.org Message-id: <201206041537.30787.b.zolnierkie@samsung.com> MIME-version: 1.0 Content-type: Text/Plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT User-Agent: KMail/1.13.2 (Linux/3.2.6; KDE/4.4.5; i686; ; ) References: <20120530163317.GA13189@redhat.com> <4FCC1D68.8060406@kernel.org> X-TM-AS-MML: No Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Monday 04 June 2012 04:28:56 Minchan Kim wrote: > On 06/04/2012 10:41 AM, Hugh Dickins wrote: > > > On Mon, 4 Jun 2012, Minchan Kim wrote: > >> On 06/02/2012 01:40 PM, Hugh Dickins wrote: > >> > >>> On Fri, 1 Jun 2012, Linus Torvalds wrote: > >>>> On Fri, Jun 1, 2012 at 3:17 PM, Hugh Dickins wrote: > >>>>> > >>>>> + spin_lock_irqsave(&zone->lock, flags); > >>>>> for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > >>>>> page++) { > >>>> > >>>> So holding the spinlock (and disabling irqs!) over the whole loop > >>>> sounds horrible. > >>> > >>> There looks to be a pretty similar loop inside move_freepages_block(), > >>> which is the part which I believe really needs the lock - it's moving > >>> free pages from one lru to another. > >>> > >>>> > >>>> At the same time, the iterators don't seem to require the spinlock, so > >>>> it should be possible to just move the lock into the loop, no? > >>> > >>> Move the lock after the loop, I think you meant. > >>> > >>> I put the lock before the loop because it's deciding whether it can > >>> usefully proceed, and then proceeding: I was thinking that the lock > >>> would stabilize the conditions that it bases that decision on. > >> > >> > >> We do it with two phase. > >> In first phase, we don't need lock because we don't need to be exact. > >> In second phase where move pages really, we need a lock so we already hold it. > > > > No, see Linus's point elsewhere in this thread. > > > > To spell it out further, page_order(page) uses page_private(page), > > and you've no idea what someone might put into page_private(page) > > once it's no longer PageBuddy but perhaps allocated to a user. > > > > So the unlocked advancment by page_order(page) may even take you > > way out of this or any pageblock. > > > > Linus was suggesting to take and drop the lock around that little > > block each time. Maybe. I'm wary, I don't pretend to have thought > > it through (nor shall further). > > > Right. > I got confused because suitable_migration_target did rescure_unmovable_pageblock. I don't want it. > I hope separating test which does just check whether it's migratable or not and working > which really does rescue. > So I think it would be better following as. > > if (!suitable_migration_target()) > continue; > > spin_lock_irqsave(&zone->lock, flags); > if (ret = suitable_migration_target()) { > if (ret == CAN_MAKE_MOVABLE_PAGE_BLOCK) > rescure_unmoable_pageblock() > isolate_freepages_block(); > } > > > > >> > >> ret = suitable_migration_target(page, cc); > >> .. > >> .. > >> spin_lock_irqsave(&zone->lock, flags); > >> ret = suitable_migration_target(page, cc); > >> > >> So you shouldn't put the lock in loop. > >> > >>> > >>> But it certainly does not stabilize all of them (most obviously not > >>> PageLRU), so I'm guesssing that this is a best-effort decision which > >> > >>> can safely go wrong some of the time. > >> > >> Right. > >> > >>> > >>> In which case, yes, much better to follow your suggestion, and hold > >>> the lock (with irqs disabled) for only half the time. > >>> > >>> Similarly untested patch below. > >>> > >>> But I'm entirely unfamiliar with this code: best Cc people more familiar > >>> with it. Does this addition of locking to rescue_unmovable_pageblock() > >>> look correct to you, and do you think it has a good chance of fixing the > >> > >> > >> No.I think we need to use start_page instead of page and > > > > I thought so, but Linus points out why not (pfn_valid_within). > > > >> we need a last page of page block to check cross-over zones, > >> not first page in next page block. > > > > Yes, that's the off-by-one I was alluding to. > > > >> > >> I should have reviewed more carefully. :( > >> > >> barrios@bbox:~/linux-2.6$ git diff > >> diff --git a/mm/compaction.c b/mm/compaction.c > >> index 4ac338a..b3fcc4b 100644 > >> --- a/mm/compaction.c > >> +++ b/mm/compaction.c > >> @@ -372,7 +372,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > >> > >> pfn = page_to_pfn(page); > >> start_pfn = pfn & ~(pageblock_nr_pages - 1); > >> - end_pfn = start_pfn + pageblock_nr_pages; > >> + end_pfn = start_pfn + pageblock_nr_pages - 1; > > > > Yes. > > > >> > >> start_page = pfn_to_page(start_pfn); > >> end_page = pfn_to_page(end_pfn); > >> @@ -381,7 +381,7 @@ static bool rescue_unmovable_pageblock(struct page *page) > >> if (page_zone(start_page) != page_zone(end_page)) > >> return false; > >> > >> - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, > >> + for (page = start_page, pfn = start_pfn; page <= end_page; pfn++, > >> page++) { > > > > Yes. > > > >> if (!pfn_valid_within(pfn)) > >> continue; > >> @@ -399,8 +399,8 @@ static bool rescue_unmovable_pageblock(struct page *page) > >> return false; > >> } > >> > >> - set_pageblock_migratetype(page, MIGRATE_MOVABLE); > >> - move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); > >> + set_pageblock_migratetype(start_page, MIGRATE_MOVABLE); > >> + move_freepages_block(page_zone(start_page), start_page, MIGRATE_MOVABLE); > > > > No. I guess we can assume the incoming page was valid (fair?), > > so should still use that, but something else for the loop iterator. > > > It should be fair. I did it in following mail. > > > > > And you seem to have missed out all the locking needed. > > > >> return true; > >> } > > > > So Nack to that on several grounds. > > > > And I'd like to hear evidence that this really is useful code, > > justifying the locking and interrupt-disabling which would have to > > be added. My 0 out of 25000 was not reassuring. Nor the original > > test results, when it was doing completely the wrong thing unnoticed. > > > In changelog, Bartlomiej said. > > My particular test case (on a ARM EXYNOS4 device with 512 MiB, which means > 131072 standard 4KiB pages in 'Normal' zone) is to: > > - allocate 120000 pages for kernel's usage > - free every second page (60000 pages) of memory just allocated > - allocate and use 60000 pages from user space > - free remaining 60000 pages of kernel memory > (now we have fragmented memory occupied mostly by user space pages) > - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage > > The results: > - with compaction disabled I get 11 successful allocations > - with compaction enabled - 14 successful allocations > - with this patch I'm able to get all 100 successful allocations > > I think above workload is really really artificial and theoretical so I didn't like > this patch but Mel seem to like it. :( > > Quote from Mel > " Ok, that is indeed an adverse workload that the current system will not > properly deal with. I think you are right to try fixing this but may need > a different approach that takes the cost out of the allocation/free path > and moves it the compaction path." Please note that the current patch is less intrusive than the original version that Mel was talking about in the above quote (the cost is only in compaction path which is non-default one and in a allocation slow-path). > We can correct this patch to work but at least need justification about it. > Do we really need this patch for such artificial workload? > what do you think? I would still like to get this patch included since it helps with my test-case and is not very much code / complexity added. So far I fixed (all?) outstanding issues in the patch attached below and will post the next combined version (v9) of the patch in the new thread. Best regards, -- Bartlomiej Zolnierkiewicz Samsung Poland R&D Center - use right page for pageblock conversion in rescue_unmovable_pageblock() - split rescue_unmovable_pageblock() on can_rescue_unmovable_pageblock() and __rescue_unmovable_pageblock() - add missing locking --- mm/compaction.c | 66 ++++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 45 insertions(+), 21 deletions(-) Index: b/mm/compaction.c =================================================================== --- a/mm/compaction.c 2012-06-04 15:19:04.564467996 +0200 +++ b/mm/compaction.c 2012-06-04 15:19:15.700467901 +0200 @@ -362,50 +362,70 @@ isolate_migratepages_range(struct zone * #endif /* CONFIG_COMPACTION || CONFIG_CMA */ #ifdef CONFIG_COMPACTION /* - * Returns true if MIGRATE_UNMOVABLE pageblock was successfully + * Returns true if MIGRATE_UNMOVABLE pageblock can be successfully * converted to MIGRATE_MOVABLE type, false otherwise. */ -static bool rescue_unmovable_pageblock(struct page *page) +static bool can_rescue_unmovable_pageblock(struct page *page, bool locked) { unsigned long pfn, start_pfn, end_pfn; - struct page *start_page, *end_page; + struct page *start_page, *end_page, *cursor_page; pfn = page_to_pfn(page); start_pfn = pfn & ~(pageblock_nr_pages - 1); - end_pfn = start_pfn + pageblock_nr_pages; + end_pfn = start_pfn + pageblock_nr_pages - 1; start_page = pfn_to_page(start_pfn); end_page = pfn_to_page(end_pfn); - /* Do not deal with pageblocks that overlap zones */ - if (page_zone(start_page) != page_zone(end_page)) - return false; + for (cursor_page = start_page, pfn = start_pfn; cursor_page <= end_page; + pfn++, cursor_page++) { + struct zone *zone = page_zone(start_page); + unsigned long flags; - for (page = start_page, pfn = start_pfn; page < end_page; pfn++, - page++) { if (!pfn_valid_within(pfn)) continue; - if (PageBuddy(page)) { - int order = page_order(page); + /* Do not deal with pageblocks that overlap zones */ + if (page_zone(cursor_page) != zone) + return false; + + if (!locked) + spin_lock_irqsave(&zone->lock, flags); + + if (PageBuddy(cursor_page)) { + int order = page_order(cursor_page); pfn += (1 << order) - 1; - page += (1 << order) - 1; + cursor_page += (1 << order) - 1; + if (!locked) + spin_unlock_irqrestore(&zone->lock, flags); continue; - } else if (page_count(page) == 0 || PageLRU(page)) + } else if (page_count(cursor_page) == 0 || + PageLRU(cursor_page)) { + if (!locked) + spin_unlock_irqrestore(&zone->lock, flags); continue; + } + + if (!locked) + spin_unlock_irqrestore(&zone->lock, flags); return false; } + return true; +} + +void __rescue_unmovable_pageblock(struct page *page) +{ set_pageblock_migratetype(page, MIGRATE_MOVABLE); move_freepages_block(page_zone(page), page, MIGRATE_MOVABLE); - return true; } enum smt_result { GOOD_AS_MIGRATION_TARGET, + GOOD_CAN_RESCUE_UNMOVABLE_TARGET, FAIL_UNMOVABLE_TARGET, FAIL_BAD_TARGET, }; @@ -416,7 +436,7 @@ enum smt_result { * is within a MIGRATE_UNMOVABLE block, FAIL_BAD_TARGET otherwise. */ static enum smt_result suitable_migration_target(struct page *page, - struct compact_control *cc) + struct compact_control *cc, bool locked) { int migratetype = get_pageblock_migratetype(page); @@ -440,8 +460,8 @@ static enum smt_result suitable_migratio if (cc->mode != COMPACT_ASYNC_MOVABLE && migratetype == MIGRATE_UNMOVABLE && - rescue_unmovable_pageblock(page)) - return GOOD_AS_MIGRATION_TARGET; + can_rescue_unmovable_pageblock(page, locked)) + return GOOD_CAN_RESCUE_UNMOVABLE_TARGET; /* Otherwise skip the block */ return FAIL_BAD_TARGET; @@ -509,8 +529,9 @@ static void isolate_freepages(struct zon continue; /* Check the block is suitable for migration */ - ret = suitable_migration_target(page, cc); - if (ret != GOOD_AS_MIGRATION_TARGET) { + ret = suitable_migration_target(page, cc, false); + if (ret != GOOD_AS_MIGRATION_TARGET && + ret != GOOD_CAN_RESCUE_UNMOVABLE_TARGET) { if (ret == FAIL_UNMOVABLE_TARGET) cc->nr_pageblocks_skipped++; continue; @@ -523,8 +544,11 @@ static void isolate_freepages(struct zon */ isolated = 0; spin_lock_irqsave(&zone->lock, flags); - ret = suitable_migration_target(page, cc); - if (ret == GOOD_AS_MIGRATION_TARGET) { + ret = suitable_migration_target(page, cc, true); + if (ret == GOOD_AS_MIGRATION_TARGET || + ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) { + if (ret == GOOD_CAN_RESCUE_UNMOVABLE_TARGET) + __rescue_unmovable_pageblock(page); end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn); isolated = isolate_freepages_block(pfn, end_pfn, freelist, false);