From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Wed, 27 Feb 2008 13:23:00 +0000 (GMT) From: Hugh Dickins Subject: Re: [PATCH 05/15] memcg: fix VM_BUG_ON from page migration In-Reply-To: <20080227055211.GB2317@balbir.in.ibm.com> Message-ID: References: <20080227055211.GB2317@balbir.in.ibm.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org Return-Path: To: Balbir Singh Cc: Andrew Morton , KAMEZAWA Hiroyuki , Hirokazu Takahashi , YAMAMOTO Takashi , linux-mm@kvack.org List-ID: On Wed, 27 Feb 2008, Balbir Singh wrote: > * Hugh Dickins [2008-02-25 23:39:23]: > > > Page migration gave me free_hot_cold_page's VM_BUG_ON page->page_cgroup. > > remove_migration_pte was calling mem_cgroup_charge on the new page whenever > > it found a swap pte, before it had determined it to be a migration entry. > > That left a surplus reference count on the page_cgroup, so it was still > > attached when the page was later freed. > > > > Move that mem_cgroup_charge down to where we're sure it's a migration entry. > > We were already under i_mmap_lock or anon_vma->lock, so its GFP_KERNEL was > > already inappropriate: change that to GFP_ATOMIC. > > > > One side effect I see of this patch is that the page_cgroup lock and > the lru_lock can now be taken from within i_mmap_lock or > anon_vma->lock. That's not a side-effect of this patch, but it is something which was already being done there, and you're absolutely right to draw attention to it, thank you. Although I mentioned they were held in the comment, it hadn't really dawned on me how unwelcome that is: it's not a violation, and lockdep doesn't protest, but we'd all be happier not to interweave those otherwise independent locks in that one place. Oh, hold on, no, it's not that one place. It's a well-established nesting of locks, as when mem_cgroup_uncharge_page is called by page_remove_rmap from try_to_unmap_one. Panic over! But we'd better add memcontrol's locks to the hierarchies shown in filemap.c and in rmap.c. > > - if (mem_cgroup_charge(new, mm, GFP_KERNEL)) { > > - pte_unmap(ptep); > > - return; > > - } > > - > > ptl = pte_lockptr(mm, pmd); > > spin_lock(ptl); > > pte = *ptep; > > @@ -169,6 +164,20 @@ static void remove_migration_pte(struct > > if (!is_migration_entry(entry) || migration_entry_to_page(entry) != old) > > goto out; > > Is it not easier to uncharge here then to move to the charging to the > context below? Do you suspect this will be a common operation (so we > might end up charging/uncharing more frequently?) In what way would it be easier to charge too early, then uncharge when we find it was wrong, than simply to charge once we know it's right, as the patch does? If we were not already in atomic context, it would definitely be better to do it the way you suggest, with GFP_KERNEL not GFP_ATOMIC; but we're already in atomic context, so I cannot see any advantage to doing it your way. What would be a real improvement is a way of doing it outside the atomic context: I've given that little thought, but it's not obvious how. And really the wider issue of force_empty inconsistencies is more important than this singular wart in page migration. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org