From mboxrd@z Thu Jan 1 00:00:00 1970 From: Larry Woodman Subject: Re: [PATCH] fix pgd_lock deadlock Date: Tue, 15 Feb 2011 14:31:30 -0500 Message-ID: <4D5AD492.5030500@redhat.com> References: <4CB76E8B.2090309@goop.org> <4CC0AB73.8060609@goop.org> <20110203024838.GI5843@random.random> <4D4B1392.5090603@goop.org> <20110204012109.GP5843@random.random> <4D4C6F45.6010204@goop.org> <20110207232045.GJ3347@random.random> <20110215190710.GL5935@random.random> Reply-To: lwoodman@redhat.com Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0741702087==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Thomas Gleixner Cc: Andrea Arcangeli , Jeremy Fitzhardinge , "Xen-devel@lists.xensource.com" , Ian Campbell , the arch/x86 maintainers , Hugh Dickins , Linux Kernel Mailing List , Jan Beulich , Andi Kleen , "H. Peter Anvin" , Andrew Morton , Johannes Weiner List-Id: xen-devel@lists.xenproject.org This is a multi-part message in MIME format. --===============0741702087== Content-Type: multipart/alternative; boundary="------------040305040806010203010400" This is a multi-part message in MIME format. --------------040305040806010203010400 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 02/15/2011 02:26 PM, Thomas Gleixner wrote: > On Tue, 15 Feb 2011, Andrea Arcangeli wrote: > >> Hello, >> >> Without this patch we can deadlock in the page_table_lock with NR_CPUS >> < 4 or THP on, with this patch we hopefully won't deadlock in the >> pgd_lock (if taken from irq). I can't see anything taking it from irq >> (maybe aio? to check I also tried the libaio testuite with no apparent >> VM_BUG_ON triggering), so unless somebody sees it, I think we should >> apply it. I've been running for a while with this patch applied >> without apparent problems. Other archs may follow suit if it's proven >> that there's nothing taking the pgd_lock from irq. >> >> === >> Subject: fix pgd_lock deadlock >> >> From: Andrea Arcangeli >> >> It's forbidden to take the page_table_lock with the irq disabled or if there's >> contention the IPIs (for tlb flushes) sent with the page_table_lock held will >> never run leading to a deadlock. > I really read this thing 5 times and still cannot make any sense of it. > > You talk about page_table_lock and then fiddle with pgd_lock. > > -ENOSENSE > > tglx > > I put this expanation in the redhat BZ, says it all: Larry Woodman 2011-01-21 15:54:58 EST The problem is with THP. The page reclaim code calls page_referenced_one() which takes the mm->page_table_lock on one CPU before sending an IPI to other CPU(s): On CPU1 we take the mm->page_table_lock, send IPIs and wait for a response: page_referenced_one(...) if (unlikely(PageTransHuge(page))) { pmd_t *pmd; spin_lock(&mm->page_table_lock); pmd = page_check_address_pmd(page, mm, address, PAGE_CHECK_ADDRESS_PMD_FLAG); if (pmd&& !pmd_trans_splitting(*pmd)&& pmdp_clear_flush_young_notify(vma, address, pmd)) referenced++; spin_unlock(&mm->page_table_lock); } else { CPU2 can race in vmalloc_sync_all() because it disables interrupt(preventing a response to the IPI from CPU1) and takes the pgd_lock then spins in the mm->page_table_lock which is already held on CPU1. spin_lock_irqsave(&pgd_lock, flags); list_for_each_entry(page,&pgd_list, lru) { pgd_t *pgd; spinlock_t *pgt_lock; pgd = (pgd_t *)page_address(page) + pgd_index(address); pgt_lock =&pgd_page_get_mm(page)->page_table_lock; spin_lock(pgt_lock); At this point the system is deadlocked. The pmdp_clear_flush_young_notify needs to do its PDG business with the page_table_lock held then release that lock before sending the IPIs to the other CPUs. Larry --------------040305040806010203010400 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 02/15/2011 02:26 PM, Thomas Gleixner wrote:
On Tue, 15 Feb 2011, Andrea Arcangeli wrote:

Hello,

Without this patch we can deadlock in the page_table_lock with NR_CPUS
< 4 or THP on, with this patch we hopefully won't deadlock in the
pgd_lock (if taken from irq). I can't see anything taking it from irq
(maybe aio? to check I also tried the libaio testuite with no apparent
VM_BUG_ON triggering), so unless somebody sees it, I think we should
apply it. I've been running for a while with this patch applied
without apparent problems. Other archs may follow suit if it's proven
that there's nothing taking the pgd_lock from irq.

===
Subject: fix pgd_lock deadlock

From: Andrea Arcangeli <aarcange@redhat.com>

It's forbidden to take the page_table_lock with the irq disabled or if there's
contention the IPIs (for tlb flushes) sent with the page_table_lock held will
never run leading to a deadlock.
I really read this thing 5 times and still cannot make any sense of it.

You talk about page_table_lock and then fiddle with pgd_lock.

-ENOSENSE

	tglx

 
I put this expanation in the redhat BZ, says it all:


2011-01-21 15:54:58 EST
The problem is with THP.  The page reclaim code calls page_referenced_one()
which takes the mm->page_table_lock on one CPU before sending an IPI to other
CPU(s):

On CPU1 we take the mm->page_table_lock, send IPIs and wait for a response:
page_referenced_one(...)
        if (unlikely(PageTransHuge(page))) {
                pmd_t *pmd;

                spin_lock(&mm->page_table_lock);
                pmd = page_check_address_pmd(page, mm, address,
                                             PAGE_CHECK_ADDRESS_PMD_FLAG);
                if (pmd && !pmd_trans_splitting(*pmd) &&
                    pmdp_clear_flush_young_notify(vma, address, pmd))
                        referenced++;
                spin_unlock(&mm->page_table_lock);
        } else {


CPU2 can race in vmalloc_sync_all() because it disables interrupt(preventing a
response to the IPI from CPU1) and takes the pgd_lock then spins in the
mm->page_table_lock which is already held on CPU1.

                spin_lock_irqsave(&pgd_lock, flags);
                list_for_each_entry(page, &pgd_list, lru) {
                        pgd_t *pgd;
                        spinlock_t *pgt_lock;

                        pgd = (pgd_t *)page_address(page) + pgd_index(address);

                        pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
                        spin_lock(pgt_lock);


At this point the system is deadlocked.  The pmdp_clear_flush_young_notify
needs to do its PDG business with the page_table_lock held then release that
lock before sending the IPIs to the other CPUs.


Larry
--------------040305040806010203010400-- --===============0741702087== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0741702087==--