All of lore.kernel.org
 help / color / mirror / Atom feed
From: Helge Deller <deller@gmx.de>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>,
	Dinh Nguyen <dinguyen@kernel.org>
Cc: Helge Deller <deller@gmx.de>,
	Bart Van Assche <bvanassche@acm.org>,
	Linux SCSI List <linux-scsi@vger.kernel.org>,
	linux-aio@kvack.org, linux-parisc <linux-parisc@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: spinlock recursion in aio_complete()
Date: Tue, 23 May 2023 22:01:20 +0200	[thread overview]
Message-ID: <ZG0bkNJ5jQC1a3pY@p100> (raw)
In-Reply-To: <ZGyawdtBhNnvvTv3@shell.armlinux.org.uk>

* Russell King (Oracle) <linux@armlinux.org.uk>:
> On Tue, May 23, 2023 at 12:24:04PM +0200, Helge Deller wrote:
> > On 5/22/23 23:22, Helge Deller wrote:
> > > > > It hangs in fs/aio.c:1128, function aio_complete(), in this call:
> > > > >      spin_lock_irqsave(&ctx->completion_lock, flags);
> > > >
> > > > All code that I found and that obtains ctx->completion_lock disables IRQs.
> > > > It is not clear to me how this spinlock can be locked recursively? Is it
> > > > sure that the "spinlock recursion" report is correct?
> > >
> > > Yes, it seems correct.
> > > [...]
> >
> > Bart, thanks to your suggestions I was able to narrow down the problem!
> >
> > I got LOCKDEP working on parisc, which then reports:
> > 	raw_local_irq_restore() called with IRQs enabled
> > for the spin_unlock_irqrestore() in function aio_complete(), which shouldn't happen.
> >
> > Finally, I found that parisc's flush_dcache_page() re-enables the IRQs
> > which leads to the spinlock hang in aio_complete().
> >
> > So, this is NOT a bug in aio or scsci, but we need fix in the the arch code.
>
> You can find some of the background to this at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=16ceff2d5dc9f0347ab5a08abff3f4647c2fee04
>
> which introduced flush_dcache_mmap_lock(). It looks like Hugh had
> questions over whether this should be _irqsave() rather than _irq()
> but I guess at the time all callers had interrupts enabled, and
> it's only recently that someone came up with the idea of calling
> flush_dcache_page() with interrupts disabled.
>
> Adding another arg to flush_dcache_mmap_lock() to save the flags
> may be doable, but requires a patch that touches not only architectures
> that have a private implementation, but also various code in mm/.

I've tested the attached patch on parisc, and it solves the issue.
I've not compile-tested it on arm and nios2, both seem to be
the only other affected platforms.

Thoughts?

Helge


From 25a96a4211975d46e6f4dac06e144d0fb9f5ed53 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller@gmx.de>
Date: Tue, 23 May 2023 21:48:33 +0200
Subject: [PATCH] Fix flush_dcache_page() for usage in irq context

flush_dcache_page() can be called with IRQs disabled, e.g. from
aio_complete().

Fix flush_dcache_page() on the arm, parisc and nios2 architectures
to not unintentionally re-enable IRQs by using xa_lock_irqsave() instead
of xa_lock_irq() for the flush_dcache_mmap_*lock() functions.

Cc: Russell King (Oracle) <linux@armlinux.org.uk>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index a094f964c869..5b8a1ef0dc50 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -315,6 +315,10 @@ static inline void flush_anon_page(struct vm_area_struct *vma,

 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)		\
+		xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)	\
+		xa_unlock_irqrestore(&mapping->i_pages, flags)

 /*
  * We don't appear to need to do anything here.  In fact, if we did, we'd
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..d57ec9165520 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -238,6 +238,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 {
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *mpnt;
+	unsigned long flags;
 	pgoff_t pgoff;

 	/*
@@ -248,7 +249,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	 */
 	pgoff = page->index;

-	flush_dcache_mmap_lock(mapping);
+	flush_dcache_mmap_lock_irqsave(mapping, flags);
 	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
 		unsigned long offset;

@@ -262,7 +263,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
 		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
 	}
-	flush_dcache_mmap_unlock(mapping);
+	flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }

 #if __LINUX_ARM_ARCH__ >= 6
diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index d0b71dd71287..a37242662809 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -48,5 +48,9 @@ extern void invalidate_dcache_range(unsigned long start, unsigned long end);

 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)		\
+		xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)	\
+		xa_unlock_irqrestore(&mapping->i_pages, flags)

 #endif /* _ASM_NIOS2_CACHEFLUSH_H */
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 6aa9257c3ede..35f3b599187f 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -75,11 +75,12 @@ static void flush_aliases(struct address_space *mapping, struct page *page)
 {
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *mpnt;
+	unsigned long flags;
 	pgoff_t pgoff;

 	pgoff = page->index;

-	flush_dcache_mmap_lock(mapping);
+	flush_dcache_mmap_lock_irqsave(mapping, flags);
 	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
 		unsigned long offset;

@@ -92,7 +93,7 @@ static void flush_aliases(struct address_space *mapping, struct page *page)
 		flush_cache_page(mpnt, mpnt->vm_start + offset,
 			page_to_pfn(page));
 	}
-	flush_dcache_mmap_unlock(mapping);
+	flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }

 void flush_cache_all(void)
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 0bdee6724132..c8b6928cee1e 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -48,6 +48,10 @@ void flush_dcache_page(struct page *page);

 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)		\
+		xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)	\
+		xa_unlock_irqrestore(&mapping->i_pages, flags)

 #define flush_icache_page(vma,page)	do { 		\
 	flush_kernel_dcache_page_addr(page_address(page)); \
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 1d3b8bc8a623..ca4a302d4365 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -399,6 +399,7 @@ void flush_dcache_page(struct page *page)
 	unsigned long offset;
 	unsigned long addr, old_addr = 0;
 	unsigned long count = 0;
+	unsigned long flags;
 	pgoff_t pgoff;

 	if (mapping && !mapping_mapped(mapping)) {
@@ -420,7 +421,7 @@ void flush_dcache_page(struct page *page)
 	 * to flush one address here for them all to become coherent
 	 * on machines that support equivalent aliasing
 	 */
-	flush_dcache_mmap_lock(mapping);
+	flush_dcache_mmap_lock_irqsave(mapping, flags);
 	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
 		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
 		addr = mpnt->vm_start + offset;
@@ -460,7 +461,7 @@ void flush_dcache_page(struct page *page)
 		}
 		WARN_ON(++count == 4096);
 	}
-	flush_dcache_mmap_unlock(mapping);
+	flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }
 EXPORT_SYMBOL(flush_dcache_page);


WARNING: multiple messages have this Message-ID (diff)
From: Helge Deller <deller@gmx.de>
To: "Russell King (Oracle)" <linux@armlinux.org.uk>,
	Dinh Nguyen <dinguyen@kernel.org>
Cc: Helge Deller <deller@gmx.de>,
	Bart Van Assche <bvanassche@acm.org>,
	Linux SCSI List <linux-scsi@vger.kernel.org>,
	linux-aio@kvack.org, linux-parisc <linux-parisc@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: spinlock recursion in aio_complete()
Date: Tue, 23 May 2023 22:01:20 +0200	[thread overview]
Message-ID: <ZG0bkNJ5jQC1a3pY@p100> (raw)
In-Reply-To: <ZGyawdtBhNnvvTv3@shell.armlinux.org.uk>

* Russell King (Oracle) <linux@armlinux.org.uk>:
> On Tue, May 23, 2023 at 12:24:04PM +0200, Helge Deller wrote:
> > On 5/22/23 23:22, Helge Deller wrote:
> > > > > It hangs in fs/aio.c:1128, function aio_complete(), in this call:
> > > > >      spin_lock_irqsave(&ctx->completion_lock, flags);
> > > >
> > > > All code that I found and that obtains ctx->completion_lock disables IRQs.
> > > > It is not clear to me how this spinlock can be locked recursively? Is it
> > > > sure that the "spinlock recursion" report is correct?
> > >
> > > Yes, it seems correct.
> > > [...]
> >
> > Bart, thanks to your suggestions I was able to narrow down the problem!
> >
> > I got LOCKDEP working on parisc, which then reports:
> > 	raw_local_irq_restore() called with IRQs enabled
> > for the spin_unlock_irqrestore() in function aio_complete(), which shouldn't happen.
> >
> > Finally, I found that parisc's flush_dcache_page() re-enables the IRQs
> > which leads to the spinlock hang in aio_complete().
> >
> > So, this is NOT a bug in aio or scsci, but we need fix in the the arch code.
>
> You can find some of the background to this at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=16ceff2d5dc9f0347ab5a08abff3f4647c2fee04
>
> which introduced flush_dcache_mmap_lock(). It looks like Hugh had
> questions over whether this should be _irqsave() rather than _irq()
> but I guess at the time all callers had interrupts enabled, and
> it's only recently that someone came up with the idea of calling
> flush_dcache_page() with interrupts disabled.
>
> Adding another arg to flush_dcache_mmap_lock() to save the flags
> may be doable, but requires a patch that touches not only architectures
> that have a private implementation, but also various code in mm/.

I've tested the attached patch on parisc, and it solves the issue.
I've not compile-tested it on arm and nios2, both seem to be
the only other affected platforms.

Thoughts?

Helge


From 25a96a4211975d46e6f4dac06e144d0fb9f5ed53 Mon Sep 17 00:00:00 2001
From: Helge Deller <deller@gmx.de>
Date: Tue, 23 May 2023 21:48:33 +0200
Subject: [PATCH] Fix flush_dcache_page() for usage in irq context

flush_dcache_page() can be called with IRQs disabled, e.g. from
aio_complete().

Fix flush_dcache_page() on the arm, parisc and nios2 architectures
to not unintentionally re-enable IRQs by using xa_lock_irqsave() instead
of xa_lock_irq() for the flush_dcache_mmap_*lock() functions.

Cc: Russell King (Oracle) <linux@armlinux.org.uk>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Helge Deller <deller@gmx.de>

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index a094f964c869..5b8a1ef0dc50 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -315,6 +315,10 @@ static inline void flush_anon_page(struct vm_area_struct *vma,

 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)		\
+		xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)	\
+		xa_unlock_irqrestore(&mapping->i_pages, flags)

 /*
  * We don't appear to need to do anything here.  In fact, if we did, we'd
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 7ff9feea13a6..d57ec9165520 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -238,6 +238,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 {
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *mpnt;
+	unsigned long flags;
 	pgoff_t pgoff;

 	/*
@@ -248,7 +249,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 	 */
 	pgoff = page->index;

-	flush_dcache_mmap_lock(mapping);
+	flush_dcache_mmap_lock_irqsave(mapping, flags);
 	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
 		unsigned long offset;

@@ -262,7 +263,7 @@ static void __flush_dcache_aliases(struct address_space *mapping, struct page *p
 		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
 		flush_cache_page(mpnt, mpnt->vm_start + offset, page_to_pfn(page));
 	}
-	flush_dcache_mmap_unlock(mapping);
+	flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }

 #if __LINUX_ARM_ARCH__ >= 6
diff --git a/arch/nios2/include/asm/cacheflush.h b/arch/nios2/include/asm/cacheflush.h
index d0b71dd71287..a37242662809 100644
--- a/arch/nios2/include/asm/cacheflush.h
+++ b/arch/nios2/include/asm/cacheflush.h
@@ -48,5 +48,9 @@ extern void invalidate_dcache_range(unsigned long start, unsigned long end);

 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)		\
+		xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)	\
+		xa_unlock_irqrestore(&mapping->i_pages, flags)

 #endif /* _ASM_NIOS2_CACHEFLUSH_H */
diff --git a/arch/nios2/mm/cacheflush.c b/arch/nios2/mm/cacheflush.c
index 6aa9257c3ede..35f3b599187f 100644
--- a/arch/nios2/mm/cacheflush.c
+++ b/arch/nios2/mm/cacheflush.c
@@ -75,11 +75,12 @@ static void flush_aliases(struct address_space *mapping, struct page *page)
 {
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *mpnt;
+	unsigned long flags;
 	pgoff_t pgoff;

 	pgoff = page->index;

-	flush_dcache_mmap_lock(mapping);
+	flush_dcache_mmap_lock_irqsave(mapping, flags);
 	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
 		unsigned long offset;

@@ -92,7 +93,7 @@ static void flush_aliases(struct address_space *mapping, struct page *page)
 		flush_cache_page(mpnt, mpnt->vm_start + offset,
 			page_to_pfn(page));
 	}
-	flush_dcache_mmap_unlock(mapping);
+	flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }

 void flush_cache_all(void)
diff --git a/arch/parisc/include/asm/cacheflush.h b/arch/parisc/include/asm/cacheflush.h
index 0bdee6724132..c8b6928cee1e 100644
--- a/arch/parisc/include/asm/cacheflush.h
+++ b/arch/parisc/include/asm/cacheflush.h
@@ -48,6 +48,10 @@ void flush_dcache_page(struct page *page);

 #define flush_dcache_mmap_lock(mapping)		xa_lock_irq(&mapping->i_pages)
 #define flush_dcache_mmap_unlock(mapping)	xa_unlock_irq(&mapping->i_pages)
+#define flush_dcache_mmap_lock_irqsave(mapping, flags)		\
+		xa_lock_irqsave(&mapping->i_pages, flags)
+#define flush_dcache_mmap_unlock_irqrestore(mapping, flags)	\
+		xa_unlock_irqrestore(&mapping->i_pages, flags)

 #define flush_icache_page(vma,page)	do { 		\
 	flush_kernel_dcache_page_addr(page_address(page)); \
diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 1d3b8bc8a623..ca4a302d4365 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -399,6 +399,7 @@ void flush_dcache_page(struct page *page)
 	unsigned long offset;
 	unsigned long addr, old_addr = 0;
 	unsigned long count = 0;
+	unsigned long flags;
 	pgoff_t pgoff;

 	if (mapping && !mapping_mapped(mapping)) {
@@ -420,7 +421,7 @@ void flush_dcache_page(struct page *page)
 	 * to flush one address here for them all to become coherent
 	 * on machines that support equivalent aliasing
 	 */
-	flush_dcache_mmap_lock(mapping);
+	flush_dcache_mmap_lock_irqsave(mapping, flags);
 	vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) {
 		offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT;
 		addr = mpnt->vm_start + offset;
@@ -460,7 +461,7 @@ void flush_dcache_page(struct page *page)
 		}
 		WARN_ON(++count == 4096);
 	}
-	flush_dcache_mmap_unlock(mapping);
+	flush_dcache_mmap_unlock_irqrestore(mapping, flags);
 }
 EXPORT_SYMBOL(flush_dcache_page);


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2023-05-23 20:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-21  5:43 spinlock recursion in aio_complete() Helge Deller
2023-05-22 19:28 ` Bart Van Assche
2023-05-22 20:51   ` Helge Deller
2023-05-22 20:58     ` Bart Van Assche
2023-05-22 21:22       ` Helge Deller
2023-05-23 10:24         ` Helge Deller
2023-05-23 10:24           ` Helge Deller
2023-05-23 10:51           ` Russell King (Oracle)
2023-05-23 10:51             ` Russell King (Oracle)
2023-05-23 20:01             ` Helge Deller [this message]
2023-05-23 20:01               ` Helge Deller
2023-05-23 20:06               ` Bart Van Assche
2023-05-23 20:06                 ` Bart Van Assche
2023-05-23 20:12                 ` Helge Deller
2023-05-23 20:12                   ` Helge Deller
2023-05-23 20:24               ` Helge Deller
2023-05-23 20:24                 ` Helge Deller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZG0bkNJ5jQC1a3pY@p100 \
    --to=deller@gmx.de \
    --cc=bvanassche@acm.org \
    --cc=dinguyen@kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.