linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers
@ 2011-01-11  6:15 Andy Grover
  2011-01-21  8:18 ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Grover @ 2011-01-11  6:15 UTC (permalink / raw)
  To: linux-mm; +Cc: rds-devel, Andy Grover

RDS is calling set_page_dirty from interrupt context, which
ends up calling this function. Using irqsave ensures irqs
are not re-enabled by this function.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
---
 mm/page-writeback.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index b840afa..c6c381b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1155,11 +1155,12 @@ int __set_page_dirty_nobuffers(struct page *page)
 	if (!TestSetPageDirty(page)) {
 		struct address_space *mapping = page_mapping(page);
 		struct address_space *mapping2;
+		unsigned long flags;
 
 		if (!mapping)
 			return 1;
 
-		spin_lock_irq(&mapping->tree_lock);
+		spin_lock_irqsave(&mapping->tree_lock, flags);
 		mapping2 = page_mapping(page);
 		if (mapping2) { /* Race with truncate? */
 			BUG_ON(mapping2 != mapping);
@@ -1168,7 +1169,7 @@ int __set_page_dirty_nobuffers(struct page *page)
 			radix_tree_tag_set(&mapping->page_tree,
 				page_index(page), PAGECACHE_TAG_DIRTY);
 		}
-		spin_unlock_irq(&mapping->tree_lock);
+		spin_unlock_irqrestore(&mapping->tree_lock, flags);
 		if (mapping->host) {
 			/* !PageAnon && !swapper_space */
 			__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers
  2011-01-11  6:15 [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers Andy Grover
@ 2011-01-21  8:18 ` Andrew Morton
  2011-01-21 19:25   ` Andy Grover
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2011-01-21  8:18 UTC (permalink / raw)
  To: Andy Grover; +Cc: linux-mm, rds-devel

On Mon, 10 Jan 2011 22:15:34 -0800 Andy Grover <andy.grover@oracle.com> wrote:

> RDS is calling set_page_dirty from interrupt context,

yikes.  Whatever possessed you to try that?

> @@ -1155,11 +1155,12 @@ int __set_page_dirty_nobuffers(struct page *page)

__set_page_dirty_buffers(): bug, takes mapping->private_lock in irq context
                            bug, __set_page_dirty() reenables IRQs
ceph_set_page_dirty():      more bugs than I care to enumerate
nilfs_set_file_dirty():	    bug, takes sbi->s_inode_lock in IRQ context

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers
  2011-01-21  8:18 ` Andrew Morton
@ 2011-01-21 19:25   ` Andy Grover
  2011-01-21 20:09     ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Grover @ 2011-01-21 19:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, rds-devel

On 01/21/2011 12:18 AM, Andrew Morton wrote:
> On Mon, 10 Jan 2011 22:15:34 -0800 Andy Grover<andy.grover@oracle.com>  wrote:
>
>> RDS is calling set_page_dirty from interrupt context,
>
> yikes.  Whatever possessed you to try that?

When doing an RDMA read into pinned pages, we get notified the operation 
is complete in a tasklet, and would like to mark the pages dirty and 
unpin in the same context.

The issue was __set_page_dirty_buffers (via calling set_page_dirty) was 
unconditionally re-enabling irqs as a side-effect because it was using 
*_irq instead of *_irqsave/restore.

How would you recommend we proceed? My understanding was calling 
set_page_dirty prior to issuing the operation isn't an option since it 
might get cleaned too early.

Thanks -- Regards -- Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers
  2011-01-21 19:25   ` Andy Grover
@ 2011-01-21 20:09     ` Andrew Morton
  2011-01-25  1:30       ` Andy Grover
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2011-01-21 20:09 UTC (permalink / raw)
  To: Andy Grover; +Cc: linux-mm, rds-devel

On Fri, 21 Jan 2011 11:25:26 -0800
Andy Grover <andy.grover@oracle.com> wrote:

> On 01/21/2011 12:18 AM, Andrew Morton wrote:
> > On Mon, 10 Jan 2011 22:15:34 -0800 Andy Grover<andy.grover@oracle.com>  wrote:
> >
> >> RDS is calling set_page_dirty from interrupt context,
> >
> > yikes.  Whatever possessed you to try that?
> 
> When doing an RDMA read into pinned pages, we get notified the operation 
> is complete in a tasklet, and would like to mark the pages dirty and 
> unpin in the same context.
> 
> The issue was __set_page_dirty_buffers (via calling set_page_dirty) was 
> unconditionally re-enabling irqs as a side-effect because it was using 
> *_irq instead of *_irqsave/restore.

Your patch patched __set_page_dirty_nobuffers()?

> How would you recommend we proceed? My understanding was calling 
> set_page_dirty prior to issuing the operation isn't an option since it 
> might get cleaned too early.

The page should be locked, for reasons explained over
set_page_dirty_lock() (which was a strange place to document this).

What you could perhaps do is to lock_page() all the pages and run
set_page_dirty() on them *before* setting up the IO operation, then run
unlock_page() from interrupt context.

I assume that all these pages are mapped into userspace processes?  If
so, they're fully uptodate and we're OK.  If they're plain old
pagecache pages then we could have partially uptodate pages and things
get messier.

Running lock_page() against multiple pages is problematic because it
introduces a risk of ab/ba deadlocks against another thread which is
also locking multiple pages.  Possible solutions are a) take some
higher-level mutex so that only one thread will ever be running the
lock_page()s at a time or b) lock all the pages in ascending
paeg_to_pfn() order.  Both of these are a PITA.

A slow-and-safe solution to all this would be to punt the operation to
a process-context helper thread and run

	lock_page(page);
	if (page->mapping)	/* truncate? */
		set_page_dirty(page);
	unlock_page(page);

against each page.

Some thought is needed regarding anonymous pages and swapcache pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers
  2011-01-21 20:09     ` Andrew Morton
@ 2011-01-25  1:30       ` Andy Grover
  2011-01-25  1:44         ` Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Grover @ 2011-01-25  1:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, rds-devel

On 01/21/2011 12:09 PM, Andrew Morton wrote:
> Andy Grover<andy.grover@oracle.com>  wrote:
>> When doing an RDMA read into pinned pages, we get notified the operation
>> is complete in a tasklet, and would like to mark the pages dirty and
>> unpin in the same context.

>> The issue was __set_page_dirty_buffers (via calling set_page_dirty)
>> was unconditionally re-enabling irqs as a side-effect because it
>> was using *_irq instead of *_irqsave/restore.
>
> Your patch patched __set_page_dirty_nobuffers()?

Yes, _nobuffers, sorry.

> What you could perhaps do is to lock_page() all the pages and run
> set_page_dirty() on them *before* setting up the IO operation, then run
> unlock_page() from interrupt context.
>
> I assume that all these pages are mapped into userspace processes?  If
> so, they're fully uptodate and we're OK.  If they're plain old
> pagecache pages then we could have partially uptodate pages and things
> get messier.
>
> Running lock_page() against multiple pages is problematic because it
> introduces a risk of ab/ba deadlocks against another thread which is
> also locking multiple pages.  Possible solutions are a) take some
> higher-level mutex so that only one thread will ever be running the
> lock_page()s at a time or b) lock all the pages in ascending
> paeg_to_pfn() order.  Both of these are a PITA.

Another problem may be that lock/unlock_page() doesn't nest. We need to 
be able to handle multiple ops to the same page. So, sounds like we also 
need to keep track of all pages we lock/dirty and make sure they aren't 
unlocked as long as we have references against them?

I just want to fully understand what's needed, before writing at least 2 
PITA's worth of extra code :)

> Some thought is needed regarding anonymous pages and swapcache pages.

I think the common case for us is IO into anon pages.

Regards -- Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers
  2011-01-25  1:30       ` Andy Grover
@ 2011-01-25  1:44         ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2011-01-25  1:44 UTC (permalink / raw)
  To: Andy Grover; +Cc: linux-mm, rds-devel

On Mon, 24 Jan 2011 17:30:27 -0800 Andy Grover <andy.grover@oracle.com> wrote:

> > Running lock_page() against multiple pages is problematic because it
> > introduces a risk of ab/ba deadlocks against another thread which is
> > also locking multiple pages.  Possible solutions are a) take some
> > higher-level mutex so that only one thread will ever be running the
> > lock_page()s at a time or b) lock all the pages in ascending
> > paeg_to_pfn() order.  Both of these are a PITA.
> 
> Another problem may be that lock/unlock_page() doesn't nest.

Not against the same page, no.  It's functionally the same as
mutex_lock/unlock, only lockdep doesn't know about lock_page().

> We need to 
> be able to handle multiple ops to the same page. So, sounds like we also 
> need to keep track of all pages we lock/dirty and make sure they aren't 
> unlocked as long as we have references against them?

It sounds like it.  Also need to address the ab/ba issue with multiple
lock_page()s in a single thread.

I don't *think* there's any other site in the kernel which locks
multiple pages like this.  Adopting the convention of "lock them in
ascending pfn order" will be OK, I think.

> I just want to fully understand what's needed, before writing at least 2 
> PITA's worth of extra code :)
> 
> > Some thought is needed regarding anonymous pages and swapcache pages.
> 
> I think the common case for us is IO into anon pages.

lock_page() will presumably keep the swapcache manipulations happy. 
We'd also need to think about the implications of pte-dirtiness and
maybe rmap walks when dealing with non-cpu-initiated dirtyings.  "do
what fs/direct-io.c does" would be a good starting point.

Actually, fs/direct-io.c gets away without locking the pages.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-01-25  1:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-11  6:15 [RESEND PATCH] mm: Use spin_lock_irqsave in __set_page_dirty_nobuffers Andy Grover
2011-01-21  8:18 ` Andrew Morton
2011-01-21 19:25   ` Andy Grover
2011-01-21 20:09     ` Andrew Morton
2011-01-25  1:30       ` Andy Grover
2011-01-25  1:44         ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).