linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read
@ 2012-03-01  9:18 Aneesh Kumar K.V
  2012-03-01 22:10 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Aneesh Kumar K.V @ 2012-03-01  9:18 UTC (permalink / raw)
  To: linux-mm, mgorman, kamezawa.hiroyu, dhillf, akpm, viro, hughd
  Cc: linux-kernel, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

Taking i_mutex lock in hugetlbfs_read can result in deadlock with mmap
as explained below
 Thread A:
  read() on hugetlbfs
   hugetlbfs_read() called
    i_mutex grabbed
     hugetlbfs_read_actor() called
      __copy_to_user() called
       page fault is triggered
 Thread B, sharing address space with A:
  mmap() the same file
   ->mmap_sem is grabbed on task_B->mm->mmap_sem
    hugetlbfs_file_mmap() is called
     attempt to grab ->i_mutex and block waiting for A to give it up
 Thread A:
  pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
 which happens to be the same thing as task_B->mm->mmap_sem.  Block waiting
 for B to give it up.

AFAIU i_mutex lock got added to  hugetlbfs_read as per
http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3066.html
to take care of the race between truncate and read. This patch fix
this by looking at page->mapping under page_lock (find_lock_page())
to ensure; the inode didn't get truncated in the range during a
parallel read.

Ideally we can extend the patch to make sure we don't increase i_size
in mmap. But that will break userspace, because application will now
have to use truncate(2) to increase i_size in hugetlbfs.

Based on the original patch from Hillf Danton <dhillf@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 fs/hugetlbfs/inode.c |   25 +++++++++----------------
 1 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 1e85a7a..3645cd3 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -238,17 +238,10 @@ static ssize_t hugetlbfs_read(struct file *filp, char __user *buf,
 	loff_t isize;
 	ssize_t retval = 0;
 
-	mutex_lock(&inode->i_mutex);
-
 	/* validate length */
 	if (len == 0)
 		goto out;
 
-	isize = i_size_read(inode);
-	if (!isize)
-		goto out;
-
-	end_index = (isize - 1) >> huge_page_shift(h);
 	for (;;) {
 		struct page *page;
 		unsigned long nr, ret;
@@ -256,18 +249,21 @@ static ssize_t hugetlbfs_read(struct file *filp, char __user *buf,
 
 		/* nr is the maximum number of bytes to copy from this page */
 		nr = huge_page_size(h);
+		isize = i_size_read(inode);
+		if (!isize)
+			goto out;
+		end_index = (isize - 1) >> huge_page_shift(h);
 		if (index >= end_index) {
 			if (index > end_index)
 				goto out;
 			nr = ((isize - 1) & ~huge_page_mask(h)) + 1;
-			if (nr <= offset) {
+			if (nr <= offset)
 				goto out;
-			}
 		}
 		nr = nr - offset;
 
 		/* Find the page */
-		page = find_get_page(mapping, index);
+		page = find_lock_page(mapping, index);
 		if (unlikely(page == NULL)) {
 			/*
 			 * We have a HOLE, zero out the user-buffer for the
@@ -279,17 +275,18 @@ static ssize_t hugetlbfs_read(struct file *filp, char __user *buf,
 			else
 				ra = 0;
 		} else {
+			unlock_page(page);
+
 			/*
 			 * We have the page, copy it to user space buffer.
 			 */
 			ra = hugetlbfs_read_actor(page, offset, buf, len, nr);
 			ret = ra;
+			page_cache_release(page);
 		}
 		if (ra < 0) {
 			if (retval == 0)
 				retval = ra;
-			if (page)
-				page_cache_release(page);
 			goto out;
 		}
 
@@ -299,16 +296,12 @@ static ssize_t hugetlbfs_read(struct file *filp, char __user *buf,
 		index += offset >> huge_page_shift(h);
 		offset &= ~huge_page_mask(h);
 
-		if (page)
-			page_cache_release(page);
-
 		/* short read or no more work */
 		if ((ret != nr) || (len == 0))
 			break;
 	}
 out:
 	*ppos = ((loff_t)index << huge_page_shift(h)) + offset;
-	mutex_unlock(&inode->i_mutex);
 	return retval;
 }
 
-- 
1.7.9

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read
  2012-03-01  9:18 [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read Aneesh Kumar K.V
@ 2012-03-01 22:10 ` Andrew Morton
  2012-03-01 22:40   ` Dave Jones
  2012-03-01 22:40   ` Josh Boyer
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Morton @ 2012-03-01 22:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: linux-mm, mgorman, kamezawa.hiroyu, dhillf, viro, hughd,
	linux-kernel

On Thu,  1 Mar 2012 14:48:50 +0530
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:

> Taking i_mutex lock in hugetlbfs_read can result in deadlock with mmap
> as explained below
>  Thread A:
>   read() on hugetlbfs
>    hugetlbfs_read() called
>     i_mutex grabbed
>      hugetlbfs_read_actor() called
>       __copy_to_user() called
>        page fault is triggered
>  Thread B, sharing address space with A:
>   mmap() the same file
>    ->mmap_sem is grabbed on task_B->mm->mmap_sem
>     hugetlbfs_file_mmap() is called
>      attempt to grab ->i_mutex and block waiting for A to give it up
>  Thread A:
>   pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
>  which happens to be the same thing as task_B->mm->mmap_sem.  Block waiting
>  for B to give it up.
> 
> AFAIU i_mutex lock got added to  hugetlbfs_read as per
> http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3066.html
> to take care of the race between truncate and read. This patch fix
> this by looking at page->mapping under page_lock (find_lock_page())
> to ensure; the inode didn't get truncated in the range during a
> parallel read.
> 
> Ideally we can extend the patch to make sure we don't increase i_size
> in mmap. But that will break userspace, because application will now
> have to use truncate(2) to increase i_size in hugetlbfs.

Looks OK to me.

Given that the bug has been there for four years, I'm assuming that
we'll be OK merging this fix into 3.4.  Or we could merge it into 3.4
and tag it for backporting into earlier kernels - it depends on whether
people are hurting from it, which I don't know?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read
  2012-03-01 22:10 ` Andrew Morton
@ 2012-03-01 22:40   ` Dave Jones
  2012-03-01 22:40   ` Josh Boyer
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Jones @ 2012-03-01 22:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aneesh Kumar K.V, linux-mm, mgorman, kamezawa.hiroyu, dhillf,
	viro, hughd, linux-kernel

On Thu, Mar 01, 2012 at 02:10:07PM -0800, Andrew Morton wrote:
 
 > > AFAIU i_mutex lock got added to  hugetlbfs_read as per
 > > http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3066.html
 > > to take care of the race between truncate and read. This patch fix
 > > this by looking at page->mapping under page_lock (find_lock_page())
 > > to ensure; the inode didn't get truncated in the range during a
 > > parallel read.
 > > 
 > > Ideally we can extend the patch to make sure we don't increase i_size
 > > in mmap. But that will break userspace, because application will now
 > > have to use truncate(2) to increase i_size in hugetlbfs.
 > 
 > Looks OK to me.
 > 
 > Given that the bug has been there for four years, I'm assuming that
 > we'll be OK merging this fix into 3.4.  Or we could merge it into 3.4
 > and tag it for backporting into earlier kernels - it depends on whether
 > people are hurting from it, which I don't know?

My testing hits this every day. It's not a real problem, but it's annoying
to see the lockdep spew constantly.  We've had a couple Fedora users
report it too in regular day-to-day use as opposed to the hostile
workloads I use to provoke it.

FWIW, I'll probably throw it in the Fedora kernels, so if it ends up
in stable, it'll be one less patch to carry.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read
  2012-03-01 22:10 ` Andrew Morton
  2012-03-01 22:40   ` Dave Jones
@ 2012-03-01 22:40   ` Josh Boyer
  2012-03-01 22:44     ` Andrew Morton
  1 sibling, 1 reply; 5+ messages in thread
From: Josh Boyer @ 2012-03-01 22:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Aneesh Kumar K.V, linux-mm, mgorman, kamezawa.hiroyu, dhillf,
	viro, hughd, linux-kernel

On Thu, Mar 1, 2012 at 5:10 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu,  1 Mar 2012 14:48:50 +0530
> "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> wrote:
>
>> Taking i_mutex lock in hugetlbfs_read can result in deadlock with mmap
>> as explained below
>>  Thread A:
>>   read() on hugetlbfs
>>    hugetlbfs_read() called
>>     i_mutex grabbed
>>      hugetlbfs_read_actor() called
>>       __copy_to_user() called
>>        page fault is triggered
>>  Thread B, sharing address space with A:
>>   mmap() the same file
>>    ->mmap_sem is grabbed on task_B->mm->mmap_sem
>>     hugetlbfs_file_mmap() is called
>>      attempt to grab ->i_mutex and block waiting for A to give it up
>>  Thread A:
>>   pagefault handled blocked on attempt to grab task_A->mm->mmap_sem,
>>  which happens to be the same thing as task_B->mm->mmap_sem.  Block waiting
>>  for B to give it up.
>>
>> AFAIU i_mutex lock got added to  hugetlbfs_read as per
>> http://lkml.indiana.edu/hypermail/linux/kernel/0707.2/3066.html
>> to take care of the race between truncate and read. This patch fix
>> this by looking at page->mapping under page_lock (find_lock_page())
>> to ensure; the inode didn't get truncated in the range during a
>> parallel read.
>>
>> Ideally we can extend the patch to make sure we don't increase i_size
>> in mmap. But that will break userspace, because application will now
>> have to use truncate(2) to increase i_size in hugetlbfs.
>
> Looks OK to me.
>
> Given that the bug has been there for four years, I'm assuming that
> we'll be OK merging this fix into 3.4.  Or we could merge it into 3.4
> and tag it for backporting into earlier kernels - it depends on whether
> people are hurting from it, which I don't know?

We've gotten a few lockdep reports about it in Fedora on various kernels.
A CC to stable might be nice.

josh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read
  2012-03-01 22:40   ` Josh Boyer
@ 2012-03-01 22:44     ` Andrew Morton
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2012-03-01 22:44 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Aneesh Kumar K.V, linux-mm, mgorman, kamezawa.hiroyu, dhillf,
	viro, hughd, linux-kernel


On Thu, 1 Mar 2012 17:40:41 -0500
Josh Boyer <jwboyer@gmail.com> wrote:

> We've gotten a few lockdep reports about it in Fedora on various kernels.
> A CC to stable might be nice.
> 

On Thu, 1 Mar 2012 17:40:14 -0500
Dave Jones <davej@redhat.com> wrote:

> My testing hits this every day. It's not a real problem, but it's annoying
> to see the lockdep spew constantly.  We've had a couple Fedora users
> report it too in regular day-to-day use as opposed to the hostile
> workloads I use to provoke it.
> 
> FWIW, I'll probably throw it in the Fedora kernels, so if it ends up
> in stable, it'll be one less patch to carry.

OK, thanks guys.  Cc:stable is added.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-03-01 22:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-01  9:18 [PATCH -V2] hugetlbfs: Drop taking inode i_mutex lock from hugetlbfs_read Aneesh Kumar K.V
2012-03-01 22:10 ` Andrew Morton
2012-03-01 22:40   ` Dave Jones
2012-03-01 22:40   ` Josh Boyer
2012-03-01 22:44     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).