[patch 01/28] md: fix loading of out-of-date bitmap.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 01/28] md: fix loading of out-of-date bitmap.
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 02/28] md: fix some (more) errors with bitmaps on devices larger than 2TB Greg KH
                     ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, NeilBrown

[-- Attachment #1: md-fix-loading-of-out-of-date-bitmap.patch --]
[-- Type: text/plain, Size: 1726 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: NeilBrown <neilb@suse.de>

commit b74fd2826c5acce20e6f691437b2d19372bc2057 upstream.

When md is loading a bitmap which it knows is out of date, it fills
each page with 1s and writes it back out again.  However the
write_page call makes used of bitmap->file_pages and
bitmap->last_page_size which haven't been set correctly yet.  So this
can sometimes fail.

Move the setting of file_pages and last_page_size to before the call
to write_page.

This bug can cause the assembly on an array to fail, thus making the
data inaccessible.  Hence I think it is a suitable candidate for
-stable.

Reported-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/md/bitmap.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -986,6 +986,9 @@ static int bitmap_init_from_disk(struct 
 			oldindex = index;
 			oldpage = page;
 
+			bitmap->filemap[bitmap->file_pages++] = page;
+			bitmap->last_page_size = count;
+
 			if (outofdate) {
 				/*
 				 * if bitmap is out of date, dirty the
@@ -998,15 +1001,9 @@ static int bitmap_init_from_disk(struct 
 				write_page(bitmap, page, 1);
 
 				ret = -EIO;
-				if (bitmap->flags & BITMAP_WRITE_ERROR) {
-					/* release, page not in filemap yet */
-					put_page(page);
+				if (bitmap->flags & BITMAP_WRITE_ERROR)
 					goto err;
-				}
 			}
-
-			bitmap->filemap[bitmap->file_pages++] = page;
-			bitmap->last_page_size = count;
 		}
 		paddr = kmap_atomic(page, KM_USER0);
 		if (bitmap->flags & BITMAP_HOSTENDIAN)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 02/28] md: fix some (more) errors with bitmaps on devices larger than 2TB.
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
  2009-05-14 22:51   ` [patch 01/28] md: fix loading of out-of-date bitmap Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 03/28] md/raid10: dont clear bitmap during recovery if array will still be degraded Greg KH
                     ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, NeilBrown

[-- Attachment #1: md-fix-some-errors-with-bitmaps-on-devices-larger-than-2tb.patch --]
[-- Type: text/plain, Size: 2496 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: NeilBrown <neilb@suse.de>

commit db305e507d554430a69ede901a6308e6ecb72349 upstream.

If a write intent bitmap covers more than 2TB, we sometimes work with
values beyond 32bit, so these need to be sector_t.  This patches
add the required casts to some unsigned longs that are being shifted
up.

This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with
member devices that are larger than 2TB.

Signed-off-by: NeilBrown <neilb@suse.de>
Reported-by: "Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/md/bitmap.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1013,9 +1013,11 @@ static int bitmap_init_from_disk(struct 
 		kunmap_atomic(paddr, KM_USER0);
 		if (b) {
 			/* if the disk bit is set, set the memory bit */
-			bitmap_set_memory_bits(bitmap, i << CHUNK_BLOCK_SHIFT(bitmap),
-					       ((i+1) << (CHUNK_BLOCK_SHIFT(bitmap)) >= start)
-				);
+			int needed = ((sector_t)(i+1) << (CHUNK_BLOCK_SHIFT(bitmap))
+				      >= start);
+			bitmap_set_memory_bits(bitmap,
+					       (sector_t)i << CHUNK_BLOCK_SHIFT(bitmap),
+					       needed);
 			bit_cnt++;
 			set_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
 		}
@@ -1151,8 +1153,9 @@ void bitmap_daemon_work(struct bitmap *b
 			spin_lock_irqsave(&bitmap->lock, flags);
 			clear_page_attr(bitmap, page, BITMAP_PAGE_CLEAN);
 		}
-		bmc = bitmap_get_counter(bitmap, j << CHUNK_BLOCK_SHIFT(bitmap),
-					&blocks, 0);
+		bmc = bitmap_get_counter(bitmap,
+					 (sector_t)j << CHUNK_BLOCK_SHIFT(bitmap),
+					 &blocks, 0);
 		if (bmc) {
 /*
   if (j < 100) printk("bitmap: j=%lu, *bmc = 0x%x\n", j, *bmc);
@@ -1166,7 +1169,8 @@ void bitmap_daemon_work(struct bitmap *b
 			} else if (*bmc == 1) {
 				/* we can clear the bit */
 				*bmc = 0;
-				bitmap_count_page(bitmap, j << CHUNK_BLOCK_SHIFT(bitmap),
+				bitmap_count_page(bitmap,
+						  (sector_t)j << CHUNK_BLOCK_SHIFT(bitmap),
 						  -1);
 
 				/* clear the bit */
@@ -1482,7 +1486,7 @@ void bitmap_dirty_bits(struct bitmap *bi
 	unsigned long chunk;
 
 	for (chunk = s; chunk <= e; chunk++) {
-		sector_t sec = chunk << CHUNK_BLOCK_SHIFT(bitmap);
+		sector_t sec = (sector_t)chunk << CHUNK_BLOCK_SHIFT(bitmap);
 		bitmap_set_memory_bits(bitmap, sec, 1);
 		bitmap_file_set_bit(bitmap, sec);
 	}



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 03/28] md/raid10: dont clear bitmap during recovery if array will still be degraded.
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
  2009-05-14 22:51   ` [patch 01/28] md: fix loading of out-of-date bitmap Greg KH
  2009-05-14 22:51   ` [patch 02/28] md: fix some (more) errors with bitmaps on devices larger than 2TB Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 04/28] md: remove ability to explicit set an inactive array to clean Greg KH
                     ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, NeilBrown

[-- Attachment #1: md-raid10-don-t-clear-bitmap-during-recovery-if-array-will-still-be-degraded.patch --]
[-- Type: text/plain, Size: 1735 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: NeilBrown <neilb@suse.de>

commit 18055569127253755d01733f6ecc004ed02f88d0 upstream.

If we have a raid10 with multiple missing devices, and we recover just
one of these to a spare, then we risk (depending on the bitmap and
array chunk size) clearing bits of the bitmap for which recovery isn't
complete (because a device is still missing).

This can lead to a subsequent "re-add" being recovered without
any IO happening, which would result in loss of data.

This patch takes the safe approach of not clearing bitmap bits
if the array will still be degraded.

This patch is suitable for all active -stable kernels.

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/md/raid10.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1805,17 +1805,17 @@ static sector_t sync_request(mddev_t *md
 				r10_bio->sector = sect;
 
 				raid10_find_phys(conf, r10_bio);
-				/* Need to check if this section will still be
+
+				/* Need to check if the array will still be
 				 * degraded
 				 */
-				for (j=0; j<conf->copies;j++) {
-					int d = r10_bio->devs[j].devnum;
-					if (conf->mirrors[d].rdev == NULL ||
-					    test_bit(Faulty, &conf->mirrors[d].rdev->flags)) {
+				for (j=0; j<conf->raid_disks; j++)
+					if (conf->mirrors[j].rdev == NULL ||
+					    test_bit(Faulty, &conf->mirrors[j].rdev->flags)) {
 						still_degraded = 1;
 						break;
 					}
-				}
+
 				must_sync = bitmap_start_sync(mddev->bitmap, sect,
 							      &sync_blocks, still_degraded);
 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 04/28] md: remove ability to explicit set an inactive array to clean.
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (2 preceding siblings ...)
  2009-05-14 22:51   ` [patch 03/28] md/raid10: dont clear bitmap during recovery if array will still be degraded Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 05/28] USB: Gadget: fix UTF conversion in the usbstring library Greg KH
                     ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Dan Williams, NeilBrown

[-- Attachment #1: md-remove-ability-to-explicit-set-an-inactive-array-to-clean.patch --]
[-- Type: text/plain, Size: 1861 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: NeilBrown <neilb@suse.de>

commit 5bf295975416f8e97117bbbcfb0191c00bc3e2b4 upstream.

Being able to write 'clean' to an 'array_state' of an inactive array
to activate it in 'clean' mode is both unnecessary and inconvenient.

It is unnecessary because the same can be achieved by writing
'active'.  This activates and array, but it still remains 'clean'
until the first write.

It is inconvenient because writing 'clean' is more often used to
cause an 'active' array to revert to 'clean' mode (thus blocking
any writes until a 'write-pending' is promoted to 'active').

Allowing 'clean' to both activate an array and mark an active array as
clean can lead to races:  One program writes 'clean' to mark the
active array as clean at the same time as another program writes
'inactive' to deactivate (stop) and active array.  Depending on which
writes first, the array could be deactivated and immediately
reactivated which isn't what was desired.

So just disable the use of 'clean' to activate an array.

This avoids a race that can be triggered with mdadm-3.0 and external
metadata, so it suitable for -stable.

Reported-by: Rafal Marszewski <rafal.marszewski@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/md/md.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2772,11 +2772,8 @@ array_state_store(mddev_t *mddev, const 
 			} else
 				err = -EBUSY;
 			spin_unlock_irq(&mddev->write_lock);
-		} else {
-			mddev->ro = 0;
-			mddev->recovery_cp = MaxSector;
-			err = do_md_run(mddev);
-		}
+		} else
+			err = -EINVAL;
 		break;
 	case active:
 		if (mddev->pers) {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 05/28] USB: Gadget: fix UTF conversion in the usbstring library
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (3 preceding siblings ...)
  2009-05-14 22:51   ` [patch 04/28] md: remove ability to explicit set an inactive array to clean Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 06/28] dup2: Fix return value with oldfd == newfd and invalid fd Greg KH
                     ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Alan Stern, David Brownell

[-- Attachment #1: usb-gadget-fix-utf-conversion-in-the-usbstring-library.patch --]
[-- Type: text/plain, Size: 1396 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Alan Stern <stern@rowland.harvard.edu>

commit 0f43158caddcbb110916212ebe4e39993ae70864 upstream.

This patch (as1234) fixes a bug in the UTF8 -> UTF-16 conversion
routine in the gadget/usbstring library.  In a UTF-8 multi-byte
sequence, all bytes after the first should have their high-order
two bits set to 10, not 11.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/usb/gadget/usbstring.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/drivers/usb/gadget/usbstring.c
+++ b/drivers/usb/gadget/usbstring.c
@@ -38,7 +38,7 @@ static int utf8_to_utf16le(const char *s
 				uchar = (c & 0x1f) << 6;
 
 				c = (u8) *s++;
-				if ((c & 0xc0) != 0xc0)
+				if ((c & 0xc0) != 0x80)
 					goto fail;
 				c &= 0x3f;
 				uchar |= c;
@@ -49,13 +49,13 @@ static int utf8_to_utf16le(const char *s
 				uchar = (c & 0x0f) << 12;
 
 				c = (u8) *s++;
-				if ((c & 0xc0) != 0xc0)
+				if ((c & 0xc0) != 0x80)
 					goto fail;
 				c &= 0x3f;
 				uchar |= c << 6;
 
 				c = (u8) *s++;
-				if ((c & 0xc0) != 0xc0)
+				if ((c & 0xc0) != 0x80)
 					goto fail;
 				c &= 0x3f;
 				uchar |= c;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 06/28] dup2: Fix return value with oldfd == newfd and invalid fd
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (4 preceding siblings ...)
  2009-05-14 22:51   ` [patch 05/28] USB: Gadget: fix UTF conversion in the usbstring library Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 07/28] i2c-algo-bit: Fix timeout test Greg KH
                     ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Jeff Mahoney

[-- Attachment #1: dup2-fix-return-value-with-oldfd-newfd-and-invalid-fd.patch --]
[-- Type: text/plain, Size: 1361 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Jeff Mahoney <jeffm@suse.com>

commit 2b79bc4f7ebbd5af3c8b867968f9f15602d5f802 upstream.

The return value of dup2 when oldfd == newfd and the fd isn't valid is
not getting properly sign extended.  We end up with 4294967287 instead
of -EBADF.

I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory
(2.6.29-rc5), and Ubuntu 9.04 (2.6.28).

This patch uses a signed int for the error value so it is properly
extended.

Commit 6c5d0512a091480c9f981162227fdb1c9d70e555 introduced this
regression.

Reported-by: Jiri Dluhos <jdluhos@novell.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/fcntl.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -117,11 +117,13 @@ SYSCALL_DEFINE2(dup2, unsigned int, oldf
 {
 	if (unlikely(newfd == oldfd)) { /* corner case */
 		struct files_struct *files = current->files;
+		int retval = oldfd;
+
 		rcu_read_lock();
 		if (!fcheck_files(files, oldfd))
-			oldfd = -EBADF;
+			retval = -EBADF;
 		rcu_read_unlock();
-		return oldfd;
+		return retval;
 	}
 	return sys_dup3(oldfd, newfd, 0);
 }



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 07/28] i2c-algo-bit: Fix timeout test
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (5 preceding siblings ...)
  2009-05-14 22:51   ` [patch 06/28] dup2: Fix return value with oldfd == newfd and invalid fd Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 08/28] i2c-algo-pca: Let PCA9564 recover from unacked data byte (state 0x30) Greg KH
                     ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Dave Airlie, Jean Delvare

[-- Attachment #1: i2c-algo-bit-fix-timeout-test.patch --]
[-- Type: text/plain, Size: 1187 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Dave Airlie <airlied@redhat.com>

commit 0cdba07bb23cdd3e0d64357ec3d983e6b75e541f upstream

When fetching DDC using i2c algo bit, we were often seeing timeouts
before getting valid EDID on a retry. The VESA spec states 2ms is the
DDC timeout, so when this translates into 1 jiffie and we are close
to the end of the time period, it could return with a timeout less than
2ms.

Change this code to use time_after instead of time_after_eq.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/i2c/algos/i2c-algo-bit.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/i2c/algos/i2c-algo-bit.c
+++ b/drivers/i2c/algos/i2c-algo-bit.c
@@ -104,7 +104,7 @@ static int sclhi(struct i2c_algo_bit_dat
 		 * chips may hold it low ("clock stretching") while they
 		 * are processing data internally.
 		 */
-		if (time_after_eq(jiffies, start + adap->timeout))
+		if (time_after(jiffies, start + adap->timeout))
 			return -ETIMEDOUT;
 		cond_resched();
 	}



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 08/28] i2c-algo-pca: Let PCA9564 recover from unacked data byte (state 0x30)
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (6 preceding siblings ...)
  2009-05-14 22:51   ` [patch 07/28] i2c-algo-bit: Fix timeout test Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 09/28] mm: page_mkwrite change prototype to match fault Greg KH
                     ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Enrik Berkhan, Jean Delvare

[-- Attachment #1: i2c-algo-pca-let-pca9564-recover-from-unacked-data-byte.patch --]
[-- Type: text/plain, Size: 2086 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Enrik Berkhan <Enrik.Berkhan@ge.com>

commit 2196d1cf4afab93fb64c2e5b417096e49b661612 upstream

Currently, the i2c-algo-pca driver does nothing if the chip enters state
0x30 (Data byte in I2CDAT has been transmitted; NOT ACK has been
received).  Thus, the i2c bus connected to the controller gets stuck
afterwards.

I have seen this kind of error on a custom board in certain load
situations most probably caused by interference or noise.

A possible reaction is to let the controller generate a STOP condition.
This is documented in the PCA9564 data sheet (2006-09-01) and the same
is done for other NACK states as well.

Further, state 0x38 isn't handled completely, either. Try to do another
START in this case like the data sheet says. As this couldn't be tested,
I've added a comment to try to reset the chip if the START doesn't help
as suggested by Wolfram Sang.

Signed-off-by: Enrik Berkhan <Enrik.Berkhan@ge.com>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/i2c/algos/i2c-algo-pca.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

--- a/drivers/i2c/algos/i2c-algo-pca.c
+++ b/drivers/i2c/algos/i2c-algo-pca.c
@@ -270,10 +270,21 @@ static int pca_xfer(struct i2c_adapter *
 
 		case 0x30: /* Data byte in I2CDAT has been transmitted; NOT ACK has been received */
 			DEB2("NOT ACK received after data byte\n");
+			pca_stop(adap);
 			goto out;
 
 		case 0x38: /* Arbitration lost during SLA+W, SLA+R or data bytes */
 			DEB2("Arbitration lost\n");
+			/*
+			 * The PCA9564 data sheet (2006-09-01) says "A
+			 * START condition will be transmitted when the
+			 * bus becomes free (STOP or SCL and SDA high)"
+			 * when the STA bit is set (p. 11).
+			 *
+			 * In case this won't work, try pca_reset()
+			 * instead.
+			 */
+			pca_start(adap);
 			goto out;
 
 		case 0x58: /* Data byte has been received; NOT ACK has been returned */



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 09/28] mm: page_mkwrite change prototype to match fault
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (7 preceding siblings ...)
  2009-05-14 22:51   ` [patch 08/28] i2c-algo-pca: Let PCA9564 recover from unacked data byte (state 0x30) Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 10/28] fs: fix page_mkwrite error cases in core code and btrfs Greg KH
                     ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-fsdevel, linux-mm, Nick Piggin, Chris Mason,
	Trond Myklebust, Miklos Szeredi, Steven Whitehouse, Mark Fasheh,
	Joel Becker, Artem Bityutskiy, Felix Blyakher

[-- Attachment #1: mm-page_mkwrite-change-prototype-to-match-fault.patch --]
[-- Type: text/plain, Size: 11624 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Nick Piggin <npiggin@suse.de>

commit c2ec175c39f62949438354f603f4aa170846aabb upstream


mm: page_mkwrite change prototype to match fault

Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 Documentation/filesystems/Locking |    2 +-
 drivers/video/fb_defio.c          |    3 ++-
 fs/buffer.c                       |    6 +++++-
 fs/ext4/ext4.h                    |    2 +-
 fs/ext4/inode.c                   |    5 ++++-
 fs/fuse/file.c                    |    3 ++-
 fs/gfs2/ops_file.c                |    5 ++++-
 fs/nfs/file.c                     |    5 ++++-
 fs/ocfs2/mmap.c                   |    6 ++++--
 fs/ubifs/file.c                   |    9 ++++++---
 fs/xfs/linux-2.6/xfs_file.c       |    4 ++--
 include/linux/buffer_head.h       |    2 +-
 include/linux/mm.h                |    3 ++-
 mm/memory.c                       |   26 ++++++++++++++++++++++----
 14 files changed, 60 insertions(+), 21 deletions(-)

--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -502,7 +502,7 @@ prototypes:
 	void (*open)(struct vm_area_struct*);
 	void (*close)(struct vm_area_struct*);
 	int (*fault)(struct vm_area_struct*, struct vm_fault *);
-	int (*page_mkwrite)(struct vm_area_struct *, struct page *);
+	int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *);
 	int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
 
 locking rules:
--- a/drivers/video/fb_defio.c
+++ b/drivers/video/fb_defio.c
@@ -70,8 +70,9 @@ EXPORT_SYMBOL_GPL(fb_deferred_io_fsync);
 
 /* vm_ops->page_mkwrite handler */
 static int fb_deferred_io_mkwrite(struct vm_area_struct *vma,
-				  struct page *page)
+				  struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	struct fb_info *info = vma->vm_private_data;
 	struct fb_deferred_io *fbdefio = info->fbdefio;
 	struct page *cur;
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2402,9 +2402,10 @@ int block_commit_write(struct page *page
  * unlock the page.
  */
 int
-block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 		   get_block_t get_block)
 {
+	struct page *page = vmf->page;
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	unsigned long end;
 	loff_t size;
@@ -2429,6 +2430,9 @@ block_page_mkwrite(struct vm_area_struct
 		ret = block_commit_write(page, 0, end);
 
 out_unlock:
+	if (ret)
+		ret = VM_FAULT_SIGBUS;
+
 	unlock_page(page);
 	return ret;
 }
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1084,7 +1084,7 @@ extern int ext4_meta_trans_blocks(struct
 extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_block_truncate_page(handle_t *handle,
 		struct address_space *mapping, loff_t from);
-extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4861,8 +4861,9 @@ static int ext4_bh_unmapped(handle_t *ha
 	return !buffer_mapped(bh);
 }
 
-int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	loff_t size;
 	unsigned long len;
 	int ret = -EINVAL;
@@ -4913,6 +4914,8 @@ int ext4_page_mkwrite(struct vm_area_str
 		goto out_unlock;
 	ret = 0;
 out_unlock:
+	if (ret)
+		ret = VM_FAULT_SIGBUS;
 	up_read(&inode->i_alloc_sem);
 	return ret;
 }
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1219,8 +1219,9 @@ static void fuse_vma_close(struct vm_are
  * - sync(2)
  * - try_to_free_pages() with order > PAGE_ALLOC_COSTLY_ORDER
  */
-static int fuse_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+static int fuse_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	/*
 	 * Don't use page->mapping as it may become NULL from a
 	 * concurrent truncate.
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -338,8 +338,9 @@ static int gfs2_allocate_page_backing(st
  * blocks allocated on disk to back that page.
  */
 
-static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+static int gfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	struct gfs2_inode *ip = GFS2_I(inode);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
@@ -411,6 +412,8 @@ out_unlock:
 	gfs2_glock_dq(&gh);
 out:
 	gfs2_holder_uninit(&gh);
+	if (ret)
+		ret = VM_FAULT_SIGBUS;
 	return ret;
 }
 
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -448,8 +448,9 @@ const struct address_space_operations nf
 	.launder_page = nfs_launder_page,
 };
 
-static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	struct file *filp = vma->vm_file;
 	struct dentry *dentry = filp->f_path.dentry;
 	unsigned pagelen;
@@ -480,6 +481,8 @@ static int nfs_vm_page_mkwrite(struct vm
 		ret = pagelen;
 out_unlock:
 	unlock_page(page);
+	if (ret)
+		ret = VM_FAULT_SIGBUS;
 	return ret;
 }
 
--- a/fs/ocfs2/mmap.c
+++ b/fs/ocfs2/mmap.c
@@ -150,8 +150,9 @@ out:
 	return ret;
 }
 
-static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	struct buffer_head *di_bh = NULL;
 	sigset_t blocked, oldset;
@@ -192,7 +193,8 @@ out:
 	ret2 = ocfs2_vm_op_unblock_sigs(&oldset);
 	if (ret2 < 0)
 		mlog_errno(ret2);
-
+	if (ret)
+		ret = VM_FAULT_SIGBUS;
 	return ret;
 }
 
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1140,8 +1140,9 @@ static int ubifs_releasepage(struct page
  * mmap()d file has taken write protection fault and is being made
  * writable. UBIFS must ensure page is budgeted for.
  */
-static int ubifs_vm_page_mkwrite(struct vm_area_struct *vma, struct page *page)
+static int ubifs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
+	struct page *page = vmf->page;
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	struct ubifs_info *c = inode->i_sb->s_fs_info;
 	struct timespec now = ubifs_current_time(inode);
@@ -1153,7 +1154,7 @@ static int ubifs_vm_page_mkwrite(struct 
 	ubifs_assert(!(inode->i_sb->s_flags & MS_RDONLY));
 
 	if (unlikely(c->ro_media))
-		return -EROFS;
+		return VM_FAULT_SIGBUS; /* -EROFS */
 
 	/*
 	 * We have not locked @page so far so we may budget for changing the
@@ -1186,7 +1187,7 @@ static int ubifs_vm_page_mkwrite(struct 
 		if (err == -ENOSPC)
 			ubifs_warn("out of space for mmapped file "
 				   "(inode number %lu)", inode->i_ino);
-		return err;
+		return VM_FAULT_SIGBUS;
 	}
 
 	lock_page(page);
@@ -1226,6 +1227,8 @@ static int ubifs_vm_page_mkwrite(struct 
 out_unlock:
 	unlock_page(page);
 	ubifs_release_budget(c, &req);
+	if (err)
+		err = VM_FAULT_SIGBUS;
 	return err;
 }
 
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -427,9 +427,9 @@ xfs_file_ioctl_invis(
 STATIC int
 xfs_vm_page_mkwrite(
 	struct vm_area_struct	*vma,
-	struct page		*page)
+	struct vm_fault		*vmf)
 {
-	return block_page_mkwrite(vma, page, xfs_get_blocks);
+	return block_page_mkwrite(vma, vmf, xfs_get_blocks);
 }
 
 const struct file_operations xfs_file_operations = {
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -222,7 +222,7 @@ int cont_write_begin(struct file *, stru
 			get_block_t *, loff_t *);
 int generic_cont_expand_simple(struct inode *inode, loff_t size);
 int block_commit_write(struct page *page, unsigned from, unsigned to);
-int block_page_mkwrite(struct vm_area_struct *vma, struct page *page,
+int block_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf,
 				get_block_t get_block);
 void block_sync_page(struct page *);
 sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *);
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -138,6 +138,7 @@ extern pgprot_t protection_map[16];
 
 #define FAULT_FLAG_WRITE	0x01	/* Fault was a write access */
 #define FAULT_FLAG_NONLINEAR	0x02	/* Fault was via a nonlinear mapping */
+#define FAULT_FLAG_MKWRITE	0x04	/* Fault was mkwrite of existing pte */
 
 
 /*
@@ -173,7 +174,7 @@ struct vm_operations_struct {
 
 	/* notification that a previously read-only page is about to become
 	 * writable, if an error is returned it will cause a SIGBUS */
-	int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page);
+	int (*page_mkwrite)(struct vm_area_struct *vma, struct vm_fault *vmf);
 
 	/* called by access_process_vm when get_user_pages() fails, typically
 	 * for use by special VMAs that can switch between memory and hardware
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1801,6 +1801,15 @@ static int do_wp_page(struct mm_struct *
 		 * get_user_pages(.write=1, .force=1).
 		 */
 		if (vma->vm_ops && vma->vm_ops->page_mkwrite) {
+			struct vm_fault vmf;
+			int tmp;
+
+			vmf.virtual_address = (void __user *)(address &
+								PAGE_MASK);
+			vmf.pgoff = old_page->index;
+			vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
+			vmf.page = old_page;
+
 			/*
 			 * Notify the address space that the page is about to
 			 * become writable so that it can prohibit this or wait
@@ -1812,8 +1821,12 @@ static int do_wp_page(struct mm_struct *
 			page_cache_get(old_page);
 			pte_unmap_unlock(page_table, ptl);
 
-			if (vma->vm_ops->page_mkwrite(vma, old_page) < 0)
+			tmp = vma->vm_ops->page_mkwrite(vma, &vmf);
+			if (unlikely(tmp &
+					(VM_FAULT_ERROR | VM_FAULT_NOPAGE))) {
+				ret = tmp;
 				goto unwritable_page;
+			}
 
 			/*
 			 * Since we dropped the lock we need to revalidate
@@ -1955,7 +1968,7 @@ oom:
 
 unwritable_page:
 	page_cache_release(old_page);
-	return VM_FAULT_SIGBUS;
+	return ret;
 }
 
 /*
@@ -2472,9 +2485,14 @@ static int __do_fault(struct mm_struct *
 			 * to become writable
 			 */
 			if (vma->vm_ops->page_mkwrite) {
+				int tmp;
+
 				unlock_page(page);
-				if (vma->vm_ops->page_mkwrite(vma, page) < 0) {
-					ret = VM_FAULT_SIGBUS;
+				vmf.flags |= FAULT_FLAG_MKWRITE;
+				tmp = vma->vm_ops->page_mkwrite(vma, &vmf);
+				if (unlikely(tmp &
+					  (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) {
+					ret = tmp;
 					anon = 1; /* no anon but release vmf.page */
 					goto out_unlocked;
 				}



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 10/28] fs: fix page_mkwrite error cases in core code and btrfs
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (8 preceding siblings ...)
  2009-05-14 22:51   ` [patch 09/28] mm: page_mkwrite change prototype to match fault Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 11/28] mm: close page_mkwrite races Greg KH
                     ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-fsdevel, linux-mm, Chris Mason, Nick Piggin

[-- Attachment #1: fs-fix-page_mkwrite-error-cases-in-core-code-and-btrfs.patch --]
[-- Type: text/plain, Size: 2330 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Nick Piggin <npiggin@suse.de>

commit 56a76f8275c379ed73c8a43cfa1dfa2f5e9cfa19 upstream

fs: fix page_mkwrite error cases in core code and btrfs

page_mkwrite is called with neither the page lock nor the ptl held.  This
means a page can be concurrently truncated or invalidated out from
underneath it.  Callers are supposed to prevent truncate races themselves,
however previously the only thing they can do in case they hit one is to
raise a SIGBUS.  A sigbus is wrong for the case that the page has been
invalidated or truncated within i_size (eg.  hole punched).  Callers may
also have to perform memory allocations in this path, where again, SIGBUS
would be wrong.

The previous patch ("mm: page_mkwrite change prototype to match fault")
made it possible to properly specify errors.  Convert the generic buffer.c
code and btrfs to return sane error values (in the case of page removed
from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit
without doing anything, and the fault will be retried properly).

This fixes core code, and converts btrfs as a template/example.  All other
filesystems defining their own page_mkwrite should be fixed in a similar
manner.

Acked-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/buffer.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2409,7 +2409,7 @@ block_page_mkwrite(struct vm_area_struct
 	struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
 	unsigned long end;
 	loff_t size;
-	int ret = -EINVAL;
+	int ret = VM_FAULT_NOPAGE; /* make the VM retry the fault */

 	lock_page(page);
 	size = i_size_read(inode);
@@ -2429,10 +2429,14 @@ block_page_mkwrite(struct vm_area_struct
 	if (!ret)
 		ret = block_commit_write(page, 0, end);

-out_unlock:
-	if (ret)
-		ret = VM_FAULT_SIGBUS;
+	if (unlikely(ret)) {
+		if (ret == -ENOMEM)
+			ret = VM_FAULT_OOM;
+		else /* -ENOSPC, -EIO, etc */
+			ret = VM_FAULT_SIGBUS;
+	}

+out_unlock:
 	unlock_page(page);
 	return ret;
 }

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 11/28] mm: close page_mkwrite races
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (9 preceding siblings ...)
  2009-05-14 22:51   ` [patch 10/28] fs: fix page_mkwrite error cases in core code and btrfs Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 12/28] GFS2: Fix page_mkwrite() return code Greg KH
                     ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-fsdevel, linux-mm, Sage Weil, Trond Myklebust,
	Nick Piggin, Valdis Kletnieks

[-- Attachment #1: mm-close-page_mkwrite-races.patch --]
[-- Type: text/plain, Size: 10276 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Nick Piggin <npiggin@suse.de>

commit b827e496c893de0c0f142abfaeb8730a2fd6b37f upstream

mm: close page_mkwrite races

Change page_mkwrite to allow implementations to return with the page
locked, and also change it's callers (in page fault paths) to hold the
lock until the page is marked dirty.  This allows the filesystem to have
full control of page dirtying events coming from the VM.

Rather than simply hold the page locked over the page_mkwrite call, we
call page_mkwrite with the page unlocked and allow callers to return with
it locked, so filesystems can avoid LOR conditions with page lock.

The problem with the current scheme is this: a filesystem that wants to
associate some metadata with a page as long as the page is dirty, will
perform this manipulation in its ->page_mkwrite.  It currently then must
return with the page unlocked and may not hold any other locks (according
to existing page_mkwrite convention).

In this window, the VM could write out the page, clearing page-dirty.  The
filesystem has no good way to detect that a dirty pte is about to be
attached, so it will happily write out the page, at which point, the
filesystem may manipulate the metadata to reflect that the page is no
longer dirty.

It is not always possible to perform the required metadata manipulation in
->set_page_dirty, because that function cannot block or fail.  The
filesystem may need to allocate some data structure, for example.

And the VM cannot mark the pte dirty before page_mkwrite, because
page_mkwrite is allowed to fail, so we must not allow any window where the
page could be written to if page_mkwrite does fail.

This solution of holding the page locked over the 3 critical operations
(page_mkwrite, setting the pte dirty, and finally setting the page dirty)
closes out races nicely, preventing page cleaning for writeout being
initiated in that window.  This provides the filesystem with a strong
synchronisation against the VM here.

- Sage needs this race closed for ceph filesystem.
- Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913).
- I need it for fsblock.
- I suspect other filesystems may need it too (eg. btrfs).
- I have converted buffer.c to the new locking. Even simple block allocation
  under dirty pages might be susceptible to i_size changing under partial page
  at the end of file (we also have a buffer.c-side problem here, but it cannot
  be fixed properly without this patch).
- Other filesystems (eg. NFS, maybe btrfs) will need to change their
  page_mkwrite functions themselves.

[ This also moves page_mkwrite another step closer to fault, which should
  eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a
  filesystem calldown and page lock/unlock cycle in __do_fault. ]

[akpm@linux-foundation.org: fix derefs of NULL ->mapping]
Cc: Sage Weil <sage@newdream.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 Documentation/filesystems/Locking |   24 +++++---
 fs/buffer.c                       |   10 ++-
 mm/memory.c                       |  108 ++++++++++++++++++++++++++------------
 3 files changed, 98 insertions(+), 44 deletions(-)

--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -509,16 +509,24 @@ locking rules:
 		BKL	mmap_sem	PageLocked(page)
 open:		no	yes
 close:		no	yes
-fault:		no	yes
-page_mkwrite:	no	yes		no
+fault:		no	yes		can return with page locked
+page_mkwrite:	no	yes		can return with page locked
 access:		no	yes
 
-	->page_mkwrite() is called when a previously read-only page is
-about to become writeable. The file system is responsible for
-protecting against truncate races. Once appropriate action has been
-taking to lock out truncate, the page range should be verified to be
-within i_size. The page mapping should also be checked that it is not
-NULL.
+	->fault() is called when a previously not present pte is about
+to be faulted in. The filesystem must find and return the page associated
+with the passed in "pgoff" in the vm_fault structure. If it is possible that
+the page may be truncated and/or invalidated, then the filesystem must lock
+the page, then ensure it is not already truncated (the page lock will block
+subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
+locked. The VM will unlock the page.
+
+	->page_mkwrite() is called when a previously read-only pte is
+about to become writeable. The filesystem again must ensure that there are
+no truncate/invalidate races, and then return with the page locked. If
+the page has been truncated, the filesystem should not look up a new page
+like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
+will cause the VM to retry the fault.
 
 	->access() is called when get_user_pages() fails in
 acces_process_vm(), typically used to debug a process through
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2416,7 +2416,8 @@ block_page_mkwrite(struct vm_area_struct
 	if ((page->mapping != inode->i_mapping) ||
 	    (page_offset(page) > size)) {
 		/* page got truncated out from underneath us */
-		goto out_unlock;
+		unlock_page(page);
+		goto out;
 	}
 
 	/* page is wholly or partially inside EOF */
@@ -2430,14 +2431,15 @@ block_page_mkwrite(struct vm_area_struct
 		ret = block_commit_write(page, 0, end);
 
 	if (unlikely(ret)) {
+		unlock_page(page);
 		if (ret == -ENOMEM)
 			ret = VM_FAULT_OOM;
 		else /* -ENOSPC, -EIO, etc */
 			ret = VM_FAULT_SIGBUS;
-	}
+	} else
+		ret = VM_FAULT_LOCKED;
 
-out_unlock:
-	unlock_page(page);
+out:
 	return ret;
 }
 
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1827,6 +1827,15 @@ static int do_wp_page(struct mm_struct *
 				ret = tmp;
 				goto unwritable_page;
 			}
+			if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
+				lock_page(old_page);
+				if (!old_page->mapping) {
+					ret = 0; /* retry the fault */
+					unlock_page(old_page);
+					goto unwritable_page;
+				}
+			} else
+				VM_BUG_ON(!PageLocked(old_page));
 
 			/*
 			 * Since we dropped the lock we need to revalidate
@@ -1836,9 +1845,11 @@ static int do_wp_page(struct mm_struct *
 			 */
 			page_table = pte_offset_map_lock(mm, pmd, address,
 							 &ptl);
-			page_cache_release(old_page);
-			if (!pte_same(*page_table, orig_pte))
+			if (!pte_same(*page_table, orig_pte)) {
+				unlock_page(old_page);
+				page_cache_release(old_page);
 				goto unlock;
+			}
 
 			page_mkwrite = 1;
 		}
@@ -1943,9 +1954,6 @@ gotten:
 unlock:
 	pte_unmap_unlock(page_table, ptl);
 	if (dirty_page) {
-		if (vma->vm_file)
-			file_update_time(vma->vm_file);
-
 		/*
 		 * Yes, Virginia, this is actually required to prevent a race
 		 * with clear_page_dirty_for_io() from clearing the page dirty
@@ -1954,16 +1962,41 @@ unlock:
 		 *
 		 * do_no_page is protected similarly.
 		 */
-		wait_on_page_locked(dirty_page);
-		set_page_dirty_balance(dirty_page, page_mkwrite);
+		if (!page_mkwrite) {
+			wait_on_page_locked(dirty_page);
+			set_page_dirty_balance(dirty_page, page_mkwrite);
+		}
 		put_page(dirty_page);
+		if (page_mkwrite) {
+			struct address_space *mapping = dirty_page->mapping;
+
+			set_page_dirty(dirty_page);
+			unlock_page(dirty_page);
+			page_cache_release(dirty_page);
+			if (mapping)	{
+				/*
+				 * Some device drivers do not set page.mapping
+				 * but still dirty their pages
+				 */
+				balance_dirty_pages_ratelimited(mapping);
+			}
+		}
+
+		/* file_update_time outside page_lock */
+		if (vma->vm_file)
+			file_update_time(vma->vm_file);
 	}
 	return ret;
 oom_free_new:
 	page_cache_release(new_page);
 oom:
-	if (old_page)
+	if (old_page) {
+		if (page_mkwrite) {
+			unlock_page(old_page);
+			page_cache_release(old_page);
+		}
 		page_cache_release(old_page);
+	}
 	return VM_FAULT_OOM;
 
 unwritable_page:
@@ -2488,27 +2521,22 @@ static int __do_fault(struct mm_struct *
 				int tmp;
 
 				unlock_page(page);
-				vmf.flags |= FAULT_FLAG_MKWRITE;
+				vmf.flags = FAULT_FLAG_WRITE|FAULT_FLAG_MKWRITE;
 				tmp = vma->vm_ops->page_mkwrite(vma, &vmf);
 				if (unlikely(tmp &
 					  (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) {
 					ret = tmp;
-					anon = 1; /* no anon but release vmf.page */
-					goto out_unlocked;
-				}
-				lock_page(page);
-				/*
-				 * XXX: this is not quite right (racy vs
-				 * invalidate) to unlock and relock the page
-				 * like this, however a better fix requires
-				 * reworking page_mkwrite locking API, which
-				 * is better done later.
-				 */
-				if (!page->mapping) {
-					ret = 0;
-					anon = 1; /* no anon but release vmf.page */
-					goto out;
+					goto unwritable_page;
 				}
+				if (unlikely(!(tmp & VM_FAULT_LOCKED))) {
+					lock_page(page);
+					if (!page->mapping) {
+						ret = 0; /* retry the fault */
+						unlock_page(page);
+						goto unwritable_page;
+					}
+				} else
+					VM_BUG_ON(!PageLocked(page));
 				page_mkwrite = 1;
 			}
 		}
@@ -2565,19 +2593,35 @@ static int __do_fault(struct mm_struct *
 	pte_unmap_unlock(page_table, ptl);
 
 out:
-	unlock_page(vmf.page);
-out_unlocked:
-	if (anon)
-		page_cache_release(vmf.page);
-	else if (dirty_page) {
-		if (vma->vm_file)
-			file_update_time(vma->vm_file);
+	if (dirty_page) {
+		struct address_space *mapping = page->mapping;
 
-		set_page_dirty_balance(dirty_page, page_mkwrite);
+		if (set_page_dirty(dirty_page))
+			page_mkwrite = 1;
+		unlock_page(dirty_page);
 		put_page(dirty_page);
+		if (page_mkwrite && mapping) {
+			/*
+			 * Some device drivers do not set page.mapping but still
+			 * dirty their pages
+			 */
+			balance_dirty_pages_ratelimited(mapping);
+		}
+
+		/* file_update_time outside page_lock */
+		if (vma->vm_file)
+			file_update_time(vma->vm_file);
+	} else {
+		unlock_page(vmf.page);
+		if (anon)
+			page_cache_release(vmf.page);
 	}
 
 	return ret;
+
+unwritable_page:
+	page_cache_release(page);
+	return ret;
 }
 
 static int do_linear_fault(struct mm_struct *mm, struct vm_area_struct *vma,



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 12/28] GFS2: Fix page_mkwrite() return code
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (10 preceding siblings ...)
  2009-05-14 22:51   ` [patch 11/28] mm: close page_mkwrite races Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 13/28] NFS: Fix the return value in nfs_page_mkwrite() Greg KH
                     ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-fsdevel, linux-mm, Steven Whitehouse, Nick Piggin

[-- Attachment #1: gfs2-fix-page_mkwrite-return-code.patch --]
[-- Type: text/plain, Size: 821 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Steven Whitehouse <swhiteho@redhat.com>

commit e56985da455b9dc0591b8cb2006cc94b6f4fb0f4 upstream

This allows for the possibility of returning VM_FAULT_OOM as
well as VM_FAULT_SIGBUS. This ensures that the correct action
is taken.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 fs/gfs2/ops_file.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -412,7 +412,9 @@ out_unlock:
 	gfs2_glock_dq(&gh);
 out:
 	gfs2_holder_uninit(&gh);
-	if (ret)
+	if (ret == -ENOMEM)
+		ret = VM_FAULT_OOM;
+	else if (ret)
 		ret = VM_FAULT_SIGBUS;
 	return ret;
 }



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 13/28] NFS: Fix the return value in nfs_page_mkwrite()
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (11 preceding siblings ...)
  2009-05-14 22:51   ` [patch 12/28] GFS2: Fix page_mkwrite() return code Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 14/28] NFS: Close page_mkwrite() races Greg KH
                     ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-fsdevel, linux-mm, Trond Myklebust, Nick Piggin

[-- Attachment #1: nfs-fix-the-return-value-in-nfs_page_mkwrite.patch --]
[-- Type: text/plain, Size: 931 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

commit 2b2ec7554cf7ec5e4412f89a5af6abe8ce950700 upstream

Commit c2ec175c39f62949438354f603f4aa170846aabb ("mm: page_mkwrite
change prototype to match fault") exposed a bug in the NFS
implementation of page_mkwrite.  We should be returning 0 on success...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/nfs/file.c |    2 --
 1 file changed, 2 deletions(-)

--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -477,8 +477,6 @@ static int nfs_vm_page_mkwrite(struct vm
 		goto out_unlock;
 
 	ret = nfs_updatepage(filp, page, 0, pagelen);
-	if (ret == 0)
-		ret = pagelen;
 out_unlock:
 	unlock_page(page);
 	if (ret)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 14/28] NFS: Close page_mkwrite() races
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (12 preceding siblings ...)
  2009-05-14 22:51   ` [patch 13/28] NFS: Fix the return value in nfs_page_mkwrite() Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 15/28] cifs: Fix buffer size for tcon->nativeFileSystem field Greg KH
                     ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-fsdevel, linux-mm, Trond Myklebust, Nick Piggin

[-- Attachment #1: nfs-close-page_mkwrite-races.patch --]
[-- Type: text/plain, Size: 1062 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

commit 7fdf523067666b0eaff330f362401ee50ce187c4 upstream

Follow up to Nick Piggin's patches to ensure that nfs_vm_page_mkwrite
returns with the page lock held, and sets the VM_FAULT_LOCKED flag.

See http://bugzilla.kernel.org/show_bug.cgi?id=12913

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>


---
 fs/nfs/file.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -478,10 +478,10 @@ static int nfs_vm_page_mkwrite(struct vm
 
 	ret = nfs_updatepage(filp, page, 0, pagelen);
 out_unlock:
+	if (!ret)
+		return VM_FAULT_LOCKED;
 	unlock_page(page);
-	if (ret)
-		ret = VM_FAULT_SIGBUS;
-	return ret;
+	return VM_FAULT_SIGBUS;
 }
 
 static struct vm_operations_struct nfs_file_vm_ops = {



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 15/28] cifs: Fix buffer size for tcon->nativeFileSystem field
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (13 preceding siblings ...)
  2009-05-14 22:51   ` [patch 14/28] NFS: Close page_mkwrite() races Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 16/28] cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows Greg KH
                     ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Steve French, Jeff Layton, Steve French, Suresh Jayaraman

[-- Attachment #1: cifs-fix-buffer-size-for-tcon-nativefilesystem-field.patch --]
[-- Type: text/plain, Size: 1654 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Jeff Layton <jlayton@redhat.com>

Commit f083def68f84b04fe3f97312498911afce79609e refreshed.

cifs: fix buffer size for tcon->nativeFileSystem field

The buffer for this was resized recently to fix a bug. It's still
possible however that a malicious server could overflow this field
by sending characters in it that are >2 bytes in the local charset.
Double the size of the buffer to account for this possibility.

Also get rid of some really strange and seemingly pointless NULL
termination. It's NULL terminating the string in the source buffer,
but by the time that happens, we've already copied the string.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Cc: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/cifs/connect.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -3549,16 +3549,12 @@ CIFSTCon(unsigned int xid, struct cifsSe
 			    BCC(smb_buffer_response)) {
 				kfree(tcon->nativeFileSystem);
 				tcon->nativeFileSystem =
-				    kzalloc(2*(length + 1), GFP_KERNEL);
+				    kzalloc((4 * length) + 2, GFP_KERNEL);
 				if (tcon->nativeFileSystem)
 					cifs_strfromUCS_le(
 						tcon->nativeFileSystem,
 						(__le16 *) bcc_ptr,
 						length, nls_codepage);
-				bcc_ptr += 2 * length;
-				bcc_ptr[0] = 0;	/* null terminate the string */
-				bcc_ptr[1] = 0;
-				bcc_ptr += 2;
 			}
 			/* else do not bother copying these information fields*/
 		} else {



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 16/28] cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (14 preceding siblings ...)
  2009-05-14 22:51   ` [patch 15/28] cifs: Fix buffer size for tcon->nativeFileSystem field Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 17/28] cifs: Fix incorrect destination buffer size in cifs_strncpy_to_host Greg KH
                     ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Steve French, Jeff Layton, Suresh Jayaraman, Steve French

[-- Attachment #1: cifs-increase-size-of-tmp_buf-in-cifs_readdir-to-avoid-potential-overflows.patch --]
[-- Type: text/plain, Size: 1835 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Suresh Jayaraman <sjayaraman@suse.de>

Commit 7b0c8fcff47a885743125dd843db64af41af5a61 refreshed and use
a #define from commit f58841666bc22e827ca0dcef7b71c7bc2758ce82.

cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows

Increase size of tmp_buf to possible maximum to avoid potential
overflows. Also moved UNICODE_NAME_MAX definition so that it can be used
elsewhere.

Pointed-out-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/cifs/cifs_unicode.h |    7 +++++++
 fs/cifs/readdir.c      |    2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

--- a/fs/cifs/cifs_unicode.h
+++ b/fs/cifs/cifs_unicode.h
@@ -64,6 +64,13 @@ int cifs_strtoUCS(__le16 *, const char *
 #endif
 
 /*
+ * To be safe - for UCS to UTF-8 with strings loaded with the rare long
+ * characters alloc more to account for such multibyte target UTF-8
+ * characters.
+ */
+#define UNICODE_NAME_MAX ((4 * NAME_MAX) + 2)
+
+/*
  * UniStrcat:  Concatenate the second string to the first
  *
  * Returns:
--- a/fs/cifs/readdir.c
+++ b/fs/cifs/readdir.c
@@ -1075,7 +1075,7 @@ int cifs_readdir(struct file *file, void
 		with the rare long characters alloc more to account for
 		such multibyte target UTF-8 characters. cifs_unicode.c,
 		which actually does the conversion, has the same limit */
-		tmp_buf = kmalloc((2 * NAME_MAX) + 4, GFP_KERNEL);
+		tmp_buf = kmalloc(UNICODE_NAME_MAX, GFP_KERNEL);
 		for (i = 0; (i < num_to_fill) && (rc == 0); i++) {
 			if (current_entry == NULL) {
 				/* evaluate whether this case is an error */



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 17/28] cifs: Fix incorrect destination buffer size in cifs_strncpy_to_host
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (15 preceding siblings ...)
  2009-05-14 22:51   ` [patch 16/28] cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 18/28] cifs: Fix buffer size in cifs_convertUCSpath Greg KH
                     ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Steve French, Jeff Layton, Suresh Jayaraman, Steve French

[-- Attachment #1: cifs-fix-incorrect-destination-buffer-size-in-cifs_strncpy_to_host.patch --]
[-- Type: text/plain, Size: 2236 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Suresh Jayaraman <sjayaraman@suse.de>


Relevant commits 968460ebd8006d55661dec0fb86712b40d71c413 and 
066ce6899484d9026acd6ba3a8dbbedb33d7ae1b. Minimal hunks to fix buffer
size and fix an existing problem pointed out by Guenter Kukuk that length
of src is used for NULL termination of dst. 

cifs: Rename cifs_strncpy_to_host and fix buffer size

There is a possibility for the path_name and node_name buffers to
overflow if they contain charcters that are >2 bytes in the local
charset. Resize the buffer allocation so to avoid this possibility.

Also, as pointed out by Jeff Layton, it would be appropriate to
rename the function to cifs_strlcpy_to_host to reflect the fact
that the copied string is always NULL terminated.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/cifs/cifssmb.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/fs/cifs/cifssmb.c
+++ b/fs/cifs/cifssmb.c
@@ -91,23 +91,22 @@ static int
 cifs_strncpy_to_host(char **dst, const char *src, const int maxlen,
 		 const bool is_unicode, const struct nls_table *nls_codepage)
 {
-	int plen;
+	int src_len, dst_len;
 
 	if (is_unicode) {
-		plen = UniStrnlen((wchar_t *)src, maxlen);
-		*dst = kmalloc(plen + 2, GFP_KERNEL);
+		src_len = UniStrnlen((wchar_t *)src, maxlen);
+		*dst = kmalloc((4 * src_len) + 2, GFP_KERNEL);
 		if (!*dst)
 			goto cifs_strncpy_to_host_ErrExit;
-		cifs_strfromUCS_le(*dst, (__le16 *)src, plen, nls_codepage);
+		dst_len = cifs_strfromUCS_le(*dst, (__le16 *)src, src_len, nls_codepage);
+		(*dst)[dst_len + 1] = 0;
 	} else {
-		plen = strnlen(src, maxlen);
-		*dst = kmalloc(plen + 2, GFP_KERNEL);
+		src_len = strnlen(src, maxlen);
+		*dst = kmalloc(src_len + 1, GFP_KERNEL);
 		if (!*dst)
 			goto cifs_strncpy_to_host_ErrExit;
-		strncpy(*dst, src, plen);
+		strlcpy(*dst, src, src_len + 1);
 	}
-	(*dst)[plen] = 0;
-	(*dst)[plen+1] = 0; /* harmless for ASCII case, needed for Unicode */
 	return 0;
 
 cifs_strncpy_to_host_ErrExit:



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 18/28] cifs: Fix buffer size in cifs_convertUCSpath
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (16 preceding siblings ...)
  2009-05-14 22:51   ` [patch 17/28] cifs: Fix incorrect destination buffer size in cifs_strncpy_to_host Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 19/28] cifs: Fix unicode string area word alignment in session setup Greg KH
                     ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Steve French, Jeff Layton, Suresh Jayaraman, Steve French

[-- Attachment #1: cifs-fix-buffer-size-in-cifs_convertucspath.patch --]
[-- Type: text/plain, Size: 1117 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Suresh Jayaraman <sjayaraman@suse.de>

Relevant commits 7fabf0c9479fef9fdb9528a5fbdb1cb744a744a4 and
f58841666bc22e827ca0dcef7b71c7bc2758ce82. The upstream commits adds
cifs_from_ucs2 that includes functionality of cifs_convertUCSpath and
does cleanup.

Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: Steve French <sfrench@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/cifs/misc.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/fs/cifs/misc.c
+++ b/fs/cifs/misc.c
@@ -685,14 +685,15 @@ cifs_convertUCSpath(char *target, const 
 						NLS_MAX_CHARSET_SIZE);
 				if (len > 0) {
 					j += len;
-					continue;
+					goto overrun_chk;
 				} else {
 					target[j] = '?';
 				}
 		}
 		j++;
 		/* make sure we do not overrun callers allocated temp buffer */
-		if (j >= (2 * NAME_MAX))
+overrun_chk:
+		if (j >= UNICODE_NAME_MAX)
 			break;
 	}
 cUCS_out:



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 19/28] cifs: Fix unicode string area word alignment in session setup
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (17 preceding siblings ...)
  2009-05-14 22:51   ` [patch 18/28] cifs: Fix buffer size in cifs_convertUCSpath Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 20/28] epoll: fix size check in epoll_create() Greg KH
                     ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Steve French, Jeff Layton, Steve French, Suresh Jayaraman

[-- Attachment #1: cifs-fix-unicode-string-area-word-alignment-in-session-setup.patch --]
[-- Type: text/plain, Size: 3957 bytes --]


2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Jeff Layton <jlayton@redhat.com>

commit 27b87fe52baba0a55e9723030e76fce94fabcea4 refreshed.

cifs: fix unicode string area word alignment in session setup

The handling of unicode string area alignment is wrong.
decode_unicode_ssetup improperly assumes that it will always be preceded
by a pad byte. This isn't the case if the string area is already
word-aligned.

This problem, combined with the bad buffer sizing for the serverDomain
string can cause memory corruption. The bad alignment can make it so
that the alignment of the characters is off. This can make them
translate to characters that are greater than 2 bytes each. If this
happens we can overflow the allocation.

Fix this by fixing the alignment in CIFS_SessSetup instead so we can
verify it against the head of the response. Also, clean up the
workaround for improperly terminated strings by checking for a
odd-length unicode buffers and then forcibly terminating them.

Finally, resize the buffer for serverDomain. Now that we've fixed
the alignment, it's probably fine, but a malicious server could
overflow it.

A better solution for handling these strings is still needed, but
this should be a suitable bandaid.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Cc: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/cifs/sess.c |   44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

--- a/fs/cifs/sess.c
+++ b/fs/cifs/sess.c
@@ -202,27 +202,26 @@ static int decode_unicode_ssetup(char **
 	int words_left, len;
 	char *data = *pbcc_area;
 
-
-
 	cFYI(1, ("bleft %d", bleft));
 
-
-	/* SMB header is unaligned, so cifs servers word align start of
-	   Unicode strings */
-	data++;
-	bleft--; /* Windows servers do not always double null terminate
-		    their final Unicode string - in which case we
-		    now will not attempt to decode the byte of junk
-		    which follows it */
+	/*
+	 * Windows servers do not always double null terminate their final
+	 * Unicode string. Check to see if there are an uneven number of bytes
+	 * left. If so, then add an extra NULL pad byte to the end of the
+	 * response.
+	 *
+	 * See section 2.7.2 in "Implementing CIFS" for details
+	 */
+	if (bleft % 2) {
+		data[bleft] = 0;
+		++bleft;
+	}
 
 	words_left = bleft / 2;
 
 	/* save off server operating system */
 	len = UniStrnlen((wchar_t *) data, words_left);
 
-/* We look for obvious messed up bcc or strings in response so we do not go off
-   the end since (at least) WIN2K and Windows XP have a major bug in not null
-   terminating last Unicode string in response  */
 	if (len >= words_left)
 		return rc;
 
@@ -260,13 +259,10 @@ static int decode_unicode_ssetup(char **
 		return rc;
 
 	kfree(ses->serverDomain);
-	ses->serverDomain = kzalloc(2 * (len + 1), GFP_KERNEL); /* BB FIXME wrong length */
-	if (ses->serverDomain != NULL) {
+	ses->serverDomain = kzalloc((4 * len) + 2, GFP_KERNEL);
+	if (ses->serverDomain != NULL)
 		cifs_strfromUCS_le(ses->serverDomain, (__le16 *)data, len,
 				   nls_cp);
-		ses->serverDomain[2*len] = 0;
-		ses->serverDomain[(2*len) + 1] = 0;
-	}
 	data += 2 * (len + 1);
 	words_left -= len + 1;
 
@@ -616,12 +612,18 @@ CIFS_SessSetup(unsigned int xid, struct 
 	}
 
 	/* BB check if Unicode and decode strings */
-	if (smb_buf->Flags2 & SMBFLG2_UNICODE)
+	if (smb_buf->Flags2 & SMBFLG2_UNICODE) {
+		/* unicode string area must be word-aligned */
+		if (((unsigned long) bcc_ptr - (unsigned long) smb_buf) % 2) {
+			++bcc_ptr;
+			--bytes_remaining;
+		}
 		rc = decode_unicode_ssetup(&bcc_ptr, bytes_remaining,
-						   ses, nls_cp);
-	else
+					   ses, nls_cp);
+	} else {
 		rc = decode_ascii_ssetup(&bcc_ptr, bytes_remaining,
 					 ses, nls_cp);
+	}
 
 ssetup_exit:
 	if (spnego_key)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 20/28] epoll: fix size check in epoll_create()
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (18 preceding siblings ...)
  2009-05-14 22:51   ` [patch 19/28] cifs: Fix unicode string area word alignment in session setup Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 21/28] nfsd4: check for negative dentry before use in nfsv4 readdir Greg KH
                     ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Davide Libenzi, Hiroyuki.Mach, rohit verma, Ulrich Drepper

[-- Attachment #1: epoll-fix-size-check-in-epoll_create.patch --]
[-- Type: text/plain, Size: 1017 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Davide Libenzi <davidel@xmailserver.org>

commit bfe3891a5f5d3b78146a45f40e435d14f5ae39dd upstream.

Fix a size check WRT the manual pages.  This was inadvertently broken by
commit 9fe5ad9c8cef9ad5873d8ee55d1cf00d9b607df0 ("flag parameters
add-on: remove epoll_create size param").

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <Hiroyuki.Mach@gmail.com>
Cc: rohit verma <rohit.170309@gmail.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/eventpoll.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1132,7 +1132,7 @@ error_return:
 
 SYSCALL_DEFINE1(epoll_create, int, size)
 {
-	if (size < 0)
+	if (size <= 0)
 		return -EINVAL;
 
 	return sys_epoll_create1(0);



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 21/28] nfsd4: check for negative dentry before use in nfsv4 readdir
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (19 preceding siblings ...)
  2009-05-14 22:51   ` [patch 20/28] epoll: fix size check in epoll_create() Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 22/28] NFS: Fix the notifications when renaming onto an existing file Greg KH
                     ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, J. Bruce Fields

[-- Attachment #1: nfsd4-check-for-negative-dentry-before-use-in-nfsv4-readdir.patch --]
[-- Type: text/plain, Size: 2424 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: J. Bruce Fields <bfields@citi.umich.edu>

commit b2c0cea6b1cb210e962f07047df602875564069e upstream.

After 2f9092e1020246168b1309b35e085ecd7ff9ff72 "Fix i_mutex vs.  readdir
handling in nfsd" (and 14f7dd63 "Copy XFS readdir hack into nfsd code"),
an entry may be removed between the first mutex_unlock and the second
mutex_lock. In this case, lookup_one_len() will return a negative
dentry.  Check for this case to avoid a NULL dereference.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Reviewed-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/nfsd/nfs4xdr.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1833,6 +1833,15 @@ nfsd4_encode_dirent_fattr(struct nfsd4_r
 	dentry = lookup_one_len(name, cd->rd_fhp->fh_dentry, namlen);
 	if (IS_ERR(dentry))
 		return nfserrno(PTR_ERR(dentry));
+	if (!dentry->d_inode) {
+		/*
+		 * nfsd_buffered_readdir drops the i_mutex between
+		 * readdir and calling this callback, leaving a window
+		 * where this directory entry could have gone away.
+		 */
+		dput(dentry);
+		return nfserr_noent;
+	}
 
 	exp_get(exp);
 	/*
@@ -1895,6 +1904,7 @@ nfsd4_encode_dirent(void *ccdv, const ch
 	struct nfsd4_readdir *cd = container_of(ccd, struct nfsd4_readdir, common);
 	int buflen;
 	__be32 *p = cd->buffer;
+	__be32 *cookiep;
 	__be32 nfserr = nfserr_toosmall;
 
 	/* In nfsv4, "." and ".." never make it onto the wire.. */
@@ -1911,7 +1921,7 @@ nfsd4_encode_dirent(void *ccdv, const ch
 		goto fail;
 
 	*p++ = xdr_one;                             /* mark entry present */
-	cd->offset = p;                             /* remember pointer */
+	cookiep = p;
 	p = xdr_encode_hyper(p, NFS_OFFSET_MAX);    /* offset of next entry */
 	p = xdr_encode_array(p, name, namlen);      /* name length & name */
 
@@ -1925,6 +1935,8 @@ nfsd4_encode_dirent(void *ccdv, const ch
 		goto fail;
 	case nfserr_dropit:
 		goto fail;
+	case nfserr_noent:
+		goto skip_entry;
 	default:
 		/*
 		 * If the client requested the RDATTR_ERROR attribute,
@@ -1943,6 +1955,8 @@ nfsd4_encode_dirent(void *ccdv, const ch
 	}
 	cd->buflen -= (p - cd->buffer);
 	cd->buffer = p;
+	cd->offset = cookiep;
+skip_entry:
 	cd->common.err = nfs_ok;
 	return 0;
 fail:



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 22/28] NFS: Fix the notifications when renaming onto an existing file
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (20 preceding siblings ...)
  2009-05-14 22:51   ` [patch 21/28] nfsd4: check for negative dentry before use in nfsv4 readdir Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 23/28] ehea: fix invalid pointer access Greg KH
                     ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Trond Myklebust

[-- Attachment #1: nfs-fix-the-notifications-when-renaming-onto-an-existing-file.patch --]
[-- Type: text/plain, Size: 1444 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Trond Myklebust <Trond.Myklebust@netapp.com>

commit b1e4adf4ea41bb8b5a7bfc1a7001f137e65495df upstream.

NFS appears to be returning an unnecessary "delete" notification when
we're doing an atomic rename. See

  http://bugzilla.gnome.org/show_bug.cgi?id=575684

The fix is to get rid of the redundant call to d_delete().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/nfs/dir.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1613,8 +1613,7 @@ static int nfs_rename(struct inode *old_
 		} else if (atomic_read(&new_dentry->d_count) > 1)
 			/* dentry still busy? */
 			goto out;
-	} else
-		nfs_drop_nlink(new_inode);
+	}
 
 go_ahead:
 	/*
@@ -1627,10 +1626,8 @@ go_ahead:
 	}
 	nfs_inode_return_delegation(old_inode);
 
-	if (new_inode != NULL) {
+	if (new_inode != NULL)
 		nfs_inode_return_delegation(new_inode);
-		d_delete(new_dentry);
-	}
 
 	error = NFS_PROTO(old_dir)->rename(old_dir, &old_dentry->d_name,
 					   new_dir, &new_dentry->d_name);
@@ -1639,6 +1636,8 @@ out:
 	if (rehash)
 		d_rehash(rehash);
 	if (!error) {
+		if (new_inode != NULL)
+			nfs_drop_nlink(new_inode);
 		d_move(old_dentry, new_dentry);
 		nfs_set_verifier(new_dentry,
 					nfs_save_change_attribute(new_dir));



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 23/28] ehea: fix invalid pointer access
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (21 preceding siblings ...)
  2009-05-14 22:51   ` [patch 22/28] NFS: Fix the notifications when renaming onto an existing file Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 24/28] powerpc/5200: Dont specify IRQF_SHARED in PSC UART driver Greg KH
                     ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Hannes Hering, Jan-Bernd Themann, David S. Miller

[-- Attachment #1: ehea-fix-invalid-pointer-access.patch --]
[-- Type: text/plain, Size: 1877 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Hannes Hering <hering2@de.ibm.com>

commit 0b2febf38a33d7c40fb7bb4a58c113a1fa33c412 upstream.

This patch fixes an invalid pointer access in case the receive queue
holds no pointer to the next skb when the queue is empty.

Signed-off-by: Hannes Hering <hering2@de.ibm.com>
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/net/ehea/ehea_main.c |   31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -529,14 +529,17 @@ static inline struct sk_buff *get_skb_by
 	x &= (arr_len - 1);
 
 	pref = skb_array[x];
-	prefetchw(pref);
-	prefetchw(pref + EHEA_CACHE_LINE);
+	if (pref) {
+		prefetchw(pref);
+		prefetchw(pref + EHEA_CACHE_LINE);
+
+		pref = (skb_array[x]->data);
+		prefetch(pref);
+		prefetch(pref + EHEA_CACHE_LINE);
+		prefetch(pref + EHEA_CACHE_LINE * 2);
+		prefetch(pref + EHEA_CACHE_LINE * 3);
+	}
 
-	pref = (skb_array[x]->data);
-	prefetch(pref);
-	prefetch(pref + EHEA_CACHE_LINE);
-	prefetch(pref + EHEA_CACHE_LINE * 2);
-	prefetch(pref + EHEA_CACHE_LINE * 3);
 	skb = skb_array[skb_index];
 	skb_array[skb_index] = NULL;
 	return skb;
@@ -553,12 +556,14 @@ static inline struct sk_buff *get_skb_by
 	x &= (arr_len - 1);
 
 	pref = skb_array[x];
-	prefetchw(pref);
-	prefetchw(pref + EHEA_CACHE_LINE);
-
-	pref = (skb_array[x]->data);
-	prefetchw(pref);
-	prefetchw(pref + EHEA_CACHE_LINE);
+	if (pref) {
+		prefetchw(pref);
+		prefetchw(pref + EHEA_CACHE_LINE);
+
+		pref = (skb_array[x]->data);
+		prefetchw(pref);
+		prefetchw(pref + EHEA_CACHE_LINE);
+	}
 
 	skb = skb_array[wqe_index];
 	skb_array[wqe_index] = NULL;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 24/28] powerpc/5200: Dont specify IRQF_SHARED in PSC UART driver
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (22 preceding siblings ...)
  2009-05-14 22:51   ` [patch 23/28] ehea: fix invalid pointer access Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 25/28] splice: split up __splice_from_pipe() Greg KH
                     ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Grant Likely

[-- Attachment #1: powerpc-5200-don-t-specify-irqf_shared-in-psc-uart-driver.patch --]
[-- Type: text/plain, Size: 1081 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Grant Likely <grant.likely@secretlab.ca>

commit d9f0c5f9bc74f16d0ea0f6c518b209e48783a796 upstream.

The MPC5200 PSC device is wired up to a dedicated interrupt line
which is never shared.  This patch removes the IRQF_SHARED flag
from the request_irq() call which eliminates the "IRQF_DISABLED
is not guaranteed on shared IRQs" warning message from the console
output.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/serial/mpc52xx_uart.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/serial/mpc52xx_uart.c
+++ b/drivers/serial/mpc52xx_uart.c
@@ -515,7 +515,7 @@ mpc52xx_uart_startup(struct uart_port *p
 
 	/* Request IRQ */
 	ret = request_irq(port->irq, mpc52xx_uart_int,
-		IRQF_DISABLED | IRQF_SAMPLE_RANDOM | IRQF_SHARED,
+		IRQF_DISABLED | IRQF_SAMPLE_RANDOM,
 		"mpc52xx_psc_uart", port);
 	if (ret)
 		return ret;



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 25/28] splice: split up __splice_from_pipe()
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (23 preceding siblings ...)
  2009-05-14 22:51   ` [patch 24/28] powerpc/5200: Dont specify IRQF_SHARED in PSC UART driver Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 26/28] splice: remove i_mutex locking in splice_from_pipe() Greg KH
                     ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Miklos Szeredi, Jens Axboe

[-- Attachment #1: splice-split-up-__splice_from_pipe.patch --]
[-- Type: text/plain, Size: 8816 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Miklos Szeredi <miklos@szeredi.hu>

commit b3c2d2ddd63944ef2a1e4a43077b602288107e01 upstream.

Split up __splice_from_pipe() into four helper functions:

  splice_from_pipe_begin()
  splice_from_pipe_next()
  splice_from_pipe_feed()
  splice_from_pipe_end()

splice_from_pipe_next() will wait (if necessary) for more buffers to
be added to the pipe.  splice_from_pipe_feed() will feed the buffers
to the supplied actor and return when there's no more data available
(or if all of the requested data has been copied).

This is necessary so that implementations can do locking around the
non-waiting splice_from_pipe_feed().

This patch should not cause any change in behavior.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/splice.c            |  223 ++++++++++++++++++++++++++++++++-----------------
 include/linux/splice.h |   10 ++
 2 files changed, 156 insertions(+), 77 deletions(-)

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -599,107 +599,176 @@ out:
 	return ret;
 }
 
+static void wakeup_pipe_writers(struct pipe_inode_info *pipe)
+{
+	smp_mb();
+	if (waitqueue_active(&pipe->wait))
+		wake_up_interruptible(&pipe->wait);
+	kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
+}
+
 /**
- * __splice_from_pipe - splice data from a pipe to given actor
+ * splice_from_pipe_feed - feed available data from a pipe to a file
  * @pipe:	pipe to splice from
  * @sd:		information to @actor
  * @actor:	handler that splices the data
  *
  * Description:
- *    This function does little more than loop over the pipe and call
- *    @actor to do the actual moving of a single struct pipe_buffer to
- *    the desired destination. See pipe_to_file, pipe_to_sendpage, or
- *    pipe_to_user.
+
+ *    This function loops over the pipe and calls @actor to do the
+ *    actual moving of a single struct pipe_buffer to the desired
+ *    destination.  It returns when there's no more buffers left in
+ *    the pipe or if the requested number of bytes (@sd->total_len)
+ *    have been copied.  It returns a positive number (one) if the
+ *    pipe needs to be filled with more data, zero if the required
+ *    number of bytes have been copied and -errno on error.
  *
+ *    This, together with splice_from_pipe_{begin,end,next}, may be
+ *    used to implement the functionality of __splice_from_pipe() when
+ *    locking is required around copying the pipe buffers to the
+ *    destination.
  */
-ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
-			   splice_actor *actor)
+int splice_from_pipe_feed(struct pipe_inode_info *pipe, struct splice_desc *sd,
+			  splice_actor *actor)
 {
-	int ret, do_wakeup, err;
-
-	ret = 0;
-	do_wakeup = 0;
+	int ret;
 
-	for (;;) {
-		if (pipe->nrbufs) {
-			struct pipe_buffer *buf = pipe->bufs + pipe->curbuf;
-			const struct pipe_buf_operations *ops = buf->ops;
-
-			sd->len = buf->len;
-			if (sd->len > sd->total_len)
-				sd->len = sd->total_len;
-
-			err = actor(pipe, buf, sd);
-			if (err <= 0) {
-				if (!ret && err != -ENODATA)
-					ret = err;
+	while (pipe->nrbufs) {
+		struct pipe_buffer *buf = pipe->bufs + pipe->curbuf;
+		const struct pipe_buf_operations *ops = buf->ops;
+
+		sd->len = buf->len;
+		if (sd->len > sd->total_len)
+			sd->len = sd->total_len;
+
+		ret = actor(pipe, buf, sd);
+		if (ret <= 0) {
+			if (ret == -ENODATA)
+				ret = 0;
+			return ret;
+		}
+		buf->offset += ret;
+		buf->len -= ret;
+
+		sd->num_spliced += ret;
+		sd->len -= ret;
+		sd->pos += ret;
+		sd->total_len -= ret;
 
-				break;
-			}
+		if (!buf->len) {
+			buf->ops = NULL;
+			ops->release(pipe, buf);
+			pipe->curbuf = (pipe->curbuf + 1) & (PIPE_BUFFERS - 1);
+			pipe->nrbufs--;
+			if (pipe->inode)
+				sd->need_wakeup = true;
+		}
 
-			ret += err;
-			buf->offset += err;
-			buf->len -= err;
-
-			sd->len -= err;
-			sd->pos += err;
-			sd->total_len -= err;
-			if (sd->len)
-				continue;
-
-			if (!buf->len) {
-				buf->ops = NULL;
-				ops->release(pipe, buf);
-				pipe->curbuf = (pipe->curbuf + 1) & (PIPE_BUFFERS - 1);
-				pipe->nrbufs--;
-				if (pipe->inode)
-					do_wakeup = 1;
-			}
+		if (!sd->total_len)
+			return 0;
+	}
 
-			if (!sd->total_len)
-				break;
-		}
+	return 1;
+}
+EXPORT_SYMBOL(splice_from_pipe_feed);
 
-		if (pipe->nrbufs)
-			continue;
+/**
+ * splice_from_pipe_next - wait for some data to splice from
+ * @pipe:	pipe to splice from
+ * @sd:		information about the splice operation
+ *
+ * Description:
+ *    This function will wait for some data and return a positive
+ *    value (one) if pipe buffers are available.  It will return zero
+ *    or -errno if no more data needs to be spliced.
+ */
+int splice_from_pipe_next(struct pipe_inode_info *pipe, struct splice_desc *sd)
+{
+	while (!pipe->nrbufs) {
 		if (!pipe->writers)
-			break;
-		if (!pipe->waiting_writers) {
-			if (ret)
-				break;
-		}
+			return 0;
 
-		if (sd->flags & SPLICE_F_NONBLOCK) {
-			if (!ret)
-				ret = -EAGAIN;
-			break;
-		}
+		if (!pipe->waiting_writers && sd->num_spliced)
+			return 0;
 
-		if (signal_pending(current)) {
-			if (!ret)
-				ret = -ERESTARTSYS;
-			break;
-		}
+		if (sd->flags & SPLICE_F_NONBLOCK)
+			return -EAGAIN;
 
-		if (do_wakeup) {
-			smp_mb();
-			if (waitqueue_active(&pipe->wait))
-				wake_up_interruptible_sync(&pipe->wait);
-			kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
-			do_wakeup = 0;
+		if (signal_pending(current))
+			return -ERESTARTSYS;
+
+		if (sd->need_wakeup) {
+			wakeup_pipe_writers(pipe);
+			sd->need_wakeup = false;
 		}
 
 		pipe_wait(pipe);
 	}
 
-	if (do_wakeup) {
-		smp_mb();
-		if (waitqueue_active(&pipe->wait))
-			wake_up_interruptible(&pipe->wait);
-		kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
-	}
+	return 1;
+}
+EXPORT_SYMBOL(splice_from_pipe_next);
 
-	return ret;
+/**
+ * splice_from_pipe_begin - start splicing from pipe
+ * @pipe:	pipe to splice from
+ *
+ * Description:
+ *    This function should be called before a loop containing
+ *    splice_from_pipe_next() and splice_from_pipe_feed() to
+ *    initialize the necessary fields of @sd.
+ */
+void splice_from_pipe_begin(struct splice_desc *sd)
+{
+	sd->num_spliced = 0;
+	sd->need_wakeup = false;
+}
+EXPORT_SYMBOL(splice_from_pipe_begin);
+
+/**
+ * splice_from_pipe_end - finish splicing from pipe
+ * @pipe:	pipe to splice from
+ * @sd:		information about the splice operation
+ *
+ * Description:
+ *    This function will wake up pipe writers if necessary.  It should
+ *    be called after a loop containing splice_from_pipe_next() and
+ *    splice_from_pipe_feed().
+ */
+void splice_from_pipe_end(struct pipe_inode_info *pipe, struct splice_desc *sd)
+{
+	if (sd->need_wakeup)
+		wakeup_pipe_writers(pipe);
+}
+EXPORT_SYMBOL(splice_from_pipe_end);
+
+/**
+ * __splice_from_pipe - splice data from a pipe to given actor
+ * @pipe:	pipe to splice from
+ * @sd:		information to @actor
+ * @actor:	handler that splices the data
+ *
+ * Description:
+ *    This function does little more than loop over the pipe and call
+ *    @actor to do the actual moving of a single struct pipe_buffer to
+ *    the desired destination. See pipe_to_file, pipe_to_sendpage, or
+ *    pipe_to_user.
+ *
+ */
+ssize_t __splice_from_pipe(struct pipe_inode_info *pipe, struct splice_desc *sd,
+			   splice_actor *actor)
+{
+	int ret;
+
+	splice_from_pipe_begin(sd);
+	do {
+		ret = splice_from_pipe_next(pipe, sd);
+		if (ret > 0)
+			ret = splice_from_pipe_feed(pipe, sd, actor);
+	} while (ret > 0);
+	splice_from_pipe_end(pipe, sd);
+
+	return sd->num_spliced ? sd->num_spliced : ret;
 }
 EXPORT_SYMBOL(__splice_from_pipe);
 
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -36,6 +36,8 @@ struct splice_desc {
 		void *data;		/* cookie */
 	} u;
 	loff_t pos;			/* file position */
+	size_t num_spliced;		/* number of bytes already spliced */
+	bool need_wakeup;		/* need to wake up writer */
 };
 
 struct partial_page {
@@ -66,6 +68,14 @@ extern ssize_t splice_from_pipe(struct p
 				splice_actor *);
 extern ssize_t __splice_from_pipe(struct pipe_inode_info *,
 				  struct splice_desc *, splice_actor *);
+extern int splice_from_pipe_feed(struct pipe_inode_info *, struct splice_desc *,
+				 splice_actor *);
+extern int splice_from_pipe_next(struct pipe_inode_info *,
+				 struct splice_desc *);
+extern void splice_from_pipe_begin(struct splice_desc *);
+extern void splice_from_pipe_end(struct pipe_inode_info *,
+				 struct splice_desc *);
+
 extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 			      struct splice_pipe_desc *);
 extern ssize_t splice_direct_to_actor(struct file *, struct splice_desc *,



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 26/28] splice: remove i_mutex locking in splice_from_pipe()
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (24 preceding siblings ...)
  2009-05-14 22:51   ` [patch 25/28] splice: split up __splice_from_pipe() Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 27/28] splice: fix i_mutex locking in generic_splice_write() Greg KH
  2009-05-14 22:51   ` [patch 28/28] ocfs2: fix i_mutex locking in ocfs2_splice_to_file() Greg KH
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Miklos Szeredi, Jens Axboe

[-- Attachment #1: splice-remove-i_mutex-locking-in-splice_from_pipe.patch --]
[-- Type: text/plain, Size: 2064 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Miklos Szeredi <miklos@szeredi.hu>

commit 2933970b960223076d6affcf7a77e2bc546b8102 upstream.

splice_from_pipe() is only called from two places:

  - generic_splice_sendpage()
  - splice_write_null()

Neither of these require i_mutex to be taken on the destination inode.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/splice.c |   18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -782,7 +782,7 @@ EXPORT_SYMBOL(__splice_from_pipe);
  * @actor:	handler that splices the data
  *
  * Description:
- *    See __splice_from_pipe. This function locks the input and output inodes,
+ *    See __splice_from_pipe. This function locks the pipe inode,
  *    otherwise it's identical to __splice_from_pipe().
  *
  */
@@ -791,7 +791,6 @@ ssize_t splice_from_pipe(struct pipe_ino
 			 splice_actor *actor)
 {
 	ssize_t ret;
-	struct inode *inode = out->f_mapping->host;
 	struct splice_desc sd = {
 		.total_len = len,
 		.flags = flags,
@@ -799,24 +798,11 @@ ssize_t splice_from_pipe(struct pipe_ino
 		.u.file = out,
 	};
 
-	/*
-	 * The actor worker might be calling ->prepare_write and
-	 * ->commit_write. Most of the time, these expect i_mutex to
-	 * be held. Since this may result in an ABBA deadlock with
-	 * pipe->inode, we have to order lock acquiry here.
-	 *
-	 * Outer lock must be inode->i_mutex, as pipe_wait() will
-	 * release and reacquire pipe->inode->i_mutex, AND inode must
-	 * never be a pipe.
-	 */
-	WARN_ON(S_ISFIFO(inode->i_mode));
-	mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
 	if (pipe->inode)
-		mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_CHILD);
+		mutex_lock(&pipe->inode->i_mutex);
 	ret = __splice_from_pipe(pipe, &sd, actor);
 	if (pipe->inode)
 		mutex_unlock(&pipe->inode->i_mutex);
-	mutex_unlock(&inode->i_mutex);
 
 	return ret;
 }



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 27/28] splice: fix i_mutex locking in generic_splice_write()
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (25 preceding siblings ...)
  2009-05-14 22:51   ` [patch 26/28] splice: remove i_mutex locking in splice_from_pipe() Greg KH
@ 2009-05-14 22:51   ` Greg KH
  2009-05-14 22:51   ` [patch 28/28] ocfs2: fix i_mutex locking in ocfs2_splice_to_file() Greg KH
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Miklos Szeredi, Jens Axboe

[-- Attachment #1: splice-fix-i_mutex-locking-in-generic_splice_write.patch --]
[-- Type: text/plain, Size: 1728 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Miklos Szeredi <miklos@szeredi.hu>

commit eb443e5a25d43996deb62b9bcee1a4ce5dea2ead upstream.

Rearrange locking of i_mutex on destination so it's only held while
buffers are copied with the pipe_to_file() actor, and not while
waiting for more data on the pipe.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/splice.c |   34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

--- a/fs/splice.c
+++ b/fs/splice.c
@@ -893,17 +893,29 @@ generic_file_splice_write(struct pipe_in
 	};
 	ssize_t ret;
 
-	WARN_ON(S_ISFIFO(inode->i_mode));
-	mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-	ret = file_remove_suid(out);
-	if (likely(!ret)) {
-		if (pipe->inode)
-			mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_CHILD);
-		ret = __splice_from_pipe(pipe, &sd, pipe_to_file);
-		if (pipe->inode)
-			mutex_unlock(&pipe->inode->i_mutex);
-	}
-	mutex_unlock(&inode->i_mutex);
+	if (pipe->inode)
+		mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_PARENT);
+
+	splice_from_pipe_begin(&sd);
+	do {
+		ret = splice_from_pipe_next(pipe, &sd);
+		if (ret <= 0)
+			break;
+
+		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+		ret = file_remove_suid(out);
+		if (!ret)
+			ret = splice_from_pipe_feed(pipe, &sd, pipe_to_file);
+		mutex_unlock(&inode->i_mutex);
+	} while (ret > 0);
+	splice_from_pipe_end(pipe, &sd);
+
+	if (pipe->inode)
+		mutex_unlock(&pipe->inode->i_mutex);
+
+	if (sd.num_spliced)
+		ret = sd.num_spliced;
+
 	if (ret > 0) {
 		unsigned long nr_pages;
 



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 28/28] ocfs2: fix i_mutex locking in ocfs2_splice_to_file()
  2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
                     ` (26 preceding siblings ...)
  2009-05-14 22:51   ` [patch 27/28] splice: fix i_mutex locking in generic_splice_write() Greg KH
@ 2009-05-14 22:51   ` Greg KH
  27 siblings, 0 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:51 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Miklos Szeredi, Jens Axboe

[-- Attachment #1: ocfs2-fix-i_mutex-locking-in-ocfs2_splice_to_file.patch --]
[-- Type: text/plain, Size: 4906 bytes --]

2.6.27-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Miklos Szeredi <miklos@szeredi.hu>

commit 328eaaba4e41a04c1dc4679d65bea3fee4349d86 upstream.

Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ocfs2/file.c        |   96 ++++++++++++++++++++++++++++++++++++++-----------
 fs/splice.c            |    5 +-
 include/linux/splice.h |    2 +
 3 files changed, 80 insertions(+), 23 deletions(-)

--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2075,6 +2075,22 @@ out_sems:
 	return written ? written : ret;
 }
 
+static int ocfs2_splice_to_file(struct pipe_inode_info *pipe,
+				struct file *out,
+				struct splice_desc *sd)
+{
+	int ret;
+
+	ret = ocfs2_prepare_inode_for_write(out->f_path.dentry,	&sd->pos,
+					    sd->total_len, 0, NULL);
+	if (ret < 0) {
+		mlog_errno(ret);
+		return ret;
+	}
+
+	return splice_from_pipe_feed(pipe, sd, pipe_to_file);
+}
+
 static ssize_t ocfs2_file_splice_write(struct pipe_inode_info *pipe,
 				       struct file *out,
 				       loff_t *ppos,
@@ -2082,38 +2098,76 @@ static ssize_t ocfs2_file_splice_write(s
 				       unsigned int flags)
 {
 	int ret;
-	struct inode *inode = out->f_path.dentry->d_inode;
+	struct address_space *mapping = out->f_mapping;
+	struct inode *inode = mapping->host;
+	struct splice_desc sd = {
+		.total_len = len,
+		.flags = flags,
+		.pos = *ppos,
+		.u.file = out,
+	};
 
 	mlog_entry("(0x%p, 0x%p, %u, '%.*s')\n", out, pipe,
 		   (unsigned int)len,
 		   out->f_path.dentry->d_name.len,
 		   out->f_path.dentry->d_name.name);
 
-	mutex_lock_nested(&inode->i_mutex, I_MUTEX_PARENT);
-
-	ret = ocfs2_rw_lock(inode, 1);
-	if (ret < 0) {
-		mlog_errno(ret);
-		goto out;
-	}
+	if (pipe->inode)
+		mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_PARENT);
 
-	ret = ocfs2_prepare_inode_for_write(out->f_path.dentry, ppos, len, 0,
-					    NULL);
-	if (ret < 0) {
-		mlog_errno(ret);
-		goto out_unlock;
-	}
+	splice_from_pipe_begin(&sd);
+	do {
+		ret = splice_from_pipe_next(pipe, &sd);
+		if (ret <= 0)
+			break;
+
+		mutex_lock_nested(&inode->i_mutex, I_MUTEX_CHILD);
+		ret = ocfs2_rw_lock(inode, 1);
+		if (ret < 0)
+			mlog_errno(ret);
+		else {
+			ret = ocfs2_splice_to_file(pipe, out, &sd);
+			ocfs2_rw_unlock(inode, 1);
+		}
+		mutex_unlock(&inode->i_mutex);
+	} while (ret > 0);
+	splice_from_pipe_end(pipe, &sd);
 
 	if (pipe->inode)
-		mutex_lock_nested(&pipe->inode->i_mutex, I_MUTEX_CHILD);
-	ret = generic_file_splice_write_nolock(pipe, out, ppos, len, flags);
-	if (pipe->inode)
 		mutex_unlock(&pipe->inode->i_mutex);
 
-out_unlock:
-	ocfs2_rw_unlock(inode, 1);
-out:
-	mutex_unlock(&inode->i_mutex);
+	if (sd.num_spliced)
+		ret = sd.num_spliced;
+
+	if (ret > 0) {
+		unsigned long nr_pages;
+
+		*ppos += ret;
+		nr_pages = (ret + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+
+		/*
+		 * If file or inode is SYNC and we actually wrote some data,
+		 * sync it.
+		 */
+		if (unlikely((out->f_flags & O_SYNC) || IS_SYNC(inode))) {
+			int err;
+
+			mutex_lock(&inode->i_mutex);
+			err = ocfs2_rw_lock(inode, 1);
+			if (err < 0) {
+				mlog_errno(err);
+			} else {
+				err = generic_osync_inode(inode, mapping,
+						  OSYNC_METADATA|OSYNC_DATA);
+				ocfs2_rw_unlock(inode, 1);
+			}
+			mutex_unlock(&inode->i_mutex);
+
+			if (err)
+				ret = err;
+		}
+		balance_dirty_pages_ratelimited_nr(mapping, nr_pages);
+	}
 
 	mlog_exit(ret);
 	return ret;
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -553,8 +553,8 @@ static int pipe_to_sendpage(struct pipe_
  * SPLICE_F_MOVE isn't set, or we cannot move the page, we simply create
  * a new page in the output file page cache and fill/dirty that.
  */
-static int pipe_to_file(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
-			struct splice_desc *sd)
+int pipe_to_file(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
+		 struct splice_desc *sd)
 {
 	struct file *file = sd->u.file;
 	struct address_space *mapping = file->f_mapping;
@@ -598,6 +598,7 @@ static int pipe_to_file(struct pipe_inod
 out:
 	return ret;
 }
+EXPORT_SYMBOL(pipe_to_file);
 
 static void wakeup_pipe_writers(struct pipe_inode_info *pipe)
 {
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -75,6 +75,8 @@ extern int splice_from_pipe_next(struct 
 extern void splice_from_pipe_begin(struct splice_desc *);
 extern void splice_from_pipe_end(struct pipe_inode_info *,
 				 struct splice_desc *);
+extern int pipe_to_file(struct pipe_inode_info *, struct pipe_buffer *,
+			struct splice_desc *);
 
 extern ssize_t splice_to_pipe(struct pipe_inode_info *,
 			      struct splice_pipe_desc *);



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [patch 00/28] 2.6.27.24-stable review
@ 2009-05-14 22:54 ` Greg KH
  2009-05-14 22:51   ` [patch 01/28] md: fix loading of out-of-date bitmap Greg KH
                     ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: Greg KH @ 2009-05-14 22:54 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan


This is the start of the stable review cycle for the 2.6.27.24 release.
There are 28 patches in this series, all will be posted as a response to
this one.  If anyone has any issues with these being applied, please let
us know.  If anyone is a maintainer of the proper subsystem, and wants
to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Cc:
line.  If you wish to be a reviewer, please email stable@kernel.org to
add your name to the list.  If you want to be off the reviewer list,
also email us.

Responses should be made by Saturday, May 16, 20:00:00 UTC.  Anything
received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.27.24-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

 Documentation/filesystems/Locking |   26 +++--
 Makefile                          |    2 +-
 drivers/i2c/algos/i2c-algo-bit.c  |    2 +-
 drivers/i2c/algos/i2c-algo-pca.c  |   11 ++
 drivers/md/bitmap.c               |   29 ++--
 drivers/md/md.c                   |    7 +-
 drivers/md/raid10.c               |   12 +-
 drivers/net/ehea/ehea_main.c      |   31 +++--
 drivers/serial/mpc52xx_uart.c     |    2 +-
 drivers/usb/gadget/usbstring.c    |    6 +-
 drivers/video/fb_defio.c          |    3 +-
 fs/buffer.c                       |   20 ++-
 fs/cifs/cifs_unicode.h            |    7 +
 fs/cifs/cifssmb.c                 |   17 +--
 fs/cifs/connect.c                 |    6 +-
 fs/cifs/misc.c                    |    5 +-
 fs/cifs/readdir.c                 |    2 +-
 fs/cifs/sess.c                    |   44 +++---
 fs/eventpoll.c                    |    2 +-
 fs/ext4/ext4.h                    |    2 +-
 fs/ext4/inode.c                   |    5 +-
 fs/fcntl.c                        |    6 +-
 fs/fuse/file.c                    |    3 +-
 fs/gfs2/ops_file.c                |    7 +-
 fs/nfs/dir.c                      |    9 +-
 fs/nfs/file.c                     |    9 +-
 fs/nfsd/nfs4xdr.c                 |   16 ++-
 fs/ocfs2/file.c                   |   94 ++++++++++---
 fs/ocfs2/mmap.c                   |    6 +-
 fs/splice.c                       |  276 +++++++++++++++++++++++--------------
 fs/ubifs/file.c                   |    9 +-
 fs/xfs/linux-2.6/xfs_file.c       |    4 +-
 include/linux/buffer_head.h       |    2 +-
 include/linux/mm.h                |    3 +-
 include/linux/splice.h            |   12 ++
 mm/memory.c                       |  132 +++++++++++++-----
 36 files changed, 547 insertions(+), 282 deletions(-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-05-14 23:13 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090514225126.907908936@mini.kroah.org>
2009-05-14 22:54 ` [patch 00/28] 2.6.27.24-stable review Greg KH
2009-05-14 22:51   ` [patch 01/28] md: fix loading of out-of-date bitmap Greg KH
2009-05-14 22:51   ` [patch 02/28] md: fix some (more) errors with bitmaps on devices larger than 2TB Greg KH
2009-05-14 22:51   ` [patch 03/28] md/raid10: dont clear bitmap during recovery if array will still be degraded Greg KH
2009-05-14 22:51   ` [patch 04/28] md: remove ability to explicit set an inactive array to clean Greg KH
2009-05-14 22:51   ` [patch 05/28] USB: Gadget: fix UTF conversion in the usbstring library Greg KH
2009-05-14 22:51   ` [patch 06/28] dup2: Fix return value with oldfd == newfd and invalid fd Greg KH
2009-05-14 22:51   ` [patch 07/28] i2c-algo-bit: Fix timeout test Greg KH
2009-05-14 22:51   ` [patch 08/28] i2c-algo-pca: Let PCA9564 recover from unacked data byte (state 0x30) Greg KH
2009-05-14 22:51   ` [patch 09/28] mm: page_mkwrite change prototype to match fault Greg KH
2009-05-14 22:51   ` [patch 10/28] fs: fix page_mkwrite error cases in core code and btrfs Greg KH
2009-05-14 22:51   ` [patch 11/28] mm: close page_mkwrite races Greg KH
2009-05-14 22:51   ` [patch 12/28] GFS2: Fix page_mkwrite() return code Greg KH
2009-05-14 22:51   ` [patch 13/28] NFS: Fix the return value in nfs_page_mkwrite() Greg KH
2009-05-14 22:51   ` [patch 14/28] NFS: Close page_mkwrite() races Greg KH
2009-05-14 22:51   ` [patch 15/28] cifs: Fix buffer size for tcon->nativeFileSystem field Greg KH
2009-05-14 22:51   ` [patch 16/28] cifs: Increase size of tmp_buf in cifs_readdir to avoid potential overflows Greg KH
2009-05-14 22:51   ` [patch 17/28] cifs: Fix incorrect destination buffer size in cifs_strncpy_to_host Greg KH
2009-05-14 22:51   ` [patch 18/28] cifs: Fix buffer size in cifs_convertUCSpath Greg KH
2009-05-14 22:51   ` [patch 19/28] cifs: Fix unicode string area word alignment in session setup Greg KH
2009-05-14 22:51   ` [patch 20/28] epoll: fix size check in epoll_create() Greg KH
2009-05-14 22:51   ` [patch 21/28] nfsd4: check for negative dentry before use in nfsv4 readdir Greg KH
2009-05-14 22:51   ` [patch 22/28] NFS: Fix the notifications when renaming onto an existing file Greg KH
2009-05-14 22:51   ` [patch 23/28] ehea: fix invalid pointer access Greg KH
2009-05-14 22:51   ` [patch 24/28] powerpc/5200: Dont specify IRQF_SHARED in PSC UART driver Greg KH
2009-05-14 22:51   ` [patch 25/28] splice: split up __splice_from_pipe() Greg KH
2009-05-14 22:51   ` [patch 26/28] splice: remove i_mutex locking in splice_from_pipe() Greg KH
2009-05-14 22:51   ` [patch 27/28] splice: fix i_mutex locking in generic_splice_write() Greg KH
2009-05-14 22:51   ` [patch 28/28] ocfs2: fix i_mutex locking in ocfs2_splice_to_file() Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox