public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch 00/39] 2.6.28.7-stable review
@ 2009-02-18 21:30 ` Greg KH
  2009-02-18 21:32   ` [patch 01/39] pid: implement ns_of_pid Greg KH
                     ` (38 more replies)
  0 siblings, 39 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:30 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan


This is the start of the stable review cycle for the 2.6.28.7 release.
There are 39 patches in this series, all will be posted as a response to
this one.  If anyone has any issues with these being applied, please let
us know.  If anyone is a maintainer of the proper subsystem, and wants
to add a Signed-off-by: line to the patch, please respond with it.

These patches are sent out with a number of different people on the Cc:
line.  If you wish to be a reviewer, please email stable@kernel.org to
add your name to the list.  If you want to be off the reviewer list,
also email us.

Responses should be made by Friday, Febuary 20, 2009, 21:00:00 UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
	kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.28.7-rc1.gz
and the diffstat can be found below.


thanks,

greg k-h


 Makefile                               |    2 +-
 arch/powerpc/kernel/align.c            |    7 +-
 arch/x86/kernel/traps.c                |   10 +-
 arch/x86/mm/pageattr.c                 |   14 +
 drivers/ata/pata_via.c                 |    4 +-
 drivers/ata/sata_nv.c                  |   14 +-
 drivers/bluetooth/btsdio.c             |    1 +
 drivers/net/3c505.c                    |   26 ++-
 drivers/pci/intel-iommu.c              |   14 +-
 drivers/scsi/libiscsi.c                |    1 +
 drivers/watchdog/Kconfig               |    2 +-
 drivers/watchdog/iTCO_vendor_support.c |   32 ++-
 drivers/watchdog/iTCO_wdt.c            |   35 +--
 fs/ext2/super.c                        |    9 +-
 fs/ext4/balloc.c                       |  168 +++++--------
 fs/ext4/ext4.h                         |   41 +++-
 fs/ext4/ext4_sb.h                      |    4 +-
 fs/ext4/hash.c                         |   77 +++++-
 fs/ext4/ialloc.c                       |  166 ++++++++-----
 fs/ext4/inode.c                        |   76 ++++--
 fs/ext4/mballoc.c                      |  433 +++++++++++++++++++++++++-------
 fs/ext4/mballoc.h                      |   26 +--
 fs/ext4/namei.c                        |   28 ++-
 fs/ext4/resize.c                       |   68 +----
 fs/ext4/super.c                        |   46 +++-
 fs/jbd2/commit.c                       |   27 ++-
 include/linux/jbd2.h                   |    4 +-
 include/linux/mod_devicetable.h        |    7 +
 include/linux/pci_ids.h                |    1 +
 include/linux/pid.h                    |   18 ++
 ipc/mqueue.c                           |    3 +-
 kernel/sched_fair.c                    |   30 ++-
 32 files changed, 938 insertions(+), 456 deletions(-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 01/39] pid: implement ns_of_pid
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 02/39] mqueue: fix si_pid value in mqueue do_notify() Greg KH
                     ` (37 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Eric W. Biederman, Sukadev Bhattiprolu, Oleg Nesterov,
	Roland McGrath, Bastian Blank, Pavel Emelyanov, Nadia Derbey,
	Serge Hallyn

[-- Attachment #1: pid-implement-ns_of_pid.patch --]
[-- Type: text/plain, Size: 2277 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Eric W. Biederman <ebiederm@xmission.com>

commit f9fb860f67b9542cd78d1558dec7058092b57d8e upstream.

A current problem with the pid namespace is that it is easy to do pid
related work after exit_task_namespaces which drops the nsproxy pointer.

However if we are doing pid namespace related work we are always operating
on some struct pid which retains the pid_namespace pointer of the pid
namespace it was allocated in.

So provide ns_of_pid which allows us to find the pid namespace a pid was
allocated in.

Using this we have the needed infrastructure to do pid namespace related
work at anytime we have a struct pid, removing the chance of accidentally
having a NULL pointer dereference when accessing current->nsproxy.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/pid.h |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -123,6 +123,24 @@ extern struct pid *alloc_pid(struct pid_
 extern void free_pid(struct pid *pid);
 
 /*
+ * ns_of_pid() returns the pid namespace in which the specified pid was
+ * allocated.
+ *
+ * NOTE:
+ * 	ns_of_pid() is expected to be called for a process (task) that has
+ * 	an attached 'struct pid' (see attach_pid(), detach_pid()) i.e @pid
+ * 	is expected to be non-NULL. If @pid is NULL, caller should handle
+ * 	the resulting NULL pid-ns.
+ */
+static inline struct pid_namespace *ns_of_pid(struct pid *pid)
+{
+	struct pid_namespace *ns = NULL;
+	if (pid)
+		ns = pid->numbers[pid->level].ns;
+	return ns;
+}
+
+/*
  * the helpers to get the pid's id seen from different namespaces
  *
  * pid_nr()    : global id, i.e. the id seen from the init namespace;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 02/39] mqueue: fix si_pid value in mqueue do_notify()
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
  2009-02-18 21:32   ` [patch 01/39] pid: implement ns_of_pid Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 03/39] WATCHDOG: iTCO_wdt: fix SMI_EN regression 2 Greg KH
                     ` (36 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Nadia Derbey, Sukadev Bhattiprolu, Oleg Nesterov,
	Roland McGrath, Bastian Blank, Pavel Emelyanov, Eric W. Biederman,
	Serge Hallyn

[-- Attachment #1: mqueue-fix-si_pid-value-in-mqueue-do_notify.patch --]
[-- Type: text/plain, Size: 2130 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

commit a6684999f7c6bddd75cf9755ad7ff44435f72fff upstream.

If a process registers for asynchronous notification on a POSIX message
queue, it gets a signal and a siginfo_t structure when a message arrives
on the message queue.  The si_pid in the siginfo_t structure is set to the
PID of the process that sent the message to the message queue.

The principle is the following:
. when mq_notify(SIGEV_SIGNAL) is called, the caller registers for
  notification when a msg arrives. The associated pid structure is stroed into
  inode_info->notify_owner. Let's call this process P1.
. when mq_send() is called by say P2, P2 sends a signal to P1 to notify
  him about msg arrival.

The way .si_pid is set today is not correct, since it doesn't take into account
the fact that the process that is sending the message might not be in the
same namespace as the notified one.

This patch proposes to set si_pid to the sender's pid into the notify_owner
namespace.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 ipc/mqueue.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -506,7 +506,8 @@ static void __do_notify(struct mqueue_in
 			sig_i.si_errno = 0;
 			sig_i.si_code = SI_MESGQ;
 			sig_i.si_value = info->notify.sigev_value;
-			sig_i.si_pid = task_tgid_vnr(current);
+			sig_i.si_pid = task_tgid_nr_ns(current,
+						ns_of_pid(info->notify_owner));
 			sig_i.si_uid = current->uid;
 
 			kill_pid_info(info->notify.sigev_signo,


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 03/39] WATCHDOG: iTCO_wdt: fix SMI_EN regression 2
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
  2009-02-18 21:32   ` [patch 01/39] pid: implement ns_of_pid Greg KH
  2009-02-18 21:32   ` [patch 02/39] mqueue: fix si_pid value in mqueue do_notify() Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 04/39] powerpc/vsx: Fix VSX alignment handler for regs 32-63 Greg KH
                     ` (35 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Wim Van Sebroeck

[-- Attachment #1: watchdog-itco_wdt-fix-smi_en-regression-2.patch --]
[-- Type: text/plain, Size: 6946 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Wim Van Sebroeck <wim@iguana.be>

commit 12d60e28bed3f593aac5385acbdbb089eb8ae21e upstream.

bugzilla: #12363
commit 7cd5b08be3c489df11b559fef210b81133764ad4 added a second regression:
some Dell's and Compaq's lockup on boot. So we revert most of the code.
The ICH9 reboot issue remains in place and will need some more fixing... :-(

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/watchdog/Kconfig               |    2 -
 drivers/watchdog/iTCO_vendor_support.c |   32 ++++++++++++++++++++++++++----
 drivers/watchdog/iTCO_wdt.c            |   35 +++++++++++++--------------------
 3 files changed, 43 insertions(+), 26 deletions(-)

--- a/drivers/watchdog/iTCO_vendor_support.c
+++ b/drivers/watchdog/iTCO_vendor_support.c
@@ -1,7 +1,7 @@
 /*
  *	intel TCO vendor specific watchdog driver support
  *
- *	(c) Copyright 2006-2008 Wim Van Sebroeck <wim@iguana.be>.
+ *	(c) Copyright 2006-2009 Wim Van Sebroeck <wim@iguana.be>.
  *
  *	This program is free software; you can redistribute it and/or
  *	modify it under the terms of the GNU General Public License
@@ -19,7 +19,7 @@
 
 /* Module and version information */
 #define DRV_NAME	"iTCO_vendor_support"
-#define DRV_VERSION	"1.02"
+#define DRV_VERSION	"1.03"
 #define PFX		DRV_NAME ": "
 
 /* Includes */
@@ -77,6 +77,26 @@ MODULE_PARM_DESC(vendorsupport, "iTCO ve
  *	    20.6 seconds.
  */
 
+static void supermicro_old_pre_start(unsigned long acpibase)
+{
+	unsigned long val32;
+
+	/* Bit 13: TCO_EN -> 0 = Disables TCO logic generating an SMI# */
+	val32 = inl(SMI_EN);
+	val32 &= 0xffffdfff;	/* Turn off SMI clearing watchdog */
+	outl(val32, SMI_EN);	/* Needed to activate watchdog */
+}
+
+static void supermicro_old_pre_stop(unsigned long acpibase)
+{
+	unsigned long val32;
+
+	/* Bit 13: TCO_EN -> 1 = Enables the TCO logic to generate SMI# */
+	val32 = inl(SMI_EN);
+	val32 |= 0x00002000;	/* Turn on SMI clearing watchdog */
+	outl(val32, SMI_EN);	/* Needed to deactivate watchdog */
+}
+
 static void supermicro_old_pre_keepalive(unsigned long acpibase)
 {
 	/* Reload TCO Timer (done in iTCO_wdt_keepalive) + */
@@ -228,14 +248,18 @@ static void supermicro_new_pre_set_heart
 void iTCO_vendor_pre_start(unsigned long acpibase,
 			   unsigned int heartbeat)
 {
-	if (vendorsupport == SUPERMICRO_NEW_BOARD)
+	if (vendorsupport == SUPERMICRO_OLD_BOARD)
+		supermicro_old_pre_start(acpibase);
+	else if (vendorsupport == SUPERMICRO_NEW_BOARD)
 		supermicro_new_pre_start(heartbeat);
 }
 EXPORT_SYMBOL(iTCO_vendor_pre_start);
 
 void iTCO_vendor_pre_stop(unsigned long acpibase)
 {
-	if (vendorsupport == SUPERMICRO_NEW_BOARD)
+	if (vendorsupport == SUPERMICRO_OLD_BOARD)
+		supermicro_old_pre_stop(acpibase);
+	else if (vendorsupport == SUPERMICRO_NEW_BOARD)
 		supermicro_new_pre_stop();
 }
 EXPORT_SYMBOL(iTCO_vendor_pre_stop);
--- a/drivers/watchdog/iTCO_wdt.c
+++ b/drivers/watchdog/iTCO_wdt.c
@@ -1,7 +1,7 @@
 /*
- *	intel TCO Watchdog Driver (Used in i82801 and i6300ESB chipsets)
+ *	intel TCO Watchdog Driver (Used in i82801 and i63xxESB chipsets)
  *
- *	(c) Copyright 2006-2008 Wim Van Sebroeck <wim@iguana.be>.
+ *	(c) Copyright 2006-2009 Wim Van Sebroeck <wim@iguana.be>.
  *
  *	This program is free software; you can redistribute it and/or
  *	modify it under the terms of the GNU General Public License
@@ -63,7 +63,7 @@
 
 /* Module and version information */
 #define DRV_NAME	"iTCO_wdt"
-#define DRV_VERSION	"1.04"
+#define DRV_VERSION	"1.05"
 #define PFX		DRV_NAME ": "
 
 /* Includes */
@@ -236,16 +236,16 @@ MODULE_DEVICE_TABLE(pci, iTCO_wdt_pci_tb
 
 /* Address definitions for the TCO */
 /* TCO base address */
-#define	TCOBASE		iTCO_wdt_private.ACPIBASE + 0x60
+#define TCOBASE		iTCO_wdt_private.ACPIBASE + 0x60
 /* SMI Control and Enable Register */
-#define	SMI_EN		iTCO_wdt_private.ACPIBASE + 0x30
+#define SMI_EN		iTCO_wdt_private.ACPIBASE + 0x30
 
 #define TCO_RLD		TCOBASE + 0x00	/* TCO Timer Reload and Curr. Value */
 #define TCOv1_TMR	TCOBASE + 0x01	/* TCOv1 Timer Initial Value	*/
-#define	TCO_DAT_IN	TCOBASE + 0x02	/* TCO Data In Register		*/
-#define	TCO_DAT_OUT	TCOBASE + 0x03	/* TCO Data Out Register	*/
-#define	TCO1_STS	TCOBASE + 0x04	/* TCO1 Status Register		*/
-#define	TCO2_STS	TCOBASE + 0x06	/* TCO2 Status Register		*/
+#define TCO_DAT_IN	TCOBASE + 0x02	/* TCO Data In Register		*/
+#define TCO_DAT_OUT	TCOBASE + 0x03	/* TCO Data Out Register	*/
+#define TCO1_STS	TCOBASE + 0x04	/* TCO1 Status Register		*/
+#define TCO2_STS	TCOBASE + 0x06	/* TCO2 Status Register		*/
 #define TCO1_CNT	TCOBASE + 0x08	/* TCO1 Control Register	*/
 #define TCO2_CNT	TCOBASE + 0x0a	/* TCO2 Control Register	*/
 #define TCOv2_TMR	TCOBASE + 0x12	/* TCOv2 Timer Initial Value	*/
@@ -338,7 +338,6 @@ static int iTCO_wdt_unset_NO_REBOOT_bit(
 static int iTCO_wdt_start(void)
 {
 	unsigned int val;
-	unsigned long val32;
 
 	spin_lock(&iTCO_wdt_private.io_lock);
 
@@ -351,11 +350,6 @@ static int iTCO_wdt_start(void)
 		return -EIO;
 	}
 
-	/* Bit 13: TCO_EN -> 0 = Disables TCO logic generating an SMI# */
-	val32 = inl(SMI_EN);
-	val32 &= 0xffffdfff;	/* Turn off SMI clearing watchdog */
-	outl(val32, SMI_EN);
-
 	/* Force the timer to its reload value by writing to the TCO_RLD
 	   register */
 	if (iTCO_wdt_private.iTCO_version == 2)
@@ -378,7 +372,6 @@ static int iTCO_wdt_start(void)
 static int iTCO_wdt_stop(void)
 {
 	unsigned int val;
-	unsigned long val32;
 
 	spin_lock(&iTCO_wdt_private.io_lock);
 
@@ -390,11 +383,6 @@ static int iTCO_wdt_stop(void)
 	outw(val, TCO1_CNT);
 	val = inw(TCO1_CNT);
 
-	/* Bit 13: TCO_EN -> 1 = Enables the TCO logic to generate SMI# */
-	val32 = inl(SMI_EN);
-	val32 |= 0x00002000;
-	outl(val32, SMI_EN);
-
 	/* Set the NO_REBOOT bit to prevent later reboots, just for sure */
 	iTCO_wdt_set_NO_REBOOT_bit();
 
@@ -649,6 +637,7 @@ static int __devinit iTCO_wdt_init(struc
 	int ret;
 	u32 base_address;
 	unsigned long RCBA;
+	unsigned long val32;
 
 	/*
 	 *      Find the ACPI/PM base I/O address which is the base
@@ -695,6 +684,10 @@ static int __devinit iTCO_wdt_init(struc
 		ret = -EIO;
 		goto out;
 	}
+	/* Bit 13: TCO_EN -> 0 = Disables TCO logic generating an SMI# */
+	val32 = inl(SMI_EN);
+	val32 &= 0xffffdfff;	/* Turn off SMI clearing watchdog */
+	outl(val32, SMI_EN);
 
 	/* The TCO I/O registers reside in a 32-byte range pointed to
 	   by the TCOBASE value */
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -399,7 +399,7 @@ config ITCO_WDT
 	---help---
 	  Hardware driver for the intel TCO timer based watchdog devices.
 	  These drivers are included in the Intel 82801 I/O Controller
-	  Hub family (from ICH0 up to ICH8) and in the Intel 6300ESB
+	  Hub family (from ICH0 up to ICH10) and in the Intel 63xxESB
 	  controller hub.
 
 	  The TCO (Total Cost of Ownership) timer is a watchdog timer


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 04/39] powerpc/vsx: Fix VSX alignment handler for regs 32-63
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (2 preceding siblings ...)
  2009-02-18 21:32   ` [patch 03/39] WATCHDOG: iTCO_wdt: fix SMI_EN regression 2 Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 05/39] sata_nv: give up hardreset on nf2 Greg KH
                     ` (34 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Michael Neuling, Benjamin Herrenschmidt

[-- Attachment #1: powerpc-vsx-fix-vsx-alignment-handler-for-regs-32-63.patch --]
[-- Type: text/plain, Size: 1149 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Michael Neuling <mikey@neuling.org>

commit 26456dcfb8d8e43b1b64b2a14710694cf7a72f05 upstream.

Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
in the VMX part of the thread_struct not the FPR part.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/powerpc/kernel/align.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -646,11 +646,16 @@ static int emulate_vsx(unsigned char __u
 		       unsigned int areg, struct pt_regs *regs,
 		       unsigned int flags, unsigned int length)
 {
-	char *ptr = (char *) &current->thread.TS_FPR(reg);
+	char *ptr;
 	int ret = 0;
 
 	flush_vsx_to_thread(current);
 
+	if (reg < 32)
+		ptr = (char *) &current->thread.TS_FPR(reg);
+	else
+		ptr = (char *) &current->thread.vr[reg - 32];
+
 	if (flags & ST)
 		ret = __copy_to_user(addr, ptr, length);
         else {


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 05/39] sata_nv: give up hardreset on nf2
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (3 preceding siblings ...)
  2009-02-18 21:32   ` [patch 04/39] powerpc/vsx: Fix VSX alignment handler for regs 32-63 Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-03-03 23:53     ` Stefan Lippers-Hollmann
  2009-02-18 21:32   ` [patch 06/39] Fix Intel IOMMU write-buffer flushing Greg KH
                     ` (33 subsequent siblings)
  38 siblings, 1 reply; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Tejun Heo, Robert Hancock, Jeff Garzik

[-- Attachment #1: sata_nv-give-up-hardreset-on-nf2.patch --]
[-- Type: text/plain, Size: 1719 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Tejun Heo <tj@kernel.org>

commit 7dac745b8e367c99175b8f0d014d996f0e5ed9e5 upstream.

Kernel bz#12176 reports that nf2 hardreset simply doesn't work.  Give
up.  Argh...

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Robert Hancock <hancockr@shaw.ca>
Reported-by: Saro <saro_v@hotmail.it>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/ata/sata_nv.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/drivers/ata/sata_nv.c
+++ b/drivers/ata/sata_nv.c
@@ -421,19 +421,21 @@ static struct ata_port_operations nv_gen
 	.hardreset		= ATA_OP_NULL,
 };
 
-/* OSDL bz3352 reports that nf2/3 controllers can't determine device
- * signature reliably.  Also, the following thread reports detection
- * failure on cold boot with the standard debouncing timing.
+/* nf2 is ripe with hardreset related problems.
+ *
+ * kernel bz#3352 reports nf2/3 controllers can't determine device
+ * signature reliably.  The following thread reports detection failure
+ * on cold boot with the standard debouncing timing.
  *
  * http://thread.gmane.org/gmane.linux.ide/34098
  *
- * Debounce with hotplug timing and request follow-up SRST.
+ * And bz#12176 reports that hardreset simply doesn't work on nf2.
+ * Give up on it and just don't do hardreset.
  */
 static struct ata_port_operations nv_nf2_ops = {
-	.inherits		= &nv_common_ops,
+	.inherits		= &nv_generic_ops,
 	.freeze			= nv_nf2_freeze,
 	.thaw			= nv_nf2_thaw,
-	.hardreset		= nv_noclassify_hardreset,
 };
 
 /* For initial probing after boot and hot plugging, hardreset mostly


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 06/39] Fix Intel IOMMU write-buffer flushing
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (4 preceding siblings ...)
  2009-02-18 21:32   ` [patch 05/39] sata_nv: give up hardreset on nf2 Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 07/39] SCSI: libiscsi: fix iscsi pool leak Greg KH
                     ` (32 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, David Woodhouse

[-- Attachment #1: fix-intel-iommu-write-buffer-flushing.patch --]
[-- Type: text/plain, Size: 1991 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: David Woodhouse <dwmw2@infradead.org>

commit ca77fde8e62cecb2c0769052228d15b901367af8 upstream.

This is the cause of the DMA faults and disk corruption that people have
been seeing. Some chipsets neglect to report the RWBF "capability" --
the flag which says that we need to flush the chipset write-buffer when
changing the DMA page tables, to ensure that the change is visible to
the IOMMU.

Override that bit on the affected chipsets, and everything is happy
again.

Thanks to Chris and Bhavesh and others for helping to debug.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Tested-by: Chris Wright <chrisw@sous-sol.org>
Reviewed-by: Bhavesh Davda <bhavesh@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/pci/intel-iommu.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -506,7 +506,7 @@ static void iommu_flush_write_buffer(str
 	u32 val;
 	unsigned long flag;
 
-	if (!cap_rwbf(iommu->cap))
+	if (!rwbf_quirk && !cap_rwbf(iommu->cap))
 		return;
 	val = iommu->gcmd | DMA_GCMD_WBF;
 
@@ -1315,6 +1315,8 @@ static void domain_remove_dev_info(struc
 	spin_unlock_irqrestore(&device_domain_lock, flags);
 }
 
+static int rwbf_quirk = 0;
+
 /*
  * find_domain
  * Note: we use struct pci_dev->dev.archdata.iommu stores the info
@@ -2436,3 +2438,13 @@ u64 intel_iommu_iova_to_pfn(struct dmar_
 	return pfn >> VTD_PAGE_SHIFT;
 }
 EXPORT_SYMBOL_GPL(intel_iommu_iova_to_pfn);
+
+static void __devinit quirk_iommu_rwbf(struct pci_dev *dev)
+{
+	/* Mobile 4 Series Chipset neglects to set RWBF capability,
+	   but needs it */
+	printk(KERN_INFO "DMAR: Forcing write-buffer flush capability\n");
+	rwbf_quirk = 1;
+}
+
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_rwbf);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 07/39] SCSI: libiscsi: fix iscsi pool leak
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (5 preceding siblings ...)
  2009-02-18 21:32   ` [patch 06/39] Fix Intel IOMMU write-buffer flushing Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 08/39] x86/cpa: make sure cpa is safe to call in lazy mmu mode Greg KH
                     ` (31 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Mike Christie, James Bottomley, Jean Delvare

[-- Attachment #1: scsi-libiscsi-fix-iscsi-pool-leak.patch --]
[-- Type: text/plain, Size: 1024 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Mike Christie <michaelc@cs.wisc.edu>

commit 2f5899a39dcffb404c9a3d06ad438aff3e03bf04 upstream.

I am not sure what happened. It looks like we have always leaked
the q->queue that is allocated from the kfifo_init call. nab finally
noticed that we were leaking and this patch fixes it by adding a
kfree call to iscsi_pool_free. kfifo_free is not used per kfifo_init's
instructions to use kfree.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/scsi/libiscsi.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1899,6 +1899,7 @@ void iscsi_pool_free(struct iscsi_pool *
 		kfree(q->pool[i]);
 	if (q->pool)
 		kfree(q->pool);
+	kfree(q->queue);
 }
 EXPORT_SYMBOL_GPL(iscsi_pool_free);
 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 08/39] x86/cpa: make sure cpa is safe to call in lazy mmu mode
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (6 preceding siblings ...)
  2009-02-18 21:32   ` [patch 07/39] SCSI: libiscsi: fix iscsi pool leak Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 09/39] sched: SCHED_OTHER vs SCHED_IDLE isolation Greg KH
                     ` (30 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Jeremy Fitzhardinge, Marcelo Tosatti, Ingo Molnar

[-- Attachment #1: x86-cpa-make-sure-cpa-is-safe-to-call-in-lazy-mmu-mode.patch --]
[-- Type: text/plain, Size: 1770 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Jeremy Fitzhardinge <jeremy@goop.org>

commit 4f06b0436b2ddbd3b67b10e77098a6862787b3eb upstream.

Impact: fix race leading to crash under KVM and Xen

The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/x86/mm/pageattr.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -576,6 +576,13 @@ static int __change_page_attr(struct cpa
 	else
 		address = *cpa->vaddr;
 
+	/*
+	 * If we're called with lazy mmu updates enabled, the
+	 * in-memory pte state may be stale.  Flush pending updates to
+	 * bring them up to date.
+	 */
+	arch_flush_lazy_mmu_mode();
+
 repeat:
 	kpte = lookup_address(address, &level);
 	if (!kpte)
@@ -854,6 +861,13 @@ static int change_page_attr_set_clr(unsi
 	} else
 		cpa_flush_all(cache);
 
+	/*
+	 * If we've been called with lazy mmu updates enabled, then
+	 * make sure that everything gets flushed out before we
+	 * return.
+	 */
+	arch_flush_lazy_mmu_mode();
+
 out:
 	return ret;
 }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 09/39] sched: SCHED_OTHER vs SCHED_IDLE isolation
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (7 preceding siblings ...)
  2009-02-18 21:32   ` [patch 08/39] x86/cpa: make sure cpa is safe to call in lazy mmu mode Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 10/39] x86, vm86: fix preemption bug Greg KH
                     ` (29 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Mike Galbraith, Peter Zijlstra, Ingo Molnar

[-- Attachment #1: sched-sched_other-vs-sched_idle-isolation.patch --]
[-- Type: text/plain, Size: 2405 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

commit 6bc912b71b6f33b041cfde93ca3f019cbaa852bc upstream.

Stronger SCHED_IDLE isolation:

 - no SCHED_IDLE buddies
 - never let SCHED_IDLE preempt on wakeup
 - always preempt SCHED_IDLE on wakeup
 - limit SLEEPER fairness for SCHED_IDLE.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 kernel/sched_fair.c |   30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -681,9 +681,13 @@ place_entity(struct cfs_rq *cfs_rq, stru
 			unsigned long thresh = sysctl_sched_latency;
 
 			/*
-			 * convert the sleeper threshold into virtual time
+			 * Convert the sleeper threshold into virtual time.
+			 * SCHED_IDLE is a special sub-class.  We care about
+			 * fairness only relative to other SCHED_IDLE tasks,
+			 * all of which have the same weight.
 			 */
-			if (sched_feat(NORMALIZED_SLEEPER))
+			if (sched_feat(NORMALIZED_SLEEPER) &&
+					task_of(se)->policy != SCHED_IDLE)
 				thresh = calc_delta_fair(thresh, se);
 
 			vruntime -= thresh;
@@ -1328,14 +1332,18 @@ wakeup_preempt_entity(struct sched_entit
 
 static void set_last_buddy(struct sched_entity *se)
 {
-	for_each_sched_entity(se)
-		cfs_rq_of(se)->last = se;
+	if (likely(task_of(se)->policy != SCHED_IDLE)) {
+		for_each_sched_entity(se)
+			cfs_rq_of(se)->last = se;
+	}
 }
 
 static void set_next_buddy(struct sched_entity *se)
 {
-	for_each_sched_entity(se)
-		cfs_rq_of(se)->next = se;
+	if (likely(task_of(se)->policy != SCHED_IDLE)) {
+		for_each_sched_entity(se)
+			cfs_rq_of(se)->next = se;
+	}
 }
 
 /*
@@ -1382,12 +1390,18 @@ static void check_preempt_wakeup(struct 
 		return;
 
 	/*
-	 * Batch tasks do not preempt (their preemption is driven by
+	 * Batch and idle tasks do not preempt (their preemption is driven by
 	 * the tick):
 	 */
-	if (unlikely(p->policy == SCHED_BATCH))
+	if (unlikely(p->policy != SCHED_NORMAL))
 		return;
 
+	/* Idle tasks are by definition preempted by everybody. */
+	if (unlikely(curr->policy == SCHED_IDLE)) {
+		resched_task(curr);
+		return;
+	}
+
 	if (!sched_feat(WAKEUP_PREEMPT))
 		return;
 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 10/39] x86, vm86: fix preemption bug
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (8 preceding siblings ...)
  2009-02-18 21:32   ` [patch 09/39] sched: SCHED_OTHER vs SCHED_IDLE isolation Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 11/39] Add support for VT6415 PCIE PATA IDE Host Controller Greg KH
                     ` (28 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Thomas Gleixner, Ingo Molnar

[-- Attachment #1: x86-vm86-fix-preemption-bug.patch --]
[-- Type: text/plain, Size: 2019 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Thomas Gleixner <tglx@linutronix.de>

commit be716615fe596ee117292dc615e95f707fb67fd1 upstream.

Commit 3d2a71a596bd9c761c8487a2178e95f8a61da083 ("x86, traps: converge
do_debug handlers") changed the preemption disable logic of do_debug()
so vm86_handle_trap() is called with preemption disabled resulting in:

 BUG: sleeping function called from invalid context at include/linux/kernel.h:155
 in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
 Pid: 3005, comm: dosemu.bin Tainted: G        W  2.6.29-rc1 #51
 Call Trace:
  [<c050d669>] copy_to_user+0x33/0x108
  [<c04181f4>] save_v86_state+0x65/0x149
  [<c0418531>] handle_vm86_trap+0x20/0x8f
  [<c064e345>] do_debug+0x15b/0x1a4
  [<c064df1f>] debug_stack_correct+0x27/0x2c
  [<c040365b>] sysenter_do_call+0x12/0x2f
 BUG: scheduling while atomic: dosemu.bin/3005/0x10000001

Restore the original calling convention and reenable preemption before
calling handle_vm86_trap().

Reported-by: Michal Suchanek <hramrach@centrum.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 arch/x86/kernel/traps.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -104,6 +104,12 @@ static inline void preempt_conditional_s
 		local_irq_enable();
 }
 
+static inline void conditional_cli(struct pt_regs *regs)
+{
+	if (regs->flags & X86_EFLAGS_IF)
+		local_irq_disable();
+}
+
 static inline void preempt_conditional_cli(struct pt_regs *regs)
 {
 	if (regs->flags & X86_EFLAGS_IF)
@@ -629,8 +635,10 @@ clear_dr7:
 
 #ifdef CONFIG_X86_32
 debug_vm86:
+	/* reenable preemption: handle_vm86_trap() might sleep */
+	dec_preempt_count();
 	handle_vm86_trap((struct kernel_vm86_regs *) regs, error_code, 1);
-	preempt_conditional_cli(regs);
+	conditional_cli(regs);
 	return;
 #endif
 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 11/39] Add support for VT6415 PCIE PATA IDE Host Controller
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (9 preceding siblings ...)
  2009-02-18 21:32   ` [patch 10/39] x86, vm86: fix preemption bug Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 12/39] ext2/xip: refuse to change xip flag during remount with busy inodes Greg KH
                     ` (27 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Zlatko Calusic

[-- Attachment #1: add-support-for-vt6415-pcie-pata-ide-host-controller.patch --]
[-- Type: text/plain, Size: 2059 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Zlatko Calusic <zlatko.calusic@iskon.hr>

commit 5955c7a2cfb6a35429adea5dc480002b15ca8cfc upstream.

Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/ata/pata_via.c  |    4 +++-
 include/linux/pci_ids.h |    1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/ata/pata_via.c
+++ b/drivers/ata/pata_via.c
@@ -110,7 +110,8 @@ static const struct via_isa_bridge {
 	{ "vt8237s",	PCI_DEVICE_ID_VIA_8237S,    0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST },
 	{ "vt8251",	PCI_DEVICE_ID_VIA_8251,     0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST },
 	{ "cx700",	PCI_DEVICE_ID_VIA_CX700,    0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST | VIA_SATA_PATA },
-	{ "vt6410",	PCI_DEVICE_ID_VIA_6410,     0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST | VIA_NO_ENABLES},
+	{ "vt6410",	PCI_DEVICE_ID_VIA_6410,     0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST | VIA_NO_ENABLES },
+	{ "vt6415",	PCI_DEVICE_ID_VIA_6415,     0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST | VIA_NO_ENABLES },
 	{ "vt8237a",	PCI_DEVICE_ID_VIA_8237A,    0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST },
 	{ "vt8237",	PCI_DEVICE_ID_VIA_8237,     0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST },
 	{ "vt8235",	PCI_DEVICE_ID_VIA_8235,     0x00, 0x2f, VIA_UDMA_133 | VIA_BAD_AST },
@@ -593,6 +594,7 @@ static int via_reinit_one(struct pci_dev
 #endif
 
 static const struct pci_device_id via[] = {
+	{ PCI_VDEVICE(VIA, 0x0415), },
 	{ PCI_VDEVICE(VIA, 0x0571), },
 	{ PCI_VDEVICE(VIA, 0x0581), },
 	{ PCI_VDEVICE(VIA, 0x1571), },
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -1312,6 +1312,7 @@
 #define PCI_DEVICE_ID_VIA_VT3351	0x0351
 #define PCI_DEVICE_ID_VIA_VT3364	0x0364
 #define PCI_DEVICE_ID_VIA_8371_0	0x0391
+#define PCI_DEVICE_ID_VIA_6415		0x0415
 #define PCI_DEVICE_ID_VIA_8501_0	0x0501
 #define PCI_DEVICE_ID_VIA_82C561	0x0561
 #define PCI_DEVICE_ID_VIA_82C586_1	0x0571


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 12/39] ext2/xip: refuse to change xip flag during remount with busy inodes
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (10 preceding siblings ...)
  2009-02-18 21:32   ` [patch 11/39] Add support for VT6415 PCIE PATA IDE Host Controller Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 13/39] 3c505: do not set pcb->data.raw beyond its size Greg KH
                     ` (26 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Carsten Otte, Nick Piggin, Jared Hulbert,
	Martin Schwidefsky, Heiko Carstens

[-- Attachment #1: ext2-xip-refuse-to-change-xip-flag-during-remount-with-busy-inodes.patch --]
[-- Type: text/plain, Size: 2734 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Carsten Otte <cotte@de.ibm.com>

commit 0e4a9b59282914fe057ab17027f55123964bc2e2 upstream.

For a reason that I was unable to understand in three months of debugging,
mount ext2 -o remount stopped working properly when remounting from
regular operation to xip, or the other way around.  According to a git
bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
rework in the vm:

commit 70688e4dd1647f0ceb502bbd5964fa344c5eb411
Author: Nick Piggin <npiggin@suse.de>
Date:   Mon Apr 28 02:13:02 2008 -0700

    xip: support non-struct page backed memory

In the failing scenario, the filesystem is mounted read only via root=
kernel parameter on s390x.  During remount (in rc.sysinit), the inodes of
the bash binary and its libraries are busy and cannot be invalidated (the
bash which is running rc.sysinit resides on subject filesystem).
Afterwards, another bash process (running ifup-eth) recurses into a
subshell, runs dup_mm (via fork).  Some of the mappings in this bash
process were created from inodes that could not be invalidated during
remount.

Both parent and child process crash some time later due to inconsistencies
in their address spaces.  The issue seems to be timing sensitive, various
attempts to recreate it have failed.

This patch refuses to change the xip flag during remount in case some
inodes cannot be invalidated.  This patch keeps users from running into
that issue.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext2/super.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -1177,9 +1177,12 @@ static int ext2_remount (struct super_bl
 	es = sbi->s_es;
 	if (((sbi->s_mount_opt & EXT2_MOUNT_XIP) !=
 	    (old_mount_opt & EXT2_MOUNT_XIP)) &&
-	    invalidate_inodes(sb))
-		ext2_warning(sb, __func__, "busy inodes while remounting "\
-			     "xip remain in cache (no functional problem)");
+	    invalidate_inodes(sb)) {
+		ext2_warning(sb, __func__, "refusing change of xip flag "
+			     "with busy inodes while remounting");
+		sbi->s_mount_opt &= ~EXT2_MOUNT_XIP;
+		sbi->s_mount_opt |= old_mount_opt & EXT2_MOUNT_XIP;
+	}
 	if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
 		return 0;
 	if (*flags & MS_RDONLY) {


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 13/39] 3c505: do not set pcb->data.raw beyond its size
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (11 preceding siblings ...)
  2009-02-18 21:32   ` [patch 12/39] ext2/xip: refuse to change xip flag during remount with busy inodes Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 14/39] Bluetooth: Fix TX error path in btsdio driver Greg KH
                     ` (25 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Roel Kluin, David S. Miller

[-- Attachment #1: 3c505-do-not-set-pcb-data.raw-beyond-its-size.patch --]
[-- Type: text/plain, Size: 1817 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Roel Kluin <roel.kluin@gmail.com>

commit 501aa061bd68169a5b54c123641f8dfa9ad31545 upstream.

Ensure that we do not set pcb->data.raw beyond its size, print an error message
and return false if we attempt to. A timout message was printed one too early.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/net/3c505.c |   26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

--- a/drivers/net/3c505.c
+++ b/drivers/net/3c505.c
@@ -493,21 +493,27 @@ static bool receive_pcb(struct net_devic
 	}
 	/* read the data */
 	spin_lock_irqsave(&adapter->lock, flags);
-	i = 0;
-	do {
-		j = 0;
-		while (((stat = get_status(dev->base_addr)) & ACRF) == 0 && j++ < 20000);
-		pcb->data.raw[i++] = inb_command(dev->base_addr);
-		if (i > MAX_PCB_DATA)
-			INVALID_PCB_MSG(i);
-	} while ((stat & ASF_PCB_MASK) != ASF_PCB_END && j < 20000);
+	for (i = 0; i < MAX_PCB_DATA; i++) {
+		for (j = 0; j < 20000; j++) {
+			stat = get_status(dev->base_addr);
+			if (stat & ACRF)
+				break;
+		}
+		pcb->data.raw[i] = inb_command(dev->base_addr);
+		if ((stat & ASF_PCB_MASK) == ASF_PCB_END || j >= 20000)
+			break;
+	}
 	spin_unlock_irqrestore(&adapter->lock, flags);
+	if (i >= MAX_PCB_DATA) {
+		INVALID_PCB_MSG(i);
+		return false;
+	}
 	if (j >= 20000) {
 		TIMEOUT_MSG(__LINE__);
 		return false;
 	}
-	/* woops, the last "data" byte was really the length! */
-	total_length = pcb->data.raw[--i];
+	/* the last "data" byte was really the length! */
+	total_length = pcb->data.raw[i];
 
 	/* safety check total length vs data length */
 	if (total_length != (pcb->length + 2)) {


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 14/39] Bluetooth: Fix TX error path in btsdio driver
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (12 preceding siblings ...)
  2009-02-18 21:32   ` [patch 13/39] 3c505: do not set pcb->data.raw beyond its size Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 15/39] ext4: Add support for non-native signed/unsigned htree hash algorithms Greg KH
                     ` (24 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Tomas Winkler, Marcel Holtmann

[-- Attachment #1: bluetooth-fix-tx-error-path-in-btsdio-driver.patch --]
[-- Type: text/plain, Size: 834 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Tomas Winkler <tomas.winkler@intel.com>

commit 7644d63d1348ec044ccd8f775fefe5eb7cbcac69 upstream.

This patch fixes accumulating of the header in case packet was requeued
in the error path.

Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/bluetooth/btsdio.c |    1 +
 1 file changed, 1 insertion(+)

--- a/drivers/bluetooth/btsdio.c
+++ b/drivers/bluetooth/btsdio.c
@@ -91,6 +91,7 @@ static int btsdio_tx_packet(struct btsdi
 
 	err = sdio_writesb(data->func, REG_TDAT, skb->data, skb->len);
 	if (err < 0) {
+		skb_pull(skb, 4);
 		sdio_writeb(data->func, 0x01, REG_PC_WRT, NULL);
 		return err;
 	}


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 15/39] ext4: Add support for non-native signed/unsigned htree hash algorithms
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (13 preceding siblings ...)
  2009-02-18 21:32   ` [patch 14/39] Bluetooth: Fix TX error path in btsdio driver Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 16/39] ext4: tone down ext4_da_writepages warnings Greg KH
                     ` (23 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: ext4-add-support-for-non-native-signed-unsigned-htree-hash-algorithms.patch --]
[-- Type: text/plain, Size: 6862 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit f99b25897a86fcfff9140396a97261ae65fed872)

The original ext3 hash algorithms assumed that variables of type char
were signed, as God and K&R intended.  Unfortunately, this assumption
is not true on some architectures.  Userspace support for marking
filesystems with non-native signed/unsigned chars was added two years
ago, but the kernel-side support was never added (until now).

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/ext4.h    |    3 ++
 fs/ext4/ext4_sb.h |    1 
 fs/ext4/hash.c    |   77 ++++++++++++++++++++++++++++++++++++++++++++++--------
 fs/ext4/namei.c   |    7 ++++
 fs/ext4/super.c   |   12 ++++++++
 5 files changed, 90 insertions(+), 10 deletions(-)

--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -891,6 +891,9 @@ static inline __le16 ext4_rec_len_to_dis
 #define DX_HASH_LEGACY		0
 #define DX_HASH_HALF_MD4	1
 #define DX_HASH_TEA		2
+#define DX_HASH_LEGACY_UNSIGNED	3
+#define DX_HASH_HALF_MD4_UNSIGNED	4
+#define DX_HASH_TEA_UNSIGNED		5
 
 #ifdef __KERNEL__
 
--- a/fs/ext4/ext4_sb.h
+++ b/fs/ext4/ext4_sb.h
@@ -57,6 +57,7 @@ struct ext4_sb_info {
 	u32 s_next_generation;
 	u32 s_hash_seed[4];
 	int s_def_hash_version;
+	int s_hash_unsigned;	/* 3 if hash should be signed, 0 if not */
 	struct percpu_counter s_freeblocks_counter;
 	struct percpu_counter s_freeinodes_counter;
 	struct percpu_counter s_dirs_counter;
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -35,23 +35,71 @@ static void TEA_transform(__u32 buf[4], 
 
 
 /* The old legacy hash */
-static __u32 dx_hack_hash(const char *name, int len)
+static __u32 dx_hack_hash_unsigned(const char *name, int len)
 {
-	__u32 hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
+	__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
+	const unsigned char *ucp = (const unsigned char *) name;
+
+	while (len--) {
+		hash = hash1 + (hash0 ^ (((int) *ucp++) * 7152373));
+
+		if (hash & 0x80000000)
+			hash -= 0x7fffffff;
+		hash1 = hash0;
+		hash0 = hash;
+	}
+	return hash0 << 1;
+}
+
+static __u32 dx_hack_hash_signed(const char *name, int len)
+{
+	__u32 hash, hash0 = 0x12a3fe2d, hash1 = 0x37abe8f9;
+	const signed char *scp = (const signed char *) name;
+
 	while (len--) {
-		__u32 hash = hash1 + (hash0 ^ (*name++ * 7152373));
+		hash = hash1 + (hash0 ^ (((int) *scp++) * 7152373));
 
-		if (hash & 0x80000000) hash -= 0x7fffffff;
+		if (hash & 0x80000000)
+			hash -= 0x7fffffff;
 		hash1 = hash0;
 		hash0 = hash;
 	}
-	return (hash0 << 1);
+	return hash0 << 1;
+}
+
+static void str2hashbuf_signed(const char *msg, int len, __u32 *buf, int num)
+{
+	__u32	pad, val;
+	int	i;
+	const signed char *scp = (const signed char *) msg;
+
+	pad = (__u32)len | ((__u32)len << 8);
+	pad |= pad << 16;
+
+	val = pad;
+	if (len > num*4)
+		len = num * 4;
+	for (i = 0; i < len; i++) {
+		if ((i % 4) == 0)
+			val = pad;
+		val = ((int) scp[i]) + (val << 8);
+		if ((i % 4) == 3) {
+			*buf++ = val;
+			val = pad;
+			num--;
+		}
+	}
+	if (--num >= 0)
+		*buf++ = val;
+	while (--num >= 0)
+		*buf++ = pad;
 }
 
-static void str2hashbuf(const char *msg, int len, __u32 *buf, int num)
+static void str2hashbuf_unsigned(const char *msg, int len, __u32 *buf, int num)
 {
 	__u32	pad, val;
 	int	i;
+	const unsigned char *ucp = (const unsigned char *) msg;
 
 	pad = (__u32)len | ((__u32)len << 8);
 	pad |= pad << 16;
@@ -62,7 +110,7 @@ static void str2hashbuf(const char *msg,
 	for (i = 0; i < len; i++) {
 		if ((i % 4) == 0)
 			val = pad;
-		val = msg[i] + (val << 8);
+		val = ((int) ucp[i]) + (val << 8);
 		if ((i % 4) == 3) {
 			*buf++ = val;
 			val = pad;
@@ -95,6 +143,8 @@ int ext4fs_dirhash(const char *name, int
 	const char	*p;
 	int		i;
 	__u32		in[8], buf[4];
+	void		(*str2hashbuf)(const char *, int, __u32 *, int) =
+				str2hashbuf_signed;
 
 	/* Initialize the default seed for the hash checksum functions */
 	buf[0] = 0x67452301;
@@ -113,13 +163,18 @@ int ext4fs_dirhash(const char *name, int
 	}
 
 	switch (hinfo->hash_version) {
+	case DX_HASH_LEGACY_UNSIGNED:
+		hash = dx_hack_hash_unsigned(name, len);
+		break;
 	case DX_HASH_LEGACY:
-		hash = dx_hack_hash(name, len);
+		hash = dx_hack_hash_signed(name, len);
 		break;
+	case DX_HASH_HALF_MD4_UNSIGNED:
+		str2hashbuf = str2hashbuf_unsigned;
 	case DX_HASH_HALF_MD4:
 		p = name;
 		while (len > 0) {
-			str2hashbuf(p, len, in, 8);
+			(*str2hashbuf)(p, len, in, 8);
 			half_md4_transform(buf, in);
 			len -= 32;
 			p += 32;
@@ -127,10 +182,12 @@ int ext4fs_dirhash(const char *name, int
 		minor_hash = buf[2];
 		hash = buf[1];
 		break;
+	case DX_HASH_TEA_UNSIGNED:
+		str2hashbuf = str2hashbuf_unsigned;
 	case DX_HASH_TEA:
 		p = name;
 		while (len > 0) {
-			str2hashbuf(p, len, in, 4);
+			(*str2hashbuf)(p, len, in, 4);
 			TEA_transform(buf, in);
 			len -= 16;
 			p += 16;
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -372,6 +372,8 @@ dx_probe(const struct qstr *d_name, stru
 		goto fail;
 	}
 	hinfo->hash_version = root->info.hash_version;
+	if (hinfo->hash_version <= DX_HASH_TEA)
+		hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
 	hinfo->seed = EXT4_SB(dir->i_sb)->s_hash_seed;
 	if (d_name)
 		ext4fs_dirhash(d_name->name, d_name->len, hinfo);
@@ -641,6 +643,9 @@ int ext4_htree_fill_tree(struct file *di
 	dir = dir_file->f_path.dentry->d_inode;
 	if (!(EXT4_I(dir)->i_flags & EXT4_INDEX_FL)) {
 		hinfo.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version;
+		if (hinfo.hash_version <= DX_HASH_TEA)
+			hinfo.hash_version +=
+				EXT4_SB(dir->i_sb)->s_hash_unsigned;
 		hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
 		count = htree_dirblock_to_tree(dir_file, dir, 0, &hinfo,
 					       start_hash, start_minor_hash);
@@ -1408,6 +1413,8 @@ static int make_indexed_dir(handle_t *ha
 
 	/* Initialize as for dx_probe */
 	hinfo.hash_version = root->info.hash_version;
+	if (hinfo.hash_version <= DX_HASH_TEA)
+		hinfo.hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
 	hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
 	ext4fs_dirhash(name, namelen, &hinfo);
 	frame = frames;
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2118,6 +2118,18 @@ static int ext4_fill_super(struct super_
 	for (i = 0; i < 4; i++)
 		sbi->s_hash_seed[i] = le32_to_cpu(es->s_hash_seed[i]);
 	sbi->s_def_hash_version = es->s_def_hash_version;
+	i = le32_to_cpu(es->s_flags);
+	if (i & EXT2_FLAGS_UNSIGNED_HASH)
+		sbi->s_hash_unsigned = 3;
+	else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
+#ifdef __CHAR_UNSIGNED__
+		es->s_flags |= cpu_to_le32(EXT2_FLAGS_UNSIGNED_HASH);
+		sbi->s_hash_unsigned = 3;
+#else
+		es->s_flags |= cpu_to_le32(EXT2_FLAGS_SIGNED_HASH);
+#endif
+		sb->s_dirt = 1;
+	}
 
 	if (sbi->s_blocks_per_group > blocksize * 8) {
 		printk(KERN_ERR


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 16/39] ext4: tone down ext4_da_writepages warnings
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (14 preceding siblings ...)
  2009-02-18 21:32   ` [patch 15/39] ext4: Add support for non-native signed/unsigned htree hash algorithms Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 17/39] ext4: Fix the delalloc writepages to allocate blocks at the right offset Greg KH
                     ` (22 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: ext4-tone-down-ext4_da_writepages-warnings.patch --]
[-- Type: text/plain, Size: 2013 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit 2a21e37e48b94388f2cc8c0392f104f5443d4bb8)

If the filesystem has errors, ext4_da_writepages() will return a *lot*
of errors, including lots and lots of stack dumps.  While it's true
that we are dropping user data on the floor, which is unfortunate, the
stack dumps aren't helpful, and they tend to obscure the true original
root cause of the problem.  So in the case where the filesystem has
aborted, return an EROFS right away.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/inode.c |   16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2388,6 +2388,20 @@ static int ext4_da_writepages(struct add
 	 */
 	if (!mapping->nrpages || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY))
 		return 0;
+
+	/*
+	 * If the filesystem has aborted, it is read-only, so return
+	 * right away instead of dumping stack traces later on that
+	 * will obscure the real source of the problem.  We test
+	 * EXT4_MOUNT_ABORT instead of sb->s_flag's MS_RDONLY because
+	 * the latter could be true if the filesystem is mounted
+	 * read-only, and in that case, ext4_da_writepages should
+	 * *never* be called, so if that ever happens, we would want
+	 * the stack trace.
+	 */
+	if (unlikely(sbi->s_mount_opt & EXT4_MOUNT_ABORT))
+		return -EROFS;
+
 	/*
 	 * Make sure nr_to_write is >= sbi->s_mb_stream_request
 	 * This make sure small files blocks are allocated in
@@ -2432,7 +2446,7 @@ static int ext4_da_writepages(struct add
 		handle = ext4_journal_start(inode, needed_blocks);
 		if (IS_ERR(handle)) {
 			ret = PTR_ERR(handle);
-			printk(KERN_EMERG "%s: jbd2_start: "
+			printk(KERN_CRIT "%s: jbd2_start: "
 			       "%ld pages, ino %lu; err %d\n", __func__,
 				wbc->nr_to_write, inode->i_ino, ret);
 			dump_stack();


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 17/39] ext4: Fix the delalloc writepages to allocate blocks at the right offset.
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (15 preceding siblings ...)
  2009-02-18 21:32   ` [patch 16/39] ext4: tone down ext4_da_writepages warnings Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 18/39] ext4: avoid ext4_error when mounting a fs with a single bg Greg KH
                     ` (21 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-fix-the-delalloc-writepages-to-allocate-blocks-at-the-right-offset.patch --]
[-- Type: text/plain, Size: 3563 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 791b7f08954869d7b8ff438f3dac3cfb39778297)

When iterating through the pages which have mapped buffer_heads, we
failed to update the b_state value. This results in allocating blocks
at logical offset 0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/inode.c |   56 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 17 deletions(-)

--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1644,35 +1644,39 @@ struct mpage_da_data {
  */
 static int mpage_da_submit_io(struct mpage_da_data *mpd)
 {
-	struct address_space *mapping = mpd->inode->i_mapping;
-	int ret = 0, err, nr_pages, i;
-	unsigned long index, end;
-	struct pagevec pvec;
 	long pages_skipped;
+	struct pagevec pvec;
+	unsigned long index, end;
+	int ret = 0, err, nr_pages, i;
+	struct inode *inode = mpd->inode;
+	struct address_space *mapping = inode->i_mapping;
 
 	BUG_ON(mpd->next_page <= mpd->first_page);
-	pagevec_init(&pvec, 0);
+	/*
+	 * We need to start from the first_page to the next_page - 1
+	 * to make sure we also write the mapped dirty buffer_heads.
+	 * If we look at mpd->lbh.b_blocknr we would only be looking
+	 * at the currently mapped buffer_heads.
+	 */
 	index = mpd->first_page;
 	end = mpd->next_page - 1;
 
+	pagevec_init(&pvec, 0);
 	while (index <= end) {
-		/*
-		 * We can use PAGECACHE_TAG_DIRTY lookup here because
-		 * even though we have cleared the dirty flag on the page
-		 * We still keep the page in the radix tree with tag
-		 * PAGECACHE_TAG_DIRTY. See clear_page_dirty_for_io.
-		 * The PAGECACHE_TAG_DIRTY is cleared in set_page_writeback
-		 * which is called via the below writepage callback.
-		 */
-		nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
-					PAGECACHE_TAG_DIRTY,
-					min(end - index,
-					(pgoff_t)PAGEVEC_SIZE-1) + 1);
+		nr_pages = pagevec_lookup(&pvec, mapping, index, PAGEVEC_SIZE);
 		if (nr_pages == 0)
 			break;
 		for (i = 0; i < nr_pages; i++) {
 			struct page *page = pvec.pages[i];
 
+			index = page->index;
+			if (index > end)
+				break;
+			index++;
+
+			BUG_ON(!PageLocked(page));
+			BUG_ON(PageWriteback(page));
+
 			pages_skipped = mpd->wbc->pages_skipped;
 			err = mapping->a_ops->writepage(page, mpd->wbc);
 			if (!err && (pages_skipped == mpd->wbc->pages_skipped))
@@ -2086,11 +2090,29 @@ static int __mpage_da_writepage(struct p
 		bh = head;
 		do {
 			BUG_ON(buffer_locked(bh));
+			/*
+			 * We need to try to allocate
+			 * unmapped blocks in the same page.
+			 * Otherwise we won't make progress
+			 * with the page in ext4_da_writepage
+			 */
 			if (buffer_dirty(bh) &&
 				(!buffer_mapped(bh) || buffer_delay(bh))) {
 				mpage_add_bh_to_extent(mpd, logical, bh);
 				if (mpd->io_done)
 					return MPAGE_DA_EXTENT_TAIL;
+			} else if (buffer_dirty(bh) && (buffer_mapped(bh))) {
+				/*
+				 * mapped dirty buffer. We need to update
+				 * the b_state because we look at
+				 * b_state in mpage_da_map_blocks. We don't
+				 * update b_size because if we find an
+				 * unmapped buffer_head later we need to
+				 * use the b_state flag of that buffer_head.
+				 */
+				if (mpd->lbh.b_size == 0)
+					mpd->lbh.b_state =
+						bh->b_state & BH_FLAGS;
 			}
 			logical++;
 		} while ((bh = bh->b_this_page) != head);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 18/39] ext4: avoid ext4_error when mounting a fs with a single bg
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (16 preceding siblings ...)
  2009-02-18 21:32   ` [patch 17/39] ext4: Fix the delalloc writepages to allocate blocks at the right offset Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 19/39] ext4: Widen type of ext4_sb_info.s_mb_maxs[] Greg KH
                     ` (20 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-avoid-ext4_error-when-mounting-a-fs-with-a-single-bg.patch --]
[-- Type: text/plain, Size: 1148 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 565a9617b2151e21b22700e97a8b04e70e103153)

Remove some completely unneeded code which which caused an ext4_error
to be generated when mounting a file system with only a single block
group.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/super.c |    4 ----
 1 file changed, 4 deletions(-)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1445,7 +1445,6 @@ static int ext4_fill_flex_info(struct su
 	ext4_group_t flex_group_count;
 	ext4_group_t flex_group;
 	int groups_per_flex = 0;
-	__u64 block_bitmap = 0;
 	int i;
 
 	if (!sbi->s_es->s_log_groups_per_flex) {
@@ -1468,9 +1467,6 @@ static int ext4_fill_flex_info(struct su
 		goto failed;
 	}
 
-	gdp = ext4_get_group_desc(sb, 1, &bh);
-	block_bitmap = ext4_block_bitmap(sb, gdp) - 1;
-
 	for (i = 0; i < sbi->s_groups_count; i++) {
 		gdp = ext4_get_group_desc(sb, i, &bh);
 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 19/39] ext4: Widen type of ext4_sb_info.s_mb_maxs[]
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (17 preceding siblings ...)
  2009-02-18 21:32   ` [patch 18/39] ext4: avoid ext4_error when mounting a fs with a single bg Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 20/39] jbd2: Add barrier not supported test to journal_wait_on_commit_record Greg KH
                     ` (19 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Li Zefan, Yasunori Goto, linux-ext4, Miao Xie

[-- Attachment #1: ext4-widen-type-of-ext4_sb_info.s_mb_maxs.patch --]
[-- Type: text/plain, Size: 1564 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Yasunori Goto <y-goto@jp.fujitsu.com>

(cherry picked from commit ff7ef329b268b603ea4a2303241ef1c3829fd574)

I chased the cause of following ext4 oops report which is tested on
ia64 box.

http://bugzilla.kernel.org/show_bug.cgi?id=12018

The cause is the size of s_mb_maxs array that is defined as "unsigned
short" in ext4_sb_info structure.  If the file system's block size is
8k or greater, an unsigned short is not wide enough to contain the
value fs->blocksize << 3.

Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/ext4_sb.h |    3 ++-
 fs/ext4/mballoc.c |    2 ++
 2 files changed, 4 insertions(+), 1 deletion(-)

--- a/fs/ext4/ext4_sb.h
+++ b/fs/ext4/ext4_sb.h
@@ -102,7 +102,8 @@ struct ext4_sb_info {
 	spinlock_t s_reserve_lock;
 	spinlock_t s_md_lock;
 	tid_t s_last_transaction;
-	unsigned short *s_mb_offsets, *s_mb_maxs;
+	unsigned short *s_mb_offsets;
+	unsigned int *s_mb_maxs;
 
 	/* tunables */
 	unsigned long s_stripe;
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2493,6 +2493,8 @@ int ext4_mb_init(struct super_block *sb,
 	if (sbi->s_mb_offsets == NULL) {
 		return -ENOMEM;
 	}
+
+	i = (sb->s_blocksize_bits + 2) * sizeof(unsigned int);
 	sbi->s_mb_maxs = kmalloc(i, GFP_KERNEL);
 	if (sbi->s_mb_maxs == NULL) {
 		kfree(sbi->s_mb_maxs);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 20/39] jbd2: Add barrier not supported test to journal_wait_on_commit_record
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (18 preceding siblings ...)
  2009-02-18 21:32   ` [patch 19/39] ext4: Widen type of ext4_sb_info.s_mb_maxs[] Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 21/39] ext4: Dont overwrite allocation_context ac_status Greg KH
                     ` (18 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: jbd2-add-barrier-not-supported-test-to-journal_wait_on_commit_record.patch --]
[-- Type: text/plain, Size: 2434 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit fd98496f467b3d26d05ab1498f41718b5ef13de5)

Xen doesn't report that barriers are not supported until buffer I/O is
reported as completed, instead of when the buffer I/O is submitted.
Add a check and a fallback codepath to journal_wait_on_commit_record()
to detect this case, so that attempts to mount ext4 filesystems on
LVM/devicemapper devices on Xen guests don't blow up with an "Aborting
journal on device XXX"; "Remounting filesystem read-only" error.

Thanks to Andreas Sundstrom for reporting this issue.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/jbd2/commit.c |   27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -25,6 +25,7 @@
 #include <linux/crc32.h>
 #include <linux/writeback.h>
 #include <linux/backing-dev.h>
+#include <linux/bio.h>
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -168,12 +169,34 @@ static int journal_submit_commit_record(
  * This function along with journal_submit_commit_record
  * allows to write the commit record asynchronously.
  */
-static int journal_wait_on_commit_record(struct buffer_head *bh)
+static int journal_wait_on_commit_record(journal_t *journal,
+					 struct buffer_head *bh)
 {
 	int ret = 0;
 
+retry:
 	clear_buffer_dirty(bh);
 	wait_on_buffer(bh);
+	if (buffer_eopnotsupp(bh) && (journal->j_flags & JBD2_BARRIER)) {
+		printk(KERN_WARNING
+		       "JBD2: wait_on_commit_record: sync failed on %s - "
+		       "disabling barriers\n", journal->j_devname);
+		spin_lock(&journal->j_state_lock);
+		journal->j_flags &= ~JBD2_BARRIER;
+		spin_unlock(&journal->j_state_lock);
+
+		lock_buffer(bh);
+		clear_buffer_dirty(bh);
+		set_buffer_uptodate(bh);
+		bh->b_end_io = journal_end_buffer_io_sync;
+
+		ret = submit_bh(WRITE_SYNC, bh);
+		if (ret) {
+			unlock_buffer(bh);
+			return ret;
+		}
+		goto retry;
+	}
 
 	if (unlikely(!buffer_uptodate(bh)))
 		ret = -EIO;
@@ -799,7 +822,7 @@ wait_for_iobuf:
 			__jbd2_journal_abort_hard(journal);
 	}
 	if (!err && !is_journal_aborted(journal))
-		err = journal_wait_on_commit_record(cbh);
+		err = journal_wait_on_commit_record(journal, cbh);
 
 	if (err)
 		jbd2_journal_abort(journal, err);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 21/39] ext4: Dont overwrite allocation_context ac_status
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (19 preceding siblings ...)
  2009-02-18 21:32   ` [patch 20/39] jbd2: Add barrier not supported test to journal_wait_on_commit_record Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 22/39] ext4: Add blocks added during resize to bitmap Greg KH
                     ` (17 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-don-t-overwrite-allocation_context-ac_status.patch --]
[-- Type: text/plain, Size: 1449 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 032115fcef837a00336ddf7bda584e89789ea498)

We can call ext4_mb_check_limits even after successfully allocating
the requested blocks.  In that case, make sure we don't overwrite
ac_status if it already has the status AC_STATUS_FOUND.  This fixes
the lockdep warning:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28-rc6-autokern1 #1
---------------------------------------------
fsstress/11948 is trying to acquire lock:
 (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278
.....

stack backtrace:
.....
 [<c04db974>] ext4_mb_regular_allocator+0xbb5/0xd44
.....

but task is already holding lock:
 (&meta_group_info[i]->alloc_sem){----}, at: [<c04d9a49>] ext4_mb_load_buddy+0x9f/0x278

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 fs/ext4/mballoc.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1326,6 +1326,8 @@ static void ext4_mb_check_limits(struct 
 	struct ext4_free_extent ex;
 	int max;
 
+	if (ac->ac_status == AC_STATUS_FOUND)
+		return;
 	/*
 	 * We don't want to scan for a whole year
 	 */


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 22/39] ext4: Add blocks added during resize to bitmap
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (20 preceding siblings ...)
  2009-02-18 21:32   ` [patch 21/39] ext4: Dont overwrite allocation_context ac_status Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 23/39] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize Greg KH
                     ` (16 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-add-blocks-added-during-resize-to-bitmap.patch --]
[-- Type: text/plain, Size: 10343 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit e21675d4b63975d09eb75c443c48ebe663d23e18)

With this change new blocks added during resize
are marked as free in the block bitmap and the
group is flagged with EXT4_GROUP_INFO_NEED_INIT_BIT
flag.  This makes sure when mballoc tries to allocate
blocks from the new group we would reload the
buddy information using the bitmap present in the disk.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/balloc.c |  138 ++++++++++++-------------------------------------------
 fs/ext4/ext4.h   |    5 -
 fs/ext4/resize.c |   11 ----
 3 files changed, 35 insertions(+), 119 deletions(-)

--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -20,6 +20,7 @@
 #include "ext4.h"
 #include "ext4_jbd2.h"
 #include "group.h"
+#include "mballoc.h"
 
 /*
  * balloc.c contains the blocks allocation and deallocation routines
@@ -350,62 +351,43 @@ ext4_read_block_bitmap(struct super_bloc
 }
 
 /**
- * ext4_free_blocks_sb() -- Free given blocks and update quota
+ * ext4_add_groupblocks() -- Add given blocks to an existing group
  * @handle:			handle to this transaction
  * @sb:				super block
- * @block:			start physcial block to free
+ * @block:			start physcial block to add to the block group
  * @count:			number of blocks to free
- * @pdquot_freed_blocks:	pointer to quota
  *
- * XXX This function is only used by the on-line resizing code, which
- * should probably be fixed up to call the mballoc variant.  There
- * this needs to be cleaned up later; in fact, I'm not convinced this
- * is 100% correct in the face of the mballoc code.  The online resizing
- * code needs to be fixed up to more tightly (and correctly) interlock
- * with the mballoc code.
- */
-void ext4_free_blocks_sb(handle_t *handle, struct super_block *sb,
-			 ext4_fsblk_t block, unsigned long count,
-			 unsigned long *pdquot_freed_blocks)
+ * This marks the blocks as free in the bitmap. We ask the
+ * mballoc to reload the buddy after this by setting group
+ * EXT4_GROUP_INFO_NEED_INIT_BIT flag
+ */
+void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
+			 ext4_fsblk_t block, unsigned long count)
 {
 	struct buffer_head *bitmap_bh = NULL;
 	struct buffer_head *gd_bh;
 	ext4_group_t block_group;
 	ext4_grpblk_t bit;
 	unsigned long i;
-	unsigned long overflow;
 	struct ext4_group_desc *desc;
 	struct ext4_super_block *es;
 	struct ext4_sb_info *sbi;
 	int err = 0, ret;
-	ext4_grpblk_t group_freed;
+	ext4_grpblk_t blocks_freed;
+	struct ext4_group_info *grp;
 
-	*pdquot_freed_blocks = 0;
 	sbi = EXT4_SB(sb);
 	es = sbi->s_es;
-	if (block < le32_to_cpu(es->s_first_data_block) ||
-	    block + count < block ||
-	    block + count > ext4_blocks_count(es)) {
-		ext4_error(sb, "ext4_free_blocks",
-			   "Freeing blocks not in datazone - "
-			   "block = %llu, count = %lu", block, count);
-		goto error_return;
-	}
+	ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
 
-	ext4_debug("freeing block(s) %llu-%llu\n", block, block + count - 1);
-
-do_more:
-	overflow = 0;
 	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
 	/*
 	 * Check to see if we are freeing blocks across a group
 	 * boundary.
 	 */
 	if (bit + count > EXT4_BLOCKS_PER_GROUP(sb)) {
-		overflow = bit + count - EXT4_BLOCKS_PER_GROUP(sb);
-		count -= overflow;
+		goto error_return;
 	}
-	brelse(bitmap_bh);
 	bitmap_bh = ext4_read_block_bitmap(sb, block_group);
 	if (!bitmap_bh)
 		goto error_return;
@@ -418,18 +400,17 @@ do_more:
 	    in_range(block, ext4_inode_table(sb, desc), sbi->s_itb_per_group) ||
 	    in_range(block + count - 1, ext4_inode_table(sb, desc),
 		     sbi->s_itb_per_group)) {
-		ext4_error(sb, "ext4_free_blocks",
-			   "Freeing blocks in system zones - "
+		ext4_error(sb, __func__,
+			   "Adding blocks in system zones - "
 			   "Block = %llu, count = %lu",
 			   block, count);
 		goto error_return;
 	}
 
 	/*
-	 * We are about to start releasing blocks in the bitmap,
+	 * We are about to add blocks to the bitmap,
 	 * so we need undo access.
 	 */
-	/* @@@ check errors */
 	BUFFER_TRACE(bitmap_bh, "getting undo access");
 	err = ext4_journal_get_undo_access(handle, bitmap_bh);
 	if (err)
@@ -445,87 +426,28 @@ do_more:
 	if (err)
 		goto error_return;
 
-	jbd_lock_bh_state(bitmap_bh);
-
-	for (i = 0, group_freed = 0; i < count; i++) {
-		/*
-		 * An HJ special.  This is expensive...
-		 */
-#ifdef CONFIG_JBD2_DEBUG
-		jbd_unlock_bh_state(bitmap_bh);
-		{
-			struct buffer_head *debug_bh;
-			debug_bh = sb_find_get_block(sb, block + i);
-			if (debug_bh) {
-				BUFFER_TRACE(debug_bh, "Deleted!");
-				if (!bh2jh(bitmap_bh)->b_committed_data)
-					BUFFER_TRACE(debug_bh,
-						"No commited data in bitmap");
-				BUFFER_TRACE2(debug_bh, bitmap_bh, "bitmap");
-				__brelse(debug_bh);
-			}
-		}
-		jbd_lock_bh_state(bitmap_bh);
-#endif
-		if (need_resched()) {
-			jbd_unlock_bh_state(bitmap_bh);
-			cond_resched();
-			jbd_lock_bh_state(bitmap_bh);
-		}
-		/* @@@ This prevents newly-allocated data from being
-		 * freed and then reallocated within the same
-		 * transaction.
-		 *
-		 * Ideally we would want to allow that to happen, but to
-		 * do so requires making jbd2_journal_forget() capable of
-		 * revoking the queued write of a data block, which
-		 * implies blocking on the journal lock.  *forget()
-		 * cannot block due to truncate races.
-		 *
-		 * Eventually we can fix this by making jbd2_journal_forget()
-		 * return a status indicating whether or not it was able
-		 * to revoke the buffer.  On successful revoke, it is
-		 * safe not to set the allocation bit in the committed
-		 * bitmap, because we know that there is no outstanding
-		 * activity on the buffer any more and so it is safe to
-		 * reallocate it.
-		 */
-		BUFFER_TRACE(bitmap_bh, "set in b_committed_data");
-		J_ASSERT_BH(bitmap_bh,
-				bh2jh(bitmap_bh)->b_committed_data != NULL);
-		ext4_set_bit_atomic(sb_bgl_lock(sbi, block_group), bit + i,
-				bh2jh(bitmap_bh)->b_committed_data);
-
-		/*
-		 * We clear the bit in the bitmap after setting the committed
-		 * data bit, because this is the reverse order to that which
-		 * the allocator uses.
-		 */
+	for (i = 0, blocks_freed = 0; i < count; i++) {
 		BUFFER_TRACE(bitmap_bh, "clear bit");
 		if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
 						bit + i, bitmap_bh->b_data)) {
-			jbd_unlock_bh_state(bitmap_bh);
 			ext4_error(sb, __func__,
 				   "bit already cleared for block %llu",
 				   (ext4_fsblk_t)(block + i));
-			jbd_lock_bh_state(bitmap_bh);
 			BUFFER_TRACE(bitmap_bh, "bit already cleared");
 		} else {
-			group_freed++;
+			blocks_freed++;
 		}
 	}
-	jbd_unlock_bh_state(bitmap_bh);
-
 	spin_lock(sb_bgl_lock(sbi, block_group));
-	le16_add_cpu(&desc->bg_free_blocks_count, group_freed);
+	le16_add_cpu(&desc->bg_free_blocks_count, blocks_freed);
 	desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
 	spin_unlock(sb_bgl_lock(sbi, block_group));
-	percpu_counter_add(&sbi->s_freeblocks_counter, count);
+	percpu_counter_add(&sbi->s_freeblocks_counter, blocks_freed);
 
 	if (sbi->s_log_groups_per_flex) {
 		ext4_group_t flex_group = ext4_flex_group(sbi, block_group);
 		spin_lock(sb_bgl_lock(sbi, flex_group));
-		sbi->s_flex_groups[flex_group].free_blocks += count;
+		sbi->s_flex_groups[flex_group].free_blocks += blocks_freed;
 		spin_unlock(sb_bgl_lock(sbi, flex_group));
 	}
 
@@ -536,15 +458,17 @@ do_more:
 	/* And the group descriptor block */
 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
 	ret = ext4_journal_dirty_metadata(handle, gd_bh);
-	if (!err) err = ret;
-	*pdquot_freed_blocks += group_freed;
-
-	if (overflow && !err) {
-		block += count;
-		count = overflow;
-		goto do_more;
-	}
+	if (!err)
+		err = ret;
 	sb->s_dirt = 1;
+	/*
+	 * request to reload the buddy with the
+	 * new bitmap information
+	 */
+	grp = ext4_get_group_info(sb, block_group);
+	set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
+	ext4_mb_update_group_info(grp, blocks_freed);
+
 error_return:
 	brelse(bitmap_bh);
 	ext4_std_error(sb, err);
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1009,9 +1009,8 @@ extern int ext4_claim_free_blocks(struct
 extern int ext4_has_free_blocks(struct ext4_sb_info *sbi, s64 nblocks);
 extern void ext4_free_blocks(handle_t *handle, struct inode *inode,
 			ext4_fsblk_t block, unsigned long count, int metadata);
-extern void ext4_free_blocks_sb(handle_t *handle, struct super_block *sb,
-				ext4_fsblk_t block, unsigned long count,
-				unsigned long *pdquot_freed_blocks);
+extern void ext4_add_groupblocks(handle_t *handle, struct super_block *sb,
+				ext4_fsblk_t block, unsigned long count);
 extern ext4_fsblk_t ext4_count_free_blocks(struct super_block *);
 extern void ext4_check_blocks_bitmap(struct super_block *);
 extern struct ext4_group_desc * ext4_get_group_desc(struct super_block * sb,
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -975,9 +975,7 @@ int ext4_group_extend(struct super_block
 	struct buffer_head *bh;
 	handle_t *handle;
 	int err;
-	unsigned long freed_blocks;
 	ext4_group_t group;
-	struct ext4_group_info *grp;
 
 	/* We don't need to worry about locking wrt other resizers just
 	 * yet: we're going to revalidate es->s_blocks_count after
@@ -1076,7 +1074,8 @@ int ext4_group_extend(struct super_block
 	unlock_super(sb);
 	ext4_debug("freeing blocks %llu through %llu\n", o_blocks_count,
 		   o_blocks_count + add);
-	ext4_free_blocks_sb(handle, sb, o_blocks_count, add, &freed_blocks);
+	/* We add the blocks to the bitmap and set the group need init bit */
+	ext4_add_groupblocks(handle, sb, o_blocks_count, add);
 	ext4_debug("freed blocks %llu through %llu\n", o_blocks_count,
 		   o_blocks_count + add);
 	if ((err = ext4_journal_stop(handle)))
@@ -1119,12 +1118,6 @@ int ext4_group_extend(struct super_block
 			ClearPageUptodate(page);
 			page_cache_release(page);
 		}
-
-		/* Get the info on the last group */
-		grp = ext4_get_group_info(sb, group);
-
-		/* Update free blocks in group info */
-		ext4_mb_update_group_info(grp, add);
 	}
 
 	if (test_opt(sb, DEBUG))


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 23/39] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (21 preceding siblings ...)
  2009-02-18 21:32   ` [patch 22/39] ext4: Add blocks added during resize to bitmap Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:32   ` [patch 24/39] ext4: cleanup mballoc header files Greg KH
                     ` (15 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-use-ext4_group_info_need_init_bit-during-resize.patch --]
[-- Type: text/plain, Size: 16855 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 920313a726e04fef0f2c0bcb04ad8229c0e700d8)

The new groups added during resize are flagged as
need_init group. Make sure we properly initialize these
groups. When we have block size < page size and we are adding
new groups the page may still be marked uptodate even though
we haven't initialized the group. While forcing the init
of buddy cache we need to make sure other groups part of the
same page of buddy cache is not using the cache.
group_info->alloc_sem is added to ensure the same.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/balloc.c  |   21 ++--
 fs/ext4/ext4.h    |    7 -
 fs/ext4/mballoc.c |  259 +++++++++++++++++++++++++++++++++++++++++-------------
 fs/ext4/mballoc.h |    3 
 fs/ext4/resize.c  |   49 +---------
 5 files changed, 229 insertions(+), 110 deletions(-)

--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -381,6 +381,7 @@ void ext4_add_groupblocks(handle_t *hand
 	ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1);
 
 	ext4_get_group_no_and_offset(sb, block, &block_group, &bit);
+	grp = ext4_get_group_info(sb, block_group);
 	/*
 	 * Check to see if we are freeing blocks across a group
 	 * boundary.
@@ -425,7 +426,11 @@ void ext4_add_groupblocks(handle_t *hand
 	err = ext4_journal_get_write_access(handle, gd_bh);
 	if (err)
 		goto error_return;
-
+	/*
+	 * make sure we don't allow a parallel init on other groups in the
+	 * same buddy cache
+	 */
+	down_write(&grp->alloc_sem);
 	for (i = 0, blocks_freed = 0; i < count; i++) {
 		BUFFER_TRACE(bitmap_bh, "clear bit");
 		if (!ext4_clear_bit_atomic(sb_bgl_lock(sbi, block_group),
@@ -450,6 +455,13 @@ void ext4_add_groupblocks(handle_t *hand
 		sbi->s_flex_groups[flex_group].free_blocks += blocks_freed;
 		spin_unlock(sb_bgl_lock(sbi, flex_group));
 	}
+	/*
+	 * request to reload the buddy with the
+	 * new bitmap information
+	 */
+	set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
+	ext4_mb_update_group_info(grp, blocks_freed);
+	up_write(&grp->alloc_sem);
 
 	/* We dirtied the bitmap block */
 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
@@ -461,13 +473,6 @@ void ext4_add_groupblocks(handle_t *hand
 	if (!err)
 		err = ret;
 	sb->s_dirt = 1;
-	/*
-	 * request to reload the buddy with the
-	 * new bitmap information
-	 */
-	grp = ext4_get_group_info(sb, block_group);
-	set_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
-	ext4_mb_update_group_info(grp, blocks_freed);
 
 error_return:
 	brelse(bitmap_bh);
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1056,12 +1056,13 @@ extern int __init init_ext4_mballoc(void
 extern void exit_ext4_mballoc(void);
 extern void ext4_mb_free_blocks(handle_t *, struct inode *,
 		unsigned long, unsigned long, int, unsigned long *);
-extern int ext4_mb_add_more_groupinfo(struct super_block *sb,
+extern int ext4_mb_add_groupinfo(struct super_block *sb,
 		ext4_group_t i, struct ext4_group_desc *desc);
 extern void ext4_mb_update_group_info(struct ext4_group_info *grp,
 		ext4_grpblk_t add);
-
-
+extern int ext4_mb_get_buddy_cache_lock(struct super_block *, ext4_group_t);
+extern void ext4_mb_put_buddy_cache_lock(struct super_block *,
+						ext4_group_t, int);
 /* inode.c */
 int ext4_forget(handle_t *handle, int is_metadata, struct inode *inode,
 		struct buffer_head *bh, ext4_fsblk_t blocknr);
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -886,18 +886,20 @@ static noinline_for_stack int
 ext4_mb_load_buddy(struct super_block *sb, ext4_group_t group,
 					struct ext4_buddy *e4b)
 {
-	struct ext4_sb_info *sbi = EXT4_SB(sb);
-	struct inode *inode = sbi->s_buddy_cache;
 	int blocks_per_page;
 	int block;
 	int pnum;
 	int poff;
 	struct page *page;
 	int ret;
+	struct ext4_group_info *grp;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct inode *inode = sbi->s_buddy_cache;
 
 	mb_debug("load group %lu\n", group);
 
 	blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
+	grp = ext4_get_group_info(sb, group);
 
 	e4b->bd_blkbits = sb->s_blocksize_bits;
 	e4b->bd_info = ext4_get_group_info(sb, group);
@@ -905,6 +907,15 @@ ext4_mb_load_buddy(struct super_block *s
 	e4b->bd_group = group;
 	e4b->bd_buddy_page = NULL;
 	e4b->bd_bitmap_page = NULL;
+	e4b->alloc_semp = &grp->alloc_sem;
+
+	/* Take the read lock on the group alloc
+	 * sem. This would make sure a parallel
+	 * ext4_mb_init_group happening on other
+	 * groups mapped by the page is blocked
+	 * till we are done with allocation
+	 */
+	down_read(e4b->alloc_semp);
 
 	/*
 	 * the buddy cache inode stores the block bitmap
@@ -920,6 +931,14 @@ ext4_mb_load_buddy(struct super_block *s
 	page = find_get_page(inode->i_mapping, pnum);
 	if (page == NULL || !PageUptodate(page)) {
 		if (page)
+			/*
+			 * drop the page reference and try
+			 * to get the page with lock. If we
+			 * are not uptodate that implies
+			 * somebody just created the page but
+			 * is yet to initialize the same. So
+			 * wait for it to initialize.
+			 */
 			page_cache_release(page);
 		page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
 		if (page) {
@@ -985,6 +1004,9 @@ err:
 		page_cache_release(e4b->bd_buddy_page);
 	e4b->bd_buddy = NULL;
 	e4b->bd_bitmap = NULL;
+
+	/* Done with the buddy cache */
+	up_read(e4b->alloc_semp);
 	return ret;
 }
 
@@ -994,6 +1016,8 @@ static void ext4_mb_release_desc(struct 
 		page_cache_release(e4b->bd_bitmap_page);
 	if (e4b->bd_buddy_page)
 		page_cache_release(e4b->bd_buddy_page);
+	/* Done with the buddy cache */
+	up_read(e4b->alloc_semp);
 }
 
 
@@ -1694,6 +1718,173 @@ static int ext4_mb_good_group(struct ext
 	return 0;
 }
 
+/*
+ * lock the group_info alloc_sem of all the groups
+ * belonging to the same buddy cache page. This
+ * make sure other parallel operation on the buddy
+ * cache doesn't happen  whild holding the buddy cache
+ * lock
+ */
+int ext4_mb_get_buddy_cache_lock(struct super_block *sb, ext4_group_t group)
+{
+	int i;
+	int block, pnum;
+	int blocks_per_page;
+	int groups_per_page;
+	ext4_group_t first_group;
+	struct ext4_group_info *grp;
+
+	blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
+	/*
+	 * the buddy cache inode stores the block bitmap
+	 * and buddy information in consecutive blocks.
+	 * So for each group we need two blocks.
+	 */
+	block = group * 2;
+	pnum = block / blocks_per_page;
+	first_group = pnum * blocks_per_page / 2;
+
+	groups_per_page = blocks_per_page >> 1;
+	if (groups_per_page == 0)
+		groups_per_page = 1;
+	/* read all groups the page covers into the cache */
+	for (i = 0; i < groups_per_page; i++) {
+
+		if ((first_group + i) >= EXT4_SB(sb)->s_groups_count)
+			break;
+		grp = ext4_get_group_info(sb, first_group + i);
+		/* take all groups write allocation
+		 * semaphore. This make sure there is
+		 * no block allocation going on in any
+		 * of that groups
+		 */
+		down_write(&grp->alloc_sem);
+	}
+	return i;
+}
+
+void ext4_mb_put_buddy_cache_lock(struct super_block *sb,
+					ext4_group_t group, int locked_group)
+{
+	int i;
+	int block, pnum;
+	int blocks_per_page;
+	ext4_group_t first_group;
+	struct ext4_group_info *grp;
+
+	blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
+	/*
+	 * the buddy cache inode stores the block bitmap
+	 * and buddy information in consecutive blocks.
+	 * So for each group we need two blocks.
+	 */
+	block = group * 2;
+	pnum = block / blocks_per_page;
+	first_group = pnum * blocks_per_page / 2;
+	/* release locks on all the groups */
+	for (i = 0; i < locked_group; i++) {
+
+		grp = ext4_get_group_info(sb, first_group + i);
+		/* take all groups write allocation
+		 * semaphore. This make sure there is
+		 * no block allocation going on in any
+		 * of that groups
+		 */
+		up_write(&grp->alloc_sem);
+	}
+
+}
+
+static int ext4_mb_init_group(struct super_block *sb, ext4_group_t group)
+{
+
+	int ret;
+	void *bitmap;
+	int blocks_per_page;
+	int block, pnum, poff;
+	int num_grp_locked = 0;
+	struct ext4_group_info *this_grp;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct inode *inode = sbi->s_buddy_cache;
+	struct page *page = NULL, *bitmap_page = NULL;
+
+	mb_debug("init group %lu\n", group);
+	blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
+	this_grp = ext4_get_group_info(sb, group);
+	/*
+	 * This ensures we don't add group
+	 * to this buddy cache via resize
+	 */
+	num_grp_locked =  ext4_mb_get_buddy_cache_lock(sb, group);
+	if (!EXT4_MB_GRP_NEED_INIT(this_grp)) {
+		/*
+		 * somebody initialized the group
+		 * return without doing anything
+		 */
+		ret = 0;
+		goto err;
+	}
+	/*
+	 * the buddy cache inode stores the block bitmap
+	 * and buddy information in consecutive blocks.
+	 * So for each group we need two blocks.
+	 */
+	block = group * 2;
+	pnum = block / blocks_per_page;
+	poff = block % blocks_per_page;
+	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	if (page) {
+		BUG_ON(page->mapping != inode->i_mapping);
+		ret = ext4_mb_init_cache(page, NULL);
+		if (ret) {
+			unlock_page(page);
+			goto err;
+		}
+		unlock_page(page);
+	}
+	if (page == NULL || !PageUptodate(page)) {
+		ret = -EIO;
+		goto err;
+	}
+	mark_page_accessed(page);
+	bitmap_page = page;
+	bitmap = page_address(page) + (poff * sb->s_blocksize);
+
+	/* init buddy cache */
+	block++;
+	pnum = block / blocks_per_page;
+	poff = block % blocks_per_page;
+	page = find_or_create_page(inode->i_mapping, pnum, GFP_NOFS);
+	if (page == bitmap_page) {
+		/*
+		 * If both the bitmap and buddy are in
+		 * the same page we don't need to force
+		 * init the buddy
+		 */
+		unlock_page(page);
+	} else if (page) {
+		BUG_ON(page->mapping != inode->i_mapping);
+		ret = ext4_mb_init_cache(page, bitmap);
+		if (ret) {
+			unlock_page(page);
+			goto err;
+		}
+		unlock_page(page);
+	}
+	if (page == NULL || !PageUptodate(page)) {
+		ret = -EIO;
+		goto err;
+	}
+	mark_page_accessed(page);
+err:
+	ext4_mb_put_buddy_cache_lock(sb, group, num_grp_locked);
+	if (bitmap_page)
+		page_cache_release(bitmap_page);
+	if (page)
+		page_cache_release(page);
+	return ret;
+}
+
 static noinline_for_stack int
 ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 {
@@ -1777,7 +1968,7 @@ repeat:
 				group = 0;
 
 			/* quick check to skip empty groups */
-			grp = ext4_get_group_info(ac->ac_sb, group);
+			grp = ext4_get_group_info(sb, group);
 			if (grp->bb_free == 0)
 				continue;
 
@@ -1790,10 +1981,9 @@ repeat:
 				 * we need full data about the group
 				 * to make a good selection
 				 */
-				err = ext4_mb_load_buddy(sb, group, &e4b);
+				err = ext4_mb_init_group(sb, group);
 				if (err)
 					goto out;
-				ext4_mb_release_desc(&e4b);
 			}
 
 			/*
@@ -2302,6 +2492,7 @@ int ext4_mb_add_groupinfo(struct super_b
 	}
 
 	INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
+	init_rwsem(&meta_group_info[i]->alloc_sem);
 	meta_group_info[i]->bb_free_root.rb_node = NULL;;
 
 #ifdef DOUBLE_CHECK
@@ -2329,54 +2520,6 @@ exit_meta_group_info:
 } /* ext4_mb_add_groupinfo */
 
 /*
- * Add a group to the existing groups.
- * This function is used for online resize
- */
-int ext4_mb_add_more_groupinfo(struct super_block *sb, ext4_group_t group,
-			       struct ext4_group_desc *desc)
-{
-	struct ext4_sb_info *sbi = EXT4_SB(sb);
-	struct inode *inode = sbi->s_buddy_cache;
-	int blocks_per_page;
-	int block;
-	int pnum;
-	struct page *page;
-	int err;
-
-	/* Add group based on group descriptor*/
-	err = ext4_mb_add_groupinfo(sb, group, desc);
-	if (err)
-		return err;
-
-	/*
-	 * Cache pages containing dynamic mb_alloc datas (buddy and bitmap
-	 * datas) are set not up to date so that they will be re-initilaized
-	 * during the next call to ext4_mb_load_buddy
-	 */
-
-	/* Set buddy page as not up to date */
-	blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
-	block = group * 2;
-	pnum = block / blocks_per_page;
-	page = find_get_page(inode->i_mapping, pnum);
-	if (page != NULL) {
-		ClearPageUptodate(page);
-		page_cache_release(page);
-	}
-
-	/* Set bitmap page as not up to date */
-	block++;
-	pnum = block / blocks_per_page;
-	page = find_get_page(inode->i_mapping, pnum);
-	if (page != NULL) {
-		ClearPageUptodate(page);
-		page_cache_release(page);
-	}
-
-	return 0;
-}
-
-/*
  * Update an existing group.
  * This function is used for online resize
  */
@@ -4585,11 +4728,6 @@ do_more:
 	err = ext4_journal_get_write_access(handle, gd_bh);
 	if (err)
 		goto error_return;
-
-	err = ext4_mb_load_buddy(sb, block_group, &e4b);
-	if (err)
-		goto error_return;
-
 #ifdef AGGRESSIVE_CHECK
 	{
 		int i;
@@ -4603,6 +4741,8 @@ do_more:
 	/* We dirtied the bitmap block */
 	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
 	err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+	if (err)
+		goto error_return;
 
 	if (ac) {
 		ac->ac_b_ex.fe_group = block_group;
@@ -4611,6 +4751,9 @@ do_more:
 		ext4_mb_store_history(ac);
 	}
 
+	err = ext4_mb_load_buddy(sb, block_group, &e4b);
+	if (err)
+		goto error_return;
 	if (metadata) {
 		/* blocks being freed are metadata. these blocks shouldn't
 		 * be used until this transaction is committed */
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -20,6 +20,7 @@
 #include <linux/version.h>
 #include <linux/blkdev.h>
 #include <linux/marker.h>
+#include <linux/mutex.h>
 #include "ext4_jbd2.h"
 #include "ext4.h"
 #include "group.h"
@@ -130,6 +131,7 @@ struct ext4_group_info {
 #ifdef DOUBLE_CHECK
 	void		*bb_bitmap;
 #endif
+	struct rw_semaphore alloc_sem;
 	unsigned short	bb_counters[];
 };
 
@@ -250,6 +252,7 @@ struct ext4_buddy {
 	struct super_block *bd_sb;
 	__u16 bd_blkbits;
 	ext4_group_t bd_group;
+	struct rw_semaphore *alloc_semp;
 };
 #define EXT4_MB_BITMAP(e4b)	((e4b)->bd_bitmap)
 #define EXT4_MB_BUDDY(e4b)	((e4b)->bd_buddy)
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -747,6 +747,7 @@ int ext4_group_add(struct super_block *s
 	struct inode *inode = NULL;
 	handle_t *handle;
 	int gdb_off, gdb_num;
+	int num_grp_locked = 0;
 	int err, err2;
 
 	gdb_num = input->group / EXT4_DESC_PER_BLOCK(sb);
@@ -787,6 +788,7 @@ int ext4_group_add(struct super_block *s
 		}
 	}
 
+
 	if ((err = verify_group_input(sb, input)))
 		goto exit_put;
 
@@ -855,6 +857,7 @@ int ext4_group_add(struct super_block *s
          * using the new disk blocks.
          */
 
+	num_grp_locked = ext4_mb_get_buddy_cache_lock(sb, input->group);
 	/* Update group descriptor block for new group */
 	gdp = (struct ext4_group_desc *)((char *)primary->b_data +
 					 gdb_off * EXT4_DESC_SIZE(sb));
@@ -870,9 +873,11 @@ int ext4_group_add(struct super_block *s
 	 * We can allocate memory for mb_alloc based on the new group
 	 * descriptor
 	 */
-	err = ext4_mb_add_more_groupinfo(sb, input->group, gdp);
-	if (err)
+	err = ext4_mb_add_groupinfo(sb, input->group, gdp);
+	if (err) {
+		ext4_mb_put_buddy_cache_lock(sb, input->group, num_grp_locked);
 		goto exit_journal;
+	}
 
 	/*
 	 * Make the new blocks and inodes valid next.  We do this before
@@ -914,6 +919,7 @@ int ext4_group_add(struct super_block *s
 
 	/* Update the global fs size fields */
 	sbi->s_groups_count++;
+	ext4_mb_put_buddy_cache_lock(sb, input->group, num_grp_locked);
 
 	ext4_journal_dirty_metadata(handle, primary);
 
@@ -1081,45 +1087,6 @@ int ext4_group_extend(struct super_block
 	if ((err = ext4_journal_stop(handle)))
 		goto exit_put;
 
-	/*
-	 * Mark mballoc pages as not up to date so that they will be updated
-	 * next time they are loaded by ext4_mb_load_buddy.
-	 *
-	 * XXX Bad, Bad, BAD!!!  We should not be overloading the
-	 * Uptodate flag, particularly on thte bitmap bh, as way of
-	 * hinting to ext4_mb_load_buddy() that it needs to be
-	 * overloaded.  A user could take a LVM snapshot, then do an
-	 * on-line fsck, and clear the uptodate flag, and this would
-	 * not be a bug in userspace, but a bug in the kernel.  FIXME!!!
-	 */
-	{
-		struct ext4_sb_info *sbi = EXT4_SB(sb);
-		struct inode *inode = sbi->s_buddy_cache;
-		int blocks_per_page;
-		int block;
-		int pnum;
-		struct page *page;
-
-		/* Set buddy page as not up to date */
-		blocks_per_page = PAGE_CACHE_SIZE / sb->s_blocksize;
-		block = group * 2;
-		pnum = block / blocks_per_page;
-		page = find_get_page(inode->i_mapping, pnum);
-		if (page != NULL) {
-			ClearPageUptodate(page);
-			page_cache_release(page);
-		}
-
-		/* Set bitmap page as not up to date */
-		block++;
-		pnum = block / blocks_per_page;
-		page = find_get_page(inode->i_mapping, pnum);
-		if (page != NULL) {
-			ClearPageUptodate(page);
-			page_cache_release(page);
-		}
-	}
-
 	if (test_opt(sb, DEBUG))
 		printk(KERN_DEBUG "EXT4-fs: extended group to %llu blocks\n",
 		       ext4_blocks_count(es));


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 24/39] ext4: cleanup mballoc header files
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (22 preceding siblings ...)
  2009-02-18 21:32   ` [patch 23/39] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize Greg KH
@ 2009-02-18 21:32   ` Greg KH
  2009-02-18 21:33   ` [patch 25/39] ext4: dont use blocks freed but not yet committed in buddy cache init Greg KH
                     ` (14 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:32 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-cleanup-mballoc-header-files.patch --]
[-- Type: text/plain, Size: 4197 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit c3a326a657562dab81acf05aee106dc1fe345eb4)

Move some of the forward declaration of the static functions
to mballoc.c where they are used. This enables us to include
mballoc.h in other .c files. Also correct the buddy cache
documentation.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/mballoc.c |   23 +++++++++++++++++++----
 fs/ext4/mballoc.h |   18 +-----------------
 2 files changed, 20 insertions(+), 21 deletions(-)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -100,7 +100,7 @@
  * inode as:
  *
  *  {                        page                        }
- *  [ group 0 buddy][ group 0 bitmap] [group 1][ group 1]...
+ *  [ group 0 bitmap][ group 0 buddy] [group 1][ group 1]...
  *
  *
  * one block each for bitmap and buddy information.  So for each group we
@@ -330,6 +330,16 @@
  *        object
  *
  */
+static struct kmem_cache *ext4_pspace_cachep;
+static struct kmem_cache *ext4_ac_cachep;
+static struct kmem_cache *ext4_free_ext_cachep;
+static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
+					ext4_group_t group);
+static int ext4_mb_init_per_dev_proc(struct super_block *sb);
+static int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
+static void release_blocks_on_commit(journal_t *journal, transaction_t *txn);
+
+
 
 static inline void *mb_correct_addr_and_bit(int *bit, void *addr)
 {
@@ -716,7 +726,7 @@ static void ext4_mb_generate_buddy(struc
  * stored in the inode as
  *
  * {                        page                        }
- * [ group 0 buddy][ group 0 bitmap] [group 1][ group 1]...
+ * [ group 0 bitmap][ group 0 buddy] [group 1][ group 1]...
  *
  *
  * one block each for bitmap and buddy information.
@@ -1320,8 +1330,13 @@ static void ext4_mb_use_best_found(struc
 	ac->ac_tail = ret & 0xffff;
 	ac->ac_buddy = ret >> 16;
 
-	/* XXXXXXX: SUCH A HORRIBLE **CK */
-	/*FIXME!! Why ? */
+	/*
+	 * take the page reference. We want the page to be pinned
+	 * so that we don't get a ext4_mb_init_cache_call for this
+	 * group until we update the bitmap. That would mean we
+	 * double allocate blocks. The reference is dropped
+	 * in ext4_mb_release_context
+	 */
 	ac->ac_bitmap_page = e4b->bd_bitmap_page;
 	get_page(ac->ac_bitmap_page);
 	ac->ac_buddy_page = e4b->bd_buddy_page;
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -99,9 +99,6 @@
  */
 #define MB_DEFAULT_GROUP_PREALLOC	512
 
-static struct kmem_cache *ext4_pspace_cachep;
-static struct kmem_cache *ext4_ac_cachep;
-static struct kmem_cache *ext4_free_ext_cachep;
 
 struct ext4_free_data {
 	/* this links the free block information from group_info */
@@ -262,25 +259,12 @@ static inline void ext4_mb_store_history
 {
 	return;
 }
-#else
-static void ext4_mb_store_history(struct ext4_allocation_context *ac);
 #endif
 
 #define in_range(b, first, len)	((b) >= (first) && (b) <= (first) + (len) - 1)
 
 struct buffer_head *read_block_bitmap(struct super_block *, ext4_group_t);
 
-static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
-					ext4_group_t group);
-static void ext4_mb_return_to_preallocation(struct inode *inode,
-					struct ext4_buddy *e4b, sector_t block,
-					int count);
-static void ext4_mb_put_pa(struct ext4_allocation_context *,
-			struct super_block *, struct ext4_prealloc_space *pa);
-static int ext4_mb_init_per_dev_proc(struct super_block *sb);
-static int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
-static void release_blocks_on_commit(journal_t *journal, transaction_t *txn);
-
 
 static inline void ext4_lock_group(struct super_block *sb, ext4_group_t group)
 {
@@ -306,7 +290,7 @@ static inline int ext4_is_group_locked(s
 						&(grinfo->bb_state));
 }
 
-static ext4_fsblk_t ext4_grp_offs_to_block(struct super_block *sb,
+static inline ext4_fsblk_t ext4_grp_offs_to_block(struct super_block *sb,
 					struct ext4_free_extent *fex)
 {
 	ext4_fsblk_t block;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 25/39] ext4: dont use blocks freed but not yet committed in buddy cache init
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (23 preceding siblings ...)
  2009-02-18 21:32   ` [patch 24/39] ext4: cleanup mballoc header files Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 26/39] ext4: Fix race between read_block_bitmap() and mark_diskspace_used() Greg KH
                     ` (13 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-don-t-use-blocks-freed-but-not-yet-committed-in-buddy-cache-init.patch --]
[-- Type: text/plain, Size: 6777 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 7a2fcbf7f85737735fd44eb34b62315bccf6d6e4)

When we generate buddy cache (especially during resize) we need to
make sure we don't use the blocks freed but not yet comitted.  This
makes sure we have the right value of free blocks count in the group
info and also in the bitmap.  This also ensures the ordered mode
consistency

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/mballoc.c |   82 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 60 insertions(+), 22 deletions(-)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -335,6 +335,8 @@ static struct kmem_cache *ext4_ac_cachep
 static struct kmem_cache *ext4_free_ext_cachep;
 static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap,
 					ext4_group_t group);
+static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
+						ext4_group_t group);
 static int ext4_mb_init_per_dev_proc(struct super_block *sb);
 static int ext4_mb_destroy_per_dev_proc(struct super_block *sb);
 static void release_blocks_on_commit(journal_t *journal, transaction_t *txn);
@@ -858,7 +860,9 @@ static int ext4_mb_init_cache(struct pag
 			/*
 			 * incore got set to the group block bitmap below
 			 */
+			ext4_lock_group(sb, group);
 			ext4_mb_generate_buddy(sb, data, incore, group);
+			ext4_unlock_group(sb, group);
 			incore = NULL;
 		} else {
 			/* this is block of bitmap */
@@ -872,6 +876,7 @@ static int ext4_mb_init_cache(struct pag
 
 			/* mark all preallocated blks used in in-core bitmap */
 			ext4_mb_generate_from_pa(sb, data, group);
+			ext4_mb_generate_from_freelist(sb, data, group);
 			ext4_unlock_group(sb, group);
 
 			/* set incore so that the buddy information can be
@@ -3469,6 +3474,32 @@ ext4_mb_use_preallocated(struct ext4_all
 }
 
 /*
+ * the function goes through all block freed in the group
+ * but not yet committed and marks them used in in-core bitmap.
+ * buddy must be generated from this bitmap
+ * Need to be called with ext4 group lock (ext4_lock_group)
+ */
+static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
+						ext4_group_t group)
+{
+	struct rb_node *n;
+	struct ext4_group_info *grp;
+	struct ext4_free_data *entry;
+
+	grp = ext4_get_group_info(sb, group);
+	n = rb_first(&(grp->bb_free_root));
+
+	while (n) {
+		entry = rb_entry(n, struct ext4_free_data, node);
+		mb_set_bits(sb_bgl_lock(EXT4_SB(sb), group),
+				bitmap, entry->start_blk,
+				entry->count);
+		n = rb_next(n);
+	}
+	return;
+}
+
+/*
  * the function goes through all preallocation in this group and marks them
  * used in in-core bitmap. buddy must be generated from this bitmap
  * Need to be called with ext4 group lock (ext4_lock_group)
@@ -4565,12 +4596,13 @@ static int can_merge(struct ext4_free_da
 
 static noinline_for_stack int
 ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
-			  ext4_group_t group, ext4_grpblk_t block, int count)
+		      struct ext4_free_data *new_entry)
 {
+	ext4_grpblk_t block;
+	struct ext4_free_data *entry;
 	struct ext4_group_info *db = e4b->bd_info;
 	struct super_block *sb = e4b->bd_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
-	struct ext4_free_data *entry, *new_entry;
 	struct rb_node **n = &db->bb_free_root.rb_node, *node;
 	struct rb_node *parent = NULL, *new_node;
 
@@ -4578,14 +4610,9 @@ ext4_mb_free_metadata(handle_t *handle, 
 	BUG_ON(e4b->bd_bitmap_page == NULL);
 	BUG_ON(e4b->bd_buddy_page == NULL);
 
-	new_entry  = kmem_cache_alloc(ext4_free_ext_cachep, GFP_NOFS);
-	new_entry->start_blk = block;
-	new_entry->group  = group;
-	new_entry->count = count;
-	new_entry->t_tid = handle->h_transaction->t_tid;
 	new_node = &new_entry->node;
+	block = new_entry->start_blk;
 
-	ext4_lock_group(sb, group);
 	if (!*n) {
 		/* first free block exent. We need to
 		   protect buddy cache from being freed,
@@ -4603,7 +4630,6 @@ ext4_mb_free_metadata(handle_t *handle, 
 		else if (block >= (entry->start_blk + entry->count))
 			n = &(*n)->rb_right;
 		else {
-			ext4_unlock_group(sb, group);
 			ext4_error(sb, __func__,
 			    "Double free of blocks %d (%d %d)\n",
 			    block, entry->start_blk, entry->count);
@@ -4645,7 +4671,6 @@ ext4_mb_free_metadata(handle_t *handle, 
 	spin_lock(&sbi->s_md_lock);
 	list_add(&new_entry->list, &handle->h_transaction->t_private_list);
 	spin_unlock(&sbi->s_md_lock);
-	ext4_unlock_group(sb, group);
 	return 0;
 }
 
@@ -4750,15 +4775,6 @@ do_more:
 			BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data));
 	}
 #endif
-	mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
-			bit, count);
-
-	/* We dirtied the bitmap block */
-	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
-	err = ext4_journal_dirty_metadata(handle, bitmap_bh);
-	if (err)
-		goto error_return;
-
 	if (ac) {
 		ac->ac_b_ex.fe_group = block_group;
 		ac->ac_b_ex.fe_start = bit;
@@ -4770,11 +4786,29 @@ do_more:
 	if (err)
 		goto error_return;
 	if (metadata) {
-		/* blocks being freed are metadata. these blocks shouldn't
-		 * be used until this transaction is committed */
-		ext4_mb_free_metadata(handle, &e4b, block_group, bit, count);
+		struct ext4_free_data *new_entry;
+		/*
+		 * blocks being freed are metadata. these blocks shouldn't
+		 * be used until this transaction is committed
+		 */
+		new_entry  = kmem_cache_alloc(ext4_free_ext_cachep, GFP_NOFS);
+		new_entry->start_blk = bit;
+		new_entry->group  = block_group;
+		new_entry->count = count;
+		new_entry->t_tid = handle->h_transaction->t_tid;
+		ext4_lock_group(sb, block_group);
+		mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
+				bit, count);
+		ext4_mb_free_metadata(handle, &e4b, new_entry);
+		ext4_unlock_group(sb, block_group);
 	} else {
 		ext4_lock_group(sb, block_group);
+		/* need to update group_info->bb_free and bitmap
+		 * with group lock held. generate_buddy look at
+		 * them with group lock_held
+		 */
+		mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data,
+				bit, count);
 		mb_free_blocks(inode, &e4b, bit, count);
 		ext4_mb_return_to_preallocation(inode, &e4b, block, count);
 		ext4_unlock_group(sb, block_group);
@@ -4797,6 +4831,10 @@ do_more:
 
 	*freed += count;
 
+	/* We dirtied the bitmap block */
+	BUFFER_TRACE(bitmap_bh, "dirtied bitmap block");
+	err = ext4_journal_dirty_metadata(handle, bitmap_bh);
+
 	/* And the group descriptor block */
 	BUFFER_TRACE(gd_bh, "dirtied group descriptor block");
 	ret = ext4_journal_dirty_metadata(handle, gd_bh);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 26/39] ext4: Fix race between read_block_bitmap() and mark_diskspace_used()
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (24 preceding siblings ...)
  2009-02-18 21:33   ` [patch 25/39] ext4: dont use blocks freed but not yet committed in buddy cache init Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 27/39] ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() Greg KH
                     ` (12 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-fix-race-between-read_block_bitmap-and-mark_diskspace_used.patch --]
[-- Type: text/plain, Size: 2823 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit e8134b27e351e813414da3b95aa8eac6d3908088)

We need to make sure we update the block bitmap and clear
EXT4_BG_BLOCK_UNINIT flag with sb_bgl_lock held, since
ext4_read_block_bitmap() looks at EXT4_BG_BLOCK_UNINIT to decide
whether to initialize the block bitmap each time it is called
(introduced by commit c806e68f), and this can race with block
allocations in ext4_mb_mark_diskspace_used().

ext4_read_block_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
	ext4_init_block_bitmap(sb, bh, block_group, desc);

Now on the block allocation side we do

mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
			ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
....
spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
	gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);

ie on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_BLOCK_UNINIT flag. What can happen is a
parallel ext4_read_block_bitmap can zero out the bitmap in between
the above mb_set_bits and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_mb_release_inode_pa: free 100, pa_free 105
EXT4-fs error (device sdb1): mb_free_blocks: double-free of inode 0's block ..

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/mballoc.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1070,7 +1070,10 @@ static void mb_clear_bits(spinlock_t *lo
 			cur += 32;
 			continue;
 		}
-		mb_clear_bit_atomic(lock, cur, bm);
+		if (lock)
+			mb_clear_bit_atomic(lock, cur, bm);
+		else
+			mb_clear_bit(cur, bm);
 		cur++;
 	}
 }
@@ -1088,7 +1091,10 @@ static void mb_set_bits(spinlock_t *lock
 			cur += 32;
 			continue;
 		}
-		mb_set_bit_atomic(lock, cur, bm);
+		if (lock)
+			mb_set_bit_atomic(lock, cur, bm);
+		else
+			mb_set_bit(cur, bm);
 		cur++;
 	}
 }
@@ -3033,10 +3039,9 @@ ext4_mb_mark_diskspace_used(struct ext4_
 		}
 	}
 #endif
-	mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data,
-				ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
-
 	spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group));
+	mb_set_bits(NULL, bitmap_bh->b_data,
+				ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len);
 	if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 		gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
 		gdp->bg_free_blocks_count =


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 27/39] ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (25 preceding siblings ...)
  2009-02-18 21:33   ` [patch 26/39] ext4: Fix race between read_block_bitmap() and mark_diskspace_used() Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 28/39] jbd2: Add BH_JBDPrivateStart Greg KH
                     ` (11 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-fix-the-race-between-read_inode_bitmap-and-ext4_new_inode.patch --]
[-- Type: text/plain, Size: 7626 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 393418676a7602e1d7d3f6e560159c65c8cbd50e)

We need to make sure we update the inode bitmap and clear
EXT4_BG_INODE_UNINIT flag with sb_bgl_lock held, since
ext4_read_inode_bitmap() looks at EXT4_BG_INODE_UNINIT to decide
whether to initialize the inode bitmap each time it is called.
(introduced by commit c806e68f.)

ext4_read_inode_bitmap does:

spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
	ext4_init_inode_bitmap(sb, bh, block_group, desc);

and ext4_new_inode does
if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
                   ino, inode_bitmap_bh->b_data))
		   ......
		   ...
spin_lock(sb_bgl_lock(sbi, group));

gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
i.e., on allocation we update the bitmap then we take the sb_bgl_lock
and clear the EXT4_BG_INODE_UNINIT flag. What can happen is a
parallel ext4_read_inode_bitmap can zero out the bitmap in between
the above ext4_set_bit_atomic and spin_lock(sb_bg_lock..)

The race results in below user visible errors
EXT4-fs error (device sdb1): ext4_free_inode: bit already cleared for inode 168449
EXT4-fs warning (device sdb1): ext4_unlink: Deleting nonexistent file ...
EXT4-fs warning (device sdb1): ext4_rmdir: empty directory has too many links ...
ls: /mnt/tmp/f/p369/d3/d6/d39/db2/dee/d10f/d3f/l71: Stale NFS file handle

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/ialloc.c |  140 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 83 insertions(+), 57 deletions(-)

--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -570,6 +570,77 @@ static int find_group_other(struct super
 }
 
 /*
+ * claim the inode from the inode bitmap. If the group
+ * is uninit we need to take the groups's sb_bgl_lock
+ * and clear the uninit flag. The inode bitmap update
+ * and group desc uninit flag clear should be done
+ * after holding sb_bgl_lock so that ext4_read_inode_bitmap
+ * doesn't race with the ext4_claim_inode
+ */
+static int ext4_claim_inode(struct super_block *sb,
+			struct buffer_head *inode_bitmap_bh,
+			unsigned long ino, ext4_group_t group, int mode)
+{
+	int free = 0, retval = 0;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, NULL);
+
+	spin_lock(sb_bgl_lock(sbi, group));
+	if (ext4_set_bit(ino, inode_bitmap_bh->b_data)) {
+		/* not a free inode */
+		retval = 1;
+		goto err_ret;
+	}
+	ino++;
+	if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
+			ino > EXT4_INODES_PER_GROUP(sb)) {
+		spin_unlock(sb_bgl_lock(sbi, group));
+		ext4_error(sb, __func__,
+			   "reserved inode or inode > inodes count - "
+			   "block_group = %lu, inode=%lu", group,
+			   ino + group * EXT4_INODES_PER_GROUP(sb));
+		return 1;
+	}
+	/* If we didn't allocate from within the initialized part of the inode
+	 * table then we need to initialize up to this inode. */
+	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+
+		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
+			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
+			/* When marking the block group with
+			 * ~EXT4_BG_INODE_UNINIT we don't want to depend
+			 * on the value of bg_itable_unused even though
+			 * mke2fs could have initialized the same for us.
+			 * Instead we calculated the value below
+			 */
+
+			free = 0;
+		} else {
+			free = EXT4_INODES_PER_GROUP(sb) -
+				le16_to_cpu(gdp->bg_itable_unused);
+		}
+
+		/*
+		 * Check the relative inode number against the last used
+		 * relative inode number in this group. if it is greater
+		 * we need to  update the bg_itable_unused count
+		 *
+		 */
+		if (ino > free)
+			gdp->bg_itable_unused =
+				cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
+	}
+	le16_add_cpu(&gdp->bg_free_inodes_count, -1);
+	if (S_ISDIR(mode)) {
+		le16_add_cpu(&gdp->bg_used_dirs_count, 1);
+	}
+	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
+err_ret:
+	spin_unlock(sb_bgl_lock(sbi, group));
+	return retval;
+}
+
+/*
  * There are two policies for allocating an inode.  If the new inode is
  * a directory, then a forward search is made for a block group with both
  * free space and a low directory-to-inode ratio; if that fails, then of
@@ -652,8 +723,12 @@ repeat_in_this_group:
 			if (err)
 				goto fail;
 
-			if (!ext4_set_bit_atomic(sb_bgl_lock(sbi, group),
-						ino, bitmap_bh->b_data)) {
+			BUFFER_TRACE(bh2, "get_write_access");
+			err = ext4_journal_get_write_access(handle, bh2);
+			if (err)
+				goto fail;
+			if (!ext4_claim_inode(sb, bitmap_bh,
+						ino, group, mode)) {
 				/* we won it */
 				BUFFER_TRACE(bitmap_bh,
 					"call ext4_journal_dirty_metadata");
@@ -661,10 +736,13 @@ repeat_in_this_group:
 								bitmap_bh);
 				if (err)
 					goto fail;
+				/* zero bit is inode number 1*/
+				ino++;
 				goto got;
 			}
 			/* we lost it */
 			jbd2_journal_release_buffer(handle, bitmap_bh);
+			jbd2_journal_release_buffer(handle, bh2);
 
 			if (++ino < EXT4_INODES_PER_GROUP(sb))
 				goto repeat_in_this_group;
@@ -684,21 +762,6 @@ repeat_in_this_group:
 	goto out;
 
 got:
-	ino++;
-	if ((group == 0 && ino < EXT4_FIRST_INO(sb)) ||
-	    ino > EXT4_INODES_PER_GROUP(sb)) {
-		ext4_error(sb, __func__,
-			   "reserved inode or inode > inodes count - "
-			   "block_group = %lu, inode=%lu", group,
-			   ino + group * EXT4_INODES_PER_GROUP(sb));
-		err = -EIO;
-		goto fail;
-	}
-
-	BUFFER_TRACE(bh2, "get_write_access");
-	err = ext4_journal_get_write_access(handle, bh2);
-	if (err) goto fail;
-
 	/* We may have to initialize the block bitmap if it isn't already */
 	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM) &&
 	    gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
@@ -733,47 +796,10 @@ got:
 		if (err)
 			goto fail;
 	}
-
-	spin_lock(sb_bgl_lock(sbi, group));
-	/* If we didn't allocate from within the initialized part of the inode
-	 * table then we need to initialize up to this inode. */
-	if (EXT4_HAS_RO_COMPAT_FEATURE(sb, EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
-		if (gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
-			gdp->bg_flags &= cpu_to_le16(~EXT4_BG_INODE_UNINIT);
-
-			/* When marking the block group with
-			 * ~EXT4_BG_INODE_UNINIT we don't want to depend
-			 * on the value of bg_itable_unused even though
-			 * mke2fs could have initialized the same for us.
-			 * Instead we calculated the value below
-			 */
-
-			free = 0;
-		} else {
-			free = EXT4_INODES_PER_GROUP(sb) -
-				le16_to_cpu(gdp->bg_itable_unused);
-		}
-
-		/*
-		 * Check the relative inode number against the last used
-		 * relative inode number in this group. if it is greater
-		 * we need to  update the bg_itable_unused count
-		 *
-		 */
-		if (ino > free)
-			gdp->bg_itable_unused =
-				cpu_to_le16(EXT4_INODES_PER_GROUP(sb) - ino);
-	}
-
-	le16_add_cpu(&gdp->bg_free_inodes_count, -1);
-	if (S_ISDIR(mode)) {
-		le16_add_cpu(&gdp->bg_used_dirs_count, 1);
-	}
-	gdp->bg_checksum = ext4_group_desc_csum(sbi, group, gdp);
-	spin_unlock(sb_bgl_lock(sbi, group));
-	BUFFER_TRACE(bh2, "call ext4_journal_dirty_metadata");
+	BUFFER_TRACE(bh2, "call ext4_handle_dirty_metadata");
 	err = ext4_journal_dirty_metadata(handle, bh2);
-	if (err) goto fail;
+	if (err)
+		goto fail;
 
 	percpu_counter_dec(&sbi->s_freeinodes_counter);
 	if (S_ISDIR(mode))


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 28/39] jbd2: Add BH_JBDPrivateStart
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (26 preceding siblings ...)
  2009-02-18 21:33   ` [patch 27/39] ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 29/39] ext4: Use new buffer_head flag to check uninit group bitmaps initialization Greg KH
                     ` (10 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Mark Fasheh, linux-ext4

[-- Attachment #1: jbd2-add-bh_jbdprivatestart.patch --]
[-- Type: text/plain, Size: 974 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Mark Fasheh <mfasheh@suse.com>

(cherry picked from commit e97fcd95a4778a8caf1980c6c72fdf68185a0838)

Add this so that file systems using JBD2 can safely allocate unused b_state
bits.

In this case, we add it so that Ocfs2 can define a single bit for tracking
the validation state of a buffer.

Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/jbd2.h |    1 +
 1 file changed, 1 insertion(+)

--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -329,6 +329,7 @@ enum jbd_state_bits {
 	BH_State,		/* Pins most journal_head state */
 	BH_JournalHead,		/* Pins bh->b_private and jh->b_bh */
 	BH_Unshadow,		/* Dummy bit, for BJ_Shadow wakeup filtering */
+	BH_JBDPrivateStart,	/* First bit available for private use by FS */
 };
 
 BUFFER_FNS(JBD, jbd)


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 29/39] ext4: Use new buffer_head flag to check uninit group bitmaps initialization
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (27 preceding siblings ...)
  2009-02-18 21:33   ` [patch 28/39] jbd2: Add BH_JBDPrivateStart Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 30/39] ext4: mark the blocks/inode bitmap beyond end of group as used Greg KH
                     ` (9 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-use-new-buffer_head-flag-to-check-uninit-group-bitmaps-initialization.patch --]
[-- Type: text/plain, Size: 5792 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 2ccb5fb9f113dae969d1ae9b6c10e80fa34f8cd3)

For uninit block group, the on-disk bitmap is not initialized. That
implies we cannot depend on the uptodate flag on the bitmap
buffer_head to find bitmap validity.  Use a new buffer_head flag which
would be set after we properly initialize the bitmap.  This also
prevents (re-)initializing the uninit group bitmap every time we call
ext4_read_block_bitmap().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/balloc.c  |   25 +++++++++++++++++++++++--
 fs/ext4/ext4.h    |   19 +++++++++++++++++++
 fs/ext4/ialloc.c  |   24 ++++++++++++++++++++++--
 fs/ext4/mballoc.c |   24 ++++++++++++++++++++++--
 4 files changed, 86 insertions(+), 6 deletions(-)

--- a/fs/ext4/balloc.c
+++ b/fs/ext4/balloc.c
@@ -320,20 +320,41 @@ ext4_read_block_bitmap(struct super_bloc
 			    block_group, bitmap_blk);
 		return NULL;
 	}
-	if (buffer_uptodate(bh) &&
-	    !(desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
+
+	if (bitmap_uptodate(bh))
 		return bh;
 
 	lock_buffer(bh);
+	if (bitmap_uptodate(bh)) {
+		unlock_buffer(bh);
+		return bh;
+	}
 	spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 	if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 		ext4_init_block_bitmap(sb, bh, block_group, desc);
+		set_bitmap_uptodate(bh);
 		set_buffer_uptodate(bh);
 		unlock_buffer(bh);
 		spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
 		return bh;
 	}
 	spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
+	if (buffer_uptodate(bh)) {
+		/*
+		 * if not uninit if bh is uptodate,
+		 * bitmap is also uptodate
+		 */
+		set_bitmap_uptodate(bh);
+		unlock_buffer(bh);
+		return bh;
+	}
+	/*
+	 * submit the buffer_head for read. We can
+	 * safely mark the bitmap as uptodate now.
+	 * We do it here so the bitmap uptodate bit
+	 * get set with buffer lock held.
+	 */
+	set_bitmap_uptodate(bh);
 	if (bh_submit_read(bh) < 0) {
 		put_bh(bh);
 		ext4_error(sb, __func__,
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -19,6 +19,7 @@
 #include <linux/types.h>
 #include <linux/blkdev.h>
 #include <linux/magic.h>
+#include <linux/jbd2.h>
 #include "ext4_i.h"
 
 /*
@@ -1286,6 +1287,24 @@ extern int ext4_get_blocks_wrap(handle_t
 			sector_t block, unsigned long max_blocks,
 			struct buffer_head *bh, int create,
 			int extend_disksize, int flag);
+
+/*
+ * Add new method to test wether block and inode bitmaps are properly
+ * initialized. With uninit_bg reading the block from disk is not enough
+ * to mark the bitmap uptodate. We need to also zero-out the bitmap
+ */
+#define BH_BITMAP_UPTODATE BH_JBDPrivateStart
+
+static inline int bitmap_uptodate(struct buffer_head *bh)
+{
+	return (buffer_uptodate(bh) &&
+			test_bit(BH_BITMAP_UPTODATE, &(bh)->b_state));
+}
+static inline void set_bitmap_uptodate(struct buffer_head *bh)
+{
+	set_bit(BH_BITMAP_UPTODATE, &(bh)->b_state);
+}
+
 #endif	/* __KERNEL__ */
 
 #endif	/* _EXT4_H */
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -115,20 +115,40 @@ ext4_read_inode_bitmap(struct super_bloc
 			    block_group, bitmap_blk);
 		return NULL;
 	}
-	if (buffer_uptodate(bh) &&
-	    !(desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)))
+	if (bitmap_uptodate(bh))
 		return bh;
 
 	lock_buffer(bh);
+	if (bitmap_uptodate(bh)) {
+		unlock_buffer(bh);
+		return bh;
+	}
 	spin_lock(sb_bgl_lock(EXT4_SB(sb), block_group));
 	if (desc->bg_flags & cpu_to_le16(EXT4_BG_INODE_UNINIT)) {
 		ext4_init_inode_bitmap(sb, bh, block_group, desc);
+		set_bitmap_uptodate(bh);
 		set_buffer_uptodate(bh);
 		unlock_buffer(bh);
 		spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
 		return bh;
 	}
 	spin_unlock(sb_bgl_lock(EXT4_SB(sb), block_group));
+	if (buffer_uptodate(bh)) {
+		/*
+		 * if not uninit if bh is uptodate,
+		 * bitmap is also uptodate
+		 */
+		set_bitmap_uptodate(bh);
+		unlock_buffer(bh);
+		return bh;
+	}
+	/*
+	 * submit the buffer_head for read. We can
+	 * safely mark the bitmap as uptodate now.
+	 * We do it here so the bitmap uptodate bit
+	 * get set with buffer lock held.
+	 */
+	set_bitmap_uptodate(bh);
 	if (bh_submit_read(bh) < 0) {
 		put_bh(bh);
 		ext4_error(sb, __func__,
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -794,22 +794,42 @@ static int ext4_mb_init_cache(struct pag
 		if (bh[i] == NULL)
 			goto out;
 
-		if (buffer_uptodate(bh[i]) &&
-		    !(desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))
+		if (bitmap_uptodate(bh[i]))
 			continue;
 
 		lock_buffer(bh[i]);
+		if (bitmap_uptodate(bh[i])) {
+			unlock_buffer(bh[i]);
+			continue;
+		}
 		spin_lock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 		if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
 			ext4_init_block_bitmap(sb, bh[i],
 						first_group + i, desc);
+			set_bitmap_uptodate(bh[i]);
 			set_buffer_uptodate(bh[i]);
 			unlock_buffer(bh[i]);
 			spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
 			continue;
 		}
 		spin_unlock(sb_bgl_lock(EXT4_SB(sb), first_group + i));
+		if (buffer_uptodate(bh[i])) {
+			/*
+			 * if not uninit if bh is uptodate,
+			 * bitmap is also uptodate
+			 */
+			set_bitmap_uptodate(bh[i]);
+			unlock_buffer(bh[i]);
+			continue;
+		}
 		get_bh(bh[i]);
+		/*
+		 * submit the buffer_head for read. We can
+		 * safely mark the bitmap as uptodate now.
+		 * We do it here so the bitmap uptodate bit
+		 * get set with buffer lock held.
+		 */
+		set_bitmap_uptodate(bh[i]);
 		bh[i]->b_end_io = end_buffer_read_sync;
 		submit_bh(READ, bh[i]);
 		mb_debug("read bitmap for group %lu\n", first_group + i);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 30/39] ext4: mark the blocks/inode bitmap beyond end of group as used
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (28 preceding siblings ...)
  2009-02-18 21:33   ` [patch 29/39] ext4: Use new buffer_head flag to check uninit group bitmaps initialization Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 31/39] ext4: Dont allow new groups to be added during block allocation Greg KH
                     ` (8 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-mark-the-blocks-inode-bitmap-beyond-end-of-group-as-used.patch --]
[-- Type: text/plain, Size: 2394 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 648f5879f5892dddd3ba71cd0d285599f40f2512)

We need to mark the block/inode bitmap beyond the end of the group
with '1'.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/ialloc.c  |    2 +-
 fs/ext4/mballoc.c |    4 ++--
 fs/ext4/resize.c  |    6 ++----
 3 files changed, 5 insertions(+), 7 deletions(-)

--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -84,7 +84,7 @@ unsigned ext4_init_inode_bitmap(struct s
 	}
 
 	memset(bh->b_data, 0, (EXT4_INODES_PER_GROUP(sb) + 7) / 8);
-	mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), EXT4_BLOCKS_PER_GROUP(sb),
+	mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8,
 			bh->b_data);
 
 	return EXT4_INODES_PER_GROUP(sb);
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -3036,8 +3036,8 @@ ext4_mb_mark_diskspace_used(struct ext4_
 	    in_range(block + len - 1, ext4_inode_table(sb, gdp),
 		     EXT4_SB(sb)->s_itb_per_group)) {
 		ext4_error(sb, __func__,
-			   "Allocating block in system zone - block = %llu",
-			   block);
+			   "Allocating block %llu in system zone of %d group\n",
+			   block, ac->ac_b_ex.fe_group);
 		/* File system mounted not to panic on error
 		 * Fix the bitmap and repeat the block allocation
 		 * We leak some of the blocks here.
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -284,11 +284,9 @@ static int setup_new_group_blocks(struct
 	if ((err = extend_or_restart_transaction(handle, 2, bh)))
 		goto exit_bh;
 
-	mark_bitmap_end(input->blocks_count, EXT4_BLOCKS_PER_GROUP(sb),
-			bh->b_data);
+	mark_bitmap_end(input->blocks_count, sb->s_blocksize * 8, bh->b_data);
 	ext4_journal_dirty_metadata(handle, bh);
 	brelse(bh);
-
 	/* Mark unused entries in inode bitmap used */
 	ext4_debug("clear inode bitmap %#04llx (+%llu)\n",
 		   input->inode_bitmap, input->inode_bitmap - start);
@@ -297,7 +295,7 @@ static int setup_new_group_blocks(struct
 		goto exit_journal;
 	}
 
-	mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), EXT4_BLOCKS_PER_GROUP(sb),
+	mark_bitmap_end(EXT4_INODES_PER_GROUP(sb), sb->s_blocksize * 8,
 			bh->b_data);
 	ext4_journal_dirty_metadata(handle, bh);
 exit_bh:


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 31/39] ext4: Dont allow new groups to be added during block allocation
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (29 preceding siblings ...)
  2009-02-18 21:33   ` [patch 30/39] ext4: mark the blocks/inode bitmap beyond end of group as used Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 32/39] ext4: Init the complete page while building buddy cache Greg KH
                     ` (7 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-don-t-allow-new-groups-to-be-added-during-block-allocation.patch --]
[-- Type: text/plain, Size: 2885 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 8556e8f3b6c4c11601ce1e9ea8090a6d8bd5daae)

After we mark the blocks in the buddy cache as allocated,
we need to ensure that we don't reinit the buddy cache until
the block bitmap is updated.  This commit achieves this by holding
the group_info alloc_semaphore till ext4_mb_release_context

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/mballoc.c |   16 +++++++++++++---
 fs/ext4/mballoc.h |    5 +++++
 2 files changed, 18 insertions(+), 3 deletions(-)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1052,7 +1052,8 @@ static void ext4_mb_release_desc(struct 
 	if (e4b->bd_buddy_page)
 		page_cache_release(e4b->bd_buddy_page);
 	/* Done with the buddy cache */
-	up_read(e4b->alloc_semp);
+	if (e4b->alloc_semp)
+		up_read(e4b->alloc_semp);
 }
 
 
@@ -1372,7 +1373,9 @@ static void ext4_mb_use_best_found(struc
 	get_page(ac->ac_bitmap_page);
 	ac->ac_buddy_page = e4b->bd_buddy_page;
 	get_page(ac->ac_buddy_page);
-
+	/* on allocation we use ac to track the held semaphore */
+	ac->alloc_semp =  e4b->alloc_semp;
+	e4b->alloc_semp = NULL;
 	/* store last allocated for subsequent stream allocation */
 	if ((ac->ac_flags & EXT4_MB_HINT_DATA)) {
 		spin_lock(&sbi->s_md_lock);
@@ -4286,6 +4289,7 @@ ext4_mb_initialize_context(struct ext4_a
 	ac->ac_pa = NULL;
 	ac->ac_bitmap_page = NULL;
 	ac->ac_buddy_page = NULL;
+	ac->alloc_semp = NULL;
 	ac->ac_lg = NULL;
 
 	/* we have to define context: we'll we work with a file or
@@ -4466,6 +4470,8 @@ static int ext4_mb_release_context(struc
 		}
 		ext4_mb_put_pa(ac, ac->ac_sb, pa);
 	}
+	if (ac->alloc_semp)
+		up_read(ac->alloc_semp);
 	if (ac->ac_bitmap_page)
 		page_cache_release(ac->ac_bitmap_page);
 	if (ac->ac_buddy_page)
@@ -4566,10 +4572,14 @@ repeat:
 				ac->ac_o_ex.fe_len < ac->ac_b_ex.fe_len)
 			ext4_mb_new_preallocation(ac);
 	}
-
 	if (likely(ac->ac_status == AC_STATUS_FOUND)) {
 		*errp = ext4_mb_mark_diskspace_used(ac, handle, reserv_blks);
 		if (*errp ==  -EAGAIN) {
+			/*
+			 * drop the reference that we took
+			 * in ext4_mb_use_best_found
+			 */
+			ext4_mb_release_context(ac);
 			ac->ac_b_ex.fe_group = 0;
 			ac->ac_b_ex.fe_start = 0;
 			ac->ac_b_ex.fe_len = 0;
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -216,6 +216,11 @@ struct ext4_allocation_context {
 	__u8 ac_op;		/* operation, for history only */
 	struct page *ac_bitmap_page;
 	struct page *ac_buddy_page;
+	/*
+	 * pointer to the held semaphore upon successful
+	 * block allocation
+	 */
+	struct rw_semaphore *alloc_semp;
 	struct ext4_prealloc_space *ac_pa;
 	struct ext4_locality_group *ac_lg;
 };


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 32/39] ext4: Init the complete page while building buddy cache
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (30 preceding siblings ...)
  2009-02-18 21:33   ` [patch 31/39] ext4: Dont allow new groups to be added during block allocation Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 33/39] ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc Greg KH
                     ` (6 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-init-the-complete-page-while-building-buddy-cache.patch --]
[-- Type: text/plain, Size: 1440 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 29eaf024980e07cc01f31ae4ea5d68c917f4b7da)

We need to init the complete page during buddy cache init
by setting the contents to '1'.  Otherwise we can see the
following errors after doing an online resize of the
filesystem:

EXT4-fs error (device sdb1): ext4_mb_mark_diskspace_used:
	Allocating block 1040385 in system zone of 127 group

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/mballoc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -846,6 +846,8 @@ static int ext4_mb_init_cache(struct pag
 
 	err = 0;
 	first_block = page->index * blocks_per_page;
+	/* init the page  */
+	memset(page_address(page), 0xff, PAGE_CACHE_SIZE);
 	for (i = 0; i < blocks_per_page; i++) {
 		int group;
 		struct ext4_group_info *grinfo;
@@ -872,7 +874,6 @@ static int ext4_mb_init_cache(struct pag
 			BUG_ON(incore == NULL);
 			mb_debug("put buddy for group %u in page %lu/%x\n",
 				group, page->index, i * blocksize);
-			memset(data, 0xff, blocksize);
 			grinfo = ext4_get_group_info(sb, group);
 			grinfo->bb_fragments = 0;
 			memset(grinfo->bb_counters, 0,


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 33/39] ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (31 preceding siblings ...)
  2009-02-18 21:33   ` [patch 32/39] ext4: Init the complete page while building buddy cache Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 34/39] ext4: Add sanity checks for the superblock before mounting the filesystem Greg KH
                     ` (5 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4, Aneesh Kumar K.V

[-- Attachment #1: ext4-fix-s_dirty_blocks_counter-if-block-allocation-failed-with-nodelalloc.patch --]
[-- Type: text/plain, Size: 1390 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

(cherry picked from commit 0087d9fb3f29f59e8d42c8b058376d80e5adde4c)

With nodelalloc option we need to update the dirty block counter on
block allocation failure. This is needed because we increment the
dirty block counter early in the block allocation phase. Without
the patch s_dirty_blocks_counter goes wrong so that filesystem's
free blocks decreases incorrectly.

Tested-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/mballoc.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4538,7 +4538,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t
 	}
 	if (ar->len == 0) {
 		*errp = -EDQUOT;
-		return 0;
+		goto out3;
 	}
 	inquota = ar->len;
 
@@ -4611,6 +4611,13 @@ out2:
 out1:
 	if (ar->len < inquota)
 		DQUOT_FREE_BLOCK(ar->inode, inquota - ar->len);
+out3:
+	if (!ar->len) {
+		if (!EXT4_I(ar->inode)->i_delalloc_reserved_flag)
+			/* release all the reserved blocks if non delalloc */
+			percpu_counter_sub(&sbi->s_dirtyblocks_counter,
+						reserv_blks);
+	}
 
 	return block;
 }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 34/39] ext4: Add sanity checks for the superblock before mounting the filesystem
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (32 preceding siblings ...)
  2009-02-18 21:33   ` [patch 33/39] ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 35/39] ext4: only use i_size_high for regular files Greg KH
                     ` (4 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Thiemo Nagel, linux-ext4

[-- Attachment #1: ext4-add-sanity-checks-for-the-superblock-before-mounting-the-filesystem.patch --]
[-- Type: text/plain, Size: 2799 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit 4ec110281379826c5cf6ed14735e47027c3c5765)

This avoids insane superblock configurations that could lead to kernel
oops due to null pointer derefences.

http://bugzilla.kernel.org/show_bug.cgi?id=12371

Thanks to David Maciejak at Fortinet's FortiGuard Global Security
Research Team who discovered this bug independently (but at
approximately the same time) as Thiemo Nagel, who submitted the patch.

Signed-off-by: Thiemo Nagel <thiemo.nagel@ph.tum.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/super.c |   30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1869,8 +1869,8 @@ static int ext4_fill_super(struct super_
 	char *cp;
 	int ret = -EINVAL;
 	int blocksize;
-	int db_count;
-	int i;
+	unsigned int db_count;
+	unsigned int i;
 	int needs_recovery, has_huge_files;
 	__le32 features;
 	__u64 blocks_count;
@@ -2153,20 +2153,30 @@ static int ext4_fill_super(struct super_
 	if (EXT4_BLOCKS_PER_GROUP(sb) == 0)
 		goto cantfind_ext4;
 
-	/* ensure blocks_count calculation below doesn't sign-extend */
-	if (ext4_blocks_count(es) + EXT4_BLOCKS_PER_GROUP(sb) <
-	    le32_to_cpu(es->s_first_data_block) + 1) {
-		printk(KERN_WARNING "EXT4-fs: bad geometry: block count %llu, "
-		       "first data block %u, blocks per group %lu\n",
-			ext4_blocks_count(es),
-			le32_to_cpu(es->s_first_data_block),
-			EXT4_BLOCKS_PER_GROUP(sb));
+        /*
+         * It makes no sense for the first data block to be beyond the end
+         * of the filesystem.
+         */
+        if (le32_to_cpu(es->s_first_data_block) >= ext4_blocks_count(es)) {
+                printk(KERN_WARNING "EXT4-fs: bad geometry: first data"
+		       "block %u is beyond end of filesystem (%llu)\n",
+		       le32_to_cpu(es->s_first_data_block),
+		       ext4_blocks_count(es));
 		goto failed_mount;
 	}
 	blocks_count = (ext4_blocks_count(es) -
 			le32_to_cpu(es->s_first_data_block) +
 			EXT4_BLOCKS_PER_GROUP(sb) - 1);
 	do_div(blocks_count, EXT4_BLOCKS_PER_GROUP(sb));
+	if (blocks_count > ((uint64_t)1<<32) - EXT4_DESC_PER_BLOCK(sb)) {
+		printk(KERN_WARNING "EXT4-fs: groups count too large: %u "
+		       "(block count %llu, first data block %u, "
+		       "blocks per group %lu)\n", sbi->s_groups_count,
+		       ext4_blocks_count(es),
+		       le32_to_cpu(es->s_first_data_block),
+		       EXT4_BLOCKS_PER_GROUP(sb));
+		goto failed_mount;
+	}
 	sbi->s_groups_count = blocks_count;
 	db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
 		   EXT4_DESC_PER_BLOCK(sb);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 35/39] ext4: only use i_size_high for regular files
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (33 preceding siblings ...)
  2009-02-18 21:33   ` [patch 34/39] ext4: Add sanity checks for the superblock before mounting the filesystem Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 36/39] ext4: Add sanity check to make_indexed_dir Greg KH
                     ` (3 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: ext4-only-use-i_size_high-for-regular-files.patch --]
[-- Type: text/plain, Size: 2060 bytes --]


2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit 06a279d636734da32bb62dd2f7b0ade666f65d7c)

Directories are not allowed to be bigger than 2GB, so don't use
i_size_high for anything other than regular files.  E2fsck should
complain about these inodes, but the simplest thing to do for the
kernel is to only use i_size_high for regular files.

This prevents an intentially corrupted filesystem from causing the
kernel to burn a huge amount of CPU and issuing error messages such
as:

EXT4-fs warning (device loop0): ext4_block_to_path: block 135090028 > max

Thanks to David Maciejak from Fortinet's FortiGuard Global Security
Research Team for reporting this issue.

http://bugzilla.kernel.org/show_bug.cgi?id=12375

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/ext4.h  |    7 +++++--
 fs/ext4/inode.c |    4 ++--
 2 files changed, 7 insertions(+), 4 deletions(-)

--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1188,8 +1188,11 @@ static inline void ext4_r_blocks_count_s
 
 static inline loff_t ext4_isize(struct ext4_inode *raw_inode)
 {
-	return ((loff_t)le32_to_cpu(raw_inode->i_size_high) << 32) |
-		le32_to_cpu(raw_inode->i_size_lo);
+	if (S_ISREG(le16_to_cpu(raw_inode->i_mode)))
+		return ((loff_t)le32_to_cpu(raw_inode->i_size_high) << 32) |
+			le32_to_cpu(raw_inode->i_size_lo);
+	else
+		return (loff_t) le32_to_cpu(raw_inode->i_size_lo);
 }
 
 static inline void ext4_isize_set(struct ext4_inode *raw_inode, loff_t i_size)
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -351,9 +351,9 @@ static int ext4_block_to_path(struct ino
 		final = ptrs;
 	} else {
 		ext4_warning(inode->i_sb, "ext4_block_to_path",
-				"block %lu > max",
+				"block %lu > max in inode %lu",
 				i_block + direct_blocks +
-				indirect_blocks + double_blocks);
+				indirect_blocks + double_blocks, inode->i_ino);
 	}
 	if (boundary)
 		*boundary = final - 1 - (i_block & (ptrs - 1));


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 36/39] ext4: Add sanity check to make_indexed_dir
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (34 preceding siblings ...)
  2009-02-18 21:33   ` [patch 35/39] ext4: only use i_size_high for regular files Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 37/39] jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs" Greg KH
                     ` (2 subsequent siblings)
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: ext4-add-sanity-check-to-make_indexed_dir.patch --]
[-- Type: text/plain, Size: 2182 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit e6b8bc09ba2075cd91fbffefcd2778b1a00bd76f)

Make sure the rec_len field in the '..' entry is sane, lest we overrun
the directory block and cause a kernel oops on a purposefully
corrupted filesystem.

Thanks to Sami Liedes for reporting this bug.

http://bugzilla.kernel.org/show_bug.cgi?id=12430

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/namei.c |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1372,7 +1372,7 @@ static int make_indexed_dir(handle_t *ha
 	struct fake_dirent *fde;
 
 	blocksize =  dir->i_sb->s_blocksize;
-	dxtrace(printk(KERN_DEBUG "Creating index\n"));
+	dxtrace(printk(KERN_DEBUG "Creating index: inode %lu\n", dir->i_ino));
 	retval = ext4_journal_get_write_access(handle, bh);
 	if (retval) {
 		ext4_std_error(dir->i_sb, retval);
@@ -1381,6 +1381,20 @@ static int make_indexed_dir(handle_t *ha
 	}
 	root = (struct dx_root *) bh->b_data;
 
+	/* The 0th block becomes the root, move the dirents out */
+	fde = &root->dotdot;
+	de = (struct ext4_dir_entry_2 *)((char *)fde +
+		ext4_rec_len_from_disk(fde->rec_len));
+	if ((char *) de >= (((char *) root) + blocksize)) {
+		ext4_error(dir->i_sb, __func__,
+			   "invalid rec_len for '..' in inode %lu",
+			   dir->i_ino);
+		brelse(bh);
+		return -EIO;
+	}
+	len = ((char *) root) + blocksize - (char *) de;
+
+	/* Allocate new block for the 0th block's dirents */
 	bh2 = ext4_append(handle, dir, &block, &retval);
 	if (!(bh2)) {
 		brelse(bh);
@@ -1389,11 +1403,6 @@ static int make_indexed_dir(handle_t *ha
 	EXT4_I(dir)->i_flags |= EXT4_INDEX_FL;
 	data1 = bh2->b_data;
 
-	/* The 0th block becomes the root, move the dirents out */
-	fde = &root->dotdot;
-	de = (struct ext4_dir_entry_2 *)((char *)fde +
-		ext4_rec_len_from_disk(fde->rec_len));
-	len = ((char *) root) + blocksize - (char *) de;
 	memcpy (data1, de, len);
 	de = (struct ext4_dir_entry_2 *) data1;
 	top = data1 + len;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 37/39] jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs"
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (35 preceding siblings ...)
  2009-02-18 21:33   ` [patch 36/39] ext4: Add sanity check to make_indexed_dir Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 38/39] ext4: Initialize the new group descriptor when resizing the filesystem Greg KH
  2009-02-18 21:33   ` [patch 39/39] Fix longstanding "error: storage size of __mod_dmi_device_table isnt known" Greg KH
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: jbd2-on-a-__journal_expect-assertion-failure-printk-jbd2-not-ext3-fs.patch --]
[-- Type: text/plain, Size: 985 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>

(cherry picked from commit 08ec8c3878cea0bf91f2ba3c0badf44b383752d0)

Otherwise it can be very confusing to find a "EXT3-fs: " failure in
the middle of EXT4-fs failures, and it makes it harder to track the
source of the failure.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/jbd2.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -308,7 +308,8 @@ void buffer_assertion_failure(struct buf
 		int val = (expr);					     \
 		if (!val) {						     \
 			printk(KERN_ERR					     \
-				"EXT3-fs unexpected failure: %s;\n",# expr); \
+			       "JBD2 unexpected failure: %s: %s;\n",	     \
+			       __func__, #expr);			     \
 			printk(KERN_ERR why "\n");			     \
 		}							     \
 		val;							     \


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 38/39] ext4: Initialize the new group descriptor when resizing the filesystem
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (36 preceding siblings ...)
  2009-02-18 21:33   ` [patch 37/39] jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs" Greg KH
@ 2009-02-18 21:33   ` Greg KH
  2009-02-18 21:33   ` [patch 39/39] Fix longstanding "error: storage size of __mod_dmi_device_table isnt known" Greg KH
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, linux-ext4

[-- Attachment #1: ext4-initialize-the-new-group-descriptor-when-resizing-the-filesystem.patch --]
[-- Type: text/plain, Size: 1367 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: "Theodore Ts'o" <tytso@mit.edu>
(cherry picked from commit fdff73f094e7220602cc3f8959c7230517976412)

Make sure all of the fields of the group descriptor are properly
initialized.  Previously, we allowed bg_flags field to be contain
random garbage, which could trigger non-deterministic behavior,
including a kernel OOPS.

http://bugzilla.kernel.org/show_bug.cgi?id=12433

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/ext4/resize.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -860,11 +860,13 @@ int ext4_group_add(struct super_block *s
 	gdp = (struct ext4_group_desc *)((char *)primary->b_data +
 					 gdb_off * EXT4_DESC_SIZE(sb));
 
+	memset(gdp, 0, EXT4_DESC_SIZE(sb));
 	ext4_block_bitmap_set(sb, gdp, input->block_bitmap); /* LV FIXME */
 	ext4_inode_bitmap_set(sb, gdp, input->inode_bitmap); /* LV FIXME */
 	ext4_inode_table_set(sb, gdp, input->inode_table); /* LV FIXME */
 	gdp->bg_free_blocks_count = cpu_to_le16(input->free_blocks_count);
 	gdp->bg_free_inodes_count = cpu_to_le16(EXT4_INODES_PER_GROUP(sb));
+	gdp->bg_flags = cpu_to_le16(EXT4_BG_INODE_ZEROED);
 	gdp->bg_checksum = ext4_group_desc_csum(sbi, input->group, gdp);
 
 	/*


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [patch 39/39] Fix longstanding "error: storage size of __mod_dmi_device_table isnt known"
  2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
                     ` (37 preceding siblings ...)
  2009-02-18 21:33   ` [patch 38/39] ext4: Initialize the new group descriptor when resizing the filesystem Greg KH
@ 2009-02-18 21:33   ` Greg KH
  38 siblings, 0 replies; 42+ messages in thread
From: Greg KH @ 2009-02-18 21:33 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Justin Forbes, Zwane Mwaikambo, Theodore Ts'o, Randy Dunlap,
	Dave Jones, Chuck Wolber, Chris Wedgwood, Michael Krufky,
	Chuck Ebbert, Domenico Andreoli, Willy Tarreau,
	Rodrigo Rubira Branco, Jake Edge, Eugene Teo, torvalds, akpm,
	alan, Alexey Dobriyan

[-- Attachment #1: fix-longstanding-error-storage-size-of-__mod_dmi_device_table-isn-t-known.patch --]
[-- Type: text/plain, Size: 1079 bytes --]

2.6.28-stable review patch.  If anyone has any objections, please let us know.

------------------

From: Alexey Dobriyan <adobriyan@gmail.com>

commit 40413dcb7b273bda681dca38e6ff0bbb3728ef11 upstream.

gcc 3.4.6 doesn't like MODULE_DEVICE_TABLE(dmi, x) expansion enough to
error out.  Shut it up in a most simple way.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/mod_devicetable.h |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -443,6 +443,13 @@ struct dmi_system_id {
 	struct dmi_strmatch matches[4];
 	void *driver_data;
 };
+/*
+ * struct dmi_device_id appears during expansion of
+ * "MODULE_DEVICE_TABLE(dmi, x)". Compiler doesn't look inside it
+ * but this is enough for gcc 3.4.6 to error out:
+ *	error: storage size of '__mod_dmi_device_table' isn't known
+ */
+#define dmi_device_id dmi_system_id
 #endif
 
 #define DMI_MATCH(a, b)	{ a, b }


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 05/39] sata_nv: give up hardreset on nf2
  2009-02-18 21:32   ` [patch 05/39] sata_nv: give up hardreset on nf2 Greg KH
@ 2009-03-03 23:53     ` Stefan Lippers-Hollmann
  2009-03-04  2:12       ` Tejun Heo
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Lippers-Hollmann @ 2009-03-03 23:53 UTC (permalink / raw)
  To: Greg KH
  Cc: linux-kernel, stable, Tejun Heo, Robert Hancock, Jeff Garzik,
	saro_v

[-- Attachment #1: Type: text/plain, Size: 7148 bytes --]

Hi

On Mittwoch, 18. Februar 2009, Greg KH wrote:
> 2.6.28-stable review patch.  If anyone has any objections, please let us know.
[...]
> From: Tejun Heo <tj@kernel.org>
> 
> commit 7dac745b8e367c99175b8f0d014d996f0e5ed9e5 upstream.
> 
> Kernel bz#12176 reports that nf2 hardreset simply doesn't work.  Give
> up.  Argh...

Sorry to report this rather late, but I upgraded a remote nforce3 system to
2.6.28.7 today, and noticed the following messages repeating continously 
(reproducable on a different nforce3 system as well):

ata4: EH pending after 5 tries, giving up
ata4: EH complete
ata4: EH pending after 5 tries, giving up
ata4: EH complete

Reverting just this patch from 2.6.28.7 lets them disappear again. Likewise
2.6.29-rc6-git7 is affected as well - and reverting this patch seems to fix
the issue there as well. 

Below is the lspci output from the affected system, with dmesg attached:

affected:
	2.6.28.7:		dmesg.2.6.28-7.slh.1-sidux-amd64.gz
	2.6.29-rc6-git5:	dmesg.2.6.29-rc6-sidux-amd64.broken.gz

not affected (only sata_nv-give-up-hardreset-on-nf2 reverted):
	2.6.28.7:		dmesg.2.6.28-7.slh.1.2-sidux-amd64.gz
	2.6.29-rc6-git7:	dmesg.2.6.29-rc6-sidux-amd64.working.gz

(don't be confused the $(uname -r) output for -rc kernels, thats's a 
limitation of the current kernel packaging)


This is a regression compared to plain 2.6.28.[0-6] and present in current 
-stable (2.6.28.7, 2.6.27.19 (untested)) and 2.6.29-rc*-git.

00:09.0 IDE interface [0101]: nVidia Corporation nForce3 Serial ATA Controller 2 [10de:00ee] (rev a2) (prog-if 85 [Master SecO PriO])
        Subsystem: Micro-Star International Co., Ltd. Device [1462:0250]
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0 (750ns min, 250ns max)
        Interrupt: pin A routed to IRQ 23
        Region 0: I/O ports at 09e0 [size=8]
        Region 1: I/O ports at 0be0 [size=4]
        Region 2: I/O ports at 0960 [size=8]
        Region 3: I/O ports at 0b60 [size=4]
        Region 4: I/O ports at c800 [size=16]
        Region 5: I/O ports at c400 [size=128]
        Capabilities: <access denied>
        Kernel driver in use: sata_nv
        Kernel modules: pata_acpi, sata_nv, ata_generic

00:0a.0 IDE interface [0101]: nVidia Corporation nForce3 Serial ATA Controller [10de:00e3] (rev a2) (prog-if 85 [Master SecO PriO])
        Subsystem: Micro-Star International Co., Ltd. Device [1462:0250]
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0 (750ns min, 250ns max)
        Interrupt: pin A routed to IRQ 22
        Region 0: I/O ports at 09f0 [size=8]
        Region 1: I/O ports at 0bf0 [size=4]
        Region 2: I/O ports at 0970 [size=8]
        Region 3: I/O ports at 0b70 [size=4]
        Region 4: I/O ports at b000 [size=16]
        Region 5: I/O ports at ac00 [size=128]
        Capabilities: <access denied>
        Kernel driver in use: sata_nv
        Kernel modules: pata_acpi, sata_nv, ata_generic

00:00.0 Host bridge [0600]: nVidia Corporation nForce3 250Gb Host Bridge [10de:00e1] (rev a1)
00:01.0 ISA bridge [0601]: nVidia Corporation nForce3 250Gb LPC Bridge [10de:00e0] (rev a2)
00:01.1 SMBus [0c05]: nVidia Corporation nForce 250Gb PCI System Management [10de:00e4] (rev a1)
00:02.0 USB Controller [0c03]: nVidia Corporation CK8S USB Controller [10de:00e7] (rev a1)
00:02.1 USB Controller [0c03]: nVidia Corporation CK8S USB Controller [10de:00e7] (rev a1)
00:02.2 USB Controller [0c03]: nVidia Corporation nForce3 EHCI USB 2.0 Controller [10de:00e8] (rev a2)
00:05.0 Bridge [0680]: nVidia Corporation CK8S Ethernet Controller [10de:00df] (rev a2)
00:06.0 Multimedia audio controller [0401]: nVidia Corporation nForce3 250Gb AC'97 Audio Controller [10de:00ea] (rev a1)
00:08.0 IDE interface [0101]: nVidia Corporation CK8S Parallel ATA Controller (v2.5) [10de:00e5] (rev a2)
00:09.0 IDE interface [0101]: nVidia Corporation nForce3 Serial ATA Controller 2 [10de:00ee] (rev a2)
00:0a.0 IDE interface [0101]: nVidia Corporation nForce3 Serial ATA Controller [10de:00e3] (rev a2)
00:0b.0 PCI bridge [0604]: nVidia Corporation nForce3 250Gb AGP Host to PCI Bridge [10de:00e2] (rev a2)
00:0e.0 PCI bridge [0604]: nVidia Corporation nForce3 250Gb PCI-to-PCI Bridge [10de:00ed] (rev a2)
00:18.0 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration [1022:1100]
00:18.1 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map [1022:1101]
00:18.2 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller [1022:1102]
00:18.3 Host bridge [0600]: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control [1022:1103]
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV280 [Radeon 9200 PRO] [1002:5960] (rev 01)
01:00.1 Display controller [0380]: ATI Technologies Inc RV280 [Radeon 9200 PRO] (Secondary) [1002:5940] (rev 01)
02:0a.0 Multimedia controller [0480]: Philips Semiconductors TriMedia TM-1300 [1131:5402] (rev 82)
02:0c.0 FireWire (IEEE 1394) [0c00]: VIA Technologies, Inc. VT6306 Fire II IEEE 1394 OHCI Link Layer Controller [1106:3044] (rev 46)
02:0d.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet [10ec:8169] (rev 10)

Regards
	Stefan Lippers-Hollmann

-- 

> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Robert Hancock <hancockr@shaw.ca>
> Reported-by: Saro <saro_v@hotmail.it>
> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> ---
>  drivers/ata/sata_nv.c |   14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> --- a/drivers/ata/sata_nv.c
> +++ b/drivers/ata/sata_nv.c
> @@ -421,19 +421,21 @@ static struct ata_port_operations nv_gen
>  	.hardreset		= ATA_OP_NULL,
>  };
>  
> -/* OSDL bz3352 reports that nf2/3 controllers can't determine device
> - * signature reliably.  Also, the following thread reports detection
> - * failure on cold boot with the standard debouncing timing.
> +/* nf2 is ripe with hardreset related problems.
> + *
> + * kernel bz#3352 reports nf2/3 controllers can't determine device
> + * signature reliably.  The following thread reports detection failure
> + * on cold boot with the standard debouncing timing.
>   *
>   * http://thread.gmane.org/gmane.linux.ide/34098
>   *
> - * Debounce with hotplug timing and request follow-up SRST.
> + * And bz#12176 reports that hardreset simply doesn't work on nf2.
> + * Give up on it and just don't do hardreset.
>   */
>  static struct ata_port_operations nv_nf2_ops = {
> -	.inherits		= &nv_common_ops,
> +	.inherits		= &nv_generic_ops,
>  	.freeze			= nv_nf2_freeze,
>  	.thaw			= nv_nf2_thaw,
> -	.hardreset		= nv_noclassify_hardreset,
>  };
>  
>  /* For initial probing after boot and hot plugging, hardreset mostly

[-- Attachment #2: dmesg.2.6.29-rc6-sidux-amd64.broken.gz --]
[-- Type: application/x-gzip, Size: 9726 bytes --]

[-- Attachment #3: dmesg.2.6.29-rc6-sidux-amd64.working.gz --]
[-- Type: application/x-gzip, Size: 9465 bytes --]

[-- Attachment #4: dmesg.2.6.28-7.slh.1.2-sidux-amd64.gz --]
[-- Type: application/x-gzip, Size: 8985 bytes --]

[-- Attachment #5: dmesg.2.6.28-7.slh.1-sidux-amd64.gz --]
[-- Type: application/x-gzip, Size: 9217 bytes --]

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [patch 05/39] sata_nv: give up hardreset on nf2
  2009-03-03 23:53     ` Stefan Lippers-Hollmann
@ 2009-03-04  2:12       ` Tejun Heo
  0 siblings, 0 replies; 42+ messages in thread
From: Tejun Heo @ 2009-03-04  2:12 UTC (permalink / raw)
  To: Stefan Lippers-Hollmann
  Cc: Greg KH, linux-kernel, stable, Robert Hancock, Jeff Garzik,
	saro_v

Hello, Stefan.

Stefan Lippers-Hollmann wrote:
> Hi
> 
> On Mittwoch, 18. Februar 2009, Greg KH wrote:
>> 2.6.28-stable review patch.  If anyone has any objections, please let us know.
> [...]
>> From: Tejun Heo <tj@kernel.org>
>>
>> commit 7dac745b8e367c99175b8f0d014d996f0e5ed9e5 upstream.
>>
>> Kernel bz#12176 reports that nf2 hardreset simply doesn't work.  Give
>> up.  Argh...
> 
> Sorry to report this rather late, but I upgraded a remote nforce3 system to
> 2.6.28.7 today, and noticed the following messages repeating continously 
> (reproducable on a different nforce3 system as well):
> 
> ata4: EH pending after 5 tries, giving up
> ata4: EH complete
> ata4: EH pending after 5 tries, giving up
> ata4: EH complete
> 
> Reverting just this patch from 2.6.28.7 lets them disappear again. Likewise
> 2.6.29-rc6-git7 is affected as well - and reverting this patch seems to fix
> the issue there as well. 

This is reported in bko#11615.  I don't quite understand it yet.  The
only thing that should have changed is the use of softreset instead of
hardreset and changing inheritance to nv_generic_ops should have done
exactly that in a very safe way.  Apparently, it didn't.  I'll dig
into what went wrong.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2009-03-04  2:12 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090218212144.965748151@mini.kroah.org>
2009-02-18 21:30 ` [patch 00/39] 2.6.28.7-stable review Greg KH
2009-02-18 21:32   ` [patch 01/39] pid: implement ns_of_pid Greg KH
2009-02-18 21:32   ` [patch 02/39] mqueue: fix si_pid value in mqueue do_notify() Greg KH
2009-02-18 21:32   ` [patch 03/39] WATCHDOG: iTCO_wdt: fix SMI_EN regression 2 Greg KH
2009-02-18 21:32   ` [patch 04/39] powerpc/vsx: Fix VSX alignment handler for regs 32-63 Greg KH
2009-02-18 21:32   ` [patch 05/39] sata_nv: give up hardreset on nf2 Greg KH
2009-03-03 23:53     ` Stefan Lippers-Hollmann
2009-03-04  2:12       ` Tejun Heo
2009-02-18 21:32   ` [patch 06/39] Fix Intel IOMMU write-buffer flushing Greg KH
2009-02-18 21:32   ` [patch 07/39] SCSI: libiscsi: fix iscsi pool leak Greg KH
2009-02-18 21:32   ` [patch 08/39] x86/cpa: make sure cpa is safe to call in lazy mmu mode Greg KH
2009-02-18 21:32   ` [patch 09/39] sched: SCHED_OTHER vs SCHED_IDLE isolation Greg KH
2009-02-18 21:32   ` [patch 10/39] x86, vm86: fix preemption bug Greg KH
2009-02-18 21:32   ` [patch 11/39] Add support for VT6415 PCIE PATA IDE Host Controller Greg KH
2009-02-18 21:32   ` [patch 12/39] ext2/xip: refuse to change xip flag during remount with busy inodes Greg KH
2009-02-18 21:32   ` [patch 13/39] 3c505: do not set pcb->data.raw beyond its size Greg KH
2009-02-18 21:32   ` [patch 14/39] Bluetooth: Fix TX error path in btsdio driver Greg KH
2009-02-18 21:32   ` [patch 15/39] ext4: Add support for non-native signed/unsigned htree hash algorithms Greg KH
2009-02-18 21:32   ` [patch 16/39] ext4: tone down ext4_da_writepages warnings Greg KH
2009-02-18 21:32   ` [patch 17/39] ext4: Fix the delalloc writepages to allocate blocks at the right offset Greg KH
2009-02-18 21:32   ` [patch 18/39] ext4: avoid ext4_error when mounting a fs with a single bg Greg KH
2009-02-18 21:32   ` [patch 19/39] ext4: Widen type of ext4_sb_info.s_mb_maxs[] Greg KH
2009-02-18 21:32   ` [patch 20/39] jbd2: Add barrier not supported test to journal_wait_on_commit_record Greg KH
2009-02-18 21:32   ` [patch 21/39] ext4: Dont overwrite allocation_context ac_status Greg KH
2009-02-18 21:32   ` [patch 22/39] ext4: Add blocks added during resize to bitmap Greg KH
2009-02-18 21:32   ` [patch 23/39] ext4: Use EXT4_GROUP_INFO_NEED_INIT_BIT during resize Greg KH
2009-02-18 21:32   ` [patch 24/39] ext4: cleanup mballoc header files Greg KH
2009-02-18 21:33   ` [patch 25/39] ext4: dont use blocks freed but not yet committed in buddy cache init Greg KH
2009-02-18 21:33   ` [patch 26/39] ext4: Fix race between read_block_bitmap() and mark_diskspace_used() Greg KH
2009-02-18 21:33   ` [patch 27/39] ext4: Fix the race between read_inode_bitmap() and ext4_new_inode() Greg KH
2009-02-18 21:33   ` [patch 28/39] jbd2: Add BH_JBDPrivateStart Greg KH
2009-02-18 21:33   ` [patch 29/39] ext4: Use new buffer_head flag to check uninit group bitmaps initialization Greg KH
2009-02-18 21:33   ` [patch 30/39] ext4: mark the blocks/inode bitmap beyond end of group as used Greg KH
2009-02-18 21:33   ` [patch 31/39] ext4: Dont allow new groups to be added during block allocation Greg KH
2009-02-18 21:33   ` [patch 32/39] ext4: Init the complete page while building buddy cache Greg KH
2009-02-18 21:33   ` [patch 33/39] ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc Greg KH
2009-02-18 21:33   ` [patch 34/39] ext4: Add sanity checks for the superblock before mounting the filesystem Greg KH
2009-02-18 21:33   ` [patch 35/39] ext4: only use i_size_high for regular files Greg KH
2009-02-18 21:33   ` [patch 36/39] ext4: Add sanity check to make_indexed_dir Greg KH
2009-02-18 21:33   ` [patch 37/39] jbd2: On a __journal_expect() assertion failure printk "JBD2", not "EXT3-fs" Greg KH
2009-02-18 21:33   ` [patch 38/39] ext4: Initialize the new group descriptor when resizing the filesystem Greg KH
2009-02-18 21:33   ` [patch 39/39] Fix longstanding "error: storage size of __mod_dmi_device_table isnt known" Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox