public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [2.6.20.16 review 00/28] 2.6.20.16 -stable review
@ 2007-08-11 18:47 Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 01/28] i386: Fix K8/core2 oprofile on multiple CPUs Willy Tarreau
                   ` (23 more replies)
  0 siblings, 24 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 18:47 UTC (permalink / raw)
  To: linux-kernel, stable

I proposed Chris and Greg to continue issuing a few more 2.6.20 releases
during the time needed for 2.6.21 and 2.6.22 to show a significant drop
in their patch rates, which hopefully will be just a matter of a few
releases.

My goal is *not* to do all the hard work they do, but just to backport
from their patches those which are meaningful for 2.6.20. For this
reason, 2.6.20 releases will always be slightly late and should not
contain patches not merged in more recent releases.

My intent with this version is to catch up with 2.6.21.7. Other patches
are already pending for future releases, but one thing at a time. I'm
trying to follow the same review/release process, so 28 patches will
be posted for review in response to this message.

If some people think it's useless to repost individual patches that have
already been reviewed in more recent versions, I'm open to adapting the
process (eg: switch to one mail for -rc and one for release like Adrian
does with 2.6.16).

The rolled up patch can be found here :
   ftp.kernel.org/pub/linux/kernel/v2.6/stable-review/patch-2.6.20.16-rc1.gz

Responses should be made by August 13, 2007, 20:00:00 UTC. Anything
received after that time might be too late.

Thanks,
Willy

--

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 01/28] i386: Fix K8/core2 oprofile on multiple CPUs
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
@ 2007-08-11 19:47 ` Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 02/28] md: Avoid overflow in raid0 calculation with large components Willy Tarreau
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Andi Kleen, Linus Torvalds, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0001-PATCH-i386-Fix-K8-core2-oprofile-on-multiple-CPUs.patch --]
[-- Type: text/plain, Size: 1610 bytes --]

Only try to allocate MSRs once instead of for every CPU.

This assumes the MSRs are the same on all CPUs which is currently
true. P4-HT is a special case for different SMT threads, but the code
always saves/restores all MSRs so it works identical.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/i386/oprofile/nmi_int.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/arch/i386/oprofile/nmi_int.c b/arch/i386/oprofile/nmi_int.c
index 3700eef..be4a9a8 100644
--- a/arch/i386/oprofile/nmi_int.c
+++ b/arch/i386/oprofile/nmi_int.c
@@ -131,7 +131,6 @@ static void nmi_save_registers(void * dummy)
 {
 	int cpu = smp_processor_id();
 	struct op_msrs * msrs = &cpu_msrs[cpu];
-	model->fill_in_addresses(msrs);
 	nmi_cpu_save_registers(msrs);
 }
 
@@ -195,6 +194,7 @@ static struct notifier_block profile_exceptions_nb = {
 static int nmi_setup(void)
 {
 	int err=0;
+	int cpu;
 
 	if (!allocate_msrs())
 		return -ENOMEM;
@@ -207,6 +207,13 @@ static int nmi_setup(void)
 	/* We need to serialize save and setup for HT because the subset
 	 * of msrs are distinct for save and setup operations
 	 */
+
+	/* Assume saved/restored counters are the same on all CPUs */
+	model->fill_in_addresses(&cpu_msrs[0]);
+	for_each_possible_cpu (cpu) {
+		if (cpu != 0)
+			cpu_msrs[cpu] = cpu_msrs[0];
+	}
 	on_each_cpu(nmi_save_registers, NULL, 0, 1);
 	on_each_cpu(nmi_cpu_setup, NULL, 0, 1);
 	nmi_enabled = 1;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 02/28] md: Avoid overflow in raid0 calculation with large components.
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 01/28] i386: Fix K8/core2 oprofile on multiple CPUs Willy Tarreau
@ 2007-08-11 19:47 ` Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 03/28] md: Dont write more than is required of the last page of a bitmap Willy Tarreau
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jeff Zheng, Neil Brown, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0002-PATCH-md-Avoid-overflow-in-raid0-calculation-with.patch --]
[-- Type: text/plain, Size: 1334 bytes --]

If a raid0 has a component device larger than 4TB, and is accessed on
a 32bit machines, then as 'chunk' is unsigned lock,
   chunk << chunksize_bits
can overflow (this can be as high as the size of the device in KB).
chunk itself will not overflow (without triggering a BUG).

So change 'chunk' to be 'sector_t, and get rid of the 'BUG' as it becomes
impossible to hit.

Cc: "Jeff Zheng" <Jeff.Zheng@endace.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/raid0.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index dfe3214..2c404f7 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -415,7 +415,7 @@ static int raid0_make_request (request_queue_t *q, struct bio *bio)
 	raid0_conf_t *conf = mddev_to_conf(mddev);
 	struct strip_zone *zone;
 	mdk_rdev_t *tmp_dev;
-	unsigned long chunk;
+	sector_t chunk;
 	sector_t block, rsect;
 	const int rw = bio_data_dir(bio);
 
@@ -470,7 +470,6 @@ static int raid0_make_request (request_queue_t *q, struct bio *bio)
 
 		sector_div(x, zone->nb_dev);
 		chunk = x;
-		BUG_ON(x != (sector_t)chunk);
 
 		x = block >> chunksize_bits;
 		tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 03/28] md: Dont write more than is required of the last page of a bitmap
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 01/28] i386: Fix K8/core2 oprofile on multiple CPUs Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 02/28] md: Avoid overflow in raid0 calculation with large components Willy Tarreau
@ 2007-08-11 19:47 ` Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 04/28] make freezeable workqueues singlethread Willy Tarreau
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:47 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Neil Brown, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0003-PATCH-md-Don-t-write-more-than-is-required-of-the.patch --]
[-- Type: text/plain, Size: 2653 bytes --]

It is possible that real data or metadata follows the bitmap
without full page alignment.
So limit the last write to be only the required number of bytes,
rounded up to the hard sector size of the device.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/bitmap.c         |   17 ++++++++++++-----
 include/linux/raid/bitmap.h |    1 +
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index cef1287..550ac72 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -255,19 +255,25 @@ static struct page *read_sb_page(mddev_t *mddev, long offset, unsigned long inde
 
 }
 
-static int write_sb_page(mddev_t *mddev, long offset, struct page *page, int wait)
+static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
 {
 	mdk_rdev_t *rdev;
 	struct list_head *tmp;
+	mddev_t *mddev = bitmap->mddev;
 
 	ITERATE_RDEV(mddev, rdev, tmp)
 		if (test_bit(In_sync, &rdev->flags)
-		    && !test_bit(Faulty, &rdev->flags))
+		    && !test_bit(Faulty, &rdev->flags)) {
+			int size = PAGE_SIZE;
+			if (page->index == bitmap->file_pages-1)
+				size = roundup(bitmap->last_page_size,
+					       bdev_hardsect_size(rdev->bdev));
 			md_super_write(mddev, rdev,
-				       (rdev->sb_offset<<1) + offset
+				       (rdev->sb_offset<<1) + bitmap->offset
 				       + page->index * (PAGE_SIZE/512),
-				       PAGE_SIZE,
+				       size,
 				       page);
+		}
 
 	if (wait)
 		md_super_wait(mddev);
@@ -282,7 +288,7 @@ static int write_page(struct bitmap *bitmap, struct page *page, int wait)
 	struct buffer_head *bh;
 
 	if (bitmap->file == NULL)
-		return write_sb_page(bitmap->mddev, bitmap->offset, page, wait);
+		return write_sb_page(bitmap, page, wait);
 
 	bh = page_buffers(page);
 
@@ -923,6 +929,7 @@ static int bitmap_init_from_disk(struct bitmap *bitmap, sector_t start)
 			}
 
 			bitmap->filemap[bitmap->file_pages++] = page;
+			bitmap->last_page_size = count;
 		}
 		paddr = kmap_atomic(page, KM_USER0);
 		if (bitmap->flags & BITMAP_HOSTENDIAN)
diff --git a/include/linux/raid/bitmap.h b/include/linux/raid/bitmap.h
index 6db9a4c..dd5a05d 100644
--- a/include/linux/raid/bitmap.h
+++ b/include/linux/raid/bitmap.h
@@ -232,6 +232,7 @@ struct bitmap {
 	struct page **filemap; /* list of cache pages for the file */
 	unsigned long *filemap_attr; /* attributes associated w/ filemap pages */
 	unsigned long file_pages; /* number of pages in the file */
+	int last_page_size; /* bytes in the last page */
 
 	unsigned long flags;
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 04/28] make freezeable workqueues singlethread
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (2 preceding siblings ...)
  2007-08-11 19:47 ` [2.6.20.16 review 03/28] md: Dont write more than is required of the last page of a bitmap Willy Tarreau
@ 2007-08-11 19:47 ` Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 05/28] Char: cyclades, fix deadlock Willy Tarreau
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Rafael J. Wysocki, Gautham R Shenoy, Andrew Morton,
	Linus Torvalds, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0004-PATCH-make-freezeable-workqueues-singlethread.patch --]
[-- Type: text/plain, Size: 1348 bytes --]

It is a known fact that freezeable multithreaded workqueues doesn't like
CPU_DEAD. We keep them only for the incoming CPU-hotplug rework.

Sadly, we can't just kill create_freezeable_workqueue() right now, make
them singlethread.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 include/linux/workqueue.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 2a7b38d..1a76bda 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -162,7 +162,7 @@ extern struct workqueue_struct *__create_workqueue(const char *name,
 						    int singlethread,
 						    int freezeable);
 #define create_workqueue(name) __create_workqueue((name), 0, 0)
-#define create_freezeable_workqueue(name) __create_workqueue((name), 0, 1)
+#define create_freezeable_workqueue(name) __create_workqueue((name), 1, 1)
 #define create_singlethread_workqueue(name) __create_workqueue((name), 1, 0)
 
 extern void destroy_workqueue(struct workqueue_struct *wq);
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 05/28] Char: cyclades, fix deadlock
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (3 preceding siblings ...)
  2007-08-11 19:47 ` [2.6.20.16 review 04/28] make freezeable workqueues singlethread Willy Tarreau
@ 2007-08-11 19:47 ` Willy Tarreau
  2007-08-11 19:47 ` [2.6.20.16 review 06/28] e1000: disable polling before registering netdevice Willy Tarreau
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jiri Slaby, Andrew Morton, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0005-PATCH-Char-cyclades-fix-deadlock.patch --]
[-- Type: text/plain, Size: 746 bytes --]

An omitted unlock.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/char/cyclades.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/char/cyclades.c b/drivers/char/cyclades.c
index 3ffa080..e4e0ccb 100644
--- a/drivers/char/cyclades.c
+++ b/drivers/char/cyclades.c
@@ -1102,6 +1102,7 @@ static void cyy_intr_chip(struct cyclades_card *cinfo, int chip,
 
 				if (data & info->ignore_status_mask) {
 					info->icount.rx++;
+					spin_unlock(&cinfo->card_lock);
 					return;
 				}
 				if (tty_buffer_request_room(tty, 1)) {
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 06/28] e1000: disable polling before registering netdevice
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (4 preceding siblings ...)
  2007-08-11 19:47 ` [2.6.20.16 review 05/28] Char: cyclades, fix deadlock Willy Tarreau
@ 2007-08-11 19:47 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G Willy Tarreau
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:47 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Auke Kok, Herbert Xu, Doug Chapman, Jeff Garzik, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0006-PATCH-e1000-disable-polling-before-registering-ne.patch --]
[-- Type: text/plain, Size: 1478 bytes --]

To assure the symmetry of poll enable/disable in up/down, we should
initialize the netdevice to be poll_disabled at load time. Doing
this after register_netdevice leaves us open to another race, so
lets move all the netif_* calls above register_netdevice so the
stack starts out how we expect it to be.

Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Doug Chapman <doug.chapman@hp.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/net/e1000/e1000_main.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index c6259c7..40bdcf9 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -1157,13 +1157,16 @@ e1000_probe(struct pci_dev *pdev,
 	    !e1000_check_mng_mode(&adapter->hw))
 		e1000_get_hw_control(adapter);
 
-	strcpy(netdev->name, "eth%d");
-	if ((err = register_netdev(netdev)))
-		goto err_register;
-
 	/* tell the stack to leave us alone until e1000_open() is called */
 	netif_carrier_off(netdev);
 	netif_stop_queue(netdev);
+#ifdef CONFIG_E1000_NAPI
+	netif_poll_disable(netdev);
+#endif
+
+	strcpy(netdev->name, "eth%d");
+	if ((err = register_netdev(netdev)))
+		goto err_register;
 
 	DPRINTK(PROBE, INFO, "Intel(R) PRO/1000 Network Connection
");
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (5 preceding siblings ...)
  2007-08-11 19:47 ` [2.6.20.16 review 06/28] e1000: disable polling before registering netdevice Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-12 10:18   ` Andi Kleen
  2007-08-11 19:48 ` [2.6.20.16 review 09/28] sparsemem: fix oops in x86_64 show_mem Willy Tarreau
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zou Nan hai, Suresh Siddha, Andi Kleen, Andrew Morton,
	Linus Torvalds, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0008-PATCH-x86_64-allocate-sparsemem-memmap-above-4G.patch --]
[-- Type: text/plain, Size: 2681 bytes --]

On systems with huge amount of physical memory, VFS cache and memory memmap
may eat all available system memory under 4G, then the system may fail to
allocate swiotlb bounce buffer.

There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
not cover sparsemem model.

This patch add fix to sparsemem model by first try to allocate memmap above
4G.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[chrisw: trivial backport]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/x86_64/mm/init.c   |    6 ++++++
 include/linux/bootmem.h |    1 +
 mm/sparse.c             |   11 +++++++++++
 3 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/mm/init.c b/arch/x86_64/mm/init.c
index 2968b90..2489aa7 100644
--- a/arch/x86_64/mm/init.c
+++ b/arch/x86_64/mm/init.c
@@ -766,3 +766,9 @@ int in_gate_area_no_task(unsigned long addr)
 {
 	return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
 }
+
+void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
+{
+	return __alloc_bootmem_core(pgdat->bdata, size,
+			SMP_CACHE_BYTES, (4UL*1024*1024*1024), 0);
+}
diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index 2275f27..8f820e4 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -59,6 +59,7 @@ extern void *__alloc_bootmem_core(struct bootmem_data *bdata,
 				  unsigned long align,
 				  unsigned long goal,
 				  unsigned long limit);
+extern void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size);
 
 #ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE
 extern void reserve_bootmem(unsigned long addr, unsigned long size);
diff --git a/mm/sparse.c b/mm/sparse.c
index ac26eb0..faa08e2 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -209,6 +209,12 @@ static int sparse_init_one_section(struct mem_section *ms,
 	return 1;
 }
 
+__attribute__((weak))
+void *alloc_bootmem_high_node(pg_data_t *pgdat, unsigned long size)
+{
+	return NULL;
+}
+
 static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
 {
 	struct page *map;
@@ -219,6 +225,11 @@ static struct page *sparse_early_mem_map_alloc(unsigned long pnum)
 	if (map)
 		return map;
 
+  	map = alloc_bootmem_high_node(NODE_DATA(nid),
+                       sizeof(struct page) * PAGES_PER_SECTION);
+	if (map)
+		return map;
+
 	map = alloc_bootmem_node(NODE_DATA(nid),
 			sizeof(struct page) * PAGES_PER_SECTION);
 	if (map)
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 09/28] sparsemem: fix oops in x86_64 show_mem
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (6 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 10/28] rt-mutex: Fix stale return value Willy Tarreau
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Bob Picco, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0009-sparsemem-fix-oops-in-x86_64-show_mem.patch --]
[-- Type: text/plain, Size: 1388 bytes --]

We aren't sampling for holes in memory. Thus we encounter a section hole with
empty section map pointer for SPARSEMEM and OOPs for show_mem. This issue
has been seen in 2.6.21, current git and current mm. This patch is for
2.6.21 stable. It was tested against sparsemem.

Previous to commit f0a5a58aa812b31fd9f197c4ba48245942364eae memory_present
was called for node_start_pfn to node_end_pfn. This would cover the hole(s)
with reserved pages and valid sections. Most SPARSEMEM supported arches
do a pfn_valid check in show_mem before computing the page structure address.

This issue was brought to my attention on IRC by Arnaldo Carvalho de Melo at
acme@redhat.com. Thanks to Arnaldo for testing.

Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/x86_64/mm/init.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/mm/init.c b/arch/x86_64/mm/init.c
index 2489aa7..e67cc4f 100644
--- a/arch/x86_64/mm/init.c
+++ b/arch/x86_64/mm/init.c
@@ -72,6 +72,8 @@ void show_mem(void)
 
 	for_each_online_pgdat(pgdat) {
                for (i = 0; i < pgdat->node_spanned_pages; ++i) {
+			if (!pfn_valid(pgdat->node_start_pfn + i))
+				continue;
 			page = pfn_to_page(pgdat->node_start_pfn + i);
 			total++;
 			if (PageReserved(page))
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 10/28] rt-mutex: Fix stale return value
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (7 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 09/28] sparsemem: fix oops in x86_64 show_mem Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 11/28] rt-mutex: Fix chain walk early wakeup bug Willy Tarreau
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alexey Kuznetsov, Thomas Gleixner, Ingo Molnar, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0010-rt-mutex-Fix-stale-return-value.patch --]
[-- Type: text/plain, Size: 1339 bytes --]

Alexey Kuznetsov found some problems in the pi-futex code.

The major problem is a stale return value in rt_mutex_slowlock():

When the pi chain walk returns -EDEADLK, but the waiter was woken up
during the phases where the locks were dropped, the rtmutex could be
acquired, but due to the stale return value -EDEADLK returned to the
caller.

Reset the return value in the woken up path.

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 kernel/rtmutex.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 4ab17da..9b08847 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -659,9 +659,16 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 			 * all over without going into schedule to try
 			 * to get the lock now:
 			 */
-			if (unlikely(!waiter.task))
+			if (unlikely(!waiter.task)) {
+				/*
+				 * Reset the return value. We might
+				 * have returned with -EDEADLK and the
+				 * owner released the lock while we
+				 * were walking the pi chain.
+				 */
+				ret = 0;
 				continue;
-
+			}
 			if (unlikely(ret))
 				break;
 		}
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 11/28] rt-mutex: Fix chain walk early wakeup bug
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (8 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 10/28] rt-mutex: Fix stale return value Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 13/28] md: Fix two raid10 bugs Willy Tarreau
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Alexey Kuznetsov, Thomas Gleixner, Ingo Molnar, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0011-rt-mutex-Fix-chain-walk-early-wakeup-bug.patch --]
[-- Type: text/plain, Size: 1328 bytes --]

Alexey Kuznetsov found some problems in the pi-futex code.

One of the root causes is:

When a wakeup happens, we do not to stop the chain walk so we
we follow a non existing locking chain.

Drop out when this happens.

Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 kernel/rtmutex.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 9b08847..dd5feae 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -212,6 +212,19 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 	if (!waiter || !waiter->task)
 		goto out_unlock_pi;
 
+	/*
+	 * Check the orig_waiter state. After we dropped the locks,
+	 * the previous owner of the lock might have released the lock
+	 * and made us the pending owner:
+	 */
+	if (orig_waiter && !orig_waiter->task)
+		goto out_unlock_pi;
+
+	/*
+	 * Drop out, when the task has no waiters. Note,
+	 * top_waiter can be NULL, when we are in the deboosting
+	 * mode!
+	 */
 	if (top_waiter && (!task_has_pi_waiters(task) ||
 			   top_waiter != task_top_pi_waiter(task)))
 		goto out_unlock_pi;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 13/28] md: Fix two raid10 bugs.
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (9 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 11/28] rt-mutex: Fix chain walk early wakeup bug Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 14/28] md: Fix bug in error handling during raid1 repair Willy Tarreau
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Neil Brown, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0013-md-Fix-two-raid10-bugs.patch --]
[-- Type: text/plain, Size: 1380 bytes --]

1/ When resyncing a degraded raid10 which has more than 2 copies of each block,
  garbage can get synced on top of good data.

2/ We round the wrong way in part of the device size calculation, which
  can cause confusion.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/raid10.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 82249a6..9eb66c1 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1867,6 +1867,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 			int d = r10_bio->devs[i].devnum;
 			bio = r10_bio->devs[i].bio;
 			bio->bi_end_io = NULL;
+			clear_bit(BIO_UPTODATE, &bio->bi_flags);
 			if (conf->mirrors[d].rdev == NULL ||
 			    test_bit(Faulty, &conf->mirrors[d].rdev->flags))
 				continue;
@@ -2037,6 +2038,11 @@ static int run(mddev_t *mddev)
 	/* 'size' is now the number of chunks in the array */
 	/* calculate "used chunks per device" in 'stride' */
 	stride = size * conf->copies;
+
+	/* We need to round up when dividing by raid_disks to
+	 * get the stride size.
+	 */
+	stride += conf->raid_disks - 1;
 	sector_div(stride, conf->raid_disks);
 	mddev->size = stride  << (conf->chunk_shift-1);
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 14/28] md: Fix bug in error handling during raid1 repair.
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (10 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 13/28] md: Fix two raid10 bugs Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 15/28] dm crypt: disable barriers Willy Tarreau
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Neil Brown, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0014-md-Fix-bug-in-error-handling-during-raid1-repair.patch --]
[-- Type: text/plain, Size: 1798 bytes --]

If raid1/repair (which reads all block and fixes any differences
it finds) hits a read error, it doesn't reset the bio for writing
before writing correct data back, so the read error isn't fixed,
and the device probably gets a zero-length write which it might
complain about.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/raid1.c |   21 ++++++++++++++-------
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 97ee870..b20c6e9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1235,17 +1235,24 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 			}
 		r1_bio->read_disk = primary;
 		for (i=0; i<mddev->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read &&
-			    test_bit(BIO_UPTODATE, &r1_bio->bios[i]->bi_flags)) {
+			if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
 				int j;
 				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
 				struct bio *pbio = r1_bio->bios[primary];
 				struct bio *sbio = r1_bio->bios[i];
-				for (j = vcnt; j-- ; )
-					if (memcmp(page_address(pbio->bi_io_vec[j].bv_page),
-						   page_address(sbio->bi_io_vec[j].bv_page),
-						   PAGE_SIZE))
-						break;
+
+				if (test_bit(BIO_UPTODATE, &sbio->bi_flags)) {
+					for (j = vcnt; j-- ; ) {
+						struct page *p, *s;
+						p = pbio->bi_io_vec[j].bv_page;
+						s = sbio->bi_io_vec[j].bv_page;
+						if (memcmp(page_address(p),
+							   page_address(s),
+							   PAGE_SIZE))
+							break;
+					}
+				} else
+					j = 0;
 				if (j >= 0)
 					mddev->resync_mismatches += r1_bio->sectors;
 				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 15/28] dm crypt: disable barriers
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (11 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 14/28] md: Fix bug in error handling during raid1 repair Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 16/28] dm crypt: fix call to clone_init Willy Tarreau
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Milan Broz, Alasdair G Kergon, Jens Axboe, Andrew Morton,
	Linus Torvalds, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0015-dm-crypt-disable-barriers.patch --]
[-- Type: text/plain, Size: 1076 bytes --]

Disable barriers in dm-crypt because of current workqueue processing can
reorder requests.

This must be addresed later but for now disabling barriers is needed to
prevent data corruption.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/dm-crypt.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4c2471e..f68677d 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -954,6 +954,9 @@ static int crypt_map(struct dm_target *ti, struct bio *bio,
 	struct crypt_config *cc = ti->private;
 	struct crypt_io *io;
 
+	if (bio_barrier(bio))
+		return -EOPNOTSUPP;
+
 	io = mempool_alloc(cc->io_pool, GFP_NOIO);
 	io->target = ti;
 	io->base_bio = bio;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 16/28] dm crypt: fix call to clone_init
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (12 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 15/28] dm crypt: disable barriers Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 17/28] dm crypt: fix avoid cloned bio ref after free Willy Tarreau
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Olaf Kirch, Alasdair G Kergon, Andrew Morton, Linus Torvalds,
	Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0016-dm-crypt-fix-call-to-clone_init.patch --]
[-- Type: text/plain, Size: 2914 bytes --]

Call clone_init early

We need to call clone_init as early as possible - at least before call
bio_put(clone) in any error path.  Otherwise, the destructor will try to
dereference bi_private, which may still be NULL.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/dm-crypt.c |   12 +++++++-----
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index f68677d..bffaf1c 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -107,6 +107,8 @@ struct crypt_config {
 
 static struct kmem_cache *_crypt_io_pool;
 
+static void clone_init(struct crypt_io *, struct bio *);
+
 /*
  * Different IV generation algorithms:
  *
@@ -379,9 +381,10 @@ static int crypt_convert(struct crypt_config *cc,
  * May return a smaller bio when running out of pages
  */
 static struct bio *
-crypt_alloc_buffer(struct crypt_config *cc, unsigned int size,
+crypt_alloc_buffer(struct crypt_io *io, unsigned int size,
                    struct bio *base_bio, unsigned int *bio_vec_idx)
 {
+	struct crypt_config *cc = io->target->private;
 	struct bio *clone;
 	unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 	gfp_t gfp_mask = GFP_NOIO | __GFP_HIGHMEM;
@@ -396,7 +399,7 @@ crypt_alloc_buffer(struct crypt_config *cc, unsigned int size,
 	if (!clone)
 		return NULL;
 
-	clone->bi_destructor = dm_crypt_bio_destructor;
+	clone_init(io, clone);
 
 	/* if the last bio was not complete, continue where that one ended */
 	clone->bi_idx = *bio_vec_idx;
@@ -562,6 +565,7 @@ static void clone_init(struct crypt_io *io, struct bio *clone)
 	clone->bi_end_io  = crypt_endio;
 	clone->bi_bdev    = cc->dev->bdev;
 	clone->bi_rw      = io->base_bio->bi_rw;
+	clone->bi_destructor = dm_crypt_bio_destructor;
 }
 
 static void process_read(struct crypt_io *io)
@@ -585,7 +589,6 @@ static void process_read(struct crypt_io *io)
 	}
 
 	clone_init(io, clone);
-	clone->bi_destructor = dm_crypt_bio_destructor;
 	clone->bi_idx = 0;
 	clone->bi_vcnt = bio_segments(base_bio);
 	clone->bi_size = base_bio->bi_size;
@@ -615,7 +618,7 @@ static void process_write(struct crypt_io *io)
 	 * so repeat the whole process until all the data can be handled.
 	 */
 	while (remaining) {
-		clone = crypt_alloc_buffer(cc, base_bio->bi_size,
+		clone = crypt_alloc_buffer(io, base_bio->bi_size,
 					   io->first_clone, &bvec_idx);
 		if (unlikely(!clone)) {
 			dec_pending(io, -ENOMEM);
@@ -631,7 +634,6 @@ static void process_write(struct crypt_io *io)
 			return;
 		}
 
-		clone_init(io, clone);
 		clone->bi_sector = cc->start + sector;
 
 		if (!io->first_clone) {
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 17/28] dm crypt: fix avoid cloned bio ref after free
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (13 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 16/28] dm crypt: fix call to clone_init Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 19/28] sched: fix next_interval determination in idle_balance() Willy Tarreau
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Olaf Kirch, Alasdair G Kergon, Jens Axboe, Andrew Morton,
	Linus Torvalds, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0017-dm-crypt-fix-avoid-cloned-bio-ref-after-free.patch --]
[-- Type: text/plain, Size: 1090 bytes --]

Do not access the bio after generic_make_request

We should never access a bio after generic_make_request - there's no guarantee
it still exists.

Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/md/dm-crypt.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index bffaf1c..28831a9 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -655,9 +655,12 @@ static void process_write(struct crypt_io *io)
 
 		generic_make_request(clone);
 
+		/* Do not reference clone after this - it
+		 * may be gone already. */
+
 		/* out of memory -> run queues */
 		if (remaining)
-			congestion_wait(bio_data_dir(clone), HZ/100);
+			congestion_wait(WRITE, HZ/100);
 	}
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 19/28] sched: fix next_interval determination in idle_balance()
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (14 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 17/28] dm crypt: fix avoid cloned bio ref after free Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 21/28] audit: fix oops removing watch if audit disabled Willy Tarreau
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christoph Lameter, Ingo Molnar, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0019-sched-fix-next_interval-determination-in-idle_balan.patch --]
[-- Type: text/plain, Size: 2229 bytes --]

Fix massive SMP imbalance on NUMA nodes observed on 2.6.21.5 with CFS.
(and later on reproduced without CFS as well).

The intervals of domains that do not have SD_BALANCE_NEWIDLE must be
considered for the calculation of the time of the next balance.
Otherwise we may defer rebalancing forever and nodes might stay idle for
very long times.

Siddha also spotted that the conversion of the balance interval to
jiffies is missing. Fix that to.

From: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

also continue the loop if !(sd->flags & SD_LOAD_BALANCE).

Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

It did in fact trigger under all three of mainline, CFS, and -rt
including CFS -- see below for a couple of emails from last Friday
giving results for these three on the AMD box (where it happened) and on
a single-quad NUMA-Q system (where it did not, at least not with such
severity).

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 kernel/sched.c |   22 +++++++++++++---------
 1 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 62db30c..907ab05 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2814,17 +2814,21 @@ static void idle_balance(int this_cpu, struct rq *this_rq)
 	unsigned long next_balance = jiffies + 60 *  HZ;
 
 	for_each_domain(this_cpu, sd) {
-		if (sd->flags & SD_BALANCE_NEWIDLE) {
+		unsigned long interval;
+
+		if (!(sd->flags & SD_LOAD_BALANCE))
+			continue;
+
+		if (sd->flags & SD_BALANCE_NEWIDLE)
 			/* If we've pulled tasks over stop searching: */
 			pulled_task = load_balance_newidle(this_cpu,
-							this_rq, sd);
-			if (time_after(next_balance,
-				  sd->last_balance + sd->balance_interval))
-				next_balance = sd->last_balance
-						+ sd->balance_interval;
-			if (pulled_task)
-				break;
-		}
+								this_rq, sd);
+
+		interval = msecs_to_jiffies(sd->balance_interval);
+		if (time_after(next_balance, sd->last_balance + interval))
+			next_balance = sd->last_balance + interval;
+		if (pulled_task)
+			break;
 	}
 	if (!pulled_task)
 		/*
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 21/28] audit: fix oops removing watch if audit disabled
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (15 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 19/28] sched: fix next_interval determination in idle_balance() Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 22/28] POWERPC: Fix subtle FP state corruption bug in signal return on SMP Willy Tarreau
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tony Jones, Al Viro, Andrew Morton, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0021-audit-fix-oops-removing-watch-if-audit-disabled.patch --]
[-- Type: text/plain, Size: 1073 bytes --]

Removing a watched file will oops if audit is disabled (auditctl -e 0).

To reproduce:
- auditctl -e 1
- touch /tmp/foo
- auditctl -w /tmp/foo
- auditctl -e 0
- rm /tmp/foo (or mv)

Signed-off-by: Tony Jones <tonyj@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 kernel/auditfilter.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index 9c8c232..5a75657 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -905,7 +905,7 @@ static void audit_update_watch(struct audit_parent *parent,
 
 		/* If the update involves invalidating rules, do the inode-based
 		 * filtering now, so we don't omit records. */
-		if (invalidating &&
+		if (invalidating && current->audit_context &&
 		    audit_filter_inodes(current, current->audit_context) == AUDIT_RECORD_CONTEXT)
 			audit_set_auditable(current->audit_context);
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 22/28] POWERPC: Fix subtle FP state corruption bug in signal return on SMP
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (16 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 21/28] audit: fix oops removing watch if audit disabled Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 23/28] mm: kill validate_anon_vma to avoid mapcount BUG Willy Tarreau
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Paul Mackerras, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0022-POWERPC-Fix-subtle-FP-state-corruption-bug-in-signa.patch --]
[-- Type: text/plain, Size: 2112 bytes --]

This fixes a bug which can cause corruption of the floating-point state
on return from a signal handler.  If we have a signal handler that has
used the floating-point registers, and it happens to context-switch to
another task while copying the interrupted floating-point state from the
user stack into the thread struct (e.g. because of a page fault, or
because it gets preempted), the context switch code will think that the
FP registers contain valid FP state that needs to be copied into the
thread_struct, and will thus overwrite the values that the signal return
code has put into the thread_struct.

This can occur because we clear the MSR bits that indicate the presence
of valid FP state after copying the state into the thread_struct.  To fix
this we just move the clearing of the MSR bits to before the copy.  A
similar potential problem also occurs with the Altivec state, and this
fixes that in the same way.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/powerpc/kernel/signal_64.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index f72e8e8..a84304e 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -177,6 +177,13 @@ static long restore_sigcontext(struct pt_regs *regs, sigset_t *set, int sig,
 	 */
 	discard_lazy_cpu_state();
 
+	/*
+	 * Force reload of FP/VEC.
+	 * This has to be done before copying stuff into current->thread.fpr/vr
+	 * for the reasons explained in the previous comment.
+	 */
+	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
+
 	err |= __copy_from_user(&current->thread.fpr, &sc->fp_regs, FP_REGS_SIZE);
 
 #ifdef CONFIG_ALTIVEC
@@ -198,9 +205,6 @@ static long restore_sigcontext(struct pt_regs *regs, sigset_t *set, int sig,
 		current->thread.vrsave = 0;
 #endif /* CONFIG_ALTIVEC */
 
-	/* Force reload of FP/VEC */
-	regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC);
-
 	return err;
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 23/28] mm: kill validate_anon_vma to avoid mapcount BUG
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (17 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 22/28] POWERPC: Fix subtle FP state corruption bug in signal return on SMP Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 24/28] saa7134: fix thread shutdown handling Willy Tarreau
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hugh Dickins, Petr Vandrovec, Nick Piggin, Andrea Arcangeli,
	Andrew Morton, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0023-mm-kill-validate_anon_vma-to-avoid-mapcount-BUG.patch --]
[-- Type: text/plain, Size: 2789 bytes --]

validate_anon_vma gave a useful check on the integrity of the anon_vma list
when Andrea was developing obj rmap; but it was not enabled in SLES9
itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
configurable CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that
its BUG_ON(mapcount > 100000) can easily crash a CONFIG_DEBUG_VM=y system.

That limit was just an arbitrary number to protect against an infinite
loop.  We could raise it to something enormous (depending on sizeof struct
vma and size of memory?); but I rather think validate_anon_vma has outlived
its usefulness, and is better just removed - which gives a magnificent
performance boost to anything like Petr's test program ;)

Of course, a very long anon_vma list is bad news for preemption latency,
and I believe there has been one recent report of such: let's not forget
that, but validate_anon_vma only makes it worse not better.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Petr Vandrovec <petr@vmware.com>
Acked-by: Nick Piggin <npiggin@suse.de>
Cc: Andrea Arcangeli <andrea@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 mm/rmap.c |   24 +-----------------------
 1 files changed, 1 insertions(+), 23 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 7ce69c1..c30781c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -53,24 +53,6 @@
 
 struct kmem_cache *anon_vma_cachep;
 
-static inline void validate_anon_vma(struct vm_area_struct *find_vma)
-{
-#ifdef CONFIG_DEBUG_VM
-	struct anon_vma *anon_vma = find_vma->anon_vma;
-	struct vm_area_struct *vma;
-	unsigned int mapcount = 0;
-	int found = 0;
-
-	list_for_each_entry(vma, &anon_vma->head, anon_vma_node) {
-		mapcount++;
-		BUG_ON(mapcount > 100000);
-		if (vma == find_vma)
-			found = 1;
-	}
-	BUG_ON(!found);
-#endif
-}
-
 /* This must be called under the mmap_sem. */
 int anon_vma_prepare(struct vm_area_struct *vma)
 {
@@ -121,10 +103,8 @@ void __anon_vma_link(struct vm_area_struct *vma)
 {
 	struct anon_vma *anon_vma = vma->anon_vma;
 
-	if (anon_vma) {
+	if (anon_vma)
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
-		validate_anon_vma(vma);
-	}
 }
 
 void anon_vma_link(struct vm_area_struct *vma)
@@ -134,7 +114,6 @@ void anon_vma_link(struct vm_area_struct *vma)
 	if (anon_vma) {
 		spin_lock(&anon_vma->lock);
 		list_add_tail(&vma->anon_vma_node, &anon_vma->head);
-		validate_anon_vma(vma);
 		spin_unlock(&anon_vma->lock);
 	}
 }
@@ -148,7 +127,6 @@ void anon_vma_unlink(struct vm_area_struct *vma)
 		return;
 
 	spin_lock(&anon_vma->lock);
-	validate_anon_vma(vma);
 	list_del(&vma->anon_vma_node);
 
 	/* We must garbage collect the anon_vma if it's empty */
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 24/28] saa7134: fix thread shutdown handling
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (18 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 23/28] mm: kill validate_anon_vma to avoid mapcount BUG Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 25/28] serial: clear proper MPSC interrupt cause bits Willy Tarreau
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jeff Mahoney, Mauro Carvalho Chehab, Andrew Morton, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0024-saa7134-fix-thread-shutdown-handling.patch --]
[-- Type: text/plain, Size: 1557 bytes --]

This patch changes the test for the thread pid from >= 0 to > 0.

When the saa7134 driver initialization fails after a certain point, it goes
through the complete shutdown process for the driver.  Part of shutting it
down includes tearing down the thread for tv audio.

The test for tearing down the thread tests for >= 0.  Since the dev
structure is kzalloc'd, the test will always be true if we haven't tried to
start the thread yet.  We end up waiting on pid 0 to complete, which will
never happen, so we lock up.

This bug was observed in Novell Bugzilla 284718, when request_irq() failed.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/media/video/saa7134/saa7134-tvaudio.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/media/video/saa7134/saa7134-tvaudio.c b/drivers/media/video/saa7134/saa7134-tvaudio.c
index dd759d6..36b3fa3 100644
--- a/drivers/media/video/saa7134/saa7134-tvaudio.c
+++ b/drivers/media/video/saa7134/saa7134-tvaudio.c
@@ -1006,7 +1006,7 @@ int saa7134_tvaudio_init2(struct saa7134_dev *dev)
 int saa7134_tvaudio_fini(struct saa7134_dev *dev)
 {
 	/* shutdown tvaudio thread */
-	if (dev->thread.pid >= 0) {
+	if (dev->thread.pid > 0) {
 		dev->thread.shutdown = 1;
 		wake_up_interruptible(&dev->thread.wq);
 		wait_for_completion(&dev->thread.exit);
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 25/28] serial: clear proper MPSC interrupt cause bits
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (19 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 24/28] saa7134: fix thread shutdown handling Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 26/28] i386: fix infinite loop with singlestep int80 syscalls Willy Tarreau
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jay Lubomirski, Mark A. Greer, Andrew Morton, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0025-serial-clear-proper-MPSC-interrupt-cause-bits.patch --]
[-- Type: text/plain, Size: 1225 bytes --]

The interrupt clearing code in mpsc_sdma_intr_ack() mistakenly clears the
interrupt for both controllers instead of just the one its supposed to.
This can result in the other controller appearing to hang because its
interrupt was effectively lost.

So, don't clear the interrupt cause bits for both MPSC controllers when
clearing the interrupt for one of them.  Just clear the one that is
supposed to be cleared.

Signed-off-by: Jay Lubomirski <jaylubo@motorola.com>
Acked-by: Mark A. Greer <mgreer@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/serial/mpsc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/serial/mpsc.c b/drivers/serial/mpsc.c
index 3d2fcc5..64ed5ef 100644
--- a/drivers/serial/mpsc.c
+++ b/drivers/serial/mpsc.c
@@ -502,7 +502,8 @@ mpsc_sdma_intr_ack(struct mpsc_port_info *pi)
 
 	if (pi->mirror_regs)
 		pi->shared_regs->SDMA_INTR_CAUSE_m = 0;
-	writel(0, pi->shared_regs->sdma_intr_base + SDMA_INTR_CAUSE);
+	writeb(0x00, pi->shared_regs->sdma_intr_base + SDMA_INTR_CAUSE +
+	       pi->port.line);
 	return;
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 26/28] i386: fix infinite loop with singlestep int80 syscalls
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (20 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 25/28] serial: clear proper MPSC interrupt cause bits Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock Willy Tarreau
  2007-08-11 19:48 ` [2.6.20.16 review 28/28] sky2: workaround for lost IRQ Willy Tarreau
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason Wessel, Jeremy Fitzhardinge, Chuck Ebbert, Chris Wright,
	Greg Kroah-Hartman

[-- Attachment #1: 0026-i386-fix-infinite-loop-with-singlestep-int80-syscal.patch --]
[-- Type: text/plain, Size: 2880 bytes --]

The commit 635cf99a80f4ebee59d70eb64bb85ce829e4591f introduced a
regression.  Executing a ptrace single step after certain int80
accesses will infinitely loop and never advance the PC.

The TIF_SINGLESTEP check should be done on the return from the syscall
and not before it.

The new test case is below:

/* Test whether singlestep through an int80 syscall works.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <asm/user.h>
#include <string.h>

static int child, status;
static struct user_regs_struct regs;

static void do_child()
{
	char str[80] = "child: int80 test
";

	ptrace(PTRACE_TRACEME, 0, 0, 0);
	kill(getpid(), SIGUSR1);
	write(fileno(stdout),str,strlen(str));
	asm ("int $0x80" : : "a" (20)); /* getpid */
}

static void do_parent()
{
	unsigned long eip, expected = 0;
again:
	waitpid(child, &status, 0);
	if (WIFEXITED(status) || WIFSIGNALED(status))
		return;

	if (WIFSTOPPED(status)) {
		ptrace(PTRACE_GETREGS, child, 0, &regs);
		eip = regs.eip;
		if (expected)
			fprintf(stderr, "child stop @ %08lx, expected %08lx %s
",
					eip, expected,
					eip == expected ? "" : " <== ERROR");

		if (*(unsigned short *)eip == 0x80cd) {
			fprintf(stderr, "int 0x80 at %08x
", (unsigned int)eip);
			expected = eip + 2;
		} else
			expected = 0;

		ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
	}
	goto again;
}

int main(int argc, char * const argv[])
{
	child = fork();
	if (child)
		do_parent();
	else
		do_child();
	return 0;
}

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 arch/i386/kernel/entry.S |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/i386/kernel/entry.S b/arch/i386/kernel/entry.S
index 5e47683..9bf056e 100644
--- a/arch/i386/kernel/entry.S
+++ b/arch/i386/kernel/entry.S
@@ -367,10 +367,6 @@ ENTRY(system_call)
 	CFI_ADJUST_CFA_OFFSET 4
 	SAVE_ALL
 	GET_THREAD_INFO(%ebp)
-	testl $TF_MASK,PT_EFLAGS(%esp)
-	jz no_singlestep
-	orl $_TIF_SINGLESTEP,TI_flags(%ebp)
-no_singlestep:
 					# system call tracing in operation / emulation
 	/* Note, _TIF_SECCOMP is bit number 8, and so it needs testw and not testb */
 	testw $(_TIF_SYSCALL_EMU|_TIF_SYSCALL_TRACE|_TIF_SECCOMP|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
@@ -385,6 +381,10 @@ syscall_exit:
 					# setting need_resched or sigpending
 					# between sampling and the iret
 	TRACE_IRQS_OFF
+	testl $TF_MASK,PT_EFLAGS(%esp)	# If tracing set singlestep flag on exit
+	jz no_singlestep
+	orl $_TIF_SINGLESTEP,TI_flags(%ebp)
+no_singlestep:
 	movl TI_flags(%ebp), %ecx
 	testw $_TIF_ALLWORK_MASK, %cx	# current->work
 	jne syscall_exit_work
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (21 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 26/28] i386: fix infinite loop with singlestep int80 syscalls Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  2007-08-12 11:15   ` Jason Uhlenkott
  2007-08-11 19:48 ` [2.6.20.16 review 28/28] sky2: workaround for lost IRQ Willy Tarreau
  23 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Thomas Gleixner, john stultz, Dave Jones, Ingo Molnar,
	Vincent Fortier, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0027-NTP-remove-clock_was_set-call-to-prevent-deadlock.patch --]
[-- Type: text/plain, Size: 1786 bytes --]

The clock_was_set() call in seconds_overflow() which happens only when
leap seconds are inserted / deleted is wrong in two aspects:

1. it results in a call to on_each_cpu() with interrupts disabled
2. it is potential deadlock source vs. call_lock in smp_call_function()

The only possible side effect of the removal might be, that an absolute
CLOCK_REALTIME timer fires 1 second too late, in the rare case of leap
second deletion and an absolute CLOCK_REALTIME timer which expires in
the affected time frame. It will never fire too early.

This was probably observed by the reporter of a June 30th -> July 1st
hang: http://lkml.org/lkml/2007/7/3/

A similar problem was observed by Dave Jones, who provided a screen shot
with a lockdep back trace, which allowed to analyse the problem.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Vincent Fortier <Vincent.Fortier1@EC.GC.CA>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 kernel/time/ntp.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 3afeaa3..64744bb 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -116,7 +116,6 @@ void second_overflow(void)
 			 */
 			time_interpolator_update(-NSEC_PER_SEC);
 			time_state = TIME_OOP;
-			clock_was_set();
 			printk(KERN_NOTICE "Clock: inserting leap second "
 					"23:59:60 UTC
");
 		}
@@ -131,7 +130,6 @@ void second_overflow(void)
 			 */
 			time_interpolator_update(NSEC_PER_SEC);
 			time_state = TIME_WAIT;
-			clock_was_set();
 			printk(KERN_NOTICE "Clock: deleting leap second "
 					"23:59:59 UTC
");
 		}
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [2.6.20.16 review 28/28] sky2: workaround for lost IRQ
  2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
                   ` (22 preceding siblings ...)
  2007-08-11 19:48 ` [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock Willy Tarreau
@ 2007-08-11 19:48 ` Willy Tarreau
  23 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-11 19:48 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Stephen Hemminger, Jeff Garzik, Chris Wright, Greg Kroah-Hartman

[-- Attachment #1: 0028-sky2-workaround-for-lost-IRQ.patch --]
[-- Type: text/plain, Size: 1516 bytes --]

This patch restores a couple of workarounds from 2.6.16:
 * restart transmit moderation timer in case it expires during IRQ routine
 * default to having 10 HZ watchdog timer.
At this point it more important not to hang than to worry about the
power cost.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
 drivers/net/sky2.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 38e75cf..aec8c59 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -95,7 +95,7 @@ static int disable_msi = 0;
 module_param(disable_msi, int, 0);
 MODULE_PARM_DESC(disable_msi, "Disable Message Signaled Interrupt (MSI)");
 
-static int idle_timeout = 0;
+static int idle_timeout = 100;
 module_param(idle_timeout, int, 0);
 MODULE_PARM_DESC(idle_timeout, "Watchdog timer for lost interrupts (ms)");
 
@@ -2341,6 +2341,13 @@ static int sky2_poll(struct net_device *dev0, int *budget)
 
 	work_done = sky2_status_intr(hw, work_limit);
 	if (work_done < work_limit) {
+		/* Bug/Errata workaround?
+		 * Need to kick the TX irq moderation timer.
+		 */
+		if (sky2_read8(hw, STAT_TX_TIMER_CTRL) == TIM_START) {
+			sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_STOP);
+			sky2_write8(hw, STAT_TX_TIMER_CTRL, TIM_START);
+		}
 		netif_rx_complete(dev0);
 
 		sky2_read32(hw, B0_Y2_SP_LISR);
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G
  2007-08-11 19:48 ` [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G Willy Tarreau
@ 2007-08-12 10:18   ` Andi Kleen
  2007-08-12 11:52     ` Willy Tarreau
  0 siblings, 1 reply; 29+ messages in thread
From: Andi Kleen @ 2007-08-12 10:18 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Zou Nan hai, Suresh Siddha, Andrew Morton,
	Linus Torvalds, Chris Wright, Greg Kroah-Hartman

On Saturday 11 August 2007 21:48, Willy Tarreau wrote:
> On systems with huge amount of physical memory, VFS cache and memory memmap
> may eat all available system memory under 4G, then the system may fail to
> allocate swiotlb bounce buffer.
>
> There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
> not cover sparsemem model.

Have you checked if sparsemem even worked in 2.6.20? Irc it was quite
unstable a couple of releases ago. There were times where it rarely
booted on x86-64 because so few people test it. 
If not the patch is not needed, although relatively harmless too.

-Andi

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock
  2007-08-11 19:48 ` [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock Willy Tarreau
@ 2007-08-12 11:15   ` Jason Uhlenkott
  2007-08-12 11:47     ` Willy Tarreau
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Uhlenkott @ 2007-08-12 11:15 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: linux-kernel, stable, Thomas Gleixner, john stultz, Dave Jones,
	Ingo Molnar, Vincent Fortier, Chris Wright, Greg Kroah-Hartman

On Sat, Aug 11, 2007 at 21:48:19 +0200, Willy Tarreau wrote:
> The clock_was_set() call in seconds_overflow() which happens only when
> leap seconds are inserted / deleted is wrong in two aspects:
> 
> 1. it results in a call to on_each_cpu() with interrupts disabled
> 2. it is potential deadlock source vs. call_lock in smp_call_function()

clock_was_set() is a no-op in 2.6.20, so this one looks unnecessary
(but harmless).  Thankfully the "hang every Linux box on the planet
simultaneously" regression (okay, that's *slight* hyperbole) was
limited to 2.6.21.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock
  2007-08-12 11:15   ` Jason Uhlenkott
@ 2007-08-12 11:47     ` Willy Tarreau
  0 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-12 11:47 UTC (permalink / raw)
  To: Jason Uhlenkott
  Cc: linux-kernel, stable, Thomas Gleixner, john stultz, Dave Jones,
	Ingo Molnar, Vincent Fortier, Chris Wright, Greg Kroah-Hartman

On Sun, Aug 12, 2007 at 04:15:58AM -0700, Jason Uhlenkott wrote:
> On Sat, Aug 11, 2007 at 21:48:19 +0200, Willy Tarreau wrote:
> > The clock_was_set() call in seconds_overflow() which happens only when
> > leap seconds are inserted / deleted is wrong in two aspects:
> > 
> > 1. it results in a call to on_each_cpu() with interrupts disabled
> > 2. it is potential deadlock source vs. call_lock in smp_call_function()
> 
> clock_was_set() is a no-op in 2.6.20, so this one looks unnecessary
> (but harmless).  Thankfully the "hang every Linux box on the planet
> simultaneously" regression (okay, that's *slight* hyperbole) was
> limited to 2.6.21.

OK, patch removed.

Thanks for your help,
Willy


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G
  2007-08-12 10:18   ` Andi Kleen
@ 2007-08-12 11:52     ` Willy Tarreau
  0 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2007-08-12 11:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, stable, Zou Nan hai, Suresh Siddha, Andrew Morton,
	Linus Torvalds, Chris Wright, Greg Kroah-Hartman

Hi Andi,

On Sun, Aug 12, 2007 at 12:18:11PM +0200, Andi Kleen wrote:
> On Saturday 11 August 2007 21:48, Willy Tarreau wrote:
> > On systems with huge amount of physical memory, VFS cache and memory memmap
> > may eat all available system memory under 4G, then the system may fail to
> > allocate swiotlb bounce buffer.
> >
> > There was a fix for this issue in arch/x86_64/mm/numa.c, but that fix dose
> > not cover sparsemem model.
> 
> Have you checked if sparsemem even worked in 2.6.20?

No, unfortunately I'm not equipped for that.

> Irc it was quite
> unstable a couple of releases ago. There were times where it rarely
> booted on x86-64 because so few people test it. 
> If not the patch is not needed, although relatively harmless too.

OK. So I'd propose the following :
 - if someone can confirm that it did not work anyway, I remove the patch
   which becomes useless.
 - but if we get no confirmation, assuming that in doubt, some people _may_
   be relying on it and that it does not affect other ones, we'd keep it.

are you OK with this ?

Thanks,
Willy


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2007-08-12 11:53 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-11 18:47 [2.6.20.16 review 00/28] 2.6.20.16 -stable review Willy Tarreau
2007-08-11 19:47 ` [2.6.20.16 review 01/28] i386: Fix K8/core2 oprofile on multiple CPUs Willy Tarreau
2007-08-11 19:47 ` [2.6.20.16 review 02/28] md: Avoid overflow in raid0 calculation with large components Willy Tarreau
2007-08-11 19:47 ` [2.6.20.16 review 03/28] md: Dont write more than is required of the last page of a bitmap Willy Tarreau
2007-08-11 19:47 ` [2.6.20.16 review 04/28] make freezeable workqueues singlethread Willy Tarreau
2007-08-11 19:47 ` [2.6.20.16 review 05/28] Char: cyclades, fix deadlock Willy Tarreau
2007-08-11 19:47 ` [2.6.20.16 review 06/28] e1000: disable polling before registering netdevice Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 08/28] x86_64: allocate sparsemem memmap above 4G Willy Tarreau
2007-08-12 10:18   ` Andi Kleen
2007-08-12 11:52     ` Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 09/28] sparsemem: fix oops in x86_64 show_mem Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 10/28] rt-mutex: Fix stale return value Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 11/28] rt-mutex: Fix chain walk early wakeup bug Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 13/28] md: Fix two raid10 bugs Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 14/28] md: Fix bug in error handling during raid1 repair Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 15/28] dm crypt: disable barriers Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 16/28] dm crypt: fix call to clone_init Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 17/28] dm crypt: fix avoid cloned bio ref after free Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 19/28] sched: fix next_interval determination in idle_balance() Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 21/28] audit: fix oops removing watch if audit disabled Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 22/28] POWERPC: Fix subtle FP state corruption bug in signal return on SMP Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 23/28] mm: kill validate_anon_vma to avoid mapcount BUG Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 24/28] saa7134: fix thread shutdown handling Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 25/28] serial: clear proper MPSC interrupt cause bits Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 26/28] i386: fix infinite loop with singlestep int80 syscalls Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 27/28] NTP: remove clock_was_set() call to prevent deadlock Willy Tarreau
2007-08-12 11:15   ` Jason Uhlenkott
2007-08-12 11:47     ` Willy Tarreau
2007-08-11 19:48 ` [2.6.20.16 review 28/28] sky2: workaround for lost IRQ Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox