* [linux-lvm] oops with snapshot / 2.4.29
@ 2005-03-18 1:20 dean gaudet
2005-03-30 23:56 ` [linux-lvm] [patch] " dean gaudet
0 siblings, 1 reply; 3+ messages in thread
From: dean gaudet @ 2005-03-18 1:20 UTC (permalink / raw)
To: linux-lvm
i seem to have run into an oops described in the archives -- in particular
this thread from last year:
https://www.redhat.com/archives/linux-lvm/2004-January/msg00110.html
has there been any further work on this? any success stories for the #2
patch mentioned in the above thread?
my system is running 2.4.29 + linux-2.4.22-VFS-lock.patch ... it's a dual
xeon w/hyperthreading. i've been using snapshots for over a year for
nightly backups and this is the first time i've run into the oops.
thanks
-dean
Mar 16 03:31:19 twinlark kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
Mar 16 03:31:19 twinlark kernel: printing eip:
Mar 16 03:31:19 twinlark kernel: c021f30c
Mar 16 03:31:19 twinlark kernel: *pde = 00000000
Mar 16 03:31:19 twinlark kernel: Oops: 0002
Mar 16 03:31:19 twinlark kernel: CPU: 3
Mar 16 03:31:19 twinlark kernel: EIP: 0010:[lvm_snapshot_remap_block+220/256] Not tainted
Mar 16 03:31:19 twinlark kernel: EFLAGS: 00010202
Mar 16 03:31:19 twinlark kernel: eax: f8a3dde0 ebx: f8d36800 ecx: f8a31000 edx: 00000000
Mar 16 03:31:19 twinlark kernel: esi: 00010180 edi: 00000001 ebp: 00000903 esp: e1483ce0
Mar 16 03:31:19 twinlark kernel: ds: 0018 es: 0018 ss: 0018
Mar 16 03:31:19 twinlark kernel: Process qmail-smtpd (pid: 18415, stackpage=e1483000)
Mar 16 03:31:19 twinlark kernel: Stack: 00000038 00000903 00000000 f7ba9600 e94e5c00 e94e5d70 00010000 c021bb28
Mar 16 03:31:19 twinlark kernel: e1483d34 e1483d2c 00010180 e94e5c00 00000001 c01c7662 00000001 f7ba9770
Mar 16 03:31:19 twinlark kernel: f750c000 00010180 00000000 000101b8 000101b8 09030903 00003a00 f7535980
Mar 16 03:31:19 twinlark kernel: Call Trace: [lvm_map+408/1136] [submit_bh+82/256] [lvm_make_request_fn+23/48] [generic_make_request+228/320] [submit_bh+82/256]
Mar 16 03:31:19 twinlark kernel: [__refile_buffer+86/112] [sync_page_buffers+168/176] [try_to_free_buffers+331/368] [shrink_cache+827/1040] [shrink_caches+74/96] [try_to_free_pages_zone+98/240]
Mar 16 03:31:19 twinlark kernel: [balance_classzone+77/496] [__alloc_pages+376/640] [filemap_nopage+494/560] [filemap_nopage+0/560] [do_no_page+330/640] [handle_mm_fault+117/256]
Mar 16 03:31:19 twinlark kernel: [do_page_fault+968/1397] [do_mmap_pgoff+874/1488] [old_mmap+272/304] [filp_close+143/208] [do_page_fault+0/1397] [error_code+52/60]
Mar 16 03:31:19 twinlark kernel:
Mar 16 03:31:19 twinlark kernel: Code: 89 42 04 8b 03 89 48 04 89 01 89 59 04 89 0b 89 ca eb 9f b8
^ permalink raw reply [flat|nested] 3+ messages in thread
* [linux-lvm] [patch] oops with snapshot / 2.4.29
2005-03-18 1:20 [linux-lvm] oops with snapshot / 2.4.29 dean gaudet
@ 2005-03-30 23:56 ` dean gaudet
2005-03-31 12:23 ` [linux-lvm] " Marcelo Tosatti
0 siblings, 1 reply; 3+ messages in thread
From: dean gaudet @ 2005-03-30 23:56 UTC (permalink / raw)
To: LVM general discussion and development; +Cc: Marcelo Tosatti
having looked at the oops below and studied the code a bit more i think
that the safest thing to do for 2.4.x to fix this race condition is to get
rid of the unsafe promote-to-front hash list traversal. i considered
other fixes (such as a separate spinlock for just the hash list, or a
small array of them to give some amount of concurrency) ... but in the
interests of stability the following patch seems the most appropriate.
besides... nobody has responded to my mail, so perhaps proposing this
patch to marcelo will cause folks to wake up and object :)
-dean
Signed-off-by: dean gaudet <dean@arctic.org>
--- linux-2.4.29/drivers/md/lvm-snap.c.orig 2005-03-25 22:03:43.000000000 -0800
+++ linux-2.4.29/drivers/md/lvm-snap.c 2005-03-30 01:46:17.000000000 -0800
@@ -119,7 +119,6 @@ static inline lv_block_exception_t *lvm_
unsigned long mask = lv->lv_snapshot_hash_mask;
int chunk_size = lv->lv_chunk_size;
lv_block_exception_t *ret;
- int i = 0;
hash_table =
&hash_table[hashfn(org_dev, org_start, mask, chunk_size)];
@@ -132,15 +131,9 @@ static inline lv_block_exception_t *lvm_
exception = list_entry(next, lv_block_exception_t, hash);
if (exception->rsector_org == org_start &&
exception->rdev_org == org_dev) {
- if (i) {
- /* fun, isn't it? :) */
- list_del(next);
- list_add(next, hash_table);
- }
ret = exception;
break;
}
- i++;
}
return ret;
}
On Thu, 17 Mar 2005, dean gaudet wrote:
> i seem to have run into an oops described in the archives -- in particular
> this thread from last year:
>
> https://www.redhat.com/archives/linux-lvm/2004-January/msg00110.html
>
> has there been any further work on this? any success stories for the #2
> patch mentioned in the above thread?
>
> my system is running 2.4.29 + linux-2.4.22-VFS-lock.patch ... it's a dual
> xeon w/hyperthreading. i've been using snapshots for over a year for
> nightly backups and this is the first time i've run into the oops.
>
> thanks
> -dean
>
> Mar 16 03:31:19 twinlark kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
> Mar 16 03:31:19 twinlark kernel: printing eip:
> Mar 16 03:31:19 twinlark kernel: c021f30c
> Mar 16 03:31:19 twinlark kernel: *pde = 00000000
> Mar 16 03:31:19 twinlark kernel: Oops: 0002
> Mar 16 03:31:19 twinlark kernel: CPU: 3
> Mar 16 03:31:19 twinlark kernel: EIP: 0010:[lvm_snapshot_remap_block+220/256] Not tainted
> Mar 16 03:31:19 twinlark kernel: EFLAGS: 00010202
> Mar 16 03:31:19 twinlark kernel: eax: f8a3dde0 ebx: f8d36800 ecx: f8a31000 edx: 00000000
> Mar 16 03:31:19 twinlark kernel: esi: 00010180 edi: 00000001 ebp: 00000903 esp: e1483ce0
> Mar 16 03:31:19 twinlark kernel: ds: 0018 es: 0018 ss: 0018
> Mar 16 03:31:19 twinlark kernel: Process qmail-smtpd (pid: 18415, stackpage=e1483000)
> Mar 16 03:31:19 twinlark kernel: Stack: 00000038 00000903 00000000 f7ba9600 e94e5c00 e94e5d70 00010000 c021bb28
> Mar 16 03:31:19 twinlark kernel: e1483d34 e1483d2c 00010180 e94e5c00 00000001 c01c7662 00000001 f7ba9770
> Mar 16 03:31:19 twinlark kernel: f750c000 00010180 00000000 000101b8 000101b8 09030903 00003a00 f7535980
> Mar 16 03:31:19 twinlark kernel: Call Trace: [lvm_map+408/1136] [submit_bh+82/256] [lvm_make_request_fn+23/48] [generic_make_request+228/320] [submit_bh+82/256]
> Mar 16 03:31:19 twinlark kernel: [__refile_buffer+86/112] [sync_page_buffers+168/176] [try_to_free_buffers+331/368] [shrink_cache+827/1040] [shrink_caches+74/96] [try_to_free_pages_zone+98/240]
> Mar 16 03:31:19 twinlark kernel: [balance_classzone+77/496] [__alloc_pages+376/640] [filemap_nopage+494/560] [filemap_nopage+0/560] [do_no_page+330/640] [handle_mm_fault+117/256]
> Mar 16 03:31:19 twinlark kernel: [do_page_fault+968/1397] [do_mmap_pgoff+874/1488] [old_mmap+272/304] [filp_close+143/208] [do_page_fault+0/1397] [error_code+52/60]
> Mar 16 03:31:19 twinlark kernel:
> Mar 16 03:31:19 twinlark kernel: Code: 89 42 04 8b 03 89 48 04 89 01 89 59 04 89 0b 89 ca eb 9f b8
^ permalink raw reply [flat|nested] 3+ messages in thread
* [linux-lvm] Re: [patch] oops with snapshot / 2.4.29
2005-03-30 23:56 ` [linux-lvm] [patch] " dean gaudet
@ 2005-03-31 12:23 ` Marcelo Tosatti
0 siblings, 0 replies; 3+ messages in thread
From: Marcelo Tosatti @ 2005-03-31 12:23 UTC (permalink / raw)
To: dean gaudet; +Cc: LVM general discussion and development
On Wed, Mar 30, 2005 at 03:56:19PM -0800, dean gaudet wrote:
> having looked at the oops below and studied the code a bit more i think
> that the safest thing to do for 2.4.x to fix this race condition is to get
> rid of the unsafe promote-to-front hash list traversal. i considered
> other fixes (such as a separate spinlock for just the hash list, or a
> small array of them to give some amount of concurrency) ... but in the
> interests of stability the following patch seems the most appropriate.
>
> besides... nobody has responded to my mail, so perhaps proposing this
> patch to marcelo will cause folks to wake up and object :)
>
> -dean
>
> Signed-off-by: dean gaudet <dean@arctic.org>
>
> --- linux-2.4.29/drivers/md/lvm-snap.c.orig 2005-03-25 22:03:43.000000000 -0800
> +++ linux-2.4.29/drivers/md/lvm-snap.c 2005-03-30 01:46:17.000000000 -0800
> @@ -119,7 +119,6 @@ static inline lv_block_exception_t *lvm_
> unsigned long mask = lv->lv_snapshot_hash_mask;
> int chunk_size = lv->lv_chunk_size;
> lv_block_exception_t *ret;
> - int i = 0;
>
> hash_table =
> &hash_table[hashfn(org_dev, org_start, mask, chunk_size)];
> @@ -132,15 +131,9 @@ static inline lv_block_exception_t *lvm_
> exception = list_entry(next, lv_block_exception_t, hash);
> if (exception->rsector_org == org_start &&
> exception->rdev_org == org_dev) {
> - if (i) {
> - /* fun, isn't it? :) */
> - list_del(next);
> - list_add(next, hash_table);
> - }
> ret = exception;
> break;
> }
> - i++;
> }
> return ret;
> }
Hi dean,
The hash table write is not performed anymore (ie your suggestion is already in).
D 1.15 05/01/26 13:01:33-02:00 mauelshagen@redhat.com[marcelo] 16 15 0/7/750
P drivers/md/lvm-snap.c
C fix panics while backing up LVM snapshots
===== lvm-snap.c 1.14 vs 1.15 =====
--- 1.14/drivers/md/lvm-snap.c 2004-03-13 02:25:25 -03:00
+++ 1.15/drivers/md/lvm-snap.c 2005-01-26 13:01:33 -02:00
@@ -119,7 +119,6 @@
unsigned long mask = lv->lv_snapshot_hash_mask;
int chunk_size = lv->lv_chunk_size;
lv_block_exception_t *ret;
- int i = 0;
hash_table =
&hash_table[hashfn(org_dev, org_start, mask, chunk_size)];
@@ -132,15 +131,9 @@
exception = list_entry(next, lv_block_exception_t, hash);
if (exception->rsector_org == org_start &&
exception->rdev_org == org_dev) {
- if (i) {
- /* fun, isn't it? :) */
- list_del(next);
- list_add(next, hash_table);
- }
ret = exception;
break;
}
- i++;
}
return ret;
}
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-03-31 17:26 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-18 1:20 [linux-lvm] oops with snapshot / 2.4.29 dean gaudet
2005-03-30 23:56 ` [linux-lvm] [patch] " dean gaudet
2005-03-31 12:23 ` [linux-lvm] " Marcelo Tosatti
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.