* [Qemu-devel] Endless loop in qcow2_alloc_cluster_offset @ 2009-11-19 12:19 Jan Kiszka 2009-11-19 14:49 ` [Qemu-devel] " Kevin Wolf 2010-05-07 1:19 ` Marcelo Tosatti 0 siblings, 2 replies; 15+ messages in thread From: Jan Kiszka @ 2009-11-19 12:19 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, kvm Hi, I just managed to push a qemu-kvm process (git rev. b496fe3431) into an endless loop in qcow2_alloc_cluster_offset, namely over QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight): (gdb) bt #0 0x000000000048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at /data/qemu-kvm/block/qcow2-cluster.c:750 #1 0x00000000004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at /data/qemu-kvm/block/qcow2.c:587 #2 0x0000000000482a44 in qcow_aio_writev (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>) at /data/qemu-kvm/block/qcow2.c:645 #3 0x0000000000470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 <bdrv_rw_em_cb>, opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362 #4 0x0000000000472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736 #5 0x0000000000435581 in ide_sector_write (s=0xc92650) at /data/qemu-kvm/hw/ide/core.c:622 #6 0x0000000000425fc2 in kvm_handle_io (env=<value optimized out>) at /data/qemu-kvm/kvm-all.c:553 #7 kvm_run (env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:964 #8 0x0000000000426049 in kvm_cpu_exec (env=0x1000) at /data/qemu-kvm/qemu-kvm.c:1651 #9 0x000000000042627d in kvm_main_loop_cpu (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1893 #10 ap_main_loop (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1943 #11 0x00007f48ae89d070 in start_thread () from /lib64/libpthread.so.0 #12 0x00007f48abf0711d in clone () from /lib64/libc.so.6 #13 0x0000000000000000 in ?? () (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first $5 = (struct QCowL2Meta *) 0xcb3568 (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} So next == first. Is something fiddling with cluster_allocs concurrently, e.g. some signal handler? Or what could cause this list corruption? Would it be enough to move to QLIST_FOREACH_SAFE? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-11-19 12:19 [Qemu-devel] Endless loop in qcow2_alloc_cluster_offset Jan Kiszka @ 2009-11-19 14:49 ` Kevin Wolf 2009-11-19 14:58 ` Jan Kiszka 2010-05-07 1:19 ` Marcelo Tosatti 1 sibling, 1 reply; 15+ messages in thread From: Kevin Wolf @ 2009-11-19 14:49 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm Hi Jan, Am 19.11.2009 13:19, schrieb Jan Kiszka: > (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > $5 = (struct QCowL2Meta *) 0xcb3568 > (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} > > So next == first. Oops. Doesn't sound quite right... > Is something fiddling with cluster_allocs concurrently, e.g. some signal > handler? Or what could cause this list corruption? Would it be enough to > move to QLIST_FOREACH_SAFE? Are there any specific signals you're thinking of? Related to block code I can only think of SIGUSR2 and this one shouldn't call any block driver functions directly. You're using aio=threads, I assume? (It's the default) QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop doesn't insert or remove any elements. If the list is corrupted now, I think it would be corrupted with QLIST_FOREACH_SAFE as well - at best, the endless loop would occur one call later. The only way I see to get such a loop in a list is to re-insert an element that already is part of the list. The only insert is at qcow2-cluster.c:777. Remains the question how we came there twice without run_dependent_requests() removing the L2Meta from our list first - because this is definitely wrong... Presumably, it's not reproducible? Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-11-19 14:49 ` [Qemu-devel] " Kevin Wolf @ 2009-11-19 14:58 ` Jan Kiszka 2009-12-07 14:16 ` Jan Kiszka 0 siblings, 1 reply; 15+ messages in thread From: Jan Kiszka @ 2009-11-19 14:58 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm Kevin Wolf wrote: > Hi Jan, > > Am 19.11.2009 13:19, schrieb Jan Kiszka: >> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >> $5 = (struct QCowL2Meta *) 0xcb3568 >> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} >> >> So next == first. > > Oops. Doesn't sound quite right... > >> Is something fiddling with cluster_allocs concurrently, e.g. some signal >> handler? Or what could cause this list corruption? Would it be enough to >> move to QLIST_FOREACH_SAFE? > > Are there any specific signals you're thinking of? Related to block code No, was just blind guessing. > I can only think of SIGUSR2 and this one shouldn't call any block driver > functions directly. You're using aio=threads, I assume? (It's the default) Yes, all on defaults. > > QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop > doesn't insert or remove any elements. If the list is corrupted now, I > think it would be corrupted with QLIST_FOREACH_SAFE as well - at best, > the endless loop would occur one call later. > > The only way I see to get such a loop in a list is to re-insert an > element that already is part of the list. The only insert is at > qcow2-cluster.c:777. Remains the question how we came there twice > without run_dependent_requests() removing the L2Meta from our list first > - because this is definitely wrong... > > Presumably, it's not reproducible? Likely not. What I did was nothing special, and I did not noticed such a crash in the last months. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-11-19 14:58 ` Jan Kiszka @ 2009-12-07 14:16 ` Jan Kiszka 2009-12-07 14:50 ` Jan Kiszka 2009-12-07 15:00 ` Kevin Wolf 0 siblings, 2 replies; 15+ messages in thread From: Jan Kiszka @ 2009-12-07 14:16 UTC (permalink / raw) Cc: Kevin Wolf, qemu-devel, kvm Jan Kiszka wrote: > Kevin Wolf wrote: >> Hi Jan, >> >> Am 19.11.2009 13:19, schrieb Jan Kiszka: >>> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >>> $5 = (struct QCowL2Meta *) 0xcb3568 >>> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >>> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} >>> >>> So next == first. >> Oops. Doesn't sound quite right... >> >>> Is something fiddling with cluster_allocs concurrently, e.g. some signal >>> handler? Or what could cause this list corruption? Would it be enough to >>> move to QLIST_FOREACH_SAFE? >> Are there any specific signals you're thinking of? Related to block code > > No, was just blind guessing. > >> I can only think of SIGUSR2 and this one shouldn't call any block driver >> functions directly. You're using aio=threads, I assume? (It's the default) > > Yes, all on defaults. > >> QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop >> doesn't insert or remove any elements. If the list is corrupted now, I >> think it would be corrupted with QLIST_FOREACH_SAFE as well - at best, >> the endless loop would occur one call later. >> >> The only way I see to get such a loop in a list is to re-insert an >> element that already is part of the list. The only insert is at >> qcow2-cluster.c:777. Remains the question how we came there twice >> without run_dependent_requests() removing the L2Meta from our list first >> - because this is definitely wrong... >> >> Presumably, it's not reproducible? > > Likely not. What I did was nothing special, and I did not noticed such a > crash in the last months. And now it happened again (qemu-kvm head, during kernel installation from network onto local qcow2-disk). Any clever idea how to proceed with this? I could try to run the step in a loop, hopefully retriggering it once in a (likely longer) while. But then we need some good instrumentation first. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 14:16 ` Jan Kiszka @ 2009-12-07 14:50 ` Jan Kiszka 2009-12-07 15:03 ` Kevin Wolf 2009-12-07 15:04 ` Avi Kivity 2009-12-07 15:00 ` Kevin Wolf 1 sibling, 2 replies; 15+ messages in thread From: Jan Kiszka @ 2009-12-07 14:50 UTC (permalink / raw) To: Kevin Wolf, qemu-devel, kvm Jan Kiszka wrote: > And now it happened again (qemu-kvm head, during kernel installation > from network onto local qcow2-disk). Any clever idea how to proceed with > this? > > I could try to run the step in a loop, hopefully retriggering it once in > a (likely longer) while. But then we need some good instrumentation first. > Maybe I'm seeing ghosts, and I don't even have a minimal clue about what goes on in the code, but this looks fishy: preallocate() invokes qcow2_alloc_cluster_offset() passing &meta, a stack variable. It seems that qcow2_alloc_cluster_offset() may insert this structure into cluster_allocs and leave it there. So we corrupt the queue as soon as preallocate() returns, no? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 14:50 ` Jan Kiszka @ 2009-12-07 15:03 ` Kevin Wolf 2009-12-07 15:25 ` Jan Kiszka 2009-12-07 15:04 ` Avi Kivity 1 sibling, 1 reply; 15+ messages in thread From: Kevin Wolf @ 2009-12-07 15:03 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm Am 07.12.2009 15:50, schrieb Jan Kiszka: > Jan Kiszka wrote: >> And now it happened again (qemu-kvm head, during kernel installation >> from network onto local qcow2-disk). Any clever idea how to proceed with >> this? >> >> I could try to run the step in a loop, hopefully retriggering it once in >> a (likely longer) while. But then we need some good instrumentation first. >> > > Maybe I'm seeing ghosts, and I don't even have a minimal clue about what > goes on in the code, but this looks fishy: > > preallocate() invokes qcow2_alloc_cluster_offset() passing &meta, a > stack variable. It seems that qcow2_alloc_cluster_offset() may insert > this structure into cluster_allocs and leave it there. So we corrupt the > queue as soon as preallocate() returns, no? preallocate() is about metadata preallocation during image creation. It is only ever run by qemu-img. Apart from that it calls run_dependent_requests() which removes the request from the list again. Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 15:03 ` Kevin Wolf @ 2009-12-07 15:25 ` Jan Kiszka 0 siblings, 0 replies; 15+ messages in thread From: Jan Kiszka @ 2009-12-07 15:25 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm Kevin Wolf wrote: > Am 07.12.2009 15:50, schrieb Jan Kiszka: >> Jan Kiszka wrote: >>> And now it happened again (qemu-kvm head, during kernel installation >>> from network onto local qcow2-disk). Any clever idea how to proceed with >>> this? >>> >>> I could try to run the step in a loop, hopefully retriggering it once in >>> a (likely longer) while. But then we need some good instrumentation first. >>> >> Maybe I'm seeing ghosts, and I don't even have a minimal clue about what >> goes on in the code, but this looks fishy: >> >> preallocate() invokes qcow2_alloc_cluster_offset() passing &meta, a >> stack variable. It seems that qcow2_alloc_cluster_offset() may insert >> this structure into cluster_allocs and leave it there. So we corrupt the >> queue as soon as preallocate() returns, no? > > preallocate() is about metadata preallocation during image creation. It > is only ever run by qemu-img. Apart from that it calls > run_dependent_requests() which removes the request from the list again. OK, I see - was far too easy anyway. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 14:50 ` Jan Kiszka 2009-12-07 15:03 ` Kevin Wolf @ 2009-12-07 15:04 ` Avi Kivity 1 sibling, 0 replies; 15+ messages in thread From: Avi Kivity @ 2009-12-07 15:04 UTC (permalink / raw) To: Jan Kiszka; +Cc: Kevin Wolf, qemu-devel, kvm On 12/07/2009 04:50 PM, Jan Kiszka wrote: > > Maybe I'm seeing ghosts, and I don't even have a minimal clue about what > goes on in the code, but this looks fishy: > > Plenty of ghosts in qcow2, of all those explorers who tried to brave the code. Only Kevin has ever come back. > preallocate() invokes qcow2_alloc_cluster_offset() passing&meta, a > stack variable. It seems that qcow2_alloc_cluster_offset() may insert > this structure into cluster_allocs and leave it there. So we corrupt the > queue as soon as preallocate() returns, no? > > We invoke run_dependent_requests() which should dequeue those &meta again (I think). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 14:16 ` Jan Kiszka 2009-12-07 14:50 ` Jan Kiszka @ 2009-12-07 15:00 ` Kevin Wolf 2009-12-07 16:09 ` Jan Kiszka 2009-12-08 14:51 ` Kevin Wolf 1 sibling, 2 replies; 15+ messages in thread From: Kevin Wolf @ 2009-12-07 15:00 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm Am 07.12.2009 15:16, schrieb Jan Kiszka: >> Likely not. What I did was nothing special, and I did not noticed such a >> crash in the last months. > > And now it happened again (qemu-kvm head, during kernel installation > from network onto local qcow2-disk). Any clever idea how to proceed with > this? I still haven't seen this and I still have no theory on what could be happening here. I'm just trying to write down what I think must happen to get into this situation. Maybe you can point at something I'm missing or maybe it helps you to have a sudden inspiration. The crash happens because we have a loop in the s->cluster_allocs list. A loop can only be created by inserting an object twice. The only insert to this list happens in qcow2_alloc_cluster_offset (though an earlier call than that of the stack trace). There is only one relevant caller of this function, qcow_aio_write_cb. Part of it is a call to run_dependent_requests which removes the request from s->cluster_allocs. So after the QLIST_REMOVE in run_dependent_requests the request can't be contained in the list, but at the call of qcow2_alloc_cluster_offset it must be contained again. It must be added somewhere in between these two calls. In qcow_aio_write_cb there isn't much happening between these calls. The only thing that could somehow become dangerous is the qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests. > I could try to run the step in a loop, hopefully retriggering it once in > a (likely longer) while. But then we need some good instrumentation first. I can't explain what exactly would be going wrong there, but if my thoughts are right so far, I think that moving this into a Bottom Half would help. So if you can reproduce it in a loop this could be worth a try. I'd certainly prefer to understand the problem first, but thinking about AIO is the perfect way to make your brain hurt... Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 15:00 ` Kevin Wolf @ 2009-12-07 16:09 ` Jan Kiszka 2009-12-07 16:26 ` Kevin Wolf 2009-12-08 14:51 ` Kevin Wolf 1 sibling, 1 reply; 15+ messages in thread From: Jan Kiszka @ 2009-12-07 16:09 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm Kevin Wolf wrote: > Am 07.12.2009 15:16, schrieb Jan Kiszka: >>> Likely not. What I did was nothing special, and I did not noticed such a >>> crash in the last months. >> And now it happened again (qemu-kvm head, during kernel installation >> from network onto local qcow2-disk). Any clever idea how to proceed with >> this? > > I still haven't seen this and I still have no theory on what could be > happening here. I'm just trying to write down what I think must happen > to get into this situation. Maybe you can point at something I'm missing > or maybe it helps you to have a sudden inspiration. > > The crash happens because we have a loop in the s->cluster_allocs list. > A loop can only be created by inserting an object twice. The only insert > to this list happens in qcow2_alloc_cluster_offset (though an earlier > call than that of the stack trace). > > There is only one relevant caller of this function, qcow_aio_write_cb. > Part of it is a call to run_dependent_requests which removes the request > from s->cluster_allocs. So after the QLIST_REMOVE in > run_dependent_requests the request can't be contained in the list, but > at the call of qcow2_alloc_cluster_offset it must be contained again. It > must be added somewhere in between these two calls. > > In qcow_aio_write_cb there isn't much happening between these calls. The > only thing that could somehow become dangerous is the > qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests. If m->nb_clusters is not, the entry won't be removed from the list. And of something corrupted nb_clusters so that it became 0 although it's still enqueued, we would see the deadly loop I faced, right? Unfortunately, any arbitrary memory corruption that generates such zeros can cause this... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 16:09 ` Jan Kiszka @ 2009-12-07 16:26 ` Kevin Wolf 0 siblings, 0 replies; 15+ messages in thread From: Kevin Wolf @ 2009-12-07 16:26 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm Am 07.12.2009 17:09, schrieb Jan Kiszka: > Kevin Wolf wrote: >> In qcow_aio_write_cb there isn't much happening between these calls. The >> only thing that could somehow become dangerous is the >> qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests. > > If m->nb_clusters is not, the entry won't be removed from the list. And > of something corrupted nb_clusters so that it became 0 although it's > still enqueued, we would see the deadly loop I faced, right? > Unfortunately, any arbitrary memory corruption that generates such zeros > can cause this... Right, this looks like another way to get into that endless loop. I don't think it's very likely the cause, but who knows. Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-12-07 15:00 ` Kevin Wolf 2009-12-07 16:09 ` Jan Kiszka @ 2009-12-08 14:51 ` Kevin Wolf 1 sibling, 0 replies; 15+ messages in thread From: Kevin Wolf @ 2009-12-08 14:51 UTC (permalink / raw) To: Jan Kiszka; +Cc: qemu-devel, kvm Am 07.12.2009 16:00, schrieb Kevin Wolf: > Am 07.12.2009 15:16, schrieb Jan Kiszka: >>> Likely not. What I did was nothing special, and I did not noticed such a >>> crash in the last months. >> >> And now it happened again (qemu-kvm head, during kernel installation >> from network onto local qcow2-disk). Any clever idea how to proceed with >> this? > > I still haven't seen this and I still have no theory on what could be > happening here. I'm just trying to write down what I think must happen > to get into this situation. Maybe you can point at something I'm missing > or maybe it helps you to have a sudden inspiration. > > The crash happens because we have a loop in the s->cluster_allocs list. > A loop can only be created by inserting an object twice. The only insert > to this list happens in qcow2_alloc_cluster_offset (though an earlier > call than that of the stack trace). > > There is only one relevant caller of this function, qcow_aio_write_cb. > Part of it is a call to run_dependent_requests which removes the request > from s->cluster_allocs. So after the QLIST_REMOVE in > run_dependent_requests the request can't be contained in the list, but > at the call of qcow2_alloc_cluster_offset it must be contained again. It > must be added somewhere in between these two calls. > > In qcow_aio_write_cb there isn't much happening between these calls. The > only thing that could somehow become dangerous is the > qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests. Hm, you're using only one disk, and it's an IDE disk, right? Then the queue of dependent requests should be empty anyway, so no dangerous calls here. Maybe your theory of a memory corruption is the better one. Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2009-11-19 12:19 [Qemu-devel] Endless loop in qcow2_alloc_cluster_offset Jan Kiszka 2009-11-19 14:49 ` [Qemu-devel] " Kevin Wolf @ 2010-05-07 1:19 ` Marcelo Tosatti 2010-05-07 7:37 ` Kevin Wolf 1 sibling, 1 reply; 15+ messages in thread From: Marcelo Tosatti @ 2010-05-07 1:19 UTC (permalink / raw) To: Jan Kiszka; +Cc: Kevin Wolf, qemu-devel, kvm On Thu, Nov 19, 2009 at 01:19:55PM +0100, Jan Kiszka wrote: > Hi, > > I just managed to push a qemu-kvm process (git rev. b496fe3431) into an > endless loop in qcow2_alloc_cluster_offset, namely over > QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight): > > (gdb) bt > #0 0x000000000048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at /data/qemu-kvm/block/qcow2-cluster.c:750 > #1 0x00000000004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at /data/qemu-kvm/block/qcow2.c:587 > #2 0x0000000000482a44 in qcow_aio_writev (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>) at /data/qemu-kvm/block/qcow2.c:645 > #3 0x0000000000470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 <bdrv_rw_em_cb>, opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362 > #4 0x0000000000472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736 > #5 0x0000000000435581 in ide_sector_write (s=0xc92650) at /data/qemu-kvm/hw/ide/core.c:622 > #6 0x0000000000425fc2 in kvm_handle_io (env=<value optimized out>) at /data/qemu-kvm/kvm-all.c:553 > #7 kvm_run (env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:964 > #8 0x0000000000426049 in kvm_cpu_exec (env=0x1000) at /data/qemu-kvm/qemu-kvm.c:1651 > #9 0x000000000042627d in kvm_main_loop_cpu (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1893 > #10 ap_main_loop (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1943 > #11 0x00007f48ae89d070 in start_thread () from /lib64/libpthread.so.0 > #12 0x00007f48abf0711d in clone () from /lib64/libc.so.6 > #13 0x0000000000000000 in ?? () > (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > $5 = (struct QCowL2Meta *) 0xcb3568 > (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} > > So next == first. > Seen the exact same bug twice in a row while installing FC12 with IDE disk, current qemu-kvm.git. qemu-system-x86_64 -drive file=/root/images/fc12-ide.img,cache=writeback \ -m 1000 -vnc :1 \ -net nic,model=virtio \ -net tap,script=/root/ifup.sh -serial stdio \ -cdrom /root/iso/linux/Fedora-12-x86_64-DVD.iso -monitor telnet::4445,server,nowait -usbdevice tablet Can't reproduce though. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2010-05-07 1:19 ` Marcelo Tosatti @ 2010-05-07 7:37 ` Kevin Wolf 2010-05-07 15:16 ` Marcelo Tosatti 0 siblings, 1 reply; 15+ messages in thread From: Kevin Wolf @ 2010-05-07 7:37 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Jan Kiszka, qemu-devel, kvm Am 07.05.2010 03:19, schrieb Marcelo Tosatti: > On Thu, Nov 19, 2009 at 01:19:55PM +0100, Jan Kiszka wrote: >> Hi, >> >> I just managed to push a qemu-kvm process (git rev. b496fe3431) into an >> endless loop in qcow2_alloc_cluster_offset, namely over >> QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight): >> >> (gdb) bt >> #0 0x000000000048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at /data/qemu-kvm/block/qcow2-cluster.c:750 >> #1 0x00000000004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at /data/qemu-kvm/block/qcow2.c:587 >> #2 0x0000000000482a44 in qcow_aio_writev (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>) at /data/qemu-kvm/block/qcow2.c:645 >> #3 0x0000000000470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 <bdrv_rw_em_cb>, opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362 >> #4 0x0000000000472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736 >> #5 0x0000000000435581 in ide_sector_write (s=0xc92650) at /data/qemu-kvm/hw/ide/core.c:622 >> #6 0x0000000000425fc2 in kvm_handle_io (env=<value optimized out>) at /data/qemu-kvm/kvm-all.c:553 >> #7 kvm_run (env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:964 >> #8 0x0000000000426049 in kvm_cpu_exec (env=0x1000) at /data/qemu-kvm/qemu-kvm.c:1651 >> #9 0x000000000042627d in kvm_main_loop_cpu (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1893 >> #10 ap_main_loop (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1943 >> #11 0x00007f48ae89d070 in start_thread () from /lib64/libpthread.so.0 >> #12 0x00007f48abf0711d in clone () from /lib64/libc.so.6 >> #13 0x0000000000000000 in ?? () >> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >> $5 = (struct QCowL2Meta *) 0xcb3568 >> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} >> >> So next == first. >> > > Seen the exact same bug twice in a row while installing FC12 with IDE > disk, current qemu-kvm.git. > > qemu-system-x86_64 -drive file=/root/images/fc12-ide.img,cache=writeback \ > -m 1000 -vnc :1 \ > -net nic,model=virtio \ > -net tap,script=/root/ifup.sh -serial stdio \ > -cdrom /root/iso/linux/Fedora-12-x86_64-DVD.iso -monitor > telnet::4445,server,nowait -usbdevice tablet > > Can't reproduce though. In current git master? That's interesting news. I had kind of expected it would be fixed with c644db3d. Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset 2010-05-07 7:37 ` Kevin Wolf @ 2010-05-07 15:16 ` Marcelo Tosatti 0 siblings, 0 replies; 15+ messages in thread From: Marcelo Tosatti @ 2010-05-07 15:16 UTC (permalink / raw) To: Kevin Wolf; +Cc: Jan Kiszka, qemu-devel, kvm On Fri, May 07, 2010 at 09:37:22AM +0200, Kevin Wolf wrote: > Am 07.05.2010 03:19, schrieb Marcelo Tosatti: > > On Thu, Nov 19, 2009 at 01:19:55PM +0100, Jan Kiszka wrote: > >> Hi, > >> > >> I just managed to push a qemu-kvm process (git rev. b496fe3431) into an > >> endless loop in qcow2_alloc_cluster_offset, namely over > >> QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight): > >> > >> (gdb) bt > >> #0 0x000000000048614b in qcow2_alloc_cluster_offset (bs=0xc4e1d0, offset=7417184256, n_start=0, n_end=16, num=0xcb351c, m=0xcb3568) at /data/qemu-kvm/block/qcow2-cluster.c:750 > >> #1 0x00000000004828d0 in qcow_aio_write_cb (opaque=0xcb34d0, ret=0) at /data/qemu-kvm/block/qcow2.c:587 > >> #2 0x0000000000482a44 in qcow_aio_writev (bs=<value optimized out>, sector_num=<value optimized out>, qiov=<value optimized out>, nb_sectors=<value optimized out>, cb=<value optimized out>, opaque=<value optimized out>) at /data/qemu-kvm/block/qcow2.c:645 > >> #3 0x0000000000470e89 in bdrv_aio_writev (bs=0xc4e1d0, sector_num=2, qiov=0x7f48a9010ed0, nb_sectors=16, cb=0x470d20 <bdrv_rw_em_cb>, opaque=0x7f48a9010f0c) at /data/qemu-kvm/block.c:1362 > >> #4 0x0000000000472991 in bdrv_write_em (bs=0xc4e1d0, sector_num=14486688, buf=0xd67200 "H\a", nb_sectors=16) at /data/qemu-kvm/block.c:1736 > >> #5 0x0000000000435581 in ide_sector_write (s=0xc92650) at /data/qemu-kvm/hw/ide/core.c:622 > >> #6 0x0000000000425fc2 in kvm_handle_io (env=<value optimized out>) at /data/qemu-kvm/kvm-all.c:553 > >> #7 kvm_run (env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:964 > >> #8 0x0000000000426049 in kvm_cpu_exec (env=0x1000) at /data/qemu-kvm/qemu-kvm.c:1651 > >> #9 0x000000000042627d in kvm_main_loop_cpu (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1893 > >> #10 ap_main_loop (_env=<value optimized out>) at /data/qemu-kvm/qemu-kvm.c:1943 > >> #11 0x00007f48ae89d070 in start_thread () from /lib64/libpthread.so.0 > >> #12 0x00007f48abf0711d in clone () from /lib64/libc.so.6 > >> #13 0x0000000000000000 in ?? () > >> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > >> $5 = (struct QCowL2Meta *) 0xcb3568 > >> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > >> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} > >> > >> So next == first. > >> > > > > Seen the exact same bug twice in a row while installing FC12 with IDE > > disk, current qemu-kvm.git. > > > > qemu-system-x86_64 -drive file=/root/images/fc12-ide.img,cache=writeback \ > > -m 1000 -vnc :1 \ > > -net nic,model=virtio \ > > -net tap,script=/root/ifup.sh -serial stdio \ > > -cdrom /root/iso/linux/Fedora-12-x86_64-DVD.iso -monitor > > telnet::4445,server,nowait -usbdevice tablet > > > > Can't reproduce though. > > In current git master? That's interesting news. I had kind of expected > it would be fixed with c644db3d. Yes, with 31b460256 more precisely. And the symptom was the same as Jan reported, cluster_allocs.lh_first had le_next pointing to itself. Perhaps you can add an assert there, so it abort()'s in that case along with some useful information? I'll try to reproduce. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-05-07 15:17 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-11-19 12:19 [Qemu-devel] Endless loop in qcow2_alloc_cluster_offset Jan Kiszka 2009-11-19 14:49 ` [Qemu-devel] " Kevin Wolf 2009-11-19 14:58 ` Jan Kiszka 2009-12-07 14:16 ` Jan Kiszka 2009-12-07 14:50 ` Jan Kiszka 2009-12-07 15:03 ` Kevin Wolf 2009-12-07 15:25 ` Jan Kiszka 2009-12-07 15:04 ` Avi Kivity 2009-12-07 15:00 ` Kevin Wolf 2009-12-07 16:09 ` Jan Kiszka 2009-12-07 16:26 ` Kevin Wolf 2009-12-08 14:51 ` Kevin Wolf 2010-05-07 1:19 ` Marcelo Tosatti 2010-05-07 7:37 ` Kevin Wolf 2010-05-07 15:16 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).