On 10/27/2014 04:25 AM, Ketor D wrote: > Hi Jens: > After debug the v3 patch, I found there is a bug in the patch. > On the first fio_rbd_getevents loop, the fri->io_seen is set to > 1, and this variable never set to 0 again. So the program get into > endless loop in such code: > > do { > this_events = rbd_iter_events(td, &events, min, wait); > > if (events >= min) > break; > if (this_events) > continue; > > wait = 1; > } while (1); > > this_events and events always be 0, because the fri->io_seen is always > 1, so no events can be getted. > > The Bug fix is: > in the function _fio_rbd_finish_read_aiocb, > _fio_rbd_finish_write_aiocb and _fio_rbd_finish_sync_aiocb add > "fio_rbd_iou->io_seen = 0;" after "fio_rbd_iou->io_complete = 1;". So there are two issues. One is that ->io_seen should be reset in the ->queue() ops, before issuing the IO. The second is that the comp is released in a racy way, so we can't use it in getevents() reliably. The new patch moves the comp release to when we reap the event, and cleans up the ->io_seen setting as well. As far as I can tell, this should fix all cases. Additionally, it now actually checks for IO errors and handles those correctly. They were just ignored before. Gets rid of some useless casting as well, and lots of duplicated IO comp functions. If everybody involved (Mark, you) could try this one out, then I'd appreciate it. -- Jens Axboe