Currently the scsi_debug driver wraps every queued command with the host_lock and every mid-level completion callback (apart from delay=0) with a spinlock. Attached is a patch against Linus's tree that also applies to lk 3.15.1 . It attempts to address some of these issues. ChangeLog - 'host_lock' option added that simply drops that lock when host_lock=0 (which is the default) - allow delay=-1 [delay=1 (jiffy) is still the default] that uses a tasklet to schedule the response quickly - for completions (when delay!=0) the callback to the mid-level is un-(spin)locked - completions are counted; can be viewed with cat /proc/scsi/scsi_debug/ - delay_override removed from TEST UNIT READY. This makes 'sg_turs -n 1m -t /dev/bsg/' a more realistic test of command overhead. I get about 100k IOPS on my laptop. This patch has been lightly tested. Perhaps someone could throw a scsi-mq test at it. Comments welcome. Doug Gilbert