* scsi per-device lock [was Re: [RFC][PATCH] ... ]
@ 2003-03-06 19:40 Patrick Mansfield
0 siblings, 0 replies; only message in thread
From: Patrick Mansfield @ 2003-03-06 19:40 UTC (permalink / raw)
To: James Bottomley; +Cc: SCSI Mailing List
On Thu, Mar 06, 2003 at 12:04:04PM -0600, James Bottomley wrote:
> On Thu, 2003-03-06 at 11:41, Patrick Mansfield wrote:
> > Such a solution complicates moving to a per-device queue lock - we need a
> > per-host lock while pulling off the restart list, and we need a
> > per-queue-lock prior to calling __blk_run_queue(); but when starving in
> > the scsi_request_fn(), we have a per-queue-lock and need the per-host lock
> > to add to the restart list. So we have to allow dropping a lock in both
> > cases without leading to a hung requeset queue.
>
> Actually, I don't think it does. Any driver that can truly have a per
> device lock can't have a fixed size pool of resources that device
> requests are competing for (otherwise it would need a host lock to
> access these resources), thus can_queue should always be set to a number
> much greater than the aggregate of the local device queue depths (we'd
> need a comment in the code to this effect).
I don't quite follow the above - you mean a per device lock in the actual
LLDD probably has no benefit? Yes, I agree with that.
But we do not need the host_lock while processing the request queue, or
modifying scsi_device values. Holding the host_lock in such cases not only
prevents IO from completing (i.e. host_busy decrement), but also blocks
other disks on the same adapter from prepping a request, allocating a
scsi_cmnd or checking scsi_device limits.
An LLDD that uses host_lock to protect scsi_device or its request queue is
doing something wrong (it can reference invariant fields of scsi_device
such as host/chan/id/lun without a lock, and generally it can reference
scsi_cmnd fields without a lock).
> Do we actually have any LLDs in the tree today that can benefit from a
> device based locking approach? (i.e. they must manage their resource
> pools on a device basis and also have a mailbox scheme that doesn't
> require host based locking).
>
> James
I'm not proposing changing the LLDD's to use a per-device lock, rather
that scsi-core use a per device lock to protect the request queue and the
scsi_device. The scsi core would still use the host_lock to protect
scsi_host and LLDD calls (the silly lock/call/unlock sequences).
Do you see any major problems with this approach?
I have a lock split patch, built on top of the previous pending_cmd queue
patch; it still needs code to handle the single_lun case, and it has a
lock hierarchy. I can post it as-is but it is still in progress, and I
(now) have to change the pending_cmd to instead be a queue of starved
request queues.
I ran with and without the lock split on a 4 processor NUMAQ with 10
drives on a single qla 2300 adapter using the feral driver. I also ran
some tests on the same system NUMAQ with 8 procs and about 39 drives.
I used a simple program to try and maximize IOs/second - it uses O_DIRECT
and rereads the first 512 bytes of the disk. One of these is run per disk
for a fixed number of IO's.
The 4x with 10 drives, each drive rereading 20000 times using the
current code gave (don't know how it got 103%):
0.51user 21.21system 0:21.08elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6670major+2705minor)pagefaults 0swaps
Top oprofile users (low to high):
c0168180 1156 1.10736 dio_await_one
c0180105 1365 1.30757 .text.lock.inode
c01e0ab0 1512 1.44839 scsi_dispatch_cmd
c0115cb0 1545 1.48 try_to_wake_up
c01164c0 1621 1.5528 load_balance
c01e7087 1772 1.69745 .text.lock.scsi_lib
c0151cc9 3380 3.2378 .text.lock.block_dev
c0116d20 3491 3.34413 schedule
c01e1cf5 4148 3.97348 .text.lock.scsi
c01cb945 7584 7.26492 .text.lock.ll_rw_blk
c0107010 25712 24.6302 default_idle
With the lock split the same test gave:
0.43user 14.33system 0:20.55elapsed 71%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (6670major+2705minor)pagefaults 0swaps
Top oprofile users (low to high):
c01077a0 880 1.02679 __switch_to
c0180105 888 1.03612 .text.lock.inode
c01896b0 894 1.04313 start_this_handle
c01e6cc0 912 1.06413 scsi_check_shost
c01e07f0 965 1.12597 scsi_get_command
c0168180 1106 1.29049 dio_await_one
c0115cb0 1483 1.73037 try_to_wake_up
c01164c0 1489 1.73738 load_balance
c0151cc9 2706 3.15738 .text.lock.block_dev
c0116d20 3470 4.04882 schedule
c0107010 24072 28.0874 default_idle
Even though elapsed times were nearly equal with and without the lock
split, system time went from 21 to 14 seconds, a fairly significant drop
for this super-micro-mini benchmark.
I don't have the 8x numbers with 39 drives available, but the *elapsed*
time went from about 60 seconds to about 40 seconds.
Any suggestions for a simple multiple-disk benchmark?
-- Patrick Mansfield
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2003-03-06 19:40 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-03-06 19:40 scsi per-device lock [was Re: [RFC][PATCH] ... ] Patrick Mansfield
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox