slave_destroy called in scsi_scan.c:scsi_probe_and_add

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
@ 2002-12-16 23:19 Justin T. Gibbs
  2002-12-17  0:03 ` Douglas Gilbert
  2002-12-17  5:41 ` Doug Ledford
  0 siblings, 2 replies; 13+ messages in thread
From: Justin T. Gibbs @ 2002-12-16 23:19 UTC (permalink / raw)
  To: linux-scsi; +Cc: dledford

In debugging a different bug in the new 2.5.X port of the aic7xxx driver,
I came across this behavior in scsi_probe_and_add_lun()

        /*
         * Since we reuse the same sdevscan over and over with different
         * target and lun values, we have to destroy and then recreate
         * any possible low level attachments since they very will might
         * also store the id and lun numbers in some form and need updating
         * with each scan.
         */
        if (sdevscan->host->hostt->slave_destroy)
                sdevscan->host->hostt->slave_destroy(sdevscan);
        if (sdevscan->host->hostt->slave_alloc)
                sdevscan->host->hostt->slave_alloc(sdevscan);

So, you cannot rely on slave_destroy as an indication of a device really
going away in the physical sense.  In SPI, for example, the driver can only
tell that the device is gone if a command is issued to it.  I had hoped that
I could detect hot-pull/scsi-remove-single-device operations via this
callback.
Granted, for some drivers, recreating and destroying state associated with a
particular device might be pretty cheap, but certainly not in all cases.
The
aic7xxx and aic79xx drivers maintain domain validation and other negotiation
state in these structures.  You certainly don't want to go through another
full
Domain Validation sequence the next time a device is allocated via
slave_alloc() if the device isn't really "new".

Any chance in changing this behavior?

Thanks,
Justin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-16 23:19 slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun() Justin T. Gibbs
@ 2002-12-17  0:03 ` Douglas Gilbert
  2002-12-17  5:41 ` Doug Ledford
  1 sibling, 0 replies; 13+ messages in thread
From: Douglas Gilbert @ 2002-12-17  0:03 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: linux-scsi, dledford

Justin T. Gibbs wrote:
> In debugging a different bug in the new 2.5.X port of the aic7xxx driver,
> I came across this behavior in scsi_probe_and_add_lun()
> 
>         /*
>          * Since we reuse the same sdevscan over and over with different
>          * target and lun values, we have to destroy and then recreate
>          * any possible low level attachments since they very will might
>          * also store the id and lun numbers in some form and need updating
>          * with each scan.
>          */
>         if (sdevscan->host->hostt->slave_destroy)
>                 sdevscan->host->hostt->slave_destroy(sdevscan);
>         if (sdevscan->host->hostt->slave_alloc)
>                 sdevscan->host->hostt->slave_alloc(sdevscan);
> 
> So, you cannot rely on slave_destroy as an indication of a device really
> going away in the physical sense.  In SPI, for example, the driver can only
> tell that the device is gone if a command is issued to it.  I had hoped that
> I could detect hot-pull/scsi-remove-single-device operations via this
> callback.
> Granted, for some drivers, recreating and destroying state associated with a
> particular device might be pretty cheap, but certainly not in all cases.
> The
> aic7xxx and aic79xx drivers maintain domain validation and other negotiation
> state in these structures.  You certainly don't want to go through another
> full
> Domain Validation sequence the next time a device is allocated via
> slave_alloc() if the device isn't really "new".
> 
> Any chance in changing this behavior?

Justin,
Yes, that behaviour does seem inconsistent with the
description of slave_alloc() and slave_destroy().

This post from 29th November shows a trace from a
device scan being done on the scsi_debug driver:
marc.theaimsgroup.com/?l=linux-scsi&m=103855771307230&w=2

I believe Pat Mansfield has done most of the recent
work in th device scan area and he is currently on
holiday.

Doug Gilbert






^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-16 23:19 slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun() Justin T. Gibbs
  2002-12-17  0:03 ` Douglas Gilbert
@ 2002-12-17  5:41 ` Doug Ledford
  2002-12-17 20:25   ` Justin T. Gibbs
  1 sibling, 1 reply; 13+ messages in thread
From: Doug Ledford @ 2002-12-17  5:41 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: linux-scsi

On Mon, Dec 16, 2002 at 04:19:46PM -0700, Justin T. Gibbs wrote:
> In debugging a different bug in the new 2.5.X port of the aic7xxx driver,
> I came across this behavior in scsi_probe_and_add_lun()
> 
>         /*
>          * Since we reuse the same sdevscan over and over with different
>          * target and lun values, we have to destroy and then recreate
>          * any possible low level attachments since they very will might
>          * also store the id and lun numbers in some form and need updating
>          * with each scan.
>          */
>         if (sdevscan->host->hostt->slave_destroy)
>                 sdevscan->host->hostt->slave_destroy(sdevscan);
>         if (sdevscan->host->hostt->slave_alloc)
>                 sdevscan->host->hostt->slave_alloc(sdevscan);
> 
> So, you cannot rely on slave_destroy as an indication of a device really
> going away in the physical sense.

No, you can.  In the code snippet above you might be destroying something 
at scsi0:0:0:0 and adding something at scsi0:0:1:0.  Regardless, the thing 
being destroyed is in fact going away permanently.  Whenever we do find a 
device, we actually allocate a new device struct identical to our current 
device struct and call slave_alloc() for the newly created device.  So, 
whenever we find a new device, there will be a momentary point in time at 
which two device structs will exist that point to the same device.  After 
the new device is allocated and set up, the original sdevscan device is 
simply renumbered in place (by updating the target or lun value) and then 
we call slave_destroy()/slave_alloc() so that the low level driver can 
also update their target/lun values to match.

>  In SPI, for example, the driver can only
> tell that the device is gone if a command is issued to it.  I had hoped that
> I could detect hot-pull/scsi-remove-single-device operations via this
> callback.

You can.  On any device we find, at device tear down time your 
slave_destroy() entry point will get called right before the device struct 
itself is kfree()ed.

> Granted, for some drivers, recreating and destroying state associated with a
> particular device might be pretty cheap, but certainly not in all cases.
> The
> aic7xxx and aic79xx drivers maintain domain validation and other negotiation
> state in these structures.  You certainly don't want to go through another
> full
> Domain Validation sequence the next time a device is allocated via
> slave_alloc() if the device isn't really "new".

In this case I would suggest that the better course of action is to delay 
any domain validation stuff until the slave_configure() call.  The 
original intent of slave_alloc() was for it to be as lightweight as 
possible simply because I knew that it would get called for every single 
target/lun value scanned.  Once we do find a device though, and once we 
have retrieved INQUIRY data so we know what the device is capable of, then 
we get the slave_configure() call which is a much more appropriate place 
for heavier initializations once you know a device has been found and not 
that we are just scanning.

> Any chance in changing this behavior?

Possibly.  But, does the explanation above and the one suggestion above 
solve your issues?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-17  5:41 ` Doug Ledford
@ 2002-12-17 20:25   ` Justin T. Gibbs
  2002-12-17 22:24     ` Doug Ledford
  0 siblings, 1 reply; 13+ messages in thread
From: Justin T. Gibbs @ 2002-12-17 20:25 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

>> So, you cannot rely on slave_destroy as an indication of a device really
>> going away in the physical sense.
> 
> No, you can.  In the code snippet above you might be destroying something 
> at scsi0:0:0:0 and adding something at scsi0:0:1:0.  Regardless, the
> thing  being destroyed is in fact going away permanently.

The SDEV, yes.  The physical device, not necessarily.  The concern is that
the API should only be called when the system is saying "physical device
gone", not due to some quirk in how the mid-layer manages its data objects.
In otherwords, the mid-layer could just as easily free the sdev every
time a probe fails, retain the already allocated sdev for the "found
device", and allocate a new sdev for each new probe.  This would avoid
callbacks for physical devices that Linux has successfully probed.

> Whenever we do
> find a  device, we actually allocate a new device struct identical to our
> current  device struct and call slave_alloc() for the newly created
> device.  So,  whenever we find a new device, there will be a momentary
> point in time at  which two device structs will exist that point to the
> same device.  After  the new device is allocated and set up, the original
> sdevscan device is  simply renumbered in place (by updating the target or
> lun value) and then  we call slave_destroy()/slave_alloc() so that the
> low level driver can  also update their target/lun values to match.

Actually, this danger only exists if the low lever driver attaches
something to the SDEV and/or has a pointer to the SDEV.  The device
at a particular target/lun is still the same device.  The little dance
performed in the mid-layer can't change that.

>>  In SPI, for example, the driver can only
>> tell that the device is gone if a command is issued to it.  I had hoped
>> that I could detect hot-pull/scsi-remove-single-device operations via
>> this callback.
> 
> You can.  On any device we find, at device tear down time your 
> slave_destroy() entry point will get called right before the device
> struct  itself is kfree()ed.

The problem is that the SDEV lifetime is not representative to the
device's lifetime.

>> Granted, for some drivers, recreating and destroying state associated
>> with a particular device might be pretty cheap, but certainly not in all
>> cases. The
>> aic7xxx and aic79xx drivers maintain domain validation and other
>> negotiation state in these structures.  You certainly don't want to go
>> through another full
>> Domain Validation sequence the next time a device is allocated via
>> slave_alloc() if the device isn't really "new".
> 
> In this case I would suggest that the better course of action is to delay 
> any domain validation stuff until the slave_configure() call.

Waiting for Linux to get around to probing devices will cause, at minimum,
a 2X (7902 is a dual channel device) increase in the time it takes to
perform domain validation.  Because Linux does not probe busses in
parallel, both of these drivers actually perform their DV to all busses in
parallel
before the return from the detect routine (and on any later hot-plugged
device).  This means that the underlying, per-device, data structures are
already configured prior to the first call to slave_alloc() or
slave_configure().  Now I can work around this little wart by ignoring
slave_destroy() calls that occur before slave_configure() is called, but
I would rather the API have semantics that mirror the physical world
rather than be dictated by poor programming practice.

> The 
> original intent of slave_alloc() was for it to be as lightweight as 
> possible simply because I knew that it would get called for every single 
> target/lun value scanned.  Once we do find a device though, and once we 
> have retrieved INQUIRY data so we know what the device is capable of,
> then  we get the slave_configure() call which is a much more appropriate
> place  for heavier initializations once you know a device has been found
> and not  that we are just scanning.

Actually, the slave_configure() is postponed until way after the inquiry
data is retrieved.  If slave_congigure() were called as soon as the
device were properly detected, the slave_destroy() in scsi_scan.c would
be destroying a device that had been slave_configured.

--
Justin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-17 20:25   ` Justin T. Gibbs
@ 2002-12-17 22:24     ` Doug Ledford
  2002-12-17 22:33       ` Christoph Hellwig
  2002-12-18  2:07       ` Justin T. Gibbs
  0 siblings, 2 replies; 13+ messages in thread
From: Doug Ledford @ 2002-12-17 22:24 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: linux-scsi

On Tue, Dec 17, 2002 at 01:25:07PM -0700, Justin T. Gibbs wrote:
> >> So, you cannot rely on slave_destroy as an indication of a device really
> >> going away in the physical sense.
> > 
> > No, you can.  In the code snippet above you might be destroying something 
> > at scsi0:0:0:0 and adding something at scsi0:0:1:0.  Regardless, the
> > thing  being destroyed is in fact going away permanently.
> 
> The SDEV, yes.  The physical device, not necessarily.

We're arguing semantics here.  On the extra slave_destroy() calls, one of 
two things will be true.  A) there was a device present, another 
slave_alloc() for this exact device has already been made, and we are now 
just destroying the extraneous reference to the device or B) there was no 
device and the destroy is in fact a real destroy event.  So, if in your 
slave_alloc() area you are attaching a scsi_device struct to other 
internal state information, you could in fact do a refcount attachement, 
and then on any transition from non-0 to 0 in the refcount you would in 
fact have a true destroy event.

Now, that being said, I agree that the above is utter crap.  It's dictated 
by historical fact in the scsi_scan.c code.  The scenario you point out 
below is actually what I would prefer and would have done, but I can in 
fact see the point that creating a new block queue for each device you 
scan is insane, especially on some fiber channel controllers doing 
probe_all_luns on very high lun limits.  At bootup I wouldn't really care 
about the overhead of creating all these queues.  I'm more concerned with 
hot plug events that bring a new controller online when a system is 
already under some amount of memory pressure.  My concern is that in that 
situation, allocating 256 block layer request structs plus a few other 
items for each device/lun we scan might in fact be excessive memory churn.  
Anyone who wishes is free to convince me otherwise, in which case we can 
fix the scan code properly and make it do like you are suggesting.

>  The concern is that
> the API should only be called when the system is saying "physical device
> gone", not due to some quirk in how the mid-layer manages its data objects.

Well, whatever other meaning you want to attach to it, it *is* about data 
objects.  That's why it's called slave_alloc()/slave_destroy() which are 
standard object manipulation terms.  One of the reasons I did this was so 
that in my driver I could yank out a bunch of host structs of the form

	unsigned char	queue_depth[MAX_TARGETS][MAX_LUNS};

and instead add a new struct aic_dev_data that I attach directly to 
device->hostdata during slave_alloc(), and in that struct aic_dev_data 
goes *all* the state data needed for that device.  It resulted in a not 
insignificant reduction in memory foot print for the driver.  And, once 
you start talking about higher and higher lun values, need I again mention 
fiber channel here, keeping complete tables like that becomes wasteful in 
the extreme.  So, the actual *primary* reason for adding this interface 
was exactly this manipulation of data objects.  The secondary use that a 
slave_destroy() event can trigger a device shutdown sequence in your 
driver was a bonus, but because of hokey crap in the scsi scan code it's a 
bonus that requires you refcount things instead of just assuming that a 
slave_destroy() is permanent.  My driver doesn't have that particular 
problem because when the scsi mid layer allocates the new sdev and calls 
slave_alloc(), I create a new struct, attach it, init it, then treat the 
two sdevs as totally independant entities, so when the destroy comes in on 
the probe device, all I have to do is kfree() my aic_dev stuff.  If I were 
to hook this for actual device shut down commands, then I would have to 
jump through the same hoops as you (or else do something simpler, like 
have slave_configure() set a AHC_DEVICE_RUNNING flag in your device state 
and on slave_destroy() only bother performing any shutdown operations if 
that flag is set).

> In otherwords, the mid-layer could just as easily free the sdev every
> time a probe fails, retain the already allocated sdev for the "found
> device", and allocate a new sdev for each new probe.  This would avoid
> callbacks for physical devices that Linux has successfully probed.
> 
> > Whenever we do
> > find a  device, we actually allocate a new device struct identical to our
> > current  device struct and call slave_alloc() for the newly created
> > device.  So,  whenever we find a new device, there will be a momentary
> > point in time at  which two device structs will exist that point to the
> > same device.  After  the new device is allocated and set up, the original
> > sdevscan device is  simply renumbered in place (by updating the target or
> > lun value) and then  we call slave_destroy()/slave_alloc() so that the
> > low level driver can  also update their target/lun values to match.
> 
> Actually, this danger only exists if the low lever driver attaches
> something to the SDEV and/or has a pointer to the SDEV.  The device
> at a particular target/lun is still the same device.  The little dance
> performed in the mid-layer can't change that.

Hanging data structs off of the sdev->hostdata place holder is exactly 
what this was intended to support.

> >>  In SPI, for example, the driver can only
> >> tell that the device is gone if a command is issued to it.  I had hoped
> >> that I could detect hot-pull/scsi-remove-single-device operations via
> >> this callback.
> > 
> > You can.  On any device we find, at device tear down time your 
> > slave_destroy() entry point will get called right before the device
> > struct  itself is kfree()ed.
> 
> The problem is that the SDEV lifetime is not representative to the
> device's lifetime.

Only true for the sdev used to scan the devices.  The use of either 
refcounting or the simple flag set during slave_configure() either one 
would solve your problem.  And yes, I do agree that this answer is ugly, 
but I need someone to convince me that block queue allocations on a live 
and possibly busy machine aren't something that could cause problems 
before I would change it myself.

Well, that or I need Jens to clean up the block layer allocation code so 
that it only allocates one request at block queue init time and from then 
on does lazy request allocations once the device needs them, similar to 
what I did with the scsi command blocks.  That way, strong memory pressure 
and huge, wasted allocations would be avoided.  It would also make sense 
to me if Jens would add a feature to the block layer so that when we 
adjust our scsi queue depth on our devices that we can indicate our queue 
depth to the block layer so that the block layer could adjust its request 
depth to some amount > our queue depth for optimum merging and what not.  
Dynamic changes to this depth at the block layer would also be nice.  If 
these changes were made, then I would rip the damn scsi_scan code apart 
and put it back together right using the exact semantics you outlined 
above.

> Actually, the slave_configure() is postponed until way after the inquiry
> data is retrieved.  If slave_congigure() were called as soon as the
> device were properly detected, the slave_destroy() in scsi_scan.c would
> be destroying a device that had been slave_configured.

Heh, if you write your code that way, then yes.  In my driver it would 
not:

alloc_sdev(sdev1)
slave_alloc(sdev1) - attaches struct to sdev
INQUIRY
  found device!
  alloc_sdev(sdev2)
  copy state from sdev1 to sdev2
  slave_alloc(sdev2) - attaches a freshly kmalloced struct to sdev2
  slave_configure(sdev2) - inits sdev2->hostdata struct state
slave_destroy(sdev1) - destroys original struct, leaving the configured 
struct untouched

In my driver I don't attempt to do anything like send cache flush commands 
or the like, so I don't have the shutdown difficulties you do and things 
work quite nicely.

We could make things work much nicer for your driver if Jens makes those 
changes to the block layer that I just suggested (hint! hint!)

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-17 22:24     ` Doug Ledford
@ 2002-12-17 22:33       ` Christoph Hellwig
  2002-12-17 22:54         ` Andrew Morton
  2002-12-18  2:07       ` Justin T. Gibbs
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2002-12-17 22:33 UTC (permalink / raw)
  To: Justin T. Gibbs, linux-scsi

On Tue, Dec 17, 2002 at 05:24:59PM -0500, Doug Ledford wrote:
> Well, that or I need Jens to clean up the block layer allocation code so 
> that it only allocates one request at block queue init time and from then 
> on does lazy request allocations once the device needs them, similar to 
> what I did with the scsi command blocks.

What do you think about forward-porting akpm's blk_grow_request_list()
changes in 2.4?  I could do that easily and it should help to get this 
sorted out properly.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-17 22:33       ` Christoph Hellwig
@ 2002-12-17 22:54         ` Andrew Morton
  2002-12-18  1:00           ` Doug Ledford
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2002-12-17 22:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Justin T. Gibbs, linux-scsi

Christoph Hellwig wrote:
> 
> On Tue, Dec 17, 2002 at 05:24:59PM -0500, Doug Ledford wrote:
> > Well, that or I need Jens to clean up the block layer allocation code so
> > that it only allocates one request at block queue init time and from then
> > on does lazy request allocations once the device needs them, similar to
> > what I did with the scsi command blocks.
> 
> What do you think about forward-porting akpm's blk_grow_request_list()
> changes in 2.4?  I could do that easily and it should help to get this
> sorted out properly.
> 

wakes up.

I think (hope) the plan there is to do away with the preallocated
per-queue request lists altogether.  Just allocate the requests
direct from slab at __make_request().

The request slab would have to be backed by a (small) mempool of
course.  get_request_wait() should disappear, and management of
the amount of memory which is under IO is moved up to the VM layer.

This means that we have basically no upper bound on the amount of
memory which can be placed under I/O.  Which used to give the VM
a heart-attack, but it should be OK for writes now.  If not I'll
fix it up.  There could be issues with reads and direct-IO.

The rbtree elevator becomes compulsory with such potentially-large
queues, but that's working fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-17 22:54         ` Andrew Morton
@ 2002-12-18  1:00           ` Doug Ledford
  2002-12-18  1:03             ` William Lee Irwin III
  2002-12-18  1:22             ` Andrew Morton
  0 siblings, 2 replies; 13+ messages in thread
From: Doug Ledford @ 2002-12-18  1:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Hellwig, Justin T. Gibbs, linux-scsi

On Tue, Dec 17, 2002 at 02:54:43PM -0800, Andrew Morton wrote:
> 
> wakes up.
> 
> I think (hope) the plan there is to do away with the preallocated
> per-queue request lists altogether.  Just allocate the requests
> direct from slab at __make_request().

So, what is the overhead of using the slab allocator on each command?  If 
you prealloc a reasonable queue, allocation from that queue is O(1).  
Would we suffer no/little/large penalty using slab instead?

/me hasn't gone looking in the slab allocator and has no idea how well it 
actually works at being a cache...

Second issue I have is that overly large request queues have never seemed 
to help performance in my experience.  At a certain point the overhead of 
the queue and merging so many requests, etc. becomes greater than the gain 
of the increased depth and starts to slow things back down.  So, my 
question to you, is why would we *want* to be able to have huge queues?

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-18  1:00           ` Doug Ledford
@ 2002-12-18  1:03             ` William Lee Irwin III
  2002-12-18  1:22             ` Andrew Morton
  1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2002-12-18  1:03 UTC (permalink / raw)
  To: Andrew Morton, Christoph Hellwig, Justin T. Gibbs, linux-scsi

On Tue, Dec 17, 2002 at 02:54:43PM -0800, Andrew Morton wrote:
>> I think (hope) the plan there is to do away with the preallocated
>> per-queue request lists altogether.  Just allocate the requests
>> direct from slab at __make_request().

On Tue, Dec 17, 2002 at 08:00:50PM -0500, Doug Ledford wrote:
> So, what is the overhead of using the slab allocator on each command?  If 
> you prealloc a reasonable queue, allocation from that queue is O(1).  
> Would we suffer no/little/large penalty using slab instead?
> /me hasn't gone looking in the slab allocator and has no idea how well it 
> actually works at being a cache...
> Second issue I have is that overly large request queues have never seemed 
> to help performance in my experience.  At a certain point the overhead of 
> the queue and merging so many requests, etc. becomes greater than the gain 
> of the increased depth and starts to slow things back down.  So, my 
> question to you, is why would we *want* to be able to have huge queues?

It helps here. When there's a lot of RAM, getting a decent fraction
of memory in-flight by and large overloads the queues. Also, blocking
explicitly breaks the asynchronous semantics expected by the apps.


Bill

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-18  1:00           ` Doug Ledford
  2002-12-18  1:03             ` William Lee Irwin III
@ 2002-12-18  1:22             ` Andrew Morton
  2002-12-18  3:22               ` Luben Tuikov
  1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2002-12-18  1:22 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Christoph Hellwig, Justin T. Gibbs, linux-scsi

Doug Ledford wrote:
> 
> On Tue, Dec 17, 2002 at 02:54:43PM -0800, Andrew Morton wrote:
> >
> > wakes up.
> >
> > I think (hope) the plan there is to do away with the preallocated
> > per-queue request lists altogether.  Just allocate the requests
> > direct from slab at __make_request().
> 
> So, what is the overhead of using the slab allocator on each command?  If
> you prealloc a reasonable queue, allocation from that queue is O(1).
> Would we suffer no/little/large penalty using slab instead?
> 
> /me hasn't gone looking in the slab allocator and has no idea how well it
> actually works at being a cache...

It's wickedly quick.  There won't be a problem there...

> Second issue I have is that overly large request queues have never seemed
> to help performance in my experience.  At a certain point the overhead of
> the queue and merging so many requests, etc. becomes greater than the gain
> of the increased depth and starts to slow things back down.  So, my
> question to you, is why would we *want* to be able to have huge queues?
> 

Generally, huge queues don't appear to buy us much.  And in principle
they shouldn't, because we should be ordering write better at the VFS layer.

In practice, I expect there will be certain workloads at which the VFS
is known to behave very suboptimally in which the huge queues will
make a large difference.  Mainly the "seek all around a huge file writing
stuff" workload.

The other "in principle" thing here is that the VM/VFS should _not_
be dependent on request exhaustion for its own throttling, because
that will fail if there are a large number of disks, or if the
machine has a teeny amount of memory.

So I expect that in practice we would be unlikely to go beyond
128 requests per direction per queue.  But it should be an
option, and it should work well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-18  1:22             ` Andrew Morton
@ 2002-12-18  3:22               ` Luben Tuikov
  0 siblings, 0 replies; 13+ messages in thread
From: Luben Tuikov @ 2002-12-18  3:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Doug Ledford, Christoph Hellwig, Justin T. Gibbs, linux-scsi

Andrew Morton wrote:
> 
[...]
> So I expect that in practice we would be unlikely to go beyond
> 128 requests per direction per queue.  But it should be an
> option, and it should work well.

I have an actual number: 512*. This is the number which /proc/slabinfo
tells me, and I suspect that the number of active (LLDD owner) commands
is a lot less than this.**

* I get this number when writing (I->T) just over 100 GiB file,
using a ``mini'' scsi-core which I wrote for a project at work.
And this, 512, number is the largest value I've seen so far,
i.e. the largest pool.

** I can actually obtain the exact (average) number of active commands
but I don't think that there's any need to. (QED)

But, yes, using the slab allocator (with the appropriate flags(!), etc...)
isn't a bad idea at all.

-- 
Luben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-17 22:24     ` Doug Ledford
  2002-12-17 22:33       ` Christoph Hellwig
@ 2002-12-18  2:07       ` Justin T. Gibbs
  2002-12-18  3:35         ` Doug Ledford
  1 sibling, 1 reply; 13+ messages in thread
From: Justin T. Gibbs @ 2002-12-18  2:07 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-scsi

> Now, that being said, I agree that the above is utter crap.

Thank you. 8-)

>>  The concern is that
>> the API should only be called when the system is saying "physical device
>> gone", not due to some quirk in how the mid-layer manages its data
>> objects.
> 
> Well, whatever other meaning you want to attach to it, it *is* about data 
> objects.  That's why it's called slave_alloc()/slave_destroy() which are 
> standard object manipulation terms.  One of the reasons I did this was so 
> that in my driver I could yank out a bunch of host structs of the form
> 
> 	unsigned char	queue_depth[MAX_TARGETS][MAX_LUNS};
> 
> and instead add a new struct aic_dev_data that I attach directly to 
> device->hostdata during slave_alloc(), and in that struct aic_dev_data 
> goes *all* the state data needed for that device. It resulted in a not 
> insignificant reduction in memory foot print for the driver.

Well, by dynamically allocating these types of things, and aggregating
them into a single structure instead of a bunch of arrays, I would
expect such a reduction.  The only difference between what your driver
is doing and what mine does is that there is a single static array for
indexing targets.  The rest of the data structures are dynamically allocated
and only live for "real" devices.  The lookup array is a requirement for
supporting 2.4.X.

...

> The secondary use that a  slave_destroy() event can trigger a device
> shutdown sequence in your  driver was a bonus, but because of hokey crap
> in the scsi scan code it's a  bonus that requires you refcount things
> instead of just assuming that a  slave_destroy() is permanent.

I decided to instrument what the SCSI layer does with these calls before
looking at refcounting.  Here's the output of the scan of a bus with
two drives on it: id 0 and id 1.

scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.23
        <Adaptec aic7899 Ultra160 SCSI adapter>
        aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

scsi0: Slave Alloc 0
scsi0: Slave Destroy 0
scsi0: Slave Alloc 0
scsi0: Slave Alloc 0
  Vendor: SEAGATE   Model: ST39236LWV        Rev: 0010
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0: Slave Destroy 0
scsi0: Slave Alloc 0
scsi0: Slave Destroy 1
scsi0: Slave Alloc 1
scsi0: Slave Alloc 1
  Vendor: SEAGATE   Model: ST39236LW         Rev: 0010
  Type:   Direct-Access                      ANSI SCSI revision: 03
scsi0: Slave Destroy 1
scsi0: Slave Alloc 1
scsi0: Slave Destroy 2
scsi0: Slave Alloc 2
(scsi0:A:1): 160.000MB/s transfers (80.000MHz DT, offset 31, 16bit)
scsi0: Slave Destroy 3
scsi0: Slave Alloc 3
scsi0: Slave Destroy 4
scsi0: Slave Alloc 4
scsi0: Slave Destroy 5
scsi0: Slave Alloc 5
scsi0: Slave Destroy 6
scsi0: Slave Alloc 6
scsi0: Slave Destroy 8
scsi0: Slave Alloc 8
scsi0: Slave Destroy 9
scsi0: Slave Alloc 9
scsi0: Slave Destroy 10
scsi0: Slave Alloc 10
scsi0: Slave Destroy 11
scsi0: Slave Alloc 11
scsi0: Slave Destroy 12
scsi0: Slave Alloc 12
scsi0: Slave Destroy 13
scsi0: Slave Alloc 13
scsi0: Slave Destroy 14
scsi0: Slave Alloc 14
scsi0: Slave Destroy 15
scsi0: Slave Alloc 15
scsi0: Slave Destroy 15

...

scsi0: Slave Configure 0
scsi0: Slave Configure 1

Notice that for all IDs but 0, a slave destroy call is performed
prior to any slave allocations.  Very nice.  Note that I wasn't
complaining that I couldn't work around this kind of crap, just
that this crap is unsettling. 8-)

> In my driver I don't attempt to do anything like send cache flush
> commands  or the like, so I don't have the shutdown difficulties you do
> and things  work quite nicely.

It's not about cache flushing or anything else.  It's about knowing if
you need to retest the connection to a device, whether or not you should
force a renegotiation the next time a command is attempted to a device,
etc. etc.  So, when linux tells the driver that the device is gone,
we setup our state so that we will not trust the old negotiated values
and will perform Domain Validation again should the device return.

--
Justin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun()
  2002-12-18  2:07       ` Justin T. Gibbs
@ 2002-12-18  3:35         ` Doug Ledford
  0 siblings, 0 replies; 13+ messages in thread
From: Doug Ledford @ 2002-12-18  3:35 UTC (permalink / raw)
  To: Justin T. Gibbs; +Cc: linux-scsi

On Tue, Dec 17, 2002 at 07:07:34PM -0700, Justin T. Gibbs wrote:
> I decided to instrument what the SCSI layer does with these calls before
> looking at refcounting.  Here's the output of the scan of a bus with
> two drives on it: id 0 and id 1.
> 
> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.23
>         <Adaptec aic7899 Ultra160 SCSI adapter>
>         aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> 
> scsi0: Slave Alloc 0

This is the initial scsi_alloc_sdev() slave_alloc() call.

> scsi0: Slave Destroy 0
> scsi0: Slave Alloc 0

This pair is the first time through the probe_and_add_lun() routine where 
we destroy the old slave and alloc a new one because, all except for this 
first time through, we will be updating the device number.  We also send 
the actual INQUIRY command in between this alloc and the alloc below.

> scsi0: Slave Alloc 0

We alloc the new sdev here and copy the stuff from the scan sdev into it.

>   Vendor: SEAGATE   Model: ST39236LWV        Rev: 0010
>   Type:   Direct-Access                      ANSI SCSI revision: 03

Print the info.

> scsi0: Slave Destroy 0
> scsi0: Slave Alloc 0

I think, but I'm not positive, that this extra Destroy/Alloc pair is from 
the attempt to scan for lun information.

> scsi0: Slave Destroy 1
> scsi0: Slave Alloc 1

And here's the gotcha.  The destroy call you see points to 1 because we 
have already updated the sdev->target value in place before calling 
destroy.  In my case, it's a simple kfree() that doesn't care about what 
number or anything else, it knows what to kfree() simply by grabbing 
sdev->hostdata.  Then we alloc a struct for 1 which is the next device to 
be scanned.  The rest of this is pretty much a straight repeat of events.

> ...
> 
> scsi0: Slave Configure 0
> scsi0: Slave Configure 1
> 
> Notice that for all IDs but 0, a slave destroy call is performed
> prior to any slave allocations.  Very nice.  Note that I wasn't
> complaining that I couldn't work around this kind of crap, just
> that this crap is unsettling. 8-)

I stared at the scsi_scan.c code for a full day, did about 3 rewrites and 
threw them all away half way through each time before I finally gave in 
and left the scan code as much alone as I could.  But, I think I have the 
answer.

I'm headed to a midnight showing of the Two Towers in 30 minutes (OK, the
movie isn't for an hour and a half, but I'm meeting friends at the local
coffee shop to sit around and bullshit until it starts), but when I 
get home from that I'll see if I can't get my idea coded up and out 
before I crash.  Assuming I do, you will get exactly one alloc per 
channel/target/lun value scanned, exactly one configure per device found 
(although still delayed like it is now), and exactly one immediate destroy 
per device alloced and not found and exactly one destroy immediately 
prior to device free at the mid layer level (and this time all the destroy 
events will come after the corresponding alloc :-)

> > In my driver I don't attempt to do anything like send cache flush
> > commands  or the like, so I don't have the shutdown difficulties you do
> > and things  work quite nicely.
> 
> It's not about cache flushing or anything else.  It's about knowing if
> you need to retest the connection to a device, whether or not you should
> force a renegotiation the next time a command is attempted to a device,
> etc. etc.  So, when linux tells the driver that the device is gone,
> we setup our state so that we will not trust the old negotiated values
> and will perform Domain Validation again should the device return.

It's coming together well enough in my mind now that I'm 99% positive 
you'll have this by tomorrow morning.

/me pulls bk latest so he can get a small start before leaving

-- 
  Doug Ledford <dledford@redhat.com>     919-754-3700 x44233
         Red Hat, Inc. 
         1801 Varsity Dr.
         Raleigh, NC 27606

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-12-18  3:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-12-16 23:19 slave_destroy called in scsi_scan.c:scsi_probe_and_add_lun() Justin T. Gibbs
2002-12-17  0:03 ` Douglas Gilbert
2002-12-17  5:41 ` Doug Ledford
2002-12-17 20:25   ` Justin T. Gibbs
2002-12-17 22:24     ` Doug Ledford
2002-12-17 22:33       ` Christoph Hellwig
2002-12-17 22:54         ` Andrew Morton
2002-12-18  1:00           ` Doug Ledford
2002-12-18  1:03             ` William Lee Irwin III
2002-12-18  1:22             ` Andrew Morton
2002-12-18  3:22               ` Luben Tuikov
2002-12-18  2:07       ` Justin T. Gibbs
2002-12-18  3:35         ` Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox