* Re: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
[not found] <0E3FA95632D6D047BA649F95DAB60E570230C7DB@exa-atlanta.se.lsil.com>
@ 2004-04-16 18:01 ` Jeff Garzik
2004-04-16 18:24 ` Brian King
0 siblings, 1 reply; 8+ messages in thread
From: Jeff Garzik @ 2004-04-16 18:01 UTC (permalink / raw)
To: Bagalkote, Sreenivas
Cc: 'Matt_Domsch@dell.com', 'paul@kungfoocoder.org',
Mukker, Atul, 'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
Thanks much for attaching the patch to your email, that makes review a
lot easier.
Comments:
0) Am I to judge from megaraid_mbox_build_cmd that megaraid does not
really support SCSI, but it translates instead? This may imply that
implementation as a SCSI driver is inappropriate.
1) Remove back-compat code from kdep.h. It's OK for devel and for
vendor-specific branch, but we wouldn't want this in 2.6.x mainline.
2) Buggy ADDR definitions:
+#define ADDR_LO(addr) (((unsigned long)(addr)) & 0xffffffff)
+#define ADDR_HI(addr) ((((ulong)(addr)) & 0xffffffff00000000ULL) >> 32)
+#define ADDR_64(hi, lo) ((((uint64_t)(hi)) << 32) | (lo))
Don't cast to unsigned long, cast to u64.
3) Delete non-standard types: ulong
4) For kernel-internal code, prefer types u8/u16/u32/u64 to C99 types,
though it's not a big deal. Mainly, prefer consistency.
5) No foo_t structures. Linux kernel style is "struct foo" not "foo_t":
+typedef struct mraid_hba_info {
+} __attribute__ ((packed)) mraid_hba_info_t;
+
+typedef struct mcontroller {
+} __attribute__ ((packed)) mcontroller_t;
6) Kill uses of "volatile". _Most_ of the time, this indicates buggy
code. You should have the proper barriers in place: mb(), wmb(),
rmb(), barrier(), and cpu_relax(). This has been mentioned before :)
7) Allow me to compliment you on proper kernel-doc documentation.
8) Consider adding ____cacheline_aligned markers to the definition of
struct adapter (a.k.a. adapter_t, before item #5 is fixed). Re-arrange
your adapter structure so that struct members that are used together are
kept in the same cacheline. For example, in net drivers, I usually
split the structures into "TX", "RX", and "everything else" sections.
Keep in mind that cacheline size varies (32/64/128 usually), so this is
not an absolute limit, just a marker where to physically separate into
distinct cachelines.
9) {SET,IS}_PRV_INTF_AVAILABLE's use of atomic_foo() seems racy.
10) In 2.6, use the type-safe module_param() rather than MODULE_PARM()
11) Why is PAGE_SIZE chosen here?
+ /*
+ * Allocate all pages in a loop
+ */
+ for (i = 0; i < num_pages; i++) {
+
+ pool->page_arr[i] = pci_alloc_consistent( dev, PAGE_SIZE-1,
+
&pool->dmah_arr[i] );
+ if (pool->page_arr[i] == NULL) {
+ con_log(CL_ANN, (KERN_WARNING
+ "megaraid: Failed to alloc page # %d\n",
i ));
+ goto memlib_fail_alloc;
+ }
+ }
12) Don't invent your own printk() wrappers. I don't see the need for
con_log()
13) try_assertion{}, catch_assertion{}, and end_assertion attempt to
turn C code into C++ code. This will inevitably fail :)
14) the following check doesn't scale, please remove:
+ if (subsysvid && (subsysvid != PCI_VENDOR_ID_AMI) &&
+ (subsysvid != PCI_VENDOR_ID_DELL) &&
+ (subsysvid != PCI_VENDOR_ID_HP) &&
+ (subsysvid != PCI_VENDOR_ID_INTEL) &&
+ (subsysvid != PCI_SUBSYS_ID_FSC) &&
+ (subsysvid != PCI_VENDOR_ID_LSI_LOGIC)) {
+
+ con_log(CL_ANN, (KERN_WARNING
+ "megaraid: not loading for subsysvid:%#4.04x\n",
+ subsysvid));
+
+ return -ENODEV;
+ }
15) Don't set a DMA mask, then change it. in megaraid_probe_one() you
have enough info to decide whether 64-bit can be supported.
16) Use normal Linux kernel style to handle errors. Don't duplicate
+ kfree(adapter);
+
+ pci_disable_device(pdev);
in multiple error handling paths.
17) Eliminate MAX_CONTROLLERS. There is no reason for this limit.
Global structures such as mraid_driver_g are generally not needed.
18) Use pci_request_regions() in megaraid_probe_one(), rather than
request_region or request_mem_region. To free, you use
pci_release_regions().
19) When hardware is malfunctioning or removed, the following code turns
into an infinite loop:
+ // FIXME: this may not be required
+ while (RDINDOOR(raid_dev) & 0x02) cpu_relax();
20) You don't need spin_lock_irqsave() in interrupt handler, only
spin_lock()
21) VERY unfriendly to shared interrupts. megaraid_iombox_ack_sequence
MUST check immediately for
(a) no irq at all (shared interrupt)
(b) status return of 0xffffffff (hardware is malfunctioning/unplugged)
22) MAJOR bug: sleep inside spinlock. The SCSI error handler calls its
error handling hooks with the adapter lock held. You cannot yield()
inside mbox_post_sync_cmd
23) Alongside #22, the following code is unacceptable:
+ // wait for maximum 1 second for status to post
+ for (i = 0; i < 40000; i++) {
+ if (mbox->numstatus != 0xFF) break;
+ udelay(25); yield();
+ }
Besides yield(), the maximum delay with spinlock held is far too long.
24) Same issues as #22 and #23 in __megaraid_busywait_mbox:
unacceptable delay inside spinlock, and schedule inside spinlock:
+ for (counter = 0; counter < 10000; counter++) {
+
+ if (!mbox->busy) return MRAID_SUCCESS;
+
+ udelay(100); yield();
+ }
25) sleep_on_timeout() is deprecated and should not be used:
+ while (adp->outstanding_cmds > 0)
+ sleep_on_timeout( &wq, 1*HZ );
26) General question: do you ever call down() inside a spinlock? I did
not look closely, but I think you do. That would be wrong, as down()
can sleep.
27) drivers/scsi/megaraid/readme belongs in the Documentation/ directory
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
2004-04-16 18:01 ` Jeff Garzik
@ 2004-04-16 18:24 ` Brian King
2004-04-16 18:30 ` Jeff Garzik
0 siblings, 1 reply; 8+ messages in thread
From: Brian King @ 2004-04-16 18:24 UTC (permalink / raw)
To: Jeff Garzik
Cc: Bagalkote, Sreenivas, 'Matt_Domsch@dell.com',
'paul@kungfoocoder.org', Mukker, Atul,
'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
Jeff Garzik wrote:
> 6) Kill uses of "volatile". _Most_ of the time, this indicates buggy
> code. You should have the proper barriers in place: mb(), wmb(),
> rmb(), barrier(), and cpu_relax(). This has been mentioned before :)
I'm not sure I totally agree with this statement. I took a quick look at the driver
and the volatile appears to be used to point to host memory that is modified
by the adapter. Correct me if I am wrong, but memory barriers will not accomplish
what the volatile will in this situation.
--
Brian King
eServer Storage I/O
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
2004-04-16 18:24 ` Brian King
@ 2004-04-16 18:30 ` Jeff Garzik
0 siblings, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2004-04-16 18:30 UTC (permalink / raw)
To: Brian King
Cc: Bagalkote, Sreenivas, 'Matt_Domsch@dell.com',
'paul@kungfoocoder.org', Mukker, Atul,
'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
Brian King wrote:
> Jeff Garzik wrote:
>
>> 6) Kill uses of "volatile". _Most_ of the time, this indicates buggy
>> code. You should have the proper barriers in place: mb(), wmb(),
>> rmb(), barrier(), and cpu_relax(). This has been mentioned before :)
>
>
> I'm not sure I totally agree with this statement. I took a quick look at
> the driver
> and the volatile appears to be used to point to host memory that is
> modified
> by the adapter. Correct me if I am wrong, but memory barriers will not
> accomplish
> what the volatile will in this situation.
You are wrong :) Compare the underlying asm.
Proper use of barriers tells the compiler when it cannot cache certain
loads and stores in registers. 'volatile' is an atomic bomb to the
compiler, killing many valid optimizations while (sometimes) hiding
bugs. It almost always indicates that the programmer did not bother
thinking about when and where DMA memory needs to be accessed.
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
@ 2004-04-16 22:50 Mukker, Atul
2004-04-16 23:38 ` Jeff Garzik
0 siblings, 1 reply; 8+ messages in thread
From: Mukker, Atul @ 2004-04-16 22:50 UTC (permalink / raw)
To: 'Jeff Garzik', Bagalkote, Sreenivas,
'Christoph Hellwig'
Cc: 'Matt_Domsch@dell.com', 'paul@kungfoocoder.org',
Mukker, Atul, 'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
> 0) Am I to judge from megaraid_mbox_build_cmd that megaraid does not
> really support SCSI, but it translates instead? This may imply that
> implementation as a SCSI driver is inappropriate.
MegaRAID is a very SCSI stack :-)
>
>
> 1) Remove back-compat code from kdep.h. It's OK for devel and for
> vendor-specific branch, but we wouldn't want this in 2.6.x mainline.
The 2.4 portion of the code has a very small footprint in the driver. Thanks
to earlier inputs from Christoph Hellwig, it will be even smaller.
Maintaining single code for lk 2.4 and 2.6 makes very sense from logistics
and maintenance point of view. lk 2.4 is still very popular :-) and we would
like to provide support for the same with the latest drivers as well.
>
>
> 2) Buggy ADDR definitions:
>
> +#define ADDR_LO(addr) (((unsigned long)(addr)) & 0xffffffff)
> +#define ADDR_HI(addr) ((((ulong)(addr)) &
> 0xffffffff00000000ULL) >> 32)
> +#define ADDR_64(hi, lo) ((((uint64_t)(hi)) << 32) | (lo))
>
> Don't cast to unsigned long, cast to u64.
This will be removed, it is not used in the driver anyway
>
>
> 3) Delete non-standard types: ulong
ok
>
>
> 5) No foo_t structures. Linux kernel style is "struct foo"
> not "foo_t":
>
> +typedef struct mraid_hba_info {
> +} __attribute__ ((packed)) mraid_hba_info_t;
> +
> +typedef struct mcontroller {
> +} __attribute__ ((packed)) mcontroller_t;
We also prefer to have "struct foo" mostly. But this should not be a rule.
E.g., having adapter_t instead of "struct adapter" is very convenient,
simply because it used so widely throughout the driver. But point taken :-)
>
>
> 6) Kill uses of "volatile". _Most_ of the time, this indicates buggy
> code. You should have the proper barriers in place: mb(), wmb(),
> rmb(), barrier(), and cpu_relax(). This has been mentioned before :)
We have carefully used "volatile" only on the fields, which are touch by the
hardware and/or firmware. The proper barriers are used when we want to make
sure access is properly sequenced.
>
>
> 7) Allow me to compliment you on proper kernel-doc documentation.
>
Thanks!
>
> 8) Consider adding ____cacheline_aligned markers to the definition of
> struct adapter (a.k.a. adapter_t, before item #5 is fixed).
> Re-arrange
> your adapter structure so that struct members that are used
> together are
> kept in the same cacheline. For example, in net drivers, I usually
> split the structures into "TX", "RX", and "everything else" sections.
> Keep in mind that cacheline size varies (32/64/128 usually),
> so this is
> not an absolute limit, just a marker where to physically
> separate into
> distinct cachelines.
ok
>
>
> 9) {SET,IS}_PRV_INTF_AVAILABLE's use of atomic_foo() seems racy.
>
This is not necessary and will be removed.
>
> 10) In 2.6, use the type-safe module_param() rather than MODULE_PARM()
>
We will explore and reply
>
> 11) Why is PAGE_SIZE chosen here?
>
> + /*
> + * Allocate all pages in a loop
> + */
> + for (i = 0; i < num_pages; i++) {
> +
> + pool->page_arr[i] = pci_alloc_consistent(
> dev, PAGE_SIZE-1,
> +
> &pool->dmah_arr[i] );
> + if (pool->page_arr[i] == NULL) {
> + con_log(CL_ANN, (KERN_WARNING
> + "megaraid: Failed to alloc
> page # %d\n",
> i ));
> + goto memlib_fail_alloc;
> + }
> + }
>
This API will only be used by low level drivers which require multiple small
chunks of DMAable memory. This chunk are assumed to be much smaller than a
PAGE therefore many can fit in a single PAGE. Driver portions requiring
bigger buffers (>4KB) use pci_alloc_consistent directly.
>
> 12) Don't invent your own printk() wrappers. I don't see the
> need for
> con_log()
Actually we do, since the debug level can be changed on the fly using ioctl
and sysfs
>
>
> 13) try_assertion{}, catch_assertion{}, and end_assertion attempt to
> turn C code into C++ code. This will inevitably fail :)
We see it as better error handling. It would be very useful specially when
we move to more fine-grained locks in the drivers.
>
>
> 14) the following check doesn't scale, please remove:
>
> + if (subsysvid && (subsysvid != PCI_VENDOR_ID_AMI) &&
> + (subsysvid != PCI_VENDOR_ID_DELL) &&
> + (subsysvid != PCI_VENDOR_ID_HP) &&
> + (subsysvid != PCI_VENDOR_ID_INTEL) &&
> + (subsysvid != PCI_SUBSYS_ID_FSC) &&
> + (subsysvid != PCI_VENDOR_ID_LSI_LOGIC)) {
> +
> + con_log(CL_ANN, (KERN_WARNING
> + "megaraid: not loading for
> subsysvid:%#4.04x\n",
> + subsysvid));
> +
> + return -ENODEV;
> + }
>
Please elaborate. We simply want to make sure the pci vendor, device and
subsystem ID are appropriate before loading the driver.
>
> 15) Don't set a DMA mask, then change it. in
> megaraid_probe_one() you
> have enough info to decide whether 64-bit can be supported.
>
>
ok
> 16) Use normal Linux kernel style to handle errors. Don't duplicate
> + kfree(adapter);
> +
> + pci_disable_device(pdev);
>
> in multiple error handling paths.
ok
>
>
> 17) Eliminate MAX_CONTROLLERS. There is no reason for this limit.
> Global structures such as mraid_driver_g are generally not needed.
We need it, since management module and applications need to see a global
picture of all adapters in the system
>
>
> 18) Use pci_request_regions() in megaraid_probe_one(), rather than
> request_region or request_mem_region. To free, you use
> pci_release_regions().
ok
>
>
> 19) When hardware is malfunctioning or removed, the following
> code turns
> into an infinite loop:
>
> + // FIXME: this may not be required
> + while (RDINDOOR(raid_dev) & 0x02) cpu_relax();
Is there is bette way to do it. If there is, please suggest - we will
definitely us it. In fact, this is not the only such loop in the ISR - there
are two more.
>
>
> 20) You don't need spin_lock_irqsave() in interrupt handler, only
> spin_lock()
ok
>
>
> 21) VERY unfriendly to shared interrupts.
> megaraid_iombox_ack_sequence
> MUST check immediately for
> (a) no irq at all (shared interrupt)
> (b) status return of 0xffffffff (hardware is
> malfunctioning/unplugged)
This is for EOL controllers. We are planning to phase out this portion.
>
>
> 22) MAJOR bug: sleep inside spinlock. The SCSI error
> handler calls its
> error handling hooks with the adapter lock held. You cannot yield()
> inside mbox_post_sync_cmd
>
>
> 23) Alongside #22, the following code is unacceptable:
>
> + // wait for maximum 1 second for status to post
> + for (i = 0; i < 40000; i++) {
> + if (mbox->numstatus != 0xFF) break;
> + udelay(25); yield();
> + }
>
> Besides yield(), the maximum delay with spinlock held is far too long.
>
>
> 24) Same issues as #22 and #23 in __megaraid_busywait_mbox:
> unacceptable delay inside spinlock, and schedule inside spinlock:
>
> + for (counter = 0; counter < 10000; counter++) {
> +
> + if (!mbox->busy) return MRAID_SUCCESS;
> +
> + udelay(100); yield();
> + }
>
Very good point! We would fix in the next drop.
>
>
> 25) sleep_on_timeout() is deprecated and should not be used:
>
> + while (adp->outstanding_cmds > 0)
> + sleep_on_timeout( &wq, 1*HZ );
>
What is the recommended way to do it?
>
>
> 26) General question: do you ever call down() inside a
> spinlock? I did
> not look closely, but I think you do. That would be wrong, as down()
> can sleep.
That should not happen since down() is only done in ioctl() through
management module. Will double check though.
>
>
> 27) drivers/scsi/megaraid/readme belongs in the
> Documentation/ directory
ok
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
2004-04-16 22:50 [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1 Mukker, Atul
@ 2004-04-16 23:38 ` Jeff Garzik
0 siblings, 0 replies; 8+ messages in thread
From: Jeff Garzik @ 2004-04-16 23:38 UTC (permalink / raw)
To: Mukker, Atul
Cc: Bagalkote, Sreenivas, 'Christoph Hellwig',
'Matt_Domsch@dell.com', 'paul@kungfoocoder.org',
'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
Mukker, Atul wrote:
>>0) Am I to judge from megaraid_mbox_build_cmd that megaraid does not
>>really support SCSI, but it translates instead? This may imply that
>>implementation as a SCSI driver is inappropriate.
>
>
> MegaRAID is a very SCSI stack :-)
Well, the stack is irrelevant -- if you are not pushing real SCSI CDBs
to the RAID arrays, then it is not terribly appropriate as a SCSI driver.
We must distinguish use of Linux SCSI for driver API, from use of Linux
SCSI for communicating with SCSI devices.
This is more of a Linux 2.7.x issue, since I don't expect LSI to rewrite
the driver again as a block driver, at the moment :)
>>1) Remove back-compat code from kdep.h. It's OK for devel and for
>>vendor-specific branch, but we wouldn't want this in 2.6.x mainline.
>
>
> The 2.4 portion of the code has a very small footprint in the driver. Thanks
> to earlier inputs from Christoph Hellwig, it will be even smaller.
> Maintaining single code for lk 2.4 and 2.6 makes very sense from logistics
> and maintenance point of view. lk 2.4 is still very popular :-) and we would
> like to provide support for the same with the latest drivers as well.
I certainly agree that 2.4 maintenance will be required for quite some
time. My employer (Red Hat) has a maintenance lifetime of ~5 years or
more for its 2.4.x-based products.
The suggested direction, however is not to include back-compat in the
upstream kernel tree. This leads to unnecessary levels of indirection
in the upstream tree, which makes it more difficult for reviewers to
understand.
You _have_ done a good job of minimizing the 2.4/2.6 differences, but I
still feel it is preferable to leave these out of the 2.6 tree. I have
done the same thing with my SATA driver (libata), which has several
differences
In particular, I feel that attempting to make the differences between
2.4 and 2.6 SCSI layer probing is a big mistake. 2.6 SCSI layer is far
more hotplug-friendly than 2.4, and supporting both in the same source
base puts unreasonable constraints on the code over time, IMO.
>>6) Kill uses of "volatile". _Most_ of the time, this indicates buggy
>>code. You should have the proper barriers in place: mb(), wmb(),
>>rmb(), barrier(), and cpu_relax(). This has been mentioned before :)
>
> We have carefully used "volatile" only on the fields, which are touch by the
> hardware and/or firmware. The proper barriers are used when we want to make
> sure access is properly sequenced.
The vast majority of Linux drivers are volatile-free, for several reasons:
1) it hurts performance (inhibits valid compiler optimizations)
2) it hides bugs
3) it is not necessary for proper operation of hardware
I request that megaraid follow the example of the "vast majority".
>>11) Why is PAGE_SIZE chosen here?
>>
>>+ /*
>>+ * Allocate all pages in a loop
>>+ */
>>+ for (i = 0; i < num_pages; i++) {
>>+
>>+ pool->page_arr[i] = pci_alloc_consistent(
>>dev, PAGE_SIZE-1,
>>+
>>&pool->dmah_arr[i] );
>>+ if (pool->page_arr[i] == NULL) {
>>+ con_log(CL_ANN, (KERN_WARNING
>>+ "megaraid: Failed to alloc
>>page # %d\n",
>>i ));
>>+ goto memlib_fail_alloc;
>>+ }
>>+ }
>>
>
> This API will only be used by low level drivers which require multiple small
> chunks of DMAable memory. This chunk are assumed to be much smaller than a
> PAGE therefore many can fit in a single PAGE. Driver portions requiring
> bigger buffers (>4KB) use pci_alloc_consistent directly.
Now that I understand the purpose, the preference is the same as in my
other reply -- let's fix the problems with pci_pool_xxx API.
We have a kernel API precisely for this purpose; it should be used.
pci_pool is also present in 2.4.x...
>>12) Don't invent your own printk() wrappers. I don't see the
>>need for
>>con_log()
>
> Actually we do, since the debug level can be changed on the fly using ioctl
> and sysfs
Fair enough. We do something similar in net drivers.
>>13) try_assertion{}, catch_assertion{}, and end_assertion attempt to
>>turn C code into C++ code. This will inevitably fail :)
>
> We see it as better error handling. It would be very useful specially when
> we move to more fine-grained locks in the drivers.
It's a very fragile scheme, unfamiliar to non-LSI engineers, and thus
more likely to have problems in the long term. It's just syntactic
sugar, really.
>>14) the following check doesn't scale, please remove:
>>
>>+ if (subsysvid && (subsysvid != PCI_VENDOR_ID_AMI) &&
>>+ (subsysvid != PCI_VENDOR_ID_DELL) &&
>>+ (subsysvid != PCI_VENDOR_ID_HP) &&
>>+ (subsysvid != PCI_VENDOR_ID_INTEL) &&
>>+ (subsysvid != PCI_SUBSYS_ID_FSC) &&
>>+ (subsysvid != PCI_VENDOR_ID_LSI_LOGIC)) {
>>+
>>+ con_log(CL_ANN, (KERN_WARNING
>>+ "megaraid: not loading for
>>subsysvid:%#4.04x\n",
>>+ subsysvid));
>>+
>>+ return -ENODEV;
>>+ }
>>
>
> Please elaborate. We simply want to make sure the pci vendor, device and
> subsystem ID are appropriate before loading the driver.
Two points:
a) Such limitations should be in the pci_device_table, not the code itself
b) It is considered "unfriendly" and maintenance-intensive to add such
checks. If a hardware vendor (let's pick FSC as example) comes out with
new megaraid hardware, you should make every effort to ensure that the
driver _already works_ for that hardware.
We see from the megaraid 2.10.3 driver this patch:
@@ -311,6 +327,7 @@
(subsysvid != DELL_SUBSYS_VID) &&
(subsysvid != HP_SUBSYS_VID) &&
(subsysvid != INTEL_SUBSYS_VID) &&
+ (subsysvid != FSC_SUBSYS_VID) &&
(subsysvid != LSI_SUBSYS_VID) )
The hardware CLEARLY did not need FSC-specific modifications. However,
the FSC hardware would not work without this simple patch.
It is better to eliminate the need for such patches. If the driver and
hardware both comply to the specification, it should Just Work(tm). We
call this "future-proofing" the driver.
You provide better value to your customers by future-proofing your drivers.
>>17) Eliminate MAX_CONTROLLERS. There is no reason for this limit.
>>Global structures such as mraid_driver_g are generally not needed.
>
> We need it, since management module and applications need to see a global
> picture of all adapters in the system
I agree -- but I think you misunderstand.
It is possible to give mm and apps global picture of all adapters,
without introducing an arbitrary hardcoded limit. megaraid already has
a global list of adapters, that should be sufficient.
Dynamic allocation of each adapter, plus a global list, provides all
that megaraid needs, without any hardcoded MAX_CONTROLLERS limit.
When coding, imagine a completely crazy scenario -- such as 1,000
megaraid controllers in a system -- and contemplate all the limits that
will violate in the driver. A simple dynamic allocation scheme would
have no such limits.
>>19) When hardware is malfunctioning or removed, the following
>>code turns
>>into an infinite loop:
>>
>>+ // FIXME: this may not be required
>>+ while (RDINDOOR(raid_dev) & 0x02) cpu_relax();
>
> Is there is bette way to do it. If there is, please suggest - we will
> definitely us it. In fact, this is not the only such loop in the ISR - there
> are two more.
Typically, one includes a simple counter to ensure the loop is not
infinite...
while (condition && (counter-- > 0))
cpu_relax()
>>21) VERY unfriendly to shared interrupts.
>>megaraid_iombox_ack_sequence
>>MUST check immediately for
>> (a) no irq at all (shared interrupt)
>> (b) status return of 0xffffffff (hardware is
>>malfunctioning/unplugged)
>
> This is for EOL controllers. We are planning to phase out this portion.
The (a) and (b) checks I describe are -required- for all controllers,
EOL or not...
>>25) sleep_on_timeout() is deprecated and should not be used:
>>
>>+ while (adp->outstanding_cmds > 0)
>>+ sleep_on_timeout( &wq, 1*HZ );
>>
>
> What is the recommended way to do it?
There are a couple ways... the easiest is to use a semaphone/timer.
>>26) General question: do you ever call down() inside a
>>spinlock? I did
>>not look closely, but I think you do. That would be wrong, as down()
>>can sleep.
>
> That should not happen since down() is only done in ioctl() through
> management module. Will double check though.
Thanks,
Jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
@ 2004-04-17 6:40 Mukker, Atul
2004-04-17 13:11 ` Ingo Oeser
2004-04-18 14:00 ` Matt Domsch
0 siblings, 2 replies; 8+ messages in thread
From: Mukker, Atul @ 2004-04-17 6:40 UTC (permalink / raw)
To: 'Jeff Garzik', Mukker, Atul
Cc: Bagalkote, Sreenivas, 'Christoph Hellwig',
'Matt_Domsch@dell.com', 'paul@kungfoocoder.org',
'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
> >>0) Am I to judge from megaraid_mbox_build_cmd that megaraid
> does not
> >>really support SCSI, but it translates instead? This may
> imply that
> >>implementation as a SCSI driver is inappropriate.
> >
> >
> > MegaRAID is a very SCSI stack :-)
>
> Well, the stack is irrelevant -- if you are not pushing real
> SCSI CDBs
> to the RAID arrays, then it is not terribly appropriate as a
> SCSI driver.
>
> We must distinguish use of Linux SCSI for driver API, from
> use of Linux
> SCSI for communicating with SCSI devices.
>
> This is more of a Linux 2.7.x issue, since I don't expect LSI
> to rewrite
> the driver again as a block driver, at the moment :)
In fact, we have contemplated about moving the driver up in the block layer,
and there might be such a driver in near future. But it is not right to say
that driver does not handle native SCSI packets. In addition to the logical
drives - which have special packets, the driver has a passthru interface for
non disk SCSI device and even SCSI disk for certain RAID+SCSI combinations.
The reasons we have not yet published drivers with block interface are:
(a) - Driver would have two personalities - block interface for logical disk
drives and SCSI for non-disk devices
(b) - Certain SCSI commands are indeed valid for logical drives, e.g., to
setup the clustering with shared storage - driver should get SCSI RESERVE,
RELEASE, RESET for logical drives.
>
>
> >>1) Remove back-compat code from kdep.h. It's OK for devel and for
> >>vendor-specific branch, but we wouldn't want this in 2.6.x mainline.
> >
> >
> > The 2.4 portion of the code has a very small footprint in
> the driver. Thanks
> > to earlier inputs from Christoph Hellwig, it will be even smaller.
> > Maintaining single code for lk 2.4 and 2.6 makes very sense
> from logistics
> > and maintenance point of view. lk 2.4 is still very popular
> :-) and we would
> > like to provide support for the same with the latest
> drivers as well.
>
> I certainly agree that 2.4 maintenance will be required for
> quite some
> time. My employer (Red Hat) has a maintenance lifetime of ~5
> years or
> more for its 2.4.x-based products.
>
> The suggested direction, however is not to include back-compat in the
> upstream kernel tree. This leads to unnecessary levels of
> indirection
> in the upstream tree, which makes it more difficult for reviewers to
> understand.
>
> You _have_ done a good job of minimizing the 2.4/2.6
> differences, but I
> still feel it is preferable to leave these out of the 2.6
> tree. I have
> done the same thing with my SATA driver (libata), which has several
> differences
>
> In particular, I feel that attempting to make the differences between
> 2.4 and 2.6 SCSI layer probing is a big mistake. 2.6 SCSI
> layer is far
> more hotplug-friendly than 2.4, and supporting both in the
> same source
> base puts unreasonable constraints on the code over time, IMO.
>
We have not heard any other argument either in favor of or against
maintaining a single code base for 2.4 and 2.6. Although - there were mixed
sentiments when the alpha code was dropped. If 2.6 SCSI maintainers feel
strongly against have 2.4 specific code - we can patch the driver
appropriately. My only concern is, the same argument would then be valid for
2.4 version. It should not have 2.6 related code. But the 2.4 version of the
driver would be so "not 2.4'ish" since this driver is primarily 2.6 based.
>
> >>6) Kill uses of "volatile". _Most_ of the time, this
> indicates buggy
> >>code. You should have the proper barriers in place: mb(), wmb(),
> >>rmb(), barrier(), and cpu_relax(). This has been mentioned
> before :)
> >
> > We have carefully used "volatile" only on the fields, which
> are touch by the
> > hardware and/or firmware. The proper barriers are used when
> we want to make
> > sure access is properly sequenced.
>
> The vast majority of Linux drivers are volatile-free, for
> several reasons:
> 1) it hurts performance (inhibits valid compiler optimizations)
> 2) it hides bugs
> 3) it is not necessary for proper operation of hardware
>
> I request that megaraid follow the example of the "vast majority".
Haven't had many arguments against this as well. So, we will remove all
volatile keywords in next version.
>
>
> >>11) Why is PAGE_SIZE chosen here?
> >>
> >>+ /*
> >>+ * Allocate all pages in a loop
> >>+ */
> >>+ for (i = 0; i < num_pages; i++) {
> >>+
> >>+ pool->page_arr[i] = pci_alloc_consistent(
> >>dev, PAGE_SIZE-1,
> >>+
> >>&pool->dmah_arr[i] );
> >>+ if (pool->page_arr[i] == NULL) {
> >>+ con_log(CL_ANN, (KERN_WARNING
> >>+ "megaraid: Failed to alloc
> >>page # %d\n",
> >>i ));
> >>+ goto memlib_fail_alloc;
> >>+ }
> >>+ }
> >>
> >
> > This API will only be used by low level drivers which
> require multiple small
> > chunks of DMAable memory. This chunk are assumed to be much
> smaller than a
> > PAGE therefore many can fit in a single PAGE. Driver
> portions requiring
> > bigger buffers (>4KB) use pci_alloc_consistent directly.
>
> Now that I understand the purpose, the preference is the same
> as in my
> other reply -- let's fix the problems with pci_pool_xxx API.
>
> We have a kernel API precisely for this purpose; it should be used.
>
> pci_pool is also present in 2.4.x...
We will investigate this further and share with lists. But this should not
be a pre-requisite for driver making in the kernel. We have made sure that
the semantics for pci_pool_xxx match closely with this our API - so it would
be a simple substitution.
>
> >>14) the following check doesn't scale, please remove:
> >>
> >>+ if (subsysvid && (subsysvid != PCI_VENDOR_ID_AMI) &&
> >>+ (subsysvid != PCI_VENDOR_ID_DELL) &&
> >>+ (subsysvid != PCI_VENDOR_ID_HP) &&
> >>+ (subsysvid != PCI_VENDOR_ID_INTEL) &&
> >>+ (subsysvid != PCI_SUBSYS_ID_FSC) &&
> >>+ (subsysvid != PCI_VENDOR_ID_LSI_LOGIC)) {
> >>+
> >>+ con_log(CL_ANN, (KERN_WARNING
> >>+ "megaraid: not loading for
> >>subsysvid:%#4.04x\n",
> >>+ subsysvid));
> >>+
> >>+ return -ENODEV;
> >>+ }
> >>
> >
> > Please elaborate. We simply want to make sure the pci
> vendor, device and
> > subsystem ID are appropriate before loading the driver.
>
> Two points:
>
> a) Such limitations should be in the pci_device_table, not
> the code itself
Combinig the subsystem id with pci vendor and device id in the device table
would result in many permuations and combinations.
>
> b) It is considered "unfriendly" and maintenance-intensive to
> add such
> checks. If a hardware vendor (let's pick FSC as example)
> comes out with
> new megaraid hardware, you should make every effort to ensure
> that the
> driver _already works_ for that hardware.
>
> We see from the megaraid 2.10.3 driver this patch:
>
> @@ -311,6 +327,7 @@
> (subsysvid != DELL_SUBSYS_VID) &&
> (subsysvid != HP_SUBSYS_VID) &&
> (subsysvid != INTEL_SUBSYS_VID) &&
> + (subsysvid != FSC_SUBSYS_VID) &&
> (subsysvid != LSI_SUBSYS_VID) )
>
> The hardware CLEARLY did not need FSC-specific modifications.
> However,
> the FSC hardware would not work without this simple patch.
>
> It is better to eliminate the need for such patches. If the
> driver and
> hardware both comply to the specification, it should Just
> Work(tm). We
> call this "future-proofing" the driver.
>
> You provide better value to your customers by future-proofing
> your drivers.
You are right about future-proofing. But when we do this, we have something
else in mind, which is totally opposite. I am sure, it seems abstruse to
redundantly check for the subsystem ids, when the vendor and device ids are
provided in the device table. This is done, so that we do not try to take
ownership of controller, which actually belongs to some other vendor. I know
of at least one example, where the qlogic driver loads for an AMI MegaRAID
controller - because both share the same vendor and device ids. Now the
driver assumes it to be a qlogic controller and tries to start it,
eventually hanging the server.
There definitely are other ways we driver simply wants to support a new
controller, which should Just Work(tm), e.g., by accepting new PCI vendor
id, device id and subsystem id as module parameters. It is unlikely we need
this in near future because the frequency at which we update drivers makes
it very easy to sneak in another set of PCI ids in the driver :-))
>
>
> >>17) Eliminate MAX_CONTROLLERS. There is no reason for this limit.
> >>Global structures such as mraid_driver_g are generally not needed.
> >
> > We need it, since management module and applications need
> to see a global
> > picture of all adapters in the system
>
> I agree -- but I think you misunderstand.
>
> It is possible to give mm and apps global picture of all adapters,
> without introducing an arbitrary hardcoded limit. megaraid
> already has
> a global list of adapters, that should be sufficient.
>
> Dynamic allocation of each adapter, plus a global list, provides all
> that megaraid needs, without any hardcoded MAX_CONTROLLERS limit.
>
Understood and accepted :-)
>
> >>19) When hardware is malfunctioning or removed, the following
> >>code turns
> >>into an infinite loop:
> >>
> >>+ // FIXME: this may not be required
> >>+ while (RDINDOOR(raid_dev) & 0x02) cpu_relax();
> >
> > Is there is bette way to do it. If there is, please suggest
> - we will
> > definitely us it. In fact, this is not the only such loop
> in the ISR - there
> > are two more.
>
> Typically, one includes a simple counter to ensure the loop is not
> infinite...
>
> while (condition && (counter-- > 0))
> cpu_relax()
>
The hard part is arriving at the counter value. Given that this driver might
run on a 700MHz singe cpu, one of my test servers :-( or a 3.2GHz SMP server
- the typical counter value is difficult to set. But you are right, we
should find out the typical time a cpu would take to execute so many
iterations and base counter value over it.
>
> >>21) VERY unfriendly to shared interrupts.
> >>megaraid_iombox_ack_sequence
> >>MUST check immediately for
> >> (a) no irq at all (shared interrupt)
> >> (b) status return of 0xffffffff (hardware is
> >>malfunctioning/unplugged)
> >
> > This is for EOL controllers. We are planning to phase out
> this portion.
>
> The (a) and (b) checks I describe are -required- for all controllers,
> EOL or not...
Looks like missed your point completely. Are you reffering to the irq
parameter passed to the ISR? In our ISR, it first thing we check is, whether
the controller raised an interrupt. If yes, then do further processing,
otherwise return immediately.
Thanks
-Atul Mukker
LSI Logic
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
2004-04-17 6:40 Mukker, Atul
@ 2004-04-17 13:11 ` Ingo Oeser
2004-04-18 14:00 ` Matt Domsch
1 sibling, 0 replies; 8+ messages in thread
From: Ingo Oeser @ 2004-04-17 13:11 UTC (permalink / raw)
To: linux-kernel
Cc: Mukker, Atul, 'Jeff Garzik', Bagalkote, Sreenivas,
'Christoph Hellwig', 'Matt_Domsch@dell.com',
'paul@kungfoocoder.org',
'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
On Saturday 17 April 2004 08:40, Mukker, Atul wrote:
> > Typically, one includes a simple counter to ensure the loop is not
> > infinite...
> >
> > while (condition && (counter-- > 0))
> > cpu_relax()
> >
> The hard part is arriving at the counter value. Given that this driver might
> run on a 700MHz singe cpu, one of my test servers :-( or a 3.2GHz SMP server
> - the typical counter value is difficult to set. But you are right, we
> should find out the typical time a cpu would take to execute so many
> iterations and base counter value over it.
Then you might use a udelay(1) instead. This should equalize this
effect, no?
Regards
Ingo Oeser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAgSz1U56oYWuOrkARApDLAKDRyAJk7wL4T2M5NDjo3sUVT9sXTACgiEmo
3NQ5RyBs8aIWxrrfCzlP4JM=
=3K/t
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1
2004-04-17 6:40 Mukker, Atul
2004-04-17 13:11 ` Ingo Oeser
@ 2004-04-18 14:00 ` Matt Domsch
1 sibling, 0 replies; 8+ messages in thread
From: Matt Domsch @ 2004-04-18 14:00 UTC (permalink / raw)
To: Mukker, Atul
Cc: 'Jeff Garzik', Bagalkote, Sreenivas,
'Christoph Hellwig', 'paul@kungfoocoder.org',
'James.Bottomley@SteelEye.com',
'arjanv@redhat.com', 'linux-scsi@vger.kernel.org',
'linux-kernel@vger.kernel.org'
On Sat, Apr 17, 2004 at 02:40:36AM -0400, Mukker, Atul wrote:
> > >>14) the following check doesn't scale, please remove:
> > >>
> > >>+ if (subsysvid && (subsysvid != PCI_VENDOR_ID_AMI) &&
> > >>+ (subsysvid != PCI_VENDOR_ID_DELL) &&
> > >>+ (subsysvid != PCI_VENDOR_ID_HP) &&
> > >>+ (subsysvid != PCI_VENDOR_ID_INTEL) &&
> > >>+ (subsysvid != PCI_SUBSYS_ID_FSC) &&
> > >>+ (subsysvid != PCI_VENDOR_ID_LSI_LOGIC)) {
>
> Combinig the subsystem id with pci vendor and device id in the device table
> would result in many permuations and combinations.
That's OK. See the e100 driver, it's got like 35 entries on its list
already.
> You are right about future-proofing. But when we do this, we have something
> else in mind, which is totally opposite. I am sure, it seems abstruse to
> redundantly check for the subsystem ids, when the vendor and device ids are
> provided in the device table. This is done, so that we do not try to take
> ownership of controller, which actually belongs to some other vendor. I know
> of at least one example, where the qlogic driver loads for an AMI MegaRAID
> controller - because both share the same vendor and device ids. Now the
> driver assumes it to be a qlogic controller and tries to start it,
> eventually hanging the server.
But megaraid doesn't have this problem AFAIK.
And if necessary, a second pci_device_id list of IDs to test against
and exclude would be appropriate, and analogous to the pci_device_id
list that's used to accept devices.
> There definitely are other ways we driver simply wants to support a new
> controller, which should Just Work(tm), e.g., by accepting new PCI vendor
> id, device id and subsystem id as module parameters.
/sys/bus/pci/driver/megaraid/new_id already does this
> It is unlikely we need this in near future because the frequency at
> which we update drivers makes it very easy to sneak in another set
> of PCI ids in the driver :-))
Wrong. The issue isn't "how fast can LSI provide an updated driver to
kernel.org", but "how fast can users themselves add new IDs to a
driver for their given kernel/distribution at install time without
needing to wait for LSI to produce a new driver". Hence the new_id
trick above was introduced for 2.6 for exactly this purpose.
Thanks,
Matt
--
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-04-18 14:02 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-16 22:50 [PATCH][RELEASE]: megaraid unified driver version 2.20.0.B1 Mukker, Atul
2004-04-16 23:38 ` Jeff Garzik
-- strict thread matches above, loose matches on Subject: below --
2004-04-17 6:40 Mukker, Atul
2004-04-17 13:11 ` Ingo Oeser
2004-04-18 14:00 ` Matt Domsch
[not found] <0E3FA95632D6D047BA649F95DAB60E570230C7DB@exa-atlanta.se.lsil.com>
2004-04-16 18:01 ` Jeff Garzik
2004-04-16 18:24 ` Brian King
2004-04-16 18:30 ` Jeff Garzik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox