* [patch resend] firewire: ieee1394: Move away from SG_ALL @ 2008-08-04 18:08 Stefan Richter 2008-08-04 19:21 ` James Bottomley 0 siblings, 1 reply; 14+ messages in thread From: Stefan Richter @ 2008-08-04 18:08 UTC (permalink / raw) To: linux1394-devel; +Cc: linux-scsi, Boaz Harrosh As a discussion reminded me today, I believe I should merge the following patch (could have done so much earlier in fact). Or is there anything speaking against it? Meanwhile, SG_ALL has been redefined from 255 to SCSI_MAX_SG_SEGMENTS = 128 without me noticing it. But since several popular architectures define ARCH_HAS_SG_CHAIN, we may want to go back to 255 in (fw-)sbp2. (Besides, we should also do some tests to figure out an actual optimum. And fw-sbp2.c's struct sbp2_command_orb should become variably sized.) Does the most relevant user of large transfers with SBP-2 devices, sd-mod, use chained s/g lists? Date: Thu, 17 Jan 2008 18:41:47 +0200 From: Boaz Harrosh <bharrosh@panasas.com> Subject: [PATCH 1/8] firewire: ieee1394: Move away from SG_ALL SG_ALL wants to be ~0 meaning "any future size". Below group of drivers preallocate a scatter list buffer of max_size, so set that size to be 255 (Like before). A better schema can be advised with a more dynamic allocation. Perhaps from a kmem_cache. List of drivers/files: drivers/firewire/fw-sbp2.c drivers/ieee1394/sbp2.[ch] Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> --- drivers/firewire/fw-sbp2.c | 6 ++++-- drivers/ieee1394/sbp2.c | 2 +- drivers/ieee1394/sbp2.h | 4 +++- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/firewire/fw-sbp2.c b/drivers/firewire/fw-sbp2.c index c2169d2..078501c 100644 --- a/drivers/firewire/fw-sbp2.c +++ b/drivers/firewire/fw-sbp2.c @@ -151,6 +151,7 @@ struct sbp2_target { }; #define SBP2_MAX_SG_ELEMENT_LENGTH 0xf000 +#define SBP2_MAX_SG_COUNT 255 #define SBP2_MAX_SECTORS 255 /* Max sectors supported */ #define SBP2_ORB_TIMEOUT 2000 /* Timeout in ms */ @@ -272,7 +273,8 @@ struct sbp2_command_orb { scsi_done_fn_t done; struct sbp2_logical_unit *lu; - struct sbp2_pointer page_table[SG_ALL] __attribute__((aligned(8))); + struct sbp2_pointer page_table[SBP2_MAX_SG_COUNT] + __attribute__((aligned(8))); dma_addr_t page_table_bus; }; @@ -1329,7 +1331,7 @@ static struct scsi_host_template scsi_driver_template = { .slave_configure = sbp2_scsi_slave_configure, .eh_abort_handler = sbp2_scsi_abort, .this_id = -1, - .sg_tablesize = SG_ALL, + .sg_tablesize = SBP2_MAX_SG_COUNT, .use_clustering = ENABLE_CLUSTERING, .cmd_per_lun = 1, .can_queue = 1, diff --git a/drivers/ieee1394/sbp2.c b/drivers/ieee1394/sbp2.c index 1eda11a..070763a 100644 --- a/drivers/ieee1394/sbp2.c +++ b/drivers/ieee1394/sbp2.c @@ -324,7 +324,7 @@ static struct scsi_host_template sbp2_shost_template = { .slave_configure = sbp2scsi_slave_configure, .slave_destroy = sbp2scsi_slave_destroy, .this_id = -1, - .sg_tablesize = SG_ALL, + .sg_tablesize = SBP2_MAX_SG_COUNT, .use_clustering = ENABLE_CLUSTERING, .cmd_per_lun = SBP2_MAX_CMDS, .can_queue = SBP2_MAX_CMDS, diff --git a/drivers/ieee1394/sbp2.h b/drivers/ieee1394/sbp2.h index 333a4bb..ae31788 100644 --- a/drivers/ieee1394/sbp2.h +++ b/drivers/ieee1394/sbp2.h @@ -222,6 +222,7 @@ struct sbp2_status_block { */ #define SBP2_MAX_SG_ELEMENT_LENGTH 0xf000 +#define SBP2_MAX_SG_COUNT 255 #define SBP2_MAX_SECTORS 255 /* There is no real limitation of the queue depth (i.e. length of the linked * list of command ORBs) at the target. The chosen depth is merely an @@ -257,7 +258,8 @@ struct sbp2_command_info { /* Also need s/g structure for each sbp2 command */ struct sbp2_unrestricted_page_table - scatter_gather_element[SG_ALL] __attribute__((aligned(8))); + scatter_gather_element[SBP2_MAX_SG_COUNT] + __attribute__((aligned(8))); dma_addr_t sge_dma; void *sge_buffer; dma_addr_t cmd_dma; -- Stefan Richter -=====-==--- =--- --=-- http://arcgraph.de/sr/ ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-04 18:08 [patch resend] firewire: ieee1394: Move away from SG_ALL Stefan Richter @ 2008-08-04 19:21 ` James Bottomley 2008-08-04 19:58 ` Stefan Richter 0 siblings, 1 reply; 14+ messages in thread From: James Bottomley @ 2008-08-04 19:21 UTC (permalink / raw) To: Stefan Richter; +Cc: linux1394-devel, linux-scsi, Boaz Harrosh On Mon, 2008-08-04 at 20:08 +0200, Stefan Richter wrote: > As a discussion reminded me today, I believe I should merge the > following patch (could have done so much earlier in fact). Or is there > anything speaking against it? The value 255 is chosen because it worked before. What you need to do is establish what the upper limit for firewire is (or indeed if it has one). > Meanwhile, SG_ALL has been redefined from 255 to SCSI_MAX_SG_SEGMENTS = > 128 without me noticing it. But since several popular architectures > define ARCH_HAS_SG_CHAIN, we may want to go back to 255 in (fw-)sbp2. > (Besides, we should also do some tests to figure out an actual optimum. > And fw-sbp2.c's struct sbp2_command_orb should become variably sized.) Don't bother with optmium ... that's block's job based on what it sees from the completions. All we need to know is maximum. > Does the most relevant user of large transfers with SBP-2 devices, > sd-mod, use chained s/g lists? pass, but firewire is a reasonably slow bus by modern standards, and you have the protocol overhead for each ORB, so I'd guess there's a point at which increasing the transaction size doesn't buy anything. James ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-04 19:21 ` James Bottomley @ 2008-08-04 19:58 ` Stefan Richter 2008-08-05 12:22 ` Boaz Harrosh 0 siblings, 1 reply; 14+ messages in thread From: Stefan Richter @ 2008-08-04 19:58 UTC (permalink / raw) To: James Bottomley; +Cc: linux1394-devel, linux-scsi, Boaz Harrosh James Bottomley wrote: > On Mon, 2008-08-04 at 20:08 +0200, Stefan Richter wrote: >> As a discussion reminded me today, I believe I should merge the >> following patch (could have done so much earlier in fact). Or is there >> anything speaking against it? > > The value 255 is chosen because it worked before. What you need to do > is establish what the upper limit for firewire is (or indeed if it has > one). The limit is 65535, following from SBP-2 clause 5.1.2, definition of data_size. [Side note: The SBP-2 s/g list (a.k.a. page table) consists of 64bit wide entries and needs to be contiguous in memory from the POV of the FireWire PCI or PCIe controller, and the SBP-2 target reads the table from the initiator's memory. The (fw-)sbp2 driver builds this table as a copy of the kernel's s/g list; but this was certainly already to the reader clear from the context in the diff.] >> Meanwhile, SG_ALL has been redefined from 255 to SCSI_MAX_SG_SEGMENTS = >> 128 without me noticing it. But since several popular architectures >> define ARCH_HAS_SG_CHAIN, we may want to go back to 255 in (fw-)sbp2. >> (Besides, we should also do some tests to figure out an actual optimum. >> And fw-sbp2.c's struct sbp2_command_orb should become variably sized.) > > Don't bother with optmium ... that's block's job based on what it sees > from the completions. All we need to know is maximum. OK, with the small caveat that the current (fw-)sbp2 driver code is somewhat simplistic WRT page table handling and geared towards rather short page tables. But this may be possible to enhance without too much difficulty. >> Does the most relevant user of large transfers with SBP-2 devices, >> sd-mod, use chained s/g lists? > > pass, but firewire is a reasonably slow bus by modern standards, and you A 3200 Mb/s spec exists :-) (though no silicon yet, to my knowledge). > have the protocol overhead for each ORB, so I'd guess there's a point at > which increasing the transaction size doesn't buy anything. > > James -- Stefan Richter -=====-==--- =--- --=-- http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-04 19:58 ` Stefan Richter @ 2008-08-05 12:22 ` Boaz Harrosh 2008-08-05 15:01 ` Stefan Richter 2008-08-05 19:57 ` [patch resend] firewire: ieee1394: Move away from SG_ALL Stefan Richter 0 siblings, 2 replies; 14+ messages in thread From: Boaz Harrosh @ 2008-08-05 12:22 UTC (permalink / raw) To: Stefan Richter; +Cc: James Bottomley, linux1394-devel, linux-scsi Stefan Richter wrote: > James Bottomley wrote: >> On Mon, 2008-08-04 at 20:08 +0200, Stefan Richter wrote: >>> As a discussion reminded me today, I believe I should merge the >>> following patch (could have done so much earlier in fact). Or is there >>> anything speaking against it? >> The value 255 is chosen because it worked before. What you need to do >> is establish what the upper limit for firewire is (or indeed if it has >> one). > > The limit is 65535, following from SBP-2 clause 5.1.2, definition of > data_size. > > [Side note: The SBP-2 s/g list (a.k.a. page table) consists of 64bit > wide entries and needs to be contiguous in memory from the POV of the > FireWire PCI or PCIe controller, and the SBP-2 target reads the table > from the initiator's memory. The (fw-)sbp2 driver builds this table as > a copy of the kernel's s/g list; but this was certainly already to the > reader clear from the context in the diff.] > Does the above mean you need contiguous physical memory to hold the sbp2's sg table. If so then it should be allocated with alloc_coherent and maybe total size up to a PAGE_SIZE. >>> Meanwhile, SG_ALL has been redefined from 255 to SCSI_MAX_SG_SEGMENTS = >>> 128 without me noticing it. But since several popular architectures >>> define ARCH_HAS_SG_CHAIN, we may want to go back to 255 in (fw-)sbp2. >>> (Besides, we should also do some tests to figure out an actual optimum. >>> And fw-sbp2.c's struct sbp2_command_orb should become variably sized.) >> Don't bother with optmium ... that's block's job based on what it sees >> from the completions. All we need to know is maximum. > >From my tests, the block layer will, very fast, send you chained commands. if you set it to 255 you'll see two chained fragments. In iscsi we have 4096 and it is very good for performance. However since you are pre-allocating the memory per can_queue commands, this can get big, and the memory overhead can be an impact on performance. > OK, with the small caveat that the current (fw-)sbp2 driver code is > somewhat simplistic WRT page table handling and geared towards rather > short page tables. But this may be possible to enhance without too much > difficulty. > >>> Does the most relevant user of large transfers with SBP-2 devices, >>> sd-mod, use chained s/g lists? >> pass, but firewire is a reasonably slow bus by modern standards, and you > > A 3200 Mb/s spec exists :-) (though no silicon yet, to my knowledge). > As you said above and since bitrate is increasing, a 255 is a good preallocated value, but perhaps it could also be a module parameter so fast cards can have a bigger pre-allocated sg table. (per canQ command). Also for embedded systems, they might want to lower this value. >> have the protocol overhead for each ORB, so I'd guess there's a point at >> which increasing the transaction size doesn't buy anything. >> >> James > >From re-inspecting the code I can see that the struct sbp2_command_orb might have problems with none DMA coherent ARCHs. I think for safety and flexibility the sbp2_command_orb.page_table array should be dynamically allocated at init time with even maybe alloc_coherent. Do you need that I send you a sketch of what I mean? Boaz ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-05 12:22 ` Boaz Harrosh @ 2008-08-05 15:01 ` Stefan Richter 2008-08-05 15:28 ` Boaz Harrosh 2008-08-05 19:57 ` [patch resend] firewire: ieee1394: Move away from SG_ALL Stefan Richter 1 sibling, 1 reply; 14+ messages in thread From: Stefan Richter @ 2008-08-05 15:01 UTC (permalink / raw) To: Boaz Harrosh; +Cc: James Bottomley, linux1394-devel, linux-scsi Boaz Harrosh wrote: > Stefan Richter wrote: >> James Bottomley wrote: >>> On Mon, 2008-08-04 at 20:08 +0200, Stefan Richter wrote: >>>> As a discussion reminded me today, I believe I should merge the >>>> following patch (could have done so much earlier in fact). Or is there >>>> anything speaking against it? >>> The value 255 is chosen because it worked before. What you need to do >>> is establish what the upper limit for firewire is (or indeed if it has >>> one). >> >> The limit is 65535, following from SBP-2 clause 5.1.2, definition of >> data_size. >> >> [Side note: The SBP-2 s/g list (a.k.a. page table) consists of 64bit >> wide entries and needs to be contiguous in memory from the POV of the >> FireWire PCI or PCIe controller, and the SBP-2 target reads the table >> from the initiator's memory. The (fw-)sbp2 driver builds this table as >> a copy of the kernel's s/g list; but this was certainly already to the >> reader clear from the context in the diff.] > > Does the above mean you need contiguous physical memory to hold the > sbp2's sg table. If so then it should be allocated with alloc_coherent and > maybe total size up to a PAGE_SIZE. It doesn't have to be cache-coherent but it indeed needs to be memory which can be DMA-mapped in one piece. (DMA direction is DMA_TO_DEVICE. Lifetime of the mapping is from before the ORB is committed until after we received completion status, or the request timed out and its ORB was aborted. fw-sbp2 implements exactly this lifetime, while sbp2 keeps the DMA mapping around from before login to the target until after logout.) >>>> Meanwhile, SG_ALL has been redefined from 255 to SCSI_MAX_SG_SEGMENTS = >>>> 128 without me noticing it. But since several popular architectures >>>> define ARCH_HAS_SG_CHAIN, we may want to go back to 255 in (fw-)sbp2. >>>> (Besides, we should also do some tests to figure out an actual optimum. >>>> And fw-sbp2.c's struct sbp2_command_orb should become variably sized.) >>> Don't bother with optmium ... that's block's job based on what it sees >>> from the completions. All we need to know is maximum. > > From my tests, the block layer will, very fast, send you chained commands. > if you set it to 255 you'll see two chained fragments. In iscsi we have > 4096 and it is very good for performance. However since you are pre-allocating > the memory per can_queue commands, this can get big, and the memory overhead > can be an impact on performance. OK. This gives me a better picture. >> OK, with the small caveat that the current (fw-)sbp2 driver code is >> somewhat simplistic WRT page table handling and geared towards rather >> short page tables. But this may be possible to enhance without too much >> difficulty. >> >>>> Does the most relevant user of large transfers with SBP-2 devices, >>>> sd-mod, use chained s/g lists? >>> pass, but firewire is a reasonably slow bus by modern standards, and you >> >> A 3200 Mb/s spec exists :-) (though no silicon yet, to my knowledge). > > As you said above and since bitrate is increasing, a 255 is a good > preallocated value, but perhaps it could also be a module parameter so > fast cards can have a bigger pre-allocated sg table. (per canQ command). > Also for embedded systems, they might want to lower this value. A preset according to connection speed could also be done by the driver without user intervention. >>> have the protocol overhead for each ORB, so I'd guess there's a point at >>> which increasing the transaction size doesn't buy anything. >>> >>> James >> > > From re-inspecting the code I can see that the struct sbp2_command_orb > might have problems with none DMA coherent ARCHs. Hmm. This would be a bug. But from what I see while scrolling through fw-sbp2 and through sbp2, they both do it right. Well, to be very strict, sbp2 should dma_sync_single_for_cpu() at a slightly different place than it does now. I shall patch this. > I think for safety > and flexibility the sbp2_command_orb.page_table array should be > dynamically allocated at init time The older driver sbp2 uses a fixed-size pool of ORBs and s/g tables for the whole time that it is logged in into a target. The author of the newer fw-sbp2 driver chose to keep ORB and s/g table only around for the lifetime of a request, and I think it's a valid approach. > with even maybe alloc_coherent. alloc_coherent isn't necessary. Even code simplicity wouldn't profit too much IMO; we already have to DMA-map the data buffer for each request anyway. -- Stefan Richter -=====-==--- =--- --=-= http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-05 15:01 ` Stefan Richter @ 2008-08-05 15:28 ` Boaz Harrosh 2008-08-08 11:59 ` [PATCH] ieee1394: sbp2: stricter dma_sync Stefan Richter 0 siblings, 1 reply; 14+ messages in thread From: Boaz Harrosh @ 2008-08-05 15:28 UTC (permalink / raw) To: Stefan Richter; +Cc: James Bottomley, linux1394-devel, linux-scsi Stefan Richter wrote: > Boaz Harrosh wrote: >> Stefan Richter wrote: >>> James Bottomley wrote: >>>> On Mon, 2008-08-04 at 20:08 +0200, Stefan Richter wrote: >>>>> As a discussion reminded me today, I believe I should merge the >>>>> following patch (could have done so much earlier in fact). Or is there >>>>> anything speaking against it? >>>> The value 255 is chosen because it worked before. What you need to do >>>> is establish what the upper limit for firewire is (or indeed if it has >>>> one). >>> The limit is 65535, following from SBP-2 clause 5.1.2, definition of >>> data_size. >>> >>> [Side note: The SBP-2 s/g list (a.k.a. page table) consists of 64bit >>> wide entries and needs to be contiguous in memory from the POV of the >>> FireWire PCI or PCIe controller, and the SBP-2 target reads the table >>> from the initiator's memory. The (fw-)sbp2 driver builds this table as >>> a copy of the kernel's s/g list; but this was certainly already to the >>> reader clear from the context in the diff.] >> Does the above mean you need contiguous physical memory to hold the >> sbp2's sg table. If so then it should be allocated with alloc_coherent and >> maybe total size up to a PAGE_SIZE. > > It doesn't have to be cache-coherent but it indeed needs to be memory > which can be DMA-mapped in one piece. > > (DMA direction is DMA_TO_DEVICE. Lifetime of the mapping is from before > the ORB is committed until after we received completion status, or the > request timed out and its ORB was aborted. fw-sbp2 implements exactly > this lifetime, while sbp2 keeps the DMA mapping around from before login > to the target until after logout.) > OK you are right, DMA_TO_DEVICE is less problematic in regard to variables that are changed by the CPU that share cacheline with DMAed memory. As long as the code has a cache-sync before the start of transfer. But I see you'll take care of that below. > >>>>> Meanwhile, SG_ALL has been redefined from 255 to SCSI_MAX_SG_SEGMENTS = >>>>> 128 without me noticing it. But since several popular architectures >>>>> define ARCH_HAS_SG_CHAIN, we may want to go back to 255 in (fw-)sbp2. >>>>> (Besides, we should also do some tests to figure out an actual optimum. >>>>> And fw-sbp2.c's struct sbp2_command_orb should become variably sized.) >>>> Don't bother with optmium ... that's block's job based on what it sees >>>> from the completions. All we need to know is maximum. >> From my tests, the block layer will, very fast, send you chained commands. >> if you set it to 255 you'll see two chained fragments. In iscsi we have >> 4096 and it is very good for performance. However since you are pre-allocating >> the memory per can_queue commands, this can get big, and the memory overhead >> can be an impact on performance. > > OK. This gives me a better picture. > > >>> OK, with the small caveat that the current (fw-)sbp2 driver code is >>> somewhat simplistic WRT page table handling and geared towards rather >>> short page tables. But this may be possible to enhance without too much >>> difficulty. >>> >>>>> Does the most relevant user of large transfers with SBP-2 devices, >>>>> sd-mod, use chained s/g lists? >>>> pass, but firewire is a reasonably slow bus by modern standards, and you >>> A 3200 Mb/s spec exists :-) (though no silicon yet, to my knowledge). >> As you said above and since bitrate is increasing, a 255 is a good >> preallocated value, but perhaps it could also be a module parameter so >> fast cards can have a bigger pre-allocated sg table. (per canQ command). >> Also for embedded systems, they might want to lower this value. > > A preset according to connection speed could also be done by the driver > without user intervention. > Even better thanks, that would be ideal. (Except for the embedded systems that would want to keep it low, perhaps at configuration time) > >>>> have the protocol overhead for each ORB, so I'd guess there's a point at >>>> which increasing the transaction size doesn't buy anything. >>>> >>>> James >> From re-inspecting the code I can see that the struct sbp2_command_orb >> might have problems with none DMA coherent ARCHs. > > Hmm. This would be a bug. But from what I see while scrolling through > fw-sbp2 and through sbp2, they both do it right. Well, to be very > strict, sbp2 should dma_sync_single_for_cpu() at a slightly different > place than it does now. I shall patch this. > Yes I agree with you. Keep me posted I'll review the code > >> I think for safety >> and flexibility the sbp2_command_orb.page_table array should be >> dynamically allocated at init time > > The older driver sbp2 uses a fixed-size pool of ORBs and s/g tables for > the whole time that it is logged in into a target. The author of the > newer fw-sbp2 driver chose to keep ORB and s/g table only around for the > lifetime of a request, and I think it's a valid approach. > >> with even maybe alloc_coherent. > > alloc_coherent isn't necessary. Even code simplicity wouldn't profit > too much IMO; we already have to DMA-map the data buffer for each > request anyway. OK I'm convinced. I think the proposed patch, (funny I wrote it) is good as a first measure to take matters into drivers hands and avoid any unintentional side effects from scsi constants. Thanks Boaz ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] ieee1394: sbp2: stricter dma_sync 2008-08-05 15:28 ` Boaz Harrosh @ 2008-08-08 11:59 ` Stefan Richter 2008-08-08 12:00 ` [PATCH] ieee1394: sbp2: avoid DMA bounce buffers for ORBs Stefan Richter 2008-08-09 18:13 ` [PATCH update] ieee1394: sbp2: stricter dma_sync Stefan Richter 0 siblings, 2 replies; 14+ messages in thread From: Stefan Richter @ 2008-08-08 11:59 UTC (permalink / raw) To: Boaz Harrosh; +Cc: James Bottomley, linux1394-devel, linux-scsi On 5 Aug, Boaz Harrosh wrote: > Stefan Richter wrote: >> Boaz Harrosh wrote: >>> From re-inspecting the code I can see that the struct sbp2_command_orb >>> might have problems with none DMA coherent ARCHs. >> >> Hmm. This would be a bug. But from what I see while scrolling through >> fw-sbp2 and through sbp2, they both do it right. Well, to be very >> strict, sbp2 should dma_sync_single_for_cpu() at a slightly different >> place than it does now. I shall patch this. >> > > Yes I agree with you. Keep me posted I'll review the code Two dma_sync_single_for_cpu() were called in the wrong place. Luckily they were merely for DMA_TO_DEVICE, hence nobody noticed. Also reorder the matching dma_sync_single_for_device() a little bit so that they reside in the same functions as their counterparts. This also avoids syncing the s/g table for requests which don't use it. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- drivers/ieee1394/sbp2.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) On a related note, sbp2 lacks error checks for the DMA mapping when the ORB pool is created and when the request buffer is mapped. I leave that as a ToDo for sometime later. Index: linux-2.6.27-rc2/drivers/ieee1394/sbp2.c =================================================================== --- linux-2.6.27-rc2.orig/drivers/ieee1394/sbp2.c +++ linux-2.6.27-rc2/drivers/ieee1394/sbp2.c @@ -1522,6 +1522,10 @@ static void sbp2_prep_command_orb_sg(str orb->misc |= ORB_SET_PAGE_TABLE_PRESENT(0x1); orb->data_descriptor_lo = cmd->sge_dma; + dma_sync_single_for_cpu(hi->host->device.parent, cmd->sge_dma, + sizeof(cmd->scatter_gather_element), + DMA_TO_DEVICE); + /* loop through and fill out our SBP-2 page tables * (and split up anything too large) */ for (i = 0, sg_count = 0; i < count; i++, sg = sg_next(sg)) { @@ -1548,6 +1552,11 @@ static void sbp2_prep_command_orb_sg(str sbp2util_cpu_to_be32_buffer(sg_element, (sizeof(struct sbp2_unrestricted_page_table)) * sg_count); + + dma_sync_single_for_device(hi->host->device.parent, + cmd->sge_dma, + sizeof(cmd->scatter_gather_element), + DMA_TO_DEVICE); } } @@ -1555,12 +1564,14 @@ static void sbp2_create_command_orb(stru struct sbp2_command_info *cmd, struct scsi_cmnd *SCpnt) { - struct sbp2_fwhost_info *hi = lu->hi; + struct device *dev = lu->hi->host->device.parent; struct sbp2_command_orb *orb = &cmd->command_orb; u32 orb_direction; unsigned int scsi_request_bufflen = scsi_bufflen(SCpnt); enum dma_data_direction dma_dir = SCpnt->sc_data_direction; + dma_sync_single_for_cpu(dev, cmd->command_orb_dma, + sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); /* * Set-up our command ORB. * @@ -1592,7 +1603,7 @@ static void sbp2_create_command_orb(stru orb->data_descriptor_lo = 0x0; orb->misc |= ORB_SET_DIRECTION(1); } else - sbp2_prep_command_orb_sg(orb, hi, cmd, scsi_sg_count(SCpnt), + sbp2_prep_command_orb_sg(orb, lu->hi, cmd, scsi_sg_count(SCpnt), scsi_sglist(SCpnt), orb_direction, dma_dir); @@ -1600,6 +1611,9 @@ static void sbp2_create_command_orb(stru memset(orb->cdb, 0, sizeof(orb->cdb)); memcpy(orb->cdb, SCpnt->cmnd, SCpnt->cmd_len); + + dma_sync_single_for_device(dev, cmd->command_orb_dma, + sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); } static void sbp2_link_orb_command(struct sbp2_lu *lu, @@ -1613,14 +1627,6 @@ static void sbp2_link_orb_command(struct size_t length; unsigned long flags; - dma_sync_single_for_device(hi->host->device.parent, - cmd->command_orb_dma, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - dma_sync_single_for_device(hi->host->device.parent, cmd->sge_dma, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); - /* check to see if there are any previous orbs to use */ spin_lock_irqsave(&lu->cmd_orb_lock, flags); last_orb = lu->last_orb; @@ -1778,13 +1784,6 @@ static int sbp2_handle_status_write(stru else cmd = sbp2util_find_command_for_orb(lu, sb->ORB_offset_lo); if (cmd) { - dma_sync_single_for_cpu(hi->host->device.parent, - cmd->command_orb_dma, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - dma_sync_single_for_cpu(hi->host->device.parent, cmd->sge_dma, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); /* Grab SCSI command pointers and check status. */ /* * FIXME: If the src field in the status is 1, the ORB DMA must -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] ieee1394: sbp2: avoid DMA bounce buffers for ORBs 2008-08-08 11:59 ` [PATCH] ieee1394: sbp2: stricter dma_sync Stefan Richter @ 2008-08-08 12:00 ` Stefan Richter 2008-08-08 17:54 ` Stefan Richter 2008-08-09 18:13 ` [PATCH update] ieee1394: sbp2: stricter dma_sync Stefan Richter 1 sibling, 1 reply; 14+ messages in thread From: Stefan Richter @ 2008-08-08 12:00 UTC (permalink / raw) To: linux1394-devel; +Cc: James Bottomley, linux-scsi, Boaz Harrosh The command ORB pool is used frequently during the whole lifetime of an SBP-2 logical unit. It doesn't make sense to let it live outside the 32bit DMA zone. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- drivers/ieee1394/sbp2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.27-rc2/drivers/ieee1394/sbp2.c =================================================================== --- linux-2.6.27-rc2.orig/drivers/ieee1394/sbp2.c +++ linux-2.6.27-rc2/drivers/ieee1394/sbp2.c @@ -531,7 +531,7 @@ static int sbp2util_create_command_orb_p int i, orbs = sbp2_serialize_io ? 2 : SBP2_MAX_CMDS; for (i = 0; i < orbs; i++) { - cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); + cmd = kzalloc(sizeof(*cmd), GFP_KERNEL | GFP_DMA32); if (!cmd) return -ENOMEM; cmd->command_orb_dma = dma_map_single(hi->host->device.parent, -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] ieee1394: sbp2: avoid DMA bounce buffers for ORBs 2008-08-08 12:00 ` [PATCH] ieee1394: sbp2: avoid DMA bounce buffers for ORBs Stefan Richter @ 2008-08-08 17:54 ` Stefan Richter 0 siblings, 0 replies; 14+ messages in thread From: Stefan Richter @ 2008-08-08 17:54 UTC (permalink / raw) To: linux1394-devel; +Cc: Boaz Harrosh, James Bottomley, linux-scsi On 8 Aug, Stefan Richter wrote: > --- linux-2.6.27-rc2.orig/drivers/ieee1394/sbp2.c > +++ linux-2.6.27-rc2/drivers/ieee1394/sbp2.c > @@ -531,7 +531,7 @@ static int sbp2util_create_command_orb_p > int i, orbs = sbp2_serialize_io ? 2 : SBP2_MAX_CMDS; > > for (i = 0; i < orbs; i++) { > - cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); > + cmd = kzalloc(sizeof(*cmd), GFP_KERNEL | GFP_DMA32); > if (!cmd) > return -ENOMEM; > cmd->command_orb_dma = dma_map_single(hi->host->device.parent, > This patch is bogus. I cannot use GFP_DMA32 in slab allocations. -- Stefan Richter -=====-==--- =--- -=--- http://arcgraph.de/sr/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH update] ieee1394: sbp2: stricter dma_sync 2008-08-08 11:59 ` [PATCH] ieee1394: sbp2: stricter dma_sync Stefan Richter 2008-08-08 12:00 ` [PATCH] ieee1394: sbp2: avoid DMA bounce buffers for ORBs Stefan Richter @ 2008-08-09 18:13 ` Stefan Richter 2008-08-09 18:16 ` [PATCH] ieee1394: sbp2: check for DMA mapping failures Stefan Richter 1 sibling, 1 reply; 14+ messages in thread From: Stefan Richter @ 2008-08-09 18:13 UTC (permalink / raw) To: linux1394-devel; +Cc: linux-scsi, Boaz Harrosh Two dma_sync_single_for_cpu() were called in the wrong place. Luckily they were merely for DMA_TO_DEVICE, hence nobody noticed. Also reorder the matching dma_sync_single_for_device() a little bit so that they reside in the same functions as their counterparts. This also avoids syncing the s/g table for requests which don't use it. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- Update: - I forgot to remove four calls of dma_sync_single_for_cpu. - Call the pointer to the controller dmadev instead of dev to avoid confusion with the logical unit's device pointer. drivers/ieee1394/sbp2.c | 50 ++++++++++++---------------------------- 1 file changed, 16 insertions(+), 34 deletions(-) Index: linux-2.6.27-rc2/drivers/ieee1394/sbp2.c =================================================================== --- linux-2.6.27-rc2.orig/drivers/ieee1394/sbp2.c +++ linux-2.6.27-rc2/drivers/ieee1394/sbp2.c @@ -1522,6 +1522,10 @@ static void sbp2_prep_command_orb_sg(str orb->misc |= ORB_SET_PAGE_TABLE_PRESENT(0x1); orb->data_descriptor_lo = cmd->sge_dma; + dma_sync_single_for_cpu(hi->host->device.parent, cmd->sge_dma, + sizeof(cmd->scatter_gather_element), + DMA_TO_DEVICE); + /* loop through and fill out our SBP-2 page tables * (and split up anything too large) */ for (i = 0, sg_count = 0; i < count; i++, sg = sg_next(sg)) { @@ -1548,6 +1552,11 @@ static void sbp2_prep_command_orb_sg(str sbp2util_cpu_to_be32_buffer(sg_element, (sizeof(struct sbp2_unrestricted_page_table)) * sg_count); + + dma_sync_single_for_device(hi->host->device.parent, + cmd->sge_dma, + sizeof(cmd->scatter_gather_element), + DMA_TO_DEVICE); } } @@ -1555,12 +1564,14 @@ static void sbp2_create_command_orb(stru struct sbp2_command_info *cmd, struct scsi_cmnd *SCpnt) { - struct sbp2_fwhost_info *hi = lu->hi; + struct device *dmadev = lu->hi->host->device.parent; struct sbp2_command_orb *orb = &cmd->command_orb; u32 orb_direction; unsigned int scsi_request_bufflen = scsi_bufflen(SCpnt); enum dma_data_direction dma_dir = SCpnt->sc_data_direction; + dma_sync_single_for_cpu(dmadev, cmd->command_orb_dma, + sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); /* * Set-up our command ORB. * @@ -1592,7 +1603,7 @@ static void sbp2_create_command_orb(stru orb->data_descriptor_lo = 0x0; orb->misc |= ORB_SET_DIRECTION(1); } else - sbp2_prep_command_orb_sg(orb, hi, cmd, scsi_sg_count(SCpnt), + sbp2_prep_command_orb_sg(orb, lu->hi, cmd, scsi_sg_count(SCpnt), scsi_sglist(SCpnt), orb_direction, dma_dir); @@ -1600,6 +1611,9 @@ static void sbp2_create_command_orb(stru memset(orb->cdb, 0, sizeof(orb->cdb)); memcpy(orb->cdb, SCpnt->cmnd, SCpnt->cmd_len); + + dma_sync_single_for_device(dmadev, cmd->command_orb_dma, + sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); } static void sbp2_link_orb_command(struct sbp2_lu *lu, @@ -1613,14 +1627,6 @@ static void sbp2_link_orb_command(struct size_t length; unsigned long flags; - dma_sync_single_for_device(hi->host->device.parent, - cmd->command_orb_dma, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - dma_sync_single_for_device(hi->host->device.parent, cmd->sge_dma, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); - /* check to see if there are any previous orbs to use */ spin_lock_irqsave(&lu->cmd_orb_lock, flags); last_orb = lu->last_orb; @@ -1778,13 +1784,6 @@ static int sbp2_handle_status_write(stru else cmd = sbp2util_find_command_for_orb(lu, sb->ORB_offset_lo); if (cmd) { - dma_sync_single_for_cpu(hi->host->device.parent, - cmd->command_orb_dma, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - dma_sync_single_for_cpu(hi->host->device.parent, cmd->sge_dma, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); /* Grab SCSI command pointers and check status. */ /* * FIXME: If the src field in the status is 1, the ORB DMA must @@ -1901,7 +1900,6 @@ done: static void sbp2scsi_complete_all_commands(struct sbp2_lu *lu, u32 status) { - struct sbp2_fwhost_info *hi = lu->hi; struct list_head *lh; struct sbp2_command_info *cmd; unsigned long flags; @@ -1910,13 +1908,6 @@ static void sbp2scsi_complete_all_comman while (!list_empty(&lu->cmd_orb_inuse)) { lh = lu->cmd_orb_inuse.next; cmd = list_entry(lh, struct sbp2_command_info, list); - dma_sync_single_for_cpu(hi->host->device.parent, - cmd->command_orb_dma, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - dma_sync_single_for_cpu(hi->host->device.parent, cmd->sge_dma, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); sbp2util_mark_command_completed(lu, cmd); if (cmd->Current_SCpnt) { cmd->Current_SCpnt->result = status << 16; @@ -2038,7 +2029,6 @@ static void sbp2scsi_slave_destroy(struc static int sbp2scsi_abort(struct scsi_cmnd *SCpnt) { struct sbp2_lu *lu = (struct sbp2_lu *)SCpnt->device->host->hostdata[0]; - struct sbp2_fwhost_info *hi = lu->hi; struct sbp2_command_info *cmd; unsigned long flags; @@ -2052,14 +2042,6 @@ static int sbp2scsi_abort(struct scsi_cm spin_lock_irqsave(&lu->cmd_orb_lock, flags); cmd = sbp2util_find_command_for_SCpnt(lu, SCpnt); if (cmd) { - dma_sync_single_for_cpu(hi->host->device.parent, - cmd->command_orb_dma, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - dma_sync_single_for_cpu(hi->host->device.parent, - cmd->sge_dma, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); sbp2util_mark_command_completed(lu, cmd); if (cmd->Current_SCpnt) { cmd->Current_SCpnt->result = DID_ABORT << 16; -- Stefan Richter -=====-==--- =--- -=--= http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH] ieee1394: sbp2: check for DMA mapping failures 2008-08-09 18:13 ` [PATCH update] ieee1394: sbp2: stricter dma_sync Stefan Richter @ 2008-08-09 18:16 ` Stefan Richter 0 siblings, 0 replies; 14+ messages in thread From: Stefan Richter @ 2008-08-09 18:16 UTC (permalink / raw) To: linux1394-devel; +Cc: linux-scsi, Boaz Harrosh On 8 Aug, Stefan Richter wrote: > On a related note, sbp2 lacks error checks for the DMA mapping when the > ORB pool is created and when the request buffer is mapped. I leave that > as a ToDo for sometime later. From: Stefan Richter <stefanr@s5r6.in-berlin.de> Subject: ieee1394: sbp2: check for DMA mapping failures Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> --- drivers/ieee1394/sbp2.c | 94 +++++++++++++++++++++++++--------------- 1 file changed, 59 insertions(+), 35 deletions(-) Index: linux-2.6.27-rc2/drivers/ieee1394/sbp2.c =================================================================== --- linux-2.6.27-rc2.orig/drivers/ieee1394/sbp2.c +++ linux-2.6.27-rc2/drivers/ieee1394/sbp2.c @@ -526,26 +526,41 @@ static void sbp2util_write_doorbell(stru static int sbp2util_create_command_orb_pool(struct sbp2_lu *lu) { - struct sbp2_fwhost_info *hi = lu->hi; struct sbp2_command_info *cmd; + struct device *dmadev = lu->hi->host->device.parent; int i, orbs = sbp2_serialize_io ? 2 : SBP2_MAX_CMDS; for (i = 0; i < orbs; i++) { cmd = kzalloc(sizeof(*cmd), GFP_KERNEL); if (!cmd) - return -ENOMEM; - cmd->command_orb_dma = dma_map_single(hi->host->device.parent, - &cmd->command_orb, - sizeof(struct sbp2_command_orb), - DMA_TO_DEVICE); - cmd->sge_dma = dma_map_single(hi->host->device.parent, - &cmd->scatter_gather_element, - sizeof(cmd->scatter_gather_element), - DMA_TO_DEVICE); + goto failed_alloc; + + cmd->command_orb_dma = + dma_map_single(dmadev, &cmd->command_orb, + sizeof(struct sbp2_command_orb), + DMA_TO_DEVICE); + if (dma_mapping_error(dmadev, cmd->command_orb_dma)) + goto failed_orb; + + cmd->sge_dma = + dma_map_single(dmadev, &cmd->scatter_gather_element, + sizeof(cmd->scatter_gather_element), + DMA_TO_DEVICE); + if (dma_mapping_error(dmadev, cmd->sge_dma)) + goto failed_sge; + INIT_LIST_HEAD(&cmd->list); list_add_tail(&cmd->list, &lu->cmd_orb_completed); } return 0; + +failed_sge: + dma_unmap_single(dmadev, cmd->command_orb_dma, + sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); +failed_orb: + kfree(cmd); +failed_alloc: + return -ENOMEM; } static void sbp2util_remove_command_orb_pool(struct sbp2_lu *lu, @@ -1483,14 +1498,16 @@ static int sbp2_agent_reset(struct sbp2_ return 0; } -static void sbp2_prep_command_orb_sg(struct sbp2_command_orb *orb, - struct sbp2_fwhost_info *hi, - struct sbp2_command_info *cmd, - unsigned int scsi_use_sg, - struct scatterlist *sg, - u32 orb_direction, - enum dma_data_direction dma_dir) +static int sbp2_prep_command_orb_sg(struct sbp2_command_orb *orb, + struct sbp2_fwhost_info *hi, + struct sbp2_command_info *cmd, + unsigned int scsi_use_sg, + struct scatterlist *sg, + u32 orb_direction, + enum dma_data_direction dma_dir) { + struct device *dmadev = hi->host->device.parent; + cmd->dma_dir = dma_dir; orb->data_descriptor_hi = ORB_SET_NODE_ID(hi->host->node_id); orb->misc |= ORB_SET_DIRECTION(orb_direction); @@ -1500,9 +1517,12 @@ static void sbp2_prep_command_orb_sg(str cmd->dma_size = sg->length; cmd->dma_type = CMD_DMA_PAGE; - cmd->cmd_dma = dma_map_page(hi->host->device.parent, - sg_page(sg), sg->offset, + cmd->cmd_dma = dma_map_page(dmadev, sg_page(sg), sg->offset, cmd->dma_size, cmd->dma_dir); + if (dma_mapping_error(dmadev, cmd->cmd_dma)) { + cmd->cmd_dma = 0; + return -ENOMEM; + } orb->data_descriptor_lo = cmd->cmd_dma; orb->misc |= ORB_SET_DATA_SIZE(cmd->dma_size); @@ -1512,8 +1532,7 @@ static void sbp2_prep_command_orb_sg(str &cmd->scatter_gather_element[0]; u32 sg_count, sg_len; dma_addr_t sg_addr; - int i, count = dma_map_sg(hi->host->device.parent, sg, - scsi_use_sg, dma_dir); + int i, count = dma_map_sg(dmadev, sg, scsi_use_sg, dma_dir); cmd->dma_size = scsi_use_sg; cmd->sge_buffer = sg; @@ -1522,7 +1541,7 @@ static void sbp2_prep_command_orb_sg(str orb->misc |= ORB_SET_PAGE_TABLE_PRESENT(0x1); orb->data_descriptor_lo = cmd->sge_dma; - dma_sync_single_for_cpu(hi->host->device.parent, cmd->sge_dma, + dma_sync_single_for_cpu(dmadev, cmd->sge_dma, sizeof(cmd->scatter_gather_element), DMA_TO_DEVICE); @@ -1553,22 +1572,23 @@ static void sbp2_prep_command_orb_sg(str (sizeof(struct sbp2_unrestricted_page_table)) * sg_count); - dma_sync_single_for_device(hi->host->device.parent, - cmd->sge_dma, + dma_sync_single_for_device(dmadev, cmd->sge_dma, sizeof(cmd->scatter_gather_element), DMA_TO_DEVICE); } + return 0; } -static void sbp2_create_command_orb(struct sbp2_lu *lu, - struct sbp2_command_info *cmd, - struct scsi_cmnd *SCpnt) +static int sbp2_create_command_orb(struct sbp2_lu *lu, + struct sbp2_command_info *cmd, + struct scsi_cmnd *SCpnt) { struct device *dmadev = lu->hi->host->device.parent; struct sbp2_command_orb *orb = &cmd->command_orb; - u32 orb_direction; unsigned int scsi_request_bufflen = scsi_bufflen(SCpnt); enum dma_data_direction dma_dir = SCpnt->sc_data_direction; + u32 orb_direction; + int ret; dma_sync_single_for_cpu(dmadev, cmd->command_orb_dma, sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); @@ -1602,11 +1622,13 @@ static void sbp2_create_command_orb(stru orb->data_descriptor_hi = 0x0; orb->data_descriptor_lo = 0x0; orb->misc |= ORB_SET_DIRECTION(1); - } else - sbp2_prep_command_orb_sg(orb, lu->hi, cmd, scsi_sg_count(SCpnt), - scsi_sglist(SCpnt), - orb_direction, dma_dir); - + ret = 0; + } else { + ret = sbp2_prep_command_orb_sg(orb, lu->hi, cmd, + scsi_sg_count(SCpnt), + scsi_sglist(SCpnt), + orb_direction, dma_dir); + } sbp2util_cpu_to_be32_buffer(orb, sizeof(*orb)); memset(orb->cdb, 0, sizeof(orb->cdb)); @@ -1614,6 +1636,7 @@ static void sbp2_create_command_orb(stru dma_sync_single_for_device(dmadev, cmd->command_orb_dma, sizeof(struct sbp2_command_orb), DMA_TO_DEVICE); + return ret; } static void sbp2_link_orb_command(struct sbp2_lu *lu, @@ -1694,9 +1717,10 @@ static int sbp2_send_command(struct sbp2 if (!cmd) return -EIO; - sbp2_create_command_orb(lu, cmd, SCpnt); - sbp2_link_orb_command(lu, cmd); + if (sbp2_create_command_orb(lu, cmd, SCpnt)) + return -ENOMEM; + sbp2_link_orb_command(lu, cmd); return 0; } -- Stefan Richter -=====-==--- =--- -=--= http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-05 12:22 ` Boaz Harrosh 2008-08-05 15:01 ` Stefan Richter @ 2008-08-05 19:57 ` Stefan Richter 2008-08-06 9:10 ` Boaz Harrosh 1 sibling, 1 reply; 14+ messages in thread From: Stefan Richter @ 2008-08-05 19:57 UTC (permalink / raw) To: Boaz Harrosh; +Cc: James Bottomley, linux1394-devel, linux-scsi Boaz Harrosh wrote: > From my tests, the block layer will, very fast, send you chained commands. > if you set it to 255 you'll see two chained fragments. In iscsi we have > 4096 and it is very good for performance. However since you are pre-allocating > the memory per can_queue commands, this can get big, and the memory overhead > can be an impact on performance. I did repeated read tests with dd now, with a 3.5" 750G SATA/ FireWire 800 disk and with a 2.5" 250G IDE/ FireWire 400 disk, each with either FireWire stack. I used dd with and without iflag=direct, just because I don't have a clue how to capture the influence of .sg_tablesize best. Tests with .sg_tablesize = 128, 255, and 5000 all yield the same transfer rates. So I guess I simply leave it at the current SG_ALL for now. -- Stefan Richter -=====-==--- =--- --=-= http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-05 19:57 ` [patch resend] firewire: ieee1394: Move away from SG_ALL Stefan Richter @ 2008-08-06 9:10 ` Boaz Harrosh 2008-08-06 20:49 ` Stefan Richter 0 siblings, 1 reply; 14+ messages in thread From: Boaz Harrosh @ 2008-08-06 9:10 UTC (permalink / raw) To: Stefan Richter; +Cc: James Bottomley, linux1394-devel, linux-scsi, Jens Axboe Stefan Richter wrote: > Boaz Harrosh wrote: >> From my tests, the block layer will, very fast, send you chained commands. >> if you set it to 255 you'll see two chained fragments. In iscsi we have >> 4096 and it is very good for performance. However since you are pre-allocating >> the memory per can_queue commands, this can get big, and the memory overhead >> can be an impact on performance. > > I did repeated read tests with dd now, with a 3.5" 750G SATA/ FireWire > 800 disk and with a 2.5" 250G IDE/ FireWire 400 disk, each with either > FireWire stack. I used dd with and without iflag=direct, just because I > don't have a clue how to capture the influence of .sg_tablesize best. > > Tests with .sg_tablesize = 128, 255, and 5000 all yield the same > transfer rates. So I guess I simply leave it at the current SG_ALL for now. You will need to enable sg chaining for your device. I'm not sure where is the latest documentation. (CC: Jens) Here is the script I used in my tests: #!/bin/bash sdx=sdb # mess with sglist chaining cd /sys/block/$sdx/queue echo 4096 > max_segments cat max_hw_sectors_kb > max_sectors_kb echo "max_hw_sectors_kb="$(cat max_hw_sectors_kb) echo "max_sectors_kb="$(cat max_sectors_kb) echo "max_segments="$(cat max_segments) Try to view the content of max_segments before and after see if it changed. and also with dd you need a large block size here is how I calculated it. (Other people might have better tests) I did a write test but a read test should be better: if=/dev/zero of=/dev/sdb outputfile=$1.txt echo "Testing $1" # send 32G in $1 sectrors at once do_dd() { # blocks of one sector bs=512 #memory page in blocks page=8 #number of scatterlist elements in a transfer sgs=$1 #calculate the bpt param bpt=$(($sgs*$page)) #total blocks to transfer 32 Giga bytes count=64M echo $3: "bpt=$bpt" \time bash -c \ "sg_dd blk_sgio=1 dio=1 if=$if of=$of bpt=$bpt bs=$bs count=$count 2>/dev/null" \ 2>> $2 } echo "BEGIN RUN $1" >> $outputfile echo "run with 128-sgilist" echo "run with 128-sgilist" >> $outputfile for i in {1..20}; do do_dd 128 $outputfile $i; done echo "run with 2048-sgilist" echo "run with 2048-sgilist" >> $outputfile for i in {1..20}; do do_dd 2048 $outputfile $i; done I think that in the second case with 2048 if your device does not support it, dd will fail. (if I recall correctly) Boaz PS: OK I used sg_dd from sg-utils sg_read might be even better ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch resend] firewire: ieee1394: Move away from SG_ALL 2008-08-06 9:10 ` Boaz Harrosh @ 2008-08-06 20:49 ` Stefan Richter 0 siblings, 0 replies; 14+ messages in thread From: Stefan Richter @ 2008-08-06 20:49 UTC (permalink / raw) To: Boaz Harrosh; +Cc: James Bottomley, linux1394-devel, linux-scsi, Jens Axboe Boaz Harrosh wrote: > You will need to enable sg chaining for your device. Probably, because... > I'm not sure where is the latest documentation. (CC: Jens) > > Here is the script I used in my tests: > > #!/bin/bash > sdx=sdb > > # mess with sglist chaining > cd /sys/block/$sdx/queue > echo 4096 > max_segments > cat max_hw_sectors_kb > max_sectors_kb > echo "max_hw_sectors_kb="$(cat max_hw_sectors_kb) > echo "max_sectors_kb="$(cat max_sectors_kb) > echo "max_segments="$(cat max_segments) > > Try to view the content of max_segments before and after > see if it changed. ...right now there is no max_segments attribute. Anyway. This already fully answers what my ideal table size limit is. Thanks so far, -- Stefan Richter -=====-==--- =--- --==- http://arcgraph.de/sr/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-08-09 18:16 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-08-04 18:08 [patch resend] firewire: ieee1394: Move away from SG_ALL Stefan Richter 2008-08-04 19:21 ` James Bottomley 2008-08-04 19:58 ` Stefan Richter 2008-08-05 12:22 ` Boaz Harrosh 2008-08-05 15:01 ` Stefan Richter 2008-08-05 15:28 ` Boaz Harrosh 2008-08-08 11:59 ` [PATCH] ieee1394: sbp2: stricter dma_sync Stefan Richter 2008-08-08 12:00 ` [PATCH] ieee1394: sbp2: avoid DMA bounce buffers for ORBs Stefan Richter 2008-08-08 17:54 ` Stefan Richter 2008-08-09 18:13 ` [PATCH update] ieee1394: sbp2: stricter dma_sync Stefan Richter 2008-08-09 18:16 ` [PATCH] ieee1394: sbp2: check for DMA mapping failures Stefan Richter 2008-08-05 19:57 ` [patch resend] firewire: ieee1394: Move away from SG_ALL Stefan Richter 2008-08-06 9:10 ` Boaz Harrosh 2008-08-06 20:49 ` Stefan Richter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox