* IOMMU and scatterlist limits @ 2005-11-17 8:34 Pierre Ossman 2005-11-17 8:54 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Pierre Ossman @ 2005-11-17 8:34 UTC (permalink / raw) To: LKML I'm writing a PCI driver for the first time and I'm trying to wrap my head around the DMA mappings in that world. I've done a ISA driver which uses DMA, but this is a bit more complex and the documentation doesn't explain everything. What I'm particularly confused about is how the IOMMU should be handled with regard to scatterlist limits. My hardware cannot handle scatterlists, only a single DMA address. But from what I understand the IOMMU can be very similar to a normal "CPU" MMU. So it should be able to aggregate pages that are non-continuous in physical memory into one single block in bus memory. Now the question is what do I set nr_phys_segments and nr_hw_segments to? Of course the code also needs to handle systems without an IOMMU. Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 8:34 IOMMU and scatterlist limits Pierre Ossman @ 2005-11-17 8:54 ` Jens Axboe 2005-11-17 9:02 ` Pierre Ossman 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2005-11-17 8:54 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML On Thu, Nov 17 2005, Pierre Ossman wrote: > I'm writing a PCI driver for the first time and I'm trying to wrap my > head around the DMA mappings in that world. I've done a ISA driver which > uses DMA, but this is a bit more complex and the documentation doesn't > explain everything. > > What I'm particularly confused about is how the IOMMU should be handled > with regard to scatterlist limits. My hardware cannot handle > scatterlists, only a single DMA address. But from what I understand the What kind of hardware can't handle scatter gather? > IOMMU can be very similar to a normal "CPU" MMU. So it should be able to > aggregate pages that are non-continuous in physical memory into one > single block in bus memory. Now the question is what do I set > nr_phys_segments and nr_hw_segments to? Of course the code also needs to > handle systems without an IOMMU. nr_hw_segments is how many segments your driver will see once dma mapping is complete (and the IOMMU has done its tricks), so you want to set that to 1 if the hardware can't handle an sg list. That'll work irregardless of whether there's an IOMMU there or not. Note that the mere existence of an IOMMU will _not_ save your performance on this hardware, you need one with good virtual merging support to get larger transfers. -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 8:54 ` Jens Axboe @ 2005-11-17 9:02 ` Pierre Ossman 2005-11-17 9:13 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Pierre Ossman @ 2005-11-17 9:02 UTC (permalink / raw) To: Jens Axboe; +Cc: LKML Jens Axboe wrote: > On Thu, Nov 17 2005, Pierre Ossman wrote: > >> I'm writing a PCI driver for the first time and I'm trying to wrap my >> head around the DMA mappings in that world. I've done a ISA driver which >> uses DMA, but this is a bit more complex and the documentation doesn't >> explain everything. >> >> What I'm particularly confused about is how the IOMMU should be handled >> with regard to scatterlist limits. My hardware cannot handle >> scatterlists, only a single DMA address. But from what I understand the >> > > What kind of hardware can't handle scatter gather? > > I'd figure most hardware? DMA is handled by writing the start address into one register and a size into another. Being able to set several addr/len pairs seems highly advanced to me. :) >> IOMMU can be very similar to a normal "CPU" MMU. So it should be able to >> aggregate pages that are non-continuous in physical memory into one >> single block in bus memory. Now the question is what do I set >> nr_phys_segments and nr_hw_segments to? Of course the code also needs to >> handle systems without an IOMMU. >> > > nr_hw_segments is how many segments your driver will see once dma > mapping is complete (and the IOMMU has done its tricks), so you want to > set that to 1 if the hardware can't handle an sg list. > > And nr_phys_segments? I haven't really grasped the relation between the two. Is this the number of segments handed to the IOMMU? If so, then I would need to know how many it can handle (and set it to one if there is no IOMMU). > That'll work irregardless of whether there's an IOMMU there or not. Note > that the mere existence of an IOMMU will _not_ save your performance on > this hardware, you need one with good virtual merging support to get > larger transfers. > > I thought the IOMMU could do the merging through its mapping tables? The way I understood it, sg support in the device was just to avoid wasting resources on the IOMMU by using fewer mappings (which would assume the IOMMU is segment based, not page based). Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 9:02 ` Pierre Ossman @ 2005-11-17 9:13 ` Jens Axboe 2005-11-17 9:27 ` Pierre Ossman 0 siblings, 1 reply; 13+ messages in thread From: Jens Axboe @ 2005-11-17 9:13 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML On Thu, Nov 17 2005, Pierre Ossman wrote: > Jens Axboe wrote: > > On Thu, Nov 17 2005, Pierre Ossman wrote: > > > >> I'm writing a PCI driver for the first time and I'm trying to wrap my > >> head around the DMA mappings in that world. I've done a ISA driver which > >> uses DMA, but this is a bit more complex and the documentation doesn't > >> explain everything. > >> > >> What I'm particularly confused about is how the IOMMU should be handled > >> with regard to scatterlist limits. My hardware cannot handle > >> scatterlists, only a single DMA address. But from what I understand the > >> > > > > What kind of hardware can't handle scatter gather? > > > > > > I'd figure most hardware? DMA is handled by writing the start address > into one register and a size into another. Being able to set several > addr/len pairs seems highly advanced to me. :) Must be a pretty nice rock you are living behind, since it's apparently kept you there for a long time :-) Sane hardware will accept an sg list directly. Are you sure you are reading the specifications for that hardware correctly? > >> IOMMU can be very similar to a normal "CPU" MMU. So it should be able to > >> aggregate pages that are non-continuous in physical memory into one > >> single block in bus memory. Now the question is what do I set > >> nr_phys_segments and nr_hw_segments to? Of course the code also needs to > >> handle systems without an IOMMU. > >> > > > > nr_hw_segments is how many segments your driver will see once dma > > mapping is complete (and the IOMMU has done its tricks), so you want to > > set that to 1 if the hardware can't handle an sg list. > > > > > > And nr_phys_segments? I haven't really grasped the relation between the > two. Is this the number of segments handed to the IOMMU? If so, then I > would need to know how many it can handle (and set it to one if there is > no IOMMU). nr_phys_segments is basically just to cap the segments somewhere, since the driver needs to store it before getting it dma mapped to a (perhaps) smaller number of segments. So yes, it's the number of 'real' segments before dma mapping. > > That'll work irregardless of whether there's an IOMMU there or not. Note > > that the mere existence of an IOMMU will _not_ save your performance on > > this hardware, you need one with good virtual merging support to get > > larger transfers. > > > > > > I thought the IOMMU could do the merging through its mapping tables? The > way I understood it, sg support in the device was just to avoid wasting > resources on the IOMMU by using fewer mappings (which would assume the > IOMMU is segment based, not page based). Depends on the IOMMU. Some IOMMUs just help you with address remapping for high addresses. The way I see it, with just 1 segment you need to be pretty damn picky with your hardware about what platform you use it on or risk losing 50% performance or so. -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 9:13 ` Jens Axboe @ 2005-11-17 9:27 ` Pierre Ossman 2005-11-17 9:38 ` Jens Axboe 0 siblings, 1 reply; 13+ messages in thread From: Pierre Ossman @ 2005-11-17 9:27 UTC (permalink / raw) To: Jens Axboe; +Cc: LKML Jens Axboe wrote: > On Thu, Nov 17 2005, Pierre Ossman wrote: > >> Jens Axboe wrote: >> >>> >>> >>> What kind of hardware can't handle scatter gather? >>> >>> >>> >> I'd figure most hardware? DMA is handled by writing the start address >> into one register and a size into another. Being able to set several >> addr/len pairs seems highly advanced to me. :) >> > > Must be a pretty nice rock you are living behind, since it's apparently > kept you there for a long time :-) > > The driver support is simply too good in Linux so I haven't had the need for writing a PCI driver until now. ;) > Sane hardware will accept an sg list directly. Are you sure you are > reading the specifications for that hardware correctly? > > Specifications? Such luxury. This driver is based on googling and reverse engineering. Any requests for specifications have so far been put in the round filing cabinet. What I know is that I have the registers: * System address (32 bit) * Block size (16 bit) * Block count (16 bit) >From what I've seen these are written to once. So I'm having a hard time believing these support more than one segment. >>> >>> >> And nr_phys_segments? I haven't really grasped the relation between the >> two. Is this the number of segments handed to the IOMMU? If so, then I >> would need to know how many it can handle (and set it to one if there is >> no IOMMU). >> > > nr_phys_segments is basically just to cap the segments somewhere, since > the driver needs to store it before getting it dma mapped to a (perhaps) > smaller number of segments. So yes, it's the number of 'real' segments > before dma mapping. > > So from a driver point of view, this is just a matter of memory usage? In that case, what is a good value? =) Since there is no guarantee this will be mapped down to one segment (that the hardware can accept), is it expected that the driver iterates over the entire list or can I mark only the first segment as completed and wait for the request to be reissued? (this is a MMC driver, which behaves like the block layer) >>> That'll work irregardless of whether there's an IOMMU there or not. Note >>> that the mere existence of an IOMMU will _not_ save your performance on >>> this hardware, you need one with good virtual merging support to get >>> larger transfers. >>> >>> >>> >> I thought the IOMMU could do the merging through its mapping tables? The >> way I understood it, sg support in the device was just to avoid wasting >> resources on the IOMMU by using fewer mappings (which would assume the >> IOMMU is segment based, not page based). >> > > Depends on the IOMMU. Some IOMMUs just help you with address remapping > for high addresses. The way I see it, with just 1 segment you need to be > pretty damn picky with your hardware about what platform you use it on > or risk losing 50% performance or so. > > Ok. Being a block device, the segments are usually rather large so the overhead of setting up many DMA transfers shouldn't be that terrible. Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 9:27 ` Pierre Ossman @ 2005-11-17 9:38 ` Jens Axboe 2005-11-17 9:49 ` Pierre Ossman 2005-12-18 22:41 ` Pierre Ossman 0 siblings, 2 replies; 13+ messages in thread From: Jens Axboe @ 2005-11-17 9:38 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML On Thu, Nov 17 2005, Pierre Ossman wrote: > Jens Axboe wrote: > > On Thu, Nov 17 2005, Pierre Ossman wrote: > > > >> Jens Axboe wrote: > >> > >>> > >>> > >>> What kind of hardware can't handle scatter gather? > >>> > >>> > >>> > >> I'd figure most hardware? DMA is handled by writing the start address > >> into one register and a size into another. Being able to set several > >> addr/len pairs seems highly advanced to me. :) > >> > > > > Must be a pretty nice rock you are living behind, since it's apparently > > kept you there for a long time :-) > > > > > > The driver support is simply too good in Linux so I haven't had the need > for writing a PCI driver until now. ;) ;-) > > Sane hardware will accept an sg list directly. Are you sure you are > > reading the specifications for that hardware correctly? > > > > > > Specifications? Such luxury. This driver is based on googling and > reverse engineering. Any requests for specifications have so far been > put in the round filing cabinet. > > What I know is that I have the registers: > > * System address (32 bit) > * Block size (16 bit) > * Block count (16 bit) Sounds like a pretty simple device, then. Any device engineered for any kind of at least half serious performance would accept more than just a address/length tupple. > >From what I've seen these are written to once. So I'm having a hard time > believing these support more than one segment. > > >>> > >>> > >> And nr_phys_segments? I haven't really grasped the relation between the > >> two. Is this the number of segments handed to the IOMMU? If so, then I > >> would need to know how many it can handle (and set it to one if there is > >> no IOMMU). > >> > > > > nr_phys_segments is basically just to cap the segments somewhere, since > > the driver needs to store it before getting it dma mapped to a (perhaps) > > smaller number of segments. So yes, it's the number of 'real' segments > > before dma mapping. > > > > > > So from a driver point of view, this is just a matter of memory usage? > In that case, what is a good value? =) Yep. A good value depends on how big a transfer you can support anyways and how fast the device is. And how much you potentially gain by doing larger transfers as compared to small. The block layer default is 128 segments, but that's probably too big for you. Something like 16 should still give you at least 64kb transfers. > Since there is no guarantee this will be mapped down to one segment > (that the hardware can accept), is it expected that the driver iterates > over the entire list or can I mark only the first segment as completed > and wait for the request to be reissued? (this is a MMC driver, which > behaves like the block layer) Ah MMC, that explains a few things :-) It's quite legal (and possible) to partially handle a given request, you are not obliged to handle a request as a single unit. See how other block drivers have an end request handling function ala: void my_end_request(struct hw_struct *hw, struct request *rq, int nbytes, int uptodate) { ... if (!end_that_request_chunk(rq, uptodate, nbytes)) { blkdev_dequeue_request(rq); end_that_request_last(rq); } ... } elv_next_request() will keep giving you the same request until you have dequeued and ended it, so you don't have to keep track of the 'current' request. end_that_request_*() will make sure the request state is sane after each call as well, so you can treat the request as a new one every time. Doing partial requests is not harder than doing full requests. > >>> That'll work irregardless of whether there's an IOMMU there or not. Note > >>> that the mere existence of an IOMMU will _not_ save your performance on > >>> this hardware, you need one with good virtual merging support to get > >>> larger transfers. > >>> > >>> > >>> > >> I thought the IOMMU could do the merging through its mapping tables? The > >> way I understood it, sg support in the device was just to avoid wasting > >> resources on the IOMMU by using fewer mappings (which would assume the > >> IOMMU is segment based, not page based). > >> > > > > Depends on the IOMMU. Some IOMMUs just help you with address remapping > > for high addresses. The way I see it, with just 1 segment you need to be > > pretty damn picky with your hardware about what platform you use it on > > or risk losing 50% performance or so. > > > > > > Ok. Being a block device, the segments are usually rather large so the > overhead of setting up many DMA transfers shouldn't be that terrible. The segments will typically be paged size, so could be worse. It all depends on what your command overhead is like whether it hurts performance a lot or not. -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 9:38 ` Jens Axboe @ 2005-11-17 9:49 ` Pierre Ossman 2005-11-17 12:02 ` Jens Axboe 2005-12-18 22:41 ` Pierre Ossman 1 sibling, 1 reply; 13+ messages in thread From: Pierre Ossman @ 2005-11-17 9:49 UTC (permalink / raw) To: Jens Axboe; +Cc: LKML Jens Axboe wrote: > On Thu, Nov 17 2005, Pierre Ossman wrote: > >> Ok. Being a block device, the segments are usually rather large so the >> overhead of setting up many DMA transfers shouldn't be that terrible. >> > > The segments will typically be paged size, so could be worse. It all > depends on what your command overhead is like whether it hurts > performance a lot or not. > > MMC overhead is a lot larger than sending new addr/len tuples to the hardware. So I suppose there is performance to be gained by iterating over the segments inside the driver. Thanks for clearing things up. Maybe someone could update DMA-mapping.txt with the things you've explained to me here *hint* ;) Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 9:49 ` Pierre Ossman @ 2005-11-17 12:02 ` Jens Axboe 0 siblings, 0 replies; 13+ messages in thread From: Jens Axboe @ 2005-11-17 12:02 UTC (permalink / raw) To: Pierre Ossman; +Cc: LKML On Thu, Nov 17 2005, Pierre Ossman wrote: > Thanks for clearing things up. Maybe someone could update > DMA-mapping.txt with the things you've explained to me here *hint* ;) Most of it is block driver specific, I doubt I added much in the way of actual DMA-mapping.txt :-) But yeah, it's not the first time I've been asked these questions. At least this time it was with lkml cc'ed, so I can point others at the thread! -- Jens Axboe ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-11-17 9:38 ` Jens Axboe 2005-11-17 9:49 ` Pierre Ossman @ 2005-12-18 22:41 ` Pierre Ossman 2005-12-20 11:10 ` Tejun Heo 1 sibling, 1 reply; 13+ messages in thread From: Pierre Ossman @ 2005-12-18 22:41 UTC (permalink / raw) To: Jens Axboe; +Cc: LKML Revisiting a dear old thread. :) After some initial tests, some more questions popped up. See below. Jens Axboe wrote: > On Thu, Nov 17 2005, Pierre Ossman wrote: > >> Since there is no guarantee this will be mapped down to one segment >> (that the hardware can accept), is it expected that the driver iterates >> over the entire list or can I mark only the first segment as completed >> and wait for the request to be reissued? (this is a MMC driver, which >> behaves like the block layer) >> > > Ah MMC, that explains a few things :-) > > It's quite legal (and possible) to partially handle a given request, you > are not obliged to handle a request as a single unit. See how other > block drivers have an end request handling function ala: > > After testing this it seems the block layer never gives me more than max_hw_segs segments. Is it being clever because I'm compiling for a system without an IOMMU? The hardware should (haven't properly tested this) be able to get new DMA addresses during a transfer. In essence scatter gather with some CPU support. Since I avoid MMC overhead this should give a nice performance boost. But this relies on the block layer giving me more than one segment. Do I need to lie in max_hw_segs to achieve this? Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-12-18 22:41 ` Pierre Ossman @ 2005-12-20 11:10 ` Tejun Heo 2005-12-20 11:36 ` Pierre Ossman 0 siblings, 1 reply; 13+ messages in thread From: Tejun Heo @ 2005-12-20 11:10 UTC (permalink / raw) To: Pierre Ossman; +Cc: Jens Axboe, LKML Pierre Ossman wrote: > Revisiting a dear old thread. :) > > After some initial tests, some more questions popped up. See below. > > Jens Axboe wrote: > >>On Thu, Nov 17 2005, Pierre Ossman wrote: >> >> >>>Since there is no guarantee this will be mapped down to one segment >>>(that the hardware can accept), is it expected that the driver iterates >>>over the entire list or can I mark only the first segment as completed >>>and wait for the request to be reissued? (this is a MMC driver, which >>>behaves like the block layer) >>> >> >>Ah MMC, that explains a few things :-) >> >>It's quite legal (and possible) to partially handle a given request, you >>are not obliged to handle a request as a single unit. See how other >>block drivers have an end request handling function ala: >> >> > > > After testing this it seems the block layer never gives me more than > max_hw_segs segments. Is it being clever because I'm compiling for a > system without an IOMMU? > > The hardware should (haven't properly tested this) be able to get new > DMA addresses during a transfer. In essence scatter gather with some CPU > support. Since I avoid MMC overhead this should give a nice performance > boost. But this relies on the block layer giving me more than one > segment. Do I need to lie in max_hw_segs to achieve this? > Hi, Pierre. max_phys_segments: the maximum number of segments in a request *before* DMA mapping max_hw_segments: the maximum number of segments in a request *after* DMA mapping (ie. after IOMMU merging) Those maximum numbers are for block layer. Block layer must not exceed above limits when it passes a request downward. As long as all entries in sg are processed, block layer doesn't care whether sg iteration is performed by the driver or hardware. So, if you're gonna perform sg by iterating in the driver, what numbers to report for max_phys_segments and max_hw_segments is entirely upto how many entries the driver can handle. Just report some nice number (64 or 128?) for both. Don't forget that the number of sg entries can be decreased after DMA-mapping on machines with IOMMU. IOW, the part which performs sg iteration gets to determine above limits. In your case, the driver is reponsible for both iterations (pre and post DMA mapping), so all the limits are upto the driver. Hope it helped. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-12-20 11:10 ` Tejun Heo @ 2005-12-20 11:36 ` Pierre Ossman 2005-12-20 12:04 ` Tejun Heo 0 siblings, 1 reply; 13+ messages in thread From: Pierre Ossman @ 2005-12-20 11:36 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, LKML Tejun Heo wrote: > Pierre Ossman wrote: >> Revisiting a dear old thread. :) >> >> After some initial tests, some more questions popped up. See below. >> >> Jens Axboe wrote: >> >>> On Thu, Nov 17 2005, Pierre Ossman wrote: >>> >>> >>>> Since there is no guarantee this will be mapped down to one segment >>>> (that the hardware can accept), is it expected that the driver >>>> iterates >>>> over the entire list or can I mark only the first segment as completed >>>> and wait for the request to be reissued? (this is a MMC driver, which >>>> behaves like the block layer) >>>> >>> >>> Ah MMC, that explains a few things :-) >>> >>> It's quite legal (and possible) to partially handle a given request, >>> you >>> are not obliged to handle a request as a single unit. See how other >>> block drivers have an end request handling function ala: >>> >>> >> >> >> After testing this it seems the block layer never gives me more than >> max_hw_segs segments. Is it being clever because I'm compiling for a >> system without an IOMMU? >> >> The hardware should (haven't properly tested this) be able to get new >> DMA addresses during a transfer. In essence scatter gather with some CPU >> support. Since I avoid MMC overhead this should give a nice performance >> boost. But this relies on the block layer giving me more than one >> segment. Do I need to lie in max_hw_segs to achieve this? >> > > Hi, Pierre. > > max_phys_segments: the maximum number of segments in a request > *before* DMA mapping > > max_hw_segments: the maximum number of segments in a request > *after* DMA mapping (ie. after IOMMU merging) > > Those maximum numbers are for block layer. Block layer must not > exceed above limits when it passes a request downward. As long as all > entries in sg are processed, block layer doesn't care whether sg > iteration is performed by the driver or hardware. > > So, if you're gonna perform sg by iterating in the driver, what > numbers to report for max_phys_segments and max_hw_segments is > entirely upto how many entries the driver can handle. > > Just report some nice number (64 or 128?) for both. Don't forget that > the number of sg entries can be decreased after DMA-mapping on > machines with IOMMU. > > IOW, the part which performs sg iteration gets to determine above > limits. In your case, the driver is reponsible for both iterations > (pre and post DMA mapping), so all the limits are upto the driver. > > I'm still a bit confused why the block layer needs to know the maximum number of hw segments. Different hardware might be connected to different IOMMU:s, so only the driver will now how much the number can be reduced. So the block layer should only care about not going above max_phys_segments, since that's what the driver has room for. What is the scenario that requires both? Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-12-20 11:36 ` Pierre Ossman @ 2005-12-20 12:04 ` Tejun Heo 2005-12-20 12:28 ` Pierre Ossman 0 siblings, 1 reply; 13+ messages in thread From: Tejun Heo @ 2005-12-20 12:04 UTC (permalink / raw) To: Pierre Ossman; +Cc: Jens Axboe, LKML Pierre Ossman wrote: > Tejun Heo wrote: > >>Pierre Ossman wrote: >>> >>>After testing this it seems the block layer never gives me more than >>>max_hw_segs segments. Is it being clever because I'm compiling for a >>>system without an IOMMU? >>> >>>The hardware should (haven't properly tested this) be able to get new >>>DMA addresses during a transfer. In essence scatter gather with some CPU >>>support. Since I avoid MMC overhead this should give a nice performance >>>boost. But this relies on the block layer giving me more than one >>>segment. Do I need to lie in max_hw_segs to achieve this? >>> >> >>Hi, Pierre. >> >>max_phys_segments: the maximum number of segments in a request >> *before* DMA mapping >> >>max_hw_segments: the maximum number of segments in a request >> *after* DMA mapping (ie. after IOMMU merging) >> >>Those maximum numbers are for block layer. Block layer must not >>exceed above limits when it passes a request downward. As long as all >>entries in sg are processed, block layer doesn't care whether sg >>iteration is performed by the driver or hardware. >> >>So, if you're gonna perform sg by iterating in the driver, what >>numbers to report for max_phys_segments and max_hw_segments is >>entirely upto how many entries the driver can handle. >> >>Just report some nice number (64 or 128?) for both. Don't forget that >>the number of sg entries can be decreased after DMA-mapping on >>machines with IOMMU. >> >>IOW, the part which performs sg iteration gets to determine above >>limits. In your case, the driver is reponsible for both iterations >>(pre and post DMA mapping), so all the limits are upto the driver. >> >> > > > I'm still a bit confused why the block layer needs to know the maximum > number of hw segments. Different hardware might be connected to > different IOMMU:s, so only the driver will now how much the number can > be reduced. So the block layer should only care about not going above > max_phys_segments, since that's what the driver has room for. > > What is the scenario that requires both? > Let's say there is a piece of (crap) controller which can handle 4 segments; but the system has a powerful IOMMU which can merge pretty well. The driver wants to handle large requests for performance but it doesn't want to break up requests itself (pretty pointless, block layer merges, driver breaks down). A request should be large but not larger than what the hardware can take at once. So, it uses max_phys_segments to tell block layer how many sg entries the driver is willing to handle (some arbitrary large number) and reports 4 for max_hw_segments letting block layer know that requests should not be more than 4 segments after DMA-mapping. To sum up, block layer performs request sizing in favor of block drivers, so it needs to know the size limits. Is this explanation any better than my previous one? :-P Also, theoretically there can be more than one IOMMUs on a system (is there already?). Block layer isn't yet ready to handle such cases but when it becomes necessary, all that needed is to make currently global IOMMU merging parameters request queue specific and modify drivers such that they tell block layer their IOMMU parameters. -- tejun ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: IOMMU and scatterlist limits 2005-12-20 12:04 ` Tejun Heo @ 2005-12-20 12:28 ` Pierre Ossman 0 siblings, 0 replies; 13+ messages in thread From: Pierre Ossman @ 2005-12-20 12:28 UTC (permalink / raw) To: Tejun Heo; +Cc: Jens Axboe, LKML Tejun Heo wrote: > Pierre Ossman wrote: >> >> I'm still a bit confused why the block layer needs to know the maximum >> number of hw segments. Different hardware might be connected to >> different IOMMU:s, so only the driver will now how much the number can >> be reduced. So the block layer should only care about not going above >> max_phys_segments, since that's what the driver has room for. >> >> What is the scenario that requires both? >> > > Let's say there is a piece of (crap) controller which can handle 4 > segments; but the system has a powerful IOMMU which can merge pretty > well. The driver wants to handle large requests for performance but > it doesn't want to break up requests itself (pretty pointless, block > layer merges, driver breaks down). A request should be large but not > larger than what the hardware can take at once. > > So, it uses max_phys_segments to tell block layer how many sg entries > the driver is willing to handle (some arbitrary large number) and > reports 4 for max_hw_segments letting block layer know that requests > should not be more than 4 segments after DMA-mapping. > > To sum up, block layer performs request sizing in favor of block > drivers, so it needs to know the size limits. > > Is this explanation any better than my previous one? :-P > > Also, theoretically there can be more than one IOMMUs on a system (is > there already?). Block layer isn't yet ready to handle such cases but > when it becomes necessary, all that needed is to make currently global > IOMMU merging parameters request queue specific and modify drivers > such that they tell block layer their IOMMU parameters. > Ahh. I thought the block layer wasn't aware of any IOMMU. Since I saw those as bus specific I figured only the DMA APIs, which have access to the device object, could know which IOMMU was to be used and how it would merge segments. So in my case I'll have to lie to the block layer. Iterating in the driver will be much faster than having to do an entire new transfer. Thanks for clearing things up. :) Rgds Pierre ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-12-20 12:28 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-17 8:34 IOMMU and scatterlist limits Pierre Ossman 2005-11-17 8:54 ` Jens Axboe 2005-11-17 9:02 ` Pierre Ossman 2005-11-17 9:13 ` Jens Axboe 2005-11-17 9:27 ` Pierre Ossman 2005-11-17 9:38 ` Jens Axboe 2005-11-17 9:49 ` Pierre Ossman 2005-11-17 12:02 ` Jens Axboe 2005-12-18 22:41 ` Pierre Ossman 2005-12-20 11:10 ` Tejun Heo 2005-12-20 11:36 ` Pierre Ossman 2005-12-20 12:04 ` Tejun Heo 2005-12-20 12:28 ` Pierre Ossman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox