max_hw_segments vs. max_phys_segments and scsi_alloc

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* max_hw_segments vs. max_phys_segments and scsi_alloc_queue()
@ 2004-02-26  7:15 Jeremy Higdon
  2004-02-26 15:02 ` James Bottomley
  0 siblings, 1 reply; 2+ messages in thread
From: Jeremy Higdon @ 2004-02-26  7:15 UTC (permalink / raw)
  To: linux-scsi; +Cc: jbarnes

In scsi_alloc_queue(), I see the following two lines:

	blk_queue_max_hw_segments(q, shost->sg_tablesize);
	blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);

I've been looking through the block code a bit, and it looks to me
as though max_phys_segments more accurately describes what the
I/O device hardware is capable of, while max_hw_segments is used
when you have an IOMMU.

That is, max_hw_segments is the number of discrete segments that
a PCI device would see, while max_phys_segments would be the
max s/g list size that a PCI device could handle.

It further seems as though when there is no IOMMU that the number
of hw_segments will equal the number of phys_segments, at least if I
understand the code in blk_recount_segments() and the comments
around the definition of BIO_VMERGE_BOUNDARY in
include/asm-ia64/io.h.

In particular, on ia64 machines, where BIO_VMERGE_BOUNDARY is
currently 0, and thus, the number of hw_segments equals the number
of phys_segments, we should be using the host's sg_tablesize to
set the max number of phys segments (as well as the max number
of hw segments).  I see no reason why this wouldn't carry forward
to the other architectures, though there may be limits to the
total amount of data that could be mapped.  This would have to
be fed to the block layer from the arch layer, though, I think.

Does this make sense, or have I completely missed something?

thanks

jeremy

===== drivers/scsi/scsi_lib.c 1.120 vs edited =====
--- 1.120/drivers/scsi/scsi_lib.c	Mon Feb 23 06:21:36 2004
+++ edited/drivers/scsi/scsi_lib.c	Wed Feb 25 23:13:35 2004
@@ -1285,7 +1285,7 @@
 	blk_queue_prep_rq(q, scsi_prep_fn);

 	blk_queue_max_hw_segments(q, shost->sg_tablesize);
-	blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
+	blk_queue_max_phys_segments(q, shost->sg_tablesize);
 	blk_queue_max_sectors(q, shost->max_sectors);
 	blk_queue_bounce_limit(q, scsi_calculate_bounce_limit(shost));
 	blk_queue_segment_boundary(q, shost->dma_boundary);

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: max_hw_segments vs. max_phys_segments and scsi_alloc_queue()
  2004-02-26  7:15 max_hw_segments vs. max_phys_segments and scsi_alloc_queue() Jeremy Higdon
@ 2004-02-26 15:02 ` James Bottomley
  0 siblings, 0 replies; 2+ messages in thread
From: James Bottomley @ 2004-02-26 15:02 UTC (permalink / raw)
  To: Jeremy Higdon; +Cc: SCSI Mailing List, jbarnes

On Thu, 2004-02-26 at 01:15, Jeremy Higdon wrote:
> That is, max_hw_segments is the number of discrete segments that
> a PCI device would see, while max_phys_segments would be the
> max s/g list size that a PCI device could handle.

No, max_hw_segments is the maximum size of the PCI device's sg table. 
max_hw_segments is the size of the internal sg list before dma_map_sg()
gets its paws on it.  The only parameter the drivers care about is
max_hw_segments.  It's the mid-layer that cares about the
max_phys_segments because the mid-layer provides the memory for the sg
list that comes out of blk_rq_map_sg()

> It further seems as though when there is no IOMMU that the number
> of hw_segments will equal the number of phys_segments, at least if I
> understand the code in blk_recount_segments() and the comments
> around the definition of BIO_VMERGE_BOUNDARY in
> include/asm-ia64/io.h.

That's correct.  No virtual merging => max_phys_segments ==
max_hw_segments.

> In particular, on ia64 machines, where BIO_VMERGE_BOUNDARY is
> currently 0, and thus, the number of hw_segments equals the number
> of phys_segments, we should be using the host's sg_tablesize to
> set the max number of phys segments (as well as the max number
> of hw segments).  I see no reason why this wouldn't carry forward
> to the other architectures, though there may be limits to the
> total amount of data that could be mapped.  This would have to
> be fed to the block layer from the arch layer, though, I think.
> 
> Does this make sense, or have I completely missed something?

No, you can't do this (or at least, not simply like your patch).

Like I said, the mid-layer has to allocate the sg table coming out of
blk_rq_map_sg().  It does this in scsi_alloc_sgtable() using mempools,
and the maximum sg table size it's expecting is MAX_PHYS_SEGMENTS.  If
you increase max_phys_segments beyond what the mid-layer can cope with,
you'll end up with a request we can never map.

We'd have to rejig the entire mempool setup to increase this (even
though it looks like it's nicely coded to be variable for
MAX_PHYS_SEGMENTS, in fact, the mempools are coded assuming that the
value is 128).

James

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-02-26 15:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-26  7:15 max_hw_segments vs. max_phys_segments and scsi_alloc_queue() Jeremy Higdon
2004-02-26 15:02 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox