public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Bio pool & scsi scatter gather pool usage
@ 2002-04-18 22:58 Mark Peloquin
  2002-04-18 23:36 ` Alan Cox
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Peloquin @ 2002-04-18 22:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


Andew Morton wrote:
> Mark Peloquin wrote:
> >
> >  ...
> > > Tell me why this won't work?
> >
> > This would require the BIO assembly code to make at least one
> > call to find the current permissible BIO size at offset xyzzy.
> > Depending on the actual IO size many foo_max_bytes calls may
> > be required. Envision the LVM or RAID case where physical
> > extents or chunks sizes can be as small as 8Kb I believe. For
> > a 64Kb IO, its conceivable that 9 calls to foo_max_bytes may
> > be required to package that IO into permissibly sized BIOs.

> True.  But probably the common case is not as bad as that, and
> these repeated calls are probably still cheaper than allocating
> and populating the redundant top-level BIO.

Perhaps, but calls are expensive. Repeated calls down stacked block
devices will add up. In only the most unusually cases will there
be a need to allocate more than one top-level BIO. So the savings
for most cases requiring splitting will just be a single BIO. The
other BIOs will still need to be allocated and populated, but it
would just happen outside the block devices. The savings of doing
repeated calls vs allocating a single BIO is not obvious to me.

> Also, the top-level code can be cache-friendly.  The bad way
> to write it would be to do:
>
> while (more to send) {
>  maxbytes = bio_max_bytes(block);
>  build_and_send_a_bio(block, maxbytes);
>  block += maxbytes / whatever;
> }

> That creates long code paths and L1 cache thrashing.  Kernel
> tends to do that rather a lot in the IO paths.

> The good way is:
>  int maxbytes[something];
>  int i = 0;
>  while (more_to_send) {
>   maxbytes[i] = bio_max_bytes(block);
>   block += maxbytes[i++] / whatever;
>  }
>  i = 0;
>  while (more_to_send) {
>   build_and_send_a_bio(block, maxbytes[i]);
>   block += maxbytes[i++] / whatever;
>  }

> if you get my drift.  This way the computational costs of
> the second and succeeding bio_max_bytes() calls are very
> small.

Yup.

> One thing which concerns me about the whole scheme at
> present is that the uncommon case (volume managers, RAID,
> etc) will end up penalising the common case - boring
> old ext2 on boring old IDE/SCSI.

Yes it would.

> Right now, BIO_MAX_SECTORS is only 64k, and IDE can
> take twice that.  I'm not sure what the largest
> request size is for SCSI - certainly 128k.

In 2.5.7, Jens allows the BIO vectors to hold upto 256
pages, so it would seem that larger than 64k IOs are
planned.

> Let's not create any designed-in limitations at this
> stage of the game.

Not really trying to create any limitations, just trying
to balance what I think could be a performance concern.


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Bio pool & scsi scatter gather pool usage
@ 2002-04-18 23:11 Douglas Gilbert
  0 siblings, 0 replies; 25+ messages in thread
From: Douglas Gilbert @ 2002-04-18 23:11 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel

Andrew Morton <akpm@zip.com.au> wrote:

<snip/>
> Right now, BIO_MAX_SECTORS is only 64k, and IDE can
> take twice that.  I'm not sure what the largest
> request size is for SCSI - certainly 128k.

Scatter gather lists in the scsi subsystem have a max
length of 255. The actual maximum size is dictated by 
the HBA driver (sg_tablesize). The HBA driver can
further throttle the size of a single transfer with
max_sectors.

Experiments with raw IO (both in 2.4 and 2.5) indicate
that pages are not contiguous when the scatter gather 
list is built. On i386 this limits the maximum transfer
size of a single scsi command to just less than 1 MB.

Doug Gilbert

^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: Bio pool & scsi scatter gather pool usage
@ 2002-04-18 18:23 Mark Peloquin
  2002-04-18 18:57 ` Andrew Morton
  2002-04-25 19:43 ` Mike Fedyk
  0 siblings, 2 replies; 25+ messages in thread
From: Mark Peloquin @ 2002-04-18 18:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


> Andrew Morton wrote:
> >
> > Mark Peloquin wrote:
> > >
> > ...
> > > In EVMS, we are adding code to deal with BIO splitting, to
> > > enable our feature modules, such as DriveLinking, LVM, & MD
> > > Linear, etc to break large BIOs up on chunk size or lower
> > > level device boundaries.
> >
> > Could I suggest that this code not be part of EVMS, but that
> > you implement it as a library within the core kernel?  Lots of
> > stuff is going to need BIO splitting - software RAID, ataraid,
> > XFS, etc.  May as well talk with Jens, Martin Petersen, Arjan,
> > Neil Brown.  Do it once, do it right...
> >
> I take that back.
>
> We really, really do not want to perform BIO splitting at all.
> It requires that the kernel perform GFP_NOIO allocations at
> the worst possible time, and it's just broken.
>
> What I would much prefer is that the top-level BIO assembly
> code be able to find out, beforehand, what the maximum
> permissible BIO size is at the chosen offset.  It can then
> simple restrict the BIO to that size.
>
> Simply:
>
>  max = bio_max_bytes(dev, block);
>
> which gets passed down the exact path as the requests themselves.
> Each layer does:
>
> int foo_max_bytes(sector_t sector)
> {
>  int my_maxbytes, his_maxbytes;
>  sector_t my_sector;
>
>  my_sector = my_translation(sector);
>  his_maxbytes = next_device(me)->max_bytes(my_sector);
>  my_maxbytes = whatever(my_sector);
>  return min(my_maxbytes, his_maxbytes);
> }
>
> and, at the bottom:
>
> int ide_max_bytes(sector_t sector)
> {
>  return 248 * 512;
> }
>
> BIO_MAX_SECTORS and request_queue.max_sectors go away.
>
> Tell me why this won't work?

This would require the BIO assembly code to make at least one
call to find the current permissible BIO size at offset xyzzy.
Depending on the actual IO size many foo_max_bytes calls may
be required. Envision the LVM or RAID case where physical
extents or chunks sizes can be as small as 8Kb I believe. For
a 64Kb IO, its conceivable that 9 calls to foo_max_bytes may
be required to package that IO into permissibly sized BIOs.

What your proposing is doable, but not without a cost.

This cost would be incurred to some degree on every IO, rather
than just on the exception case. Certain underlying storage
layouts would pay a higher cost, but they also had a higher
cost if they had to split BIOs themselves.

Perhaps foo_max_bytes also accept a size and could be coded
to return a list of sizes, only one call would be required
to determine all the permissible BIO sizes needed to package
an IO of a specified size.

What your proposal guarantees is that BIOs would never have
to split up at all.

Mark


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Bio pool & scsi scatter gather pool usage
@ 2002-04-18 13:58 Mark Peloquin
  2002-04-18 16:17 ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Peloquin @ 2002-04-18 13:58 UTC (permalink / raw)
  To: linux-kernel

I'm experiencing a problem using the bio pool created in
2.5.7 and I'm not quite able to put my finger on the cause
and hoped somone might have the knowledge and insight to
understand this problem.

In EVMS, we are adding code to deal with BIO splitting, to
enable our feature modules, such as DriveLinking, LVM, & MD
Linear, etc to break large BIOs up on chunk size or lower
level device boundaries.

In the first implementation of this, EVMS created it own
private pool of BIOs to use both for internally generated
synchronous IOs as well as for the source of BIOs used to
create the resulting split BIOs. In this implementation,
everything worked well, even under heavy loads.

However, after some thought, I concluded it was redundant
of EVMS to create its own pool of BIOs, when 2.5 had already
created a pool along with several support routines for
appropriately dealing with them.

So I made the changes to EVMS to use the 2.5 BIO pool. I
started testing a volume using linear concatination and BIO
splitting. In the test, I have an ext2 filesystem formatted
with a block size of 4096. The BIO split function was
tweaked to maximize the stress by splitting all BIOs into
512 byte pieces. So this test is generating 8 additional
BIOs for each one coming down for this volume.

The allocation and initialization of the resulting split
BIOs seems to be correct and works in light loads. However,
under heavier loads, the assert in scsi_merge.c:82
{BUG_ON(!sgpnt)} fires, due to the fact that scatter gather
pool for MAX_PHYS_SEGMENTS (128) is empty. This is occurring
at interrupt time when __scsi_end_request is attempting to
queue the next request.

Its not perfectly clear to me how switching from a private
BIO pool to the 2.5 BIO pool should affect the usage of the
scsi driver's scatter gather pools.

Rather than simply increasing the size of scatter gather
pools I hope to understand how these changes resulted in
this behaviour so the proper solution can be determined.

Another data point: I have observed that the BIO pool does
get depleted below the 50% point of its mininum value, and
in such cases mempool_alloc (the internal worker for
bio_alloc) tries to free up more memory (I assume to grow
the pool) by waking bdflush. As a result, even more
pressure is put on the BIO pool when the dirty buffers
are being flushed.

</speculation on>

BIO splitting does increase the pressure on the BIO pool.
mempool_alloc increases pressure on all IO pools when it
wakes bdflush. BIOs splitting alone (when from a private
pool) didn't create sufficient IO pressure to deplete the
currently sized pools in the IO path. Can the behaviour
of mempool_alloc, triggering bdflush, in addition to BIO
splitting adequately explain why the scsi scatter gather
pool would become depleted?

</speculation off>

Have I caused a problem by unrealistically increasing
pressure on the BIO pool by a factor of 8? Or have I
discovered a problem that can occur on very heavy loads?
What are your thoughts on a recommended solution?

Thanks.
Mark


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2002-04-25 19:59 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-04-18 22:58 Bio pool & scsi scatter gather pool usage Mark Peloquin
2002-04-18 23:36 ` Alan Cox
2002-04-18 23:48   ` Andrew Morton
2002-04-19  7:29     ` Stephen Lord
2002-04-19  8:08       ` Joe Thornber
2002-04-19  8:51         ` Alan Cox
2002-04-19  8:58       ` Alan Cox
2002-04-19 15:27         ` Steve Lord
2002-04-19 15:57           ` Alan Cox
2002-04-19 15:51             ` Rik van Riel
2002-04-22  6:50         ` Suparna Bhattacharya
2002-04-22  7:06           ` arjan
2002-04-22  7:54             ` Suparna Bhattacharya
2002-04-24 10:20         ` Helge Hafting
2002-04-19 18:15     ` Oliver Xymoron
  -- strict thread matches above, loose matches on Subject: below --
2002-04-18 23:11 Douglas Gilbert
2002-04-18 18:23 Mark Peloquin
2002-04-18 18:57 ` Andrew Morton
2002-04-19 15:44   ` Denis Vlasenko
2002-04-25 19:43 ` Mike Fedyk
2002-04-25 19:56   ` Andrew Morton
2002-04-25 19:59   ` David Mansfield
2002-04-18 13:58 Mark Peloquin
2002-04-18 16:17 ` Andrew Morton
2002-04-18 17:35   ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox