public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: ST alloc failures
       [not found] <20040402051355.GA1604@frodo>
@ 2004-04-02  6:32 ` Christoph Hellwig
  2004-04-03  7:19   ` Kai Makisara
  0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2004-04-02  6:32 UTC (permalink / raw)
  To: Nathan Scott; +Cc: linux-scsi

[linux-scsi is the right list for st problems, moving the thread there]

On Fri, Apr 02, 2004 at 03:13:55PM +1000, Nathan Scott wrote:
> Hi all,
> 
> I'm seeing a bunch of large allocation attempts failing from
> the SCSI tape driver when doing dumps and restores ... (this
> is with a stock 2.6.4 kernel).
> 
> xfsdump: page allocation failure. order:8, mode:0xd0
> Call Trace:
>  [<c013982b>] __alloc_pages+0x33b/0x3d0
>  [<c03805ac>] enlarge_buffer+0xdc/0x1b0
>  [<c03819a3>] st_map_user_pages+0x33/0x90
>  [<c037cf24>] setup_buffering+0xb4/0x160

This looks like the driver tries to pin down the userpages first
(st_map_user_pages) but then fails and needs to use an inkernel
buffer.  Can you put some debug printks into st_map_user_pages
to see why it fails?  The actual message is harmless, it's the
same thing we had in the XFS log code:  It tries to allocate
an as large as possible buffer and if that fails tries the next
smaller power of two size.  We should probably add an __GFP_NOWARN
here.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ST alloc failures
  2004-04-02  6:32 ` ST alloc failures Christoph Hellwig
@ 2004-04-03  7:19   ` Kai Makisara
  2004-04-06  8:48     ` Nathan Scott
  0 siblings, 1 reply; 4+ messages in thread
From: Kai Makisara @ 2004-04-03  7:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Nathan Scott, linux-scsi

On Fri, 2 Apr 2004, Christoph Hellwig wrote:

> [linux-scsi is the right list for st problems, moving the thread there]
> 
> On Fri, Apr 02, 2004 at 03:13:55PM +1000, Nathan Scott wrote:
> > Hi all,
> > 
> > I'm seeing a bunch of large allocation attempts failing from
> > the SCSI tape driver when doing dumps and restores ... (this
> > is with a stock 2.6.4 kernel).
> > 
> > xfsdump: page allocation failure. order:8, mode:0xd0
> > Call Trace:
> >  [<c013982b>] __alloc_pages+0x33b/0x3d0
> >  [<c03805ac>] enlarge_buffer+0xdc/0x1b0
> >  [<c03819a3>] st_map_user_pages+0x33/0x90
> >  [<c037cf24>] setup_buffering+0xb4/0x160
> 
> This looks like the driver tries to pin down the userpages first
> (st_map_user_pages) but then fails and needs to use an inkernel
> buffer.  Can you put some debug printks into st_map_user_pages
> to see why it fails?

Pinning down pages should not fail with most modern hardware except for 
the following three cases:

1) A change in 2.6.4 (*) mandates st (and sg) not to use direct transfers 
unless the user buffer is aligned at 512 byte boundary. This means, for 
instance, that in most cases transfers from/to malloced/calloced buffers 
are forced to use bounce buffers (alignment at 8 or 16 byte boundaries).

2) There is a bug in checking the allowed address range. Most SCSI 
adapters support 64-bit addresses and so even lots of memory should not 
prevent using direct transfers.

3) Some resource shortage that happened just now. This is not a bug.

> The actual message is harmless, it's the
> same thing we had in the XFS log code:  It tries to allocate
> an as large as possible buffer and if that fails tries the next
> smaller power of two size.  We should probably add an __GFP_NOWARN
> here.

Yes. st prints a message if the allocation finally fails.

(*) Some history for those who have not followed this development:

In 2.6.3, st (and sg) started checking the user buffer alignment with 
queue_dma_alignment(). The overall default is 512 bytes. A change was 
added to the scsi code to set the limit for SCSI devices to 8 bytes.

In 2.6.4, the code setting the SCSI device alignment to 8 bytes was, for 
some reason unknown to me, removed and this put the requirement to 512 
bytes.

The alignment requirement defaults can be set using many strategies. The 
current one is very safe. The low-level drivers (practically every 
driver in this case) can relax the constraints but this is not currently 
done. Another strategy (the 8-byte limit) sets the requirements safe for 
most devices. The exceptions can then enforce more strict limits.

-- 
Kai

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ST alloc failures
  2004-04-03  7:19   ` Kai Makisara
@ 2004-04-06  8:48     ` Nathan Scott
  2004-04-06 19:09       ` Kai Makisara
  0 siblings, 1 reply; 4+ messages in thread
From: Nathan Scott @ 2004-04-06  8:48 UTC (permalink / raw)
  To: Christoph Hellwig, Kai Makisara; +Cc: linux-scsi, linux-xfs

Hi there,

On Sat, Apr 03, 2004 at 10:19:51AM +0300, Kai Makisara wrote:
> On Fri, 2 Apr 2004, Christoph Hellwig wrote:
> 
> > [linux-scsi is the right list for st problems, moving the thread there]
> > 
> > On Fri, Apr 02, 2004 at 03:13:55PM +1000, Nathan Scott wrote:
> > > Hi all,
> > > 
> > > I'm seeing a bunch of large allocation attempts failing from
> > > the SCSI tape driver when doing dumps and restores ... (this
> > > is with a stock 2.6.4 kernel).
> > > 
> > > xfsdump: page allocation failure. order:8, mode:0xd0
> > > Call Trace:
> > >  [<c013982b>] __alloc_pages+0x33b/0x3d0
> > >  [<c03805ac>] enlarge_buffer+0xdc/0x1b0
> > >  [<c03819a3>] st_map_user_pages+0x33/0x90
> > >  [<c037cf24>] setup_buffering+0xb4/0x160
> > 
> > This looks like the driver tries to pin down the userpages first
> > (st_map_user_pages) but then fails and needs to use an inkernel
> > buffer.  Can you put some debug printks into st_map_user_pages
> > to see why it fails?

Apologies for the delay; after whacking in some printk's it looks
like the point st decides to not pin down the user pages for me is
here in sgl_map_user_pages:

	/* Too big */
	if (nr_pages > max_pages) {
		return -ENOMEM;
	}

In my cases nr_pages is always 256 and max_pages is always 96 (I
see this printk a fair few times, and its always from this point).

> Pinning down pages should not fail with most modern hardware except for 
> the following three cases:
> 
> 1) A change in 2.6.4 (*) mandates st (and sg) not to use direct transfers 
> unless the user buffer is aligned at 512 byte boundary. This means, for 
> instance, that in most cases transfers from/to malloced/calloced buffers 
> are forced to use bounce buffers (alignment at 8 or 16 byte boundaries).
> 
> 2) There is a bug in checking the allowed address range. Most SCSI 
> adapters support 64-bit addresses and so even lots of memory should not 
> prevent using direct transfers.

I guess its not either of these two, from the printk?

> 3) Some resource shortage that happened just now. This is not a bug.

Hmm... I see this alot, but I have a fair bit of memory in the machine
(its during stress and regression testing that I hit this, so not sure
about the exact memory usage at each particular printk I see).

Is this something we should be tuning in xfsdump/xfsrestore, Kai?
(to make smaller requests?)

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ST alloc failures
  2004-04-06  8:48     ` Nathan Scott
@ 2004-04-06 19:09       ` Kai Makisara
  0 siblings, 0 replies; 4+ messages in thread
From: Kai Makisara @ 2004-04-06 19:09 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Christoph Hellwig, linux-scsi, linux-xfs

On Tue, 6 Apr 2004, Nathan Scott wrote:

> Hi there,
> 
> On Sat, Apr 03, 2004 at 10:19:51AM +0300, Kai Makisara wrote:
> > On Fri, 2 Apr 2004, Christoph Hellwig wrote:
> > 
> > > [linux-scsi is the right list for st problems, moving the thread there]
> > > 
> > > On Fri, Apr 02, 2004 at 03:13:55PM +1000, Nathan Scott wrote:
> > > > Hi all,
> > > > 
> > > > I'm seeing a bunch of large allocation attempts failing from
> > > > the SCSI tape driver when doing dumps and restores ... (this
> > > > is with a stock 2.6.4 kernel).
> > > > 
> > > > xfsdump: page allocation failure. order:8, mode:0xd0
> > > > Call Trace:
> > > >  [<c013982b>] __alloc_pages+0x33b/0x3d0
> > > >  [<c03805ac>] enlarge_buffer+0xdc/0x1b0
> > > >  [<c03819a3>] st_map_user_pages+0x33/0x90
> > > >  [<c037cf24>] setup_buffering+0xb4/0x160
> > > 
> > > This looks like the driver tries to pin down the userpages first
> > > (st_map_user_pages) but then fails and needs to use an inkernel
> > > buffer.  Can you put some debug printks into st_map_user_pages
> > > to see why it fails?
> 
> Apologies for the delay; after whacking in some printk's it looks
> like the point st decides to not pin down the user pages for me is
> here in sgl_map_user_pages:
> 
> 	/* Too big */
> 	if (nr_pages > max_pages) {
> 		return -ENOMEM;
> 	}
> 
> In my cases nr_pages is always 256 and max_pages is always 96 (I
> see this printk a fair few times, and its always from this point).
> 
OK. max_pages is the maximum number of scatter/gather segments supported 
by the SCSI adapter.

> > Pinning down pages should not fail with most modern hardware except for 
> > the following three cases:
> > 
> > 1) A change in 2.6.4 (*) mandates st (and sg) not to use direct transfers 
> > unless the user buffer is aligned at 512 byte boundary. This means, for 
> > instance, that in most cases transfers from/to malloced/calloced buffers 
> > are forced to use bounce buffers (alignment at 8 or 16 byte boundaries).
> > 
> > 2) There is a bug in checking the allowed address range. Most SCSI 
> > adapters support 64-bit addresses and so even lots of memory should not 
> > prevent using direct transfers.
> 
> I guess its not either of these two, from the printk?
> 
Correct. 1) was something that would have explained why you see this 
starting from 2.6.4. I am happy that it is not 2 :-)

> > 3) Some resource shortage that happened just now. This is not a bug.
> 
> Hmm... I see this alot, but I have a fair bit of memory in the machine
> (its during stress and regression testing that I hit this, so not sure
> about the exact memory usage at each particular printk I see).
> 
Having a lot of memory does not help because it gets fragmented, too. st 
is trying to allocate big chunks so that it can satisfy the user requests 
with the available number of s/g segments even when the user successively 
requests bigger and bigger block sizes. Usually smaller than maximum 
chunks can be used if the user just uses the same block size for 
subsequent requests. The driver tries to allocate smaller chunks if 
allocation of big chunks fails and the smaller chunks are big enough for 
the current user request. This is what happens in your case now. Earlier 
allocations with the big chunk size have succeeded and no error messages 
have been written.

> Is this something we should be tuning in xfsdump/xfsrestore, Kai?
> (to make smaller requests?)
> 
There are actually two problems. As Christoph said, the messages you see 
are harmless. I have already sent to linux-scsi a patch that adds 
__GFP_NOWARN to the allocation. This should remove these error messages.

The other problem is that you probably would like to use direct transfers 
between the xfsdump/xfsrestore buffer and the drive instead of using the 
"bounce" buffer in the driver. This is not possible unless the tape 
requests are small enough for the SCSI adapter. In your case the limit is 
96 pages. You can try to increase this limit but it is not a general 
solution.

I would recommend xfsdump/xfsrestore to use smaller requests if possible. 
64 pages of 4 kB would make 256 kB. Using this request size should not 
limit throughput even with the fastest tape drives.

I would like to make st somehow tell the users when it is using the driver 
buffer instead of direct transfers. Some users would probably like to know 
this because it limits throughput in some cases. The best idea I have so 
far is to log a message once for each open if this happens. Even this 
may be too much. Good ideas are welcome.

-- 
Kai

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-04-06 19:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20040402051355.GA1604@frodo>
2004-04-02  6:32 ` ST alloc failures Christoph Hellwig
2004-04-03  7:19   ` Kai Makisara
2004-04-06  8:48     ` Nathan Scott
2004-04-06 19:09       ` Kai Makisara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox