All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
Cc: Kai Makisara <Kai.Makisara@kolumbus.fi>, linux-scsi@vger.kernel.org
Subject: Re: After memory pressure: can't read from tape anymore
Date: Tue, 30 Nov 2010 18:10:06 +0200	[thread overview]
Message-ID: <4CF521DE.7020800@panasas.com> (raw)
In-Reply-To: <1291123886.2181.1347.camel@quux.techfak.uni-bielefeld.de>

On 11/30/2010 03:31 PM, Lukas Kolbe wrote:
> On Mon, 2010-11-29 at 19:09 +0200, Kai Makisara wrote:
> 
> Hi,
> 
>>> On our backup system (2 LTO4 drives/Tandberg library via LSISAS1068E,
>>> Kernel 2.6.36 with the stock Fusion MPT SAS Host driver 3.04.17 on
>>> debian/squeeze), we see reproducible tape read and write failures after
>>> the system was under memory pressure:
>>>
>>> [342567.297152] st0: Can't allocate 2097152 byte tape buffer.
>>> [342569.316099] st0: Can't allocate 2097152 byte tape buffer.
>>> [342570.805164] st0: Can't allocate 2097152 byte tape buffer.
>>> [342571.958331] st0: Can't allocate 2097152 byte tape buffer.
>>> [342572.704264] st0: Can't allocate 2097152 byte tape buffer.
>>> [342873.737130] st: from_buffer offset overflow.
>>>
>>> Bacula is spewing this message every time it tries to access the tape
>>> drive:
>>> 28-Nov 19:58 sd1.techfak JobId 2857: Error: block.c:1002 Read error on fd=10 at file:blk 0:0 on device "drv2" (/dev/nst0). ERR=Input/output error
>>>
>>> By memory pressure, I mean that the KVM processes containing the
>>> postgres-db (~20million files) and the bacula director have used all
>>> available RAM, one of them used ~4GiB of its 12GiB swap for an hour or
>>> so (by selecting a full restore, it seems that the whole directory tree
>>> of the 15mio files backup gets read into memory). After this, I wasn't
>>> able to read from the second tape drive anymore (/dev/st0); whereas the
>>> first tape drive was restoring the data happily (it is currently about
>>> halfway through a 3TiB restore from 5 tapes).
>>>
>>> This same behaviour appears when we're doing a few incremental backups;
>>> after a while, it just isn't possible to use the tape drives anymore -
>>> every I/O operation gives an I/O Error, even a simple dd bs=64k
>>> count=10. After a restart, the system behaves correctly until
>>> -seemingly- another memory pressure situation occured.
>>>
>> This is predictable. The maximum number of scatter/gather segments seems 
>> to be 128. The st driver first tries to set up transfer directly from the 
>> user buffer to the HBA. The user buffer is usually fragmented so that one 
>> scatter/gather segment is used for each page. Assuming 4 kB page size, the 
>> maximu size of the direct transfer is 128 x 4 = 512 kB.
>>
>> When this fails, the driver tries to allocate a kernel buffer so that 
>> there larger than 4 kB physically contiguous segments. Let's assume that 
>> it can find 128 16 kB segments. In this case the maximum block size is 
>> 2048 kB. Memory pressure results in memory fragmentation and the driver 
>> can't find large enough segments and allocation fails. This is what you 
>> are seeing.
> 
> Reasonable explanation, thanks. What makes me wonder is why it still
> fails *after* memory pressure was gone - ie free shows more than 4GiB of
> free memory. I had the output of /proc/meminfo at that time but can't
> find it anymore :/
> 
>> So, one solution is to use 512 kB block size. Another one is to try to 
>> find out if the 128 segment limit is a physical limitation or just a 
>> choice. In the latter case the mptsas driver could be modified to support 
>> larger block size even after memory fragmentation.
> 
> Even with 64kb blocksize (dd bs=64k), I was getting I/O errors trying to
> access the tape drive. I am now trying to upper the max_sg_segs
> parameter to the st module (modinfo says 256 is the default; I'm trying
> 1024 now) and see how well this works under memory pressure.
> 

It looks like something is broken/old-code in sr. Most important LLDs
and block-layer scsi-ml fully support sg-chaining that effectively are
able to deliver limitless (Only limited by HW) sg sizes. It looks like
sr has some code that tries to allocate contiguous buffers larger than
PAGE_SIZE. Why does it do that? It should not be necessary any more.

>> Kai
> 

Boaz

  reply	other threads:[~2010-11-30 16:10 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-28 19:15 After memory pressure: can't read from tape anymore Lukas Kolbe
2010-11-29 17:09 ` Kai Makisara
2010-11-30 13:31   ` Lukas Kolbe
2010-11-30 16:10     ` Boaz Harrosh [this message]
2010-11-30 16:23       ` Kai Makisara
2010-11-30 16:44         ` Boaz Harrosh
2010-11-30 17:04           ` Kai Makisara
2010-11-30 17:24             ` Boaz Harrosh
2010-11-30 19:53               ` Kai Makisara
2010-12-01  9:40                 ` Lukas Kolbe
2010-12-02 11:17                   ` Desai, Kashyap
2010-12-02 16:22                     ` Kai Makisara
2010-12-02 18:14                       ` Desai, Kashyap
2010-12-02 20:25                         ` Kai Makisara
2010-12-05 10:44                           ` Lukas Kolbe
2010-12-03 10:13                       ` FUJITA Tomonori
2010-12-03 10:45                         ` Desai, Kashyap
2010-12-03 11:11                           ` FUJITA Tomonori
2010-12-02 10:01                 ` Lukas Kolbe
2010-12-03  9:44               ` FUJITA Tomonori
2010-11-30 16:20     ` Kai Makisara
2010-12-01 17:06       ` Lukas Kolbe
2010-12-02 16:41         ` Kai Makisara
2010-12-06  7:59           ` Kai Makisara
2010-12-06  8:50             ` FUJITA Tomonori
2010-12-06  9:36             ` Lukas Kolbe
2010-12-06 11:34               ` Bjørn Mork
2010-12-08 14:19               ` Lukas Kolbe
2010-12-03 12:27   ` FUJITA Tomonori
2010-12-03 14:59     ` Kai Mäkisara
2010-12-03 15:06       ` James Bottomley
2010-12-03 17:03         ` Lukas Kolbe
2010-12-03 18:10           ` James Bottomley
2010-12-05 10:53             ` Lukas Kolbe
2010-12-05 12:16               ` FUJITA Tomonori
2010-12-14 20:35             ` Vladislav Bolkhovitin
2010-12-14 22:23               ` Stephen Hemminger
2010-12-15 16:27                 ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CF521DE.7020800@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=Kai.Makisara@kolumbus.fi \
    --cc=linux-scsi@vger.kernel.org \
    --cc=lkolbe@techfak.uni-bielefeld.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.