From: Boaz Harrosh <bharrosh@panasas.com>
To: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
Cc: Kai Makisara <Kai.Makisara@kolumbus.fi>, linux-scsi@vger.kernel.org
Subject: Re: After memory pressure: can't read from tape anymore
Date: Tue, 30 Nov 2010 18:10:06 +0200 [thread overview]
Message-ID: <4CF521DE.7020800@panasas.com> (raw)
In-Reply-To: <1291123886.2181.1347.camel@quux.techfak.uni-bielefeld.de>
On 11/30/2010 03:31 PM, Lukas Kolbe wrote:
> On Mon, 2010-11-29 at 19:09 +0200, Kai Makisara wrote:
>
> Hi,
>
>>> On our backup system (2 LTO4 drives/Tandberg library via LSISAS1068E,
>>> Kernel 2.6.36 with the stock Fusion MPT SAS Host driver 3.04.17 on
>>> debian/squeeze), we see reproducible tape read and write failures after
>>> the system was under memory pressure:
>>>
>>> [342567.297152] st0: Can't allocate 2097152 byte tape buffer.
>>> [342569.316099] st0: Can't allocate 2097152 byte tape buffer.
>>> [342570.805164] st0: Can't allocate 2097152 byte tape buffer.
>>> [342571.958331] st0: Can't allocate 2097152 byte tape buffer.
>>> [342572.704264] st0: Can't allocate 2097152 byte tape buffer.
>>> [342873.737130] st: from_buffer offset overflow.
>>>
>>> Bacula is spewing this message every time it tries to access the tape
>>> drive:
>>> 28-Nov 19:58 sd1.techfak JobId 2857: Error: block.c:1002 Read error on fd=10 at file:blk 0:0 on device "drv2" (/dev/nst0). ERR=Input/output error
>>>
>>> By memory pressure, I mean that the KVM processes containing the
>>> postgres-db (~20million files) and the bacula director have used all
>>> available RAM, one of them used ~4GiB of its 12GiB swap for an hour or
>>> so (by selecting a full restore, it seems that the whole directory tree
>>> of the 15mio files backup gets read into memory). After this, I wasn't
>>> able to read from the second tape drive anymore (/dev/st0); whereas the
>>> first tape drive was restoring the data happily (it is currently about
>>> halfway through a 3TiB restore from 5 tapes).
>>>
>>> This same behaviour appears when we're doing a few incremental backups;
>>> after a while, it just isn't possible to use the tape drives anymore -
>>> every I/O operation gives an I/O Error, even a simple dd bs=64k
>>> count=10. After a restart, the system behaves correctly until
>>> -seemingly- another memory pressure situation occured.
>>>
>> This is predictable. The maximum number of scatter/gather segments seems
>> to be 128. The st driver first tries to set up transfer directly from the
>> user buffer to the HBA. The user buffer is usually fragmented so that one
>> scatter/gather segment is used for each page. Assuming 4 kB page size, the
>> maximu size of the direct transfer is 128 x 4 = 512 kB.
>>
>> When this fails, the driver tries to allocate a kernel buffer so that
>> there larger than 4 kB physically contiguous segments. Let's assume that
>> it can find 128 16 kB segments. In this case the maximum block size is
>> 2048 kB. Memory pressure results in memory fragmentation and the driver
>> can't find large enough segments and allocation fails. This is what you
>> are seeing.
>
> Reasonable explanation, thanks. What makes me wonder is why it still
> fails *after* memory pressure was gone - ie free shows more than 4GiB of
> free memory. I had the output of /proc/meminfo at that time but can't
> find it anymore :/
>
>> So, one solution is to use 512 kB block size. Another one is to try to
>> find out if the 128 segment limit is a physical limitation or just a
>> choice. In the latter case the mptsas driver could be modified to support
>> larger block size even after memory fragmentation.
>
> Even with 64kb blocksize (dd bs=64k), I was getting I/O errors trying to
> access the tape drive. I am now trying to upper the max_sg_segs
> parameter to the st module (modinfo says 256 is the default; I'm trying
> 1024 now) and see how well this works under memory pressure.
>
It looks like something is broken/old-code in sr. Most important LLDs
and block-layer scsi-ml fully support sg-chaining that effectively are
able to deliver limitless (Only limited by HW) sg sizes. It looks like
sr has some code that tries to allocate contiguous buffers larger than
PAGE_SIZE. Why does it do that? It should not be necessary any more.
>> Kai
>
Boaz
next prev parent reply other threads:[~2010-11-30 16:10 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-28 19:15 After memory pressure: can't read from tape anymore Lukas Kolbe
2010-11-29 17:09 ` Kai Makisara
2010-11-30 13:31 ` Lukas Kolbe
2010-11-30 16:10 ` Boaz Harrosh [this message]
2010-11-30 16:23 ` Kai Makisara
2010-11-30 16:44 ` Boaz Harrosh
2010-11-30 17:04 ` Kai Makisara
2010-11-30 17:24 ` Boaz Harrosh
2010-11-30 19:53 ` Kai Makisara
2010-12-01 9:40 ` Lukas Kolbe
2010-12-02 11:17 ` Desai, Kashyap
2010-12-02 16:22 ` Kai Makisara
2010-12-02 18:14 ` Desai, Kashyap
2010-12-02 20:25 ` Kai Makisara
2010-12-05 10:44 ` Lukas Kolbe
2010-12-03 10:13 ` FUJITA Tomonori
2010-12-03 10:45 ` Desai, Kashyap
2010-12-03 11:11 ` FUJITA Tomonori
2010-12-02 10:01 ` Lukas Kolbe
2010-12-03 9:44 ` FUJITA Tomonori
2010-11-30 16:20 ` Kai Makisara
2010-12-01 17:06 ` Lukas Kolbe
2010-12-02 16:41 ` Kai Makisara
2010-12-06 7:59 ` Kai Makisara
2010-12-06 8:50 ` FUJITA Tomonori
2010-12-06 9:36 ` Lukas Kolbe
2010-12-06 11:34 ` Bjørn Mork
2010-12-08 14:19 ` Lukas Kolbe
2010-12-03 12:27 ` FUJITA Tomonori
2010-12-03 14:59 ` Kai Mäkisara
2010-12-03 15:06 ` James Bottomley
2010-12-03 17:03 ` Lukas Kolbe
2010-12-03 18:10 ` James Bottomley
2010-12-05 10:53 ` Lukas Kolbe
2010-12-05 12:16 ` FUJITA Tomonori
2010-12-14 20:35 ` Vladislav Bolkhovitin
2010-12-14 22:23 ` Stephen Hemminger
2010-12-15 16:27 ` Vladislav Bolkhovitin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CF521DE.7020800@panasas.com \
--to=bharrosh@panasas.com \
--cc=Kai.Makisara@kolumbus.fi \
--cc=linux-scsi@vger.kernel.org \
--cc=lkolbe@techfak.uni-bielefeld.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox