public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
To: Kai Makisara <Kai.Makisara@kolumbus.fi>
Cc: linux-scsi@vger.kernel.org, Kashyap Desai <kashyap.desai@lsi.com>
Subject: Re: After memory pressure: can't read from tape anymore
Date: Wed, 01 Dec 2010 18:06:51 +0100	[thread overview]
Message-ID: <1291223211.2814.72.camel@larosa> (raw)
In-Reply-To: <alpine.LNX.2.00.1011301813460.9731@kai.makisara.local>

Am Dienstag, den 30.11.2010, 18:20 +0200 schrieb Kai Makisara:

Hi, in reply to your earlier mail:

> On Tue, 30 Nov 2010, Lukas Kolbe wrote:
> 
> > On Mon, 2010-11-29 at 19:09 +0200, Kai Makisara wrote:
> > 
> > Hi,
> > 
> > > > On our backup system (2 LTO4 drives/Tandberg library via LSISAS1068E,
> > > > Kernel 2.6.36 with the stock Fusion MPT SAS Host driver 3.04.17 on
> > > > debian/squeeze), we see reproducible tape read and write failures after
> > > > the system was under memory pressure:
> > > > 
> > > > [342567.297152] st0: Can't allocate 2097152 byte tape buffer.
> > > > [342569.316099] st0: Can't allocate 2097152 byte tape buffer.
> > > > [342570.805164] st0: Can't allocate 2097152 byte tape buffer.
> > > > [342571.958331] st0: Can't allocate 2097152 byte tape buffer.
> > > > [342572.704264] st0: Can't allocate 2097152 byte tape buffer.
> > > > [342873.737130] st: from_buffer offset overflow.
> > > > 
> ...
> > > When this fails, the driver tries to allocate a kernel buffer so that 
> > > there larger than 4 kB physically contiguous segments. Let's assume that 
> > > it can find 128 16 kB segments. In this case the maximum block size is 
> > > 2048 kB. Memory pressure results in memory fragmentation and the driver 
> > > can't find large enough segments and allocation fails. This is what you 
> > > are seeing.
> > 
> > Reasonable explanation, thanks. What makes me wonder is why it still
> > fails *after* memory pressure was gone - ie free shows more than 4GiB of
> > free memory. I had the output of /proc/meminfo at that time but can't
> > find it anymore :/
> > 
> This is because (AFAIK) the kernel does not defragment the memory. There 
> may be contiguous free pages but the memory management data structures 
> don't show these.
> 
> > > So, one solution is to use 512 kB block size. Another one is to try to 
> > > find out if the 128 segment limit is a physical limitation or just a 
> > > choice. In the latter case the mptsas driver could be modified to support 
> > > larger block size even after memory fragmentation.
> > 
> > Even with 64kb blocksize (dd bs=64k), I was getting I/O errors trying to
> > access the tape drive. I am now trying to upper the max_sg_segs
> > parameter to the st module (modinfo says 256 is the default; I'm trying
> > 1024 now) and see how well this works under memory pressure.
> > 
> This will not help. The final limit is the minimum of the limit of st and 
> the limit of mtpsas. The mptsas limit is 128. This is the limit that 
> should be increased but I don't know if it is possible.
> 
> If you see error with 64 kB block size, I would like to see any messages 
> associated with these errors.

I have now hit this bug again. Trying to read and write a label from the
tape drive in question results in this (via bacula's btape command):

*readlabel
01-Dec 17:47 btape JobId 0: Error: block.c:1002 Read error on fd=3 at
file:blk 0:0 on device "drv1" (/dev/nst1). ERR=Value too large for
defined data type.
btape: btape.c:525 Volume has no label.

Volume Label:
Id                : **error**VerNo             : 0
VolName           :
PrevVolName       :
VolFile           : 0
LabelType         : Unknown 0
LabelSize         : 0
PoolName          :
MediaType         :
PoolType          :
HostName          :
Date label written: -4712-01-01 at 00:00
*label
Enter Volume Name: AAA543
01-Dec 17:47 btape JobId 0: Error: block.c:577 Write error at 0:0 on
device "drv1" (/dev/nst1). ERR=Input/output error.
01-Dec 17:48 btape JobId 0: Error: Backspace record at EOT failed.
ERR=Input/output error
Wrote Volume label for volume "AAA543".

dmesg says (as expected): 

[158529.011206] st1: Can't allocate 2097152 byte tape buffer.
[158544.348411] st: append_to_buffer offset overflow.
[158544.348416] st: append_to_buffer offset overflow.
[158544.348418] st: append_to_buffer offset overflow.
[158544.348419] st: append_to_buffer offset overflow.

Now a dd with 64kb blocksize behaves really strange:

root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=1
dd: reading `/dev/nst1': Device or resource busy
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.118717 s, 0.0 kB/s

ok, so some process must be using /dev/nst1, right?

root@shepherd:~# lsof  |grep st1

nope, nothing.

Subsequent dd's, only a few seconds later:

root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=1
0+0 records in
0+0 records out
0 bytes (0 B) copied, 4.64747 s, 0.0 kB/s
root@shepherd:~# echo $?
0

Jeha right, we successfully read EOF/EOT

root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=1
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0041229 s, 0.0 kB/s

Possibly another EOT?

root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=1
dd: reading `/dev/nst1': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.0128587 s, 0.0 kB/s
root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=1
dd: reading `/dev/nst1': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000236144 s, 0.0 kB/s
root@shepherd:~# echo $?
1

Hm, now an I/O error! now dmesg has this to tell me:
[158651.882012] st1: Can't allocate 5085561 byte tape buffer.

Trying to write to the tape looks like below, which seems to match your
earlier description; ie 64/65k works, 128k blocksize works, 256k
blocksize and above don't work anymore. I wasn't able to reproduce not
being able to write with a 64k blocksize at the moment.

root@shepherd:~# lsof  |grep st1
root@shepherd:~# dd if=/dev/zero of=/dev/nst1 bs=65k count=100
100+0 records in
100+0 records out
6656000 bytes (6.7 MB) copied, 2.08872 s, 3.2 MB/s
root@shepherd:~# dd if=/dev/zero of=/dev/nst1 bs=64k count=100
100+0 records in
100+0 records out
6553600 bytes (6.6 MB) copied, 1.71815 s, 3.8 MB/s
root@shepherd:~# dd if=/dev/zero of=/dev/nst1 bs=512k count=100
dd: writing `/dev/nst1': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 1.82643 s, 0.0 kB/s
root@shepherd:~# dd if=/dev/zero of=/dev/nst1 bs=256k count=100
dd: writing `/dev/nst1': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 1.71959 s, 0.0 kB/s
root@shepherd:~# dd if=/dev/zero of=/dev/nst1 bs=128k count=100
100+0 records in
100+0 records out
13107200 bytes (13 MB) copied, 2.08911 s, 6.3 MB/s
root@shepherd:~# dd if=/dev/zero of=/dev/nst1 bs=64k count=100
100+0 records in
100+0 records out
6553600 bytes (6.6 MB) copied, 1.99401 s, 3.3 MB/s
root@shepherd:~# mt -f /dev/nst1 rewind
root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=100
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00889507 s, 0.0 kB/s
root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=64k count=10^C
root@shepherd:~# lsof  |grep st1
root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=512k count=100
dd: reading `/dev/nst1': Device or resource busy
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000232968 s, 0.0 kB/s
root@shepherd:~# lsof  |grep st1
root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=128k count=100
0+100 records in
0+100 records out
6656000 bytes (6.7 MB) copied, 0.314093 s, 21.2 MB/s
root@shepherd:~# dd if=/dev/nst1 of=/tmp/x bs=256k count=100
dd: reading `/dev/nst1': Device or resource busy
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000367819 s, 0.0 kB/s
root@shepherd:~# lsof  |grep st1
root@shepherd:~# 


Swap is used mainly because of overcommiting RAM to two VMs, but that
memory is rarely accessed.


root@shepherd:~# cat /proc/meminfo 
MemTotal:        8197296 kB
MemFree:           72648 kB
Buffers:           40496 kB
Cached:          1891664 kB
SwapCached:      1131684 kB
Active:          4258136 kB
Inactive:        3452272 kB
Active(anon):    4010488 kB
Inactive(anon):  1767884 kB
Active(file):     247648 kB
Inactive(file):  1684388 kB
Unevictable:         160 kB
Mlocked:             160 kB
SwapTotal:       4194300 kB
SwapFree:        1398976 kB
Dirty:            336920 kB
Writeback:             0 kB
AnonPages:       4648596 kB
Mapped:             4472 kB
Shmem:                12 kB
Slab:             155140 kB
SReclaimable:     109152 kB
SUnreclaim:        45988 kB
KernelStack:        1448 kB
PageTables:        18436 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8227412 kB
Committed_AS:    7884284 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       59244 kB
VmallocChunk:   34359660812 kB
HardwareCorrupted:     0 kB
HugePages_Total:      64
HugePages_Free:       64
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7488 kB
DirectMap2M:     8380416 kB

I hope this somehow helps,
Lukas



  reply	other threads:[~2010-12-01 17:06 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-28 19:15 After memory pressure: can't read from tape anymore Lukas Kolbe
2010-11-29 17:09 ` Kai Makisara
2010-11-30 13:31   ` Lukas Kolbe
2010-11-30 16:10     ` Boaz Harrosh
2010-11-30 16:23       ` Kai Makisara
2010-11-30 16:44         ` Boaz Harrosh
2010-11-30 17:04           ` Kai Makisara
2010-11-30 17:24             ` Boaz Harrosh
2010-11-30 19:53               ` Kai Makisara
2010-12-01  9:40                 ` Lukas Kolbe
2010-12-02 11:17                   ` Desai, Kashyap
2010-12-02 16:22                     ` Kai Makisara
2010-12-02 18:14                       ` Desai, Kashyap
2010-12-02 20:25                         ` Kai Makisara
2010-12-05 10:44                           ` Lukas Kolbe
2010-12-03 10:13                       ` FUJITA Tomonori
2010-12-03 10:45                         ` Desai, Kashyap
2010-12-03 11:11                           ` FUJITA Tomonori
2010-12-02 10:01                 ` Lukas Kolbe
2010-12-03  9:44               ` FUJITA Tomonori
2010-11-30 16:20     ` Kai Makisara
2010-12-01 17:06       ` Lukas Kolbe [this message]
2010-12-02 16:41         ` Kai Makisara
2010-12-06  7:59           ` Kai Makisara
2010-12-06  8:50             ` FUJITA Tomonori
2010-12-06  9:36             ` Lukas Kolbe
2010-12-06 11:34               ` Bjørn Mork
2010-12-08 14:19               ` Lukas Kolbe
2010-12-03 12:27   ` FUJITA Tomonori
2010-12-03 14:59     ` Kai Mäkisara
2010-12-03 15:06       ` James Bottomley
2010-12-03 17:03         ` Lukas Kolbe
2010-12-03 18:10           ` James Bottomley
2010-12-05 10:53             ` Lukas Kolbe
2010-12-05 12:16               ` FUJITA Tomonori
2010-12-14 20:35             ` Vladislav Bolkhovitin
2010-12-14 22:23               ` Stephen Hemminger
2010-12-15 16:27                 ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291223211.2814.72.camel@larosa \
    --to=lkolbe@techfak.uni-bielefeld.de \
    --cc=Kai.Makisara@kolumbus.fi \
    --cc=kashyap.desai@lsi.com \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox