From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
To: James Bottomley <James.Bottomley@suse.de>
Cc: "Kai Mäkisara" <kai.makisara@kolumbus.fi>,
"FUJITA Tomonori" <fujita.tomonori@lab.ntt.co.jp>,
linux-scsi@vger.kernel.org,
"Kashyap Desai" <Kashyap.Desai@lsi.com>
Subject: Re: After memory pressure: can't read from tape anymore
Date: Sun, 05 Dec 2010 11:53:03 +0100 [thread overview]
Message-ID: <1291546383.2814.2890.camel@larosa> (raw)
In-Reply-To: <1291399814.2881.66.camel@mulgrave.site>
Am Freitag, den 03.12.2010, 12:10 -0600 schrieb James Bottomley:
> On Fri, 2010-12-03 at 18:03 +0100, Lukas Kolbe wrote:
> > Am Freitag, den 03.12.2010, 09:06 -0600 schrieb James Bottomley:
> > > On Fri, 2010-12-03 at 16:59 +0200, Kai Mäkisara wrote:
> > > > On 12/03/2010 02:27 PM, FUJITA Tomonori wrote:
> > > > >
> > > > > Can we make enlarge_buffer friendly to the memory alloctor a bit?
> > > > >
> > > > > His problem is that the driver can't allocate 2 mB with the hardware
> > > > > limit 128 segments.
> > > > >
> > > > > enlarge_buffer tries to use ST_MAX_ORDER and if the allocation (256 kB
> > > > > page) fails, enlarge_buffer fails. It could try smaller order instead?
> > > > >
> > > > > Not tested at all.
> > > > >
> > > > >
> > > > > diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
> > > > > index 5b7388f..119544b 100644
> > > > > --- a/drivers/scsi/st.c
> > > > > +++ b/drivers/scsi/st.c
> > > > > @@ -3729,7 +3729,8 @@ static int enlarge_buffer(struct st_buffer * STbuffer, int new_size, int need_dm
> > > > > b_size = PAGE_SIZE<< order;
> > > > > } else {
> > > > > for (b_size = PAGE_SIZE, order = 0;
> > > > > - order< ST_MAX_ORDER&& b_size< new_size;
> > > > > + order< ST_MAX_ORDER&&
> > > > > + max_segs * (PAGE_SIZE<< order)< new_size;
> > > > > order++, b_size *= 2)
> > > > > ; /* empty */
> > > > > }
> > > >
> > > > You are correct. The loop does not work at all as it should. Years ago,
> > > > the strategy was to start with as big blocks as possible to minimize the
> > > > number s/g segments. Nowadays the segments must be of same size and the
> > > > old logic is not applicable.
> > > >
> > > > I have not tested the patch either but it looks correct.
> > > >
> > > > Thanks for noticing this bug. I hope this helps the users. The question
> > > > about number of s/g segments is still valid for the direct i/o case but
> > > > that is optimization and not whether one can read/write.
> > >
> > > Realistically, though, this will only increase the probability of making
> > > an allocation work, we can't get this to a certainty.
> > >
> > > Since we fixed up the infrastructure to allow arbitrary length sg lists,
> > > perhaps we should document what cards can actually take advantage of
> > > this (and how to do so, since it's not set automatically on boot). That
> > > way users wanting tapes at least know what the problems are likely to be
> > > and how to avoid them in their hardware purchasing decisions. The
> > > corollary is that we should likely have a list of not recommended cards:
> > > if they can't go over 128 SG elements, then they're pretty much
> > > unsuitable for modern tapes.
> >
> > Are you implying here that the LSI SAS1068E is unsuitable to drive two
> > LTO-4 tape drives? Or is it 'just' a problem with the driver?
>
> The information seems to be the former. There's no way the kernel can
> guarantee physical contiguity of memory as it operates. We try to
> defrag, but it's probabalistic, not certain, so if we have to try to
> find a physically contiguous buffer to copy into for an operation like
> this, at some point that allocation is going to fail.
>
> The only way to be certain you can get a 2MB block down to a tape device
> is to be able to transmit the whole thing as a SG list of fully
> discontiguous pages. On a system with 4k pages, that requires 512 SG
> entries. From what I've heard Kashyap say, that can't currently be done
> on the 1068 because of firmware limitations (I'm not entirely clear on
> this, but that's how it sounds to me ... if there is a way of making
> firmware accept more than 128 SG elements per SCSI command, then it is a
> fairly simple driver change).
Well, 2MB blocksizes actually do work - bacula is reporting a blocksize
of ~2MB for each drive while writing to it - only after there was memory
pressure and a new tape got inserted, it is *not* possible anymore to
write to the tape with these blocksizes, and dmesg tells me one of these
every time bacula tries to read from or write to a tape:
[101883.958351] st0: Can't allocate 2097152 byte tape buffer.
[103901.666608] st0: Can't allocate 10249541 byte tape buffer.
No idea why it's trying 10MB, though.
I tested with the patch from Fujita, and this messages from before
applying the patch:
[158544.348411] st: append_to_buffer offset overflow.
do not appear anymore.
It didn't help on the not-being-able-to-write-after-memory-pressure
matter, though.
> This isn't something we can work around
> in the driver because the transaction can't be split ... it has to go
> down as a single WRITE command with a single output data buffer.
>
> The LSI 1068 is an upgradeable firmware system, so it's always possible
> LSI can come up with a firmware update that increases the size (this
> would also require a corresponding driver change), but it doesn't sound
> to be something that can be done in the driver alone.
If only LSI's website were a little more clear on where to find updated
firmware and what was the latest version :/.
--
Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-12-05 10:53 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-28 19:15 After memory pressure: can't read from tape anymore Lukas Kolbe
2010-11-29 17:09 ` Kai Makisara
2010-11-30 13:31 ` Lukas Kolbe
2010-11-30 16:10 ` Boaz Harrosh
2010-11-30 16:23 ` Kai Makisara
2010-11-30 16:44 ` Boaz Harrosh
2010-11-30 17:04 ` Kai Makisara
2010-11-30 17:24 ` Boaz Harrosh
2010-11-30 19:53 ` Kai Makisara
2010-12-01 9:40 ` Lukas Kolbe
2010-12-02 11:17 ` Desai, Kashyap
2010-12-02 16:22 ` Kai Makisara
2010-12-02 18:14 ` Desai, Kashyap
2010-12-02 20:25 ` Kai Makisara
2010-12-05 10:44 ` Lukas Kolbe
2010-12-03 10:13 ` FUJITA Tomonori
2010-12-03 10:45 ` Desai, Kashyap
2010-12-03 11:11 ` FUJITA Tomonori
2010-12-02 10:01 ` Lukas Kolbe
2010-12-03 9:44 ` FUJITA Tomonori
2010-11-30 16:20 ` Kai Makisara
2010-12-01 17:06 ` Lukas Kolbe
2010-12-02 16:41 ` Kai Makisara
2010-12-06 7:59 ` Kai Makisara
2010-12-06 8:50 ` FUJITA Tomonori
2010-12-06 9:36 ` Lukas Kolbe
2010-12-06 11:34 ` Bjørn Mork
2010-12-08 14:19 ` Lukas Kolbe
2010-12-03 12:27 ` FUJITA Tomonori
2010-12-03 14:59 ` Kai Mäkisara
2010-12-03 15:06 ` James Bottomley
2010-12-03 17:03 ` Lukas Kolbe
2010-12-03 18:10 ` James Bottomley
2010-12-05 10:53 ` Lukas Kolbe [this message]
2010-12-05 12:16 ` FUJITA Tomonori
2010-12-14 20:35 ` Vladislav Bolkhovitin
2010-12-14 22:23 ` Stephen Hemminger
2010-12-15 16:27 ` Vladislav Bolkhovitin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1291546383.2814.2890.camel@larosa \
--to=lkolbe@techfak.uni-bielefeld.de \
--cc=James.Bottomley@suse.de \
--cc=Kashyap.Desai@lsi.com \
--cc=fujita.tomonori@lab.ntt.co.jp \
--cc=kai.makisara@kolumbus.fi \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox