public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukas Kolbe <lkolbe@techfak.uni-bielefeld.de>
To: James Bottomley <James.Bottomley@suse.de>
Cc: "Kai Mäkisara" <kai.makisara@kolumbus.fi>,
	"FUJITA Tomonori" <fujita.tomonori@lab.ntt.co.jp>,
	linux-scsi@vger.kernel.org,
	"Kashyap Desai" <Kashyap.Desai@lsi.com>
Subject: Re: After memory pressure: can't read from tape anymore
Date: Sun, 05 Dec 2010 11:53:03 +0100	[thread overview]
Message-ID: <1291546383.2814.2890.camel@larosa> (raw)
In-Reply-To: <1291399814.2881.66.camel@mulgrave.site>

Am Freitag, den 03.12.2010, 12:10 -0600 schrieb James Bottomley:
> On Fri, 2010-12-03 at 18:03 +0100, Lukas Kolbe wrote:
> > Am Freitag, den 03.12.2010, 09:06 -0600 schrieb James Bottomley:
> > > On Fri, 2010-12-03 at 16:59 +0200, Kai Mäkisara wrote:
> > > > On 12/03/2010 02:27 PM, FUJITA Tomonori wrote:
> > > > >
> > > > > Can we make enlarge_buffer friendly to the memory alloctor a bit?
> > > > >
> > > > > His problem is that the driver can't allocate 2 mB with the hardware
> > > > > limit 128 segments.
> > > > >
> > > > > enlarge_buffer tries to use ST_MAX_ORDER and if the allocation (256 kB
> > > > > page) fails, enlarge_buffer fails. It could try smaller order instead?
> > > > >
> > > > > Not tested at all.
> > > > >
> > > > >
> > > > > diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
> > > > > index 5b7388f..119544b 100644
> > > > > --- a/drivers/scsi/st.c
> > > > > +++ b/drivers/scsi/st.c
> > > > > @@ -3729,7 +3729,8 @@ static int enlarge_buffer(struct st_buffer * STbuffer, int new_size, int need_dm
> > > > >   		b_size = PAGE_SIZE<<  order;
> > > > >   	} else {
> > > > >   		for (b_size = PAGE_SIZE, order = 0;
> > > > > -		     order<  ST_MAX_ORDER&&  b_size<  new_size;
> > > > > +		     order<  ST_MAX_ORDER&&
> > > > > +			     max_segs * (PAGE_SIZE<<  order)<  new_size;
> > > > >   		     order++, b_size *= 2)
> > > > >   			;  /* empty */
> > > > >   	}
> > > > 
> > > > You are correct. The loop does not work at all as it should. Years ago,
> > > > the strategy was to start with as big blocks as possible to minimize the 
> > > > number s/g segments. Nowadays the segments must be of same size and the 
> > > > old logic is not applicable.
> > > > 
> > > > I have not tested the patch either but it looks correct.
> > > > 
> > > > Thanks for noticing this bug. I hope this helps the users. The question 
> > > > about number of s/g segments is still valid for the direct i/o case but 
> > > > that is optimization and not whether one can read/write.
> > > 
> > > Realistically, though, this will only increase the probability of making
> > > an allocation work, we can't get this to a certainty.
> > > 
> > > Since we fixed up the infrastructure to allow arbitrary length sg lists,
> > > perhaps we should document what cards can actually take advantage of
> > > this (and how to do so, since it's not set automatically on boot).  That
> > > way users wanting tapes at least know what the problems are likely to be
> > > and how to avoid them in their hardware purchasing decisions.  The
> > > corollary is that we should likely have a list of not recommended cards:
> > > if they can't go over 128 SG elements, then they're pretty much
> > > unsuitable for modern tapes.
> > 
> > Are you implying here that the LSI SAS1068E is unsuitable to drive two
> > LTO-4 tape drives? Or is it 'just' a problem with the driver?
> 
> The information seems to be the former.  There's no way the kernel can
> guarantee physical contiguity of memory as it operates.  We try to
> defrag, but it's probabalistic, not certain, so if we have to try to
> find a physically contiguous buffer to copy into for an operation like
> this, at some point that allocation is going to fail.
> 
> The only way to be certain you can get a 2MB block down to a tape device
> is to be able to transmit the whole thing as a SG list of fully
> discontiguous pages.  On a system with 4k pages, that requires 512 SG
> entries.  From what I've heard Kashyap say, that can't currently be done
> on the 1068 because of firmware limitations (I'm not entirely clear on
> this, but that's how it sounds to me ... if there is a way of making
> firmware accept more than 128 SG elements per SCSI command, then it is a
> fairly simple driver change).

Well, 2MB blocksizes actually do work - bacula is reporting a blocksize
of ~2MB for each drive while writing to it - only after there was memory
pressure and a new tape got inserted, it is *not* possible anymore to
write to the tape with these blocksizes, and dmesg tells me one of these
every time bacula tries to read from or write to a tape:

[101883.958351] st0: Can't allocate 2097152 byte tape buffer.
[103901.666608] st0: Can't allocate 10249541 byte tape buffer.

No idea why it's trying 10MB, though.

I tested with the patch from Fujita, and this messages from before
applying the patch: 

[158544.348411] st: append_to_buffer offset overflow.

do not appear anymore.
It didn't help on the not-being-able-to-write-after-memory-pressure
matter, though.

>  This isn't something we can work around
> in the driver because the transaction can't be split ... it has to go
> down as a single WRITE command with a single output data buffer.
> 
> The LSI 1068 is an upgradeable firmware system, so it's always possible
> LSI can come up with a firmware update that increases the size (this
> would also require a corresponding driver change), but it doesn't sound
> to be something that can be done in the driver alone.

If only LSI's website were a little more clear on where to find updated
firmware and what was the latest version :/.

-- 
Lukas


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-12-05 10:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-28 19:15 After memory pressure: can't read from tape anymore Lukas Kolbe
2010-11-29 17:09 ` Kai Makisara
2010-11-30 13:31   ` Lukas Kolbe
2010-11-30 16:10     ` Boaz Harrosh
2010-11-30 16:23       ` Kai Makisara
2010-11-30 16:44         ` Boaz Harrosh
2010-11-30 17:04           ` Kai Makisara
2010-11-30 17:24             ` Boaz Harrosh
2010-11-30 19:53               ` Kai Makisara
2010-12-01  9:40                 ` Lukas Kolbe
2010-12-02 11:17                   ` Desai, Kashyap
2010-12-02 16:22                     ` Kai Makisara
2010-12-02 18:14                       ` Desai, Kashyap
2010-12-02 20:25                         ` Kai Makisara
2010-12-05 10:44                           ` Lukas Kolbe
2010-12-03 10:13                       ` FUJITA Tomonori
2010-12-03 10:45                         ` Desai, Kashyap
2010-12-03 11:11                           ` FUJITA Tomonori
2010-12-02 10:01                 ` Lukas Kolbe
2010-12-03  9:44               ` FUJITA Tomonori
2010-11-30 16:20     ` Kai Makisara
2010-12-01 17:06       ` Lukas Kolbe
2010-12-02 16:41         ` Kai Makisara
2010-12-06  7:59           ` Kai Makisara
2010-12-06  8:50             ` FUJITA Tomonori
2010-12-06  9:36             ` Lukas Kolbe
2010-12-06 11:34               ` Bjørn Mork
2010-12-08 14:19               ` Lukas Kolbe
2010-12-03 12:27   ` FUJITA Tomonori
2010-12-03 14:59     ` Kai Mäkisara
2010-12-03 15:06       ` James Bottomley
2010-12-03 17:03         ` Lukas Kolbe
2010-12-03 18:10           ` James Bottomley
2010-12-05 10:53             ` Lukas Kolbe [this message]
2010-12-05 12:16               ` FUJITA Tomonori
2010-12-14 20:35             ` Vladislav Bolkhovitin
2010-12-14 22:23               ` Stephen Hemminger
2010-12-15 16:27                 ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1291546383.2814.2890.camel@larosa \
    --to=lkolbe@techfak.uni-bielefeld.de \
    --cc=James.Bottomley@suse.de \
    --cc=Kashyap.Desai@lsi.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=kai.makisara@kolumbus.fi \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox