All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Smith <andy@strugglers.net>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: xen-devel <xen-devel@lists.xensource.com>
Subject: Re: mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full"
Date: Tue, 6 Dec 2016 02:43:50 +0000	[thread overview]
Message-ID: <20161206024350.GV1804@bitfolk.com> (raw)
In-Reply-To: <2226c155-43ab-d170-c9e4-d112b8fa2de2@citrix.com>

[-- Attachment #1: Type: text/plain, Size: 4336 bytes --]

Hi Andrew,

On Sun, Dec 04, 2016 at 03:59:20PM +0000, Andrew Cooper wrote:
> On 04/12/16 08:32, Andy Smith wrote:
> > Under the Debian jessie amd64 kernel (linux-image-3.16.0-4-amd64
> > 3.16.36-1+deb8u2) running under Xen, I cannot put the system's
> > storage under heavy load without receiving a bunch of "swiotlb
> > buffer is full" kernel error messages and severely degraded
> > performance. Sometimes the system panics and reboots itself.

[…]

> Can you try these two patches from the XenServer Patch queue?
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/series#L613-L614

Looking good.

Using those patches I'm ~20 minutes into this now:

Every 2.0s: cat /proc/mdstat                                 Tue Dec  6 02:16:40 2016

Personalities : [raid1] [raid10]
md5 : active raid10 sdb[0] sda[1]
      1875243008 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
      [==>..................]  check = 11.5% (217058176/1875243008) finish=133.9min speed=206252K/sec
      bitmap: 0/14 pages [0KB], 65536KB chunk

md4 : active raid10 sdc[0] sdd[1]
      3906886656 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
      [>....................]  check =  2.6% (102650880/3906886656) finish=674.4min speed=94007K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

…where previously it would have given kernel errors within 5
seconds, so I think that fixes it. I will have to perform some more
strenuous testing.

Those two patches did not apply cleanly to source of
linux-image-3.16.0-4-amd64 3.16.36-1+deb8u2. The last bit of each
patch was rejected, so I removed them and put them into a separate
patch file (0003-fixup.patch attached).

I have not done this process in a long time so just for the
archives, my process was as per:

    https://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official

# mkdir -p /data/debian
# chown andy: /data/debian
# apt-get install build-essential fakeroot
# apt-get build-dep linux
$ cd /data/debian
$ apt-get source linux
$ wget https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0001-dma-add-dma_get_required_mask_from_max_pfn.patch
$ wget https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch
$ # remove last parts of each patch file, create 0003-fixup.patch that performs equivalent changes
$ cd linux-3.16.36
$ # applying these patches is going to change symbols so changing the abiname
$ # is necessary.
$ # See https://kernel-handbook.alioth.debian.org/ch-versions.html#s-abi-name
$ sed -i -e 's/^abiname: 4/abiname: 4bf/' debian/config/defines
$ fakeroot debian/rules debian/control-real
$ bash debian/bin/test-patches -f amd64 ../0001-dma-add-dma_get_required_mask_from_max_pfn.patch ../0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch ../0003-fixup.patch
# dpkg -i ../linux-headers-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb ../linux-image-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb

boot into new kernel under Xen

$ uname -a
Linux elephant 3.16.0-4bf-amd64 #1 SMP Debian 3.16.36-1+deb8u2a~test (2016-12-05) x86_64 GNU/Linux

I think my next steps should be:

1. Do some more strenuous testing

2. Report bug against source package "linux" in Debian jessie with
   pointer to those two patches.

3. Check if those fixes are already applied in Debian backports
   and/or Debian testing linux package.

> > Dec  4 07:06:00 elephant kernel: [22019.373653] mpt3sas 0000:01:00.0: swiotlb buffer is full (sz: 57344 bytes)
> > Dec  4 07:06:00 elephant kernel: [22019.374707] mpt3sas 0000:01:00.0: swiotlb buffer is full
> > Dec  4 07:06:00 elephant kernel: [22019.375754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > Dec  4 07:06:00 elephant kernel: [22019.376430] IP: [<ffffffffa004e779>] _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
> > Dec  4 07:06:00 elephant kernel: [22019.377122] PGD 0
> 
> This alone is a clear error handling bug in the mpt3sas driver.  It
> hasn't checked the DMA mapping call for a successful mapping before
> following the NULL pointer it got given back.  It is collateral damage
> from the swiotlb buffer being full, but a bug none the less.

Does that require reporting as an upstream linux bug in mpt3sas
then?

Thanks for your help.

Cheers,
Andy

[-- Attachment #2: 0003-fixup.patch --]
[-- Type: text/x-diff, Size: 1393 bytes --]

diff -u a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
--- a/drivers/xen/swiotlb-xen.c	2016-06-15 20:29:36.000000000 +0000
+++ b/drivers/xen/swiotlb-xen.c	2016-12-05 07:05:13.009992832 +0000
@@ -673,6 +673,13 @@
 }
 EXPORT_SYMBOL_GPL(xen_swiotlb_dma_supported);
 
+u64
+xen_swiotlb_get_required_mask(struct device *dev)
+{
+	return DMA_BIT_MASK(64);
+}
+EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask);
+
 int
 xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
 {
diff -u a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h	2016-06-15 20:29:36.000000000 +0000
+++ b/include/linux/dma-mapping.h	2016-12-05 07:03:13.992601404 +0000
@@ -127,6 +127,7 @@
 	return dma_set_mask_and_coherent(dev, mask);
 }
 
+extern u64 dma_get_required_mask_from_max_pfn(struct device *dev);
 extern u64 dma_get_required_mask(struct device *dev);
 
 #ifndef set_arch_dma_coherent_ops
diff -u a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h
--- a/include/xen/swiotlb-xen.h	2016-06-15 20:29:36.000000000 +0000
+++ b/include/xen/swiotlb-xen.h	2016-12-05 07:06:01.084938801 +0000
@@ -56,6 +56,10 @@
 extern int
 xen_swiotlb_dma_supported(struct device *hwdev, u64 mask);
 
+extern u64
+xen_swiotlb_get_required_mask(struct device *dev);
+
+
 extern int
 xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask);
 #endif /* __LINUX_SWIOTLB_XEN_H */

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

      parent reply	other threads:[~2016-12-06  2:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-04  8:32 mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full" Andy Smith
2016-12-04 15:59 ` Andrew Cooper
2016-12-04 21:04   ` Andy Smith
2016-12-06  2:43   ` Andy Smith [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161206024350.GV1804@bitfolk.com \
    --to=andy@strugglers.net \
    --cc=andrew.cooper3@citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.