From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "toshi.kani@hpe.com" <toshi.kani@hpe.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Cc: "Williams, Dan J" <dan.j.williams@intel.com>,
"jmoyer@redhat.com" <jmoyer@redhat.com>,
"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>
Subject: Re: [PATCH v4 0/6] BTT error clearing rework
Date: Mon, 31 Jul 2017 23:35:08 +0000 [thread overview]
Message-ID: <1501544000.4405.5.camel@intel.com> (raw)
In-Reply-To: <1501542358.2042.97.camel@hpe.com>
On Mon, 2017-07-31 at 23:15 +0000, Kani, Toshimitsu wrote:
> On Wed, 2017-07-26 at 17:35 -0600, Vishal Verma wrote:
> :
> >
> > Clearing errors or badblocks during a BTT write requires sending an
> > ACPI DSM, which means potentially sleeping. Since a BTT IO happens
> > in
> > atomic context (preemption disabled, spinlocks may be held), we
> > cannot perform error clearing in the course of an IO. Due to this
> > error clearing for BTT IOs has hitherto been disabled.
> >
> > This series fixes these problems by moving the error clearing out of
> > the atomic sections in the BTT.
> >
> > Also fix a potential deadlock that can occur while clearing errors
> > from either BTT or pmem due to memory allocations in the IO path.
>
> Hi Vishal,
>
> I just tested the series (sorry for the delay). It works nicely when
> doing I/Os to a block device directly. But I am seeing a lot of write
> errors with filesystem.
>
> Here is what I did for the testing.
>
> 1. 'mkfs.ext /dev/pmem0s' and 'mount /dev/pmem0s /mnt/pmem0s'.
> 2. Inject an error to somewhere in the pmem0s device, but not in the
> metadata area at beginning.
> 3. Run the following script.
> ===
> DEV=pmem0s
> set -x
> dd if=/dev/zero of=/mnt/$DEV/1Gfile bs=1M count=1024
> while true; do
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-1
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-2
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-3
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-4
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-5
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-6
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-7
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-8
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-9
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-10
> done
> ===
>
> Step 3 clears an error and runs fine with raw and memory modes. With
> sector mode, however, it ends up with continuous write errors like
> below and does not clear the error. Do you have any thoughts?
>
> EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10
> writing to inode 17 (offset 1023410176 size 8388608 starting block
> 1834752)
> Buffer I/O error on device pmem0s, logical block 1834752
> Buffer I/O error on device pmem0s, logical block 1834753
> Buffer I/O error on device pmem0s, logical block 1834754
> :
> nd_pmem btt0.0: io error in WRITE sector 14680064, len 4096,
> EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10
> writing to inode 17 (offset 1031798784 size 1052672 starting block
> 1835008)
> nd_pmem btt0.0: io error in WRITE sector 14682112, len 4096,
> EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10
> writing to inode 17 (offset 1031798784 size 2101248 starting block
> 1835264)
> :
> nd_pmem btt0.0: io error in WRITE sector 14698496, len 4096,
> nd_pmem btt0.0: io error in WRITE sector 14700544, len 4096,
> nd_pmem btt0.0: io error in WRITE sector 14702592, len 4096,
> nd_pmem btt0.0: io error in WRITE sector 14704640, len 4096,
> :
Thanks for the test Toshi, I will try and reproduce it.
My first guess is - are the injected errors potentially in the BTT
metadata area towards the end?
->rw_bytes can only clear errors on properly aligned writes, and the btt
metadata writes will be too small to clear metadata errors..
>
> Thanks,
> -Toshi
WARNING: multiple messages have this Message-ID (diff)
From: "Verma, Vishal L" <vishal.l.verma@intel.com>
To: "toshi.kani@hpe.com" <toshi.kani@hpe.com>,
"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>
Cc: "linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>, Wysocki,
Subject: Re: [PATCH v4 0/6] BTT error clearing rework
Date: Mon, 31 Jul 2017 23:35:08 +0000 [thread overview]
Message-ID: <1501544000.4405.5.camel@intel.com> (raw)
In-Reply-To: <1501542358.2042.97.camel@hpe.com>
On Mon, 2017-07-31 at 23:15 +0000, Kani, Toshimitsu wrote:
> On Wed, 2017-07-26 at 17:35 -0600, Vishal Verma wrote:
> :
> >
> > Clearing errors or badblocks during a BTT write requires sending an
> > ACPI DSM, which means potentially sleeping. Since a BTT IO happens
> > in
> > atomic context (preemption disabled, spinlocks may be held), we
> > cannot perform error clearing in the course of an IO. Due to this
> > error clearing for BTT IOs has hitherto been disabled.
> >
> > This series fixes these problems by moving the error clearing out of
> > the atomic sections in the BTT.
> >
> > Also fix a potential deadlock that can occur while clearing errors
> > from either BTT or pmem due to memory allocations in the IO path.
>
> Hi Vishal,
>
> I just tested the series (sorry for the delay). It works nicely when
> doing I/Os to a block device directly. But I am seeing a lot of write
> errors with filesystem.
>
> Here is what I did for the testing.
>
> 1. 'mkfs.ext /dev/pmem0s' and 'mount /dev/pmem0s /mnt/pmem0s'.
> 2. Inject an error to somewhere in the pmem0s device, but not in the
> metadata area at beginning.
> 3. Run the following script.
> ===
> DEV=pmem0s
> set -x
> dd if=/dev/zero of=/mnt/$DEV/1Gfile bs=1M count=1024
> while true; do
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-1
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-2
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-3
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-4
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-5
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-6
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-7
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-8
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-9
> cp /mnt/$DEV/1Gfile /mnt/$DEV/file-10
> done
> ===
>
> Step 3 clears an error and runs fine with raw and memory modes. With
> sector mode, however, it ends up with continuous write errors like
> below and does not clear the error. Do you have any thoughts?
>
> EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10
> writing to inode 17 (offset 1023410176 size 8388608 starting block
> 1834752)
> Buffer I/O error on device pmem0s, logical block 1834752
> Buffer I/O error on device pmem0s, logical block 1834753
> Buffer I/O error on device pmem0s, logical block 1834754
> :
> nd_pmem btt0.0: io error in WRITE sector 14680064, len 4096,
> EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10
> writing to inode 17 (offset 1031798784 size 1052672 starting block
> 1835008)
> nd_pmem btt0.0: io error in WRITE sector 14682112, len 4096,
> EXT4-fs warning (device pmem0s): ext4_end_bio:322: I/O error 10
> writing to inode 17 (offset 1031798784 size 2101248 starting block
> 1835264)
> :
> nd_pmem btt0.0: io error in WRITE sector 14698496, len 4096,
> nd_pmem btt0.0: io error in WRITE sector 14700544, len 4096,
> nd_pmem btt0.0: io error in WRITE sector 14702592, len 4096,
> nd_pmem btt0.0: io error in WRITE sector 14704640, len 4096,
> :
Thanks for the test Toshi, I will try and reproduce it.
My first guess is - are the injected errors potentially in the BTT
metadata area towards the end?
->rw_bytes can only clear errors on properly aligned writes, and the btt
metadata writes will be too small to clear metadata errors..
>
> Thanks,
> -Toshi
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2017-07-31 23:35 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-26 23:35 [PATCH v4 0/6] BTT error clearing rework Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 2/6] btt: refactor map entry operations with macros Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 3/6] btt: ensure that flags were also unchanged during a map_read Vishal Verma
2017-07-26 23:35 ` Vishal Verma
[not found] ` <20170726233546.29052-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-07-26 23:35 ` [PATCH v4 4/6] btt: cache sector_size in arena_info Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 5/6] libnvdimm: fix potential deadlock while clearing errors Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-26 23:35 ` [PATCH v4 6/6] libnvdimm, btt: rework error clearing Vishal Verma
2017-07-26 23:35 ` Vishal Verma
2017-07-31 23:15 ` [PATCH v4 0/6] BTT error clearing rework Kani, Toshimitsu
2017-07-31 23:15 ` Kani, Toshimitsu
2017-07-31 23:35 ` Verma, Vishal L [this message]
2017-07-31 23:35 ` Verma, Vishal L
2017-08-01 15:28 ` Kani, Toshimitsu
2017-08-01 15:28 ` Kani, Toshimitsu
2017-08-01 19:11 ` Kani, Toshimitsu
2017-08-01 19:11 ` Kani, Toshimitsu
[not found] ` <1501614143.2042.101.camel-ZPxbGqLxI0U@public.gmane.org>
2017-08-01 19:56 ` Vishal Verma
2017-08-01 19:56 ` Vishal Verma
2017-08-01 20:06 ` Kani, Toshimitsu
2017-08-01 20:06 ` Kani, Toshimitsu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1501544000.4405.5.camel@intel.com \
--to=vishal.l.verma@intel.com \
--cc=dan.j.williams@intel.com \
--cc=jmoyer@redhat.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=rafael.j.wysocki@intel.com \
--cc=toshi.kani@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.