* [PATCH 0/6] BTT error clearing rework
@ 2017-07-14 22:11 Vishal Verma
2017-07-14 22:11 ` [PATCH 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw)
To: linux-nvdimm
Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki,
Toshi Kani, Vishal Verma
Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.
This series fixes these problems by moving the error clearing out of
the atomic sections in the BTT.
Patch 5 makes a core ACPI change and I would like some views on it
from the ACPI guys - does the problem make sense, and is this the right
approach to solve it?
Vishal Verma (6):
btt: fix a missed NVDIMM_IO_ATOMIC case in the write path
btt: refactor map entry operations with macros
btt: ensure that flags were also unchanged during a map_read
btt: cache sector_size in arena_info
acpi: change memory allocations to GFP_NOIO
libnvdimm, btt: rework error clearing
drivers/nvdimm/btt.c | 111 ++++++++++++++++++++++++++++++++------
drivers/nvdimm/btt.h | 11 ++++
drivers/nvdimm/claim.c | 9 +---
include/acpi/platform/aclinuxex.h | 6 +--
4 files changed, 109 insertions(+), 28 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path 2017-07-14 22:11 [PATCH 0/6] BTT error clearing rework Vishal Verma @ 2017-07-14 22:11 ` Vishal Verma 2017-07-14 22:11 ` [PATCH 2/6] btt: refactor map entry operations with macros Vishal Verma ` (3 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw) To: linux-nvdimm Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki, Toshi Kani, Vishal Verma The IO context conversion for rw_bytes missed a case in the BTT write path (btt_map_write) which should've been marked as atomic. In reality this should not cause a problem, because map writes are to small for nsio_rw_bytes to attempt error clearing, but it should be fixed for posterity. Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that things like the nfit unit tests, which don't actually sleep, can catch bugs like this. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- drivers/nvdimm/btt.c | 3 ++- drivers/nvdimm/claim.c | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 2af329d..52aa96f 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -1155,7 +1155,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, if (ret) goto out_map; - ret = btt_map_write(arena, premap, new_postmap, 0, 0, 0); + ret = btt_map_write(arena, premap, new_postmap, 0, 0, + NVDIMM_IO_ATOMIC); if (ret) goto out_map; diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index f8ad92b..8f29937 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -292,6 +292,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, && !(flags & NVDIMM_IO_ATOMIC)) { long cleared; + might_sleep(); cleared = nvdimm_clear_poison(&ndns->dev, nsio->res.start + offset, size); if (cleared < size) -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/6] btt: refactor map entry operations with macros 2017-07-14 22:11 [PATCH 0/6] BTT error clearing rework Vishal Verma 2017-07-14 22:11 ` [PATCH 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma @ 2017-07-14 22:11 ` Vishal Verma 2017-07-14 22:11 ` [PATCH 4/6] btt: cache sector_size in arena_info Vishal Verma ` (2 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw) To: linux-nvdimm Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki, Toshi Kani, Vishal Verma Add helpers for converting a raw map entry to just the block number, or either of the 'e' or 'z' flags in preparation for actually using the error flag to mark blocks with media errors. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- drivers/nvdimm/btt.c | 8 ++++---- drivers/nvdimm/btt.h | 4 ++++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 52aa96f..3aa6bc0 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -106,7 +106,7 @@ static int btt_map_write(struct arena_info *arena, u32 lba, u32 mapping, * This 'mapping' is supposed to be just the LBA mapping, without * any flags set, so strip the flag bits. */ - mapping &= MAP_LBA_MASK; + mapping = ent_lba(mapping); ze = (z_flag << 1) + e_flag; switch (ze) { @@ -155,10 +155,10 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping, raw_mapping = le32_to_cpu(in); - z_flag = (raw_mapping & MAP_TRIM_MASK) >> MAP_TRIM_SHIFT; - e_flag = (raw_mapping & MAP_ERR_MASK) >> MAP_ERR_SHIFT; + z_flag = ent_z_flag(raw_mapping); + e_flag = ent_e_flag(raw_mapping); ze = (z_flag << 1) + e_flag; - postmap = raw_mapping & MAP_LBA_MASK; + postmap = ent_lba(raw_mapping); /* Reuse the {z,e}_flag variables for *trim and *error */ z_flag = 0; diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h index 888e862..09fabf5 100644 --- a/drivers/nvdimm/btt.h +++ b/drivers/nvdimm/btt.h @@ -38,6 +38,10 @@ #define IB_FLAG_ERROR 0x00000001 #define IB_FLAG_ERROR_MASK 0x00000001 +#define ent_lba(ent) (ent & MAP_LBA_MASK) +#define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK)) +#define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK)) + enum btt_init_state { INIT_UNCHECKED = 0, INIT_NOTFOUND, -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/6] btt: cache sector_size in arena_info 2017-07-14 22:11 [PATCH 0/6] BTT error clearing rework Vishal Verma 2017-07-14 22:11 ` [PATCH 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma 2017-07-14 22:11 ` [PATCH 2/6] btt: refactor map entry operations with macros Vishal Verma @ 2017-07-14 22:11 ` Vishal Verma 2017-07-14 22:11 ` [PATCH 5/6] acpi: change memory allocations to GFP_NOIO Vishal Verma [not found] ` <20170714221148.11232-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 4 siblings, 0 replies; 9+ messages in thread From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw) To: linux-nvdimm Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki, Toshi Kani, Vishal Verma In preparation for the error clearing rework, add sector_size in the arena_info struct. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- drivers/nvdimm/btt.c | 1 + drivers/nvdimm/btt.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 304dac4..6b84eae 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -566,6 +566,7 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size, if (!arena) return NULL; arena->nd_btt = btt->nd_btt; + arena->sector_size = btt->sector_size; if (!size) return arena; diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h index 09fabf5..2bc0d10b 100644 --- a/drivers/nvdimm/btt.h +++ b/drivers/nvdimm/btt.h @@ -108,6 +108,7 @@ struct aligned_lock { * handle incoming writes. * @version_major: Metadata layout version major. * @version_minor: Metadata layout version minor. + * @sector_size: The Linux sector size - 512 or 4096 * @nextoff: Offset in bytes to the start of the next arena. * @infooff: Offset in bytes to the info block of this arena. * @dataoff: Offset in bytes to the data area of this arena. @@ -135,6 +136,7 @@ struct arena_info { u32 nfree; u16 version_major; u16 version_minor; + u32 sector_size; /* Byte offsets to the different on-media structures */ u64 nextoff; u64 infooff; -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 5/6] acpi: change memory allocations to GFP_NOIO 2017-07-14 22:11 [PATCH 0/6] BTT error clearing rework Vishal Verma ` (2 preceding siblings ...) 2017-07-14 22:11 ` [PATCH 4/6] btt: cache sector_size in arena_info Vishal Verma @ 2017-07-14 22:11 ` Vishal Verma 2017-07-15 5:26 ` Dan Williams [not found] ` <20170714221148.11232-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 4 siblings, 1 reply; 9+ messages in thread From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw) To: linux-nvdimm Cc: linux-acpi, Dan Williams, Jeff Moyer, Rafael J. Wysocki, Toshi Kani, Vishal Verma With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths. Specifically, the DSM to clear media errors is called during writes, so that we can provide a writes-fix-errors model. However it is easy to imagine a scenario like: - write through the nvdimm driver - acpi allocation - writeback, causes more IO through the nvdimm driver - deadlock Making the acpi allocations GPF_NOIO would ensure that it doesn't trigger writeback, and avoids the above deadlock. Cc: <linux-acpi@vger.kernel.org> Cc: <linux-nvdimm@lists.01.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> --- include/acpi/platform/aclinuxex.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/include/acpi/platform/aclinuxex.h b/include/acpi/platform/aclinuxex.h index efdff52..9bd54ec 100644 --- a/include/acpi/platform/aclinuxex.h +++ b/include/acpi/platform/aclinuxex.h @@ -83,12 +83,12 @@ acpi_status acpi_os_terminate(void); */ static inline void *acpi_os_allocate(acpi_size size) { - return kmalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_KERNEL); + return kmalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_NOIO); } static inline void *acpi_os_allocate_zeroed(acpi_size size) { - return kzalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_KERNEL); + return kzalloc(size, irqs_disabled()? GFP_ATOMIC : GFP_NOIO); } static inline void acpi_os_free(void *memory) @@ -99,7 +99,7 @@ static inline void acpi_os_free(void *memory) static inline void *acpi_os_acquire_object(acpi_cache_t * cache) { return kmem_cache_zalloc(cache, - irqs_disabled()? GFP_ATOMIC : GFP_KERNEL); + irqs_disabled()? GFP_ATOMIC : GFP_NOIO); } static inline acpi_thread_id acpi_os_get_thread_id(void) -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 5/6] acpi: change memory allocations to GFP_NOIO 2017-07-14 22:11 ` [PATCH 5/6] acpi: change memory allocations to GFP_NOIO Vishal Verma @ 2017-07-15 5:26 ` Dan Williams 2017-07-15 12:45 ` Rafael J. Wysocki 0 siblings, 1 reply; 9+ messages in thread From: Dan Williams @ 2017-07-15 5:26 UTC (permalink / raw) To: Vishal Verma Cc: linux-nvdimm@lists.01.org, Linux ACPI, Jeff Moyer, Rafael J. Wysocki, Toshi Kani, Moore, Robert [ adding Bob ] On Fri, Jul 14, 2017 at 3:11 PM, Vishal Verma <vishal.l.verma@intel.com> wrote: > With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths. > Specifically, the DSM to clear media errors is called during writes, so > that we can provide a writes-fix-errors model. > > However it is easy to imagine a scenario like: > - write through the nvdimm driver > - acpi allocation > - writeback, causes more IO through the nvdimm driver > - deadlock > > Making the acpi allocations GPF_NOIO would ensure that it doesn't > trigger writeback, and avoids the above deadlock. Another option would be to custom code an acpi_ars_clear_error() routine that manages to issue the proper DSM without needing to perform any memory allocations, but this will have implications for the common ACPICA code. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 5/6] acpi: change memory allocations to GFP_NOIO 2017-07-15 5:26 ` Dan Williams @ 2017-07-15 12:45 ` Rafael J. Wysocki 0 siblings, 0 replies; 9+ messages in thread From: Rafael J. Wysocki @ 2017-07-15 12:45 UTC (permalink / raw) To: Dan Williams, Vishal Verma Cc: linux-nvdimm@lists.01.org, Linux ACPI, Jeff Moyer, Rafael J. Wysocki, Toshi Kani, Moore, Robert On Friday, July 14, 2017 10:26:42 PM Dan Williams wrote: > [ adding Bob ] > > On Fri, Jul 14, 2017 at 3:11 PM, Vishal Verma <vishal.l.verma@intel.com> wrote: > > With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths. > > Specifically, the DSM to clear media errors is called during writes, so > > that we can provide a writes-fix-errors model. > > > > However it is easy to imagine a scenario like: > > - write through the nvdimm driver > > - acpi allocation > > - writeback, causes more IO through the nvdimm driver > > - deadlock > > > > Making the acpi allocations GPF_NOIO would ensure that it doesn't > > trigger writeback, and avoids the above deadlock. Well, no. The above is not a good enough reason to switch the entire subsystem over to using GPF_NOIO wholesale, especially on systems that don't care about NFIT and the related things (and that's the majority of systems shipping today AFAICS let alone the systems already in production). If that's one of *multiple* reasons to do the switch-over, we can talk about that, but otherwise it just looks like an attempt to cut corners. > Another option would be to custom code an acpi_ars_clear_error() > routine that manages to issue the proper DSM without needing to > perform any memory allocations, but this will have implications for > the common ACPICA code. But isn't that the *right* way to address the problem at hand? I guess any OS using ACPICA will run into this problem sooner or later, so it would not be unreasonable to request ACPICA modifications to address it IMO. Thanks, Rafael ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20170714221148.11232-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>]
* [PATCH 3/6] btt: ensure that flags were also unchanged during a map_read [not found] ` <20170714221148.11232-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> @ 2017-07-14 22:11 ` Vishal Verma 2017-07-14 22:11 ` [PATCH 6/6] libnvdimm, btt: rework error clearing Vishal Verma 1 sibling, 0 replies; 9+ messages in thread From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw) To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw Cc: Rafael J. Wysocki, linux-acpi-u79uwXL29TY76Z2rM5mHXA In btt_map_read, we read the map twice to make sure that the map entry didn't change after we added it to the read tracking table. In anticipation of expanding the use of the error bit, also make sure that the error and zero flags are constant across the two map reads. Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- drivers/nvdimm/btt.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 3aa6bc0..304dac4 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -1031,6 +1031,7 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip, */ while (1) { u32 new_map; + int new_t, new_e; if (t_flag) { zero_fill_data(page, off, cur_len); @@ -1049,15 +1050,18 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip, */ barrier(); - ret = btt_map_read(arena, premap, &new_map, &t_flag, - &e_flag, NVDIMM_IO_ATOMIC); + ret = btt_map_read(arena, premap, &new_map, &new_t, + &new_e, NVDIMM_IO_ATOMIC); if (ret) goto out_rtt; - if (postmap == new_map) + if ((postmap == new_map) && (t_flag == new_t) && + (e_flag == new_e)) break; postmap = new_map; + t_flag = new_t; + e_flag = new_e; } ret = btt_data_read(arena, page, off, postmap, cur_len); -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 6/6] libnvdimm, btt: rework error clearing [not found] ` <20170714221148.11232-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> 2017-07-14 22:11 ` [PATCH 3/6] btt: ensure that flags were also unchanged during a map_read Vishal Verma @ 2017-07-14 22:11 ` Vishal Verma 1 sibling, 0 replies; 9+ messages in thread From: Vishal Verma @ 2017-07-14 22:11 UTC (permalink / raw) To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw Cc: Rafael J. Wysocki, linux-acpi-u79uwXL29TY76Z2rM5mHXA Clearing errors or badblocks during a BTT write requires sending an ACPI DSM, which means potentially sleeping. Since a BTT IO happens in atomic context (preemption disabled, spinlocks may be held), we cannot perform error clearing in the course of an IO. Due to this error clearing for BTT IOs has hitherto been disabled. In this patch we move error clearing out of the atomic section, and thus re-enable error clearing with BTTs. When we are about to add a block to the free list, we check if it was previously marked as an error, and if it was, we add it to the freelist, but also set a flag that says error clearing will be required. We then drop the lane (ending the atomic context), and send a zero buffer so that the error can be cleared. The error flag in the free list is protected by the nd 'lane', and is set only be a thread while it holds that lane. When the error is cleared, the flag is cleared, but while holding a mutex for that freelist index. When writing, we check for two things - 1/ If the freelist mutex is held or if the error flag is set. If so, this is an error block that is being (or about to be) cleared. 2/ If the block is a known badblock based on nsio->bb The second check is required because the BTT map error flag for a map entry only gets set when an error LBA is read. If we write to a new location that may not have the map error flag set, but still might be in the region's badblock list, we can trigger an EIO on the write, which is undesirable and completely avoidable. Cc: Jeff Moyer <jmoyer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Toshi Kani <toshi.kani-ZPxbGqLxI0U@public.gmane.org> Cc: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> --- drivers/nvdimm/btt.c | 89 +++++++++++++++++++++++++++++++++++++++++++++----- drivers/nvdimm/btt.h | 5 +++ drivers/nvdimm/claim.c | 8 ----- 3 files changed, 85 insertions(+), 17 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index 6b84eae..48382ca9 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -381,7 +381,9 @@ static int btt_flog_write(struct arena_info *arena, u32 lane, u32 sub, arena->freelist[lane].sub = 1 - arena->freelist[lane].sub; if (++(arena->freelist[lane].seq) == 4) arena->freelist[lane].seq = 1; - arena->freelist[lane].block = le32_to_cpu(ent->old_map); + if (ent_e_flag(ent->old_map)) + arena->freelist[lane].has_err = 1; + arena->freelist[lane].block = le32_to_cpu(ent_lba(ent->old_map)); return ret; } @@ -480,6 +482,34 @@ static int btt_log_init(struct arena_info *arena) return ret; } +static u64 to_namespace_offset(struct arena_info *arena, u64 lba) +{ + return arena->dataoff + ((u64)lba * arena->internal_lbasize); +} + +static int arena_clear_freelist_error(struct arena_info *arena, u32 lane) +{ + int ret = 0; + + if (arena->freelist[lane].has_err) { + u32 lba = arena->freelist[lane].block; + u64 nsoff = to_namespace_offset(arena, lba); + void *zerobuf; + + zerobuf = kzalloc(arena->sector_size, GFP_KERNEL); + if (!zerobuf) + return -ENOMEM; + + mutex_lock(&arena->freelist[lane].err_lock); + ret = arena_write_bytes(arena, nsoff, zerobuf, + arena->sector_size, 0); + arena->freelist[lane].has_err = 0; + mutex_unlock(&arena->freelist[lane].err_lock); + kfree(zerobuf); + } + return ret; +} + static int btt_freelist_init(struct arena_info *arena) { int old, new, ret; @@ -505,6 +535,9 @@ static int btt_freelist_init(struct arena_info *arena) arena->freelist[i].seq = nd_inc_seq(le32_to_cpu(log_new.seq)); arena->freelist[i].block = le32_to_cpu(log_new.old_map); + if (ent_e_flag(log_new.old_map)) + arena_clear_freelist_error(arena, i); + /* This implies a newly created or untouched flog entry */ if (log_new.old_map == log_new.new_map) continue; @@ -525,7 +558,7 @@ static int btt_freelist_init(struct arena_info *arena) if (ret) return ret; } - + mutex_init(&arena->freelist[i].err_lock); } return 0; @@ -905,11 +938,6 @@ static void unlock_map(struct arena_info *arena, u32 premap) spin_unlock(&arena->map_locks[idx].lock); } -static u64 to_namespace_offset(struct arena_info *arena, u64 lba) -{ - return arena->dataoff + ((u64)lba * arena->internal_lbasize); -} - static int btt_data_read(struct arena_info *arena, struct page *page, unsigned int off, u32 lba, u32 len) { @@ -1066,8 +1094,14 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip, } ret = btt_data_read(arena, page, off, postmap, cur_len); - if (ret) + if (ret) { + int rc; + + /* Media error - set the e_flag */ + rc = btt_map_write(arena, premap, postmap, 0, 1, + NVDIMM_IO_ATOMIC); goto out_rtt; + } if (bip) { ret = btt_rw_integrity(btt, bip, arena, postmap, READ); @@ -1092,6 +1126,15 @@ static int btt_read_pg(struct btt *btt, struct bio_integrity_payload *bip, return ret; } +static bool btt_is_badblock(struct btt *btt, struct arena_info *arena, + u32 postmap) +{ + u64 nsoff = to_namespace_offset(arena, postmap); + sector_t phys_sector = nsoff >> 9; + + return is_bad_pmem(btt->phys_bb, phys_sector, arena->internal_lbasize); +} + static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, sector_t sector, struct page *page, unsigned int off, unsigned int len) @@ -1104,7 +1147,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, while (len) { u32 cur_len; + int e_flag; + retry: lane = nd_region_acquire_lane(btt->nd_region); ret = lba_to_arena(btt, sector, &premap, &arena); @@ -1117,6 +1162,24 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, goto out_lane; } + /* The block had a media error, and is being cleared */ + if (mutex_is_locked(&arena->freelist[lane].err_lock) + || arena->freelist[lane].has_err) { + nd_region_release_lane(btt->nd_region, lane); + /* OK to acquire a different lane/free block */ + goto retry; + } + + /* The block had a media error, and needs to be cleared */ + if (btt_is_badblock(btt, arena, arena->freelist[lane].block)) { + arena->freelist[lane].has_err = 1; + nd_region_release_lane(btt->nd_region, lane); + + arena_clear_freelist_error(arena, lane); + /* OK to acquire a different lane/free block */ + goto retry; + } + new_postmap = arena->freelist[lane].block; /* Wait if the new block is being read from */ @@ -1142,7 +1205,7 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, } lock_map(arena, premap); - ret = btt_map_read(arena, premap, &old_postmap, NULL, NULL, + ret = btt_map_read(arena, premap, &old_postmap, NULL, &e_flag, NVDIMM_IO_ATOMIC); if (ret) goto out_map; @@ -1150,6 +1213,8 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, ret = -EIO; goto out_map; } + if (e_flag) + set_e_flag(old_postmap); log.lba = cpu_to_le32(premap); log.old_map = cpu_to_le32(old_postmap); @@ -1168,6 +1233,9 @@ static int btt_write_pg(struct btt *btt, struct bio_integrity_payload *bip, unlock_map(arena, premap); nd_region_release_lane(btt->nd_region, lane); + if (e_flag) + arena_clear_freelist_error(arena, lane); + len -= cur_len; off += cur_len; sector += btt->sector_size >> SECTOR_SHIFT; @@ -1358,6 +1426,7 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize, { int ret; struct btt *btt; + struct nd_namespace_io *nsio; struct device *dev = &nd_btt->dev; btt = devm_kzalloc(dev, sizeof(struct btt), GFP_KERNEL); @@ -1371,6 +1440,8 @@ static struct btt *btt_init(struct nd_btt *nd_btt, unsigned long long rawsize, INIT_LIST_HEAD(&btt->arena_list); mutex_init(&btt->init_lock); btt->nd_region = nd_region; + nsio = to_nd_namespace_io(&nd_btt->ndns->dev); + btt->phys_bb = &nsio->bb; ret = discover_arenas(btt); if (ret) { diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h index 2bc0d10b..69c1f90 100644 --- a/drivers/nvdimm/btt.h +++ b/drivers/nvdimm/btt.h @@ -15,6 +15,7 @@ #ifndef _LINUX_BTT_H #define _LINUX_BTT_H +#include <linux/badblocks.h> #include <linux/types.h> #define BTT_SIG_LEN 16 @@ -41,6 +42,7 @@ #define ent_lba(ent) (ent & MAP_LBA_MASK) #define ent_e_flag(ent) (!!(ent & MAP_ERR_MASK)) #define ent_z_flag(ent) (!!(ent & MAP_TRIM_MASK)) +#define set_e_flag(ent) (ent |= MAP_ERR_MASK) enum btt_init_state { INIT_UNCHECKED = 0, @@ -82,6 +84,8 @@ struct free_entry { u32 block; u8 sub; u8 seq; + struct mutex err_lock; + u8 has_err; }; struct aligned_lock { @@ -187,6 +191,7 @@ struct btt { struct mutex init_lock; int init_state; int num_arenas; + struct badblocks *phys_bb; }; bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super); diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c index 8f29937..727f11b 100644 --- a/drivers/nvdimm/claim.c +++ b/drivers/nvdimm/claim.c @@ -280,14 +280,6 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns, } if (unlikely(is_bad_pmem(&nsio->bb, sector, sz_align))) { - /* - * FIXME: nsio_rw_bytes() may be called from atomic - * context in the btt case and the ACPI DSM path for - * clearing the error takes sleeping locks and allocates - * memory. An explicit error clearing path, and support - * for tracking badblocks in BTT metadata is needed to - * work around this collision. - */ if (IS_ALIGNED(offset, 512) && IS_ALIGNED(size, 512) && !(flags & NVDIMM_IO_ATOMIC)) { long cleared; -- 2.9.3 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-07-15 12:53 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-14 22:11 [PATCH 0/6] BTT error clearing rework Vishal Verma
2017-07-14 22:11 ` [PATCH 1/6] btt: fix a missed NVDIMM_IO_ATOMIC case in the write path Vishal Verma
2017-07-14 22:11 ` [PATCH 2/6] btt: refactor map entry operations with macros Vishal Verma
2017-07-14 22:11 ` [PATCH 4/6] btt: cache sector_size in arena_info Vishal Verma
2017-07-14 22:11 ` [PATCH 5/6] acpi: change memory allocations to GFP_NOIO Vishal Verma
2017-07-15 5:26 ` Dan Williams
2017-07-15 12:45 ` Rafael J. Wysocki
[not found] ` <20170714221148.11232-1-vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-07-14 22:11 ` [PATCH 3/6] btt: ensure that flags were also unchanged during a map_read Vishal Verma
2017-07-14 22:11 ` [PATCH 6/6] libnvdimm, btt: rework error clearing Vishal Verma
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).