Linux block layer

Linux block layer
 help / color / mirror / Atom feed

* Re: [PATCH v2 04/13] md: prepare for managing resync I/O pages in clean way
From: Ming Lei @ 2017-03-02  2:09 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	linux-block, Christoph Hellwig
In-Reply-To: <20170228233011.rdqtde22zwsimbz7@kernel.org>

Hi Shaohua,

On Wed, Mar 1, 2017 at 7:30 AM, Shaohua Li <shli@kernel.org> wrote:
> On Tue, Feb 28, 2017 at 11:41:34PM +0800, Ming Lei wrote:
>> Now resync I/O use bio's bec table to manage pages,
>> this way is very hacky, and may not work any more
>> once multipage bvec is introduced.
>>
>> So introduce helpers and new data structure for
>> managing resync I/O pages more cleanly.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/md.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 54 insertions(+)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 1d63239a1be4..b5a638d85cb4 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -720,4 +720,58 @@ static inline void mddev_check_writesame(struct mddev *mddev, struct bio *bio)
>>  #define RESYNC_BLOCK_SIZE (64*1024)
>>  #define RESYNC_PAGES ((RESYNC_BLOCK_SIZE + PAGE_SIZE-1) / PAGE_SIZE)
>>
>> +/* for managing resync I/O pages */
>> +struct resync_pages {
>> +     unsigned        idx;    /* for get/put page from the pool */
>> +     void            *raid_bio;
>> +     struct page     *pages[RESYNC_PAGES];
>> +};
>
> I'd like this to be embedded into r1bio directly. Not sure if we really need a
> structure.

There are two reasons we can't put this into r1bio:
- r1bio is used in both normal and resync I/O, not fair to allocate more
in normal I/O, and that is why this patch wouldn't like to touch r1bio or r10bio

- the count of 'struct resync_pages' instance depends on raid_disks(raid1)
or copies(raid10), both can't be decided during compiling.

>
>> +
>> +static inline int resync_alloc_pages(struct resync_pages *rp,
>> +                                  gfp_t gfp_flags)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < RESYNC_PAGES; i++) {
>> +             rp->pages[i] = alloc_page(gfp_flags);
>> +             if (!rp->pages[i])
>> +                     goto out_free;
>> +     }
>> +
>> +     return 0;
>> +
>> + out_free:
>> +     while (--i >= 0)
>> +             __free_page(rp->pages[i]);
>> +     return -ENOMEM;
>> +}
>> +
>> +static inline void resync_free_pages(struct resync_pages *rp)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < RESYNC_PAGES; i++)
>> +             __free_page(rp->pages[i]);
>
> Since we will use get_page, shouldn't this be put_page?

You are right, will fix in v3.

>
>> +}
>> +
>> +static inline void resync_get_all_pages(struct resync_pages *rp)
>> +{
>> +     int i;
>> +
>> +     for (i = 0; i < RESYNC_PAGES; i++)
>> +             get_page(rp->pages[i]);
>> +}
>> +
>> +static inline struct page *resync_fetch_page(struct resync_pages *rp)
>> +{
>> +     if (WARN_ON_ONCE(rp->idx >= RESYNC_PAGES))
>> +             return NULL;
>> +     return rp->pages[rp->idx++];
>
> I'd like the caller explicitly specify the index instead of a global variable
> to track it, which will make the code more understandable and less error prone.

That is fine, but the index has to be per bio, and finally the index
has to be stored
in 'struct resync_pages', so every user has to call it in the following way:

          resync_fetch_page(rp, rp->idx);

then looks no benefit to pass index explicitly.

>
>> +}
>> +
>> +static inline bool resync_page_available(struct resync_pages *rp)
>> +{
>> +     return rp->idx < RESYNC_PAGES;
>> +}
>
> Then we don't need this obscure API.

That is fine.


Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH 2/3] blk-mq: Provide freeze queue timeout
From: Christoph Hellwig @ 2017-03-02  0:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block, Marc MERLIN, Jens Axboe
In-Reply-To: <1488396132-11369-3-git-send-email-keith.busch@intel.com>

> +int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
> +				     unsigned long timeout)
> +{
> +	return wait_event_timeout(q->mq_freeze_wq,
> +					percpu_ref_is_zero(&q->q_usage_counter),
> +					timeout);
> +}
> +EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);

Can you just add the timeout argument to blk_mq_freeze_queue_wait?
Existing callers can pass 0, which is interpreted as no timeout by
the low-level wait code.

^ permalink raw reply

* Re: [PATCH v2] blkcg: allocate struct blkcg_gq outside request queue spinlock
From: Tahsin Erdogan @ 2017-03-01 23:49 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: linux-block, David Rientjes, linux-kernel, Tahsin Erdogan
In-Reply-To: <20170301234319.29584-1-tahsin@google.com>

Hi Tejun,
>
> Ah, indeed, but we can break out allocation of blkg and its
> initialization, right?  It's a bit more work but then we'd be able to
> do something like.
>
>
> retry:
>         new_blkg = alloc;
>         lock;
>         sanity checks;
>         blkg = blkg_lookup_and_create(..., new_blkg);
>         if (!blkg) {
>                 unlock;
>                 goto retry;
>         }
I tried doing it based on the sample above but I wasn't happy with the
result. The amount of code grew too big. I sent a simplified version
that does blkg allocation within blkg_lookup_create(). I think this
version is simpler, let me know what you think.


On Wed, Mar 1, 2017 at 3:43 PM, Tahsin Erdogan <tahsin@google.com> wrote:
> blkg_conf_prep() currently calls blkg_lookup_create() while holding
> request queue spinlock. This means allocating memory for struct
> blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
> failures in call paths like below:
>
>   pcpu_alloc+0x68f/0x710
>   __alloc_percpu_gfp+0xd/0x10
>   __percpu_counter_init+0x55/0xc0
>   cfq_pd_alloc+0x3b2/0x4e0
>   blkg_alloc+0x187/0x230
>   blkg_create+0x489/0x670
>   blkg_lookup_create+0x9a/0x230
>   blkg_conf_prep+0x1fb/0x240
>   __cfqg_set_weight_device.isra.105+0x5c/0x180
>   cfq_set_weight_on_dfl+0x69/0xc0
>   cgroup_file_write+0x39/0x1c0
>   kernfs_fop_write+0x13f/0x1d0
>   __vfs_write+0x23/0x120
>   vfs_write+0xc2/0x1f0
>   SyS_write+0x44/0xb0
>   entry_SYSCALL_64_fastpath+0x18/0xad
>
> In the code path above, percpu allocator cannot call vmalloc() due to
> queue spinlock.
>
> A failure in this call path gives grief to tools which are trying to
> configure io weights. We see occasional failures happen shortly after
> reboots even when system is not under any memory pressure. Machines
> with a lot of cpus are more vulnerable to this condition.
>
> Add a flag to blkg_lookup_create() to indicate whether releasing locks
> for memory allocation purposes is okay.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Tahsin Erdogan <tahsin@google.com>
> ---
> v2:
>   Moved blkg creation into blkg_lookup_create() to avoid duplicating
>   blkg_lookup_create() logic.
>
>  block/blk-cgroup.c         | 51 +++++++++++++++++++++++++++++++++++++++-------
>  include/linux/blk-cgroup.h |  4 ++--
>  2 files changed, 46 insertions(+), 9 deletions(-)
>
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index 295e98c2c8cc..afb16e998bf3 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -258,18 +258,22 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
>   * blkg_lookup_create - lookup blkg, try to create one if not there
>   * @blkcg: blkcg of interest
>   * @q: request_queue of interest
> + * @wait_ok: whether blocking for memory allocations is okay
>   *
>   * Lookup blkg for the @blkcg - @q pair.  If it doesn't exist, try to
>   * create one.  blkg creation is performed recursively from blkcg_root such
>   * that all non-root blkg's have access to the parent blkg.  This function
>   * should be called under RCU read lock and @q->queue_lock.
>   *
> + * When @wait_ok is true, rcu and queue locks may be dropped for allocating
> + * memory. In this case, the locks will be reacquired on return.
> + *
>   * Returns pointer to the looked up or created blkg on success, ERR_PTR()
>   * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
>   * dead and bypassing, returns ERR_PTR(-EBUSY).
>   */
>  struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
> -                                   struct request_queue *q)
> +                                   struct request_queue *q, bool wait_ok)
>  {
>         struct blkcg_gq *blkg;
>
> @@ -300,7 +304,30 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
>                         parent = blkcg_parent(parent);
>                 }
>
> -               blkg = blkg_create(pos, q, NULL);
> +               if (wait_ok) {
> +                       struct blkcg_gq *new_blkg;
> +
> +                       spin_unlock_irq(q->queue_lock);
> +                       rcu_read_unlock();
> +
> +                       new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
> +
> +                       rcu_read_lock();
> +                       spin_lock_irq(q->queue_lock);
> +
> +                       if (unlikely(!new_blkg))
> +                               return ERR_PTR(-ENOMEM);
> +
> +                       if (unlikely(blk_queue_bypass(q))) {
> +                               blkg_free(new_blkg);
> +                               return ERR_PTR(blk_queue_dying(q) ?
> +                                                       -ENODEV : -EBUSY);
> +                       }
> +
> +                       blkg = blkg_create(pos, q, new_blkg);
> +               } else
> +                       blkg = blkg_create(pos, q, NULL);
> +
>                 if (pos == blkcg || IS_ERR(blkg))
>                         return blkg;
>         }
> @@ -789,6 +816,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
>  {
>         struct gendisk *disk;
>         struct blkcg_gq *blkg;
> +       struct request_queue *q;
>         struct module *owner;
>         unsigned int major, minor;
>         int key_len, part, ret;
> @@ -812,18 +840,27 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
>                 return -ENODEV;
>         }
>
> +       q = disk->queue;
> +
>         rcu_read_lock();
> -       spin_lock_irq(disk->queue->queue_lock);
> +       spin_lock_irq(q->queue_lock);
>
> -       if (blkcg_policy_enabled(disk->queue, pol))
> -               blkg = blkg_lookup_create(blkcg, disk->queue);
> -       else
> +       if (blkcg_policy_enabled(q, pol)) {
> +               blkg = blkg_lookup_create(blkcg, q, true /* wait_ok */);
> +
> +               /*
> +                * blkg_lookup_create() may have dropped and reacquired the
> +                * queue lock. Check policy enabled state again.
> +                */
> +               if (!IS_ERR(blkg) && unlikely(!blkcg_policy_enabled(q, pol)))
> +                       blkg = ERR_PTR(-EOPNOTSUPP);
> +       } else
>                 blkg = ERR_PTR(-EOPNOTSUPP);
>
>         if (IS_ERR(blkg)) {
>                 ret = PTR_ERR(blkg);
>                 rcu_read_unlock();
> -               spin_unlock_irq(disk->queue->queue_lock);
> +               spin_unlock_irq(q->queue_lock);
>                 owner = disk->fops->owner;
>                 put_disk(disk);
>                 module_put(owner);
> diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
> index 01b62e7bac74..78067dd59c91 100644
> --- a/include/linux/blk-cgroup.h
> +++ b/include/linux/blk-cgroup.h
> @@ -172,7 +172,7 @@ extern struct cgroup_subsys_state * const blkcg_root_css;
>  struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
>                                       struct request_queue *q, bool update_hint);
>  struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
> -                                   struct request_queue *q);
> +                                   struct request_queue *q, bool wait_ok);
>  int blkcg_init_queue(struct request_queue *q);
>  void blkcg_drain_queue(struct request_queue *q);
>  void blkcg_exit_queue(struct request_queue *q);
> @@ -694,7 +694,7 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
>         blkg = blkg_lookup(blkcg, q);
>         if (unlikely(!blkg)) {
>                 spin_lock_irq(q->queue_lock);
> -               blkg = blkg_lookup_create(blkcg, q);
> +               blkg = blkg_lookup_create(blkcg, q, false /* wait_ok */);
>                 if (IS_ERR(blkg))
>                         blkg = NULL;
>                 spin_unlock_irq(q->queue_lock);
> --
> 2.12.0.rc1.440.g5b76565f74-goog
>

^ permalink raw reply

* [PATCH v2] blkcg: allocate struct blkcg_gq outside request queue spinlock
From: Tahsin Erdogan @ 2017-03-01 23:43 UTC (permalink / raw)
  To: Tejun Heo, Jens Axboe
  Cc: linux-block, David Rientjes, linux-kernel, Tahsin Erdogan
In-Reply-To: <20170301165501.GB3662@htj.duckdns.org>

blkg_conf_prep() currently calls blkg_lookup_create() while holding
request queue spinlock. This means allocating memory for struct
blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM
failures in call paths like below:

  pcpu_alloc+0x68f/0x710
  __alloc_percpu_gfp+0xd/0x10
  __percpu_counter_init+0x55/0xc0
  cfq_pd_alloc+0x3b2/0x4e0
  blkg_alloc+0x187/0x230
  blkg_create+0x489/0x670
  blkg_lookup_create+0x9a/0x230
  blkg_conf_prep+0x1fb/0x240
  __cfqg_set_weight_device.isra.105+0x5c/0x180
  cfq_set_weight_on_dfl+0x69/0xc0
  cgroup_file_write+0x39/0x1c0
  kernfs_fop_write+0x13f/0x1d0
  __vfs_write+0x23/0x120
  vfs_write+0xc2/0x1f0
  SyS_write+0x44/0xb0
  entry_SYSCALL_64_fastpath+0x18/0xad

In the code path above, percpu allocator cannot call vmalloc() due to
queue spinlock.

A failure in this call path gives grief to tools which are trying to
configure io weights. We see occasional failures happen shortly after
reboots even when system is not under any memory pressure. Machines
with a lot of cpus are more vulnerable to this condition.

Add a flag to blkg_lookup_create() to indicate whether releasing locks
for memory allocation purposes is okay.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
---
v2:
  Moved blkg creation into blkg_lookup_create() to avoid duplicating
  blkg_lookup_create() logic.

 block/blk-cgroup.c         | 51 +++++++++++++++++++++++++++++++++++++++-------
 include/linux/blk-cgroup.h |  4 ++--
 2 files changed, 46 insertions(+), 9 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 295e98c2c8cc..afb16e998bf3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -258,18 +258,22 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
  * blkg_lookup_create - lookup blkg, try to create one if not there
  * @blkcg: blkcg of interest
  * @q: request_queue of interest
+ * @wait_ok: whether blocking for memory allocations is okay
  *
  * Lookup blkg for the @blkcg - @q pair.  If it doesn't exist, try to
  * create one.  blkg creation is performed recursively from blkcg_root such
  * that all non-root blkg's have access to the parent blkg.  This function
  * should be called under RCU read lock and @q->queue_lock.
  *
+ * When @wait_ok is true, rcu and queue locks may be dropped for allocating
+ * memory. In this case, the locks will be reacquired on return.
+ *
  * Returns pointer to the looked up or created blkg on success, ERR_PTR()
  * value on error.  If @q is dead, returns ERR_PTR(-EINVAL).  If @q is not
  * dead and bypassing, returns ERR_PTR(-EBUSY).
  */
 struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
-				    struct request_queue *q)
+				    struct request_queue *q, bool wait_ok)
 {
 	struct blkcg_gq *blkg;
 
@@ -300,7 +304,30 @@ struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
 			parent = blkcg_parent(parent);
 		}
 
-		blkg = blkg_create(pos, q, NULL);
+		if (wait_ok) {
+			struct blkcg_gq *new_blkg;
+
+			spin_unlock_irq(q->queue_lock);
+			rcu_read_unlock();
+
+			new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
+
+			rcu_read_lock();
+			spin_lock_irq(q->queue_lock);
+
+			if (unlikely(!new_blkg))
+				return ERR_PTR(-ENOMEM);
+
+			if (unlikely(blk_queue_bypass(q))) {
+				blkg_free(new_blkg);
+				return ERR_PTR(blk_queue_dying(q) ?
+							-ENODEV : -EBUSY);
+			}
+
+			blkg = blkg_create(pos, q, new_blkg);
+		} else
+			blkg = blkg_create(pos, q, NULL);
+
 		if (pos == blkcg || IS_ERR(blkg))
 			return blkg;
 	}
@@ -789,6 +816,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 {
 	struct gendisk *disk;
 	struct blkcg_gq *blkg;
+	struct request_queue *q;
 	struct module *owner;
 	unsigned int major, minor;
 	int key_len, part, ret;
@@ -812,18 +840,27 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 		return -ENODEV;
 	}
 
+	q = disk->queue;
+
 	rcu_read_lock();
-	spin_lock_irq(disk->queue->queue_lock);
+	spin_lock_irq(q->queue_lock);
 
-	if (blkcg_policy_enabled(disk->queue, pol))
-		blkg = blkg_lookup_create(blkcg, disk->queue);
-	else
+	if (blkcg_policy_enabled(q, pol)) {
+		blkg = blkg_lookup_create(blkcg, q, true /* wait_ok */);
+
+		/*
+		 * blkg_lookup_create() may have dropped and reacquired the
+		 * queue lock. Check policy enabled state again.
+		 */
+		if (!IS_ERR(blkg) && unlikely(!blkcg_policy_enabled(q, pol)))
+			blkg = ERR_PTR(-EOPNOTSUPP);
+	} else
 		blkg = ERR_PTR(-EOPNOTSUPP);
 
 	if (IS_ERR(blkg)) {
 		ret = PTR_ERR(blkg);
 		rcu_read_unlock();
-		spin_unlock_irq(disk->queue->queue_lock);
+		spin_unlock_irq(q->queue_lock);
 		owner = disk->fops->owner;
 		put_disk(disk);
 		module_put(owner);
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index 01b62e7bac74..78067dd59c91 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -172,7 +172,7 @@ extern struct cgroup_subsys_state * const blkcg_root_css;
 struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg,
 				      struct request_queue *q, bool update_hint);
 struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg,
-				    struct request_queue *q);
+				    struct request_queue *q, bool wait_ok);
 int blkcg_init_queue(struct request_queue *q);
 void blkcg_drain_queue(struct request_queue *q);
 void blkcg_exit_queue(struct request_queue *q);
@@ -694,7 +694,7 @@ static inline bool blkcg_bio_issue_check(struct request_queue *q,
 	blkg = blkg_lookup(blkcg, q);
 	if (unlikely(!blkg)) {
 		spin_lock_irq(q->queue_lock);
-		blkg = blkg_lookup_create(blkcg, q);
+		blkg = blkg_lookup_create(blkcg, q, false /* wait_ok */);
 		if (IS_ERR(blkg))
 			blkg = NULL;
 		spin_unlock_irq(q->queue_lock);
-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply related

* Re: [PATCH 0/3] nvme suspend/resume fix
From: Jens Axboe @ 2017-03-01 23:29 UTC (permalink / raw)
  To: Keith Busch, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block
  Cc: Marc MERLIN, Jens Axboe
In-Reply-To: <1488396132-11369-1-git-send-email-keith.busch@intel.com>

On 03/01/2017 12:22 PM, Keith Busch wrote:
> Hi Jens,
> 
> This is hopefully the last version to fix nvme stopping blk-mq's CPU
> event from making forward progress. The solution requires a couple new
> blk-mq exports so the nvme driver can properly sync with queue states.
> 
> Since this depends on the blk-mq parts, and if you approve of the
> proposal, I think it'd be easiest if you can take this directly into
> linux-block/for-linus. Otherwise, we can send you a pull request if you
> Ack the blk-mq parts.
> 
> The difference from the previous patch is an update that Artur
> confirmed passes hibernate on a stacked request queue. Personally,
> I tested this for several hours with fio running buffered writes
> in the back-ground and rtcwake running suspend/resume at intervals.
> This succeeded with no fio errors.

I've queued it up for this series, thanks Keith.

-- 
Jens Axboe

^ permalink raw reply

* Re: [PATCH 1/8] nowait aio: Introduce IOCB_FLAG_NOWAIT
From: Christoph Hellwig @ 2017-03-01 22:44 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: Christoph Hellwig, jack, linux-fsdevel, linux-block, linux-btrfs,
	linux-ext4, linux-xfs
In-Reply-To: <cc3fc6ee-c48d-1b51-59b7-1e322d10a561@suse.de>

On Wed, Mar 01, 2017 at 10:57:17AM -0600, Goldwyn Rodrigues wrote:
> RWF_* ? Isn't that kernel space flags? Or did you intend to say
> IOCB_FLAG_*?

No, they are the flags for preadv2/pwritev2.

> If yes, we maintain two flag fields? aio_reserved1 (perhaps
> renamed to aio_flags2) and aio_flags?

Yes - I'd call it aio_rw_flags or similar.

> aio_reserved1 is also used to return key for the purpose of io_cancel,
> but we should be able to fetch the flags before putting the key value
> there. Still I am not comfortable using the same field for it because it
> will be overwritten when io_submit returns.

It's not - the key is a separate field.  It's just that the two are
defined using a very strange macro switching around their positions
based on the endiannes.

> Which brings me to the next question: What is the purpose of aio_key?
> Why is aio_key set to KIOCB_KEY (which is zero) every time? You are not
> differentiating the request by setting all the iocb's key to zero.

I don't know the history of this rather odd field.

^ permalink raw reply

* [PATCH 1/2] loop: fix LO_FLAGS_PARTSCAN hang
From: Omar Sandoval @ 2017-03-01 18:42 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team, Ming Lei, stable

From: Omar Sandoval <osandov@fb.com>

loop_reread_partitions() needs to do I/O, but we just froze the queue,
so we end up waiting forever. This can easily be reproduced with losetup
-P. Fix it by moving the reread to after we unfreeze the queue.

Fixes: ecdd09597a57 ("block/loop: fix race between I/O and set_status")
Reported-by: Tejun Heo <tj@kernel.org>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 drivers/block/loop.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4b52a1690329..132c9f371dce 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1142,13 +1142,6 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 	     (info->lo_flags & LO_FLAGS_AUTOCLEAR))
 		lo->lo_flags ^= LO_FLAGS_AUTOCLEAR;
 
-	if ((info->lo_flags & LO_FLAGS_PARTSCAN) &&
-	     !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
-		lo->lo_flags |= LO_FLAGS_PARTSCAN;
-		lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
-		loop_reread_partitions(lo, lo->lo_device);
-	}
-
 	lo->lo_encrypt_key_size = info->lo_encrypt_key_size;
 	lo->lo_init[0] = info->lo_init[0];
 	lo->lo_init[1] = info->lo_init[1];
@@ -1163,6 +1156,14 @@ loop_set_status(struct loop_device *lo, const struct loop_info64 *info)
 
  exit:
 	blk_mq_unfreeze_queue(lo->lo_queue);
+
+	if (!err && (info->lo_flags & LO_FLAGS_PARTSCAN) &&
+	     !(lo->lo_flags & LO_FLAGS_PARTSCAN)) {
+		lo->lo_flags |= LO_FLAGS_PARTSCAN;
+		lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
+		loop_reread_partitions(lo, lo->lo_device);
+	}
+
 	return err;
 }
 
-- 
2.12.0

^ permalink raw reply related

* [PATCH 2/2] blk-mq-debugfs: add q->mq_freeze_depth output
From: Omar Sandoval @ 2017-03-01 18:42 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: kernel-team
In-Reply-To: <7e888cad9e6c35835559281d3ab9e05ea48836d0.1488393750.git.osandov@fb.com>

From: Omar Sandoval <osandov@fb.com>

This can be used to diagnose freeze-related hangs.

Signed-off-by: Omar Sandoval <osandov@fb.com>
---
 block/blk-mq-debugfs.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f6d917977b33..1459546788da 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -29,6 +29,26 @@ struct blk_mq_debugfs_attr {
 	const struct file_operations *fops;
 };
 
+static int queue_freeze_depth_show(struct seq_file *m, void *v)
+{
+	struct request_queue *q = m->private;
+
+	seq_printf(m, "%d\n", atomic_read(&q->mq_freeze_depth));
+	return 0;
+}
+
+static int queue_freeze_depth_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, queue_freeze_depth_show, inode->i_private);
+}
+
+static const struct file_operations queue_freeze_depth_fops = {
+	.open		= queue_freeze_depth_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static int blk_mq_debugfs_seq_open(struct inode *inode, struct file *file,
 				   const struct seq_operations *ops)
 {
@@ -636,6 +656,11 @@ static const struct file_operations ctx_completed_fops = {
 	.release	= single_release,
 };
 
+static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
+	{"freeze_depth", 0400, &queue_freeze_depth_fops},
+	{},
+};
+
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"state", 0400, &hctx_state_fops},
 	{"flags", 0400, &hctx_flags_fops},
@@ -753,6 +778,9 @@ int blk_mq_debugfs_register_hctxs(struct request_queue *q)
 	if (!q->mq_debugfs_dir)
 		goto err;
 
+	if (!debugfs_create_files(q->mq_debugfs_dir, q, blk_mq_debugfs_queue_attrs))
+		return -ENOMEM;
+
 	queue_for_each_hw_ctx(q, hctx, i) {
 		if (blk_mq_debugfs_register_hctx(q, hctx))
 			goto err;
-- 
2.12.0

^ permalink raw reply related

* [PATCH 3/3] nvme: Complete all stuck requests
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch
In-Reply-To: <1488396132-11369-1-git-send-email-keith.busch@intel.com>

If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/nvme/host/core.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  4 ++++
 drivers/nvme/host/pci.c  | 33 ++++++++++++++++++++++++++++-----
 3 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 25ec4e5..9b3b57f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2344,6 +2344,53 @@ void nvme_kill_queues(struct nvme_ctrl *ctrl)
 }
 EXPORT_SYMBOL_GPL(nvme_kill_queues);
 
+void nvme_unfreeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_unfreeze_queue(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_unfreeze);
+
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
+		if (timeout <= 0)
+			break;
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze_timeout);
+
+void nvme_wait_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_wait(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_wait_freeze);
+
+void nvme_start_freeze(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list)
+		blk_mq_freeze_queue_start(ns->queue);
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+EXPORT_SYMBOL_GPL(nvme_start_freeze);
+
 void nvme_stop_queues(struct nvme_ctrl *ctrl)
 {
 	struct nvme_ns *ns;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a3da1e9..2aa20e3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -294,6 +294,10 @@ void nvme_queue_async_events(struct nvme_ctrl *ctrl);
 void nvme_stop_queues(struct nvme_ctrl *ctrl);
 void nvme_start_queues(struct nvme_ctrl *ctrl);
 void nvme_kill_queues(struct nvme_ctrl *ctrl);
+void nvme_unfreeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze(struct nvme_ctrl *ctrl);
+void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
+void nvme_start_freeze(struct nvme_ctrl *ctrl);
 
 #define NVME_QID_ANY -1
 struct request *nvme_alloc_request(struct request_queue *q,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index eee8f84..26a5fd0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1675,21 +1675,34 @@ static void nvme_pci_disable(struct nvme_dev *dev)
 static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 {
 	int i, queues;
-	u32 csts = -1;
+	bool dead = true;
+	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	del_timer_sync(&dev->watchdog_timer);
 
 	mutex_lock(&dev->shutdown_lock);
-	if (pci_is_enabled(to_pci_dev(dev->dev))) {
-		nvme_stop_queues(&dev->ctrl);
-		csts = readl(dev->bar + NVME_REG_CSTS);
+	if (pci_is_enabled(pdev)) {
+		u32 csts = readl(dev->bar + NVME_REG_CSTS);
+
+		if (dev->ctrl.state == NVME_CTRL_LIVE)
+			nvme_start_freeze(&dev->ctrl);
+		dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
+			pdev->error_state  != pci_channel_io_normal);
 	}
 
+	/*
+	 * Give the controller a chance to complete all entered requests if
+	 * doing a safe shutdown.
+	 */
+	if (!dead && shutdown)
+		nvme_wait_freeze_timeout(&dev->ctrl, NVME_IO_TIMEOUT);
+	nvme_stop_queues(&dev->ctrl);
+
 	queues = dev->online_queues - 1;
 	for (i = dev->queue_count - 1; i > 0; i--)
 		nvme_suspend_queue(dev->queues[i]);
 
-	if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
+	if (dead) {
 		/* A device might become IO incapable very soon during
 		 * probe, before the admin queue is configured. Thus,
 		 * queue_count can be 0 here.
@@ -1704,6 +1717,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 
 	blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
 	blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+
+	/*
+	 * The driver will not be starting up queues again if shutting down so
+	 * must flush all entered requests to their failed completion to avoid
+	 * deadlocking blk-mq hot-cpu notifier.
+	 */
+	if (shutdown)
+		nvme_start_queues(&dev->ctrl);
 	mutex_unlock(&dev->shutdown_lock);
 }
 
@@ -1826,7 +1847,9 @@ static void nvme_reset_work(struct work_struct *work)
 		nvme_remove_namespaces(&dev->ctrl);
 	} else {
 		nvme_start_queues(&dev->ctrl);
+		nvme_wait_freeze(&dev->ctrl);
 		nvme_dev_add(dev);
+		nvme_unfreeze(&dev->ctrl);
 	}
 
 	if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_LIVE)) {
-- 
2.5.5

^ permalink raw reply related

* [PATCH 2/3] blk-mq: Provide freeze queue timeout
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch
In-Reply-To: <1488396132-11369-1-git-send-email-keith.busch@intel.com>

A driver may wish to take corrective action if queued requests do not
complete within a set time.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c         | 9 +++++++++
 include/linux/blk-mq.h | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8da2c04..a5e66a7 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -81,6 +81,15 @@ void blk_mq_freeze_queue_wait(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout)
+{
+	return wait_event_timeout(q->mq_freeze_wq,
+					percpu_ref_is_zero(&q->q_usage_counter),
+					timeout);
+}
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait_timeout);
+
 /*
  * Guarantee no request is in use, so we can change any data structure of
  * the queue afterward.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8dacf68..b296a90 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -246,6 +246,8 @@ void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
 void blk_mq_freeze_queue_wait(struct request_queue *q);
+int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
+				     unsigned long timeout);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 
2.5.5

^ permalink raw reply related

* [PATCH 1/3] blk-mq: Export blk_mq_freeze_queue_wait
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch
In-Reply-To: <1488396132-11369-1-git-send-email-keith.busch@intel.com>

Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-mq.c         | 3 ++-
 include/linux/blk-mq.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 94593c6..8da2c04 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -75,10 +75,11 @@ void blk_mq_freeze_queue_start(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_start);
 
-static void blk_mq_freeze_queue_wait(struct request_queue *q)
+void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
 	wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter));
 }
+EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 /*
  * Guarantee no request is in use, so we can change any data structure of
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 001d30d..8dacf68 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -245,6 +245,7 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
 void blk_mq_freeze_queue(struct request_queue *q);
 void blk_mq_unfreeze_queue(struct request_queue *q);
 void blk_mq_freeze_queue_start(struct request_queue *q);
+void blk_mq_freeze_queue_wait(struct request_queue *q);
 int blk_mq_reinit_tagset(struct blk_mq_tag_set *set);
 
 int blk_mq_map_queues(struct blk_mq_tag_set *set);
-- 
2.5.5

^ permalink raw reply related

* [PATCH 0/3] nvme suspend/resume fix
From: Keith Busch @ 2017-03-01 19:22 UTC (permalink / raw)
  To: Jens Axboe, Sagi Grimberg, Christoph Hellwig, linux-nvme,
	linux-block
  Cc: Marc MERLIN, Jens Axboe, Keith Busch

Hi Jens,

This is hopefully the last version to fix nvme stopping blk-mq's CPU
event from making forward progress. The solution requires a couple new
blk-mq exports so the nvme driver can properly sync with queue states.

Since this depends on the blk-mq parts, and if you approve of the
proposal, I think it'd be easiest if you can take this directly into
linux-block/for-linus. Otherwise, we can send you a pull request if you
Ack the blk-mq parts.

The difference from the previous patch is an update that Artur
confirmed passes hibernate on a stacked request queue. Personally,
I tested this for several hours with fio running buffered writes
in the back-ground and rtcwake running suspend/resume at intervals.
This succeeded with no fio errors.

Keith Busch (3):
  blk-mq: Export blk_mq_freeze_queue_wait
  blk-mq: Provide queue freeze wait timeout
  nvme: Complete all stuck requests

 block/blk-mq.c           | 12 +++++++++++-
 drivers/nvme/host/core.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |  4 ++++
 drivers/nvme/host/pci.c  | 33 ++++++++++++++++++++++++++++-----
 include/linux/blk-mq.h   |  3 +++
 5 files changed, 93 insertions(+), 6 deletions(-)

-- 
2.5.5

^ permalink raw reply

* Re: [PATCH 11/16] mmc: block: shuffle retry and error handling
From: Bartlomiej Zolnierkiewicz @ 2017-03-01 17:48 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-mmc, Ulf Hansson, Adrian Hunter, Paolo Valente,
	Chunyan Zhang, Baolin Wang, linux-block, Jens Axboe,
	Christoph Hellwig, Arnd Bergmann
In-Reply-To: <18156581.sUHfslyV5F@amdc3058>

On Wednesday, March 01, 2017 04:52:38 PM Bartlomiej Zolnierkiewicz wrote:

> I assume that the problem got introduced even earlier,
> commit 4515dc6 ("mmc: block: shuffle retry and error
> handling") just makes it happen every time.

It seems to be introduced by patch #6. Patch #5 survived
30 consecutive boot+sync iterations (with later patches
the issue shows up during the first 12 iterations).

root@target:~# sync
[  248.801846] INFO: task mmcqd/0:128 blocked for more than 120 seconds.
[  248.806866]       Tainted: G        W       4.10.0-rc3-00113-g5750765 #2739
[  248.814051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.821696] mmcqd/0         D    0   128      2 0x00000000
[  248.827123] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac)
[  248.834210] [<c06dfa24>] (schedule) from [<c06e5384>] (schedule_timeout+0x148/0x220)
[  248.841912] [<c06e5384>] (schedule_timeout) from [<c06e0310>] (wait_for_common+0xb8/0x144)
[  248.850058] [<c06e0310>] (wait_for_common) from [<c0528100>] (mmc_start_areq+0x40/0x1ac)
[  248.858209] [<c0528100>] (mmc_start_areq) from [<c05376c0>] (mmc_blk_issue_rw_rq+0x78/0x314)
[  248.866599] [<c05376c0>] (mmc_blk_issue_rw_rq) from [<c0538358>] (mmc_blk_issue_rq+0x9c/0x458)
[  248.875293] [<c0538358>] (mmc_blk_issue_rq) from [<c0538868>] (mmc_queue_thread+0x98/0x180)
[  248.883789] [<c0538868>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  248.891058] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  248.898364] INFO: task jbd2/mmcblk0p2-:136 blocked for more than 120 seconds.
[  248.905400]       Tainted: G        W       4.10.0-rc3-00113-g5750765 #2739
[  248.912353] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.919923] jbd2/mmcblk0p2- D    0   136      2 0x00000000
[  248.925693] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac)
[  248.932470] [<c06dfa24>] (schedule) from [<c0294ccc>] (jbd2_journal_commit_transaction+0x1e8/0x15c4)
[  248.941552] [<c0294ccc>] (jbd2_journal_commit_transaction) from [<c0298af0>] (kjournald2+0xbc/0x264)
[  248.950608] [<c0298af0>] (kjournald2) from [<c0135604>] (kthread+0xfc/0x134)
[  248.957660] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  248.964860] INFO: task kworker/u16:2:730 blocked for more than 120 seconds.
[  248.971780]       Tainted: G        W       4.10.0-rc3-00113-g5750765 #2739
[  248.978673] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.986686] kworker/u16:2   D    0   730      2 0x00000000
[  248.991993] Workqueue: writeback wb_workfn (flush-179:0)
[  248.997230] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac)
[  249.004287] [<c06dfa24>] (schedule) from [<c06e5384>] (schedule_timeout+0x148/0x220)
[  249.011997] [<c06e5384>] (schedule_timeout) from [<c06df364>] (io_schedule_timeout+0x74/0xb0)
[  249.020451] [<c06df364>] (io_schedule_timeout) from [<c06dfd0c>] (bit_wait_io+0x10/0x58)
[  249.028545] [<c06dfd0c>] (bit_wait_io) from [<c06dff1c>] (__wait_on_bit_lock+0x74/0xd0)
[  249.036513] [<c06dff1c>] (__wait_on_bit_lock) from [<c06dffe0>] (out_of_line_wait_on_bit_lock+0x68/0x70)
[  249.046231] [<c06dffe0>] (out_of_line_wait_on_bit_lock) from [<c0293dfc>] (do_get_write_access+0x3d0/0x4c4)
[  249.055729] [<c0293dfc>] (do_get_write_access) from [<c029410c>] (jbd2_journal_get_write_access+0x38/0x64)
[  249.065336] [<c029410c>] (jbd2_journal_get_write_access) from [<c0272680>] (__ext4_journal_get_write_access+0x2c/0x68)
[  249.076016] [<c0272680>] (__ext4_journal_get_write_access) from [<c0278eb8>] (ext4_mb_mark_diskspace_used+0x64/0x474)
[  249.086515] [<c0278eb8>] (ext4_mb_mark_diskspace_used) from [<c027a334>] (ext4_mb_new_blocks+0x258/0xa1c)
[  249.096040] [<c027a334>] (ext4_mb_new_blocks) from [<c026fc80>] (ext4_ext_map_blocks+0x8b4/0xf28)
[  249.104883] [<c026fc80>] (ext4_ext_map_blocks) from [<c024f318>] (ext4_map_blocks+0x144/0x5f8)
[  249.113468] [<c024f318>] (ext4_map_blocks) from [<c0254b0c>] (mpage_map_and_submit_extent+0xa4/0x788)
[  249.122641] [<c0254b0c>] (mpage_map_and_submit_extent) from [<c02556d0>] (ext4_writepages+0x4e0/0x670)
[  249.131925] [<c02556d0>] (ext4_writepages) from [<c01a5348>] (do_writepages+0x24/0x38)
[  249.139774] [<c01a5348>] (do_writepages) from [<c0208038>] (__writeback_single_inode+0x28/0x18c)
[  249.148555] [<c0208038>] (__writeback_single_inode) from [<c02085f0>] (writeback_sb_inodes+0x1e0/0x394)
[  249.157909] [<c02085f0>] (writeback_sb_inodes) from [<c0208814>] (__writeback_inodes_wb+0x70/0xac)
[  249.166833] [<c0208814>] (__writeback_inodes_wb) from [<c02089dc>] (wb_writeback+0x18c/0x1b4)
[  249.175324] [<c02089dc>] (wb_writeback) from [<c0208c74>] (wb_workfn+0xd4/0x388)
[  249.182704] [<c0208c74>] (wb_workfn) from [<c012fdf8>] (process_one_work+0x120/0x318)
[  249.190464] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac)
[  249.198551] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  249.205904] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  249.213094] INFO: task sync:1403 blocked for more than 120 seconds.
[  249.219261]       Tainted: G        W       4.10.0-rc3-00113-g5750765 #2739
[  249.226220] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.234019] sync            D    0  1403   1396 0x00000000
[  249.239424] [<c06df51c>] (__schedule) from [<c06dfa24>] (schedule+0x40/0xac)
[  249.246624] [<c06dfa24>] (schedule) from [<c02078c0>] (wb_wait_for_completion+0x50/0x7c)
[  249.254538] [<c02078c0>] (wb_wait_for_completion) from [<c0207c14>] (sync_inodes_sb+0x94/0x20c)
[  249.263200] [<c0207c14>] (sync_inodes_sb) from [<c01e4dc8>] (iterate_supers+0xac/0xd4)
[  249.271056] [<c01e4dc8>] (iterate_supers) from [<c020c088>] (sys_sync+0x30/0x98)
[  249.278446] [<c020c088>] (sys_sync) from [<c01078c0>] (ret_fast_syscall+0x0/0x3c)

I also once hit another problem with patch #6 that doesn't
happen with patch #5:

[   12.121767] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[   12.129747] pgd = c0004000
[   12.132425] [00000008] *pgd=00000000
[   12.135996] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   12.141262] Modules linked in:
[   12.144304] CPU: 0 PID: 126 Comm: mmcqd/0 Tainted: G        W       4.10.0-rc3-00113-g5750765 #2739
[   12.153296] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   12.159367] task: edd19900 task.stack: edd66000
[   12.163900] PC is at kthread_queue_work+0x18/0x64
[   12.168574] LR is at _raw_spin_lock_irqsave+0x20/0x28
[   12.173583] pc : [<c0135b24>]    lr : [<c06e6138>]    psr: 60000193
[   12.173583] sp : edd67d10  ip : 00000000  fp : edcc9b04
[   12.185014] r10: 00000000  r9 : edd6808c  r8 : edcc9b08
[   12.190215] r7 : 00000000  r6 : edc97320  r5 : edc97324  r4 : 00000008
[   12.196714] r3 : edc97000  r2 : 00000000  r1 : 0b750b74  r0 : a0000113
[   12.203216] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   12.210406] Control: 10c5387d  Table: 6d0e006a  DAC: 00000051
[   12.216125] Process mmcqd/0 (pid: 126, stack limit = 0xedd66210)
[   12.222102] Stack: (0xedd67d10 to 0xedd68000)
[   12.226444] 7d00:                                     edc97000 edd68004 edd6808c edc97000
[   12.234595] 7d20: 00000000 c0527ab8 edcc9a10 edc97000 edd68004 edd68004 edcc9b08 c0542834
[   12.242740] 7d40: edd680c0 edcc9a10 00000001 edd680f4 edd68004 c0542b5c edd67da4 edcc9a80
[   12.250886] 7d60: c0b108aa edcc9af0 edcc9af4 00000000 c0a62244 00000000 c0b02080 00000006
[   12.259031] 7d80: 00000101 c011f6f0 00000000 c0b02098 00000006 c0a622c8 c0b02080 c011edac
[   12.267176] 7da0: eea15160 00000001 00000000 00000009 ffff8f8d 00208840 eea15100 00000000
[   12.275322] 7dc0: 0000004b c0a65c20 00000000 00000001 ee818000 edd67e28 00000000 c011f1a8
[   12.283468] 7de0: 0000008c c016068c f0802000 c0b05724 f080200c 000003eb c0b17c30 f0803000
[   12.291614] 7e00: edd67e28 c0101470 c03448b8 20000013 ffffffff edd67e5c 00000000 edd66000
[   12.299759] 7e20: edd68004 c010b00c c08a2154 c0890cdc edd67e78 edd66000 00000000 c011f068
[   12.307904] 7e40: c0890cdc c08a2154 00000000 edd68030 edd68004 00000000 00000001 edd67e78
[   12.316050] 7e60: c011f068 c03448b8 20000013 ffffffff 00000051 00000000 edd68004 00000001
[   12.324195] 7e80: 00000000 00000201 edc97000 edd68004 00000001 c011f068 edc97000 c0527b8c
[   12.332340] 7ea0: 00000000 edd68004 edc97000 edd6813c 00000001 c0527d04 edd68044 edc97000
[   12.340487] 7ec0: 00000000 c0528208 edd68000 edd31800 edd48858 edd48858 ede6fe60 edd48840
[   12.348631] 7ee0: edd48840 00000001 00000000 c05376c0 00000000 00000001 00000000 00000000
[   12.356777] 7f00: 00000000 c013c5ec 00000100 ede6fe60 00000000 edd48858 edd48840 edd48840
[   12.364922] 7f20: edd31800 00000001 00000000 c0538358 edc18b50 edc97000 edd48860 00000001
[   12.373068] 7f40: edc18b50 edd48858 00000000 ede6fe60 edc18b50 edc97000 edd48860 00000001
[   12.381214] 7f60: 00000000 c0538868 edd19900 eeae0500 00000000 edd4e000 eeae0528 edd48858
[   12.389358] 7f80: edc87d14 c05387d0 00000000 c0135604 edd4e000 c0135508 00000000 00000000
[   12.397502] 7fa0: 00000000 00000000 00000000 c0107978 00000000 00000000 00000000 00000000
[   12.405647] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   12.413795] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff ffffffff
[   12.421985] [<c0135b24>] (kthread_queue_work) from [<c0527ab8>] (mmc_request_done+0xd8/0x158)
[   12.430458] [<c0527ab8>] (mmc_request_done) from [<c0542834>] (dw_mci_request_end+0xa0/0xd8)
[   12.438848] [<c0542834>] (dw_mci_request_end) from [<c0542b5c>] (dw_mci_tasklet_func+0x2f0/0x394)
[   12.447693] [<c0542b5c>] (dw_mci_tasklet_func) from [<c011f6f0>] (tasklet_action+0x84/0x12c)
[   12.456089] [<c011f6f0>] (tasklet_action) from [<c011edac>] (__do_softirq+0xec/0x244)
[   12.463885] [<c011edac>] (__do_softirq) from [<c011f1a8>] (irq_exit+0xc0/0x104)
[   12.471166] [<c011f1a8>] (irq_exit) from [<c016068c>] (__handle_domain_irq+0x70/0xe4)
[   12.478966] [<c016068c>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c)
[   12.487280] [<c0101470>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   12.494716] Exception stack(0xedd67e28 to 0xedd67e70)
[   12.499753] 7e20:                   c08a2154 c0890cdc edd67e78 edd66000 00000000 c011f068
[   12.507902] 7e40: c0890cdc c08a2154 00000000 edd68030 edd68004 00000000 00000001 edd67e78
[   12.516039] 7e60: c011f068 c03448b8 20000013 ffffffff
[   12.521085] [<c010b00c>] (__irq_svc) from [<c03448b8>] (check_preemption_disabled+0x30/0x128)
[   12.529573] [<c03448b8>] (check_preemption_disabled) from [<c011f068>] (__local_bh_enable_ip+0xc8/0xec)
[   12.538931] [<c011f068>] (__local_bh_enable_ip) from [<c0527b8c>] (__mmc_start_request+0x54/0xdc)
[   12.547770] [<c0527b8c>] (__mmc_start_request) from [<c0527d04>] (mmc_start_request+0xf0/0x11c)
[   12.556437] [<c0527d04>] (mmc_start_request) from [<c0528208>] (mmc_start_areq+0x148/0x1ac)
[   12.564753] [<c0528208>] (mmc_start_areq) from [<c05376c0>] (mmc_blk_issue_rw_rq+0x78/0x314)
[   12.573155] [<c05376c0>] (mmc_blk_issue_rw_rq) from [<c0538358>] (mmc_blk_issue_rq+0x9c/0x458)
[   12.581733] [<c0538358>] (mmc_blk_issue_rq) from [<c0538868>] (mmc_queue_thread+0x98/0x180)
[   12.590053] [<c0538868>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134)
[   12.597603] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[   12.604782] Code: e1a06000 e1a04001 e1a00005 eb16c17c (e5943000) 
[   12.610842] ---[ end trace 86f45842e4b0b193 ]---
[   12.615426] Kernel panic - not syncing: Fatal exception in interrupt
[   12.621786] CPU1: stopping
[   12.624455] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   12.633452] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   12.639567] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   12.647261] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   12.654445] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   12.661810] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   12.669344] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   12.676783] Exception stack(0xee8b3f78 to 0xee8b3fc0)
[   12.681813] 3f60:                                                       00000001 00000000
[   12.689970] 3f80: ee8b3fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   12.698113] 3fa0: 00000000 00000000 00000001 ee8b3fc8 c01083c0 c01083c4 60000013 ffffffff
[   12.706265] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   12.713653] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   12.721001] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   12.728538] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   12.735463] CPU5: stopping
[   12.738165] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   12.747156] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   12.753291] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   12.760971] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   12.768153] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   12.775515] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   12.783049] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   12.790485] Exception stack(0xee8bbf78 to 0xee8bbfc0)
[   12.795517] bf60:                                                       00000001 00000000
[   12.803673] bf80: ee8bbfd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   12.811817] bfa0: 00000000 00000000 00000001 ee8bbfc8 c01083c0 c01083c4 60000013 ffffffff
[   12.819968] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   12.827350] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   12.834710] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   12.842239] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   12.849159] CPU4: stopping
[   12.851846] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   12.860840] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   12.866957] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   12.874653] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   12.881835] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   12.889197] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   12.896729] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   12.904168] Exception stack(0xee8b9f78 to 0xee8b9fc0)
[   12.909204] 9f60:                                                       00000001 00000000
[   12.917356] 9f80: ee8b9fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   12.925499] 9fa0: 00000000 00000000 00000001 ee8b9fc8 c01083c0 c01083c4 60000013 ffffffff
[   12.933655] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   12.941028] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   12.948393] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   12.955923] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   12.962842] CPU2: stopping
[   12.965520] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   12.974517] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   12.980621] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   12.988321] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   12.995508] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   13.002875] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   13.010409] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   13.017849] Exception stack(0xee8b5f78 to 0xee8b5fc0)
[   13.022878] 5f60:                                                       00000001 00000000
[   13.031036] 5f80: ee8b5fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   13.039182] 5fa0: 00000000 00000000 00000001 ee8b5fc8 c01083c0 c01083c4 60000013 ffffffff
[   13.047329] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   13.054703] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   13.062066] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   13.069600] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   13.076519] CPU3: stopping
[   13.079209] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   13.088207] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   13.094309] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   13.102010] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   13.109197] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   13.116563] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   13.124099] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   13.131537] Exception stack(0xee8b7f78 to 0xee8b7fc0)
[   13.136566] 7f60:                                                       00000001 00000000
[   13.144723] 7f80: ee8b7fd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   13.152869] 7fa0: 00000000 00000000 00000001 ee8b7fc8 c01083c0 c01083c4 60000013 ffffffff
[   13.161019] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   13.168390] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   13.175754] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   13.183286] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   13.190213] CPU6: stopping
[   13.192912] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   13.201905] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   13.208022] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   13.215716] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   13.222899] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   13.230263] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   13.237796] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   13.245233] Exception stack(0xee8bdf78 to 0xee8bdfc0)
[   13.250265] df60:                                                       00000001 00000000
[   13.258422] df80: ee8bdfd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   13.266567] dfa0: 00000000 00000000 00000001 ee8bdfc8 c01083c0 c01083c4 60000013 ffffffff
[   13.274720] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   13.282096] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   13.289459] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   13.296989] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   13.303908] CPU7: stopping
[   13.306603] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   13.315594] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   13.321711] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   13.329407] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   13.336587] [<c032956c>] (dump_stack) from [<c010caac>] (handle_IPI+0x170/0x1a8)
[   13.343950] [<c010caac>] (handle_IPI) from [<c01014b0>] (gic_handle_irq+0x90/0x9c)
[   13.351484] [<c01014b0>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   13.358923] Exception stack(0xee8bff78 to 0xee8bffc0)
[   13.363955] ff60:                                                       00000001 00000000
[   13.372113] ff80: ee8bffd0 c0114060 c0b05444 c0b053e4 c0a66cc8 c0b0544c c0b108a2 00000000
[   13.380256] ffa0: 00000000 00000000 00000001 ee8bffc8 c01083c0 c01083c4 60000013 ffffffff
[   13.388410] [<c010b00c>] (__irq_svc) from [<c01083c4>] (arch_cpu_idle+0x30/0x3c)
[   13.395786] [<c01083c4>] (arch_cpu_idle) from [<c0152f34>] (do_idle+0x13c/0x200)
[   13.403148] [<c0152f34>] (do_idle) from [<c015328c>] (cpu_startup_entry+0x18/0x1c)
[   13.410678] [<c015328c>] (cpu_startup_entry) from [<4010154c>] (0x4010154c)
[   13.417621] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[   13.424840] ------------[ cut here ]------------
[   13.429318] WARNING: CPU: 0 PID: 126 at kernel/workqueue.c:857 wq_worker_waking_up+0x70/0x80
[   13.437681] Modules linked in:
[   13.440727] CPU: 0 PID: 126 Comm: mmcqd/0 Tainted: G      D W       4.10.0-rc3-00113-g5750765 #2739
[   13.449728] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[   13.455823] [<c010d830>] (unwind_backtrace) from [<c010a544>] (show_stack+0x10/0x14)
[   13.463530] [<c010a544>] (show_stack) from [<c032956c>] (dump_stack+0x74/0x94)
[   13.470717] [<c032956c>] (dump_stack) from [<c011ad10>] (__warn+0xd4/0x100)
[   13.477650] [<c011ad10>] (__warn) from [<c011ad5c>] (warn_slowpath_null+0x20/0x28)
[   13.485194] [<c011ad5c>] (warn_slowpath_null) from [<c0130e70>] (wq_worker_waking_up+0x70/0x80)
[   13.493873] [<c0130e70>] (wq_worker_waking_up) from [<c013ba30>] (ttwu_do_activate+0x58/0x6c)
[   13.502355] [<c013ba30>] (ttwu_do_activate) from [<c013c4ec>] (try_to_wake_up+0x190/0x290)
[   13.510586] [<c013c4ec>] (try_to_wake_up) from [<c01521dc>] (__wake_up_common+0x4c/0x80)
[   13.518645] [<c01521dc>] (__wake_up_common) from [<c0152224>] (__wake_up_locked+0x14/0x1c)
[   13.526876] [<c0152224>] (__wake_up_locked) from [<c0152c24>] (complete+0x34/0x44)
[   13.534433] [<c0152c24>] (complete) from [<c04fcd34>] (exynos5_i2c_irq+0x220/0x26c)
[   13.542042] [<c04fcd34>] (exynos5_i2c_irq) from [<c0160dac>] (__handle_irq_event_percpu+0x58/0x140)
[   13.551048] [<c0160dac>] (__handle_irq_event_percpu) from [<c0160eb0>] (handle_irq_event_percpu+0x1c/0x58)
[   13.560664] [<c0160eb0>] (handle_irq_event_percpu) from [<c0160f24>] (handle_irq_event+0x38/0x5c)
[   13.569511] [<c0160f24>] (handle_irq_event) from [<c016422c>] (handle_fasteoi_irq+0xc4/0x19c)
[   13.578016] [<c016422c>] (handle_fasteoi_irq) from [<c0160574>] (generic_handle_irq+0x18/0x28)
[   13.586579] [<c0160574>] (generic_handle_irq) from [<c0160688>] (__handle_domain_irq+0x6c/0xe4)
[   13.595239] [<c0160688>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c)
[   13.603556] [<c0101470>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   13.610994] Exception stack(0xedd67b30 to 0xedd67b78)
[   13.616028] 7b20:                                     00000041 edd19900 00000102 edd66000
[   13.624180] 7b40: c0b49ae8 00000000 c0881434 00000000 00000000 edd19900 60000193 edcc9b04
[   13.632321] 7b60: 00000001 edd67b80 c0196974 c0196978 20000113 ffffffff
[   13.638933] [<c010b00c>] (__irq_svc) from [<c0196978>] (panic+0x1e8/0x26c)
[   13.645769] [<c0196978>] (panic) from [<c010a7f8>] (die+0x2b0/0x2e0)
[   13.652099] [<c010a7f8>] (die) from [<c011514c>] (__do_kernel_fault.part.0+0x54/0x1e4)
[   13.659982] [<c011514c>] (__do_kernel_fault.part.0) from [<c0110bec>] (do_page_fault+0x26c/0x294)
[   13.668812] [<c0110bec>] (do_page_fault) from [<c0101308>] (do_DataAbort+0x34/0xb4)
[   13.676432] [<c0101308>] (do_DataAbort) from [<c010af78>] (__dabt_svc+0x58/0x80)
[   13.683783] Exception stack(0xedd67cc0 to 0xedd67d08)
[   13.688825] 7cc0: a0000113 0b750b74 00000000 edc97000 00000008 edc97324 edc97320 00000000
[   13.696970] 7ce0: edcc9b08 edd6808c 00000000 edcc9b04 00000000 edd67d10 c06e6138 c0135b24
[   13.705102] 7d00: 60000193 ffffffff
[   13.708586] [<c010af78>] (__dabt_svc) from [<c0135b24>] (kthread_queue_work+0x18/0x64)
[   13.716478] [<c0135b24>] (kthread_queue_work) from [<c0527ab8>] (mmc_request_done+0xd8/0x158)
[   13.724970] [<c0527ab8>] (mmc_request_done) from [<c0542834>] (dw_mci_request_end+0xa0/0xd8)
[   13.733373] [<c0542834>] (dw_mci_request_end) from [<c0542b5c>] (dw_mci_tasklet_func+0x2f0/0x394)
[   13.742211] [<c0542b5c>] (dw_mci_tasklet_func) from [<c011f6f0>] (tasklet_action+0x84/0x12c)
[   13.750614] [<c011f6f0>] (tasklet_action) from [<c011edac>] (__do_softirq+0xec/0x244)
[   13.758411] [<c011edac>] (__do_softirq) from [<c011f1a8>] (irq_exit+0xc0/0x104)
[   13.765689] [<c011f1a8>] (irq_exit) from [<c016068c>] (__handle_domain_irq+0x70/0xe4)
[   13.773486] [<c016068c>] (__handle_domain_irq) from [<c0101470>] (gic_handle_irq+0x50/0x9c)
[   13.781804] [<c0101470>] (gic_handle_irq) from [<c010b00c>] (__irq_svc+0x6c/0xa8)
[   13.789241] Exception stack(0xedd67e28 to 0xedd67e70)
[   13.794279] 7e20:                   c08a2154 c0890cdc edd67e78 edd66000 00000000 c011f068
[   13.802427] 7e40: c0890cdc c08a2154 00000000 edd68030 edd68004 00000000 00000001 edd67e78
[   13.810565] 7e60: c011f068 c03448b8 20000013 ffffffff
[   13.815603] [<c010b00c>] (__irq_svc) from [<c03448b8>] (check_preemption_disabled+0x30/0x128)
[   13.824098] [<c03448b8>] (check_preemption_disabled) from [<c011f068>] (__local_bh_enable_ip+0xc8/0xec)
[   13.833457] [<c011f068>] (__local_bh_enable_ip) from [<c0527b8c>] (__mmc_start_request+0x54/0xdc)
[   13.842297] [<c0527b8c>] (__mmc_start_request) from [<c0527d04>] (mmc_start_request+0xf0/0x11c)
[   13.850963] [<c0527d04>] (mmc_start_request) from [<c0528208>] (mmc_start_areq+0x148/0x1ac)
[   13.859278] [<c0528208>] (mmc_start_areq) from [<c05376c0>] (mmc_blk_issue_rw_rq+0x78/0x314)
[   13.867680] [<c05376c0>] (mmc_blk_issue_rw_rq) from [<c0538358>] (mmc_blk_issue_rq+0x9c/0x458)
[   13.876258] [<c0538358>] (mmc_blk_issue_rq) from [<c0538868>] (mmc_queue_thread+0x98/0x180)
[   13.884579] [<c0538868>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134)
[   13.892121] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[   13.899292] ---[ end trace 86f45842e4b0b194 ]---

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply

* Re: [PATCH 1/8] nowait aio: Introduce IOCB_FLAG_NOWAIT
From: Goldwyn Rodrigues @ 2017-03-01 16:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: jack, linux-fsdevel, linux-block, linux-btrfs, linux-ext4,
	linux-xfs
In-Reply-To: <20170301155634.GA9630@infradead.org>

On 03/01/2017 09:56 AM, Christoph Hellwig wrote:
> On Wed, Mar 01, 2017 at 07:36:48AM -0800, Christoph Hellwig wrote:
>> Given that we aren't validating aio_flags in older kernels we can't
>> just add this flag as it will be a no-op in older kernels.  I think
>> we will have to add IOCB_CMD_PREADV2/IOCB_CMD_WRITEV2 opcodes that
>> properly validate all reserved fields or flags first.
>>
>> Once we do that I'd really prefer to use the same flags values
>> as preadv2/pwritev2 so that we'll only need one set of flags over
>> sync/async read/write ops.
> 
> I just took another look and we do verify that
> aio_reserved1/aio_reserved2 must be zero.  So I think we can just
> stick RWF_* into aio_reserved1 and fix that problem that way.
> 

RWF_* ? Isn't that kernel space flags? Or did you intend to say
IOCB_FLAG_*? If yes, we maintain two flag fields? aio_reserved1 (perhaps
renamed to aio_flags2) and aio_flags?

aio_reserved1 is also used to return key for the purpose of io_cancel,
but we should be able to fetch the flags before putting the key value
there. Still I am not comfortable using the same field for it because it
will be overwritten when io_submit returns.

Which brings me to the next question: What is the purpose of aio_key?
Why is aio_key set to KIOCB_KEY (which is zero) every time? You are not
differentiating the request by setting all the iocb's key to zero.

-- 
Goldwyn

^ permalink raw reply

* Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
From: Tejun Heo @ 2017-03-01 16:55 UTC (permalink / raw)
  To: Tahsin Erdogan; +Cc: Jens Axboe, linux-block, David Rientjes, linux-kernel
In-Reply-To: <CAAeU0aMMmWagAKc_nUoZj77EYiuiyhdtPZ35C4Yk6BPG-_=kxg@mail.gmail.com>

Hello,

On Tue, Feb 28, 2017 at 03:51:27PM -0800, Tahsin Erdogan wrote:
> On Tue, Feb 28, 2017 at 2:47 PM, Tejun Heo <tj@kernel.org> wrote:
> >> +     if (!blkcg_policy_enabled(q, pol)) {
> >> +             ret = -EOPNOTSUPP;
> >> +             goto fail;
> >
> > Pulling this out of the queue_lock doesn't seem safe to me.  This
> > function may end up calling into callbacks of disabled policies this
> > way.
> 
> I will move this to within the lock. To make things safe, I am also
> thinking of rechecking both blkcg_policy_enabled()  and
> blk_queue_bypass() after reacquiring the locks in each iteration.
> 
> >> +             parent = blkcg_parent(blkcg);
> >> +             while (parent && !__blkg_lookup(parent, q, false)) {
> >> +                     pos = parent;
> >> +                     parent = blkcg_parent(parent);
> >> +             }
> >
> > Hmm... how about adding @new_blkg to blkg_lookup_create() and calling
> > it with non-NULL @new_blkg until it succeeds?  Wouldn't that be
> > simpler?
> >
> >> +
> >> +             new_blkg = blkg_alloc(pos, q, GFP_KERNEL);
> 
> The challenge with that approach is creating a new_blkg with the right
> blkcg before passing to blkg_lookup_create(). blkg_lookup_create()
> walks down the hierarchy and will try to fill the first missing entry
> and the preallocated new_blkg must have been created with the right
> blkcg (feel free to send a code fragment if you think I am
> misunderstanding the suggestion).

Ah, indeed, but we can break out allocation of blkg and its
initialization, right?  It's a bit more work but then we'd be able to
do something like.


retry:
	new_blkg = alloc;
	lock;
	sanity checks;
	blkg = blkg_lookup_and_create(..., new_blkg);
	if (!blkg) {
		unlock;
		goto retry;
	}

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH] nbd: stop leaking sockets
From: Jens Axboe @ 2017-03-01 16:53 UTC (permalink / raw)
  To: Josef Bacik, axboe, nbd-general, linux-block, kernel-team; +Cc: stable
In-Reply-To: <1488386842-16515-1-git-send-email-jbacik@fb.com>

On 03/01/2017 09:47 AM, Josef Bacik wrote:
> This was introduced in the multi-connection patch, we've been leaking
> socket's ever since.
> 
> Fixes: 9561a7a ("nbd: add multi-connection support")
> cc: stable@vger.kernel.org
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Applied for this series, thanks Josef.

-- 
Jens Axboe

^ permalink raw reply

* [PATCH] nbd: stop leaking sockets
From: Josef Bacik @ 2017-03-01 16:47 UTC (permalink / raw)
  To: axboe, nbd-general, linux-block, kernel-team; +Cc: stable

This was introduced in the multi-connection patch, we've been leaking
socket's ever since.

Fixes: 9561a7a ("nbd: add multi-connection support")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 drivers/block/nbd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 0bf2b21..c7e93f6 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -689,8 +689,10 @@ static int nbd_clear_sock(struct nbd_device *nbd, struct block_device *bdev)
 	    nbd->num_connections) {
 		int i;
 
-		for (i = 0; i < nbd->num_connections; i++)
+		for (i = 0; i < nbd->num_connections; i++) {
+			sockfd_put(nbd->socks[i]->sock);
 			kfree(nbd->socks[i]);
+		}
 		kfree(nbd->socks);
 		nbd->socks = NULL;
 		nbd->num_connections = 0;
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 0/13 v2] block: Fix block device shutdown related races
From: Tejun Heo @ 2017-03-01 16:26 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Dan Williams,
	Thiago Jung Bauermann, Lekshmi Pillai, NeilBrown, Omar Sandoval
In-Reply-To: <20170301153700.GJ20512@quack2.suse.cz>

Hello, Jan.

On Wed, Mar 01, 2017 at 04:37:00PM +0100, Jan Kara wrote:
> > The other thing which came to mind is that the congested->__bdi sever
> > semantics.  IIRC, that one was also to support the "bdi must go away now"
> > behavior.  As bdi is refcnted now, I think we can probably just let cong
> > hold onto the bdi rather than try to sever the ref there.
> 
> So currently I get away with __bdi not being a proper refcounted reference.
> If we were to remove the clearing of __bdi, we'd have to make it into
> refcounted reference which is sligthly ugly as we need to special-case
> embedded bdi_writeback_congested structures. Maybe it will be a worthwhile
> cleanup but for now I left it alone...

Yeah, absolutely, it's an additional step that we can take later.
Nothing urgent.

Thanks.

-- 
tejun

^ permalink raw reply

* Re: [PATCH 11/16] mmc: block: shuffle retry and error handling
From: Bartlomiej Zolnierkiewicz @ 2017-03-01 15:58 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-mmc, Ulf Hansson, Adrian Hunter, Paolo Valente,
	Chunyan Zhang, Baolin Wang, linux-block, Jens Axboe,
	Christoph Hellwig, Arnd Bergmann
In-Reply-To: <18156581.sUHfslyV5F@amdc3058>

On Wednesday, March 01, 2017 04:52:38 PM Bartlomiej Zolnierkiewicz wrote:

> I assume that the problem got introduced even earlier,
> commit 4515dc6 ("mmc: block: shuffle retry and error
> handling") just makes it happen every time.

Patch #16 makes it worse as now I get deadlock on boot:

[  248.801750] INFO: task kworker/2:2:113 blocked for more than 120 seconds.
[  248.807119]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  248.814162] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.821943] kworker/2:2     D    0   113      2 0x00000000
[  248.827357] Workqueue: events_freezable mmc_rescan
[  248.832227] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  248.839123] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0)
[  248.846851] [<c0527708>] (__mmc_claim_host) from [<c052dc54>] (mmc_attach_mmc+0xb8/0x14c)
[  248.854989] [<c052dc54>] (mmc_attach_mmc) from [<c052a124>] (mmc_rescan+0x274/0x34c)
[  248.862725] [<c052a124>] (mmc_rescan) from [<c012fdf8>] (process_one_work+0x120/0x318)
[  248.870498] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac)
[  248.878653] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  248.885934] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  248.893098] INFO: task jbd2/mmcblk0p2-:132 blocked for more than 120 seconds.
[  248.900092]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  248.907108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.914904] jbd2/mmcblk0p2- D    0   132      2 0x00000000
[  248.920319] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  248.927433] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220)
[  248.935139] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0)
[  248.943634] [<c06def74>] (io_schedule_timeout) from [<c06df91c>] (bit_wait_io+0x10/0x58)
[  248.951684] [<c06df91c>] (bit_wait_io) from [<c06dfd3c>] (__wait_on_bit+0x84/0xbc)
[  248.959134] [<c06dfd3c>] (__wait_on_bit) from [<c06dfe60>] (out_of_line_wait_on_bit+0x68/0x70)
[  248.968142] [<c06dfe60>] (out_of_line_wait_on_bit) from [<c0295f4c>] (jbd2_journal_commit_transaction+0x1468/0x15c4)
[  248.978397] [<c0295f4c>] (jbd2_journal_commit_transaction) from [<c0298af0>] (kjournald2+0xbc/0x264)
[  248.987514] [<c0298af0>] (kjournald2) from [<c0135604>] (kthread+0xfc/0x134)
[  248.994494] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  249.001714] INFO: task kworker/1:2H:134 blocked for more than 120 seconds.
[  249.008412]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  249.015479] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.023094] kworker/1:2H    D    0   134      2 0x00000000
[  249.028510] Workqueue: kblockd blk_mq_run_work_fn
[  249.033330] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  249.040199] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0)
[  249.047856] [<c0527708>] (__mmc_claim_host) from [<c053881c>] (mmc_queue_rq+0x9c/0xa8)
[  249.055736] [<c053881c>] (mmc_queue_rq) from [<c0314358>] (blk_mq_dispatch_rq_list+0xd4/0x1d0)
[  249.064316] [<c0314358>] (blk_mq_dispatch_rq_list) from [<c03145d4>] (blk_mq_process_rq_list+0x180/0x198)
[  249.073845] [<c03145d4>] (blk_mq_process_rq_list) from [<c03146a4>] (__blk_mq_run_hw_queue+0xb8/0x110)
[  249.083120] [<c03146a4>] (__blk_mq_run_hw_queue) from [<c012fdf8>] (process_one_work+0x120/0x318)
[  249.092076] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac)
[  249.099990] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  249.107322] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  249.114485] INFO: task kworker/5:2H:136 blocked for more than 120 seconds.
[  249.121326]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  249.128232] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.136074] kworker/5:2H    D    0   136      2 0x00000000
[  249.141544] Workqueue: kblockd blk_mq_run_work_fn
[  249.146187] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  249.153419] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0)
[  249.160825] [<c0527708>] (__mmc_claim_host) from [<c053881c>] (mmc_queue_rq+0x9c/0xa8)
[  249.168755] [<c053881c>] (mmc_queue_rq) from [<c0314358>] (blk_mq_dispatch_rq_list+0xd4/0x1d0)
[  249.177318] [<c0314358>] (blk_mq_dispatch_rq_list) from [<c03145d4>] (blk_mq_process_rq_list+0x180/0x198)
[  249.186858] [<c03145d4>] (blk_mq_process_rq_list) from [<c03146a4>] (__blk_mq_run_hw_queue+0xb8/0x110)
[  249.196124] [<c03146a4>] (__blk_mq_run_hw_queue) from [<c012fdf8>] (process_one_work+0x120/0x318)
[  249.204969] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac)
[  249.213161] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  249.220270] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  249.227505] INFO: task kworker/0:1H:145 blocked for more than 120 seconds.
[  249.234328]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  249.241229] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.249066] kworker/0:1H    D    0   145      2 0x00000000
[  249.254521] Workqueue: kblockd blk_mq_run_work_fn
[  249.259176] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  249.266233] [<c06df634>] (schedule) from [<c0527708>] (__mmc_claim_host+0x8c/0x1a0)
[  249.274001] [<c0527708>] (__mmc_claim_host) from [<c053881c>] (mmc_queue_rq+0x9c/0xa8)
[  249.281747] [<c053881c>] (mmc_queue_rq) from [<c0314358>] (blk_mq_dispatch_rq_list+0xd4/0x1d0)
[  249.290284] [<c0314358>] (blk_mq_dispatch_rq_list) from [<c03145d4>] (blk_mq_process_rq_list+0x180/0x198)
[  249.299843] [<c03145d4>] (blk_mq_process_rq_list) from [<c03146a4>] (__blk_mq_run_hw_queue+0xb8/0x110)
[  249.309122] [<c03146a4>] (__blk_mq_run_hw_queue) from [<c012fdf8>] (process_one_work+0x120/0x318)
[  249.317951] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac)
[  249.326017] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  249.333408] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  249.340459] INFO: task udevd:280 blocked for more than 120 seconds.
[  249.346725]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  249.353644] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.361452] udevd           D    0   280    258 0x00000005
[  249.366885] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  249.373964] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220)
[  249.381651] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0)
[  249.390110] [<c06def74>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118)
[  249.398399] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c)
[  249.407129] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20)
[  249.416571] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8)
[  249.424892] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20)
[  249.432422] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0)
[  249.439501] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc)
[  249.446677] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc)
[  249.454152] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc)
[  249.461165] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c)
[  249.468833] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4)
[  249.476015] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4)
[  249.483557] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20)
[  249.491689] INFO: task udevd:281 blocked for more than 120 seconds.
[  249.497900]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  249.504892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.512771] udevd           D    0   281    258 0x00000005
[  249.518097] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  249.525153] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220)
[  249.532853] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0)
[  249.541354] [<c06def74>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118)
[  249.549463] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c)
[  249.558331] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20)
[  249.567785] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8)
[  249.576207] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20)
[  249.583669] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0)
[  249.590710] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc)
[  249.597843] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc)
[  249.605217] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc)
[  249.612399] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c)
[  249.620000] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4)
[  249.627228] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4)
[  249.634874] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20)
[  249.642922] INFO: task kworker/u16:2:1268 blocked for more than 120 seconds.
[  249.649891]       Tainted: G        W       4.10.0-rc3-00123-g1bec9a6 #2726
[  249.656847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  249.664654] kworker/u16:2   D    0  1268      2 0x00000000
[  249.670094] Workqueue: writeback wb_workfn (flush-179:0)
[  249.675398] [<c06df12c>] (__schedule) from [<c06df634>] (schedule+0x40/0xac)
[  249.682425] [<c06df634>] (schedule) from [<c06e4f94>] (schedule_timeout+0x148/0x220)
[  249.690103] [<c06e4f94>] (schedule_timeout) from [<c06def74>] (io_schedule_timeout+0x74/0xb0)
[  249.698738] [<c06def74>] (io_schedule_timeout) from [<c03154e4>] (bt_get+0x140/0x228)
[  249.706432] [<c03154e4>] (bt_get) from [<c03156d0>] (blk_mq_get_tag+0x24/0xa8)
[  249.713613] [<c03156d0>] (blk_mq_get_tag) from [<c03119c0>] (__blk_mq_alloc_request+0x10/0x15c)
[  249.722287] [<c03119c0>] (__blk_mq_alloc_request) from [<c0311bbc>] (blk_mq_map_request+0xb0/0xfc)
[  249.731178] [<c0311bbc>] (blk_mq_map_request) from [<c03136f0>] (blk_sq_make_request+0x8c/0x298)
[  249.739962] [<c03136f0>] (blk_sq_make_request) from [<c0308e00>] (generic_make_request+0xd8/0x180)
[  249.748891] [<c0308e00>] (generic_make_request) from [<c0308f30>] (submit_bio+0x88/0x148)
[  249.757175] [<c0308f30>] (submit_bio) from [<c0256ccc>] (ext4_io_submit+0x34/0x40)
[  249.764581] [<c0256ccc>] (ext4_io_submit) from [<c0255674>] (ext4_writepages+0x484/0x670)
[  249.772722] [<c0255674>] (ext4_writepages) from [<c01a5348>] (do_writepages+0x24/0x38)
[  249.780573] [<c01a5348>] (do_writepages) from [<c0208038>] (__writeback_single_inode+0x28/0x18c)
[  249.789359] [<c0208038>] (__writeback_single_inode) from [<c02085f0>] (writeback_sb_inodes+0x1e0/0x394)
[  249.798717] [<c02085f0>] (writeback_sb_inodes) from [<c0208814>] (__writeback_inodes_wb+0x70/0xac)
[  249.807643] [<c0208814>] (__writeback_inodes_wb) from [<c02089dc>] (wb_writeback+0x18c/0x1b4)
[  249.816241] [<c02089dc>] (wb_writeback) from [<c0208d68>] (wb_workfn+0x1c8/0x388)
[  249.823590] [<c0208d68>] (wb_workfn) from [<c012fdf8>] (process_one_work+0x120/0x318)
[  249.831375] [<c012fdf8>] (process_one_work) from [<c0130054>] (worker_thread+0x2c/0x4ac)
[  249.839408] [<c0130054>] (worker_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  249.846726] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply

* Re: [PATCH 1/8] nowait aio: Introduce IOCB_FLAG_NOWAIT
From: Christoph Hellwig @ 2017-03-01 15:56 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: jack, hch, linux-fsdevel, linux-block, linux-btrfs, linux-ext4,
	linux-xfs, Goldwyn Rodrigues
In-Reply-To: <20170301153647.GA30631@infradead.org>

On Wed, Mar 01, 2017 at 07:36:48AM -0800, Christoph Hellwig wrote:
> Given that we aren't validating aio_flags in older kernels we can't
> just add this flag as it will be a no-op in older kernels.  I think
> we will have to add IOCB_CMD_PREADV2/IOCB_CMD_WRITEV2 opcodes that
> properly validate all reserved fields or flags first.
> 
> Once we do that I'd really prefer to use the same flags values
> as preadv2/pwritev2 so that we'll only need one set of flags over
> sync/async read/write ops.

I just took another look and we do verify that
aio_reserved1/aio_reserved2 must be zero.  So I think we can just
stick RWF_* into aio_reserved1 and fix that problem that way.

^ permalink raw reply

* Re: [PATCH 11/16] mmc: block: shuffle retry and error handling
From: Bartlomiej Zolnierkiewicz @ 2017-03-01 15:52 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-mmc, Ulf Hansson, Adrian Hunter, Paolo Valente,
	Chunyan Zhang, Baolin Wang, linux-block, Jens Axboe,
	Christoph Hellwig, Arnd Bergmann
In-Reply-To: <1718299.ToPxjyb5YA@amdc3058>

On Wednesday, March 01, 2017 12:45:57 PM Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Tuesday, February 28, 2017 06:45:20 PM Bartlomiej Zolnierkiewicz wrote:
> > On Thursday, February 09, 2017 04:33:58 PM Linus Walleij wrote:
> > > Instead of doing retries at the same time as trying to submit new
> > > requests, do the retries when the request is reported as completed
> > > by the driver, in the finalization worker.
> > > 
> > > This is achieved by letting the core worker call back into the block
> > > layer using mmc_blk_rw_done(), that will read the status and repeatedly
> > > try to hammer the request using single request etc by calling back to
> > > the core layer using mmc_restart_areq()
> > > 
> > > The beauty of it is that the completion will not complete until the
> > > block layer has had the opportunity to hammer a bit at the card using
> > > a bunch of different approaches in the while() loop in
> > > mmc_blk_rw_done()
> > > 
> > > The algorithm for recapture, retry and handle errors is essentially
> > > identical to the one we used to have in mmc_blk_issue_rw_rq(),
> > > only augmented to get called in another path.
> > > 
> > > We have to add and initialize a pointer back to the struct mmc_queue
> > > from the struct mmc_queue_req to find the queue from the asynchronous
> > > request.
> > > 
> > > Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> > 
> > It seems that after this change we can end up queuing more
> > work for kthread from the kthread worker itself and wait
> > inside it for this nested work to complete.  I hope that
> 
> On the second look it seems that there is no waiting for
> the retried areq to complete so I cannot see what protects
> us from racing and trying to run two areq-s in parallel:
> 
> 1st areq being retried (in the completion kthread):
> 
> 	mmc_blk_rw_done()->mmc_restart_areq()->__mmc_start_data_req()
> 
> 2nd areq coming from the second request in the queue
> (in the queuing kthread):
> 
> 	mmc_blk_issue_rw_rq()->mmc_start_areq()->__mmc_start_data_req()
> 
> (after mmc_blk_rw_done() is done in mmc_finalize_areq() 1st
> areq is marked as completed by the completion kthread and
> the waiting on host->areq in mmc_start_areq() of the queuing
> kthread is done and 2nd areq is started while the 1st one
> is still being retried)
> 
> ?
> 
> Also retrying of areqs for MMC_BLK_RETRY status case got broken
> (before change do {} while() loop increased retry variable,
> now the loop is gone and retry variable will not be increased
> correctly and we can loop forever).

There is another problem with this patch.

During boot there is ~30 sec delay and later I get deadlock
on trying to run sync command (first thing I do after boot):

...
[    5.960623] asoc-simple-card sound: HiFi <-> 3830000.i2s mapping ok
done.
[....] Waiting for /dev to be fully populated...[   17.745887] random: crng init done
done.
[....] Activating swap...done.
[   39.767982] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
...
root@target:~# sync
[  248.801708] INFO: task udevd:287 blocked for more than 120 seconds.
[  248.806552]       Tainted: G        W       4.10.0-rc3-00118-g4515dc6 #2736
[  248.813590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.821275] udevd           D    0   287    249 0x00000005
[  248.826815] [<c06df404>] (__schedule) from [<c06df90c>] (schedule+0x40/0xac)
[  248.833889] [<c06df90c>] (schedule) from [<c06e526c>] (schedule_timeout+0x148/0x220)
[  248.841598] [<c06e526c>] (schedule_timeout) from [<c06df24c>] (io_schedule_timeout+0x74/0xb0)
[  248.849993] [<c06df24c>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118)
[  248.858235] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c)
[  248.867053] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20)
[  248.876525] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8)
[  248.884828] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20)
[  248.892375] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0)
[  248.899383] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc)
[  248.906593] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc)
[  248.913938] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc)
[  248.921046] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c)
[  248.928776] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4)
[  248.935970] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4)
[  248.943506] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20)
[  248.951637] INFO: task sync:1398 blocked for more than 120 seconds.
[  248.957756]       Tainted: G        W       4.10.0-rc3-00118-g4515dc6 #2736
[  248.965052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.972681] sync            D    0  1398   1390 0x00000000
[  248.978117] [<c06df404>] (__schedule) from [<c06df90c>] (schedule+0x40/0xac)
[  248.985174] [<c06df90c>] (schedule) from [<c06dfb3c>] (schedule_preempt_disabled+0x14/0x20)
[  248.993609] [<c06dfb3c>] (schedule_preempt_disabled) from [<c06e3b18>] (__mutex_lock_slowpath+0x480/0x6ec)
[  249.003153] [<c06e3b18>] (__mutex_lock_slowpath) from [<c0215964>] (iterate_bdevs+0xb8/0x108)
[  249.011729] [<c0215964>] (iterate_bdevs) from [<c020c0ac>] (sys_sync+0x54/0x98)
[  249.018802] [<c020c0ac>] (sys_sync) from [<c01078c0>] (ret_fast_syscall+0x0/0x3c)

To be exact the same issue also sometimes happens with
previous commit 784da04 ("mmc: queue: simplify queue
logic") and I also got deadlock on boot once with commit
9a4c8a3 ("mmc: core: kill off the context info"):

...
[    5.958868] asoc-simple-card sound: HiFi <-> 3830000.i2s mapping ok
done.
[....] Waiting for /dev to be fully populated...[   16.361597] random: crng init done
done.
[  248.801776] INFO: task mmcqd/0:127 blocked for more than 120 seconds.
[  248.806795]       Tainted: G        W       4.10.0-rc3-00116-g9a4c8a3 #2735
[  248.813882] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.821909] mmcqd/0         D    0   127      2 0x00000000
[  248.827031] [<c06df4b4>] (__schedule) from [<c06df9bc>] (schedule+0x40/0xac)
[  248.834098] [<c06df9bc>] (schedule) from [<c06e531c>] (schedule_timeout+0x148/0x220)
[  248.841788] [<c06e531c>] (schedule_timeout) from [<c06e02a8>] (wait_for_common+0xb8/0x144)
[  248.849969] [<c06e02a8>] (wait_for_common) from [<c05280f8>] (mmc_start_areq+0x40/0x1ac)
[  248.858092] [<c05280f8>] (mmc_start_areq) from [<c0537680>] (mmc_blk_issue_rw_rq+0x78/0x314)
[  248.866485] [<c0537680>] (mmc_blk_issue_rw_rq) from [<c0538318>] (mmc_blk_issue_rq+0x9c/0x458)
[  248.875060] [<c0538318>] (mmc_blk_issue_rq) from [<c0538820>] (mmc_queue_thread+0x90/0x16c)
[  248.883383] [<c0538820>] (mmc_queue_thread) from [<c0135604>] (kthread+0xfc/0x134)
[  248.890867] [<c0135604>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)
[  248.898124] INFO: task udevd:273 blocked for more than 120 seconds.
[  248.904331]       Tainted: G        W       4.10.0-rc3-00116-g9a4c8a3 #2735
[  248.911191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  248.919057] udevd           D    0   273    250 0x00000005
[  248.924543] [<c06df4b4>] (__schedule) from [<c06df9bc>] (schedule+0x40/0xac)
[  248.931557] [<c06df9bc>] (schedule) from [<c06e531c>] (schedule_timeout+0x148/0x220)
[  248.939206] [<c06e531c>] (schedule_timeout) from [<c06df2fc>] (io_schedule_timeout+0x74/0xb0)
[  248.947770] [<c06df2fc>] (io_schedule_timeout) from [<c0198a0c>] (__lock_page+0xe8/0x118)
[  248.955916] [<c0198a0c>] (__lock_page) from [<c01a88b0>] (truncate_inode_pages_range+0x580/0x59c)
[  248.964751] [<c01a88b0>] (truncate_inode_pages_range) from [<c01a8984>] (truncate_inode_pages+0x18/0x20)
[  248.974401] [<c01a8984>] (truncate_inode_pages) from [<c0214bf0>] (__blkdev_put+0x68/0x1d8)
[  248.982593] [<c0214bf0>] (__blkdev_put) from [<c0214ea8>] (blkdev_close+0x18/0x20)
[  248.990088] [<c0214ea8>] (blkdev_close) from [<c01e3178>] (__fput+0x84/0x1c0)
[  248.997229] [<c01e3178>] (__fput) from [<c0133d60>] (task_work_run+0xbc/0xdc)
[  249.004380] [<c0133d60>] (task_work_run) from [<c011de60>] (do_exit+0x304/0x9bc)
[  249.011570] [<c011de60>] (do_exit) from [<c011e664>] (do_group_exit+0x3c/0xbc)
[  249.018732] [<c011e664>] (do_group_exit) from [<c01278c0>] (get_signal+0x200/0x65c)
[  249.026392] [<c01278c0>] (get_signal) from [<c010ed48>] (do_signal+0x84/0x3c4)
[  249.033577] [<c010ed48>] (do_signal) from [<c010a0e4>] (do_work_pending+0xa4/0xb4)
[  249.041086] [<c010a0e4>] (do_work_pending) from [<c0107914>] (slow_work_pending+0xc/0x20)

I assume that the problem got introduced even earlier,
commit 4515dc6 ("mmc: block: shuffle retry and error
handling") just makes it happen every time.

The hardware I use for testing is Odroid XU3-Lite.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply

* Re: [PATCH 7/8] nowait aio: xfs
From: Christoph Hellwig @ 2017-03-01 15:40 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: jack, hch, linux-fsdevel, linux-block, linux-btrfs, linux-ext4,
	linux-xfs, Goldwyn Rodrigues
In-Reply-To: <20170228233610.25456-8-rgoldwyn@suse.de>

> @@ -528,12 +528,17 @@ xfs_file_dio_aio_write(
>  	    ((iocb->ki_pos + count) & mp->m_blockmask)) {
>  		unaligned_io = 1;
>  		iolock = XFS_IOLOCK_EXCL;
> +		if (iocb->ki_flags & IOCB_NOWAIT)
> +			return -EAGAIN;

So all unaligned I/O will return -EAGAIN?  Why?  Also please explain
that reason in a comment right here.

> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 1aa3abd..84f981a 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1020,6 +1020,11 @@ xfs_file_iomap_begin(
>  	if ((flags & IOMAP_REPORT) ||
>  	    (xfs_is_reflink_inode(ip) &&
>  	     (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT))) {
> +		/* Allocations due to reflinks */
> +		if ((flags & IOMAP_NOWAIT) && !(flags & IOMAP_REPORT)) {
> +			error = -EAGAIN;
> +			goto out_unlock;
> +		}

FYI, this code looks very different in current Linus' tree - I think
you're on some old kernel base.

^ permalink raw reply

* Re: [PATCH 3/8] nowait aio: return if direct write will trigger writeback
From: Christoph Hellwig @ 2017-03-01 15:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Goldwyn Rodrigues, jack, hch, linux-fsdevel, linux-block,
	linux-btrfs, linux-ext4, linux-xfs, Goldwyn Rodrigues
In-Reply-To: <20170301034606.GK16328@bombadil.infradead.org>

On Tue, Feb 28, 2017 at 07:46:06PM -0800, Matthew Wilcox wrote:
> Ugh, this is pretty inefficient.  If that's all you want to know, then
> using the radix tree directly will be far more efficient than spinning
> up all the pagevec machinery only to discard the pages found.
> 
> But what's going to kick these pages out of cache?  Shouldn't we rather
> find the pages, kick them out if clean, start writeback if not, and *then*
> return -EAGAIN?
> 
> So maybe we want to spin up the pagevec machinery after all so we can
> do that extra work?

As pointed out in the last round of these patches I think we really
need to pass a flags argument to filemap_write_and_wait_range to
communicate the non-blocking nature and only return -EAGAIN if we'd
block.  As a bonus that can indeed start to kick the pages out.

^ permalink raw reply

* Re: [PATCH] sbitmap: boundary checks for resized bitmap
From: Hannes Reinecke @ 2017-03-01 15:39 UTC (permalink / raw)
  To: Omar Sandoval; +Cc: Jens Axboe, Omar Sandoval, linux-block, Hannes Reinecke
In-Reply-To: <20170228191512.GA28004@vader.DHCP.thefacebook.com>

On 02/28/2017 08:15 PM, Omar Sandoval wrote:
> On Wed, Feb 15, 2017 at 12:10:42PM +0100, Hannes Reinecke wrote:
>> If the sbitmap gets resized we need to ensure not to overflow
>> the original allocation. And we should limit the search in
>> sbitmap_any_bit_set() to the available depth to avoid looking
>> into unused space.
> 
> Hey, Hannes, I don't really like this change. It's easy enough for the
> caller to keep track of this and check themselves if they really care. I
> even included a comment in sbitmap.h to that effect:
> 
Okay.

> /**
>  * sbitmap_resize() - Resize a &struct sbitmap.
>  * @sb: Bitmap to resize.
>  * @depth: New number of bits to resize to.
>  *
>  * Doesn't reallocate anything. It's up to the caller to ensure that the new
>  * depth doesn't exceed the depth that the sb was initialized with.
>  */
> 
> 
> As for the sbitmap_any_bit_set() change, the bits beyond the actual
> depth should all be zero, so I don't think that change is worth it,
> either.
> 
Hmm. That would be okay if we can be sure that the remaining bits really
are zero. Which probably would need to be checked by the caller, too.

So yeah, if you don't like it, okay.
Just ignore it then.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nï¿½rnberg
GF: J. Hawn, J. Guild, F. Imendï¿½rffer, HRB 16746 (AG Nï¿½rnberg)

^ permalink raw reply

* Re: [PATCH 1/8] nowait aio: Introduce IOCB_FLAG_NOWAIT
From: Christoph Hellwig @ 2017-03-01 15:36 UTC (permalink / raw)
  To: Goldwyn Rodrigues
  Cc: jack, hch, linux-fsdevel, linux-block, linux-btrfs, linux-ext4,
	linux-xfs, Goldwyn Rodrigues
In-Reply-To: <20170228233610.25456-2-rgoldwyn@suse.de>

On Tue, Feb 28, 2017 at 05:36:03PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> This flag informs kernel to bail out if an AIO request will block
> for reasons such as file allocations, or a writeback triggered,
> or would block while allocating requests while performing
> direct I/O.
> 
> IOCB_FLAG_NOWAIT is translated to IOCB_NOWAIT for
> iocb->ki_flags.

Given that we aren't validating aio_flags in older kernels we can't
just add this flag as it will be a no-op in older kernels.  I think
we will have to add IOCB_CMD_PREADV2/IOCB_CMD_WRITEV2 opcodes that
properly validate all reserved fields or flags first.

Once we do that I'd really prefer to use the same flags values
as preadv2/pwritev2 so that we'll only need one set of flags over
sync/async read/write ops.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox