All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Ming Lei <tom.leiming@gmail.com>, Mike Snitzer <snitzer@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>,
	linux-block <linux-block@vger.kernel.org>
Subject: Re: for-next hangs on test srp/012
Date: Tue, 11 Dec 2018 18:23:31 -0700	[thread overview]
Message-ID: <05be643a-46b8-c836-11bf-005dbb0df9e4@kernel.dk> (raw)
In-Reply-To: <41d15b87-d32e-9c6d-d68e-03bd5e7b74b0@kernel.dk>

On 12/11/18 6:05 PM, Jens Axboe wrote:
> On 12/11/18 5:38 PM, Ming Lei wrote:
>> On Wed, Dec 12, 2018 at 8:28 AM Mike Snitzer <snitzer@redhat.com> wrote:
>>>
>>> On Tue, Dec 11 2018 at  7:19pm -0500,
>>> Ming Lei <tom.leiming@gmail.com> wrote:
>>>
>>>> On Wed, Dec 12, 2018 at 8:04 AM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>> On 12/11/18 3:58 PM, Bart Van Assche wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>> If I run the following subset of blktests:
>>>>>>
>>>>>>   while :; do ./check -q srp && ./check -q nvmeof-mp; done
>>>>>>
>>>>>> against today's for-next branch (commit dd2bf2df85a7) then after some
>>>>>> time the following hang is reported:
>>>>>>
>>>>>> INFO: task fio:14869 blocked for more than 120 seconds.
>>>>>>       Not tainted 4.20.0-rc6-dbg+ #1
>>>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>>> fio             D25272 14869  14195 0x00000000
>>>>>> Call Trace:
>>>>>>  __schedule+0x401/0xe50
>>>>>>  schedule+0x4e/0xd0
>>>>>>  io_schedule+0x21/0x50
>>>>>>  blk_mq_get_tag+0x46d/0x640
>>>>>>  blk_mq_get_request+0x7c0/0xa00
>>>>>>  blk_mq_make_request+0x241/0xa70
>>>>>>  generic_make_request+0x411/0x950
>>>>>>  submit_bio+0x9b/0x250
>>>>>>  blkdev_direct_IO+0x7fb/0x870
>>>>>>  generic_file_direct_write+0x119/0x210
>>>>>>  __generic_file_write_iter+0x11c/0x280
>>>>>>  blkdev_write_iter+0x13c/0x220
>>>>>>  aio_write+0x204/0x310
>>>>>>  io_submit_one+0x9c6/0xe70
>>>>>>  __x64_sys_io_submit+0x115/0x340
>>>>>>  do_syscall_64+0x71/0x210
>>>>>>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>>>>>
>>>>>> When that hang occurs my list-pending-block-requests script does not show
>>>>>> any pending requests:
>>>>>>
>>>>>> # list-pending-block-requests
>>>>>> dm-0
>>>>>> loop0
>>>>>> loop1
>>>>>> loop2
>>>>>> loop3
>>>>>> loop4
>>>>>> loop5
>>>>>> loop6
>>>>>> loop7
>>>>>> nullb0
>>>>>> nullb1
>>>>>> sda
>>>>>> sdb
>>>>>> sdc
>>>>>> sdd
>>>>>> vda
>>>>>> vdb
>>>>>>
>>>>>> Enabling fail_if_no_path mode did not resolve the hang so I don't think
>>>>>> that the root cause is in any of the dm drivers used in this test:
>>>>>>
>>>>>> # dmsetup ls | while read dm rest; do dmsetup message $dm 0 fail_if_no_path; done; dmsetup remove_all; dmsetup table
>>>>>> 360014056e756c6c62300000000000000: 0 65536 multipath 0 1 alua 1 1 service-time 0 1 2 8:16 1 1
>>>>>>
>>>>>> The same test passes against kernel v4.20-rc6.
>>>>>
>>>>> What device is this being run on?
>>>>
>>>> I saw this issue on usb storage too.
>>>>
>>>> Seems it is introduced by commit ea86ea2cdced ("sbitmap: ammortize cost of
>>>> clearing bits"). When the IO hang happens, .cleared is 2, and .busy is 0 on
>>>> the sched_tag's sbitmap queue.
>>>
>>> You saw this running the same tests as Bart?
>>
>> Not the srp test as done by Bart, I just run 'parted' test on usb storage disk,
>> see the attached test script.
>>
>> Mostly it can be triggered in one run, sometimes it needs more.
> 
> I'll take a look. The ->cleared doesn't make sense for QD=1, or on
> one word in general. But I'd like to try and understand why it hangs.
> 
> Are you using a scheduler?

OK, I think I see what it is, the shallow is missing the deferred clear.
On top of this, probably worth to check at what depths deferred starts
to make sense. For QD == 1, definitely not. But that should be on top of
the fix.

Can you try this one?


diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index 2261136ae067..d98ba7af6bce 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -20,6 +20,47 @@
 #include <linux/sbitmap.h>
 #include <linux/seq_file.h>
 
+/*
+ * See if we have deferred clears that we can batch move
+ */
+static inline bool sbitmap_deferred_clear(struct sbitmap *sb, int index)
+{
+	unsigned long mask, val;
+	unsigned long __maybe_unused flags;
+	bool ret = false;
+
+	/* Silence bogus lockdep warning */
+#if defined(CONFIG_LOCKDEP)
+	local_irq_save(flags);
+#endif
+	spin_lock(&sb->map[index].swap_lock);
+
+	if (!sb->map[index].cleared)
+		goto out_unlock;
+
+	/*
+	 * First get a stable cleared mask, setting the old mask to 0.
+	 */
+	do {
+		mask = sb->map[index].cleared;
+	} while (cmpxchg(&sb->map[index].cleared, mask, 0) != mask);
+
+	/*
+	 * Now clear the masked bits in our free word
+	 */
+	do {
+		val = sb->map[index].word;
+	} while (cmpxchg(&sb->map[index].word, val, val & ~mask) != val);
+
+	ret = true;
+out_unlock:
+	spin_unlock(&sb->map[index].swap_lock);
+#if defined(CONFIG_LOCKDEP)
+	local_irq_restore(flags);
+#endif
+	return ret;
+}
+
 int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift,
 		      gfp_t flags, int node)
 {
@@ -70,6 +111,9 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth)
 	unsigned int bits_per_word = 1U << sb->shift;
 	unsigned int i;
 
+	for (i = 0; i < sb->map_nr; i++)
+		sbitmap_deferred_clear(sb, i);
+
 	sb->depth = depth;
 	sb->map_nr = DIV_ROUND_UP(sb->depth, bits_per_word);
 
@@ -112,47 +156,6 @@ static int __sbitmap_get_word(unsigned long *word, unsigned long depth,
 	return nr;
 }
 
-/*
- * See if we have deferred clears that we can batch move
- */
-static inline bool sbitmap_deferred_clear(struct sbitmap *sb, int index)
-{
-	unsigned long mask, val;
-	unsigned long __maybe_unused flags;
-	bool ret = false;
-
-	/* Silence bogus lockdep warning */
-#if defined(CONFIG_LOCKDEP)
-	local_irq_save(flags);
-#endif
-	spin_lock(&sb->map[index].swap_lock);
-
-	if (!sb->map[index].cleared)
-		goto out_unlock;
-
-	/*
-	 * First get a stable cleared mask, setting the old mask to 0.
-	 */
-	do {
-		mask = sb->map[index].cleared;
-	} while (cmpxchg(&sb->map[index].cleared, mask, 0) != mask);
-
-	/*
-	 * Now clear the masked bits in our free word
-	 */
-	do {
-		val = sb->map[index].word;
-	} while (cmpxchg(&sb->map[index].word, val, val & ~mask) != val);
-
-	ret = true;
-out_unlock:
-	spin_unlock(&sb->map[index].swap_lock);
-#if defined(CONFIG_LOCKDEP)
-	local_irq_restore(flags);
-#endif
-	return ret;
-}
-
 static int sbitmap_find_bit_in_index(struct sbitmap *sb, int index,
 				     unsigned int alloc_hint, bool round_robin)
 {
@@ -215,6 +218,7 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint,
 	index = SB_NR_TO_INDEX(sb, alloc_hint);
 
 	for (i = 0; i < sb->map_nr; i++) {
+again:
 		nr = __sbitmap_get_word(&sb->map[index].word,
 					min(sb->map[index].depth, shallow_depth),
 					SB_NR_TO_BIT(sb, alloc_hint), true);
@@ -223,6 +227,9 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned int alloc_hint,
 			break;
 		}
 
+		if (sbitmap_deferred_clear(sb, index))
+			goto again;
+
 		/* Jump to next index. */
 		index++;
 		alloc_hint = index << sb->shift;

-- 
Jens Axboe


  reply	other threads:[~2018-12-12  1:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-11 22:58 for-next hangs on test srp/012 Bart Van Assche
2018-12-11 23:05 ` Mike Snitzer
2018-12-11 23:09   ` Bart Van Assche
2018-12-11 23:09     ` Bart Van Assche
2018-12-12  0:22     ` Mike Snitzer
2018-12-12  0:02 ` Jens Axboe
2018-12-12  0:18   ` Bart Van Assche
2018-12-12  0:19   ` Ming Lei
2018-12-12  0:27     ` Mike Snitzer
2018-12-12  0:38       ` Ming Lei
2018-12-12  1:05         ` Jens Axboe
2018-12-12  1:23           ` Jens Axboe [this message]
2018-12-12  1:36             ` Jens Axboe
2018-12-12  1:43               ` Ming Lei
2018-12-12  1:44                 ` Jens Axboe
2018-12-12  1:49                 ` Ming Lei
2018-12-12  2:03                   ` Ming Lei
2018-12-12  2:25                     ` Jens Axboe
2018-12-12  4:28                       ` Ming Lei
2018-12-12  1:37             ` Ming Lei
2018-12-12  1:39               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05be643a-46b8-c836-11bf-005dbb0df9e4@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=bvanassche@acm.org \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=tom.leiming@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.