BPF List
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, pabeni@redhat.com,
	bpf@vger.kernel.org, Tejun Heo <tj@kernel.org>
Subject: Re: [GIT PULL] Networking for v6.9
Date: Tue, 12 Mar 2024 15:40:07 -0600	[thread overview]
Message-ID: <39c3c4dc-d852-40b3-a662-6202c5422acf@kernel.dk> (raw)
In-Reply-To: <CAHk-=wiOaBLqarS2uFhM1YdwOvCX4CZaWkeyNDY1zONpbYw2ig@mail.gmail.com>

On 3/12/24 3:11 PM, Linus Torvalds wrote:
> On Tue, 12 Mar 2024 at 13:47, Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> With your tree as of 65d287c7eb1d it gets to prompt but dies soon after
>> when prod services kick in (dunno what rpm Kdump does but says iocost
>> so adding Tejun):
> 
> Both of your traces are timers that seem to either lock up in ioc_now():
> 
>    https://lore.kernel.org/all/20240312133427.1a744844@kernel.org/
> 
> and now it looks like ioc_timer_fn():
> 
>   https://lore.kernel.org/all/20240312134739.248e6bd3@kernel.org/
> 
> But in neither case does it actually look like it's a lockup on a *lock*.
> 
> IOW, the NMI isn't happening on some spin_lock sequence or anything like that.
> 
> Yes, ioc_now() could have been looping on the seq read-lock if the
> sequence number was odd. But the writers do seem to be done with
> interrupts disabled, plus then you wouldn't have this lockup in
> ioc_timer_fn, so it's probably not that.
> 
> And yes, ioc_timer_fn() does take locks, but again, that doesn't seem
> to be where it is hanging.
> 
> So it smells like it's an endless loop in ioc_timer_fn() to me, or
> perhaps retriggering the timer itself infinitely.
> 
> Which would then explain both of those traces (that endless loop would
> call ioc_now() as part of it).
> 
> The blk-iocost.c code itself hasn't changed, but the timer code has
> gone through big changes.
> 
> That said, there's a more blk-related change: da4c8c3d0975 ("block:
> cache current nsec time in struct blk_plug").
> 
> *And* your second dump is from that
> 
>         period_vtime = now.vnow - ioc->period_at_vtime;
>         if (WARN_ON_ONCE(!period_vtime)) {
> 
> so it smells like the blk-iocost code is just completely confused by
> the time caching. Jens?
> 
> Jakub, it might be worth seeing if just reverting that commit
> da4c8c3d0975 makes the problem go away. Otherwise a bisect might be
> needed...

Hmm, I wonder if the below will fix it. At least from the timer side,
we should not be using the cached clock.


diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 9a85bfbbc45a..646b50e1c914 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -1044,7 +1044,7 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 	unsigned seq;
 	u64 vrate;
 
-	now->now_ns = blk_time_get_ns();
+	now->now_ns = ktime_get_ns();
 	now->now = ktime_to_us(now->now_ns);
 	vrate = atomic64_read(&ioc->vtime_rate);
 

-- 
Jens Axboe


  reply	other threads:[~2024-03-12 21:40 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12  4:25 [GIT PULL] Networking for v6.9 Jakub Kicinski
2024-03-12 20:17 ` Linus Torvalds
2024-03-12 20:34   ` Jakub Kicinski
2024-03-12 20:47     ` Jakub Kicinski
2024-03-12 21:11       ` Linus Torvalds
2024-03-12 21:40         ` Jens Axboe [this message]
2024-03-12 21:48           ` Jakub Kicinski
2024-03-12 21:53             ` Jens Axboe
2024-03-12 21:55             ` Jakub Kicinski
2024-03-12 22:02               ` Jens Axboe
2024-03-12 22:14                 ` Tejun Heo
2024-03-12 22:24 ` Jakub Kicinski
2024-03-13  0:00   ` Jakub Kicinski
2024-03-13  1:00 ` Linus Torvalds
2024-03-13  1:10 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39c3c4dc-d852-40b3-a662-6202c5422acf@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=johannes.thumshirn@wdc.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox