From: Matt Mackall <mpm@selenic.com>
To: Francois Romieu <romieu@fr.zoreil.com>
Cc: Mark Broadbent <markb@wetlettuce.com>, netdev@oss.sgi.com
Subject: Re: Followup to netpoll issues
Date: Fri, 7 Jan 2005 14:07:23 -0800 [thread overview]
Message-ID: <20050107220723.GB2940@waste.org> (raw)
In-Reply-To: <20050107214254.GA17317@electric-eye.fr.zoreil.com>
On Fri, Jan 07, 2005 at 10:42:54PM +0100, Francois Romieu wrote:
> Matt Mackall <mpm@selenic.com> :
> [...]
> > printk("debug: at point A");
> > do_something_that_sends_a_packet_B();
> > printk("debug: at point C");
> >
> > Depending on locking and queueing, we might see on our network dump B,
> > A, C, and wrongly conclude that whatever did B was not between A and C.
> > That's a bad way for printk to work.
>
> I completely agree that it is not perfect.
>
> However netconsole is currently unable to guarantee that you will
> always see both A and B (currently = assuming the skb is dropped
> when netconsole fails trylock as suggested by Jamal). So there is
> an issue with the reliability of the delivery as well.
Indeed, and we can't really do much about it generally as we're on a
lossy media. But I think that's less misleading than potential
reordering. I acknowledge that packets can get reordered on the wire
potentially as well but for typical LAN segments, this will be uncommon.
> I won't push harder on the queuing side as I believe that it will
> be possible to add it as an extra choice to the user whatever form
> netconsole takes to stop deadlocking.
I might be willing to compromise on a queue depth of one. Much simpler
code, much less effort to paper over bugs that should be fixed in a
different way.
But then we've got to have a way to disable it for kgdb-over-ethernet
or netdump, where the delayed work simply won't get processed because
the box is stopped. Admittedly trying to trace your way into the
network driver is asking for a beating here, but the general issue is
that netpoll clients can stop interrupt processing and netpoll should
tell such clients that the packets its trying to send aren't going
anywhere rather than quietly scheduling them for later delivery, see?
> [...]
> > The bugs I'm talking about are identical to the xmit_lock deadlock
> > except with locks we can't see outside the driver. In other words,
>
> Right, it's clearer now. Thanks for the reminder.
>
> User space takes device's private lock -> printks -> netconsole.write
> -> hard_start_xmit -> device's private lock -> splat. Same thing from
> interrupt context (in_irq() can probably help though).
>
> So we ought to check rtnl_sem as well (dev_base_lock anyone ?).
>
> /me scratches neck...
>
> > this patch addresses the easy part of larger problem by adding a bunch
> > of complexity that doesn't help in the larger problem. To me, that's a
> > hint that it's the wrong fix.
>
> Too big. It won't bite. :o)
>
> [...]
> > > I am not convinced that people will be satisfied with a rule which
> > > states that printk _from anywhere_ are lost as soon as a CPU enters
> > > in the xmit_lock zone but, hey, it's just me.
> >
> > It should only be dropped on the CPU holding the lock, with a loud
> > warning to follow shortly.
>
> Sorry if I was not clear: "from anywhere" meant printk issued from
> any part of the kernel which can interrupt the xmit_locked section
> of a qdisc_run(), i.e. printk from irq handlers.
Hrmm. Yes, that's uglier than I realized.
Perhaps there's a way to use the existing IRQ handler reentrancy
avoidance to detect that we might be about to recurse on a lock.
> If I read correctly the suggested design, the remaining CPUs should
> loop in netpoll_send_skb() when they notice that they can not take
> the lock and that their CPU do not own it, right ?
Yep.
--
Mathematics is the supreme nostalgia of our time.
next prev parent reply other threads:[~2005-01-07 22:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1105045914.7687.3.camel@tigger>
[not found] ` <20050106234610.GT2940@waste.org>
[not found] ` <20050107011547.GE27896@electric-eye.fr.zoreil.com>
2005-01-07 17:01 ` Followup to netpoll issues Matt Mackall
2005-01-07 21:42 ` Francois Romieu
2005-01-07 22:07 ` Matt Mackall [this message]
2005-01-07 22:41 ` Francois Romieu
[not found] ` <20050107002053.GD27896@electric-eye.fr.zoreil.com>
2005-01-07 20:14 ` Mark Broadbent
2005-01-07 23:18 ` Francois Romieu
2005-01-10 22:03 ` Mark Broadbent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050107220723.GB2940@waste.org \
--to=mpm@selenic.com \
--cc=markb@wetlettuce.com \
--cc=netdev@oss.sgi.com \
--cc=romieu@fr.zoreil.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).