All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Waychison <mikew-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: "Américo Wang" <xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Matt Mackall <mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org>,
	Greg KH <greg-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>,
	simon.kagstrom-vI6UBbBVNY+JA8cjQkG2/g@public.gmane.org,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org,
	adurbin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	chavey-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v1 00/12] netoops support
Date: Thu, 04 Nov 2010 10:38:15 -0700	[thread overview]
Message-ID: <4CD2EF87.2030906@google.com> (raw)
In-Reply-To: <20101104063511.GE5210-+dguKlz9DXUf7BdofF/totBPR1lH4CV8@public.gmane.org>

Américo Wang wrote:
> On Wed, Nov 03, 2010 at 06:18:41PM -0700, Mike Waychison wrote:
>> Matt Mackall wrote:
>>> On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>>>> Mike Waychison wrote:
>>>>> FWIW, another semantic difference between netconsole and netoops (that
>>>>> I had missed in the last email) is filtering: we really do want to get
>>>>> the whole log when a crash happens, debug messages and all.
>>>>> Netconsole is subject to console filtering (which we _do_ want as
>>>>> debug messages going out the uart slows the whole world down).
>>>>>
>>>>> netconsole and netoops _do_ have bits in common, for instance the
>>>>> handling of NETDEV events and source+target configuration.  I'd rather
>>>>> those bits become common between the two than figure out how to jam
>>>>> the semantics we need into netconsole.
>>>> Hi Matt,
>>>>
>>>> I've been reading through the netconsole driver in response to
>>>> Greg's comments on this thread, and it is definitely more robust
>>>> in terms of configuration and handling of network device events
>>>> than the netoops driver I proposed.
>>> I've been following the discussion to see if it went anywhere
>>> interesting..
>>>
>>>> What are your thoughts on extending netconsole with the same sort
>>>> of semantics that are in the netoops patchset?
>>> My first thought is that it's a bit unfortunate that some of the the
>>> netconsole configgy bits weren't implemented in a generic way that would
>>> be applicable to other netpoll clients. Some people have never gotten it
>>> into their heads that netconsole isn't the only client.
>>>
>>>> I'd still like to have blit-dmesg-to-the-network-on-oops
>>>> semantics, which seems doable by having a per-target flag for
>>>> streaming of console messages (enabled by default) and a flag to
>>>> emit a structured full dmesg dump (disabled by default).
>>> I'd actually like to see you go forward with netoops. It's clear to me
>>> that it's a different beast and complexifying netconsole with a bunch of
>>> weird new options doesn't really sit well. If that means abstracting
>>> some of the sysfs crap from netconsole, great.
>> I'd be happy to take a stab at this.  This solves most of the ABI
>> reservations that I have with this v1 patchset.
>>
>> Looking at netconsole, it looks to lack some locking for data
>> consistency, and it appears that we will deadlock if we ever get a
>> NETDEV_UNREGISTER event (due to recursively grabbing the rtnl in
>> netpoll_cleanup).  I have a couple patches I've been hacking on this
>> afternoon that should clear those issues up.
>>
> 
> 
> You might want to look at net-next-2.6, it has some fixes
> from Neil.

Excellent, yes, 3b410a31 fixes the recursive rtnl deadlock I was 
referring to.

> 
> 
>> I'm thinking of pushing all the target handling options down into
>> net/core/netpoll.c.  I'll probably expose this interface as "struct
>> netpoll_targets" where ->lock and ->list could be completely exposed
>> to clients.  netconsole would then get a lot smaller as would
>> netoops.
>>
>>> That said, I don't think netoops is an ideal name, given how closely
>>> bound oops _events_ are with their textual output. Presumably it covers
>>> events other than oopsen like panics too.
>> True.  We call this code 'netdump' or 'network_dumper' internally,
>> but I figured it'd be better to follow current conventions with
>> ramoops and mtdoops already in the tree.  I don't really care what
>> it's called in the end :)
>>
> 
> 
> "netdump" was used by a utility that do crash dumping over net.
> It is deprecated now, since we have kdump.

Yup.  If you go back far enough, I think this was a gut of that code 
long long ago, hence the name.

> 
>>> Regarding rolling oopses: lots of machines regularly survive
>>> oopses, so I think you ought to consider rate-limiting them (to a
>>> configurable rate
>>> with a very low default) rather than suppressing all but the first.
>>>
>> The trouble with Oopses is just that:  We don't know whether we can
>> safely survive them or not and it's a total gamble each time we do
>> Oops.  We can't programmatically know how crapped out the machine is,
>> so historically we've erred on not allowing bad things to continue
>> happening once someone notices something wrong.
>>
>> It's easier for us to just shoot the machine in the head
>> (panic_on_oops) and move on than corrupt data or dead-lock in weird
>> ways at some later point in time.  This is definitely not the
>> behaviour I would want nor expect from my desktop or phone, but for
>> the cluster, it's just safer.
> 
> We also have pause_on_oops, or we can invent a oops_once.

WARNING: multiple messages have this Message-ID (diff)
From: Mike Waychison <mikew@google.com>
To: "Américo Wang" <xiyou.wangcong@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>, Greg KH <greg@kroah.com>,
	simon.kagstrom@netinsight.net, davem@davemloft.net,
	adurbin@google.com, akpm@linux-foundation.org, chavey@google.com,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH v1 00/12] netoops support
Date: Thu, 04 Nov 2010 10:38:15 -0700	[thread overview]
Message-ID: <4CD2EF87.2030906@google.com> (raw)
In-Reply-To: <20101104063511.GE5210@cr0.nay.redhat.com>

Américo Wang wrote:
> On Wed, Nov 03, 2010 at 06:18:41PM -0700, Mike Waychison wrote:
>> Matt Mackall wrote:
>>> On Wed, 2010-11-03 at 13:29 -0700, Mike Waychison wrote:
>>>> Mike Waychison wrote:
>>>>> FWIW, another semantic difference between netconsole and netoops (that
>>>>> I had missed in the last email) is filtering: we really do want to get
>>>>> the whole log when a crash happens, debug messages and all.
>>>>> Netconsole is subject to console filtering (which we _do_ want as
>>>>> debug messages going out the uart slows the whole world down).
>>>>>
>>>>> netconsole and netoops _do_ have bits in common, for instance the
>>>>> handling of NETDEV events and source+target configuration.  I'd rather
>>>>> those bits become common between the two than figure out how to jam
>>>>> the semantics we need into netconsole.
>>>> Hi Matt,
>>>>
>>>> I've been reading through the netconsole driver in response to
>>>> Greg's comments on this thread, and it is definitely more robust
>>>> in terms of configuration and handling of network device events
>>>> than the netoops driver I proposed.
>>> I've been following the discussion to see if it went anywhere
>>> interesting..
>>>
>>>> What are your thoughts on extending netconsole with the same sort
>>>> of semantics that are in the netoops patchset?
>>> My first thought is that it's a bit unfortunate that some of the the
>>> netconsole configgy bits weren't implemented in a generic way that would
>>> be applicable to other netpoll clients. Some people have never gotten it
>>> into their heads that netconsole isn't the only client.
>>>
>>>> I'd still like to have blit-dmesg-to-the-network-on-oops
>>>> semantics, which seems doable by having a per-target flag for
>>>> streaming of console messages (enabled by default) and a flag to
>>>> emit a structured full dmesg dump (disabled by default).
>>> I'd actually like to see you go forward with netoops. It's clear to me
>>> that it's a different beast and complexifying netconsole with a bunch of
>>> weird new options doesn't really sit well. If that means abstracting
>>> some of the sysfs crap from netconsole, great.
>> I'd be happy to take a stab at this.  This solves most of the ABI
>> reservations that I have with this v1 patchset.
>>
>> Looking at netconsole, it looks to lack some locking for data
>> consistency, and it appears that we will deadlock if we ever get a
>> NETDEV_UNREGISTER event (due to recursively grabbing the rtnl in
>> netpoll_cleanup).  I have a couple patches I've been hacking on this
>> afternoon that should clear those issues up.
>>
> 
> 
> You might want to look at net-next-2.6, it has some fixes
> from Neil.

Excellent, yes, 3b410a31 fixes the recursive rtnl deadlock I was 
referring to.

> 
> 
>> I'm thinking of pushing all the target handling options down into
>> net/core/netpoll.c.  I'll probably expose this interface as "struct
>> netpoll_targets" where ->lock and ->list could be completely exposed
>> to clients.  netconsole would then get a lot smaller as would
>> netoops.
>>
>>> That said, I don't think netoops is an ideal name, given how closely
>>> bound oops _events_ are with their textual output. Presumably it covers
>>> events other than oopsen like panics too.
>> True.  We call this code 'netdump' or 'network_dumper' internally,
>> but I figured it'd be better to follow current conventions with
>> ramoops and mtdoops already in the tree.  I don't really care what
>> it's called in the end :)
>>
> 
> 
> "netdump" was used by a utility that do crash dumping over net.
> It is deprecated now, since we have kdump.

Yup.  If you go back far enough, I think this was a gut of that code 
long long ago, hence the name.

> 
>>> Regarding rolling oopses: lots of machines regularly survive
>>> oopses, so I think you ought to consider rate-limiting them (to a
>>> configurable rate
>>> with a very low default) rather than suppressing all but the first.
>>>
>> The trouble with Oopses is just that:  We don't know whether we can
>> safely survive them or not and it's a total gamble each time we do
>> Oops.  We can't programmatically know how crapped out the machine is,
>> so historically we've erred on not allowing bad things to continue
>> happening once someone notices something wrong.
>>
>> It's easier for us to just shoot the machine in the head
>> (panic_on_oops) and move on than corrupt data or dead-lock in weird
>> ways at some later point in time.  This is definitely not the
>> behaviour I would want nor expect from my desktop or phone, but for
>> the cluster, it's just safer.
> 
> We also have pause_on_oops, or we can invent a oops_once.


  parent reply	other threads:[~2010-11-04 17:38 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-03  1:29 [PATCH v1 00/12] netoops support Mike Waychison
2010-11-03  1:29 ` [PATCH v1 05/12] netoops: add core functionality Mike Waychison
2010-11-03  1:30 ` [PATCH v1 06/12] netoops: Add x86 specific bits to packet headers Mike Waychison
     [not found] ` <20101103012917.4641.57113.stgit-+dUuAhMFdFN6FDdRrpk8kO4/NqBCd+6Q@public.gmane.org>
2010-11-03  1:29   ` [PATCH v1 01/12] Oops: Pass regs to oops_exit() Mike Waychison
2010-11-03  1:29     ` Mike Waychison
2010-11-03  1:29   ` [PATCH v1 02/12] kmsg_dumper: Pass pt_regs along to dumpers Mike Waychison
2010-11-03  1:29     ` Mike Waychison
2010-11-03  1:29   ` [PATCH v1 03/12] kmsg_dumper: Introduce a new 'SOFT' dump reason Mike Waychison
2010-11-03  1:29     ` Mike Waychison
2010-11-03  1:29   ` [PATCH v1 04/12] sys-rq: Add option to soft dump Mike Waychison
2010-11-03  1:29     ` Mike Waychison
2010-11-03  1:30   ` [PATCH v1 07/12] netoops: Add user programmable fields to the netoops packet Mike Waychison
2010-11-03  1:30     ` Mike Waychison
2010-11-03  1:30   ` [PATCH v1 08/12] netoops: Add one-shot mode Mike Waychison
2010-11-03  1:30     ` Mike Waychison
2010-11-03  1:30   ` [PATCH v1 10/12] kmsg_dump: Export symbol kmsg_dump() to GPL modules Mike Waychison
2010-11-03  1:30     ` Mike Waychison
2010-11-03  1:30   ` [PATCH v1 11/12] ipv4: Export arp_bind_neighbour() symbol " Mike Waychison
2010-11-03  1:30     ` Mike Waychison
2010-11-03  2:34   ` [PATCH v1 00/12] netoops support Greg KH
2010-11-03  2:34     ` Greg KH
     [not found]     ` <20101103023422.GB5782-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2010-11-03  3:37       ` Mike Waychison
2010-11-03  3:37         ` Mike Waychison
2010-11-03 18:16         ` Greg KH
     [not found]           ` <20101103181634.GF7441-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2010-11-03 18:50             ` Randy Dunlap
2010-11-03 18:50               ` Randy Dunlap
     [not found]               ` <20101103115020.ad8a4ecc.randy.dunlap-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2010-11-03 19:40                 ` Joe Perches
2010-11-03 19:40                   ` Joe Perches
2010-11-03 20:58                   ` Bruno Prémont
2010-11-03 19:03             ` Mike Waychison
2010-11-03 19:03               ` Mike Waychison
     [not found]               ` <AANLkTimKWCWtuPeZhMZ75gTxB8LwAhJfy2FZnnRwthft-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-11-03 20:29                 ` Mike Waychison
2010-11-03 20:29                   ` Mike Waychison
     [not found]                   ` <4CD1C612.5080902-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2010-11-03 20:54                     ` Matt Mackall
2010-11-03 20:54                       ` Matt Mackall
2010-11-04  1:18                       ` Mike Waychison
2010-11-04  1:18                         ` Mike Waychison
     [not found]                         ` <4CD209F1.90708-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2010-11-04  6:35                           ` Américo Wang
2010-11-04  6:35                             ` Américo Wang
     [not found]                             ` <20101104063511.GE5210-+dguKlz9DXUf7BdofF/totBPR1lH4CV8@public.gmane.org>
2010-11-04 17:38                               ` Mike Waychison [this message]
2010-11-04 17:38                                 ` Mike Waychison
2010-11-04  6:15                       ` Américo Wang
2010-11-04  6:15                         ` Américo Wang
     [not found]                         ` <20101104061544.GD5210-+dguKlz9DXUf7BdofF/totBPR1lH4CV8@public.gmane.org>
2010-11-04 17:21                           ` Mike Waychison
2010-11-04 17:21                             ` Mike Waychison
2010-11-03  1:30 ` [PATCH v1 09/12] netoops: Add an interface to trigger various types of crashes Mike Waychison
2010-11-03  1:30 ` [PATCH v1 12/12] netoops: Allow the driver to be built as a module Mike Waychison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CD2EF87.2030906@google.com \
    --to=mikew-hpiqsd4aklfqt0dzr+alfa@public.gmane.org \
    --cc=adurbin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=chavey-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org \
    --cc=greg-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mpm-VDJrAJ4Gl5ZBDgjK7y7TUQ@public.gmane.org \
    --cc=simon.kagstrom-vI6UBbBVNY+JA8cjQkG2/g@public.gmane.org \
    --cc=xiyou.wangcong-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.