* Regression in current git - Network Manager fails (bisected)
@ 2007-10-22 0:58 Joseph Fannin
2007-10-22 9:22 ` Denis V. Lunev
0 siblings, 1 reply; 8+ messages in thread
From: Joseph Fannin @ 2007-10-22 0:58 UTC (permalink / raw)
To: netdev; +Cc: Denis V. Lunev, David S. Miller, Alexey Kuznetsov
Network Manager (the freedesktop.org one) fails to work with Linus's
current git on a couple of different boxes I have here. All the boxes
have different NIC types, with different drivers.
I've bisected it down to cd40b7d3983c708aabe3d3008ec64ffce56d33b0 ,
"[NET]: make netlink user -> kernel interface synchronious". I've
double checked this by testing the kernel as of the immediately
previous commit; Network Manager works with that one, as it did on all
my machines in 2.6.23-mm1.
The netlink change seems to confuse N-M, and it somehow decides that
there's no link beat, so doesn't try to bring up the interface. If I
run "ifconfig eth0 up", N-M will decide there's a carrier after all
and takes over. Ethtool detects the link state correctly even with
the interface down.
If I down the interface again with ifconfig, N-M brings it right back
up without a problem, but if I kill N-M, it'll down the interface
before it exits, and fail in the same way as before when restarted.
N-M also emits this error:
"-- Error: Invalid message: type=DONE length=20 flags=<MULTI> sequence-nr=1193012574 pid=1185943630"
...which it doesn't do on kernels where it works normally.
strace'ing NetworkManager shows that it prints that message just after
talking over a netlink socket.
Networking otherwise works fine here with the latest git and N-M, if I
use the ifconfig "trick" to get the link up.
--
Joseph Fannin
jfannin@gmail.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression in current git - Network Manager fails (bisected)
2007-10-22 0:58 Regression in current git - Network Manager fails (bisected) Joseph Fannin
@ 2007-10-22 9:22 ` Denis V. Lunev
2007-10-22 15:57 ` Dan Williams
0 siblings, 1 reply; 8+ messages in thread
From: Denis V. Lunev @ 2007-10-22 9:22 UTC (permalink / raw)
To: netdev, Denis V. Lunev, David S. Miller, Alexey Kuznetsov
We have spent some time with the problem with Alexey and there are no
guesses for now.
Is it possible to name exact version of Network Manager and all
libraries related + provide us an output of strace with full buffers
send/received from netlink. Something like
strace -v -x -s 32768 <nm>
Regards,
Den
Joseph Fannin wrote:
> Network Manager (the freedesktop.org one) fails to work with Linus's
> current git on a couple of different boxes I have here. All the boxes
> have different NIC types, with different drivers.
>
> I've bisected it down to cd40b7d3983c708aabe3d3008ec64ffce56d33b0 ,
> "[NET]: make netlink user -> kernel interface synchronious". I've
> double checked this by testing the kernel as of the immediately
> previous commit; Network Manager works with that one, as it did on all
> my machines in 2.6.23-mm1.
>
> The netlink change seems to confuse N-M, and it somehow decides that
> there's no link beat, so doesn't try to bring up the interface. If I
> run "ifconfig eth0 up", N-M will decide there's a carrier after all
> and takes over. Ethtool detects the link state correctly even with
> the interface down.
>
> If I down the interface again with ifconfig, N-M brings it right back
> up without a problem, but if I kill N-M, it'll down the interface
> before it exits, and fail in the same way as before when restarted.
>
> N-M also emits this error:
>
> "-- Error: Invalid message: type=DONE length=20 flags=<MULTI> sequence-nr=1193012574 pid=1185943630"
>
> ...which it doesn't do on kernels where it works normally.
> strace'ing NetworkManager shows that it prints that message just after
> talking over a netlink socket.
>
> Networking otherwise works fine here with the latest git and N-M, if I
> use the ifconfig "trick" to get the link up.
>
> --
> Joseph Fannin
> jfannin@gmail.com
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Regression in current git - Network Manager fails (bisected)
2007-10-22 9:22 ` Denis V. Lunev
@ 2007-10-22 15:57 ` Dan Williams
2007-10-23 12:11 ` Thomas Graf
0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2007-10-22 15:57 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: netdev, Denis V. Lunev, David S. Miller, Alexey Kuznetsov
On Mon, 2007-10-22 at 13:22 +0400, Denis V. Lunev wrote:
> We have spent some time with the problem with Alexey and there are no
> guesses for now.
>
> Is it possible to name exact version of Network Manager and all
> libraries related + provide us an output of strace with full buffers
> send/received from netlink. Something like
> strace -v -x -s 32768 <nm>
NM uses netlink in two places; libnl (from Thomas Graf) and some custom
code for listening for interface up/down events and wireless events.
It looks like that code comes from libnl's lib/handlers.c where it
thinks the received message is invalid.
I'm pretty sure the code that checks carrier status of the device isn't
libnl code; so maybe the error message (which should get fixed of
course) isn't in the same path as the link detection.
The link detection comes from src/nm-netlink-monitor.c, so maybe we
should look at debugging there.
Dan
> Regards,
> Den
>
> Joseph Fannin wrote:
> > Network Manager (the freedesktop.org one) fails to work with Linus's
> > current git on a couple of different boxes I have here. All the boxes
> > have different NIC types, with different drivers.
> >
> > I've bisected it down to cd40b7d3983c708aabe3d3008ec64ffce56d33b0 ,
> > "[NET]: make netlink user -> kernel interface synchronious". I've
> > double checked this by testing the kernel as of the immediately
> > previous commit; Network Manager works with that one, as it did on all
> > my machines in 2.6.23-mm1.
> >
> > The netlink change seems to confuse N-M, and it somehow decides that
> > there's no link beat, so doesn't try to bring up the interface. If I
> > run "ifconfig eth0 up", N-M will decide there's a carrier after all
> > and takes over. Ethtool detects the link state correctly even with
> > the interface down.
> >
> > If I down the interface again with ifconfig, N-M brings it right back
> > up without a problem, but if I kill N-M, it'll down the interface
> > before it exits, and fail in the same way as before when restarted.
> >
> > N-M also emits this error:
> >
> > "-- Error: Invalid message: type=DONE length=20 flags=<MULTI> sequence-nr=1193012574 pid=1185943630"
> >
> > ...which it doesn't do on kernels where it works normally.
> > strace'ing NetworkManager shows that it prints that message just after
> > talking over a netlink socket.
> >
> > Networking otherwise works fine here with the latest git and N-M, if I
> > use the ifconfig "trick" to get the link up.
> >
> > --
> > Joseph Fannin
> > jfannin@gmail.com
> >
> >
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression in current git - Network Manager fails (bisected)
2007-10-22 15:57 ` Dan Williams
@ 2007-10-23 12:11 ` Thomas Graf
2007-10-23 13:09 ` Denis V. Lunev
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2007-10-23 12:11 UTC (permalink / raw)
To: Dan Williams
Cc: Denis V. Lunev, netdev, Denis V. Lunev, David S. Miller,
Alexey Kuznetsov
* Dan Williams <dcbw@redhat.com> 2007-10-22 11:57
> On Mon, 2007-10-22 at 13:22 +0400, Denis V. Lunev wrote:
> > We have spent some time with the problem with Alexey and there are no
> > guesses for now.
> >
> > Is it possible to name exact version of Network Manager and all
> > libraries related + provide us an output of strace with full buffers
> > send/received from netlink. Something like
> > strace -v -x -s 32768 <nm>
>
> NM uses netlink in two places; libnl (from Thomas Graf) and some custom
> code for listening for interface up/down events and wireless events.
>
> It looks like that code comes from libnl's lib/handlers.c where it
> thinks the received message is invalid.
>
> I'm pretty sure the code that checks carrier status of the device isn't
> libnl code; so maybe the error message (which should get fixed of
> course) isn't in the same path as the link detection.
>
> The link detection comes from src/nm-netlink-monitor.c, so maybe we
> should look at debugging there.
The patch introduced a change in semantics because it removed the
special ACK handling after a dump was started.
I will look into this.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression in current git - Network Manager fails (bisected)
2007-10-23 12:11 ` Thomas Graf
@ 2007-10-23 13:09 ` Denis V. Lunev
2007-10-23 13:38 ` Thomas Graf
0 siblings, 1 reply; 8+ messages in thread
From: Denis V. Lunev @ 2007-10-23 13:09 UTC (permalink / raw)
To: Thomas Graf
Cc: Dan Williams, netdev, Denis V. Lunev, David S. Miller,
Alexey Kuznetsov
Thomas Graf wrote:
> * Dan Williams <dcbw@redhat.com> 2007-10-22 11:57
>> On Mon, 2007-10-22 at 13:22 +0400, Denis V. Lunev wrote:
>>> We have spent some time with the problem with Alexey and there are no
>>> guesses for now.
>>>
>>> Is it possible to name exact version of Network Manager and all
>>> libraries related + provide us an output of strace with full buffers
>>> send/received from netlink. Something like
>>> strace -v -x -s 32768 <nm>
>> NM uses netlink in two places; libnl (from Thomas Graf) and some custom
>> code for listening for interface up/down events and wireless events.
>>
>> It looks like that code comes from libnl's lib/handlers.c where it
>> thinks the received message is invalid.
>>
>> I'm pretty sure the code that checks carrier status of the device isn't
>> libnl code; so maybe the error message (which should get fixed of
>> course) isn't in the same path as the link detection.
>>
>> The link detection comes from src/nm-netlink-monitor.c, so maybe we
>> should look at debugging there.
>
> The patch introduced a change in semantics because it removed the
> special ACK handling after a dump was started.
>
> I will look into this.
>
I have reproduced the problem with one-line test.
./nl-route-get 192.168.1.1
The problem is with this message:
-- Debug: Sent Message:
-------------------------- BEGIN NETLINK MESSAGE
---------------------------
[HEADER] 16 octets
.nlmsg_len = 20
.nlmsg_type = 18 <route/link>
.nlmsg_flags = 773 <REQUEST,ACK,ROOT,MATCH>
.nlmsg_seq = 1193143772
.nlmsg_pid = 8233
[PAYLOAD] 16 octets
00 1d fa 20 00 00 00 00 81 0e 02 00 00 00 00 00 ... ............
--------------------------- END NETLINK MESSAGE
---------------------------
it starts dump and requests ACK.
Regards,
Den
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: Regression in current git - Network Manager fails (bisected)
2007-10-23 13:09 ` Denis V. Lunev
@ 2007-10-23 13:38 ` Thomas Graf
2007-10-23 14:10 ` Dan Williams
0 siblings, 1 reply; 8+ messages in thread
From: Thomas Graf @ 2007-10-23 13:38 UTC (permalink / raw)
To: Denis V. Lunev
Cc: Dan Williams, netdev, Denis V. Lunev, David S. Miller,
Alexey Kuznetsov
* Denis V. Lunev <den@sw.ru> 2007-10-23 17:09
> I have reproduced the problem with one-line test.
> ./nl-route-get 192.168.1.1
> The problem is with this message:
>
> -- Debug: Sent Message:
> -------------------------- BEGIN NETLINK MESSAGE
> ---------------------------
> [HEADER] 16 octets
> .nlmsg_len = 20
> .nlmsg_type = 18 <route/link>
> .nlmsg_flags = 773 <REQUEST,ACK,ROOT,MATCH>
> .nlmsg_seq = 1193143772
> .nlmsg_pid = 8233
> [PAYLOAD] 16 octets
> 00 1d fa 20 00 00 00 00 81 0e 02 00 00 00 00 00 ... ............
> --------------------------- END NETLINK MESSAGE
> ---------------------------
> it starts dump and requests ACK.
libnl sets the ACK bit for all requests unless the application
disables this behaviour.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression in current git - Network Manager fails (bisected)
2007-10-23 13:38 ` Thomas Graf
@ 2007-10-23 14:10 ` Dan Williams
2007-10-25 9:06 ` Thomas Graf
0 siblings, 1 reply; 8+ messages in thread
From: Dan Williams @ 2007-10-23 14:10 UTC (permalink / raw)
To: Thomas Graf
Cc: Denis V. Lunev, netdev, Denis V. Lunev, David S. Miller,
Alexey Kuznetsov
On Tue, 2007-10-23 at 15:38 +0200, Thomas Graf wrote:
> * Denis V. Lunev <den@sw.ru> 2007-10-23 17:09
> > I have reproduced the problem with one-line test.
> > ./nl-route-get 192.168.1.1
> > The problem is with this message:
> >
> > -- Debug: Sent Message:
> > -------------------------- BEGIN NETLINK MESSAGE
> > ---------------------------
> > [HEADER] 16 octets
> > .nlmsg_len = 20
> > .nlmsg_type = 18 <route/link>
> > .nlmsg_flags = 773 <REQUEST,ACK,ROOT,MATCH>
> > .nlmsg_seq = 1193143772
> > .nlmsg_pid = 8233
> > [PAYLOAD] 16 octets
> > 00 1d fa 20 00 00 00 00 81 0e 02 00 00 00 00 00 ... ............
> > --------------------------- END NETLINK MESSAGE
> > ---------------------------
> > it starts dump and requests ACK.
>
> libnl sets the ACK bit for all requests unless the application
> disables this behaviour.
Should I make NM disable ACKs for now until it gets fixed?
Dan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Regression in current git - Network Manager fails (bisected)
2007-10-23 14:10 ` Dan Williams
@ 2007-10-25 9:06 ` Thomas Graf
0 siblings, 0 replies; 8+ messages in thread
From: Thomas Graf @ 2007-10-25 9:06 UTC (permalink / raw)
To: Dan Williams
Cc: Denis V. Lunev, netdev, Denis V. Lunev, David S. Miller,
Alexey Kuznetsov
* Dan Williams <dcbw@redhat.com> 2007-10-23 10:10
> Should I make NM disable ACKs for now until it gets fixed?
The reason libnl enables ACKs by default is to give the
application using it clear synchronisation points. For
change requests that means the interface function won't
return until the change has been commited as it will
call nl_wait_for_ack(). So if you disable it in NM and
run it on "old" kernels still using async netlink you
won't be sure when the change is actually being done so
this might break things if you rely on it.
I think providing a invalid message handler which returns
NL_OK if nlmsg_type is NLMSG_DONE or NLMSG_ERROR && err == 0
would be better if you need some kind of workaround. As
those messages are always last this should never cause
real troubles.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-10-25 9:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-22 0:58 Regression in current git - Network Manager fails (bisected) Joseph Fannin
2007-10-22 9:22 ` Denis V. Lunev
2007-10-22 15:57 ` Dan Williams
2007-10-23 12:11 ` Thomas Graf
2007-10-23 13:09 ` Denis V. Lunev
2007-10-23 13:38 ` Thomas Graf
2007-10-23 14:10 ` Dan Williams
2007-10-25 9:06 ` Thomas Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).